Algorithms and accuracy in A-Level results
This week saw students in the UK receive their A-Level (qualifications between compulsory and university education) exam results. Due to social distancing and stay-at-home orders from Coronavirus individual exams were not possible.
Instead, the results have been controversial because grades were assigned using an algorithm devised by Ofqual, the UK’s qualifications and examinations regulator.
There has been significant media coverage of students that have been downgraded from their teacher’s predicted grades and calls for these to be used instead.
It’s an interesting case study for me for two reasons:
- accuracy of predictions is an important aspect of (cyber) risk management
- rights against automated decision-making in data protection regulation
On the accuracy front, I’ve looked to understand the percentage of grades where the algorithm matched the teacher’s prediction (or “Centre Assessment Grade”) and how good, or not, teachers are at predicting their student’s achievements.
First up, how ‘accurate’ were the results generated by the Ofqual algorithm? The Guardian report that 39.1% of predicted grades were adjusted down by one or more grades as a result of the algorithm. While 2.2% of grades were adjusted up by one or more grades. So 58.7% of results were unchanged or ‘accurate’ when compared to teacher’s predictions.
I’ve now read a few studies into the accuracy of teacher’s grade predictions, one of the most recent (Gill, 2019) being conducted by Cambridge Assessments, the parent of exam board OCR and another (UCAS, 2013) looking at the differences between predicted and achieved grades for university applicants.
So secondly, how accurate are teachers at predicting exam results? Both studies were similar in their findings: over-prediction occurred 43%-48% of the time, under-prediction was 15%-11% of the time, with achieved grades matching predictions 43%-43% of the time.
So statistically teachers are wrong more than they are right, and there is a strong tendency for optimism leading to over-prediction of student’s grades. This has, in part, been shown to be where predicted grades are used as a motivational tool.
On the surface, then, this would seem to be a good thing: predicted grades are often optimistic and, in the majority of cases, these were adopted. Claims of bias in the algorithm abound though, backed by a noticeable uptick in results at private and independent schools, and downgrades appearing to be more common at state-funded schools.
Ofqual defended the algorithm to Channel 4 saying that it was unbiased and the type of institution was not a factor. Though there does seem to be merit in that claim because of how it handles different class sizes.
For smaller class sizes of less than 15, there is an increasing preference for reliance on teacher predictions is favoured, down to class sizes of 5 where it is the sole factor.
And class-size may be a proxy for institution type, with independent schools entering 9.4 students/subject, whereas sixth form and further education colleges entering 33 students/subject.
So smaller class sizes found at independent schools favoured teacher’s predictions that are only correct ~2/5 of the time. While independent schools do have the highest accuracy (47.5%) of predicted grades, this means they still get it wrong more than they got it right.
And this is where my two interests start to converge because for larger class sizes there is the introduction of the school’s past performance. The last three years for each subject at each school are used along with an ordinal ranking (i.e. no ‘joint places’) of students.
A Mishcon de Reya blog post recalls a watershed case from 1991 involving how what is now Experian was using the address of an individual to determine their credit rating: if you moved to an address where someone had previously struggled with repayments, you would ‘inherit’ this negative mark on your credit file and be deemed less creditworthy.
The Data Protection Registrar (now the ICO) ruled that this was too broad and didn’t meet the requirements for a ‘fairness of processing.’ Fast-forward to 2020 and this is now enshrined in General Data Protection Regulation as rights to automated individual decision-making.
Article 22, as it’s known, says that “the data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly affects him or her.”
‘Solely’ will undoubted be the crux here: the rankings that are used were provided by teachers (assumed human!) Presumably this will be the defence, as legal proceedings are being brought against Ofqual, that this constitutes adequate human input into the decision-making process.
Algorithms do not have any agency of their own, they are subject to the individual biases of their creators, and historical statistical models can only repeat the past. They shine the light brightly on the socio-economic disparities from the region, to ethnicity, to institutional privilege, and many more, present in the UK.
In Maslow’s hierarchy, esteem and self-actualisation are higher-order needs. Even with a 100% accurate model, subjecting individuals to automated assessment is hugely disenfranchising.
At the 60,000ft level, the aggregate decisions may appear statistically sound, but for thousands of students at sea-level, they have been denied the ability to determine their future.
300,000 URLs the tried to scam victims out of money have been removed by NCSC in four months, and 1.8M reports from members of the public to the Suspicious Email Reporting Service since its launch in April this year ncsc.gov.uk
Other newsy bits
ReVoLTE vulnerability allows decryption of 4G calls
The way that many telco companies have configured their Voice-over-LTE (VoLTE) 4G calls means that it is possible to decrypt calls. When you make a call over 4G it will likely be encrypted using a stream cipher, where one block is used to deprive the encryption for the next block. The starting key is meant to be unique to each call, however, researchers found many mobile operators they assign the same key for all calls on the same base station (mast/tower.) An attacker on the same base station as you can call you and it will use the same encryption keys. By keeping you on the phone they will collect the encryption keys for subsequent blocks too and can use these to decrypt a previously recorded phone call. In Germany, the researchers found as many as 80% of base stations may have been configured incorrectly in December 2019. Since reporting the issue to the industry body GSM Alliance all appear to have been addressed. zdnet.com, revolte-attack.net
MITRE launches Shield knowledge base
The US not-for-profit has released a new collection of information for blue team defenders. Shield covers the opportunities, use cases and high-level procedures that detection and response teams may wish to implement to counter the tactics and techniques described in the adversary-focussed ATT&CK framework. It looks like it will become a useful resource for teams looking for a reference of how they may defend against different threat actor behaviours. There’s also a focus on deception and decay technologies to detect and disrupt attacks. mitre.org
Attacks, incidents & breaches
- Internet access, mobile and landline communication has been ‘shut down’ - reportedly by the government - in Belarus after unrest following last weekend’s disputed election results wired.com
- 513 emails containing 28,000 people’s information were forwarded out of security training provider SANS’ control following a phishing attack bleepingcomputer.com
- ‘Cheat sheets’ developed by NCC Group and detailing answers to multiple-choice and practical exercises to the CREST Registered Tester exams were posted to GitHub. The content suggests a long-running emphasis on training employees to ‘pass’ the exam, rather than understand the concepts, from the security consultancy theregister.com
- Instagram has paid a $6,000 bounty to a researcher that found they were not properly deleting images and direct messages from their services, in potential violation of data protection regulations techcrunch.com
- NSA release details attribution, technical info and indicators of compromise for Drovorub malware used by Russia’s GRU (AKA Fancy Bear, PT28, Strontium) on Linux systems defense.gov (PDF)
- Patch your Citrix XenMobile instances: “we do anticipate malicious actors will move quickly to exploit” — Citrix CISO cyberscoop.com
- Researcher drops zero-day for ‘inadequately patched’ pre-auth remote code vulnerability in vBulletin forum software zdnet.com
The vBulletin story above is a good example of the ‘security research’ source Phil Huggins and I have in OISRU. You may be forced to respond to a risk event in order to avoid an ‘active’ attack at short notice.
- Great Firewall of China is now blocking TLS 1.3 connections using ‘ESNI’ feature that encrypts the server name they are communicating with theregister.com
- Forrester are predicting, with the same proliferation of TLS 1.3 and DNS-over-HTTPS, “within two years you’ll lose the ability to analyze network traffic” zdnet.com
- NCSC write up of the security behind the revised NHS Test & Trace App and changes from the previous version ncsc.gov.uk
Internet of Things
- A bug in Amazon’s Alexa voice assistant may have allowed attackers to access voice history and add/remove ‘skills’ (apps) from the victim’s devices bbc.co.uk
- UK Court of Appeal landmark ruling against South Wales Police use of facial recognition tech was ‘unlawful’ and ‘violated human rights’ independent.co.uk
Mergers, acquisitions and investments
- Tel Aviv startup Adaptive Shield raises $4M of seed funding for SaaS security scanner techcrunch.com
Researchers have secretly been hobbling Emotet malware for the last six months
This year there has been a recurring theme of malware gotchas (vol. 3, iss. 30) and own-goals (vol. 3, iss. 28). This week it’s that, for the last six months, security researchers at Binary Defence have been helping to detect and vaccinate organisations against Emotet malware infections. By crafting a special Windows registry key (used by the malware to store an encryption key) they were able to crash the malware before infection. The crash would also generate two log files that you can use to detect the issue, too. Good work! zdnet.com