Facebook outage may cost the firm $60M
Facebook accidentally disconnected itself from the Internet this week, causing all of its services to be unavailable to its three billion users for six hours, as engineers scrambled to fix the issue.
The outage was caused when the social network stopped advertising Border Gateway Protocol (BGP) routes that signpost how to get to its network. The company also hosts its Domain Name System (DNS) servers itself, and that meant that human-friendly names like ‘facebook.com’ couldn’t be resolved to the IP addresses used by devices to communicate. (Rob Graham has a good explanation).
Facebook’s engineering blog confirmed most of the speculation doing the rounds though in itself prompts more questions about engineering practices and infrastructure design.
It suggests Facebook’s internal tools are so bad that it is trivial for a user to confuse a command that ‘assesses global backbone capacity’ with ‘disconnect yourself from the backbone’. Perhaps that the command in question allows capacity limits to be set, as well as read, and someone set (or it defaulted) to zero, essentially disconnecting itself. Either way, not only was it suitable unclear for the engineer, it wasn’t picked up by tooling designed to audit and prevent this type of command from getting through.
Lots of Silicon Valley tech companies end up building complex platforms with sophisticated internal toolsets for managing workloads at scale - Google even created its own filesystem and network protocol! - and the user experience and human interface on these tools can often be woeful. This is a lesson in investing similarly in the engineering toolsets, and appropriate ‘are you sure’ checks and safeguards.
Secondly, Facebook has tightly integrated and automated aspects of its network maintenance, such that its DNS servers can manipulate the BGP (Border Gateway Protocol), or routing, information of its network. There are reasons why you’d want to do this (if an IP address is unavailable, you don’t want to advertise and reroute requests to a data centre where it is available).
Thirdly, this took down Facebook’s “primary and out-of-band network access,” so… That’s not really out-of-band network access, is it, Facebook? Typically this connectivity is provisioned over separate provider’s network, to a subset of critical components that would, in the event of a catastrophic outage, allow you to get back in and correct the mistake. When I worked at Multiplay we had such a link, and an IP KVM that we could request be plugged into to servers to give us ‘hands on keyboard’ type access to critical components. (Back then, things like leaving a floppy disk in the drive caused all sorts of issues when the machine eventually rebooted. ahem.) It’s also why network engineers for mobile operators are often given SIM cards on competitor networks. Of course, managing that at ‘Facebook scale’ is going to be more complex, but the underlying principle of segregated, redundant and resilient infrastructure appears to have been missed.
This kind of ‘cold start’ for a business is something rarely practiced, but understanding the dependencies and order that things need to be brought up is often something that catches organisations out for smaller scale failures, or following bad ransomware attacks.
This is the sort of risk consolidation or aggregation that regulators and insurance companies are worrying about when it comes to ‘the cloud’. Essentially, have we got too many eggs in too few baskets. Research and modelling has traditionally focussed on a big cloud provider — AWS, Azure and GCP — and puts a three day outage at one of these providers in the region of $19 billion impact to the US economy. The impact to Facebook is reportedly going to cost the company $60 million, the total economic impact when considering all the businesses that rely on Facebook’s services, could be much higher.
To many users Facebook is primarily a place to connect with friends, share photos and be reminded of birthdays, though it has devoted significant effort to present itself to businesses as the place to not just advertise to, but access customers. That’s far more than just Facebook pages that you can like: its portfolio of Facebook, WhatsApp and Instagram are machines for commerce and communication. WhatsApp in India replaced the voice call button with a shopping button to browse catalogues and order products from businesses. It is also rolling out insurance and pension products accessible via the platform. One restaurant in Ghana reported a loss of 50% of sales because customers were unable to place orders.
Similarly the ‘login with Facebook’ button removes a lot of friction during account creation and login (plus gives Facebook insight into the apps and services) but left many sites that use this mechanism without a way for their customers to login.
Monday’s outage provides an interesting data point and avenue for investigation into this type of risk and the nature of its (unforeseen) consequences.
cnbc.com, fb.com (engineering blog), @ErrataRob explanation, arstechnica.com, WhatsApp business: India shopping, insurance, pensions [bbc.co.uk], Ghana restaurant, Risk aggregation studies: lloyds.com, cam.ac.uk
15% of respondents to SANS’ ICS/OT survey admitted at least one cyber security incident in the last 12 months, with 1/5 originating from engineering workstations, ~1/3 via remote access, and 1/2 citing ‘external connections’ as the initial vector zdnet.com
58% of nation-state attacks observed by Microsoft originate from Russia, 32% are successful (up from 21% last year), according to Microsoft bleepingcomputer.com
An older stat, but there are… 40,000 managed services providers in the US techcrunch.com
Other newsy bits
Syniverse - a company at the centre of global telecommunications - ‘quietly’ discloses breach
A Securities and Exchange Commission (SEC) filing from Syniverse has ‘quietly’ disclosed that an “individual or organization gained unauthorized access to databases within its network on several occasions.” The unauthorised access is believe to affect 235 of its telco customers. The intrusion was discovered in May 2021, but is believed to have started five years ago in May 2016.
Syniverse, who you’ve probably never heard of, helps mobile operators to exchange information when their customers are roaming, including call records (who called who and for how long) and text messages (SMS). A company presentation from 2020 boasts connecting 8 billion devices and delivering 1 trillion SMS annually.
The company says it has ‘adequately remediated’ the vulnerabilities that caused the incident. Given what an attractive target they are for intelligence agencies, I’d be amazed if it was a single actor that had targeted, and infiltrated their network.
Massive breach of Twitch data
Over 125GB of proprietary and user data from Amazon-owned streaming service Twitch was leaked online this week. The data included full source code for the platform, earnings of creators using the platform, and an unannounced competitor to Valve’s Steam gaming platform.
In a post on 4chan accompanying the leak said “Jeff Bezos paid $970m for this, we’re giving it away FOR FREE.”
Twitch has been quiet about the incident, save two short updates on its blog, that confirms the data was exposed following a configuration change. That’s frustrating many content creators (that Twitch calls Partners) who are worried about what data in addition to earnings info may be in the hands of nefarious actors. Some top streamers earn millions from the platform and that wealth may make them a target for cybercriminals.
While user accounts aren’t reported to have been compromised, if you have an account you should consider resetting your password and enabling multi-factor authentication.
UK High Court finds ruler of Dubai used NSO’s Pegasus spyware to access ex-wife’s phone during divorce proceedings
Dubai’s ruler and Vice President of the United Arab Emirates, Sheikh Mohammed bin Rashid Al Maktoum, used NSO Group’s Pegasus spyware to access the phone of ex-wife Princess Haya and her divorce lawyer, during their divorce proceedings. As Ciaran Martin, former head of the UK’s National Cyber Security Centre puts it:
Cyber security is often boring.
A British court finding that a Dubai prince used Israeli spyware to hack a member of the Lords who is also a senior lawyer who was alerted to it by Cherie Blair who advises the Israeli firm…
…is not one of those times
Interestingly, The Guardian reports an NSO insider saying that after the incident came to light, all UK country code ‘+44’ numbers were added to an exclusion list.
Attacks, incidents & breaches
- Scottish brewer BrewDog exposed details of 200,000 shareholders because of hard-coded API token that would also allow app users to claim FREE BEER, takes four attempts to fix pentestpartners.com
- UK newspaper The Telegraph leaves user data and server logs in publicly accessible ElasticSearch instance, claims less than 0.1% of users affected theregister.com
- IT technician fired from School and then IT company, carried out revenge attacks wiping data and changing passwords, pleads guilty and faces up to 10 years in prison bleepingcomputer.com
- AvosLocker ransomware gang shift double-extortion to auction system for stolen data therecord.media
- FIN12 group, associated with TrickBot gang, targets healthcare groups with revenues of at least $300M, in ransomware attacks; avoids data exfiltration and in doing so cuts dwell, or ‘time to ransom,’ from over twelve to under three days bleepingcomputer.com
- New MalKamak group, believed to be affiliated with Iran, tied to new ShellClient remote access trojan used against aviation and telco targets in Middle East, Europe and America bleepingcomputer.com
- Russian companies targeted by smaller ransomware outfits says Kaspersky bleepingcomputer.com
- QuickBooks accounting software customers targeted in phishing attacks, warns developer Intuit bleepingcomputer.com
- Misconfigured Apache Airflow instances leak creds zdnet.com
- UK NCSC has updated guidance on bring your own device (BYOD) security ncsc.gov.uk
- Google stumps up $1M to Linux Foundation’s Secure Open Source initiative that “financially rewards developers for enhancing the security of critical open-source projects” zdnet.com
- Google to mandate multi-factor authentication on 150 million accounts, including YouTube creators therecord.media
- Microsoft will disable widely abused Excel 4.0 ‘XLM’ macros (introduced in 1992!) by the end of the year for Microsoft 365 customers therecord.media
- NSA warns of wildcard TLS certificates and ‘ALPACA’ attacks therecord.media
Internet of Things
- Medtronic recalls insulin pump controllers that are vulnerable to replay attacks and could administer deadly over, or under, dosages bleepingcomputer.com Medtronic are repeat offenders in this area, with their insulin pumps (vol. 2, iss. 26) and pacemakers (vol. 1, iss. 8) requiring recalls.
- The UK’s National Cyber Force will be based in Samlesbury, just outside Preston and not far from GCHQ’s Manchester office as part of £5B investment in offensive cyber capabilities bbc.co.uk
- There’s critical national infrastructure, then there’s really critical national infrastructure: new bill proposes CISA designate “systemically important critical infrastructure” cyberscoop.com
- Members of the European Parliament (MEPs) vote in favour of resolution to ban artificial intelligence powered mass surveillance systems, including facial recognition and other biometric identifiers bleepingcomputer.com
- US Ransomware Disclosure Act would require organisations to disclose ransom payments within 48 hours techcrunch.com
- US government will bring civil claims against suppliers that hide breaches under ‘Civil Cyber-Fraud Initiative’ arstechnica.com
- Netherlands may use intelligence or armed forces in response to ransomware attacks deemed a threat to national security, says Dutch minister therecord.media
- Shout out to Immersive Labs’ UK financial services resilience test being run with the Bank of England ft.com
- US Transport Safety Administration to issue cyber regulation before the end of the year for ‘high risk’ rail and transit systems and the aviation sector therecord.media
- Police arrest two in Ukraine for their parts in ransomware attacks, seizing two luxury cars, $375K in cash and a further $1.3M in cryptocurrency zdnet.com
- Three indicted for business email compromise frauds, including Bank of America/TD Bank insider who opened new accounts for money laundering purposes cyberscoop.com
- 22 year old arresting in France for theft and leaking the COVID-19 tests of 1.4M Parisians therecord.media
Mergers, acquisitions and investments
- One Identity acquires OneLogin to boost secure sign-on capabilities in rivalry with Okta and Ping techcrunch.com
- Duality raises $30M Series B to continue development of collaboration tools that use homomorphic encryption techcrunch.com
- McAfee/FireEye completed their $1.2B merger zdnet.com
Apache’s nostaligic directory traversal bug
When you request the ‘index.html’ page from example.com, the web server you’re talking to may look for that file within the ‘/var/www/example.com/public_html/’ directory configured to host those files. To make it easier for humans to navigate filesystems, the ‘..’ folder is a shortcut to the parent directory. Directory traversal attacks exploit these shortcuts to cause unintended information to be exposed, such as a request for ‘../../../../etc/passwd’ for details of users on a unix system.
Such issues were more common in Microsoft’s IIS web server in the 2000’s, but a recent patch to the popular Apache HTTPD project introduced an issue that should have been long relegated to history.
A fix is available.