Data has become a crucial battleground in the war against Coronavirus, as many countries have used sophisticated methods for gathering and analysing data to monitor and manage the pandemic.
This could lead to a lasting shift in how data is thought about, opening up a new social contract around data and its potential public benefits. Here I set out what that might contain – and how we might move beyond the frustrating vagueness that has characterised much of the debate about data (a parallel blog has just been published on the Economist Intelligence Unit website).
The binary debate of the 2010s
For the past decade discussion of data has often been squeezed into a binary framework. On one side were big organisations – governments and large companies – harvesting data on an unprecedented scale with, often, little honesty about what they’d done with it and a tendency to assume that privacy was a thing of the past. Against them grew up a community of activists who argued for new rights and restrictions to put data under the control of citizens. In essence this was a battle between the surveillance state and surveillance capitalism on the one hand, and resisters and rebels on the other. Strangely both sides thrived: while the profits of Google and Facebook continued to soar, by the end of the last decade the activists were winning some of the battles, as the EU adopted GDPR and companies like Microsoft and Apple repositioned themselves as guardians of privacy.
COVID-19 has now shown the limits of both data hubris and data restriction. Smart use of data from multiple sources can be very much in the public interest. But it’s clearer than ever that strong rules will be needed if the power that creates isn’t to be abused.
Where we may be headed is a new social contract around data that combines three distinct elements: first, new norms of data minimisation and privacy by design; second, strong laws to punish abuses; and third a new generation of regulators and institutions charged with maximising the public value from data. If we can get this right, we’ll see radically more data sharing where there is a public interest in doing so, and less where there isn’t. But the details will be all-important.
Innovations in the crisis
The prompt is the extraordinary innovation forced by the crisis. China moved first, using mobile phone data to track the millions who left Wuhan in the hours before it was cut off, and then later using Alipay and WeChat’s HealthCode (which also drew on self-reporting and medical records) to give people red, yellow or green status to determine their freedom of movement, depending on whether had been near infected individuals. Taiwan then also used mobile phone data to track people who had been infected and manage their quarantines. Singapore relied on a combination of its TraceTogether app and teams doing investigations and interviews to find who needed to be tested. South Korea used smart phone data, credit card payments and other sources to find out who had been in contact with who (and sparked controversy when transparency about people’s travel patterns uncovered illicit affairs), and then a Self-quarantine GPS-based tracking app.
Each approach was slightly different. But all of these countries were aggressive in pulling data together to contain the crisis. Western countries had little comparable, but are now trying to copy them. In the UK, for example, much effort is now going into an NHS app that it’s hoped a majority of population will adopt to allow a faster end to the lockdown.
One irony of these experiences is that new apps aren’t technically needed, since smart phones automatically know where they are, and so intelligence agencies and phone companies can easily track who was in close proximity to who (and in Israel Shin Bet, the intelligence agency, has apparently been active in using location data to track infections). The barriers are, of course, legal and cultural.
But the crisis is also throwing up important design and technical choices which are simultaneously legal and cultural choices too, and may point to the new kinds of hybrid approach that will needed in other fields.
For example, tracing can be done using either Bluetooth or geolocation through 4 and 5G networks – Bluetooth is in principle more decentralised and leaves more control in the hands of citizens, but it creates its own problems if it’s always on – a challenge Google and Apple are working on, prompting an argument with France’s digital minister, Cedric O. Another choice is how far to anonymise the data that’s collected. Europe’s DP-3T (Decentralized Privacy-Preserving Proximity Tracing) is trying to use clever randomisation and Bluetooth so that if someone is infected they can upload codes to authorities – and then inform others who have been near - but without the authorities needing to know the identities of who is being informed. This is appealing – but at a certain point there is no avoiding the need to identify people and ensure that they are showing up for tests. Here we come up against the unavoidable tension between individual rights and the collective interest – and the need for some visible governance mechanisms to judge how that trade-off should be made in different conditions.
There will be even harder judgements to make about using data to manage certification of immunity – given the uncertainties of the tests (which seem to be unreliable); the uncertainties of the science (on how much and how long immunity works); the risks of fraud; and uncertainties on everyday justice (ie how much freedom should the apparently immune get)?
As these experiments unfold in front of our eyes the crisis is bringing to the surface many of the big questions that would anyway need to be answered if we’re to make the most of data and AI over the next decade. It has already prompted some hand-wringing by prominent thinkers such as Yuval Harari and Shoshanna Zuboff, though it’s striking that, as in the past, they have remarkably little to say about possible solutions (others have been more practical, like this recent assessment from the Ada Lovelace Institute).
So what lessons should we draw? And what could a more permanent settlement or social contract around data look like? Clearly the answers have to go far beyond what’s appropriate for COVID-19. But the structure that works in the crisis may also work in other fields. So here I will stick my neck out and suggest that a longer term settlement will combine three apparently very different, but complementary elements, and that any answers need to be similarly three-dimensional.
First, we will need new approaches to technology design that build in data minimisation. We have become used to digital tools that gather and share data on an extraordinary scale, but mainly for the benefit of a handful of big commercial platforms. Google really does know more about you than you do. But this is not inevitable; it is the result of choices. The alternative route promotes data minimisation and says that companies and governments should only gather what they need. Some of the projects in the EU DECODE programme have been experimenting with doing this – for example, if you book a hotel room there is no need for hotel to know your passport or all of your bank details.
Various minimisation methods have been used during this and other crises: limiting how long data can be stored (as in Norway), or Flowminder’s low resolution data. My guess is that data minimisation and privacy by design will increasingly become the norm, and the default, but with provisions for greater gathering of data where there is a particular and clear-cut public interest.
Second, we will continue to need laws that are strong enough to penalise abuses and flexible enough to adapt to changing pressures and technologies. GDPR has become a de facto standard, and, contrary to the complaints of Silicon Valley, has turned out to be quite flexible, for example allowing employers to gather data on which employees need to be self-isolating because of symptoms but with strict rules on what can do with it. The European Data Protection Board acknowledged that an emergency like this is a "legal condition which may legitimise restrictions of freedoms provided these restrictions are proportionate and limited to the emergency period." and Article 9 allows the processing of personal information without consent if it’s necessary to protect “against serious cross-border threats to health.” It’s clearer than ever that every country will need some laws of this kind, and there is now little chance of the UK, post-Brexit, moving far away from GDPR.
Third, we will need new institutions, designed to protect trust and make judgements about trade-offs. The crisis has confirmed the glaring gap in terms of institutions with the skills and authority to be trusted guardians of data and data linking, including the kinds of data that are being gathered for COVID responses. Currently this is an empty space. Although some countries have Information Commissioners, they hardly ever appear on the evening news discussing the trade-offs and all of the big events in this space – like the Cambridge Analytica scandal - were driven by whistle-blowers and the media, not by public regulators.
Yet history tells us that when powerful new technologies arise we cannot rely just on law or design. Instead it’s the combination of law, design and accountable institutions that gives us confidence our interests are being protected. On their own law and design cannot solve the problem of making judgements about trade-offs: such as how intrusive contact tracing should be.
We take the role of institutions for granted in relation to other everyday technologies like the car, and in finance where complex ecosystems of regulation and law manage the subtleties of pensions, insurance, equities, savings and banking. My expectation is that we will see a comparable complexity in data – giving us visible institutions to work out, in the public interest, the balance of issues around options like an NHS App.
The solutions will have to be complex because the issues are. The chart above summarises some of this complexity. Some data we can control – for example, choosing whether to have an app that for the public benefit tracks contacts. But other data we can’t control, including the traces our phones leave automatically. There is a similar complexity in the value that is latent in the data. Some of it is only valuable to individuals, like most of what’s on a fitbit. But other data has huge public value, including tracking the health patterns of the virus and its impacts to help us be better prepared next time (much of which can be anonymised, in the way that Deutsche Telekom is providing anonymised “movement flows” data to the Robert-Koch Institute, a research institute and government agency responsible for disease control and prevention).
Into this space I expect we will see the creation of an array of different kind of data trust, including trusts responsible for the myriad of choices needed around health (the diagram below summarises some of these). During crises the top right-hand part of these charts becomes all the more important, requiring visible and accountable bodies to manage the decisions and make difficult judgements.
This is a debate that has hardly started, as the still remarkably vague comments from many leading opinion formers confirms. Hopefully COVID-19 will force the pace to a more sophisticated public debate, and towards a more durable social contract around data that gives us the benefits of smart technologies as well as reliable protections against misuse.