How much can data tell us? - Ada Lovelace Institute

In our latest issue of Trust & Foundation News, we spoke to members about the kinds of data they use and what it can help them to achieve, as well as some of its risks and limitations.

The seventh and final article in this series comes from Reema Patel, Programme Manager – Data Ethics & AI, Ada Lovelace Institute.

ACF member the Nuffield Foundation worked with partners to create the Ada Lovelace Institute to ensure that data and artificial intelligence (AI) work for people and society. As part of this work, the institute is grappling with many social and ethical challenges that arise.

One of the challenges is around the concept of data ownership. People have very different views on whether data can be owned in the same way that, say, a chair can be owned. Clearly, we are not talking about ownership in the traditional sense. For example, if I buy a chair, it is exclusive to me, and it costs something. Data is something that can be shared, or held by diverse groups of people simultaneously, and can be free to access at the point of use. So it can be argued that it is more like a public good, like the BBC or air, than it is like a chair.

It has been suggested that the relationship is instead one of stewardship, that those who hold data are holding that data on trust for a group of people. That brings us to the really interesting point of data for the common good, or in the public interest. The legitimacy of the use of the data comes not from the fact that someone owns the data, but from the fact that the data has been shared, and the holder is trusted to use it for public purposes.

Much of the existing debate about data sharing has been framed around the notion of consent. However, that consent may be given in a very superficial way, in a tick-box manner, and people don’t always realise the implications of what they are giving away. Two prime examples are the cases of Cambridge Analytica and Facebook. People may have consented to data sharing in the traditional sense, but feel uncomfortable with the subsequent use of that data and information. More recently, the NHS rowed back on proposals to share NHS patient data with the Home Office, because in that instance the data sharing was not seen as in the public interest, and not protecting the rights of migrants or asylum-seekers.

But data sharing does not mean that data has to always be in the public domain – in some instances this might be highly inappropriate. The Office for Artificial Intelligence is looking at a concept called ‘data trust’. The tension is the fact that data is often valuable when it is merged and shared, and it can be valuable for public good purposes, but that needs to be balanced against potential harm to people and the risk of exploitation if it is in the public domain. A really good example is confidential information about people, such as their medical condition. It is perfectly legitimate that people might not want that in the public domain, but they might feel differently if that data is used to advance medical research or by foundations and other organisations focused on social good.

The government is currently working with the Alan Turing Institute and the Open Data Institute to design what a data trust might look like, and to create a framework for using data for public good purposes while still protecting the rights of people to whom the data relates.

There is also the issue of what you could describe as data monopolies. Traditionally a lot of data is collected for commercial purposes by organisations like Facebook, Twitter or Google. They have amassed large datasets and data sources. How best can social purpose organisations, universities and research bodies work with the holder of the data for non-profit purposes? Many people and academic and research institutions are calling for the data to be donated for the public good.

Artificial intelligence

Foundations and other organisations commissioning research currently use quantitative and qualitative social science methodology. Artificial intelligence could complement or strengthen these methodologies, and vice versa. For example, good quality social science research can help sense check the AI, illustrate where the AI may not be identifying trends accurately, or sense check against bias. Of course, AI relies on the quality of its data and its ability to learn from the data that already helped to inform and shape it in the first place. So there is a range of challenges around the datasets the AI is learning from.

The first is whether the datasets are accurate. Take the example of the criminal justice system. In many instances the data is historical and may not reflect the assumptions made about individuals some time down the line. Has the right information been input or gathered? There is also a big risk of omission. More data is available about certain groups of people than others, and that lends itself to discriminatory outcomes for minority communities. For example, facial recognition techniques find it easier to identify certain groups, notably white middle-class men, simply because of the sheer number of input data – there is far less data on ethnic minorities or women.

More broadly, the quality of the data is really important. What the data can be used for matters. In many instances there may be a lot of data, but even if it is accurate, it does not necessarily mean it is useful or that the AI is learning much from it or identifying many patterns from it. There are mechanisms being developed to ensure that the AI is not exclusively relying on the data – the idea that ‘humans are in the loop’ in the use of the AI. Individuals are tasked with sense-checking whether the datasets are lending themselves to biased outcomes, and then feeding in to the AI rules or caveats that enable it to correct those biased outcomes. In many cases the AI is not just making an automated decision, it is throwing up greyscale scenarios for people to then make a judgement.

Reema Patel
Programme Manager – Data Ethics & AI
Ada Lovelace Institute


Other articles in this series

How much can data tell us? - Walcot Foundation 
How much can data tell us? - Friends Provident Foundation 
How much can data tell us? - Paul Hamlyn Foundation 
How much can data tell us? - Corra Foundation 
How much can data tell us? - United St Saviour’s Charity 
How much can data tell us? - Co-op Foundation 

We support UK foundations and grant-making charities