Making the most of data

From our Care and Cure magazine of Summer 2015, find out more about how researchers are sharing data about dementia.

With the availability of new technologies and access to detailed information about public health, researchers are producing more data than ever before. To make the most of this information, new efforts are making it easier to share widely among researchers. 

We live in an age of information. Not only is it easier to create more information, it is also getting easier to tap into the sum knowledge of humanity.

Research is continuing to produce increasing amounts of data. One obvious source of this is in genetics studies - the entire sequence of the human genome would take 100 books to print out. However the use of data goes far beyond genetics; researchers are also gathering vast amounts of data from health records and long-term group studies.

Gathering this kind of data can be expensive - the original Human Genome Project cost roughly £2 billion and phase III clinical trials regularly cost hundreds of millions of pounds - and it is important that we can make as much use of it as possible. 

Pooling data

Dr Graciela Muniz-Terrera from University College London is being funded by Alzheimer's Society to look at past studies and better understand the relationship between education and the decline of brain function in dementia.

'I'm trying to see whether there are common patterns across the different datasets. I understand that this is the only way we can make sure that a result we find is not just a fluke of the data. This involves the analysis of different datasets, trying to minimise the differences in results that may emerge from using different statistical methods.'

This kind of study is known as a co-ordinated analysis, where a researcher analyses data from several sources in a consistent way to reduce differences from the various approaches used. By pooling together data from several studies, new results can appear that wouldn't have been possible to see in any one study. 

This kind of research isn't the only way that sharing data can help. Additional information may also help to build connections between fields of expertise. As Dr Petroula Proitsi explains in her article in this issue, access to large datasets from other researchers can help to build a better understanding of the bigger picture. 

However pooling data is not always straightforward. 'In analysis of cognition, that is not an easy thing to do. Say if I want to measure verbal fluency, one study may have one measure of verbal fluency and another one will have another measure,' says Dr Muniz-Terrera. 

'So we cannot bring together these datasets, because even though they measure the same cognitive function, the tests used are different and may have different degrees of difficulty. So although these methods of pooling data from multiple studies may work in other research areas, in our area it is more difficult.'

In confidence

Another difficulty involves the ethical concerns of sharing people's personal information. When participants sign up to take part in a study it is based on an understanding of how their data is going to be used and protected. So when sharing their data, researchers have to ensure that they abide by the participants' wishes.

'I think that with development of technology, somehow we will be able to access data in such a way that the data does not need to leave researchers' facilities,' says Dr Muniz-Terrera. 

'There are ethical concerns, like identifiability of patients and so on, and people are concerned about where copies of their data are stored and I understand that. You can't break participants' confidence and trust.'

She adds that there is a need for an 'initiative that reassures the researchers and the study participants that their data is safe, but at the same time that doesn't make the use of the data more difficult for researchers.'

One way researchers are trying to tackle these difficulties is through the creation of theDementias Platform UK, a multimillion pound public-private partnership. This will bring together 22 existing studies in the UK with a total of 2 million participants to create the world's largest population study for dementia.

'The challenge is overcoming the very real technical and ethical challenges of data sharing, so that the scientific advantages may be realised rapidly and cost effectively,' says Dr John Gallacher, director of the platform.

'The Dementias Platform UK aims to do this by developing a "one-stop shop" for scientists in which the technical and ethical challenges have been addressed centrally, releasing scientists to get on with what they do best: the science.'

Ways of sharing

Although the Dementias Platform UK includes an impressive number of participants, there are still hundreds of smaller studies that are not included, which might look at more niche areas of study. Dr Muniz-Terrera has been experimenting with ways of making the most of these smaller studies, without the researchers having to distribute their data.

'I produce a script for statistical analysis and I send it to different people involved in the studies. They run the analysis and send me back the results. However that takes a long time, because it involves the other person having time to do the analysis and then I get the results back, but there is always some information missing so it takes many, many iterations.'

As well as this approach, she has also found that a more direct method is to do this in person. 'We have organised workshops and invited researchers associated with different studies to come to our workshops, where we don't access their data. Each of them comes with their own datasets and we guide the analysis and then we produce a common paper with summary results.'

Dr Muniz-Terrera is 'really hopeful' for the future of the Dementias Platform UK and other efforts to aid data sharing, but cautions that it won't be easy. 'We are in relatively early days in terms of using multiple datasets and we still have to convince people that it is worth the effort.'

Further reading