Who will benefit most from the Data Portal?
The centralisation of previously disparate cohort datasets should benefit the research community across the board, from early career scientists looking to engage with wider cohort data, to established principal investigators who wish to collaborate or make use of the technology for their own research. Cohorts will benefit from increased interest in their data and both the technology and the reach of DPUK will allow for new types of research to be undertaken on such cohort data, which may have previously not been possible or even considered.
In a broader sense, it is hoped that the DPUK Data Portal will be evidence that national data repositories that make use of world-leading infrastructure for analyses are fundamental for the advancement of epidemiology and experimental medicine, as a use case of unprecedented collaboration and influential outputs.
What does the 90 day timescale for data access actually entail?
The full 90 days to access is counted from once an application is submitted, as some applications will obviously take longer than others from a researcher perspective depending on cohort response. DPUK will aim to process applications as efficiently as possible.
The timescale is broken down as such:
Up to 28 days for a cohort (via their designated contact or PI) to respond to an application. We anticipate that any amendments and queries can also be addressed during this period. Following approval, researchers must arrange signature of the DPUK Data Access Agreement. Simultaneously, DPUK will arrange for the data to be available upon receipt of the signed Agreement. We estimate the signing process to be between 15 and 45 days depending on the institution. The only further area that will lengthen the access process is if the researcher has requested bespoke software for their study, or a cohort is preparing subsets of data themselves for upload to the Portal, which we are aiming to be available within 90 days of application.
Do you have any idea of a timescale for availability for accessing the cohort data, 6 months, 12 months' time?
With regard to timescales, we are aiming to have 25 cohorts’ data engaged with the Portal, either having shared their data to the repository or ready to upload on a study-to-study basis by the end of March 2018. We will however make any data we have available subject to approval on an incremental basis, or facilitate data being uploaded on a study-to-study basis where necessary subject to cohort circumstances. By the end of 2018, DPUK hopes to be engaged on a data share or facilitated access basis with at least 30 of its cohorts.
Are there any costs involved for accessing cohort data?
The DPUK Data Portal is a free-to-use resource for researchers across the world, subject to bespoke arrangements being requested. If a researcher is happy to make use of the standard provided infrastructure, including software and storage, there is no cost for analysis. We will only discuss possible charge for studies where a researcher wishes to use bespoke software or scaled-up compute power and memory.
DPUK aims to absorb any data access costs that cohorts would normally charge, and make data freely available subject to approval of use.
What is the procedure to get access to these cohorts?
Data access applications are very simple via DPUK. The online form allows essential information on a proposed study to be captured and sent to cohorts for their approval. As per the general timescales, once a study application is submitted, the form is sent by DPUK to cohort contacts, and a decision is then made. If a study is approved, DPUK will arrange with the applicants for signature of the DPUK Data Access Agreement, and upon receipt of a signed copy, make the cohort data available within the analysis environment.
Will researchers need to complete multiple applications for each cohort they wish to access?
One of the most important factors for DPUK is to make all aspects of dementias data research easier for all parties concerned. DPUK has an application process that it manages that is envisaged to allow a researcher to apply for data once, with each cohort implicated then giving their decision within a framed time period. This way, even though individual cohorts will consider applications, approvals will be given centrally via DPUK, with the possibility to have back and forth conversation with a cohort during the scrutinising period.
What kind of data can be accessed? Subject-level data or aggregated data?
The Data Portal will primarily be storing subject-level phenotypic data as standard, however there will be some datasets in aggregate form and summary data derived from genetic and imaging studies.
Will DPUK have standardised psychological and behavioural test data on the Data Portal?
Although DPUK is taking data ‘as is’ directly from cohorts, many of them will use standard behavioural and cognitive tests such as MMSE, MoCA etc. Even though DPUK will not lead cohorts in standardising their own data collection, DPUK’s metadata tools will allow researchers to identify cohorts who have used the same tests in order to aim for applying for standard sets of variables within datasets. Cohorts will of course employ these scales differently, however DPUK will seek to release variables by category, in order that areas of study can be captured by topic, such as cognitive data, lifestyle data, and metabolomic data being available for request as a grouped set of variables.
How is data stored, accessed and analysed?
Cohort data that is transferred to DPUK is physically stored on servers at Swansea University. The data is then provided within a virtual desktop infrastructure to researchers who have had their applications approved. The virtual desktop is a Windows desktop that is access via the software VMware Horizon View Client. The Client allows access to the desktop, hosted in Swansea, by two factor authentication – the combination of username and password, with either a Yubikey encryption device, or a Google authenticator code.
Once connected, researchers will find data files they have been afforded access to within shared folders on the desktop, where they can then open such files in the provided software of their choice. DPUK aims to allow researchers who specialise in particular software to come to DPUK no matter what their specialisation, as DPUK provides a variety of software in the Windows desktop, such as Design Studio, WinSQL, R, Eclipse, SPSS, SAS, and STATA.
What are the rules on using data with the virtual desktop?
The datasets available from cohorts and any other providers is not allowed to be removed from the virtual desktops, however any derived results and outputs can be removed via the data out process. This is a simple process of submitting files to DPUK to approve their release which can be done from within the virtual desktop.
DPUK expects researchers to follow the analysis protocol in the application, and only data that has been applied for will be supplied to researchers. Researchers who already own or approved for physical access to data outside of the Portal can bring this in to the Portal using our ‘data in’ mechanism, either for its own analysis or combination with cohort data, however such use must also be outlined in the DPUK application.
Will the processing capacity of the portal cope with a large number of researchers carrying out big studies?
The DPUK Data Portal is housed on an instance of UK Secure eResearch Platform (UKSeRP) at Swansea University. This infrastructure is scalable to meet demand and DPUK has a complex instance of the platform that will both support a large number of concurrent users and also multi-omics data studies that have the possibility to contain very large datasets. The virtual desktop infrastructure that is used to provide the remote desktop to a researcher is also customisable, if specialist software or larger memory is needed.