Applying for Data

What is the procedure for accessing cohort data?

Data access applications are very simple via DPUK. The online form allows essential information on a proposed study to be captured and sent to cohorts for their approval. As per the general timescales, once a study application is submitted, the form is sent by DPUK to cohort contacts, and a decision is then made. If a study is approved, DPUK will arrange with the applicants for signature of the DPUK Data Access Agreement, and upon receipt of a signed copy, make the cohort data available within the analysis environment.

Will researchers need to complete multiple applications for each cohort they wish to access?

DPUK aims to make all aspects of dementias data research easier for all parties concerned. The DPUK application process allows a researcher to make one application for data from multiple cohorts, with each cohort giving their decision within a specified time period. This way, even though individual cohorts will consider applications, approvals will be managed centrally via DPUK, with the possibility to have a dialogue with a cohort during the scrutinising period.

What does the 90-day timescale for data access actually entail?

The full 90 days to access is counted once an application is submitted. DPUK will aim to process applications as efficiently as possible.

The timescale is broken down as such:

Up to 28 days for a cohort (via their designated contact or PI) to make a decision on an application. We anticipate that any amendments and queries can also be addressed during this period, but of course, this could take place at any stage of engagement. Following approval, researchers must arrange signature of the DPUK Data Access Agreement. Simultaneously, DPUK will arrange for the data to be available upon receipt of the signed Agreement. We estimate the signing process to be between 15 and 45 days depending on the Institution. Further time may be needed if a researcher has requested bespoke software for their study, or a cohort is preparing subsets of data themselves for upload to the Portal. However, we still aim to make this available within 90 days of application.

If cohorts refuse to provide their data in the context of an application, what happens?

If you apply for access to the data from four cohorts and one refuses, for example, the Data Access Agreement would be drawn up to incorporate only the cohorts who have agreed to allow data to be accessed. Normally cohorts do provide a reason for denying access, which can be relayed back to the applicant. In the event that a significant number of cohorts refuse access to data for a study it may be necessary for a researcher to evaluate whether the proposed project remains viable or whether it would be better to reapply requesting access to different cohorts.

Can you modify a submitted data access proposal?

Yes – if submitted, DPUK will facilitate minor changes to content of the submission or addition of cohorts, usually by email. If major changes are needed however (complete change of cohort choice, new analysis plan etc.), we may advise a submission of a new application in the interest of clarity.

Is there a way to look at the completeness of data before you apply for access to a cohort? In other words, can you see the specific data you need are available?

It is not possible to view the data held in the Portal before making an application. DPUK is refining tools that will provide more details about the cohorts, which will be in the form of data availability tables, data dictionaries, and metadata discovery tools.

Is there any way of altering a project end date if you find out that the one chosen in the beginning was not appropriate?

There are no strict rules on this but it would be possible to provide a justification/rationale for an alteration to the end date. Much in the same manner DPUK facilitates initial approval we would send details of any proposed study extensions to named cohorts for their comment and approval.

Who should I contact to discuss my application?

In the first instance please contact Mark Newbury, part of the DPUK Swansea team, who coordinates applications (m.s.newbury@swansea.ac.uk, Tel: 01792 604366).

Why do I need approval from the cohorts?

The Data Portal was established with an ethos of sharing and collaboration but cohorts were also assured that they would be able to retain control of who accessed their data, and for what purpose. To ensure this the DPUK application process was developed but it has been streamlined as far as possible to try and facilitate rapid access for researchers.

Do DPUK plan to incorporate additional cohorts into the portal?

DPUK remains in the process of concluding discussions with a number of cohorts who are completing the legal formalities. New cohorts who wish to explore joining DPUK should contact Chris Orton, Data Project Manager, (c.orton@swansea.ac.uk) in the first instance.

Data Types and Categorisation

What kind of data can be accessed? Subject-level or aggregate data?

Please do consult our cohort matrix and cohort directory for specific details (https://portal.dementiasplatform.uk/CohortDirectory). The Data Portal will primarily store subject-level phenotypic data as standard. However, there will be some datasets available in aggregate form together with summary data derived from genetic and imaging studies, as well as sequence data and images in their own right.

Will DPUK have standardised psychological and behavioural test data on the Data Portal?

Although DPUK takes data ‘as is’ directly from cohorts, many of them will use standard behavioural and cognitive tests such as MMSE, MoCA etc. DPUK does not insist on cohorts standardising their own data collection and DPUK’s metadata tools will allow researchers to identify cohorts who have used the same tests. Cohorts will of course employ these scales differently; however, DPUK will seek to release variables by category, in order that areas of study can be captured by topic, such as cognitive data, lifestyle data, and metabolomic data being available for request as a grouped set of variables.

Can I request specific data categories from a cohort?

This is highly desirable as cohorts are not keen to provide complete datasets where a strong scientific justification has not been provided to do so. The application form is configured to show which categories of data are available when a specific cohort is selected.

Data Storage, Access and Analysis

How is data stored?

Cohort data transferred to DPUK are physically stored on servers at Swansea University, part of the UK Secure eResearch Platform, which is accredited to ISO 27001.

How is data accessed?

The data are then provided within a virtual desktop infrastructure to researchers who have had their applications approved. The virtual desktop is a Windows desktop that is accessed via the software VMware Horizon View Client. The Client allows access to the desktop, hosted in Swansea, by two factor authentication – the combination of username and password, with either a Yubikey encryption device, or the more recently adopted mobile authenticator code. Provision of Linux distribution desktops is possible subject to discussion during the application period.

How is data analysed?

Once connected, researchers will find data files they have been afforded access to within a dedicated study folder on the desktop, where they can then open such files in the software provided. The folder access is shared across all approved study researchers to allow collaborative team science, and users also have their own personal workspaces. DPUK aims to allow researchers who specialise in particular software to come to DPUK irrespective of their specialisation, as DPUK provides a variety of software in the Windows desktop, such as Design Studio, WinSQL, R, Eclipse, Python, SPSS, SAS, and STATA. New additions include customisation for genomic analysis with a variety of tools and packages available from the Resources drive, as well as in package form via R and Python.

What are the rules on using data within the virtual desktop?

The datasets available from cohorts and any other providers cannot be removed from the virtual desktops; however, any derived results and outputs (e.g. for posters or preliminary data in support of grant applications) can be removed via the ‘data out’ process. This is a simple process of submitting files to DPUK to approve their release, which can be done from within the virtual desktop.

Researchers who wish to use derived results and outputs should follow the DPUK Publications process (see Publication Policy: https://www.dementiasplatform.uk/about/policies which involves the use of the DPUK logo on posters, for example.

DPUK expects researchers to follow the analysis protocol outlined in their application, and only data that has been applied for will be supplied to researchers. Researchers who already own, or have been approved for access to data outside of the Portal can bring this in to the Portal using the ‘data in’ mechanism. The use of such data must be stated in the submitted application, or declared when appropriate later in the study.

How do I get access to my own data to use on the Portal?

DPUK has a ‘data in’ process, which is a standard feature available to all researchers. The mechanism allows for automatic approval of upload to the virtual desktop environment. Files can then be retrieved following the ‘File In and Out’ link within the virtual desktop, and subsequently analysed within the Portal.

Is it possible to add bespoke analysis tools in the Portal?

Yes, this can be considered. Subject to request, there are mechanisms for either DPUK installed required software, or a researcher bringing this in themselves. Please contact Chris Orton, Data Project Manager, at c.orton@swansea.ac.uk, for further details.

Is it possible for a team of researchers who may be geographically dispersed to work simultaneously on a dataset and store and access their analyses?

Yes, provided all the individuals have been named on the study and/or have signed a data access agreement granting access to data. The Data Portal desktops are user-specific in terms of access permissions, however all researchers on a study have access to the same data and are provided with shared network folders to save collaborative work into, as well as their own personal directories.

How much space is available on the virtual desktop?

This depends on the technical specification of the desktop you selected at the time of application. A Standard desktop will be provided (8GB RAM, 4CPUs) unless you have asked for a Large (32GB RAM, 8CPUs) or XLarge (128GB RAM, 16CPUs) version. A charge may be made for anything other than a standard desktop and you will be advised of likely costs at the time of application.

Who should I contact to discuss data storage or access?

Please contact Chris Orton, Data Project Manager, (c.orton@swansea.ac.uk).

Cohort Data and Portal Access Charges

Are there any costs involved for accessing cohort data?

The DPUK Data Portal is predominantly a free-to-use resource for researchers across the world. Any bespoke arrangements (see below) may incur a charge. If a researcher is happy to make use of the standard provided infrastructure, including software and storage, there is no cost for analysis. We will only charge for studies where a researcher wishes to use bespoke software, and/or scaled-up compute power and memory.

Further charges may be incurred if a researcher wishes to use data from a cohort that makes a specific charge. Details of such cohorts are given on this page.

Outputs from Data Portal studies

Do I automatically have permission to publish the results of my research using data from the portal?

Researchers should follow the process described in the DPUK Data Access Agreement. Briefly, this involves written notice to Swansea 30 days prior to the submission of any publication. Swansea will liaise with the data providers who have 20 days to object to, or request a delay to, the publication for accuracy/patent reasons. If objections are made, the Swansea team will work with the objector and publishing team to facilitate discussions, with the aim of resolving any outstanding issues. Ultimately researchers also need to adhere to the DPUK Publications Policy (https://www.dementiasplatform.uk/about/policies/dpuk-publications-policy).