Applying for Data

What is the procedure for accessing cohort data?

Data access applications are very simple via DPUK. The online form allows essential information on a proposed study to be captured and sent to cohorts for their approval. As per the general timescales, once a study application is submitted, the form is sent by DPUK to cohort contacts, and a decision is then made. If a study is approved, DPUK will arrange with the applicants for signature of the DPUK Data Access Agreement, and upon receipt of a signed copy, make the cohort data available within the analysis environment.

Will researchers need to complete multiple applications for each cohort they wish to access?

DPUK aims to make all aspects of dementias data research easier for all parties concerned. The DPUK application process allows a researcher to apply for data once, with each cohort giving their decision within a framed time period. This way, even though individual cohorts will consider applications, approvals will be given centrally via DPUK, with the possibility to have back and forth conversation with a cohort during the scrutinising period.

What does the 90-day timescale for data access actually entail?

The full 90 days to access is counted once an application is submitted. DPUK will aim to process applications as efficiently as possible.

The timescale is broken down as such:

Up to 28 days for a cohort (via their designated contact or PI) to make a decision on an application. We anticipate that any amendments and queries can also be addressed during this period, but of course, this could take place at any stage of engagement. Following approval, researchers must arrange signature of the DPUK Data Access Agreement. Simultaneously, DPUK will arrange for the data to be available upon receipt of the signed Agreement. We estimate the signing process to be between 15 and 45 days depending on the Institution. Further time may be needed if a researcher has requested bespoke software for their study, or a cohort is preparing subsets of data themselves for upload to the Portal. However, we still aim to make this available within 90 days of application.

If cohorts refuse to provide their data in the context of an application, what happens?

If you apply for access to the data from four cohorts and one refuses, for example, the Data Access Agreement would be drawn up to incorporate only the cohorts who have agreed to allow data to be accessed. Normally, cohorts do provide a reason for denying access, which can be relayed back to the applicant. In the event that a significant number of cohorts refuse access to data for a study it may be necessary to evaluate whether the proposed project remains viable.

Can you modify a submitted data access proposal?

Yes – if submitted, DPUK will facilitate minor changes to content of the submission or addition of cohorts, usually by email. If major changes are needed however (complete change of cohort choice, new analysis plan etc.), we may advise a submission of a new application in the interest of clarity.

Is there a way to look at the completeness of data before you apply for access to a cohort? In other words, can you see the specific data you need are available?

It is not possible to view the actual data before making an application but DPUK is working on tools that will provide more details about the cohorts, which will be in the form of data availability tables, data dictionaries, and metadata discovery tools.

Is there any way of altering a project end date if you find out that the one chosen in the beginning was not appropriate?

There are no strict rules on this, but it would be possible to provide a justification/rationale for an alteration to the end date. Much in the same manner DPUK facilitates initial approval; we would send details of any proposed study extensions to named cohorts.

Data Types and Categorisation

What kind of data can be accessed? Subject-level or aggregate data?

The Data Portal will primarily store subject-level phenotypic data as standard. However, there will be some datasets in aggregate form and summary data derived from genetic and imaging studies. Similarly, DPUK will be storing subject-level genetic data from some cohorts, and both images and their analysis.

Will DPUK have standardised psychological and behavioural test data on the Data Portal?

Although DPUK takes data ‘as is’ directly from cohorts, many of them will use standard behavioural and cognitive tests such as MMSE, MoCA etc. DPUK does not insist on cohorts standardising their own data collection and DPUK’s metadata tools will allow researchers to identify cohorts who have used the same tests. Cohorts will of course employ these scales differently; however, DPUK will seek to release variables by category, in order that areas of study can be captured by topic, such as cognitive data, lifestyle data, and metabolomic data being available for request as a grouped set of variables.

Data Storage, Access and Analysis

How is data stored, accessed and analysed?

Cohort data transferred to DPUK is physically stored on servers at Swansea University. The data is then provided within a virtual desktop infrastructure to researchers who have had their applications approved. The virtual desktop is a Windows desktop that is accessed via the software VMware Horizon View Client. The Client allows access to the desktop, hosted in Swansea, by two factor authentication – the combination of username and password, with either a Yubikey encryption device, or a Google authenticator code.

Once connected, researchers will find data files they have been afforded access to within shared folders on the desktop, where they can then open such files in the provided software. DPUK aims to allow researchers who specialise in particular software to come to DPUK no matter what their specialisation, as DPUK provides a variety of software in the Windows desktop, such as Design Studio, WinSQL, R, Eclipse, SPSS, SAS, and STATA.

What are the rules on using data within the virtual desktop?

The datasets available from cohorts and any other providers cannot be removed from the virtual desktops; however, any derived results and outputs can be removed via the ‘data out’ process. This is a simple process of submitting files to DPUK to approve their release, which can be done from within the virtual desktop.

DPUK expects researchers to follow the analysis protocol in the application, and only data that has been applied for will be supplied to researchers. Researchers who already own, or have been approved for access to data outside of the Portal can bring this in to the Portal using the ‘data in’ mechanism.

How do I get access to my own data to use on the Portal?

DPUK has a ‘data in’ process, which is a standard feature available to all researchers. The mechanism allows for automatic approval of upload to the virtual desktop environment. Files can then be retrieved following the ‘File In and Out’ link within the virtual desktop, and subsequently analysed within the Portal.

Is it possible to add bespoke analysis tools in the Portal?

Yes, this can be considered. Subject to request, there are mechanisms for either DPUK installed required software, or a researcher bringing this in themselves. Please contact Chris Orton, Data Project Manager, at c.orton@swansea.ac.uk, for further details.

Is it possible for a team of researchers who may be geographically dispersed to work simultaneously on a dataset and store and access their analyses?

Yes, provided all the individuals have been named on the study and/or have signed a data access agreement granting access to data. The Data Portal desktops are user-specific in terms of access permissions, however all researchers on a study have access to the same data and are provided with shared network folders to save collaborative work into, as well as their own personal directories.

Cohort Data and Portal Access Charges

Are there any costs involved for accessing cohort data?

The DPUK Data Portal is a free-to-use resource for researchers across the world, subject to bespoke arrangements (see below) may incur a charge If a researcher is happy to make use of the standard provided infrastructure, including software and storage, there is no cost for analysis. We will only discuss possible charge for studies where a researcher wishes to use bespoke software, scaled-up compute power and memory, or is seeking to engage with DPUK for experimental medicine funding.

DPUK aims to absorb any data access costs that cohorts would normally charge, and make data freely available subject to approval of use.

Changes to Approved Studies

Can I modify a study once it has begun?

Yes – once a study is underway, there are a few options for modifications to take place. Should the modification be a time extension request, then we will ask the necessary cohorts to approve the new end date in the same manner as the original approval. If more cohorts are to be added to the study, we can simply use the original proposal and ensure it is sent to the new cohorts for approval, the data will then be made available subject to this approval. Should a study need to completely change its topic, methodology, or cohort selection, we would advise creating a new submission to DPUK, in order that the new study proposal is considered appropriately.