Sharing research data
Guidance on the steps you can take to make your research data discoverable and accessible.
Overview
Research data should be made ‘as open as possible, as closed as necessary’.
When sharing research data there are a few simple steps you can take to make it discoverable and accessible. This will help potential collaborators to find your work and you to reuse your data in the future.
These steps will also help to make your data FAIR – – in line with the requirements of major research funders and the University’s open research statement.
-
Check that you have permission to share the data.
-
Prepare your dataset with appropriate metadata and documentation.
-
Decide if your dataset will need to be restricted or embargoed.
-
Choose a repository – either that specified by your funder, one used in your discipline, or the University of Sheffield repository ORDA. You can find a short video introduction to data sharing and the ORDA repository .
-
Upload your data and assign a licence to it.
-
Add a data availability statement linking to the data in any related papers.
Considering these steps early in the project, via a data management plan, can make it easier to effectively share research data.
For more information on how to apply the FAIR principles to your research data and outputs, see our .
The guidance is split into sections covering pre-, during, and after a research project and actions that can be taken at each point to help you make your outputs more FAIR.
It also contains extra information broken down by the type of research data that you are handling, such as sensitive data or code/software.
Permissions
Whether you make data available openly or on request, you must make sure that you are permitted to make data available within the terms of participant consent partner agreements and conditions of third-party data providers.
If you can’t share raw data, you may be able to share analysed data, but you will still need to check that this complies with stakeholder agreements.
Preparing your dataset
When preparing to share a dataset you will need to ensure that the files are organised in a clear structure, using open or commonly used file formats.
Remember to upload any software code that has been developed to generate or process the data, or details of proprietary software used.
It is also important to consider the metadata which will help people to find out what is included in a dataset and how it can be accessed and used.
While basic metadata such as title and creator are mandatory in most repositories, individual repositories may require other descriptive and technical details.
You should check these requirements when you choose a repository and ensure you retain the relevant information throughout your project. You can find more information about metadata from the and .
The terms ‘metadata’ and ‘data documentation’ are sometimes used interchangeably.
While ‘metadata’ usually refers to details in a repository record that enable data discovery and access, ‘data documentation’ generally refers to information stored with a dataset that enables understanding and reuse of data, such as a README.
It’s important to include both types of information when sharing data.
Top tip: Remember that an openly shared dataset is available for everyone to see. Make your metadata detailed enough to enable use of the data but simple enough for a non-expert to understand what the dataset is about.
Restrictions and embargoes
It may not be appropriate to share some data – perhaps because it contains sensitive data or extremely large files – but you may be able to share the data, or part of it, on request.
It is usually best to make arrangements for this to be done through your department rather than giving your own email address, which may cause issues if you are not available or are no longer at the University.
You can create a ‘metadata-only’ record in the University data repository, ORDA, giving details of how to request data access. In the case of sensitive data, you may require completion of a form agreeing to specific conditions of reuse.
You may sometimes need to place an embargo on data, perhaps to comply with funder or publisher requirements. Temporary or permanent embargoes can be placed on data and metadata in ORDA and many other repositories.
You can also restrict access to data in ORDA to certain groups of people, such as members of the University or your department.
You may need to check if Export Control Legislation applies when sharing your data outside the UK. Export controls are measures imposed by the government to regulate the transfer of goods – including data – to other states, where there are end-use or end-user concerns, and when destinations are subject to sanctions or other restrictions.
For example, applied research that could be misused for military purposes would be considered particularly high risk. (staff only link) can provide further information and advice.
Top tip: If you upload data to ORDA that require different embargoes or access conditions, it’s best to upload them as separate items.
Choosing a repository
Research data repositories are the best option for storing and publishing research data at the end of a project. Some funders may recommend a repository or provide their own data centres, but generally you will select the most appropriate repository for your data.
The University data repository, ORDA, is suitable for most non-sensitive research data, and there are also many subject-specific repositories (you can find details of these at ).
Sharing sites such as ResearchGate and Academia.edu are not considered suitable alternatives for long-term storage and publishing of research data.
Most repositories will give your data a DOI (Digital Object Identifier), a persistent serial number that can be assigned to a digital object such as a research paper or dataset.
Objects with a DOI are guaranteed to be accessible online for the foreseeable future, and can be found using the DOI even if they move to a different website (eg if a journal changes publisher). DOIs can save time when adding outputs to databases, and there are also online tools that use DOIs to and .
A DOI is automatically allocated to items published in ORDA. If the most suitable location for your data does not assign DOIs, you could create a record in ORDA and ‘link’ it to the data. You can then use the DOI provided by ORDA as a permanent identifier for your data.
Top tip: Choose one repository for your data. It can cause difficulties for you and confusion for other researchers if your data is available in different places with multiple DOIs.
Uploading data and choosing a licence
When you upload data to a repository such as ORDA, you will usually be given a choice of licences under which to make your data available. A licence tells people how they can use your data, and options often include the Creative Commons licences.
There are different types of licences available, ranging in how open or restrictive they are, to select from in order to best suit your needs. There are also licences specifically for , including MIT and Apache.
Top tip: If different parts of your data require different licences, it’s best to deposit them separately and give links to the related items.
View our short video about Creative Commons licences .
Data Availability Statement
When producing publications or other research outputs, it is important to tell readers if and how underlying data can be accessed. The best way to do this is to include a data availability statement, also known as a data access statement.
If data is available through a repository, you should include the dataset DOI, which provides a direct link to the dataset. Not only does this enable readers to access the data, it enables them to cite your data more easily and accurately if they reuse it.
If data can only be made available on request, you should give details of how access can be requested in a data availability statement.
Top tip: Many funders require a data availability statement to be included in research outputs – whether data is available openly, on request, not at all, or even when there is no data involved.
Examples of data availability statements include:
- Data supporting this publication can be freely downloaded from the University of Sheffield research data repository at <your doi here>, under the terms of the Creative Commons Attribution (CC BY) licence.
- Data supporting this publication include personal information, and may be obtained by contacting <group email address>@. A signed data sharing agreement may be required to comply with participant consent.
- Data supporting this publication are confidential, and can only be supplied by our industrial partner, <name>.
Further examples of can be found on the University of Manchester website.
For further information, contact rdm@sheffield.ac.uk.