Skip to Main Content

Data Management Plans

This guide covers what Data Management Plans are, how to make an implementable one, and how to use the DMPTool

About this page

This page is a brief overview of the relevant questions that you should consider throughout each stage of the research process before you write your DMP. You can go through each of the questions to think through the data management actions that will be necessary for your work and thus should be included in your DMP. Completing the Questions to Consider will make the process of composing the DMP - and including all needed details - much easier than going straight to the DMPTool or a template. 

This page will not cover best practices for each of these stages. To further explore each of these topics in detail, see this guide from the University of Arizona or this guide from Penn State Harrisburg.

Data Collection & Description

Define & Describe

First it is important to consider the types of data you will be collecting, the format and approximate volume of the data, and the methods used to collect the data.

  • The type of data that you are collecting will inform security considerations and documentation of that data. 
  • Understanding the volume of data that will be produced helps to determine what storage solutions will work. 
  • Thinking through collection methods before starting is advised so that the appropriate steps can be documented in the DMP and implemented. 

Questions to Consider

  • What types of data will I have?
    • Some example data types to consider include:
      • sample or specimen data (e.g., images)
      • observational (e.g., sensor data, data from surveys, data from field studies)
      • experimental (e.g., genetic/genomic sequencing data)
      • simulation (e.g., climate modeling data)
      • derived or complied (e.g., text mining) 
      • code
  • What kind of file formats will my data be in? (.txt, PDF, .csv, .jpg, .tiff, .xlsx, .R, etc.)
    • A factor to consider when thinking about data formats include whether the format is proprietary or is an open, community-supported standard. The use of formats that are open, well-documented standards with robust usage by researchers helps ensure that your data will be accessible over the long term. Learn more about file formats in the UK Data Archive's Managing and Sharing Data; a best practice guide for researchers.
  • Would a user of the data require special software to open/use the data?
  • What is the approximate volume of data to be generated? Does the data have a growth rate? (file size, # of records, etc. - this helps with choosing a storage location) How long will the data need to be retained?
  • What methods will I utilize to collect the data, including how they will be standardized for consistency and future re-use?

Data Documentation & Organization

Metadata & Organization

One of the main purposes of data management and data sharing is to make the data generated from funded research available to other parties for replication and reuse, among other uses. In order for research data to be as useful and relevant as it can be, data should follow the FAIR Principles as closely as possible. FAIR stands for: Findable, Accessible, Interoperable, and Reusable. A brief outline is included below.

Findable - Both people and machines should be able to find the data. This implies persistent identifiers – like a “digital object identifier” (DOI). You also want to make sure that your metadata is suitable for others to find/discover it. And to be findable, your data should be in a visible catalog or data repository.
Accessible - Both people and machines should be able to access the data through clearly defined, open protocols. Once a user finds the data, they need to know how they can be accessed.
Interoperable - Data usually needs to be integrated with other data throughout the research process for analysis, storage, and processing. A researcher should strive to follow disciplinary standards when describing their data and metadata. Consider the vocabularies/metadata schema that specific disciplines utilize.
Reusable - To make data reusable, it must be clearly and fully documented from beginning to end. Be sure to include a specific usage license and relevant provenance. For example: who created it, when did they create it, and how was it created. All of this information will be relevant to others who want to reuse your data.

 

To that end, it's important to determine how you will describe your data and how it will be organized throughout the project. This includes naming conventions and hierarchy of files to be created or collected, README files, data dictionaries, and metadata. See best practices for these here

Questions to Consider

  • What is your naming and versioning strategy for storing the data that is collected? 
  • What metadata will be provided with your data to help others identify and discover the data?
  • What other documentation might be necessary to enable reuse of the data? Where would you capture this type of data (e.g., readme text file, file headers, etc.)?

Data Security & Active Storage

Active Storage, Security, & Backups

This step covers how the data will be stored during the active phase of the project. In the case of sensitive or protected data, you should be vigilant about security of the data during active storage, including physical security, network security, and security of computer systems and files to protect your data from unauthorized access, changes to data, and disclosure or destruction of data. Be sure to check relevant Penn State policies for guidance on security of certain types of protected data. 

Once you have identified any limitations to where and how you store your research data, you can look at the storage options available. This section is specifically dealing with active storage and not repositories or preservation of data; that is covered later. Penn State has created a Data Storage Finder which allows researchers to select relevant criteria for a potential storage solutions and then explore the options at Penn State. See also the Storage Options for Research Data on the Additional Resources page. 

You should always have a plan for where and when your data will be backed up. Best practice for backups is to always have three copies of your data:

  • The original/active file stored on your computer. This is the file you update daily throughout the course of your work.
  • A backup copy on a physical external drive (Ex. disk, jump drive, external hard drive). This should be updated frequently in case your primary work computer is lost.
  • A backup file stored on a remote or cloud drive (Ex. shared university drive, cloud storage). This should be updated frequently in case your physical storage devices are lost. This also ensures remote access to files.

Questions to Consider

  • Where will the data be stored while the project is active? 
  • Who will have access to the data during the project, and are any platforms used to share data?
  • What is your plan to prevent data loss?
    • Where will you back up your data? How often? 
    • Who is responsible for performing backups and ensuring the backup schedule is adhered to?

Note that once you have selected your active storage solutions, there is often boilerplate language to include in your DMP about that storage option. 

Data Sharing, Reuse, & Distribution

Data Sharing & Reuse

A large part of the reason that DMPs are required for research is to ensure that researchers make a plan on how they will share their data after the project. As this is such an integral part to research, there will be a separate guide that covers data sharing best practices (stay tuned for updates). In addition, many funders, institutions, and publishers require researchers to share their data, so be sure to check the requirements for your project. 

Questions to Consider

  • Do any of the ethical, privacy, and confidentiality concerns from below affect this decision?
  • What type of license will you place on your data?
  • What data can be shared publicly? 
  • What are the journal/funder requirements for sharing your data?
  • When will the data be shared publicly? Any embargo period?
  • Which repository will you use to share your data?
  • Will any additional costs be incurred to prepare the data for sharing that should be added to the project budget?

Data Preservation

Preservation

Finally, you will want to consider what happens to your data several years after the close of the project. Many repositories do have options for preservation, but that is one aspect you will want to investigate further. Check with your institution and funder for preservation requirements and best practices for disposal of data. 

Questions to Consider

  • How long will the data be retained and available? 
  • When will it be archived? And is there any additional work that needs to be done to archive data?
  • Who will be responsible for the data after the initial data sharing period?
  • Is there a cost to long-term preservation that can be included in the budget request?

Ethical & Legal Considerations

Ethical, Confidentiality, and Privacy Concerns

These considerations must be determined as they inform storage and documentation decisions. Several aspects could trigger a confidentiality or ethical concern, most notably the collection of human-related data. This could include genomic data, personally identifiable information (PII) related to a subject, or health data (under the Health Insurance Portability and Accountability Act (HIPPA)). If your research data encompasses any of the above ethical and confidentiality concerns, you should note these in the DMP. If your project will be generating human subjects data, you should also state an intention to comply with Penn State's IRB requirements set by research administration guidelines and policies. Consult Penn State's Human Subjects Research (IRB) site. Review RA 22: HIPAA and Research at Penn State University, or, if applicable, RA 23: HIPAA and the Milton S. Hershey Medical Center and Penn State College of Medicine.

Questions to Consider

  • Are there any relevant privacy or ethical concerns related to my data? 
  • Will I have to anonymize or deidentify my data before sharing? What methods will be used?
  • Have I obtained Informed Consent from human subjects to share the data collected and in what form?
  • Have I engaged the correct specialists if necessary (i.e., IRB)?
  • Does my data have other privacy or ethical concerns such as endangered species, sensitive locations, or genomic data?
  • Is my data under the purview of HIPAA? FERPA? Export control?