Library Guides: Research Data Services at Penn State Harrisburg: Data Documentation

General information

Documenting your data thoroughly is essential to ensure that research data can be shared and reused. This section outlines several key factors for documenting your data properly. Data can be documented at the study level and at the individual data level. Study level documentation includes your research design and data collection methods while data level documentation can include labels and description of variables/codes; individual variables in a database and/or interview transcript codes. Three documentation formats are outlined below.

Metadata

Very simply put, metadata is "data about your data". It is highly-structured and laid out in specific fields. Often with controlled vocabularies in each field. Three metadata levels are explored below

Minimal - Basic information to enable discovery and access. Fields such as: Creator, Title, Publisher, Dates, and License/Embargo terms.

General Contextual - Administrative in nature, for example: funding agency, and grant number. Can also be methodological, for example: project information, collection process, geographical location of data collection, and data range of data collection.

Discipline Specific - Specific machine settings to enable re-use and experimental conditions for a discipline or domain.

Data dictionaries

Data dictionaries are separate file(s) that define and explain data variables. Common information includes:

Variable name and definition
How variable was measured
Data units/format
Coded values and their meanings
Any known issues with the data (missing values, bias, etc.)
Other important notes about the data

README.txt files

A README file is a simple text file that contains important information about your data. These files are often associated with code and increasingly used with research data. To create a README file for your data, use open software such as NotePad (Windows) or TextEdit (MacOS) as these tools allow users to write .txt files. Avoid creating README files in proprietary software formats such as .docx (Microsoft Word) and .pdf (Adobe).