Library Guides: Research Data Services at Penn State Harrisburg: Data Discovery and Storage

General information

Whether your research project requires you to gather and create your own data, or to locate pre-existing datasets, it is important to know your data discovery and data storage options. Knowing how and where to locate and store data is imperative for a smooth research process and this section provides assistance in doing so. What data are you looking for? What data do you want to store? Often, researchers are looking for (or creating) three types of data:

raw data sets,
summary statistics, and/or
restricted or proprietary data.

If you need assistance, the library can help! Your Harrisburg librarians can provide frontline assistance and will be able to get you started on your data journey. If necessary, your librarian will connect you with the Research Informatics and Publishing department where you can submit your data related questions via their contact form.

Data repositories

A data repository is a location that holds data, makes data available to users, and organizes the data in a logical manner (National Library of Medicine, 2020). Additionally, data repositories are great places to find and/or store research data. Data repositories fall into three categories: General purpose; Discipline/domain specific; and Institutional repositories. A more detailed description is below.

General purpose data repositories are domain agnostic. Generally, these repositories accept all files and all formats. While they are good to search, they are not always the best fit for finding and storing data. Examples of this type of repository are: Dryad, figshare, and Zenodo.

Discipline / domain specific data repositories are primarily designed for specific research domains. These repositories offer more search options than general purpose ones. Often, users can browse by subject, country, and/or content-type. Additionally, domain specific repositories offer several filtering options for users. If you are interested in locating a domain specific repository for data, a great place to begin is the Registry of Research Data Repositories, also known as re3data.

Institutional repositories are data repositories that are created and hosted by institutions. These are available to researchers affiliated with the institution and the availability of data is determined by the researcher and/or research team. Researchers can make their data open and publicly available or only available to those within the institution. ScholarSphere is Penn State's institutional repository and anyone with a Penn State Access ID can deposit materials relating to the University’s teaching, learning, and research mission to ScholarSphere. All types of scholarly materials, including publications, instructional materials, creative works, and research data are accepted. In addition to ScholarSphere, DataCommons is a disciplinary data repository at Penn State to which researchers may submit data for dissemination and compliance purposes.

Selecting a data repository

Like other aspects of a research project, investigating and selecting an appropriate data repository can be daunting. If you or your research team have a voice in the selection process, considering these features may help:

Provides your work a landing page with metadata,
Provides a persistent identifier (e.g. DOI),
Matches your needs (formats accepted, access, back-up and recovery, etc.),
Sustainable platform and includes longevity of service,
Clear terms and conditions,
Tracks how the data has been used (if that’s important to you), and/or
Low to no cost.

Data storage options at Penn State

At Penn State, researchers have several data storage options that offer different features. Here you will find a description of four storage options available to Penn State users: ScholarSphere, Penn State's G Suite, PSU OneDrive, and Penn State's Institute for Computational and Data Sciences (ICDS). Ultimately, your data storage selection should serve your needs and you are encouraged to reach out to the responsible parties for more information.

Table exploring four different data storage options available to Penn State users.
Storage Option	Key Features
ScholarSphere	Self-deposit for PSU students, faculty, and staff Deposit, actively manage, and share scholarly works Flexible access and visibility controls Assistance in complying with data management planning and sharing requirements Long-term access and preservation of work Curation services from University Libraries
Penn State's G Suite	Available to PSU students, faculty, and staff External collaborators possible Unlimited space Any type of file, 5 TB
PSU OneDrive	Available to PSU students, faculty, staff, and emeritus Data needing more security External collaborators possible Unlimited space Any type of file, 15 GB
PSU ICDS	Available to PSU PIs and their students, post-docs, researchers, staff, and sponsored guests External collaborators possible By default, every ICDS-ACI user receives access to two free storage directories Additional storage can be purchased

Looking for more assistance comparing your data storage options? Penn State created a website to do just that. Check out the university's Data Storage Finder to find the data storage option that best suits your research needs.