Skip to main content

Research Using Web Archives

A guide to finding and using archived websites in research, including web archive collections curated by Penn State University Libraries.

Web Archives Explained

According to the International Internet Preservation Consortium (IIPC), "Web archiving is the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use."  

Penn State University Libraries uses the Archive-It vended service from the Internet Archive to crawl and capture content on the Web. Using the Archive-It service, Penn State selects URLs for capture, organizes them into collections, performs periodic snapshots of those websites on specific dates, and then makes the website available for "playback" that is meant to match the original browsing experience. Because the Web changes frequently and the technologies used to deliver web content are complex and varied, it cannot be guaranteed that all files from a website will be captured or reconstructed with perfect fidelity. Please be aware, as you navigate the content of an archived website, that some content may not have been captured accurately or not included in the scope of the crawling activity, resulting in broken links. If you encounter missing content that you feel should be available, please let us know. More information about Archive-It can be found online at https://archive-it.org/. The full scope of University Libraries’ web archives can be found online at https://archive-it.org/home/psu.