Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Digital Preservation: Web Crawling Program

The following is the current Web Archiving Program at Virginia Tech University Libraries. Virginia Tech utilizes the Archive-It web crawler hosted by the Internet Archive.

Web Archiving @ VT

Web Archiving Program at Virginia Tech University Libraries

Virginia Tech University Libraries conducts a biannual crawl of the Virginia Tech web domain. Alex Kinnaman, Digital Preservation Coordinator, oversees the program. The purpose of this program is to systematically capture all university web content in order to protect against data loss during web content migrations, and also to archive all areas of university web communications as a matter of institutional history.

The scope of the crawl includes all top level domains at vt.edu (e.g. liberalarts.vt.eduicat.vt.edudsa.vt.edu, etc.).  The scope also includes externally hosted pages related to VT faculty research and professional activities. Members of the university community may contact Alex Kinnaman (alexk93@vt.edu) to request targeted crawls to archive topical collections of web documents related to research projects. 

Due to limitations of crawling technology, and the unpredictability of web site design, the scope may exclude certain embedded dynamic content such as calendars, social networking modules, and certain video formats.

The Virginia Tech University Libraries Archive-It collection is located at https://www.archive-it.org/collections/5315.

Last updated: November 3, 2018

Web Archiving COVID-19

In support of the project Hokies@Home: Documenting COVID-19 at Virginia Tech we have curated a second web archive hosted by the University Libraries to crawl relevant Virginia Tech websites daily. This web archive is publicly available at https://archive-it.org/collections/14068.

Some of the seed's URLs have changed since the initial information has been released and we have been actively monitoring when those URLs change so that we can update the seed. There may be some duplication, and some retired seeds replaced by current seeds.

Web Crawling Explained