Skip to main content

Digital Preservation: Web Crawling Program

The following is the current Web Archiving Program at Virginia Tech University Libraries. Virginia Tech utilizes the Archive-It web crawler hosted by the Internet Archive.

Web Archiving @ VT

Web Archiving Program at Virginia Tech University Libraries

Virginia Tech University Libraries conducts a biannual crawl of the Virginia Tech web domain. Alex Kinnaman, Digital Preservation Coordinator, oversees the program. The purpose of this program is to systematically capture all university web content in order to protect against data loss during web content migrations, and also to archive all areas of university web communications as a matter of institutional history.

The scope of the crawl includes all top level domains at vt.edu (e.g. liberalarts.vt.eduicat.vt.edudsa.vt.edu, etc.).  The scope also includes externally hosted pages related to VT faculty research and professional activities. Members of the university community may contact Alex Kinnaman (alexk93@vt.edu) to request targeted crawls to archive topical collections of web documents related to research projects. 

Due to limitations of crawling technology, and the unpredictability of web site design, the scope may exclude certain embedded dynamic content such as calendars, social networking modules, and certain video formats.

The Virginia Tech University Libraries Archive-It collection is located at https://www.archive-it.org/collections/5315.

Last updated: November 3, 2018

Web Crawling Explained