Skip to Main Content

Web Archiving: Web Crawling Program

The following is the current Web Archiving Program at Virginia Tech University Libraries. Virginia Tech utilizes the Archive-It web crawler hosted by the Internet Archive.

Web Archiving @ VT

Web Archiving Program at Virginia Tech University Libraries

Virginia Tech University Libraries conducts a biannual crawl of the Virginia Tech web domain. Alex Kinnaman, Digital Preservation Coordinator, oversees the program. The purpose of this program is to systematically capture all university web content in order to protect against data loss during web content migrations, and also to archive all areas of university web communications as a matter of institutional history.

The scope of the crawl includes all top level domains at vt.edu (e.g. liberalarts.vt.eduicat.vt.edudsa.vt.edu, etc.).  The scope also includes externally hosted pages related to VT faculty research and professional activities. Members of the university community may contact Alex Kinnaman (alexk93@vt.edu) to request targeted crawls to archive topical collections of web documents related to research projects. 

Due to limitations of crawling technology, and the unpredictability of web site design, the scope may exclude certain embedded dynamic content such as calendars, social networking modules, and certain video formats.

The Virginia Tech University Libraries Archive-It collection is located at https://www.archive-it.org/collections/5315.

Last updated: November 3, 2018

Web Archiving COVID-19

In support of the project Hokies@Home: Documenting COVID-19 at Virginia Tech we have curated a second web archive hosted by the University Libraries to crawl relevant Virginia Tech websites daily. This web archive is publicly available at https://archive-it.org/collections/14068.

Some of the seed's URLs have changed since the initial information has been released and we have been actively monitoring when those URLs change so that we can update the seed. There may be some duplication, and some retired seeds replaced by current seeds.

Submit to the VT Web Archive

The Virginia Tech Web Archive is always expanding to fully represent Virginia Tech's web presence. If you have a website you would like to be added, please submit it to the Digital Preservation Coordinator.

Websites that are in scope include:

  • Departments, colleges, institutes, centers, and other groups affiliated with Virginia Tech
  • Digital Humanities and Digital Scholarship projects
  • Course websites outside of Canvas
  • Theses with website components
  • Personal websites of Faculty and Staff containing work related to Virginia Tech
  • Grant project websites
  • Social media from departments, groups, etc. related to Virginia Tech

Submit your website HERE.

If you have a website that is designed to be temporary, you may also add it to the web archive. You can also request a one-time crawl of the website and receive a WARC file. Please email Alex Kinnaman (alexk93@vt.edu) for additional information.

Web Crawling Explained