Skip to Main Content

Web Archiving: How to Use the VT Web Archive

The Virginia Tech Web Archive archives the web presence of Virginia Tech. It largely includes pages for Colleges, Departments, Centers, and Institutes at Virginia Tech, as well as Digital Humanities and Digital Scholarship projects, Creative Technologies Theses, and select social media.

Things to Note

Some things to note about the VT Web Archive:

  • This web archive crawls 4 levels. This means that from the original website, links that extend up to four pages into the website will be crawled
  • We exclude robots.txt, meaning some sites have a robots.txt and may not allow anything to crawl them (see the Web Archiving at Virginia Tech tab for additional information)
  • Social media is exclusive. Social media takes up a significant amount of storage due to the constant addition of new links
  • We are always looking to expand. Please submit a request to add a website or information to an existing website HERE

Redirections

Websites URLs often break, are discontinued, or are redirected. You will see in the VT Web Archive that there are three primary seeds.

  1. Seeds that are original and have not been redirected.
  2. Seeds that are no longer accessible now but show a last crawled date. An example is the Center for Applied Technologies in the Humanities, which previously hosted several Digital Humanities sites but is no longer accessible 

3. Seeds that have been redirected. Specific examples include:

  1. College of Liberal Arts and Human Sciences and all of its child paged redirected from http://www.clahs.vt.edu/ to https://liberalarts.vt.edu

2. School of Plant and Environmental Sciences absorbed several other departments, such as the Department of Plant Pathology, Physiology, and Weed Science, which were then redirected to the School of Plant and Environment Science

​​​​​​​​​​​​​​

Searching

Archive-It allows for in-depth searching of sites and pages within those sites. Users may:

  • Filter by Group, Keyword, Creator, Publisher
  • Select sites can be found by filtering through other field like Date, Language, Redirected To, and other
  • Search titles of websites
  • Search page text of websites

Submit to the Virginia Tech Web Archive

The Virginia Tech Web Archive is always expanding to fully represent Virginia Tech's web presence. If you have a website you would like to be added, please submit it to the Digital Preservation Coordinator.

Websites that are in scope include:

  • Departments, colleges, institutes, centers, and other groups affiliated with Virginia Tech
  • Digital Humanities and Digital Scholarship projects
  • Course websites outside of Canvas
  • Theses with website components
  • Personal websites of Faculty and Staff containing work related to Virginia Tech
  • Grant project websites
  • Social media from departments, groups, etc. related to Virginia Tech

Other additions to existing sites that can be requested include:

  • Additional subjects (keywords) be added 
  • Inclusion in a new or existing group
  • Specific metadata desired for better searching/filtering

Submit your website HERE.

If you have a website that is designed to be temporary, you may also add it to the web archive. You can also request a one-time crawl of the website and receive a WARC file. Please email Alex Kinnaman (alexk93@vt.edu) for additional information.

Digital Preservation Coordinator

Profile Photo
Alex Kinnaman
Contact:
Newman Library, Room 4062
540-231-9474

Archive-It Groups

Groups are not subjects and are designed simply to help group websites together by theme. Currently the groups included in the VT Web Archive are: