Web Archiving: How to Use the VT Web Archive
The Virginia Tech Web Archive archives the web presence of Virginia Tech. It largely includes pages for Colleges, Departments, Centers, and Institutes at Virginia Tech, as well as Digital Humanities and Digital Scholarship projects, Creative Technologies Theses, and select social media.
Things to Note
Some things to note about the VT Web Archive:
- This web archive crawls 4 levels. This means that from the original website, links that extend up to four pages into the website will be crawled
- We exclude robots.txt, meaning some sites have a robots.txt and may not allow anything to crawl them (see the Web Archiving at Virginia Tech tab for additional information)
- Social media is exclusive. Social media takes up a significant amount of storage due to the constant addition of new links
- We are always looking to expand. Please submit a request to add a website or information to an existing website HERE
Websites URLs often break, are discontinued, or are redirected. You will see in the VT Web Archive that there are three primary seeds.
- Seeds that are original and have not been redirected.
- Seeds that are no longer accessible now but show a last crawled date. An example is the Center for Applied Technologies in the Humanities, which previously hosted several Digital Humanities sites but is no longer accessible
3. Seeds that have been redirected. Specific examples include:
2. School of Plant and Environmental Sciences absorbed several other departments, such as the Department of Plant Pathology, Physiology, and Weed Science, which were then redirected to the School of Plant and Environment Science
Archive-It allows for in-depth searching of sites and pages within those sites. Users may:
- Filter by Group, Keyword, Creator, Publisher
- Select sites can be found by filtering through other field like Date, Language, Redirected To, and other
- Search titles of websites
- Search page text of websites
Submit to the Virginia Tech Web Archive
The Virginia Tech Web Archive is always expanding to fully represent Virginia Tech's web presence. If you have a website you would like to be added, please submit it to the Digital Preservation Coordinator.
Websites that are in scope include:
- Departments, colleges, institutes, centers, and other groups affiliated with Virginia Tech
- Digital Humanities and Digital Scholarship projects
- Course websites outside of Canvas
- Theses with website components
- Personal websites of Faculty and Staff containing work related to Virginia Tech
- Grant project websites
- Social media from departments, groups, etc. related to Virginia Tech
Other additions to existing sites that can be requested include:
- Additional subjects (keywords) be added
- Inclusion in a new or existing group
- Specific metadata desired for better searching/filtering
Submit your website HERE.
If you have a website that is designed to be temporary, you may also add it to the web archive. You can also request a one-time crawl of the website and receive a WARC file. Please email Alex Kinnaman (email@example.com) for additional information.