Web archiving is quickly becoming a popular practice. This page provides a non-exhaustive list of resources on web archiving and web crawling.
Library of Congress Guide to Creating Preservable Websites: Creating preservable websites increases how effectively and comprehensively those websites can be archived.
API (application program interface): a set of routines, protocols, and tools for building software applications that specify how software components should interact
Digital preservation: the series of managed activities necessary to ensure continued access to digital materials for as long as necessary
robots.txt: a.k.a robots exclusion protocol; a standard to inform what website should and should not be crawled
Seed: a URL crawled by a web crawler
Web archive: a collection of pages from the World Wide Web
Web crawler (aka spider): software that automatically and systematically browses the internet and snapshots webpages
WARC (Web ARChive file format): a file format derived from the ARC (ARChival file format) that is the result of harvested web pages and allows for more metadata to be captured than an ARC file format