History: Historical newspapers & transcribed news broadcasts

US historical news media

Thematic/specialized historical newspaper collections

Historical news sources outside the US

About text mining

The availability of news content for text-mining and other computation analysis is extremely complicated.

In general:

  • The library's vendors of historical archives generally permit text mining of those collections we have purchased.  Talk to us before you contact the vendor, and allow time for negotiation before your research clock starts.  Costs and contractual obligations vary.  Assume that brute-force extraction from our vendors violates university policy, our contracts, and assorted laws.
  • Open-access collections of publications that are out of copyright may be amenable to text mining -- Virginia Tech researchers have a long track record with Chronicling America, for example -- though they may not offer you much technical support.
  • Providers of current news (eg, Factiva) usually do not permit text-mining and the like.  They rarely own the copyrights of the news they aggregate.
  • News publishers set their own policy about computational access to their content.  Look on each paper's website about API access.  The New York Times is very generous.  On the other hand, the Washington Post is very restrictive.


Directories of online news archives, mostly free, from around the world

Recent news sources with some historical coverage

Most recent news collections start coverage in the late 1980s or later.  Typically, they do not include the photos, graphics, other illustrations, nor advertising that appeared in original print, on microfilm, or in archival digital facsimiles.