Text and Data Mining: TDM: Sources
Policies, practices, tools, and current issues related to TDM.
Can I Mine This Source?
Library Databases:
Most of the libraries' databases do not allow text or data mining due to license agreements. We will continue to work with database vendors to include TDM in future license agreements. See the list below for vendors that allow TDM.
Project Support: For TDM projects using content from a library database, or other content sources where you are uncertain of rights or permissions, please contact us and we can investigate options with you.
Other Sources:
There are a variety of content sources that allow text and data mining openly or within stated restrictions. Resources listed in these sections below are selected examples. For assistance in identifying additional sources, please contact us.
Open Access Publishers and TDM
- HindawiThe entire Hindawi corpus, in XML and updated daily, is available for download.
- MDPIArticles can be downloaded in bulk from 160 journals from the Swiss open access publisher MDPI. It also offers chemical structures for download in a SDF file.
- PLOS: All of PLOS downloadPLOS offers all of its published content in downloadable form through October 2016. The file includes full XML but not data or images. Also see the Twitter hashtag #allofplos
- PLOS: ALM APIAn API for Article Level Metrics (ALM) of PLOS articles (views, readers, likes, comments)
- PLOS: APIThe entire PLOS corpus of research articles searchable by Solr query
Open Corpus sources for TDM
Subscribed sources for TDM
Source | Content | Fee | Access Details | Help Guides |
---|---|---|---|---|
Adam Matthew | Any Adam Matthew primary source database we subscribe to | No | Contact your college librarian | Adam Matthew guide |
Brill | Subscribed and open access journals and ebooks | Contact your college librarian | ||
Elsevier, ScienceDirect, and Scopus | Subscribed and open access content on ScienceDirect, bibliographic citations in Scopus | No | Request an API key, then programmatically access the data | |
Gale | Primary source documents: historical news, books, journals and magazines | Yes, if not through Artemis | Content delivered in hard drive or through Gale Artemis: Primary Sources | Gale guide |
HathiTrust | Complete set of digitized books and journals | No |
|
HathiTrust guide |
IEEE | All online content | No | Contact your College Librarian; requires notifying IEEE | |
JSTOR | Licensed and open access journals and books, up to 25,000 articles at a time | No | Data for Research | Data for Research guide |
PubMed | Bibliographic citations in XML | No | FTP baseline citations and incremental updates | PubMed info |
SAGE | Subscribed and open access journals | No | Restrictions on speed of downloads | SAGE guide |
Springer Nature | Subscribed and open access journals and books | Sometimes | Substantial downloads should be made through their API, which has a cost involved | Springer Nature guide |
Wiley | Subscribed and open access journals and books | No | Use CrossRef Text and Data Mining Service | Wiley info |