Text and Data Mining: TDM: Copyright & Licensing

Policies, practices, tools, and current issues related to TDM.

Legal considerations

Mapping the Text and Data Mining Source Landscape for Access and Use

There are a spectrum of variables when considering the legal access and usevariables of TDM, all the way from "Here are little fluffy bunnies" to "HERE THERE BE DRAGONS". One of the largest aspects to consider is copyright. Do you have permission to do what you want to do with the resource? On the maps of old, where map makers didn't know what existed, they marked "Here there be Dragons" to indicate uncertainty and potential danger. While there is uncertainty in many aspects of TDM as the landscape is being explored, you have subject librarians who are able to guide you in your journey. One caveat: Please be a good digital citizen. If you don't follow the rules for accessing the resources, VT as a whole can be denied access. Again, if you have questions, feel free to contact your librarian.

N/B: As Virginia Tech is a land grant based in the US, this guide assumes US law as the ruling source.

bunnies!Dragon on Green

Left, Bunnies: Creative Commons Zero - CC0. Max Pixel. Middle, Traveller: CC0 Creative Commons. Right, Dragon: CC0 Creative Commons

Suggestions on how to start

We can't tell you what to do - but we can suggest what to consider!

Copyrights: Do you own the copyrights for the source content? If not, is it Open Access or in the Creative Commons

  • Some resources are totally open, and include language and even creative commons licenses, to indicate all TDM activity is approved. This is the "here be fluffy bunnies" area.

Contract Requirements / Licensing Agreements: Did the Library (or someone) pay for access/have a contract for access?

  • Some resources are restricted because of the contract that was signed to allow access to the resource. "Here there be dragons," because you might violate contract terms if you do not investigate them first. 

Fair Use: For content published in the United States, Fair Use may apply to use of sources for text and data mining. This is particularly relevant for non-fully open access content where you are not the copyright holder. *However, contractual and licensing agreements can take precedence over standard U.S. copyrights and may prohibit even actions that might otherwise fall under 'fair use.' | Find out more about Fair Use and the factors to consider

API/Downloading constraints: Are there specific permissions or prohibitions built into the allowable methods for access? If so, are there other ways to access the content that are permissible and will allow you to get what you need?

Source Data Displaywhether / how to display the source and/or results of your data analysis: Here is where if you use TDM methods to analyze copyrighted material, you may not be able to display all or even very much of the source data, but you can display your analysis and summary conclusions. Reproduction of individual images or sections of text in research papers may require permission unless a separate fair use case can be made for each, as these (images or large portions of text) may be considered as using the 'whole work' (image) or too large of a portion of the work (amount is 1 factor in fair use), however, they may also be okay - talk with a librarian for guidance. (Copyright Consultation contact)