Please enjoy this blog post written by Emily McGinn.

Open Access is at the heart of the Digital Humanities (DH). Freely available and unfettered data is key to being able to gather, curate, and analyze information about questions and research in the humanities.1 However, finding the data is only the beginning of the project.

Data is a fraught category for humanists. Using and sharing data need not be the “sciencing of the humanities,” but instead we can think of humanities objects of study–from literature, to art, to dance–as structured information that can be studied in a new way.

Within the DH community, there is a deep ethos of collaborating and sharing. Therefore, it is important to give back to the community as a part of any public-facing project. This sharing can include:

  1. Making your data accessible and useable to others.
  2. Offering details regarding how the data was used or cleaned.
  3. Adding a Creative Commons license so future researchers know how they can reuse your work.
  4. Clearly crediting everyone who worked on the project.

Even when the data is openly accessible, the ability to reuse this data depends on good documentation and the comprehensibility of the data. A thousand tiny decisions go into every DH project. These steps might seem trivial while working on the project, but without documentation it will be incredibly difficult to remember all of these small steps in the future.

A white paper, or a README file, can keep track of the decisions that went into cleaning and analyzing the data and the project’s goals. This document is beneficial both to the current researcher to record their process, and to a future researcher who needs to understand what the original researcher has already done and what research question they were answering. It can serve as a map for the future researcher to follow the train of thought of the original researcher.

The addition of a Creative Commons license to a project, site, dataset, or document can provide further insight into how a future researcher should reuse that work. There are several types of Creative Commons licenses; the most common in DH is CC BY-NC (Attribution Noncommercial). This license lets others remix and adapt the work as long as the original creator is credited, and the work is not used by a commercial entity for profit.

Even without a Creative Commons license, a DH project can include the preferred guidelines for reusing the data and research from the project. The Colored Conventions Project (CCP) is an exemplar of providing a use statement. Before downloading the CCP corpus, the user is asked to abide by their principles:

  • “I honor CCP’s commitment to a use of data that humanizes and acknowledges the Black people whose collective organizational histories are assembled here. Although the subjects of datasets are often reduced to abstract data points, I will contextualize and narrate the conditions of the people who appear as “data” and to name them when possible.
  • I will include the above language in my first citation of any data I pull/use from the CCP Corpus.
  • I will be sensitive to a standard use of language that again reduces 19th-century Black people to being objects. Words like “item” and “object,” standard in digital humanities and data collection, fall into this category.
  • I will acknowledge that Colored Conventions were produced through collectives rather than by the work of singular figures or events.
  • I will fully attribute the Colored Conventions Project for corpora content.”

(Introduction to the CCP Corpus)

While the CCP provides open access to their data, they do ask that a researcher maintains the ethos that informs their work and provides a citation back to the CCP.

Finally, in keeping with the spirit of collaboration, a public-facing DH project should give credit to all those who have worked on the project. Too often the humanities are thought of as a solitary endeavor where monographs and research are produced by a singular genius. DH requires many hands and a variety of expertise to effectively execute a project. One of the most comprehensive credits pages is from the Belfast Group Poetry| Networks out of Emory University, which includes the contributors, the architects, software used, and the research produced. An attribution of credit page does not have to be as extensive as this one, but it should give credit to those who had a hand in the project and be transparent about the work that went into its creation.

The continued existence of open data depends on the community. Shared goodwill and a dedication to providing one’s own data at the completion of a project is key to sustaining a robust community. A generosity towards future researchers and clear explanations of what research has already been done allows for the data to find a new home and become the foundation for future work.


1Humanities specific data can be hard to find (another reason to archive and publish your own). For this reason, Data Services has created a list of  humanities datasets or explore the data produced by other Johns Hopkins researchers with JHU’s Research Data Repository.

For guidance on making your own data accessible to future researchers see Data Services’ article on Archiving Data.  If there is an article or project that could be productive for your own work, but the data is not present or not in a form you can use, reach out to the author directly to ask for use of their data or establish a deeper collaboration.