Ways to Share Research Data

Many funders and some publishers are placing increasing importance on the sharing of research data. This trend for more access to research data will soon be increasing (see OSTP memo). As a consultant with JHU Data Management Services, I get asked by scientists how to share research data? The diagram below illustrates how research data can be shared through a variety of mechanisms, but each solution has pros and cons that researchers need to consider.


Ease of Access

Researchers need to consider how public they want to make their data. What are your funder’s or publisher’s expectations for sharing research data? How much time are you willing to devote to responding to requests for data and then responding to follow-up questions about your data? How do others in your research community share data? Please note that data with any legal/ethical restrictions on it, such as confidentiality, security, intellectual property, and privacy concerns, should not be shared.

Providing data through peer-to-peer correspondence allows scientists to retain control over who is using it and doesn’t require upfront preparation for sharing as a website or data archive would; however, the onus for finding, sending, and explaining the data remains with the scientist.

Persistence of Data

For persistence of data, researchers need to consider their ability to preserve and understand their digital research data in the future. Can you maintain multiple copies of your data? Can you ensure that, in the future, your files can be opened and are not corrupted? Will you be able to find and understand your data in three or more years?

The more time that has elapsed between when data are generated and when data are requested, the greater the probability that 1) technological problems with the data will have occurred such as loss of file integrity, and/or obsolescence of media, software, hardware, or format and 2) the ability to find and understand your data will diminish. File sharing services and repositories provide the technological infrastructure for preserving data. In addition, because data in repositories can be better organized, documented, and cited, it is easier for others to find and understand data without having to contact the scientist who generated it.

Data Repositories

Data repositories, digital systems that actively manage data, provide the most robust access and persistence services. Repositories differ in their capabilities, but most include the following to varying degrees:

  • Providing a web-accessible interface for discovering and downloading research data collections.
  • Managing preservation of digital objects such as file integrity checking and redundant offsite backups.
  • Using identifiers, such as DOIs (digital object identifiers) to give datasets persistent location links and citations, which are more stable than URLs of websites
  • Describing projects and files, and ways to include documentation sufficient for using the collection without contacting the researcher.

Search for repositories in your field on the Re3data website, or contact us for assistance in locating a suitable data repository.

In addition, while some academic disciplines have established research data repositories, many fields of research do not have easily available options for archiving and online access. At Johns Hopkins University, JHU researchers may deposit their research data into the JHU Data Archive. If you are interested in archiving your data here, please contact us at datamanagement@jhu.edu to discuss your research and data access needs.

A Paper Database Part 2: Decoding the Key to the Roland Park Records

Part of a monthly series of posts highlighting uncovered items of note, and the archival process brought to bear on these items, as we preserve, arrange, and describe the Roland Park Company Archives.

This is Part 2 of the two-part post titled A Paper Database. Be sure to read Part 1 here!

So now it’s a time for a post that offers some very real and important insight into the Roland Park Company Records, and it’s information that I hope will aid researchers when they come to use the collection at the Sheridan Libraries. It’s also probably going to be a pretty long blog post, but you might thank me one day.

So let’s get straight to it: The vast majority of the RPC Records are correspondence files; they make up around 200 boxes worth of material. Normally, correspondence in archival collections is organized either alphabetically or chronologically or a combination of the two. Say they’re organized by year, and then organized by correspondent, so both chronological and alphabetical.

Well, if there’s one thing I can tell you that is simple and clear, it’s that the clerks at the Roland Park Company definitely organized things chronologically. That’s the good news! The bad news is that within each year they used a system that you probably haven't seen before (with nuances that even I hadn't seen before!).

Did you read my very first blog post? In it, I gave the example of one box that was described as both “Letters 662-970” and “General correspondence A-Z.” Well, A-Z makes sense, but what did 662-970 mean and how can it be the same thing?

Well, it turns out that the clerks used a numerical filing system, which is a system where a concordance (a list of numbers like an index) is assigned to topics. So, say you filed all your credit cards under the number 5, all your tax returns under 10, and all letters from your cousin under 432. This doesn’t make sense on a personal scale, but it does make sense when a huge company corresponds with over 1,000 entities. So somewhere in the Roland Park office there was a list of almost 1,000 numbers, and each number represented a correspondent.

Here is an original finding aid for the collection (archivist nerd joy!), used by the clerks to find correspondence files in filing cabinets in their basement. You see that they started using this numerical system in 1902.

This is a single page from a document titled “Index of Books, Letters, Vouchers, etc. filed in the Basement of 4810 Roland Avenue,” written circa 1933.


The hand-written note at the top of this document instructs the clerk to file this under number 882. Tellingly, the second hand-written note at the bottom shows an insistence on proper filing, which is good for us!

In the first half of this blog entry I talked about the importance of keys in relational databases; in other words, the important link between two kinds of information. In this case, there is [a numbered document], shown above, and an unknown [topic or correspondent]. The link/key between them is the number [882].

Are you ready for the really bad news? There is no key!! For whatever reason, the list of 1,000 numbers and what they mean did not survive, and so huge parts of the Roland Park Records are numbered, but we don’t know what those numbers mean! This paper database does not have a key! As a Real Life Information Professional, my suggestion is to panic!

So of course, some hope does exist. In trying to figure out this system I started a spreadsheet that attempts to re-create or de-code the numerical key. I didn’t come close to finishing, but more importantly I determined that the system is consistent and helpful once you figure it out.

This is a selection of samples from my concordance. Here are some important observations: #68: The number stayed the same even when the company changed names, and was consistent for 30 years! #84: The topic changed, but once it did it was stable for over 15 years. #454: The number again stayed the same even when the company changed names.

So there remains one unanswered question: how can “Letters 662-970” and “General correspondence A-Z” be the same thing? The answer is that the Roland Park Company clerks filed correspondence first by year (chronologically), then by both number (numerical) and by letter (alphabetical).  It kinda looks like this:

The numbers 1-100, filed A-Z. Then the numbers 101-200, filed A-Z, then 201-300, and on.

Confused? Don’t worry, you should be. But, I can promise future researchers that once you get elbow-deep in the files, and read the painfully long explanation of this system provided both here and in the finding aid, you will find that while complex, the numerical system really is dependable and useful.

Doesn’t this just make you appreciate real databases all the more?

A Paper Database Part 1: Understanding Relational Databases

Part of a monthly series of posts highlighting uncovered items of note, and the archival process brought to bear on these items, as we preserve, arrange, and describe the Roland Park Company Archives.

We all know the word “database.” We definitely, definitely know the word database. We know (and often rely on the fact) that our information is in databases: our credit card information, our consumer information, medical records, social security number, phone number, email address(es), driving records, insurance information, purchases, everything. When you call customer service anywhere to do anything, they are usually pulling you up in their database. When you log into a site that stores information on you, like Amazon or your cell phone provider (or every login website ever), the web interface is pulling its data from a database.

So databases are everywhere, and our digital world is one hundred percent reliant on them and the information they store for us. They’re pretty awesome, when they aren’t pretty scary. So why am I mentioning it?

Well, because I want to talk about how they work. One very important part of databases is that most are relational. That means that information stored in two different places (often called tables, within the database) can relate. To relate, they need a key. So hold on, I’ll explain.

If you have a bank account and a credit card with the same bank, you know that when you call you can ask the customer service person about either one. But how, exactly, does the database know that the two accounts are linked? This may seem obvious, but give it a thought. It’s because there’s a piece of information in common between the two, like the fact that you gave your social security number to open both accounts. Your SS# is the unique key that the database uses to know that the Holmes, Sherlock that opened the credit card is the same Holmes, Sherlock that has a bank account. The key (SS#) links them, thus the bank’s database is relational.

The image below is a partial screenshot from a real Access database displaying how the database understands the relationship I just described:

So what does this have to do with the Roland Park Company Records? It has a lot to do with them, and if you think you will ever be interested in using the records, then you most definitely need to read "A Paper Database Part 2: Decoding the Key to the Roland Park Records."

But I’ll leave you with one thought: what if the link in the above graphic disappeared? What happens when there is no key?

Hopkins Retrospective

click to enlarge

Did you know there is a Tumblr site devoted to promoting Hopkins history? Leading up to Alumni Weekend on April 11-13, we will be posting photographs with captions commemorating earlier classes, particularly the Classes of 1954, 1959, 1964, 1969, 1974, 1979, 1984, 1989, 1994, 1999, 2004 and 2009. These photos will come from the yearbooks and from the Archives’ photograph collection. But, prior to that, we wanted to give you a taste of our history from earlier years, including our original campus downtown. We will continue using this site after Alumni Weekend to promote our unique history. Take a look, enjoy, and let me know if you have any questions.

Journal Article Impact III: Altmetrics

Now you know how to figure out how often your journal article has been cited. Other nagging questions include: How many people read the article but didn't cite it? And what if your article isn't pure research? What if it's more important to educators, policy makers, clinicians, or other practitioners? They would also read your article but not necessarily cite it. What if your article is picked up by the news media? Or there's a discussion on a blog about it? None of these 'impacts' are included in the scholarly citation count. How can this kind of use be measured or captured?

This is where altmetrics comes in. Altmetrics (alternative bibliometrics, get it?) focuses on social and news media, rather than the scholarly literature. The different services and publishers will cover slightly different mixes of blogs, Twitter accounts, news media, and sharing sites like Mendeley, figshare, and GitHub. A good overview of altmetrics is provided by Robin Chin Roemer and Rachel Borchardt. There are several services that will provide some of these numbers for you; a few are listed below.

  • Altmetric offers several commercial products that let you monitor and display how frequently an article has been mentioned in social media or the news. Publishers that use the Altmetric service include Elsevier, BioMed Central, and Nature Publishing Group.
  • ImpactStory is a nonprofit that lets you build a public profile based on your publications. Their data is open source.
  • PLoS is an example of a publisher that provides their own article level metrics. They provide article views, HTML page views, as well as PDF and XML download numbers for each article. Mentions on Wikipedia, Google+, blogs, and Twitter are included, as are links to services which provide the more traditional citation numbers.

And that's the end of this short series on journal article impact. We covered rules of thumb, citations, and altmetrics. If you're interested, there's a Scholarly Metrics guide with more information about other metrics. As always, your librarians are happy to discuss these topics with you.

NEW! Research Clinics!

Have you ever gotten stuck with zero results trying to find articles on a topic on which you know articles exist? Or, stuck with too many articles that are completely unrelated to what you need? Have you felt your brain frying as you search all over the web to find one simple statistic? Ever felt uneasy about a few citations you added? Worse, have you had these or other research related questions way too close to the deadline to feel comfortable asking your professor? Could you use an ear and a little push in order to get started? Would you like a blueprint to follow to the finish?

Research Clinics to the rescue! The library will hold Research Clinics, a sort of triage unit for papers and projects, on Sunday evenings at 7pm throughout the spring semester. This is a no pressure environment for students to work and get help when they need it. No question is too big or too small. Students are encouraged to just show up with what you're working on and what you need help with, and we'll take if from there. Fellow students who have been specially trained in research skills will be on hand to help, along with a librarian or two. There is no formal instruction, only one on one research guidance. We can help you narrow your topic to something do-able and findable. We can point you in the right direction to find statistics, data, and citation information. Even better, we can help you navigate the library website and databases to get to the right place to find better information. Students are encouraged to stay and work or simply drop in and out with questions.

Research Clinics will be held on the following Sundays at 7pm in Eisenhower in the Electronic Resource Center (ERC)

  • March 9
  • March 23
  • April 6
  • April 20
  • April 27

Web of Science Has Changed

One tool that will tell you how many times an article has been cited, or who cited a particular article, is Web of Science. It has a new platform, so here are a few helpful tips.

You land on the Basic Search page:

WoS search box 2014

  • You can add more rows by clicking "Add Another Field."
  • Use the drop-down boxes on the right to specify what you're searching for -- choose title words, author, etc.

WoS cited ref search 2014

Go to the Cited Reference Search page by clicking the drop-down arrow next to Basic Search.

WoS top right tools 2014

More tools for you on the top right of the screen:

  1. Sign In -- Register so that you can save searches or export citations to EndNote
  2. Help -- How to save your settings and create alerts
  3. Languages -- Click the word "English" to change language
  4. My Tools -- Choose EndNote, ResearcherID, or Saved Searches and Alerts
  5. Search History -- See what searches you have done during this session
  6. Marked List -- This shows you how many citations are in your "marked list"

Here is a quick-reference page. And here are tutorials about cited reference searching, exporting records, and other topics.

Finally, big news: Google Scholar search results link directly to Web of Science citations, and Web of Science citations link directly to corresponding full text in Google Scholar:   

Wos in Scholar 2014

Google Scholar linking to Web of Science

Scholar in WoS 2014

Web of Science linking to Google Scholar

Ask your librarian if you have more questions.

A Student Exhibition at Homewood

lynne1First, there was the field trip to Clifton. Then, there was the classwork. And finally by the end of Fall 2013, the students of Ms. Authur’s Museums and Society class had researched and prepared the material for their exhibition entitled "A Tale of Two Houses: Homewood, Clifton & Historic Preservation." They then installed maps, photos, and objects – many from Special Collections -- and gave an opening gallery talk about their work and findings. Now is the time to visit the Homewood House and view the results of their labors.


The Thompsons

The exhibition introduces visitors to the Carrolls who owned Homewood House, the Thompsons who built Clifton which finally became the summer home of Johns Hopkins.

Where are the homes located? Students chose maps from the Sheridan Libraries’ collections to put the story of the two houses in context. The visitor can see how the city grew and eventually surrounded the houses.

The Carrolls

lynne5In the back entry way, there are objects such as an old chair, a silver chocolate pot, and a copy of The American Register with a rare signature of Charles Carroll, Jr.

An exciting section of the exhibition displays the power of paint analysis. The original analysis of paint colors during the 1980s restoration showed that the color of the decorative trim was green. With the paint analysis that was finished in the fall, the outcome was very different. The current lynne6analysis shows that the colors were Naples Yellow with Prussian Blue trim.


Come, enjoy and learn about the Thompsons, the Carrolls, and Mr. Johns Hopkins and their summer villas. The exhibition will be open until May 25, 2014.

ArcGIS Workshops Resume!

GIS Blog IMG2The Sheridan Libraries GIS and Data Services Department is resuming its popular series of workshops, "Getting Acquainted With ArcGIS." We welcome all to attend: the curious, the besieged, the beset, frequent GIS users, and novices, too!

The workshops are being held on Tuesdays from 4:00 to 5:30 p.m. in on A Level of MSE Library. The series begins with "Introduction to ArcGIS" and progresses to more powerful aspects of the Esri software.

You can now download the free software from our department’s “Maps and GIS” Library Guide, and access data there, too.

Our “Data and Statistics” LibGuide has further help with data (“data” and “statistics” are not necessarily the same!) and its sources, file formats, citations, and other resources.

Free workshops? Got ‘em. Free software? Got it. Free individual consultation and one-on-one help? We got that, too! For GIS and maps inquiries, contact Bonni Wittstadt. For help with datasets, contact Jen Darragh. You can always shoot us an email, at GISandData@jhu.edu. Or, stop by and visit!

Happy Birthday, Arthur Schopenhauer!

Happy, happy!

The great German philosopher, Arthur Schopenhauer, was born this day in 1788 in Danzig (now Gdansk, Poland), the son of a wealthy merchant, Heinrich Floris Schopenhauer. Young Arthur was unhappily destined to follow his father into a career in commerce, when in fact he wanted nothing more than to study at the university and become a scholar. Upon the untimely death of his father, thought by some to have been a suicide, Schopenhauer, his mother Johanna, and his sister Adele, were left with funds sufficient, if managed prudently, to support them. And Schopenhauer was thus free to pursue his dream of study, which he did, variously studying medicine and philosophy at Göttingen, then Berlin, and ultimately earning his doctoral degree in philosophy in 1813 from the University of Jena.

While still in his twenties, Schopenhauer wrote the first edition of his masterwork, Die Welt als Wille und Vorstellung, frequently translated as The World as Will and Representation. Schopenhauer took as his starting point our experience of our own selves and bodies. I experience my body in two different ways, what might be called an objective perspective and a subjective experience. I can look at my arm, for example, and view it as an object inhabiting a world along with other objects. But I also experience that arm right there as being my arm. (We know this experience occurs not only because it's immediately obvious, but because there are cases where humans have lost this sense of bodily ownership, what the neurologists call "proprioception." See the strange and terrifying case study of this in Oliver Sacks' The Man Who Mistook His Wife For a Hat.) So I experience the world in two different ways, an outer, objective way, and an inner, subjective way. My objective experience is governed by the laws of physics, chemistry, biology, mathematics, logic, etc., boiled down to what Schopenhauer and others before him called "the principle of sufficient reason." The subjective experience, however, is more elusive, it is not governed by this rational principle and is oftentimes downright irrational or more accurately, arational. My motivations, desires, strivings for survival, and swirling inner experiences Schopenhauer calls "Will," a notion that seems anachronistic, but in terminology only when considered in light of some of the tenets of contemporary cognitive science and contemporary sociobiology and evolutionary psychology.  (The philosopher and Schopenhauer scholar, Julian Young, went so far as to state that Schopenhauer "deserves to be regarded as the father, or at least the grandfather, of both disciplines.")

Schopenhauer's work was not recognized for some time, and his professorial career was a wreck. He once purposely scheduled his own lectures to coincide with those of his rival, Georg Wilhelm Friedrich Hegel. Students flocked to Hegel's lectures, and Schopenhauer's star sunk.

His private life was no better, filled with conflict and strife, fueled no doubt by Schopenhauer's curmudgeonly personality and misogynist manner. His relationship with his mother Johanna, who had become a sparkling socialite and famous popular novelist, was fraught. And he once, in a violent fit, threw a woman to the floor for talking too loudly outside the door of his apartment. He was ordered to pay her a monthly sum for the rest of her life. Upon learning years later of her death, he wrote on the death notice, "obit anus abit onus" -- "the old woman has died; the burden has lifted."

Despite his difficult personality, the ethical side of his philosophy, a side derived from his metaphysics of the world as Will, can be called an ethics of compassion. If I am experiencing my own inner striving, sufferings, and turmoil, then upon pain of solipsism, I must assume that you are too. We're all experiencing this! One of Schopenhauer's favorite adages was from the Hindu Sanskrit: "tat tvam asi" -- "this thou art." Differentiation follows from the principle of sufficient reason, our Will is not governed by that principle, so our Will is not differentiated. If our Will is not differentiated, then we are one. If we are one, then compassion for our fellows, a recognition of our oneness, becomes the highest ethical value.

In later years, Schopenhauer began to be recognized and appreciated, which he enjoyed immensely. His more technical philosophical work was supplemented by essayistic examinations of the human condition. These were eventually collected in his Parerga und Paralipomena, a work that is frequently mined for gems and republished in various collections of Schopenhauer's essays and aphorisms, most of them accessible, readable, enjoyable, and wise.

The stamp of Schopenhauer's thought can be found in the works of Nietzsche, Freud, and Wittgenstein.

The stamp of Schopenhauer's thought can be found on the works of Nietzsche, Freud, and Wittgenstein.

Until the past few decades, Schopenhauer's influence was not really appreciated in the Anglo-American scholarly world. His influence on such figures as Friedrich Nietzsche, Sigmund Freud, and the early Ludwig Wittgenstein was known, but not appreciated. Recent years, however, have witnessed an increased interest in Schopenhauer. In 2007, a new English translation of the first volume of The World as Will and Representation appeared, translated by University of Tennessee philosopher, Richard Aquila; volume two appeared in 2010.  Likewise, another translation, by Judith Norman, Alistair Welchman, Christopher Janaway, appeared in 2010. Last year, University of Wisconsin-Whitewater philosopher and renowned Schopenhauer scholar, David E. Cartwright, published Schopenhauer: A Biography, "the first comprehensive biography of Schopenhauer written in English" and a solid example of a superlative intellectual biography.


From Wikimedia Commons:  http://upload.wikimedia.org/wikipedia/commons/8/84/ACJziegfeld_cake.jpgOn a more mundane note, as I ponder Schopenhauer's philosophy and meditate on the nature of Oneness, I can't help but think that my darling step-daughter really should not get upset with me when she returns home from work only to discover that I've eaten the leftover dessert she brought home from The Cheesecake Factory. I mean, if our Wills are undifferentiated, then we are One, and if we are One, then me eating that cake is really just like her eating that cake. "Was it good?" she might ask, miffed. "You tell me!" I'd no doubt reply.

I think you see my reasoning.

I know Schopenhauer would.