From using GPS to navigate traffic to apps tailoring selections based on location, geospatial data is integral to our daily lives. Researchers also increasingly use geospatial data. For example, in a recent article featuring several Johns Hopkins researchers, geospatial data was used to understand sub-surface features of an archaeological site in modern-day Oman.
Currently, I am a Council on Library and Information Resources (CLIR) Postdoctoral Fellow in the Data Services unit of the Sheridan Libraries. I came into this fellowship from outside the library world – my Ph.D. is in Anthropology. I have a background in geospatial methods and I have used geospatial data in my research.
Since we generate and use geospatial data so frequently in life and our work – I am currently considering, ‘How do we find geospatial data?’
When reflecting on my research process, my methods of finding geospatial data were serendipitous. I discovered new data through advice from my professors, suggestions from colleagues, or stumbling across a dataset through a peer-reviewed publication.
As our collective use of geospatial data continues to increase, so has the effort to streamline the process of digitally finding geospatial data. In the information science world, specifically in the context of academic libraries and higher education institutions, three general thematic areas help users find geospatial data: discovery, access, and preservation.
The concept of discovery is the ‘process of searching, locating, and retrieving data from a file or database.’ In other words, ‘How we locate data.’
Geoportals are a front-end for the discovery of geospatial data. Perhaps the most well-known backend technology supporting the creation of geoportals from a North American information science context is GeoBlacklight. GeoBlacklight, based on the widely-adopted discovery platform Blacklight, is an open-source, multi-institutional platform that endeavors to help users discover geospatial data. GeoBlacklight is an open source application and both users and institutions can adapt it to fit their needs.
The concept of access is ‘to use or engage with information, often using electronic resources.’ In other words, ‘How we get data.’
Access technologies are an essential component of retrieving geospatial data. They serve as the go-between when we try to locate data and where it is stored. GeoServer is a popular open source server that provides options enabling users to view, edit, and share geospatial data in a variety of formats via a local, or cloud-based, server.
The concept of preservation includes ‘activities associated with maintaining library and archival materials for use, either in their original physical form or in some other usable way.’ In other words, ‘How we maintain data.’
Repositories are a typical space for preservation. Traditionally, they were ‘a place where archives, manuscripts, books, or other documents are stored.’ The scope of repositories has widened to include digital repositories aiming to preserve digital data. Samvera (formerly known as Hydra) is a popular open-source digital repository software that is used by many institutions to preserve geospatial data.
While discovery, access, and preservation are distinct concepts, they work together to create an ecosystem that aims to connect users with the data they seek. Without the ability to discover data, no one will know they could access specific datasets. Without access, discovery is meaningless because users cannot get to the data or might not understand how to use it. Finally, without preservation, there would be no data to discover or access. An important underlying component of this ecosystem that fosters all three concepts is metadata.
Metadata is ‘information used to describe a work to enable discovery and use.’ In other words, it is ‘information about data.’
For geospatial data, metadata is crucial for discovery and preservation; thus, facilitating access. Metadata describes geospatial data, which allows users to know important details like what the data is and how it was created. ISO and FGDC are entities that facilitate broader geospatial metadata standards (e.g., ISO 19115). The more detailed the metadata, the better the chances that users will find it and reuse it. Indexes, like SOLR, are often needed to make it easier for users to search for geospatial data.
While there is an increase in efforts toward developing digital mechanisms and ecosystems for finding geospatial data, there are also significant challenges.
This post lays out the interconnections amongst discovery, access, and preservation, but there can be barriers to connectivity across these concepts. For example, an institution might have a repository, but lack a geoportal. Thus, while data remains preserved, discovery capabilities are weakened.
A recent usability study  found that users had trouble finding geospatial data in a geoportal if they were not experienced ‘searchers’ – even if they were knowledgeable about geospatial data. Thus, if the user lacked searching skills (e.g., via research or even online shopping!), then the geoportal was challenging to use. Educating users on best practices for searching for geospatial data might be a necessary step to alleviating this challenge.
Generating metadata can be a challenge as many researchers might not know the necessity of factoring in this step as they create their geospatial datasets. I was guilty of this during my doctoral research. Without metadata, the context and potential reuse of geospatial data are compromised. With the increased demand by funding agencies for data management plans, factoring in metadata creation might emerge as an essential step in research.
Finally, the various geospatial discovery, access, and preservation technologies mentioned in this post are not ‘out-of-the-box’ solutions. Currently, they require a substantial amount of time, personnel, and effort to transform into a form that serves user and institutional needs. The necessary adaption required of these technologies can create gaps and lags in geospatial data services. We are at a critical juncture – will standardized ‘out-of-the-box’ solutions emerge OR will institutions continue to create custom solutions OR a combination of both?
While these are not the only challenges surrounding finding geospatial data, they are prominent in my mind as I continue this fellowship. While I certainly do not have a panacea for finding geospatial data, I do view these challenges as opportunities to connect with users and contribute to the creation of solutions that promote a balance amongst geospatial data discovery, access, and preservation.
Smiti Nathan is CLIR Postdoctoral Fellow in Geospatial Data Discovery, Access, Management, and Curation. She received her PhD in Anthropology (Archaeology) from New York University and her MSc in GIS and Spatial Analysis in Archaeology from the University College London. For more on her research, check out her ORCID. She is also the founder and editor of the website Habits of a Travelling Archaeologist, where she publishes posts on archaeology, travel, and academic productivity.
 Wiig, Frances, Michael Harrower, Alexander Braun, Smiti Nathan, Joseph Lehner, Katie Simon, Jennie Sturm, et al. 2018. “Mapping a Subsurface Water Channel with X-Band and C-Band Synthetic Aperture Radar at the Iron Age Archaeological Site of ‘Uqdat Al-Bakrah (Safah), Oman.” Geosciences 8 (9): 334. https://doi.org/10.3390/geosciences8090334.
 Definitions of these concepts are coming from an information science context and can be found in the following reference: Levine-Clark, Michael, and Toni M. Carter, eds. 2013. ALA Glossary of Library and Information Science. 4th ed.
 Blake, Mara, Karen Majewicz, Amanda Tickner, and Jason Lam. 2017. “Usability Analysis of the Big Ten Academic Alliance Geoportal: Findings and Recommendations for Improvement of the User Experience.” Code4Lib Journal, no. 38.