From using GPS to navigate traffic to apps tailoring selections based on location, geospatial data is integral to our daily lives. Researchers also increasingly use geospatial data. For example, in a recent article featuring several Johns Hopkins researchers, geospatial data was used to understand sub-surface features of an archaeological site in modern-day Oman.

The above images show how geospatial data is used to locate sub-surface features at an archaeological site in Oman. Both images are taken from Wiig et al. 2018[1] and use DLR’s TanDEM-X satellite (A: Summed stack of 10 VV images and B: Summed stack of HH images).
Currently, I am a Council on Library and Information Resources (CLIR) Postdoctoral Fellow in the Data Services unit of the Sheridan Libraries. I came into this fellowship from outside the library world – my Ph.D. is in Anthropology. I have a background in geospatial methods and I have used geospatial data in my research.

CLIR Fellows from around the world (Image used with permission from CLIR).

Since we generate and use geospatial data so frequently in life and our work – I am currently considering, ‘How do we find geospatial data?

When reflecting on my research process, my methods of finding geospatial data were serendipitous. I discovered new data through advice from my professors, suggestions from colleagues, or stumbling across a dataset through a peer-reviewed publication.

This image depicts typical methods of finding geospatial data.

As our collective use of geospatial data continues to increase, so has the effort to streamline the process of digitally finding geospatial data. In the information science world, specifically in the context of academic libraries and higher education institutions, three general thematic areas help users find geospatial data: discovery, access, and preservation.[2]

This is a service blueprint displaying the various components you might interact with when using a geospatial data. For more information on service design blueprints, check out this article from the Nielsen Norman Group.

The concept of discovery is the ‘process of searching, locating, and retrieving data from a file or database.’ In other words, ‘How we locate data.’

Geoportals are a front-end for the discovery of geospatial data. Perhaps the most well-known backend technology supporting the creation of geoportals from a North American information science context is GeoBlacklight. GeoBlacklight, based on the widely-adopted discovery platform Blacklight, is an open-source, multi-institutional platform that endeavors to help users discover geospatial data. GeoBlacklight is an open source application and both users and institutions can adapt it to fit their needs.

The geospatial data service blueprint highlighting the concept of discovery.

The concept of access is ‘to use or engage with information, often using electronic resources.’ In other words, ‘How we get data.’

Access technologies are an essential component of retrieving geospatial data. They serve as the go-between when we try to locate data and where it is stored. GeoServer is a popular open source server that provides options enabling users to view, edit, and share geospatial data in a variety of formats via a local, or cloud-based, server.

The geospatial data service blueprint highlighting the concept of access.

The concept of preservation includes ‘activities associated with maintaining library and archival materials for use, either in their original physical form or in some other usable way.’ In other words, ‘How we maintain data.’

Repositories are a typical space for preservation. Traditionally, they were ‘a place where archives, manuscripts, books, or other documents are stored.’ The scope of repositories has widened to include digital repositories aiming to preserve digital data. Samvera (formerly known as Hydra) is a popular open-source digital repository software that is used by many institutions to preserve geospatial data.

The geospatial data service blueprint highlighting the concept of preservation.

While discovery, access, and preservation are distinct concepts, they work together to create an ecosystem that aims to connect users with the data they seek. Without the ability to discover data, no one will know they could access specific datasets. Without access, discovery is meaningless because users cannot get to the data or might not understand how to use it. Finally, without preservation, there would be no data to discover or access. An important underlying component of this ecosystem that fosters all three concepts is metadata.

The geospatial data ecosystem

Metadata is ‘information used to describe a work to enable discovery and use.’ In other words, it is ‘information about data.’

For geospatial data, metadata is crucial for discovery and preservation; thus, facilitating access. Metadata describes geospatial data, which allows users to know important details like what the data is and how it was created. ISO and FGDC are entities that facilitate broader geospatial metadata standards (e.g., ISO 19115). The more detailed the metadata, the better the chances that users will find it and reuse it. Indexes, like SOLR, are often needed to make it easier for users to search for geospatial data.

Quick notes about metadata

While there is an increase in efforts toward developing digital mechanisms and ecosystems for finding geospatial data, there are also significant challenges.

This post lays out the interconnections amongst discovery, access, and preservation, but there can be barriers to connectivity across these concepts. For example, an institution might have a repository, but lack a geoportal. Thus, while data remains preserved, discovery capabilities are weakened.

A recent usability study [3] found that users had trouble finding geospatial data in a geoportal if they were not experienced ‘searchers’ – even if they were knowledgeable about geospatial data. Thus, if the user lacked searching skills (e.g., via research or even online shopping!), then the geoportal was challenging to use. Educating users on best practices for searching for geospatial data might be a necessary step to alleviating this challenge.

Generating metadata can be a challenge as many researchers might not know the necessity of factoring in this step as they create their geospatial datasets. I was guilty of this during my doctoral research. Without metadata, the context and potential reuse of geospatial data are compromised. With the increased demand by funding agencies for data management plans, factoring in metadata creation might emerge as an essential step in research.

Finally, the various geospatial discovery, access, and preservation technologies mentioned in this post are not ‘out-of-the-box’ solutions. Currently, they require a substantial amount of time, personnel, and effort to transform into a form that serves user and institutional needs. The necessary adaption required of these technologies can create gaps and lags in geospatial data services. We are at a critical juncture – will standardized ‘out-of-the-box’ solutions emerge OR will institutions continue to create custom solutions OR a combination of both?

Challenges of finding geospatial data

While these are not the only challenges surrounding finding geospatial data, they are prominent in my mind as I continue this fellowship. While I certainly do not have a panacea for finding geospatial data, I do view these challenges as opportunities to connect with users and contribute to the creation of solutions that promote a balance amongst geospatial data discovery, access, and preservation.

Smiti Nathan is CLIR Postdoctoral Fellow in Geospatial Data Discovery, Access, Management, and Curation. She received her PhD in Anthropology (Archaeology) from New York University and her MSc in GIS and Spatial Analysis in Archaeology from the University College London. For more on her research, check out her ORCID. She is also the founder and editor of the website Habits of a Travelling Archaeologist, where she publishes posts on archaeology, travel, and academic productivity.

Acknowledgments: This post benefited from feedback from Reid Boehm, Meredith Shelby, Matthew Sisk, and Mara Blake.

Footnotes:

[1] Wiig, Frances, Michael Harrower, Alexander Braun, Smiti Nathan, Joseph Lehner, Katie Simon, Jennie Sturm, et al. 2018. “Mapping a Subsurface Water Channel with X-Band and C-Band Synthetic Aperture Radar at the Iron Age Archaeological Site of ‘Uqdat Al-Bakrah (Safah), Oman.” Geosciences 8 (9): 334. https://doi.org/10.3390/geosciences8090334.

[2] Definitions of these concepts are coming from an information science context and can be found in the following reference: Levine-Clark, Michael, and Toni M. Carter, eds. 2013. ALA Glossary of Library and Information Science. 4th ed.

[3] Blake, Mara, Karen Majewicz, Amanda Tickner, and Jason Lam. 2017. “Usability Analysis of the Big Ten Academic Alliance Geoportal: Findings and Recommendations for Improvement of the User Experience.” Code4Lib Journal, no. 38.