Gateways 2019 has ended
Back To Schedule
Wednesday, September 25 • 1:20pm - 1:40pm
EarthCube Data Discovery Studio, an integration of a semantically enhanced cross-disciplinary catalog with JupyterHub to enable an analytical workbench

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

EarthCube Data Discovery Studio (DDStudio) works to integrate resources described by metadata with analytical platforms. DDStudio has harvested over 1.6 million metadata records from over 40 sources, enhanced them via an augmentation pipeline, created a catalog, and provided an interface that allows users to explore the data via Jupyter Notebooks. DDStudio utilizes a scalable metadata augmentation pipeline designed to improve and re-index metadata content using text analytics and an integrated geoscience ontology. Metadata enhancers automatically add keywords and related ontology references that describe science domains, geospatial features, measured variables, equipment, geoscience processes, and other characteristics, thus search and discovery of semantically indexed datasets. In the pipeline, we enhance spatial and temporal extents, and organization identifiers, enabling faceted browsing by these parameters. The pipeline also generates provenance for each enhanced metadata document, publishes the metadata using schema.org markup, lets users validate or invalidate metadata enhancements, and enables faceted search. Users are permitted to upload metadata descriptions for resources not already in the catalog and have them immediately available within the search interface. DDStudio and the Jupyter Hubs are loosely coupled and communicate via a simple interface we call a dispatcher. Users can search for datasets in DDStudio by utilizing text, search facets, and geospatial and temporal filters. Researchers can collect records of interests into collections, save the collections for further use, and share collections of resources with collaborators. From DDStudio, users can launch Jupyter notebooks residing on several JupyterHubs for any metadata record, or a built collection of metadata records. The dispatcher seeks to identify appropriate resources to utilize in visualization, analysis or modeling, thus bridging resource discovery with more in-depth data exploration. Users can contribute their own notebooks to process additional types of data indexed in DDStudio. DDStudio demonstrates how linking search results from the catalog directly to software tools and environments reduces time to science in a series of examples from coral reef and river geochemistry studies. DDStudio has worked with SGCI to enhance its process and utility with centralized authentication, security analysis, and outreach to user communities. URL: datadiscoverystudio.org

Wednesday September 25, 2019 1:20pm - 1:40pm PDT
Toucan Room, Catamaran Resort