Gateways 2019 has ended
Back To Schedule
Wednesday, September 25 • 10:50am - 11:10am
Streamed Data via Cloud-Hosted Real-Time Data Services for the Geosciences as an Ingestion Interface into the Planet Texas Science Gateway and Integrated Modeling Platform

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

By the year 2050, the state of Texas is forecast to increase in population from 28 million to nearly 55 million residents. As a result, the effects of present utilization in the sustainability of natural resources (water, energy, and land-use) must be modeled and made available to policymakers. The Planet Texas 2050 (PT2050) project is designed to address knowledge and information needed to inform and support resilient responses in the face of identified vulnerabilities.

The DataX Science Gateway is in development as part of the PT2050 initiative, to provide a platform through which scientists, data analysts, and policymakers collaborate to generate cross-disciplinary environmental models. The scientists and analysts creating the hybridized models will have unique access to both datasets, workflow generation tools, and collaborators historically partitioned across disciplines. The DataX Gateway enables the ingestion, data transformations and composition of integrated models. Core capabilities within the data portal include tools for assimilating disparate datasets, pre-processing data sources for inclusion in integrated models, and sharing through the community with access to large scale resources including storage, and computational capabilities at the Texas Advanced Computing Center.

Generally, integrated models use static datasets. The purpose of this research was to explore a method by which real-time in-situ environmental edge monitoring systems could stream data into backend models for processing. The real-time data serves as a ground truth source of information for models and expands the spectrum of possible use cases the DataX Gateway could support. The Cloud-Hosted Real-time Data Services for the Geosciences project, funded by the EarthCube program at NSF, was implemented within the DataX platform from an edge sensor point of view. Non-standard utilization of the application programming interface (API) for the ingestion of prior/non-streamed datasets was also addressed as a possible use case. Future work aims to create a data streaming to data frame workflow as an approach for connecting real-time or near real-time data with integrated models at scale. Challenges include addressing authentication and data confidentiality for potential users, as well as data collection at scale limitations.

Early implementation and testing of data streaming in the gateway has demonstrated that the capabilities of the API exceed standard data streaming. When viewed as a core service, CHORDS becomes a method by which datasets can be added to the DataX platform while providing both standardized geoscience naming schemes as well as direct pipelines into integrated model workflows.

Wednesday September 25, 2019 10:50am - 11:10am PDT
Toucan Room, Catamaran Resort