Loading…
Gateways 2019 has ended
Concurrents A [clear filter]
Tuesday, September 24
 

10:30am PDT

Enabling rich data sharing for Science Gateways via the SeedMeLab platform
Science Gateways provide an easily accessible and powerful computing environment for researchers. These are built around a set of software tools that are heavily used by large research communities in specific domains. Science Gateways have been catering to a growing need of researchers for easy to use computational tools, however, their usage model is typically single user-centric. As scientific research becomes ever more team-oriented, the need for integrated collaborative capabilities in Science Gateways has been emerging. One such need is the ability to share data/results with others. In this article, we will describe and discuss our effort to provide a rich environment for data organization and sharing by integrating the SeedMeLab platform with two Science Gateways: CIPRES and GenApp.


Tuesday September 24, 2019 10:30am - 10:50am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:50am PDT

iReceptor: A case study in the importance of standards for data sharing
Next-generation sequencing (NGS) allows the characterization of the adaptive immune receptor repertoire (AIRR) in exquisite detail. These large-scale AIRR-seq
data sets have rapidly become critical to vaccine development, understanding the immune response in autoimmune and infectious disease, and monitoring novel therapeutics against cancer. Over the past five years, a grass roots, international community (the AIRR Community - www.airr-community.org) has been working towards establishing standards and recommendations for obtaining, analyzing, curating and comparing/sharing NGS AIRR-seq datasets. Using these standards, the AIRR Community Common Repository Working Group (CRWG) is working towards establishing an international network of AIRR-seq repositories whose data are findable, accessible, interoperable, and reusable (FAIR).

The iReceptor Data Integration Platform (gateway.ireceptor.org) provides an implementation of the AIRR Data Commons envisioned by the AIRR Community. The iReceptor Scientific Gateway links distributed (federated) AIRR-seq repositories,
allowing sequence searches or repertoire metadata queries across multiple studies at multiple institutions, returning sets of sequences fulfilling specific criteria. The data standards developed by the AIRR Community are at the foundation of our ability to implement such a platform. In this paper we use iReceptor as a case study that considers the importance of standards for effective data sharing.

The short paper will discuss the process that the AIRR Community went through to establish its working groups and the standards those working groups produced. This will include discussions of the Minimal Information for AIRR-seq data (MiAIRR), the Standardized Representations for Annotated Immune Repertoires, and the emerging AIRR Data Commons Web API. Each of these standards will be discussed in the context of the iReceptor Platform terms of its importance to the platform's implementation as well as its expected usefulness to the scientific community.


Tuesday September 24, 2019 10:50am - 11:10am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:10am PDT

Purdue University Research Repository - adapting when small data gets bigger
PURR was founded in 2011 as a partnership between Purdue University Libraries, Information Technology at Purdue (ITaP), and the Office of the Executive Vice President for Research as campus-wide support for researchers throughout the data management lifecycle built on the HUBzero® platform, which was developed at Purdue. PURR provides the tools and expertise to help researchers plan for data management, share data with collaborators, publish completed datasets in compliance with federal funding guidelines, safely archive data, and track data publication impact. Every PURR user has access to private space for storing and sharing research data files. When research is completed, PURR takes users through a step-by-step process for selecting and describing data files for publication. Upon publication, PURR mints a DOI for each dataset, and provides archiving services through the MetaArchive network. All published datasets are maintained and accessible on the PURR website for at least 10 years. After which time, they will be reviewed by the libraries and could be decommissioned or moved to library archives.

Over the past eight years, PURR has published 975 datasets, and served over 3,600 researchers with 481 grant awards. In that time, PURR’s services have grown along with the HUBzero® platform to meet the changing needs of the Purdue community as researchers across all fields produce more data. Supporting larger datasets requires a multi-faceted approach far beyond simply acquiring additional storage space. Our recent development has followed a 5-pronged plan: 1) increased storage quotas, 2) new publication series functionality, 3) an online database viewer, 4) publication file preview, and 5) seamless ftp transfers for large publications. Combined, these improvements ensure our increasingly large data publications are not only stored safely, but also are accessible over the long term.

The newly published Rough Cilicia Survey Pottery Study dataset series illustrates both the motivation for and the results of PURR’s recent development. The culmination of four years of close collaboration between PURR’s data curator and a faculty member from Purdue’s classics department, the Rough Cilicia collection is composed of 25 datasets. The collection takes advantage of PURR’s series functionality, which allows authors to separate large data collections into smaller, more manageable, related subsets. These subsets are easier to download than the entire collection, and each subset has a DOI for precise citation. This series makes available images of hundreds of pottery sherds from the ancient Cilicia region of modern-day Turkey, and their associated descriptive information in a series of interactive data tables that allow the user to view, search, and filter data on the PURR website. Users can also download the data files for closer study and reuse. At about 15 GB, the Rough Cilicia series is not exactly “big data,” but it is large enough to stretch the limits of a web-based repository like PURR, and we are increasingly seeing datasets of this size or more. Moderate improvements like the five mentioned here allow us to publish larger datasets while maintaining the ease and convenience of serving users through a web browser.

Presenters
avatar for Claire Stirm

Claire Stirm

Project Coordinator, UC San Diego | SDSC
Claire Stirm is the Deputy Director of the Incubator and Project Coordinator for the Science Gateways Community Institute (SGCI). 
SC

Sandi Caldrone

Purdue University Libraries


Tuesday September 24, 2019 11:10am - 11:20am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:20am PDT

Search SRA Gateway for Metagenomics Data
The Sequence Read Archive (SRA)-https://www.ncbi.nlm.nih.gov/sra houses all publicly available biological DNA sequence data to enhance reproducibility, reduce redundancy, and to allow for new discoveries by comparing data. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD System®, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.The Sequence Read Archive (SRA), the world’s largest database of sequences, is growing at the alarming rate of 10 TB per day. But this data is inaccessible to most researchers because of the need for large storage and computing facilities to search through the datasets. Most individual laboratories do not have the computing capacity to deal with this volume of data.
Empowering scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. Together with XSEDE ECSS support, we developed a gateway (https://www.searchsra.org/) to provide computational analysis of a subset of the SRA, focussed on metagenomic sequences. These sequences come from diverse environments, and their analysis is computationally challenging. Our users submit a DNA or protein sequence to be compared to all of the known sequences in the public databases. The computation is performed on XSEDE cloud resource Jetstream and the data housed on the XSEDE Wrangler resource. Results from the computation are only saved shortly to enable the users to download the outputs.
Future improvements will provide data versioning and integrity, a wider range of search algorithms, and integrate other applications into the gateway to streamline direct job submission and result retrieval.


Tuesday September 24, 2019 11:20am - 11:40am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:40am PDT

ESS-DIVE: A Scalable Community Repository for Managing Earth and Environmental Science Data
This demonstration presents the Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), a new Department of Energy (DOE) web-based data repository that enables the earth and environmental science community. The multidisciplinary ESS-DIVE team consists of computer scientists, environmental scientists, and digital librarians that have come together to build this system. We will highlight the end-to-end features of ESS-DIVE to showcase its unique capabilities, including (1) Implementation of Data Standards and HTTP API using JSON-LD, (2) Publication workflow and automated DOI generation, (3) Scalable, repeatable containerized infrastructure through Docker, (4) Core capabilities based on the NCEAS Metacat and MetacatUI software, including ORCID based single-sign on, data search and access, data publication and dataset management, and (5) Federated data access and replication on the DataONE network.


Tuesday September 24, 2019 11:40am - 12:00pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109
 
Filter sessions
Apply filters to sessions.