Tuesday, September 24 • 11:20am - 11:40am
Search SRA Gateway for Metagenomics Data

The Sequence Read Archive (SRA)-https://www.ncbi.nlm.nih.gov/sra houses all publicly available biological DNA sequence data to enhance reproducibility, reduce redundancy, and to allow for new discoveries by comparing data. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD System®, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.The Sequence Read Archive (SRA), the world’s largest database of sequences, is growing at the alarming rate of 10 TB per day. But this data is inaccessible to most researchers because of the need for large storage and computing facilities to search through the datasets. Most individual laboratories do not have the computing capacity to deal with this volume of data.
Empowering scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. Together with XSEDE ECSS support, we developed a gateway (https://www.searchsra.org/) to provide computational analysis of a subset of the SRA, focussed on metagenomic sequences. These sequences come from diverse environments, and their analysis is computationally challenging. Our users submit a DNA or protein sequence to be compared to all of the known sequences in the public databases. The computation is performed on XSEDE cloud resource Jetstream and the data housed on the XSEDE Wrangler resource. Results from the computation are only saved shortly to enable the users to download the outputs.
Future improvements will provide data versioning and integrity, a wider range of search algorithms, and integrate other applications into the gateway to streamline direct job submission and result retrieval.

Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109