Loading…
Gateways 2019 has ended
Monday, September 23
 

8:00am PDT

Registration Opens, Breakfast Available
Monday September 23, 2019 8:00am - 9:00am PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

9:00am PDT

Deploying Science Gateways using Apache Airavata
The authors present the Apache Airavata framework for deploying science gateways, illustrating how to request, administer, modify, and extend a basic gateway tenant to the middleware.

Skill Level: Intermediate.

Prerequisites:
  1. python 3.6 – Latest 3.6 release (https://www.python.org/downloads/)
  2. Docker (https://www.docker.com/products/docker-desktop)
 
Technology Requirements:
  • Laptop with Linux OS, Mac OS, or a Linux vm
 
https://s.apache.org/django-tutorial


Monday September 23, 2019 9:00am - 12:30pm PDT
Macaw Room, Catamaran Resort

9:00am PDT

Science Gateways Bootcamp [now Focus Week] Brain Trust: A Problem-Solving Workshop
This session will be repeated in the afternoon. Attend one or attend both!

The Science Gateways Bootcamp [now Gateway Focus Week] is designed to help teams in any stage of development for their gateway project - from planning to developing to maintaining their gateway. Principal Investigators and their teams learn core business strategy skills, best practices, and long-term sustainability strategies. Participants engage in hands-on activities to help them articulate the value of their work to key stakeholders and to create a strong development, operations, and sustainability plan.

Bootcamp participants work closely with one another and, as a result, have the opportunity to network and establish relationships with people who are engaging in similar activities. Since the first Bootcamp in April 2017, 210 participants have gone through the Bootcamp curriculum. Those participants have expressed that they find value to the work of their gateway to keep in touch with members of their cohort. They have also expressed interest in diving deeper into their problems and receiving feedback from their cohort and the instructors. This three-hour workshop will enable Bootcamp alumni an opportunity to participate in a "Brain Trust" exercise with members of their cohort and other cohorts as well as other conference participants. Through this exercise, alumni will share a problem they are facing and receive recommendations from their peers on how to approach the issue. Alumni and other participants will share ideas of how to resolve the issue along with commitments to help work towards a solution. Bootcamp instructors will lead this engagement and guide the exercise. We expect to moderate up to six teams per three-hour session. All participants will leave with a new tool for problem solving.

Prerequisites: Prior participation in Bootcamp/Focus Week is encouraged but not expected. No laptop is needed.

Presenters
avatar for Claire Stirm

Claire Stirm

Project Coordinator, UC San Diego | SDSC
Claire Stirm is the Deputy Director of the Incubator and Project Coordinator for the Science Gateways Community Institute (SGCI). 


Monday September 23, 2019 9:00am - 12:30pm PDT
Boardroom East, Catamaran Resort

9:00am PDT

Secure Coding Practices and Automated Assessment Tools
High performance computing increasingly involves the development and deployment of network and cloud services to access resources for computation, communication, data, instruments, and analytics. Unique to the HPC field is the large amount of software that we develop to drive these services. These services must assure data integrity and availability, while providing access to a global scientific and engineering community. Securing your network is not enough. Every service that you deploy is a window into your data center from the outside world, and a window that could be exploited by an attacker.

This tutorial is relevant to anyone wanting to learn about minimizing security flaws in the software they develop or manage. We share our experiences gained from performing vulnerability assessments of critical middleware. You will learn skills critical for software developers and analysts concerned with security.
Software assurance tools – tools that scan the source or binary code of a program to find weaknesses – are the first line of defense in assessing the security of a software project. These tools can catch flaws in a program that affect both the correctness and safety of the code. This tutorial is also relevant to anyone wanting to learn how to use these automated assessment tools to minimize security flaws in the software they develop or manage.

Content level: 50% beginner, 25% intermediate, 25% advanced. The target audience for this tutorial is anyone involved with the development, deployment, assessment, or management of critical software.

Prerequisites: To gain maximum benefit from this tutorial, attendees should be familiar with the process of developing software and at least one of the C, C++ Java or scripting programming languages. This tutorial does not assume any prior knowledge of security assessment or vulnerabilities. The hands-on exercise will be packed in a VirtualBox image, which will be available to attendees before the tutorial session (and available on the web and memory sticks at the tutorial). The VirtualBox image will be pre-configured and ready to run (on Linux, Windows, and MacOS) with example code and step-by-step instructions.

To attend this tutorial, you will need to: 
1. Bring your own laptop.
2. Have VirtualBox installed on your machine. a. Go to https://www.virtualbox.org/wiki/Downloads and download VirtualBox 5.2.30 for your platform. If you already have VirtualBox installed, if the version is lower than the very new 6.0 you should be fine. Note that the binary for 5.2.30 is in https://www.virtualbox.org/wiki/Download_Old_Builds_5_2 (3rd bullet).
b. Execute the program downloaded.
c. Check that you are able to run VirtualBox.

3. For the class exercises, we will use a virtual machine image.

Please download it from:
http://www.cs.wisc.edu/mist/trusted-ci-ubuntu-mini-2019.ova (2.4 GB)
Save it on the local disk of the machine you will be using for the tutorial. If you have problems downloading this image, we will have copies at the class.
If you have any questions before the tutorial, please contact elisa@cs.wisc.edu




Monday September 23, 2019 9:00am - 12:30pm PDT
Cockatoo Room, Catamaran Resort

9:00am PDT

Portable, Reproducible High Performance Computing In the Cloud (All-day Tutorial)
This tutorial will focus on providing attendees exposure to state-of-the-art techniques for portable, reproducible research computing, enabling them to easily transport analyses from cloud to HPC resources. We will introduce open source technologies such as Jupyter, Docker and Singularity, the emerging "serverless" computing paradigm, and how to utilize these tools within two NSF-funded cyberinfrastucture platforms, Tapis API (formerly Agave API) and Abaco API. The approaches introduced not only increase application portability and reproducibility but also reduce or eliminate the need for investigators to maintain physical infrastructure so that more time can be spent on analysis. For the tutorial, attendees will have access to allocations on XSEDE JetStream and one or more HPC resources such as TACC’s Stampede2 or Frontera.

Target Audience: This tutorial is targeted to CI professionals and researchers that are interested in learning to use container technologies for research computing, and leveraging national cyberinfrastructure platforms to execute containerized compute jobs on cloud and HPC resources.

Content Level: Beginner 70%, Intermediate 30%

Prerequisites: Requirements before the Workshop: Basic familiarity with Linux, SSH and the command line will be assumed. A valid, active TACC account will be needed to complete the exercises (attendees can register for a TACC account for free on the TACC User Portal: https://portal.tacc.utexas.edu/account-request ). Some familiarity with Python will be helpful but not required. Attendees must bring their own laptops.


Monday September 23, 2019 9:00am - 5:00pm PDT
Toucan Room, Catamaran Resort

10:30am PDT

Tutorial Coffee Break
Monday September 23, 2019 10:30am - 11:00am PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

12:30pm PDT

Lunch
Monday September 23, 2019 12:30pm - 1:30pm PDT
Beach, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:30pm PDT

Creating Science Gateways with GenApp, Containers and Abaco
GenApp is a framework for rapidly building and deploying graphical front ends for underlying computational modules. GenApp builds full featured applications on an extensible variety of web and local GUI-based targets. GenApp works by reading a collection of definition files which guide the assembly of code fragments to output the application. In particular, GenApp can build fully functioning science gateways with a rich set of features. Science gateways built with GenApp can utilize OAuth2 for users to register and log on with XSEDE, Google or other credentials. Underlying jobs can run on a variety of resources including direct execution via ssh, elastically via OpenStack, on queue-managed resources via Apache Airavata, and newly in Containers locally and via Abaco. GenApp is currently being successfully used as the generator of multiple production science gateways. Abaco is an NSF-funded web service and distributed computing platform providing functions-as-a-service (FaaS) to the research computing community. Abaco implements functions using the Actor Model of concurrent computation. This tutorial will be presented in two 90 minutes segments. The first will cover practical usage of advanced user interface methods. The second will cover GenApp execution methods, in particular building containers from GenApp modules and running jobs in containers locally and via Abaco.

Skill level: Although this tutorial’s skill level is listed as “Intermediate," beginners should consider taking advantage of our on-line GenApp basics tutorial and available interactive training described under prerequisites.

Prerequisites: Participants should bring a laptop with an SSH client and modern web browser installed. Working knowledge of some text editor under Linux is required, such as nano, vi or emacs. The instructors will arrange for students to have access to cloud based training accounts. It is expected for attendees to familiarize themselves with the material covered in the GenApp basics tutorial available at http://genapp.rocks/learn. On-line training will be available beforehand to assist attendees with the GenApp basics, if requested. For more information on additional training or to ask any questions, please subscribe to the users’ mailing list at http://genapp.rocks/join. For the second session, participants are requested to additionally setup a Docker account. You can do this at https://docker.com and clicking “Sign in” near the top right. Second session participants are also requested to setup a TACC account. You can do this at https://portal.tacc.utexas.edu and clicking “Sign in” at the top right. Follow the instructions under “Don’t have a TACC Account?”.

History: This tutorial builds upon the Gateways 2016 and Gateways 2018 tutorials. (The Gateways 2016 tutorial is available on-line, http://genapp.rocks/learn). The first 90-minute session is a repeat of that provided in the Gateways 2018 tutorial. The second 90 minute session contains all new material.


Monday September 23, 2019 1:30pm - 5:00pm PDT
Boardroom West, Catamaran Resort

1:30pm PDT

Deploy computations and workflows, at-scale, on the Open Science Grid
Could your Gateway or other computational work benefit from the ability to concurrently run hundreds or thousands of independent computations, for free? The Open Science Grid (OSG) facilitates distributed high-throughput computing (dHTC) via a partnership of national labs, universities, and other organizations who contribute and share computing capacity for use by researchers across and beyond the United States. Individual researchers, institutions, or multi-institutional collaborations can access OSG via local submission points or through the OSG Connect service (freely available to U.S. academic, government, and non-profit researchers).

The OSG is perhaps the most scalable resource for computational work that can be run as numerous independent jobs, making it an ideal fit for many existing and future gateways. Individual users regularly occupy thousands of CPU cores across jobs when each runs for less than a day on a single core, achieving greater parallelization than on any individual cluster. Computational work runs in the OSG via the HTCondor job scheduler, which can be integrated with numerous workflow tools (Pegasus, TOIL, CCTools, HTCondor's own DAG Manager, etc.) and made interchangeable with submission to other schedulers. Available capacity includes not only the significant CPU and data storage in OSG, but also GPUs and seamless expansion to cloud resources. The OSG offers multiple services in support of the Science Gateway Community, including consulting on workflow design/optimization, data handling, and software solutions via help@opensciencegrid.org.

During this 3-hour tutorial, you'll learn to run examples of large HTC workloads and multi-step workflows via the OSG Connect service, including discussion of the support available to gateway developers through OSG. If time permits, the OSG User Support team will also help you run your own sample workload on OSG.

Skill Level : Intermediate

Prerequisites : Familiarity with the unix command-line; familiarity with SSH connection to a remote server

Requirements : Participants will need to bring a laptop with WiFi and SSH capabilities (e.g. Putty for Windows or Terminal for Mac/Linux laptops); learning accounts will be distributed to participants; no additional software will need to be installed ahead of the tutorial.


Monday September 23, 2019 1:30pm - 5:00pm PDT
Cockatoo Room, Catamaran Resort

1:30pm PDT

How to Create and Maintain an Effective Information Architecture and Navigation System for Science Gateway Websites
Part one of IA-SEO tutorial: https://drive.google.com/file/d/1Ew02M7NrAz0gx4c8pYCq7TwyQJaq-q2K/view
Part two of IA-SEO tutorial: https://tinyurl.com/yxsgs87e

Whether you have an existing Science Gateway website or are creating your first one, this hands-on tutorial will show you, step by step, how to create and update gateway websites so that their content is easier to find and easier to use.

As a Science Gateway provides its web-based tools and resources, it is essential that these sites utilize specific usability tests and other research methods to ensure positive and productive experiences with the sites. Successful information architecture (IA), intuitive site navigation, and clear user interfaces (UIs) all rely on knowing where various users expect to find needed information.

Since many Science Gateway creators are educated as subject domain scientists (e.g. biological, chemical, physical, environmental, social, mathematical, and computer scientists), many are not likely to understand the importance of IA and site navigation, including the downstream dependence they have on search engine visibility and user engagement. Additionally, the information architecture and site navigation processes are iterative ones, requiring ongoing measurement, assessment, and updates. To accomplish optimal findability (which includes the behaviors browsing, searching, and asking), creators of gateway websites should understand the information architecture vocabulary, different architecture research methods, and when to use each research method to determine the best ways to label and organize content. Then, based on the data gathered from both qualitative data (desirability studies, interviews, diary studies, and so forth), and quantitative data (web analytics data for an existing website, search tools, performance-based usability tests, and so forth), gateway managers should create and maintain a navigation system that contains at least five types of navigation schemes.

If the navigation system of a gateway’s site is effective, visitors will have positive user experience (UX), and search engines will be able to properly access a site’s documents and prioritize its content. Ultimately, the site’s IA and navigation system will lead to increased page views per visitor, a low bounce rate, and other conversions such as filling out and submitting forms, creating an account, logging in/out, downloading/uploading resources, and using gateway tools.

This tutorial includes exercises as well as downloadable checklists (PDFs). At the end of this tutorial, attendees will:
  • Know what to put in design templates to make gateway content more findable before & after people arrive on the site
  •  Identify & measure significant architecture & navigation metrics
  •  Learn various methods to identify, evaluate, & fix common IA/navigation issues
  •  Communicate with web professionals, usability/UX professionals, & information architects in an informed way
Skill level: All (Beginner – Advanced) This tutorial is created for gateways of all types & for all stages of web design/development from (planning to site launch). No programming skills required.

Technology requirements: A laptop or notebook computer is highly recommended so attendees can view live gateway sites & useful online tools.


Monday September 23, 2019 1:30pm - 5:00pm PDT
Macaw Room, Catamaran Resort

1:30pm PDT

Science Gateways Bootcamp (Focus Week) Brain Trust: A Problem-Solving Workshop
This session is a repeat of the morning session. Attend one or attend both!

The Science Gateways Bootcamp [now Gateway Focus Week] is designed to help teams in any stage of development for their gateway project - from planning to developing to maintaining their gateway. Principal Investigators and their teams learn core business strategy skills, best practices, and long-term sustainability strategies. Participants engage in hands-on activities to help them articulate the value of their work to key stakeholders and to create a strong development, operations, and sustainability plan.

Bootcamp participants work closely with one another and, as a result, have the opportunity to network and establish relationships with people who are engaging in similar activities. Since the first Bootcamp in April 2017, 210 participants have gone through the Bootcamp curriculum. Those participants have expressed that they find value to the work of their gateway to keep in touch with members of their cohort. They have also expressed interest in diving deeper into their problems and receiving feedback from their cohort and the instructors. This three-hour workshop will enable Bootcamp alumni an opportunity to participate in a "Brain Trust" exercise with members of their cohort and other cohorts as well as other conference participants. Through this exercise, alumni will share a problem they are facing and receive recommendations from their peers on how to approach the issue. Alumni and other participants will share ideas of how to resolve the issue along with commitments to help work towards a solution. Bootcamp instructors will lead this engagement and guide the exercise. We expect to moderate up to six teams per three-hour session. All participants will leave with a new tool for problem solving.

Prerequisites: Prior participation in Bootcamp/Focus Week is encouraged but not expected. No laptop is needed.

Presenters
avatar for Claire Stirm

Claire Stirm

Project Coordinator, UC San Diego | SDSC
Claire Stirm is the Deputy Director of the Incubator and Project Coordinator for the Science Gateways Community Institute (SGCI). 


Monday September 23, 2019 1:30pm - 5:00pm PDT
Boardroom East, Catamaran Resort

3:00pm PDT

Tutorial Snack Break
Monday September 23, 2019 3:00pm - 3:30pm PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109
 
Tuesday, September 24
 

7:30am PDT

Registration Opens, Breakfast Available
Tuesday September 24, 2019 7:30am - 8:30am PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

8:30am PDT

Welcome & Awards
The Science Gateways Community Institute (SGCI) is pleased to welcome you to the Gateways 2019 conference, the fourth sponsored by SGCI. In addition to welcoming you to the next two days of the conference, SGCI's Workforce Development area will announce the winners of the Young Professional of the Year award.

Presenters
avatar for Katherine Lawrence

Katherine Lawrence

Associate Director, Community Engagement & Exchange, U of Michigan/Science Gateways Community Institute
I help people creating advanced digital resources for research and education connect their projects with helpful services, expertise, and information. Ask me how the Science Gateways Community Institute can support your projects--at no cost--to better leverage the people and money... Read More →


Tuesday September 24, 2019 8:30am - 9:00am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

9:00am PDT

Keynote: James Taylor on "Galaxy: From genomic science gateway to global community"
Presenters
avatar for James Taylor

James Taylor

Professor, Johns Hopkins University
James Taylor is the Ralph S. O'Connor Professor of Biology and professor of computer science at Johns Hopkins University. Until 2014, he was an associate professor in the departments of biology and mathematics and computer science at Emory University. He is one of the original developers... Read More →


Tuesday September 24, 2019 9:00am - 10:00am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:00am PDT

Coffee Break
Tuesday September 24, 2019 10:00am - 10:30am PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:30am PDT

Enabling rich data sharing for Science Gateways via the SeedMeLab platform
Science Gateways provide an easily accessible and powerful computing environment for researchers. These are built around a set of software tools that are heavily used by large research communities in specific domains. Science Gateways have been catering to a growing need of researchers for easy to use computational tools, however, their usage model is typically single user-centric. As scientific research becomes ever more team-oriented, the need for integrated collaborative capabilities in Science Gateways has been emerging. One such need is the ability to share data/results with others. In this article, we will describe and discuss our effort to provide a rich environment for data organization and sharing by integrating the SeedMeLab platform with two Science Gateways: CIPRES and GenApp.


Tuesday September 24, 2019 10:30am - 10:50am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:30am PDT

Gateway design features for undergraduate education communities
Like science gateways, an education gateway should provide research and management support for a community of practitioners working collaboratively to solve a set of challenging problems. While the technical aspects of the cyberinfrastructure play an important role in the utility of a gateway, they are not sufficient to attract users who are new to collaborative, online scholarship. Over the course of the development of the Quantitative Undergraduate Biology Education and Synthesis (QUBES) gateway we have learned to adapt our services and messaging to reach out to our target audience and recruit their participation. Part of this process has involved aligning our services with common project management challenges and being aware of the opportunities and constraints faced by teaching faculty. Adopting a client-centered approach has made it possible not only to build our user base, but to foster important conversations among users around promoting a shared culture that supports scholarly approaches to teaching.

Presenters
avatar for Michael LaMar

Michael LaMar

Associate Professor, College of William and Mary


Tuesday September 24, 2019 10:30am - 10:50am PDT
Toucan Room, Catamaran Resort

10:50am PDT

iReceptor: A case study in the importance of standards for data sharing
Next-generation sequencing (NGS) allows the characterization of the adaptive immune receptor repertoire (AIRR) in exquisite detail. These large-scale AIRR-seq
data sets have rapidly become critical to vaccine development, understanding the immune response in autoimmune and infectious disease, and monitoring novel therapeutics against cancer. Over the past five years, a grass roots, international community (the AIRR Community - www.airr-community.org) has been working towards establishing standards and recommendations for obtaining, analyzing, curating and comparing/sharing NGS AIRR-seq datasets. Using these standards, the AIRR Community Common Repository Working Group (CRWG) is working towards establishing an international network of AIRR-seq repositories whose data are findable, accessible, interoperable, and reusable (FAIR).

The iReceptor Data Integration Platform (gateway.ireceptor.org) provides an implementation of the AIRR Data Commons envisioned by the AIRR Community. The iReceptor Scientific Gateway links distributed (federated) AIRR-seq repositories,
allowing sequence searches or repertoire metadata queries across multiple studies at multiple institutions, returning sets of sequences fulfilling specific criteria. The data standards developed by the AIRR Community are at the foundation of our ability to implement such a platform. In this paper we use iReceptor as a case study that considers the importance of standards for effective data sharing.

The short paper will discuss the process that the AIRR Community went through to establish its working groups and the standards those working groups produced. This will include discussions of the Minimal Information for AIRR-seq data (MiAIRR), the Standardized Representations for Annotated Immune Repertoires, and the emerging AIRR Data Commons Web API. Each of these standards will be discussed in the context of the iReceptor Platform terms of its importance to the platform's implementation as well as its expected usefulness to the scientific community.


Tuesday September 24, 2019 10:50am - 11:10am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:50am PDT

External Communication to Diffuse Science Gateways and Cyberinfrastructure for Research with Big Data
In the era of big data, for science gateways (SG) and cyberinfrastructure (CI) projects to have the greatest impacts, the tools need to be widely adopted in the scientific community. However, diffusion activities are often an afterthought in SG/CI projects. We warn against the fallacy of ‘If You Build It, They Will Come’. Projects could be intentional in promoting tool adoption. We identified five external communication practices based on an analysis of 20 interviews with administrators, developers, users, and outreach educators working in CI across the US. The practices include raising awareness of the innovations, engaging in educational outreach, building relationships with trust, networking with the community, and keeping a track record of reliability. While exploratory in nature, the findings can be used as a guideline for project to promote SG/CI diffusion. The paper serves as evidence to justify a bigger budget from funder for diffusion activities to increase adoption and broader impacts.


Tuesday September 24, 2019 10:50am - 11:10am PDT
Toucan Room, Catamaran Resort

11:10am PDT

Purdue University Research Repository - adapting when small data gets bigger
PURR was founded in 2011 as a partnership between Purdue University Libraries, Information Technology at Purdue (ITaP), and the Office of the Executive Vice President for Research as campus-wide support for researchers throughout the data management lifecycle built on the HUBzero® platform, which was developed at Purdue. PURR provides the tools and expertise to help researchers plan for data management, share data with collaborators, publish completed datasets in compliance with federal funding guidelines, safely archive data, and track data publication impact. Every PURR user has access to private space for storing and sharing research data files. When research is completed, PURR takes users through a step-by-step process for selecting and describing data files for publication. Upon publication, PURR mints a DOI for each dataset, and provides archiving services through the MetaArchive network. All published datasets are maintained and accessible on the PURR website for at least 10 years. After which time, they will be reviewed by the libraries and could be decommissioned or moved to library archives.

Over the past eight years, PURR has published 975 datasets, and served over 3,600 researchers with 481 grant awards. In that time, PURR’s services have grown along with the HUBzero® platform to meet the changing needs of the Purdue community as researchers across all fields produce more data. Supporting larger datasets requires a multi-faceted approach far beyond simply acquiring additional storage space. Our recent development has followed a 5-pronged plan: 1) increased storage quotas, 2) new publication series functionality, 3) an online database viewer, 4) publication file preview, and 5) seamless ftp transfers for large publications. Combined, these improvements ensure our increasingly large data publications are not only stored safely, but also are accessible over the long term.

The newly published Rough Cilicia Survey Pottery Study dataset series illustrates both the motivation for and the results of PURR’s recent development. The culmination of four years of close collaboration between PURR’s data curator and a faculty member from Purdue’s classics department, the Rough Cilicia collection is composed of 25 datasets. The collection takes advantage of PURR’s series functionality, which allows authors to separate large data collections into smaller, more manageable, related subsets. These subsets are easier to download than the entire collection, and each subset has a DOI for precise citation. This series makes available images of hundreds of pottery sherds from the ancient Cilicia region of modern-day Turkey, and their associated descriptive information in a series of interactive data tables that allow the user to view, search, and filter data on the PURR website. Users can also download the data files for closer study and reuse. At about 15 GB, the Rough Cilicia series is not exactly “big data,” but it is large enough to stretch the limits of a web-based repository like PURR, and we are increasingly seeing datasets of this size or more. Moderate improvements like the five mentioned here allow us to publish larger datasets while maintaining the ease and convenience of serving users through a web browser.

Presenters
avatar for Claire Stirm

Claire Stirm

Project Coordinator, UC San Diego | SDSC
Claire Stirm is the Deputy Director of the Incubator and Project Coordinator for the Science Gateways Community Institute (SGCI). 
SC

Sandi Caldrone

Purdue University Libraries


Tuesday September 24, 2019 11:10am - 11:20am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:10am PDT

TAMU HPRC Portal: Leveraging Open OnDemand for Research and Education
The Texas A&M University High Performance Research Computing (TAMU HPRC) Portal is a local installation and adaptation of Open OnDemand (OOD) on the HPRC clusters. The portal provides an advanced cyberinfrastructure that enables HPRC users with various backgrounds to utilize the High Performance Computing (HPC) resources for their research. It also serves as an educational platform for HPRC staff to train their users with cluster technologies and HPC applications.
Using OOD for the HPRC portal has three benefits. First, it provides a single point of access to all the HPC resources via a web browser and can greatly simplify HPC workflows. Second, it provides an intuitive user interface that significantly reduces the barrier between users and HPC working environments. Third, the extensible and scalable design makes it easy to accommodate a growing number of users and applications.
In addition to the out-of-the-box features, we have extensively customized the Matlab interface for our local needs. We have also developed a dynamic form generation scheme that makes the portal app deployment and management more efficient. We have used the portal in multiple training programs and have received positive feedback from the instructors and the users.
To understand the impact of the portal on our users, we analyzed portal access data and conducted a survey among HPRC portal users. We received 148 survey responses out of 554 users who have accessed the portal between March 22, 2018 and April 24, 2019. The responses demonstrate that most users think the apps are useful and they would recommend the portal to other users. Additionally, we provide two use cases from our portal users, one for research and one for training, to demonstrate the usefulness of the portal.
Our paper is the first that describes the experience with OOD from an HPC site outside of OSC. Overall, the TAMU HPRC Portal based on OOD provides a robust and simple solution for both novice and experienced users at TAMU HPRC to access HPC resources. It is a valuable addition to the traditional command line based approach.


Tuesday September 24, 2019 11:10am - 11:20am PDT
Toucan Room, Catamaran Resort

11:20am PDT

Search SRA Gateway for Metagenomics Data
The Sequence Read Archive (SRA)-https://www.ncbi.nlm.nih.gov/sra houses all publicly available biological DNA sequence data to enhance reproducibility, reduce redundancy, and to allow for new discoveries by comparing data. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD System®, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.The Sequence Read Archive (SRA), the world’s largest database of sequences, is growing at the alarming rate of 10 TB per day. But this data is inaccessible to most researchers because of the need for large storage and computing facilities to search through the datasets. Most individual laboratories do not have the computing capacity to deal with this volume of data.
Empowering scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. Together with XSEDE ECSS support, we developed a gateway (https://www.searchsra.org/) to provide computational analysis of a subset of the SRA, focussed on metagenomic sequences. These sequences come from diverse environments, and their analysis is computationally challenging. Our users submit a DNA or protein sequence to be compared to all of the known sequences in the public databases. The computation is performed on XSEDE cloud resource Jetstream and the data housed on the XSEDE Wrangler resource. Results from the computation are only saved shortly to enable the users to download the outputs.
Future improvements will provide data versioning and integrity, a wider range of search algorithms, and integrate other applications into the gateway to streamline direct job submission and result retrieval.


Tuesday September 24, 2019 11:20am - 11:40am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:20am PDT

Using a Scientific Gateway to Build STEM Education Capacity Around the World
With a broad focus on STEM education, STEMEdhub.org brings together university researchers in STEM disciplines with other researchers, k-12 teachers/practitioners, students and the public through the various groups and projects on the site. Built on Purdue’s HUBzero architecture, STEMEdhub.org is a fully functional Gateway that facilitates the hosting of interactive scientific tools, online presentations, wikis, or documents such as assessment plans and courses for downloading or interactive editing, complemented by document tagging to enable searching and a rating tool for commenting on shared resources. STEMEdhub has been used for over 8 years to build capacity in many areas of STEM Education in the United States and throughout the world. It currently hosts over 6000 users in 160 user groups with over 1300 published resources. More importantly, STEMEdhub.org allows users themselves to create and manage their own groups, resources and communities of practice enabling it to exist on very little overhead with a small staff. While other Scientific Gateways focus on high performance computing capabilities, STEMEdhub is focused on using a Scientific Gateway platform to make connections, build partnerships and engage students. This Demo will show how STEMEdhub.org is used as a Scientific Gateway to build STEM Education Capacity throughout the world.

Presenters
AB

Ann Bessenbacher

Data Scientist, ELRC/Purdue University


Tuesday September 24, 2019 11:20am - 11:40am PDT
Toucan Room, Catamaran Resort

11:40am PDT

ESS-DIVE: A Scalable Community Repository for Managing Earth and Environmental Science Data
This demonstration presents the Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), a new Department of Energy (DOE) web-based data repository that enables the earth and environmental science community. The multidisciplinary ESS-DIVE team consists of computer scientists, environmental scientists, and digital librarians that have come together to build this system. We will highlight the end-to-end features of ESS-DIVE to showcase its unique capabilities, including (1) Implementation of Data Standards and HTTP API using JSON-LD, (2) Publication workflow and automated DOI generation, (3) Scalable, repeatable containerized infrastructure through Docker, (4) Core capabilities based on the NCEAS Metacat and MetacatUI software, including ORCID based single-sign on, data search and access, data publication and dataset management, and (5) Federated data access and replication on the DataONE network.


Tuesday September 24, 2019 11:40am - 12:00pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:40am PDT

Open OnDemand: State of the Platform and the Project
High performance computing (HPC) has led to remarkable advances in science and engineering and has become an indispensable tool for research. Unfortunately, HPC use and adoption by many researchers is often hindered by the complex way in which these resources are accessed. Indeed, while the web has become the dominant access mechanism for remote computing services in virtually every computing area, it has not for HPC. Open OnDemand is an open source project to provide web based access to HPC resources (https://openondemand.org). This paper describes the challenges to adoption and other lessons learned over the three year project that may be relevant to other science gateway projects, and describes future plans in the Open OnDemand 2.0 project.


Tuesday September 24, 2019 11:40am - 12:00pm PDT
Toucan Room, Catamaran Resort

12:00pm PDT

Lunch
Tuesday September 24, 2019 12:00pm - 1:00pm PDT
Beach, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:00pm PDT

Learning Labs
Tuesday Learning Labs (1:00-1:45pm)
Table Numbers and Topics for this time slot
  1. Cybersecurity Risks for Science Gateways
    This forum, hosted by a cybersecurity analyst affiliated with SGCI, will provide an opportunity for gateway operators and developers to learn more about security risks applicable to gateways and to share their own security challenges. Gateway operators will be able to takeaway solutions to common risks as well as receive direct advice and insight into more unique challenges. Hosted by Mark Krenz.
  2. Gateway Ambassadors - Join our community for community builders
    SGCI is in the process of spinning up a community of Gateway Ambassadors. Gateway Ambassadors serve as community builders, making connections between people, experts, and resources on campus and in distributed projects. Come for brainstorming about ideas and concepts what you would like to see in such a community. You can join also if you wouldn't like to fill the role of a Gateway Ambassador yourself - anyone interested is welcome to participate. Hosted by Sandra Gesing.
  3. Gateway Data Wrangling
    What do you find most challenging about transferring & managing data in your gateway? Lee is from the Globus team and wants to know what Globus can do to make life easier for gateway developers. Hosted by Lee Liming
  4. Workflow Frameworks
    Does your gateway currently support user-defined workflows? If so, what framework do you use and how? Hosted by Rajesh Kalyanam.
  5. Self-hosted HUBzero Community
    A call for self-hosted HUBzero site developers and potential open source contributors to brainstorm on ways to foster and sustain a community of support. Hosted by Jack Smith.
  6. Training and supporting communities to engage with your Gateway
    Training and supporting researchers to use, extend, or customize a scientific Gateway. This BoF is being organized by Dave Clements and Mo Heydarian, who would be happy to have additional attendees of Gateways 2019 lead the discussion around training.
  7. Automation tech and Jetstream
    "Do it once, and it's done. Do it twice, and you should have automated." "Anything that you do more than twice has to be automated." "Three strikes and you automate." Wherever you draw the line, automation is a critical part of running a successful science gateway. Come join other developers, engineers, and DevOps practitioners to share your experiences automating all the things on Jetstream and the commercial cloud. Hosted by Rion Dooley.

About Learning Labs
“Learning Labs” may be one of several styles of impromptu learning:
  • Pop-up BOFs (Birds-of-a-Feather Sessions)
  • Mini Hacks
  • Coffee-Break Conversations
We will have three 45-minute periods devoted to your ideas. Round tables will be set up for you to meet with others and discuss the topics of your choice. Here are the ways you can get involved:
  1. Propose a topic that you’d be willing to host. (You don’t have to be an expert, just interested!) 
  2. Find a topic that interests you, and join a table!
Submit your topic with this Google form by Tuesday, September 24 at 7pm Pacific: https://forms.gle/1Th5svq13VeKreLE7.

We’ll announce when and where the first round are happening by Tuesday morning on Sched, second round on Wednesday morning. We may be able to insert additional topics after the deadline if space is available.


Tuesday September 24, 2019 1:00pm - 1:45pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:50pm PDT

Measuring Success: How Science Gateways Define Impact
Science gateways, also known as advanced web portals, virtual research environments and more, have changed the face of research and scholarship over the last two decades. Scholars world-wide leverage science gateways for a wide variety of individual research endeavors spanning diverse scientific fields. Evaluating the value of a given gateway to its constituent community is critical in obtaining the financial and human resources to sustain gateway operations. Accordingly, those who run gateways must routinely measure and communicate impact. Just as gateways are varied, their success metrics vary as well. In this survey paper, a variety of different gateways briefly share their approaches.


Tuesday September 24, 2019 1:50pm - 2:20pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:50pm PDT

DDX-Interface: An interface to and a factory of interoperable scientific gateways.
Data access and distribution is an ongoing problem in science, affecting many research fields, from genomic information to microscopic images of rocks. Issues such as differing database schema and file formats, inability to customize or enforce laboratory terms of use, infrastructure failure and financial limitations have reduced public access to scientific data. Centralized solutions have been funded by government agencies in an attempt to expand access, but often, valuable resources cannot be published in repositories without being part of a peer-reviewed publication. Previously, we proposed to answer the demand for public access to raw scientific data using Open Index Protocol, a specification to publish metadata into a public blockchain-based ledger and host files in peer-to-peer file systems. With this method, there are 30TB of cryo-electron tomography datasets publicly available today. Now, we generalized this idea to let researchers publish any kind of scientific datasets using a distributed public ledger as a common index between interoperable databases. Here we describe a customizable gateway capable of exploring these distributed databases and publishing new records. The basic gateway design is built to be intuitively operable by academic and non-academic public alike, expanding the reach of the data distribution. As Open Index Protocol becomes a popular choice for data distribution by laboratories, focus on the user experience of the interface for data consumption will be key to achieve its full impact on society.

In the demo part of this presentation, we will demonstrate how to build a distributed database to share scientific data using Open Index Protocol, a specification to publish metadata on FLO blockchain and use a peer-to-peer file system for file storage and distribution. To begin, we will launch an instance of DDX and publish the metadata schema to the blockchain. Next, we will publish a few datasets to the database using the schema. Then, we will configure the explorer template and customize it to create a static webpage capable of exploring, searching & downloading the datasets published. A remote colleague will run another instance of DDX, configured to be compatible with the database we just created, and will use it to publish some records. We will be able to visualize their records in our own instance of DDX. Finally, we will show how to deploy and build a static website that serves as a gateway to visualize the records in the newly created database. In this brief demonstration, we will show the flexibility and power of this distributed resource for increasing access to raw datasets. Our main goal is to make it easy for researchers to participate in the effort to host and share their own data.


Tuesday September 24, 2019 1:50pm - 2:20pm PDT
Toucan Room, Catamaran Resort

2:20pm PDT

Chatbot Guided Domain-science Knowledge Discovery in a Science Gateway Application
Neuroscientists are increasingly relying on high performance/throughput computing resources for experimentation on voluminous data, analysis and visualization at multiple neural levels. Though current science gateways provide access to computing resources, datasets and tools specific to the disciplines, neuroscientists require guided knowledge discovery at various levels to accomplish their research/education tasks. The guidance can help them navigate through relevant publications, tools, topic associations and cloud platform options as they accomplish important research and education activities. To address this need and to spur research productivity and rapid learning platform deployment, we present "OnTimeRecommend", a novel recommender system that comprises of several integrated recommender modules through RESTful web services. We detail a neuroscience use case in a CyNeuro science gateway, and show how the OnTimeRecommend design can enable novice/expert user interfaces, as well as template-driven control of heterogeneous cloud resources.

Dr. Songjie Wang will be presenting this paper on behalf of the project.


Tuesday September 24, 2019 2:20pm - 2:30pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

2:20pm PDT

Enabling Data Streaming-based Science Gateway through Federated Cyberinfrastructure
Large scientific facilities are unique and complex infrastructures that have become fundamental instruments for enabling high quality, world-leading research tackling scientific problems at unprecedented scales. Cyberinfrastructure (CI) is an essential component of these facilities to provide the user community with access to data, data products, and services with the potential to transform data into knowledge. However, the timely evolution of the CI available at the large facilities is challenging and can result in science communities requirements not being fully satisfied. Furthermore, integrating CI across multiple facilities as part of a scientific workflow is hard, resulting in data silos.

In this paper, we explore how science gateways can provide improved user experience and services that may not be offered at the large facilities datacenter. Using a science gateway supported by the Science Gateway Community Institute that provides subscription-based delivery of streamed data and data products from the NSF Ocean Observatories Initiative (OOI), we propose a system that enables streaming-based capabilities and workflows using data from large facilities such as OOI in a scalable manner. We leverage data infrastructure building blocks such as the Virtual Data Collaboratory that provides data and computing capabilities in the continuum to efficiently and collaboratively integrate multiple data-centric CI, build data-driven workflows and connect large facilities data sources with NSF-funded CI such as XSEDE. We also introduce architectural solutions for running these workflows using dynamically provisioned federated CI.


Tuesday September 24, 2019 2:20pm - 2:30pm PDT
Toucan Room, Catamaran Resort

2:30pm PDT

Snack Break
Tuesday September 24, 2019 2:30pm - 3:00pm PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

3:00pm PDT

nanoHUB@home: Expanding nanoHUB through Volunteer Computing
Volunteer computing (VC) uses consumer digital electronics products, such as PCs, mobile devices, and game consoles, for high-throughput scientific computing. Device owners participate in VC by installing a program which, in the background, downloads and executes jobs from servers operated by science projects. Most VC projects use BOINC, an open-source middleware system for VC. BOINC allows scientists create and operate VC projects and enables volunteers to participate in these projects. Volunteers install a single application (the BOINC client) and then choose projects to support. We have developed a BOINC project, nanoHUB@home, to make use of VC in support of the nanoHUB science gateway. VC has greatly expanded the computational resources available for nanoHUB simulations.

We are using VC to support “speculative exploration”, a model of computing that explores the input parameters of online simulation tools published through the nanoHUB gateway, pre-computing results that have not been requested by users. These results are stored in a cache, and when a user launches an interactive simulation our system first checks the cache. If the result is already available it is returned to the user immediately, leaving the computational resources free and not re-computing existing results. The cache is also useful for machine learning (ML) studies, building surrogate models for nanoHUB simulation tools that allow us to quickly estimate results before running an expensive simulation.

VC resources also allow us to support uncertainty quantification (UQ) in nanoHUB simulation tools, to go beyond simulations and deliver real-world predictions. Models are typically simulated with precise input values, but real-world experiments involve imprecise values for device measurements, material properties, and stimuli. The imprecise values can be expressed as a probability distribution of values, such as a Gaussian distribution with a mean and standard deviation, or an actual distribution measured from experiments. Stochastic collocation methods can be used to predict the resulting outputs given a series of probability distributions for inputs. These computations require hundreds or thousands of simulation runs for each prediction. This workload is well-suited to VC, since the runs are completely separate, but the results of all runs are combined in a statistical analysis.


Tuesday September 24, 2019 3:00pm - 3:20pm PDT
Toucan Room, Catamaran Resort

3:00pm PDT

The Dark Energy Survey Data Release Infrastructure
In this paper and demo, we will present and showcase the Data Release Infrastructure we have developed and deploy using state of the art technologies like Kuberentes, Jupyter, Celery and Python to allow scientist to access, explore and analyze the catalogs and images generated by the Dark Energy Survey project, which is a scientific community-based project (with over 500 scientists) with the goal of understanding the origin of dark matter and dark energy by surveying the night sky and observe millions of galaxies and stars for a 5 year period. This Infrastructure includes novel data visualizations and exploration tools to enable scientific discovery. I will review the deployment and development process, the scientific output and feedback as well as the main features of our gateway.


Tuesday September 24, 2019 3:00pm - 3:30pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

3:20pm PDT

Cloud bursting to AWS from the CIPRES Science Gateway
The role of commercial cloud computing as a source of scalable compute power for science gateways is an area of ongoing investigation. As part of this effort, we are exploring the practicality of cloud bursting to a commercial provider for the CIPRES Science Gateway (CIPRES), a highly accessed gateway that delivers compute resources to users across all fields of biology. CIPRES provides browser and RESTful access to popular phylogenetics codes run on large computational clusters. Historically, CIPRES has submitted compute-intensive jobs to clusters provided through the NSF-funded XSEDE project. An ongoing issue for CIPRES is whether compute time available on XSEDE resources will be adequate to meet the needs of a large and growing user base. Here we describe a partnership with Internet2 to create infrastructure that supports CIPRES submissions to compute resources available through a commercial cloud provider, Amazon Web Services (AWS). This paper describes the design and implementation of the infrastructure created, which allows users to submit a specific subset of CIPRES jobs to V100 GPU nodes at AWS. This new infrastructure allows us to refine and tune job submissions to commercial clouds as a production service at CIPRES. In the short term, the results will speed the discovery process by allowing users greater discretionary access to GPU resources at AWS. In the long term, this infrastructure can be expanded and improved to submit all CIPRES jobs to one or more commercial providers on a fee-for-service basis.


Tuesday September 24, 2019 3:20pm - 3:30pm PDT
Toucan Room, Catamaran Resort

3:30pm PDT

BASIN-3D: Reducing the data processing burden for earth scientists
Earth scientists expend significant effort synthesizing data often from multiple data sources for both modeling and empirical analyses. We introduce BASIN-3D (Broker for Assimilation, Synthesis and Integration of eNvironmental Diverse, Distributed Datasets) as a data brokering approach to reduce the scientist's data processing burden. BASIN-3D synthesizes diverse data from a variety of remote sources in real-time without the need for additional storage. BASIN-3D is an extendable Django web framework application using a generalized data synthesis model that makes the synthesized data available via REpresentational State Transfer (REST) Application Programming Interface (API). We have currently implemented our data synthesis model to represent sensor-based time series earth science observations across a hierarchy of spatial locations. Supporting our data synthesis model is a plugin framework that allows developers to map data sources of interest to the BASIN-3D synthesis model. In this demo, we give an overview of BASIN-3D's synthesis model and plugin framework and compare direct time-series queries to a public data source with queries to BASIN-3D. Additionally, we demonstrate a web interface built on top of BASIN-3D that provides a usable interface for scientific users including features such as an interactive map, data visualization, and download.


Tuesday September 24, 2019 3:30pm - 3:50pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

3:30pm PDT

Supporting Characterisation Communities with Interactive HPC (Characterisation Virtual Laboratory)
The Characterisation VL is an Australian nationally funded virtual laboratory focused on bringing together the national community around their research data and software. Principally this means imaging techniques including optical microscopes. CT, MRI, Cryo Electron microscopy and other non-traditional techniques. As it turns out Characterisation is very general, but does have two principal commonalities:
- Data sets are getting larger every day (CryoEM ~2-5TB per dataset, LLSM ~1-10TB per dataset). They are becoming too large for the average workstation and difficult for enterprise IT providers within universities.
- Many data processing tools take the form of desktop applications, requiring interactivity and input from domain experts.
Rather than building a dedicated web interface to a single workflow, the CVL has chosen to provide access to a virtual laboratory with all of the techniques needed by the range of characterisation communities. In this demonstration we will show how easy access to virtual laboratories (science gateway) has impacted the Australian characterisation community as well as explaining the first and second generations of architecture, used and how it can be reused by other computing facilities to benefit their users.


Tuesday September 24, 2019 3:30pm - 3:50pm PDT
Toucan Room, Catamaran Resort

3:50pm PDT

BEACO2N Data Explorer
The BEACO2N website offers an easy-to-use tool for visualizing, comparing and downloading air quality data. Used in science curriculum K-12 and by academic researchers, these data provide measurements of the air we breath and the factors that influence them.


Tuesday September 24, 2019 3:50pm - 4:10pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

3:50pm PDT

SciServer: Bringing Analysis to Petabytes of Scientific Data
SciServer is a free science gateway that offers access to more than five Petabytes of data across multiple science domains, along with free online tools to analyze, share, and publish results.
SciServer’s online services are entirely browser-based, with no software to install and configure, and are designed to be easy to learn and use. They include familiar user interface components for managing and sharing files, creating groups, and running computational analysis in Python, R, or Matlab by means of Jupyter Notebooks or RStudio.

The SciServer project grew out an existing system designed to support astronomy research, featuring several research and education tools that made access to hundreds of Terabytes of astronomical data easy and intuitive for researchers, students, and the public. One component of the previous system was Galaxy Zoo, a citizen science project that resulted in reliable classifications of hundreds of thousands of galaxy images, and led to more than 40 peer-reviewed scientific publications.

The current SciServer system has scaled out these tools for multi-science-domain support, applicable to any form of data. SciServer has been used in a variety of fields, from oceanography to mechanical engineering to social sciences and finance.

SciServer features a learning environment that is being used in K-12 and university education in a variety of contexts, both formal and informal. We have continued to develop the educational tools into a new component called Courseware which allows a classroom or course project to be defined giving teachers and students direct access to hosted scientific data sets.
SciServer has sufficiently impressed some of our collaborators that three of them have taken the system and deployed it for themselves for use in varied environments. To facilitate this, over the past year we redeveloped the packaging and deployment model to support deployment in Kubernetes clusters. This work then led us to a new deployment of the system in the Amazon Cloud on their EKS platform. This latter installation is allowing us to experiment with the issues around data hosting and data transfer in a hybrid-cloud environment, and how best to support integration of user data between local and cloud hosted data sets.

SciServer is being developed by the Institute for Data-Intensive Engineering and Science (IDIES) at Johns Hopkins University, with funding from a five-year award from the National Science Foundation.
Submitted:


Tuesday September 24, 2019 3:50pm - 4:10pm PDT
Toucan Room, Catamaran Resort

4:15pm PDT

Plenary Panel: Effective Checklists for Developers and Researchers to Gather Requirements for Science Gateways
The initial idea for a science gateway is often driven by requirements in research and/or teaching; however, the knowledge about implementing an extensible, scalable, easy-to-use, and sustainable science gateway is not necessarily in the knowledge portfolio of the researcher behind the idea. On the development side, someone knowledgeable on the topic of creating science gateways may not necessarily be an expert in the research area serviced by an envisioned science gateway. The close collaboration between researchers and science gateway creators is crucial to gather all necessary information and requirements on a science gateway. This is usually an underestimated design task, and the exact layout for the science gateway is a continuous and iterative process. Suggestions come from developers for the design and layout while feedback and comments come from the user community. While each community and its requirements for a science gateway are unique, the questions that need to be answered for planning and designing a particular science gateway are very similar for any domain. The panel will discuss effective checklists to support developers communicating with diverse domain experts. Such checklists may be the basis for starting a Software Requirement Specification for an envisioned science gateway.

This panel will be useful for both software developers creating gateways and researchers/educators who specialize in the content of a gateway, as it will illuminate both sides of the process. Additional questions for the panelists will be accepted in advance during the Tuesday afternoon Learning Labs. More details will follow.

Presenters
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University


Tuesday September 24, 2019 4:15pm - 5:00pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

5:00pm PDT

Reception & Poster Session, including eScience attendees
Tuesday September 24, 2019 5:00pm - 7:00pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109
 
Wednesday, September 25
 

7:30am PDT

Registration Opens, Breakfast Available
Wednesday September 25, 2019 7:30am - 8:30am PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

8:30am PDT

Joint Welcome (including eScience attendees)
Presenters
avatar for Katherine Lawrence

Katherine Lawrence

Associate Director, Community Engagement & Exchange, U of Michigan/Science Gateways Community Institute
I help people creating advanced digital resources for research and education connect their projects with helpful services, expertise, and information. Ask me how the Science Gateways Community Institute can support your projects--at no cost--to better leverage the people and money... Read More →


Wednesday September 25, 2019 8:30am - 9:00am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

9:00am PDT

Keynote: Randy Olson on "Narrative Is Everything: The ABT Framework and Narrative Evolution"
Presenters
avatar for Randy Olson

Randy Olson

Randy Olson is a scientist-turned-filmmaker who left a tenured professorship of marine biology (PhD Harvard University) to attend USC Cinema School, then work in and around Hollywood for 25 years. He wrote and directed the documentary feature film “Flock of Dodos: The Evolution-Intelligent Design Circus,” which premier... Read More →


Wednesday September 25, 2019 9:00am - 10:00am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:00am PDT

Coffee Break
Wednesday September 25, 2019 10:00am - 10:30am PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:30am PDT

Tapis-CHORDS Integration: Time-Series DataSupport in Science Gateway Infrastructure
The explosion of IoT devices and sensors in recentyears has led to a demand for efficiently storing, processing andanalyzing time-series data. Geoscience researchers use time-seriesdata stores such as Hydroserver, VOEIS and CHORDS. Many ofthese tools require a great deal of infrastructure to deploy andexpertise to manage and scale. Tapis’s (formerly known as Agave)platform as a service provides a way to support researchers ina way that they are not responsible for the infrastructure andcan focus on the science. The University of Hawaii (UH) andTexas Advanced Computing Center (TACC) have collaboratedto develop a new API integration that combines Tapis withthe CHORDS time series data service to support projects atboth institutions for storing, annotating and querying time-seriesdata. This new Streams API leverages the strengths of both theTapis platform and CHORDS service to enable capabilities forsupporting time-series data streams not available in either toolalone. These new capabilities may be leveraged by Tapis poweredscience gateways with needs for handling spatially indexed time-series data-sets for their researchers as they have been at UHand TACC.


Wednesday September 25, 2019 10:30am - 10:50am PDT
Toucan Room, Catamaran Resort

10:30am PDT

Ghub: Building a Glaciology Gateway to Unify a Community
There is currently no consensus on how quickly the Greenland ice sheet is melting due to global warming, and what the ramifications will be for the rise in sea level. Sea level rise is a grave concern, due to its potential impact on coastal populations, global economies, and national security. Therefore, the ice-sheet science community is striving to improve their understanding of the problem. This community consists of two groups that perform related but distinct kinds of science: a data community, and a model building community. Broadly, the data community characterizes past and current states of the ice sheets, by assembling data from past events and from satellite observations. The modeling community, meanwhile, seeks to explain and forecast the speed and extent of ice sheet melting and subsequent sea level rise, by developing and validating computational models to explain these changes. Although ice sheet experimental data and models are dependent on one another, these two groups of scientists are not well integrated; better coordination is needed between data collection efforts and modeling efforts if we are to improve our understanding of ice sheet melting rates. These two scientific communities must build closer ties in order to better validate models and reduce prediction uncertainties.

We present a new science gateway, GHub, that is taking form as a collaboration space for ice sheet scientists in academia and government agencies alike. This gateway, built on the HUBzero platform, will host datasets and modeling workflows, and provide access to codes for community tool building. First, we aim to collect, centralize, and fuse existing datasets, creating new data products that more completely catalog the ice sheets of Greenland and Antarctica. Second, we plan to build workflows that provide support for correct model validation and improve uncertainty quantification, thus extending existing ice sheet models. Finally, we will host existing community codes. We will install codes such as CmCt on the gateway server itself, and others, such as ISSM, on gateway-accessible high-performance computing resources, so that scientists can build new tools utilizing them. A natural objective of this gateway is to provide a unifying location where these disparate scientific communities may gather, mingle, and collaborate, using collaborative gateway features with the goal of doing better science. Overall, this gateway will be a major step towards accomplishing goals that were identified by a recent NSF workshop on the Greenland ice sheet. With this new cyberinfrastructure, ice sheet scientists will gain improved tools to quantify the rate and extent of sea level rise, for the benefit of human societies around the globe.

Presenters
avatar for Jeanette Sperhac

Jeanette Sperhac

Scientific Programmer, University at Buffalo/Center for Computational Research


Wednesday September 25, 2019 10:30am - 10:50am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:50am PDT

Streamed Data via Cloud-Hosted Real-Time Data Services for the Geosciences as an Ingestion Interface into the Planet Texas Science Gateway and Integrated Modeling Platform
By the year 2050, the state of Texas is forecast to increase in population from 28 million to nearly 55 million residents. As a result, the effects of present utilization in the sustainability of natural resources (water, energy, and land-use) must be modeled and made available to policymakers. The Planet Texas 2050 (PT2050) project is designed to address knowledge and information needed to inform and support resilient responses in the face of identified vulnerabilities.

The DataX Science Gateway is in development as part of the PT2050 initiative, to provide a platform through which scientists, data analysts, and policymakers collaborate to generate cross-disciplinary environmental models. The scientists and analysts creating the hybridized models will have unique access to both datasets, workflow generation tools, and collaborators historically partitioned across disciplines. The DataX Gateway enables the ingestion, data transformations and composition of integrated models. Core capabilities within the data portal include tools for assimilating disparate datasets, pre-processing data sources for inclusion in integrated models, and sharing through the community with access to large scale resources including storage, and computational capabilities at the Texas Advanced Computing Center.

Generally, integrated models use static datasets. The purpose of this research was to explore a method by which real-time in-situ environmental edge monitoring systems could stream data into backend models for processing. The real-time data serves as a ground truth source of information for models and expands the spectrum of possible use cases the DataX Gateway could support. The Cloud-Hosted Real-time Data Services for the Geosciences project, funded by the EarthCube program at NSF, was implemented within the DataX platform from an edge sensor point of view. Non-standard utilization of the application programming interface (API) for the ingestion of prior/non-streamed datasets was also addressed as a possible use case. Future work aims to create a data streaming to data frame workflow as an approach for connecting real-time or near real-time data with integrated models at scale. Challenges include addressing authentication and data confidentiality for potential users, as well as data collection at scale limitations.

Early implementation and testing of data streaming in the gateway has demonstrated that the capabilities of the API exceed standard data streaming. When viewed as a core service, CHORDS becomes a method by which datasets can be added to the DataX platform while providing both standardized geoscience naming schemes as well as direct pipelines into integrated model workflows.


Wednesday September 25, 2019 10:50am - 11:10am PDT
Toucan Room, Catamaran Resort

10:50am PDT

The ‘Ike Wai Hawai‘i Groundwater Recharge Tool
This paper discusses the design and implementation of the ‘Ike Wai Hawai‘i Groundwater Recharge Tool, an application for providing data and analyses of the impacts of land-cover and climate modifications on groundwater-recharge rates for the island of O‘ahu. This application uses simulation data based on a set of 29 land-cover types and two rainfall scenarios to provide users with real-time recharge calculations for interactively defined land-cover modifications. Two visualizations, representing the land cover for the island and the resultant groundwater-recharge rates, and a set of metrics indicating the changes to groundwater recharge for relevant areas of the map are provided to present a set of easily interpreted outcomes based on the user-defined simulations. Tools are provided to give users varying degrees of control over the granularity of data input and output, allowing for the quick production of a roughly defined simulation, or more precise land-cover models that can be exported for further analysis. Heuristics are used to provide a responsive user interface and performant integration with the database containing the full set of simulation data. This tool is designed to provide user-friendly access to the information on the impacts of land-cover and climate changes on groundwater-recharge rates needed to make data-driven decisions.


Wednesday September 25, 2019 10:50am - 11:10am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

11:15am PDT

Featured Interactive Presentation: Engaging Presentation Skills
You’ve got solid data, but are people really listening to you?

For everything from seeking funding to motivating stakeholder action, communication matters. You’ve got the substance, but if it’s not presented with the proper form, structure, and style, it’s not going to be understood. I’m here to help you with this.

This brief, interactive talk will introduce you to simple techniques designed to engage audiences and make you a more effective communicator. I’ve worked with a wide range of scientists from the Jet Propulsion Labs in Pasadena to doctors at major hospitals. I’m eager to share this training with you.

Brian Palermo is a career professional actor who’s been training scientists and science communicators for a decade. https://www.palermoscienceimprov.com/

Presenters
avatar for Brian Palermo

Brian Palermo

Palermo Improv Training
Brian Palermo is an engaging actor with an impressive resume of performances in television, film and top comedy venues. He graduated from the University of New Orleans with a degree in Drama and Communications. He has been a performer and teacher with The Groundlings Theatre, Los... Read More →


Wednesday September 25, 2019 11:15am - 12:00pm PDT
Toucan Room, Catamaran Resort

11:15am PDT

Learning Labs
Wednesday Learning Labs (11:15-12:00pm)
Table Numbers and Topics for this time slot
  1. Building gateways shouldn't be this hard   
    It's no secret that science gateways are more than, "just a website." But what makes them so challenging (and expensive) to build? People from every area of the gateway community are invited to come exchange problems, solutions, and ideas about how to make gateway development better for everyone. Hosted by Rion Dooley.
  2. Containers: The Why and How
    Do you currently use containers in your gateway? If so, why did you choose to? If not, could you benefit from their use? Hosted by Rajesh Kalyanam.
  3. New User Adoption of Gateways
    What can we do to promote new user adoption of gateways? Come share your tips, strategies, and ideas, and learn from others too. Hosted by Kerk Kee.
  4. iReceptor: Gateways and Data Standards
    Following up on my talk on Tuesday, discussing iReceptor, a gateway for Immune Genetics, including both the gateway architecture and how we use data standards in our platform. Hosted by Brian Corrie.
  5. HPC in the Cloud with Tapis
    Discussions around how to use the Tapis API framework for high throughput and high performance computing. Hosted by Joe Stubbs.
  6. SGCI Cloudify Science Gateways
    This solicitation seeks proposals from the SGCI Community to Cloudify Science Gateways for operation in the public cloud. Stop by to talk to us about your idea. Hosted by Boyd Wilson and Amy Cannon.

About Learning Labs
“Learning Labs” may be one of several styles of impromptu learning:
  • Pop-up BOFs (Birds-of-a-Feather Sessions)
  • Mini Hacks
  • Coffee-Break Conversations
We will have three 45-minute periods devoted to your ideas. Round tables will be set up for you to meet with others and discuss the topics of your choice. Here are the ways you can get involved:
  1. Propose a topic that you’d be willing to host. (You don’t have to be an expert, just interested!)
  2. Find a topic that interests you, and join a table!
Submit your topic with this Google form by Tuesday, September 24 at 7pm Pacific: https://forms.gle/1Th5svq13VeKreLE7.

We’ll announce when and where the first round are happening by Tuesday morning on Sched, second round on Wednesday morning. We may be able to insert additional topics after the deadline if space is available.

Wednesday September 25, 2019 11:15am - 12:00pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

12:00pm PDT

Lunch
Wednesday September 25, 2019 12:00pm - 1:00pm PDT
Beach, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:00pm PDT

ROTDIF-web and ALTENS: GenApp-based Science Gateways for Biomolecular Nuclear Magnetic Resonance (NMR) Data Analysis and Structure Modeling
Proteins and nucleic acids participate in essentially every biochemical process in living organisms, and the elucidation of their structure and motions is essential for our understanding how these molecular machines perform their function. Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful versatile technique that provides critical information on the molecular structure and dynamics. Spin-relaxation data are used to determine the overall rotational diffusion and local motions of biological macromolecules, while residual dipolar couplings (RDCs) reveal local and long-range structural architecture of these molecules and their complexes. This information allows researchers to refine structures of proteins and nucleic acids and provides restraints for molecular docking. Several software packages have been developed by NMR researchers in order to tackle the complicated experimental data analysis and structure modeling. However, many of them are offline packages or command-line applications that require users to set up the run time environment and also to possess certain programming skills, which inevitably limits accessibility of this software to a broad scientific community. Here we present new science gateways designed for NMR/structural biology community that address these current limitations in NMR data analysis. Using the GenApp technology for scientific gateways (https://genapp.rocks), we successfully transformed ROTDIF and ALTENS, two offline packages for bio-NMR data analysis, into science gateways that provide advanced computational functionalities, cloud-based data management, and interactive 2D and 3D plotting and visualizations. Furthermore, these gateways are integrated with molecular structure visualization tools (Jmol) and with gateways/engines (SASSIE-web) capable of generating huge computer-simulated structural ensembles of proteins and nucleic acids. This enables researchers to seamlessly incorporate conformational ensembles into the analysis in order to adequately take into account structural heterogeneity and dynamic nature of biological macromolecules. ROTDIF-web offers a versatile set of integrated modules/tools for determining and predicting molecular rotational diffusion tensors and model-free characterization of bond dynamics in biomacromolecules and for docking of molecular complexes driven by the information extracted from NMR relaxation data. ALTENS allows characterization of the molecular alignment under anisotropic conditions, which enables researchers to obtain accurate local and long-range bond-vector restraints for refining 3-D structures of macromolecules and their complexes. We will describe our experience bringing our programs into GenApp and illustrate the use of these gateways for specific examples of protein systems of high biological significance. We expect these gateways to be useful to structural biologists and biophysicists as well as NMR community and to stimulate other researchers to share their scientific software in a similar way.


Wednesday September 25, 2019 1:00pm - 1:20pm PDT
Toucan Room, Catamaran Resort

1:00pm PDT

Instant On: Caching Simulation Results for Science Gateways
Powered by the HUBzero platform, nanoHUB is the science gateway built and operated by the Network for Computational Nanotechnology (NCN). Like many science gateways, nanoHUB offers a variety of content. Among all HUBzero hubs, nanoHUB is unique for its large catalog of simulation tools and its community of tool users. In 2018, nanoHUB saw 16,750 users execute more than 750,000 simulation jobs using some 600 simulation tools. The resources applied to computing these jobs totaled some 145,000 CPU hours.

While the CPU allocation is significant, what is arguably more significant is the “wall” time experienced by the users running the simulation. Our own internal studies have shown a relationship between usage and wall time. Tools that have a low expected wall time typically have the highest utilization. The bulk of nanoHUB Rappture tools typically execute jobs in the range of almost 0.0 seconds to the maximum allowed session time of 2 weeks. Across these jobs, the expected (median) wall time is approximately 17.0 seconds.

Starting in 2011, the combined efforts of the leadership teams of both nanoHUB and HUBzero were awarded an NSF grant for the “Instant On” project. This project invested in several strategies to reduce resource consumption and improve user experience by reducing the turn around time between submitting a simulation job and receiving the computed result. One of the strategies would invest in developing a system to re-use
simulation results when possible. This development ultimately became a part of the HUBzero middleware as a caching system. It is this caching system upon which the remainder of this paper will focus.

In Section 2, we will describe the design goals of the “Instant On” cache and highlight some of the implementation details and features. In Section 3, we will discuss the operation of the cache with respect to utility and economy and also some of the pitfalls both experienced and potential. Section 4 will present some future directions in which the cache is but one of several services built on top of the underlying archive of simulation results. We will conclude in Section 5 with an invitation for other science gateways to use “Instant
On” as part of their tool and workflow pipelines.


Wednesday September 25, 2019 1:00pm - 1:20pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:20pm PDT

EarthCube Data Discovery Studio, an integration of a semantically enhanced cross-disciplinary catalog with JupyterHub to enable an analytical workbench
EarthCube Data Discovery Studio (DDStudio) works to integrate resources described by metadata with analytical platforms. DDStudio has harvested over 1.6 million metadata records from over 40 sources, enhanced them via an augmentation pipeline, created a catalog, and provided an interface that allows users to explore the data via Jupyter Notebooks. DDStudio utilizes a scalable metadata augmentation pipeline designed to improve and re-index metadata content using text analytics and an integrated geoscience ontology. Metadata enhancers automatically add keywords and related ontology references that describe science domains, geospatial features, measured variables, equipment, geoscience processes, and other characteristics, thus search and discovery of semantically indexed datasets. In the pipeline, we enhance spatial and temporal extents, and organization identifiers, enabling faceted browsing by these parameters. The pipeline also generates provenance for each enhanced metadata document, publishes the metadata using schema.org markup, lets users validate or invalidate metadata enhancements, and enables faceted search. Users are permitted to upload metadata descriptions for resources not already in the catalog and have them immediately available within the search interface. DDStudio and the Jupyter Hubs are loosely coupled and communicate via a simple interface we call a dispatcher. Users can search for datasets in DDStudio by utilizing text, search facets, and geospatial and temporal filters. Researchers can collect records of interests into collections, save the collections for further use, and share collections of resources with collaborators. From DDStudio, users can launch Jupyter notebooks residing on several JupyterHubs for any metadata record, or a built collection of metadata records. The dispatcher seeks to identify appropriate resources to utilize in visualization, analysis or modeling, thus bridging resource discovery with more in-depth data exploration. Users can contribute their own notebooks to process additional types of data indexed in DDStudio. DDStudio demonstrates how linking search results from the catalog directly to software tools and environments reduces time to science in a series of examples from coral reef and river geochemistry studies. DDStudio has worked with SGCI to enhance its process and utility with centralized authentication, security analysis, and outreach to user communities. URL: datadiscoverystudio.org


Wednesday September 25, 2019 1:20pm - 1:40pm PDT
Toucan Room, Catamaran Resort

1:20pm PDT

SimTools: Standardized Packaging for Simulation Tools
In this paper we are introducing SimTools, a simple way to create tools with well-defined inputs and outputs. Ease-of-use is a priority; a SimTool is a Jupyter notebook which can be a self-contained simulation or a wrapper that calls a larger tool. Inputs and outputs are described in YAML embedded in the notebook. A new copy of the notebook is returned as the result of the run, with the cells showing the progress of the simulation, including intermediate results for debugging. Outputs are embedded in the notebook metadata as data or database references. Published SimTools can be deployed as Docker or Singularity images and will be runnable on any platform that can run those containers.


Wednesday September 25, 2019 1:20pm - 1:40pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:40pm PDT

Rebasing HUBzero tool middleware from OpenVZ to Docker
HUBzero middleware has been based on OpenVZ container technology for a decade. This provided very powerful control and customization options with light resource utilization, ahead of alternatives at the time. However, OpenVZ 6 will reach end of life in November 2019. The next version, OpenVZ 7, is substantially different than its predecessors. Architecturally, OpenVZ 7 is becoming its own Linux distribution with limited support for the previous container management with "simfs". Adapting the HUBzero middleware to simfs under OpenVZ 7 resulted in a loss of quota management. HUBzero tool development under OpenVZ, as well as testing the entire HUBzero software stack, has been problematic because it required people to install a different kernel than the one provided by their distribution; under OpenVZ 7, having to install a specific distribution would make the problem even worse. The HUBzero middleware also required that all tools use the same tool template, so upgrades to the tool template necessitated synchronized upgrades and retesting of all tools.

Meanwhile, Docker emerged as a popular choice for creating, sharing and deploying containers. Docker isn't tied to a specific Linux distribution, and is easier to install and use than OpenVZ. Having the entire HUBzero software stack, not just the middleware, redeployed as Docker containers would ease testing, development, adoption and deployment. However, there were several challenges to doing so. One is that by default Docker heavily manages the host firewall, conflicting with the management performed by the HUBzero middleware, which also interacts extensively with the host firewall. That Docker functionality is optional, but enabled by default and normally expected to be functional. We didn't want to disable the Docker firewall functionality, as that may be surprising and cause compatibility issues. The second challenge was trying to separate the X11 server and related services, from the tools themselves, which used to all be located in the same OpenVZ container. Doing so creates flexibility and makes sense as newer tools tend to emit HTML directly and do not require an X11 server. It also makes tool containers smaller and more easily shared and managed. The third challenge is an ongoing one, which is to evaluate the security implications of using Docker instead of OpenVZ and develop better assurances based on gathered experience and evidence.


Wednesday September 25, 2019 1:40pm - 1:50pm PDT
Toucan Room, Catamaran Resort

1:40pm PDT

Chem Compute Undergraduate Computational Chemistry Science Gateway
The Chem Compute Science Gateway provides access for undergraduate chemistry students to perform computational chemistry jobs. These jobs are mostly run within a typical 3 - 4 hour laboratory period. Thus, the users and usage of our gateway is quite different from a typical research based gateway. We will demonstrate the usage of the gateway and the aspects of it that are geared towards interfacing with undergraduates in a short lab period.

Presenters
MP

Mark Perri

Associate Professor, Chem Compute


Wednesday September 25, 2019 1:40pm - 1:50pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:50pm PDT

vDef-Web: A Case-Study on Building a Science Gateway Around a Research Code
Many research codes assume a user’s proficiency with high-performance computing tools, which often hinders their adoption by a community of users.
Our goal is to create a user-friendly gateway to allow such users to leverage new capabilities brought forward to the fracture mechanics community by the phase-field approach to fracture, implemented in the open source code vDef.

We leveraged popular existing tools for building such frameworks: Agave, Django, and Docker, to build a Science Gateway that allows a user to submit a large number of jobs at once.
We use the Agave framework to run jobs and handle all communications with the high-performance computers, as well as data sharing and tracking of provenance.
Django was used to create a web application.
Docker provided an easily deployable image of the system, simplifying setup by the user.

The result is a system that masks all interactions with the high-performance computing environment and provides a graphical interface that makes sense for scientists.
In the common situation of parameter sweeps our gateway also helps the scientists comparing outputs of various computations using a matrix view that links to individual computations.


Wednesday September 25, 2019 1:50pm - 2:00pm PDT
Toucan Room, Catamaran Resort

2:00pm PDT

Simplifying Natural Hazards Engineering Research Data with an Interactive Curation Process
The organization of complex datasets, experiments, and simulations into readable and reusable information is a challenging task in the realm of Natural Hazards Engineering Research. The varying types and formats of data collected in the field and in laboratory research is difficult to present due to the scale and complexity of these procedures. Procedures performed in experimental facilities use a variety of unique tools to simulate natural disasters. These tools have differing setups as well as various configurations of sensors and sensor types. In addition, each sensor type may have its own output. When reproducing or referencing these procedures, it is difficult to interpret how each piece of information relates to the whole. Finally, there is a need to store data while in the field, collaborate with other researchers, visualize their results, and curate their findings in a place that can be easily referenced and reused by other members in the community.
DesignSafe is a Science Gateway that aims to enable natural hazards researchers by addressing these issues in an easy-to-use web interface. It provides researchers the ability to collaborate on projects in a shared workspace and publish their data through an interactive curation process. Allowing researchers and engineers the ability to accurately portray relationships between their predictions, procedures, and results greatly improves the readability and reusability of their findings. To do this, we’ve collaborated with several research groups and universities to develop a series of standardized yet flexible models that researchers use to structure their projects. In doing so, researchers can publish large and complex procedures in a way that is simple to interpret, cite, and reuse. This capability known as the DesignSafe curation process.
This paper will focus more specifically on how this curation process was developed and implemented. It will expand on challenges implementing this process and future work being planned to further improve the pipeline.


Wednesday September 25, 2019 2:00pm - 2:20pm PDT
Toucan Room, Catamaran Resort

2:00pm PDT

Learning Labs
Wednesday Learning Labs (2:00-2:40pm)
Table Numbers and Topics for this time slot
  1. Kubernetes for Fun and Science
    With Kubernetes establishing itself as the de facto container orchestration platform in the cloud, what happens when the gateway community adopts Kubernetes into their projects. Meet fellow developers, designers, and community builders to talk Kubernetes for science and what the availability of open platforms means for gateway sustainability. Hosted by Rion Dooley.
  2. Code and Coffee
    We get it. Sometimes you just need a fix. Take a break from the program to grab some coffee, knock out some code. No host, but open to anyone.
  3. Jupyter(Hub) in Gateways
    This is a discussion about all things related to the jupyter project, including integrating JupyterHub with Gateways, running Jupyter Notebooks as jobs, and deploying JupyterHub on Jetstream with Kubernetes. Hosted by Andrea Zonca and Julia Looney.
  4. Gateways for classroom education
    Let's talk about the challenges of focusing on education in a research oriented world. Hosted by Mark Perri.
  5. Research Software Engineers - Challenges of a career path in academia
    One major concern in achieving software sustainability is improving career paths for RSEs, research programmers and/or facilitators - whether they are staff or faculty at academic institutions or national labs. Software contributions are generally not a factor in career advancement in academia. While several initiatives and projects such as US-RSE show the interest in changing the culture in academia to support career paths, the interaction between these initiatives is still sparse. This could be a good time to join forces and use the momentum of the diverse communities thinking about similar challenges. Hosted by Sandra Gesing.
About Learning Labs
“Learning Labs” may be one of several styles of impromptu learning:
  • Pop-up BOFs (Birds-of-a-Feather Sessions)
  • Mini Hacks
  • Coffee-Break Conversations
We will have three 45-minute periods devoted to your ideas. Round tables will be set up for you to meet with others and discuss the topics of your choice. Here are the ways you can get involved:
  1. Propose a topic that you’d be willing to host. (You don’t have to be an expert, just interested!)
  2. Find a topic that interests you, and join a table!
Submit your topic with this Google form by Tuesday, September 24 at 7pm Pacific: https://forms.gle/1Th5svq13VeKreLE7.

We’ll announce when and where the first round are happening by Tuesday morning on Sched, second round on Wednesday morning. We may be able to insert additional topics after the deadline if space is available.

Wednesday September 25, 2019 2:00pm - 2:40pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

2:20pm PDT

Protecting integrity and provenance of research data with the Open Science Chain
Facilitating the future reuse of data is critical to the advancement of research. Researchers need the ability to independently validate the authenticity of scientific datasets and track the provenance information to extend or build upon prior research. The National Science Foundation funded Open Science Chain project is building a cyberinfrastructure solution to enable a broad set of researchers to efficiently verify and validate the authenticity of scientific datasets and share metadata including detailed provenance information in a secure manner. In this demonstration, we will show how science gateway users can benefit from utilizing the Open Science Chain cyberinfrastructure to enhance the trustworthiness of their data.


Wednesday September 25, 2019 2:20pm - 2:40pm PDT
Toucan Room, Catamaran Resort

2:45pm PDT

Plenary Closing
We're eager for your input about growing the gateway community and shaping upcoming conferences into "can't miss it" events. Join us for a quick chat before grabbing a snack and heading home!

Wednesday September 25, 2019 2:45pm - 3:15pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

3:15pm PDT

Snack Break
The conference will be wrapped up, but please feel free to stick around, grab a snack, and chat with others before leaving town!

Wednesday September 25, 2019 3:15pm - 3:30pm PDT
Foyer, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

5:00pm PDT

Optional dinner gathering at Waterbar
On Wednesday, September 25 at 5:00pm, those who are staying in town for the evening are welcome to join us for a relaxing evening at a local restaurant. If you are interested, sign up at the Registration Desk by the end of the reception on Tuesday.
The cost of dinner is not included in your registration. We’ll be meeting at Waterbar, a 7-minute walk from the Catamaran.
Waterbar
4325 Ocean Blvd. 
San Diego, CA
https://www.waterbarsd.com/

Wednesday September 25, 2019 5:00pm - 8:00pm PDT
 
Filter sessions
Apply filters to sessions.