Loading…
Gateways 2019 has ended
Concurrents B [clear filter]
Tuesday, September 24
 

10:30am PDT

Gateway design features for undergraduate education communities
Like science gateways, an education gateway should provide research and management support for a community of practitioners working collaboratively to solve a set of challenging problems. While the technical aspects of the cyberinfrastructure play an important role in the utility of a gateway, they are not sufficient to attract users who are new to collaborative, online scholarship. Over the course of the development of the Quantitative Undergraduate Biology Education and Synthesis (QUBES) gateway we have learned to adapt our services and messaging to reach out to our target audience and recruit their participation. Part of this process has involved aligning our services with common project management challenges and being aware of the opportunities and constraints faced by teaching faculty. Adopting a client-centered approach has made it possible not only to build our user base, but to foster important conversations among users around promoting a shared culture that supports scholarly approaches to teaching.

Presenters
avatar for Michael LaMar

Michael LaMar

Associate Professor, College of William and Mary


Tuesday September 24, 2019 10:30am - 10:50am PDT
Toucan Room, Catamaran Resort

10:50am PDT

External Communication to Diffuse Science Gateways and Cyberinfrastructure for Research with Big Data
In the era of big data, for science gateways (SG) and cyberinfrastructure (CI) projects to have the greatest impacts, the tools need to be widely adopted in the scientific community. However, diffusion activities are often an afterthought in SG/CI projects. We warn against the fallacy of ‘If You Build It, They Will Come’. Projects could be intentional in promoting tool adoption. We identified five external communication practices based on an analysis of 20 interviews with administrators, developers, users, and outreach educators working in CI across the US. The practices include raising awareness of the innovations, engaging in educational outreach, building relationships with trust, networking with the community, and keeping a track record of reliability. While exploratory in nature, the findings can be used as a guideline for project to promote SG/CI diffusion. The paper serves as evidence to justify a bigger budget from funder for diffusion activities to increase adoption and broader impacts.


Tuesday September 24, 2019 10:50am - 11:10am PDT
Toucan Room, Catamaran Resort

11:10am PDT

TAMU HPRC Portal: Leveraging Open OnDemand for Research and Education
The Texas A&M University High Performance Research Computing (TAMU HPRC) Portal is a local installation and adaptation of Open OnDemand (OOD) on the HPRC clusters. The portal provides an advanced cyberinfrastructure that enables HPRC users with various backgrounds to utilize the High Performance Computing (HPC) resources for their research. It also serves as an educational platform for HPRC staff to train their users with cluster technologies and HPC applications.
Using OOD for the HPRC portal has three benefits. First, it provides a single point of access to all the HPC resources via a web browser and can greatly simplify HPC workflows. Second, it provides an intuitive user interface that significantly reduces the barrier between users and HPC working environments. Third, the extensible and scalable design makes it easy to accommodate a growing number of users and applications.
In addition to the out-of-the-box features, we have extensively customized the Matlab interface for our local needs. We have also developed a dynamic form generation scheme that makes the portal app deployment and management more efficient. We have used the portal in multiple training programs and have received positive feedback from the instructors and the users.
To understand the impact of the portal on our users, we analyzed portal access data and conducted a survey among HPRC portal users. We received 148 survey responses out of 554 users who have accessed the portal between March 22, 2018 and April 24, 2019. The responses demonstrate that most users think the apps are useful and they would recommend the portal to other users. Additionally, we provide two use cases from our portal users, one for research and one for training, to demonstrate the usefulness of the portal.
Our paper is the first that describes the experience with OOD from an HPC site outside of OSC. Overall, the TAMU HPRC Portal based on OOD provides a robust and simple solution for both novice and experienced users at TAMU HPRC to access HPC resources. It is a valuable addition to the traditional command line based approach.


Tuesday September 24, 2019 11:10am - 11:20am PDT
Toucan Room, Catamaran Resort

11:20am PDT

Using a Scientific Gateway to Build STEM Education Capacity Around the World
With a broad focus on STEM education, STEMEdhub.org brings together university researchers in STEM disciplines with other researchers, k-12 teachers/practitioners, students and the public through the various groups and projects on the site. Built on Purdue’s HUBzero architecture, STEMEdhub.org is a fully functional Gateway that facilitates the hosting of interactive scientific tools, online presentations, wikis, or documents such as assessment plans and courses for downloading or interactive editing, complemented by document tagging to enable searching and a rating tool for commenting on shared resources. STEMEdhub has been used for over 8 years to build capacity in many areas of STEM Education in the United States and throughout the world. It currently hosts over 6000 users in 160 user groups with over 1300 published resources. More importantly, STEMEdhub.org allows users themselves to create and manage their own groups, resources and communities of practice enabling it to exist on very little overhead with a small staff. While other Scientific Gateways focus on high performance computing capabilities, STEMEdhub is focused on using a Scientific Gateway platform to make connections, build partnerships and engage students. This Demo will show how STEMEdhub.org is used as a Scientific Gateway to build STEM Education Capacity throughout the world.

Presenters
AB

Ann Bessenbacher

Data Scientist, ELRC/Purdue University


Tuesday September 24, 2019 11:20am - 11:40am PDT
Toucan Room, Catamaran Resort

11:40am PDT

Open OnDemand: State of the Platform and the Project
High performance computing (HPC) has led to remarkable advances in science and engineering and has become an indispensable tool for research. Unfortunately, HPC use and adoption by many researchers is often hindered by the complex way in which these resources are accessed. Indeed, while the web has become the dominant access mechanism for remote computing services in virtually every computing area, it has not for HPC. Open OnDemand is an open source project to provide web based access to HPC resources (https://openondemand.org). This paper describes the challenges to adoption and other lessons learned over the three year project that may be relevant to other science gateway projects, and describes future plans in the Open OnDemand 2.0 project.


Tuesday September 24, 2019 11:40am - 12:00pm PDT
Toucan Room, Catamaran Resort

1:50pm PDT

DDX-Interface: An interface to and a factory of interoperable scientific gateways.
Data access and distribution is an ongoing problem in science, affecting many research fields, from genomic information to microscopic images of rocks. Issues such as differing database schema and file formats, inability to customize or enforce laboratory terms of use, infrastructure failure and financial limitations have reduced public access to scientific data. Centralized solutions have been funded by government agencies in an attempt to expand access, but often, valuable resources cannot be published in repositories without being part of a peer-reviewed publication. Previously, we proposed to answer the demand for public access to raw scientific data using Open Index Protocol, a specification to publish metadata into a public blockchain-based ledger and host files in peer-to-peer file systems. With this method, there are 30TB of cryo-electron tomography datasets publicly available today. Now, we generalized this idea to let researchers publish any kind of scientific datasets using a distributed public ledger as a common index between interoperable databases. Here we describe a customizable gateway capable of exploring these distributed databases and publishing new records. The basic gateway design is built to be intuitively operable by academic and non-academic public alike, expanding the reach of the data distribution. As Open Index Protocol becomes a popular choice for data distribution by laboratories, focus on the user experience of the interface for data consumption will be key to achieve its full impact on society.

In the demo part of this presentation, we will demonstrate how to build a distributed database to share scientific data using Open Index Protocol, a specification to publish metadata on FLO blockchain and use a peer-to-peer file system for file storage and distribution. To begin, we will launch an instance of DDX and publish the metadata schema to the blockchain. Next, we will publish a few datasets to the database using the schema. Then, we will configure the explorer template and customize it to create a static webpage capable of exploring, searching & downloading the datasets published. A remote colleague will run another instance of DDX, configured to be compatible with the database we just created, and will use it to publish some records. We will be able to visualize their records in our own instance of DDX. Finally, we will show how to deploy and build a static website that serves as a gateway to visualize the records in the newly created database. In this brief demonstration, we will show the flexibility and power of this distributed resource for increasing access to raw datasets. Our main goal is to make it easy for researchers to participate in the effort to host and share their own data.


Tuesday September 24, 2019 1:50pm - 2:20pm PDT
Toucan Room, Catamaran Resort

2:20pm PDT

Enabling Data Streaming-based Science Gateway through Federated Cyberinfrastructure
Large scientific facilities are unique and complex infrastructures that have become fundamental instruments for enabling high quality, world-leading research tackling scientific problems at unprecedented scales. Cyberinfrastructure (CI) is an essential component of these facilities to provide the user community with access to data, data products, and services with the potential to transform data into knowledge. However, the timely evolution of the CI available at the large facilities is challenging and can result in science communities requirements not being fully satisfied. Furthermore, integrating CI across multiple facilities as part of a scientific workflow is hard, resulting in data silos.

In this paper, we explore how science gateways can provide improved user experience and services that may not be offered at the large facilities datacenter. Using a science gateway supported by the Science Gateway Community Institute that provides subscription-based delivery of streamed data and data products from the NSF Ocean Observatories Initiative (OOI), we propose a system that enables streaming-based capabilities and workflows using data from large facilities such as OOI in a scalable manner. We leverage data infrastructure building blocks such as the Virtual Data Collaboratory that provides data and computing capabilities in the continuum to efficiently and collaboratively integrate multiple data-centric CI, build data-driven workflows and connect large facilities data sources with NSF-funded CI such as XSEDE. We also introduce architectural solutions for running these workflows using dynamically provisioned federated CI.


Tuesday September 24, 2019 2:20pm - 2:30pm PDT
Toucan Room, Catamaran Resort

3:00pm PDT

nanoHUB@home: Expanding nanoHUB through Volunteer Computing
Volunteer computing (VC) uses consumer digital electronics products, such as PCs, mobile devices, and game consoles, for high-throughput scientific computing. Device owners participate in VC by installing a program which, in the background, downloads and executes jobs from servers operated by science projects. Most VC projects use BOINC, an open-source middleware system for VC. BOINC allows scientists create and operate VC projects and enables volunteers to participate in these projects. Volunteers install a single application (the BOINC client) and then choose projects to support. We have developed a BOINC project, nanoHUB@home, to make use of VC in support of the nanoHUB science gateway. VC has greatly expanded the computational resources available for nanoHUB simulations.

We are using VC to support “speculative exploration”, a model of computing that explores the input parameters of online simulation tools published through the nanoHUB gateway, pre-computing results that have not been requested by users. These results are stored in a cache, and when a user launches an interactive simulation our system first checks the cache. If the result is already available it is returned to the user immediately, leaving the computational resources free and not re-computing existing results. The cache is also useful for machine learning (ML) studies, building surrogate models for nanoHUB simulation tools that allow us to quickly estimate results before running an expensive simulation.

VC resources also allow us to support uncertainty quantification (UQ) in nanoHUB simulation tools, to go beyond simulations and deliver real-world predictions. Models are typically simulated with precise input values, but real-world experiments involve imprecise values for device measurements, material properties, and stimuli. The imprecise values can be expressed as a probability distribution of values, such as a Gaussian distribution with a mean and standard deviation, or an actual distribution measured from experiments. Stochastic collocation methods can be used to predict the resulting outputs given a series of probability distributions for inputs. These computations require hundreds or thousands of simulation runs for each prediction. This workload is well-suited to VC, since the runs are completely separate, but the results of all runs are combined in a statistical analysis.


Tuesday September 24, 2019 3:00pm - 3:20pm PDT
Toucan Room, Catamaran Resort

3:20pm PDT

Cloud bursting to AWS from the CIPRES Science Gateway
The role of commercial cloud computing as a source of scalable compute power for science gateways is an area of ongoing investigation. As part of this effort, we are exploring the practicality of cloud bursting to a commercial provider for the CIPRES Science Gateway (CIPRES), a highly accessed gateway that delivers compute resources to users across all fields of biology. CIPRES provides browser and RESTful access to popular phylogenetics codes run on large computational clusters. Historically, CIPRES has submitted compute-intensive jobs to clusters provided through the NSF-funded XSEDE project. An ongoing issue for CIPRES is whether compute time available on XSEDE resources will be adequate to meet the needs of a large and growing user base. Here we describe a partnership with Internet2 to create infrastructure that supports CIPRES submissions to compute resources available through a commercial cloud provider, Amazon Web Services (AWS). This paper describes the design and implementation of the infrastructure created, which allows users to submit a specific subset of CIPRES jobs to V100 GPU nodes at AWS. This new infrastructure allows us to refine and tune job submissions to commercial clouds as a production service at CIPRES. In the short term, the results will speed the discovery process by allowing users greater discretionary access to GPU resources at AWS. In the long term, this infrastructure can be expanded and improved to submit all CIPRES jobs to one or more commercial providers on a fee-for-service basis.


Tuesday September 24, 2019 3:20pm - 3:30pm PDT
Toucan Room, Catamaran Resort

3:30pm PDT

Supporting Characterisation Communities with Interactive HPC (Characterisation Virtual Laboratory)
The Characterisation VL is an Australian nationally funded virtual laboratory focused on bringing together the national community around their research data and software. Principally this means imaging techniques including optical microscopes. CT, MRI, Cryo Electron microscopy and other non-traditional techniques. As it turns out Characterisation is very general, but does have two principal commonalities:
- Data sets are getting larger every day (CryoEM ~2-5TB per dataset, LLSM ~1-10TB per dataset). They are becoming too large for the average workstation and difficult for enterprise IT providers within universities.
- Many data processing tools take the form of desktop applications, requiring interactivity and input from domain experts.
Rather than building a dedicated web interface to a single workflow, the CVL has chosen to provide access to a virtual laboratory with all of the techniques needed by the range of characterisation communities. In this demonstration we will show how easy access to virtual laboratories (science gateway) has impacted the Australian characterisation community as well as explaining the first and second generations of architecture, used and how it can be reused by other computing facilities to benefit their users.


Tuesday September 24, 2019 3:30pm - 3:50pm PDT
Toucan Room, Catamaran Resort

3:50pm PDT

SciServer: Bringing Analysis to Petabytes of Scientific Data
SciServer is a free science gateway that offers access to more than five Petabytes of data across multiple science domains, along with free online tools to analyze, share, and publish results.
SciServer’s online services are entirely browser-based, with no software to install and configure, and are designed to be easy to learn and use. They include familiar user interface components for managing and sharing files, creating groups, and running computational analysis in Python, R, or Matlab by means of Jupyter Notebooks or RStudio.

The SciServer project grew out an existing system designed to support astronomy research, featuring several research and education tools that made access to hundreds of Terabytes of astronomical data easy and intuitive for researchers, students, and the public. One component of the previous system was Galaxy Zoo, a citizen science project that resulted in reliable classifications of hundreds of thousands of galaxy images, and led to more than 40 peer-reviewed scientific publications.

The current SciServer system has scaled out these tools for multi-science-domain support, applicable to any form of data. SciServer has been used in a variety of fields, from oceanography to mechanical engineering to social sciences and finance.

SciServer features a learning environment that is being used in K-12 and university education in a variety of contexts, both formal and informal. We have continued to develop the educational tools into a new component called Courseware which allows a classroom or course project to be defined giving teachers and students direct access to hosted scientific data sets.
SciServer has sufficiently impressed some of our collaborators that three of them have taken the system and deployed it for themselves for use in varied environments. To facilitate this, over the past year we redeveloped the packaging and deployment model to support deployment in Kubernetes clusters. This work then led us to a new deployment of the system in the Amazon Cloud on their EKS platform. This latter installation is allowing us to experiment with the issues around data hosting and data transfer in a hybrid-cloud environment, and how best to support integration of user data between local and cloud hosted data sets.

SciServer is being developed by the Institute for Data-Intensive Engineering and Science (IDIES) at Johns Hopkins University, with funding from a five-year award from the National Science Foundation.
Submitted:


Tuesday September 24, 2019 3:50pm - 4:10pm PDT
Toucan Room, Catamaran Resort
 
Wednesday, September 25
 

10:30am PDT

Ghub: Building a Glaciology Gateway to Unify a Community
There is currently no consensus on how quickly the Greenland ice sheet is melting due to global warming, and what the ramifications will be for the rise in sea level. Sea level rise is a grave concern, due to its potential impact on coastal populations, global economies, and national security. Therefore, the ice-sheet science community is striving to improve their understanding of the problem. This community consists of two groups that perform related but distinct kinds of science: a data community, and a model building community. Broadly, the data community characterizes past and current states of the ice sheets, by assembling data from past events and from satellite observations. The modeling community, meanwhile, seeks to explain and forecast the speed and extent of ice sheet melting and subsequent sea level rise, by developing and validating computational models to explain these changes. Although ice sheet experimental data and models are dependent on one another, these two groups of scientists are not well integrated; better coordination is needed between data collection efforts and modeling efforts if we are to improve our understanding of ice sheet melting rates. These two scientific communities must build closer ties in order to better validate models and reduce prediction uncertainties.

We present a new science gateway, GHub, that is taking form as a collaboration space for ice sheet scientists in academia and government agencies alike. This gateway, built on the HUBzero platform, will host datasets and modeling workflows, and provide access to codes for community tool building. First, we aim to collect, centralize, and fuse existing datasets, creating new data products that more completely catalog the ice sheets of Greenland and Antarctica. Second, we plan to build workflows that provide support for correct model validation and improve uncertainty quantification, thus extending existing ice sheet models. Finally, we will host existing community codes. We will install codes such as CmCt on the gateway server itself, and others, such as ISSM, on gateway-accessible high-performance computing resources, so that scientists can build new tools utilizing them. A natural objective of this gateway is to provide a unifying location where these disparate scientific communities may gather, mingle, and collaborate, using collaborative gateway features with the goal of doing better science. Overall, this gateway will be a major step towards accomplishing goals that were identified by a recent NSF workshop on the Greenland ice sheet. With this new cyberinfrastructure, ice sheet scientists will gain improved tools to quantify the rate and extent of sea level rise, for the benefit of human societies around the globe.

Presenters
avatar for Jeanette Sperhac

Jeanette Sperhac

Scientific Programmer, University at Buffalo/Center for Computational Research


Wednesday September 25, 2019 10:30am - 10:50am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

10:50am PDT

The ‘Ike Wai Hawai‘i Groundwater Recharge Tool
This paper discusses the design and implementation of the ‘Ike Wai Hawai‘i Groundwater Recharge Tool, an application for providing data and analyses of the impacts of land-cover and climate modifications on groundwater-recharge rates for the island of O‘ahu. This application uses simulation data based on a set of 29 land-cover types and two rainfall scenarios to provide users with real-time recharge calculations for interactively defined land-cover modifications. Two visualizations, representing the land cover for the island and the resultant groundwater-recharge rates, and a set of metrics indicating the changes to groundwater recharge for relevant areas of the map are provided to present a set of easily interpreted outcomes based on the user-defined simulations. Tools are provided to give users varying degrees of control over the granularity of data input and output, allowing for the quick production of a roughly defined simulation, or more precise land-cover models that can be exported for further analysis. Heuristics are used to provide a responsive user interface and performant integration with the database containing the full set of simulation data. This tool is designed to provide user-friendly access to the information on the impacts of land-cover and climate changes on groundwater-recharge rates needed to make data-driven decisions.


Wednesday September 25, 2019 10:50am - 11:10am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:00pm PDT

Instant On: Caching Simulation Results for Science Gateways
Powered by the HUBzero platform, nanoHUB is the science gateway built and operated by the Network for Computational Nanotechnology (NCN). Like many science gateways, nanoHUB offers a variety of content. Among all HUBzero hubs, nanoHUB is unique for its large catalog of simulation tools and its community of tool users. In 2018, nanoHUB saw 16,750 users execute more than 750,000 simulation jobs using some 600 simulation tools. The resources applied to computing these jobs totaled some 145,000 CPU hours.

While the CPU allocation is significant, what is arguably more significant is the “wall” time experienced by the users running the simulation. Our own internal studies have shown a relationship between usage and wall time. Tools that have a low expected wall time typically have the highest utilization. The bulk of nanoHUB Rappture tools typically execute jobs in the range of almost 0.0 seconds to the maximum allowed session time of 2 weeks. Across these jobs, the expected (median) wall time is approximately 17.0 seconds.

Starting in 2011, the combined efforts of the leadership teams of both nanoHUB and HUBzero were awarded an NSF grant for the “Instant On” project. This project invested in several strategies to reduce resource consumption and improve user experience by reducing the turn around time between submitting a simulation job and receiving the computed result. One of the strategies would invest in developing a system to re-use
simulation results when possible. This development ultimately became a part of the HUBzero middleware as a caching system. It is this caching system upon which the remainder of this paper will focus.

In Section 2, we will describe the design goals of the “Instant On” cache and highlight some of the implementation details and features. In Section 3, we will discuss the operation of the cache with respect to utility and economy and also some of the pitfalls both experienced and potential. Section 4 will present some future directions in which the cache is but one of several services built on top of the underlying archive of simulation results. We will conclude in Section 5 with an invitation for other science gateways to use “Instant
On” as part of their tool and workflow pipelines.


Wednesday September 25, 2019 1:00pm - 1:20pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:20pm PDT

SimTools: Standardized Packaging for Simulation Tools
In this paper we are introducing SimTools, a simple way to create tools with well-defined inputs and outputs. Ease-of-use is a priority; a SimTool is a Jupyter notebook which can be a self-contained simulation or a wrapper that calls a larger tool. Inputs and outputs are described in YAML embedded in the notebook. A new copy of the notebook is returned as the result of the run, with the cells showing the progress of the simulation, including intermediate results for debugging. Outputs are embedded in the notebook metadata as data or database references. Published SimTools can be deployed as Docker or Singularity images and will be runnable on any platform that can run those containers.


Wednesday September 25, 2019 1:20pm - 1:40pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109

1:40pm PDT

Chem Compute Undergraduate Computational Chemistry Science Gateway
The Chem Compute Science Gateway provides access for undergraduate chemistry students to perform computational chemistry jobs. These jobs are mostly run within a typical 3 - 4 hour laboratory period. Thus, the users and usage of our gateway is quite different from a typical research based gateway. We will demonstrate the usage of the gateway and the aspects of it that are geared towards interfacing with undergraduates in a short lab period.

Presenters
MP

Mark Perri

Associate Professor, Chem Compute


Wednesday September 25, 2019 1:40pm - 1:50pm PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109
 
Filter sessions
Apply filters to sessions.