Gateways 2019 has ended
Back To Schedule
Tuesday, September 24 • 11:10am - 11:20am
Purdue University Research Repository - adapting when small data gets bigger

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

PURR was founded in 2011 as a partnership between Purdue University Libraries, Information Technology at Purdue (ITaP), and the Office of the Executive Vice President for Research as campus-wide support for researchers throughout the data management lifecycle built on the HUBzero® platform, which was developed at Purdue. PURR provides the tools and expertise to help researchers plan for data management, share data with collaborators, publish completed datasets in compliance with federal funding guidelines, safely archive data, and track data publication impact. Every PURR user has access to private space for storing and sharing research data files. When research is completed, PURR takes users through a step-by-step process for selecting and describing data files for publication. Upon publication, PURR mints a DOI for each dataset, and provides archiving services through the MetaArchive network. All published datasets are maintained and accessible on the PURR website for at least 10 years. After which time, they will be reviewed by the libraries and could be decommissioned or moved to library archives.

Over the past eight years, PURR has published 975 datasets, and served over 3,600 researchers with 481 grant awards. In that time, PURR’s services have grown along with the HUBzero® platform to meet the changing needs of the Purdue community as researchers across all fields produce more data. Supporting larger datasets requires a multi-faceted approach far beyond simply acquiring additional storage space. Our recent development has followed a 5-pronged plan: 1) increased storage quotas, 2) new publication series functionality, 3) an online database viewer, 4) publication file preview, and 5) seamless ftp transfers for large publications. Combined, these improvements ensure our increasingly large data publications are not only stored safely, but also are accessible over the long term.

The newly published Rough Cilicia Survey Pottery Study dataset series illustrates both the motivation for and the results of PURR’s recent development. The culmination of four years of close collaboration between PURR’s data curator and a faculty member from Purdue’s classics department, the Rough Cilicia collection is composed of 25 datasets. The collection takes advantage of PURR’s series functionality, which allows authors to separate large data collections into smaller, more manageable, related subsets. These subsets are easier to download than the entire collection, and each subset has a DOI for precise citation. This series makes available images of hundreds of pottery sherds from the ancient Cilicia region of modern-day Turkey, and their associated descriptive information in a series of interactive data tables that allow the user to view, search, and filter data on the PURR website. Users can also download the data files for closer study and reuse. At about 15 GB, the Rough Cilicia series is not exactly “big data,” but it is large enough to stretch the limits of a web-based repository like PURR, and we are increasingly seeing datasets of this size or more. Moderate improvements like the five mentioned here allow us to publish larger datasets while maintaining the ease and convenience of serving users through a web browser.

avatar for Claire Stirm

Claire Stirm

Project Coordinator, UC San Diego | SDSC
Claire Stirm is the Deputy Director of the Incubator and Project Coordinator for the Science Gateways Community Institute (SGCI). 

Sandi Caldrone

Purdue University Libraries

Tuesday September 24, 2019 11:10am - 11:20am PDT
Kon Tiki Room, Catamaran Resort 3999 Mission Boulevard, San Diego, California 92109