HPDF: Status and Plans 

June 11, 2024

Presenters

  • Amber Boehnlein, Project Director
  • Lavanya Ramakrishnan, Deputy Project Director
  • Graham Heyes, Technical Director
  • Shane Canon, Deputy Technical Director

Overview

The mission of the High Performance Data Facility Project is to enable and accelerate scientific discovery by delivering state-of-the-art data management infrastructure, capabilities, and tools. HPDF will be a component of the U.S. Department of Energy’s Advanced Scientific Computing Research ecosystem. The HPDF Project is a partnership between Thomas Jefferson National Accelerator Facility and Lawrence Berkeley National Laboratory.  

During this webinar, the HPDF leadership team introduced the project, summarizing its status and immediate plans. Click on the presentation and video links to learn more.

Presentation

Slides2024-06-HPDFWebinarSlides-final.pdf

Webinar Q&A

Q1: What types of data are the focus of the HPDF Project and future facility?

A1: The HPDF Project team is designing a facility to address the full data lifecycle for experimental, observational, and simulation data, for the wide variety of data-intensive science research supported by the Office of Science. A successful HPDF Project will provide infrastructure and services to support the data lifecycle including data collection or generation through processing and curation, analysis, publishing, and archiving — supporting the end-to-end scientific discovery process. HPDF will support the three Integrated Research Infrastructure (IRI) patterns — time-sensitive, data-integration-intensive, and long-term campaigns – identified in the IRI Blueprint Activity.

For example, HPDF will help researchers in communities collect and process experimental and observational data and possibly integrating these products with theoretical and simulation data to answer boundary pushing, cross-cutting research questions. In cases of time sensitive workflows, experimental data will be streamed into HPDF for rapid analysis leading to refinement of the experimental approach. Similarly, when handling long-term campaigns, HPC simulation data will be moved into HPDF for continued analysis and combination with experimental data archived from multiple current and past instruments in the community. We are interested in hearing about the range of needs and use cases from across the DOE Office of Science complex as we prioritize the services we aim to offer over time.

Q2: What types of advanced data capabilities and data management services will be integrated into HPDF?

A2: The precise set of data capabilities will be prioritized in partnership with the community. Our long-term vision is a comprehensive portfolio of data services that address challenges throughout the complete data lifecycle. We anticipate the initial set of services will be chosen in consultation with the community and will expand over time. HPDF data management capabilities will be built around the FAIR principles and will have metadata and rich provenance. A key part of our strategy for addressing these challenges will be to leverage existing software in use across the U.S. Department of Energy’s Office of Science complex as well as open source communities.

Q3: What is [the relationship of HPDF to] the IRI program? Is HPDF the data facility within the IRI program?

A3: The Integrated Research Infrastructure (IRI) is a broad DOE Office of Science effort, organized and led by the ASCR program, and HPDF is a DOE 413 Project, sponsored by the ASCR program, to enable and accelerate scientific discovery by delivering state-of-the-art data management infrastructure, capabilities, and tools. The IRI Blueprint Activity report emphasizes that the IRI effort will “empower researchers to meld DOE’s world-class research tools, infrastructure, and user facilities seamlessly and securely in novel ways to radically accelerate discovery and innovation” (p.3), and the entire ASCR Facilities ecosystem will play a major role in furthering IRI.  We expect the HPDF Project to both contribute to and leverage outcomes from IRI, but it will still have a distinct mission and objectives as the team works on delivering resources to the community. In the IRI ecosystem, HPDF specifically will enable analysis, preservation, and accessibility of the staggering amounts of data produced by SC facilities and projects.

Questions?

Reach out via our Contact page.