Earth System Grid II: Turning Climate Datasets Into Community Resources
http://www.earthsystemgrid.orgPI’s: Ian Foster (ANL), Don Middleton (NCAR), & Dean Williams (PCMDI)
Executive Summary, September 2001
Vision
The need to evaluate climate change scenarios under the Kyoto accord makes climate modeling a mission critical application area for DOE. The climate modeling component of DOE’s SciDAC program seeks to address this need through the creation of an advanced climate simulation program that will accelerate the execution of climate models one hundred-fold by 2005 relative to the execution rate of today. This program, and other similar modeling and observational programs, will produce a tremendous volume of data that has the potential to revolutionize our understanding of complex climate processes. In order for this potential to be realized, geographically distributed teams of researchers must be able to effectively and rapidly develop new knowledge from these massive, distributed data holdings and share the results with a wider community.
The magnitude and complexity of the data problem already challenges the strained resources of research groups and has the potential to stand as a formidable barrier to research progress if not addressed. If target research goals are to be realized, fundamentally new methodologies for managing, accessing, recombining, analyzing and intercomparing distributed data are required. The Earth System Grid II (ESG) is aimed at addressing this critical problem. We will define, develop, and deploy a next generation environment that harnesses the combined potential of massive distributed data resources, remote computation, and high-bandwidth wide-area networks as an integrated resource for the research scientist. We envision the ESG as a foundation for next-generation analysis applications, web-based data portals, and collaborative problem-solving environments, and thus as vital enabling infrastructure for sustaining and advancing climate and other environmental research.
Major Goals and Technical Challenges
Creating an effective and efficient ESG in support of DOE climate research goals is challenging at multiple levels. A large community of global change researchers at laboratories and universities around the nation will need to access significant fractions of the data. Most requests will require significant analysis, which may be computationally demanding. User requests for data products will be translated into appropriate combinations of accesses to data caches, requests to central data archives, and new large-scale simulations. The effective management of the required data movement operations will tax even the highest performance and most advanced networks.
In previous DOE-funded research, we took the first steps towards the realization of the Earth System Grid vision. Specifically, we developed techniques for the high-speed movement of data between centers and users, replica catalogs for keeping track of data location, request managers for coordinating multiple transfers, and a Grid-enabled version of the data analysis package produced by the Program for Climate Model Diagnosis and Intercomparison (PCMDI). We demonstrated our ability to manage the location and movement of large datasets from the user’s desktop. We also learned a lot about user requirements: in particular, the importance of thin clients and standard data access protocols as a means of delivering ESG capabilities to the largest possible audience, and the emerging importance of moving analysis processes to the data.
Building upon this base, the ESG will be a next-generation environment targeted at enabling flexible, efficient, universal access to large datasets and distributed resources for applying analysis and filtering functions upon them. In creating this new community infrastructure, we will turn climate model data into true community resources and place these advanced capabilities into the hands of a substantial user community.
Major Activities and Milestones
The primary activities of ESG are:
Year 1:
Year 2:
Year 3:
Connectivity to SciDAC and Other Projects
The DOE Science Grid
(PI: William Johnston, LBL) – Leverage common infrastructure provided by the Science Grid to support development and deployment of ESG technologies and services.
Storage Resource Management for Data Grid Applications (PI: Arie Shoshani, LBL) – Leverage SRM developments in support of ESG.
Data Management ISIC (PI: Arie Shoshani, LBL) – Coordinate with this effort so that data management approaches are aligned and coordinated and metadata strategies apply gracefully.
Security and Policy for Group Collaboration (PI: Steven Tuecke, ANL) – Leverage community authorization services.
High-Performance Data Grid Toolkit (PI: Ian Foster, ANL) – Leverage for data dissemination and access