This is an archived site and is no longer maintained. There will be no further updates to this site.
Grid Research Integration Deployment and Support Center 

Home
GRIDS Essentials
Partners & Goals
NSF Middleware Initiative
Software
Testing
Grid Computing Primer
Grid Ecosystem
Training & Support
News & Outreach
Downloads
A Grid Computing Primer

The following primer on Grid computing was adapted from "The Grid: New Infrastructure for 21st Century Science," in Physics Today, February 2002, by Ian Foster (Argonne National Laboratory and the University of Chicago) and "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," International Journal of Supercomputing Applications, 2001, by Ian Foster, Carl Kesselman (University of Southern California Information Sciences Institute) and Steve Tuecke (Argonne National Laboratory).
See http://www-fp.mcs.anl.gov/~foster/ for the full text of both articles.

Background
An infrastructure is a technology that we can take for granted when performing our activities. The road system enables us to travel by car; the international banking system allows us to transfer funds across borders; and the Internet allows us to communicate with virtually any electronic device.

To be useful, an infrastructure technology must be broadly deployed, which means, in turn, that it must be simple, extraordinarily valuable, or both. A good example is the set of protocols that must be implemented within a device to allow Internet access. The set is so small that people have constructed matchbox-sized Web servers. A Grid infrastructure needs to provide more functionality than the Internet on which it rests, but it must also remain simple. And of course, the need remains for supporting the resources that power the Grid, such as high-speed data movement, caching of large datasets, and on-demand access to computing.

Tools make use of infrastructure services. Internet and Web tools include browsers for accessing remote Web sites, e-mail programs for handling electronic messages, and search engines for locating Web pages. Grid tools are concerned with resource discovery, data management, scheduling of computation, security, and so forth. But the Grid goes beyond sharing and distributing data and computing resources.

The following examples describe how researchers may deploy grids.

Science Portals
Scientists often face a steep learning curve when installing and using new software. Science portals make advanced problem-solving methods easier to use by invoking sophisticated packages remotely from Web browsers or other simple, easily downloaded "thin clients." The packages themselves can also run remotely on suitable computers within a Grid. Such portals are currently being developed in biology, fusion, computational chemistry, and other disciplines.
Resources:
http://www.cactuscode.org/ - Cactus, open-source software for creating Grid portals
http://workbench.sdsc.edu/ - Biology Workbench portal
http://archive.ncsa.uiuc.edu/alliance/partners/ApplicationTechnologies/ChemicalEngineering.html - Chemical Engineering Workbench portal
http://gridport.net/pubs/10.25.04.GridPort3Training.ppt - "Building Grid Enabled Portals Using GridPort 3"
http://www.collab-ogce.org/nmi/papers/PortletTutorial.ppt - "Grid Portlet Writing Tutorial"
http://www.collab-ogce.org - Open Grid Computing Environments Collaboratory Web site, which has a list of active science portals

Distributed Computing
High-speed workstations and networks can yoke together an organization's PCs to form a substantial computational resource. Entropia Inc's FightAIDSAtHome system harnesses more than 30 000 computers to analyze AIDS drug candidates. And in 2001, mathematicians across the US and Italy pooled their computational resources to solve a particular instance, dubbed "Nug30," of an optimization problem. For a week, the collaboration brought an average of 630--and a maximum of 1006--computers to bear on Nug30, delivering a total of 42 000 CPU-days. Future improvements in network performance and Grid technologies will increase the range of problems that aggregated computing resources can tackle.
Resources:
http://www.fightaidsathome.org/ - Entropia's FightAIDS@Home
http://www-unix.mcs.anl.gov/metaneos/nug30/ - Nug30 project

Large-Scale Data Analysis
Many interesting scientific problems require analysis of large datasets. For such problems, harnessing distributed computing and storage resources is clearly of great value. Furthermore, the natural parallelism inherent in many data analysis procedures makes it feasible to use distributed resources efficiently. For example, analysis of the many petabytes of data to be produced by the LHC and other future high-energy physics experiments will require the marshalling of tens of thousands of processors and hundreds of terabytes of disk space for holding intermediate results. For various technical and political reasons, assembling these resources at a single location appears impractical. Yet the collective institutional and national resources of the hundreds of institutions participating in those experiments can provide these resources.  Beyond sharing computers and storage, these communities can also share analysis procedures and computational results.
Resources:
http://www.griphyn.org/ - GriPhyN, the Grid Physics Network
http://www.neesgrid.org/ - Network for Earthquake Engineering Simulation

Computer-in-the-Loop Instrumentation
Scientific instruments such as telescopes, synchrotrons, and electron microscopes generate raw data streams that are archived for subsequent batch processing. But quasi-real-time analysis can greatly enhance an instrument's capabilities. For example, consider an astronomer studying solar flares with a radio telescope array. The deconvolution and analysis algorithms used to process the data and detect flares are computationally demanding. Running the algorithms continuously would be inefficient for studying flares that are brief and sporadic. But if the astronomer could call on substantial computing resources (and sophisticated software) in an on-demand fashion, he or she could use automated detection techniques to zoom in on solar flares as they occurred.
Resources:
http://www.ivdgl.org/ - International Virtual Data Grid Laboratory
http://www.ppdg.net/ - Particle Physics Data Grid
Collaborative Work
Researchers often want to aggregate not only data and computing power, but also human expertise. Collaborative problem formulation, data analysis, and the like are important Grid applications. For example, an astrophysicist who has performed a large, multi-terabyte simulation might want colleagues around the world to visualize the results in the same way and at the same time so that the group can discuss the results in real time.
Resources:
http://www.teragrid.org/ - TeraGrid project
http://www.eu-datagrid.org/ - European Union Data Grid
http://birn.ncrr.nih.gov/ - Biomedical Informatics Research Network
Here are some other "real-world" examples of Grid applications:
  • A company needing to reach a decision on the placement of a new factory invokes a sophisticated financial forecasting model from an Application Service Provider (ASP), providing the ASP with access to appropriate proprietary historical data from a corporate database on storage systems operated by a Storage Service Provider (SSP). During the decision-making meeting, what-if scenarios are run collaboratively and interactively, even though the division heads participating in the decision are located in different cities. The ASP itself contracts with a cycle provider for additional "oomph" during particularly demanding scenarios, requiring of course that cycles meet desired security and performance requirements.

  • An industrial consortium formed to develop a feasibility study for a next-generation supersonic aircraft undertakes a highly accurate multidisciplinary simulation of the entire aircraft. This simulation integrates proprietary software components developed by different participants, with each component operating on that participant’s computers and having access to appropriate design databases and other data made available to the consortium by its members.

  • A crisis management team responds to a chemical spill by using local weather and soil models to estimate the spread of the spill, determining the impact based on population location as well as geographic features such as rivers and water supplies, creating a shortterm mitigation plan (perhaps based on chemical reaction models), and tasking emergency response personnel by planning and coordinating evacuation, notifying hospitals, and so forth.

  • Thousands of physicists at hundreds of laboratories and universities worldwide come together to design, create, operate, and analyze the products of a major detector at CERN, the European high energy physics laboratory. During the analysis phase, they pool their computing, storage, and networking resources to create a "Data Grid" capable of analyzing petabytes of data.



©2004 GRIDS Center. All Rights Reserved.
Site Map | Contact
This is an archived site and is no longer maintained. There will be no further updates to this site.