The following primer on Grid computing was adapted from "The Grid: New Infrastructure for 21st Century Science," in
Physics Today, February 2002, by Ian Foster (Argonne National Laboratory and the University of Chicago) and "The Anatomy of
the Grid: Enabling Scalable Virtual Organizations," International Journal of Supercomputing Applications, 2001, by Ian
Foster, Carl Kesselman (University of Southern California Information Sciences Institute) and Steve Tuecke (Argonne National
Laboratory). See http://www-fp.mcs.anl.gov/~foster/ for the full text of both
articles.
Background
An infrastructure is a technology that we can take for granted when performing our activities. The road system enables us to travel by
car; the international banking system allows us to transfer funds across borders; and the Internet allows us to communicate with
virtually any electronic device.
To be useful, an infrastructure technology must be broadly deployed, which means, in turn, that it must be simple, extraordinarily
valuable, or both. A good example is the set of protocols that must be implemented within a device to allow Internet access. The set is
so small that people have constructed matchbox-sized Web servers. A Grid infrastructure needs to provide more functionality than the
Internet on which it rests, but it must also remain simple. And of course, the need remains for supporting the resources that power the
Grid, such as high-speed data movement, caching of large datasets, and on-demand access to computing.
Tools make use of infrastructure services. Internet and Web tools include browsers for accessing remote Web sites, e-mail programs for
handling electronic messages, and search engines for locating Web pages. Grid tools are concerned with resource discovery, data
management, scheduling of computation, security, and so forth. But the Grid goes beyond sharing and distributing data and computing
resources.
The following examples describe how researchers may deploy grids.
Science Portals
Scientists often face a steep learning curve when installing and using new software. Science portals make advanced problem-solving
methods easier to use by invoking sophisticated packages remotely from Web browsers or other simple, easily downloaded "thin
clients." The packages themselves can also run remotely on suitable computers within a Grid. Such portals are currently being
developed in biology, fusion, computational chemistry, and other disciplines.
Distributed Computing
High-speed workstations and networks can yoke together an organization's PCs to form a substantial computational resource. Entropia
Inc's FightAIDSAtHome system harnesses more than 30 000 computers to analyze AIDS drug candidates. And in 2001, mathematicians across
the US and Italy pooled their computational resources to solve a particular instance, dubbed "Nug30," of an optimization
problem. For a week, the collaboration brought an average of 630--and a maximum of 1006--computers to bear on Nug30, delivering a total
of 42 000 CPU-days. Future improvements in network performance and Grid technologies will increase the range of problems that aggregated
computing resources can tackle.
Large-Scale Data Analysis
Many interesting scientific problems require analysis of large datasets. For such problems, harnessing distributed computing and storage
resources is clearly of great value. Furthermore, the natural parallelism inherent in many data analysis procedures makes it feasible to
use distributed resources efficiently. For example, analysis of the many petabytes of data to be produced by the LHC and other future
high-energy physics experiments will require the marshalling of tens of thousands of processors and hundreds of terabytes of disk space
for holding intermediate results. For various technical and political reasons, assembling these resources at a single location appears
impractical. Yet the collective institutional and national resources of the hundreds of institutions participating in those experiments
can provide these resources. Beyond sharing computers and storage, these communities can also share analysis procedures and
computational results.
Computer-in-the-Loop Instrumentation
Scientific instruments such as telescopes, synchrotrons, and electron microscopes generate raw data streams that are archived for
subsequent batch processing. But quasi-real-time analysis can greatly enhance an instrument's capabilities. For example, consider an
astronomer studying solar flares with a radio telescope array. The deconvolution and analysis algorithms used to process the data and
detect flares are computationally demanding. Running the algorithms continuously would be inefficient for studying flares that are brief
and sporadic. But if the astronomer could call on substantial computing resources (and sophisticated software) in an on-demand fashion,
he or she could use automated detection techniques to zoom in on solar flares as they occurred.
Collaborative Work
Researchers often want to aggregate not only data and computing power, but also human expertise. Collaborative problem formulation, data
analysis, and the like are important Grid applications. For example, an astrophysicist who has performed a large, multi-terabyte
simulation might want colleagues around the world to visualize the results in the same way and at the same time so that the group can
discuss the results in real time.
Here are some other "real-world" examples of Grid applications:
A company needing to reach a decision on the placement of a new factory invokes a sophisticated financial forecasting model from
an Application Service Provider (ASP), providing the ASP with access to appropriate proprietary historical data from a corporate
database on storage systems operated by a Storage Service Provider (SSP). During the decision-making meeting, what-if scenarios are
run collaboratively and interactively, even though the division heads participating in the decision are located in different cities.
The ASP itself contracts with a cycle provider for additional "oomph" during particularly demanding scenarios, requiring of
course that cycles meet desired security and performance requirements.
An industrial consortium formed to develop a feasibility study for a next-generation supersonic aircraft undertakes a highly
accurate multidisciplinary simulation of the entire aircraft. This simulation integrates proprietary software components developed by
different participants, with each component operating on that participant’s computers and having access to appropriate design
databases and other data made available to the consortium by its members.
A crisis management team responds to a chemical spill by using local weather and soil models to estimate the spread of the spill,
determining the impact based on population location as well as geographic features such as rivers and water supplies, creating a
shortterm mitigation plan (perhaps based on chemical reaction models), and tasking emergency response personnel by planning and
coordinating evacuation, notifying hospitals, and so forth.
Thousands of physicists at hundreds of laboratories and universities worldwide come together to design, create, operate, and
analyze the products of a major detector at CERN, the European high energy physics laboratory. During the analysis phase, they pool
their computing, storage, and networking resources to create a "Data Grid" capable of analyzing petabytes of data.