Grid Research Integration Deployment and Support Center 

Home
GRIDS Essentials
Grid Ecosystem
Training & Support
Documentation
Workshop 2005
News & Outreach
Downloads
Running Condor-G in NMI

Condor-G is part of the NSF Middleware Initiative (NMI) All Bundle and can be used to schedule work flows. While Condor-G should work immediately, there may be a few things that you will need to configure. This document provides a guide to configuration tasks and validation for the installed Condor application.

Overview

We will look at:

  • Verifying that Condor-G is running
  • Examining the Condor-G configuration file
  • Submitting and tracking a job
  • Reviewing problems that might be encountered and their fixes
Condor Configuration

When running the main Condor server you normally have several daemons:

   daemon 12699 1 0 Nov18 ? 00:09:23 ./condor_master
		   
   daemon 24670 12699 0 Nov19 ? 00:01:22 condor_collector -f
		   
   daemon 24671 12699 0 Nov19 ? 00:00:44 condor_negotiator -f
		   
   daemon 24672 12699 0 Nov19 ? 00:07:47 condor_startd -f
		   
   daemon 24673 12699 0 Nov19 ? 00:00:12 condor_schedd f

However, when running Condor-G that comes bundle with the NMI software stack, you will only have two daemons running:

   condor_master 

condor_schedd

To verify that this is the case, run:

   # ps -ef | grep condor_
		
   daemon186401 0 Nov01 ?00:03:50 /usr/local/nmi51//sbin/condor_master
		   
   daemon 18641 18640 0 Nov01 ? 00:00:27 condor_schedd f

In most cases, Condor-G should work as installed with the NMI All Bundle. After installing the bundle, start the Condor-G daemon set by issuing the following command:

   $ $GLOBUS_LOCATION/sbin/condor_master

The condor_master script starts the other daemons that are configured to run as part of the Condor-G setup. It also runs in the background, checking to make sure that all of the daemons stay up and running. If a daemon dies, condor_master will attempt to restart the daemon.

How Condor-G runs is determined by the settings in the condor configuration file. This file is normally found in: $GLOBUS_LOCATION/etc/condor_config. You need to set an environment variable, CONDOR_CONFIG, to point to the actual location of this file. As stated, Condor-G should be ready to run jobs after being installed as part of the NMI bundle. However, you may want to check a few values in the condor_config file.

   HOSTALLOW_READ


   HOSTALLOW_WRITE

These values can be either searched for in the condor_config file, or you can use the condor_config_val utility. As defined in the condor_config file, the setting of these values mean:

READ access

Machines listed as allow (and/or not listed as deny) can view the status of your pool but cannot join your pool or run jobs. By default, without these entries customized, you are granting READ access to the whole world. You may want to restrict that to hosts in your domain. If possible, please also grant READ access to "*.cs.wisc.edu", so the Condor developers will be able to view the status of your pool and more easily help you install, configure or debug your Condor installation. It is important to have this defined.

WRITE access

Machines listed here can join your pool, submit jobs, etc.

Note: Any machine that has WRITE access must also be granted READ access. Granting WRITE access below does not also automatically grant READ access; you must change HOSTALLOW_READ above as well. If you leave it as it is, it will be unspecified, and, in effect, it will allow anyone to write to your pool.

For example, in the case of the NCSA Grids Center test system, the values are defined:

   [root@grids3 bin]# condor_config_val HOSTALLOW_READ
		   
   *.ncsa.uiuc.edu, *.cs.wisc.edu
		   
   [root@grids3 bin]# condor_config_val HOSTALLOW_WRITE
		   
   *

In this case, with these values, the domains *.ncsa.uiuc.edu and *.cs.wisc.edu will be able to read status information about Condor. Anyone can write to the current Condor pool.

One other thing of importance is how Condor is installed. When installing Condor and Globus, care should be taken. Both packages should be installed as the same user; otherwise, permission problems may arise.

For a single-user install, install the bundles, start Gram and Condor, and run validation tests all as the same non-root user. For a multi-user install, install the bundles as root and start Gram and Condor as root. (Condor will default to run as the daemon or condor user if it exists, but this is expected.) Run validation tests as any user.

You should now be ready to test your Condor installation.

Condor and Validation

The easiest thing to do is to create a simple job and submit it to the condor daemons. This job can look something like this:

   executable = /usr/bin/uptime
		
   transfer_executable = false
		
   globusscheduler = grids3.ncsa.uiuc.edu/jobmanager
		
   universe = globus
		
   output = grids3.cndr-test.out
		
   error = grids3.cndr-test.err
		
   log = grids3.cndr-test.log
		
   queue

This script tells Condor-G to run the executable uptime. uptime is a UNIX utility that returns the length of time that the system has been running. Condor will submit the job to the job manager found on the machine grids3.ncsa.uiuc.edu. The output information will be stored in a file called grids3.cndr-test.out, which will be under the current directory.

To submit the job simply enter:

   condor_submit <filename>

You can view the status of your job using condor_q. This utility returns information on the Condor job queue. After submitting your job and entering condor_q you should see something like:

   $ condor_q
	
   -- Submitter: grids3.ncsa.uiuc.edu : <141.142.97.108:33805> :
	
   grids3.ncsa.uiuc.edu
	
   ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
	
   3.0 pduda 10/21 13:11 0 00:00:00 I 0 0.0
	 
   uptime
	 
   1 jobs; 1 idle, 0 running, 0 held

If you look in your log file, grids3.cndr-test.log, you will see output similar to:

   000 (003.000.000) 10/21 13:11:28 Job submitted from host:
	 
   <141.142.97.108:33805>
	 
   ...

In the output file, grids3.cndr-test.out, you should see:

   09:53:14 up 98 days, 2:14, 1 user, load average: 0.00, 0.00, 0.00

If the job does not run, check to make sure Gram is up and running. Gram is the default job manager in the NMI bundle. Gram should be configured to run when the system starts up. This configuration  is done by making changes to either inetd or xinetd, depending on what your system is using. The steps to do this can be found at http://npackage.npaci.edu/sysadmin_gram_config.html

If you are not able to configure and run Gram as a system resource, you can still use Condor. Instead of using the system job manager, you can create your own local one. This approach can be accomplished by using globus-personal-gatekeeper, as follows:

First, make sure you have a current valid proxy certificate:

   % grid-proxy-init -debug -verify

Then, start globus-personal-gatekeeper:

   % globus-personal-gatekeeper -start

You should see output like:

   GRAM contact: grids3.ncsa.uiuc.edu:40117:/O=Grid/OU=GlobusTest/ \
      OU=simpleCA-grids3.ncsa.uiuc.edu/OU=localdomain/CN=Bob Test

This information is important. This contact string is used when submitting jobs to the gatekeeper. For example, we could run the following:

   globus-job-run "grids3.ncsa.uiuc.edu:40117:/O=Grid/ \
      OU=GlobusTest/OU=simpleCA-grids3.ncsa.uiuc.edu/OU=localdomain/CN=Bob Test" /bin/date

Note the quotation marks around the contact string. They are needed because of the spaces found in the string. From this command, you should see the current system date and time printed out.

This same idea is used when submitting a job to Condor using the globus-personal-gatekeeper as the job manager. Just use the contact string as the hostname when defining the job manager.

Our above script now looks like the following:

   executable = /usr/bin/uptime

   transfer_executable = false
	
   globusscheduler = grids3.ncsa.uiuc.edu:40117:/O=Grid/OU=GlobusTest/ \
      OU=simpleCA-grids3.ncsa.uiuc.edu/OU=localdomain/CN=Bob Test/jobmanager

universe = globus output = grids3.cndr-test.out error = grids3.cndr-test.err log = grids3.cndr-test.log queue

You should now be able to execute a job using Condor-G and a local job manager such as globus-personal-gatekeeper.

TOP

2004 GRIDS Center. All Rights Reserved.
Site Map | Contact