Running
Condor-G in NMI
Condor-G is part of the NSF Middleware Initiative (NMI) All Bundle and can be used to schedule
work flows. While Condor-G should work immediately, there may be a
few things that you will need to configure. This document provides
a guide to configuration tasks and validation for the installed Condor
application.
Overview
We will look at:
- Verifying that Condor-G is running
- Examining the Condor-G configuration file
- Submitting and tracking a job
- Reviewing problems that might be encountered and their fixes
Condor Configuration
When running the main Condor server you normally have several daemons:
daemon 12699 1 0 Nov18 ? 00:09:23 ./condor_master
daemon 24670 12699 0 Nov19 ? 00:01:22 condor_collector -f
daemon 24671 12699 0 Nov19 ? 00:00:44 condor_negotiator -f
daemon 24672 12699 0 Nov19 ? 00:07:47 condor_startd -f
daemon 24673 12699 0 Nov19 ? 00:00:12 condor_schedd f
However, when running Condor-G that comes bundle with the NMI software stack,
you will only have two daemons running:
condor_master
condor_schedd
To verify that this is the case, run:
# ps -ef | grep condor_
daemon186401 0 Nov01 ?00:03:50 /usr/local/nmi51//sbin/condor_master
daemon 18641 18640 0 Nov01 ? 00:00:27 condor_schedd f
In most cases, Condor-G should work as installed with the NMI All
Bundle. After installing the bundle, start the Condor-G daemon set by
issuing the following command:
$ $GLOBUS_LOCATION/sbin/condor_master
The condor_master script starts the other daemons that are configured to run as part of
the Condor-G setup. It also runs in the background, checking to make sure
that all of the daemons
stay up and running. If a daemon dies, condor_master will attempt to restart the daemon.
How Condor-G runs is determined by the settings in the condor configuration file. This file
is normally found in: $GLOBUS_LOCATION/etc/condor_config . You need to set
an environment variable, CONDOR_CONFIG , to point to the actual location of this file.
As stated, Condor-G should be ready to run jobs after being installed as part of the NMI
bundle. However, you may want to check a few values in the condor_config file.
HOSTALLOW_READ
HOSTALLOW_WRITE
These values can be either searched for in the condor_config file, or
you can use the condor_config_val utility. As defined in the condor_config
file, the setting of these values mean:
READ access |
Machines listed as allow (and/or not listed as deny) can view
the status of your pool but cannot join your pool or run jobs.
By default, without these entries customized, you are granting
READ access to the whole world. You may want to restrict that
to hosts in your domain. If possible, please also grant READ
access to "*.cs.wisc.edu", so the Condor developers will be
able to view the status of your pool and more easily help you
install, configure or debug your Condor installation. It is
important to have this defined.
|
WRITE access |
Machines listed here can join your pool, submit jobs, etc.
Note: Any machine that has WRITE access
must also be granted READ access. Granting WRITE access below
does not also automatically grant READ access; you must change
HOSTALLOW_READ above as well. If you leave it as it is, it
will be unspecified, and, in effect, it will allow anyone
to write to your pool.
|
For example, in the case of the NCSA Grids Center test system, the values are defined:
[root@grids3 bin]# condor_config_val HOSTALLOW_READ
*.ncsa.uiuc.edu, *.cs.wisc.edu
[root@grids3 bin]# condor_config_val HOSTALLOW_WRITE
*
In this case, with these values, the domains *.ncsa.uiuc.edu and *.cs.wisc.edu will be
able to read status information about Condor. Anyone can write to the current Condor pool.
One other thing of importance is how Condor is installed. When installing Condor and Globus,
care should be taken. Both packages should be installed as the same user; otherwise, permission
problems may arise.
For a single-user install, install the bundles, start Gram and Condor, and run validation
tests all as the same non-root user. For a multi-user install, install the bundles as root and
start Gram and Condor as root. (Condor will default to run as the daemon or condor user if it
exists, but this is expected.) Run validation tests as any user.
You should now be ready to test your Condor installation.
Condor and Validation
The easiest thing to do is to create a simple job and submit it to the condor daemons. This
job can look something like this:
executable = /usr/bin/uptime
transfer_executable = false
globusscheduler = grids3.ncsa.uiuc.edu/jobmanager
universe = globus
output = grids3.cndr-test.out
error = grids3.cndr-test.err
log = grids3.cndr-test.log
queue
This script tells Condor-G to run the executable uptime . uptime is a UNIX utility that
returns the length of time that the system has been running. Condor will submit the job to the job
manager found on the machine grids3.ncsa.uiuc.edu . The output information will be stored in
a file called grids3.cndr-test.out , which will be under the current directory.
To submit the job simply enter:
condor_submit <filename>
You can view the status of your job using condor_q . This utility returns information on
the Condor job queue. After submitting your job and entering condor_q you should see something
like:
$ condor_q
-- Submitter: grids3.ncsa.uiuc.edu : <141.142.97.108:33805> :
grids3.ncsa.uiuc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
3.0 pduda 10/21 13:11 0 00:00:00 I 0 0.0
uptime
1 jobs; 1 idle, 0 running, 0 held
If you look in your log file, grids3.cndr-test.log , you will see output similar to:
000 (003.000.000) 10/21 13:11:28 Job submitted from host:
<141.142.97.108:33805>
...
In the output file, grids3.cndr-test.out , you should see:
09:53:14 up 98 days, 2:14, 1 user, load average: 0.00, 0.00, 0.00
If the job does not run, check to make sure Gram is up and running. Gram is the default
job manager in the NMI bundle. Gram should be configured to run when the system starts up.
This configuration is done by making changes to either inetd or xinetd , depending on what your system is
using. The steps to do this can be found at
http://npackage.npaci.edu/sysadmin_gram_config.html
If you are not able to configure and run Gram as a system resource, you can still use
Condor. Instead of using the system job manager, you can create your own local one. This
approach can be accomplished by using globus-personal-gatekeeper, as follows:
First, make sure you have a current valid proxy certificate:
% grid-proxy-init -debug -verify
Then, start globus-personal-gatekeeper:
% globus-personal-gatekeeper -start
You should see output like:
GRAM contact: grids3.ncsa.uiuc.edu:40117:/O=Grid/OU=GlobusTest/ \
OU=simpleCA-grids3.ncsa.uiuc.edu/OU=localdomain/CN=Bob Test
This information is important. This contact string is used when submitting jobs to the
gatekeeper. For example, we could run the following:
globus-job-run "grids3.ncsa.uiuc.edu:40117:/O=Grid/ \
OU=GlobusTest/OU=simpleCA-grids3.ncsa.uiuc.edu/OU=localdomain/CN=Bob Test" /bin/date
Note the quotation marks around the contact string. They are needed because of the spaces found
in the string. From this command, you should see the current system date and time printed out.
This same idea is used when submitting a job to Condor using the globus-personal-gatekeeper as the
job manager. Just use the contact string as the hostname when defining the job manager.
Our above script now looks like the following:
executable = /usr/bin/uptime
transfer_executable = false
globusscheduler = grids3.ncsa.uiuc.edu:40117:/O=Grid/OU=GlobusTest/ \
OU=simpleCA-grids3.ncsa.uiuc.edu/OU=localdomain/CN=Bob Test/jobmanager
universe = globus
output = grids3.cndr-test.out
error = grids3.cndr-test.err
log = grids3.cndr-test.log
queue
You should now be able to execute a job using Condor-G and a local
job manager such as globus-personal-gatekeeper.
|