next up previous contents
Next: 3.3 Installing Contrib Modules Up: 3. Administrators' Manual Previous: 3.1 Introduction

Subsections

  
3.2 Installation of Condor

This section contains the instructions for installing Condor at your site. Condor's installation will setup a default configuration which you can then learn how to customize in the sections which follow.

Please read the the copyright and disclaimer information in section [*] on page [*] of the manual, or in the file LICENSE.TXT, before proceeding. Installation and use of Condor is acknowledgement that you have read and agreed to these terms.

The Condor binary distribution is packaged in the following 5 files and 2 directories:

DOC
file containing directions for where to find the Condor documentation
INSTALL
these installation directions
LICENSE.TXT
by installing Condor, you agree to the contents of the LICENSE.TXT file
README
general info
condor_install
Perl script to install and configure Condor
examples
directory containing C, Fortran and C++ example programs to run with Condor
release.tar
tar file of the 'release directory', which contains the Condor binaries and libraries

Before you install, please consider joining the condor-world mailing list. Traffic on this list is kept to an absolute minimum. It is only used to announce new releases of Condor. To subscribe, send a message to majordomo@cs.wisc.edu with the body:

   subscribe condor-world

  
3.2.1 Preparing to Install Condor

Before you install Condor at your site, there are a few important decisions you must make about the basic layout of your pool. These are:

1.
What machine will be the Central Manager?
2.
Will Condor run as root or not?
3.
Who will be administering Condor on the machines in your pool?
4.
Will you have a 'condor' user and will it's home directory be shared?
5.
Where should the machine-specific directories for Condor go?
6.
Where should the parts of the Condor system be installed?
7.
Am I using AFS?
8.
Do I have enough disk space for Condor?

If you feel you already know the answers to these questions, you can skip to the 'Installation Procedure' section below, section 3.2.2. If you are unsure about any of them, read on.

3.2.1.1 What machine will be the Central Manager?

One machine in your pool must be the Central Manager. You should setup and install Condor on this machine first. This is the centralized information repository for the Condor pool and is also the machine that does match-making between available machines and waiting jobs. If the Central Manager machine crashes, any currently active matches in the system will keep running, but no new matches will be made. Moreover, most Condor tools will stop working. Because of the importance of this machine for the proper functioning of Condor, we recommend you install it on a machine that is likely to stay up all the time, or at the very least, one that will be rebooted quickly if it does crash. Also, because all the daemons will send updates (by default every 5 minutes) to this machine, it is advisable to consider network traffic and your network layout when choosing your central manager.

3.2.1.2 Will Condor run as root or not?

We strongly recommend that you start up the Condor daemons as root. Otherwise, Condor can do very little to enforce security and policy decisions. If you don't have root access and would like to install Condor, under most platforms you can run Condor under any user you'd like. However, there are serious security consequences of this. Please see section 3.10.1 on page [*] in the manual for details on running Condor as non-root.

3.2.1.3 Who will be administering Condor on the machines in your pool?

Either root will be administering Condor directly, or someone else would be acting as the Condor administrator. If root has delegated the responsibility to another person but doesn't want to grant that person root access, root can specify a condor_config.root file that will override settings in the other condor config files. This way, the global condor_config file can be owned and controlled by whoever is condor-admin, and the condor_config.root can be owned and controlled only by root. Settings that would compromise root security (such as which binaries are started as root) can be specified in the condor_config.root file while other settings that only control policy or condor-specific settings can still be controlled without root access.

3.2.1.4 Will you have a 'condor' user and will it's home directory be shared?

To simplify installation of Condor at your site, we recommend that you create a 'condor' user on all machines in your pool. The condor daemons will create files (such as the log files) owned by this user, and the home directory can be used to specify the location of files and directories needed by Condor. The home directory of this user can either be shared among all machines in your pool, or could be a separate home directory on the local partition of each machine. Both approaches have advantages and disadvantages. Having the directories centralized can make administration easier, but also concentrates the resource usage such that you potentially need a lot of space for a single shared home directory. See the section below on machine-specific directories for more details.

If you choose not to create a condor user, you must specify via the CONDOR_IDS environment variable which uid.gid pair should be used for the ownership of various Condor files. See section 3.10.2 on ``UIDs in Condor'' on page [*] in the Administrator's Manual for details.

3.2.1.5 Where should the machine-specific directories for Condor go?

Condor needs a few directories that are unique on every machine in your pool. These are 'spool', 'log', and 'execute'. Generally, all three are subdirectories of a single machine specific directory called the 'local directory' (specified by the LOCAL_DIR parameter in the config file).

If you have a 'condor' user with a local home directory on each machine, the LOCAL_DIR could just be user condor's home directory ('LOCAL_DIR = $(TILDE)' in the config file). If this user's home directory is shared among all machines in your pool, you would want to create a directory for each host (named by hostname) for the local directory ('LOCAL_DIR = $(TILDE)/hosts/$(HOSTNAME)' for example). If you don't have a condor account on your machines, you can put these directories wherever you'd like. However, where to place them will require some thought, as each one has its own resource needs:

execute
This is the directory that acts as the current working directory for any Condor jobs that run on a given execute machine. The binary for the remote job is copied into this directory, so you must have enough space for that. (Condor won't send a job to a machine that doesn't have enough disk space to hold the initial binary). In addition, if the remote job dumps core for some reason, it is first dumped to the execute directory before it is sent back to the submit machine. So, you will want to put the execute directory on a partition with enough space to hold a possible core file from the jobs submitted to your pool.

spool
The spool directory holds the job queue and history files, and the checkpoint files for all jobs submitted from a given machine. As a result, disk space requirements for spool can be quite large, particularly if users are submitting jobs with very large executables or image sizes. By using a checkpoint server (see section 3.3.5 on ``Installing a Checkpoint Server'' on page [*] for details), you can ease the disk space requirements, since all checkpoint files are stored on the server instead of the spool directories for each machine. However, the initial checkpoint files (the executables for all the clusters you submit) are still stored in the spool directory, so you will need some space, even with a checkpoint server.

log
Each Condor daemon writes its own log file which is placed in the log directory. You can specify what size you want these files to grow to before they are rotated, so the disk space requirements of the log directory are configurable. The larger the logs, the more historical information they will hold if there's a problem, but the more disk space they use up. If you have a network filesystem installed at your pool, you might want to place the log directories in a shared location (such as /usr/local/condor/logs/$(HOSTNAME)) so that you can view the log files from all your machines in a single location. However, if you take this approach, you will have to specify a local partition for the lock directory (see below).

lock
Condor uses a small number of lock files to synchronize access to some files that are shared between multiple daemons. Because of problems we've had with file locking and network filesystems (particularly NFS), these lock files should be placed on a local partition on each machine. By default, they are just placed in the log directory. If you place your log directory on a network filesystem partition, you should specify a local partition for the lock files with the 'LOCK' parameter in the config file (such as /var/lock/condor).

Generally speaking, we recommend that you do not put these directories (except lock) on the same partition as /var, since if the partition fills up, you will fill up /var as well, which will cause lots of problems for your machines. Ideally, you'd have a separate partition for the Condor directories that the only consequence of filling up would be Condor's malfunction, not your whole machine.

3.2.1.6 Where should the parts of the Condor system be installed?

Config Files
There are a number of config files that allow you different levels of control over how Condor is configured at each machine in your pool. In general, you will have 1 global configuration file for each platform. In addition, there is a local config file for each machine, where you can override settings in the global file. This allows you to have different daemons running, different policies for when to start and stop Condor jobs, and so on. Beginning with Condor version 6.0.1, you can use a single config file which is shared among all platforms in your pool, and have both platform-specific and machine-specific files. See section 3.9.2 on page [*] about ``Configuring Condor for Multiple Platforms'' for details.

In addition, because we recommend that you start the Condor daemons as root, we allow you to create config files that are owned and controlled by root that will override any other condor settings. This way, if the condor administrator isn't root, the regular condor config files can be owned and writable by condor-admin, but root doesn't have to grant root access to this person. See section 3.10.3 on page [*] in the manual for a detailed discussion of the root config files, if you should use them, and what settings should be in them.

In general, there are a number of places that condor will look to find its config files. The first file it looks for is the global config file. These locations are searched in order until a config file is found. If none contain a valid config file, Condor will print an error message and exit:

1.
File specified in CONDOR_CONFIG environment variable
2.
/etc/condor/condor_config
3.
~condor/condor_config

Next, Condor tries to load the machine-specific, or local config file. The only way to specify the local config file is in the global config file, with the LOCAL_CONFIG_FILE macro. If that macro isn't set, no local config file is used. Beginning with Condor version 6.0.1, this macro can be a list of files instead of a single file.

The root config files come in last. The global file is searched for in the following places:

1.
/etc/condor/condor_config.root
2.
~condor/condor_config.root

The local root config file is found with the LOCAL_ROOT_CONFIG_FILE macro. If that isn't set, no local root config file is used. Beginning with Condor version 6.0.1, this macro can also be a list of files instead of a single file.

Release Directory

Every binary distribution contains a 'release.tar' file that contains four subdirectories: 'bin', 'etc', 'lib' and 'sbin'. Wherever you choose to install these 4 directories we call the 'release directory' (specified by the 'RELEASE_DIR' parameter in the config file). Each release directory contains platform dependent binaries and libraries, so you will need to install a separate one for each kind of machine in your pool.

Documentation

The documentation provided with Condor is currently only available in HTML, Postscript and PDF (Adobe Acrobat). It can be locally installed wherever is customary at your site. You can also find the Condor documentation on the web at: http://www.cs.wisc.edu/condor/manual.

3.2.1.7 Am I using AFS?

If you are using AFS at your site, be sure to read the section 3.9.1 on page [*] in the manual. Condor does not currently have a way to authenticate itself to AFS. We're working on a solution, it's just not ready for version 6.0. So, what this means is that you are probably not going to want to have the LOCAL_DIR for Condor on AFS. However, you can (and probably should) have the Condor RELEASE_DIR on AFS, so that you can share one copy of those files and upgrade them in a centralized location. You will also have to do something special if you submit jobs to Condor from a directory on AFS. Again, read manual section 3.9.1 for all the gory details.

3.2.1.8 Do I have enough disk space for Condor?

The Condor release directory takes up a fair amount of space. This is another reason why it's a good idea to have it on a shared filesystem. The rough size requirements for the release directory on various platforms are listed in table 3.1.


 
Table 3.1: Release Directory Size Requirements
Platform Size
Intel/Linux 11 megs (statically linked)
Intel/Linux 6.5 megs (dynamically linked)
Intel/Solaris 8 megs
Sparc/Solaris 10 megs
SGI/IRIX 17.5 megs
Alpha/Digital Unix 15.5 megs

In addition, you will need a lot of disk space in the local directory of any machines that are submitting jobs to Condor. See question 5 above for details on this.

  
3.2.2 Installation Procedure

IF YOU HAVE DECIDED TO CREATE A 'condor' USER AND GROUP, YOU SHOULD DO THAT ON ALL YOUR MACHINES BEFORE YOU DO ANYTHING ELSE.

The easiest way to install Condor is to use one or both of the scripts provided to help you: condor_install and condor_init. You should run these scripts as the user that you are going to run the Condor daemons as. First, run condor_install on the machine that will be a fileserver for shared files used by Condor, such as the release directory, and possibly the condor user's home directory. When you do, choose the ``full-install'' option in step #1 described below.

Once you have run condor_install on a file server to setup your release directory and configure Condor for your site, you should run condor_init on any other machines in your pool to create any locally used files that aren't created by condor_install. In the most simple case, where nearly all of Condor is installed on a shared file system, even though Condorinstall will create nearly all the files and directories you need, you will still need to use condor_init to create the LOCK directory on the local disk of each machine. If you have a shared release directory, but the LOCAL_DIR is local on each machine, condor_init will create all the directories and files needed in LOCAL_DIR. In addition, condor_init will create any soft links on each machine that are needed so that Condor can find its global config file.

If you don't have a shared filesystem, you will need to run condor_install on each machine in your pool to setup Condor. In this case, there is no need to run condor_init at all.

In addition, you will want to run condor_install on your central manager machine if that machine is different from your file server, using the ``central-manager'' option in step #1 described below. Run condor_install on your file server first, then on your central manager. If this step fails for some reason (NFS permissions, etc), you can do it manually quite easily. All this does is copy the condor_config.local.central.manager file from <release_dir>/etc/examples to the proper location for the local config file of your central manager machine. If your central manager is an Alpha or an SGI, you might want to add ``KBDD'' to the DAEMON_LIST parameter. See section 3.4 ``Configuring Condor'' on page [*] of the manual for details.

condor_install assumes you have perl installed in /usr/bin/perl. If this is not the case, you can either edit the script to put the right path in, or you will have to invoke perl directly from your shell (assuming perl is in your PATH):

% perl condor_install

condor_install breaks down the installation procedure into various steps. Each step is clearly numbered. The following section explains what each step is for, and suggests how to answer the questions condor_install will ask you for each one.

3.2.2.1 condor_install, step-by-step

STEP 1: What type of Condor installation do you want?

There are three types of Condor installation you might choose: 'submit-only', 'full-install', and 'central-manager'. A submit-only machine can submit jobs to a Condor pool, but Condor jobs will not run on it. A full-install machine can both submit and run Condor jobs.

If you are planning to run Condor jobs on your machines, you should either install and run Condor as root, or as user 'condor'.

If you are planning to setup a submit only machine, you can either install Condor machine-wide as root or user 'condor', or, you can install Condor as yourself into your home directory.

The other possible installation type is setting up a machine as a central manager. If you do a full-install and you say that you want the local host to be your central manager, this step will be done automatically. You should only choose the central-manager option at step 1 if you have already run condor_install on your file server and you now want to run condor_install on a different machine that will be your central manager.

STEP 2: How many machines are you setting up this way?

If you are installing Condor for multiple machines, and you have a shared file system, condor_install will prompt you for the hostnames of each machine you want to add to your Condor pool. If you don't have a shared file system, you will have to run condor_install locally on each machine, anyway, so it doesn't bother asking you for the names. If you provide a list, it will use the names to automatically create directories and files later. At the end, condor_install will dump out this list to a 'roster' file which can be used by scripts to help maintain your Condor pool.

If you are only installing Condor on 1 machine, you would just answer 'no' to the first question, and move on.

STEP 3: Install the Condor release directory
The release directory contains four subdirectories: 'bin', 'etc', 'lib' and 'sbin'. bin contains user-level executable programs. etc is the recommended location for your Condor config files, and also includes an 'examples' directory with default config files and other default files used for installing condor. lib contains libraries to link condor user programs and scripts used by the Condor system. sbin contains all administrative executable programs and the Condor daemons.

If you have multiple machines with a shared filesystem that will be running Condor, you should put the release directory on that shared filesystem so you only have one copy of all the binaries, and so that when you update them, you can do so in one place. Note that the release directory is architecture dependent, so you will need to download separate binary distributions for every platform in your pool.

condor_install tries to find an already installed release directory. If it can't find one, it asks if you have installed one already. If you have not installed one, it tries to do so for you by untarring the release.tar file from the binary distribution.

NOTE: If you are only setting up a central manager (you chose 'central manager' in step 1) step 3 is the last question you will need to answer.

STEP 4: How and where should Condor send email if things go wrong?

Various parts of Condor will send email to a condor administrator if something goes wrong that needs human attention. You will need to specify the email address of this administrator.

You will also need to specify the full path to a mail program that Condor will use to send the email. This program needs to understand the '-s' option, which is how you specify a subject for the outgoing message. The default on most platforms will probably be correct. On Linux machines, since there is such variation in Linux distributions and installations, you should verify that the default works. If the script complains that it cannot find the mail program that was specified, you can try 'which mail' from your shell prompt to see what 'mail' program is currently in your PATH. If there is none, try 'which mailx'. If you still can't find anything, ask your system administrator. You should verify that the program you end up using supports '-s'. The man page for that program will probably tell you.

STEP 5: Where should public programs be installed?

It is recommended that you install the user-level condor programs in the release directory, (where they go by default). This way, when you want to install a new version of the Condor binaries, you can just replace your release directory and everything will be updated at once. So, one option is to have Condor users add <release_dir>/bin to their PATH, so that they can access the programs. However, we recommend putting in soft links from some directory already in their PATH (such as /usr/local/bin) that point back to the Condor user programs. condor_install will do this for you, all you have to do is tell it what directory to put these links into. This way, users don't have to change their PATH to use Condor but you can still have the binaries installed in their own location.

If you are installing Condor as neither root nor condor, there is a perl script wrapper to all the Condor tools that is created which sets some appropriate environment variables and automatically passes certain options to the tools. This is all created automatically by condor_install. So, you need to tell condor_install where to put this perl script. The script itself is linked to itself with many different names, since it is the name that determines the behavior of the script. This script should go somewhere that is in your PATH already, if possible (such as ~bin).

At this point, the remaining steps are different depending on what kind of installation you are doing. Skip to the appropriate section depending on what kind of installation you selected in STEP 1 above.

3.2.2.2 Full Install

STEP 6: What machine will be your central manager?

Simply type in the full hostname of the machine you have chosen for your central manager. If condor_install can't find information about the host you typed by querying your nameserver, it will print out an error message and ask you to confirm.

STEP 7: Where will the 'local directory' go?

This is the directory discussed in question #5 from the introduction. condor_install tries to make some educated guesses as to what directory you want to use for the purpose. Simply agree to the correct guess, or (when condor_install has run out of guesses) type in what you want. Since this directory needs to be unique, it is common to use the hostname of each machine in its name. When typing in your own path, you can use '$(HOSTNAME)' which condor_install (and the Condor config files) will expand to the hostname of the machine you are currently on. condor_install will try to create the corresponding directories for all the machines you told it about in STEP 2 above.

Once you have selected the local directory, condor_install creates all the needed subdirectories of each one with the proper permissions. They should have the following permissions and ownerships:

     drwxr-xr-x   2 condor   root         1024 Mar  6 01:30 execute/
     drwxr-xr-x   2 condor   root         1024 Mar  6 01:30 log/
     drwxr-xr-x   2 condor   root         1024 Mar  6 01:30 spool/

If your local directory is on a shared file system, condor_install will prompt you for the location of your lock files, as discussed in question #5 above. In this case, when condor_install is finished, you will have to run condor_init on each machine in your pool to create the lock directory before you can start up Condor.

STEP 8: Where will the local (machine-specific) config files go?

As discussed in question #6 above, there are a few different levels of Condor config file. There's the global config file that will be installed in <release_dir>/etc/condor_config, and there are machine-specific, or local config files that override the settings in the global file. If you are installing on multiple machines or are configuring your central manager machine, you must select a location for your local config files.

The two main options are to have a single directory that holds all the local config files, each one named '$(HOSTNAME).local', or to have the local config files go into the individual local directories for each machine. Given a shared filesystem, we recommend the first option, since it makes it easier to configure your pool from a centralized location.

STEP 9: How do you want Condor to find its config file?

Since there are a few known places Condor looks to find your config file, we recommend that you put a soft link from one of them to point to <release_dir>/etc/condor_config. This way, you can keep your Condor configuration in a centralized location, but all the Condor daemons and tools will be able to find their config files. Alternatively, you can set the CONDOR_CONFIG environment variable to contain <release_dir>/etc/condor_config.

condor_install will ask you if you want to create a soft link from either of the two fixed locations that Condor searches.

Once you have completed STEP 9, you're done. condor_install prints out a messages describing what to do next. Please skip to section 3.2.3.

3.2.2.3 Submit Only

\fbox{This section has not yet been written}

  
3.2.3 Condor is installed... now what?

Now that Condor has been installed on your machine(s), there are a few things you should check before you start up Condor.

1.
Read through the <release_dir>/etc/condor_config file. There are a lot of possible settings and you should at least take a look at the first two main sections to make sure everything looks okay. In particular, you might want to setup host/ip based security for Condor. See the section 3.7 on page [*] in the manual to learn how to do this.

2.
Condor can monitor the activity of your mouse and keyboard, provided that you tell it where to look. You do this with the CONSOLE_DEVICES entry in the condor_startd section of the config file. On most platforms, we provide reasonable defaults. For example, the default device for the mouse on Linux is 'mouse', since most Linux installations have a soft link from '/dev/mouse' that points to the right device (such as tty00 if you have a serial mouse, psaux if you have a PS/2 bus mouse, etc). If you don't have a /dev/mouse link, you should either create one (you'll be glad you did), or change the CONSOLE_DEVICES entry in Condor's config file. This entry is just a comma seperated list, so you can have any devices in /dev count as 'console devices' and activity will be reported in the condor_startd's classad as ConsoleIdleTime.

3.
(Linux only) Condor needs to be able to find the 'utmp' file. According to the Linux File System Standard, this file should be /var/run/utmp. If Condor can't find it there, it looks in /var/adm/utmp. If it still can't find it, it gives up. So, if your Linux distribution puts this file somewhere else, be sure to put a soft link from /var/run/utmp to point to the real location.

3.2.4 Starting up the Condor daemons

To start up the Condor daemons, all you need to do is execute <release_dir>/sbin/condor_master. This is the Condor master, whose only job in life is to make sure the other Condor daemons are running. The master keeps track of the daemons, restarts them if they crash, and periodically checks to see if you have installed new binaries (and if so, restarts the affected daemons).

If you're setting up your own pool, you should start Condor on your central manager machine first. If you have done a submit-only installation and are adding machines to an existing pool, it doesn't matter what order to start them in.

To ensure that Condor is running, you can run either:

        ps -ef | egrep condor_
or
        ps -aux | egrep condor_
depending on your flavor of Unix. On your central manager machine you should have processes for: On all other machines in your pool you should have processes for: (NOTE: On Alphas and IRIX machines, there will also be a 'condor_kbdd' - see section 3.9.4 on page [*] of the manual for details.) If you have setup a submit-only machine, you will only see:

Once you're sure the Condor daemons are running, check to make sure that they are communicating with each other. You can run condor_status to get a one line summary of the status of each machine in your pool.

Once you're sure Condor is working properly, you should add ``condor_master" into your startup/bootup scripts (i.e. /etc/rc ) so that your machine runs condor_master upon bootup. condor_master will then fire up the neccesary Condor daemons whenever your machine is rebooted.

If your system uses System-V style init scripts, you can look in <release_dir>/etc/examples/condor.boot for a script that can be used to start and stop Condor automatically by init. Normally, you would install this script as /etc/init.d/condor and put in soft link from various directories (for example, /etc/rc2.d) that point back to /etc/init.d/condor. The exact location of these scripts and links will vary on different platforms.

If your system uses BSD style boot scripts, you probably have an /etc/rc.local file. Just add a line in there to start up <release_dir>/sbin/condor_master and you're done.

  
3.2.5 The Condor daemons are running... now what?

Now that the Condor daemons are running, there are a few things you can and should do:

1.
(Optional) Do a full install for the condor_compile script. condor_compile assists in linking jobs with the Condor libraries to take advantage of all of Condor's features. As it is currently installed, it will work by placing it in front of any of the following commands that you would normally use to link your code: gcc, g++, g77, cc, acc, c89, CC, f77, fort77 and ld. If you complete the full install, you will be able to use condor_compile with any command whatsoever, in particular, make. See section 3.9.3 on page [*] in the manual for directions.

2.
Try building and submitting some test jobs. See examples/README for details.

3.
If your site uses the AFS network file system, see section 3.9.1 on page [*] in the manual.

4.
We strongly recommend that you start up Condor (i.e. run the condor_master daemon) as user root. If you must start Condor as some user other than root, see section 3.10.1 on page [*].


next up previous contents
Next: 3.3 Installing Contrib Modules Up: 3. Administrators' Manual Previous: 3.1 Introduction
condor-admin@cs.wisc.edu