Condor is a software system that creates a High Throughput Computing (HTC) environment by effectively harnessing the power of a cluster of UNIX workstations on a network. Although Condor can manage a dedicated cluster of workstations, a key appeal of Condor is its ability to effectively harness non-dedicated, preexisting resources in a distributed ownership setting such as machines sitting on people's desks in offices and labs.
Instead of running a CPU-intensive job in the background on their own workstation, users submit their job to Condor. Condor will then find an available machine on the network and begin running the job on that machine. When Condor detects that a machine running a Condor job would no longer be available (perhaps because the owner of the machine came back from lunch and started typing on the keyboard), Condor checkpoints the job and then migrates it over the network to a different machine which would otherwise be idle. Condor then restarts the job on the new machine from precisely where it left off. If no machine on the network is currently available, then the job is stored in a queue on disk until a machine becomes available.
So, for example, say you submit a compute job that typically takes 5 hours to run. Condor may start running it on Machine A, but after 3 hours Condor notices activity on the keyboard. So Condor checkpoints your job and migrates it to Machine B. After two hours on Machine B, your job completes (and Condor notifies you via email).
Perhaps you have to run this 5 hour compute job 250 different times (perhaps on 250 different data sets). In this case, Condor can be a real time saver. With one command you can submit all 250 runs into Condor. Depending upon the number of machines in your organization's Condor pool, there could be dozens or even hundreds of otherwise idle machines (especially at night, for example) at any given moment running your job.
Condor makes it easy to maximize the number of machines which can run your job. Because Condor does not require participating machines to share file systems (via NFS or AFS for example), machines across the entire enterprise can run your job, including machines in different administrative domains. Condor does not even require you to have an account (login) on machines where it runs your job. Condor can do this because of its Remote System Call technology, which traps operating system calls for such operations as reading/writing from disk files and sends them back over the network to be performed on the machine where the job was submitted.
In addition to migrating jobs to available machines, Condor provides sophisticated and distributed resource management. Match-making resource owners with resource consumers is the cornerstone of a successful HTC environment. Unlike many other compute cluster resource management systems which attach properties to the job queues themselves (resulting in user confusion over which queue to use as well as administrative hassle in constantly adding and editing queue properties to satisfy user demands), Condor implements a clean design called ClassAds.
ClassAds work in a fashion similar to the newspaper classified advertising want-ads. All machines in the Condor pool advertise their resource properties, such as available RAM memory, CPU type and speed, virtual memory size, physical location, current load average, and many other static and dynamic properties, into a Resource Offer ad. Likewise, when submitting a job, users can specify a Resource Request ad which defines both the required and desired set of resources to run the job. Similarly, a Resource Offer ad can define requirements and preferences. Condor then acts as a broker by matching and ranking Resource Offer ads with Resource Request ads, making certain that all requirements in both ads are satisfied. During this match-making process, Condor also takes several layers of priority values into consideration: the priority the user assigned to the Resource Request ad, the priority of the user which submitted the ad, and desire of machines in the pool to accept certain types of ads over others.