TaskForest

A simple, expressive, open-source, text-file-based Job Scheduler with console, HTTP, and RESTful API interfaces.

Documentation

Options

The following options are required. If they are not specified on the command line, the environment will be searched for corresponding environment variables or look for them in the configuration file. I recommend that you specify all options in the configuration file.

--run_wrapper=/a/b/r [or environment variable TF_RUN_WRAPPER]

This is the location of the run wrapper that is used to execute the job files. The run wrapper is also responsible for creating the semaphore files that denote whether a job ran successfully or not. The system comes with two run wrappers: bin/run and bin/run_with_log

The first provides the most basic functionality, while the second also captures the stdout and stderr from the invoked job and saves it to a file in the log directory. You may use either run wrapper. If you need additional functionality, you can create your own run wrapper, as long as it preserves the functionality of the default run_wrapper.

You are encouraged to use run_with_log because of the extra functionality available to you. If you also use the included web server to look at the status of today's job, or to browser the logs from earlier days, clicking on a the status of a job that's already run will bring up the log file associated with that job. This is very convenient if you're trying to investigate a job failure.

--log_dir=/a/b/l [or environment variable TF_LOG_DIR]

This is called the root log directory. Every day a dated directory named in the form YYYYMMDD will be created and the semaphore files will be created in that directory.

--job_dir=/a/b/j [or environment variable TF_JOB_DIR]

This is the location of all the job files. Each job file should be an executable file (e.g.: a binary file, a shell script, a perl or python script). The file names are used as job names in the family configuration files. Job names may only contain the characters a-z, A-Z, 0-9 and _. You may create aliases to jobs within this directory.

If a job J1 is present in a family config file, any other occurrance of J1 in that family refers TO THAT SAME JOB INSTANCE. It does not mean that the job will be run twice.

If you want the same job running twice, you will have to put it in different families, or make soft links to it and have the soft link(s) in the family file along with the actual file name.

If a job is to run repeatedly every x minutes, you could specify that using the 'repeat/every' syntax shown above.

--family_dir=/a/b/f [or environment variable TF_FAMILY_DIR]

This is the location of all the family files. As is the case with jobs, family names are the file names. Family names may only contain the characters a-z, A-Z, 0-9 and _.

The following options are optional

--once_only

If this option is set, the system will check each family once, run any jobs in the Ready state and then exit. This is useful for testing, or if you want to invoke the program via cron or some similar system, or if you just want the program to run on demand, and not run and sleep all day.

--end_time=HH:MM

If once_only is not set, this option determines when the main program loop should end. This refers to the local time in 24-hour format. By default it is set to 23:55. This is the recommended maximum.

--wait_time

This is the amount of seconds to sleep at the end of every loop. The default value is 60.

--verbose

Print a lot of debugging information

--help

Display help text

--log

Log stdout and stderr to files. Before the logging start, the program will print onto stdout the names of the log file and error file. The program logs incidents at different levels ("debug", "info", "warning", "error" and "fatal"). The log_threshold option sets the level at which logs are written to the stdout file. If the value of log_threshold is "info", then only those log messages with a level of "info" or above ("warning", "error" or "fatal") will be written to the stdout log file. The stderr log file always has logs printed at level "error" or above, as well as anything printed explicitly to STDERR.

--log_threshold=t

Log messages at level t and above only will be printed to the stdout log file. The default value is "warn".

--log_file=o

Messages printed to stdout are saved to file o in the log_directory (if --log is set). The default value is "stdout".

--err_file=e

Messages printed to stderr are saved to file e in the log_directory (if --log is set). The default value is "stderr".

--config_file=f

Read configuration settings from config file f.

--chained

If this is set, all recurring jobs will have the chained attribute set to 1 unless specified explicitly in the family file.

--collapse

If this option is set then the status command will behave as if the --collapse options was specified on the command line.

--ignore_regex=r

If this option is set then the family files whose names match the perl regular expression r will be ignored. You can specify this option more than once on the command line or in the configuration file, but if you use the environment to set this option, you can only set it to one value. Look at the included configuration file taskforest.cfg for examples.

--default_time_zone

This is the time zone in which jobs that ran on days in the past will be displayed. When looking at previous days' status, the system has no way of knowing what time zone the job was originally scheduled for. Therefore, the system will choose the time zone denoted by this option. The default value for this option is "America/Chicago".

token

This is a configuration option that can only be set in the configuration file. A token is a dependency. It is something that a job must 'possess' before it can run, if that job needs that token. You can create different types of tokens, giving each type a common name. You can also specify how many instances of tokens of each type are to exist. The syntax for specifying the different types of tokens and the number of instances each type can have is as follows:


   +-------------------------------------------------------
01 | ...   
02 | <token T>
03 |   number = 1
04 | </token>
05 | <token U>
06 |   number = 2
07 | </token>
08 | ...
   +-------------------------------------------------------

For more details on tokens, please see the tokens page.

--calendar_dir=/a/b/f

This is the location of all the calendar files. As is the case with jobs, calendar names are the file names. Calendar names may only contain the characters a-z, A-Z, 0-9 and _.

--num_retries=n

This is the number of times to automatically retry running a job that fails. The default value is 0.

--retry_sleep=n

Wait these many seconds before automatically retrying running a job that fails. The default value is 60.

--smtp_server=server_name

This is the SMTP server that will be used to send emails out when a job fails, for example

--smtp_port=p

This is the SMTP server port. If not defined it defaults to 25.

--smtp_timeout=t

This is the SMTP timeout in seconds. If not defined it defaults to 60.

--smtp_sender=s

This is the SMTP envelope sender (the text after "MAIL FROM:")

--mail_from=f

This is the email address that appears in the From: mail header

--mail_reply_to=r

If a user replies to a received email, the reply will go to this address instead of the From: address. This address is set in the Reply-To mail header.

--mail_return_path=p

This is the address to which bounces will be sent if they occur at the SMTP server (as opposed to the receiving Mail Transfer Agent).

--instructions_dir=i

This is the directory that stores the contents of the emails that are sent by the system.

--email=e

When a job fails, emails are sent to this address.

--retry_email=e

When a job fails, but is being automatically retried, emails are sent to this address, as opposed to the one stored in the 'email' setting. If no_retry_mail is set, then no email will be sent in this case.

--retry_success_email=e

When a job fails, is automatically retried one or more times and then suceeds, emails are sent to this address, as opposed to any of the others. If no_retry_success_email is set, then no email will be sent in this case.

--no_retry_email=n

If this is set to 1, then an email will not be sent when a job fails and is being automatically retried. The default value is 0 (retry emails will be sent).

--no_retry_success_email=n

If this is set to 1, then an email will not be sent when a job fails, was automatically retried one or more times and then finally succeeded. The default value is 0 (retry_success emails will be sent).