Running scheduled tasks in SeedDMS
Whenever lots of data is managed like SeedDMS does it, there is sooner or later a need for running certain task, e.g. to do clean ups or update operations, or simply to check for data changes occurred over the past. One of the rather obvious operations in SeedDMS is checking for expired documents. But there are others, like informing users about reviews or approvals to be due or updating the full text index. None of them would ever be done without an external trigger, because a web application is not a constantly running process doing all the above at recurring intervals. A document in SeedDMS, which has expired some time ago, will not change its status to expired unless a user accesses that document and forces SeedDMS to check the expiration date again. If you were looking at the database, you would see a document remaining in its old state. In most cases this is just fine, because nobody actually cares about the status of a document unless it is being accessed. But there are other cases where it makes a difference. The full text index, which also stores the status of a document, will not be aware of the expired documents, unless it is updated regularly. That’s why the so called scheduler was added in SeedDMS 6.
The scheduler
The scheduler in SeedDMS manages those task which need to be run regularly. It is much like a cron daemon on Unix systems and actually the way to schedule a task has been borrowed from crond.
SeedDMS 6 already ships with a small number of tasks, which need to be configured and activated to be of any use.
- finding expired documents and informing the owner
- updating or recreating the full text index
- checking for an incorrect checksum of a document
- checking for missing preview images
- checking for upcomming events in the calendar and informing the owner of the event
Many of the available extensions add more task of this kind, e.g. to import mail attachments, checking for due revisions, emptying the trash can, etc.
If you have ever looked into the directory utils
of your SeedDMS
installation, you will find some php scripts which do exactly what the tasks
above do. In the past those scripts had to be called individually by a cron job. Hence,
each task needs its own cron job. The scheduler comprises all tasks into one
cron job, by just calling utils/seeddms-schedulercli
. All the remaining
configuration and activation of a task can be done within SeedDMS.
Configuring the scheduler
The scheduler itself also is not a constantly running process, it must be run by a cronjob with the same user running your webserver
*/5 * * * * /home/seeddms/utils/seeddms-schedulercli --mode=run
In this particular case it is run every 5 minutes. It could be less or more time between the runs, but keep in mind that this is also the minimum time between two runs of a task. If you configure a task to run every minute, it will not do it if the scheduler is run every 5 minutes.
The scheduler will not run unless a user cli_scheduler
exists in SeedDMS. This
user will be the one used to access the documents and folders. Hence, it should
have sufficient access rights.
Thought the scheduler script seeddms-schedulercli
is usually called from
a cronjob it can also be called on the command line. This can be very helpful
for debugging. In that case it’s worth to have a look at the different options
which can be passed on the command line. Just run the script with the option -h
to get a list of possible command line parameters. Always run the scheduler with
the same user like your web server user, otherwise files created by any of the
tasks may not be changeable later (e.g. the full text index if recreated). On
Debian would usually run
sudo -u www-data utils/seeddms-schedulercli --mode=run
In case SeedDMS’ configuration file cannot be found, then just specify the full
path of the configuration file with the command line option --config
.
What if you can’t run it as a cronjob
There are cases where you cannot run the scheduler by a cron daemon.
That’s why the page op/op.Cron.php
exists. Calling this page with
the parameter mode=run
will be identical to running
utils/seeddms-schedulercli --mode=run
. The result of that call
is a json data object with information about each task. You may
evaluate it, but often it’s sufficient to check for the http status
code 200, whether the execution of the scheduler was successful or not.
The page uses basic authentication and requires to log in as user
cli_scheduler
.
Configuring a task
Each task itself is configured in SeedDMS. Just click on the button ‘Scheduler’
in the admin area. The page will list all available task classes. It’s
important to recognize the difference between the task class and it’s
instantiation. It’s very much like objects and classes in object oriented
programming. The task classes shipped with SeedDMS all start with core::
.
Extensions should use its extension name as prefix followed by ::
.
The first step to run a task is choosing one of the task classes and
clicking on the +
button. It will open a form next to the list of
task classes. The first four fields of the form are identical for all
tasks. The name and the frequency are mandatory. The frequency
follows the same syntax like the first five numbers in a crontab. It
also understand the terms @daily
and @hourly
which mean exactly
what the say. The fourth parameter Disabled can be use to activate
or deactivate a task. If there are more fields in the form, then they
are specific for the task. Those extra parameters explain, why there
can be more tasks derived from the same task class. They do different
things depending on the parameters. A common scenario is to have two
task base on the class core::indexingdocs
. One running more
frequently with the parameter recreate not being checked and a second
task, running e.g. once a week, to recreate the whole full text index.
Once you have created the first task it will appear in another table below the task classes. Those task which are activated will have a green background all other task are white. Besides the name, description, class and frequency of a task, it also shows the time of the last and next run. Tasks which have been deactivated may have a time for the next run a long time ago. If such as task is activated again, it will be run the next time the scheduler is run.
Conclusion
Most installations of SeedDMS get along without tasks, but sooner or later your users will request reminders, status reports, monitoring, etc. or your daily administrative duties become too boring. Than it’s time to automate things with periodically run tasks. A good starting point might be the example extension in SeedDMS.