Skip to main content

Coal Mine - Periodic task execution monitor

Project description

Home page is on Github. Releases are available in PyPI.

What is Coal Mine?

Periodic, recurring tasks are ubiquitous in computing, and so one of the most common problems in systems administration and operations is ensuring that such tasks execute as expected. Designing the tasks to report errors is necessary but not sufficient; what if a task isn’t being run at all (crashed daemon, misconfigured crontab) or is running much more slowly than it should be?

Coal Mine provides a simple yet powerful tool for solving this problem. In a nutshell:

  • Each recurring task has a Coal Mine “canary” associated with it.

  • The task triggers the canary when it is finished executing.

  • The canary knows how often the task is supposed to execute.

  • Coal Mine alerts by email when a canary is late and alerts again when the late canary resumes.

  • Coal Mine keeps a (partial) history of when each canary was triggered.

Track tasks that are supposed to execute periodically using “canaries” that the tasks trigger when they execute. Alert by email when a canary is late. Alert again when a late canary resumes. Keep a partial history of canary trigger times.

The server notifies immediately when the deadline for an unpaused canary passes. Similarly, the server notifies immediately when a previously late canary is triggered.

Prerequisites

  • Python 3

  • MongoDB for storage (pull requests to add additional storage engines are welcome)

  • requirements listed in requirements.txt

  • for development, requirements listed in requirements_dev.txt

Concepts

Coal Mine provides two interfaces, a REST API and a command-line interface (CLI). Since triggering a canary requires nothing more than hitting its endpoint with a GET or POST query, it’s best to do triggering through the API, so that the CLI doesn’t need to be installed on every system running monitoring tasks. For administrative operations, on the other hand, the CLI is usually easier.

All timestamps stored and displayed by Coal Mine are in UTC.

Operations

The operations that can be performed on canaries through the CLI or API are:

  • create

  • delete

  • reconfigure

  • get information about

  • pause – stop monitoring and alerting

  • unpause

  • trigger

  • list – all canaries or the ones matching search terms

Coal Mine security is rudimentary. If the server is configured with an optional authentication key, then the key must be specified with all operations except trigger.

Data

These canary attributes are specified when it is created or updated:

  • name

  • description

  • periodicity – the maximum number of seconds that can elapse before a canary is late, or a schedule in the format described below, which allows the periodicity of the canary to vary over time

  • zero or more notification email address(es)

These are created and maintained by Coal Mine:

  • slug – the canary’s name, lower-cased, with spaces and underscores converted to hyphens and other non-alphanumeric characters removed

  • a random identifier consisting of eight lower-case letters, generated when the canary is created and guaranteed to be unique against other canaries in the database

  • late state (boolean)

  • paused state (boolean)

  • deadline by which the canary should be triggered to avoid being late

  • a history of triggers, pruned when >1000 or (>100 and older than one week)

Scheduled periodicity

Coal Mine allows the periodicity of a canary to vary automatically based on the time, date, day of week, etc. There are three contexts in which this is useful:

  1. a recurring task executes with different frequencies at different times;

  2. a continuous recurring task takes more or less time to finish at different times; or

  3. the urgency of responding to delays in a recurring task varies at different times.

To specify a varying periodicity for a canary, instead of just specifying a number of seconds, you specify a serious of crontab-like directives separated by semicolons. Here’s an example, split onto multiple lines for clarity:

# 5-minute delays are ok on weekends ;
* * * * sat,sun 300 ;
# 5-minute days are ok overnight ;
* 0-12 * * mon-fri 300 ;
# otherwise, we require a shorter periodicity ;
* 13-23 * * mon-fri 90

Notes:

  • The last field in each directive is the periodicity value, i.e., the maximum number of seconds to allow between triggers during the specified time range.

  • As indicated above, even though the example is shown split across multiple lines, it must be specified all on one line when providing it to Coal Mine.

  • Note that comments like the ones shown above really are allowed in the schedule you specify to Coal Mine – they’re not just for decoration in the example – but you need to remember to end them with semicolons.

  • Schedule directives cannot overlap. For example, this won’t work, because the second directive overlaps with the first one every Saturday and Sunday between midnight and noon:

    * * * * sat,sun 60 ;
    * 0-11 * * * 90
  • If a canary’s schedule has gaps, then the canary is effectively paused during them. For example, in this schedule, the canary would be paused all day Saturday:

    * * * * sun 300 ;
    * * * * mon-fri 60
  • As with everything else in Coal Mine, the hours and minutes specified here are in UTC.

  • When you create or update a canary with a periodicity schedule, the canary data returned to you in response will include a “periodicity_schedule” field showing how the schedule you specified plays out. The schedule will extend far enough into the future for each of the directives you specified to be be shown at least once, or for a week, whichever is longer.

Installation and configuration

Server

  1. pip install coal-mine

  2. Create /etc/coal-mine.ini (see below)

  3. Run coal-mine &

  4. Put that in /etc/rc.local or something as needed to ensure that it is restarted on reboot.

Server configuration file

The server configuration file, coal-mine.ini, can go in the current directory where the server is launched, /etc, or /usr/local/etc. (If you need to put it somewhere else, modify the list of directories near the top of main() in server.py.)

The file is (obviously) in INI format. Here are the sections and settings that it can or must contain:

  • [logging] – optional

  • file – log file path; otherwise logging goes to stderr

  • rotate – if true, then rotate the log file when it gets too large

  • max_size – max log file size before rotating (default: 1048576)

  • backup_count – number of rotated log files to keep (default: 5)

  • [mongodb] – required

  • hosts – the first argument to pymongo 3’s MongoClient

  • database – database name. Coal Mine will create only one collection in the database, called “canaries”.

  • username – must be specified, but can be blank if no authentication is required

  • password – must be specified, but can be blank if no authentication is required

  • replicaSet – must be specified if using a replicaset

  • other arguments will be passed through to MongoClient

  • [email] – required

  • sender – email address to put in the From line of notification emails

  • [wsgi] – optional

  • port – port number the server should listen on (default: 80)

  • auth_key – if non-empty, then the specified key must be specified as a parameter of the same name with all API requests except “trigger”.

CLI

  1. pip install coal-mine

  2. cmcli configure [--host server-host-name] [--port server-port] [--auth-key key | --no-auth-key]

The CLI stores its configuration in ~/.coal-mine.ini. Note that the authentication key is stored in plaintext. Any configuration parameters the CLI needs that aren’t stored in the INI file must be specified explicitly on the command line when using the CLI.

Using Coal Mine

CLI

The Coal Mine CLI, cmcli, provides convenient access to the full range of Coal Mine’s functionality.

To make the CLI easier to use, you can configure it as shown above, but you also have the option of specifying the server connection information every time you use it. Also, connnection information specified on the command line overrides the stored configuration.

Here are some example commands:

cmcli create --help

cmcli create --name 'My Second Canary' --periodicity $((60*60*25))  # $((60*60*25)) is 25 hours
cmcli trigger --id aseprogj
cmcli delete --slug 'my-second-canary'

Run cmcli --help for more information.

For commands that operate on individual canaries, you can identify the canary with --id, --name, or --slug. Note that for the update command, if you want to update the name of a canary you will need to identify it --id or --slug, because in that case the --name argument is used to specify the new name.

API usage examples

Example commands

$ coal-mine &
[1] 7564
$ curl 'http://coal-mine-server/coal-mine/v1/canary/create?name=My+First+Canary&periodicity=3600'
{
    "status": "ok",
    "canary": {
        "deadline": "2015-03-19T02:08:44.885182",
        "id": "fbkvlsby",
        "paused": false,
        "description": "",
        "periodicity": 3600,
        "name": "My First Canary",
        "slug": "my-first-canary",
        "emails": [],
        "history": [
            [
                "2015-03-19T01:08:44.885182",
                "Canary created"
            ]
        ],
        "late": false
    }
}
$ curl 'http://coal-mine-server/fbkvlsby?comment=short+form+trigger+url'
{
    "recovered": false,
    "unpaused": false,
    "status": "ok"
}
$ curl 'http://coal-mine-server/coal-mine/v1/canary/trigger?slug=my-first-canary&comment=long+form+trigger+url'
{
    "recovered": false,
    "unpaused": false,
    "status": "ok"
}
$ curl 'http://coal-mine-server/coal-mine/v1/canary/get?name=My+First+Canary'
{
    "canary": {
        "paused": false,
        "name": "My First Canary",
        "history": [
            [
                "2015-03-19T01:11:56.408000",
                "Triggered (long form trigger url)"
            ],
            [
                "2015-03-19T01:10:42.608000",
                "Triggered (short form trigger url)"
            ],
            [
                "2015-03-19T01:08:44.885000",
                "Canary created"
            ]
        ],
        "emails": [],
        "id": "fbkvlsby",
        "late": false,
        "slug": "my-first-canary",
        "deadline": "2015-03-19T02:11:56.408000",
        "periodicity": 3600,
        "description": ""
    },
    "status": "ok"
}

All API endpoints are fully documented below.

Watching a cron job

0 0 * * * my-backup-script.sh && (curl http://coal-mine-server/fbkvlsby &>/dev/null)

API reference

All API endpoints are submitted as http(s) GET requests. Results are returned in JSON.

All results have a “status” field which is “ok” on success or “error” on failure. Failures also return a reasonable HTTP error status code.

Boolean fields in API should be specified as “true”, “yes”, or “1” for true, or “false”, “no”, “0”, or empty string for false. Boolean fields in responses are standard JSON, i.e., “true” or “false”.

Timestamps returned by the API are always UTC.

Create canary

Endpoint: /coal-mine/v1/canary/create

Side effects:

Adds canary to database. Creates history record at current time with “Canary created” as its comment. Sets deadline to current time plus periodicity, unless “paused” was specified.

Required parameters:

  • name

  • periodicity

  • auth_key (if authentication is enabled in the server)

Optional parameters:

  • description - empty if unspecified

  • email - specify multiple times for multiple addresses; no notifications if unspecified

  • paused - allows canary to be created already in paused state

Response is the same as shown for get().

Delete canary

Endpoint: /coal-mine/v1/canary/delete

Required parameters:

  • name, id, or slug

  • auth_key

Response:

{'status': 'ok'}

Update canary

Endpoint: /coal-mine/v1/canary/update

Side effects:

Updates the specified canary attributes. Updates deadline to latest history timestamp plus periodicity if periodicity is updated and canary is unpaused, and sets late state if new deadline is before now. Sends notification if canary goes from not late to late or vice versa.

Required parameters:

  • id or slug (not name, which should only be specified to update the name and slug)

  • auth_key

Optional parameters:

  • name

  • periodicity

  • description

  • email - specify a single value of “-” to clear existing email addresses

Response is the same as shown for get().

Get canary

Endpoint: /coal-mine/v1/canary/get

Required parameters:

  • name, id, or slug

  • auth_key

Response:

{'status': 'ok',
 'canary': {'name': name,
           'description': description,
           'id': identifier,
           'slug': slug,
           'periodicity': seconds,
           'emails': [address, ...],
           'late': boolean,
           'paused': boolean,
           'deadline': 'YYYY-MM-DDTHH:MM:SSZ',
           'history': [['YYYY-MM-DDTHH:MM:SSZ', comment], ...]}}

List canaries

Endpoint: /coal-mine/v1/canary/list

Required parameters:

  • auth_key

Optional parameters:

  • verbose - include all query output for each canary

  • paused - boolean, whether to list paused / unpaused canaries only

  • late - boolean, whether to list late / timely canaries only

  • search - string, regular expression to match against name, identifier, and slug

Response:

{'status': 'ok',
 'canaries': [{'name': name,
             'id': identifier},
            ...]}

If “verbose” is true, then the JSON for each canary includes all the fields shown above, not just the name and identifier.

Trigger canary

Endpoint: /coal-mine/v1/canary/trigger

Also: /identifier, in which case the “id” parameter is implied

Note that the server will accept POST requests for triggers as well as GET requests, so that you can use triggers as webhooks in applications that expect to be able to POST. The content of the POST is ignored; even when using POST, the API parameters must still be specified as a query string.

Side effects:

Sets late state to false. Sets deadline to now plus periodicity. Adds history record. Prunes history records. Unpauses canary. Generates notification email if canary was previously late.

Required parameters:

  • name, id, or slug

Optional parameters:

  • comment - stored in history with trigger record

Response:

{'status': 'ok', 'recovered': boolean, 'unpaused': boolean}
  • recovered - indicates whether the canary was previously late before this trigger

  • unpaused - indicates whether the canary was previously paused before this trigger

Pause canary

Endpoint: /coal-mine/v1/canary/pause

Side effects:

Clears deadline. Sets late state to false if necessary. Pauses canary. Adds history record about pause. Prunes history records.

Required parameters:

  • name, id, or slug

  • auth_key

Optional parameters:

  • comment

Response is the same as shown for get().

Unpause canary

Endpoint: /coal-mine/v1/canary/unpause

Side effects:

Sets deadline to now plus periodicity. Unpauses canary. Adds history record about unpause. Prunes history records.

Required parameters:

  • name, id, or slug

  • auth_key

Optional parameters:

  • comment

Response is the same as shown for get().

Quis custodiet ipsos custodes?

Obviously, if you’re relying on Coal Mine to let you know when something is wrong, you need to make sure that Coal Mine itself stays running. One way to do that is to have a cron job which periodically triggers a canary and generates output (which crond will email to you) if the trigger fails. Something like:

0 * * * * (curl http://coal-mine-server/atvywzoa | grep -q -s '"status": "ok"') || echo "Failed to trigger canary."

I also recommend using a log-monitoring service such as Papertrail to monitor and alert about errors in the Coal Mine log.

Contacts

Github

Email

PyPI

Contributors

Coal Mine was created by Jonathan Kamens, with design help from the awesome folks at Quantopian. Thanks, also, to Quantopian for supporting the development and open-sourcing of this project.

Development philosophy

Use Python.

Do one, simple thing well. There are several similar projects out there that do more than this project attempts to do.

Make the implementation as simple and straightforward as possible. The code should be small. What everything does should be obvious from reading it.

Minimize external dependencies. If something is simple and straightforward to do ourselves, don’t use a third-party package just for the sake of using a third-party package.

Alternatives

Alternatives to Coal Mine include:

We chose to write something new, rather than using what’s already out there, for several reasons:

  • We wanted more control over the stability and reliability of our watch service than the commercial alternatives provide.

  • We wanted fine-grained control over the periodicity of our watches, as well as assurance that we would be notified immediately when a watch is late, something that not all of the alternatives guarantee.

  • We like Python.

  • We like OSS.

To Do

(Pull requests welcome!)

Other storage engines.

Other notification mechanisms.

More smtplib configuration options in INI file.

Web UI.

Links to Web UI in email notifications.

Repeat notifications if a canary remains late for an extended period of time? Not even sure I want this.

Better authentication?

Support time-zone localization of displayed timestamps.

SSL support in server

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coal_mine-0.4.3.tar.gz (33.1 kB view details)

Uploaded Source

Built Distributions

coal_mine-0.4.3.linux-x86_64.tar.gz (44.5 kB view details)

Uploaded Source

coal_mine-0.4.3-py3.4.egg (54.8 kB view details)

Uploaded Source

coal_mine-0.4.3-1.src.rpm (60.7 kB view details)

Uploaded Source

coal_mine-0.4.3-1.noarch.rpm (70.1 kB view details)

Uploaded Source

File details

Details for the file coal_mine-0.4.3.tar.gz.

File metadata

  • Download URL: coal_mine-0.4.3.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for coal_mine-0.4.3.tar.gz
Algorithm Hash digest
SHA256 8cc80de5e76d23e05bccd62026294f2882c7d25e4cd27365b3861b590e23df67
MD5 8e5bafb0492962c495403e7754b707da
BLAKE2b-256 6555339cc2e58643f7240051d665e64ba17eea9cd6686b3597bfb242a30995da

See more details on using hashes here.

File details

Details for the file coal_mine-0.4.3.linux-x86_64.tar.gz.

File metadata

File hashes

Hashes for coal_mine-0.4.3.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 945355493343cd85f5fbfb4d6898b1cf3489e4e4ca93da1c6f15f57167d06e00
MD5 723b2a2629b15ba9278f5962abf4d074
BLAKE2b-256 f95dc2663a68a1d965fec01a8ae54cef640f2321408acd9da4a1dd952570b370

See more details on using hashes here.

File details

Details for the file coal_mine-0.4.3-py3.4.egg.

File metadata

File hashes

Hashes for coal_mine-0.4.3-py3.4.egg
Algorithm Hash digest
SHA256 5549453f01719ec0f4c5b0243910354f7fc4121c4078d602c9f47e43be0cca5e
MD5 4c7c0b64b6915e7412460307b1d8b77a
BLAKE2b-256 29a5499839bf83bceab7850ba461a19e1cea16cb7dd713018da1cabb23658548

See more details on using hashes here.

File details

Details for the file coal_mine-0.4.3-1.src.rpm.

File metadata

File hashes

Hashes for coal_mine-0.4.3-1.src.rpm
Algorithm Hash digest
SHA256 742fc083e62431275d863caf5df9ca44662b1251f277948c2fbdea18e7145465
MD5 e7a1e86ff84271f3b4101535436f8a7b
BLAKE2b-256 50c536d111e950c41350d9168a559f22445dd31a5718fdd885f42f3c999221dd

See more details on using hashes here.

File details

Details for the file coal_mine-0.4.3-1.noarch.rpm.

File metadata

File hashes

Hashes for coal_mine-0.4.3-1.noarch.rpm
Algorithm Hash digest
SHA256 5a73f34bd141d0ca12ae5fb9005b9e9b9f687bab276a8ad8ce6150f621c27603
MD5 3c7b939c0f737379d8bb7d6aae144925
BLAKE2b-256 385954e6a321b78cf3829facbb43dbb0d57aaddf6075e3af29696f8874971699

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page