Harrison is a project mainly written in JavaScript, based on the BSD-3-Clause license.
SimpleGeo's offline task system
Harrison is an offline task system that uses Redis and Node.js to keep track of tasks and push them out to ancillary worker processes. It is inspired by Resque, Flickr's Offline Task system(s), and others.
Harrison is named for John Harrison, creator of the Longitude Clocks, his attempt to win the Longitude Prize.
The "public" (initiator- and worker-facing) JSON description of a task looks like:
{
"class": "UserBackfill",
"args": [
"12"
]
}
This is the payload that is both used to create tasks and passed to worker
processes. an id
parameter may be included when sent to a worker.
The internal task definition looks like this:
{
"class": "UserBackfill",
"args": [
"12"
],
"id": 42,
"attempts": 0,
"state": "ready",
"lastError": null,
"queuedAt": "2010-07-08T18:02:23.347Z",
"firstRunAt": "2010-07-08T18:04:00.000Z",
"reservedAt": "2010-07-08T18:03:00.000Z",
"lastRunAt": null,
"firstScheduledFor": "2010-07-08T18:02:23.347Z",
"scheduledFor": "2010-07-08T18:02:23.347Z",
"lastRunBy": null,
"priority": ""
}
The following states are valid:
Task ids are generated by INCR
ing the string value next.task.id
. Uniqueness
is checked by taking the SHA of the public task description (sans whitespace)
and seeing whether a matching task has already been scheduled (and not yet run).
The following Redis data structures are used (prefixed with a user-defined
value, harrison
by default):
next.task.id
- string value containing the maximum task id, suitable for
INCR
ing.tasks:<id>
- hash containing the internal definition of a tasktasks
- set containing ids of all pending / running tasks (for uniqueness
checks)queue
- sorted set containing ids of pending / running tasks (sort key =
scheduledFor
)by_priority
- sorted set containing ids of pending / running tasks with a
particular priority (sort key = priority
)reservoir
- sorted set containing ids of tasks scheduled for the future
(scheduledFor
> now) (sort key = scheduledFor
)failed
- sorted set containing ids of failed tasks (sort key = attempts
)pending
- sorted set containing ids of reserved (in Harrison's local buffer)
tasks (sort key = reservedAt
)error
- set containing tasks that passed the retry limitinvalid
- set containing ids of invalid tasks (invalid JSON, etc.)errors:<id>
- string value containing the most recent error output for a
failed taskqueuedAt
)firstRunAt
)queuedAt
)reservedAt
)firstScheduledFor
)TDB
harrison.WORKER_MAP = {
"backfill": {
"href": "http://localhost:8080/jobs/backfill",
"concurrency": 10
},
"load": {
"href": "http://localhost:8081/",
"concurrency": 12
},
"echo": "http://localhost:8081/"
};
The following actions need to occur on a regular basis:
pending
and Harrison's local queueerror
When a task completes (the HTTP request ends), some of the following actions should be taken:
failed
and reschedule into reservoir
as
appropriatepending
tasks
and tasks:<id>
as appropriateThe concurrency of the following operations is configurable:
Harrison has the following dependencies:
Follow the instructions for hashlib to add it to Node's require.path
.
$ NODE_PATH=lib expresso
$ NODE_PATH=../../fictorial/redis-node-client/lib/:lib/ node lib/harrison.js