English 中文(简体)
How to use database server for distributed job scheduling?
原标题:

I have around 100 computers and few workers on each of them. The already connect to a central database to query for job parameters.

Now I have to do job scheduling for them. One job for one worker takes few minutes, doesn t require network connection (except for dealing jobs and reporting) and can be done at any time in any order.

Constraints:

  • no job will be taken/done twice
  • there will be a timeeout in case the worker dies.

I thought that I could use a separate table in DB to schedule jobs. How to create and access job scheduling table?

最佳回答

Break it down into pieces:

You have a job description - which may have some constraints on where its performed - and you have a queue of jobs to be performed, broadly speaking if there are no other constraints you d expect the jobs to be done in order i.e. you take them from the front of the queue and add them to the end.

If we run the queue as a single table then jobs will have 3 states:

  • Not Started
  • In Progress
  • Completed

So looking for work to do is simply a matter of finding the first job (or the first n jobs if work is to be assigned in a batch) that are not started and then flagging them as started. If the constraints are more complex than simply the first n available jobs then it becomes the first n available jobs that meet the constraints, but it should still be fairly simple.

That implies the following fields in the queue table:

  • Status
  • DateQueued (date and time) for sort
  • DateStarted (date and time) for timeout
  • AssignedTo

One should probably add a DateCompleted - but if the work is batched that s not going to be strictly accurate (it ll be time reported as complete).

So for a worker (the worker "app") the process becomes:

  1. Connect to server
  2. Report completed work - set status and completion time
  3. Request new work
    1. Search for new work for worker (first n jobs not started that worker can do)
    2. Assign new work to worker (set status, Date Started and assigned to) - search and assign as a transaction.
  4. List work and disconnect

Separately you d need processes to queue work, to look for jobs that have "timed out" so that the status can be reset and to archive or otherwise clear out completed jobs from the queue.

Full table would have the following plus any audit fields required.

  • ID
  • JobID -- Assuming that jobs are defined elsewhere
  • StatusID
  • DateQueued
  • DateStarted
  • AssignedToID
  • DateCompleted

Hope that helps...

问题回答

The interesting part, and the part where all of the difficulties lie, is in wrapping things up in transactions.

You ll need two tables: A table of available work, and a table recording work that is in progress. The "work in progress" table has a unique foreign key to the work-available table.

A process wishing to do work first locates a row from the table of work to be done. This should be done using a random sort order, in order to reduce contention.

That process the removes the "work in progress" row. It was never meant to be persist outside the transaction. It s only for locking.

That process then starts a transaction.

That process then creates a row in the "work in progress" table, with a foreign key referencing the work that is being done. It should then do the work. As a part of doing that work, it should change the state of the item being worked on (e.g., making it "finished" and no longer available to be worked on).

The process the commits its transaction.

If some other process has grabbed the work, then this process s transaction will fail due to its attempt to commit a duplicate foreign key to the "work in progress" table. In that case, the process should back off for a short, random interval, and to back to the start, trying to locate some work to do.

Monitor the "work in progress" table carefully. Some databases, or some versions of some database, don t expect a table such work-in-progress table to be used as a queue, with rows constantly being created and deleted. Specifically, older versions of Postgresql had difficulty cleaning up the old, no longer used rows, causing table bloat and poor performance.





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签