One of the characteristics I love most about Google s Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
- User request comes in and triggers a few tasks that shouldn t be blocking.
- Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
- The gearman server finds a worker, passes the url and post data to a worker
- The worker simply posts to the url with the data, thus executing the task.
Assume the following:
- Each request from a gearman worker is signed somehow so that we know it s coming from a gearman server and not a malicious request.
- Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here s one that worries me:
- The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?