Merge lp://qastaging/~jamesj/openobject-server/complex-cron-fix into lp://qastaging/openobject-server/6.1

Proposed by James Jesudason
Status: Work in progress
Proposed branch: lp://qastaging/~jamesj/openobject-server/complex-cron-fix
Merge into: lp://qastaging/openobject-server/6.1
Diff against target: 92 lines (+37/-8)
2 files modified
gunicorn.conf.py (+3/-0)
openerp/cron.py (+34/-8)
To merge this branch: bzr merge lp://qastaging/~jamesj/openobject-server/complex-cron-fix
Reviewer Review Type Date Requested Status
OpenERP Core Team Pending
Review via email: mp+96024@code.qastaging.launchpad.net

Commit message

[FIX] Make the cron master thread gunicorn-friendly by using the multiprocessing Process module instead of threading, and Queue instead of heapq. This allows the thread to work in a multi-processor and multi-threaded environment.

Description of the change

[FIX] Make the cron master thread gunicorn-friendly by using the multiprocessing Process module instead of threading, and Queue instead of heapq. This allows the thread to work in a multi-processor and multi-threaded environment.

To post a comment you must log in.
Revision history for this message
Vo Minh Thu (thu) wrote :

Hi,

Thanks for proposing changes as code!

Unfortunately, the patch doesn't seem right to me.

Leaving aside the fact things as still named 'threads' in various places (including in the command line options and in the docstrings), there are truly implementation details that are thread-specific that you have not removed/changed:
  - the possibility to cancel wakeups (and I guess that once a task is pushed to the multiprocessing.Queue, it doesn't make sense to alter its cancel attribute) and the related variables and locks
  - the old runner_body that you have replaced is still there

But those are details; the biggest problem is that the main change (i.e. using a process-friendly Queue) has no purpose here:

You use a master cron process, instead of a master cron thread. But you still spawn worker *threads*. The difference is those will run inside the master *cron* process instead of the *main*
application process. Thus there is no need for a process-friendly Queue: the worker threads can still communicate with the main cron thread (the main thread of the cron process).

This leaves us with the question of starting the main cron thread in its own process or not (i.e. around the line 80s in the patch). I am not against that choice but actually don't see any benefits doing so.

The real problem with the current implementation (and still present in your patch) is that the cron system is not aware of changes that happen in database because of other processes. Say a worker process creates a new task, in the current implementation, the wakeup structure is updateed and shared with the master thread. If that communication is broken, the master thread will never spawn a new worker. A possibility to solve the problem is to poll the database to see if anything has change. This seems a very good solution when the number of database to monitor is limited.

Revision history for this message
James Jesudason (jamesj) wrote :

Hi,

Thanks for the feedback. I know that there are redundant code and comments in this proposal, but I wanted to get approval for the approach before chopping out large slices of the existing code. I can clean that up once we have an agreed approach.

> the biggest problem is that the main change (i.e. using a process-friendly Queue) has no purpose here:
On the surface, this appears to be true. However, I did a *lot* of experimenting before coming to this solution. There is a problem with using heapq as it depends on the queue list '_wakeups'. The runner code checks that the '_wakeups' list is not empty before getting the next queued item. However, the problem is that the '_wakeups' list is *always* empty when viewed from the runner. I think that the reason for this is because the queue push is happening in a different process to the queue pull. The multiprocessing.Queue was used because that was the only method that worked. I encourage you to try using heapq instead of multiprocessing.Queue, it may be easier to understand the choice when you do.

> This leaves us with the question of starting the main cron thread in its own process or not (i.e. around the line 80s in the patch). I am not against that choice but actually don't see any benefits doing so.
Again, this came from experimenting - starting using a thread always found that '_wakeups' (or _queue) was empty. I opted for multiprocessing.Process because that was the solution that worked.

>If that communication is broken, the master thread will never spawn a new worker. A possibility to solve the problem
> is to poll the database to see if anything has change. This seems a very good solution when the number of database
> to monitor is limited.
Not sure that I understand this comment properly. In the other (simpler) branch that I submitted, the master thread kept a list of databases and just checked them every 60s (no wakeups involved). Do you mean doing something like that if the last wakeup for a database was greater than MAX_SLEEP?

Thanks.

Revision history for this message
Vo Minh Thu (thu) wrote :

> On the surface, this appears to be true. However, I did a *lot* of experimenting before coming to this solution. There is a problem with using heapq as it depends on the queue list '_wakeups'. The runner code checks that the '_wakeups' list is not empty before getting the next queued item. However, the problem is that the '_wakeups' list is *always* empty when viewed from the runner. I think that the reason for this is because the queue push is happening in a different process to the queue pull. The multiprocessing.Queue was used because that was the only method that worked. I encourage you to try using heapq instead of multiprocessing.Queue, it may be easier to understand the choice when you do.

Of course you're right. From the perspective of the cron worker processes, the multiprocessing.Queue has no importance. But I completely forgot about the gunicorn wroker processes doing the pushes to the queue.

This leaves the problem of the `cancel` functionality being lost. Which is not a big deal to me. I just have to check removing it won't cause any problem (beside hitting the database for a canceled task) but I don't think it would.

Thanks!

Revision history for this message
James Jesudason (jamesj) wrote :

I've discovered an issue with this proposal when deploying to another test server. The 'runner' method calls registry['ir.cron']._run_jobs_multithread() to check/run the scheduled tasks, and '_run_jobs_multithread' checks that there are some available threads before processing an ir.cron. I found that '_run_jobs_multithread' was always getting None for the number of threads available.

The solution should be straightforward: the _threads_slots variable should be initialised in 'runner' before going into the loop.

Unmerged revisions

4090. By James <jjesudason@ubuntu-server>

[FIX] Make the cron master thread gunicorn-friendly by using the multiprocessing Process module instead of threading, and Queue instead of heapq. This allows the thread to work in a multi-processor and multi-threaded environment.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.