Mir

Merge lp://qastaging/~alan-griffiths/mir/keep-helgrind-happy into lp://qastaging/mir

keep-helgrind-happy
Merge into development-branch

Proposed by Alan Griffiths on 2014-02-20

Status:

Rejected

Rejected by:

Alan Griffiths on 2014-02-24

Proposed branch:

lp://qastaging/~alan-griffiths/mir/keep-helgrind-happy

Merge into:

lp://qastaging/mir

Diff against target:

13 lines (+3/-1)

1 file modified

tests/integration-tests/compositor/test_swapping_swappers.cpp (+3/-1)

To merge this branch:

bzr merge lp://qastaging/~alan-griffiths/mir/keep-helgrind-happy

Low

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Daniel van Vugt			Disapprove on 2014-02-21
Kevin DuBois (community)			Approve on 2014-02-21
Alberto Aguirre (community)		2014-02-20	Approve on 2014-02-20
PS Jenkins bot (community)	continuous-integration		Approve on 2014-02-20
Review via email: mp+207389@code.qastaging.launchpad.net

Commit message

tests: frig code to prevent helgrind seeing a race condition (LP: #1282886)

Description of the change

tests: frig code to prevent helgrind seeing a race condition

After some investigation it became apparent that helgrind is seeing some read access of the address of mutex from the callback *before* the function is re-entered and a new mutex created on the same spot. I don't think the logic is actually wrong (just that helgrind hasn't got a way to sequence the destruction and re-creation of the mutex with the compositing thread).

As we don't need a new mutex for each call I've made it persistent.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-20:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/mir-team-mir-development-branch-ci/905/rebuild

review: Approve (continuous-integration)

Revision history for this message

Alberto Aguirre (albaguirre) on 2014-02-20:

review: Approve

Revision history for this message

Kevin DuBois (kdub) wrote on 2014-02-21:

okay

review: Approve

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-21:

Helgrind uses the Eracer algorithm, which is pretty simple and even obvious if you think about it for a bit;
http://www.cs.duke.edu/courses/cps210/spring06/papers/eraser.pdf
As such it's very easy to trust the correctness of Helgrind, providing it has implemented hooks for all the lock/unlock functions you may depend on.

I'm not sure, but it appears the problem is that mutex is not guaranteed to be 1:1 to the switching_bundle which affects cv.

The basic rule for condition variables is that if you're not protecting any data/objects in the associated mutex then you're using the condition variable in a racy and unsafe way. This is why both C++ and pthreads always require a condition have an associated lock. Because failure to use that lock properly will make you race. I forget the exact reasoning, but it appears the solution would be to tie mutex to the switching bundle more closely. Even lock it before entering the function.

Using a static might be a safe solution, not sure, but it's definitely inefficient forcing all threads to share the same mutex. So not ideal. When/if I can commit more thought to the problem, I'm sure it would be possible to solve more correctly without "static".

review: Abstain

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-21:

I think I found the real issue. The mutex obviously needs to live longer than the lambda that's locking/unlocking it. But that's not guaranteed because the lambda runs in a different thread. Using static provides the guarantee, but it's a bit of a hack. I'll try to propose a more correct solution.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-21:

I feel static is a hack (but is a solution too). Also, this proposal doesn't silence the races reported for std::atomic.

I've made a separate proposal that fixes all the races, for helgrind cleanliness:
https://code.launchpad.net/~vanvugt/mir/fix-1282886/+merge/207596