Merge lp://qastaging/~vanvugt/compiz-core/fix-880707.2 into lp://qastaging/compiz-core/0.9.5

Proposed by Daniel van Vugt
Status: Merged
Approved by: Sam Spilsbury
Approved revision: 2903
Merged at revision: 2915
Proposed branch: lp://qastaging/~vanvugt/compiz-core/fix-880707.2
Merge into: lp://qastaging/compiz-core/0.9.5
Diff against target: 408 lines (+89/-150)
3 files modified
plugins/composite/src/privates.h (+2/-7)
plugins/composite/src/screen.cpp (+63/-131)
plugins/opengl/src/screen.cpp (+24/-12)
To merge this branch: bzr merge lp://qastaging/~vanvugt/compiz-core/fix-880707.2
Reviewer Review Type Date Requested Status
Sam Spilsbury Approve
Compiz Maintainers Pending
Tim Penhey Pending
Review via email: mp+83472@code.qastaging.launchpad.net

This proposal supersedes a proposal from 2011-11-14.

Commit message

Fix inaccurate frame timing causing tearing and stuttering (LP: #880707) (LP: #92599) (LP: #798868) (LP: #876575) (LP: #755841) (LP: #891744)

Description of the change

Introduced "tickless" frame timing to the composite plugin. This means we no longer need to poll for repaints. Composite only wakes up when a repaint is required and it's the right time to do so. This not only improves vsync accuracy, but also reduces CPU usage and power consumption.

To post a comment you must log in.
Revision history for this message
Tim Penhey (thumper) wrote : Posted in a previous version of this proposal

> bool scheduled, painting, reschedule;

Many coding standards say only one variable declaration per line. I'm not entirely sure what the compiz-core one says.

Also, please prefer initializer lists to setting variables in the constructor.

+ scheduled = false;
+ painting = false;
+ reschedule = false;

These should all be in the initializer list.

review: Needs Fixing
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

I disagree...

That file already contains multiple variables per line. So I am not violating the existing standard.

And I tried using initializer lists, but gcc gave warnings about variable initialization order. Those =false lines were the only way I could make the warnings go away.

Revision history for this message
Sam Spilsbury (smspillaz) wrote : Posted in a previous version of this proposal

222 + /*
223 + * Note the use of delay = 1 instead of 0, even though 0 would be better.
224 + * A delay of zero is presently broken due to CompTimer bugs;
225 + * 1. Infinite loop in CompTimeoutSource::callback when a zero
226 + * timer is set.
227 + * 2. Priority set too high in CompTimeoutSource::CompTimeoutSource
228 + * causing the glib main event loop to be starved of X events.'

Just use the delay of zero and mark the other branches as prerequisites to this one :)

Revision history for this message
Sam Spilsbury (smspillaz) wrote : Posted in a previous version of this proposal

Oh, it might also be good to get this code into unit tests, since regressions that could be introduced on frame timing would end up very difficult to debug down the road.

Feel free to send me an email about it, and I'll work on it next week. Thanks for all this, its excellent :)

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

> Just use the delay of zero and mark the other branches as prerequisites to
> this one :)

I tested a delay of zero. While it made a huge difference in the old composite code (and exposed the CompTimer bugs), using 0 instead of 1 makes no perceivable performance difference in this new version (tested with fixes for CompTimer applied). So maybe my comment is a bit wrong and should not say "even though 0 would be better". In fact, in theory, a delay of zero would increase CPU usage. And since it provides no perceivable benefit any more, I don't think we should aim to change it to zero.

Also, creating dependencies on other branches when you don't need to is not ideal. I would prefer to keep things simple as they are...

Revision history for this message
Tim Penhey (thumper) wrote : Posted in a previous version of this proposal

The reason you get warnings in the initializer list is that the member variables are initialized in the order that they are declared in the class. The initializer list expects that the order in the list is the same as the order that they are defined. If you don't, it is possible that a developer would assume that something earlier in the initializer list was initialized when in fact it wasn't.

Sam, what does the compiz coding standards say about initializer lists (if anything)?

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

Awesome. Testing with NVIDIA-280, this fix appears to solve bug 92599 and bug 888039.

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

Found a problem when testing with "nouveau". Nouveau advertises support for GLX_SGI_video_sync, however it is quietly disabled by default and returns immediately. This causes the code in this proposal to attempt to run unthrottled at 1000 FPS, which obviously is a little laggy.

The workaround for nouveau is in /etc/X11/xorg.conf:
Section "Device"
 Identifier "My Graphics"
 Option "GLXVBlank" "on"
EndSection

However, I will have to build a workaround into this proposal so that users don't need to edit xorg.conf.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Posted in a previous version of this proposal

Or - query upstream as to why it is disabled? Maybe it's something we
can safely distropatch, they already have it in the pipes?

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

I will research the issue with nouveau separately. However it's now clear that we need to occasionally expect graphics drivers to be a little broken. Compiz should be able to handle it without becoming unusable. I will look at adding detection for such driver problems into compiz (opengl plugin?) today.

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

Interestingly, I only realized today that the fix for bug 763005 is a prerequisite for this proposal. It probably wouldn't work otherwise. Lucky the fix for 763005 is already committed...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This proposal also appears to solve LP: #798868 and LP: #876575.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

This will be merged as soon as the testing branch in https://code.launchpad.net/~smspillaz/compiz-core/compiz-core.fix_880707.2.test/+merge/82961 is complete

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

(I also don't like working around driver bugs inside of the opengl plugin since these can break if the driver developers change something. If you need to add a workaround, add it to the workarounds plugin by replacing the GL::getVideoSync function pointer with your work-around wrapper)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm generalizing the "driver bug workaround" today because similar code is needed for all drivers in some circumstances.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Done. The changes to plugins/opengl/src/screen.cpp are no longer just for nouveau, but now important for all users.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Major improvement today. Reduced tearing enough to also resolve bug 755841.

Revision history for this message
Omer Akram (om26er) wrote :

Daniel, is there a specific reason this change won't make into Precise? I have noticed real positive improvements on on Intel(smooth minimize animations), nvidia-current (no tearing) and nouvea (no tearing)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Omer, you misinterpreted my comment in bug 880707. What I meant was that this fix is so complex that it won't go into any oneiric updates. But getting it into precise is my absolute top priority.

I agree; the number of improvements this branch brings is much larger than I ever envisioned. I never even noticed the stuttering bugs until I saw they were fixed by this code. It was a nice surprise.

We do absolutely want everyone to have these improvements by the time precise is released. :)

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

I'm going to have another look at how we can do automated testing from this branch again tomorrow.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I already did automate testing... using an army of testers now subscribed to ppa:vanvugt/compiz ;)

OK, maybe you meant a different kind of automated...

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

Indeed :)

I've started updating lp:~smspillaz/compiz-core/fix-880707.2.test with the stuff we discussed there earlier. Namely testing for phase (eg, that it doesn't take > 4ms to get from a vblank begin fence to finishing a swapbuffers / paint) and also testing that we don't fall out of phase when we are throttling the framerate with the composite plugin.

I don't really think its possible to be testing for an exact fix to bug 880707 without instantiating a separate opengl context and running out constructs in there, but doing that is going to be far too much effort than its worth really.

Please let me know of any other parts of the composite plugin's code that you think can be tested.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :
Revision history for this message
Sam Spilsbury (smspillaz) wrote :

(and thank you for setting up the PPA)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Comments in the other branch.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

More work may be required on this proposal. Some anecdotal evidence from a minority of testers, and a brief test on a slow netbook, both suggest that this proposal can make performance worse. I will need to look into this.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

Sure. I'll keep working on testcases for the current stuff we've got and rebase on any other work you come up with when that happens.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Fixed the slow-motion animations problem in r2900.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

I agree with fixing the plugins for what its worth.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

Hey Daniel,

I had a think about the problem that you were encountering with the plugins. As you said earlier, the plugins are expecting a situation where msSinceLastPaint is going to be reflective of optimalRedrawTime, but with the tickless algorithm, msSinceLastPaint could be quite large. On slower systems, if it took > optimalRedrawTime to actually process a paint, then forcing msSinceLastPaint to be optimalRedrawTime would result in slow animations since frame skipping would not work correctly.

Since the only case we really wanted to cover was the case where the screen went from idle to active, what do you think about doing something like this:

    struct timeval tv;
    int timeDiff;

    gettimeofday (&tv, 0);

    timeDiff = TIMEVALDIFF (&tv, &mLastRedraw);

    /* handle clock rollback */

    if (timeDiff < 0)
 timeDiff = 0;

    /*
     * Now that we use a "tickless" timing algorithm, timeDiff could be
     * very large if the screen is truely idle.
     * However plugins expect the old behaviour where timeDiff is never
     * larger than the frame rate (optimalRedrawTime).
     * So enforce this to keep animations timed correctly and smooth...
     */

    if (timeDiff > mOptimalRedrawTime && !reschedule)
 timeDiff = mOptimalRedrawTime;

    ...

At least on your branch, reschedule is going to be set every time a plugin requests a redraw in donePaint or addWindowDamage (which is the correct way to do it), so that means that in those cases, msSinceLastPaint is always going to be accurate. And where we have switched from inactive to active, msSinceLastPaint is always going to be mOptimalRedrawTime (in fact, it could be zero, but that might require further testing).

I've updated my changes to reflect this.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Your suggestion looks incorrect. priv->reschedule is always false there. I don't think it will work.

Best to use my r2900 below. Mine has been tested for a few days, whereas yours would appear to probably not work (or compile in that form :)

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

I forgot to mention - the check would have to be moved to before priv->reschedule has been set to false (I've roughly transposed the code from my branch to yours)

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

(It works quite well here, and I've tested it with all the plugins)

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

I think that checking if the timeDiff is more than 100 only really works around the original problem which I outlined earlier as well. There's a case where some really slow plugin could take more than 100 ms per frame to process some animation, which would result in perceived slowness in its animation code ;-)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

"I forgot to mention - the check would have to be moved to before priv->reschedule has been set to false (I've roughly transposed the code from my branch to yours)"

That still makes no sense in my mind. But if you can reproduce the slow animations bug consistently, and have tested and verified your fix works then go nuts. If however this is another code change you're proposing without actually testing it, then please reconsider.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

On Thu, 22 Dec 2011, Daniel van Vugt wrote:

> "I forgot to mention - the check would have to be moved to before priv->reschedule has been set to false (I've roughly transposed the code from my branch to yours)"
>
> That still makes no sense in my mind. But if you can reproduce the slow animations bug consistently, and have tested and verified your fix works then go nuts. If however this is another code change you're proposing without actually testing it, then please reconsider.

I'm running it right now and haven't run into any problems whatsoever. I
can probably give it a try on some of my other systems as well, but I put
some cases in to force it to run slowly and it still works then.

> --
> https://code.launchpad.net/~vanvugt/compiz-core/fix-880707.2/+merge/83472
> You are requested to review the proposed merge of lp:~vanvugt/compiz-core/fix-880707.2 into lp:compiz-core.
>

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yep, a simple sleep or something to make painting slow should allow you to reproduce the problem and then verify you're actually fixing it.

In my real-world example, I have an old netbook which takes between 2-4 frames to paint each frame (up to 64ms) during animations. So it's easy to reproduce "slow motion animations" on a machine that only achieves 15-30 FPS.

Oddly enough, as I have mentioned before, this "slow" machine gets a perfect 60 FPS with "Force full screen redraw (buffer swap) on repaint" enabled. But only when using this branch. The difference is staggering.

2901. By Daniel van Vugt

Merge with upstream changes (lp:compiz-core).

2902. By Daniel van Vugt

Merge upstream changes.

2903. By Daniel van Vugt

Merge from upstream.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

OK, I think there's been enough discussion here.

I'm confident with the changes this branch is introducing, and think we just need to land it now. I have created an entire testing framework for it, which can be found at lp:~smspillaz/compiz-core/compiz-core.fix_880707.2.test . That test tests for both phase and period timings. On a period of aggressive vblank scheduling for 70 seconds, that test passes each time on my machine.

The testing branch will need to land a little bit after the main one does, once Daniel is ok with it, but ideally it needs to be in before feature freeze.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches