Merge lp://qastaging/~epics-core/epics-base/fix-calc-bit-manipulation into lp://qastaging/~epics-core/epics-base/3.14

Proposed by Ralph Lange
Status: Merged
Approved by: Andrew Johnson
Approved revision: 12615
Merged at revision: 12611
Proposed branch: lp://qastaging/~epics-core/epics-base/fix-calc-bit-manipulation
Merge into: lp://qastaging/~epics-core/epics-base/3.14
Diff against target: 371 lines (+141/-53)
3 files modified
src/libCom/calc/calcPerform.c (+35/-22)
src/libCom/calc/postfix.c (+15/-12)
src/libCom/test/epicsCalcTest.cpp (+91/-19)
To merge this branch: bzr merge lp://qastaging/~epics-core/epics-base/fix-calc-bit-manipulation
Reviewer Review Type Date Requested Status
Ralph Lange Approve
Andrew Johnson Approve
Review via email: mp+286515@code.qastaging.launchpad.net

Description of the change

Fix for bug lp:1514520

- Pulls a backport of a 3.15 change to the calc engine, setting the internal integer format to the size-fixed epicsInt32 type.
- Adds tests that show the behavior described in the bug report.
- Applies the original patch by Dirk from that bug report.

To be able to apply this to 3.14, I had to backport the change to the sized-fixed integers. For sanity.

When applying to 3.15, that first commit has to be omitted.

One thing that I am not sure of: left shifts (zeroes filled in) are handled different than right shifts (ones filled in, assuming a signed integer operation).
The documentation (AppDevGuide) leaves it open.

To post a comment you must log in.
12614. By Andrew Johnson

Fixed conversion overflows in tests

Minor tidying-up, added comments about casting for bitwise operations.

Revision history for this message
Andrew Johnson (anj) wrote :

Added an epicsUInt32 utop variable to calcPerform() to reduce the number of casts, and added some code comments.

Previous versions have always handled shifts as signed quantities, so the sign bit is extended on right-shifts. I changed the left-shift code to match, although it doesn't actually make any difference.

The shift operators also need to have their shift count limited to avoid Undefined Behavior; I see 2 realistic possibilities: ANDing the shift count with 31; or return 0 on shift-overflow but when right-shifting a negative quantity return -1 on shift-overflow. The first is the fastest and simplest to code, but the second one might be regarded as be more logical. Any preferences or alternative suggestions? I have implemented the first option for now.

The large positive decimal values in your tests are causing the same overflows when converted from double to integer that Dirk originally reported, they always convert to 0x80000000 on my RHEL systems so some tests fail. The paragraph before Dirk's "The fix is:" line explains that you have to use negative values for bit 31 to be set, which my tests agree with. The negative versions that work are 0xaaaaaaaa = -1431655766, 0xffff0000 = -65536. Note that postfix() compiles double literals into a LITERAL_INT op-code if the value can be safely held in an epicsInt32, so adding ".0" has no effect; but using ".1" instead does work, the fraction gets truncated.

Also fixed some minor white-space issues.

review: Approve
Revision history for this message
Ralph Lange (ralph-lange) wrote :

> Added an epicsUInt32 utop variable to calcPerform() to reduce the number of
> casts, and added some code comments.

Looks good, thanks.

> Previous versions have always handled shifts as signed quantities, so the sign
> bit is extended on right-shifts.

Fine with me. I was just seeing the asymmetry.

> The shift operators also need to have their shift count limited to avoid
> Undefined Behavior; I see 2 realistic possibilities: ANDing the shift count
> with 31; or return 0 / -1 on shift-overflow [...]
> I have implemented the first option for now.

Fine with me. We don't have to fix the holes in the definition.

> The large positive decimal values in your tests are causing the same overflows
> when converted from double to integer that Dirk originally reported, they
> always convert to 0x80000000 on my RHEL systems so some tests fail.

Interesting. Is that a compiler or a size dependency? My Debian 64bit gcc5 machine was testing them fine.

Thanks a lot!

Revision history for this message
Andrew Johnson (anj) wrote :
Download full text (4.6 KiB)

> Is that a compiler or a size dependency? My Debian 64bit gcc5 machine was testing them fine.

Don't know, I was testing on RHEL 6.7 with both 64 and 32-bit builds. Let me boot my Ubuntu VM to see what that does...
No difference; I'm doing manual checks on an IOC with a single calc record loaded.

This is with an unfixed 3.14 IOC, 64-bit:
tux% caget -lx anj:calc.CALC anj:calc
anj:calc.CALC a:=2863311530.0; a >> 8
anj:calc 0xAAAAAA

This is a 3.15 IOC, 64-bit:
tux% caget -lx anj:calc.CALC anj:calc
anj:calc.CALC a:=2863311530.0; a >> 8
anj:calc 0xFFFFFFFFFF800000

Then with this branch, 64-bit:
tux% caget -lx anj:calc.CALC anj:calc
anj:calc.CALC a:=2863311530.0; a >> 8
anj:calc 0xFFFFFFFFFFAAAAAA

The difference between 3.14 and 3.15 is because the bit-wise expressions were changed from long to epicsInt32, which fixed the shift to make them sign-extend properly, but show the broken value conversions. The final result above is correct (the sign extension on the results above are being done by caget -lx on 64-bit). I can get the right result from 3.15 using the negative value instead:
tux% caget -lx anj:calc.CALC anj:calc
anj:calc.CALC a:=-1431655766.1; a >> 8
anj:calc 0xFFFFFFFFFFAAAAAA

I think I got a bit confused when I was working on this last week; the negative value is evidently not (and maybe shouldn't be) necessary with the proper fix, so maybe I didn't need to change those tests. However I have just rewritten the double tests to check that we do convert the arguments properly for every bit-wise operator instead of duplicating the earlier integer tests:

    // converting double values (add 0.1 to force as double)
    // 0xaaaaaaaa = -1431655766 or 2863311530u
    testUInt32Calc("-1431655766.1 OR 0", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 OR 0", 0xaaaaaaaau);
    testUInt32Calc("0 OR -1431655766.1", 0xaaaaaaaau);
    testUInt32Calc("0 OR 2863311530.1", 0xaaaaaaaau);
    testUInt32Calc("-1431655766.1 XOR 0", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 XOR 0", 0xaaaaaaaau);
    testUInt32Calc("0 XOR -1431655766.1", 0xaaaaaaaau);
    testUInt32Calc("0 XOR 2863311530.1", 0xaaaaaaaau);
    testUInt32Calc("-1431655766.1 AND 0xffffffff", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 AND 0xffffffff", 0xaaaaaaaau);
    testUInt32Calc("0xffffffff AND -1431655766.1", 0xaaaaaaaau);
    testUInt32Calc("0xffffffff AND 2863311530.1", 0xaaaaaaaau);
    testUInt32Calc("~ -1431655766.1", 0x55555555u);
    testUInt32Calc("~ 2863311530.1", 0x55555555u);
    testUInt32Calc("-1431655766.1 >> 0", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 >> 0", 0xaaaaaaaau);
    testUInt32Calc("-1431655766.1 >> 0.1", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 >> 0.1", 0xaaaaaaaau);
    testUInt32Calc("-1431655766.1 << 0", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 << 0", 0xaaaaaaaau);
    testUInt32Calc("-1431655766.1 << 0.1", 0xaaaaaaaau);
    testUInt32Calc("2863311530.1 << 0.1", 0xaaaaaaaau);

Hmm, I just copied the resulting epicsCalcTest.cpp file into an unmodified 3.14 branch, where it happily...

Read more...

review: Needs Information
Revision history for this message
Ralph Lange (ralph-lange) wrote :

That's a nice one, eh?! When working on it last week, I got knots in my brain.

First of all, I set a narrow scope: I decided that I wanted to fix the calc engine, not the calc(out) record. To achieve that, I based the main tests on setting fields to double values before doing the operation, as this gets as close as possible to what the calc record does.
The other tests (using literals and setting fields to integers) were added later when I saw different failure patterns between them and the main tests.

On plain 3.14, the operations completely depend on sizeof(long). On 64bit things work fine (as setting bit31 is *not* setting the sign bit), on 32bit things fail.

To get a more reproducible situation, I first backported the change in the calc engine that changes all operations to work on fixed 32bit size types. With that change, I had the tests consistently failing in a pattern that matched Dirk's report.

After applying Dirk's patch, all tests passed. That's the state I pushed to LaunchPad.

I never tested in a real IOC using Dirk's database.
Having a test producing the same failure pattern that Dirk saw using his database, and his fix passing all my tests and (as per Dirk's report) his database testing seemed good enough to me.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Note that your caget based tests add another conversion (CA server converting from the calc record's double VAL field to integer).

Too many chained conversions in one test, for my taste.

12615. By Andrew Johnson

Test each bitwise cast individually for overflow

Revision history for this message
Andrew Johnson (anj) wrote :

Ok, so I wasn't testing against 3.14 on 32-bit, my bad — the tests do indeed fail there, thanks for pointing that out.

I have committed my changes to the double value tests that I posted yesterday, they are designed to check each cast in the bit-wise operator implementation individually for not overflowing with either +ve or -ve double values that have bit 31 set.

I agree that my caget technique introduced another conversion, I was trying to allow for that in my understanding of what's actually going on. Your testUInt32Calc() routine does the same conversion though:
    uresult = (epicsUInt32) result;
We could avoid that completely by doing the test comparisons inside the expression being tested:
- testUInt32Calc("0xaaaaaaaa AND 0xffff0000", 0xaaaa0000u);
+ testCalc("(0xaaaaaaaa AND 0xffff0000) == 0xaaaa0000", 1);
However the returned result is less useful for diagnosing overflow problems, so I'm happy to keep what we have now.

If you're happy with this I will merge it.

review: Approve
Revision history for this message
Ralph Lange (ralph-lange) wrote :

I am happy with it!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches