[LLVMbugs] [Bug 19396] New: ARM code emitter misoptimization with 16-bit multiplication

Thu Apr 10 14:27:57 PDT 2014

http://llvm.org/bugs/show_bug.cgi?id=19396

            Bug ID: 19396
           Summary: ARM code emitter misoptimization with 16-bit
                    multiplication
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: ARM
          Assignee: unassignedbugs at nondot.org
          Reporter: craig at ni.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

The following code snippet (post opt -O2 optimization):

  %136 = getelementptr i8* %dsp, i32 844
  %137 = bitcast i8* %136 to i32**
  %138 = load i32** %137, align 4
  %139 = bitcast i32* %138 to i16*
  %140 = load i16* %139, align 2
  %vr22 = sext i16 %140 to i32
  %vr22.1 = mul i32 %vr22, 5000
  %141 = getelementptr i8* %dsp, i32 872
  %142 = bitcast i8* %141 to i16*
  %143 = trunc i32 %vr22.1 to i16
  store i16 %143, i16* %142, align 2
  %144 = getelementptr i8* %dsp, i32 876
  %sext = mul i32 %vr22, 327680000  ; 5000<<16
  %145 = ashr exact i32 %sext, 16
  %146 = bitcast i8* %144 to i32*
  store i32 %145, i32* %146, align 4

miscompiles on ARM with ;llc -mtriple=arm-pc-eabi -mcpu=cortex-a8
-mattr=-thumb2' 
(even with -O0 added).

It generates the following:
        ldr     r0, [r4, #844]
        mov     r1, #904
        mov     r2, #59244544
        orr     r1, r1, #4096
        orr     r2, r2, #268435456
        ldrsh   r0, [r0]
        smulbb  r1, r0, r1
        smulwb  r0, r2, r0
        mov     r2, #872
        strh    r1, [r4, r2]
        str     r0, [r4, #876]

The problem is that the second multiply (smulwb) tries to fold the ashr
operation into the multiply, but it's not actually equivalent.  The expected
result in the original IR is the result of the 16-bit multiplication truncated
to 16-bits and then sign-extended to 32.  Instead smulwb multiplies all of 32
by the low 16-bits of r0 and writes the high 32-bits of the 48-bit intermediate
result into r0.  This leaves the entire 32-bit result intact without truncation
to 16 bits and sign-extension.

The culprit is this pattern in ARMInstrInfo.td:

     def : ARMV5TEPat<(sra (mul GPR:$a, sext_16_node:$b), (i32 16)),
                  (SMULWB GPR:$a, GPR:$b)>;

Commenting it out makes the bug go away, and correct code is emitted:
        ldr     r0, [r4, #844]
        mov     r2, #904
        orr     r2, r2, #4096
        ldrsh   r1, [r0]
        mov     r0, #59244544
        orr     r0, r0, #268435456
        mul     r0, r1, r0
        smulbb  r1, r1, r2
        mov     r2, #872
        strh    r1, [r4, r2]
        asr     r0, r0, #16
        str     r0, [r4, #876]

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140410/86eae130/attachment.html>