[PATCH] D78439: [AVR] Fix miscompilation of zext + add

Sat Apr 18 23:25:36 PDT 2020

dylanmckay added a comment.

I'm not sure that this is a miscompile. Note that AVR has no "add with immediate" instruction, instead subsumed by the "subtract with immediate" instruction, which of course with immediate values being immediately known to the compiler, can be negated directly at the callsite to effectively form a "add with immediate" instruction.

In this case, with the two snippets

(trunk)

  foo:
      mov     r18, r20
      clr     r19
      add     r22, r18
      adc     r23, r19
      sbci    r24, 0
      sbci    r25, 0
      ret

(with patch)

  foo:
      mov     r18, r20
      clr     r19
      ldi     r20, 0
      ldi     r21, 0
      add     r22, r18
      adc     r23, r19
      adc     r24, r20
      adc     r25, r21
      ret

The first two instructions and the final `ret` are identical in both, let's remove.

(trunk)

  add     r22, r18
  adc     r23, r19
  sbci    r24, 0
  sbci    r25, 0

(with patch)

  ldi     r20, 0
  ldi     r21, 0
  add     r22, r18
  adc     r23, r19
  adc     r24, r20
  adc     r25, r21

With the patch, the LDIs and `r20` and `r21` aren't used until the last two ADCs, let's reorder

(with patch)

  add     r22, r18
  adc     r23, r19
  ldi     r20, 0
  ldi     r21, 0
  adc     r24, r20
  adc     r25, r21

This is equivalent assembly as before, and now the two snippets both have the exact same initial two `add+adc` instructions. Let's remove from both snippets, leaving

(trunk)

  sbci    r24, 0
  sbci    r25, 0

(with patch)

  ldi     r20, 0
  ldi     r21, 0
  adc     r24, r20
  adc     r25, r21

So the only possible difference between the two variants, is in the addition of last two bytes from the 32-bit zero extended `i1` value.

The remaining (trunk) asm just subtracts zero from the upper half of the non-extended 32-bit integer `%a` in registers `r24:r25`. This is the compiler inserting an addition of the immediate `0` (as `-0 = 0`).  This makes sense because the corresponding bits from the `i1` value have to be zero because it was zero-extended to 32-bits. A subtraction of zero is always redundant, therefore the two SBCIs from the first patch are completely redundant, and could be removed by the compiler in a future optimization.

The remaining (with patch) asm just adds zero to the upper half of the non-zero extended 32-bit integer `%a` in registers `r24:25` but with extra steps to load the zero value into a register.

The only two remaining patches are completely identical in terms of execution as far as I can tell.

In summary: I think this is not a miscompile, let me know if you still think otherwise.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78439/new/

https://reviews.llvm.org/D78439