[PATCH] D87771: [AArch64] Emit zext move when the source of the zext is AssertZext or AssertSext

Wed Sep 16 12:31:38 PDT 2020

efriedma added a subscriber: spop.
efriedma added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.h:425
+         Opc != ISD::CopyFromReg && Opc != ISD::AssertSext &&
+         Opc != ISD::AssertZext;
 }
----------------
AssertSext/AssertZext don't really indicate anything either way; it would make sense to check the operand, maybe?  That said, I guess the operand will usually be a CopyFromReg, so maybe it doesn't matter much.

More generally, I don't like the general approach of guessing what isel will do, as opposed to examining what isel actually did.  Due to the way the dot product patterns were written, it's actually possible for an i32 EXTRACT_VECTOR_ELT to produce a value that isn't zero-extended (testcase follows).  And I'm not confident there aren't other weird edge cases.  I'd be happier doing this after isel, when we can tell what instruction actually produced the value in question.

```
define i64 @test_udot_v8i8(i8* nocapture readonly %a, i8* nocapture readonly %b) {
entry:
; CHECK-LABEL: test_udot_v8i8:
; CHECK:  udot {{v[0-9]+}}.2s, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b
  %0 = bitcast i8* %a to <8 x i8>*
  %1 = load <8 x i8>, <8 x i8>* %0
  %2 = zext <8 x i8> %1 to <8 x i32>
  %3 = bitcast i8* %b to <8 x i8>*
  %4 = load <8 x i8>, <8 x i8>* %3
  %5 = zext <8 x i8> %4 to <8 x i32>
  %6 = mul nuw nsw <8 x i32> %5, %2
  %7 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> %6)
  %8 = zext i32 %7 to i64
  ret i64 %8
}
declare i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32>)
```

================
Comment at: llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll:1
+; RUN: llc -O2 -mtriple=aarch64-linux-gnu < %s | FileCheck %s
+
----------------
This testcase is way too complicated; can you extract out the bit that actually triggers this issue?

================
Comment at: llvm/test/CodeGen/AArch64/shift_minsize.ll:62
 ; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    // kill: def $w2 killed $w2 def $x2
+; CHECK-NEXT:    mov w2, w2
 ; CHECK-NEXT:    bl __ashlti3
----------------
This mov shouldn't be necessary, but the reason isn't really related to this patch, I guess.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87771/new/

https://reviews.llvm.org/D87771