[PATCH] D80716: [AArch64]: BFloat Load/Store Intrinsics&CodeGen

Tue Jun 2 08:47:03 PDT 2020

LukeGeeson added a subscriber: pratlucas.
LukeGeeson added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:10366
+    auto Alignment = CGM.getNaturalPointeeTypeAlignment(
+        E->getArg(0)->IgnoreParenCasts()->getType());
     Ops[0] = Builder.CreateBitCast(Ops[0], llvm::PointerType::getUnqual(VTy));
----------------
simon_tatham wrote:
> What effect is this change of strategy having on the alignment computation, for the already-supported instances of this builtin?
> 
> It //looks// to me as if `__builtin_neon_vld1_v` with (say) a `uint8_t *` pointer argument will now compute `Alignment=1` (the natural alignment of the pointee type), whereas it would previously have computed `Alignment=8` (the size of the whole vector being loaded or stored).
> 
> Is that intentional? Or accidental? Or have I completely misunderstood what's going on?
> 
> (Whichever of the three, some discussion in the commit message would be a good idea, explaining why this change does or does not make a difference, as appropriate.)
Clang was incorrectly assuming that all the pointers from which loads were being generated for vld1 intrinsics were aligned according to according to the intrinsics result type. This causes alignment faults on the code generated by the backend.
This fixes the issue so that alignment is based on the type of the pointer provided to as input to the intrinsic.

@pratlucas has done some work on this in parallel https://reviews.llvm.org/D79721 which has been approved and may overrule this particular line of code. 

I shall add a note to the commit message, and tentatively mark this as fixed, given it's liable to adopt the work of Lucas.

================
Comment at: clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:36
+// CHECK32: ret <4 x bfloat> %vld1_lane
+
+bfloat16x8_t test_vld1q_lane_bf16(bfloat16_t const *ptr, bfloat16x8_t src) {
----------------
labrinea wrote:
> CHECK-NEXT or CHECK-DAG are preferable for sequences.
Added to be consistent with the rest of the file (ie no CHECK-NEXT, but CHECK32/64)

================
Comment at: clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:180
+// %.fca.0.2.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.1.insert, <4 x bfloat> %vld3_lane.fca.2.extract, 0, 2
+
+bfloat16x8x3_t test_vld3q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x3_t src) {
----------------
labrinea wrote:
> where are the check lines?
Added to be consistent with the rest of the file

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80716/new/

https://reviews.llvm.org/D80716