[PATCH] D41863: [AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian

Tue Jan 16 02:03:33 PST 2018

olista01 added inline comments.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:736
+    AddPromotedToType(ISD::STORE, VT, MVT::v4i16);
   } else if (VT == MVT::v2f64 || VT == MVT::v4f32 || VT == MVT::v8f16) {
     setOperationAction(ISD::LOAD, VT, Promote);
----------------
pbarrio wrote:
> SjoerdMeijer wrote:
> > How about MVT::v8f16? Does this one needs to get a similar treatment?
> I could do the same for MVT::v8f16, but this would be an optimization rather than a bugfix, since LLVM generates correct code in this case:
> 
> ```
>         ld1     { v0.2d }, [x0]
>         rev64   v0.8h, v0.8h
> ```
> 
> which does the correct byte reversal after the load. For the MVT::v4f16 case, LLVM currently generates the following incorrect code:
> 
> ```
>         ld1     { v0.2s }, [x0]
>         rev64   v0.4h, v0.4h
> ```
> 
> I will give the optimization a try and upload an updated patch soon.
Why is this change needed for correctness of v4f16, but only an optimisation for v8f16? If there's a difference between the treatment of these two types elsewhere in the compiler, would it not be better to handle them consistently?

================
Comment at: lib/Target/AArch64/AArch64InstrInfo.td:5849
                              (v4f16 (REV64v4i16 FPR64:$src))>;
 def : Pat<(v4f16 (bitconvert (v2i32 FPR64:$src))),
                              (v4f16 (REV64v4i16 FPR64:$src))>;
----------------
I suspect that the problem is actually in these patterns - the pattern on line 5823 (v2i32 -> v4i16) used REV32, but this one (v2i32 -> v4f16) uses REV64. I would assume that these patterns should be the same regardless of whether the lanes are integers or floating-point.

https://reviews.llvm.org/D41863