[PATCH] D89162: [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation

Fri Oct 23 07:50:12 PDT 2020

paulwalker-arm added a comment.

In D89162#2347814 <https://reviews.llvm.org/D89162#2347814>, @cameron.mcinally wrote:

> The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

Are you sure?  I took your patch for a test drive and removed all but the `TLI.getOperationAction` related change from Legalize*.{cpp, h} and the tests passed.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16293
+  SDValue Rdx = DAG.getNode(AArch64ISD::FADDA_PRED, DL,
+                            getPackedSVEVectorVT(ResVT),
+                            Pg, AccOp, VecOp);
----------------
I think this can be ContainerVT?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16297
+  return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ResVT, Rdx,
+                     DAG.getConstant(0, DL, MVT::i64));
+}
----------------
You use this in a couple of places, so might be worth extracting into `Zero = ` ?

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll:65
+
+; NOTE: NEON expansions used to ensure ordering.
+  %op = load <32 x half>, <32 x half>* %a
----------------
What does this mean?

I was going to say you're missing VBITS_EQ_256 lines here but then noticed the output.  What's you're feeling on this?  I see two routes:

Stick with the expanded output for this patch, in which case just having `VBITS_EQ_256-COUNT-33: fadd` is good enough.

For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard.

`VECREDUCE_SEQ_FADD A, X_512 => VECREDUCE_SEQ_FADD (VECREDUCE_SEQ_FADD X, X_LO), X_HI` ?

I'm happy with either approach.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89162/new/

https://reviews.llvm.org/D89162