[PATCH] D77251: [llvm][CodeGen] Addressing modes for SVE ldN.

Wed Jul 22 08:14:27 PDT 2020

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:4731
 /// EVT.
+template <unsigned NumVec>
 static EVT getPackedVectorTypeFromPredicateType(LLVMContext &Ctx, EVT PredVT) {
----------------
fpetrogalli wrote:
> sdesmalen wrote:
> > Why is this a template argument instead of a regular argument with default = 1?
> It is a template because I don't foresee `NumVec` becoming a runtime value. I have used this in other places, for example in  `setInfoSVEStN`. I could set the default to be 1, but I'd rather call it out in the template invocation.
> 
> Your comment made me think I should have also made sure that the values of `NumVecs` are architecturally valid ==> that's why I have added a static assert in the last version of the patch.
This means the compiler will have to generate two versions of `getPackedVectorTypeFromPredicateType` because you expect `NumVec` to be a constant value. It should instead just pass it as a regular parameter.

One of the reasons that this file contains various templated functions is so that you can target them in `ComplexPattern`s through TableGen by specifying a specific variant of that function, e.g. `SelectSVEShiftImm64<0, 64>` for `ComplexPattern SVEShiftImm64`.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:4768
     return cast<VTSDNode>(Root->getOperand(4))->getVT();
+  case AArch64ISD::SVE_LD2_MERGE_ZERO:
+    return getPackedVectorTypeFromPredicateType<2>(
----------------
fpetrogalli wrote:
> sdesmalen wrote:
> > How come that ST2..ST4 are not covered here?
> We cannot do for the structured load the same we do for the stores because address optimization is done on the intrinsics for structured stores:
> 
> ```
>     case Intrinsic::aarch64_sve_st2: {
>       if (VT == MVT::nxv16i8) {
>         SelectPredicatedStore</*Scale=*/0>(Node, 2, AArch64::ST2B,
>                                            AArch64::ST2B_IMM);
>         return;
> ```
> 
> While for the loads the address optimization is done on the custom ISD nodes:
> 
> ```
>   case AArch64ISD::SVE_LD2_MERGE_ZERO: {
>     if (VT == MVT::nxv16i8) {
>       SelectPredicatedLoad</*Scale=*/0>(Node, 2, AArch64::LD2B_IMM,
>                                         AArch64::LD2B);
>       return;
> ```
> 
> This means that the method `getMemVTFromNode` determines the memory type with the following generic code for the stores:
> 
> ```
>   if (isa<MemIntrinsicSDNode>(Root))
>     return cast<MemIntrinsicSDNode>(Root)->getMemoryVT();
> ```
> 
> We might do the same for the loads, but we would have to teach to `AArch64TargetLowering::getTgtMemIntrinsic` how to extract the memory type from the intrinsics call, like we do with the stores:
> 
> ```
>   switch (Intrinsic) {
>   case Intrinsic::aarch64_sve_st2:
>     return setInfoSVEStN<2>(Info, I);
>   case Intrinsic::aarch64_sve_st3:
>     return setInfoSVEStN<3>(Info, I);
>   case Intrinsic::aarch64_sve_st4:
>     return setInfoSVEStN<4>(Info, I);
> ```
> 
> But that would probably requires changing all the code that converts the intrinsics of the loads into the `SVE_LD*_MERGE_ZERO` ISD nodes...
> 
> If we want to do this cleanup maybe it is better to do it for the whole "structure store intrinsics/ISD nodes" in a separate patch? As far as I understand it, this would be quite an extensive change because we would have to make sure that the nodes of the structure loads are still using the intrinsic and not the custom ISD node when we enter this method, which is not what is going on now. I believe there is a reason for having converted the intrinsics nodes into custom ISD nodes before getting here in address optimization?
Thanks for explaining, I guess I'm happy keeping it as it is now.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll:452
+; CHECK-NEXT: ret
+%base = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %addr, i64 16
+%base_ptr = bitcast <vscale x 2 x i64>* %base to i64 *
----------------
fpetrogalli wrote:
> sdesmalen wrote:
> > nit: when testing these pairs, can you at least make sure to test the min/max values?
> I can do that, just making sure you didn't miss the comment on the top of file that explains why I haven't add range checks for all cases:
> 
> ```
> ; NOTE: invalid, upper and lower bound immediate values of the regimm
> ; addressing mode are checked only for the byte version of each
> ; instruction (`ld<N>b`), as the code for detecting the immediate is
> ; common to all instructions, and varies only for the number of
> ; elements of the structure store, which is <N> = 2, 3, 4.
> ```
> 
> Let me know if you think this is not a sufficient explanation!
> 
I meant using `#-32` and `#28` instead of `#16` and `#-16` for ld4.nxv8i64 and nxv8f64 respectively.
(and similar for other tests in this file)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77251/new/

https://reviews.llvm.org/D77251