[PATCH] D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`

Wed Oct 27 01:42:05 PDT 2021

pengfei resigned from this revision.
pengfei added a comment.

> @pengfei like with D111546 <https://reviews.llvm.org/D111546>, are intel benchmarks impacted by this? in which way?

I checked cpu2017 on SKX, no obvious improvment/regression found.

I'll resign given I neither have strong objections nor stand for it.

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3856-3861
+    // FIXME: this is a hack to artificially favor interleaved loads over gather
+    // on pre-AVX2. We do this because we can't correctly model the cost of an
+    // interleaved load, we overshoot by a an order of magnitude (because we
+    // essentially just fallback to the same sequence as gather) but we can
+    // model the cost of gather, and the cost is more favorable. Yet the
+    // interleaved load is better in general in reality.
----------------
The comments are easy misleading. I suggest we just remove the comments here. Maybe only explain why works for targets prior to AVX2.

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3862-3863
+    // interleaved load is better in general in reality.
+    // TODO: AVX2 is the current cut-off because we have the correct
+    //       interleaving costs for a select number of triples.
     if (!BaseT::isStridedAccess(Ptr))
----------------
> I'm not doing this (D111460) to affect the interleaving costs,
The comments here make me think interleaved cost is substitute of the math here.

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3858-3859
       return NumVectorInstToHideOverhead;
     if (!BaseT::getConstantStrideStep(SE, Ptr))
       return 1;
   }
----------------
lebedev.ri wrote:
> This appears to be dead code, if i replace `return 1;` with `assert(false);` it does not fire on any of the tests.
Neither does on my tests.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7061
   Type *PtrTy = ToVectorTy(Ptr->getType(), VF);
+  // FIXME: PtrTy should not be a vector, it's a hack.

----------------
RKSimon wrote:
> dmgreen wrote:
> > pengfei wrote:
> > > lebedev.ri wrote:
> > > > pengfei wrote:
> > > > > It would be more productive if it includes the reason why we can’t change it right away.
> > > > > Lack of such explanation led to the original version of the patch that repeats D93129.
> > > > I'm open to suggestions as to how this should be amended.
> > > Sorry, I don't have a concrete suggestion. I have concerns because I saw disagreement from Dave and didn't see consensus have been made.
> > My comments were mostly for ARM/AArch64, which this patch no longer affects. I don't have as much knowledge on X86 matters and no evidence suggesting one way or another whether this is a good idea I'm afraid.
> > 
> > I don't think this comment is particularly useful though. Personally I would personally just remove this comment and leave the patch to the X86 backend.
> @fhahn What happened to D93129 that actually dealt with this?
Removing the comment sounds good to me.

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll:1450
-; ENABLED_MASKED_STRIDED-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
 ; ENABLED_MASKED_STRIDED-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
----------------
I think masked gather/scatter/load/store etc. is definitely better than branches on AVX512. I think we should consider the cost of branches.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111460/new/

https://reviews.llvm.org/D111460