[PATCH] D149893: Rewrite LSV to handle longer chains.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 1 07:30:38 PDT 2023
arsenm added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:805
+ // existing tests. This isn't *so* bad, because at most we align to 4
+ // bytes (current value of StackAdjustedAlignment).
+ //
----------------
bjope wrote:
> barannikov88 wrote:
> > arsenm wrote:
> > > jlebar wrote:
> > > > barannikov88 wrote:
> > > > > Does DL.getStackAlign() instead of 4 cause many negative differences in tests?
> > > > >
> > > > DL.getStackAlignment() assert-fails on x86, nvptx, and amdgpu:
> > > >
> > > > llvm::DataLayout::getStackAlignment() const: StackNaturalAlign && "StackNaturalAlign must be defined"
> > > >
> > > > Not sure what's going on, but I don't think this is a windmill I want to tilt at in this patch.
> > > For AMDGPU we definitely do not want to promote stack objects above 4 alignment. We should in fact have optmizations to under-align any stack values to 4 as long as the address isn't captured
> > > For AMDGPU we definitely do not want to promote stack objects above 4 alignment.
> >
> > AMDGPU has "S32" in the datalayout string, so this promotion shouldn't happen.
> >
> > > DL.getStackAlignment() assert-fails on x86, nvptx, and amdgpu:
> >
> > This is probably because the datalayout in these tests does not have this "S" specifier.
> > Anyway, it was merely a suggestion, not a request for a change.
> >
> Our downstream target suffered from this. Now it starts to dynamically align things on the stack, that weren't aligned in the past. So we see regressions after this patch.
>
> I wonder if it would be OK to do something like this as a workaround until these fixme:s are sorted out:
>
> ```
> @@ -822,11 +822,12 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
> // FIXME: We will upgrade the alignment of the alloca even if it turns out
> // we can't vectorize for some other reason.
> Align Alignment = getLoadStoreAlignment(C[CBegin].Inst);
> + Align PrefAlignment = DL.getPrefTypeAlign(Type::getInt32Ty(F.getContext()));
> if (AS == DL.getAllocaAddrSpace() && Alignment.value() % SizeBytes != 0 &&
> - IsAllowedAndFast(Align(StackAdjustedAlignment))) {
> + IsAllowedAndFast(PrefAlignment)) {
> Align NewAlign = getOrEnforceKnownAlignment(
> getLoadStorePointerOperand(C[CBegin].Inst),
> - Align(StackAdjustedAlignment), DL, C[CBegin].Inst, nullptr, &DT);
> + PrefAlignment, DL, C[CBegin].Inst, nullptr, &DT);
> if (NewAlign >= Alignment) {
> LLVM_DEBUG(dbgs()
> << "LSV: splitByChain upgrading alloca alignment from "
> ```
>
> So instead of the hard-coded 4 byte alignment, we ask DataLayout about preferred alignment of a 32-bit value. I guess that is 4 for most targets, at least all upstream lit tests seem to pass.
>
> Well, I haven't thought much about it myself. I just tried to make a workaround that makes our downstream target happy while still all in-tree lit tests pass.
Could also clamp by max of existing stack alignments and stack alignment
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D149893/new/
https://reviews.llvm.org/D149893
More information about the llvm-commits
mailing list