[llvm] [LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (PR #159388)

Wed Sep 17 08:46:35 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-nvptx

Author: Drew Kersnar (dakersnar)

<details>
<summary>Changes</summary>

This change introduces Gap Filling, an optimization that aims to fill in holes in otherwise contiguous load/store chains to enable vectorization. It also introduces Chain Extending, which extends the end of a chain to the closest power of 2.

This was originally motivated by the NVPTX target, but I tried to generalize it to be universally applicable to all targets that may use the LSV. One way I did so was introducing a new TTI API called "isLegalToWidenLoads", allowing targets to opt in to these optimizations. I'm more than willing to make adjustments to improve the target-agnostic-ness of this change. I fully expect there are some issues and encourage feedback on how to improve things.

For stores, which unlike loads, cannot be filled/extended without consequence, we only perform the optimization when we can generate a legal llvm masked store intrinsic, masking off the additional elements. Determining legality for stores is a little tricky from the NVPTX side, because these intrinsics are only supported for 256-bit vectors. See the other PR I opened for the implementation of the NVPTX lowering of masked store intrinsics, which include NVPTX TTI changes that return true for isLegalMaskedStore under certain conditions: https://github.com/llvm/llvm-project/pull/159387. This change is dependent on that backend change, but I predict this change will require more discussion, so I am putting them both up at the same time. The backend change will be merged first assuming both are approved.

---

Patch is 95.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159388.diff


15 Files Affected:

- (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+6) 
- (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2) 
- (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4) 
- (modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+2) 
- (modified) llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp (+383-52) 
- (modified) llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll (+21-19) 
- (modified) llvm/test/CodeGen/NVPTX/param-vectorize-device.ll (+2-4) 
- (modified) llvm/test/CodeGen/NVPTX/variadics-backend.ll (+1-1) 
- (added) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/extend-chain.ll (+81) 
- (added) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill-cleanup.ll (+37) 
- (added) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill-invariant.ll (+83) 
- (added) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill-vectors.ll (+186) 
- (added) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill.ll (+194) 
- (added) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/masked-store.ll (+541) 
- (modified) llvm/test/Transforms/LoadStoreVectorizer/NVPTX/vectorize_i8.ll (+1-2) 


``````````diff

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 41ff54f0781a2..f8f134c833ea2 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -817,6 +817,12 @@ class TargetTransformInfo {
   LLVM_ABI bool isLegalMaskedLoad(Type *DataType, Align Alignment,
                                   unsigned AddressSpace) const;
 
+  /// Return true if it is legal to widen loads beyond their current width,
+  /// assuming the result is still well-aligned. For example, converting a load
+  /// i32 to a load i64, or vectorizing three continuous load i32s into a load
+  /// <4 x i32>.
+  LLVM_ABI bool isLegalToWidenLoads() const;
+
   /// Return true if the target supports nontemporal store.
   LLVM_ABI bool isLegalNTStore(Type *DataType, Align Alignment) const;
   /// Return true if the target supports nontemporal load.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 566e1cf51631a..55bd4bd709589 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -318,6 +318,8 @@ class TargetTransformInfoImplBase {
     return false;
   }
 
+  virtual bool isLegalToWidenLoads() const { return false; }
+
   virtual bool isLegalNTStore(Type *DataType, Align Alignment) const {
     // By default, assume nontemporal memory stores are available for stores
     // that are aligned and have a size that is a power of 2.
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 09b50c5270e57..89cda79558057 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -476,6 +476,10 @@ bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType, Align Alignment,
   return TTIImpl->isLegalMaskedLoad(DataType, Alignment, AddressSpace);
 }
 
+bool TargetTransformInfo::isLegalToWidenLoads() const {
+  return TTIImpl->isLegalToWidenLoads();
+}
+
 bool TargetTransformInfo::isLegalNTStore(Type *DataType,
                                          Align Alignment) const {
   return TTIImpl->isLegalNTStore(DataType, Alignment);
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
index b32d931bd3074..d56cff1ce3695 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
@@ -72,6 +72,8 @@ class NVPTXTTIImpl final : public BasicTTIImplBase<NVPTXTTIImpl> {
     return isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment, AddrSpace);
   }
 
+  bool isLegalToWidenLoads() const override { return true; };
+
   // NVPTX has infinite registers of all kinds, but the actual machine doesn't.
   // We conservatively return 1 here which is just enough to enable the
   // vectorizers but disables heuristics based on the number of registers.
diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index 7b5137b0185ab..04f4e92826a52 100644
--- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -119,6 +119,29 @@ using namespace llvm;
 
 #define DEBUG_TYPE "load-store-vectorizer"
 
+cl::opt<bool>
+    ExtendLoads("vect-extend-loads", cl::Hidden,
+                cl::desc("Load more elements if the target VF is higher "
+                         "than the chain length."),
+                cl::init(true));
+
+cl::opt<bool> ExtendStores(
+    "vect-extend-stores", cl::Hidden,
+    cl::desc("Store more elements if the target VF is higher "
+             "than the chain length and we have access to masked stores."),
+    cl::init(true));
+
+cl::opt<bool> FillLoadGaps(
+    "vect-fill-load-gaps", cl::Hidden,
+    cl::desc("Should Loads be introduced in gaps to enable vectorization."),
+    cl::init(true));
+
+cl::opt<bool>
+    FillStoreGaps("vect-fill-store-gaps", cl::Hidden,
+                  cl::desc("Should Stores be introduced in gaps to enable "
+                           "vectorization into masked stores."),
+                  cl::init(true));
+
 STATISTIC(NumVectorInstructions, "Number of vector accesses generated");
 STATISTIC(NumScalarsVectorized, "Number of scalar accesses vectorized");
 
@@ -246,12 +269,16 @@ class Vectorizer {
   const DataLayout &DL;
   IRBuilder<> Builder;
 
-  // We could erase instrs right after vectorizing them, but that can mess up
-  // our BB iterators, and also can make the equivalence class keys point to
-  // freed memory.  This is fixable, but it's simpler just to wait until we're
-  // done with the BB and erase all at once.
+  /// We could erase instrs right after vectorizing them, but that can mess up
+  /// our BB iterators, and also can make the equivalence class keys point to
+  /// freed memory.  This is fixable, but it's simpler just to wait until we're
+  /// done with the BB and erase all at once.
   SmallVector<Instruction *, 128> ToErase;
 
+  /// We insert load/store instructions and GEPs to fill gaps and extend chains
+  /// to enable vectorization. Keep track and delete them later.
+  DenseSet<Instruction *> ExtraElements;
+
 public:
   Vectorizer(Function &F, AliasAnalysis &AA, AssumptionCache &AC,
              DominatorTree &DT, ScalarEvolution &SE, TargetTransformInfo &TTI)
@@ -344,6 +371,28 @@ class Vectorizer {
   /// Postcondition: For all i, ret[i][0].second == 0, because the first instr
   /// in the chain is the leader, and an instr touches distance 0 from itself.
   std::vector<Chain> gatherChains(ArrayRef<Instruction *> Instrs);
+
+  /// Is a load/store with this alignment allowed by TTI and at least as fast
+  /// as an unvectorized load/store.
+  bool accessIsAllowedAndFast(unsigned SizeBytes, unsigned AS, Align Alignment,
+                              unsigned VecElemBits) const;
+
+  /// Before attempting to fill gaps, check if the chain is a candidate for
+  /// a masked store, to save compile time if it is not possible for the address
+  /// space and element type.
+  bool shouldAttemptMaskedStore(const ArrayRef<ChainElem> C) const;
+
+  /// Create a new GEP and a new Load/Store instruction such that the GEP
+  /// is pointing at PrevElem + Offset. In the case of stores, store poison.
+  /// Extra elements will either be combined into a vector/masked store or
+  /// deleted before the end of the pass.
+  ChainElem createExtraElementAfter(const ChainElem &PrevElem, APInt Offset,
+                                    StringRef Prefix,
+                                    Align Alignment = Align(1));
+
+  /// Delete dead GEPs and extra Load/Store instructions created by
+  /// createExtraElementAfter
+  void deleteExtraElements();
 };
 
 class LoadStoreVectorizerLegacyPass : public FunctionPass {
@@ -457,12 +506,21 @@ bool Vectorizer::run() {
       Changed |= runOnPseudoBB(*It, *std::next(It));
 
     for (Instruction *I : ToErase) {
+      // These will get deleted in deleteExtraElements.
+      // This is because ExtraElements will include both extra elements
+      // that *were* vectorized and extra elements that *were not*
+      // vectorized. ToErase will only include extra elements that *were*
+      // vectorized, so in order to avoid double deletion we skip them here and
+      // handle them in deleteExtraElements.
+      if (ExtraElements.contains(I))
+        continue;
       auto *PtrOperand = getLoadStorePointerOperand(I);
       if (I->use_empty())
         I->eraseFromParent();
       RecursivelyDeleteTriviallyDeadInstructions(PtrOperand);
     }
     ToErase.clear();
+    deleteExtraElements();
   }
 
   return Changed;
@@ -623,6 +681,29 @@ std::vector<Chain> Vectorizer::splitChainByContiguity(Chain &C) {
     dumpChain(C);
   });
 
+  // If the chain is not contiguous, we try to fill the gap with "extra"
+  // elements to artificially make it contiguous, to try to enable
+  // vectorization.
+  // - Filling gaps in loads is always ok if the target supports widening loads.
+  // - For stores, we only fill gaps if there is a potentially legal masked
+  //   store for the target. If later on, we don't end up with a chain that
+  //   could be vectorized into a legal masked store, the chains with extra
+  //   elements will be filtered out in splitChainByAlignment.
+  bool TryFillGaps = isa<LoadInst>(C[0].Inst)
+                         ? (FillLoadGaps && TTI.isLegalToWidenLoads())
+                         : (FillStoreGaps && shouldAttemptMaskedStore(C));
+
+  unsigned ASPtrBits =
+      DL.getIndexSizeInBits(getLoadStoreAddressSpace(C[0].Inst));
+
+  // Compute the alignment of the leader of the chain (which every stored offset
+  // is based on) using the current first element of the chain. This is
+  // conservative, we may be able to derive better alignment by iterating over
+  // the chain and finding the leader.
+  Align LeaderOfChainAlign =
+      commonAlignment(getLoadStoreAlignment(C[0].Inst),
+                      C[0].OffsetFromLeader.abs().getLimitedValue());
+
   std::vector<Chain> Ret;
   Ret.push_back({C.front()});
 
@@ -633,7 +714,8 @@ std::vector<Chain> Vectorizer::splitChainByContiguity(Chain &C) {
     unsigned SzBits = DL.getTypeSizeInBits(getLoadStoreType(&*Prev.Inst));
     assert(SzBits % 8 == 0 && "Non-byte sizes should have been filtered out by "
                               "collectEquivalenceClass");
-    APInt PrevReadEnd = Prev.OffsetFromLeader + SzBits / 8;
+    APInt PrevSzBytes = APInt(ASPtrBits, SzBits / 8);
+    APInt PrevReadEnd = Prev.OffsetFromLeader + PrevSzBytes;
 
     // Add this instruction to the end of the current chain, or start a new one.
     bool AreContiguous = It->OffsetFromLeader == PrevReadEnd;
@@ -642,10 +724,54 @@ std::vector<Chain> Vectorizer::splitChainByContiguity(Chain &C) {
                       << *Prev.Inst << " (ends at offset " << PrevReadEnd
                       << ") -> " << *It->Inst << " (starts at offset "
                       << It->OffsetFromLeader << ")\n");
-    if (AreContiguous)
+
+    if (AreContiguous) {
       CurChain.push_back(*It);
-    else
-      Ret.push_back({*It});
+      continue;
+    }
+
+    // For now, we aren't filling gaps between load/stores of different sizes.
+    // Additionally, as a conservative heuristic, we only fill gaps of 1-2
+    // elements. Generating loads/stores with too many unused bytes has a side
+    // effect of increasing register pressure (on NVIDIA targets at least),
+    // which could cancel out the benefits of reducing number of load/stores.
+    if (TryFillGaps &&
+        SzBits == DL.getTypeSizeInBits(getLoadStoreType(It->Inst))) {
+      APInt OffsetOfGapStart = Prev.OffsetFromLeader + PrevSzBytes;
+      APInt GapSzBytes = It->OffsetFromLeader - OffsetOfGapStart;
+      if (GapSzBytes == PrevSzBytes) {
+        // There is a single gap between Prev and Curr, create one extra element
+        ChainElem NewElem = createExtraElementAfter(
+            Prev, PrevSzBytes, "GapFill",
+            commonAlignment(LeaderOfChainAlign,
+                            OffsetOfGapStart.abs().getLimitedValue()));
+        CurChain.push_back(NewElem);
+        CurChain.push_back(*It);
+        continue;
+      }
+      // There are two gaps between Prev and Curr, only create two extra
+      // elements if Prev is the first element in a sequence of four.
+      // This has the highest chance of resulting in a beneficial vectorization.
+      if ((GapSzBytes == 2 * PrevSzBytes) && (CurChain.size() % 4 == 1)) {
+        ChainElem NewElem1 = createExtraElementAfter(
+            Prev, PrevSzBytes, "GapFill",
+            commonAlignment(LeaderOfChainAlign,
+                            OffsetOfGapStart.abs().getLimitedValue()));
+        ChainElem NewElem2 = createExtraElementAfter(
+            NewElem1, PrevSzBytes, "GapFill",
+            commonAlignment(
+                LeaderOfChainAlign,
+                (OffsetOfGapStart + PrevSzBytes).abs().getLimitedValue()));
+        CurChain.push_back(NewElem1);
+        CurChain.push_back(NewElem2);
+        CurChain.push_back(*It);
+        continue;
+      }
+    }
+
+    // The chain is not contiguous and cannot be made contiguous with gap
+    // filling, so we need to start a new chain.
+    Ret.push_back({*It});
   }
 
   // Filter out length-1 chains, these are uninteresting.
@@ -721,6 +847,14 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
   unsigned AS = getLoadStoreAddressSpace(C[0].Inst);
   unsigned VecRegBytes = TTI.getLoadStoreVecRegBitWidth(AS) / 8;
 
+  // For compile time reasons, we cache whether or not the superset
+  // of all candidate chains contains any extra stores from earlier gap
+  // filling.
+  bool CandidateChainsMayContainExtraStores =
+      !IsLoadChain && any_of(C, [this](const ChainElem &E) {
+        return ExtraElements.contains(E.Inst);
+      });
+
   std::vector<Chain> Ret;
   for (unsigned CBegin = 0; CBegin < C.size(); ++CBegin) {
     // Find candidate chains of size not greater than the largest vector reg.
@@ -769,41 +903,6 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
         continue;
       }
 
-      // Is a load/store with this alignment allowed by TTI and at least as fast
-      // as an unvectorized load/store?
-      //
-      // TTI and F are passed as explicit captures to WAR an MSVC misparse (??).
-      auto IsAllowedAndFast = [&, SizeBytes = SizeBytes, &TTI = TTI,
-                               &F = F](Align Alignment) {
-        if (Alignment.value() % SizeBytes == 0)
-          return true;
-        unsigned VectorizedSpeed = 0;
-        bool AllowsMisaligned = TTI.allowsMisalignedMemoryAccesses(
-            F.getContext(), SizeBytes * 8, AS, Alignment, &VectorizedSpeed);
-        if (!AllowsMisaligned) {
-          LLVM_DEBUG(dbgs()
-                     << "LSV: Access of " << SizeBytes << "B in addrspace "
-                     << AS << " with alignment " << Alignment.value()
-                     << " is misaligned, and therefore can't be vectorized.\n");
-          return false;
-        }
-
-        unsigned ElementwiseSpeed = 0;
-        (TTI).allowsMisalignedMemoryAccesses((F).getContext(), VecElemBits, AS,
-                                             Alignment, &ElementwiseSpeed);
-        if (VectorizedSpeed < ElementwiseSpeed) {
-          LLVM_DEBUG(dbgs()
-                     << "LSV: Access of " << SizeBytes << "B in addrspace "
-                     << AS << " with alignment " << Alignment.value()
-                     << " has relative speed " << VectorizedSpeed
-                     << ", which is lower than the elementwise speed of "
-                     << ElementwiseSpeed
-                     << ".  Therefore this access won't be vectorized.\n");
-          return false;
-        }
-        return true;
-      };
-
       // If we're loading/storing from an alloca, align it if possible.
       //
       // FIXME: We eagerly upgrade the alignment, regardless of whether TTI
@@ -818,8 +917,7 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
                             isa<AllocaInst>(PtrOperand->stripPointerCasts());
       Align Alignment = getLoadStoreAlignment(C[CBegin].Inst);
       Align PrefAlign = Align(StackAdjustedAlignment);
-      if (IsAllocaAccess && Alignment.value() % SizeBytes != 0 &&
-          IsAllowedAndFast(PrefAlign)) {
+      if (IsAllocaAccess && Alignment.value() % SizeBytes != 0) {
         Align NewAlign = getOrEnforceKnownAlignment(
             PtrOperand, PrefAlign, DL, C[CBegin].Inst, nullptr, &DT);
         if (NewAlign >= Alignment) {
@@ -831,7 +929,59 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
         }
       }
 
-      if (!IsAllowedAndFast(Alignment)) {
+      Chain ExtendingLoadsStores;
+      bool ExtendChain = IsLoadChain
+                             ? ExtendLoads
+                             : ExtendStores;
+      if (ExtendChain && NumVecElems < TargetVF && NumVecElems % 2 != 0 &&
+          VecElemBits >= 8) {
+        // TargetVF may be a lot higher than NumVecElems,
+        // so only extend to the next power of 2.
+        assert(VecElemBits % 8 == 0);
+        unsigned VecElemBytes = VecElemBits / 8;
+        unsigned NewNumVecElems = PowerOf2Ceil(NumVecElems);
+        unsigned NewSizeBytes = VecElemBytes * NewNumVecElems;
+
+        assert(NewNumVecElems <= TargetVF);
+
+        LLVM_DEBUG(dbgs() << "LSV: attempting to extend chain of "
+                          << NumVecElems << " "
+                          << (IsLoadChain ? "loads" : "stores") << " to "
+                          << NewNumVecElems << " elements\n");
+        // Do not artificially increase the chain if it becomes misaligned,
+        // otherwise we may unnecessary split the chain when the target actually
+        // supports non-pow2 VF.
+        if (accessIsAllowedAndFast(NewSizeBytes, AS, Alignment, VecElemBits) &&
+            ((IsLoadChain ? TTI.isLegalToWidenLoads()
+                          : TTI.isLegalMaskedStore(
+                                FixedVectorType::get(VecElemTy, NewNumVecElems),
+                                Alignment, AS, /*IsMaskConstant=*/true)))) {
+          LLVM_DEBUG(dbgs()
+                     << "LSV: extending " << (IsLoadChain ? "load" : "store")
+                     << " chain of " << NumVecElems << " "
+                     << (IsLoadChain ? "loads" : "stores")
+                     << " with total byte size of " << SizeBytes << " to "
+                     << NewNumVecElems << " "
+                     << (IsLoadChain ? "loads" : "stores")
+                     << " with total byte size of " << NewSizeBytes
+                     << ", TargetVF=" << TargetVF << " \n");
+
+          unsigned ASPtrBits = DL.getIndexSizeInBits(AS);
+          ChainElem Prev = C[CEnd];
+          for (unsigned i = 0; i < (NewNumVecElems - NumVecElems); i++) {
+            ChainElem NewElem = createExtraElementAfter(
+                Prev, APInt(ASPtrBits, VecElemBytes), "Extend");
+            ExtendingLoadsStores.push_back(NewElem);
+            Prev = ExtendingLoadsStores.back();
+          }
+
+          // Update the size and number of elements for upcoming checks.
+          SizeBytes = NewSizeBytes;
+          NumVecElems = NewNumVecElems;
+        }
+      }
+
+      if (!accessIsAllowedAndFast(SizeBytes, AS, Alignment, VecElemBits)) {
         LLVM_DEBUG(
             dbgs() << "LSV: splitChainByAlignment discarding candidate chain "
                       "because its alignment is not AllowedAndFast: "
@@ -849,10 +999,41 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
         continue;
       }
 
+      if (CandidateChainsMayContainExtraStores) {
+        // The legality of adding extra stores to ExtendingLoadsStores has
+        // already been checked, but if the candidate chain contains extra
+        // stores from an earlier optimization, confirm legality now.
+        // This filter is essential because, when filling gaps in
+        // splitChainByContinuity, we queried the API to check that (for a given
+        // element type and address space) there *may* be a legal masked store
+        // we can try to create. Now, we need to check if the actual chain we
+        // ended up with is legal to turn into a masked store.
+        // This is relevant for NVPTX targets, for example, where a masked store
+        // is only legal if we have ended up with a 256-bit vector.
+        bool CandidateChainContainsExtraStores = llvm::any_of(
+            ArrayRef<ChainElem>(C).slice(CBegin, CEnd - CBegin + 1),
+            [this](const ChainElem &E) {
+              return ExtraElements.contains(E.Inst);
+            });
+
+        if (CandidateChainContainsExtraStores &&
+            !TTI.isLegalMaskedStore(
+                FixedVectorType::get(VecElemTy, NumVecElems), Alignment, AS,
+                /*IsMaskConstant=*/true)) {
+          LLVM_DEBUG(dbgs()
+                     << "LSV: splitChainByAlignment discarding candidate chain "
+                        "because it contains extra stores that we cannot "
+                        "legally vectorize into a masked store \n");
+          continue;
+        }
+      }
+
       // Hooray, we can vect...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/159388