[PATCH] D52653: [CodeGen, AArch64] Combine Interleaved Loads which are not covered by the Vectorizer

Martin Elshuber via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 29 10:45:34 PDT 2018


marels added a comment.

I also added missing colons in some of the test, and in the main loop I added a missing pop_back

@john.brawn: thanks for commenting



================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:679
+  /// Information of a Vector Element
+  struct ElementInfo {
+    /// Offset Polynomial.
----------------
john.brawn wrote:
> marels wrote:
> > john.brawn wrote:
> > > LI is used only for the InsertionPoint in combine, and you can find that by picking the least element of VectorInfo::LIs e.g.
> > > 
> > > ```
> > >   LoadInst *getFirstLoad() const { return *std::min_element(LIs.begin(), LIs.end()); }
> > > ```
> > > so LI isn't needed here, so we don't need ElementInfo at all and can just have an array of Polynomial.
> > I am not sure if this field can be omitted:
> > 
> >   - The InsertionPoint is not always the first load. It is the load instruction loading the from the memory address where the first element of the interleave is located at. This insertion point is selected because at this position it is guaranteed that the Pointer Value from which that combined load has to load from is available and that this value can be used just by adding a pointer cast.
> >   - The LI field also guarantees that this load exists. This is not always the case consider for example the wired case:
> > ```
> > %ptr1 = &a[0];
> > %ptr2 = &a[9];
> > %v1 =<12 x float> load %ptr1
> > %v2 =<12 x float> load %ptr2
> > %s1 = <4 x float> shufflevector %v1, %v2, <1, 5, 9, 16>
> > %s2 = <4 x float> shufflevector %v1, %v2, <2, 6, 10, 17>
> > %s3 = <4 x float> shufflevector %v1, %v2, <3, 7, 11, 18>
> > %s4 = <4 x float> shufflevector %v1, %v2, <4, 8, 15, 19>
> > ```
> > This interleave would be detected in the first step, but the transformation would be aborted because the there is no load that loads from &a[1].
> > 
> > 
> > 
> Could you add tests for these two cases? Currently if you do as I suggested then you get no failures (so it looks like it works). Also for the second it looks like it could be handled by inserting a getelementptr to generate the address &a[1] and load from that, though it doesn't look easy to do (seems like it would involve generating a getelementptr from a Polynomial in some manner) so I wouldn't worry about implementing it.
I added a case for where this load does not exist.

Yes, one could add a corresponding GEP, but it might not only be complicated to reconstruct this from a polynomial, i cannot imagine that many real-life examples would profit.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1229
+  for (auto &VI : InterleavedLoad) {
+    if (!DT->dominates(InsertionPoint, VI->SVI))
+      return false;
----------------
john.brawn wrote:
> marels wrote:
> > john.brawn wrote:
> > > This isn't needed, as if this weren't the case the IR wouldn't be valid in the first place (I'm pretty sure).
> > I think it is. Again consider a weird but possible case:
> > 
> > ```
> > %ptr2 = &a[2]
> > %v2 = <14 x float> load %ptr2
> > %s3 = <4 x float> shufflevector %v2, undef, <0, 4, 8, 12>
> > %s4 = <4 x float> shufflevector %v2, undef, <1, 5, 9, 13>
> > %x = some-nonaliasing-use-eg-extract-element-instr of %s3 or %s4 
> > %ptr2 = &a[0];
> > %v1 = <14 x float> load %ptr2
> > %s1 = shufflevector %v1, undef, <0, 4, 8, 12>
> > %s2 = shufflevector %v1, undef, <1, 5, 9, 13>
> > 
> > ``` 
> > 
> > Unless moving and proving that %ptr2 is movable before %v2 the insertion point must be at %v1. However because %v1 does not dominate %x the transformation would generate an invalid IR. This loop prevents this case. Also because the shuffle vector instructions are not necessarily in the same basic block I think that the dominator relation is the way to go.
> > 
> > Also not the extract element instructions are inserted just before the %s1 to %s3 respectively. 
> > 
> Could you add a test case for this?
sure & done


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1277-1278
+  for (unsigned Factor = MaxFactor; Factor >= 2; Factor--) {
+    //    SmallVector<std::shared_ptr<VectorInfo>, 4> Candidates;
+    std::list<VectorInfo> Candidates;
+
----------------
john.brawn wrote:
> The comment here looks like it should be deleted.
done


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1284-1285
+
+          if (DL.getTypeAllocSize(SVI->getType()) % VectorWidth)
+            continue;
+
----------------
john.brawn wrote:
> Now that the instruction cost is being computed this isn't needed, as that will already handle mis-sized vector types.
removed, including related code


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1298-1299
+
+    // std::vector<std::shared_ptr<VectorInfo>> InterleavedLoad(Factor);
+    std::list<VectorInfo> InterleavedLoad;
+    while (findPattern(Candidates, InterleavedLoad, Factor, DL)) {
----------------
john.brawn wrote:
> The same for this comment.
removed


https://reviews.llvm.org/D52653





More information about the llvm-commits mailing list