[PATCH] D52653: [CodeGen, AArch64] Combine Interleaved Loads which are not covered by the Vectorizer

John Brawn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 23 07:00:34 PDT 2018


john.brawn added inline comments.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:679
+  /// Information of a Vector Element
+  struct ElementInfo {
+    /// Offset Polynomial.
----------------
marels wrote:
> john.brawn wrote:
> > LI is used only for the InsertionPoint in combine, and you can find that by picking the least element of VectorInfo::LIs e.g.
> > 
> > ```
> >   LoadInst *getFirstLoad() const { return *std::min_element(LIs.begin(), LIs.end()); }
> > ```
> > so LI isn't needed here, so we don't need ElementInfo at all and can just have an array of Polynomial.
> I am not sure if this field can be omitted:
> 
>   - The InsertionPoint is not always the first load. It is the load instruction loading the from the memory address where the first element of the interleave is located at. This insertion point is selected because at this position it is guaranteed that the Pointer Value from which that combined load has to load from is available and that this value can be used just by adding a pointer cast.
>   - The LI field also guarantees that this load exists. This is not always the case consider for example the wired case:
> ```
> %ptr1 = &a[0];
> %ptr2 = &a[9];
> %v1 =<12 x float> load %ptr1
> %v2 =<12 x float> load %ptr2
> %s1 = <4 x float> shufflevector %v1, %v2, <1, 5, 9, 16>
> %s2 = <4 x float> shufflevector %v1, %v2, <2, 6, 10, 17>
> %s3 = <4 x float> shufflevector %v1, %v2, <3, 7, 11, 18>
> %s4 = <4 x float> shufflevector %v1, %v2, <4, 8, 15, 19>
> ```
> This interleave would be detected in the first step, but the transformation would be aborted because the there is no load that loads from &a[1].
> 
> 
> 
Could you add tests for these two cases? Currently if you do as I suggested then you get no failures (so it looks like it works). Also for the second it looks like it could be handled by inserting a getelementptr to generate the address &a[1] and load from that, though it doesn't look easy to do (seems like it would involve generating a getelementptr from a Polynomial in some manner) so I wouldn't worry about implementing it.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1229
+  for (auto &VI : InterleavedLoad) {
+    if (!DT->dominates(InsertionPoint, VI->SVI))
+      return false;
----------------
marels wrote:
> john.brawn wrote:
> > This isn't needed, as if this weren't the case the IR wouldn't be valid in the first place (I'm pretty sure).
> I think it is. Again consider a weird but possible case:
> 
> ```
> %ptr2 = &a[2]
> %v2 = <14 x float> load %ptr2
> %s3 = <4 x float> shufflevector %v2, undef, <0, 4, 8, 12>
> %s4 = <4 x float> shufflevector %v2, undef, <1, 5, 9, 13>
> %x = some-nonaliasing-use-eg-extract-element-instr of %s3 or %s4 
> %ptr2 = &a[0];
> %v1 = <14 x float> load %ptr2
> %s1 = shufflevector %v1, undef, <0, 4, 8, 12>
> %s2 = shufflevector %v1, undef, <1, 5, 9, 13>
> 
> ``` 
> 
> Unless moving and proving that %ptr2 is movable before %v2 the insertion point must be at %v1. However because %v1 does not dominate %x the transformation would generate an invalid IR. This loop prevents this case. Also because the shuffle vector instructions are not necessarily in the same basic block I think that the dominator relation is the way to go.
> 
> Also not the extract element instructions are inserted just before the %s1 to %s3 respectively. 
> 
Could you add a test case for this?


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1277-1278
+  for (unsigned Factor = MaxFactor; Factor >= 2; Factor--) {
+    //    SmallVector<std::shared_ptr<VectorInfo>, 4> Candidates;
+    std::list<VectorInfo> Candidates;
+
----------------
The comment here looks like it should be deleted.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1284-1285
+
+          if (DL.getTypeAllocSize(SVI->getType()) % VectorWidth)
+            continue;
+
----------------
Now that the instruction cost is being computed this isn't needed, as that will already handle mis-sized vector types.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1298-1299
+
+    // std::vector<std::shared_ptr<VectorInfo>> InterleavedLoad(Factor);
+    std::list<VectorInfo> InterleavedLoad;
+    while (findPattern(Candidates, InterleavedLoad, Factor, DL)) {
----------------
The same for this comment.


https://reviews.llvm.org/D52653





More information about the llvm-commits mailing list