[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.
Shahid via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 22 06:35:26 PST 2016
ashahid added a comment.
Hi Simon, Michael
Thanks for the comments. Pls find the response inlined.
Thanks,
Shahid
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1022
+ ScalarEvolution &SE) {
+ std::multimap<int, Value *> offValPair;
+ for (unsigned i = 0; i < VL.size(); i++) {
----------------
mkuper wrote:
> Why is this a multimap?
This is because the elements in the multimap follow a certain order, so using this will ensure that the values are sorted accordingly.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:464
+ /// \returns true if the scalars in VL are found in this entry.
+ bool isJumbled(ArrayRef<Value *> VL) const {
+ assert(VL.size() == Scalars.size() && "Invalid size");
----------------
mkuper wrote:
> I think you may want a different name - this doesn't actually check whether the scalars are jumbled, it checks whether they're all present. That is, it'll return true even if they're all in-order.
What about isFoundJumbled()?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:466
+ assert(VL.size() == Scalars.size() && "Invalid size");
+ for (unsigned i = 0; i < VL.size(); i++) {
+ bool Found = false;
----------------
mkuper wrote:
> RKSimon wrote:
> > This might be simplified with a for range loop and use of llvm:none_of / any_of?
> Would it make sense to pre-sort both arrays, and then check the two sorted arrays are equal? This would make it O(nlogn) instead of O(n^2)
> (I'm not sure sort based on what, though - as well as the actual gain, since I guess VL.size() is small in practice.)
Pre-sorting would require two calls for sort and then compare, IMO, for the given small VL.size it would not make much difference. However I am open to other views.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1197
// Check if the loads are consecutive, reversed, or neither.
// TODO: What we really want is to sort the loads, but for now, check
// the two likely directions.
----------------
mkuper wrote:
> This TODO gets done.
Yes, that's right.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1217
+ if (VL.size() > 2) {
+ SmallVector<Value *, 8> list;
----------------
mkuper wrote:
> mkuper wrote:
> > Do we still need the ReverseConsecutive case at all? That was introduced in r276477 as a patch for the common case of PR28474, where we find the loads in reverse order. But this should completely supersede it, right?
> Why not for VL.size() == 2?
A jumbled VL of VL.size() == 2 is essentially a case of reversed VL. Considering the tradeoff between compile time of extra buildTree() for VL.size==2 vs additional runtime for shufflevector, I opted for extra compile time over extra runtime.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1219
+ SmallVector<Value *, 8> list;
+ auto newVL = (ArrayRef<Value *>)0;
+ bool shuffledLoads = true;
----------------
mkuper wrote:
> This looks rather weird. Can you make it more idiomatic?
Sure
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1228
+ }
+ }
+ if (shuffledLoads) {
----------------
RKSimon wrote:
> Is it worth breaking here once we know that shuffledLoad is false?
>
> Remove the braces if you can.
Seems yes.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2582
+ TreeEntry *E = &VectorizableTree[Idx];
+ for (unsigned Lane = 0, LE = VL.size(); Lane != LE; ++Lane) {
+ if (E->Scalars[Lane] == VL[i]) {
----------------
RKSimon wrote:
> for range loop?
Sure
================
Comment at: test/Transforms/SLPVectorizer/X86/jumbled-load.ll:3
+
+
+; CHECK-LABEL: @jumbled-load
----------------
RKSimon wrote:
> Possibly commit this test to trunk with the current output generated by utils/update_test_checks.py?
By "current output" do you mean output generated by utils/update_test_checks.py with this patch by ?
================
Comment at: test/Transforms/SLPVectorizer/X86/jumbled-load.ll:10
+
+define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %out) {
+
----------------
mkuper wrote:
> Please add a test that has several load packets (e.g. multiplies one load sequence by another load sequence).
Sure
================
Comment at: test/Transforms/SLPVectorizer/X86/reduction_loads.ll:8
+; CHECK: mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[SHUF]]
define i32 @test(i32* nocapture readonly %p) {
----------------
mkuper wrote:
> RKSimon wrote:
> > It'd be better if there was more context to this additional shuffle - regenerate + commit the current output with utils/update_test_checks.py ? For an IR loop it's not that large an output.
> Yes, I'd be interested to know if we added a shuffle here, or just moved a shuffle from the store side to the load side (which makes sense).
By "current output" do you mean output generated by utils/update_test_checks.py with this patch by ? Pls explain.
https://reviews.llvm.org/D26905
More information about the llvm-commits
mailing list