[llvm] [SLP] no need to generate extract for in-tree uses for original scala… (PR #76077)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 21 18:33:08 PST 2023
================
@@ -11898,6 +11864,12 @@ Value *BoUpSLP::vectorizeTree(
Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");
+ // Generate extract for in-tree uses if the use is scalar operand in
+ // vectorized instruction.
+ if (auto *UserVecTE = getTreeEntry(User))
+ if (doesInTreeUserNeedToExtract(Scalar, cast<Instruction>(User), TLI))
+ User = cast<llvm::User>(UserVecTE->VectorizedValue);
+
----------------
Enna1 wrote:
> Why need this?
Adding new vectorized load/store/call that use in-tree scalar to `ExternalUses` in `BoUpSLP::vectorizeTree()` is removed, we still should generate extract for these new added vectorized load/store/call. As we add the original scalar load/store/call to `ExternalUses` in `buildingExtractUsers()`, so I add a check here, generate extract for new added vectorized load/store/call instead of the scalar load/store/call that will be erased later.
Take a llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll as an example.
```
define i32 @fn1() {
entry:
%0 = load ptr, ptr @a, align 8
%add.ptr = getelementptr inbounds i64, ptr %0, i64 11
%1 = ptrtoint ptr %add.ptr to i64
store i64 %1, ptr %add.ptr, align 8
%add.ptr1 = getelementptr inbounds i64, ptr %0, i64 56
%2 = ptrtoint ptr %add.ptr1 to i64
%arrayidx2 = getelementptr inbounds i64, ptr %0, i64 12
store i64 %2, ptr %arrayidx2, align 8
; store <2 x i64> %5, ptr %add.ptr, align 8
ret i32 undef
}
```
Without this change, when generating extracts, `ExternalUses` have
`{%add.ptr, store i64 %1, ptr %add.ptr, align 8}` and
`{%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8}`,
what this patch want to do is only generate extract for `{%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8}` since there is no need to generate extract for `{%add.ptr, store i64 %1, ptr %add.ptr, align 8}`.
---
> I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.
Sorry, I don't quite understand this, could you please explain this a bit more?
https://github.com/llvm/llvm-project/pull/76077
More information about the llvm-commits
mailing list