[llvm] [VectorCombine] isExtractExtractCheap - specify the extract/insert shuffle mask to improve shuffle costs (PR #114780)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 4 09:39:19 PST 2024
================
@@ -688,9 +688,9 @@ define i32 @load_multiple_extracts_with_constant_idx(ptr %x) {
define i32 @load_multiple_extracts_with_constant_idx_profitable(ptr %x) {
; CHECK-LABEL: @load_multiple_extracts_with_constant_idx_profitable(
; CHECK-NEXT: [[LV:%.*]] = load <8 x i32>, ptr [[X:%.*]], align 16
-; CHECK-NEXT: [[E_0:%.*]] = extractelement <8 x i32> [[LV]], i32 0
-; CHECK-NEXT: [[E_1:%.*]] = extractelement <8 x i32> [[LV]], i32 6
-; CHECK-NEXT: [[RES:%.*]] = add i32 [[E_0]], [[E_1]]
+; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <8 x i32> [[LV]], <8 x i32> poison, <8 x i32> <i32 6, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
----------------
davemgreen wrote:
Hello. Am I right in saying that the load is not included in the cost? It will be difficult to beat scalarization of the load if this doesn't lead to other optimizations, but I suspect that is not really what the costs are measuring.
Maybe @fhahn remembers more about this specific case. The change you have (pass the mask to the shuffle cost) seems like a sensible optimization. There is a comment that says:
```
// Aggressively form a vector op if the cost is equal because the transform
// may enable further optimization.
// Codegen can reverse this transform (scalarize) if it was not profitable.
```
Maybe it should be more aggressive in the backend at scalarizing. It looks like the costs here should be extract-lane-0 + extract-lane-2 (2) + i32 add (1) vs extract-lane-0 + shuffle (1 now?) + v8i32 add (2). If it could realize that the last v8i32 add was actually a v4i32 add, that might be more accurate (if I have those costs correct).
https://github.com/llvm/llvm-project/pull/114780
More information about the llvm-commits
mailing list