[llvm] [WebAssembly] Enable interleaved memory accesses (PR #125696)

Sam Parker via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 17 00:53:43 PST 2025


================
@@ -0,0 +1,361 @@
+; RUN: opt -mattr=+simd128 -passes=loop-vectorize %s | llc -mtriple=wasm32 -mattr=+simd128 -verify-machineinstrs -o - | FileCheck %s
+
+target triple = "wasm32"
+target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-i128:128-n32:64-S128-ni:1:10:20"
+
+%struct.Output32x2 = type { i32, i32 }
+%struct.Input8x2 = type { i8, i8 }
+%struct.Output32x4 = type { i32, i32, i32, i32 }
+%struct.Input8x4 = type { i8, i8, i8, i8 }
+%struct.Input16x2 = type { i16, i16 }
+%struct.Input16x4 = type { i16, i16, i16, i16 }
+%struct.Input32x2 = type { i32, i32 }
+%struct.Input32x4 = type { i32, i32, i32, i32 }
+
+; Function Attrs: nofree norecurse nosync nounwind memory(argmem: readwrite)
+define hidden void @accumulate8x2(ptr dead_on_unwind noalias writable sret(%struct.Output32x2) align 4 captures(none) %0, ptr noundef readonly captures(none) %1, i32 noundef %2) local_unnamed_addr #0 {
+; CHECK-LABEL: accumulate8x2:
+; CHECK: loop
+; CHECK: v128.load64_zero
+; CHECK: i8x16.shuffle 1, 3, 5, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+; CHECK: i16x8.extend_low_i8x16_u
+; CHECK: i32x4.extend_low_i16x8_u
----------------
sparker-arm wrote:

The loop accumulates the separate values in the struct and, here, the vectorizer is choosing a factor of 8, but the arithmetic is being done by a factor of 4. So, these shuffle patterns are deinterleaving four elements at a time so that we can accumulate alternating values.

I doubt the shuffles are great for anyone really, so I have a generic optimisation (which I need to re-commit to V8) to simplify the lowering of shuffles like this. But, as I said in the description, the potential uplift from choosing a higher vectorization factor should outweigh sub-optimal shuffle lowering. 

https://github.com/llvm/llvm-project/pull/125696


More information about the llvm-commits mailing list