[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Fri Nov 3 09:25:50 PDT 2017

kristof.beyls added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:71-73
+  // This is used to cache instruction replacement decisions within function
+  // units and across function units.
+  std::map<unsigned, bool> SIMDInstrTable;
----------------
The cache here will also need to take into account the TargetSubtargetInfo as the key into the cache.
Otherwise, wouldn't you get unexpected results for e.g. the following .ll file with a mixture of targeting different subtargets (see "target-cpu" function attributes)?

```
declare <2 x float> @llvm.fma.v2f32(<2 x float>, <2 x float>, <2 x float>)

define <2 x float> @test_vfma_lane_f32_a57(<2 x float> %a, <2 x float> %b, <2 x float> %v)
"target-cpu"="cortex-a57" {
entry:
  %lane = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
  %0 = tail call <2 x float> @llvm.fma.v2f32(<2 x float> %lane, <2 x float> %b, <2 x float> %a)
  ret <2 x float> %0
}

define <2 x float> @test_vfma_lane_f32_exynos(<2 x float> %a, <2 x float> %b, <2 x float> %v)
"target-cpu"="exynos-m1" {
entry:
  %lane = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
  %0 = tail call <2 x float> @llvm.fma.v2f32(<2 x float> %lane, <2 x float> %b, <2 x float> %a)
  ret <2 x float> %0
}
```

I have no idea how to get a stable key from TargetSubtargetInfo to use in this cache.

At this point, I'd measure the overhead of this pass e.g. using CTMark to make sure that the caching mechanism is actually needed before investing more time in it.

https://reviews.llvm.org/D38196