[PATCH] D140069: [DAGCombiner] Scalarize vectorized loads that are splatted

Thu Dec 15 03:41:31 PST 2022

luke added a comment.

In D140069#3997198 <https://reviews.llvm.org/D140069#3997198>, @pengfei wrote:

> Checked changes in X86 tests are all correct.

Thanks! It looks like some other X86 tests are still failing, I'll try to address them also.

On aarch64 some tests in `arm64-dup.ll` are failing:

  define <8 x i8> @vduplane8(<8 x i8>* %A) nounwind {
  ; CHECK-LABEL: vduplane8:
  ; CHECK:       // %bb.0:
  ; CHECK-NEXT:    ldr d0, [x0]
  ; CHECK-NEXT:    dup.8b v0, v0[1]
  ; CHECK-NEXT:    ret
  	%tmp1 = load <8 x i8>, <8 x i8>* %A
  	%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> < i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1 >
  	ret <8 x i8> %tmp2
  }

Is now being selected as a broadcasted load:

  vduplane8:                              // @vduplane8
  // %bb.0:
  	add	x8, x0, #1
  	ld1r.8b	{ v0 }, [x8]
  	ret

I'm not familiar with aarch64, but is it not possible to fold the offset in like this?

  vduplane8:                              // @vduplane8
  // %bb.0:
  	ld1r.8b	{ v0 }, [x0], #1
  	ret

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140069/new/

https://reviews.llvm.org/D140069