[PATCH] D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 6 03:32:51 PDT 2022
lebedev.ri added inline comments.
================
Comment at: llvm/test/CodeGen/X86/slow-pmulld.ll:284
; AVX2-64-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
-; AVX2-64-NEXT: vpbroadcastd {{.*#+}} ymm2 = [18778,18778,18778,18778,18778,18778,18778,18778]
+; AVX2-64-NEXT: vmovdqa {{.*#+}} ymm2 = <18778,u,18778,u,18778,u,18778,u,18778,u,18778,u,18778,u,18778,u>
; AVX2-64-NEXT: vpmaddwd %ymm2, %ymm0, %ymm0
----------------
lebedev.ri wrote:
> With AVX1, we can only broadcast i32 load to XMM/YMM, and i64 to YMM,
> but with AVX2 we can broadcast i8/i16/i32 load to XMM/YMM.
> Is `lowerBuildVectorAsBroadcast()` intentionally not doing that,
> because such i8/i16 broadcasts are slow, or is that a bug?
~~but with AVX2 we can broadcast i8/i16/i32 load to XMM/YMM.~~
but with AVX2 we can broadcast i8/i16/i32/64 load to XMM/YMM.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123163/new/
https://reviews.llvm.org/D123163
More information about the llvm-commits
mailing list