[PATCH] D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 6 03:40:06 PDT 2022
RKSimon accepted this revision.
RKSimon added a comment.
This revision is now accepted and ready to land.
LGTM - cheers
================
Comment at: llvm/test/CodeGen/X86/slow-pmulld.ll:284
; AVX2-64-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
-; AVX2-64-NEXT: vpbroadcastd {{.*#+}} ymm2 = [18778,18778,18778,18778,18778,18778,18778,18778]
+; AVX2-64-NEXT: vmovdqa {{.*#+}} ymm2 = <18778,u,18778,u,18778,u,18778,u,18778,u,18778,u,18778,u,18778,u>
; AVX2-64-NEXT: vpmaddwd %ymm2, %ymm0, %ymm0
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > With AVX1, we can only broadcast i32 load to XMM/YMM, and i64 to YMM,
> > but with AVX2 we can broadcast i8/i16/i32 load to XMM/YMM.
> > Is `lowerBuildVectorAsBroadcast()` intentionally not doing that,
> > because such i8/i16 broadcasts are slow, or is that a bug?
> ~~but with AVX2 we can broadcast i8/i16/i32 load to XMM/YMM.~~
> but with AVX2 we can broadcast i8/i16/i32/64 load to XMM/YMM.
>
Its one of the many annoyances of lowering constant broadcasts that I mentioned on https://github.com/llvm/llvm-project/issues/54743 - I think this is because AVX512 doesn't have many ops that do i8/i16 broadcast-memory folds? Let's accept it for now.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123163/new/
https://reviews.llvm.org/D123163
More information about the llvm-commits
mailing list