[PATCH] D79652: [X86] Remove the v16i8->v16i16 path for MULHS with AVX2.

Tue May 12 10:44:09 PDT 2020

craig.topper marked an inline comment as done.
craig.topper added inline comments.

================
Comment at: llvm/test/CodeGen/X86/vec_smulo.ll:2842
+; AVX512-NEXT:    vpmullw %zmm3, %zmm4, %zmm3
+; AVX512-NEXT:    vmovdqa64 {{.*#+}} zmm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
+; AVX512-NEXT:    vpandq %zmm4, %zmm3, %zmm3
----------------
RKSimon wrote:
> How come this can't be a broadcast of a smaller constant?
We ended up here which doesn't know to broadcast i16 for some reason.

  // On Sandybridge (no AVX2), it is still better to load a constant vector
  // from the constant pool and not to broadcast it from a scalar.
  // But override that restriction when optimizing for size.
  // TODO: Check if splatting is recommended for other AVX-capable CPUs.
  if (ConstSplatVal && (Subtarget.hasAVX2() || OptForSize)) {
    EVT CVT = Ld.getValueType();
    assert(!CVT.isVector() && "Must not broadcast a vector type");

    // Splat f32, i32, v4f64, v4i64 in all cases with AVX2.
    // For size optimization, also splat v2f64 and v2i64, and for size opt
    // with AVX2, also splat i8 and i16.
    // With pattern matching, the VBROADCAST node may become a VMOVDDUP.
    if (ScalarSize == 32 || (IsGE256 && ScalarSize == 64) ||
        (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2()))) {

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79652/new/

https://reviews.llvm.org/D79652