[all-commits] [llvm/llvm-project] 437958: [X86] Improve SMULO/UMULO codegen for vXi8 vectors.

Wed Mar 31 10:14:12 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 437958d9fdb63779c10499befd7fb6ef67418a5f
      https://github.com/llvm/llvm-project/commit/437958d9fdb63779c10499befd7fb6ef67418a5f
  Author: Craig Topper <craig.topper at sifive.com>
  Date:   2021-03-31 (Wed, 31 Mar 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/prefer-avx256-mulo.ll
    M llvm/test/CodeGen/X86/vec_smulo.ll
    M llvm/test/CodeGen/X86/vec_umulo.ll

  Log Message:
  -----------
  [X86] Improve SMULO/UMULO codegen for vXi8 vectors.

The default expansion creates a MUL and either a MULHS/MULHU. Each
of those separately expand to sequences that use one or more
PMULLW instructions as well as additional instructions to
extend the types to vXi16. The MULHS/MULHU expansion computes the
whole 16-bit product, but only keeps the high part.

We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU
expansion, but keep both the high and low parts. And we can use
those parts to calculate the overflow.

For AVX512 we might have vXi1 overflow outputs. We can improve those by using
vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better
than truncating the high result to use vpcmpeqb. If we don't have avx512bw we
can extend up to v16i32 to use vpcmpeqd to produce a k register.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97624