[all-commits] [llvm/llvm-project] 01636c: [X86] Remove the v16i8->v16i16 path for MULHS with...

Tue May 12 10:32:54 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 01636c1eeace5371fef8508c7318df9d7a25b489
      https://github.com/llvm/llvm-project/commit/01636c1eeace5371fef8508c7318df9d7a25b489
  Author: Craig Topper <craig.topper at intel.com>
  Date:   2020-05-12 (Tue, 12 May 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/vec_smulo.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-256.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll

  Log Message:
  -----------
  [X86] Remove the v16i8->v16i16 path for MULHS with AVX2.

We have a couple main strategies for legalizing MULH.

-If the vXi16 type is legal, extend to do the full i16 multiply
and then shift and truncate the results.
-Use unpcks to split each 128 bit lane into high and low halves.a

For signed we have an extra case to split a v32i8 to v16i8 and then
use the extending to v16i16 strategy.

This patch proposes to use the unpck strategy instead. Which is
what we already do for unsigned.

This seems to be 1 instruction shorter when the RHS is constant
like the idiv case. It's 1 instruction longer for the smulo case.
But we're trading cross lane shuffles for inlane shuffles and a
shift.

Differential Revision: https://reviews.llvm.org/D79652