[all-commits] [llvm/llvm-project] 913d7a: [X86][SSE2] Use smarter instruction patterns for l...

Simon Pilgrim via All-commits all-commits at lists.llvm.org
Sun Oct 11 03:21:53 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 913d7a110efaad06888523d17e03b2833fc83ed2
      https://github.com/llvm/llvm-project/commit/913d7a110efaad06888523d17e03b2833fc83ed2
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-10-11 (Sun, 11 Oct 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/arith-uminmax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
    M llvm/test/CodeGen/X86/machine-combiner-int-vec.ll
    M llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
    M llvm/test/CodeGen/X86/midpoint-int-vec-128.ll
    M llvm/test/CodeGen/X86/sat-add.ll
    M llvm/test/CodeGen/X86/umax.ll
    M llvm/test/CodeGen/X86/umin.ll
    M llvm/test/CodeGen/X86/vec_minmax_uint.ll
    M llvm/test/CodeGen/X86/vector-reduce-umax.ll
    M llvm/test/CodeGen/X86/vector-reduce-umin.ll
    M llvm/test/CodeGen/X86/vector-trunc-usat.ll
    M llvm/test/CodeGen/X86/vselect-minmax.ll

  Log Message:
  -----------
  [X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16.

This is my first LLVM patch, so please tell me if there are any process issues.

The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one.

We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this.

Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case.

Patch By: @TomHender (Tom Hender) ActuallyaDeviloper

Differential Revision: https://reviews.llvm.org/D87236




More information about the All-commits mailing list