[all-commits] [llvm/llvm-project] cc65a7: [X86] Improve i8 + 'slow' i16 funnel shift codegen

Sun May 24 00:09:29 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: cc65a7a5ea81f8cb5068e99d9bf407745623c624
      https://github.com/llvm/llvm-project/commit/cc65a7a5ea81f8cb5068e99d9bf407745623c624
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-05-24 (Sun, 24 May 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/fshl.ll
    M llvm/test/CodeGen/X86/fshr.ll
    M llvm/test/CodeGen/X86/rotate-extract.ll

  Log Message:
  -----------
  [X86] Improve i8 + 'slow' i16 funnel shift codegen

This is a preliminary patch before I deal with the xor+and issue raised in D77301.

We get much better code for i8/i16 funnel shifts by concatenating the operands together and performing the shift as a double width type, it avoids repeated use of the shift amount and partial registers.

fshl(x,y,z) -> (((zext(x) << bw) | zext(y)) << (z & (bw-1))) >> bw.
fshr(x,y,z) -> (((zext(x) << bw) | zext(y)) >> (z & (bw-1))) >> bw.

Alive2: http://volta.cs.utah.edu:8080/z/CZx7Cn

This doesn't do as well for i32 cases on x86_64 (the xor+and followup patch is much better) so I haven't bothered with that.

Cases with constant amounts are more dubious as well so I haven't currently bothered with those - its these kind of 'edge' cases that put me off trying to put this in TargetLowering::expandFunnelShift.

Differential Revision: https://reviews.llvm.org/D80466