[PATCH] D145299: [X86] Generate better code for std::bit_ceil

Kazu Hirata via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Mar 4 12:37:47 PST 2023


kazu marked an inline comment as done.
kazu added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47340
+  if (Add.getOperand(0) != Cond.getOperand(0) ||
+      !isAllOnesConstant(Add.getOperand(1)))
+    return SDValue();
----------------
goldstein.w.n wrote:
> This all seems a bit a bit brittle, for example if someone does `std::bit_ceil(x + 1)` we won't have the `add` on `ctlz`:
> 
> ```
>   %2 = tail call i32 @llvm.ctlz.i32(i32 %0, i1 false), !range !5
>   %3 = sub nuw nsw i32 32, %2
>   %4 = shl nuw i32 1, %3
>   %5 = add i32 %0, -1
>   %6 = icmp ult i32 %5, -2
>   %7 = select i1 %6, i32 %4, i32 1
>   ret i32 %7
> ```
> 
> is there a way we could more generically detect that the `select` is unneeded?
Yes, brittleness is a real concern.

I could calculate the difference between the `ctlz` argument and `icmp` argument.


================
Comment at: llvm/test/CodeGen/X86/bit_ceil.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi2,+lzcnt | FileCheck %s
 
----------------
goldstein.w.n wrote:
> Can you add tests w.o `bmi` and w.o `lzcnt` enabled?
Sure.  I just added:

https://github.com/llvm/llvm-project/commit/ccc849e0b1a4acb5fb16b0d0ddb63744fb0faec9


================
Comment at: llvm/test/CodeGen/X86/bit_ceil.ll:36
-; CHECK-NEXT:    cmpq $2, %rdi
-; CHECK-NEXT:    cmovbq %rcx, %rax
 ; CHECK-NEXT:    retq
----------------
goldstein.w.n wrote:
> Is the select at the end just unneeded? If so this would probably be better addressed in the IR or `DAGCombiner` as its a pretty generic. Maybe `DAGCombiner::VisitSETCC`? 
Correct, we do not need the `select` at the end.

We do rely on a subtle property that `shlx` masks off the shift count with 31 or 63, depending on the operand size.  I generate `ISD::AND` with 31 (or 63), expecting that the hardware shift instruction absorbs it.

At the LLVM IR level, the result of the shift instruction is undefined because the final `select` picks 1 for inputs < 2.

Yes, we could do this in the IR or DAG combiner somewhere.  Let me look into it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145299/new/

https://reviews.llvm.org/D145299



More information about the llvm-commits mailing list