[PATCH] D145299: [X86] Generate better code for std::bit_ceil
Kazu Hirata via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Mar 4 12:37:47 PST 2023
kazu marked an inline comment as done.
kazu added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47340
+ if (Add.getOperand(0) != Cond.getOperand(0) ||
+ !isAllOnesConstant(Add.getOperand(1)))
+ return SDValue();
----------------
goldstein.w.n wrote:
> This all seems a bit a bit brittle, for example if someone does `std::bit_ceil(x + 1)` we won't have the `add` on `ctlz`:
>
> ```
> %2 = tail call i32 @llvm.ctlz.i32(i32 %0, i1 false), !range !5
> %3 = sub nuw nsw i32 32, %2
> %4 = shl nuw i32 1, %3
> %5 = add i32 %0, -1
> %6 = icmp ult i32 %5, -2
> %7 = select i1 %6, i32 %4, i32 1
> ret i32 %7
> ```
>
> is there a way we could more generically detect that the `select` is unneeded?
Yes, brittleness is a real concern.
I could calculate the difference between the `ctlz` argument and `icmp` argument.
================
Comment at: llvm/test/CodeGen/X86/bit_ceil.ll:2
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi2,+lzcnt | FileCheck %s
----------------
goldstein.w.n wrote:
> Can you add tests w.o `bmi` and w.o `lzcnt` enabled?
Sure. I just added:
https://github.com/llvm/llvm-project/commit/ccc849e0b1a4acb5fb16b0d0ddb63744fb0faec9
================
Comment at: llvm/test/CodeGen/X86/bit_ceil.ll:36
-; CHECK-NEXT: cmpq $2, %rdi
-; CHECK-NEXT: cmovbq %rcx, %rax
; CHECK-NEXT: retq
----------------
goldstein.w.n wrote:
> Is the select at the end just unneeded? If so this would probably be better addressed in the IR or `DAGCombiner` as its a pretty generic. Maybe `DAGCombiner::VisitSETCC`?
Correct, we do not need the `select` at the end.
We do rely on a subtle property that `shlx` masks off the shift count with 31 or 63, depending on the operand size. I generate `ISD::AND` with 31 (or 63), expecting that the hardware shift instruction absorbs it.
At the LLVM IR level, the result of the shift instruction is undefined because the final `select` picks 1 for inputs < 2.
Yes, we could do this in the IR or DAG combiner somewhere. Let me look into it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145299/new/
https://reviews.llvm.org/D145299
More information about the llvm-commits
mailing list