[LLVMdev] X86TargetLowering::LowerToBT

Mon Jan 19 10:16:04 PST 2015

On Sun, Jan 18, 2015 at 5:13 PM, Chris Sears <chris.sears at gmail.com> wrote:
> I'm tracking down an X86 code generation malfeasance regarding BT (bit test)
> and I have some questions.
>
> This IR matches and then X86TargetLowering::LowerToBT is called:
>
> %and = and i64 %shl, %val      ; (val & (1 << index)) != 0     ; bit test
> with a register index
>
>
> This IR does not match and so X86TargetLowering::LowerToBT is not called:
>
> %and = lshr i64 %val, 25          ; (val & (1 << 25)) != 0          ; bit
> test with an immediate index
>
> %conv = and i64 %and, 1
>
>
> Let's back that up a bit. Clang emits this IR. These expressions start out
> life in C as and with a left shifted masking bit, and are then converted
> into IR as right shifted values anded with a masking bit.
>
> This IR then remains untouched until Expand ISel Pseudo-instructions in llc
> (-O3). At that point, LowerToBT is called on the REGISTER version and
> substitutes in a BT reg,reg instruction:
>
> btq %rsi, %rdi                          ## <MCInst #312 BT64rr
>
>
> The IMMEDIATE version doesn't match the pattern and so LowerToBT is not
> called.
>
> Question: This is during pseudo instruction expansion. How could LowerToBT's
> caller have enough context to match the immediate IR version? In fact, lli
> isn't calling LowerToBT so it isn't matching. But isn't this really a
> peephole optimization issue?
>
> LLVM has a generic peephole optimizer, CodeGen/PeepholeOptimizer.cpp which
> has exactly one subclass in NVPTXTargetMachine.cpp.
>
> But isn't it better to deal with X86 LowerToBT in a PeepholeOptimizer
> subclass where you have a small window of instructions rather than during
> pseudo instruction expansion where you have really one instruction?
> PeepholeOptimizer doesn't seem to be getting much attention and certainly no
> attention at the subclass level.
>
> Bluntly, expansion is about expansion. Peephole optimization is the
> opposite.
>
> Question: Regardless, why is LowerToBT not being called for the IMMEDIATE
> version? I suppose you could look at the preceding instruction in the DAG.
> That seems a bit hacky.
>
> Another approach using LowerToBT would be to match lshr reg/imm first and
> then if the following instruction was an and reg,1 replace both with a BT.
> It doesn't look like LowerToBT as is can do that right now since it is
> matching the and instruction.

I think it's actually matching the comparison: LowerToBT is called by
LowerSetCC, which has a comment saying:

  // Optimize to BT if possible.
  // Lower (X & (1 << N)) == 0 to BT(X, N).
  // Lower ((X >>u N) & 1) != 0 to BT(X, N).
  // Lower ((X >>s N) & 1) != 0 to BT(X, N).

This doesn't match the immediate/LSHR version, because the ANDed
result is returned directly, and there's no comparison with 0.

If it is indeed profitable to generate the BT (a quick glance at
Agner's tables for Merom/Haswell shows it probably is), I would start
by looking at in PerformAndCombine, to replace the two nodes with an
X86ISD::BT.

-Ahmed

> SDValue X86TargetLowering::LowerToBT(SDValue And, ISD::CondCode CC, SDLoc
> dl, SelectionDAG &DAG) const { ... }
>
>
> But I think this is better done in a subclass of
> CodeGen/PeepholeOptimizer.cpp.
>
> thanks.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>