[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

Craig Topper via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jan 18 19:29:22 PST 2023


craig.topper added a comment.

In D141798#4064142 <https://reviews.llvm.org/D141798#4064142>, @MaskRay wrote:

> `ZB_Max` is the strange mode that should be dropped, perhaps also `ZB_Undefined`.
>
> In D141798#4064114 <https://reviews.llvm.org/D141798#4064114>, @arsenm wrote:
>
>>> If you care about compilation speed, you should build LLVM with an appropriate -march= to take advantage of lzcnt and tzcnt.
>>
>> I think this is bad reasoning, nobody really uses -march
>
> I agree. The reason should be clarified that the lzcnt performance here really doesn't matter.
> Note: https://stackoverflow.com/questions/21390165/why-does-breaking-the-output-dependency-of-lzcnt-matter lzcnt/tzcnt have false dependencies on older (pre-Skylake) Intel processors. But this doesn't really matter for LLVM, at least the minor issue does not justify keeping the weird mode `ZB_Undefined`.

I'm not sure the false dependency issue is relevant here. Without -march, compilers will emit BSR/BSF and possibly a cmov to handle the zero behavior. Both BSR/BSF always have a false dependency because the behavior for 0 input is to keep the old value of the output register. Knowing we can use lzcnt/tzcnt is always better than BSR/BSF.

If we know zero doesn't matter, without march, gcc and recent clang will always emit BSF using the encoding for TZCNT. On CPUs without TZCNT this encoding is treated as BSF. This allows us to use TZCNT at runtime if the CPU is modern even without -march being passed. We can't do the same for BSF/LZCNT because their outputs are inverted from each other.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798



More information about the cfe-commits mailing list