[PATCH] D89479: [SimplifyCFG] Be more conservative when speculating in loops. (WIP)

Fri Oct 16 06:24:49 PDT 2020

spatel added a comment.

In D89479#2333186 <https://reviews.llvm.org/D89479#2333186>, @fhahn wrote:

> In D89479#2333065 <https://reviews.llvm.org/D89479#2333065>, @spatel wrote:
>
>> Another reason that we would likely want a finer-grain solution: recent AMD implementations appear to have full-speed lzcnt (1 cycle and full throughput according to Agner's tables for Jaguar and Ryzen).
>
> Yeah, the issue here is really the throughput/number of execution units available together with the number of cycles. I guess we could ask TTI about that and get roughly sane results?

It's tricky. We can not make the TTI model completely accurate without re-implementing or somehow exposing all of codegen's behavior to the IR optimizer. And so IIUC, it's intentional that we not make such fine-grain decisions in IR; we want those kinds of transforms to happen later. For example, the x86 cost models use worst-case timing for a given ISA/attribute set instead of differentiating per CPU model. It's up to codegen to improve on that.

That said, I think there's something weird already going on for `ctlz` as demonstrated in `test1` and others in the test file shown here. In that test, we speculated the ctlz no matter what target attributes (bmi/lzcnt) were given. But the CHECK lines show that we arrive at that same result on 2 different paths - in one case the select has no name set, and the other is called "spec.select". If the intent was that base x86 not speculate a potentially expensive ctlz intrinsic, we already broke that. It's possible that both the cost model and SimplifyCFG are at fault.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89479/new/

https://reviews.llvm.org/D89479