[PATCH] D89479: [SimplifyCFG] Be more conservative when speculating in loops. (WIP)

Fri Oct 16 11:39:24 PDT 2020

fhahn added a comment.

In D89479#2334697 <https://reviews.llvm.org/D89479#2334697>, @spatel wrote:

> In D89479#2333186 <https://reviews.llvm.org/D89479#2333186>, @fhahn wrote:
>
>> In D89479#2333065 <https://reviews.llvm.org/D89479#2333065>, @spatel wrote:
>>
>>> Another reason that we would likely want a finer-grain solution: recent AMD implementations appear to have full-speed lzcnt (1 cycle and full throughput according to Agner's tables for Jaguar and Ryzen).
>>
>> Yeah, the issue here is really the throughput/number of execution units available together with the number of cycles. I guess we could ask TTI about that and get roughly sane results?
>
> It's tricky. We can not make the TTI model completely accurate without re-implementing or somehow exposing all of codegen's behavior to the IR optimizer. And so IIUC, it's intentional that we not make such fine-grain decisions in IR; we want those kinds of transforms to happen later. For example, the x86 cost models use worst-case timing for a given ISA/attribute set instead of differentiating per CPU model. It's up to codegen to improve on that.
>
> That said, I think there's something weird already going on for `ctlz` as demonstrated in `test1` and others in the test file shown here. In that test, we speculated the ctlz no matter what target attributes (bmi/lzcnt) were given. But the CHECK lines show that we arrive at that same result on 2 different paths - in one case the select has no name set, and the other is called "spec.select". If the intent was that base x86 not speculate a potentially expensive ctlz intrinsic, we already broke that. It's possible that both the cost model and SimplifyCFG are at fault.

I think the problem with TTI & CTLZ/CTTZ is that D80012 <https://reviews.llvm.org/D80012> removed an exit `return TCC_Expensive`  if they are not cheap to speculate. My immediate problem can be fixed by returning the exit (89578). But we might want to re-consider on what architectures CTLZ/CTTZ are cheap. Currently we consider them cheap on Haswell or Skylake, but they really are not, especially when speculated in a loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89479/new/

https://reviews.llvm.org/D89479