[PATCH] D32352: Go to eleven

Mon Apr 24 11:07:45 PDT 2017

RKSimon added a comment.

In https://reviews.llvm.org/D32352#735441, @spatel wrote:

> In https://reviews.llvm.org/D32352#735421, @avt77 wrote:
>
> > In https://reviews.llvm.org/D32352#735393, @spatel wrote:
> >
> > > Is this or should this be limited when optimizing for size? I didn't count the instruction bytes...it might depend on the multiplier constant which version is smaller?
> >
> >
> > It's already limited:
> >
> >   // An imul is usually smaller than the alternative sequence.
> >   if (DAG.getMachineFunction().getFunction()->optForMinSize())
>
>
> Ah, sorry I missed that. The fact that it is "MinSize" highlights that we're in a gray area for the DAG. That is, it's hard to know what the best sequence will be without looking at the instruction timing. Given that, we need to know if converting these muls is generally good. Do you have real or synthetic benchmark info for these cases? Is there a perf difference, for example, between Jaguar and Haswell (since those CPUs are specified in the tests)? Is the codegen ever different for those CPUs? If not, why are we adding different RUNs for them in this patch?

The hope is that in the longer term this can all be converted to MC patterns and driven by the scheduler models, but for that we need decent scheduler modelling of LEA (PR32326). In the meantime we might be better off only using multiple LEA calls when !Subtarget->slowLEA() ? In which case we need to add tests for silvermont

https://reviews.llvm.org/D32352