[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

Mon Feb 27 16:01:49 PST 2017

This patch only results in relaxing dependencies. This now *allows* new
orderings that were not previously allowed, but, the fact that we then
actually get such a suboptimal output likely indicates an issue elsewhere,
that this allowance is exacerbating.

Some observations:

1. For some reason, memop folding seems to be generating seriously
non-optimal instructions. It is somehow causing there to be 7 adds in the
output instead of 4 -- some with the store integrated, but also keeping the
original adds without the store integrated. That's no good...and didn't
used to happen. I expect this is the main problem.

2. The scheduler is then choosing an ordering that requires spilling
eflags. Not sure why; possibly due to the former it's pushed itself into a
corner where this appears like it's required.

3. Then, even if you need to spill, it's a shame that the x86 backend isn't
tracking bits in the flag register separately... Thus, a definition and use
of the carry bit requires saving/restoring the entire flags register, even
if all you cared about was the one carry bit. That's quite unfortunate, as
saving/restoring just the carry bit would be a LOT cheaper than
saving/restoring the entire register. I suspect it'd be possible to define
a 1-bit subregister of eflags and mark the various carry-in ops as only
using that. Might be worthwhile doing that, separately, even if fixing #1
makes this particular issue disappear for this test case.

On Sat, Feb 25, 2017 at 3:06 PM, Nirav Davé via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> rL296252's  main change was to turn on anti-aliasing in the DAGCombiner.
> This should generally be a mild improvement to code due to the relaxed
> memory constraints, modulo any patterns downstream that are no longer
> general enough. This looks to be the case here.
>
> I'm going to leave this for a little while longer to get a check that all
> the buildbots pass, but I'll revert this and make sure this test case looks
> more reasonable.
>
> -Nirav
>
>
>
> On Sat, Feb 25, 2017 at 12:51 PM, Amaury SECHET <deadalnix at gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm working with workload where the bottleneck is cryptographic signature
>> checks. Or, in compiler terms, most large integer operations.
>>
>> Looking at rL296252 , the state of affair in that area degraded quite
>> significantly, see test/CodeGen/X86/i256-add.ll for instance.
>>
>> Is there some kind of work in progress here and it is expected to get
>> better ? Because if not, that's a big problem. It looks like the problem is
>> that the compiler now choose to use pushfq/popfq in some cases rather than
>> chaining adc to propagate the carry in additions.
>>
>> I hope this can get sorted out quickly. I'm happy to help if that is
>> necessary.
>>
>> Thanks,
>>
>> Amaury SECHET
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/00255d16/attachment.html>