[llvm-dev] Missed optimization - spill/load generated instead of reg-to-reg move (and two other questions)

Wed Feb 28 15:18:12 PST 2018

On 02/27/2018 10:21 AM, Alex Wang via llvm-dev wrote:
> Hello all!
>
> I was looking through the results of disassembling a heavily-used 
> short function
> in the program I'm working on, and ended up wondering why LLVM was 
> generating
> that assembly and what changes would be necessary to improve the code. 
> I asked
> on #llvm, but it seems that the people with the necessary expertise 
> weren't
> around.
>
> Here is a condensed version of the code: https://godbolt.org/g/ec5cP7
>
> My main question concerns assembly lines 37/38 and 59/60, where xmm0 
> is spilled
> to the stack, only to be immediately reloaded into xmm1. Google tells 
> me that
> there is a register-to-register mov instruction for the xmmn 
> registers, so I
> found it odd that LLVM missed what looks like an easy optimization. 
> tstellar on
> #llvm pointed me towards using -debug-only=regalloc with llc to see 
> what LLVM is
> thinking (regalloc log here, since I'm not sure what's considered "too 
> large"
> for mailing lists: [0]), and it seemed to me like the load/store were
> introduced separately, and llc never looked at them at the same time, 
> and so
> never realized that they could be folded. Is that what is happening? I 
> know
> little about compilers, so I wouldn't be surprised if I were wrong.
I don't have time to dig into this in detail, but you're heading in the 
right direction if you're looking at regalloc tracing.  This vaguely 
looks like something related to phi lowering, so you might want to check 
what the MIR looks like immediately before regalloc as well.
>
> The other two questions are tangential, so please let me know if I 
> should ask
> them somewhere else.
>
> On assembly lines 24 and 46, I think the vtable pointer for the Quad 
> object is
> being reloaded every iteration of the loop. nbjoerg on #llvm said 
> that's due to
> the possibility of placement new being used somewhere inside the called
> function, which makes sense to me. Is there a way to indicate to LLVM 
> that this
> will not happen? I tried [[gnu::pure]], since the function doesn't 
> write to
> externally-visible memory, but the vtable pointer reload remained.
I don't know that we have anything like this, but we totally should if 
we don't.  You're more likely to get a useful answer if you send this 
separately to cfe-dev though.  The clang frontend devs don't tend to 
read emails apparently about register allocation.  :)

If you want to assist with the devirt directly, you could capture the 
member pointer on the first iteration, then reuse.  This does require 
that all Nodes in your array are the exact same type though!

>
> Finally, I'm inclined to say that this routine should be vectorizable, 
> since
> it's essentially just an accumulate, but Clang can't prove that 
> GetLocalValue
> doesn't have side effects that will affect later iterations. Is this 
> correct,
> and if so, are there any hints I can give Clang besides just manually
> parallelizing it with #pragma omp or something?
To clarify, is the the GetLocalValue in your example?  Or some more 
complicated version?  It depends a lot on what IPO can conclude about 
the function.  You can also manually annotate the initial IR, but I 
don't know how to do that from clang.
>
> I do intend on changing this loop to something a bit less messy, but 
> it'll be
> part of a larger refactoring, so it's still a ways off.
>
> Thanks!
>
> Alex
>
>    [0]: https://hastebin.com/raw/oqamesahos
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180228/5974edc7/attachment-0001.html>