[llvm-dev] LLVM struct, alloca, SROA and the entry basic block

Sanjay Patel via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 9 08:38:21 PDT 2015


Hi Benoit -

I've been looking at memcpy/memset lowering and alignment issues recently.
See:
https://llvm.org/bugs/show_bug.cgi?id=24678
and the links from there.

If you can file a bug report with your test case and any perf data that
you've collected, that would be very helpful.

Thanks!


On Tue, Sep 8, 2015 at 12:11 PM, Benoit Belley via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Philip,
>
> Attached you will find the LLVM IR that causes LLVM 3.7.0 to emit assembly
> generating a whole bunch of blocked store-forwarding pipeline stalls.
>
> Compile using:
>
> $ llvm/3.7.0/Release/bin/opt -S -O3 store-forward-failure.ll -o - |
> llvm/3.7.0/Release/bin/llc -filetype=asm -O3 -o -
>
>
> You will find assembly sequences such as:
>
>         movss   dword ptr [rcx - 12], xmm4 # 32-bit store
>         movss   dword ptr [rcx - 16], xmm3 # 32-bit store
>     mov rdx, qword ptr [rcx - 16]      # 64-bit load
>
> Notice how the stores and loads are back-to-back and of different
> bit-width.  On my processor (Intel Sandy Bridge), this sequence seems to
> fail store-forwarding and to cause a huge CPU pipeline stall. Or at least,
> this is what the following CPU performance counter leads me to believe:
>
> LD_BLOCKS.STORE_FORWARD: Loads blocked by overlapping with store buffer
> that cannot be forwarded.
>
> My test case is generating 1,500,000,000  of these "blocked
> store-forwarding » when using LLVM 3.7 versus 74,000 for LLVM 3.6! The
> number of instructions executed per CPU cycles goes down to 0.7 IPC instead
> of 2.2 IPC.
>
> Further analysis suggests that it might be due to the GVN pass (which runs
> just before the MemCpy pass) which actually combines 2 32-bit loads into a
> single 64-bit load.  See the attached files.
>
> I have also noted that the alloca are actually getting properly annotated
> with an alignment of 8 bytes by the « Combine redundant instructions »
> pass. So, I guess that annotating alloca when emitting LLVM IR within our
> JIT compiler is unnecessary. Is that a fair assessment ?
>
> Is store-forwarding always blocking on these kind of memory accesses even
> if they are properly aligned ?
>
> (Side note: Moving the alloca into the entry BB, causes all of these
> redundant alloca, store and load instructions to be optimized out and the
> entire store-forwarding issue goes away for this particular test case. But,
> isn’t this an issue that could be triggered in other valid cases ?)
>
> Cheers,
> Benoit
>
> *Benoit Belley*
>
> Sr Principal Developer
>
> M&E-Product Development Group
>
>
>
> *MAIN* +1 514 393 1616
>
> *DIRECT* +1 438 448 6304
>
> *FAX* +1 514 393 0110
>
>
>
> Twitter <http://twitter.com/autodesk>
>
> Facebook <https://www.facebook.com/Autodesk>
>
>
>
> *Autodesk, Inc.*
>
> 10 Duke Street
>
> Montreal, Quebec, Canada H3C 2L7
>
> www.autodesk.com
>
>
>
> [image: Description: Email_Signature_Logobar]
>
>
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Benoit
> Belley via llvm-dev <llvm-dev at lists.llvm.org>
> Reply-To: Benoit Belley <benoit.belley at autodesk.com>
> Date: mardi 8 septembre 2015 13:11
> To: Philip Reames <listmail at philipreames.com>, "llvm-dev at lists.llvm.org" <
> llvm-dev at lists.llvm.org>
>
> Subject: Re: [llvm-dev] LLVM struct, alloca, SROA and the entry basic
> block
>
> From: Philip Reames <listmail at philipreames.com>
> Date: mardi 8 septembre 2015 12:50
> To: Benoit Belley <benoit.belley at autodesk.com>, "llvm-dev at lists.llvm.org"
> <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] LLVM struct, alloca, SROA and the entry basic
> block
>
> On 09/08/2015 07:21 AM, Benoit Belley via llvm-dev wrote:
>
> Hi everyone,
>
> We have noticed that the SROA pass will only eliminate ‘alloca’
> instructions if those are located in the entry basic block of a function.
>
> *As a general recommendation, should the LLVM IR emitted by our compiler
> always place ‘alloca’ instructions in the entry basic block ? (I couldn’t
> find any recommendations concerning this matter.)*
>
> Yes.
>
>
>
> Thanks Phil. Should this be mentioned somewhere in the documentation ? As
> a footnote in the LLVM Language Reference manual maybe ?
>
> As a note, I have also find out that alloca instructions should be placed
> before any call instructions as these can get inlined and then, the
> original alloca can no longer by placed in the entry basic block!
>
>
>
> In addition, we have noticed that the MemCpy pass will attempt to copy
> LLVM struct using moves that are as large as possible. For example, a
> struct of 3 floats is copied using a 64-bit and a 32-bit move. It is
> therefore important that such a struct be aligned on 8-byte boundary, not
> just 4 bytes! Else, one runs the risk of triggering store-forwarding
> failure pipelining stalls (which we did encountered really badly with one
> of our internal performance benchmark).
>
> This sounds like a bug to me.  We shouldn't be using the large load/stores
> without knowing they're aligned or that unaligned access is fast on a
> particular target.  Where this is best fixed (memcpy, store lowering?) I
> don't know.
>
>
> I’ll send out a test case. Maybe, that will help.
>
>
>
> *Is there any guidelines for specifying the alignment of LLVM structs
> allocated by alloca instructions ? Is rounding down to the structure size
> to the next power of 2 a good strategy ? Will the MemCpy pass issue moves
> of up to 64-bytes on AVX-512 capable processors ?*
>
> Cheers,
> Benoit
>
> *Benoit Belley*
>
> Sr Principal Developer
>
> M&E-Product Development Group
>
>
>
> *MAIN* +1 514 393 1616
>
> *DIRECT* +1 438 448 6304
>
> *FAX* +1 514 393 0110
>
>
>
> Twitter <http://twitter.com/autodesk>
>
> Facebook <https://www.facebook.com/Autodesk>
>
>
>
> *Autodesk, Inc.*
>
> 10 Duke Street
>
> Montreal, Quebec, Canada H3C 2L7
>
> www.autodesk.com
>
>
>
> [image: Description: Email_Signature_Logobar]
>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/24dbad81/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00001.png
Type: image/png
Size: 4316 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/24dbad81/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 350F40DB-4457-4455-A632-0DF05738AF15[21].png
Type: image/png
Size: 4316 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/24dbad81/attachment-0001.png>


More information about the llvm-dev mailing list