[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer

Tue Dec 2 13:37:42 PST 2014

On 2 Dec 2014, at 19:57, Joerg Sonnenberger <joerg at britannica.bec.de> wrote:

> On Tue, Dec 02, 2014 at 07:23:01PM +0000, Robert Lougher wrote:
>> In feedback from game studios a common issue is the replacement of
>> loops with calls to memcpy/memset.  These loops are often
>> hand-optimised, and highly-efficient and the developers strongly want
>> a way to control the compiler (i.e. leave my loop alone).
> 
> I doubt that. If anything, it means the lowering of the intrinsic is
> bad, not that the transformation should not happen.

I'd agree.  On x86-64, however, memcpy is difficult.  Some recent profiling shows that various different approaches using SSE instructions have around a 50% performance difference between Sandy Bridge, Ivy Bridge and Haswell, with different versions performing very differently (no idea what the variation is like between AMD chips).

Lowering memcpy in LLVM is particularly horrible as it's done in three different places, only one of which has anything that's a bit like a cost model.  

We can often generate a very efficient memcpy loop in the back end if we know that the data being copied is strongly aligned.  For x86-64 (and our architecture), if the data is 256-bit aligned and known to be a multiple of 256 bits (or, even better, a multiple of a known multiple of 256 bits) then we can generate something that is likely to be significantly faster than a call to memcpy, but we often lose this information by the time we are doing the lowering.

The interface for target-specific lowering of memcpy is horribly convoluted (and assumes that memcpy is always in AS 0, even though the intrinsic supports multiple address spaces, but that's a different issue) and so some cleanup would make it possible to exploit some of this information a bit better.  Ideally, I'd see it moved entirely to the back end (or a single flag saying 'expand this in the IR, I don't care about optimising it yet'), rather than having the back end trying to provide SelectionDAG with some things that it sometimes uses.

David