[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer

Fri Dec 5 10:08:27 PST 2014

On 12/04/2014 11:46 PM, David Chisnall wrote:
> On 3 Dec 2014, at 23:36, Robert Lougher <rob.lougher at gmail.com> wrote:
>
>> On 2 December 2014 at 22:18, Alex Rosenberg <alexr at leftfield.org> wrote:
>>> Our C library amplifies this problem by being in a dynamic library, so the
>>> call has additional overhead, which for small trip counts swamps the
>>> copy/set.
>>>
>> I can't imagine we're the only platform (now or in the future) that
>> has comparatively slow library calls.  We had discussed some sort of
>> platform flag (has slow library calls) but this would be too late to
>> affect the loop-idiom.  However, it could affect lowering.  Following
>> on from Reid's earlier idea to lower short memcpys to an inlined,
>> slightly widened loop, we could expand into a guarded loop for small
>> values and a call?
> I think the bug is not that we are recognising that the loop is memcpy, it's that we're then generating an inefficient memcpy.  We do this for a variety of reasons, some of which apply elsewhere.  One issue I hit a few months ago was that the vectoriser doesn't notice whether unaligned loads and stores are supported, so will happily replace two adjacent i32 align 4 loads followed by two adjacent i64 align 4 stores with an i64 align 4 load followed by an i64 align 4 store, which more than doubles the number of instructions that the back end emits.
>
> We expand memcpy and friends in several different places (in the IR in at least one place, then in SelectionDAG, and then again in the back end, as I recall - I remember playing whack-a-bug with this for a while as the lowering was differently broken for our target in each place).  In SelectionDAG, we're dealing with a single basic block, so we can't construct the loop.  In the back end we've already lost a lot of high-level type information that would make this easier.
>
> I'd be in favour of consolidating the memcpy / memset / memmove expansion into an IR pass that would take a cost model from the target.
+1

It sounds like we might also be loosing information about alignment in 
the loop-idiom recognizer.  Or at least not using it when we lower.
>
> David
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev