[PATCH] Gut InstCombiner::SimplifyMemTransfer.

Wed Apr 8 22:54:46 PDT 2015

Hi Chandler,

Not as easy as I was hoping then.

> Do you see any other way to solve the problem of non-overlapping
information?

I'll have to do some reading. If there's any aliasing metadata that we can
attach to express that the pointers are disjoint, that would work: In
SelectionDAGBuilder we could detect disjoint, misaligned load/store pairs
where the load has no other users and use the memcpy expansion instead.

> Is an under aligned memcpy really that much better than an under aligned
load and store???

It saves a bit of shifting and masking as you try to reconstruct the full
iN value in a register. This would be exacerbated if we raised the size
cap. I'll see if I can get you some numbers.

- Lang.

On Wed, Apr 8, 2015 at 10:38 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> On Wed, Apr 8, 2015 at 10:15 PM Lang Hames <lhames at gmail.com> wrote:
>
>> Hi David, Chandler,
>>
>> The attached patch guts the SimplifyMemTransfer method, which turns small
>> memcpys (1/2/4/8 bytes) into load/store pairs. Turning memcpys into
>> load/store pairs loses information, since we can no longer assume the
>> source and dest are non-overlapping. This is leading to some suboptimal
>> expansions for small memcpys on AArch64 when -mno-unaligned-access is
>> turned on (see r234462). I suspect other architectures would suffer similar
>> issues.
>>
>> I assume this transform is an old workaround to simplify other
>> non-memcpy-aware IR transforms. These days I think most IR transforms can
>> reason sensibly about memcpys, so I'm hoping this is safe to remove. FWIW,
>> removing it didn't hit any regression tests except those that were
>> verifying that this optimisation was being applied, but then you wouldn't
>> really expect it to hit any others.
>>
>
> Heh. I tried to remove it before and it regressed a *lot* of performance.
> Have you measured it? I think there are many places that don't today reason
> about memcpy but do reason about loads and stores. Here is a partial list:
>
> - GVN
> - ValueTracking.cpp's available loaded value (or whatever its called)
> which drives load combining and store-to-load forwarding throughout
> instcombine and the IR
> - EarlyCSE
> - LoopVectorize
>
> I thought about fixing all of this, but it seems really complicated and to
> have very little value. Loads and stores and SSA values are really useful.
> Do you see any other way to solve the problem of non-overlapping
> information?
>
>
>>
>> If this transform really is useful then we should probably revisit the
>> cut-off: 8-bytes isn't much these days.
>>
>
> Yea, this has been kind of horrible. I think the correct heuristic would
> be when the size is one for which we have a legal integer type.
>
>
>> Perhaps we should also only apply it if the alignment on the memcpy is
>> sufficiently high?
>>
>
> Is an under aligned memcpy really that much better than an under aligned
> load and store???
>
>>
>> Cheers,
>> Lang.
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150408/568e0527/attachment.html>