[llvm-dev] Which pass should be propagating memory copies
Daniel Neilson via llvm-dev
llvm-dev at lists.llvm.org
Tue May 16 11:15:11 PDT 2017
Ah, sorry. I misunderstood the question. I’m new to the LLVM infrastructure, so I’m not sure exactly what exists to be able to do this, but I’d expect it to be some sort of combination of transforms done in sequence — instcombine to lower the memcpys, followed by some sort of data flow transform like value numbering to propagate the value stored to the stack into the second store, then some sort of dead-store. However, there’s a couple of immediate challenges that I can spot — 1) instcombine won’t lower a 32-byte store (it limits itself to lowering 8-byte memcpys and lower), and (2) the aliasing between %ptr & %dataptr might be some sort of barrier to the value numbering.
An alternative would be that value numbering would have to understand the load/store semantics of memcpy, and be smart enough to realize that it’s okay to merge these particular memcpys. There’s a ‘fixme’ in new GVN regarding memcpys in NewGVN::performSymbolicStoreEvaluation(). Perhaps that’s a place to start looking?
-Daniel
On May 16, 2017, at 12:56 PM, Keno Fischer <keno at juliacomputing.com<mailto:keno at juliacomputing.com>> wrote:
Hi Daniel,
as far as I can tell that handles turning small memcpys into store instructions. What I'm looking for
is something that can simplify (copy to stack) -> (modify stack) -> (copy back to heap) into a single
heap modification.
Keno
On Tue, May 16, 2017 at 1:50 PM, Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>> wrote:
The InstCombine transform does exactly what you want. Take a look at lib/Transforms/Scalar/InstCombine/InstCombineCalls.cpp — InstCombiner::SimplifyMemTransfer
With your align parameter on the memcpy being zero you are likely hitting the first conditional in that function:
if (CopyAlign < MinAlign) {
MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), MinAlign, false));
return MI;
}
Arguably, instcombine probably shouldn’t bail on trying to simplify the memcpy just because it could update the alignment on the call...
-Daniel
> On May 16, 2017, at 12:37 PM, Keno Fischer via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Consider the following IR example:
>
> define void @simple([4 x double] *%ptr, i64 %idx) {
> %stack = alloca [4 x double]
> %ptri8 = bitcast [4 x double] *%ptr to i8*
> %stacki8 = bitcast [4 x double] *%stack to i8*
> call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%stacki8, i8 *%ptri8, i32 32, i32 0, i1 0)
> %dataptr = getelementptr inbounds [4 x double], [4 x double] *%ptr, i32 0, i64 %idx
> store double 0.0, double *%dataptr
> call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%ptri8, i8 *%stacki8, i32 32, i32 0, i1 0)
> ret void
> }
>
>
> I would like to see this optimized to just a single store (into %ptr). Right now, even at -O3 that doesn't happen. My frontend guarantees that idx is always inbounds for the allocation, but I do think the transformation should be valid regardless because accessing beyond the bounds of the alloca should be undefined behavior. Now, my question is which pass should be responsible for doing this? SROA? DSE? GVN? A new pass just to do this kind of thing? Maybe there already is some pass that does this, just not in the default pipeline? Any hints would be much appreciated.
>
> Thanks,
> Keno
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170516/1321d7d5/attachment.html>
More information about the llvm-dev
mailing list