[PATCH] D35035: [InstCombine] Prevent memcpy generation for small data size

Tue Oct 2 09:50:08 PDT 2018

spatel added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:139
+  // largest legal integer size.
+  unsigned LargestInt = DL.getLargestLegalIntTypeSizeInBits();
+  if (!LargestInt || Size > LargestInt)
----------------
lebedev.ri wrote:
> jfb wrote:
> > spatel wrote:
> > > hiraditya wrote:
> > > > jfb wrote:
> > > > > Why does this make sense? I don't understand why we'd want to tie this to the largest legal integer type, and not have it be its own target parameter.
> > > > We used 'largest legal integer size' because that will fit in a register for sure. I think making a target specific parameter seems reasonable, or maybe using TargetLowering.h:getMaxStoresPerMemcpy() which is already available.
> > > I'm not understanding this discussion...
> > > This patch is trying to apply some target-based constraint to a questionable (reverse) IR canonicalization that currently just pulls a number out of the air.
> > > The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.
> > > But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.
> > > Expanding memcpy for performance using target parameters belongs in a late, target-aware pass (and as mentioned, it already exists), not early in generic instcombine.
> > Let me try to expand a bit, and let me know if that makes sense:
> > 
> > IMO `LargestLegalIntTypeSizeInBits` doesn't make sense to use here. On ARM I might want to use paired integer load / store for memcpy, or paired Q registers. These have nothing to do with the largest legal integer type. It doesn't matter what the patch is trying to do or what is currently being done: the patch is adding something weird. It shouldn't.
> It's replacing magic hardcoded number with a number pulled out of target datalayout.
> Should be a new field in [[ https://llvm.org/docs/LangRef.html#data-layout | target datalayout ]]?
> I'm not sure if we can get TargetLowering info from the backend in the middle-end?
I agree that this is a strange transform for instcombine. We're trying to safely replace an obviously bogus magic number constraint (8) with something based on target reality. 

So this is an intermediate step as I see it. The goal of this patch (and this may be counter to the original goal...) is to not expand memcpy(x, y, 8) on a 32-bit system because that could take >1 load/store pair. The final goal is to not expand memcpy at all.

IIUC, you'd like to make this transform more aggressive rather than more conservative though? Ie, a pair of registers or Q regs are always bigger than LargestLegalIntType (let me know if I'm wrong).

But that's moving us further away from the target-independent canonicalization goal of instcombine. The kind of expansion where we ask if pairs or vector regs are available should be happening in memcpyopt (if it's not, then that should be enhanced in that pass/lowering).

https://reviews.llvm.org/D35035