[PATCH] D10905: move DAGCombiner's allowableAlignment() helper function into the TLI

Tue Jul 21 09:35:20 PDT 2015

spatel added inline comments.

================
Comment at: lib/CodeGen/TargetLoweringBase.cpp:1547
@@ +1546,3 @@
+  // aligned access of this type is allowed too.
+  if (allowsMisalignedMemoryAccesses(VT, AddrSpace, Alignment, Fast))
+    return true;
----------------
jyknight wrote:
> dsanders wrote:
> > spatel wrote:
> > > jyknight wrote:
> > > > This won't return the right result in Fast.
> > > > 
> > > > E.g. allowsMisalignedMemoryAccesses might return with `{ *Fast = false; return true; }`, which would then cause this function to claim that a properly aligned memory access is not fast.
> > > > 
> > > > That's obviously wrong, but I'm not really sure what Right is.
> > > > 
> > > > What's the actual contract for the ABI alignment and Preferred alignment information in DataLayout?
> > > > 
> > > > I think there's three things that LLVM might want to know about alignment and memory accesses:
> > > > 
> > > > 1) If I can choose any alignment for this data, what should it be? (getPrefTypeAlignment should return that, I think)
> > > > 
> > > > 2) Is this memory access guaranteed to work *AT ALL*, no matter the speed. I think that's a combination of getABITypeAlignment, and then allowsMisalignedMemoryAccesses without checking "Fast".
> > > > 
> > > >  Although: is it always guaranteed that the ABI alignment for a type is always okay to load/store from? It'd be sane for it to be, but it's not really totally crazy to have a target where that's not true: e.g. ABI alignment of 32bits for an i64, but requires 64bit alignment for an i64 load/store?) I guess that's not the case of any existing target, since LLVM doesn't support it.
> > > > 
> > > > 3) Is it worthwhile to combine some memory accesses into a larger memory access, given alignment of X? I'm not sure this is actually adequately expressed in LLVM right now. Could be it's intended to be getPrefTypeAlignment and allowsMisalignedMemoryAccesses (with testing Fast==true) is supposed to be this.
> > > > 
> > > >  But, is "preferred alignment" and "fast to access" supposed to always be the same thing? Because, e.g. MIPS says i8:8:32-i16:16:32-i64:64 -- so both i8 and i16 have abi alignment of their size, but preferred alignment of 32 bits. I don't know a lot about MIPS, so I don't know why it claims a preferred alignment of 32bits, but I think it has a perfectly good 1/2-byte load instructions that work at their natural alignment.
> > > > 
> > > >  So on MIPS (even pre-MIPS32r6/MIPS64r6 which apparently made unaligned access allowed), it seems like it should always be an improvement to merge 8-bit loads which are 16-bit aligned into a 16-bit load. Yet, going by getPrefTypeAlignment would say that's not ok.
> > > Hi James -
> > > 
> > > Thanks for the close review - much appreciated!
> > > 
> > > I can certainly fix the Fast bug that you noted, but I also don't know what the right answer is. 
> > > 
> > > My guess is that the MIPS DataLayout is wrong. Let's see if we can find out...
> > > 
> > > The 32-bit preferred alignment for MIPS i8/i16 was introduced way back here:
> > > http://reviews.llvm.org/rL53585 
> > > 
> > > Adding Bruno (author) and Daniel (currently listed as owner of the MIPS backend).
> > > 
> > I'm afraid I'm not sure why we prefer 32-bit alignment for i8 and i16. My best guess is that it pays off in library calls by allowing better-optimised memcpy/strcpy/strcmp/etc.
> > 
> > > I don't know a lot about MIPS, so I don't know why it claims a preferred alignment of 32bits, 
> > > but I think it has a perfectly good 1/2-byte load instructions that work at their natural alignment. 
> > 
> > That's right. As far as I know they're usually (possibly always) the same latency too.
> > 
> > > ... (even pre-MIPS32r6/MIPS64r6 which apparently made unaligned access 
> > > allowed) ...
> > 
> > That's right. 32r6/64r6 dropped the unaligned load/store left/right instruction pairs in favour a promise that something in the system will make unaligned load/store work without special instructions. Depending on the MIPS implementation, the 'something' could be full hardware support, or a software exception handler, or a hybrid such as hardware-support for unaligned accesses inside a cache-line and software otherwise.
> > 
> > > So on MIPS (even pre-MIPS32r6/MIPS64r6 which apparently made unaligned access 
> > > allowed), it seems like it should always be an improvement to merge 8-bit loads which are 
> > > 16-bit aligned into a 16-bit load. Yet, going by getPrefTypeAlignment would say that's not ok.
> > 
> > As far as the load is concerned, I agree it's always an improvement to merge like this. The risk is that manipulating the data might cost more than the saving.
> > 
> > It's worth mentioning that some MIPS implementations will do this optimization in hardware without the risk of needing additional data manipulation.
> Even if there were no reason at all for MIPS to have preferred alignment set as it is, I think I've convinced myself that using "preferred" alignment in this function isn't correct. The "Fast" output should be indicating whether it's faster to do a potentially-misaligned load/store, or to use multiple load/store instructions. But "preferred" alignment is indicating the overall BEST alignment choice, not just what is going to be faster-or-equal than separate loads.
> 
> E.g. consider an i64:32:64 alignment spec, with a target that doesn't support under-aligned memory access (it supports the abi alignment of 32bits, but not below). It still is quite possible that the target behavior is such that merging two 32-bit loads to a 64bit load is a good idea -- e.g. that it's at worst the same speed as two 32-bit loads, but is still faster when there's 64bit alignment.
> 
> So perhaps the DataLayout should not actually be used to determine what memory alignments are valid for load/store on a target at all? That is, I'm thinking that targets maybe ought to implement "allowsMemoryAccess" directly, describing the entirety of that target's rules for what memory alignments work, and are "fast", rather than having "allowsMisalignedMemoryAccess" which describes only half of them, the other half intuited from the DataLayout. That seems a potentially saner model -- leaving the "Data Layout" spec to just describe the layout, and the target to describe what the hardware/runtime-environment can actually do, independent of that.
> 
> Alternatively it might be okay for the ABI alignment to just be assumed to be fast. If that's not the case, that's certainly a suboptimal ABI, but it wouldn't surprise me wouldn't surprise me if some ABIs were suboptimal in that way.
I see your point about making a new target hook that implements all of 'allowsMemoryAccess' directly, but a quick look at the implementations of DataLayout::getAlignment() and DataLayout::getAlignmentInfo() makes me scared...

Let me post an updated version of the patch that assumes the ABI alignment is fast and see what people think.

http://reviews.llvm.org/D10905