[llvm-dev] How to change CLang struct alignment behaviour?
Joan Lluch via llvm-dev
llvm-dev at lists.llvm.org
Tue May 14 09:51:01 PDT 2019
I tracked down the issue to a LLVM omission or bug, or that’s what I think. The issue can be easily reproduced for targets that do not support misaligned memory accessed and do not implement allowsMisalignedMemoryAccesses. Let me try to explain what happens:
In getMemcpyLoadsAndStores, the function FindOptimalMemOpLowering is called to obtain the list of Store operations that should be generated. If the memcpy destination is the non-fixed section of the stack, a value of Zero is passed as the destination alignment (DstAlign) to signal that it can be “changed”.
The FindOptimalMemOpLowering starts by calling the target getOptimalMemOpType. Most targets implement the later by returning the largest possible EVT that shall be used to create the load/store pairs. This is generally ok for targets that can handle non-aligned load/stores. If this function is not implemented, then the FindOptimalMemOpLowering does that by finding the largest legal integer type. Again, this is ok if the target would support misaligned accesses. Upon return, the FindOptimalMemOpLowering has found the minimal list of memory operations required to complete the memcpy.
Next, the getMemcpyLoadsAndStores continues by generating the required load/store pairs, which will result into too large loads/stores for targets that do not support misaligned accesses.
Later on, during the legalizeDAG phase, the function LegalizeLoadOps is invoked which eventually calls allowsMemoryAccess function, which only now calls allowsMisalignedMemoryAccesses to determine that it must replace the too long loads by the odd Shifts and OR aggregates of smaller loads, the latter happens inside expandUnalignedLoad
I would have expected, that for targets not supporting misaligned accesses, the list of load/stores generated around the getMemcpyLoadsAndStores function, would have respected such target limitation and would have not produced misaligned load/stores to begin with. Instead, the default implementation defers that for later and ends replacing such incorrect load/stores by odd expanded aggregates.
The workaround that I found was implementing getOptimalMemOpType in a way that rather than finding the highest possible EVT to minimise the number of load/stores, it would provide one that will never get greater than the SrcAlign. So this is what I have now:
EVT CPU74TargetLowering::getOptimalMemOpType(uint64_t Size,
unsigned DstAlign, unsigned SrcAlign, bool IsMemset, bool ZeroMemset,
bool MemcpyStrSrc, MachineFunction &MF) const
if ( DstAlign == 0 && !IsMemset )
// Return the EVT of the source
DstAlign = SrcAlign;
EVT VT = MVT::i64;
while (DstAlign && DstAlign < VT.getSizeInBits() / 8)
VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);
Maybe, this is what we are expected to do after all, but it still looks somewhat weird to me that the default LLVM implementation would not take care of misaligned memory access targets in a more appropriate way. Solving this issue is just a matter of including the code above in FindOptimalMemOpLowering for such targets. The current implementation does just the opposite, and will even use MVT::i64 as the destination align even if the source align is only MVT::i8, which I regard as totally wrong.
This problem is also shared by the MSP430 target, and it’s very easy to reproduce by just compiling the code that I posted before.
> On 13 May 2019, at 20:09, Tim Northover <t.p.northover at gmail.com> wrote:
> Hi Joan,
> On Mon, 13 May 2019 at 18:01, Joan Lluch <joan.lluch at icloud.com> wrote:
>> After looking at it a bit further, I think this is a Clang thing. Clang issues “align 2” if the struct has at least one int (2 bytes), but also if the entire struct size is multiple of 2. For example a struct with 4 char members. In these cases the LLVM backend correctly creates word sized load/stores (2 bytes).
> I'm slightly surprised that it happens based purely on size, but
> either way LLVM should be able to cope.
>> The LLVM backend just follows what’s dictated by Clang regarding alignment and thus it creates 2 byte or 1 byte load/stores instructions accordingly. I have not found a way to override this in LLVM. Any suggestions are appreciated.
> That sounds right, but I don't think it explains the shifts you
> described before. It should work out a lot better than what you're
> seeing. Specifically, a 3 byte struct (for example) ought to either
> lower to:
> load i16, load i8 + stores if your target can do misaligned i16 operations.
> load i8, load i8, load i8 + stores if not.
> Neither of those involve shifting operations. I'd suggest breaking
> just after getMemcpyLoadsAndStores and using SelectionDAG::dump to see
> exactly what it's created. Then try to work out where that gets
> pessimized to shifts, because it's not normal.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev