[llvm-dev] How to change CLang struct alignment behaviour?

Fri May 17 14:12:09 PDT 2019

In terms of increasing the alignment of a variable, there’s an API getOrEnforceKnownAlignment which is used by LLVM optimizations to increase the alignment of global variables and stack allocations, when it appears profitable.  There currently isn’t any transform that calls it for memcpy operations, but maybe worth implementing for cases where the size of the memcpy is a small constant.

-Eli

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Joan Lluch via llvm-dev
Sent: Friday, May 17, 2019 1:13 PM
To: Tim Northover <t.p.northover at gmail.com>
Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] How to change CLang struct alignment behaviour?

Hello Tim,

After a second look at the topic I have gotten a better understanding on what’s going on.

On the original implementation FindOptimalMemOpLowering and getOptimalMemOpType gets called to find the minimal set of load/store pairs to replace a memcpy operation. So for example, a 5 byte memcpy on a 32 bit architecture gets replaced by a 32 bit load/store and a 8 bit extload/truncstore. In a similar way, for a 16 bit architecture this will get replaced by a two 16 bit load/stores and one 8 bit extload/truncstore  That’s the optimal way to deal with it, however, for targets not implementing the <TheTarget>TargetLowering::allowsMisalignedMemoryAccesses function, this causes a call to expandUnalignedLoad at a later stage, which inserts the undesired shift/or code sequences.

On my previous fix approach, I attempted to solve the issue by getting the FindOptimalMemOpLowering and getOptimalMemOpType to use the smallest possible alignment for both the source and the destination. This avoids the shift/or code because the generated load/stores are always of the smaller type. This kind of works, but that approach generates a larger number of load/store pairs than the original implementation. For example, for a 5 byte memcpy it will generate 5 single byte load/store pairs, which may be more or less optimal depending on the target or whether you aim to optimize for speed or size.

The ultimate reason for the original problem is that the ‘load’ instructions are conservatively created with the source operand alignment. But the loads in reality never attempt to access unaligned offsets. Indeed, all the load/store replacements (regardless of size) are created on aligned offsets with respect to the origin of the memcpy operands, regardless of the target supporting or not misaligned accesses. This means that there’s no need to expand the load/stores into shifts/ors even for targets not supporting misaligned accesses, PROVIDED THAT we know that the initial operand addresses of the memcpy were aligned already.

So this leads to a problem of a more difficult solution, we need to infer more information from the memcpy operands. The current getMemcpyLoadsAndStores implementation already checks for the destination to be the stack frame, and uses this to provide bigger alignments to stores. For the source operand the function calls SelectionDAG::InferPtrAlignment. Unfortunately, the later always returns 0 rather than the proper alignment for the source node, which is the seed of the problem. Instead, if we already knew that pointers to a base Global Address are aligned to 2 for a particular architecture, we should use that information to set SrcAlign, and therefore produce an optimal sequence of load/stores for memcpy replacement.

We need to find the nature of the source and set SrcAlign to the correct value (greater than 1) at the beginning of getMemcpyLoadsAndStores. That would be the perfect fix.

Alternatively, we could provide the correct Alignment argument already to the getMemcpy function

So my question now goes back to my original one, how do I do that?

By the way, if I manually set align to 2 in Clang output files, such as on the line below, (which corresponds to a global var declaration):

@ga = dso_local local_unnamed_addr global %struct.AA { i8 3, i8 4, i8 5, i8 6 }, align 1

replaced by

@ga = dso_local local_unnamed_addr global %struct.AA { i8 3, i8 4, i8 5, i8 6 }, align 2

then everything turns perfect inside LLVM.

So again, I need a hook either in CLang or in LLVM to set the desired align to global vars and stack allocas.

Any ideas?

John Lluch

On 14 May 2019, at 23:21, Joan Lluch <joan.lluch at icloud.com<mailto:joan.lluch at icloud.com>> wrote:

Hi Tim,

I am not sure about what you mean by “upstream” the function. I attach a text file with my proposed changes in case you can do something with it. It seems to me that the programmer that coded this function was already considering this, or maybe somebody spoiled it later, because it turns out that the problem is just a missing sentence in the function, that I have now added. I have tested it the best that I could for several targets, but I have not applied any systematised testing procedure.

I also found that not only the MSP430 target is affected, but also the MIPS16 target too.

For the MIPS16 you also need to remove the “getOptimalMemOpType” or make it return MVT::Other, because the current implementation returning a concrete MVT prevents the proposed changes to take the desired effect. The changes benefit the MIPS16  (llc -march=mips -mips-os16) because it’s a subtarget that does not support unaligned accesses.

Thanks for your help. The function is attached, it just has an added statement

John Lluch
<FindOptimalMemOpLowering.txt>

On 14 May 2019, at 19:55, Tim Northover <t.p.northover at gmail.com<mailto:t.p.northover at gmail.com>> wrote:

Hi John,

On Tue, 14 May 2019 at 17:51, Joan Lluch <joan.lluch at icloud.com<mailto:joan.lluch at icloud.com>> wrote:

This problem is also shared by the MSP430 target, and it’s very easy to reproduce by just compiling the code that I posted before.

That's some good detective work; it definitely explains what you're
seeing. Since MSP430 is affected it would probably be pretty easy to
upstream an alignment-aware version of the function and test it, if
you're keen. I'd promise to do a review promptly!

Cheers.

Tim.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190517/5dc9aa15/attachment.html>