[LLVMdev] How to enable use of 64bit load/store for 32bit architecture

Tue Apr 7 06:54:56 PDT 2015

On Apr 6, 2015, at 4:44 PM, Pete Cooper <peter_cooper at apple.com> wrote:
>> But, somewhere in the bowels after that, llvm seems to try to merge the two sequential i32 loads and stores into a v2i32 load and store. But, because "x" and "y" are not 64bit aligned, it then ends up converting that v2i32 load to occur from the stack -- after using i32 load/stores to copy x.a, x.b onto the stack temporarily! Then, immediately after the v2i32 load from the stack....it starts the reverse process -- a v2i32 store BACK to a different place in the stack, and i32 load/stores copying the latter stack location onto y.a and y.b.
>> 
>> That seems quite sub-optimal, but I'm not sure where it comes from. It looks like just *having* the v2i32 type assigned to the register class causes that "optimization". The other stuff I implemented doesn't affect it. Why is it trying to do a conversion to v2i32 when the arguments are not properly aligned? Running "llc -debug" doesn't seem to shed any light for me.
> Its probably DAGCombiner::MergeConsecutiveStores which is checking that the stores have equal alignment (and they do: 4), but not that the final store has enough alignment to be legal.  I’d breakpoint MergeStoresOfConstantsOrVecElts and see if you get a hit there.

Yes, it is indeed MergeConsecutiveStores doing this. (not MergeStoresOfConstantsOrVecElts; the case of merging a sequence of stores whose data comes from a sequence of loads handled in the main function body, and that's the case in question).

I see in a bunch of other places, llvm checks 'allowsMisalignedMemoryAccesses' before creating a new larger load/store operation. I suppose that code should do so as well. I wonder if this is the only place that creates unaligned operations improperly?

I'd say, additionally, the alignment checks that MergeConsecutiveStores does do seem pretty bogus:
1) it doesn't make sense to require that the stores all have the same alignment as each-other: if the struct as a whole had an alignment of 8, the first store would have alignment 8, and the second would have alignment 4. And *that* ought to be valid, but isn't currently treated as such.
2) Later on, it then requires that the alignment of each load is the same as the alignment of the corresponding store "if (Ld->getAlignment() != St->getAlignment()) break;". But why? As long as the alignments are both acceptable, I don't see what purpose ensuring that they're identical has.

As to what it's doing storing temporarily onto the stack, that's done by "ExpandUnalignedLoad" and "ExpandUnalignedStore" -- apparently they are only able to deal with unaligned vector and floating-point ops by temporarily storing to the stack, unless there's an equivalent integer type of the same size. That seems quite suboptimal -- I'd assume it would almost always be better to split into loads of the largest integer type actually available, and build things up from there, instead of indirecting through the stack...