[LLVMdev] How to enable use of 64bit load/store for 32bit architecture

Mon Apr 6 13:44:13 PDT 2015

> On Apr 6, 2015, at 1:35 PM, James Y Knight <jyknight at google.com> wrote:
> 
> Thanks for the help so far!
> 
> So, there were three alternatives proposed on this thread.
> 
> (1) Jim's suggestion to copy the solution ARM uses for LDRD/STRD. This looks like it would be applicable, since ARM's LDRD/STRD seem a basically direct analog of the Sparc's LDD/STD instructions. Both require an even/odd register pair, and 64bit alignment of the memory for the 2x32 load/store. But that solution seemed pretty complicated, requiring writing a separate optimization pass. I thought I'd try a simpler solution first...
> 
> (2) Tom's suggestion of declaring that the IntPair registers are of type v2i32. I actually tried implementing this. And it does seem to work...kinda.
> 
> First thing -- it seems kinda hacky to me to use a vector type for a pair of integer registers. What if the architecture also had real vector ops? Wouldn't that conflict with the use v2i32 to represent a pair of integer registers?
> 
> But anyways,
> I implemented Legal actions of LOAD, STORE, EXTRACT_VECTOR_ELT, and BUILD_VECTOR (the latter two expanding to EXTRACT_SUBREG/INSERT_SUBREG patterns) on the v2i32 type in the .td file. All other operations for v2i32 are marked Expand. I made i64 LOAD/STORE Custom, and added handlers in LowerOperation and ReplaceNodeResults, which do a v2i32 load/store, and a bitcast (which turns into the build/extract operations).
> 
> So, that all seems to work fine for i64 operations now.
> 
> But, doing some testing, it seems that something within LLVM *really* wants to merge two i32 loads into an v2i32, even though it's not appropriate. It makes terrible code for this example:
>   struct X {
>     int a, b;
>   };
>   struct X x, y;
> 
>   void f() {
>     long a = x.a, b = x.b;
>     y.a = a; y.b = b;
>   }
> 
> "f()" turns into the expected 4 operations:
>   %0 = load i32, i32* getelementptr inbounds (%struct.X, %struct.X* @x, i32 0, i32 0), align 4, !tbaa !1
>   %1 = load i32, i32* getelementptr inbounds (%struct.X, %struct.X* @x, i32 0, i32 1), align 4, !tbaa !6
>   store i32 %0, i32* getelementptr inbounds (%struct.X, %struct.X* @y, i32 0, i32 0), align 4, !tbaa !1
>   store i32 %1, i32* getelementptr inbounds (%struct.X, %struct.X* @y, i32 0, i32 1), align 4, !tbaa !6
> 
> But, somewhere in the bowels after that, llvm seems to try to merge the two sequential i32 loads and stores into a v2i32 load and store. But, because "x" and "y" are not 64bit aligned, it then ends up converting that v2i32 load to occur from the stack -- after using i32 load/stores to copy x.a, x.b onto the stack temporarily! Then, immediately after the v2i32 load from the stack....it starts the reverse process -- a v2i32 store BACK to a different place in the stack, and i32 load/stores copying the latter stack location onto y.a and y.b.
> 
> That seems quite sub-optimal, but I'm not sure where it comes from. It looks like just *having* the v2i32 type assigned to the register class causes that "optimization". The other stuff I implemented doesn't affect it. Why is it trying to do a conversion to v2i32 when the arguments are not properly aligned? Running "llc -debug" doesn't seem to shed any light for me.
Its probably DAGCombiner::MergeConsecutiveStores which is checking that the stores have equal alignment (and they do: 4), but not that the final store has enough alignment to be legal.  I’d breakpoint MergeStoresOfConstantsOrVecElts and see if you get a hit there.

Cheers,
Pete
> 
> (3) Pete's "not pretty" solution of using Untyped operands, and REG_SEQUENCE. I'm gonna try implementing this one next. It seems like it might be the simplest of the three, and have fewer side-effects since it doesn't introduce a new legal type at all.
> 
> 
> On Apr 2, 2015, at 8:02 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
> 
>>> 
>>> On Apr 2, 2015, at 2:07 PM, Tom Stellard <tom at stellard.net <mailto:tom at stellard.net>> wrote:
>>> 
>>> On Thu, Apr 02, 2015 at 01:35:55PM -0700, Pete Cooper wrote:
>>>> Hi James, Jim
>>>> 
>>>> If you *really* want this to work in selection DAG then there is a solution, but its not pretty.
>>>> 
>>>> First make i64 not be legal.  Then, assuming the regclass you gave has some subregs, you can give load/store a custom legalisation where you change the i64 to MVT::Untyped.  So something like this for ISD::STORE:
>>>> 
>>>> SDValue ValueToBeStored = St.getOperand(…)
>>>> 
>>>> auto SeqOps[] = { 
>>>>  DAG.getTargetConstant(SP::IntPairRegClassID, MVT::i32),
>>>>  DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, ValueToBeStored, DAG.getConstant(0, MVT::i32)),
>>>>  DAG.getTargetConstant(SP ::sub0, MVT::i32),
>>>>  DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, ValueToBeStored, DAG.getConstant(1, MVT::i32)),
>>>>  DAG.getTargetConstant(SP ::sub1, MVT::i32)
>>>> };
>>>> 
>>>> SDValue NewValueToBeStored = DAG.getMachineNode(TargetOpcode::REG_SEQUENCE, dl, MVT::Untyped, SeqOps);
>>>> 
>>>> return DAG.getStore(…, NewValueToBeStored, …)
>>>> 
>>>> The legalizer won’t touch Untyped operands, so once you’ve changed the store operand to this, you won’t have the legalize complain later.  But this is very target specific and prone to error if you get sub regs or anything wrong.
>>>> 
>>>> Cheers,
>>>> Pete
>>> 
>>> Hi,
>>> 
>>> We have a similar issue in the R600 backend.  We need to support 128-bit
>>> (and larger) stores, but i128 is not a legal type.  The way we handle
>>> this is to make a same sized vector type (v4i32) legal and then add a
>>> loop to the TargetLowering constructor to Expand everything except loads and
>>> stores.
>>> 
>>> I'm not sure if the TypeLegalizer will promote i64 stores to v2i32, but
>>> if not I think you can add a target DAGCombine for it.
>>> 
>>> 
>>> I'm not sure if this is better or worse than Pete's suggestion, but it
>>> may be worth a try.
>> This solution is definitely less of a hack than mine :)
>> 
>> Mine has the advantage of working with weird sized types, but if you don’t need to store something like a <3 x float> then i’d go with Tom’s solution.
>> 
>> Cheers,
>> Pete
>>> 
>>> -Tom
>>> 
>>>>> On Apr 2, 2015, at 1:20 PM, Jim Grosbach <grosbach at apple.com <mailto:grosbach at apple.com>> wrote:
>>>>> 
>>>>> Hi James,
>>>>> 
>>>>> I’m not too familiar with Sparc’s design here, but you may find it useful to have a look at how the ARM target deals with this sort of thing. In particular, the load/store optimization pass which can form the LDRD/STRD instructions among other things.
>>>>> 
>>>>> -Jim
>>>>> 
>>>>>> On Apr 2, 2015, at 11:43 AM, James Y Knight <jyknight at google.com <mailto:jyknight at google.com> <mailto:jyknight at google.com <mailto:jyknight at google.com>>> wrote:
>>>>>> 
>>>>>> In http://reviews.llvm.org/D8713 <http://reviews.llvm.org/D8713> <http://reviews.llvm.org/D8713 <http://reviews.llvm.org/D8713>>, I added the 64bit integer store ("std") and load ("ldd") instructions for 32bit sparc. But now I need codegen to know how to emit them, and am not sure the best way to go about teaching the backend that 64bit integers can be used natively, but only for loads and stores.
>>>>>> 
>>>>>> (I originally wrote an earlier draft of question in the review but it seems like it may be better on the dev list, since it's more what to do next than what was done.)
>>>>>> 
>>>>>> Basically, I'd like it if C code like:
>>>>>>  long long x,y;
>>>>>>  void f() { y = x; }
>>>>>> Or equivalently, the ll code
>>>>>>  %0 = load i64, i64* @x, align 8
>>>>>>  store i64 %0, i64* @y, align 8
>>>>>> turned into just "ldd, std" instructions, as it does in GCC, rather than loading and storing the two 32bit halves of the variables separately.
>>>>>> 
>>>>>> To allow that, I tried adding:
>>>>>>  addRegisterClass(MVT::i64, &SP::IntPairRegClass)
>>>>>> to SparcTargetLowering::SparcTargetLowering in 32bit mode.
>>>>>> 
>>>>>> Doing that then makes load/store work. But it causes llvm to try to use i64 operations for *everything*, which of course fails for all other operations, since there's no such instruction pattern for them.
>>>>>> 
>>>>>> Okay, so I then try setting all the operations to "Expand" via:
>>>>>>    for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op) {
>>>>>>      setOperationAction(Op, MVT::i64, Expand);
>>>>>>    }
>>>>>> 
>>>>>> This does not actually work properly. E.g.
>>>>>>  %res = add nsw i64 %0, 3
>>>>>> gives an error:
>>>>>>  LLVM ERROR: Cannot select: 0x43b1df0: i64 = add 0x43b1bd0, 0x43b1ce0 [ORD=3] [ID=15]
>>>>>> ...
>>>>>> Which seems odd -- it appears as if it ignored the Expand and just tried to continue onto selection for 64bit add as if it had been legal?
>>>>>> 
>>>>>> And a different example:
>>>>>>  %res = shl i64 %0, 1
>>>>>> aborts with error:
>>>>>>  llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:3988: void {anonymous}::SelectionDAGLegalize::ExpandNode(llvm::SDNode*): Assertion `VT.isVector() && "Unable to legalize non-vector shift"' failed.
>>>>>> 
>>>>>> So, it seems to me that a whole bunch of other code will need to be added or modified, too, to teach LLVM how to expand i64 to i32 operations for everything other than load/store, if I continue down this path -- despite that it can do so just fine if i64 doesn't have a register class. This seems ugly...
>>>>>> 
>>>>>> Is there some other better way to notate the case that ONLY loads/stores can be done on 64bit integers, but that everything else must be done as if 64bit integers didn't exist?
>>>>>> 
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> <http://llvm.cs.uiuc.edu/ <http://llvm.cs.uiuc.edu/>>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>> 
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>>> 
>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150406/f99bf47d/attachment.html>