[LLVMdev] Avoiding load narrowing in DAGCombiner

Wed Jul 27 18:07:10 PDT 2011

On 07/27/2011 05:59 PM, Eli Friedman wrote:
> On Wed, Jul 27, 2011 at 3:50 PM, Matt Johnson
> <johnso87 at crhc.illinois.edu>  wrote:
>> Hi Eli,
>>
>> On 07/27/2011 04:59 PM, Eli Friedman wrote:
>>> On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson
>>> <johnso87 at crhc.illinois.edu>    wrote:
>>>> Hi All,
>>>>      I'm writing a backend for a target which only supports 4-byte,
>>>> 4-byte-aligned loads and stores.  I custom-lower all {*EXT}LOAD and
>>>> STORE nodes in TargetISelLowering.cpp to take advantage of all alignment
>>>> information available to the backend, rather than treat each load and
>>>> store conservatively, which takes O(10) instructions.  My target's
>>>> allowsUnalignedMemoryOperations() always returns 'false', and the
>>>> setOperationAction()s for i8,i16,i32 loads and stores are all 'Custom'.
>>>>
>>>>      I'm running into a problem where DAGCombiner is being too clever
>>>> for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and
>>>> LowerSTORE() routines (which emit between 1 and O(10) SDValues,
>>>> depending on alignment information), and then runs DAGCombine.  To lower
>>>> an i16 STORE that is known to be in the high-addressed 2 bytes of a word
>>>> on my little-endian target, I emit and LD4 from the word-aligned address
>>>> and an SRL 16 to shift the i16 into the LSbits of the register.
>>>>
>>>> DAGCombine visit()s an ISD::SRL node and notices that it is
>>>> right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and
>>>> replaces it with an LD2 from %arrayidx+2.
>>>>
>>>> Replaces
>>>> --------
>>>> 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70, 0x17f6a70<LD4[%arrayidx4]>
>>>> 0x17f94c0: i32 = Constant<16>    [ORD=9] [ID=10]
>>>> 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0
>>>>
>>>> With
>>>> ----
>>>> 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00,
>>>> 0x17f6a70<LD2[%arrayidx4+2], zext from i16>
>>>>
>>>> That seems like a logical thing to do on a lot of targets, but in my
>>>> case, the LD4->SRL combo was emitted precisely because I *don't* want a
>>>> ZEXTLOAD i16.  I'm wondering if there is:
>>>>
>>>> a) A way to turn off this optimization in DAGCombine if your target
>>>> doesn't support LD2 natively
>>> This.
>>>
>>> I think you're talking about DAGCombiner::ReduceLoadWidth... and the
>>> "if (LegalOperations&&    !TLI.isLoadExtLegal(ExtType, ExtVT))" is
>>> supposed to ensure that the transformation in question is safe.  That
>>> said, IIRC I fixed a bug there recently; are you using trunk?
>> Perfect!  I'm using 2.8 for now (am hoping to roll forward to trunk and stay
>> there in a month or two), and 2.8 only had that check for SIGN_EXTEND_INREG,
>> not SRL or others.  The bugfix was in r124587.
>>
>> I'm now seeing a similar problem with SimplifyDemandedBits() on a
>> 4-byte-aligned i8 load.  LowerLOAD() emits an aligned LD4 followed by an AND
>> with constant 0xFF.  SimplifyDemandedBits() sees this and changes it to a
>> zero-extended LD1.  Is this the same situation, where a bugfix was made
>> after 2.8?  Any idea where to look?
> The following?
>
>      // If the LHS is '(and load, const)', the RHS is 0,
>      // the test is for equality or unsigned, and all 1 bits of the const are
>      // in the same partial word, see if we can shorten the load.
>      if (DCI.isBeforeLegalize()&&
>          N0.getOpcode() == ISD::AND&&  C1 == 0&&
>          N0.getNode()->hasOneUse()&&
>          isa<LoadSDNode>(N0.getOperand(0))&&
>          N0.getOperand(0).getNode()->hasOneUse()&&
>          isa<ConstantSDNode>(N0.getOperand(1))) {
>        LoadSDNode *Lod = cast<LoadSDNode>(N0.getOperand(0));
>        APInt bestMask;
> [...]
>
It turned out that this problem is fixed by (your) r135462.  
isLoadExtLegal() returned true if the extload was marked 'Custom', which 
is a problem post-legalize because you can't go back and do your custom 
lowering :)  It sounds like that patch was motivated by CellSPU, which 
isn't surprising since I'm in much the same boat w.r.t. custom 
load/store lowering.

On a side note, are there other consumers of isLoadExt/TruncStoreLegal() 
that run pre-legalize and *do* want 'true' when the extload/truncstore 
is 'Custom'?

> This shouldn't ever do the wrong thing because of the
> DCI.isBeforeLegalize() check, although the result might be less than
> ideal.
>
> Note that in general, this stuff only gets very lightly tested.
>
Fair enough.  Hopefully I'll be able to come up with some bugfixes of my 
own once I roll the backend forward :)
> -Eli
Thanks again,
-Matt