[LLVMdev] Avoiding load narrowing in DAGCombiner
Matt Johnson
johnso87 at crhc.illinois.edu
Wed Jul 27 18:07:10 PDT 2011
On 07/27/2011 05:59 PM, Eli Friedman wrote:
> On Wed, Jul 27, 2011 at 3:50 PM, Matt Johnson
> <johnso87 at crhc.illinois.edu> wrote:
>> Hi Eli,
>>
>> On 07/27/2011 04:59 PM, Eli Friedman wrote:
>>> On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson
>>> <johnso87 at crhc.illinois.edu> wrote:
>>>> Hi All,
>>>> I'm writing a backend for a target which only supports 4-byte,
>>>> 4-byte-aligned loads and stores. I custom-lower all {*EXT}LOAD and
>>>> STORE nodes in TargetISelLowering.cpp to take advantage of all alignment
>>>> information available to the backend, rather than treat each load and
>>>> store conservatively, which takes O(10) instructions. My target's
>>>> allowsUnalignedMemoryOperations() always returns 'false', and the
>>>> setOperationAction()s for i8,i16,i32 loads and stores are all 'Custom'.
>>>>
>>>> I'm running into a problem where DAGCombiner is being too clever
>>>> for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and
>>>> LowerSTORE() routines (which emit between 1 and O(10) SDValues,
>>>> depending on alignment information), and then runs DAGCombine. To lower
>>>> an i16 STORE that is known to be in the high-addressed 2 bytes of a word
>>>> on my little-endian target, I emit and LD4 from the word-aligned address
>>>> and an SRL 16 to shift the i16 into the LSbits of the register.
>>>>
>>>> DAGCombine visit()s an ISD::SRL node and notices that it is
>>>> right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and
>>>> replaces it with an LD2 from %arrayidx+2.
>>>>
>>>> Replaces
>>>> --------
>>>> 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70, 0x17f6a70<LD4[%arrayidx4]>
>>>> 0x17f94c0: i32 = Constant<16> [ORD=9] [ID=10]
>>>> 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0
>>>>
>>>> With
>>>> ----
>>>> 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00,
>>>> 0x17f6a70<LD2[%arrayidx4+2], zext from i16>
>>>>
>>>> That seems like a logical thing to do on a lot of targets, but in my
>>>> case, the LD4->SRL combo was emitted precisely because I *don't* want a
>>>> ZEXTLOAD i16. I'm wondering if there is:
>>>>
>>>> a) A way to turn off this optimization in DAGCombine if your target
>>>> doesn't support LD2 natively
>>> This.
>>>
>>> I think you're talking about DAGCombiner::ReduceLoadWidth... and the
>>> "if (LegalOperations&& !TLI.isLoadExtLegal(ExtType, ExtVT))" is
>>> supposed to ensure that the transformation in question is safe. That
>>> said, IIRC I fixed a bug there recently; are you using trunk?
>> Perfect! I'm using 2.8 for now (am hoping to roll forward to trunk and stay
>> there in a month or two), and 2.8 only had that check for SIGN_EXTEND_INREG,
>> not SRL or others. The bugfix was in r124587.
>>
>> I'm now seeing a similar problem with SimplifyDemandedBits() on a
>> 4-byte-aligned i8 load. LowerLOAD() emits an aligned LD4 followed by an AND
>> with constant 0xFF. SimplifyDemandedBits() sees this and changes it to a
>> zero-extended LD1. Is this the same situation, where a bugfix was made
>> after 2.8? Any idea where to look?
> The following?
>
> // If the LHS is '(and load, const)', the RHS is 0,
> // the test is for equality or unsigned, and all 1 bits of the const are
> // in the same partial word, see if we can shorten the load.
> if (DCI.isBeforeLegalize()&&
> N0.getOpcode() == ISD::AND&& C1 == 0&&
> N0.getNode()->hasOneUse()&&
> isa<LoadSDNode>(N0.getOperand(0))&&
> N0.getOperand(0).getNode()->hasOneUse()&&
> isa<ConstantSDNode>(N0.getOperand(1))) {
> LoadSDNode *Lod = cast<LoadSDNode>(N0.getOperand(0));
> APInt bestMask;
> [...]
>
It turned out that this problem is fixed by (your) r135462.
isLoadExtLegal() returned true if the extload was marked 'Custom', which
is a problem post-legalize because you can't go back and do your custom
lowering :) It sounds like that patch was motivated by CellSPU, which
isn't surprising since I'm in much the same boat w.r.t. custom
load/store lowering.
On a side note, are there other consumers of isLoadExt/TruncStoreLegal()
that run pre-legalize and *do* want 'true' when the extload/truncstore
is 'Custom'?
> This shouldn't ever do the wrong thing because of the
> DCI.isBeforeLegalize() check, although the result might be less than
> ideal.
>
> Note that in general, this stuff only gets very lightly tested.
>
Fair enough. Hopefully I'll be able to come up with some bugfixes of my
own once I roll the backend forward :)
> -Eli
Thanks again,
-Matt
More information about the llvm-dev
mailing list