[PATCH] D30416: [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization.

Tue Mar 7 10:39:40 PST 2017

You can call DataLayout::isLegalInteger() to verify the type you're
shrinking to is reasonable (InstCombine already does this, e.g.
InstCombiner::shouldChangeType()).
This isn't quite the same thing as querying TLI, but...

On Tue, Mar 7, 2017 at 9:36 AM, Friedman, Eli <efriedma at codeaurora.org>
wrote:

> On 3/7/2017 9:15 AM, Wei Mi wrote:
>
>
>
> On Mon, Mar 6, 2017 at 8:50 PM, Wei Mi <wmi at google.com> wrote:
>
>>
>>
>> On Mon, Mar 6, 2017 at 4:36 PM, Friedman, Eli <efriedma at codeaurora.org>
>> wrote:
>>
>>> On 3/6/2017 11:06 AM, Wei Mi wrote:
>>>
>>>> After looking at the benchmark motivating the test, I find it is much
>>>> harder for codegenprepare to catch all the shrinking opportunities compared
>>>> with instcombine, because of two things:
>>>>
>>>> * loadpre may merge the original load in a shrinking pattern with other
>>>> load in a very early position, and there maybe other stores in between. The
>>>> safety check for the codegenprepare now simply scan mayWriteToMemory insns
>>>> between load and store in the candidate shrinking pattern, and it will fail
>>>> because of those stores in between.
>>>>
>>>
>>> It would be easy to make the check a lot more powerful using MemorySSA;
>>> I guess you don't have access to that in the passes where you're doing the
>>> transform?
>>
>>
>> Yes, if MemorySSA + alias query can be used, it should solve the problem.
>> I didn't find existing usage of MemorySSA in instcombine or CGP so I am not
>> sure whether it can be used.
>>
>>
>>>
>>>
>>> * we may see the following pattern. a.f1 is a bitfield. Ideally, we
>>>> should do the shrinking for all the three stores. But codegenprepare works
>>>> in top-down order. After the first store is shrinked, it is hard for the
>>>> second and the third to know %bf.set is the same as the value loaded from
>>>> %0. If we do this in instcombine, it is much easier because before
>>>> load/store elimination, %bf.set will be loaded from %0 respecitively in
>>>> if.then9 and in if.end41.
>>>>
>>>
>>> I'm not following how it helps to do this in instcombine?  Normally,
>>> EarlyCSE runs first.  (A complete C testcase might be useful here.)
>>
>>
>> It is not earlycse. It is also loadpre to cause the load to be replaced
>> by %bf.set.
>>
>> Before global value numbering:
>>
>> entry:
>>   %0 = bitcast %class.B* %this to i64*
>>   %bf.load = load i64, i64* %0, align 8
>>   %dec = add i64 %bf.load, 65535
>>   %bf.value = and i64 %dec, 65535
>>   %bf.clear5 = and i64 %bf.load, -65536
>>   %bf.set = or i64 %bf.value, %bf.clear5
>>   store i64 %bf.set, i64* %0, align 8
>>   ...
>> if.then9:
>>   %bf.load1 = load i64, i64* %0, align 8                      //
>> %bf.load1 is helpful for the patch to recognize the load-and-or-store
>> pattern.
>>   %inc79 = add i64 %bf.load1, 281474976710656
>>   %bf.shl = and i64 %inc79, 71776119061217280
>>   %bf.clear18 = and i64 %bf.load1, -71776119061217281
>>   %bf.set19 = or i64 %bf.shl, %bf.clear18
>>   store i64 %bf.set19, i64* %0, align 8
>>   ...
>> if.end41:
>>   %bf.load2 = load i64, i64* %0, align 8                      //
>> %bf.load2 is helpful for the patch to recognize the load-and-or-store
>> pattern.
>>   %inc4578 = add i64 %bf.load2, 65536
>>   %bf.shl48 = and i64 %inc4578, 4294901760
>>   %bf.clear49 = and i64 %bf.load2, -4294901761
>>   %bf.set50 = or i64 %bf.shl48, %bf.clear49
>>   store i64 %bf.set50, i64* %0, align 8
>>
>> After global value numbering:
>>
>> entry:
>>   %0 = bitcast %class.B* %this to i64*
>>   %bf.load = load i64, i64* %0, align 8
>>   %dec = add i64 %bf.load, 65535
>>   %bf.value = and i64 %dec, 65535
>>   %bf.clear5 = and i64 %bf.load, -65536
>>   %bf.set = or i64 %bf.value, %bf.clear5
>>   store i64 %bf.set, i64* %0, align 8
>>   ...
>> if.then9:
>>   %inc79 = add i64 %bf.set, 281474976710656       // we hope to know
>> %bf.set is the same as load i64, i64* %0, align 8.
>>   %bf.shl = and i64 %inc79, 71776119061217280
>>   %bf.clear18 = and i64 %bf.set, -71776119061217281
>>   %bf.set19 = or i64 %bf.shl, %bf.clear18
>>   store i64 %bf.set19, i64* %0, align 8
>>   ...
>> if.end41:
>>   %inc4578 = add i64 %bf.set, 65536                        // we hope to
>> know %bf.set is the same as load i64, i64* %0, align 8.
>>   %bf.shl48 = and i64 %inc4578, 4294901760
>>   %bf.clear49 = and i64 %bf.set, -4294901761
>>   %bf.set50 = or i64 %bf.shl48, %bf.clear49
>>   store i64 %bf.set50, i64* %0, align 8
>>
>
> I don't like depending on pass ordering like this... I mean, I can't
> really say without the original C testcase, but at first glance, EarlyCSE
> should be catching this, and you're just getting lucky.
>
>  Or can I move all the types of shrinking to instcombine? Since I only do
> the shrinking to i8, i16, i32, I think almost all the general architectures
> support i8/i16/i32 load/store directly?
>
> It should be safe to assume every architecture has a reasonable lowering
> for i8, i16, and i32 load/store.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170307/d44ecc5a/attachment.html>