[PATCH] D30416: [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization.

Wed Mar 8 14:11:35 PST 2017

On 3/8/2017 12:26 PM, Wei Mi wrote:
> Eli, thank you for pointing me to the thread. Very helpful to 
> understand the background.
>
> Now I have better understanding about the problem. The change in the 
> thread wrapping consecutive bitfields into a group has the benefit for 
> loads of different bitfields to be merged. If the bitfield acceses are 
> not legal types like i8/i16/i32, the merge will be helpful. If the 
> bitfield acceses are legal type, we want to do the shrinking here.
>
> However, for bitfield load/store merge (earlycse, gvn and even 
> instcombine can do such merge, but gvn does it more thoroughly) and 
> legal type bitfield accesses shrinking, no matter which one happens 
> first, it can make the other one more difficult. Like if we do legal 
> type bitfield accesses shrinking first, the shrinked load/store will 
> block other bitfield load/store from the same bitfield groups to 
> merge. If we do bitfield load/store merge first, we will face the 
> problems described previously: 1. we need memoryssa in safety check. 
> 2. The pattern matching becomes more difficult.
>
> After some thought, I think it is still better to do the bitfield 
> shrinking later. To make the shrinking still effective, for the case 
> below, 1. I need to use memoryssa to check there is no alias mem write 
> between the load and the store below. 2. I need to extend the existing 
> pattern match to handle store(or(and(or(and...or(and(load Ptr, Value), 
> Value1)..., ValueN), Ptr).
>
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> .....
>
> %bf.clear1 = and i64 %bf.set, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> .....
>
> %bf.clear2 = and i64 %bf.set1, -4294901761
> %bf.set2 = or i64 %bf.value2, %bf.clear2
> store i64 %bf.set2, i64* %9, align 8
>
> For 2, we know that the extended pattern above is to load the bitfield 
> group first, and the several "or+and" operations are to adjusting 
> different parts of the load. If only the last "or + and" touch 
> different bits with all previous "or + and" operations, it is good to 
> separate the store in the end to two stores and do the shrinking for 
> the second store, like below. I think it just adds more complexity to 
> the existing patch, but should still be doable.
>
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> .....
>
> %bf.clear1 = and i64 %bf.set, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> store i64 %bf.set1, i64* %0, align 8
> .....
>
> %bf.value2.shrinked = shrink_op %bf.value2
> store i16 %bf.value2.shrinked, i64* %0, align 8
>
> For 1, probably I need to add a new pass for it because I cannot put 
> it in CGP anymore (There is no dominator tree to use in CGP. The 
> dominator tree update there is broken, which was described in 
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html. 
> The thread is three years ago, but from my looking at the code, the 
> conclusion still holds).
>
> Comments are welcomed.

This makes sense... but please post a thread on llvmdev before you spend 
too much time writing code.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project