[PATCH] D30416: [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization.
Friedman, Eli via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 8 14:11:35 PST 2017
On 3/8/2017 12:26 PM, Wei Mi wrote:
> Eli, thank you for pointing me to the thread. Very helpful to
> understand the background.
>
> Now I have better understanding about the problem. The change in the
> thread wrapping consecutive bitfields into a group has the benefit for
> loads of different bitfields to be merged. If the bitfield acceses are
> not legal types like i8/i16/i32, the merge will be helpful. If the
> bitfield acceses are legal type, we want to do the shrinking here.
>
> However, for bitfield load/store merge (earlycse, gvn and even
> instcombine can do such merge, but gvn does it more thoroughly) and
> legal type bitfield accesses shrinking, no matter which one happens
> first, it can make the other one more difficult. Like if we do legal
> type bitfield accesses shrinking first, the shrinked load/store will
> block other bitfield load/store from the same bitfield groups to
> merge. If we do bitfield load/store merge first, we will face the
> problems described previously: 1. we need memoryssa in safety check.
> 2. The pattern matching becomes more difficult.
>
> After some thought, I think it is still better to do the bitfield
> shrinking later. To make the shrinking still effective, for the case
> below, 1. I need to use memoryssa to check there is no alias mem write
> between the load and the store below. 2. I need to extend the existing
> pattern match to handle store(or(and(or(and...or(and(load Ptr, Value),
> Value1)..., ValueN), Ptr).
>
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> .....
>
> %bf.clear1 = and i64 %bf.set, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> .....
>
> %bf.clear2 = and i64 %bf.set1, -4294901761
> %bf.set2 = or i64 %bf.value2, %bf.clear2
> store i64 %bf.set2, i64* %9, align 8
>
> For 2, we know that the extended pattern above is to load the bitfield
> group first, and the several "or+and" operations are to adjusting
> different parts of the load. If only the last "or + and" touch
> different bits with all previous "or + and" operations, it is good to
> separate the store in the end to two stores and do the shrinking for
> the second store, like below. I think it just adds more complexity to
> the existing patch, but should still be doable.
>
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> .....
>
> %bf.clear1 = and i64 %bf.set, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> store i64 %bf.set1, i64* %0, align 8
> .....
>
> %bf.value2.shrinked = shrink_op %bf.value2
> store i16 %bf.value2.shrinked, i64* %0, align 8
>
> For 1, probably I need to add a new pass for it because I cannot put
> it in CGP anymore (There is no dominator tree to use in CGP. The
> dominator tree update there is broken, which was described in
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html.
> The thread is three years ago, but from my looking at the code, the
> conclusion still holds).
>
> Comments are welcomed.
This makes sense... but please post a thread on llvmdev before you spend
too much time writing code.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
More information about the llvm-commits
mailing list