[llvm-dev] [RFC] bitfield access shrinking

Thu Mar 9 10:28:23 PST 2017

We could add intrinsics to extract/insert a bitfield, which would 
simplify a lot of that bitwise logic.

-Krzysztof

On 3/9/2017 12:14 PM, Wei Mi via llvm-dev wrote:
> In http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120827/063200.html,
> consecutive bitfields are wrapped as a group and represented as a
> large integer and emits loads stores and bit operations appropriate
> for extracting bits from within it. It fixes the problem of violating
> C++11 memory model that original widen load/store of bitfield was
> facing. It also brings more coalescing opportunities across bitfields.
>
> If some bitfields are natural aligned and their num of bits can form a
> legal integer type that the target supports, it is more efficient to
> access them directly than doing a large integer load/store plus a
> series of bit operations. We call such reverse transformation legal
> type bitfield shrinking. Currently, llvm depends on DAGCombiner to do
> such shrinking.
>
> However, DAGCombiner has the one-basic-block-a-time limitation, so we
> started to implement a new shrinking optimization in
> https://reviews.llvm.org/D30416, and initially put it in instcombine,
> then moved it to CGP because we want to use some TargetLowering
> information.
>
> The initial implementation in D30416 only supports load-and-or-store
> pattern matching, and it uses a inst-by-inst scan as a safety check to
> make sure there is no other memory write insn between the load and
> store (no alias query is done).  After getting the initial
> implementation, we found more problems: EarlyCSE, LoadPRE and even
> InstCombine itself can do coalescing before the shrinking (LoadPRE
> does it the most thoroughly).
> The coalescing can move the load many BasicBlocks earlier and make
> simple inst-by-inst scan unscalable and oftentimes fail. It also
> breaks the load-and-or-store pattern. An example is below:
>
> Before coalescing done by earlycse/loadpre:
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> store i64 %bf.set2, i64* %9, align 8
> .....
> %bf.load1 = load i64, i64* %0, align 8
> %bf.clear1 = and i64 %bf.load1, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> store i64 %bf.set2, i64* %9, align 8
> .....
> %bf.load2 = load i64, i64* %0, align 8
> %bf.clear2 = and i64 %bf.load2, -4294901761
> %bf.set2 = or i64 %bf.value2, %bf.clear2
> store i64 %bf.set2, i64* %9, align 8
>
> After coalescing, it will become:
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> .....
> %bf.clear1 = and i64 %bf.set, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> .....
> %bf.clear2 = and i64 %bf.set1, -4294901761
> %bf.set2 = or i64 %bf.value2, %bf.clear2
> store i64 %bf.set2, i64* %9, align 8
>
> After load-store coalescing, %bf.load now is far away from the store,
> and the previous load-and-or-store pattern disappears.
>
> A simple idea to fix it is to move the shrinking in a very early pass
> before the first pass of EarlyCSE. However, if we move shrinking
> earlier, it is possible to block the coalescing of other ilegal type
> bitfields which can not be shrinked. So for coalescing and shrinking,
> no matter which one is first, it will block the other one.
>
> After some discussions with Eli and Michael, I come up with another
> idea to let shrinking stay in the late pipeline. It needs two changes
> to the current patch:
>
> 1. extending the pattern match to handle store(or(and(or(and...))
> pattern above. It needs to analyze the bit mask of every and-or pairs.
> If the last and-or pair touch different section with the other and-or
> pairs, we can split the original store into two, and do the shrinking
> for the second store, like below
>
> %bf.load = load i64, i64* %0, align 8
> %bf.clear = and i64 %bf.load, -65536
> %bf.set = or i64 %bf.value, %bf.clear
> .....
>
> %bf.clear1 = and i64 %bf.set, -4294901761
> %bf.set1 = or i64 %bf.value1, %bf.clear1
> store i64 %bf.set1, i64* %0, align 8                         // the first store.
> .....
>
> %bf.value2.shrinked = shrink_op %bf.value2
> store i16 %bf.value2.shrinked, i64* %0, align 8       // shrinked store.
>
> 2. use memoryssa + alias query to do the safety check. Because
> dominator tree is not properly updated in CGP, I have to create a new
> pass and put it before CGP, in order to use memoryssa.
>
> Eli suggested me to ask for more opinions before start writting code.
> I think it is a good idea and here is the post. Comments are
> appreciated.
>
> Thanks,
> Wei.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation