<div dir="ltr"><div>Eli, thank you for pointing me to the thread. Very helpful to understand the background.<br></div><div><br></div><div>Now I have better understanding about the problem. The change in the thread wrapping consecutive bitfields into a group has the benefit for loads of different bitfields to be merged. If the bitfield acceses are not legal types like i8/i16/i32, the merge will be helpful. If the bitfield acceses are legal type, we want to do the shrinking here. </div><div><br></div><div>However, for bitfield load/store merge (earlycse, gvn and even instcombine can do such merge, but gvn does it more thoroughly) and legal type bitfield accesses shrinking, no matter which one happens first, it can make the other one more difficult. Like if we do legal type bitfield accesses shrinking first, the shrinked load/store will block other bitfield load/store from the same bitfield groups to merge. If we do bitfield load/store merge first, we will face the problems described previously: 1. we need memoryssa in safety check. 2. The pattern matching becomes more difficult. </div><div><br></div><div>After some thought, I think it is still better to do the bitfield shrinking later. To make the shrinking still effective, for the case below, 1. I need to use memoryssa to check there is no alias mem write between the load and the store below. 2. I need to extend the existing pattern match to handle store(or(and(or(and...or(and(load Ptr, Value), Value1)..., ValueN), Ptr).</div><div><br></div><div>%bf.load = load i64, i64* %0, align 8<br></div><div><div>%bf.clear = and i64 %bf.load, -65536</div><div>%bf.set = or i64 %bf.value, %bf.clear</div></div><div>.....</div><div><br></div><div><div>%bf.clear1 = and i64 %bf.set, -4294901761</div><div>%bf.set1 = or i64 %bf.value1, %bf.clear1</div></div><div>.....</div><div><br></div><div><div>%bf.clear2 = and i64 %bf.set1, -4294901761</div><div>%bf.set2 = or i64 %bf.value2, %bf.clear2</div><div>store i64 %bf.set2, i64* %9, align 8</div></div><div><br></div><div>For 2, we know that the extended pattern above is to load the bitfield group first, and the several "or+and" operations are to adjusting different parts of the load. If only the last "or + and" touch different bits with all previous "or + and" operations, it is good to separate the store in the end to two stores and do the shrinking for the second store, like below. I think it just adds more complexity to the existing patch, but should still be doable.</div><div><br></div><div><div>%bf.load = load i64, i64* %0, align 8<br></div><div><div>%bf.clear = and i64 %bf.load, -65536</div><div>%bf.set = or i64 %bf.value, %bf.clear</div></div><div>.....</div><div><br></div><div><div>%bf.clear1 = and i64 %bf.set, -4294901761</div><div>%bf.set1 = or i64 %bf.value1, %bf.clear1</div></div><div>store i64 %bf.set1, i64* %0, align 8<br></div><div>.....</div><div><br></div><div><div>%bf.value2.shrinked = shrink_op %bf.value2 <br></div><div>store i16 %bf.value2.shrinked, i64* %0, align 8</div></div></div><div><br></div><div>For 1, probably I need to add a new pass for it because I cannot put it in CGP anymore (There is no dominator tree to use in CGP. The dominator tree update there is broken, which was described in <a href="http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html">http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html</a>. The thread is three years ago, but from my looking at the code, the conclusion still holds).</div><div><br></div><div>Comments are welcomed.</div><div><br></div><div>Thanks,</div><div>Wei.</div><div><br></div><div><div><br></div><div> </div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 7, 2017 at 3:26 PM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
<div class="m_-565701091899102164moz-cite-prefix">On 3/7/2017 1:57 PM, Wei Mi wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Mar 7, 2017 at 10:51 AM, Wei
Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div class="m_-565701091899102164h5">On Tue, Mar 7, 2017 at 9:36 AM,
Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>
<div class="m_-565701091899102164m_-182419626503815813gmail-h5">
<div class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899moz-cite-prefix">On
3/7/2017 9:15 AM, Wei Mi wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon,
Mar 6, 2017 at 8:50 PM, Wei Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span>On
Mon, Mar 6, 2017 at
4:36 PM, Friedman, Eli
<span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>
wrote:<br>
</span><span>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-">On
3/6/2017 11:06 AM,
Wei Mi wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
After looking at
the benchmark
motivating the
test, I find it
is much harder
for
codegenprepare
to catch all the
shrinking
opportunities
compared with
instcombine,
because of two
things:<br>
<br>
* loadpre may
merge the
original load in
a shrinking
pattern with
other load in a
very early
position, and
there maybe
other stores in
between. The
safety check for
the
codegenprepare
now simply scan
mayWriteToMemory
insns between
load and store
in the candidate
shrinking
pattern, and it
will fail
because of those
stores in
between.<br>
</blockquote>
<br>
</span> It would be
easy to make the
check a lot more
powerful using
MemorySSA; I guess
you don't have
access to that in
the passes where
you're doing the
transform?</blockquote>
<div><br>
</div>
</span>
<div>Yes, if MemorySSA +
alias query can be
used, it should solve
the problem. I didn't
find existing usage of
MemorySSA in
instcombine or CGP so
I am not sure whether
it can be used. </div>
<span>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* we may see the
following
pattern. a.f1 is
a bitfield.
Ideally, we
should do the
shrinking for
all the three
stores. But
codegenprepare
works in
top-down order.
After the first
store is
shrinked, it is
hard for the
second and the
third to know
%bf.set is the
same as the
value loaded
from %0. If we
do this in
instcombine, it
is much easier
because before
load/store
elimination,
%bf.set will be
loaded from %0
respecitively in
if.then9 and in
if.end41.<br>
</blockquote>
<br>
</span> I'm not
following how it
helps to do this in
instcombine?
Normally, EarlyCSE
runs first. (A
complete C testcase
might be useful
here.)</blockquote>
<div><br>
</div>
</span>
<div>It is not earlycse.
It is also loadpre to
cause the load to be
replaced by %bf.set. </div>
<div><br>
</div>
<div>
<div style="font-size:12.8px">Before
global value
numbering:</div>
<span>
<div style="font-size:12.8px"><br class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-Apple-interchange-newline">
entry:</div>
<div style="font-size:12.8px">
%0 = bitcast
%class.B* %this to
i64*</div>
<div style="font-size:12.8px">
%bf.load = load
i64, i64* %0,
align 8</div>
<div style="font-size:12.8px">
<div> %dec = add
i64 %bf.load,
65535</div>
<div> %bf.value =
and i64 %dec,
65535</div>
<div> %bf.clear5
= and i64
%bf.load, -65536</div>
<div> %bf.set =
or i64
%bf.value,
%bf.clear5</div>
<div> store i64
%bf.set, i64*
%0, align 8</div>
</div>
<div style="font-size:12.8px">
...</div>
<div style="font-size:12.8px">if.then9:</div>
</span>
<div style="font-size:12.8px"> <span style="font-size:12.8px">%bf.load1 = load i64, i64* %0, align 8
//
%bf.load1 is
helpful for the
patch to recognize
the
load-and-or-store
pattern.</span></div>
<div style="font-size:12.8px">
<div> %inc79 = add
i64 %bf.load1,
281474976710656
</div>
<span>
<div> %bf.shl =
and i64 %inc79,
71776119061217280</div>
</span>
<div> %bf.clear18 =
and i64 %bf.load1,
-71776119061217281</div>
<span>
<div> %bf.set19 =
or i64 %bf.shl,
%bf.clear18</div>
<div> store i64
%bf.set19, i64*
%0, align 8</div>
</span></div>
<div style="font-size:12.8px">
...</div>
<div style="font-size:12.8px">if.end41:</div>
<div style="font-size:12.8px"> <span style="font-size:12.8px">%bf.load2 = load i64, i64* %0, align 8
</span><span style="font-size:12.8px">// %bf.load2 is helpful for the patch to
recognize the
load-and-or-store
pattern.</span></div>
<div style="font-size:12.8px">
<div> %inc4578 =
add i64 %bf.load2,
65536
</div>
<span>
<div> %bf.shl48 =
and i64
%inc4578,
4294901760</div>
</span>
<div> %bf.clear49 =
and i64 %bf.load2,
-4294901761</div>
<span>
<div> %bf.set50 =
or i64
%bf.shl48,
%bf.clear49</div>
<div> store i64
%bf.set50, i64*
%0, align 8</div>
</span></div>
</div>
<div> </div>
<div>After global value
numbering:</div>
<span>
<div><br>
</div>
<div>
<div style="font-size:12.8px">entry:</div>
<div style="font-size:12.8px">
%0 = bitcast
%class.B* %this to
i64*</div>
<div style="font-size:12.8px">
%bf.load = load
i64, i64* %0,
align 8</div>
<div style="font-size:12.8px">
<div> %dec = add
i64 %bf.load,
65535</div>
<div> %bf.value =
and i64 %dec,
65535</div>
<div> %bf.clear5
= and i64
%bf.load, -65536</div>
<div> %bf.set =
or i64
%bf.value,
%bf.clear5</div>
<div> store i64
%bf.set, i64*
%0, align 8</div>
</div>
<div style="font-size:12.8px">
...</div>
<div style="font-size:12.8px">if.then9:</div>
<div style="font-size:12.8px">
<div> %inc79 =
add i64 %bf.set,
281474976710656
// we hope
to know %bf.set
is the same as
load i64, i64*
%0, align 8.</div>
<div> %bf.shl =
and i64 %inc79,
71776119061217280</div>
<div> %bf.clear18
= and i64
%bf.set,
-71776119061217281</div>
<div> %bf.set19 =
or i64 %bf.shl,
%bf.clear18</div>
<div> store i64
%bf.set19, i64*
%0, align 8</div>
</div>
<div style="font-size:12.8px">
...</div>
<div style="font-size:12.8px">if.end41:</div>
<div style="font-size:12.8px">
<div> %inc4578 =
add i64 %bf.set,
65536
//
we hope to know
%bf.set is the
same as load
i64, i64* %0,
align 8.</div>
<div> %bf.shl48 =
and i64
%inc4578,
4294901760</div>
<div> %bf.clear49
= and i64
%bf.set,
-4294901761</div>
<div> %bf.set50 =
or i64
%bf.shl48,
%bf.clear49</div>
<div> store i64
%bf.set50, i64*
%0, align 8</div>
</div>
</div>
</span></div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
I don't like depending on pass ordering like
this... I mean, I can't really say without
the original C testcase, but at first
glance, EarlyCSE should be catching this,
and you're just getting lucky.</div>
</blockquote>
<div><br>
</div>
</div>
</div>
<div>For the testcase above, EarlyCSE does't replace
the load with previous store value because there
are calls and other stores in between (in the <span style="color:rgb(84,84,84);font-family:roboto,arial,sans-serif">ellipsis</span>).
You are right, generally, unless we add another
pass of instcombine before earlycse, doing the
shrinking in instcombine after earlycse can face
the same problem. I havn't got an idea to solve it
generally except that to add another pass of
instcombine. Still thinking...</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>To catch all the bitfield shrinking opportunities,
seems the optimization should be placed before the first
pass of earlycse. But I don't think adding another pass of
instcombine before earlycse is a good idea, because I
vaguely remember we saw case requiring earlycse to happen
before instcombine. </div>
<div><br>
</div>
<div>Is it ok to add a separate pass before earlycse for
bitfield related shrinking? This is the best way I can
think of for now. Suggestions are welcomed.</div>
<br>
</div>
</div>
</div>
</blockquote>
<br></div></div>
You might want to skim the thread
<a class="m_-565701091899102164moz-txt-link-freetext" href="http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120827/063200.html" target="_blank">http://lists.llvm.org/<wbr>pipermail/cfe-commits/Week-of-<wbr>Mon-20120827/063200.html</a>
for some background on our current treatment of bitfields. <br>
<br>
I'm not sure what the right answer here looks like, given the way it
interacts with other memory transforms like GVN. Could you send an
email to llvm-dev with a description of what we're looking at, to
see if anyone else has suggestions?<span class=""><br>
<p>-Eli<br>
</p>
<pre class="m_-565701091899102164moz-signature" cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
</span></div>
</blockquote></div><br></div>