<div dir="ltr"><div>Eli, thank you for pointing me to the thread. Very helpful to understand the background.<br></div><div><br></div><div>Now I have better understanding about the problem. The change in the thread wrapping consecutive bitfields into a group has the benefit for loads of different bitfields to be merged. If the bitfield acceses are not legal types like i8/i16/i32, the merge will be helpful. If the bitfield acceses are legal type, we want to do the shrinking here. </div><div><br></div><div>However, for bitfield load/store merge (earlycse, gvn and even instcombine can do such merge, but gvn does it more thoroughly) and legal type bitfield accesses shrinking, no matter which one happens first, it can make the other one more difficult. Like if we do legal type bitfield accesses shrinking first, the shrinked load/store will block other bitfield load/store from the same bitfield groups to merge. If we do bitfield load/store merge first, we will face the problems described previously: 1. we need memoryssa in safety check. 2. The pattern matching becomes more difficult. </div><div><br></div><div>After some thought, I think it is still better to do the bitfield shrinking later. To make the shrinking still effective, for the case below, 1. I need to use memoryssa to check there is no alias mem write between the load and the store below. 2. I need to extend the existing pattern match to handle store(or(and(or(and...or(and(load Ptr, Value), Value1)..., ValueN), Ptr).</div><div><br></div><div>%bf.load = load i64, i64* %0, align 8<br></div><div><div>%bf.clear = and i64 %bf.load, -65536</div><div>%bf.set = or i64 %bf.value, %bf.clear</div></div><div>.....</div><div><br></div><div><div>%bf.clear1 = and i64 %bf.set, -4294901761</div><div>%bf.set1 = or i64 %bf.value1, %bf.clear1</div></div><div>.....</div><div><br></div><div><div>%bf.clear2 = and i64 %bf.set1, -4294901761</div><div>%bf.set2 = or i64 %bf.value2, %bf.clear2</div><div>store i64 %bf.set2, i64* %9, align 8</div></div><div><br></div><div>For 2, we know that the extended pattern above is to load the bitfield group first, and the several "or+and" operations are to adjusting different parts of the load. If only the last "or + and" touch different bits with all previous "or + and" operations, it is good to separate the store in the end to two stores and do the shrinking for the second store, like below. I think it just adds more complexity to the existing patch, but should still be doable.</div><div><br></div><div><div>%bf.load = load i64, i64* %0, align 8<br></div><div><div>%bf.clear = and i64 %bf.load, -65536</div><div>%bf.set = or i64 %bf.value, %bf.clear</div></div><div>.....</div><div><br></div><div><div>%bf.clear1 = and i64 %bf.set, -4294901761</div><div>%bf.set1 = or i64 %bf.value1, %bf.clear1</div></div><div>store i64 %bf.set1, i64* %0, align 8<br></div><div>.....</div><div><br></div><div><div>%bf.value2.shrinked = shrink_op %bf.value2 <br></div><div>store i16 %bf.value2.shrinked, i64* %0, align 8</div></div></div><div><br></div><div>For 1, probably I need to add a new pass for it because I cannot put it in CGP anymore (There is no dominator tree to use in CGP. The dominator tree update there is broken, which was described in <a href="http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html">http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html</a>. The thread is three years ago, but from my looking at the code, the conclusion still holds).</div><div><br></div><div>Comments are welcomed.</div><div><br></div><div>Thanks,</div><div>Wei.</div><div><br></div><div><div><br></div><div> </div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 7, 2017 at 3:26 PM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">

    <div class="m_-565701091899102164moz-cite-prefix">On 3/7/2017 1:57 PM, Wei Mi wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Tue, Mar 7, 2017 at 10:51 AM, Wei

            Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr"><br>

                <div class="gmail_extra"><br>

                  <div class="gmail_quote">

                    <div>

                      <div class="m_-565701091899102164h5">On Tue, Mar 7, 2017 at 9:36 AM,

                        Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>

                        wrote:<br>

                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                          <div bgcolor="#FFFFFF">

                            <div>

                              <div class="m_-565701091899102164m_-182419626503815813gmail-h5">

                                <div class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899moz-cite-prefix">On

                                  3/7/2017 9:15 AM, Wei Mi wrote:<br>

                                </div>

                                <blockquote type="cite">

                                  <div dir="ltr"><br>

                                    <div class="gmail_extra"><br>

                                      <div class="gmail_quote">On Mon,

                                        Mar 6, 2017 at 8:50 PM, Wei Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>

                                        wrote:<br>

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div dir="ltr"><br>

                                            <div class="gmail_extra"><br>

                                              <div class="gmail_quote"><span>On

                                                  Mon, Mar 6, 2017 at

                                                  4:36 PM, Friedman, Eli

                                                  <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>

                                                  wrote:<br>

                                                </span><span>

                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-">On

                                                      3/6/2017 11:06 AM,

                                                      Wei Mi wrote:<br>

                                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                        After looking at

                                                        the benchmark

                                                        motivating the

                                                        test, I find it

                                                        is much harder

                                                        for

                                                        codegenprepare

                                                        to catch all the

                                                        shrinking

                                                        opportunities

                                                        compared with

                                                        instcombine,

                                                        because of two

                                                        things:<br>

                                                        <br>

                                                        * loadpre may

                                                        merge the

                                                        original load in

                                                        a shrinking

                                                        pattern with

                                                        other load in a

                                                        very early

                                                        position, and

                                                        there maybe

                                                        other stores in

                                                        between. The

                                                        safety check for

                                                        the

                                                        codegenprepare

                                                        now simply scan

                                                        mayWriteToMemory

                                                        insns between

                                                        load and store

                                                        in the candidate

                                                        shrinking

                                                        pattern, and it

                                                        will fail

                                                        because of those

                                                        stores in

                                                        between.<br>

                                                      </blockquote>

                                                      <br>

                                                    </span> It would be

                                                    easy to make the

                                                    check a lot more

                                                    powerful using

                                                    MemorySSA; I guess

                                                    you don't have

                                                    access to that in

                                                    the passes where

                                                    you're doing the

                                                    transform?</blockquote>

                                                  <div><br>

                                                  </div>

                                                </span>

                                                <div>Yes, if MemorySSA +

                                                  alias query can be

                                                  used, it should solve

                                                  the problem. I didn't

                                                  find existing usage of

                                                  MemorySSA in

                                                  instcombine or CGP so

                                                  I am not sure whether

                                                  it can be used. </div>

                                                <span>

                                                  <div> </div>

                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-"><br>

                                                      <br>

                                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                        * we may see the

                                                        following

                                                        pattern. a.f1 is

                                                        a bitfield.

                                                        Ideally, we

                                                        should do the

                                                        shrinking for

                                                        all the three

                                                        stores. But

                                                        codegenprepare

                                                        works in

                                                        top-down order.

                                                        After the first

                                                        store is

                                                        shrinked, it is

                                                        hard for the

                                                        second and the

                                                        third to know

                                                        %bf.set is the

                                                        same as the

                                                        value loaded

                                                        from %0. If we

                                                        do this in

                                                        instcombine, it

                                                        is much easier

                                                        because before

                                                        load/store

                                                        elimination,

                                                        %bf.set will be

                                                        loaded from %0

                                                        respecitively in

                                                        if.then9 and in

                                                        if.end41.<br>

                                                      </blockquote>

                                                      <br>

                                                    </span> I'm not

                                                    following how it

                                                    helps to do this in

                                                    instcombine? 

                                                    Normally, EarlyCSE

                                                    runs first.  (A

                                                    complete C testcase

                                                    might be useful

                                                    here.)</blockquote>

                                                  <div><br>

                                                  </div>

                                                </span>

                                                <div>It is not earlycse.

                                                  It is also loadpre to

                                                  cause the load to be

                                                  replaced by %bf.set. </div>

                                                <div><br>

                                                </div>

                                                <div>

                                                  <div style="font-size:12.8px">Before

                                                    global value

                                                    numbering:</div>

                                                  <span>

                                                    <div style="font-size:12.8px"><br class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-Apple-interchange-newline">

                                                      entry:</div>

                                                    <div style="font-size:12.8px"> 

                                                      %0 = bitcast

                                                      %class.B* %this to

                                                      i64*</div>

                                                    <div style="font-size:12.8px"> 

                                                      %bf.load = load

                                                      i64, i64* %0,

                                                      align 8</div>

                                                    <div style="font-size:12.8px">

                                                      <div>  %dec = add

                                                        i64 %bf.load,

                                                        65535</div>

                                                      <div>  %bf.value =

                                                        and i64 %dec,

                                                        65535</div>

                                                      <div>  %bf.clear5

                                                        = and i64

                                                        %bf.load, -65536</div>

                                                      <div>  %bf.set =

                                                        or i64

                                                        %bf.value,

                                                        %bf.clear5</div>

                                                      <div>  store i64

                                                        %bf.set, i64*

                                                        %0, align 8</div>

                                                    </div>

                                                    <div style="font-size:12.8px"> 

                                                      ...</div>

                                                    <div style="font-size:12.8px">if.then9:</div>

                                                  </span>

                                                  <div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load1 = load i64, i64* %0, align 8        

                                                                   //

                                                      %bf.load1 is

                                                      helpful for the

                                                      patch to recognize

                                                      the

                                                      load-and-or-store

                                                      pattern.</span></div>

                                                  <div style="font-size:12.8px">

                                                    <div>  %inc79 = add

                                                      i64 %bf.load1,

                                                      281474976710656  

                                                          </div>

                                                    <span>

                                                      <div>  %bf.shl =

                                                        and i64 %inc79,

71776119061217280</div>

                                                    </span>

                                                    <div>  %bf.clear18 =

                                                      and i64 %bf.load1,

                                                      -71776119061217281</div>

                                                    <span>

                                                      <div>  %bf.set19 =

                                                        or i64 %bf.shl,

                                                        %bf.clear18</div>

                                                      <div>  store i64

                                                        %bf.set19, i64*

                                                        %0, align 8</div>

                                                    </span></div>

                                                  <div style="font-size:12.8px"> 

                                                    ...</div>

                                                  <div style="font-size:12.8px">if.end41:</div>

                                                  <div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load2 = load i64, i64* %0, align 8        

                                                                   </span><span style="font-size:12.8px">// %bf.load2 is helpful for the patch to

                                                      recognize the

                                                      load-and-or-store

                                                      pattern.</span></div>

                                                  <div style="font-size:12.8px">

                                                    <div>  %inc4578 =

                                                      add i64 %bf.load2,

                                                      65536            

                                                                </div>

                                                    <span>

                                                      <div>  %bf.shl48 =

                                                        and i64

                                                        %inc4578,

                                                        4294901760</div>

                                                    </span>

                                                    <div>  %bf.clear49 =

                                                      and i64 %bf.load2,

                                                      -4294901761</div>

                                                    <span>

                                                      <div>  %bf.set50 =

                                                        or i64

                                                        %bf.shl48,

                                                        %bf.clear49</div>

                                                      <div>  store i64

                                                        %bf.set50, i64*

                                                        %0, align 8</div>

                                                    </span></div>

                                                </div>

                                                <div> </div>

                                                <div>After global value

                                                  numbering:</div>

                                                <span>

                                                  <div><br>

                                                  </div>

                                                  <div>

                                                    <div style="font-size:12.8px">entry:</div>

                                                    <div style="font-size:12.8px"> 

                                                      %0 = bitcast

                                                      %class.B* %this to

                                                      i64*</div>

                                                    <div style="font-size:12.8px"> 

                                                      %bf.load = load

                                                      i64, i64* %0,

                                                      align 8</div>

                                                    <div style="font-size:12.8px">

                                                      <div>  %dec = add

                                                        i64 %bf.load,

                                                        65535</div>

                                                      <div>  %bf.value =

                                                        and i64 %dec,

                                                        65535</div>

                                                      <div>  %bf.clear5

                                                        = and i64

                                                        %bf.load, -65536</div>

                                                      <div>  %bf.set =

                                                        or i64

                                                        %bf.value,

                                                        %bf.clear5</div>

                                                      <div>  store i64

                                                        %bf.set, i64*

                                                        %0, align 8</div>

                                                    </div>

                                                    <div style="font-size:12.8px"> 

                                                      ...</div>

                                                    <div style="font-size:12.8px">if.then9:</div>

                                                    <div style="font-size:12.8px">

                                                      <div>  %inc79 =

                                                        add i64 %bf.set,

                                                        281474976710656

                                                              // we hope

                                                        to know %bf.set

                                                        is the same as

                                                        load i64, i64*

                                                        %0, align 8.</div>

                                                      <div>  %bf.shl =

                                                        and i64 %inc79,

71776119061217280</div>

                                                      <div>  %bf.clear18

                                                        = and i64

                                                        %bf.set,

                                                        -71776119061217281</div>

                                                      <div>  %bf.set19 =

                                                        or i64 %bf.shl,

                                                        %bf.clear18</div>

                                                      <div>  store i64

                                                        %bf.set19, i64*

                                                        %0, align 8</div>

                                                    </div>

                                                    <div style="font-size:12.8px"> 

                                                      ...</div>

                                                    <div style="font-size:12.8px">if.end41:</div>

                                                    <div style="font-size:12.8px">

                                                      <div>  %inc4578 =

                                                        add i64 %bf.set,

                                                        65536          

                                                                     //

                                                        we hope to know

                                                        %bf.set is the

                                                        same as load

                                                        i64, i64* %0,

                                                        align 8.</div>

                                                      <div>  %bf.shl48 =

                                                        and i64

                                                        %inc4578,

                                                        4294901760</div>

                                                      <div>  %bf.clear49

                                                        = and i64

                                                        %bf.set,

                                                        -4294901761</div>

                                                      <div>  %bf.set50 =

                                                        or i64

                                                        %bf.shl48,

                                                        %bf.clear49</div>

                                                      <div>  store i64

                                                        %bf.set50, i64*

                                                        %0, align 8</div>

                                                    </div>

                                                  </div>

                                                </span></div>

                                            </div>

                                          </div>

                                        </blockquote>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                                <br>

                              </div>

                            </div>

                            I don't like depending on pass ordering like

                            this... I mean, I can't really say without

                            the original C testcase, but at first

                            glance, EarlyCSE should be catching this,

                            and you're just getting lucky.</div>

                        </blockquote>

                        <div><br>

                        </div>

                      </div>

                    </div>

                    <div>For the testcase above, EarlyCSE does't replace

                      the load with previous store value because there

                      are calls and other stores in between (in the <span style="color:rgb(84,84,84);font-family:roboto,arial,sans-serif">ellipsis</span>).

                      You are right, generally, unless we add another

                      pass of instcombine before earlycse, doing the

                      shrinking in instcombine after earlycse can face

                      the same problem. I havn't got an idea to solve it

                      generally except that to add another pass of

                      instcombine. Still thinking...</div>

                  </div>

                </div>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>To catch all the bitfield shrinking opportunities,

              seems the optimization should be placed before the first

              pass of earlycse. But I don't think adding another pass of

              instcombine before earlycse is a good idea, because I

              vaguely remember we saw case requiring earlycse to happen

              before instcombine. </div>

            <div><br>

            </div>

            <div>Is it ok to add a separate pass before earlycse for

              bitfield related shrinking? This is the best way I can

              think of for now. Suggestions are welcomed.</div>

            <br>

          </div>

        </div>

      </div>

    </blockquote>

    <br></div></div>

    You might want to skim the thread

<a class="m_-565701091899102164moz-txt-link-freetext" href="http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120827/063200.html" target="_blank">http://lists.llvm.org/<wbr>pipermail/cfe-commits/Week-of-<wbr>Mon-20120827/063200.html</a>

    for some background on our current treatment of bitfields.  <br>

    <br>

    I'm not sure what the right answer here looks like, given the way it

    interacts with other memory transforms like GVN. Could you send an

    email to llvm-dev with a description of what we're looking at, to

    see if anyone else has suggestions?<span class=""><br>

    <p>-Eli<br>

    </p>

    <pre class="m_-565701091899102164moz-signature" cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>

  </span></div>

</blockquote></div><br></div>