<div dir="ltr"><div>Eli, thank you for pointing me to the thread. Very helpful to understand the background.<br></div><div><br></div><div>Now I have better understanding about the problem. The change in the thread wrapping consecutive bitfields into a group has the benefit for loads of different bitfields to be merged. If the bitfield acceses are not legal types like i8/i16/i32, the merge will be helpful. If the bitfield acceses are legal type, we want to do the shrinking here. </div><div><br></div><div>However, for bitfield load/store merge (earlycse, gvn and even instcombine can do such merge, but gvn does it more thoroughly) and legal type bitfield accesses shrinking, no matter which one happens first, it can make the other one more difficult. Like if we do legal type bitfield accesses shrinking first, the shrinked load/store will block other bitfield load/store from the same bitfield groups to merge. If we do bitfield load/store merge first, we will face the problems described previously: 1. we need memoryssa in safety check. 2. The pattern matching becomes more difficult. </div><div><br></div><div>After some thought, I think it is still better to do the bitfield shrinking later. To make the shrinking still effective, for the case below, 1. I need to use memoryssa to check there is no alias mem write between the load and the store below. 2. I need to extend the existing pattern match to handle store(or(and(or(and...or(and(load Ptr, Value), Value1)..., ValueN), Ptr).</div><div><br></div><div>%bf.load = load i64, i64* %0, align 8<br></div><div><div>%bf.clear = and i64 %bf.load, -65536</div><div>%bf.set = or i64 %bf.value, %bf.clear</div></div><div>.....</div><div><br></div><div><div>%bf.clear1 = and i64 %bf.set, -4294901761</div><div>%bf.set1 = or i64 %bf.value1, %bf.clear1</div></div><div>.....</div><div><br></div><div><div>%bf.clear2 = and i64 %bf.set1, -4294901761</div><div>%bf.set2 = or i64 %bf.value2, %bf.clear2</div><div>store i64 %bf.set2, i64* %9, align 8</div></div><div><br></div><div>For 2, we know that the extended pattern above is to load the bitfield group first, and the several "or+and" operations are to adjusting different parts of the load. If only the last "or + and" touch different bits with all previous "or + and" operations, it is good to separate the store in the end to two stores and do the shrinking for the second store, like below. I think it just adds more complexity to the existing patch, but should still be doable.</div><div><br></div><div><div>%bf.load = load i64, i64* %0, align 8<br></div><div><div>%bf.clear = and i64 %bf.load, -65536</div><div>%bf.set = or i64 %bf.value, %bf.clear</div></div><div>.....</div><div><br></div><div><div>%bf.clear1 = and i64 %bf.set, -4294901761</div><div>%bf.set1 = or i64 %bf.value1, %bf.clear1</div></div><div>store i64 %bf.set1, i64* %0, align 8<br></div><div>.....</div><div><br></div><div><div>%bf.value2.shrinked = shrink_op %bf.value2 <br></div><div>store i16 %bf.value2.shrinked, i64* %0, align 8</div></div></div><div><br></div><div>For 1, probably I need to add a new pass for it because I cannot put it in CGP anymore (There is no dominator tree to use in CGP. The dominator tree update there is broken, which was described in <a href="http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html">http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140120/202291.html</a>. The thread is three years ago, but from my looking at the code, the conclusion still holds).</div><div><br></div><div>Comments are welcomed.</div><div><br></div><div>Thanks,</div><div>Wei.</div><div><br></div><div><div><br></div><div> </div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 7, 2017 at 3:26 PM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
    <div class="m_-565701091899102164moz-cite-prefix">On 3/7/2017 1:57 PM, Wei Mi wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, Mar 7, 2017 at 10:51 AM, Wei
            Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div dir="ltr"><br>
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">
                    <div>
                      <div class="m_-565701091899102164h5">On Tue, Mar 7, 2017 at 9:36 AM,
                        Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div bgcolor="#FFFFFF">
                            <div>
                              <div class="m_-565701091899102164m_-182419626503815813gmail-h5">
                                <div class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899moz-cite-prefix">On
                                  3/7/2017 9:15 AM, Wei Mi wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr"><br>
                                    <div class="gmail_extra"><br>
                                      <div class="gmail_quote">On Mon,
                                        Mar 6, 2017 at 8:50 PM, Wei Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div dir="ltr"><br>
                                            <div class="gmail_extra"><br>
                                              <div class="gmail_quote"><span>On
                                                  Mon, Mar 6, 2017 at
                                                  4:36 PM, Friedman, Eli
                                                  <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>
                                                  wrote:<br>
                                                </span><span>
                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-">On
                                                      3/6/2017 11:06 AM,
                                                      Wei Mi wrote:<br>
                                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                        After looking at
                                                        the benchmark
                                                        motivating the
                                                        test, I find it
                                                        is much harder
                                                        for
                                                        codegenprepare
                                                        to catch all the
                                                        shrinking
                                                        opportunities
                                                        compared with
                                                        instcombine,
                                                        because of two
                                                        things:<br>
                                                        <br>
                                                        * loadpre may
                                                        merge the
                                                        original load in
                                                        a shrinking
                                                        pattern with
                                                        other load in a
                                                        very early
                                                        position, and
                                                        there maybe
                                                        other stores in
                                                        between. The
                                                        safety check for
                                                        the
                                                        codegenprepare
                                                        now simply scan
                                                        mayWriteToMemory
                                                        insns between
                                                        load and store
                                                        in the candidate
                                                        shrinking
                                                        pattern, and it
                                                        will fail
                                                        because of those
                                                        stores in
                                                        between.<br>
                                                      </blockquote>
                                                      <br>
                                                    </span> It would be
                                                    easy to make the
                                                    check a lot more
                                                    powerful using
                                                    MemorySSA; I guess
                                                    you don't have
                                                    access to that in
                                                    the passes where
                                                    you're doing the
                                                    transform?</blockquote>
                                                  <div><br>
                                                  </div>
                                                </span>
                                                <div>Yes, if MemorySSA +
                                                  alias query can be
                                                  used, it should solve
                                                  the problem. I didn't
                                                  find existing usage of
                                                  MemorySSA in
                                                  instcombine or CGP so
                                                  I am not sure whether
                                                  it can be used. </div>
                                                <span>
                                                  <div> </div>
                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-"><br>
                                                      <br>
                                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                        * we may see the
                                                        following
                                                        pattern. a.f1 is
                                                        a bitfield.
                                                        Ideally, we
                                                        should do the
                                                        shrinking for
                                                        all the three
                                                        stores. But
                                                        codegenprepare
                                                        works in
                                                        top-down order.
                                                        After the first
                                                        store is
                                                        shrinked, it is
                                                        hard for the
                                                        second and the
                                                        third to know
                                                        %bf.set is the
                                                        same as the
                                                        value loaded
                                                        from %0. If we
                                                        do this in
                                                        instcombine, it
                                                        is much easier
                                                        because before
                                                        load/store
                                                        elimination,
                                                        %bf.set will be
                                                        loaded from %0
                                                        respecitively in
                                                        if.then9 and in
                                                        if.end41.<br>
                                                      </blockquote>
                                                      <br>
                                                    </span> I'm not
                                                    following how it
                                                    helps to do this in
                                                    instcombine? 
                                                    Normally, EarlyCSE
                                                    runs first.  (A
                                                    complete C testcase
                                                    might be useful
                                                    here.)</blockquote>
                                                  <div><br>
                                                  </div>
                                                </span>
                                                <div>It is not earlycse.
                                                  It is also loadpre to
                                                  cause the load to be
                                                  replaced by %bf.set. </div>
                                                <div><br>
                                                </div>
                                                <div>
                                                  <div style="font-size:12.8px">Before
                                                    global value
                                                    numbering:</div>
                                                  <span>
                                                    <div style="font-size:12.8px"><br class="m_-565701091899102164m_-182419626503815813gmail-m_-940156540931862899m_-8464126786741311821gmail-Apple-interchange-newline">
                                                      entry:</div>
                                                    <div style="font-size:12.8px"> 
                                                      %0 = bitcast
                                                      %class.B* %this to
                                                      i64*</div>
                                                    <div style="font-size:12.8px"> 
                                                      %bf.load = load
                                                      i64, i64* %0,
                                                      align 8</div>
                                                    <div style="font-size:12.8px">
                                                      <div>  %dec = add
                                                        i64 %bf.load,
                                                        65535</div>
                                                      <div>  %bf.value =
                                                        and i64 %dec,
                                                        65535</div>
                                                      <div>  %bf.clear5
                                                        = and i64
                                                        %bf.load, -65536</div>
                                                      <div>  %bf.set =
                                                        or i64
                                                        %bf.value,
                                                        %bf.clear5</div>
                                                      <div>  store i64
                                                        %bf.set, i64*
                                                        %0, align 8</div>
                                                    </div>
                                                    <div style="font-size:12.8px"> 
                                                      ...</div>
                                                    <div style="font-size:12.8px">if.then9:</div>
                                                  </span>
                                                  <div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load1 = load i64, i64* %0, align 8        
                                                                   //
                                                      %bf.load1 is
                                                      helpful for the
                                                      patch to recognize
                                                      the
                                                      load-and-or-store
                                                      pattern.</span></div>
                                                  <div style="font-size:12.8px">
                                                    <div>  %inc79 = add
                                                      i64 %bf.load1,
                                                      281474976710656  
                                                          </div>
                                                    <span>
                                                      <div>  %bf.shl =
                                                        and i64 %inc79,
71776119061217280</div>
                                                    </span>
                                                    <div>  %bf.clear18 =
                                                      and i64 %bf.load1,
                                                      -71776119061217281</div>
                                                    <span>
                                                      <div>  %bf.set19 =
                                                        or i64 %bf.shl,
                                                        %bf.clear18</div>
                                                      <div>  store i64
                                                        %bf.set19, i64*
                                                        %0, align 8</div>
                                                    </span></div>
                                                  <div style="font-size:12.8px"> 
                                                    ...</div>
                                                  <div style="font-size:12.8px">if.end41:</div>
                                                  <div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load2 = load i64, i64* %0, align 8        
                                                                   </span><span style="font-size:12.8px">// %bf.load2 is helpful for the patch to
                                                      recognize the
                                                      load-and-or-store
                                                      pattern.</span></div>
                                                  <div style="font-size:12.8px">
                                                    <div>  %inc4578 =
                                                      add i64 %bf.load2,
                                                      65536            
                                                                </div>
                                                    <span>
                                                      <div>  %bf.shl48 =
                                                        and i64
                                                        %inc4578,
                                                        4294901760</div>
                                                    </span>
                                                    <div>  %bf.clear49 =
                                                      and i64 %bf.load2,
                                                      -4294901761</div>
                                                    <span>
                                                      <div>  %bf.set50 =
                                                        or i64
                                                        %bf.shl48,
                                                        %bf.clear49</div>
                                                      <div>  store i64
                                                        %bf.set50, i64*
                                                        %0, align 8</div>
                                                    </span></div>
                                                </div>
                                                <div> </div>
                                                <div>After global value
                                                  numbering:</div>
                                                <span>
                                                  <div><br>
                                                  </div>
                                                  <div>
                                                    <div style="font-size:12.8px">entry:</div>
                                                    <div style="font-size:12.8px"> 
                                                      %0 = bitcast
                                                      %class.B* %this to
                                                      i64*</div>
                                                    <div style="font-size:12.8px"> 
                                                      %bf.load = load
                                                      i64, i64* %0,
                                                      align 8</div>
                                                    <div style="font-size:12.8px">
                                                      <div>  %dec = add
                                                        i64 %bf.load,
                                                        65535</div>
                                                      <div>  %bf.value =
                                                        and i64 %dec,
                                                        65535</div>
                                                      <div>  %bf.clear5
                                                        = and i64
                                                        %bf.load, -65536</div>
                                                      <div>  %bf.set =
                                                        or i64
                                                        %bf.value,
                                                        %bf.clear5</div>
                                                      <div>  store i64
                                                        %bf.set, i64*
                                                        %0, align 8</div>
                                                    </div>
                                                    <div style="font-size:12.8px"> 
                                                      ...</div>
                                                    <div style="font-size:12.8px">if.then9:</div>
                                                    <div style="font-size:12.8px">
                                                      <div>  %inc79 =
                                                        add i64 %bf.set,
                                                        281474976710656
                                                              // we hope
                                                        to know %bf.set
                                                        is the same as
                                                        load i64, i64*
                                                        %0, align 8.</div>
                                                      <div>  %bf.shl =
                                                        and i64 %inc79,
71776119061217280</div>
                                                      <div>  %bf.clear18
                                                        = and i64
                                                        %bf.set,
                                                        -71776119061217281</div>
                                                      <div>  %bf.set19 =
                                                        or i64 %bf.shl,
                                                        %bf.clear18</div>
                                                      <div>  store i64
                                                        %bf.set19, i64*
                                                        %0, align 8</div>
                                                    </div>
                                                    <div style="font-size:12.8px"> 
                                                      ...</div>
                                                    <div style="font-size:12.8px">if.end41:</div>
                                                    <div style="font-size:12.8px">
                                                      <div>  %inc4578 =
                                                        add i64 %bf.set,
                                                        65536          
                                                                     //
                                                        we hope to know
                                                        %bf.set is the
                                                        same as load
                                                        i64, i64* %0,
                                                        align 8.</div>
                                                      <div>  %bf.shl48 =
                                                        and i64
                                                        %inc4578,
                                                        4294901760</div>
                                                      <div>  %bf.clear49
                                                        = and i64
                                                        %bf.set,
                                                        -4294901761</div>
                                                      <div>  %bf.set50 =
                                                        or i64
                                                        %bf.shl48,
                                                        %bf.clear49</div>
                                                      <div>  store i64
                                                        %bf.set50, i64*
                                                        %0, align 8</div>
                                                    </div>
                                                  </div>
                                                </span></div>
                                            </div>
                                          </div>
                                        </blockquote>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                            I don't like depending on pass ordering like
                            this... I mean, I can't really say without
                            the original C testcase, but at first
                            glance, EarlyCSE should be catching this,
                            and you're just getting lucky.</div>
                        </blockquote>
                        <div><br>
                        </div>
                      </div>
                    </div>
                    <div>For the testcase above, EarlyCSE does't replace
                      the load with previous store value because there
                      are calls and other stores in between (in the <span style="color:rgb(84,84,84);font-family:roboto,arial,sans-serif">ellipsis</span>).
                      You are right, generally, unless we add another
                      pass of instcombine before earlycse, doing the
                      shrinking in instcombine after earlycse can face
                      the same problem. I havn't got an idea to solve it
                      generally except that to add another pass of
                      instcombine. Still thinking...</div>
                  </div>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>To catch all the bitfield shrinking opportunities,
              seems the optimization should be placed before the first
              pass of earlycse. But I don't think adding another pass of
              instcombine before earlycse is a good idea, because I
              vaguely remember we saw case requiring earlycse to happen
              before instcombine. </div>
            <div><br>
            </div>
            <div>Is it ok to add a separate pass before earlycse for
              bitfield related shrinking? This is the best way I can
              think of for now. Suggestions are welcomed.</div>
            <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br></div></div>
    You might want to skim the thread
<a class="m_-565701091899102164moz-txt-link-freetext" href="http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120827/063200.html" target="_blank">http://lists.llvm.org/<wbr>pipermail/cfe-commits/Week-of-<wbr>Mon-20120827/063200.html</a>
    for some background on our current treatment of bitfields.  <br>
    <br>
    I'm not sure what the right answer here looks like, given the way it
    interacts with other memory transforms like GVN. Could you send an
    email to llvm-dev with a description of what we're looking at, to
    see if anyone else has suggestions?<span class=""><br>
    <p>-Eli<br>
    </p>
    <pre class="m_-565701091899102164moz-signature" cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
  </span></div>

</blockquote></div><br></div>