<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>I think I have a change building on Quentin's patch that catches
      this case with only a few extra changes.  I should have a rough
      draft ready for feedback in a day or two.<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 11/15/2016 3:46 PM, Matthias Braun
      wrote:<br>
    </div>
    <blockquote
      cite="mid:86AC25FA-47A7-490B-8C50-260C3117FD10@braunis.de"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="">Ah right,</div>
      <div class=""><br class="">
      </div>
      <div class="">that is indeed a different issue. I had that issue
        on my TODO list as well (but somewhere deep down). Without
        actually trying it, I hope we can do that by having a smarter
        spill creation callback which matches a spill(copy(zero))
        pattern.</div>
      <div class=""><br class="">
      </div>
      <div class="">- Matthias</div>
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Nov 15, 2016, at 12:13 PM, Geoff Berry <<a
              moz-do-not-send="true" href="mailto:gberry@codeaurora.org"
              class="">gberry@codeaurora.org</a>> wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <meta content="text/html; charset=utf-8"
              http-equiv="Content-Type" class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <p class="">Hi Matthias,</p>
              <p class="">Those changes don't catch the particular cases
                I'm looking at.  The problem is that the xzr/wzr copies
                don't become candidates for coalescing until the middle
                of register allocation when some live ranges have been
                split/spilled.  Here's an example test case:<br class="">
              </p>
              declare i32 @bar()<br class="">
              declare i32 @baz()<br class="">
              ; Check that the spill of the zero value gets stored
              directly instead<br class="">
              ; of being copied from wzr and then stored.<br class="">
              define i32 @test_zr_spill_copyprop(i1 %c) {<br class="">
              ; CHECK-LABEL: test_zr_spill_copyprop:<br class="">
              entry:<br class="">
                br i1 %c, label %if.else, label %if.then<br class="">
              <br class="">
              if.else:<br class="">
              ; CHECK: bl bar<br class="">
              ; CHECK-NEXT: str w0, [sp, #[[SLOT:[0-9]+]]]<br class="">
                %call1 = tail call i32 @bar()<br class="">
                br label %if.end<br class="">
              <br class="">
              if.then:<br class="">
              ; CHECK: bl baz<br class="">
              ; CHECK-NEXT: str wzr, [sp, #[[SLOT]]]<br class="">
                %call2 = tail call i32 @baz()<br class="">
                br label %if.end<br class="">
              <br class="">
              if.end:<br class="">
                %x.0 = phi i32 [ 0, %if.then ], [ %call1, %if.else ]<br
                class="">
                call void asm sideeffect "",
"~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x13},~{x14},~{x15},~{x16},~{x17},~{x18},~{x19},~{x20},~{x21},~{x22},~{x23},~{x24},~{x25},~{x26},~{x27},~{x28},~{fp},~{lr},~{sp}"()
              nounwind<br class="">
                %x.1 = add i32 %x.0, 1<br class="">
                ret i32 %x.1<br class="">
              }<br class="">
              <br class="">
              <br class="">
              <div class="moz-cite-prefix">On 11/15/2016 2:58 PM,
                Matthias Braun wrote:<br class="">
              </div>
              <blockquote
                cite="mid:61E456A4-F98C-43B7-ADE4-AFCF426B1BCB@braunis.de"
                type="cite" class="">
                <meta http-equiv="Content-Type" content="text/html;
                  charset=utf-8" class="">
                <div class="">I would xzr/wzr problems expect to be
                  handled by the changes in <a moz-do-not-send="true"
                    href="https://reviews.llvm.org/D26106" class="">https://reviews.llvm.org/D26106</a> as
                  soon as <a moz-do-not-send="true"
                    href="https://reviews.llvm.org/D26111" class="">https://reviews.llvm.org/D26111</a> is
                  committed.</div>
                <div class=""><br class="">
                </div>
                <div class="">- Matthias</div>
                <div class=""><br class="">
                </div>
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">On Nov 15, 2016, at 11:01 AM, Geoff
                      Berry <<a moz-do-not-send="true"
                        href="mailto:gberry@codeaurora.org" class="">gberry@codeaurora.org</a>>
                      wrote:</div>
                    <br class="Apple-interchange-newline">
                    <div class="">
                      <meta content="text/html; charset=utf-8"
                        http-equiv="Content-Type" class="">
                      <div bgcolor="#FFFFFF" text="#000000" class="">
                        <p class="">Hi Quentin,</p>
                        <p class="">Thanks for taking a look.  I would
                          greatly prefer a one line change :)  My
                          original motivation for looking into this was
                          a case involving XZR/WZR copys on aarch64, so
                          I would like to catch those cases as well. 
                          I'll take a look into adding constant reg
                          support to the register allocator unless you
                          already know how you'd like to add that
                          functionality.  Once we have that I can
                          compare the two approaches to make sure there
                          aren't more cases being missed.<br class="">
                        </p>
                        <div class="moz-cite-prefix">On 11/14/2016 9:37
                          PM, Quentin Colombet wrote:<br class="">
                        </div>
                        <blockquote
                          cite="mid:3B5F4700-B001-437E-A6D3-8DF06F99D7E5@apple.com"
                          type="cite" class="">
                          <pre class="" wrap="">Hi Geoff,

I had a closer look at the problem and in particular checked why the “hint fix-up” thing was not catching those cases.
The reason why it didn’t catch them is because we were doing the fix-up only for live-range that regalloc split not all missed hints.
Fixing that seems to solve your motivating cases (but the wzr case because this is a reserved reg and we don’t touch it, though I could improve the code to handle constant reg… let me know if you want me to have a look).

Could you try the attached patch and tell me how it works for you?
Assuming it solves the cases you wanted to catch, I’ll push it and we can drop that patch.

Cheers,
-Quentin
</pre>
                          <br class="">
                          <fieldset class="mimeAttachmentHeader"></fieldset>
                          <br class="">
                          <blockquote type="cite" class="">
                            <pre class="" wrap="">On Nov 10, 2016, at 4:37 PM, Quentin Colombet <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:qcolombet@apple.com"><qcolombet@apple.com></a> wrote:

qcolombet added inline comments.


================
Comment at: lib/CodeGen/VirtRegMap.cpp:158
+// Map from physical regunit to list of def SlotIndexes.
+using PhysDefsMap = DenseMap<unsigned, SmallVector<SlotIndex, 4>>;
+
----------------
For consistency with the rest of LLVM source code I would use typedef.


================
Comment at: lib/CodeGen/VirtRegMap.cpp:158
+// Map from physical regunit to list of def SlotIndexes.
+using PhysDefsMap = DenseMap<unsigned, SmallVector<SlotIndex, 4>>;
+
----------------
qcolombet wrote:
</pre>
                            <blockquote type="cite" class="">
                              <pre class="" wrap="">For consistency with the rest of LLVM source code I would use typedef.
</pre>
                            </blockquote>
                            <pre class="" wrap="">Having read the code now, I believe it would be more efficient to map PhysReg to LI.
It would greatly simplify add/removeMappedPhysReg.


================
Comment at: lib/CodeGen/VirtRegMap.cpp:221
  AU.addRequired<VirtRegMap>();
+  AU.addRequired<MachineBlockFrequencyInfo>();
+  AU.addPreserved<MachineBlockFrequencyInfo>();
----------------
We could gate that dependency on DisableRenameCopys.
That being said, if we move all that code in its own pass, that's not going to be a problem :).


================
Comment at: lib/CodeGen/VirtRegMap.cpp:296
+      bool FoundDef = false;
+      for (auto DI = DefIndexes.begin(); DI != DefIndexes.end(); ++DI)
+        if (*DI == DefIndex) {
----------------
Seems inefficient to use a vector here.
At the very least it should be sorted.


================
Comment at: lib/CodeGen/VirtRegMap.cpp:336
+                                    const TargetRegisterInfo &TRI,
+                                    const VirtRegMap &VRM) {
+
----------------
Looks to me like this is reimplementing functionalities already available in the LiveRegMatrix class.
Reuse that instead.


================
Comment at: lib/CodeGen/VirtRegMap.cpp:428
+    // Check the cost/benefit of renaming.
+    BlockFrequency DefFreq = MBFI->getBlockFreq(MI->getParent());
+    BlockFrequency Benefit(DefFreq);
----------------
Wasn't the goal to propagate useless copies, i.e., that should always be beneficial, right?

If you want to do more, then please add test cases covering that. Indeed, as far as I can tell, the changes in the test cases only cover the simple case of propagating useless copies.


================
Comment at: lib/CodeGen/VirtRegMap.cpp:491
+    // reg so kill flags computation will be correct.
+    assert(LI.getNumValNums() == 1);
+    SlotIndex SrcDefPrevSlot = LI.getValNumInfo(0)->def.getPrevSlot();
----------------
Add && "Msg" in the assert.

Also explain in the comment before the assert what would it take to extend that to multiple VNs.


<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://reviews.llvm.org/D25347">https://reviews.llvm.org/D25347</a>



</pre>
                          </blockquote>
                        </blockquote>
                        <br class="">
                        <pre class="moz-signature" cols="72">-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <br class="">
              </blockquote>
              <br class="">
              <pre class="moz-signature" cols="72">-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
            </div>
          </div>
        </blockquote>
      </div>
      <br class="">
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
  </body>
</html>