<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><span class="vcard"><a class="email" href="mailto:andrew.b.adams@gmail.com" title="Andrew Adams <andrew.b.adams@gmail.com>"> <span class="fn">Andrew Adams</span></a>
</span> changed
              <a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED FIXED - alloca in local memory not promoted to registers"
   href="https://llvm.org/bugs/show_bug.cgi?id=31333">bug 31333</a>
        <br>
             <table border="1" cellspacing="0" cellpadding="8">
          <tr>
            <th>What</th>
            <th>Removed</th>
            <th>Added</th>
          </tr>

         <tr>
           <td style="text-align:right;">Status</td>
           <td>NEW
           </td>
           <td>RESOLVED
           </td>
         </tr>

         <tr>
           <td style="text-align:right;">Resolution</td>
           <td>---
           </td>
           <td>FIXED
           </td>
         </tr></table>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED FIXED - alloca in local memory not promoted to registers"
   href="https://llvm.org/bugs/show_bug.cgi?id=31333#c8">Comment # 8</a>
              on <a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED FIXED - alloca in local memory not promoted to registers"
   href="https://llvm.org/bugs/show_bug.cgi?id=31333">bug 31333</a>
              from <span class="vcard"><a class="email" href="mailto:andrew.b.adams@gmail.com" title="Andrew Adams <andrew.b.adams@gmail.com>"> <span class="fn">Andrew Adams</span></a>
</span></b>
        <pre>Your comments made me wonder if it's our pass setup that's wrong:

<a href="https://github.com/halide/Halide/blob/master/src/CodeGen_PTX_Dev.cpp#L267">https://github.com/halide/Halide/blob/master/src/CodeGen_PTX_Dev.cpp#L267</a>

I think that code was written in 2012. Leaving it as a size-64 alloca, but
changing the pass setup to this:

<a href="https://github.com/halide/Halide/blob/0e1662f2382e7134205abcdcd995a54f3441365a/src/CodeGen_PTX_Dev.cpp#L267">https://github.com/halide/Halide/blob/0e1662f2382e7134205abcdcd995a54f3441365a/src/CodeGen_PTX_Dev.cpp#L267</a>

gives me the best timings I've seen. It's 15% faster than the 64 individial
allocas. No usage of local memory, and it seems to have decided to make the
loads non-cached.

So, pebkac I guess. Sorry about that. SROA probably wasn't even running before
the address-space casts appeared. That's still using the legacy pass manager
though. Is there some canonical piece of code that shows the right way to set
up the passes for PTX kernels?</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>