<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Halide would like to set the .maxnreg directive per PTX kernel."

   href="https://llvm.org/bugs/show_bug.cgi?id=31321">31321</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Halide would like to set the .maxnreg directive per PTX kernel.

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>Other

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: PTX

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>zalman@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>The PTX backend has the ability to generate certain per kernel (.entry) PTX

directives via metadata annotations. The test for this is in

test/CodeGen/NVPTX/annotations.ll . According to this NVIDIA PTX document:

<a href="http://docs.nvidia.com/cuda/parallel-thread-execution/#performance-tuning-directives">http://docs.nvidia.com/cuda/parallel-thread-execution/#performance-tuning-directives</a>

the .maxnreg value can be set on a per entry basis as well. Halide would like

to exploit this to be able to provide a scheduling directive to control this

value. (See:

    <a href="https://github.com/halide/Halide/pull/1667">https://github.com/halide/Halide/pull/1667</a>

where there is a hack to set the maximum number of registers on a per module

basis at load time.)

Plumbing this through involves adding support to

NVPTXAsmPrinter::emitKernelFunctionDirectives .</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>