<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [AMDGPU]{MC] Improve disassembler error handling"

   href="https://bugs.llvm.org/show_bug.cgi?id=36347">36347</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[AMDGPU]{MC] Improve disassembler error handling

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: AMDGPU

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>tcorring@amd.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Currently AMDGPUDisassembler::getInstruction() follows the standard LLVM

disassembler documented behavior for reporting an instruction that cannot be

disassembled - it returns an empty string and reports 0 bytes as consumed.

This has the result that the 4 byte instruction is reported as unknown, but the

disassembler steps just 1 byte before attempting to decode the next

instruction. As amdgcn instructions are multiples of 4 bytes, a 1 byte step

almost guarantees that the result will either be another failed instruction or

the bogus disassembly of an instruction encoding that is made up from part of

one instruction and part of another. In the best case this results in a cascade

of four invalid instructions, but can result in much longer cascades. In either

case the remainder of the disassembly can't be trusted.

There are two approaches to improve this:

1) change the disassembler to report 4 bytes consumed when reporting an

instruction disassembly failure. This is a trivial one character change, but

does go against the LLVM Disassembler behavior - some other targets have

already made this change.

2) disassemble unrecognized opcodes as data using the .long directive (assuming

the unknown instruction is 4 bytes). This avoids disassembly failures, so the

change at 1) isn't required. However, it doesn't guarantee valid disassembly is

always valid, as the unrecognized instruction may be 8 bytes, leading to

further errors or bogus disassembly - this is likely to be in fewer cases than

previously though. An advantage of this approach is that the resulting

disassembly is in a format that is suitable for re-assembly.

I favor 2), as this has more advantages. There isn't a solution that is always

correct in the presence of unknown opcodes, but this is probably as close as is

possible, and much better than the current behavior.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>