<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - No AMX ldtilecfg emitted when compiled without optimizations"
   href="https://bugs.llvm.org/show_bug.cgi?id=49223">49223</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>No AMX ldtilecfg emitted when compiled without optimizations
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>john@jwlawson.co.uk
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=24547" name="attach_24547" title="Small test case that uses AMX instructions and fails to run without optimizations enabled">attachment 24547</a> <a href="attachment.cgi?id=24547&action=edit" title="Small test case that uses AMX instructions and fails to run without optimizations enabled">[details]</a></span>
Small test case that uses AMX instructions and fails to run without
optimizations enabled

When using the AMX C interface intrinsics without optimizations the instruction
to load the tile configuration settings are not generated. This leads to a
runtime error when any other AMX instruction is used as the tile configurations
are a prerequisite for them. The AMX instructions are all tile based with
programmable sizes, so the configurations must be set before anything can be
done with the tiles.

    #include <immintrin.h>
    #include <stdint.h>
    void compute_matmul(uint16_t row, uint16_t col, uint16_t acc,               
                        int8_t* data, int32_t* out) {                         
      __tile1024i A_tile = {row, acc};                                          
      __tile1024i B_tile = {acc / 4, 4 * col};                                  
      __tile1024i out_tile = {row, 4 * col};                                    

      __tile_zero(&out_tile);                                                   
      __tile_loadd(&A_tile, data, acc);                                         
      __tile_loadd(&B_tile, data, 4 * col);                                     
      __tile_dpbssd(&out_tile, A_tile, B_tile);                                 
      __tile_stored(out, 4 * col, out_tile);                                    
    }

With optimizations this generates the correct code:
    ...
    ldtilecfg -64(%rsp)                                                         
    tilezero  %tmm0                                                             
    movl  %edx, %r8d                                                            
    movl  $data, %r9d                                                           
    tileloadd (%r9,%r8), %tmm1                                                  
    tileloadd (%r9,%rsi), %tmm2                                                 
    tdpbssd %tmm2, %tmm1, %tmm0                                                 
    tilestored  %tmm0, (%rcx,%rsi)                                              
    tilerelease
    ...

But when compiled without optimizations there is no ldtilecfg instruction
emitted before the other AMX instructions.

The attached test case can be compiled with clang
    clang amx_test.c -O0 -march=sapphirerapids -S -o amx_test.s
which will show that the ldtilecfg instruction is not generated for O0, but is
for O1.

The binary can be tested using Intel's CPU emulator [0] and run with the
Sapphire Rapids target:
    clang amx_test.c -O0 -march=sapphirerapids -o amx_test
    sde -spr -- ./amx_test

Which will throw an error that the tiles are not configured:
    #UD TILES_CONFIGURED == 0
    XDIS 00000000004007ee: AMX_TILE C4E27B49C0               tilezero tmm0

At O1 the generated binary will run successfully and print out the matrices.

Tested with 
* clang version 13.0.0 (<a href="https://github.com/llvm/llvm-project.git">https://github.com/llvm/llvm-project.git</a>
63a35f35ecf8a46e63af750a88faecfe1c6354a6) compiled from source
* Ubuntu clang version
12.0.0-++20210206062546+716eef9ad5b3-1~exp1~20210206053323.18 from apt.llvm.org

[0]:
<a href="https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html">https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html</a></pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>