<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - No AMX ldtilecfg emitted when compiled without optimizations"
href="https://bugs.llvm.org/show_bug.cgi?id=49223">49223</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>No AMX ldtilecfg emitted when compiled without optimizations
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>john@jwlawson.co.uk
</td>
</tr>
<tr>
<th>CC</th>
<td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=24547" name="attach_24547" title="Small test case that uses AMX instructions and fails to run without optimizations enabled">attachment 24547</a> <a href="attachment.cgi?id=24547&action=edit" title="Small test case that uses AMX instructions and fails to run without optimizations enabled">[details]</a></span>
Small test case that uses AMX instructions and fails to run without
optimizations enabled
When using the AMX C interface intrinsics without optimizations the instruction
to load the tile configuration settings are not generated. This leads to a
runtime error when any other AMX instruction is used as the tile configurations
are a prerequisite for them. The AMX instructions are all tile based with
programmable sizes, so the configurations must be set before anything can be
done with the tiles.
#include <immintrin.h>
#include <stdint.h>
void compute_matmul(uint16_t row, uint16_t col, uint16_t acc,
int8_t* data, int32_t* out) {
__tile1024i A_tile = {row, acc};
__tile1024i B_tile = {acc / 4, 4 * col};
__tile1024i out_tile = {row, 4 * col};
__tile_zero(&out_tile);
__tile_loadd(&A_tile, data, acc);
__tile_loadd(&B_tile, data, 4 * col);
__tile_dpbssd(&out_tile, A_tile, B_tile);
__tile_stored(out, 4 * col, out_tile);
}
With optimizations this generates the correct code:
...
ldtilecfg -64(%rsp)
tilezero %tmm0
movl %edx, %r8d
movl $data, %r9d
tileloadd (%r9,%r8), %tmm1
tileloadd (%r9,%rsi), %tmm2
tdpbssd %tmm2, %tmm1, %tmm0
tilestored %tmm0, (%rcx,%rsi)
tilerelease
...
But when compiled without optimizations there is no ldtilecfg instruction
emitted before the other AMX instructions.
The attached test case can be compiled with clang
clang amx_test.c -O0 -march=sapphirerapids -S -o amx_test.s
which will show that the ldtilecfg instruction is not generated for O0, but is
for O1.
The binary can be tested using Intel's CPU emulator [0] and run with the
Sapphire Rapids target:
clang amx_test.c -O0 -march=sapphirerapids -o amx_test
sde -spr -- ./amx_test
Which will throw an error that the tiles are not configured:
#UD TILES_CONFIGURED == 0
XDIS 00000000004007ee: AMX_TILE C4E27B49C0 tilezero tmm0
At O1 the generated binary will run successfully and print out the matrices.
Tested with
* clang version 13.0.0 (<a href="https://github.com/llvm/llvm-project.git">https://github.com/llvm/llvm-project.git</a>
63a35f35ecf8a46e63af750a88faecfe1c6354a6) compiled from source
* Ubuntu clang version
12.0.0-++20210206062546+716eef9ad5b3-1~exp1~20210206053323.18 from apt.llvm.org
[0]:
<a href="https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html">https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>