[llvm-bugs] [Bug 49223] New: No AMX ldtilecfg emitted when compiled without optimizations
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Feb 17 07:14:16 PST 2021
https://bugs.llvm.org/show_bug.cgi?id=49223
Bug ID: 49223
Summary: No AMX ldtilecfg emitted when compiled without
optimizations
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: john at jwlawson.co.uk
CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
llvm-dev at redking.me.uk, pengfei.wang at intel.com,
spatel+llvm at rotateright.com
Created attachment 24547
--> https://bugs.llvm.org/attachment.cgi?id=24547&action=edit
Small test case that uses AMX instructions and fails to run without
optimizations enabled
When using the AMX C interface intrinsics without optimizations the instruction
to load the tile configuration settings are not generated. This leads to a
runtime error when any other AMX instruction is used as the tile configurations
are a prerequisite for them. The AMX instructions are all tile based with
programmable sizes, so the configurations must be set before anything can be
done with the tiles.
#include <immintrin.h>
#include <stdint.h>
void compute_matmul(uint16_t row, uint16_t col, uint16_t acc,
int8_t* data, int32_t* out) {
__tile1024i A_tile = {row, acc};
__tile1024i B_tile = {acc / 4, 4 * col};
__tile1024i out_tile = {row, 4 * col};
__tile_zero(&out_tile);
__tile_loadd(&A_tile, data, acc);
__tile_loadd(&B_tile, data, 4 * col);
__tile_dpbssd(&out_tile, A_tile, B_tile);
__tile_stored(out, 4 * col, out_tile);
}
With optimizations this generates the correct code:
...
ldtilecfg -64(%rsp)
tilezero %tmm0
movl %edx, %r8d
movl $data, %r9d
tileloadd (%r9,%r8), %tmm1
tileloadd (%r9,%rsi), %tmm2
tdpbssd %tmm2, %tmm1, %tmm0
tilestored %tmm0, (%rcx,%rsi)
tilerelease
...
But when compiled without optimizations there is no ldtilecfg instruction
emitted before the other AMX instructions.
The attached test case can be compiled with clang
clang amx_test.c -O0 -march=sapphirerapids -S -o amx_test.s
which will show that the ldtilecfg instruction is not generated for O0, but is
for O1.
The binary can be tested using Intel's CPU emulator [0] and run with the
Sapphire Rapids target:
clang amx_test.c -O0 -march=sapphirerapids -o amx_test
sde -spr -- ./amx_test
Which will throw an error that the tiles are not configured:
#UD TILES_CONFIGURED == 0
XDIS 00000000004007ee: AMX_TILE C4E27B49C0 tilezero tmm0
At O1 the generated binary will run successfully and print out the matrices.
Tested with
* clang version 13.0.0 (https://github.com/llvm/llvm-project.git
63a35f35ecf8a46e63af750a88faecfe1c6354a6) compiled from source
* Ubuntu clang version
12.0.0-++20210206062546+716eef9ad5b3-1~exp1~20210206053323.18 from apt.llvm.org
[0]:
https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210217/1fa70761/attachment.html>
More information about the llvm-bugs
mailing list