[llvm-bugs] [Bug 49223] New: No AMX ldtilecfg emitted when compiled without optimizations

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Feb 17 07:14:16 PST 2021


https://bugs.llvm.org/show_bug.cgi?id=49223

            Bug ID: 49223
           Summary: No AMX ldtilecfg emitted when compiled without
                    optimizations
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: john at jwlawson.co.uk
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, pengfei.wang at intel.com,
                    spatel+llvm at rotateright.com

Created attachment 24547
  --> https://bugs.llvm.org/attachment.cgi?id=24547&action=edit
Small test case that uses AMX instructions and fails to run without
optimizations enabled

When using the AMX C interface intrinsics without optimizations the instruction
to load the tile configuration settings are not generated. This leads to a
runtime error when any other AMX instruction is used as the tile configurations
are a prerequisite for them. The AMX instructions are all tile based with
programmable sizes, so the configurations must be set before anything can be
done with the tiles.

    #include <immintrin.h>
    #include <stdint.h>
    void compute_matmul(uint16_t row, uint16_t col, uint16_t acc,               
                        int8_t* data, int32_t* out) {                         
      __tile1024i A_tile = {row, acc};                                          
      __tile1024i B_tile = {acc / 4, 4 * col};                                  
      __tile1024i out_tile = {row, 4 * col};                                    

      __tile_zero(&out_tile);                                                   
      __tile_loadd(&A_tile, data, acc);                                         
      __tile_loadd(&B_tile, data, 4 * col);                                     
      __tile_dpbssd(&out_tile, A_tile, B_tile);                                 
      __tile_stored(out, 4 * col, out_tile);                                    
    }

With optimizations this generates the correct code:
    ...
    ldtilecfg -64(%rsp)                                                         
    tilezero  %tmm0                                                             
    movl  %edx, %r8d                                                            
    movl  $data, %r9d                                                           
    tileloadd (%r9,%r8), %tmm1                                                  
    tileloadd (%r9,%rsi), %tmm2                                                 
    tdpbssd %tmm2, %tmm1, %tmm0                                                 
    tilestored  %tmm0, (%rcx,%rsi)                                              
    tilerelease
    ...

But when compiled without optimizations there is no ldtilecfg instruction
emitted before the other AMX instructions.

The attached test case can be compiled with clang
    clang amx_test.c -O0 -march=sapphirerapids -S -o amx_test.s
which will show that the ldtilecfg instruction is not generated for O0, but is
for O1.

The binary can be tested using Intel's CPU emulator [0] and run with the
Sapphire Rapids target:
    clang amx_test.c -O0 -march=sapphirerapids -o amx_test
    sde -spr -- ./amx_test

Which will throw an error that the tiles are not configured:
    #UD TILES_CONFIGURED == 0
    XDIS 00000000004007ee: AMX_TILE C4E27B49C0               tilezero tmm0

At O1 the generated binary will run successfully and print out the matrices.

Tested with 
* clang version 13.0.0 (https://github.com/llvm/llvm-project.git
63a35f35ecf8a46e63af750a88faecfe1c6354a6) compiled from source
* Ubuntu clang version
12.0.0-++20210206062546+716eef9ad5b3-1~exp1~20210206053323.18 from apt.llvm.org

[0]:
https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210217/1fa70761/attachment.html>


More information about the llvm-bugs mailing list