[llvm] [X86] Support EGPR (R16-R31) for APX (PR #67702)

Shengchen Kan via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 16 00:44:41 PDT 2023


KanRobert wrote:

Hi @nikic, I reproduced the regression for sqlite3 and did some investigation. The 0.4% instruction count regression was introduced by definitions of new registers. As long as we introduce definitions in the td file,
```
def R16B : X86Reg<"r16b", 16>;
```
even if we do not add these registers to any register class, regression will occur. For example, when we collect the instruction counts by command
```
valgrind --tool=callgrind clang -DNDEBUG  -O3   -w -Werror=date-time -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHA    VE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DSQLITE_THREADSA    FE=0 -I. -MD -MT MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/shell.c.o -MF MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/shell.c.o.d -o MultiSource/App    lications/sqlite3/CMakeFiles/sqlite3.dir/shell.c.o -c /export/users/skan/test-suite/MultiSource/Applications/sqlite3/shell.c
```
the new added registers will introduce 0.6% regression in total. Here is the statistic:

Change | instruction count | regression
--------- | ------------------ | ------------   
baseline | 2,252,104,105 | 0%
\+ R16B-R31B |  2,256,497,475  | 0.17%
\+ R16BH-R31BH |  2,259,699,801 | 0.26%
\+ R16WH-R31WH |  2,261,163,600 | 0.4%
\+ R16W-R31W | 2,262,719,972 | 0.44%
\+ R16D-R31D | 2,264,204,477 | 0.53%
\+ R16-R31 | 2,267,003,623  | 0.6%


The extra instructions mainly come from the iteration over these registers like
```
bool LiveVariables::runOnMachineFunction(MachineFunction &mf) {
...
  const unsigned NumRegs = TRI->getNumRegs();
...
  for (MachineBasicBlock *MBB : depth_first_ext(Entry, Visited)) {
    runOnBlock(MBB, NumRegs);
...
}

void LiveVariables::runOnBlock(MachineBasicBlock *MBB, const unsigned NumRegs) {
...
  // Loop over PhysRegDef / PhysRegUse, killing any registers that are
  // available at the end of the basic block.
  for (unsigned i = 0; i != NumRegs; ++i)
    if ((PhysRegDef[i] || PhysRegUse[i]) && !LiveOuts.count(i))
      HandlePhysRegDef(i, nullptr, Defs);

}
```
NumRegs equals 292 w/o this PR and equals 388 (292+16*6) w/ this PR. The code does some stuff on registers even if the target does not support them. I think it's a target-independent issue and not clear how to fix it.

Do you have any ideas or could we ignore this regression for the time being? cc @phoebewang @RKSimon @topperc 

https://github.com/llvm/llvm-project/pull/67702


More information about the llvm-commits mailing list