[PATCH] D32684: [X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.

Mon Jun 5 17:08:20 PDT 2017

chandlerc added a comment.

Folks, I know this has gone back and forth a few times, but I'm afraid I have to revert it again.

Ultimately, this patch does two things at once, and I think that caught a lot of people by surprise:

1. It changes *how* the fold tables are built to use tablegen
2. It changes *what* is in the fold tables

#1 should be a no-op change in terms of behavior. But #2 is a really big behavior change that has already caused us to track down three really painful miscompiles due to inappropriate folding. Essentially, the new tables aren't correct yet, and the fact that this new functionality came all at once and coupled with a more benign refactoring makes it hard to cope with.

I also have some design concerns about the mechanisms used here. The only way to fix these is to grow an ever larger table of instructions in the C++ source code for tablegen, which is (IMO) even worse than a big table of instructions in the backend. No one will even think to look here to find out that they should add instructions to this. There is no checking that you have named an instruction correctly. And it isn't even clear what the correct name is in some cases.

All of this is also being done without properly addressing the glaring TODOs in the memory folding tests in the X86 backend. For example, in `test/CodeGen/X86/stack-folding-int-sse42.ll` we have `; TODO: stack_fold_pextrw`. This means we don't currently have any coverage of this instruction when doing folding. I think we need to have this kind of coverage before attempting to do #2, otherwise, we will run into bugs.

As it happens, I just debugged exactly this instruction. If I copy and paste the `pextrd` test like so:

  define i16 @stack_fold_pextrw(<8 x i16> %a0) {
    ;CHECK-LABEL: stack_fold_pextrw
    ;CHECK:       pextrw $1, {{%xmm[0-9][0-9]*}}, {{-?[0-9]*}}(%rsp) {{.*#+}} 4-byte Folded Spill
    ;CHECK:       movl    {{-?[0-9]*}}(%rsp), %eax {{.*#+}} 4-byte Reload
    ; add forces execution domain
    %1 = add <8 x i16> %a0, <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>
    %2 = extractelement <8 x i16> %1, i32 1
    %3 = tail call <2 x i64> asm sideeffect "nop", "=x,~{rax},~{rbx},~{rcx},~{rdx},~{rsi},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15}"()
    ret i16 %2
  }

It passes. Note that we do `pextrw` to a memory location and then reload it with `movl`. This will load garbage from the high 16 bits! We shouldn't be folding a 32-bit memory access into the memory operand of `pextrw`, but I can't even find where the pattern is that defines this lowering...

I could just blindly guess based on knowing how instructions are encoded in the X86 backend and probably disable folding for it, but it would be the third such disabling that I'm aware of. We can't just play whack-a-mole here.

So my concrete suggestions are:

1. Redesign how the folding works so that it is enabled or disabled by the instruction pattern itself rather than by a table in this file with string literals in it. Or at least by something else in the X86 tablegen files rather than in this C++ code.
2. Add the new tablegen emitter then, with just the flags in instruction patterns that generate the exact same table as was hard coded before.
3. Audit the outstanding TODOs (I'll submit the above test case though as soon as I finish reverting) and fill in the obvious ones so we have better coverage of folding.
4. Then start incrementally enabling more things with commensurate test updates as you go.

Does that make sense?

It'll probably take me an hour just to construct the revert, so shout if folks are really vehemently opposed to this. But it seems better for me to revert back to green than try to guess at how to stamp out yet another miscompile derived from this patch. =[

Repository:
  rL LLVM

https://reviews.llvm.org/D32684