[PATCH] D88823: [CodeGen][TailDuplicator] Don't duplicate blocks with INLINEASM_BR

Bill Wendling via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 5 14:57:19 PDT 2020


void added a comment.

In D88823#2312933 <https://reviews.llvm.org/D88823#2312933>, @nickdesaulniers wrote:

> I confirmed that this patch does fix the kernel panic observed in https://github.com/ClangBuiltLinux/linux/issues/1125#issuecomment-703890483.
>
> I don't understand the bug though.  Looking at the output of:
>
>   llc -mtriple=i686-- -stop-after=early-tailduplication -print-after-all llvm/test/CodeGen/X86/tail-dup-asm-goto.ll -o - 2>&1 | less
>
> Before/After `IR Dump After Early Tail Duplication` before rebuilding clang with this patch applied (so still in a bad state), but nothing looks wrong to me.  So what precisely is the bug?
>
> Before early tail dup:
>
>   bb.0.bb:
>     successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
>   
>     %2:gr32 = IMPLICIT_DEF
>     %0:gr32 = MOV32rm killed %2:gr32, 1, $noreg, 0, $noreg :: (load 4 from `i8** undef`, align 8)
>     %3:gr32_abcd = MOV32r0 implicit-def dead $eflags
>     %4:gr8 = COPY %3.sub_8bit:gr32_abcd
>     TEST8rr %4:gr8, %4:gr8, implicit-def $eflags
>     JCC_1 %bb.2, 5, implicit $eflags
>     JMP_1 %bb.1
>   
>   bb.1.bb100:
>   ; predecessors: %bb.0
>     successors: %bb.3(0x80000000); %bb.3(100.00%)
>   
>     %6:gr32 = IMPLICIT_DEF
>     MOV32mi killed %6:gr32, 1, $noreg, 0, $noreg, 0 :: (store 4 into `i8** undef`, align 8)
>     JMP_1 %bb.3
>   
>   bb.2.bb106:
>   ; predecessors: %bb.0
>     successors: %bb.3(0x80000000); %bb.3(100.00%)
>   
>     ADJCALLSTACKDOWN32 0, 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
>     CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $esp, implicit-def 
>   $ssp
>     ADJCALLSTACKUP32 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
>     %5:gr32 = MOV32r0 implicit-def dead $eflags
>   
>   bb.3.bb110:
>   ; predecessors: %bb.2, %bb.1
>     successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)
>   
>     %1:gr32 = PHI %5:gr32, %bb.2, %0:gr32, %bb.1

Note that we have a PHI node here that passes through the indirect branch (starting at `bb.4.bb17.i.i.i`) and used in `bb.5.kmem_cache_has_cpu_partial.exit`.

>   INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[
>
> clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
>
>   JMP_1 %bb.5

The duplication of `bb.3.bb110` has the effect that the PHI is now pushed down into the indirect destination block. See below.

> bb.4.bb17.i.i.i (address-taken):
> ; predecessors: %bb.3
>
>   successors: %bb.5(0x80000000); %bb.5(100.00%)
>
> bb.5.kmem_cache_has_cpu_partial.exit:
> ; predecessors: %bb.3, %bb.4
>
>   $eax = COPY %1:gr32
>   RET 0, $eax
>
>   After early tail dup:
>
> bb.0.bb:
>
>   successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
>   
>   %2:gr32 = IMPLICIT_DEF
>   %0:gr32 = MOV32rm killed %2:gr32, 1, $noreg, 0, $noreg :: (load 4 from `i8** undef`, align 8)
>   %3:gr32_abcd = MOV32r0 implicit-def dead $eflags
>   %4:gr8 = COPY %3.sub_8bit:gr32_abcd
>   TEST8rr %4:gr8, %4:gr8, implicit-def $eflags
>   JCC_1 %bb.2, 5, implicit $eflags
>   JMP_1 %bb.1
>
> bb.1.bb100:
> ; predecessors: %bb.0
>
>   successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)
>   
>   %6:gr32 = IMPLICIT_DEF
>   MOV32mi killed %6:gr32, 1, $noreg, 0, $noreg, 0 :: (store 4 into `i8** undef`, align 8)
>   INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
>   JMP_1 %bb.5
>
> bb.2.bb106:
> ; predecessors: %bb.0
>
>   successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)
>   
>   ADJCALLSTACKDOWN32 0, 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
>   CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $esp, implicit-def $ssp
>   ADJCALLSTACKUP32 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
>   %5:gr32 = MOV32r0 implicit-def dead $eflags
>   INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
>   JMP_1 %bb.5

The PHI node that was in `bb.3.bb110` was moved back into the predecessors: `%6:gr32 = IMPLICIT_DEF` and `%5:gr32 = MOV32r0 implicit-def dead $eflags`. (Note that `%5.gr32 ...` is now with the `INLINEASM_BR` due to the predecessor and `bb.3.bb110` merging.) This doesn't resolve the PHI node though, so it ends up in the indirect destination block (`bb.4.bb17.i.i.i`). Those values also end up in the PHI node in the default destination.

> bb.4.bb17.i.i.i (address-taken):
> ; predecessors: %bb.1, %bb.2
>
>   successors: %bb.5(0x80000000); %bb.5(100.00%)
>   
>   %9:gr32 = PHI %0:gr32, %bb.1, %5:gr32, %bb.2

The problem is this PHI here. We don't have a way to resolve PHI nodes in the entry block to an indirect destination. (This is the reason we don't allow "asm goto" outputs on the indirect paths.) When we go to resolve this PHI node, there's no place to put the put the values that works in all situations; we can't split the critical edges and it's not valid to place the resolved instructions at the ends of the predecessor blocks.

> bb.5.kmem_cache_has_cpu_partial.exit:
> ; predecessors: %bb.4, %bb.1, %bb.2
>
>   %10:gr32 = PHI %9:gr32, %bb.4, %0:gr32, %bb.1, %5:gr32, %bb.2
>   $eax = COPY %10:gr32
>   RET 0, $eax
>
>   The comment
>   
>   > We assume that all values used on the indirect path
>   > dominates all paths into the indirect block, i.e. don't have PHI nodes.
>   
>   Doesn't make sense to me.  Which PHI node in the above post-early-tail-dup is problematic?
>   
>   I wonder if there's perhaps a later pass that's messing up, and this change is simply hiding that? Or if duplicating the `INLINEASM_BR` results in multiple side effects? (Even if the code is never run, the kernel uses `.pushsection` to squirrel away data in these, so tail duplication is duplicating the data.  Is that a problem? Not sure yet.)

Later passes are messing up because tail duplication created a situation they're not able to handle.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88823/new/

https://reviews.llvm.org/D88823



More information about the llvm-commits mailing list