[PATCH] D69411: [MC] Parse .if conditions with symbols in consecutive MCDataFragements

Tue Nov 12 11:32:56 PST 2019

peter.smith added a comment.

In D69411#1742501 <https://reviews.llvm.org/D69411#1742501>, @nickdesaulniers wrote:

> Thanks for the summary.
>
> In D69411#1742030 <https://reviews.llvm.org/D69411#1742030>, @peter.smith wrote:
>
> > To summarise where I think we are right now.
> >
> > - D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
> > - This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
> >   - When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.
>
>
> I'll follow up on this.  There's likely a lot of code that can be dropped if that's the case.
>
> >   - When there is a Subtarget change between MCDataFragments.
> > - In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
> > - The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.
> > 
> >   From the perspective of the linux kernel, is D76002 sufficient?
>
> I can still reproduce https://github.com/ClangBuiltLinux/linux/issues/742 with https://reviews.llvm.org/D70062 applied.

That's unfortunate. I've found out how to reproduce the original issue:

  clang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__ASSEMBLY__ --target=arm-linux-gnueabihf -D__LINUX_ARM_ARCH__=7 -march=armv7-a -include asm/unified.h -c -o /tmp/proc-v7.o arch/arm/mm/proc-v7.Sclang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__ASSEMBLY__ --target=arm-linux-gnueabihf -D__LINUX_ARM_ARCH__=7 -march=armv7-a -include asm/unified.h -c -o /tmp/proc-v7.o arch/arm/mm/proc-v7.S

If I extract out just the small part of the test case it works even without 76002.

> The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide).  See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.
> 
> I don't see any subarch directives there.

They are there, I preprocessed to get a rather large output file including:

  .globl cpu_v7_dcache_clean_area ; .align 0 ; cpu_v7_dcache_clean_area:
   9998: nop @ MP extensions imply L1 PTW
   .equ up_b_offset, 1f - 9998b ; .pushsection ".alt.smp.init", "a" ; .long 9998b ; b . + up_b_offset ; .popsection
   ret lr
  1: dcache_line_size r2, r3
  2: mcr p15, 0, r0, c7, c10, 1 @ clean D entry
   add r0, r0, r2
   subs r1, r1, r2
   bhi 2b
   dsb ishst
   ret lr
  .type cpu_v7_dcache_clean_area, %function; .size cpu_v7_dcache_clean_area, .-cpu_v7_dcache_clean_area

   .arch_extension sec
  .globl cpu_v7_smc_switch_mm ; .align 0 ; cpu_v7_smc_switch_mm:
   stmfd sp!, {r0 - r3}
   movw r0, #:lower16:(((1) << 31) | ((0) << 30) | (((0) & 0x3F) << 24) | ((0x8000) & 0xFFFF))
   movt r0, #:upper16:(((1) << 31) | ((0) << 30) | (((0) & 0x3F) << 24) | ((0x8000) & 0xFFFF))
   smc #0
   ldmfd sp!, {r0 - r3}
   b cpu_v7_switch_mm
  .type cpu_v7_smc_switch_mm, %function; .size cpu_v7_smc_switch_mm, .-cpu_v7_smc_switch_mm

  ...
  ...

   9998: orr r1, r1, #((0 << 0) | (1 << 6))|(1 << 1)|(1 << 5)|(1 << 3)
   .pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection

This confused me for a bit, I think something like the following is happening.
.pushsection  ".alt.smp.init" 
// some instructions with Subtarget X
.popsection

.arch_extension sec // or anything to change the subtarget
// more stuff

.pushsection  ".alt.smp.init" 
// we are returning to .alt.smp.init but our subtarget is now X +sec
next instruction will start a new fragment
.popsection

It is getting late over here so I need to go home. Would jcai19 or yourself be able to investigate to confirm, check further? We would need to start a new MCDataFragment within .alt.smp.init for the new SubTarget, but I'd expect it all to happen before the 9997: in the failing case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69411/new/

https://reviews.llvm.org/D69411