[PATCH] D70157: Align branches within 32-Byte boundary
Kan Shengchen via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Dec 5 08:02:16 PST 2019
skan marked 3 inline comments as done.
skan added inline comments.
================
Comment at: llvm/test/MC/X86/x86-64-align-branch-1b.s:10
+# CHECK: 0000000000000000 foo:
+# CHECK-NEXT: 0: 64 89 04 25 01 00 00 00 movl %eax, %fs:1
+# CHECK-NEXT: 8: 2e 55 pushq %rbp
----------------
MaskRay wrote:
> I think 1a.s and 1b.s should be merged. FileCheck supports
>
> --check-prefixes=CHECK,PREFIX5
> --check-prefixes=CHECK,PREFIX1
>
> ```
> CHECK: common part
> CHECK-NEXT: common part
> PREFIX5:
> PREFIX5-NEXT:
> PREFIX1:
> PREFIX1-NEXT:
> CHECK:
> CHECK-NEXT:
> ```
>
> ```
> % diff -U1 x86-64-align-branch-1[ab].s
> # CHECK: 0000000000000000 foo:
> -# CHECK-NEXT: 0: 64 64 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
> -# CHECK-NEXT: b: 55 pushq %rbp
> -# CHECK-NEXT: c: 55 pushq %rbp
> -# CHECK-NEXT: d: 55 pushq %rbp
> +# CHECK-NEXT: 0: 64 89 04 25 01 00 00 00 movl %eax, %fs:1
> +# CHECK-NEXT: 8: 2e 55 pushq %rbp
> +# CHECK-NEXT: a: 2e 55 pushq %rbp
> +# CHECK-NEXT: c: 2e 55 pushq %rbp
> # CHECK-NEXT: e: 48 89 e5 movq %rsp, %rbp
> ```
>
> Is there performance benefit to add 4 prefixes to the same instruction?
Thanks for the knowledge! There is no performance benefit to add 4 prefixes to the same instruction. The instruction `movl %eax, %fs:1` has already has a prefix %fs (0x64), if option `-x86-align-branch-prefix-size=5` is used, we could add 4 more prefixes at most. The branch to be aligned needs 3-byte padding, so 3 prefixes are added to the move instruction.
================
Comment at: llvm/test/MC/X86/x86-64-align-branch-1c.s:2
+# Check only fused conditional jumps and conditional jumps are aligned with option --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc --x86-align-branch-prefix-size=5
+# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc --x86-align-branch-prefix-size=5 %s | llvm-objdump -d - | FileCheck %s
+
----------------
MaskRay wrote:
> The difference between 1a and 1c is that 1c does not allow "jmp", but in 1a no jmp instructions get a prefix in the test, so it is unclear why 1c has different output.
```
# CHECK-NEXT: 45: 2e 89 45 fc movl %eax, %cs:-4(%rbp)
# CHECK-NEXT: 49: 89 75 f4 movl %esi, -12(%rbp)
# CHECK-NEXT: 4c: 89 7d f8 movl %edi, -8(%rbp)
# CHECK-NEXT: 4f: 89 75 f4 movl %esi, -12(%rbp)
# CHECK-NEXT: 52: 89 75 f4 movl %esi, -12(%rbp)
# CHECK-NEXT: 55: 89 75 f4 movl %esi, -12(%rbp)
# CHECK-NEXT: 58: 89 75 f4 movl %esi, -12(%rbp)
# CHECK-NEXT: 5b: 89 75 f4 movl %esi, -12(%rbp)
# CHECK-NEXT: 5e: 5d popq %rbp
# CHECK-NEXT: 5f: 5d popq %rbp
# CHECK-NEXT: 60: eb 26 jmp {{.*}}
```
The prefix 2e is added at
```
45: 2e 89 45 fc movl %eax, %cs:-4(%rbp)
```
================
Comment at: llvm/test/MC/X86/x86-64-align-branch-1e.s:46
+# CHECK-NEXT: 5c: eb 27 jmp {{.*}}
+# CHECK-NEXT: 5e: 90 nop
+# CHECK-NEXT: 5f: 90 nop
----------------
MaskRay wrote:
> This is weird. Comparing this with 1d, 1e allows more instruction types, yet it inserts two NOPs which actually seems to degrade performance.
Not all the target cpu support long nop, you can find related code in X86AsmBackend::writeNopData.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D70157/new/
https://reviews.llvm.org/D70157
More information about the cfe-commits
mailing list