[Openmp-commits] [PATCH] D138704: RFC: [openmp] Provide an assembly implementation of __kmp_invoke_microtask on ARM
    Martin Storsjö via Phabricator via Openmp-commits 
    openmp-commits at lists.llvm.org
       
    Fri Nov 25 09:47:49 PST 2022
    
    
  
mstorsjo added a comment.
Just BTW - the assembly here isn't entirely written from scratch, but I took the existing aarch64 implementation and rewrote it, so most of the style/structure matches that.
I'll try to add comments to clarify the details that you pointed out. Thanks for the review!
================
Comment at: openmp/runtime/src/z_Linux_asm.S:1429
+
+	// Pushing one extra register to keep the stack aligned
+	push	{r3-r11,lr}
----------------
DavidSpickett wrote:
> You might want to add "for when we call pkfn below".
> 
> Since AAPCS32 says align to 4, unless at a public interface where it should be 8. Which includes calling a function.
> (which evidently you knew but I did not)
> 
> Also worth saying which is the extra register.
Sure, can do.
================
Comment at: openmp/runtime/src/z_Linux_asm.S:1431
+	push	{r3-r11,lr}
+	ldrd    r4, r5, [sp, #10*4]
+
----------------
DavidSpickett wrote:
> What does this load? I think p_argv and exit_frame.
Yes, that's correct - I'll add a comment.
================
Comment at: openmp/runtime/src/z_Linux_asm.S:1452
+	sub	sp, sp, r5, lsl #3
+	sub	r3, r3, #1
+
----------------
DavidSpickett wrote:
> r3 is the arg count. So I think you are rounding up for the case where the count is an odd number, which would leave sp at 4 byte alignment? But if that's the case why do you then multiply by 8 not 4.
> 
> Because you divided by 2 in the first place? Making the calculation produce a number of 8 byte slots not 4?
> 
> Tell me where I'm going wrong there.
Exactly - we want to adjust the stack down 4*argc bytes, but rounded up to an even increment of 8 bytes.
================
Comment at: openmp/runtime/src/z_Linux_asm.S:1465
+
+	cmp	r5, #0
+	beq	KMP_LABEL(kmp_1)
----------------
DavidSpickett wrote:
> This section you doing:
> ```
> while (argc != 0) {
>    copy and argument to the stack
>    argc--
> }
> 
> call pkfn
> ```
> 
> Correct?
Yes, pretty much.
================
Comment at: openmp/runtime/src/z_Linux_asm.S:1467
+	beq	KMP_LABEL(kmp_1)
+	ldr	r2, [r6]
+
----------------
DavidSpickett wrote:
> What does this `ldr r2` and the `ldr r3` below setup?
When calling a function, the first 4 arguments go in r0-r3 - r0/r1 are filled with `>id` and `&tid`, while r2/r3 should get the first arguments among the regular ones. If argc > 2, then we copy the rest of them onto the stack. In the case of aarch64, we'd copy incoming arguments into x2-x7 and the rest on the stack - here we only need to handle r2/r3 separately.
================
Comment at: openmp/runtime/src/z_Linux_asm.S:2062
     .size __kmp_unnamed_critical_addr,4
+#endif
 #endif /* KMP_ARCH_ARM */
----------------
DavidSpickett wrote:
> And this is an elf/coff difference? No need for a equivalent coff line?
`.size` is a ELF-specific assembly directive (and object file feature) - COFF doesn't store per-symbol sizes.
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138704/new/
https://reviews.llvm.org/D138704
    
    
More information about the Openmp-commits
mailing list