[PATCH] D125418: [Arm64EC 6/?] Implement C/C++ mangling for Arm64EC function definitions.

Thu Aug 11 20:00:23 PDT 2022

bcl5980 added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:5128
+    // to the function itself; it points to a stub for the compiler.
+    // FIXME: We also need to emit an entry thunk.
+    SmallString<256> MangledName;
----------------
efriedma wrote:
> bcl5980 wrote:
> > efriedma wrote:
> > > bcl5980 wrote:
> > > > efriedma wrote:
> > > > > bcl5980 wrote:
> > > > > > A headache thing here.
> > > > > > We need to get the function definition with triple x64 to define entry thunk. For now the function definition here is aarch64 version.
> > > > > > For example the case in Microsoft doc "Understanding Arm64EC ABI and assembly code":
> > > > > > 
> > > > > > ```
> > > > > > struct SC {
> > > > > >     char a;
> > > > > >     char b;
> > > > > >     char c;
> > > > > > };
> > > > > > int fB(int a, double b, int i1, int i2, int i3);
> > > > > > int fC(int a, struct SC c, int i1, int i2, int i3);
> > > > > > int fA(int a, double b, struct SC c, int i1, int i2, int i3) {
> > > > > >     return fB(a, b, i1, i2, i3) + fC(a, c, i1, i2, i3);
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > x64 version IR for fA is:
> > > > > > ```
> > > > > > define dso_local i32 @fA(i32 noundef %a, double noundef %b, ptr nocapture noundef readonly %c, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) local_unnamed_addr #0 { ... }
> > > > > > ```
> > > > > > aarch64 version IR for fA is:
> > > > > > 
> > > > > > ```
> > > > > > define dso_local i32 @"#fA"(i32 noundef %a, double noundef %b, i64 %c.coerce, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) #0 {...}
> > > > > > ```
> > > > > > Arm64 will allow any size structure to be assigned to a register directly. x64 only allows sizes 1, 2, 4 and 8. 
> > > > > > Entry thunk follow x64 version function type. But we only have aarch64 version function type.
> > > > > > 
> > > > > > I think the best way to do is create a x64 version codeGenModule and use the x64 CGM to generate the function type for entry thunk. But it is hard for me to do here. I tried a little but a lot of issues happen.
> > > > > > 
> > > > > > One other way is only modify `AArch64ABIInfo::classifyArgumentType`, copy the x64 code into the function and add a flag to determine which version will the function use. It is easier but I'm not sure it is the only difference between x64 and aarch64. Maybe the classify return also need to do this. And it is not a clean way I think.
> > > > > Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7.
> > > > > 
> > > > > Like I noted on D126811, attaching thunks to calls is tricky if we try to do it from clang.
> > > > > 
> > > > > Computing the right IR type shouldn't be that hard by itself; we can call into call lowering code in TargetInfo without modifying much else.  (We just need a bit to tell the TargetInfo to redirect the call, like D125419.  Use an entry point like CodeGenTypes::arrangeCall.)  You don't need to mess with the type system or anything like that.
> > > > > 
> > > > > The problem is correctly representing the lowered call in IR; we really don't want to do lowering early because it will block optimizations.  I considered using an operand bundle; we can probably make that work, but it's complicated, and probably disables some optimizations.
> > > > > 
> > > > > I think the best thing we can do here is add an IR attribute to mark arguments which are passed directly on AArch64, but need to be passed indirectly for the x64 ABI.  Then AArch64Arm64ECCallLowering can check for the attribute and modify its behavior.  This isn't really clean in the sense that it's specific to the x64/aarch64 pair of calling conventions, but I think the alternative is worse.
> > > > It looks not only 3/5/6/7, but also all size exclusive larger than 8 and less than 16 are difference between x86 ABI and Aarch64 ABI.
> > > > Maybe we can emit a function declaration here for the x86ABI thunk, then define it in Arm64ECCallLowering.
> > > > 
> > > I think the sizes between 8 and 16 work correctly already?  All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that.  But that's not really important for the general question.
> > > 
> > > We need to preserve the required semantics for both the AArch64 and x86 calling conventions.  There are basically the following possibilities:
> > > 
> > > - We compute the declaration of the thunk in the frontend, and attach it to the call with an operand bundle.  Like I mentioned, I don't want to go down this path: the operand bundle blocks optimizations, and it becomes more complicated for other code to generate arm64ec compatible calls.
> > > - We don't compute the definition of the thunk in the frontend.  Given that, the only other way to attach the information we need to the call is to use attributes.  The simplest thing is probably to attach the attribute directly to the argument; name it "arm64ec-thunk-pass-indirect", or something like that.  (I mean, we could compute the whole signature and stuff it into a string attribute, but that doesn't really seem like an improvement...)
> > > I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that.
> > Yeah, current code for exit thunk already account for that. I mean we need to mark the parameter because entry thunk behavior is also different.
> >  
> > Maybe we can compute the mangle name like `$iexit_thunk$cdecl$i8$m6` or `$ientry_thunk$cdecl$m16$f` for the thunk function. Then set attributes like
> > ```
> > "arm64ec-exitthunk"="$iexit_thunk$cdecl$i8$m6"
> > "arm64ec-entrythunk"="$ientry_thunk$cdecl$m16$f"
> > ```
> > to the function.
> > Based on the mangle name we can restore the whole thunk I think. This should be a little easier.
> Each function has an arm64 function signature, and a corresponding x64 signature.  The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature.  That part is the same whether we're generating an entry thunk, or an exit thunk.  So I'm not sure why you're distinguishing between them in this context.
> 
> I'm not sure it makes sense to force the frontend to generate the mangled form, then make the backend demangle it.  Seems more straightforward to just attach an attribute to an argument, and make the backend generate the mangled form?
> Each function has an arm64 function signature, and a corresponding x64 signature. The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature. That part is the same whether we're generating an entry thunk, or an exit thunk. So I'm not sure why you're distinguishing between them in this context.

I mean which arguments need to be marked is different for the entry thunk and exit thunk.
Both entry thunk and exit thunk need to mark argument with size 3/5/6/7.
But when the size is larger than 8 and less than 16, entry thunk still need to mark it but exit thunk needn't.
So if we attach an attribute to an argument we need to consider the case larger than 8 less than 16 also.
Because when a function has an argument with size 15bytes, frontend will coerce it to i64x2. If we don't attach an attribute for it , backend can't generate the correct entry thunk as we already loss the real size of the argument. Exit thunk needn't that because the code for 15bytes and 16 bytes is the same, store i64x2 to the memory them pass the address.

This is part of 15bytes entry thunk

```
	mov         fp,sp
	mov         x10,x1
	ldr         w1,[x10,#8]
	mov         x19,x0
	ldur        w8,[x10,#0xB]
	ldr         x0,[x10]
	bfi         x1,x8,#0x18,#0x20
	blr         x9
```
This is part of 16bytes entry thunk

```
	mov         fp,sp
	mov         x8,x1
	mov         x19,x0
	ldp         x0,x1,[x8]
	blr         x9
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125418/new/

https://reviews.llvm.org/D125418