[PATCH] D125418: [Arm64EC 6/?] Implement C/C++ mangling for Arm64EC function definitions.

Tue Jul 19 10:31:20 PDT 2022

efriedma added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:5128
+    // to the function itself; it points to a stub for the compiler.
+    // FIXME: We also need to emit an entry thunk.
+    SmallString<256> MangledName;
----------------
bcl5980 wrote:
> A headache thing here.
> We need to get the function definition with triple x64 to define entry thunk. For now the function definition here is aarch64 version.
> For example the case in Microsoft doc "Understanding Arm64EC ABI and assembly code":
> 
> ```
> struct SC {
>     char a;
>     char b;
>     char c;
> };
> int fB(int a, double b, int i1, int i2, int i3);
> int fC(int a, struct SC c, int i1, int i2, int i3);
> int fA(int a, double b, struct SC c, int i1, int i2, int i3) {
>     return fB(a, b, i1, i2, i3) + fC(a, c, i1, i2, i3);
> }
> ```
> 
> x64 version IR for fA is:
> ```
> define dso_local i32 @fA(i32 noundef %a, double noundef %b, ptr nocapture noundef readonly %c, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) local_unnamed_addr #0 { ... }
> ```
> aarch64 version IR for fA is:
> 
> ```
> define dso_local i32 @"#fA"(i32 noundef %a, double noundef %b, i64 %c.coerce, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) #0 {...}
> ```
> Arm64 will allow any size structure to be assigned to a register directly. x64 only allows sizes 1, 2, 4 and 8. 
> Entry thunk follow x64 version function type. But we only have aarch64 version function type.
> 
> I think the best way to do is create a x64 version codeGenModule and use the x64 CGM to generate the function type for entry thunk. But it is hard for me to do here. I tried a little but a lot of issues happen.
> 
> One other way is only modify `AArch64ABIInfo::classifyArgumentType`, copy the x64 code into the function and add a flag to determine which version will the function use. It is easier but I'm not sure it is the only difference between x64 and aarch64. Maybe the classify return also need to do this. And it is not a clean way I think.
Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7.

Like I noted on D126811, attaching thunks to calls is tricky if we try to do it from clang.

Computing the right IR type shouldn't be that hard by itself; we can call into call lowering code in TargetInfo without modifying much else.  (We just need a bit to tell the TargetInfo to redirect the call, like D125419.  Use an entry point like CodeGenTypes::arrangeCall.)  You don't need to mess with the type system or anything like that.

The problem is correctly representing the lowered call in IR; we really don't want to do lowering early because it will block optimizations.  I considered using an operand bundle; we can probably make that work, but it's complicated, and probably disables some optimizations.

I think the best thing we can do here is add an IR attribute to mark arguments which are passed directly on AArch64, but need to be passed indirectly for the x64 ABI.  Then AArch64Arm64ECCallLowering can check for the attribute and modify its behavior.  This isn't really clean in the sense that it's specific to the x64/aarch64 pair of calling conventions, but I think the alternative is worse.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125418/new/

https://reviews.llvm.org/D125418