[PATCH] D125418: [Arm64EC 6/?] Implement C/C++ mangling for Arm64EC function definitions.

Thu Aug 11 12:41:45 PDT 2022

efriedma added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:5128
+    // to the function itself; it points to a stub for the compiler.
+    // FIXME: We also need to emit an entry thunk.
+    SmallString<256> MangledName;
----------------
bcl5980 wrote:
> efriedma wrote:
> > bcl5980 wrote:
> > > efriedma wrote:
> > > > bcl5980 wrote:
> > > > > A headache thing here.
> > > > > We need to get the function definition with triple x64 to define entry thunk. For now the function definition here is aarch64 version.
> > > > > For example the case in Microsoft doc "Understanding Arm64EC ABI and assembly code":
> > > > > 
> > > > > ```
> > > > > struct SC {
> > > > >     char a;
> > > > >     char b;
> > > > >     char c;
> > > > > };
> > > > > int fB(int a, double b, int i1, int i2, int i3);
> > > > > int fC(int a, struct SC c, int i1, int i2, int i3);
> > > > > int fA(int a, double b, struct SC c, int i1, int i2, int i3) {
> > > > >     return fB(a, b, i1, i2, i3) + fC(a, c, i1, i2, i3);
> > > > > }
> > > > > ```
> > > > > 
> > > > > x64 version IR for fA is:
> > > > > ```
> > > > > define dso_local i32 @fA(i32 noundef %a, double noundef %b, ptr nocapture noundef readonly %c, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) local_unnamed_addr #0 { ... }
> > > > > ```
> > > > > aarch64 version IR for fA is:
> > > > > 
> > > > > ```
> > > > > define dso_local i32 @"#fA"(i32 noundef %a, double noundef %b, i64 %c.coerce, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) #0 {...}
> > > > > ```
> > > > > Arm64 will allow any size structure to be assigned to a register directly. x64 only allows sizes 1, 2, 4 and 8. 
> > > > > Entry thunk follow x64 version function type. But we only have aarch64 version function type.
> > > > > 
> > > > > I think the best way to do is create a x64 version codeGenModule and use the x64 CGM to generate the function type for entry thunk. But it is hard for me to do here. I tried a little but a lot of issues happen.
> > > > > 
> > > > > One other way is only modify `AArch64ABIInfo::classifyArgumentType`, copy the x64 code into the function and add a flag to determine which version will the function use. It is easier but I'm not sure it is the only difference between x64 and aarch64. Maybe the classify return also need to do this. And it is not a clean way I think.
> > > > Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7.
> > > > 
> > > > Like I noted on D126811, attaching thunks to calls is tricky if we try to do it from clang.
> > > > 
> > > > Computing the right IR type shouldn't be that hard by itself; we can call into call lowering code in TargetInfo without modifying much else.  (We just need a bit to tell the TargetInfo to redirect the call, like D125419.  Use an entry point like CodeGenTypes::arrangeCall.)  You don't need to mess with the type system or anything like that.
> > > > 
> > > > The problem is correctly representing the lowered call in IR; we really don't want to do lowering early because it will block optimizations.  I considered using an operand bundle; we can probably make that work, but it's complicated, and probably disables some optimizations.
> > > > 
> > > > I think the best thing we can do here is add an IR attribute to mark arguments which are passed directly on AArch64, but need to be passed indirectly for the x64 ABI.  Then AArch64Arm64ECCallLowering can check for the attribute and modify its behavior.  This isn't really clean in the sense that it's specific to the x64/aarch64 pair of calling conventions, but I think the alternative is worse.
> > > It looks not only 3/5/6/7, but also all size exclusive larger than 8 and less than 16 are difference between x86 ABI and Aarch64 ABI.
> > > Maybe we can emit a function declaration here for the x86ABI thunk, then define it in Arm64ECCallLowering.
> > > 
> > I think the sizes between 8 and 16 work correctly already?  All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that.  But that's not really important for the general question.
> > 
> > We need to preserve the required semantics for both the AArch64 and x86 calling conventions.  There are basically the following possibilities:
> > 
> > - We compute the declaration of the thunk in the frontend, and attach it to the call with an operand bundle.  Like I mentioned, I don't want to go down this path: the operand bundle blocks optimizations, and it becomes more complicated for other code to generate arm64ec compatible calls.
> > - We don't compute the definition of the thunk in the frontend.  Given that, the only other way to attach the information we need to the call is to use attributes.  The simplest thing is probably to attach the attribute directly to the argument; name it "arm64ec-thunk-pass-indirect", or something like that.  (I mean, we could compute the whole signature and stuff it into a string attribute, but that doesn't really seem like an improvement...)
> > I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that.
> Yeah, current code for exit thunk already account for that. I mean we need to mark the parameter because entry thunk behavior is also different.
>  
> Maybe we can compute the mangle name like `$iexit_thunk$cdecl$i8$m6` or `$ientry_thunk$cdecl$m16$f` for the thunk function. Then set attributes like
> ```
> "arm64ec-exitthunk"="$iexit_thunk$cdecl$i8$m6"
> "arm64ec-entrythunk"="$ientry_thunk$cdecl$m16$f"
> ```
> to the function.
> Based on the mangle name we can restore the whole thunk I think. This should be a little easier.
Each function has an arm64 function signature, and a corresponding x64 signature.  The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature.  That part is the same whether we're generating an entry thunk, or an exit thunk.  So I'm not sure why you're distinguishing between them in this context.

I'm not sure it makes sense to force the frontend to generate the mangled form, then make the backend demangle it.  Seems more straightforward to just attach an attribute to an argument, and make the backend generate the mangled form?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125418/new/

https://reviews.llvm.org/D125418