[PATCH] D125418: [Arm64EC 6/?] Implement C/C++ mangling for Arm64EC function definitions.

Wed Aug 10 13:07:27 PDT 2022

efriedma added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:5128
+    // to the function itself; it points to a stub for the compiler.
+    // FIXME: We also need to emit an entry thunk.
+    SmallString<256> MangledName;
----------------
bcl5980 wrote:
> efriedma wrote:
> > bcl5980 wrote:
> > > A headache thing here.
> > > We need to get the function definition with triple x64 to define entry thunk. For now the function definition here is aarch64 version.
> > > For example the case in Microsoft doc "Understanding Arm64EC ABI and assembly code":
> > > 
> > > ```
> > > struct SC {
> > >     char a;
> > >     char b;
> > >     char c;
> > > };
> > > int fB(int a, double b, int i1, int i2, int i3);
> > > int fC(int a, struct SC c, int i1, int i2, int i3);
> > > int fA(int a, double b, struct SC c, int i1, int i2, int i3) {
> > >     return fB(a, b, i1, i2, i3) + fC(a, c, i1, i2, i3);
> > > }
> > > ```
> > > 
> > > x64 version IR for fA is:
> > > ```
> > > define dso_local i32 @fA(i32 noundef %a, double noundef %b, ptr nocapture noundef readonly %c, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) local_unnamed_addr #0 { ... }
> > > ```
> > > aarch64 version IR for fA is:
> > > 
> > > ```
> > > define dso_local i32 @"#fA"(i32 noundef %a, double noundef %b, i64 %c.coerce, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) #0 {...}
> > > ```
> > > Arm64 will allow any size structure to be assigned to a register directly. x64 only allows sizes 1, 2, 4 and 8. 
> > > Entry thunk follow x64 version function type. But we only have aarch64 version function type.
> > > 
> > > I think the best way to do is create a x64 version codeGenModule and use the x64 CGM to generate the function type for entry thunk. But it is hard for me to do here. I tried a little but a lot of issues happen.
> > > 
> > > One other way is only modify `AArch64ABIInfo::classifyArgumentType`, copy the x64 code into the function and add a flag to determine which version will the function use. It is easier but I'm not sure it is the only difference between x64 and aarch64. Maybe the classify return also need to do this. And it is not a clean way I think.
> > Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7.
> > 
> > Like I noted on D126811, attaching thunks to calls is tricky if we try to do it from clang.
> > 
> > Computing the right IR type shouldn't be that hard by itself; we can call into call lowering code in TargetInfo without modifying much else.  (We just need a bit to tell the TargetInfo to redirect the call, like D125419.  Use an entry point like CodeGenTypes::arrangeCall.)  You don't need to mess with the type system or anything like that.
> > 
> > The problem is correctly representing the lowered call in IR; we really don't want to do lowering early because it will block optimizations.  I considered using an operand bundle; we can probably make that work, but it's complicated, and probably disables some optimizations.
> > 
> > I think the best thing we can do here is add an IR attribute to mark arguments which are passed directly on AArch64, but need to be passed indirectly for the x64 ABI.  Then AArch64Arm64ECCallLowering can check for the attribute and modify its behavior.  This isn't really clean in the sense that it's specific to the x64/aarch64 pair of calling conventions, but I think the alternative is worse.
> It looks not only 3/5/6/7, but also all size exclusive larger than 8 and less than 16 are difference between x86 ABI and Aarch64 ABI.
> Maybe we can emit a function declaration here for the x86ABI thunk, then define it in Arm64ECCallLowering.
> 
I think the sizes between 8 and 16 work correctly already?  All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that.  But that's not really important for the general question.

We need to preserve the required semantics for both the AArch64 and x86 calling conventions.  There are basically the following possibilities:

- We compute the declaration of the thunk in the frontend, and attach it to the call with an operand bundle.  Like I mentioned, I don't want to go down this path: the operand bundle blocks optimizations, and it becomes more complicated for other code to generate arm64ec compatible calls.
- We don't compute the definition of the thunk in the frontend.  Given that, the only other way to attach the information we need to the call is to use attributes.  The simplest thing is probably to attach the attribute directly to the argument; name it "arm64ec-thunk-pass-indirect", or something like that.  (I mean, we could compute the whole signature and stuff it into a string attribute, but that doesn't really seem like an improvement...)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125418/new/

https://reviews.llvm.org/D125418