[PATCH] D128108: [lld-macho] Add support for objc_msgSend stubs
Keith Smiley via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 8 17:51:54 PDT 2022
keith added inline comments.
================
Comment at: lld/MachO/Arch/ARM64.cpp:114-116
+ 0xd4200020, // brk #0
+ 0xd4200020, // brk #0
+ 0xd4200020, // brk #0
----------------
int3 wrote:
> What's the purpose of these `brk #0` instructions? Also, any idea why they are rendered as `brk #0x1` in the `arm64-objcstubs.s` test?
The comment here was outdated, fixed to show `0x1`. I'm not sure why there are extras here, maybe some alignment concern? I wasn't really sure if the right path is to copy ld64, or just ignore unless we hit an issue. I haven't tested without them.
================
Comment at: lld/MachO/Driver.cpp:1183
+ in.objcMethnameSection->addInput(isec);
+ in.objcMethnameSection->isec->markLive(0);
+}
----------------
int3 wrote:
> is this necessary? isn't marking the pieces as `live` above sufficient?
Without it I hit this assert https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/lld/MachO/InputSection.cpp#L221 when using -dead_strip, the entire section being alive or not doesn't include the strings being alive
================
Comment at: lld/MachO/SyntheticSections.cpp:731
+ std::string methname = sym->getName().drop_front(symbolPrefixLength).str();
+ methname.push_back('\0');
+ auto map = in.objcMethnameSection->getDuplicateStringOffsetMap();
----------------
int3 wrote:
> keith wrote:
> > int3 wrote:
> > > isn't the string already null-terminated? why do we have to add another `\0`?
> > I didn't dig into this, but this is how I got here:
> >
> > In `CStringSection`:
> >
> > ```
> > StringRef string = isec->getStringRef(i);
> > llvm::outs() << "'" << string << "'" << " len: " << string.size() << "\n";
> > ```
> >
> > This prints: `'foo' size 4`.
> >
> > In `ObjCStubsSection::addEntry`
> >
> > ```
> > llvm::outs() << "'" << sym->getName() << "' size " << sym->getName().size() << "\n";
> > std::string methname = sym->getName().drop_front(symbolPrefixLength).str();
> > ```
> >
> > This prints `'foo' size 3`.
> could you dig into this? I would really prefer we avoid copying the string if possible
I pushed a new option here which I believe avoids the copies. The core issue is that here https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/lld/MachO/InputSection.h#L206-L212 we manually construct a StringRef with the actual data, and with the length of the actual data + 1, since `pieces[i + 1].inSecOff` is after the null terminator. But any other StringRef initialization, like this one https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/lld/MachO/InputFiles.cpp#L943 goes through an initializer that computes the length, which naturally excludes the null terminator https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/llvm/include/llvm/ADT/StringRef.h#L83-L85
I'm now mirroring this StringRef initialization, but maybe we should correct that string length off by 1 instead?
================
Comment at: lld/test/MachO/x86_64-objcstubs.s:18
+.section __TEXT,__objc_methname,cstring_literals
+selref1:
+ .asciz "foo"
----------------
int3 wrote:
> just curious -- does `clang -S` normally generate regular symbols for strings in `__objc_methname`? AFAIK it usually uses private `LFoo` symbols for `__cstring`
Here's a snippet:
```
.section __TEXT,__objc_methname,cstring_literals
l_OBJC_METH_VAR_NAME_: ; @OBJC_METH_VAR_NAME_
.asciz "foo"
```
Then they reference this from `__objc_const`:
```
.section __DATA,__objc_const
.p2align 3 ; @"_OBJC_$_INSTANCE_METHODS_Foo"
__OBJC_$_INSTANCE_METHODS_Foo:
.long 24 ; 0x18
.long 3 ; 0x3
.quad l_OBJC_METH_VAR_NAME_
.quad l_OBJC_METH_VAR_TYPE_
.quad "-[Foo foo]"
```
I added a prefix to these to be more clear
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128108/new/
https://reviews.llvm.org/D128108
More information about the llvm-commits
mailing list