[PATCH] D131309: [ELF] Add ability to get a symbol by name from the hash table section

Fri Aug 19 00:16:32 PDT 2022

jhenderson added a comment.

Mostly just nits remaining from me, plus a couple of "normal behaviour" test cases I missed earlier.

FWIW, I think implementing SHT_HASH support first, as this patch does, should definitely happen. This is because the SHT_HASH is the thing that is in the gABI, and not SHT_GNU_HASH, however widely the GNU version is supported. Last I checked my company's system didn't use GNU hash tables yet, for example. I wouldn't be opposed to follow-up work for the GNU hash table though.

Presumably LLD has the GNU hash alogrithm in its source code, since it can generate such tables? Could that code be moved to libObject if needed?

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:835-840
+  - Name:            y
+    Index:           SHN_ABS
+    Value:           0x1
+  - Name:            x
+    Index:           SHN_ABS
+    Value:           0x2
----------------
If I follow this correctly, y is in the same bucket and ahead of x, such that we see y, check its name, and then move onto x. Assuming that is right, it might be worth a brief comment stating why we have y (I think we should have it FWIW, to show that the bucket/chain logic is correct).

We definitely also want a test case where the symbol is not found. I think it would also be worth a test case involving multiple buckets, to show that we calculate the bucket correctly (this test case only currently has one bucket, so there's no risk that the wrong bucket would be selected).

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:923
+    Bucket:          [ 1 ]
+    Chain:           [ 0, 0 ]
+  - Name:            .dynsym
----------------
I believe nchain is supposed to always match the size of the .dynsym. Not sure that it matters that much, but might be less confusing/risk of causing problems if it did in these test cases.

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:1064
+
+// Test for getSymbolFromHashTable: check that it fails with an invalid table.
+TEST(ELFObjectFileTest, SymbolFromHashTableTestName) {
----------------
I'm not sure "an invalid table" is clear as to the meaning. I'd go with "an invalid symbol name offset."

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:1106
+
+// Test for getSymbolFromHashTable: check that it fails with an invalid sh_link.
+TEST(ELFObjectFileTest, SymbolFromHashTableTestLink) {
----------------
Probably change this comment to "check that it fails when not referencing a dynamic symbol table" or something to that effect. "invalid sh_link" sounds like the value is outside the range of the section indexes, which is what the next case does.

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:1164-1165
+
+// Test for getSymbolFromHashTable: check that it fails given invalid section
+// header offsets.
+TEST(ELFObjectFileTest, SymbolFromHashTableTestHeaderOffset) {
----------------

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:916
+  Expected<ELFObjectFile<ELF64LE>> ElfOrErr = toBinary<ELF64LE>(Storage, R"(
+--- !ELF
+FileHeader:
----------------
MaskRay wrote:
> jhenderson wrote:
> > jhuber6 wrote:
> > > jhenderson wrote:
> > > > You might be able to avoid duplicating this string between some of the test cases, by having a factory function that constructs the string by having a prefix (which ends at the end of the section you want to modify) and a suffix (which starts immediately after it), and then an optional field that can be inserted in the middle. Essentially you could have lines like the following then in the different tests:
> > > > ```
> > > > makeYAMLWithInvalidDynsym("ShOffset: 9999");
> > > > makeYAMLWithInvalidHash("ShSize: 1");
> > > > ```
> > > > or perhaps:
> > > > ```
> > > > makeYAML(DynsymPrefix, "ShOffset: 9999", DynsymSuffix);
> > > > makeYAML(HashPrefix, "ShSize: 1", HashSuffix);
> > > > ```
> > > > Potentially, you could have that function construct the ELF and return the hash table too, to reduce the boiler plate.
> > > The solution I had previously that modified the ELF in-place was more compact because I could iteratively make different field invalid using the same input. I could go back to that method, otherwise I think it's too much work to make some factory generator just for saving a few string constants in a test.
> > Yes, I see your point. I do have a strong aversion to const_cast, but I don't really have much justification to it in this particular context. I see there is limited usage within LLVM unit tests of it too. I'd like a second opinion from a regular contributor in this area:
> > 
> > @MaskRay, do you have any thoughts?
> I prefer avoiding `const_cast`, but for this case it's fine if an alternative needs too much boilerplate.
> 
> `SmallString<0> Storage;` and the `toBinary<ELF64LE>(Storage, R"(` use make it pretty clear the underlying object is actually writable.
Okay, thanks @MaskRay. @jhuber6, if you would prefer to revert to const_cast style, I'm not going to oppose it, although I'd ask that you keep the test cases separate still, with a helper function for the boiler plate to set up the ELF. The reasoning for this is mostly to keep the errors isolated (i.e. "the symbol would be found/nullptr returned except for this one thing").

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131309/new/

https://reviews.llvm.org/D131309