[PATCH] D131309: [ELF] Add ability to get a symbol by name from the hash table section

Thu Aug 18 00:48:48 PDT 2022

jhenderson added inline comments.

================
Comment at: llvm/include/llvm/Object/ELF.h:506
+  if (Sec.sh_type != ELF::SHT_HASH)
+    return createError("invalid sh_type for hash table, expected SHT_HASH");
+  Expected<Elf_Shdr_Range> SectionsOrError = sections();
----------------
jhenderson wrote:
> I'm looking at this error and wondering if it would be useful to print the actual type (possibly just as a number), so that if a user encounters this area, it might be slightly easier to find the problem. Up to you whether you think it's useful or not though.
Just pinging this point in case you missed it, as you've not responded one way or another (even with just a "no I don't think it's useful).

================
Comment at: llvm/include/llvm/Object/ELF.h:527-528
+                       " has invalid sh_offset: " + Twine(Sec.sh_offset));
+  if (Sec.sh_size < sizeof(Elf_Hash) + sizeof(Elf_Word) * HashTab->nbucket +
+                        sizeof(Elf_Word) * HashTab->nchain)
+    return createError("section " + getSecIndexForError(*this, Sec) +
----------------
jhenderson wrote:
> If HashTab's size is 0, then you can't read the nbucket or nchain field, so this will result in reading outside the section. You'll need to split this into a check to make sure it's big enough for the nchain/nbucket fields, and then a separate one to make sure the arrays fit.
Thanks for the fix, although there are now two failure cases here, both of which need separate testing (i.e. one with a very small/nearly empty SHT_HASH and another where it has enough space for the header fields, but not the full bucket array).

================
Comment at: llvm/include/llvm/Object/ELF.h:505-516
+  if (Sec.sh_type != ELF::SHT_HASH)
+    return createError("invalid sh_type for hash table, expected SHT_HASH");
+  Expected<Elf_Shdr_Range> SectionsOrError = sections();
+  if (!SectionsOrError)
+    return SectionsOrError.takeError();
+
+  auto SymTabOrErr = object::getSection<ELFT>(*SectionsOrError, Sec.sh_link);
----------------
jhuber6 wrote:
> jhenderson wrote:
> > jhuber6 wrote:
> > > jhenderson wrote:
> > > > All these checks still need test cases.
> > > Checking these errors explicitly is excessive. This isn't any new logic so it should already be covered by existing tests. I added one for calling this with an incorrect section but the other error paths are already covered as far as I'm aware.
> > I'm not suggesting the called code itself needs additional testing, merely testing that shows that if the called code reports an error, the error is correctly handled and returned up the call stack. The new if statements //are// new logic. As things stand, there is zero testing for those statements.
> I'll add a test for an invalid `st_link`. That should be good enough to assert that this part functions properly. I think it's safe to assert that getting a section from an index is sufficiently tested.
(I assume you meant sh_link). That will cover the `getSection` if statement, but it doesn't cover the `sections()` if statement or the `getStringTableForSymtab`: every conditional should have at least one test case that triggers both the true and false cases, or you haven't fully tested the if.

Putting it in other terms, if things were easier to mock, I'd suggest mocking out the return values of each of those functions, so that in one case they produced an error and in the other case they produced a valid value that can be used later on, and then having test cases that used both mocked versions. In practice, what you have to do instead is craft some invalid ELF to trigger the error. I don't care how you achieve that, as long as the respective functions trigger errors.

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:1080
+
+// Test for getSymbolFromHashTable: check that it fails with an invalid st_link.
+TEST(ELFObjectFileTest, SymbolFromHashTableTestLink) {
----------------
`sh_link` not `st_link`

================
Comment at: llvm/unittests/Object/ELFObjectFileTest.cpp:916
+  Expected<ELFObjectFile<ELF64LE>> ElfOrErr = toBinary<ELF64LE>(Storage, R"(
+--- !ELF
+FileHeader:
----------------
jhuber6 wrote:
> jhenderson wrote:
> > You might be able to avoid duplicating this string between some of the test cases, by having a factory function that constructs the string by having a prefix (which ends at the end of the section you want to modify) and a suffix (which starts immediately after it), and then an optional field that can be inserted in the middle. Essentially you could have lines like the following then in the different tests:
> > ```
> > makeYAMLWithInvalidDynsym("ShOffset: 9999");
> > makeYAMLWithInvalidHash("ShSize: 1");
> > ```
> > or perhaps:
> > ```
> > makeYAML(DynsymPrefix, "ShOffset: 9999", DynsymSuffix);
> > makeYAML(HashPrefix, "ShSize: 1", HashSuffix);
> > ```
> > Potentially, you could have that function construct the ELF and return the hash table too, to reduce the boiler plate.
> The solution I had previously that modified the ELF in-place was more compact because I could iteratively make different field invalid using the same input. I could go back to that method, otherwise I think it's too much work to make some factory generator just for saving a few string constants in a test.
Yes, I see your point. I do have a strong aversion to const_cast, but I don't really have much justification to it in this particular context. I see there is limited usage within LLVM unit tests of it too. I'd like a second opinion from a regular contributor in this area:

@MaskRay, do you have any thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131309/new/

https://reviews.llvm.org/D131309