[PATCH] D93362: [llvm-elfabi] Support ELF file that lacks .gnu.hash section

Wed Jan 13 22:15:44 PST 2021

haowei added a comment.

In D93362#2492168 <https://reviews.llvm.org/D93362#2492168>, @jhenderson wrote:

> In D93362#2492101 <https://reviews.llvm.org/D93362#2492101>, @haowei wrote:
>
>> I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.
>
> What do you mean it seems upset? The review here looks fine to me.

If I compare Diff 2 and Diff 3 in Phabricator, the result looks weird as rebase changes were included.

>> In D93362#2486135 <https://reviews.llvm.org/D93362#2486135>, @jhenderson wrote:
>>
>>> The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.
>>
>> Do you mean I should also modify llvm-readobj so it use my `getDynSymtabSize` function to determine `.dynsym` size? I assume it currently has its own way to determine the number of dynamic symbols.
>
> I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.

Sorry, I misunderstood your suggestion. I thought you suggest to use llvm-readelf to verify the dynsym sizes.

I added a test to use llvm-elfabi to read elf files under 3 different conditions:

- ELF contains section headers, .gnu.hash and .hash sections.
- ELF contains .gnu.hash and .hash sections, section headers are stripped.
- ELF contains .hash sections, section headers are stripped.

Please let me know if these tests are sufficient.

================
Comment at: llvm/include/llvm/Object/ELF.h:566
+  // Locate the end of the chain to find the last symbol index.
+  while ((*It & 1) == 0) {
+    LastSymIdx++;
----------------
grimar wrote:
> jhenderson wrote:
> > haowei wrote:
> > > jhenderson wrote:
> > > > Seems like this might enter an infinite loop, if there's a corrupt section without a 0 value at the end of the chain.
> > > You are right, but what would be the best way to prevent that? I thought about to determine the max bucket capacity by calculating the differences between indexes in the bucket. However, it is not guaranteed that hashes will evenly distributed in the buckets. It also won't work if this is only one bucket.
> > I know @grimar spent some time fixing loops in hash table code like this in llvm-readelf. Maybe he can provide some guidance there. If nothing else, we can at least make sure the hash table doesn't run off the end of the file.
> My understanding is that this loop might either run past the end of the section or past the end of the file. It seems that saying "infinite loop" is not really correct then: it should crash sooner ol later, or return a wrong result.
> 
> There are 2 cases:
> 1) When we have a section header table, we can get the size of the hash table from there.
> 2) When we don't have a section header table, we have no information about the hash table size and only can check that we don't go past the end of the file I believe.
> 
> Perhaps, `getDynSymtabSizeFromGnuHash` could also accept one more argument, like `void *End` or something alike,
> which will either point past the end of the section or past the end of the file. The code could check that no overrun happens then.
> 
> May be it worth just to pass both `Optional<uint64_t> SecSize` and `void *BufEnd` for better diagnostics.
I pass BufEnd into getDynSymtabSizeFromGnuHash function. If section headers are stripped, there seems to be no way to know the size limit of DynSym if the hash tables are corrupted. I think file's BufEnd might be the only choice.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93362/new/

https://reviews.llvm.org/D93362