[PATCH] D93267: [lld-macho] Handle overflow beyond the 127 common encodings limit
Jez Ng via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 16 19:19:26 PST 2020
int3 added inline comments.
================
Comment at: lld/MachO/UnwindInfoSection.cpp:142-146
if (a.second == b.second)
// When frequencies match, secondarily sort on encoding
// to maintain parity with validate-unwind-info.py
return a.first > b.first;
return a.second > b.second;
----------------
I know this is existing code, but isn't this the default behavior of `std::pair::operator>()`?
================
Comment at: lld/MachO/UnwindInfoSection.cpp:149
- // Split folded encodings into pages, limited by capacity of a page
- // and the 24-bit range of function offset
- //
- // Record the page splits as a vector of iterators on cuPtrVector
- // such that successive elements form a semi-open interval. E.g.,
- // page X's bounds are thus: [ pageBounds[X] .. pageBounds[X+1] )
- //
- // Note that pageBounds.size() is one greater than the number of
- // pages, and pageBounds.back() holds the sentinel cuPtrVector.cend()
- pageBounds.push_back(cuPtrVector.cbegin());
- // TODO(gkm): cut 1st page entries short to accommodate section headers ???
- CompactUnwindEntry64 cuEntryKey;
- for (size_t i = 0;;) {
- // Limit the search to entries that can fit within a 4 KiB page.
- const auto pageBegin = pageBounds[0] + i;
- const auto pageMax =
- pageBounds[0] +
- std::min(i + UNWIND_INFO_COMPRESSED_SECOND_LEVEL_ENTRIES_MAX,
- cuPtrVector.size());
- // Exclude entries with functionOffset that would overflow 24 bits
- cuEntryKey.functionAddress = (*pageBegin)->functionAddress +
- UNWIND_INFO_COMPRESSED_ENTRY_FUNC_OFFSET_MASK;
- const auto pageBreak = std::lower_bound(
- pageBegin, pageMax, &cuEntryKey,
- [](const CompactUnwindEntry64 *a, const CompactUnwindEntry64 *b) {
- return a->functionAddress < b->functionAddress;
- });
- pageBounds.push_back(pageBreak);
- if (pageBreak == cuPtrVector.cend())
- break;
- i = pageBreak - cuPtrVector.cbegin();
+ // Truncate the vector to 127 elements.
+ // Common encoding indexes are limited to 0..126, while enconding
----------------
depending on how large the vector is, might be a useful perf optimization to use quickselect to find the 127th largest element and just sort over those smaller than it... but we can consider that later
================
Comment at: lld/MachO/UnwindInfoSection.cpp:187
+ bytesRemaining -= sizeof(uint32_t);
+ } else if (bytesRemaining >= 2 * sizeof(uint32_t) &&
+ n < UNWIND_INFO_COMPACT_ENCODINGS_MAX) {
----------------
`2 * sizeof(uint32_t)` is due to the size of `unwind_info_regular_second_level_entry`, right? how about using `sizeof(unwind_info_regular_second_level_entry)` instead?
================
Comment at: lld/MachO/UnwindInfoSection.cpp:193-194
+ bytesRemaining -= 2 * sizeof(uint32_t);
+ } else
+ break;
+ }
----------------
let's use braces here too since the other if-else clauses have them
================
Comment at: lld/MachO/UnwindInfoSection.cpp:197
+ page.entryCount = i - page.entryIndex;
+ if (i < cuPtrVector.size() &&
+ page.entryCount < UNWIND_INFO_REGULAR_SECOND_LEVEL_ENTRIES_MAX) {
----------------
why does it matter if `i < cuPtrVector.size()`?
================
Comment at: lld/MachO/UnwindInfoSection.cpp:217
personalities.size() * sizeof(uint32_t) +
- pageBounds.size() * sizeof(unwind_info_section_header_index_entry) +
+ (secondLevelPages.size() + 1) *
+ sizeof(unwind_info_section_header_index_entry) +
----------------
why the `+ 1`?
================
Comment at: lld/MachO/UnwindInfoSection.cpp:255
auto *iep = reinterpret_cast<unwind_info_section_header_index_entry *>(i32p);
- for (size_t i = 0; i < pageBounds.size() - 1; i++) {
- iep->functionOffset = (*pageBounds[i])->functionAddress;
+ for (const auto &page : secondLevelPages) {
+ iep->functionOffset = cuPtrVector[page.entryIndex]->functionAddress;
----------------
avoid `auto` for types with simple names
================
Comment at: lld/MachO/UnwindInfoSection.cpp:279
+ auto *ep = reinterpret_cast<uint32_t *>(lep);
+ for (const auto &page : secondLevelPages) {
+ if (page.kind == UNWIND_SECOND_LEVEL_COMPRESSED) {
----------------
avoid `auto`
================
Comment at: lld/MachO/UnwindInfoSection.cpp:303-304
+ }
+ for (auto &encoding : page.localEncodings)
+ *ep++ = encoding;
+ } else {
----------------
could use `memcpy`
================
Comment at: lld/MachO/UnwindInfoSection.cpp:319
}
- p2p =
- reinterpret_cast<unwind_info_compressed_second_level_page_header *>(ep);
+ memset(ep, 0, page.padByteCount);
+ ep += (page.padByteCount / sizeof(uint32_t));
----------------
so I don't know if this is spec-guaranteed behavior, but there are already several places in LLD where we assume that an unwritten section of an `mmap`'ed buffer will be zero. I.e. we don't zero things out explicitly, and it seems to work in practice. I think doing that would allow us to eliminate `padByteCount`?
================
Comment at: lld/MachO/UnwindInfoSection.cpp:322-323
}
- assert(getSize() ==
- static_cast<size_t>((reinterpret_cast<uint8_t *>(p2p) - buf)));
+ // assert(getSize() ==
+ // static_cast<size_t>((reinterpret_cast<uint8_t *>(p2p) - buf)));
}
----------------
rm / uncomment?
================
Comment at: lld/MachO/UnwindInfoSection.h:17
#include "llvm/ADT/DenseMap.h"
+#include <unordered_map>
----------------
doesn't seem like `unordered_map` is actually used
================
Comment at: lld/MachO/UnwindInfoSection.h:42-47
+// This is a wrapper class that is otherwise like size_t, but has a
+// default constructor that yields ~0 rather than 0. I need it so that
+// lookup() of absent keys from DenseMap<..., size_tx> return ~0,
+// since 0 is a valid value.
+// TODO: If someone knows nicer way to accomplish this, please advise.
+// A simpler alternative: bias all values by 1.
----------------
why not use `find()` instead of `lookup()`?
================
Comment at: lld/MachO/UnwindInfoSection.h:97-112
#define UNWIND_INFO_COMMON_ENCODINGS_MAX 127
+#define UNWIND_INFO_COMPACT_ENCODINGS_MAX 256
#define UNWIND_INFO_SECOND_LEVEL_PAGE_SIZE 4096
#define UNWIND_INFO_REGULAR_SECOND_LEVEL_ENTRIES_MAX \
((UNWIND_INFO_SECOND_LEVEL_PAGE_SIZE - \
sizeof(unwind_info_regular_second_level_page_header)) / \
----------------
general aesthetic suggestion: if we moved this into the .cpp file, we could do away with all the `UNWIND_INFO_` prefixes and reduce some of the (extensive) line wrapping. Probably better in a separate diff though
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93267/new/
https://reviews.llvm.org/D93267
More information about the llvm-commits
mailing list