[PATCH] D148188: [llvm-profdata] Make profile reader behavior consistent when encountering malformed profiles

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 27 18:38:06 PDT 2023


wenlei added inline comments.


================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:557-558
     Data = MD5NameMemStart + ((*Idx) * sizeof(uint64_t));
+    End = reinterpret_cast<const uint8_t *>(
+        std::numeric_limits<uintptr_t>::max());
     auto FID = readUnencodedNumber<uint64_t>();
----------------
huangjd wrote:
> wenlei wrote:
> > What is the bogus End used for? Defend against call site that didn't check error and tries to continue reading? 
> Even though the section header of the name table should appear before the section header of the function offset table or profiles, the actual section data can be placed in arbitrary order. If the name table section is placed after the profile data section, when a string is read, End still points to the end of the profile data section which is before Data here, and readUnencodedNumber fails even if the name table index is valid. 
> 
> Since we already checked a fixed length MD5 name table contains actually enough entries when reading the name table, we don't need to check for bounds here again, so setting End to max int so that readUnencodedNumber never fails bounds check 
I think it's better to keep the invariant that `End >= Data` everywhere instead trying to get around it. To me, the problem here is the caller of `readUnencodedNumber` only set `Data`, but not `End` which leave the two in an inconsistent state. Can we also set/save/restore `End` there, and avoid the trick here to bypass the check? 

```
    const uint8_t *SavedData = Data;
    Data = MD5NameMemStart + ((*Idx) * sizeof(uint64_t));
    auto FID = readUnencodedNumber<uint64_t>();
```


================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:743-744
-    bool UseMD5 = hasSecFlag(Entry, SecNameTableFlags::SecFlagMD5Name);
-    assert((!FixedLengthMD5 || UseMD5) &&
-           "If FixedLengthMD5 is true, UseMD5 has to be true");
     FunctionSamples::HasUniqSuffix =
----------------
huangjd wrote:
> wenlei wrote:
> > Why is assertion removed? Is it no longer the case? 
> In release mode assert is ignored, and test case 2 causes a bug with a flag of FixedLengthMD5 & ~UseMD5 . I fixed the logic so that even in this case the logic is correct, so the check is no longer needed. 
Perhaps we should fix the logic/testcase that produced the situation `FixedLengthMD5` is true while `UseMD5` is false, instead of removing the assert and tolerate such cases.

In this function below, it also assumed `IsMD5` needs to be true if `FixedLengthMD5`. I think it's reasonable to keep the assertion. 
```
SampleProfileReaderExtBinaryBase::readNameTableSec(bool IsMD5,
                                                   bool FixedLengthMD5)
  if (!IsMD5)
    return SampleProfileReaderBinary::readNameTable();
...
  if (FixedLengthMD5) 

```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148188/new/

https://reviews.llvm.org/D148188



More information about the llvm-commits mailing list