<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/84463>84463</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[analyzer] MemRegion::getDescriptiveName breaks for Index not of type nonloc::ConcreteIn
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
T-Gruber
</td>
</tr>
</table>
<pre>
I am currently working with the Clang Static Analyzer to analyze small C code examples. Thereby I have come across a behaviour of MemRegion::getDescriptiveName that I do not fully understand. I am working with the llvm-project 17.x-release version.
The simple checker below should be sufficient for demonstration purposes. Basically, the MemRegion and its name for a given SVal should be extracted within the checkLocation method.
```Cpp
namespace {
class TestChecker : public clang::ento::Checker<clang::ento::check::Location> {
public:
void checkLocation(const clang::ento::SVal &Loc, bool IsLoad,
const clang::Stmt *S,
clang::ento::CheckerContext &CC) const {
clang::ento::ProgramStateRef PSR = CC.getState();
const clang::ento::MemRegion *MemReg = Loc.getAsRegion();
if (MemReg) {
std::string MemRegName = MemReg->getDescriptiveName();
llvm::outs() << "MemRegion: Name = " << MemRegName << "\n";
}
}
};
} // namespace
void clang::ento::registerTestChecker(clang::ento::CheckerManager &CM) {
CM.registerChecker<TestChecker>();
}
bool clang::ento::shouldRegisterTestChecker(
const clang::ento::CheckerManager &CM) {
return true;
}
```
This is the C code example to be analysed.
```C
unsigned int index1 = 1;
extern unsigned int index2;
extern int array[10];
int main() {
array[index1];
array[index2];
return 0;
}
```
The extracted MemRegions and their names for the first statement match my expectations.:
```
MemRegion: Name = 'index1'
ElementRegion: ER = Element{array,1 S64b,int}
Index: ConcreteInt = 1
Next iteration: R = array
MemRegion: Name = 'array[1]'
```
Here is the relevant part of MemRegion::getDescriptiveName with a few additional outputs:
```Cpp
std::string MemRegion::getDescriptiveName(bool UseQuotes) const {
std::string VariableName;
std::string ArrayIndices;
const MemRegion *R = this;
SmallString<50> buf;
llvm::raw_svector_ostream os(buf);
// Obtain array indices to add them to the variable name.
const ElementRegion *ER = nullptr;
while ((ER = R->getAs<ElementRegion>())) {
// Index is a ConcreteInt.
llvm::outs() << "ElementRegion: ER = " << ER->getString() << "\n";
if (auto CI = ER->getIndex().getAs<nonloc::ConcreteInt>()) {
llvm::SmallString<2> Idx;
CI->getValue().toString(Idx);
ArrayIndices = (llvm::Twine("[") + Idx.str() + "]" + ArrayIndices).str();
llvm::outs() << "Index: ConcreteInt = " << Idx.str() << "\n";
}
// If not a ConcreteInt, try to obtain the variable
// name by calling 'getDescriptiveName' recursively.
else {
std::string Idx = ER->getDescriptiveName(false);
if (!Idx.empty()) {
ArrayIndices = (llvm::Twine("[") + Idx + "]" + ArrayIndices).str();
llvm::outs() << "Index: NOT ConcreteInt = " << Idx << "\n";
}
}
R = ER->getSuperRegion();
if (R)
llvm::outs() << "Next iteration: R = " << R->getString() << "\n";
}
...
```
The second statement, however, causes the method to call itself recursively with the same instance (else branch inside while statement). This finally leads to a segmentation fault.
```
MemRegion: Name = 'index2'
ElementRegion: ER = Element{array,reg_$0<unsigned int index2>,int}
ElementRegion: ER = Element{array,reg_$0<unsigned int index2>,int}
ElementRegion: ER = Element{array,reg_$0<unsigned int index2>,int}
...
#254 0x000055f02a6171ed clang::ento::MemRegion::getDescriptiveName[abi:cxx11](bool) const (bin/clang+0x57321ed)
#255 0x000055f02a6171ed clang::ento::MemRegion::getDescriptiveName[abi:cxx11](bool) const (bin/clang+0x57321ed)
Segmentation fault (core dumped)
```
The result that would correspond to my expectation could be achieved by the following additional lines (else if):
```Cpp
std::string MemRegion::getDescriptiveName(bool UseQuotes) const {
std::string VariableName;
std::string ArrayIndices;
const MemRegion *R = this;
SmallString<50> buf;
llvm::raw_svector_ostream os(buf);
// Obtain array indices to add them to the variable name.
const ElementRegion *ER = nullptr;
while ((ER = R->getAs<ElementRegion>())) {
// Index is a ConcreteInt.
llvm::outs() << "ElementRegion: ER = " << ER->getString() << "\n";
if (auto CI = ER->getIndex().getAs<nonloc::ConcreteInt>()) {
llvm::SmallString<2> Idx;
CI->getValue().toString(Idx);
ArrayIndices = (llvm::Twine("[") + Idx.str() + "]" + ArrayIndices).str();
llvm::outs() << "Index: ConcreteInt = " << Idx.str() << "\n";
}
// Added: Try to get index as SymbolVal -> SymbolRef -> OriginRegion -> Name
else if (auto SI = ER->getIndex().getAs<nonloc::SymbolVal>()) {
if (auto SR = SI->getAsSymbol()) {
if (auto OR = SR->getOriginRegion()) {
std::string Idx = OR->getDescriptiveName(false);
ArrayIndices = (llvm::Twine("[") + Idx + "]" + ArrayIndices).str();
llvm::outs() << "Index: SymbolVal = " << Idx << "\n";
}
}
}
// If not a ConcreteInt, try to obtain the variable
// name by calling 'getDescriptiveName' recursively.
else {
std::string Idx = ER->getDescriptiveName(false);
if (!Idx.empty()) {
ArrayIndices = (llvm::Twine("[") + Idx + "]" + ArrayIndices).str();
llvm::outs() << "Index: NOT ConcreteInt = " << Idx << "\n";
}
}
R = ER->getSuperRegion();
if (R)
llvm::outs() << "Next iteration: R = " << R->getString() << "\n";
}
```
This would achieve the following output and the recursion of MemRegion::getDescriptiveName would also be solved for this case:
```
MemRegion: Name = 'index2'
ElementRegion: ER = Element{array,reg_$0<unsigned int index2>,int}
Index: SymbolVal = index2
Next iteration: R = array
MemRegion: Name = 'array[index2]'
```
Am I missing something or is the described behaviour intended?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWttu4zjSfhrmptCGRNmxc-ELR0n-30D39Gyc7dsBJZUtblOkQVI-7NMvSEqW7MhJenawh9kOgsSSyDrxq6qPlJkxfCMR52RyTyYPN6y2pdLzl0__p-sM9U2miuN8CayCvNYapRVH2Cv9ncsN7LktwZYIqWByAyvLLM9hIZk4_h01WAUsfAZTMSEghVwVCHhg1VagGcFLiRqzIyyhZDuEXFUILNfKGGCQYcl2XNUa1Bq-YPWMG64kSRYkWWzQPqDJNd9avsNfWIVgS2ZhCYUCqSysayGOUMsCtbFMFiPwPryyXIhd9Wmr1d8wtxBPR4dPGgUyg7BDbbiSIyDRA4kWLyWC4c5uyEvMv6OGDIXagylVLQrIEEy9XvOco7SwVhoKrJQ0VjPLlYRtrbfKOKfvmeE5E-JIaOptOPkGTBbArQHpHHIyGGz4DiWsvjHR04QHq1lusfCOcOnFeLM-qzzoq9CWqhgF45u_t1H4TbfbcMfpMVuWI5DpfbiVC2YMvKCxaeMmSRawrTPBc8jdOocVQGlV-NSMI0k6-NibFT62xpHksVMY_gYFbpi_BNgpXpy7ROgsdwEdtsJHiNDbzyp3cc2UErA0nxUrCE1boQM_r0SubGWB0MWqm_eW26mSFg9uxm2aEnrXCDy553UMzf9Vq41mlUsafMY1_Lp6BpI8QJqONmj9bUJnhN6RpC_qegQ6GBHaXHiBn1XuJC5Mk0DnQjvRfA2EzsJE58iZCwDGFkGPsdqlUBjoU89pCZefSPL4OjcH_ACfeUGgqq0JQ4AkKUlSIJT2Ex5OWgil7Zgz_e0sMkml-9dXRaYP7cXpo_twisDUyX0i9AlO-dCPTUDiUMQ1brixqHvZ4kD6Blq-MMk2LqfobfrlPMjpl1Err8uovuTk8XLpTt74vx7yg8pD4XgetPZD0HrfetBoay3B6hoHDGwrT9_el5Ib4CY0kLPO4DpHhqF5GLxWxMJ1LX3_KoBLC1wWeIg9UuKTFXiwqCW8Hkgvh7gnTGt2JJP7OCKTh4sscc8rxmUL1s77dlYwoD_z4hk9f9ZELfpwyPq1_5QixjcOWyLXAcK-d7i4rrk2FowrJhV6621eQnUEPGwxt76umtGp7F4ovZaD08ZNOg3jHoWX3o19DKWsuU-m9yEGNI1hdTvOCE25u924unTS3LRUyVyjxaW0zRr657-4-sot6qZ5LCCID0LftvS0nC7urb2Dsf1_1NjC0XGAHZMWtkzbj9EPzygYrHEPrCi4s5QJULXduur2KsCnFjxcVt_QROjMZ_pfDf6lVhbNYNO5FPuNac4yESQkV4ctXLiWsuA5mt6wIP6swYQlsCXvj1s5mrfyokiSTiLX6LN63Y3oir5m-9_MDnOr9G_KWI2sAuXagBs_0J-aCv01s4zLsPQui52hnmgWPgEq99mt4K7x1-fD6NyPM7g6Xxq4ylqIrdU9f_YlFwg-3WfNoOemxS0MSdIL4LcF2v-e98_GfA91hzLWB_voVXQGW-K1NOs1xcfWvGYRLkW86o-h6bPaKkiXIWlbESEtvYRR67BUUqi8aQqdA33Xe353_pwDgzpcLIvDBSNIl43mb0zUDWkYWXVyxc3o04g-WJtAzDqVL3sugxDqtjaUetvovVM8Mla3saH3ITAPPoz0_jwH6F039scIzNWq1luuC1M-SmM6OK39ZucMTH5XoY8uEVRIln4-vBLhtxrZEdyOxFUAQqdDNWcKGvNaG75DcWzxisLgezRxWRzOYfW6nK2ZMHgW34BKQmMXIKy29jgEr-bn98Pgn1j8jy__L19f3oHAOwt_ufT9i-fz2K7qLepBmt8R_Gd3--MwvtZ8ey78aM05eTAajd4hOwZzJYuOwDhwl2qPO8ddU8hZbTC07LDfdah3SHa7aBTrPmi7Pb9xkOfSWCZzX909jjPNZF66-7zApvL39N6NwBPWNZdu7w4CWRE6DxjcuDFh371mtbDDbr1Np-jvoFMaN78ROo5Ikg7S28cLpvXfKLkDCU3oZAzRIYqiaDJZR5TdxtMYr2zO3idSk3uWcZIs8sMhDvzQM6sen6KzzLH9p6CA3keHyTShMRanHPJGTf7DjFq9AiT4wxONUNTVtm_-tcTTaNw0f6i298dOudIazdalo1UXWwjI25Mplpccd1i4luI3IEoItXeNoMeKBZeuVDeJxwPfW7x9VPW_xZN_EuWfRPknUf7jiPKiKNBlMLwEZrzBpuEAM7A6VpkS35gAF9bm8hnX4fKr5hsumxzwd3xtOGloatgJHasfRMdJ-xVsQEfdgvyA5dXylF9BwhsMuTf7azO7ta7v3ZCEa3T-64_Q-X81Q_8o4LqF_zFSfoawt-j5n3yb1gPXz73an3Ovdvb25OopfmCIDfe74H3hCLQ9oW7xqeQHj1WDZGH8GwGjhOOW4XSbG8iZwe6d4YV98O_fb12pNM3wP-xwu3uz8OYJ96KCJVTcGLcsRrkds18g3R57Fz74mWPvp3fvXFqUvnk-3RTzpLhL7tgNzuNpdDe7u41pclPO71gcjW_HlGU4GU_iLJ7ScTGL47vxJM7XRXTD5zSi4yiJZnEczcbJ6HYSJ1mRT6IowyibJGQcYcW4GDmEj5Te3HBjapzPxuPb5EawDIWZh5ogcQ_-YSgGN3ru3-Fn9caQcSS4saaTYrkV_qsNzRcRNJk8fAB1mUb2PbxFCUTUlW-1BnvcIgxTu5tai3lp7dYf9vuiveG2rLNRripCn3zqhn_t9w0IffKOGEKfvKP_CAAA__9eNcT-">