<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/63229>63229</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
use a smarter binary search in `SourceManager::getFileIDLocal`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
zygoloid
</td>
</tr>
</table>
<pre>
We perform a binary search in `SourceManager::getFileIDLocal` to map a source location to a `SLocEntry`. (We do the same for `getFileIDLoaded`, but per #2851, we should do something different there.) This binary search is given a target `SLocOffset`, but doesn't take advantage of that when picking its probe points. It should! In some compilations we spend a significant amount of time here.
Instead of probing at (low ID + high ID) / 2, we should consider probing at lerp(low ID, high ID, (SLocOffset - low offset) / (high offset - low offset)). On the assumption that the sizes of `SLocEntry`s are independently identically distributed, this should get us to the right entry in many fewer probes. One downside is that we'll potentially pull in more cache lines during the search, so we should do some measurement to make sure this actually works out being better. It's also worst-case linear, for example if we have a large number of tiny SLocEntry's and then a really big one, and we look for the start of the really big one, so some adaptive behavior falling back to the worst-case logarithmic time search might make sense for that case. (For example, we could compute the current naive probe and the target-offset-based probe, and interpolate between them in some way, perhaps based on how many probes we've already done or how much closer we got in source location address space last time.)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycVU-P27gP_TTKhZjAI0_-HXJoO78AA_SHHnaBnmmLtrUjS4ZIJ5t--gVltzNt97SXBLCp9_geH2Vk9n0kOpvdR7N73uAsQ8rnb_c-heTdpknufv5KMFHuUh4BofER8x2YMLcD-AhmX_2R5tzS_zFiT9nUH0z9oSe5-EAvz59Ti8HsK5AEI06AwKUaQmpRfIr6AgvK59T-L0q-m321BWOPXwlcAhkIGEeCLmUte4eMjpzZV8Z-gmYWbRKMre1x96iPbgQ8pDk4ReE0kgw-9uB811GmKIqcaWvsCf4cPP-qjKH3V4qAIJh7ku8tfuk6JnlH6xJxNPYgIPhKgO6KUbAnSB3IgAK3gSJMvn1Vei8MU04NwZR8FN7Ci6x9GvsIL7G0Cm0aJx-KQVyUTBSdmuf76DvfYhTAMc1RCo0fCRY11bOpPiy_L5GF0GmBMio7ihob0g1ensHYjzD4foCXZzXB2AvYn41rU2TvKL8_HyhPPzC0_AfEJ8V-swgeQIvS4tdKYOyx1Kd_LTH2tIUvscwcmedxWhKiLpYc-G_EqueXuDBgJvDRkdpEUcIdvP77FkO4g_Ms2TezkNM2Rce9StTJzqwhVILs-0GAFFazPWK8Q0e31QFi7U5TeSu-aEiWCZOxhxBgSqKkhXOaQygYKRO02A4EwUdicHNWK4uekjVtidPveYWRkOdMYwmrrs8rgT5YBGArc2G6pfzKkGaBhhS5IRHKGixjDwwYFDxllocWeekCs5LqRtHfOE6BwHfawIBXAoSgiYc4jw3lJV_xDm-GF9ToVILuR6bSRuN7SJEUWF_edMXTayEpWgXzElb1-bcjvGpGh5P4K0FDA159ytBhCEUWtq_f5_ReTuoxexlG3y57sC7wWEa5eEaRaW0EBfRYuWAub_rX3Ldr7MdpFipM7ZzLZRFRm1o2d9W-XgwPS3wfGmRyS8V3D3wUylMKKKpHbkQl26Pmoqi94V1rJ8oDTgwLRIowpNsSviV2S8B0NCETuju4FAlSXurmdoA2JKasEvokC_zPtyw6l4kZeEJ9jCzFLb3-Nu5cu1N9wg2dH_fHfb2zp4PdDGesuh3Vrmlrd9o3dKxs1xyb3YGeqCO7e9z4s61sXe2rk62t3dnt4bi3iIfHk93t6ubpaJ4qGtGHbQjXcZtyv_HMM533tbWnTcCGApdvj7WRblBeGmv1U5TPeuahmXs2T1XwLPyGIl4CnWfWsPKIWSj_5y_TZs7hPIhMrAX2Yuyl9zLMzbZNo7EXJV3_Hqac_qJWjL2UVtnYS5HyTwAAAP__DpuBcQ">