<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)">Hi all,</span>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><br>
</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)">We'd like to contribute something related to -path-equivalence option of llvm-cov. We want to hear feedbacks from you guys.</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><br>
</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)">In summary, there are three changes.</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)">1. Use regular expression to remap paths.</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)">2. Do not use `sys::fs::equivalent` to find the same file, which causes a huge performance issue in our environment.</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)">3. Use remapped paths for output directory paths of `llvm-show` instead of original paths.</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><br>
</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">Here is some context why we want this feature. We're building a single binary from multiple machines, so the binary contains different source code paths for
 the same file. </span><span style="margin:0px">There is already `-path-equivalence` option for remapping paths, but it does not meet our requirement because we need n to 1 mapping while the current implementation only supports 1 to 1 mapping. Thus, currently,
 we are using a custom llvm-cov build using a regular expression. The command line looks like this</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">$ llvm-cov show </span><span style="margin:0px">-path-equivalence='[A-Za-z0-9_/\\:]+[\\/]src[\\/][0-9]+[\\/]([A-Za-z]),:/' ...</span><br>
</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">Then, /path/to/src/0/f/asdf becomes f:/asdf.</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">We are using a capture to find a prefix (i.e. captured_string + filepath.replace(regex, to)), but we can drop this part if it looks complicated and not intuitive.</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">Secondly, there is a huge performance degradation in our environment because of file system accesses in llvm-cov.</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px"><a href="https://github.com/llvm/llvm-project/blob/d079dbc591899159925a1fe10b081fa0f6bb61bd/llvm/tools/llvm-cov/CodeCoverage.cpp#L251-L253" target="_blank" rel="noopener noreferrer" data-auth="NotApplicable" style="margin:0px">https://github.com/llvm/llvm-project/blob/d079dbc591899159925a1fe10b081fa0f6bb61bd/llvm/tools/llvm-cov/CodeCoverage.cpp#L251-L253</a><br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">```</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">ErrorOr<const MemoryBuffer &><br>
CodeCoverageTool::getSourceFile(StringRef SourceFile) {<br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">...<br>
  for (const auto &Files : LoadedSourceFiles)
<div style="margin:0px">    if (sys::fs::equivalent(SourceFile, Files.first))</div>
<div style="margin:0px">    <span style="margin:0px">  return *Files.second;</span></div>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">```</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><br>
</div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">If I rewrite the `sys::fs::equivalent` to a plain string comparison, the performance is improved a lot. When I tested it with a sample binary, the running time
 reduced from 1 minutes to 7 seconds. And, the overall running time of llvm-cov in our Windows environment reduced from 17 hours to 4 hours.</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">Since we are already using the remapped paths to find the same files, we don't need this kind of strict check to find equivalent files. So, I'd like to replace
 this strict comparison to a simple one when this regex-based remapping is being used.</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">Lastly, the current implementation is using the original paths to generate output files. For example, if `-path-equivalence=/tmp,/output -format=html -output-dir=./outdir`
 is used, the generated file paths will be `./outdir/tmp/...`, not `./outdir/output/...`. It does not make sense because there can be multiple different original paths. Thus, I'd like to use the remapped file paths to generate the output files.</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">
<div style="margin:0px">Since we don't want to break the existing option, I'm thinking about introducing a new flag adding `-regex` at the end (i.e. `-path-equivalence-regex`). I want to ask which approach is best for this case (introducing a new flag or updating
 the existing behavior).</div>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;background-color:rgb(255, 255, 255)"><span style="margin:0px">Please let me know if there is any concern from this proposal.<br>
I also uploaded a proof of concept to <a href="https://reviews.llvm.org/D96696" target="_blank" rel="noopener noreferrer" data-auth="NotApplicable" style="margin:0px">https://reviews.llvm.org/D96696</a>
<div class="x__Entity x__EType_OWALinkPreview x__EId_OWALinkPreview x__EReadonly_1" style="margin:0px">
</div>
<br>
Thanks,<br>
Choongwoo Han</span></div>
<br>
</div>
</body>
</html>