<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/144681>144681</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[XRay] Weird sled behavior with `-O3` and `-fno-inline`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Thyre
</td>
</tr>
</table>
<pre>
**Godbolt link:** https://godbolt.org/z/KvoMf5GjY
-----
Given this very short example:
```c++
#include <math.h>
inline int SQRT(int arg) { return sqrtf(static_cast<float>(arg)); }
template<typename T>
T foo( T a )
{
return SQRT( (T)a );
}
int main( int argc, char** argv )
{
return foo( argc );
}
```
`clang` generates interesting assembly code with XRay being involved, and both the flags `-O3 -fno-inline -fxray-instrument -fxray-instruction-threshold=1` being used:
```assembly
main:
nop word ptr [rax + rax + 512]
nop word ptr [rax + rax + 512]
jmp int foo<int>(int)
int foo<int>(int):
nop word ptr [rax + rax + 512]
nop word ptr [rax + rax + 512]
jmp SQRT(int)
SQRT(int):
nop word ptr [rax + rax + 512]
[...]
ret
nop word ptr cs:[rax + rax + 512]
```
Both `main` and `int foo<int>(int)` have proper sleds for XRay instrumentation. However, both the enter and exit sled can be found before the actual function content (i.e. the `jmp` instruction).
This causes an issue for tools who want to represent the a proper tree structure of functions being called, e.g. performance tools. One would see something like this:
```
- ./a.out
- main
- int foo<int>(int)
- SQRT(int)
```
Instead of
```
- ./a.out
- main
- int foo<int>(int)
- SQRT(int)
```
In the case of LULESH with our current (in-development) XRay instrumentation adapter in [Score-P](https://www.vi-hps.org/projects/score-p/overview/overview.html), this even caused an inconsistent profile, probably due to similar reasons.
Given that this is a very constructed case, I don't see this as being a huge issue. However, I think this may be a limitation that should be documented somewhere. I can't immediately think of a solution for this, and I think most people will not encounter this issue. Why would someone prevent inlining with -O3 in the first place? (well, me, because I wanted to test the overhead when filtering functions).
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0VlFv2zYQ_jX0y8GCTNmK8-AHO6nbYB26tRm6PQ0UdbKYUqRGUnK8Xz8cKadx07XbgAUCzEjHu49333dH4b06GMQNW-3Y6nYmhtBat7lvTw5nla1PG8a3jG9f27qyOoBW5hMrtukltCH0Pv67Z3x_SDaZdQfG938yvv9htD82q9cPv7F8y_LtnP7S8rUa0UBolYcR3Ql8a10AfBRdr5E8RitW5umRjO_oybeMF8pIPdQIrLjpRGizlhWvkr0yWhkEZQJ8-Pn9PeNrWgrCcw3sagcOw-AM-D9caBhf-yCCkr9L4QMrbhptRSBffJ220FPsgF3dJvcBu16LgKy4CacejegQ7lPwe2isZXwN9yCA9uVbdkV4zyEnPMD4-p7x62RUxBOd3RPWTihDZhNuyfgNyFa4KeHCHcZL9wDnCBMA2vXC-TmPT2mVWpgDK3M4oEEnAnoKiQ59UOYAwnvsKn0CaWuEowot_PpenKBC-qrMaPWINYETpobKhhZCi9BocfDAynz-roB5Y-x8qsi8eXTiNFfGBzd0aMLlGxmUNfPQOvSt1TUrbheELUUbPNYvGXFGyPJtzFk0AGN7yggcrauhDw7YaufEIzC-g_PvasHZ6nbKHf39p00PXdpEhaLMFzfKTOShRSpRqunXPyfAzyD8j5jPYD-L4gng5asXoP51Xthql2XZ5TuH4VtuZWwh3_D9BX13RDdW5rHuZR4pyMr871Nd5tCKEaF3tkcHXmPtobEucfozKQXRMIM39ogjOmL3E7ORxBEj4aMK0QVIYaBCaOxAGsDGOoy2QoZBaGgGE3kN0ppAlCdAGWbRhpX5Q9cTsmcKYPw6g3TGe-qMUgwePQgDyvsBI-RgrfZwbC0chQkQLDjsHXoKEIOfTxkcIiTXg0OwzRMgPwlLCq2TiDE7ZNCja6zrhJGYomTwziAc7aBr8OTMdhha2qnVJ4zN-6Uuqc1DxvheZHZIZZ-nrpaW39ILfX9B0cva3xkfUNRgm38e9Pthk813Q8cES-FjNt_-8vbVhzepN9rBgRycO1fZzGscUdueWEWz52tEA1GLnlilDKnmg7QO5z8R4_n6crAej8dsVPO299Ns7Z19QBk843sft_WM7-2IblR4fLbM2tDpOMhu0qxFmrqRVnWklZHWeOUjPXtnG6WRbHtnK0Htvx6IC-BVp7Rw4FB4a3x2OcNFSM6VB5GmOXmNzIsq8dHnHdTE8KsQuRQ3iDMTBbTDARPLL_R3R4bmUzLv4gACAVp1asphjO7byNEKobYyJhjryNZjiw4zuCOpxtCq67BWIqA-TZ5tAwK81UN0FxVGvJ4m2zl-Z32AHm2vaRpqDcYGQCPtEPvCdPyI_mN7OmvGdmgNtR1Ke4A4C-m4kTM0IlWiVKMcuddCIiv2xKAjak0Yupi6CmPJ4C5qHmsqSUCfBE-1bkkSxxYNNEoHdBTkSe3UVWb1pqivi2sxw83iapXz5XpVlrN2U-W5WFTLKs_X-bIpcLksZF0sFguZX68Wq_VMbXjOV3m5WC94fs3L7Kpcl7IoFrypcIF5xZY5dkLpTOuxI37OYiI2i-WyXC9mWlSofbxecm7wmNLEOLX2mdvQpnk1HDxb5lr54D-7CSroeC8l8bDVLXxE5erUeitsxaisS6lMF45no-DZ1YOV-WxwevPFVVWFdqgyaTvG9xRx-plPymJ8H3GSwqaDjBv-VwAAAP__ACN1KA">