<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/76824>76824</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
castToDeclContext takes 2% of execution time
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Destroyerrrocket
</td>
</tr>
</table>
<pre>
The function castToDeclContext takes around 2% of the execution time in my test run.
I was profiling clang to see if by any chance I could spot any small mistakes that were taking a significant amount of time on debug builds, and while pretty much everything is either complicated enough where optimizing would be impossible for me or is already very performant, I noticed that valgrind was reporting an interesting function as the top of the 'auto' category:
![interesting_entry](https://github.com/llvm/llvm-project/assets/25348040/8b7e5d79-d7c7-4c3e-a0de-2ea63058c5aa)
It is being executed 1.8 billion times, and the implementation looks pretty trivial to me:
![code_snippet](https://github.com/llvm/llvm-project/assets/25348040/a9f87c3f-55da-43b8-b028-db79ad61cf20)
Looking at the assembly, I expected pretty much a lookup table and an add operation, but it is pretty clear that the resulting code is not ideal:
clang::Decl::castFromDeclContext(clang::DeclContext const*): # @clang::Decl::castFromDeclContext(clang::DeclContext const*)
.L_ZN5clang4Decl19castFromDeclContextEPKNS_11DeclContextE$local:
movzwl 8(%rdi), %edx
movq %rdi, %rax
andl $127, %edx
leal -1(%rdx), %esi
cmpl $84, %esi
ja .LBB65_7
leaq .LJTI65_0(%rip), %rdi
movq $-40, %rcx
movslq (%rdi,%rsi,4), %rsi
addq %rdi, %rsi
jmpq *%rsi
.LBB65_2:
addq %rcx, %rax
retq
.LBB65_4:
movq $-48, %rcx
addq %rcx, %rax
retq
.LBB65_5:
movq $-64, %rcx
addq %rcx, %rax
retq
.LBB65_6:
movq $-56, %rcx
addq %rcx, %rax
retq
.LBB65_7:
leal -53(%rdx), %esi
movq $-72, %rcx
cmpl $6, %esi
jb .LBB65_2
addl $-34, %edx
xorl %ecx, %ecx
cmpl $5, %edx
setae %cl
shll $4, %ecx
orq $-64, %rcx
addq %rcx, %rax
retq
.LJTI65_0:
<JUMP TABLE>
The PR I'll submit in a few minutes fixes this problem by eliminating the need for the macro DECL_CONTEXT_BASE (it's only used here and in two other analogous functions), and reordering the AST decl order to prioritize classes that inherit from DeclContext. I also experimented with hand rolled offset tables, but this is far from maintainable even if it manages to compress 3 lookup tables into one. The resulting assembly is just:
clang::Decl::castFromDeclContext(clang::DeclContext const*): # @clang::Decl::castFromDeclContext(clang::DeclContext const*)
.L_ZN5clang4Decl19castFromDeclContextEPKNS_11DeclContextE$local:
movzwl 8(%rdi), %ecx
leaq .Lswitch.table._ZN5clang4Decl19castFromDeclContextEPKNS_11DeclContextE(%rip), %rdx
movq %rdi, %rax
andl $127, %ecx
addq (%rdx,%rcx,8), %rax
retq
And the build difference of clang+clang-tools-extra is:
NonOpt: ninja 19007,02s user 760,01s system 2284% cpu 14:25,23 total
Opt: ninja 18806,18s user 763,33s system 2308% cpu 14:07,74 total
So around ~1.02 speedup, ~0.98 of the previous execution, nothing earth shattering, but what would be expected from valgrind, and I already did all the legwork, so I might as well send it :)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsV0tv4zgS_jXKpWBDoiRLPviQJ5DZbE9jOwss9hJQZMliN0WqSSq2-zC_fVF0pDhxZoDtmeNc_CCLX31VLNaDe6-2BnGTlFdJeXPBx9BZt7lBH5w9oHPOim8YLhorD5vHDqEdjQjKGhDch0d7g0JfWxNwHyDwb-iBOzsaCSxhJdgWQoeAexRjPBRUj6AM9AcI6AO40SyT9CZJL4-f97DjHgZnW6WV2YLQ3GwhWPCIoFpoDsDNAUTHjUC4B2FHLcEPNsR133OtoVf-SCV0PMAOHRI1guNA1qpWCW4C8N6OJkSSRMsakNiMW2hGpaVP2DVwI2HXKY0wOAzhAP0oOsBndIfQEaDygCp06EDYftBK8IAS0Nhx28GuI9V2CKpXP0h6F9k2CKofrPeq0QitdUDKHWFx7ZDLA5ACGNC11vXcBKJyD8YGJVAerXrmeusU0eMeHA7WhWigAWUCOvTx73xZ3MeLCHaY7iRhFR-DTVgFRHpr3SHJX24hYVlSXp0APaEJ7pCUNwmruxAGT6LsLmF3WxW6sVkK2yfsTuvn6WsxOPsVRUjYHfceg0_YHSvzok6LNGF3dVNhKav1QlaiWhQixwVPJS4Y8lWelrUoOU_Y-k1oBPJQg2TXMaBQQrasoVFaT7E13xqZqPpBY48m8OgDbe03P11kcOpZcU2h1eM7w4WV-OSNGgYMf6HNfN3WlcjbRVlKvijypl40KasXsqnWXK4y0bL0nc0P1h7jNkSDCLVv9OEYD7gfUJATTmOTRzPHAQKn6CJXcANcSrADuugIOt2MAVR06MthoZG7Y2iRJod-1DGEyBkkZ2wAJZHr2VnxbdK__JKywPEXZYU7Z_uTvJCw-p3olDCENT4k7JKszi8hYTkkRfrX4kaqy4en_34qo3RBgtn6A7zbz__49OUpy06XElZoK06Mht4-_9hpqBNWJ6x0UpESdg0JK1HuX4RI6jtM-3HT8XmTG6khYUXGqvOTGrmGRTbB70_gvZqERD9EhLo43_vKYflwdbUqn6oTzO-wfPjl8X5VPqUv0Gp4hSaab5kXC4rZ46Y4Nctr2p5tv6Yfnn4UJ3CvbLiU7x1xQrUfaPPydPWFO3v1-CuG2J8702H4_uZkcXLy1Zj63Jg_gj1DLT9CXRV_EnX1EWq5-pOo1QnqMZzK_I_jadZdsXPdU7CtPoi1Bub7emUapRd5cR7ce-t0XJrNwHM95fk5j4FTySqFnpc6HaWLcyDrfu563nhyeixzacivf_n3Pz_D4-XVw22S356maeqLPv8L7hNWaQ1-bHrKrQY4tLiDXpkxoIdW7WNPEpOubTT21M6gVr0yPOZayrwGUcamgP70XDgLN7fXD0_Xv356vP3P49PV5ZdbeoAqJKzyYI0-wOhRQmw2KN8rA2Fnwca2hBuu7daOfu4F_EsEkKhD6yS6Sffll0eQKDTEVaqNg1PWqaB-ILVi3k9NlTIdOhWgdbaHk4S5hHvg2ttYnJyi4osSdip00EWFVmuUYNvWYzjWKD_Vo-gZ5aHl7ojbc2UCVyZWMnxGQx2gCtBzw7fExMauy6H3kL-pe566IAvW4BIe35SzqYSSoq-jD39Xs9-tZmL_rn74nQqiW0YPL3-awgfF56fL5tm7npPc9fzG6xNNHz11gOP35UvXGLt_kKpt0SFNGbY9jiEJu4rfi2Ct9gvcB8dB-dmZn6z5daCAAqPMVw6QrdOUqKbM0xN1UK2opqaZB3_wAXtgjCp4CWIYIaPCxSj5sRyCDfwl1b3DrOuUEnFWz5h5wq7z_BUzT-s3mJFDVZxiHj-_2GlY-y1bpgz8gCjHgZz1W7pc19OoMDh8VpRC5jmORIw9jkDIXejAdzyEmEmm57yLw9c08czNanzZ0_Qy5aH7efKRSgLNcKRX43Zn3TcS8hbuoVfbLtAos0NKs0i5LkBsy9cXcpPLdb7mF7jJqrRg63yVrS-6jahXTYNrWVQMa1lnDa7W67QsK562BXJ2oTYsZUWapXlWlHm-XjbrjDVYZu2qWbcrXiZFij1Xekk9_tK67YXyfsRNtapZcaF5g9rH8ZkxgzuImwljNE27TZwLmnHrkyLVygf_ihJU0Lj5vTF6mp_fzc4Xo9Ob_3sciZxoHImc_xcAAP__wUnv-g">