<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Date: Tue, 23 Mar 2021 19:44:49 +0000<br>
From: Sjoerd Meijer via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>
To: "<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>" <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>
Subject: [llvm-dev] Function specialisation pass<br><br><br>
I am interested in adding a function specialisation(*) pass to LLVM ...<br>
<br>
Both previous attempts were parked at approximately the same point: the<br>
transformation was implemented but the cost-model to control compile-times<br>
and code-size was lacking ...<br></blockquote><div><br></div><div>This sounds right. The transform is fairly mechanical - clone function, replace uses of the argument being specialised on with the value, replace call sites which passed said value. Great to see there's already work in that direction.</div><div><br></div><div>I'd be delighted to contribute to the implementation effort. It may even qualify as a legitimate thing to do during work hours - it would let the amdgpu openmp runtime back off on the enthusiasm for inlining everything. Some initial thoughts below.<br></div><div><br></div><div>Thanks!</div><div><br></div><div>Eliding call overhead and specialising on known arguments are currently bundled together as 'inlining', which also has a challenging cost model. If we can reliably specialise, call sites that are presently inlined no longer need to be. I'm sure the end point reached, with specialisation and inlining working in harmony, would be better than what we have now in compile time, code size, code quality.</div><div><br></div><div>There's a lot of work to get the heuristics and interaction with inlining right though. I'd suggest an intermediate step of specialising based on user annotations, kicking that problem down the road:</div><div><br></div><div>void example(int x, __attribute__((bikeshed)) int y) {... big function ...}</div><div><br></div><div>where this means a call site with a compile time known value for y, say 42, gets specialised without reference to heuristics to a call to:</div><div><br></div><div>void example.carefully-named.42(int x) {constexpr int y = 42; ...}</div><div><br></div><div>We still have to get the machinery right - caching previous specialisations, mapping multiple specialisations down to the same end call, care with naming, maybe teaching thinlto about it and so forth. However we postpone the problem of heuristically determining which arguments and call sites are profitable.</div><div><br></div><div>The big motivating examples for me are things that take function pointers (qsort style) and functions containing a large switch on one of the arguments that is often known at the callee. An in-tree example of the latter using 'the trick' from partial evaluation to get the same end result is in llvm/openmp/libomptarget/plugins/amdgpu/impl/msgpack.h:</div><div><br></div><div>The entry point is:</div><div>template <typename F><br>const unsigned char *handle_msgpack(byte_range bytes, F f);<br></div><div><br></div><div>It's going to switch on the first byte in the byte_range but the function is too large to inline. So it is specialized, not with a convenient attribute, but by introducing something like:</div><div><br></div><div>template <typename F, msgpack::type ty><br>const unsigned char *func(byte_range bytes, F f) {</div><div> switch(ty) {}</div><div>}</div><div>template <typename F><br>const unsigned char *handle_msgpack(byte_range bytes, F f) {</div><div>auto ty = bytes[0];</div><div>switch(ty) {</div><div> case 0: return func<F,0>(...);</div><div> case 1: return func<F,1>(...);</div><div>}</div><div>}</div><div><br></div><div>This is, alas, a slightly more complicated example than `void example(int x, __attribute__((bikeshed)) int y)`, but I'm optimistic that explicitly annotating a parameter as 'when this is known at compile time, specialise on it' would still let me simplify that code.</div><div><br></div><div>Function pointer interfaces that are only called with a couple of different pointers may be a more obvious win. Like openmp target regions.</div></div></div>