[LLVMdev] [Proposal] Speculative execution of function calls
Kuperstein, Michael M
michael.m.kuperstein at intel.com
Wed Jul 31 05:18:05 PDT 2013
Whether cost is an issue depends on the specific use of speculative execution.
In the context of LICM, I believe it is almost always a good idea to hoist, as loop counts of 0 are relatively rare. This applies especially to expensive functions.
As to the use of speculative execution purely to elide jumps - right now, cost is not a factor in the isSafeTo...() decision in any case. A memory load may also be much more expensive than a jump, but loads, when possible, are still considered safe. So, I think this is indeed orthogonal - cost should be a separate query, perhaps. Some passes may want to perform it (In fact, SimplifyCFG already has an internal ComputeSpeculationCost() method), while others will want to speculate whenever possible (LICM).
As to being safe for only a subset of inputs - if a function is safe only for a subset of inputs, it's not safe, just like a function that is readonly for a subset of inputs is not readonly. ;-)
From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall
Sent: Wednesday, July 31, 2013 13:56
To: Kuperstein, Michael M
Cc: LLVMdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [Proposal] Speculative execution of function calls
On 31 Jul 2013, at 10:50, "Kuperstein, Michael M" <michael.m.kuperstein at intel.com> wrote:
> This has two main uses:
> 1) Intrinsics, including target-dependent intrinsics, can be marked with this attribute - hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by "readnone" (e.g. rounding mode) will also not have any of the other issues.
> 2) DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe.
The slightly orthogonal question to safety is the cost of execution. For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors). For other functions, it's even worse because the cost may be dependent on the input. Consider as a trivial example the well-loved recursive Fibonacci function. This is always safe to call speculatively, because it only touches local variables. It is, however, probably never a good idea to do so. It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution. Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs. Even in your examples, some of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this.
I suspect that much of the problem here comes from modelling intrinsics as calls in the IR, when most of them are closer to arithmetic operations. This means that optimisations have to be aware that some calls are not really calls and so don't cause any flow control effects. I wonder if it's worth revisiting some of the design of intrinsics and having some notion of target-dependent instructions. This would also help if anyone wants to try the route discussed at the San Jose DevMeeting last year of progressively lowering machine-independent IR to machine instructions.
A final issue that may be relevant is parallel safety. On architectures that have very cheap userspace coroutine creation, it may be interesting to speculatively execute some functions in parallel. On others, I can imagine transforming certain longer-running calls into libdispatch invocations followed by joins. This, however, requires that you can detect that the call is safe to execute speculatively, doesn't have read dependencies on any shared state that might be modified, and is sufficiently expensive for the overhead of parallel execution to be worth it. This is probably a lot beyond the scope of the current discussion.
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the llvm-dev