[llvm-dev] Adding support for self-modifying branches to LLVM?
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 9 14:22:26 PST 2016
On Tue, Feb 9, 2016 at 8:07 AM, Philip Reames <listmail at philipreames.com>
> On 02/09/2016 06:57 AM, Jonas Wagner wrote:
> I'm coming back to this old thread with data about the performance of
> NOPs. Recalling that I was considering transforming NOP instructions into
> branches and back, in order to dynamically enable code. One use case for
> this was enabling/disabling individual sanitizer checks (ASan, UBSan) on
> I wrote a pass which takes an ASan-instrumented program, and replaces each
> ASan check with an llvm.experimental.patchpoint intrinsic. This intrinsic
> inserts a NOP of configurable size. It has otherwise no effect on the
> program semantics. It does prevent some optimizations, presumably because
> instructions cannot be moved across the patchpoint.
> Some results:
> - On SPEC, patchpoints introduce an overhead of ~25% compared to a version
> where ASan checks are removed.
> - This is almost half of the cost of the checks themselves.
> - The results are similar for NOPs of size 1 and 5 bytes.
> - Interestingly, the results are similar for NOPs of 0 bytes, too. These
> are patchpoints that don't insert any code and only inhibit optimizations.
> I've only tested this on one benchmark, though.
> To summarize, only part of the cost of NOPs is due to executing them.
> Their effect on optimizations is significant, too. I guess this would hold
> for branches and sanitizer checks as well.
> I don't think you can really draw strong conclusions from the experiments
> you described. What you've ended up measuring is nearly the impact of not
> optimizing over patchpoints at the check locations. This doesn't really
> tell you much about what a check (which is likely to inhibit optimization
> much less) costs over a nop at the same position.
> One bit of data you could extract from the experiment as constructed would
> be the relative cost of extra nops. You do mention that the results are
> similar for sizes 1-5 bytes, but similar is very vague in this context.
> Are the results statistically indistinguishable? Or is there a noticeable
> but small slowdown that results? (Numbers would be great here.)
In this same vein, try inserting 1,2,3,4,5,6,... nops and measure the
performance impact (the total size of nops is also interesting but is more
difficult to measure reliably). I've used this kind of technique
successfully in the past for e.g. measuring the cost of "stat" syscalls on
windows. I call the technique "stuffing". Basically, make a plot of the
performance degradation as you insert more and more redundant stuff (e.g. 1
nop, 2 nops, 3 nops, etc.). If the result is a strong linear trend, then
you can pretty confidently extrapolate backward to the "0 nop" case to see
the overhead of inserting 1 nop.
-- Sean Silva
> On Thu, Jan 21, 2016 at 11:52 PM Jonas Wagner <jonas.wagner at epfl.ch>
>> There is some data on this, e.g, in “High System-Code Security with Low
>> Overhead” <http://dslab.epfl.ch/proj/asap/#publications>. In this work
>> we found that, for ASan as well as other instrumentation tools, most
>> overhead comes from the checks. Especially for CPU-intensive applications,
>> the cost of maintaining shadow memory is small.
>> How did you measure this? If it was measured by removing the checks
>> before optimization happens, then what you may have been measuring is not
>> the execution overhead of the branches (which is what would be eliminated
>> by nop’ing them out) but the effect on the optimizer.
>> Interesting. Indeed this was measured by removing some checks and then
>> re-optimizing the program.
>> I’m aware of some impact checks may have on optimization. For example,
>> I’ve seen cases where much less inlining happens because functions with
>> checks are larger. Do you know other concrete examples? This is definitely
>> something I’ll have to be careful about. Philip Reames confirms this, too.
>> On the other hand, we’ve also found that the benefit from removing a
>> check is roughly proportional to the number of cycles spent executing that
>> check’s instructions. Our model of this is not very precise, but it shows
>> that the cost of executing the check’s instructions matters.
>> I'll try to measure this, and will come back when I have data.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev