[PATCH] D68351: [profile] Add a mode to continuously sync counter updates to a file

Fri Oct 4 10:30:30 PDT 2019

Petr's method is more like 'relocating' the counter memory from static
allocated memory to mmapped memory, but better be named 'reloc' method.

Now lets do some comparison:

1) reloc method allows runtime to mmap from the start of the raw profile,
so there is no need to page align the counter data in the file;
2) reloc method does have more instrumentation overhead, however it is not
as large as we think.
   a) the bias load can be done at the start of the function
   b) on x86, the addressing mode kicks in, there is actually no size
difference in terms of updating:
        addq    $1, counter+16(%rip)
   vs
       addq    $1, counter+16(%rax)

  both are 7 bytes long.

so the overhead will be a few bytes more per function.

3) reloc method produces smaller sized raw profile

4) The continuous method has the advantage of being more efficient in
running speed.

Also, I believe both methods should support on-line merging -- as the
profile update happens 'In-place' -- the online merging code can be
simplified (no locking, reading, updating are needed).

David

On Thu, Oct 3, 2019 at 4:18 PM Vedant Kumar via Phabricator via
llvm-commits <llvm-commits at lists.llvm.org> wrote:

> vsk added a comment.
>
> In D68351#1693693 <https://reviews.llvm.org/D68351#1693693>, @phosek
> wrote:
>
> > In D68351#1693307 <https://reviews.llvm.org/D68351#1693307>, @davidxl
> wrote:
> >
> > > +petr who has similar needs for Fuchia platform
> >
> >
> > Thank you for including me, we have exactly the same use case in Fuchsia
> and this is the solution I've initially considered and been experimenting
> with locally based on the discussion with @davidxl. However, after internal
> discussion we've decided against this solution for two reasons:
> >
> > 1. This requires the ability to mmap the output file over the (portion
> of) binary which is something we don't allow in Fuchsia for security
> reasons; once all modules have been mapped in by the dynamic linker, we
> don't allow any further changes to their mapping and using this solution
> would require special dynamic linker which is possible but highly
> undesirable.
> > 2. It introduces bloat to both the binary and the output file. It also
> complicates the runtime implementation due to alignment and padding
> requirements.
> >
> >   The alternative solution we've came up and that I've now been
> implementing is to change the instrumentation to allow extra level of
> indirection. Concretely, the instrumentation conceptually does the
> following: `c = __profc_foo[idx]; c++; __profc_foo[idx] = c`. We'd like to
> change this `c = *(&__profc[idx] + *bias); c++; *(&__profc[idx] + *bias) =
> c` where `bias` is a global variable set by the runtime to be the offset
> between the `__llvm_prf_cnts` section and the corresponding location in the
> file that's mmapped in. Initially, that offset would be `0` which would
> result in exactly the same behavior as today, but the runtime can mmap the
> output file into address space and then change the offset to make the
> counters be continuously updated.
> >
> >   The advantage of this solution is that there are no changes needed to
> the profile format. It also doesn't require mmapping the output file over
> the binary, the output file can be mmapped anywhere in the address space.
> The disadvantage is extra overhead since instrumentation is going to be
> slightly more complicated, although we don't have any numbers yet to
> quantify how much slower it's going to be. The implementation should be
> fairly minimal and my tentative plan was to gate it on compiler switch, so
> it wouldn't affect existing in any way (modulo introducing one new variable
> in the runtime to hold the bias). I'm hoping to have the prototype ready
> and uploaded for review within the next few days. What's your opinion on
> this idea? Would this be something that you'd be interested in as an
> alternative approach?
>
>
> @phosek thank you for sharing this alternative. I hope you don't mind my
> calling this the bias method, after the term in the proposed
> instrumentation.
>
> The TLDR is that I don't think the bias method is a good fit for
> Linux/Darwin. Imho, this method won't reduce the complexity of continuous
> mode, and its instrumentation overhead is likely to outweigh the savings
> from reduced section alignment. Let me try and justify these claims :).
>
> First and most critically, note that (on Linux & Darwin, at least), mmap()
> requires that the file `offset` argument be page-aligned. So, I don't
> believe there's a way to avoid changing the raw profile format, or to avoid
> the concomitant complexity necessary to calculate padding bytes in the
> runtime. I don't see how the bias method solves this problem: this seems to
> be a fundamental limitation of mmap().
>
> Second, note that the size overhead of counter increment instrumentation
> dwarfs the alignment overhead for `__llvm_prf_{cnts,data}` for all but the
> smallest of programs. Assuming 16K pages, in the worst case, the section
> alignment costs 32K per image. Assuming an increment can be encoded in 7
> bytes as `incq l___profc_main(%rip)`, this section alignment cost is
> equivalent to ~4700 increments. For comparison, this is roughly the number
> of counter increments in //FileCheck.cpp.o//. If the size overhead of
> counter instrumentation were to double, as I believe it would with the bias
> method, it would rapidly erase the memory savings from eliminating the
> section alignment requirement.
>
> Third, I'm not sure I understand the security constraints in Fuchsia that
> make it undesirable to map the contents of a section onto a file. Is the
> concern that an attacker process may update the file, thereby changing the
> in-memory contents of the section? I'll note that on Darwin, at least, the
> kernel does not permit this kind of sharing. If an attacker process
> modifies a file on disk that has the in-memory contents of a section
> MAP_SHARED over it, the in-memory contents of the mapped section are not
> changed (I verified this by making a small modification to
> `darwin-proof-of-concept.c`). If there is some security aspect to the
> problem I'm missing here, could you please elaborate on it? I'm hoping that
> the bias method will not be necessary to support continuous mode on
> Fuchsia, but it seems this depends on the answer to the security question.
> Either way, istm that the high level approach taken in this patch
> as-written is a better fit for Linux/Darwin, and moreover probably provides
> support that can be used to implement the bias method.
>
> I appreciate your feedback and think we can definitely work together to
> build some kind of common solution!
>
>
> CHANGES SINCE LAST ACTION
>   https://reviews.llvm.org/D68351/new/
>
> https://reviews.llvm.org/D68351
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191004/88272bde/attachment.html>