[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Mon Aug 8 07:41:40 PDT 2016

> On 8 Aug 2016, at 23:27, Serge Rogatch <serge.rogatch at gmail.com> wrote:
> 
> I think that 32-bit systems (especially ARM) may be short on memory so doubling the size of the table containing (potentially) all the functions may give a tangible overhead. I would even align the entries to 4 bytes (so 12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per entry) on 64-bit platforms, to improve CPU cache hits. What do you think?

It should work, but I'm a little wary about painting ourselves into a corner -- for example, I'm already designing some extensions to this table to represent other kinds of information (that might fit into what currently is padding, or use some more bits of the bytes in the entries). I might need to either have some sort of versioning introduced into this table so that tools reading the same table (not necessarily the runtime) can determine what kinds of information will be available in the entries. Although you're right, maybe 14 bytes of padding is a little excessive but I'm being very conservative here. :D

Basically the trade-off is between binary/resident size and tooling support.

For 32-bit systems I think its possible to have smaller entries at the cost of making the tooling and runtime a bit more complex -- i.e. there's going to be a special implementation for x86 32-bit and 64-bit, arm 32-bit and 64-bit, etc. Then we think about the tools external to the runtime that will access the same table. We can probably write tools that extract the table from binaries in COFF, ELF, and MachO then turn those into a canonicalised instrumentation map. We don't even deal with endianness (reading an instrumentation map for a 64-bit binary from a 32-bit system -- what order to the bytes come in and how should the 32-bit system interpret those values). These issues start expanding the tooling support matrix.

Maybe that's inevitable, depending on which platforms the members of the LLVM community would like to see XRay be available. :D

As far as CPU cache hit/misses are concerned, I personally don't think it's that crucial to get the table packed so we utilise the cache more -- the patching code runs through this table sequentially, and the cost is actually in the sys calls making code pages writeable (and marking the page dirty and causing all sorts of more important issues). I think cache hits/misses are the least of the problems here. ;)

I am open to suggestions here too, so I'd be happy to shave a few more bytes off if that means the impact of having XRay tables in the binary is minimised.

I suppose I should detail a bit more what other things will be coming up, which should help with the overall design direction here. I'll update within the week about some other things we're looking to bring upstream with more details as soon as I'm done fleshing those out. :)

Stay tuned!

Cheers

-- Dean