[LLVMdev] Generating different assembly code for the same LLVM instruction depending on the metadata.

Tue Jun 28 02:50:12 PDT 2011

Hi LLVM devs,

consider I've got an instrumentation pass that adds some code (say,
function calls) before some memory access instructions and marks those
calls with some special metadata.
I want the compiler to lower the instrumentation code to a sequence of
no-ops while generating the object code.

For example, the assembly for the following code:
  1 void _instr();  // the instrumentation function
  2
  3 void foo(int *x) {
  4   _instr();
  5   *x = *x + 1;
  6   _instr();
  7 }

should look like:

0000000000000000 <foo>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 83 ec 10          	sub    $0x10,%rsp
   8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
   c:	b8 00 00 00 00       	mov    $0x0,%eax
  11:	90                   	nop
  12:	90                   	nop
  13:	90                   	nop
  14:	90                   	nop
  15:	90                   	nop
  16:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  1a:	8b 00                	mov    (%rax),%eax
  1c:	8d 50 01             	lea    0x1(%rax),%edx
  1f:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  23:	89 10                	mov    %edx,(%rax)
  25:	b8 00 00 00 00       	mov    $0x0,%eax
  2a:	90                   	nop
  2b:	90                   	nop
  2c:	90                   	nop
  2d:	90                   	nop
  2e:	90                   	nop
  2f:	c9                   	leaveq
  30:	c3                   	retq

instead of:

0000000000000000 <foo>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 83 ec 10          	sub    $0x10,%rsp
   8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
   c:	b8 00 00 00 00       	mov    $0x0,%eax
  11:	e8 00 00 00 00       	callq  16 <foo+0x16>
  16:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  1a:	8b 00                	mov    (%rax),%eax
  1c:	8d 50 01             	lea    0x1(%rax),%edx
  1f:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  23:	89 10                	mov    %edx,(%rax)
  25:	b8 00 00 00 00       	mov    $0x0,%eax
  2a:	e8 00 00 00 00       	callq  2f <foo+0x2f>
  2f:	c9                   	leaveq
  30:	c3                   	retq

What's the easiest/best way to do so?

I'm considering the following approaches:
 -- use some uncommon DWARF tags to mark the instrumentation
instructions (to make sure they appear in the resulting binary) and
then post-process the .o file replacing the necessary bytes
 -- hack the code generator such that it generates different assembly
sequences depending on the metadata (am I right that it is not
currently supported?)

The goal I'm trying to accomplish is to make two versions of code that
can be hot-swapped at runtime using a single mmap() call. This may
allow to turn the instrumentation on and off at runtime.