[PATCH] D35928: [XRay][X86] Use a valid instruction for the synthetic reference.

Tue Aug 1 03:25:39 PDT 2017

dberris added a comment.

In https://reviews.llvm.org/D35928#826967, @chandlerc wrote:

> I'd like to revisit the goal here:
>
> Is this a performance fix by avoiding an unknown instruction that gets decoded in some cases? Or is there a correctness issue? (I think you mentioned something about linking having trouble w/ this?)

This is mainly for performance reasons, one that's speculative too. It's an attempt to align whatever is coming next after the synthetic reference to 16-bytes, so I was looking for a way to do this with "valid" instructions.

> In https://reviews.llvm.org/D35928#822817, @dberris wrote:
> 
>> > I can't think of a better instruction if you need all the bits. movabsq is the only instruciton that takes a 64-bit immediate. The only other option might be one of the loads into al/ax/eax/rax from a 64-bit absolute address.
>>
>> I see. I think writing to the scratch register from outside of the function bodies might be more "benign" if it ever gets executed (compared to a read into one of the other registers). Although I'm not a security expert, so maybe the tradeoff here isn't as clear-cut. :/
>>
>> Will the x86 pipelines be happier with a store to a register than a load of an address? Or put another way, is the cost of `movabsq <immediate>,%r10` higher if it gets speculatively executed compared to say a load instruction? Or should I not even worry about this because the odds of the instruction being decoded/executed is very small (since it's aligned to 16 byte boundaries away from the end of the function)?
> 
> 
> Emitting a load sounds really bad here. Honestly, even movabsq seems a bit bad if you're worried about performance.
> 
> To help with performance I would pad this out with nops that we know decode *fast* (as in, not into micro-ops). If you want to be defensive about ROP gadgets, I'd emit a seld of int3 instructions followed by whatever you want.
> 
> The interesting bit (as indicated above) is whether you can just guard the raw data with a nop/int3 sled, or if you actually need a valid instruction. If you *do* need a valid instruction, I'd like to understand why. We have a reasonable number of places that slap raw data above a function section...

I settled on an INT3 then a 7-byte sled, then the reference. This is so that we can mitigate as much as possible the off-chance that the synthetic reference would be decoded and speculatively executed.

Thanks, Chandler!

PTAL

https://reviews.llvm.org/D35928