[PATCH] D41723: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Chandler Carruth via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 4 15:17:58 PST 2018
On Thu, Jan 4, 2018 at 5:54 PM Rui Ueyama via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
> On Fri, Jan 5, 2018 at 7:16 AM, Chandler Carruth via Phabricator <
> reviews at reviews.llvm.org> wrote:
>
>> chandlerc marked an inline comment as done.
>> chandlerc added a comment.
>>
>> In https://reviews.llvm.org/D41723#967580, @jyknight wrote:
>>
>> > In https://reviews.llvm.org/D41723#967554, @chandlerc wrote:
>> >
>> > > > That should be easy to support -- just don't do the thunk-emission
>> -- but it does then mean the need to standardize on the names and semantics
>> of the required thunks.
>> > >
>> > > I don't think this is true. We need to document LLVM's semantics and
>> the thunks required. If they match GCCs, cool. If not and a user provides a
>> flag to request an external thunk, then they have to give LLVM one that
>> matches LLVM's semantics. Since they control the name, they can even have
>> thunks with each compiler's semantics and different names. While in
>> practice, I expect the semantics to match, I don't think we should be in
>> the business of forcing this into the ABI. We already have waaaay to much
>> in the ABI. If we come up with a better way to do retpoline in the future,
>> we should rapidly switch to that without needing to coordinate about names
>> and semantics across compilers.
>> >
>> >
>> > Right, I shouldn't have said "standardize", I did actually mean
>> "document".
>> >
>> > >> And it would be good if the same function-names were used as GCC.
>> Fine to do as a followup patch, but maybe at least have an idea what the
>> command-line UI will be for it.
>> > >
>> > > I strongly disagree here. Again, this should *not* be part of the ABI
>> and it should not be something that we have to engineer to match exactly
>> with other compilers.
>> >
>> > No, I agree with you, it does not _need_ to match. However, if the
>> semantics are the same as GCC's thunks (which they do appear to be), it
>> would be _good_ to coordinate with eachother, simply in order to make
>> users' lives easier. Which is the same reason to coordinate the
>> command-line UI. Neither are required, but both would be good.
>> >
>> > > We should even be free to add `__llvm_retpoline2_foo()` thunks in the
>> future with new semantics with zero compatibility concerns.
>> >
>> > It wouldn't have _zero_ concerns due to externally-provided thunks --
>> they'll need to provide new ones to use the new compiler version. But
>> probably that's okay.
>>
>>
>> Sure, sounds like we're generally in agreement here. I'll take a stab at
>> supporting external thunks and skimming emission, shouldn't be too hard.
>>
>>
>>
>> ================
>> Comment at: lld/ELF/Arch/X86.cpp:491
>> + 0x83, 0xc4, 0x04, // next: add $4, %esp
>> + 0x87, 0x04, 0x24, // xchg %eax, (%esp)
>> + 0xc3, // ret
>> ----------------
>> ruiu wrote:
>> > rnk wrote:
>> > > chandlerc wrote:
>> > > > Does it make sense to use something like the `pushl` sequence Reid
>> came up with here? In the non-PLT case it looks like:
>> > > >
>> > > > ```
>> > > > addl $4, %esp
>> > > > pushl 4(%esp)
>> > > > pushl 4(%esp)
>> > > > popl 8(%esp)
>> > > > popl (%esp)
>> > > > ```
>> > > >
>> > > > So it would potentially need to be done a touch differently to work
>> here, but maybe something like that rather than `xchg`?
>> > > >
>> > > > Even if the alternative is a lot more instructions, the `xchg`
>> instruction is a locked instruction on x86 and so this will actually create
>> synchronization overhead on the cache line of the top of the stack.
>> > > This is a real concern, I checked the Intel manual, and it says:
>> > >
>> > > > Exchanges the contents of the destination (first) and source
>> (second) operands. The operands can be two general purpose
>> > > > registers or a register and a memory location. If a memory operand
>> is referenced, the processor’s locking
>> > > > protocol is automatically implemented for the duration of the
>> exchange operation, regardless of the presence or
>> > > > absence of the LOCK prefix or of the value of the IOPL.
>> > >
>> > > One question is, do we want to try to avoid `PUSH/POP MEM`
>> instructions? LLVM has x86 subtarget features that suggest that these
>> instructions are slow on some CPU models.
>> > >
>> > > To avoid them completely, we could use this code sequence:
>> > > ```
>> > > movl %ecx, (%esp) # save ECX over useless RA
>> > > movl 4(%esp), %ecx # load original EAX to ECX
>> > > movl %eax, 4(%esp) # store callee over saved EAX
>> > > movl %ecx, %eax # restore EAX
>> > > popl %ecx # restore ECX
>> > > retl # return to callee
>> > > ```
>> > >
>> > > When it comes down to it, this just implements `xchg` with a scratch
>> register.
>> > >
>> > > On reflection, I think the code sequence above is the best we can do.
>> The PUSH sequence you describe is 8 memory operations vs 4 if we use ECX as
>> scratch.
>> > I didn't know that xchg is so slow. There's no reason not to use
>> push/pop instructions to swap a word at the stack top and a register. Since
>> this is a PLT header (and not a PLT entry), the size of the instrcutions
>> doesn't really matter. Do you want me to update my patch?
>> Yeah, if you can email me a delta on top of this, I'll fold it in. Thanks!
>>
>> (Suggesting that in part to make backports a bit easier...)
>>
>>
>> ================
>> Comment at: lld/ELF/Arch/X86_64.cpp:517-525
>> + 0x4c, 0x8b, 0x1d, 0, 0, 0, 0, // mov foo at GOTPLT(%rip), %r11
>> + 0xe8, 0x04, 0x00, 0x00, 0x00, // callq next
>> + 0xf3, 0x90, // loop: pause
>> + 0xeb, 0xfc, // jmp loop; .align 16
>> + 0x4c, 0x89, 0x1c, 0x24, // next: mov %r11, (%rsp)
>> + 0xc3, // ret
>> + 0x68, 0, 0, 0, 0, // pushq <relocation index>
>> ----------------
>> ruiu wrote:
>> > Chander,
>> >
>> > I also noticed we can improve instructions here. We can use the
>> following instructions instead so that the jump target to lazy-resolve PLT
>> is aligned to 16 byte. I can make a change now if you want.
>> >
>> > mov foo at GOTPLT(%rip), %r11
>> > callq next
>> > loop: pause
>> > jmp plt+32; .align 16
>> > pushq <relocation index> // lazy-resolve a PLT entry
>> > jmpq plt[0]
>> >
>> Will this break the 16-byte property when combined with -z now? That
>> seems more relevant. Given that the lazy-binding is intended to only run
>> once, I don't think its worth burning much space to speed it up.
>
>
> No, since for `-z now`, we use completely different code sequence for the
> PLT entry, and the new code sequence shouldn't be larger than the current
> one.
>
Maybe I don't follow, but if it doesn't change the size, great.
> I'm not sure how important it is to align retpoline target to be aligned
> to 16 byte though.
>
It is useful to align the retpoline target if it is hot. Lazy binding
shouldn't be hot.
>
>
>>
>> ================
>> Comment at: llvm/lib/CodeGen/IndirectBrExpandPass.cpp:113
>> + // index.
>> + if (BBIndex == -1) {
>> + BBIndex = BBs.size();
>> ----------------
>> efriedma wrote:
>> > blockaddresses are uniqued, so no block should ever have more than one
>> blockaddress user. So this should probably be an assertion.
>> I just didn't want to hard code that assumption, but I can if you prefer.
>>
>>
>> ================
>> Comment at: llvm/lib/CodeGen/IndirectBrExpandPass.cpp:134
>> + // Now rewrite each indirectbr to cast its loaded pointer to an
>> integer and
>> + // switch on it using the integer map from above.
>> + for (auto *IBr : IndirectBrs) {
>> ----------------
>> sanjoy wrote:
>> > Do we care about inline assembly here? The langref says "Finally, some
>> targets may provide defined semantics when using the value as the operand
>> to an inline assembly, but that is target specific."
>> I mean, yes, but also no. ;]
>>
>> It would be nice to maybe preserve inline asm uses of blockaddr and not
>> any others. And then force people to not rinse their blackaddr usage
>> through inline asm and mix that with `-mretpoline`. That would allow the
>> common usage I'm aware of to remain (showing label addresses in crash dumps
>> in things like kernels) and really allow almost any usage wholly contained
>> within inline asm to continue working perfectly.
>>
>> But it seemed reasonable for a follow-up. That said, maybe its not too
>> complex to add now...
>>
>>
>> ================
>> Comment at: llvm/lib/Target/X86/X86InstrCompiler.td:1160
>> (TCRETURNdi tglobaladdr:$dst, imm:$off)>,
>> - Requires<[NotLP64]>;
>> + Requires<[NotLP64, NotUseRetpoline]>;
>>
>> ----------------
>> AndreiGrischenko wrote:
>> > Hi Chandler,
>> > Do you really want to use "NotUseRetpoline" here? It will match
>> RETPOLINE_TCRETURN32 then even it is not an indirect branch.
>> > I guess the following test case will crash llc.
>> >
>> > ```
>> > target triple = "i386-unknown-linux-gnu"
>> >
>> > define void @FOO() {
>> > ret void
>> > }
>> >
>> > define void @FOO1() {
>> > entry:
>> > tail call void @FOO()
>> > ret void
>> > }
>> > ```
>> >
>> Reid, could you take a look?
>>
>>
>> ================
>> Comment at: llvm/lib/Target/X86/X86RetpolineThunks.cpp:121
>> +
>> + createThunk(M, "r11", X86::R11);
>> + } else {
>> ----------------
>> AndreiGrischenko wrote:
>> > You will create thunk function even if it is not necessary? For example
>> for " tail call void @FOO()"?
>> >
>> >
>> There is a FIXME above about being more careful when creating these.
>>
>>
>> https://reviews.llvm.org/D41723
>>
>>
>>
>> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180104/e44037f8/attachment.html>
More information about the llvm-commits
mailing list