[llvm-dev] Potential missed optimisation with SEH funclets

Thu Jun 27 11:39:37 PDT 2019

I’d like to work on improving this, and I’ve got a few ideas thanks to your pointers. However there’s one issue that I can’t seem to work out.

The funclets are treated as save and restore blocks for the associated function, which means that they’ll push/pop every callee saved register that the associated function uses, even if the funclets themselves don’t use them. I tried fixing this with some custom logic in X86FrameLowering::[spill/restore]CalleeSavedRegisters, but I couldn’t find a good way to determine which registers the block for the funclet actually use (without iterating over each instruction).

Is there a better way to approach this?

> On 26 Jun 2019, at 21:17, Reid Kleckner <rnk at google.com> wrote:
> 
> 
> Yes, not much effort has been applied to optimizing Windows exception handling. We were primarily concerned with making it correct, and improving it hasn't been a priority. You can follow the code path through X86FrameLowering::emitPrologue with IsFunclet=true and see that it mechanically emits all the extra instructions mentioned above without any logic to skip such steps when not necessary.
> 
> However, while the mid-level representation we chose makes it hard to write these types of micro-level code quality optimizations, it allows the optimizers to do a variety of fancy things like heap to stack promotion on unique_ptr in the presence of exceptional control flow.
> 
>> On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> I’ve been experimenting with SEH handling in LLVM, and it seems like the unwind funclets generated by LLVM are much larger than those generated by Microsoft’s CL compiler.
>> 
>> I used the following code as a test:
>> 
>> void test() {
>>   MyClass x;
>>   externalFunction();
>> }
>> 
>> Compiling with CL, the unwind funclet that destroys ‘x’ is just two lines of asm:
>> 
>> lea rcx, QWORD PTR x$[rdx]
>> jmp ??1MyClass@@QEAA at XZ
>> 
>> However when compiling with clang-cl, it seems like it sets up an entire function frame just for the destructor call:
>> 
>> mov qword ptr [rsp + 16], rdx
>> push rbp
>> .seh_pushreg 5
>> sub rsp, 32
>> .seh_stackalloc 32
>> Lea rbp, [rdx + 48]
>> .seh_endprologue
>> Lea rcx, [rbp - 16]
>> call "??1MyClass@@QEAA at XZ”
>> nop
>> add rsp, 32
>> pop rbp
>> ret
>> 
>> Both were compiled with “/c /O2 /MD /EHsc”
>> 
>> Is LLVM missing a major optimisation here?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190627/740668ae/attachment.html>