[PATCH] D80344: [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
John McCall via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 21 21:35:37 PDT 2020
rjmccall added a comment.
In D80344#2286666 <https://reviews.llvm.org/D80344#2286666>, @tentzen wrote:
> Thank you for prompt reply again.
>
>> [rjmccall] And I agree with him that the potential benefits are substantial. I'm just saying that one barely-trafficked RFC thread is not evidence of a community consensus.
>
> OK, thanks. it's good to know you are also supportive in adding this feature.
>
>> [rjmccall] As I understand it, you are creating implicit edges within the function to the active SEH handler (potentially part of the same function) from arbitrary places within covered basic blocks, turning basic blocks from a single-entry single-(internal-)exit representation to a single-entry multiple-exit representation. That is a huge change. How do you expect LLVM transformations to preserve or merge this information across blocks? How do you expect transformations to not move code between blocks in problematic ways, or reorder code within blocks in ways that will suddenly be visible to the optimizer? These are the standard objections to supporting features like this, that they have basically unspecified behavior and appear to work by good intentions and happy thoughts.
>
> There is absolutely NO explicit edge added in this design.
I didn't say there was. I said that there was an *implicit* edge: from the memory access to the handler. It's precisely because that edge is implicit that it's problematic for analysis and optimization.
> For LLVM transformations/optimizations, the only constrain added is **volatile** attribute on memory operations within a SEH region.
When is this attribute applied? Because you can't just apply it at a single point; you also need to mark operations when they're copied/inlined into the region. We once had a similar attempt at handling exceptions with a setjmp-based ABI (not the "SJLJ" exceptions supported by the backends, something frontend-centric) that had persistent problems because of poor modeling in the IR, which it feels like this is doomed to repeat.
>> [rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?
>
> Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.
Do you expect to maintain a CI bot that runs the full test suite continuously as changes go into LLVM? Because this sort of thing is extremely easy to break.
>> [rjmccall] There is a major difference between HW exceptions and non-HW exceptions, which is that non-HW exceptions are observed only at call sites. It is quite easy to write code in IRGen that works because the code emitted at a certain point isn't totally arbitrary
>
> still not completely sure what you are referring to. let me guess. Are you concerned that the Cleanup mechanism of a SEH region may not be setup properly if there is no call instruction in the region? This is actually a good question. The answer is that the presence of seh.try.begin() intrinsic will ensure that a Cleanup stack is established properly because there is an EH edge built for seh.try.begin() automatically. hope this answer your concern here.
Does this happen whenever there are changes to the active cleanups, and not just when entering the scope? Since you're not modeling the control edge from the access to the handler, IRGen isn't naturally going to create a cleanup just because there's a memory access in a scope, and even if it did, the cleanup would appear to be unreachable.
That's really a major part of my concern, that the test suite might be passing because of some common structure to the tests which minimizes the impact of the poor modeling in IR. IR modeling inadequacies are not going to cause obvious problems if in practice the SEH scope is always entered in a function that doesn't do much inside the scope besides call a function, and then the trap happens with the callee. In fact, I would be a lot more confident in this feature if you actually forced the IR to model it that way, so that there's nothing but a by-fiat unoptimizable invoke within the scope. You could extract the function in the frontend and then, if necessary, inline it in a very late pass after any IR-level optimizations, although that would leave me still concerned about LLVM CodeGen's transformations.
But even doing that wouldn't help with code like the following, where the only operation within the scope of s2 is a memory access:
void test() {
std::string s1 = <blah>;
int *ip = get_invalid_ptr();
std::string s2 = <blah>;
*ip = 0;
}
>> [rjmccall] Yes, but it's not known well to all the compiler developers who are reading and maintaining the Clang codebase. I'm not saying you should rename the command-line flag, I'm asking that you not just say "EHa" in comments and identifiers.
>
> Ok, no problem. more comments will be added.
Comments and identifiers. So, for example, the intrinsic should not be called `eha_scope_begin`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80344/new/
https://reviews.llvm.org/D80344
More information about the llvm-commits
mailing list