[PATCH] D102817: [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Apr 30 02:58:59 PDT 2022


efriedma added inline comments.


================
Comment at: llvm/lib/CodeGen/BranchFolding.cpp:1207
     // If it is dead, remove it.
-    if (MBB->pred_empty()) {
+    if (MBB->pred_empty() && !MBB->hasAddressTaken()) {
       RemoveDeadBlock(MBB);
----------------
efriedma wrote:
> tentzen wrote:
> > efriedma wrote:
> > > tentzen wrote:
> > > > JosephTremoulet wrote:
> > > > > do the tests (either the new ones you're adding or existing ones) exercise the case where we need this?
> > > > Yes, the **Dtor one.  it's how we found this bug. thanks.
> > > We intentionally do not check for `hasAddressTaken()` here. Even if something refers to the address, it's not legal to branch to the block, so it can't contain any useful instructions.  AsmPrinter::emitFunctionBody has some code to print a fake label if there's a reference to an erased block.
> > Hmm, really? there are at least three places where hasAddressTaken() is used to guard optimizations in the same pass (BranchFolding.cpp).  why is OptimizeBranches() special?
> The other places in the pass are checking hasAddressTaken() to try to figure out if they can rewrite the predecessors.  Here, we can trivially rewrite all the predecessors: there aren't any.
I guess given that we have references to an MBB stored in FuncInfo, we can't really delete the block.  So I guess this is right, sort of.  But it would still be undefined behavior to jump to it, so I'm not sure this really solves whatever problem you ran into.


================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2916
+      if (CleanupMBB) // a block referenced by EH table
+        CleanupMBB->setHasAddressTaken(); // so dtor-funclet not removed by opts
       break;
----------------
tentzen wrote:
> efriedma wrote:
> > tentzen wrote:
> > > efriedma wrote:
> > > > tentzen wrote:
> > > > > efriedma wrote:
> > > > > > tentzen wrote:
> > > > > > > efriedma wrote:
> > > > > > > > tentzen wrote:
> > > > > > > > > efriedma wrote:
> > > > > > > > > > I assume this is trying to trick optimizations into avoiding optimizations on the cleanup block if the __try involves multiple basic blocks.
> > > > > > > > > > 
> > > > > > > > > > Have you considered trying to fix the control flow graph?  I mean, I understand the difficulty of expressing the control flow at the IR level, given we don't have IR versions of "load" and "store" with an unwind edge, but those concerns don't really apply in the same way in a MachineFunction.  You can mark the cleanup block as a successor of any block that can throw an exception, without explicitly marking up the control flow on each individual instruction.  (This is how exceptions normally work.)
> > > > > > > > > The Cleanup block is turned into a dtor funclet at the end. And the beginning address of that funclet will be recorded in EH table in .xdata section. So it's really address-taken by definition. If you verify this by checking Linker relocation records or the EH table in .asm file.  IMO this flag should be set regardless of the SEH feature.
> > > > > > > > That's not what "address-taken" is supposed to mean.
> > > > > > > > 
> > > > > > > > It's supposed to be equivalent to BasicBlock::hasAddressTaken, which means that the block is the target of a blockaddress expression (used for indirectbr).  Primarily, this restricts certain optimizations because in general, we can't split a critical edge from an indirectbr.
> > > > > > > > 
> > > > > > > > hasAddressTaken() is not a general "will the address of the block eventually be emitted somewhere in the assembly".  That can happen for a variety of unrelated reasons, including exception handling, debug info, jump tables, and branch lowering.
> > > > > > > I'm not sure about your interpretation.  the comment of this flag from the header is:
> > > > > > >   
> > > > > > >   /// Returns true if there are any uses of this basic block other than
> > > > > > >   /// direct branches, switches, etc. to it.
> > > > > > >   bool hasAddressTaken() const {
> > > > > > >     return getBasicBlockBits().BlockAddressRefCount != 0;
> > > > > > >   }
> > > > > > > it does include "switch" which I think refers to jump-table.
> > > > > > > Besides, the dtor-funclet in EH table is invoked via indirectbr/indirect-call.  Does not that also meet your interpretation?
> > > > > > > 
> > > > > > "other than direct branches, switches, etc." means MachineBasicBlock terminators, exception pads, and jump tables.  Anything which is *not* one of those, we mark up, because target-independent code can't analyze it.  But in almost all cases, the above list covers basically everything except indirectbr until you get very late in the compile process.
> > > > > > 
> > > > > > If we do end up performing some transform that materializes the address of a block outside of the above cases, it could make sense to mark a block address-taken.  (Some target-specific code does this.  Not sure if that code is really doing it correctly, but none of those calls are relevant to this patch.)  But we should not be preemptively marking basic block address-taken: we should wait until such a transform is actually performed.
> > > > > > 
> > > > > > For a dtor-funclet, we don't actually "take the address" of the funclet until the AsmPrinter, when we compute the exception tables.  We could add an address-taken marking at that point... but nothing would check it that late in the pipeline.
> > > > > This is exactly the case that a target-independent code (like BranchFolding.cpp) must be aware of it. 
> > > > > The dtor-funclet cleanup block is genuinely address-taken the moment it's created.
> > > > > Yes, it's not emitted until AsmPrinter. but it does not mean the block is not address-taken until AsmPrinter. 
> > > > > 
> > > > There's not really any point to arguing what the person who originally wrote the comment meant. Sorry, I shouldn't have tried to take the discussion in that direction.
> > > > 
> > > > But I still think this is confusing: there's multiple different use-cases smashed together into one boolean.
> > > > 
> > > > How about this: we split up isBlockAddressTarget(), isAsyncSEHTarget(), and maybe isMachineLabelUsed() for the target-specific stuff that messes with MachineBasicBlock labels in weird ways.  And then as a transitional measure hasAddressTaken() checks all of them, maybe?  Really, I expect most of the relevant places that would check hasAddressTaken(), or something like that.
> > > > 
> > > > Then we can go through later and refine various places to check exactly what they need.  I suspect most places don't actually need all three.
> > > I feel doing that is over-engineering and I don't see much value out of it.
> > > 
> > > note that in addition to BasicBlock.h, it's also commented in MachineBasicBlock.h :
> > > 
> > >   /// Test whether this block is **potentially the target of an indirect branch.**
> > >   bool hasAddressTaken() const { return AddressTaken; }
> > > 
> > > Also, some present usages are far from what you described. 
> > > See the usages in RetpolineThunkInserter where CaptureSpec and CallTarget blocks are not even address-taken.
> > > 
> > > The usage of hasAddressTaken() in this revision is much more appropriate and legitimate.
> > Looked a bit more.  It turns out there's an actual bug here: mixing up the different markings could confuse the AsmPrinter if we mix together the different bits.  Posted D124697.
> > 
> > With that, the "MBBLabelUsed" is probably close enough to what you need, if we need something.  I guess we don't really need to distinguish between the SEH uses and other target-specific uses.
> > 
> > I'm still not sure where, exactly, the MBB is being stored that necessitates this.  If I look at an MIR dump, the only place the EHPad shows up is as a successor of the block that can throw an exception. And as far as I can tell, none of the side-tables in WinEHFuncInfo contain any references to MBBs.
> The EH table is named $stateUnwindMap$**, the 'state' and 'action' map table.  It is stored in $cppxdata$** that is the primary EH table passed to _CxxFrameHandler3/4 by __ehhandler$**.  Check the new test case windows-seh-EHa-cpp-Dtors01.ll.
> LLVM code that emits it is in WinException.cpp:
> 
>   // UnwindMapEntry {
>   //   int32_t ToState;
>   //   void  (*Action)();
>   // };
>   if (UnwindMapXData) {
>     OS.emitLabel(UnwindMapXData);
>     for (const CxxUnwindMapEntry &UME : FuncInfo.CxxUnwindMap) {
>       MCSymbol *CleanupSym =
>           getMCSymbolForMBB(Asm, UME.Cleanup.dyn_cast<MachineBasicBlock *>());
>       AddComment("ToState");
>       OS.emitInt32(UME.ToState);
> 
>       AddComment("Action");
>       OS.emitValue(create32bitRef(CleanupSym), 4);
>     }
> 
> note getMCSymbolForMBB() that refers to the ehcleanup block we were talking about here.
I was more concerned about the other end of it... but looking into it, I guess FunctionLoweringInfo::set() is actually where we end up computing the mapping.  So we need to ensure that the FuncInfo tables stay consistent through the entirety of codegen.  Which seems like an extreme restriction, but I guess it's unlikely to matter that much in practice.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102817/new/

https://reviews.llvm.org/D102817



More information about the llvm-commits mailing list