[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations
Rafael Auler via llvm-dev
llvm-dev at lists.llvm.org
Wed Oct 2 15:18:18 PDT 2019
You’re correct, except that, in Propeller, CFI duplication happens for every basic block as it operates with the conservative assumption that a block can be put anywhere by the linker. That’s a significant bloat that is not cleaned up later. So, during link time, if N blocks from the same function are contiguous in the final layout, as it should happen most of the time for any sane BB order, we would have several FDEs for a region that only needs one. The bloat goes to the final binary (a lot more FDEs, specifically, one FDE per basic block).
BOLT will only split a function in two parts, and only if it has profile. Most of the time, a function is not split. It also has an option not to split at all. For internally reordered basic blocks of a given function, it has CFI deduplication logic (it will interpret and build the CFI states for each block and rewrite the CFIs in a way that uses the minimum number of instructions to encode the states for each block).
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of James Y Knight via llvm-dev <llvm-dev at lists.llvm.org>
Reply-To: James Y Knight <jyknight at google.com>
Date: Wednesday, October 2, 2019 at 1:59 PM
To: Maksim Panchenko <maks at fb.com>
Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations
I'm a bit confused by this subthread -- doesn't BOLT have the exact same CFI bloat issue? From my cursory reading of the propellor doc, the CFI duplication is _necessary_ to represent discontiguous functions, not anything particular to the way Propellor happens to generate those discontiguous functions.
And emitting discontiguous functions is a fundamental goal of this, right?
On Wed, Oct 2, 2019 at 4:25 PM Maksim Panchenko via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Thanks for clarifying. This means once you move to the next basic block (or any other basic
block in the function) you have to execute an entirely new set of CFI instructions
except for the common CIE part. While indeed this is not as bad, on average, the overall
active memory footprint will increase.
Creating one FDE per basic block means that .eh_frame_hdr, an allocatable section,
will be bloated too. This will increase the FDE lookup time. I don’t see .eh_frame_hdr
being mentioned in the proposal.
On 10/2/19, 12:20 PM, "Krzysztof Pszeniczny" <kpszeniczny at google.com<mailto:kpszeniczny at google.com>> wrote:
On Wed, Oct 2, 2019 at 8:41 PM Maksim Panchenko via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
*Pessimization/overhead for stack unwinding used by system-wide profilers and
for exception handling*
Larger CFI programs put an extra burden on unwinding at runtime as more CFI
(and thus native) instructions have to be executed. This will cause more
overhead for any profiler that records stack traces, and, as you correctly note
in the proposal, for any program that heavily uses exceptions.
The number of CFI instructions that have to be executed when unwinding any given stack stays the same. The CFI instructions for a function have to be duplicated in every basic block section, but when performing unwinding only one such a set is executed -- the copy for the current basic block. However, this copy contains precisely the same CFI instructions as the ones that would have to be executed if there were no basic block sections.
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev