[cfe-dev] Disable certain llvm optimizations at clang frontend
Y Song via cfe-dev
cfe-dev at lists.llvm.org
Tue Jun 23 23:13:44 PDT 2020
On Tue, Jun 23, 2020 at 8:09 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
>
> On 6/23/20 3:37 PM, Y Song wrote:
> > On Tue, Jun 23, 2020 at 11:55 AM Hal Finkel <hfinkel at anl.gov> wrote:
> >>
> >> On 6/23/20 12:28 AM, Y Song wrote:
> >>> On Mon, Jun 22, 2020 at 3:42 PM Hal Finkel <hfinkel at anl.gov> wrote:
> >>>> On 6/21/20 2:30 AM, Y Song wrote:
> >>>>> On Fri, Jun 19, 2020 at 6:20 PM Hal Finkel <hfinkel at anl.gov> wrote:
> >>>>>> On 6/19/20 3:08 PM, Eli Friedman wrote:
> >>>>>>> (Reply inline)
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Y Song via cfe-
> >>>>>>>> dev
> >>>>>>>> Sent: Friday, June 19, 2020 11:40 AM
> >>>>>>>> To: Hal Finkel <hfinkel at anl.gov>
> >>>>>>>> Cc: Andrii Nakryiko <andrii.nakryiko at gmail.com>; Clang Dev <cfe-
> >>>>>>>> dev at lists.llvm.org>; Alexei Starovoitov <ast at kernel.org>
> >>>>>>>> Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
> >>>>>>>> frontend
> >>>>>>>>
> >>>>>>>> Just to be more specific about what transformations we want to disable:
> >>>>>>>> (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
> >>>>>>>> (https://reviews.llvm.org/D72787)
> >>>>>>>> This transformation created a new variable "var.off" for comparison but
> >>>>>>>> using the original variable "var" later on. The kernel verifier does not
> >>>>>>>> have a better range of "var" at its use place and this may cause
> >>>>>>>> verification failure.
> >>>>>>>>
> >>>>>>>> To generalize, BPF prefers the single variable is refined and used later
> >>>>>>>> for each verification. New variable can be created and used for further
> >>>>>>>> value range refinement, but all later usage of old variable should be
> >>>>>>>> replaced with the new variable.
> >>>>>>>> (2). Prevent speculative code motion
> >>>>>>>> (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
> >>>>>>>> If the code in the original program may not execute under
> >>>>>>>> certain conditions,
> >>>>>>>> we could like the code not to execute in the final byte code
> >>>>>>>> under the same
> >>>>>>>> condition, regardless of whether it is safe to execute or not.
> >>>>>>>>
> >>>>>>>> I guess we could have two attributes here:
> >>>>>>>> - "refine-value-range-with-original-var" (false by default, true for BPF)
> >>>>>>>> - "no-speculative-code-motion" (false by default, true for BPF)
> >>>>>>> Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.
> >>>>>>>
> >>>>>>> If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely
> >>>>> bpf verifier is briefly described in the comments here:
> >>>>> https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38
> >>>>>
> >>>>> It is a path sensitive verifier. It will go through all possible paths
> >>>>> to ensure all memory accesses are safe.
> >>>>> The verifier is able to carry refined info across function calls.
> >>>>> The condition may refine a value range, e.g.,
> >>>>> unsigned a = foo(); /* " a" is an unsigned int, could be any value
> >>>>> 0 <= "a" <= UINT_MAX */
> >>>>> if (a < 128) {
> >>>>> /* varifier concludes "a" range now is [0, 127]. */
> >>>>> ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
> >>>>> }
> >>>>>
> >>>>> But verifier is not smart enough to do backtracking now, so if you have code
> >>>>> unsigned a = foo();
> >>>>> b = a + 10;
> >>>>> if (b < 128) {
> >>>>> ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
> >>>>> }
> >>>>> The verifier may reject the above code as (p + a) may be out of bound
> >>>>> of legal permitted memory range.
> >>>>>
> >>>>> Why verifier did not then implement backtracking? There are a few reasons:
> >>>>> - This is actually not very common. As long as the program does not
> >>>>> have control flow like the above triggering a particular instcombine
> >>>>> transformation, the compiler actually did a good job avoiding the
> >>>>> above pattern to avoid a second copy of the same variable. But if this
> >>>>> actually happens, it often cause a lot of
> >>>>> frustration for developers as their codes are perfectly okay and they
> >>>>> often do not know how to please
> >>>>> the verifier.
> >>>>> - verifier is already very complex. Adding generic backtracking
> >>>>> will add a lot of complexity to verifier.
> >>>>> More states may need to be kept during verification process and this
> >>>>> will require more kernel memory. The verifier will become slower too.
> >>>>>
> >>>>> The same for speculative code motion. Yes, the current verifier may
> >>>>> reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
> >>>>> this pkt_ptr += UINT_MAX may just a speculative move and later on
> >>>>> pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
> >>>>> actual memory access. In this case, the verifier will be wrong to
> >>>>> reject the program.
> >>>>>
> >>>>> Not all speculative code motion will cause problems. Some scalar
> >>>>> speculative code motion totally fine. Or if the speculative code
> >>>>> motion still within ptr valid range, it will be fine too.
> >>>>>
> >>>>> My original proposal to disable certain optimizations are a big
> >>>>> hammer, just to get it work so the developer can move on. I totally
> >>>>> agree that using func attribute to disable certain specific
> >>>>> optimizations may happen to work for some cases, but it may not work
> >>>>> for all cases since other optimizations may perform optimizations.
> >>>> Thanks, this is helpful. One question: Is the information used by the
> >>>> verifier available at compile time? It seems like what you want here is
> >>> We have identified some compiler optimizations in instcombine and
> >>> simplifyCFG may generate verifier unfriendly code. Yes, we could do
> >>> IR analysis to identify these patterns, may not be 100% but should be
> >>> close.
> >>>
> >>> Completely modelling what verifier did in llvm is complex. Verifier
> >>> interacts with kernel subsystem with helper functions and these
> >>> helper functions having some semantics utilized with the verifier.
> >>> Bringing kernel helpers to llvm does not sound a good idea.
> >>> Verifier implements a path sensitive verification framework. To replicate
> >>> in llvm probably not a good idea as there are constant changes to
> >>> verifier and it is very hard to keep them sync. So here, in llvm the
> >>> best probably
> >>> to identify known IR patterns where verifier unfriendly code may be
> >>> generated and prevent them from generating (preferred) or undo them
> >>> if already generated.
> >>
> >> Honestly, I don't think that you're thinking about this in the
> >> most-useful way. Obviously, we strive for code reuse and reduced
> >> duplication. However, imagine if this were, not a VM with a
> >> software-programmed loader, but a hardware architecture. The verifier
> >> would be part of the hardware, and it's semantics would be part of the
> >> hardware ISA. In this case, updates to the verifier would, without
> >> particular debate, necessitate changes to the compiler. When the
> >> hardware is updated, the compiler needs to be updated. Keeping the two
> >> in sync is just part of the normal compiler-development process. Will
> >> this be difficult? Maybe. But, as Eli mentioned, you can use
> >> conservative approximations where modeling the details doesn't matter
> >> too much. Regardless, at a high level, yes, you should duplicate the
> >> relevant parts of the verifier logic in the backend. You can then use
> >> that to move code that is correct only assuming speculation is
> >> side-effect free (consistent with LLVM's semantics) into the blocks with
> >> the eventual users (which is required by the verifier).
> >>
> >> The idea that you're doing to disable particular optimizations to fix
> >> this problem is not robust or sound. Any number of other transformations
> >> could introduce similar issues. Are you going to constantly audit
> >> everything? The input source-language code could have the same problem.
> >> You can, of course, say that the source-language inputs are not really
> >> C/C++ code, but some similar-looking language with different semantics,
> >> but frankly, you'll be fighting an uphill battle the whole way. I
> >> understand that you already do this to some extent (e.g., by disallowing
> >> unbounded loops), but these semantics are far more pervasive (it seems
> >> to me, anyway, as qualitatively different). As you can probably tell, I
> >> really don't recommend trying to go down this path.
> > I agree that disabling optimization is not nice. It is just a big
> > hammer approach
> > and it may not work if optimization is changed. What I do not want to do
> > is to implement the whole verifier logic in LLVM. But as you and Eli mentioned,
> > we can just implement a portion of verifier logic, focusing solely on what
> > verifier does not like.
> >
> > One of my previous suggestions is to implement some verifier logic in clang
> > code generation pass. I am not 100% sure how easy it is. Hence I am
> > proposing a target influenced PassManager in clang so BPF target can have
> > a say.
> >
> > But later Eli pointed out that we could use
> > addCoroutinePassesToExtensionPoints()
> > approach to have an early IR pass to encode verifier logic to IR so downstream
> > optimization can stay as is. This effectively very similar to my clang
> > target influenced
> > PassManager, already implemented in llvm.
> > I think this is a sound approach and I will experiment with it.
>
>
> I don't understand what you're planning to try. How would the pass
> encode the verifier constraints in the IR?
The following two examples, which I described in one of my previous emails,
show what I want to do in the IR transformation:
What we really want is to
"serialize" variable uses to avoid backtracking and speculative code
motion. The following is a specific example,
a = bar();
if (a >= 0) {
if (a <= 16) {
...
}
}
We would like to generate a code like
// var_barrier(a) defines and uses a
#define var_barrier(a) asm volatile ("" : "=r"(a) : "0"(a))
a = bar();
var_barrier(a);
if (a >= 0) {
var_barrier(a);
if (a <= 16) {
...
}
}
The above code will effectively disable control flow optimization and
will generate verifier friendly code.
For speculative code motion part,
a = bar();
if (a < 10) {
... *(p + a) ...
}
We can do
a = bar();
var_barrier(a);
if (a < 10) {
var_barrier(a);
... *(p + a) ...
}
This will prevent speculative code motion as well.
Basically, we selectively put some memory or variable barriers to
restrict some reorderings or
condition merging. As experiments go on, we will see whether this is
sufficient or not.
>
> Backends, as you're probably aware, commonly add IR level passes that
> run prior to instruction selection (BPFPassConfig::addIRPasses already
> does so).
Yes, I am aware of this and we already use it. I hope to use adjustPassManager()
so BPF can "preprocess" IR before major optimization happens.
>
> -Hal
>
>
> >
> > Thanks for all your involved discussions for this topic!
> >
> >> Thanks again,
> >>
> >> Hal
> >>
> >>
> >>>
> >>>> for the backend to iterate over the IR, perform the verification, and in
> >>>> cases where it would fail, push the relevant parts of the use-def chain
> >>>> down into the blocks with their eventual users.
> >>> This is a good idea. The issue is instCombine and SimplifyCFG happens
> >>> before any backend IR passes. So backend will have to undo some of
> >>> these optimizations. For speculative code motion, the backend probably
> >>> hard to do since it does not even know the code is speculative generated.
> >>>
> >>> Can we prevent these optimizations happening in the first place?
> >>>
> >>> In addition to my previous three ideas:
> >>> - function attribute to state intention,
> >>> - clang IR generation with certain variable barriers.
> >>> - new TLI functions used by bpf backend and additional bpf backend
> >>> IR phases.
> >>>
> >>> Two more ideas here:
> >>> (1). Is it possible to implement an generic IR pass (before InstCombine
> >>> and SimplifyCFG) which takes some hints from backend to modify IR
> >>> to favor BPF verifier?
> >>> (2). is it possible that clang frontend provides a hook for target to
> >>> provide a pass manager or at least having a say
> >>> to the pass manager? This way, target may disable certain passes?
> >>> I realize this is a big hammer approach. But Considering here that our
> >>> main goal is to improve development experience and performance
> >>> is not a goal, maybe it is still acceptable?
> >>>
> >>>> -Hal
> >>>>
> >>>> ...
> >>>>
> >>>>
> >>>>>> -Hal
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Hal Finkel
> >>>>>> Lead, Compiler Technology and Programming Languages
> >>>>>> Leadership Computing Facility
> >>>>>> Argonne National Laboratory
> >>>>>>
> >>>> --
> >>>> Hal Finkel
> >>>> Lead, Compiler Technology and Programming Languages
> >>>> Leadership Computing Facility
> >>>> Argonne National Laboratory
> >>>>
> >> --
> >> Hal Finkel
> >> Lead, Compiler Technology and Programming Languages
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
More information about the cfe-dev
mailing list