[cfe-dev] Disable certain llvm optimizations at clang frontend

Mon Jun 22 15:42:38 PDT 2020

On 6/21/20 2:30 AM, Y Song wrote:
> On Fri, Jun 19, 2020 at 6:20 PM Hal Finkel <hfinkel at anl.gov> wrote:
>>
>> On 6/19/20 3:08 PM, Eli Friedman wrote:
>>> (Reply inline)
>>>
>>>> -----Original Message-----
>>>> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Y Song via cfe-
>>>> dev
>>>> Sent: Friday, June 19, 2020 11:40 AM
>>>> To: Hal Finkel <hfinkel at anl.gov>
>>>> Cc: Andrii Nakryiko <andrii.nakryiko at gmail.com>; Clang Dev <cfe-
>>>> dev at lists.llvm.org>; Alexei Starovoitov <ast at kernel.org>
>>>> Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
>>>> frontend
>>>>
>>>> Just to be more specific about what transformations we want to disable:
>>>>     (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
>>>>           (https://reviews.llvm.org/D72787)
>>>>           This transformation created a new variable "var.off" for comparison but
>>>>           using the original variable "var" later on. The kernel verifier does not
>>>>           have a better range of "var" at its use place and this may cause
>>>>           verification failure.
>>>>
>>>>           To generalize, BPF prefers the single variable is refined and used later
>>>>           for each verification. New variable can be created and used for further
>>>>           value range refinement, but all later usage of old variable should be
>>>>           replaced with the new variable.
>>>>      (2). Prevent speculative code motion
>>>>             (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
>>>>            If the code in the original program may not execute under
>>>> certain conditions,
>>>>            we could like the code not to execute in the final byte code
>>>> under the same
>>>>            condition, regardless of whether it is safe to execute or not.
>>>>
>>>>       I guess we could have two attributes here:
>>>>          - "refine-value-range-with-original-var" (false by default, true for BPF)
>>>>          - "no-speculative-code-motion" (false by default, true for BPF)
>>> Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle.  It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.
>>>
>>> If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition.  Both clang and LLVM optimizations are completely
> bpf verifier is briefly described in the comments here:
>    https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38
>
> It is a path sensitive verifier. It will go through all possible paths
> to ensure all memory accesses are safe.
> The verifier is able to carry refined info across function calls.
> The condition may refine a value range, e.g.,
>     unsigned a = foo(); /* " a" is an unsigned int, could be any value
> 0 <= "a" <= UINT_MAX */
>     if (a < 128) {
>       /* varifier concludes "a" range now is [0, 127]. */
>       ...  * (p + a) ...   /* test whether *(p + [0, 127]) is legal or not.
>    }
>
> But verifier is not smart enough to do backtracking now, so if you have code
>     unsigned a = foo();
>     b = a + 10;
>     if (b < 128) {
>         ... * (p + a) ...    /* b will be [10, 127], but a is [0, UINT_MAX]. */
>     }
> The verifier may reject the above code as (p + a) may be out of bound
> of legal permitted memory range.
>
> Why verifier did not then implement backtracking? There are a few reasons:
>     - This is actually not very common. As long as the program does not
> have control flow like the above triggering a particular instcombine
> transformation, the compiler actually did a good job avoiding the
> above pattern to avoid a second copy of the same variable. But if this
> actually happens, it often cause a lot of
> frustration for developers as their codes are perfectly okay and they
> often do not know how to please
> the verifier.
>     - verifier is already very complex. Adding generic backtracking
> will add a lot of complexity to verifier.
> More states may need to be kept during verification process and this
> will require more kernel memory. The verifier will become slower too.
>
> The same for speculative code motion. Yes, the current verifier may
> reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
> this pkt_ptr += UINT_MAX may just a speculative move and later on
> pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
> actual memory access. In this case, the verifier will be wrong to
> reject the program.
>
> Not all speculative code motion will cause problems. Some scalar
> speculative code motion totally fine. Or if the speculative code
> motion still within ptr valid range, it will be fine too.
>
> My original proposal to disable certain optimizations are a big
> hammer, just to get it work so the developer can move on. I totally
> agree that using func attribute to disable certain specific
> optimizations may happen to work for some cases, but it may not work
> for all cases since other optimizations may perform optimizations.

Thanks, this is helpful. One question: Is the information used by the 
verifier available at compile time? It seems like what you want here is 
for the backend to iterate over the IR, perform the verification, and in 
cases where it would fail, push the relevant parts of the use-def chain 
down into the blocks with their eventual users.

  -Hal

...

>>    -Hal
>>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory