[PATCH] D37216: [SROA] propagate !range metadata when moving loads

Sun Sep 3 10:25:10 PDT 2017

On 09/03/2017 12:23 PM, Ariel Ben-Yehuda wrote:
>> What do you mean? We'd need to have one in the pass and pass it in to the function.
> I was talking about the fact that this also possibly causes damage in
> D37215. For promoteMemoryToRegister only it is pretty easy to patch
> things up.

Can you be more specific?

>
>   > But we're now automatically adding assumptions, and clang does emit
> nonnull. In any case, I'd not be surprised if you're right, but we
> need some data here.
>
> I just ordered a benchmarking result on rustc and a lot of other rust
> code. Hope I'll have results soon.

Great, thanks.

  -Hal

>
> On Sun, Sep 3, 2017 at 8:11 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> On 09/03/2017 11:31 AM, Ariel Ben-Yehuda wrote:
>>> I don't see an easy way to keep the OrderedInstructions info valid
>>> through all the passes that use computeKnownBits.
>>
>> What do you mean? We'd need to have one in the pass and pass it in to the
>> function. Regarding maintaining it, I don't think that's too much of a
>> problem (although it will require some work). OrderedInstructions is a cache
>> around OrderedBasicBlock, and OrderedBasicBlock already deals gracefully
>> with additions of new instructions (for dominance). I think that we could
>> essentially use the same scheme to track guaranteed-execution regions. Maybe
>> we need to change it to use value handles to track deletions.
>>
>>> I'm not sure how much of a problem this is at these passes - we can
>>> already go through quite a lot of stuff with the old
>>> `isSafeToSpeculativelyExecute` (e.g., loads of `dereferencable` data +
>>> arbitrary arithmetic). Because of that, I'm not inclined to believe
>>> this will make that problem much worse in practice.
>>>
>>> In any case I doubt any of this will show at clang, because clang does
>>> not emit either a lot of calls to either assume or !range metadata.
>>
>> But we're now automatically adding assumptions, and clang does emit nonnull.
>> In any case, I'd not be surprised if you're right, but we need some data
>> here.
>>
>>
>> Thanks again,
>> Hal
>>
>>>
>>> On Sun, Sep 3, 2017 at 7:19 PM, Ariel Ben-Yehuda <ariel.byd at gmail.com>
>>> wrote:
>>>> I'll try to see what I can do. Do you have any other pointers for me?
>>>>
>>>> On Sun, Sep 3, 2017 at 7:17 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>> On 09/03/2017 11:12 AM, Ariel Ben-Yehuda wrote:
>>>>>> I think the performance worry is already a problem with my previous
>>>>>> patch - we already call isValidAssumeForContext quite a bit of times
>>>>>> in ValueTracking, where it will perform the long walk on a big BB.
>>>>>
>>>>> Indeed. Maybe we can give
>>>>> include/llvm/Transforms/Utils/OrderedInstructions.h a mode in which it
>>>>> keeps
>>>>> track of guaranteed-execution regions? Then we can check for region
>>>>> membership and dominance in one step.
>>>>>
>>>>> Thanks again,
>>>>> Hal
>>>>>
>>>>>
>>>>>> On Sun, Sep 3, 2017 at 6:58 PM, Hal Finkel via Phabricator
>>>>>> <reviews at reviews.llvm.org> wrote:
>>>>>>> hfinkel added a comment.
>>>>>>>
>>>>>>> This looks good, but, have you checked for compile-time regressions?
>>>>>>> I'm
>>>>>>> concerned here because:
>>>>>>>
>>>>>>> We've now made addAssumptionsFromMetadata linear in the number of
>>>>>>> instructions between the instruction and its replacement. That should
>>>>>>> be
>>>>>>> documented with the function declaration (at least). As the
>>>>>>> replacement
>>>>>>> always dominates the original instruction, when they're in the same
>>>>>>> BB,
>>>>>>> we'll always be doing that linear walk. As you're making the walk go
>>>>>>> through
>>>>>>> normal loads, I can imagine this becoming quadratic in some inputs
>>>>>>> (unless
>>>>>>> there's something about the way that SROA uses this that prevents
>>>>>>> that, and
>>>>>>> if so, we need to document that too).
>>>>>>>
>>>>>>> If this can become quadratic, I think we'll want to maintain some kind
>>>>>>> of
>>>>>>> BB-region analysis, there we partition each BB into regions based on
>>>>>>> guaranteed execution, and then use that instead of walking.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: include/llvm/Analysis/ValueTracking.h:375
>>>>>>> +  /// Return true if, when CxtI executes, it is guaranteed that
>>>>>>> either
>>>>>>> +  /// I had executed already or that I is guaranteed to be reached
>>>>>>> and
>>>>>>> +  /// be executed.
>>>>>>> ----------------
>>>>>>> reached and executed -> later executed
>>>>>>>
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: lib/Transforms/Utils/PromoteMemoryToRegister.cpp:314
>>>>>>> +                                      AssumptionCache *AC)
>>>>>>> +{
>>>>>>> +  if (LI->getMetadata(LLVMContext::MD_nonnull) &&
>>>>>>> ----------------
>>>>>>> Brace on previous line.
>>>>>>>
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: lib/Transforms/Utils/PromoteMemoryToRegister.cpp:316
>>>>>>> +  if (LI->getMetadata(LLVMContext::MD_nonnull) &&
>>>>>>> +      !llvm::isKnownNonNullAt(ReplVal, LI, &DT)) {
>>>>>>> +    addAssumeNonNull(AC, LI);
>>>>>>> ----------------
>>>>>>> Do you need the llvm:: here? If not, please remove it.
>>>>>>>
>>>>>>>
>>>>>>> https://reviews.llvm.org/D37216
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> --
>>>>> Hal Finkel
>>>>> Lead, Compiler Technology and Programming Languages
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory