[LLVMdev] [Patches] Some LazyValueInfo and related patches

Mon Jan 27 07:48:11 PST 2014

Hi,

Let me just add a comment to what Nick already said:
LVI is not recursive on purpose. The tree of an expression may be 
arbitrarily long, and therefore a recursion could overflow the stack. Or it 
could also take too long to compute.
The current LVI's worklist limits its precision. It has an arbitrary limit 
of not looking back more than 2 basic blocks, AFAIR.

I dislike this behavior of LVI as well, but turning it into an unbounded 
recursion is not the solution, though.  I believe an eager and sparse 
analysis would perform better than what we have today, but I don't have any 
numbers to corroborate my intuition.  People have also opposed the SSI-based 
thing we had previously (which ended up being removed).

Nuno

----- Original Message -----
> Olivier Goffart wrote:
>> Hi.
>>
>> Attached you will find a set of patches which I did while I was trying to 
>> solve
>> two problems.
>
> Patches should go to the llvm-commits list, not llvmdev. (Think of it as 
> "llvm-patches" if that helps.)
>
>> I did not manage to solve fully what i wanted to improve, but I think it 
>> is
>> still a step in the right direction.
>>
>> The patches are hopefully self-explanatory.
>> The biggest change here is that LazyValueInfo do not maintain a separate 
>> stack
>> of work to do,
>> but do the work directly recursively.
>
> That sounds scary. We often get crash reports where we have recursion, and 
> replace it with a worklist stack. The difference is purely mechanical. Is 
> there a reason you needed to do this?
>
>> The test included in the patch 4 also test the patch 2.
>
> For patch 1 (SCCP):
>
> -  // load null -> null
> -  if (isa<ConstantPointerNull>(Ptr) && I.getPointerAddressSpace() == 0)
> -    return markConstant(IV, &I, Constant::getNullValue(I.getType()));
>
> This patch doesn't include a test so it isn't clear why you removed this. 
> The transform you remove is correct -- load of null in addrspace 0 is 
> undefined behaviour. We could model the load as returning undef, but 
> undef's nature makes it hard to reason about in SCCP (if one transform 
> relies on a particular bit being 0 and another transform relies on a 
> particular bit being 1, we won't notice and will miscompile), so picking 
> null as a result seems reasonable?
>
> For patch 2:
>
> +static bool inferFromComparison(Value *Val, ICmpInst *ICI, bool 
> isTrueDest,
> +                                LVILatticeVal &Result) {
> +  if (ICI && isa<Constant>(ICI->getOperand(1))) {
>
> In your patch, ICI is never null (it has one caller, which does a 
> dyn_cast). Please remove the redundant test. Then, you can replace the 
> isa<Constant>(ICI->getOperand(1)) with a dyn_cast, and remove the 
> "cast<Constant>(ICI->getOperand(1))" in two lines later on.
>
> Furthermore, since this whole function relies on that getOperand(1) being 
> a Constant*, please also make it an early exit.
>
> +      // We know that V has the RHS constant if this is a true SETEQ or
> +      // false SETNE.
>
> I know you're just moving code around here, but seteq and setne were 
> removed as of LLVM 2.0. Could you update the comment to refer to icmp eq 
> and icmp ne?
>
> Putting those together, it's something like:
>
> static bool inferFromComparison(Value *Val, ICmpInst *ICI, bool 
> isTrueDest,
>                                 LVILatticeVal &Result) {
>   Constant *Op1 = dyn_cast<Constant>(ICI->getOperand(1));
>   if (!Op1) return false;
>
>   if (ICI->isEquality() && ICI->getOperand(0) == Val) {
>     // We know that V has the RHS constant if this is a true icmp eq
>     // or false icmp ne.
>     if (isTrueDest == (ICI->getPredicate() == ICmpInst::ICMP_EQ))
>       Result = LVILatticeVal::get(Op1);
>     else
>       Result = LVILatticeVal::getNot(Op1);
>     return true;
>   }
>
> +bool inferFromCondition(Value* Val, Value *Condition, bool isTrueDest,
> +                        LVILatticeVal& Result) {
>
> You have "Value* Val, Value *Condition" :) Please be consistent about 
> putting the space before the * or & (some code is consistent about putting 
> it after, if you find such files, consistency is more important).
>
> +  if (Condition == Val) {
> +    Result = LVILatticeVal::get(ConstantInt::get(
> +                            Type::getInt1Ty(Val->getContext()), 
> isTrueDest));
> +        return true;
> +  }
>
> Too many spaces in front of "return true;".
>
> +  // If the condition of the branch is an equality comparison, we may be
> +  // able to infer the value.
> +  if (ICmpInst *ICI = dyn_cast<ICmpInst>(Condition)) {
> +    return inferFromComparison(Val, ICI, isTrueDest, Result);
> +  }
>
> This is no longer specific to equality comparison, update the comment.
>
> Again, this patch will need a testcase to land (I know you said it's in 
> patch 4, but with rare exception, we commit a patch with matching 
> testcase).
>
>
> inferFromCondition is directly recursive, please make it use a worklist. 
> It doesn't need to exit and go through solveBlock, it can just keep its 
> worklist local to its own function.
>
> Both inferFromCondition and inferFromComparison should have function 
> comments. I had to keep referring back to getEdgeValueLocal to figure out 
> exactly what those functions were required to produce, as part of 
> reviewing your patch.
>
> For patch 3:
>
> This looks good! If I understand it correctly, this saves us some 
> compile-time and doesn't cause any functional change?
>
> For patch 4:
>
> Sorry, but this isn't acceptable as is, you'll have to find a different 
> approach. Please find a way to make it work within the framework of LVI.
>
> Why doesn't the existing recursion work? As far as I can tell, you ought 
> to be getting the same result without this patch, and I think that 
> suggests there's a bug elsewhere in LVI that we need to find instead. 
> Returning false ought to redo the relevant computations.
>
> It's hard for me to see it fail because the testcase depends on patches 1 
> through 3, otherwise I would fire up a debugger and try to sort it out 
> myself.
>
> I'm going to hold off reviewing the rest of the patches, under the 
> assumption that they depend on this one, until we get this sorted.
>
>> The first problem I was trying to solve is to be let the code give hint 
>> on the
>> range of the values.
>>
>> Imagine, in a library:
>>
>> class CopyOnWrite {
>>      char *stuff;
>>      int ref_count;
>>      void detach_internal();
>>      inline void detach() {
>>          if (ref_count>  1) {
>>              detach_internal();
>>              /* ref_count = 1; */
>>          }
>>      }
>> public:
>>      char&operator[](int i) { detach(); return stuff[i]; }
>> };
>>
>> Then, in code like this:
>>
>> int doStuffWithStuff(CoptOnWrite&stuff) {
>>      return stuff[0] + stuff[1] * stuff[2];
>> }
>>
>> The generated code will contains three test of ref_count, and three call 
>> to
>> detach_internal
>>
>> Is there a way to tell the compiler that ref_count is actually smaller or
>> equal to 1 after a call to detach_internal?
>> Having the "ref_count=1" explicit in the code help (with my patches), but 
>> then
>> the operation itself is in the code, and I don't want that.
>>
>> Something like
>>
>>   if (ref_count>1)
>>       __builtin_unreachable()
>>
>> Works fine in GCC,  but does not work with LLVM.
>> Well, it almost work.  but the problem is that the whole condition is 
>> removed
>> before the inlining is done.
>
> Exactly. I've looked into not deleting the condition that leads into an 
> unreachable, but that in turn blocks other optimization and leads to worse 
> code overall.
>
> Nick
>
>> So what can be done for that to work?  Either delay the removal of
>> __builtin_unreachable() to after inlining (when?)
>> Another way could be, while removing branches because they are 
>> unreachable,
>> somehow leave the range information kept.
>> I was thinking about a !range metadata, but I don't know where to put it.
>>
>> The other problem was that i was analyzing code like this:
>>
>> void toLatin1(uchar *dst, const ushort *src, int length)
>> {
>>      if (length) {
>> #if defined(__SSE2__)
>>          if (length>= 16) {
>>              for (int i = 0; i<  length>>  4; ++i) {
>>                  /* skipped code using SSE2 intrinsics */
>>                  src += 16; dst += 16;
>>              }
>>              length = length % 16;
>>          }
>> #endif
>>          while (length--) {
>>              *dst++ = (*src>0xff) ? '?' : (uchar) *src;
>>              ++src;
>>          }
>>      }
>> }
>>
>> I was wondering, if compiling with AVX, would clang/LLVM be able to even
>> vectorize more the SSE2 intrinsics to wider vectors? Or would the non
>> intrinsics branch be better?
>> It turns out the result is not great.  LLVM leaves the intrinsics code
>> unchanged (that's ok),  but tries to also vectorize the second loop. (And 
>> the
>> result of this vectorisation is quite horrible.)
>> Shouldn't the compiler see that length is never bigger than 16 and hence
>> deduce that there is no point in vectorizing? This is why I implemented 
>> the
>> srem and urem in LVI.
>> But then, maybe some other pass a loop pass should use LVI to see than a 
>> loop
>> never enters, or loop vectorizer could use LVI to avoid creating the loop 
>> in
>> the first place.
>>
>> --
>> Olivier