[LLVMdev] [Patches] Some LazyValueInfo and related patches

Sun Jan 26 23:02:33 PST 2014

Olivier Goffart wrote:
> Hi.
>
> Attached you will find a set of patches which I did while I was trying to solve
> two problems.

Patches should go to the llvm-commits list, not llvmdev. (Think of it as 
"llvm-patches" if that helps.)

> I did not manage to solve fully what i wanted to improve, but I think it is
> still a step in the right direction.
>
> The patches are hopefully self-explanatory.
> The biggest change here is that LazyValueInfo do not maintain a separate stack
> of work to do,
> but do the work directly recursively.

That sounds scary. We often get crash reports where we have recursion, 
and replace it with a worklist stack. The difference is purely 
mechanical. Is there a reason you needed to do this?

> The test included in the patch 4 also test the patch 2.

For patch 1 (SCCP):

-  // load null -> null
-  if (isa<ConstantPointerNull>(Ptr) && I.getPointerAddressSpace() == 0)
-    return markConstant(IV, &I, Constant::getNullValue(I.getType()));

This patch doesn't include a test so it isn't clear why you removed 
this. The transform you remove is correct -- load of null in addrspace 0 
is undefined behaviour. We could model the load as returning undef, but 
undef's nature makes it hard to reason about in SCCP (if one transform 
relies on a particular bit being 0 and another transform relies on a 
particular bit being 1, we won't notice and will miscompile), so picking 
null as a result seems reasonable?

For patch 2:

+static bool inferFromComparison(Value *Val, ICmpInst *ICI, bool isTrueDest,
+                                LVILatticeVal &Result) {
+  if (ICI && isa<Constant>(ICI->getOperand(1))) {

In your patch, ICI is never null (it has one caller, which does a 
dyn_cast). Please remove the redundant test. Then, you can replace the 
isa<Constant>(ICI->getOperand(1)) with a dyn_cast, and remove the 
"cast<Constant>(ICI->getOperand(1))" in two lines later on.

Furthermore, since this whole function relies on that getOperand(1) 
being a Constant*, please also make it an early exit.

+      // We know that V has the RHS constant if this is a true SETEQ or
+      // false SETNE.

I know you're just moving code around here, but seteq and setne were 
removed as of LLVM 2.0. Could you update the comment to refer to icmp eq 
and icmp ne?

Putting those together, it's something like:

static bool inferFromComparison(Value *Val, ICmpInst *ICI, bool isTrueDest,
                                 LVILatticeVal &Result) {
   Constant *Op1 = dyn_cast<Constant>(ICI->getOperand(1));
   if (!Op1) return false;

   if (ICI->isEquality() && ICI->getOperand(0) == Val) {
     // We know that V has the RHS constant if this is a true icmp eq
     // or false icmp ne.
     if (isTrueDest == (ICI->getPredicate() == ICmpInst::ICMP_EQ))
       Result = LVILatticeVal::get(Op1);
     else
       Result = LVILatticeVal::getNot(Op1);
     return true;
   }

+bool inferFromCondition(Value* Val, Value *Condition, bool isTrueDest,
+                        LVILatticeVal& Result) {

You have "Value* Val, Value *Condition" :) Please be consistent about 
putting the space before the * or & (some code is consistent about 
putting it after, if you find such files, consistency is more important).

+  if (Condition == Val) {
+    Result = LVILatticeVal::get(ConstantInt::get(
+                            Type::getInt1Ty(Val->getContext()), 
isTrueDest));
+        return true;
+  }

Too many spaces in front of "return true;".

+  // If the condition of the branch is an equality comparison, we may be
+  // able to infer the value.
+  if (ICmpInst *ICI = dyn_cast<ICmpInst>(Condition)) {
+    return inferFromComparison(Val, ICI, isTrueDest, Result);
+  }

This is no longer specific to equality comparison, update the comment.

Again, this patch will need a testcase to land (I know you said it's in 
patch 4, but with rare exception, we commit a patch with matching testcase).

inferFromCondition is directly recursive, please make it use a worklist. 
It doesn't need to exit and go through solveBlock, it can just keep its 
worklist local to its own function.

Both inferFromCondition and inferFromComparison should have function 
comments. I had to keep referring back to getEdgeValueLocal to figure 
out exactly what those functions were required to produce, as part of 
reviewing your patch.

For patch 3:

This looks good! If I understand it correctly, this saves us some 
compile-time and doesn't cause any functional change?

For patch 4:

Sorry, but this isn't acceptable as is, you'll have to find a different 
approach. Please find a way to make it work within the framework of LVI.

Why doesn't the existing recursion work? As far as I can tell, you ought 
to be getting the same result without this patch, and I think that 
suggests there's a bug elsewhere in LVI that we need to find instead. 
Returning false ought to redo the relevant computations.

It's hard for me to see it fail because the testcase depends on patches 
1 through 3, otherwise I would fire up a debugger and try to sort it out 
myself.

I'm going to hold off reviewing the rest of the patches, under the 
assumption that they depend on this one, until we get this sorted.

> The first problem I was trying to solve is to be let the code give hint on the
> range of the values.
>
> Imagine, in a library:
>
> class CopyOnWrite {
>      char *stuff;
>      int ref_count;
>      void detach_internal();
>      inline void detach() {
>          if (ref_count>  1) {
>              detach_internal();
>              /* ref_count = 1; */
>          }
>      }
> public:
>      char&operator[](int i) { detach(); return stuff[i]; }
> };
>
> Then, in code like this:
>
> int doStuffWithStuff(CoptOnWrite&stuff) {
>      return stuff[0] + stuff[1] * stuff[2];
> }
>
> The generated code will contains three test of ref_count, and three call to
> detach_internal
>
> Is there a way to tell the compiler that ref_count is actually smaller or
> equal to 1 after a call to detach_internal?
> Having the "ref_count=1" explicit in the code help (with my patches), but then
> the operation itself is in the code, and I don't want that.
>
> Something like
>
>   if (ref_count>1)
>       __builtin_unreachable()
>
> Works fine in GCC,  but does not work with LLVM.
> Well, it almost work.  but the problem is that the whole condition is removed
> before the inlining is done.

Exactly. I've looked into not deleting the condition that leads into an 
unreachable, but that in turn blocks other optimization and leads to 
worse code overall.

Nick

> So what can be done for that to work?  Either delay the removal of
> __builtin_unreachable() to after inlining (when?)
> Another way could be, while removing branches because they are unreachable,
> somehow leave the range information kept.
> I was thinking about a !range metadata, but I don't know where to put it.
>
> The other problem was that i was analyzing code like this:
>
> void toLatin1(uchar *dst, const ushort *src, int length)
> {
>      if (length) {
> #if defined(__SSE2__)
>          if (length>= 16) {
>              for (int i = 0; i<  length>>  4; ++i) {
>                  /* skipped code using SSE2 intrinsics */
>                  src += 16; dst += 16;
>              }
>              length = length % 16;
>          }
> #endif
>          while (length--) {
>              *dst++ = (*src>0xff) ? '?' : (uchar) *src;
>              ++src;
>          }
>      }
> }
>
> I was wondering, if compiling with AVX, would clang/LLVM be able to even
> vectorize more the SSE2 intrinsics to wider vectors? Or would the non
> intrinsics branch be better?
> It turns out the result is not great.  LLVM leaves the intrinsics code
> unchanged (that's ok),  but tries to also vectorize the second loop. (And the
> result of this vectorisation is quite horrible.)
> Shouldn't the compiler see that length is never bigger than 16 and hence
> deduce that there is no point in vectorizing? This is why I implemented the
> srem and urem in LVI.
> But then, maybe some other pass a loop pass should use LVI to see than a loop
> never enters, or loop vectorizer could use LVI to avoid creating the loop in
> the first place.
>
> --
> Olivier
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev