[LLVMdev] [Patches] Some LazyValueInfo and related patches

Thu Jan 23 23:34:33 PST 2014

Ping?

On Tuesday 21 January 2014 14:21:43 Olivier Goffart wrote:
> Hi.
> 
> Attached you will find a set of patches which I did while I was trying to
> solve two problems.
> I did not manage to solve fully what i wanted to improve, but I think it is
> still a step in the right direction.
> 
> The patches are hopefully self-explanatory.
> The biggest change here is that LazyValueInfo do not maintain a separate
> stack of work to do,
> but do the work directly recursively.
> 
> The test included in the patch 4 also test the patch 2.
> 
> 
> The first problem I was trying to solve is to be let the code give hint on
> the range of the values.
> 
> Imagine, in a library:
> 
> class CopyOnWrite {
>     char *stuff;
>     int ref_count;
>     void detach_internal();
>     inline void detach() {
>         if (ref_count > 1) {
>             detach_internal();
>             /* ref_count = 1; */
>         }
>     }
> public:
>     char &operator[](int i) { detach(); return stuff[i]; }
> };
> 
> Then, in code like this:
> 
> int doStuffWithStuff(CoptOnWrite &stuff) {
>     return stuff[0] + stuff[1] * stuff[2];
> }
> 
> The generated code will contains three test of ref_count, and three call to
> detach_internal
> 
> Is there a way to tell the compiler that ref_count is actually smaller or
> equal to 1 after a call to detach_internal?
> Having the "ref_count=1" explicit in the code help (with my patches), but
> then the operation itself is in the code, and I don't want that.
> 
> Something like
> 
>  if (ref_count>1)
>      __builtin_unreachable()
> 
> Works fine in GCC,  but does not work with LLVM.
> Well, it almost work.  but the problem is that the whole condition is
> removed before the inlining is done.
> So what can be done for that to work?  Either delay the removal of
> __builtin_unreachable() to after inlining (when?)
> Another way could be, while removing branches because they are unreachable,
> somehow leave the range information kept.
> I was thinking about a !range metadata, but I don't know where to put it.
> 
> The other problem was that i was analyzing code like this:
> 
> void toLatin1(uchar *dst, const ushort *src, int length)
> {
>     if (length) {
> #if defined(__SSE2__)
>         if (length >= 16) {
>             for (int i = 0; i < length >> 4; ++i) {
>                 /* skipped code using SSE2 intrinsics */
>                 src += 16; dst += 16;
>             }
>             length = length % 16;
>         }
> #endif
>         while (length--) {
>             *dst++ = (*src>0xff) ? '?' : (uchar) *src;
>             ++src;
>         }
>     }
> }
> 
> I was wondering, if compiling with AVX, would clang/LLVM be able to even
> vectorize more the SSE2 intrinsics to wider vectors? Or would the non
> intrinsics branch be better?
> It turns out the result is not great.  LLVM leaves the intrinsics code
> unchanged (that's ok),  but tries to also vectorize the second loop. (And
> the result of this vectorisation is quite horrible.)
> Shouldn't the compiler see that length is never bigger than 16 and hence
> deduce that there is no point in vectorizing? This is why I implemented the
> srem and urem in LVI.
> But then, maybe some other pass a loop pass should use LVI to see than a
> loop never enters, or loop vectorizer could use LVI to avoid creating the
> loop in the first place.
> 
> --
> Olivier