[LLVMdev] [Patches] Some LazyValueInfo and related patches
Olivier Goffart
olivier at woboq.com
Thu Jan 23 23:34:33 PST 2014
Ping?
On Tuesday 21 January 2014 14:21:43 Olivier Goffart wrote:
> Hi.
>
> Attached you will find a set of patches which I did while I was trying to
> solve two problems.
> I did not manage to solve fully what i wanted to improve, but I think it is
> still a step in the right direction.
>
> The patches are hopefully self-explanatory.
> The biggest change here is that LazyValueInfo do not maintain a separate
> stack of work to do,
> but do the work directly recursively.
>
> The test included in the patch 4 also test the patch 2.
>
>
> The first problem I was trying to solve is to be let the code give hint on
> the range of the values.
>
> Imagine, in a library:
>
> class CopyOnWrite {
> char *stuff;
> int ref_count;
> void detach_internal();
> inline void detach() {
> if (ref_count > 1) {
> detach_internal();
> /* ref_count = 1; */
> }
> }
> public:
> char &operator[](int i) { detach(); return stuff[i]; }
> };
>
> Then, in code like this:
>
> int doStuffWithStuff(CoptOnWrite &stuff) {
> return stuff[0] + stuff[1] * stuff[2];
> }
>
> The generated code will contains three test of ref_count, and three call to
> detach_internal
>
> Is there a way to tell the compiler that ref_count is actually smaller or
> equal to 1 after a call to detach_internal?
> Having the "ref_count=1" explicit in the code help (with my patches), but
> then the operation itself is in the code, and I don't want that.
>
> Something like
>
> if (ref_count>1)
> __builtin_unreachable()
>
> Works fine in GCC, but does not work with LLVM.
> Well, it almost work. but the problem is that the whole condition is
> removed before the inlining is done.
> So what can be done for that to work? Either delay the removal of
> __builtin_unreachable() to after inlining (when?)
> Another way could be, while removing branches because they are unreachable,
> somehow leave the range information kept.
> I was thinking about a !range metadata, but I don't know where to put it.
>
> The other problem was that i was analyzing code like this:
>
> void toLatin1(uchar *dst, const ushort *src, int length)
> {
> if (length) {
> #if defined(__SSE2__)
> if (length >= 16) {
> for (int i = 0; i < length >> 4; ++i) {
> /* skipped code using SSE2 intrinsics */
> src += 16; dst += 16;
> }
> length = length % 16;
> }
> #endif
> while (length--) {
> *dst++ = (*src>0xff) ? '?' : (uchar) *src;
> ++src;
> }
> }
> }
>
> I was wondering, if compiling with AVX, would clang/LLVM be able to even
> vectorize more the SSE2 intrinsics to wider vectors? Or would the non
> intrinsics branch be better?
> It turns out the result is not great. LLVM leaves the intrinsics code
> unchanged (that's ok), but tries to also vectorize the second loop. (And
> the result of this vectorisation is quite horrible.)
> Shouldn't the compiler see that length is never bigger than 16 and hence
> deduce that there is no point in vectorizing? This is why I implemented the
> srem and urem in LVI.
> But then, maybe some other pass a loop pass should use LVI to see than a
> loop never enters, or loop vectorizer could use LVI to avoid creating the
> loop in the first place.
>
> --
> Olivier
More information about the llvm-dev
mailing list