[PATCH] [SLPVectorization] Enhance Ability to Vectorize Horizontal Reductions from Consecutive Loads

suyog suyog.sarda at samsung.com
Thu Dec 25 22:13:36 PST 2014


Hi Michael.

Ideally, for sum of 8 operands, for which we have a deeper tree, Right and Left should each have 4 operands.
But the way the tree is build up, we recursively call build_tree() for Left and Right, we handle 2 loads each at a time.

Lets take an example :). I will number the adds op

                    + (1)
                /      \ 
               /        \ 
              /          \
             /            \
           + (2)            + (3)
         /   \            /  \
        /     \          /    \
       /       \        /      \
     +(4)       +(5)    +(6)    +(7)
    /  \      /  \    /  \     /  \
  0    1     2    3  4    5    6   7

When the add 1 is encountered (top most add), we split the tree into left and right subtree, and recursively call build tree on 
left and right of 2nd and 3rd add. So, we go to left of 2nd and 3rd add, and arrive at 4th and 6th add. Then again we go to left
of 4th and 6th add and encounter a[0] and a[4], we put them into Left vector and go to right of 4th and 6th add. We arrive at 
a[1] and a[5], we put them into vector Right. After this we check if Elements in Left and Right can be bundled into a vector of loads.

  Left        Right
  a[0]          a[1]
  a[4]          a[5]

At this point, we are totally unaware of the other loads, since we haven't called build_tree() on the right side of 2nd and 3rd add yet.
(DFS running in parallel for subtree starting at 2nd and 3rd add).

Once, we are done with processing and bundling the above pair of loads into a vector, we then move to the right recursively, and finally 
encounter 5th and 7th add and then have Left and Right as:

  Left           Right
  a[2]          a[3]
  a[6]          a[7]

Note that at this point, a[0], a[1], a[4] and a[5] are already processed. I hope you get my point :)

The Left and Right will contain more than 2 operands if the number of subtree are more than 2, which is not possible with single tree,
as single tree has only 2 children at a time.

I think that this should be handled in better way, because the above code had even more potential to get vectorize into <4x> vectors instead of <2x> vectors,
though i am not yet sure how to do that. I would be happy for your suggestions on it.

Awaiting your reply !!

Regards,
Suyog

(Merry Xmas and Happy New Year :) )


REPOSITORY
  rL LLVM

http://reviews.llvm.org/D6675

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list