[PATCH] D9822: Reducing the costs of cast instructions to enable more vectorization of smaller types in LoopVectorize

Mon Jul 13 04:15:24 PDT 2015

samparker added a comment.

In http://reviews.llvm.org/D9822#202874, @sbaranga wrote:

> In http://reviews.llvm.org/D9822#201852, @samparker wrote:
>
> > Hi Silviu,
> >
> > So i've added some comments for clarification and also removed the sub operands from the chains for simplicity as I didn't notice any performance difference in the tests I was looking at. I've also added support for 64-bit > 32-bit operations and updated the test file to reflect this.
> >
> > cheers,
>
>
> Thanks Sam!
>
> Some further comments:
>
> Subtraction is a very common operation, it would be much nicer to get it now. wrt performance numbers this would only show up in a benchmark if it would affect the hot loop _and_ the loop would end up being vectorized.
>
> Looking at what bits we don't care about seems like the right approach to me. I think an easier way to reason about these is to have a bitmask for each value in the chain that we're looking at, indicating what bits we care about. Different operations would do different things on this bitmask, and we would compute the value of the bitmasks starting from the trunc operations.
>
> For example:
>
>   b = add a, c means that a and c have the same bitmask as b
>   
>
> The same would be true for mul, left shift, all bitwise operators, and sub (since sub can be written with and add and a not).
>
>   b = shr a, 2,  and b has a bitmask of 0b00011 ( where 1 means we care, 0 don't care) means a has a bitmask of 0b01111 (technically it would be 0b01100, but this is an approximation).
>   
>   
>
> This can probably be done for most operators.
>
> Also, if you have an operation that takes a constant and you know what bits you care about it should be perfectly legal to truncate the constant and get rid of some of the bits that you don't care about.
>
> Technically, we could set the mask values on the truncate operations and iterate using a work list until all bitmasks have converged, but that is best left for future work.
>
> What would be your opinion on this?
>
> Cheers,
> Silviu

Hey Silviu,

I believe this is the technique that James had told me about and mentioned that it had already been implemented somewhere in the code base, but without an interface. I didn't go for it originally because I didn't what the ramp up time of trying to understand, refactor and modify two separate passes and I'm still concerned about that time factor! The bitmask idea does sound like a much more useful and reusable approach, however I really can't commit the time to implement it, unless you believe it can slot into the current structure of my analysis.

Cheers,

http://reviews.llvm.org/D9822