[PATCH] D45834: [TTI] Add a hook to TTI for choosing scalarized shuffle-reduction sequence for reduction idiom

Mon Apr 23 13:57:36 PDT 2018

hsaito added a comment.

In https://reviews.llvm.org/D45834#1074249, @FarhanaAleen wrote:

> Hi Hideki.
>
> "It looks to me that you are trying to fine-tune the final step of log-2 reduction."
>
> This patch actually tries to generate required sized vectors(precise vector length) for shuffleReduction (where scalarization follows naturally) instead of keeping the starting vector length fixed all the way through which are filled with unnecessary undefs. I agree it was not clear in the previous patch. To make the distinction clear, I've created a separate function called getVariableLengthShuffleReduction() with additional changes.
>
> I am not opposed to do this in the target/Codegen, it's just that the implementation is a little messy since there is no good way to classify scalar operations as opposed to vector operations whereas it can be implemented in a cleaner way at the first place with a TTI hook.
>
> Does it look reasonable to add a TTI hook and allow SLP to generate precise vector length for the shuffle reduction at the first place if target wants  that?

I don't know about your original intention, but your previous patch (on the left pane as of now) wasn't doing what you described above ---- and the right pane does what you've described above.

This time, I can see why you'd want to do this upfront --- except for the last step in your code. You may call this last step more optimal and I don't have any data to support or go against your opinion about that ---- but I can say it's not consistent. Why bother? Is that REALLY needed? In any case, you seem to be introducing a new "canonical form" for reduction last value compute. You are better off having a good discussion in llvm-dev, by explicitly asking for those who "latch on" to reduction last value compute, to see if they are okay to add another form (or if you really think this is a better representation, you should argue that the old form should be retired). I suggest sending an RFC to llvm-dev. At this moment, you fail to convince me why the community should support both canonical forms (i.e., why you can't optimize from the current form ---- or why you'd want to keep the old form). If others support having both is a great idea, I won't insist.

BTW, have you looked at ARM's last value compute experimental intrinsic: e.g., int_experimental_vector_reduce_add? If you don't mind losing IR-level optimizations for reduction last value code, this is the easiest way to do custom lowering in the Target. I don't know whether this meets your needs, but I thought it's worth mentioning.

https://reviews.llvm.org/D45834