[PATCH] D25350: [X86] Enable interleaved memory accesses by default
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 20 13:29:08 PDT 2016
mkuper added a comment.
In https://reviews.llvm.org/D25350#575681, @RKSimon wrote:
> >> 1 - We've lost a number of cases where we had vectorized horizontal reduction clamp + sum patterns. These were typically loading 16 sparse integers as 4 x v4i32 in vpinsrd buildvector sequences and then performing the clamps (pminsd/pmaxsd) + hadd's. These are fully scalarized now.
> > That seems fairly bad.
> > Do you have a reproducer? This didsn't seem to break our existing horizontal reduction lit tests.
> We should be able to create one and address this as a follow up issue.
Great, please CC me on the PR when you have one, I'll look into it.
>>> This should be improvable in the backend with a mixture of more shuffle improvements (PR21281 and PR21138 come to mind)
>> It seems like PR21281 was mostly resolved. I'll need to look at PR21138.
> I have a possible shuffle patch that cover both of these, but haven't had time to finish it - its a rewrite of lowerVectorShuffleByMerging128BitLanes that acts a bit like lowerShuffleAsRepeatedMaskAndLanePermute but in reverse (multiple input lane permute followed by repeated mask).
Sounds good, feel free to add me to the review.
>>> and also possibly splitting a ymm load into 2 if the only use of the load is to extract the low / high xmm subvectors.
>> This is a bit weird - I'm not sure I'd expect this to fire in this kind of situation.
> Sorry, I meant such a change could fix the regression - and possibly allow a great deal more shuffle folding. It'll be a fine balance as to when to let it fire though.
Sorry, I wasn't clear- I understood what you meant. I'm just confused about why we're producing this pattern.
>> In any case, how do you think we can move forward with this? I'd really like to get this in (because of cases like the 60% improvement in denbench), but, obviously, with a minimum amount of regressions. :-)
>> If you can provide reproducers for the CG issues, I'll look into fixing them before enabling this. Otherwise, are you ok with this going on as is? If not, what's the alternative?
> Yes I think I'm happy for this to go ahead, the regression areas we can work on afterward, most can be solved during lowering and are existing issues - its just interleaving makes them a little more obvious!
Of course this doesn't introduce CG issues, it was just a question of which of the ones it exposes we "prefetch" vs. which ones we handle later. :)
Anyway, this sounds good to me, thanks a lot.
I'm going to push this in. If we see significant regressions coming from other people, I'm perfectly ok with reverting and reapplying after they're fixed.
More information about the llvm-commits