[llvm] [X86] LowerSelect - use BLENDV for scalar selection if not all operands are multi use (PR #125853)
Phoebe Wang via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 6 17:18:29 PST 2025
phoebewang wrote:
> > > > Changes to sse-minmax.ll - Ill push the diff (tmp commit for review - I'll remove it again later)
> > >
> > >
> > > We assume move have negligible cost in uarch and the total instrcution count is not increased. Why it is not prefered?
> >
> >
> > Not all uarchs form the SSE4 era had move elimination, and often the BLENDV instructions were 2 uops or more - so the total uop count could increase if the 3 x 1uop logic ops (+maybe 1uop move for the ANDNP mask) were replaced with 3 x 1uop moves + 1 x 2uop BLENDV - that's the worse case scenario. But we already always take that chance with BLENDV for vector select, its just the scalar selects that for some reason we were more cautious. I was trying to find a compromise, but I'm not against dropping the multiuse limit for SSE4 entirely.
>
> The either op having one-use seems reasonable compromise to me, although I also think removing the requirement entirely is sensible. My guess is most stuff compiled for SSE4 is probably running on newer hardware (mov elim, fast blendv) and just compiled with SSE4 for compatibilities sake.
+1. If it's ok either way, we can assume newer hardware performance is important than the older ones. We used this strategy when bumping the general tuning.
https://github.com/llvm/llvm-project/pull/125853
More information about the llvm-commits
mailing list