[PATCH] Improve DAG combine pass on certain IR vector patterns

Chandler Carruth chandlerc at google.com
Fri Jan 16 15:28:59 PST 2015

On Fri, Jan 16, 2015 at 3:23 PM, Fiona Glaser <fglaser at apple.com> wrote:

> > On Jan 16, 2015, at 3:13 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
> >
> > Cool, this looks good to me provided your data indicates this pattern
> works well on other targets as well. =]
> >
> > Thanks for working on it! Any chance you can also look at the fact that
> we use vmovq here rather than vmovlpd?
> I would think vmovq is more correct; it zeroes out the rest of the
> register while vmovlpd doesn’t, so vmovlpd would create a false dependency
> on the source register.
> >
> > We also at some point need to do a post-processing of the shuffles and
> replace ones that use packed double type when there is an equivalent for
> packed single type and it removes a bitcast.... It would be really awesome
> to get the "obvious" code of vmovlps + vmovhps here (or some variant of
> vmovlps that still targeted the floating point vector unit and didn't have
> an input dependency... mayb vxorps + vmovlps + vmovhps would be best)
> I’m really not certain saving one cycle of latency on a unit/unit
> forwarding delay would be worth an entire extra uop;

All of my measurements indicate that it is actually more than one cycle in
practice. =/ It is actually a huge hit on AMD chips, and even on Intel,
I've seen code that really fluctuated its performance around this.

The other reason I'm not worried about it is that xorps X, X should only
take up space in the decode buffer, etc. the register renamer and such
handles those AFAICT with essentially zero execution cost.

> that doesn’t really feel worth it at all. Plus I’m not even sure those
> particular instructions have that delay (it’s only specific combinations, I
> think…?)

That may well be true. I would certainly hope that they get decoded to
something less crazy.

> It’s not my fault x86 has weirdly non-orthogonal vector instructions ;-)

;] But without them, the vector shuffle lowering wouldn't be *nearly* so
much fun.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150116/8cbdf108/attachment.html>

More information about the llvm-commits mailing list