[PATCH] Improve DAG combine pass on certain IR vector patterns

Fiona Glaser fglaser at apple.com
Fri Jan 16 15:32:41 PST 2015

> All of my measurements indicate that it is actually more than one cycle in practice. =/ It is actually a huge hit on AMD chips, and even on Intel, I've seen code that really fluctuated its performance around this.

Darnit, I had almost escaped the “having to care about performance on AMD” train ;-)

> The other reason I'm not worried about it is that xorps X, X should only take up space in the decode buffer, etc. the register renamer and such handles those AFAICT with essentially zero execution cost.
> that doesn’t really feel worth it at all. Plus I’m not even sure those particular instructions have that delay (it’s only specific combinations, I think…?)
> That may well be true. I would certainly hope that they get decoded to something less crazy.
> It’s not my fault x86 has weirdly non-orthogonal vector instructions ;-)
> ;] But without them, the vector shuffle lowering wouldn't be *nearly* so much fun.

Heehee, indeed. Anyways, I think this is getting slightly off-topic from the original thread!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150116/fe370072/attachment.html>

More information about the llvm-commits mailing list