[PATCH] Improve DAG combine pass on certain IR vector patterns
fglaser at apple.com
Fri Jan 16 15:32:41 PST 2015
> All of my measurements indicate that it is actually more than one cycle in practice. =/ It is actually a huge hit on AMD chips, and even on Intel, I've seen code that really fluctuated its performance around this.
Darnit, I had almost escaped the “having to care about performance on AMD” train ;-)
> The other reason I'm not worried about it is that xorps X, X should only take up space in the decode buffer, etc. the register renamer and such handles those AFAICT with essentially zero execution cost.
> that doesn’t really feel worth it at all. Plus I’m not even sure those particular instructions have that delay (it’s only specific combinations, I think…?)
> That may well be true. I would certainly hope that they get decoded to something less crazy.
> It’s not my fault x86 has weirdly non-orthogonal vector instructions ;-)
> ;] But without them, the vector shuffle lowering wouldn't be *nearly* so much fun.
Heehee, indeed. Anyways, I think this is getting slightly off-topic from the original thread!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits