[PATCH] Optimize double storing by memset; memcpy (Take two)
Peter Cooper
peter_cooper at apple.com
Wed Mar 6 09:41:47 PST 2013
Hi Joel
I implemented something similar to this a year ago and found that where I would regress performance was in shortening a memcopy which would have been entirely in vector registers.
Eg, I'd you have a 64-byte memcpy it would have been 4 vector load/stores. But if the memset shortens it to 60 bytes you'll be 3 vector load/stores and 2 or 3 scalar.
I don't think this comment should block the patch, just something to think about. I don't even think this part of the optimiser knows the vector width of the target even if you did want to check that.
Thanks
Pete
Sent from my iPhone
On Mar 6, 2013, at 8:05 AM, Joel Jones <joel_k_jones at apple.com> wrote:
>
> Note that the the compile times aren't close, as my change is against a different version than the baseline.
>
> http://llvm-reviews.chandlerc.com/D498
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list