[PATCH] Optimize double storing by memset; memcpy (Take two)

Wed Mar 6 09:41:47 PST 2013

Hi Joel

I implemented something similar to this a year ago and found that where I would regress performance was in shortening a memcopy which would have been entirely in vector registers.

Eg, I'd you have a 64-byte memcpy it would have been 4 vector load/stores. But if the memset shortens it to 60 bytes you'll be 3 vector load/stores and 2 or 3 scalar.

I don't think this comment should block the patch, just something to think about. I don't even think this part of the optimiser knows the vector width of the target even if you did want to check that.

Thanks
Pete

Sent from my iPhone

On Mar 6, 2013, at 8:05 AM, Joel Jones <joel_k_jones at apple.com> wrote:

> 
>  Note that the the compile times aren't close, as my change is against a different version than the baseline.
> 
> http://llvm-reviews.chandlerc.com/D498
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits