[PATCH] D20176: [MemCpyOpt] Use MaxIntSize in byte instead of bit

Fri May 13 08:01:23 PDT 2016

junbuml added a comment.

> In your test case aren't we merging these stores into a memset to be later lowered to a series of stores (or single 64-bit store) on most targets? The final code generated will be better in this case, because it can merge the 3 stores.

Yes, for even small number of stores, isProfitableToUseMemset() try to see if we can reduce the total number of stores by merge them to memset which will be lowed later. But MaxIntSize should be in byte to make the heuristic correct.

> In other cases I'm concerned the rather small memset might perturb optimizations that don't know how to deal with memsets.

Not sure exactly which pass, but I think it could be, if a pass gives up it's optimization when encountering a call instruction. If we concern merging small number of stores into memset here, we might need to discuss if doing this itself is profitable or not.

> Don't we have a pass that merges stores into wider stores (e.g., SLP vectorizer or the counterpart to the LoadMerge pass or something)?

For  sequential stores, it seams that the narrow stores are combined in DAG level :

  store i16 0, i16* %0
  store i16 0, i16* %1

>
-

  STRWui %vreg1, %vreg0, 0;

As far as I check, however, this doesn't seem to handle the test case in this change, where I mixed the stored size :  2byte, 4byte, 2byte.

http://reviews.llvm.org/D20176