[llvm-dev] array fill idioms

Fri Nov 25 15:46:54 PST 2016

Take a look an memset (byte patterns), and memset_patternX (multi byte 
patterns, only currently supported for selective targets).  In general, 
support for fill idioms is something we could stand to improve and it's 
something I or someone on my team is likely to be working on within the 
next year.

Today, the naive store loop is probably your best choice to have emitted 
by the frontend.  This loop will be nicely vectorized by the loop 
vectorizer, specialized if the loop length is known to be small, and 
otherwise decently handled.  The only serious problem with this 
implementation strategy is that you end up with many copies of the fill 
loop scattered throughout your code (code bloat).  (I'm assuming this 
gets aggressively inlined.  If it doesn't, well, then there are bigger 
problems.)

Moving forward, any further support we added would definitely handle 
pattern matching the naive loop constructs.  Given that, it's also 
reasonably future proof as well.

Philip

On 11/10/2016 01:25 PM, Bagel via llvm-dev wrote:
> I am asking for some collective wisdom/guidance.
>
> What sort of IR construct should one use to implement filling each
> element in an array (or vector) with the same value?  In C++, this
> might arise in "std:fill" or "std:fill_n", when the element values in the
> vector are identical.
> In the D language, one can fill an array or a slice of an array
> by an assignment, e.g.
>    "A[2..10] = 42;"
>
> 1. What I would prefer is an explicit intrinsic, call it "llvm.fill.*" that
>     would work similar to the "llvm.memset.*" intrinsic.  The memset intrinsic
>     only works with byte arrays, but provides wonderful optimizations in the
>     various code generators.  Hopefully, these similar optimizations would be
>     implemented for "llvm.fill.*".
>
> 2. Given that I probably won't get my wish, I note that some front-ends use
>     vector assignment:
>     store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16
> 42>, <8 x i16>* %14, align 2
>     Does this work well for architectures without SIMD?
>     What chunk size should be used for the vector, and is that architecture
>     dependent?
>
> 3. If vectors are not used, but rather an explicit loop of stores,
>     element-by-element, will this be recognized as an idiom for
>     architecture-dependent optimizations?
>
> Thanks in advance.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev