[llvm-dev] array fill idioms
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Fri Nov 25 15:46:54 PST 2016
Take a look an memset (byte patterns), and memset_patternX (multi byte
patterns, only currently supported for selective targets). In general,
support for fill idioms is something we could stand to improve and it's
something I or someone on my team is likely to be working on within the
Today, the naive store loop is probably your best choice to have emitted
by the frontend. This loop will be nicely vectorized by the loop
vectorizer, specialized if the loop length is known to be small, and
otherwise decently handled. The only serious problem with this
implementation strategy is that you end up with many copies of the fill
loop scattered throughout your code (code bloat). (I'm assuming this
gets aggressively inlined. If it doesn't, well, then there are bigger
Moving forward, any further support we added would definitely handle
pattern matching the naive loop constructs. Given that, it's also
reasonably future proof as well.
On 11/10/2016 01:25 PM, Bagel via llvm-dev wrote:
> I am asking for some collective wisdom/guidance.
> What sort of IR construct should one use to implement filling each
> element in an array (or vector) with the same value? In C++, this
> might arise in "std:fill" or "std:fill_n", when the element values in the
> vector are identical.
> In the D language, one can fill an array or a slice of an array
> by an assignment, e.g.
> "A[2..10] = 42;"
> 1. What I would prefer is an explicit intrinsic, call it "llvm.fill.*" that
> would work similar to the "llvm.memset.*" intrinsic. The memset intrinsic
> only works with byte arrays, but provides wonderful optimizations in the
> various code generators. Hopefully, these similar optimizations would be
> implemented for "llvm.fill.*".
> 2. Given that I probably won't get my wish, I note that some front-ends use
> vector assignment:
> store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16
> 42>, <8 x i16>* %14, align 2
> Does this work well for architectures without SIMD?
> What chunk size should be used for the vector, and is that architecture
> 3. If vectors are not used, but rather an explicit loop of stores,
> element-by-element, will this be recognized as an idiom for
> architecture-dependent optimizations?
> Thanks in advance.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev