[llvm-dev] Lowering llvm.memset for ARM target

Tue Sep 5 12:23:46 PDT 2017

As reported in an earlier thread
(http://clang-developers.42468.n3.nabble.com/Disable-memset-synthesis-tp4057810.html),
we noticed in some cases that the llvm.memset intrinsic, if lowered to
stores, could help with performance.

Here's a test case: If LIMIT is > 8, I see that a call to memset is
emitted for arm & aarch64, but not for x86 target.

typedef struct {
    int v0[100];
} test;
#define LIMIT 9
void init(test *t)
{
    int i;
    for (i = 0; i < LIMIT ; i++)
      t->v0[i] = 0;
}
int main() {
test t;
init(&t);
return 0;
}

Looking at the llvm sources, I see that there are two key target
specific variables, MaxStoresPerMemset and MaxStoresPerMemsetOptSize,
that determine if the intrinsic llvm.memset can be lowered into store
operations. For ARM, these variables are set to 8 and 4 respectively.

I do not know as to how the default values for these two variables are
arrived at, but doubling these values (similar to that for the x86
target) seems to help our case and we observe a 7% increase in
performance of our networking application. We use -O3 and -flto and
32-bit arm.

I can prepare a patch and post for review if such a change, say under
CodeGenOpt::Aggressive would be acceptable.

Thanks,
Bharathi