[PATCH] D103009: [DSE] Transform memset + malloc --> calloc (PR25892)

Fri Aug 20 07:49:48 PDT 2021

alexfh added a comment.

In D103009#2956871 <https://reviews.llvm.org/D103009#2956871>, @yurai007 wrote:

>> I found and reduced a test case that shows a too eager replacement of malloc with calloc: https://gcc.godbolt.org/z/dTjonof74
>> (...) This is quite a serious problem. Would you please revert while working on a fix?
>
> First of all just to make it clear - in this particular case GCC does exactly same: https://gcc.godbolt.org/z/qbE6Kxdnv unless you pass different flags.

As I said, "GCC also has this problem" is not a good excuse.

In D103009#2956948 <https://reviews.llvm.org/D103009#2956948>, @xbolva00 wrote:

> I think more codes with “common pattern” see improvements so few ones which are problematic should use flag or attribute.
>
> Also unroller sometimes goes crazy and degrades performance but here nobody talks about revert/removal/turning off.

Revert is the safest thing to do when a real problem has been root caused to the commit and there is no quick and obvious fix. If there's no consensus on introducing a postdominance check with the purpose of making this transformation more conservative, I'd insist on reverting until there's a clear path forward.
I'd probably be fine with using a workaround (`-fno-builtin-calloc`) for now, but it also disables the completely reasonable and uncontroversial transformation of `memset(malloc(n), 0, n)` to `calloc(1, n)`, which is a net regression from the state before this commit: https://gcc.godbolt.org/z/Ev64KPs85

And needless to say, "there's a problem elsewhere, why should we do better here?" is not a productive approach ;) If you see a change in unroller pessimizing use cases you care about, it *may* be reasonable to speak up, ask for a revert and figure out how to solve the problem while keeping trunk in a better shape.

In D103009#2956897 <https://reviews.llvm.org/D103009#2956897>, @yurai007 wrote:

>> I will be blunt: i do not understand what are the profitability heuristics for this transform are?
>> When is `calloc` better than a `malloc` immediately followed by the `memset`?
>
> IIRC the main rationale behind this transformation is here: https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc. 
> I think I could prepare benchmark showing benefits of calloc over malloc+memset, I already observed reduced RSS and less page faults after transformation on some tests.

Yes, please. A benchmark measuring the benefits of this transformation would be nice. Specifically, it would be interesting to see the differences between the too eager (current) version of this transform vs a more conservative version (to be implemented) with a "memset postdominating malloc" check. For comparison I have a benchmark that shows a 4+x increase in RSS with this transformation in a real production code running in a rather realistic configuration.

> However if non-standard malloc is used it may behave differently.

In the post I linked to above (http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html) there were measurements of performance impacts of this transformation (when done by GCC) with three different malloc implementations - glibc, tcmalloc, jemalloc. glibc malloc suffered the most (>30% drop in QPS).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103009/new/

https://reviews.llvm.org/D103009