[PATCH] D104636: [LoopIdiom] [LoopNest] let the pass deal with runtime memset size

Thu Jul 15 05:49:52 PDT 2021

yurai007 added a comment.

Just minor comment regarding potential benefits and benchmarking. If I understand patch correctly it flatten multilevel loop with memset into one memset. According simple microbenchmarks: https://godbolt.org/z/MTnYcvvYo on my x86-64 Skylake flattening memset in double loop (memset_3D) into 1 memset gives between even 800% performance boost (on small WS) to ~80% boost (when WS > LLC). 
But how useful such transformation would be in practice? I'm not sure. We need to keep in mind that memset is usually just part of initialization/reusing memory code so in real world benchmarks flattening memsets loop may be less beneficial than microbenchmarks shows.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104636/new/

https://reviews.llvm.org/D104636