[PATCH] D77528: [MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm

Tue Apr 7 21:46:06 PDT 2020

bondhugula marked 14 inline comments as done.
bondhugula added a comment.

@ftynse Everything's addressed now but you may want to take another look because I had completely missed the fact earlier that aligned_alloc only supports a size that is a multiple of alignment. So I have additional handling (and test cases for that) now to bump the allocation size to the next multiple of alignment. Also, if the elt size is not a power of two (and an alignment attribute doesn't exist), the next power of two is used for alignment (although this isn't an interesting case for aligned allocation, didn't want to punt to malloc for this to keep things simple/clear). So, all heap allocations use aligned_alloc whenever -use-aligned-alloc is set.

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1427
+      uint64_t constEltSizeBytes = 0;
+      auto isMallocAlignmentSufficient = [&]() {
+        if (auto vectorType = elementType.template dyn_cast<VectorType>())
----------------
rriddle wrote:
> ftynse wrote:
> > bondhugula wrote:
> > > mehdi_amini wrote:
> > > > ftynse wrote:
> > > > > nicolasvasilache wrote:
> > > > > > bondhugula wrote:
> > > > > > > ftynse wrote:
> > > > > > > > bondhugula wrote:
> > > > > > > > > ftynse wrote:
> > > > > > > > > > I wouldn't add a lambda that is only called once immediately after its definition.
> > > > > > > > > Hmm... this is just for better readability - it gives a name / auto documents a code block without the need to outline it into a function or add an explicit comment. I've seen this as a standard practice.
> > > > > > > > This does not seem to be common practice in MLIR. FWIW, I find it less readable than just writing
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > int64_t constEltSizeBytes = 0;
> > > > > > > > if (auto vectorType = elementType.template dyn_cast<VectorType>())
> > > > > > > >   constEltSizeBytes =
> > > > > > > >       vectorType.getNumElements() *
> > > > > > > >       llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
> > > > > > > > else
> > > > > > > >   constEltSizeBytes =
> > > > > > > >       llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);
> > > > > > > > 
> > > > > > > > // Use aligned_alloc if elt_size > malloc's alignment.
> > > > > > > > bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;
> > > > > > > > useAlignedAlloc = isMallocAlignmentSufficient;
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > Since you already have the block comment immediately above it anyway, and variables can have names just as well as lambdas. The lambda also mutates a global state that it captures by-reference, so the only effects of lambda are: (1) extra indentation; (2) extra LoC; and (3) extra concepts leading to cognitive overhead.
> > > > > > > Hmm... the demarcation/isolation is important I feel. I'm fine with changing to the straightline style but out of curiosity and for future purposes as well, it'll be good to have a third person view here on coding style as far as such patterns go: @mehdi_amini - is there a guideline here?
> > > > > > I am generally a fan of such style (esp. when mixed with functional combinators), so I'd vote for +1 it when it makes sense.
> > > > > Generally, I would be strongly opposed to defining style guidelines based on a single use of a construct in a single diff, where only a small subset of contributors could participate (and before you can object that review history is public, are you reading all comments on all diffs?). I would be also opposed to having to define additional rules until we have to. I am not generally opposed to helper lambdas, I just don't see any benefit from this specific one, only drawbacks. And a lambda that the entire environment by reference is not exactly my definition of isolation.
> > > > @bondhugula this seems too detailed to have a guideline :)
> > > > I wouldn't say it is "common", but probably not unheard of?
> > > > I have been doing this myself but in general not calling it right after, rather to outline a block of code outside of a loop to make the loop shorter and easier to read, or similar situation (getting large boilerplate out of the way and "naming it").
> > > > 
> > > > @rriddle ?
> > > As Mehdi now confirms, this is too detailed to have a style guideline. The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it. 
> > > The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. 
> > 
> > Well, you have a block comment right above it (modulo the variable declaration that is only used inside the lambda), so I wouldn't call it auto-documenting since you felt like you needed to write documentation for it. And having a named variable of lambda-type or an identically-named variable of boolean type still gives you exactly the same naming scheme.
> > 
> > > Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it.
> > 
> > I read Mehdi's comment differently:
> > 
> > > but in general not calling it right after
> > 
> > > rather to outline a block of code outside of a loop to make the loop shorter
> > 
> > do not seem to necessarily support your usage here (neither does it contradict).
> > 
> > Readability is a very subjective thing. I was reading your code for review purposes and this construct did make me lose time and expand more energy than for a straight-line code here, so for me personally it decreased readability. Namely because it (a) mutates an implicitly captured variable and (b) requires to unwrap more abstractions mentally. Anyway, I won't block the commit just because of a stylistic discussion.
> > 
> > I can suggest an alternative that would address part of my readability concerns:
> > 
> > ```
> > int64_t constEltSizeBytes = [elementType]() {
> >   if (auto vectorType = elementType.template dyn_cast<VectorType>())
> >     return vectorType.getNumElements() *
> >         llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
> >   else
> >     return llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);
> > }();
> > bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;
> > ```
> > 
> > This removes implicit by-reference capture, makes it clear that you do not intend for the lambda to be reused (named lambda would be also okay since it doesn't store references anyway, but there's no point), and this way of using lambdas is actually considered a C++11 idiom for comlex constant initialization (https://groups.google.com/a/isocpp.org/g/std-discussion/c/FBjcR4WJlkU/m/nQnsSOziq04J) so one can claim it's "common enough".
> FWIWI, there is already a guide on using lamdas for computing predicates:
> 
> https://llvm.org/docs/CodingStandards.html#turn-predicate-loops-into-predicate-functions
Thank you all for commenting. This discussion is now moot since the aligned_alloc generation is now untangled from being contingent on this conditional / what malloc supports, and so this whole lambda and conditionals are gone. 

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1589
+  // This is the alignment malloc typically provides.
+  constexpr static unsigned kMallocAlignment = 16;
 };
----------------
ftynse wrote:
> bondhugula wrote:
> > ftynse wrote:
> > > bondhugula wrote:
> > > > ftynse wrote:
> > > > > bondhugula wrote:
> > > > > > ftynse wrote:
> > > > > > > I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead?
> > > > > > I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly every system we have (on probably really old ones, it was perhaps 8 bytes). Did you want a pass flag and then letting 16 be the default - that would be too much plumbing (just like alignedAlloc). This is already a parameter of sorts. 
> > > > > The world is not limited to glibc. MLIR should also work on other platforms, and you essentially shift the burden of the plumbing you didn't do to people debugging builds on those platforms. You can have one pass option that corresponds to malloc alignment and, if it is set to 0, treat it as "never use aligned_alloc".
> > > > Sorry, I didn't quite understand. What should the pass options be and what should the behavior and the default behavior be?
> > > Normally, you would have two pass options (and a configuration `struct` for the constructors like Nicolas proposed in another patch to decrease the amount of churn in pass constructor APIs): `-use-aligned-alloc` and `-assume-malloc-alignment`. If you don't want two separate options, you could get away with one `-use-aligned-alloc-and-assume-malloc-alignment` (did not think about a better name). If it is set to zero (default), the conversion doesn't use aligned_alloc at all. If it is set to non-zero, the conversion uses aligned_alloc and treats the option value as malloc alignment in order to also use aligned_alloc in relevant cases.
> > The configuration struct is now done (in the parent revision). How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? This revision is not about tinkering with malloc alignment handling. 
> > 
> > Update - PTAL. Thanks for all the feedback.
> > How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided?
> 
> Works for me!
Done - aligned_alloc is now used only with -use-aligned-alloc, and for all heap allocations whenever that cmd line flag exists. 

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1408
+    // Whenever we don't have alignment set, we will still use aligned_alloc
+    // if the alignment needed is more than what malloc can provide.
+    uint64_t constEltSizeBytes = 0;
----------------
nicolasvasilache wrote:
> Alignment needed is really a target-specific thing isn't it?
> I would have expected something that looks at DataLayout and that brings in the can of worms we have been punting on (I understand once flang is in MLIR core we will want to reopen it).
> 
> Can this part be dropped from this revision, esp in light of @ftynse's comments?
> 
> I also have some fun micro ARM targets I will need to test some of this on, the smaller the baked in assumptions the better.
> 
> 
All of this now dropped. -use-aligned-alloc is now independent of what malloc can do. The malloc path remains untouched.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77528/new/

https://reviews.llvm.org/D77528