[PATCH] D77528: [MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm

Tue Apr 7 04:49:56 PDT 2020

ftynse marked an inline comment as done.
ftynse added inline comments.

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1404
+  static Optional<int64_t> getAllocationAlignment(Operation *op,
+                                                  bool &useAlignedAlloc) {
+    auto elementType = cast<AllocLikeOp>(op).getType().getElementType();
----------------
bondhugula wrote:
> bondhugula wrote:
> > ftynse wrote:
> > > bondhugula wrote:
> > > > ftynse wrote:
> > > > > Can't you just rely on the Optional result being None to signal that aligned alloc is not required?
> > > > We still use aligned_alloc if the elt size is > 16 bytes (in spite of no alignment attribute). The code would otherwise crash since the loads/stores won't have alignment guaranteed (with malloc) but the they would be lowered by LLVM via aligned load/stores. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries). 
> > > > We still use aligned_alloc if the elt size is > 16 bytes
> > > 
> > > At which point, this function returns the alignment value rather than `llvm::None`.
> > > The only situation where it returns a non-`None` value and sets `useAlignedAlloc` to `false` is for `AllocaOp`. I would consider refactoring this function to only work on `AllocOp`, and querying the `AllocaOp` allocation separately in the call place. This would make the API simpler and shorten the code. And you wouldn't need an extra boolean result.
> > > 
> > > > (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries).
> > > 
> > > I suppose we can just take the ABI alignment into account when lowering the `alloc` operation (we already do so for the `alignment` attribute). Not all platforms have `aligned_alloc`.
> > >I would consider refactoring this function to only work on 
> > >AllocOp, and querying the AllocaOp allocation separately in the 
> > 
> > Sure, sounds good.
> > 
> > >I suppose we can just take the ABI alignment into account when 
> > >lowering the alloc operation (we already do so for the
> > 
> > Yes, we should do this with malloc as well. This can be a TODO and should be done in a separate revision (because it isn't to do with aligned_alloc but fixing malloc lowering).
> > Not all platforms have aligned_alloc.
> 
> Btw, aligned_alloc is a C11 and C++11 standard, and it's available on Windows too (it's posix_memalign that isn't). And of course there are systems where there isn't any dynamic memory allocation at all. 
Yes, I know it's C11 (but C++17, most changes in C11 did not make it to C++11), but unfortunately, I don't think we can assume all target systems have C11, let alone C++17.

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1427
+      uint64_t constEltSizeBytes = 0;
+      auto isMallocAlignmentSufficient = [&]() {
+        if (auto vectorType = elementType.template dyn_cast<VectorType>())
----------------
bondhugula wrote:
> mehdi_amini wrote:
> > ftynse wrote:
> > > nicolasvasilache wrote:
> > > > bondhugula wrote:
> > > > > ftynse wrote:
> > > > > > bondhugula wrote:
> > > > > > > ftynse wrote:
> > > > > > > > I wouldn't add a lambda that is only called once immediately after its definition.
> > > > > > > Hmm... this is just for better readability - it gives a name / auto documents a code block without the need to outline it into a function or add an explicit comment. I've seen this as a standard practice.
> > > > > > This does not seem to be common practice in MLIR. FWIW, I find it less readable than just writing
> > > > > > 
> > > > > > ```
> > > > > > int64_t constEltSizeBytes = 0;
> > > > > > if (auto vectorType = elementType.template dyn_cast<VectorType>())
> > > > > >   constEltSizeBytes =
> > > > > >       vectorType.getNumElements() *
> > > > > >       llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
> > > > > > else
> > > > > >   constEltSizeBytes =
> > > > > >       llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);
> > > > > > 
> > > > > > // Use aligned_alloc if elt_size > malloc's alignment.
> > > > > > bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;
> > > > > > useAlignedAlloc = isMallocAlignmentSufficient;
> > > > > > ```
> > > > > > 
> > > > > > Since you already have the block comment immediately above it anyway, and variables can have names just as well as lambdas. The lambda also mutates a global state that it captures by-reference, so the only effects of lambda are: (1) extra indentation; (2) extra LoC; and (3) extra concepts leading to cognitive overhead.
> > > > > Hmm... the demarcation/isolation is important I feel. I'm fine with changing to the straightline style but out of curiosity and for future purposes as well, it'll be good to have a third person view here on coding style as far as such patterns go: @mehdi_amini - is there a guideline here?
> > > > I am generally a fan of such style (esp. when mixed with functional combinators), so I'd vote for +1 it when it makes sense.
> > > Generally, I would be strongly opposed to defining style guidelines based on a single use of a construct in a single diff, where only a small subset of contributors could participate (and before you can object that review history is public, are you reading all comments on all diffs?). I would be also opposed to having to define additional rules until we have to. I am not generally opposed to helper lambdas, I just don't see any benefit from this specific one, only drawbacks. And a lambda that the entire environment by reference is not exactly my definition of isolation.
> > @bondhugula this seems too detailed to have a guideline :)
> > I wouldn't say it is "common", but probably not unheard of?
> > I have been doing this myself but in general not calling it right after, rather to outline a block of code outside of a loop to make the loop shorter and easier to read, or similar situation (getting large boilerplate out of the way and "naming it").
> > 
> > @rriddle ?
> As Mehdi now confirms, this is too detailed to have a style guideline. The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it. 
> The lambda demarcates the start and end of the thing it's naming/auto-documenting - you don't get it from code comments alone and I see it better for readability. 

Well, you have a block comment right above it (modulo the variable declaration that is only used inside the lambda), so I wouldn't call it auto-documenting since you felt like you needed to write documentation for it. And having a named variable of lambda-type or an identically-named variable of boolean type still gives you exactly the same naming scheme.

> Given @ntv's and @mehdi_amini's comments, I'm now strongly inclined to retain it.

I read Mehdi's comment differently:

> but in general not calling it right after

> rather to outline a block of code outside of a loop to make the loop shorter

do not seem to necessarily support your usage here (neither does it contradict).

Readability is a very subjective thing. I was reading your code for review purposes and this construct did make me lose time and expand more energy than for a straight-line code here, so for me personally it decreased readability. Namely because it (a) mutates an implicitly captured variable and (b) requires to unwrap more abstractions mentally. Anyway, I won't block the commit just because of a stylistic discussion.

I can suggest an alternative that would address part of my readability concerns:

```
int64_t constEltSizeBytes = [elementType]() {
  if (auto vectorType = elementType.template dyn_cast<VectorType>())
    return vectorType.getNumElements() *
        llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
  else
    return llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);
}();
bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;
```

This removes implicit by-reference capture, makes it clear that you do not intend for the lambda to be reused (named lambda would be also okay since it doesn't store references anyway, but there's no point), and this way of using lambdas is actually considered a C++11 idiom for comlex constant initialization (https://groups.google.com/a/isocpp.org/g/std-discussion/c/FBjcR4WJlkU/m/nQnsSOziq04J) so one can claim it's "common enough".

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1589
+  // This is the alignment malloc typically provides.
+  constexpr static unsigned kMallocAlignment = 16;
 };
----------------
bondhugula wrote:
> ftynse wrote:
> > bondhugula wrote:
> > > ftynse wrote:
> > > > bondhugula wrote:
> > > > > ftynse wrote:
> > > > > > I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead?
> > > > > I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly every system we have (on probably really old ones, it was perhaps 8 bytes). Did you want a pass flag and then letting 16 be the default - that would be too much plumbing (just like alignedAlloc). This is already a parameter of sorts. 
> > > > The world is not limited to glibc. MLIR should also work on other platforms, and you essentially shift the burden of the plumbing you didn't do to people debugging builds on those platforms. You can have one pass option that corresponds to malloc alignment and, if it is set to 0, treat it as "never use aligned_alloc".
> > > Sorry, I didn't quite understand. What should the pass options be and what should the behavior and the default behavior be?
> > Normally, you would have two pass options (and a configuration `struct` for the constructors like Nicolas proposed in another patch to decrease the amount of churn in pass constructor APIs): `-use-aligned-alloc` and `-assume-malloc-alignment`. If you don't want two separate options, you could get away with one `-use-aligned-alloc-and-assume-malloc-alignment` (did not think about a better name). If it is set to zero (default), the conversion doesn't use aligned_alloc at all. If it is set to non-zero, the conversion uses aligned_alloc and treats the option value as malloc alignment in order to also use aligned_alloc in relevant cases.
> The configuration struct is now done (in the parent revision). How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided? This revision is not about tinkering with malloc alignment handling. 
> 
> Update - PTAL. Thanks for all the feedback.
> How about just keeping it simple with -use-aligned-alloc and not changing previous/existing behavior when -use-aligned-alloc is not provided?

Works for me!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77528/new/

https://reviews.llvm.org/D77528