[PATCH] D77528: [MLIR] Add support to use aligned_alloc to lower AllocOp from std to llvm

Mon Apr 6 05:22:29 PDT 2020

ftynse added inline comments.

================
Comment at: mlir/include/mlir/Conversion/Passes.td:228
   let options = [
-    Option<"useAlloca", "use-alloca", "bool", /*default=*/"false",
-           "Use `alloca` instead of `call @malloc` for converting std.alloc">,
----------------
Nit: I suppose this option should be removed by the previous commit, instead of rewritten by this one.

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1404
+  static Optional<int64_t> getAllocationAlignment(Operation *op,
+                                                  bool &useAlignedAlloc) {
+    auto elementType = cast<AllocLikeOp>(op).getType().getElementType();
----------------
bondhugula wrote:
> ftynse wrote:
> > Can't you just rely on the Optional result being None to signal that aligned alloc is not required?
> We still use aligned_alloc if the elt size is > 16 bytes (in spite of no alignment attribute). The code would otherwise crash since the loads/stores won't have alignment guaranteed (with malloc) but the they would be lowered by LLVM via aligned load/stores. (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries). 
> We still use aligned_alloc if the elt size is > 16 bytes

At which point, this function returns the alignment value rather than `llvm::None`.
The only situation where it returns a non-`None` value and sets `useAlignedAlloc` to `false` is for `AllocaOp`. I would consider refactoring this function to only work on `AllocOp`, and querying the `AllocaOp` allocation separately in the call place. This would make the API simpler and shorten the code. And you wouldn't need an extra boolean result.

> (LLVM in the absence of alignment attributes on load/store would assume ABI alignment which for say vector<8xf32> elt types would be 256-bit boundaries).

I suppose we can just take the ABI alignment into account when lowering the `alloc` operation (we already do so for the `alignment` attribute). Not all platforms have `aligned_alloc`.

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1427
+      uint64_t constEltSizeBytes = 0;
+      auto isMallocAlignmentSufficient = [&]() {
+        if (auto vectorType = elementType.template dyn_cast<VectorType>())
----------------
bondhugula wrote:
> ftynse wrote:
> > I wouldn't add a lambda that is only called once immediately after its definition.
> Hmm... this is just for better readability - it gives a name / auto documents a code block without the need to outline it into a function or add an explicit comment. I've seen this as a standard practice.
This does not seem to be common practice in MLIR. FWIW, I find it less readable than just writing

```
int64_t constEltSizeBytes = 0;
if (auto vectorType = elementType.template dyn_cast<VectorType>())
  constEltSizeBytes =
      vectorType.getNumElements() *
      llvm::divideCeil(vectorType.getElementTypeBitWidth(), 8);
else
  constEltSizeBytes =
      llvm::divideCeil(elementType.getIntOrFloatBitWidth(), 8);

// Use aligned_alloc if elt_size > malloc's alignment.
bool isMallocAlignmentSufficient = constEltSizeBytes > kMallocAlignment;
useAlignedAlloc = isMallocAlignmentSufficient;
```

Since you already have the block comment immediately above it anyway, and variables can have names just as well as lambdas. The lambda also mutates a global state that it captures by-reference, so the only effects of lambda are: (1) extra indentation; (2) extra LoC; and (3) extra concepts leading to cognitive overhead.

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1589
+  // This is the alignment malloc typically provides.
+  constexpr static unsigned kMallocAlignment = 16;
 };
----------------
bondhugula wrote:
> ftynse wrote:
> > I am a bit concerned about hardcoding the "typical" value. Can we make it parametric instead?
> I had sort of a similar concern. But 16 bytes is pretty much what glibc malloc gives on nearly every system we have (on probably really old ones, it was perhaps 8 bytes). Did you want a pass flag and then letting 16 be the default - that would be too much plumbing (just like alignedAlloc). This is already a parameter of sorts. 
The world is not limited to glibc. MLIR should also work on other platforms, and you essentially shift the burden of the plumbing you didn't do to people debugging builds on those platforms. You can have one pass option that corresponds to malloc alignment and, if it is set to 0, treat it as "never use aligned_alloc".

================
Comment at: mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp:1464
           loc, elementPtrType, cumulativeSize,
-          allocationAlignment ? allocationAlignment.getValue().getSExtValue()
-                              : 0);
+          allocationAlignment ? allocationAlignment.getValue() : 0);
     }
----------------
I would just have

```
auto allocaOp = dyn_cast<AllocaOp>(op);
if (allocaOp) {
  allocatedBytePtr = nullptr;
  accessAlignment = nullptr;
  return rewriter.create<LLVM::AllocaOp>(
          loc, elementPtrType, cumulativeSize,
          allocaOp.alignment() ? allocaOp.alignemnt().getValue() : 0);
}
```

and sink the `getAllocationAlignment` below, all while making it handle only `AllocOp`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77528/new/

https://reviews.llvm.org/D77528