[llvm-dev] [RFC] Proposal for TLX: Tensor LLVM eXtensions
Kothari, Akash via llvm-dev
llvm-dev at lists.llvm.org
Tue Nov 23 10:57:24 PST 2021
(+ Vikram, Charith, Dounia, Rafae, Milind and Sudipta)
Thanks for your feedback and questions. Please see my inlined comments.
On Nov 23, 2021, at 11:32 AM, Florian Hahn <florian_hahn at apple.com<mailto:florian_hahn at apple.com>> wrote:
Thanks for sharing the proposal! I think the matrix extension has shown that it is feasible to use a ‘flat vector’ encoding to support more complex operations. Decoupling the shape information from the ‘operational’ intrinsics seems very neat!
Below some additional initial questions.
* The proposal itself is very big, both in terms of text as well as in the code that will be required to implement it. Have you thought about how to bring up support in a way that allows using a (smaller) subset of intrinsics end-to-end?
We intend to submit code review requests for patches that add the proposed intrinsics such as typeinfo, different MMA intrinsics, matmul, transpose, padding, etc. to LLVM and support for legalizing them to target-specific intrinsics for Intel AMX, NVIDIA Tensor Cores, and other targets — lowering for different targets could be separate code review requests. We can also implement and submit a code review request for pass that converts the current matrix extensions to our intrinsics, so that people using the existing matrix extensions can try out our intrinsics and lowering support to different targets. In the subsequent code review requests, we will submit support for N-D to 2-D lowering as that implementation becomes more mature on our end.
* What will the hardware specific lowering look like? I think you mentioned you are planning to support a set of different hardware architectures. Will this require a separate lowering pass for each of those?
Yes, the lowering for different hardware target entails lowering from target-agnostic intrinsics to target-specific ones in LLVM IR. So each target would require a separate pass.
* What’s the motivation for some intrinsics returning a vector and others returning a token type? Could all intrinsics return vector? This would be more consistent and the type info is associated to the value itself in any case.
Actually, all intrinsics return vectors except typeinfo and tensor load intrinsics. Tensor load intrinsic returns a token type because it explicitly returns a tensor and the tensor info is known at the time of performing the tensor load. LLVM already has an instruction for loading vectors.
* Will variable shapes/sizes be supported? IIRC you mentioned that the type intrinsic can take arbitrary values as arguments. But some intrinsics return vectors, so they would need a fixed size?
Variable shape/sizes can be represented with our intrinsics. In this situation, our intrinsics could return values of vector type with vscale. We do not use this vector information when lowering. We use the shape information available in typeinfo when lowering, so use of vscale in the IR should suffice in indicating that the size of tensors is not known at compile-time.
* You mentioned Julia and Halide as potential adopters. Do you know if there’s concrete interest to switch to using the new intrinsics by the maintainers ? What would the anticipated timeframe be? I think this could be a crucial argument for having this in LLVM directly, if we have people who are going to use this ASAP.
We intend to get in touch with people working on Julia soon. We have not spoken with people working on Halide specifically but the lead of a compiler team at Qualcomm, Anshu Dasgupta, has been involved in our project. Currently, Halide generates target-specific intrinsics in LLVM IR to be able to target different architectures and there is some interest in using the target-agnostic intrinsics. However, we have not discussed any timeline with them yet.
* What will Clang support for arbitrary tensors look like? If Clang won’t support arbitrary tensors, why not?
Clang would just need to generate our intrinsics with appropriate tensor shape, layout information, etc. that is available in the application written in a frontend language supported by LLVM. I would refer you to an example we have in our documentation about lowering from Julia to our tensor intrinsics here<https://docs.google.com/document/d/1A3xbrtouckRsPz94v2XttjoaTSqQlz1pSzVe80-Jmro/edit#heading=h.17j13gwxto8i>.
* AFAICT this should completely subsume the matrix extension and if we decide to add the more general extension the matrix extension should be removed. How will the transition from the current matrix intrinsics to the new tensor intrinsics work? Can existing IR be auto-upgraded?
We are thinking about implementing a pass that can convert LLVM IR with matrix extensions to IR with our proposed intrinsics and vice versa. If people would be interested in this, we could contribute these passes to LLVM.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev