[llvm-dev] Complex proposal v3 + roundtable agenda

Thu Nov 12 13:13:43 PST 2020

Some architectures have instructions that assist with complex arithmetic.  Without intrinsics it may be hard to use such instructions especially because of the arithmetic simplifications.  Perhaps, depending on TTI, those intrinsics could be expanded into the explicit arithmetic?

--
Krzysztof Parzyszek  kparzysz at quicinc.com   AI tools development

-----Original Message-----
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Cameron McInally via llvm-dev
Sent: Thursday, November 12, 2020 12:53 PM
To: Florian Hahn <florian_hahn at apple.com>
Cc: David Greene <dag at cray.com>; llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] Complex proposal v3 + roundtable agenda

On Thu, Nov 12, 2020 at 12:03 PM Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> There’s growing interest among our users to make better use of dedicated hardware instructions for complex math and I would like to re-start the discussion on the topic. Given that this original thread was started a while ago apologies if I missed anything already discussed earlier on the list or the round-table. The original mail is quoted below.
>
> In particular, I’m interested in the AArch64 side of things, like using FCMLA [1] for complex multiplications to start with.
>
> To get the discussion going, I’d like to share an alternative pitch. Instead of starting with adding complex types, we could start with adding a set of intrinsics that operate on complex values packed into vectors instead.
>
> Starting with intrinsics would allow us to bring up the lowering of those intrinsics to target-specific nodes incrementally without having to make substantial changes across the codebase, as adding new types would require. Initially, we could try and match IR patterns that correspond to complex operations late in the pipeline. We can then work on incrementally moving the point where the intrinsics are introduced earlier in the pipeline, as we adopt more passes to deal with them. This way, we won’t have to teach all passes about complex types at once or risk loosing out all the existing combines on the corresponding floating point operations.
>
> I think if we introduce a small set of intrinsics for complex math (like @llvm.complex.multiply) we could use them to improve code-generation in key passes like the vectorizers and deliver large improvements to our users fairly quickly. There might be some scenarios which require a dedicated IR type, but I think we can get a long way with a set of specialized intrinsics at a much lower cost. If we later decide that dedicated IR types are needed, replacing the intrinsics should be easy and we will benefit of having already updated various passes to deal with the intrinsics.
>
> We took a similar approach when adding matrix support to LLVM and I think that worked out very well in the end. The implementation upstream generates equivalent or better code than our earlier implementation using dedicated IR matrix types, while being simpler and impacting a much smaller area of the codebase.
>
> An independent issue to discuss is how to generate complex math intrinsics.
> As part of the initial bring-up, I’d propose matching the code Clang generates for operations on std::complex<> & co to introduce the complex math intrinsics. This won’t be perfect and will miss cases, but allows us to deliver initial improvements without requiring extensive updates to existing libraries or frontends. I don’t think either the intrinsic only or the complex type variants are inherently more convenient for frontends to emit.
>
> To better illustrate what this approach could look like, I put up a set of rough patches that introduce a @llvm.complex.multiply intrinsic (https://reviews.llvm.org/D91347), replace a set of fadd/fsub/fmul instructions with @llvm.complex.multiply (https://reviews.llvm.org/D91353) and  lower the intrinsic for FCMLA on AArch64 (https://reviews.llvm.org/D91354). Note that those are just rough proof-of-concept patches.
>
> Cheers,
> Florian

Hi Florian,

The proposed experimental intrinsics are a difficult detour to accept for performance reasons. With a complex type, the usual algebraic simplifications fall out for free (or close to it). Teaching existing optimizations how to handle the new complex intrinsics seems like a LOT of unnecessary work.

That said, we recently had this same conversation at Simon Moll's native predication sync-up meeting. Simon had some convincing ways to workaround predicated intrinsic optimization (e.g. the PredicatedInstruction class). Maybe we should explore a more generalized solution that would cover complex intrinsics too?

Digressing a bit, have we ever discussed using a branch to develop something like complex support? That way we would avoid an experimental intrinsic implementation, but also not disturb the codebase until the implementation is complete.

-Cameron
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev