[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

Mon Oct 7 09:27:30 PDT 2019

dmgreen added subscribers: samparker, SjoerdMeijer.
dmgreen added a comment.

This is looking good to me. My understanding is that is has some dependencies? The llvm side will likely needed to go in first, plus a couple of clang patches?

================
Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8529
   "argument should be a multiple of %0">;
+def err_argument_not_power_of_2 : Error<
+  "argument should be a power of 2">;
----------------
simon_tatham wrote:
> dmgreen wrote:
> > Do we have any/should we have some tests for these errors?
> By the time we finish implementing all the intrinsics, there will be tests for these errors. The intrinsics that need them aren't in this preliminary commit, though.
OK. That's fair.

================
Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp -mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp -mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s
----------------
simon_tatham wrote:
> dmgreen wrote:
> > These tests all run -O3, the entire pass pipeline. I see quite a few tests in the same folder do the same thing, but in this case we will be adding quite a few tests. Random mid-end optimizations may well keep on altering them.
> > 
> > Is it possible to use -disable-O0-optnone and pipe them into opt -mem2reg? What would that do to the codegen, would it be a lot more verbose than it is now?
> > 
> > Also could/should they be using clang_cc1?
> The immediate problem is that if you use any form of `clang | opt | FileCheck` command line, then `update_cc_test_checks.py` says 'skipping non-FileChecked line', because it doesn't support anything more complicated than a two-stage pipeline consisting of clang and FileCheck.
> 
> I've enhanced `update_cc_test_checks` to handle that case, in D68406, and respun these tests accordingly. But if that patch doesn't get approval then I'll have to rethink this again.
> 
> (For `vld24.c` I ended up running `opt -sroa -early-cse` to avoid the IR becoming noticeably more verbose. The rest worked fine with just `mem2reg`, though.)
> 
> 
> Patching `update_cc_test_checks.py` to support more complex pipelines, it seems to work OK: most codegen changes are trivial ones, such as modifiers not appearing on IR operations (`call` instead of `tail call`, plain `shl` in place of `shl nuw`). Only `vld24` becomes significantly more complicated: for that one file I had to run `opt -mem2reg -sroa -early-cse` instead.
Yeah, they look OK. Thanks for making the change. It looks like a useful feature being added.

================
Comment at: clang/test/CodeGen/arm-mve-intrinsics/vcvt.c:13
+{
+    return vcvttq_f16_f32(a, b);
+}
----------------
Tests for bottom too would be good to see too. In general, more tests are better.

================
Comment at: clang/test/CodeGen/arm-mve-intrinsics/vcvt.c:20
+// CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
+// CHECK-NEXT:    [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.fltnarrow.predicated(<8 x half> [[A:%.*]], <4 x float> [[B:%.*]], i32 1, <4 x i1> [[TMP1]])
+// CHECK-NEXT:    ret <8 x half> [[TMP2]]
----------------
Are these tests still using old names? I'm not missing something about how these are generated, am I?

================
Comment at: clang/utils/TableGen/MveEmitter.cpp:132
+    // the pointee type.
+    Pointer,
+  };
----------------
simon_tatham wrote:
> dmgreen wrote:
> > The gathers are really a Vector of Pointers, in a way. But in the intrinsics (and IR), they are just treated as a vector of ints, so I presume a new type is not needed? We may (but not yet) want to make use of the llvm masked gathers. We would have to add codegen support for them first though (which I don't think we have plans for in the near term).
> Yes, I'm working so far on the assumption that we don't need to represent scatter/gather address vectors as a special vector-of-pointers type.
> 
> (If nothing else, it would be a bit strange for the 64-bit versions, where the vector element isn't even the same //size// as a pointer.)
> 
> If auto-generation of gather loads during vectorization turns out to need a special representation, then I suppose we'll have to rethink.
Sounds good.

================
Comment at: clang/utils/TableGen/MveEmitter.cpp:207
+    // pointer, especially if the pointee is also const.
+    if (isa<PointerType>(Pointee)) {
+      if (Const)
----------------
simon_tatham wrote:
> dmgreen wrote:
> > Would making this always east const be simpler? Do we generate anything with inner pointer types? Should there be a whitespace before the "const " in Name += "const "?
> There shouldn't be a whitespace, because clang-format agrees that the right spacing is `char *const p` and not `char * const p` :-)
> 
> But, hmm, now you mention it, it may well be that we never actually need a pointer-to-pointer type in the current state of the ACLE spec. Previous versions did have pointers to pointers, for written-back base addresses in contiguous loads/stores, but those intrinsics were removed in later drafts. So perhaps you're right that this is now unnecessary complexity.
Ah, I see. My brain and clang format often disagree about where the whitespace around a * should be :)

================
Comment at: clang/utils/TableGen/MveEmitter.cpp:755
+    }
+    std::list<Value::Ptr> used;
+    gen_code_dfs(Code, used, Pass);
----------------
simon_tatham wrote:
> dmgreen wrote:
> > Used
> > 
> > And why a list, out of interest?
> Just on the general principle of 'don't demand more capabilities from your data structure than you actually need' – I didn't need indexed random access in this case, and a list is the simplest structure that supports constant-time append and iteration. I can make it a vector if you prefer. (Perhaps the constant factors are better.)
Yeah, OK. Makes sense. I was thinking about it from the other way around; it doesn't need to insert into the middle of the list anywhere, so my default would be a vector. The spatial locality can be very nice.

No opinion which is really better though. Whatever you think.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161