[clang] [llvm] [Clang][AArch64] Add customisable immediate range checking to NEON (PR #100278)
via cfe-commits
cfe-commits at lists.llvm.org
Tue Aug 13 04:30:41 PDT 2024
================
@@ -1959,9 +2064,12 @@ multiclass VCMLA_ROTS<string type, string lanety, string laneqty> {
let isLaneQ = 1 in {
// vcmla{ROT}_laneq
+ // ACLE specifies that the fp16 vcmla_#ROT_laneq variant has an immedaite range of 0 <= lane <= 1.
+ // fp16 is the only variant for which these two differ.
+ // https://developer.arm.com/documentation/ihi0073/latest/
+ defvar getlanety = !if(!eq(type, "h"), lanety, laneqty);
def : SOpInst<"vcmla" # ROT # "_laneq", "...QI", type, Op<(call "vcmla" # ROT, $p0, $p1,
- (bitcast $p0, (dup_typed lanety, (call "vget_lane", (bitcast laneqty, $p2), $p3))))>>;
-
+ (bitcast $p0, (dup_typed lanety, (call "vget_lane", (bitcast getlanety, $p2), $p3))))>>;
----------------
SpencerAbson wrote:
The `vcmla` intrinsics are instantiated for each base type ([f16, f32, f64]) under this `VCMLA_ROTS` multiclass. For example:
`defm VCMLA_F32 : VCMLA_ROTS<"f", "uint64x1_t", "uint64x2_t">;`
The first argument is a vector with the number of lanes accessible by the `lane` argument in [ vcmlaq_lane](https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vcmlaq_lane) and [vcmla_lane](https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vcmla_lane) type inrinsics. The second argument is that for [vcmlaq_laneq](https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vcmlaq_laneq) and [vcmla_laneq](https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vcmla_laneq) type intrinsics . This is done to give a vector of the correct number of elements to the call to `vget` in the instruction definition, so that the range of the immediate is correctly bounded (the range of this immediate is such that it can access some of the lower half of the vector only, which differs from traditional `_lane` type intrinsics where we access an arbitrary lane).
There is a problem with this approach for the `f16`/`h` base type; the range of the immediate in `vcmlaq_laneq_f16` is different to that for `vcmla_laneq_f16` (` 0<=lane<=3` and `0<=lane<=1` respectively). This is how the intrinsics are instantiated:
`defm VCMLA_FP16 : VCMLA_ROTS<"h", "uint32x2_t", "uint32x4_t">;`
The simplest fix I found for this case was to add the `getlanety` conditional which will select `uint32x2_t` for `vcmla_laneq_f16` and `uin32x4_t` for `vcmlaq_laneq_f16`, giving the correct range for the immediates.
https://github.com/llvm/llvm-project/pull/100278
More information about the cfe-commits
mailing list