<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/142561>142561</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SVE] Compile error with SVE intrinsic with windows on arm
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
scoutzeng
</td>
</tr>
</table>
<pre>
I was compiling a sample code of SVE Intrinsic with CLANGCL 19.1.1 on Windows platform. The compiler output as follows:
**error: Incorrect size for matrix_multiply_sve epilogue: 12 bytes of instructions in range, but .seh directives corresponding to 8 bytes**
`void matrix_multiply_sve(const float32_t* A, const float32_t* B, float32_t* C, uint32_t n, uint32_t m, uint32_t k) {
/*
* Multiply matrices A and B, store the result in C.
* It is the users responsibility to make sure the matrices are compatible.
*/
int a_idx;
int b_idx;
int c_idx;
// these are the columns of a nx4 sub matrix of A
svfloat32_t A0;
svfloat32_t A1;
svfloat32_t A2;
svfloat32_t A3;
// these are the columns of a 4x4 sub matrix of B
svfloat32_t B0;
svfloat32_t B1;
svfloat32_t B2;
svfloat32_t B3;
// these are the columns of a nx4 sub matrix of C
svfloat32_t C0;
svfloat32_t C1;
svfloat32_t C2;
svfloat32_t C3;
for (int i_idx = 0; i_idx < n; i_idx += svcntw()) {
// calculate predicate for this i_idx
svbool_t pred = svwhilelt_b32_u32(i_idx, n);
for (int j_idx = 0; j_idx < m; j_idx += 4) {
// zero accumulators before matrix op
C0 = svdup_n_f32(0);
C1 = svdup_n_f32(0);
C2 = svdup_n_f32(0);
C3 = svdup_n_f32(0);
for (int k_idx = 0; k_idx < k; k_idx += 4) {
// compute base index to 4x4 block
a_idx = i_idx + n * k_idx;
b_idx = k * j_idx + k_idx;
// load most current a values in row
A0 = svld1_f32(pred, A + a_idx);
A1 = svld1_f32(pred, A + a_idx + n);
A2 = svld1_f32(pred, A + a_idx + 2 * n);
A3 = svld1_f32(pred, A + a_idx + 3 * n);
// multiply accumulate 4x1 blocks, that is each column C
B0 = svld1rq_f32(svptrue_b32(), B + b_idx);
C0 = svmla_lane_f32(C0, A0, B0, 0);
C0 = svmla_lane_f32(C0, A1, B0, 1);
C0 = svmla_lane_f32(C0, A2, B0, 2);
C0 = svmla_lane_f32(C0, A3, B0, 3);
B1 = svld1rq_f32(svptrue_b32(), B + b_idx + k);
C1 = svmla_lane_f32(C1, A0, B1, 0);
C1 = svmla_lane_f32(C1, A1, B1, 1);
C1 = svmla_lane_f32(C1, A2, B1, 2);
C1 = svmla_lane_f32(C1, A3, B1, 3);
B2 = svld1rq_f32(svptrue_b32(), B + b_idx + 2 * k);
C2 = svmla_lane_f32(C2, A0, B2, 0);
C2 = svmla_lane_f32(C2, A1, B2, 1);
C2 = svmla_lane_f32(C2, A2, B2, 2);
C2 = svmla_lane_f32(C2, A3, B2, 3);
B3 = svld1rq_f32(svptrue_b32(), B + b_idx + 3 * k);
C3 = svmla_lane_f32(C3, A0, B3, 0);
C3 = svmla_lane_f32(C3, A1, B3, 1);
C3 = svmla_lane_f32(C3, A2, B3, 2);
C3 = svmla_lane_f32(C3, A3, B3, 3);
}
// compute base index for stores
c_idx = n * j_idx + i_idx;
svst1_f32(pred, C + c_idx, C0);
svst1_f32(pred, C + c_idx + n, C1);
svst1_f32(pred, C + c_idx + 2 * n, C2);
svst1_f32(pred, C + c_idx + 3 * n, C3);
}
}
}`
The code is a sample code from ARM developer website introduce me to implement matrix multiply with SVE intrinsic. I don't know how to solve this error, is there a compiler error?
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycWEFz2joX_TVic6eMLQGBBQvjNN9kpv0Wr2_6loxsX0BFlvwkGZL--jeyjW0S49AwHYpudI7OPbJ0JXNrxV4hrsl8Q-aPE166gzZrm-rS_Ua1nyQ6e10_w5lbSHVeCCnUHjhYnhcSIdUZgt7Bj59f4Vk5I5QVKZyFO0D8Lfr__-JvEK6m4TQEreAfoTJ9tlBI7nba5FP4-4ANKxrQpStKB9zCTkupz5awiAQRof4fGqMNYRE8q1Qbg6kDK34j7LSBnDsjXrZ5KZ0o5OvWnhCwEFLvS_SQkELy6tB6oUJZZ8rUCa0sCAWGqz0SGkNSOphaPEAmPLs4oU_YGLSFVplP2mlY1kS1JK9tEZy0yIYUELpMtbIOdlJzx-jWERpB5IcaiG98_CoS-0gpVBUAddXKr1pHQldAHjYkiAAACH2qxUHdiuB7I6uWmaKFCLjK6kGt0wbBHRAM2lI6b0o87cOfHQhb9SgtGgu1JVYkQgr36m3J-RHBlg1POwo39exyJxKJfU6vMbhoFMoB34rshbBNL5S8D6VdqJ_skx_WYjWeq54oWeaqmm4O6mUGtkyaKfKxC9aeWsMhCrqRruJhE78K0hud2R9qm73TNkS7uaFtE96I35C3-VN5762LB2jjG_LiG_JiOmBpfK3Nr2pCl37OhZ9zIOwR_DhtMwbVa9KN72BPqXJnQpeErnprokk05TItJXcIhcFMpP6XH8cdhK15Lg9oJTjRWm5d1Rdq8vNBSJRumzC6LRn1-ioU9Vro6iqBN0n8uk7iV5tE3mvWScyuV_Pl0yTxG40GnqZl7lPRxkKCO7-AL3NUvMHFQaM-K4ut2u4q3UEr96preH9Xen9XdnfXnmHHa8OOrWHHXnPUsJ5pfg8qHULCLYJQGb74TcsvvkTq9DgA5O3w7QMGqtoMj9e7Uv-TtKBj1bWd1j7otkqpeQa5tg7S0hj0myKcuCyxrlP67KHRZTplFjZe-ifUP4NRNVS9kQ656z9ReA-8Tvc2Cb2bhFZGjFCxu6nYG6reyr6U3W5hIMxewnp2rSdzB14VMeTpodnj6r1s0_PT_NtIsKfCmRL9Qr9sJjFsKhXJqL3tYssl30qusCGMgyqj6ntTfQ8vgDs4wo4j_CwH7Thox_EBiHUgNrTbXT6b8BOO1qvkdj7hDWlhz9fwA1_HOcKOY8TXcQ7acfR9HQexDjTuK_2sr_UiHHGX3hBIe-7SD9wd5wg7jhF3xzlox9F3dxzEOtC4u-yz7rKP3GU3BLKeu-wDd8c5wo5jxN1xDtpx9N0dB7EOxAYHJg-Pw-eYgZLsq391GbFvIGlbV9Wbuip6xdierHtbQ-KqV3o5pcXDDn-AvFTD2J9oP4tvC2Hsz78tyz1Q1oO-d7lzuP7lvxcBCaL6ap2hL3vX1_Wd0TlEf32HDE8odYEGzphY4fxMOKOzMkXI0Z-RhEfl_izSnC_bUltd8f2tX1xu_VN4hkwrQh8cwFHpMxz02ZNYLU9YH7PrWzyNmwulQeDd_b-54j9NsjXLVmzFJ7gOH2bLFVutFg-TwzrBcL7DxZwtWEKXu3AezGdhOgv4nC1mu2w3EWsa0HmwCFiwZCv6MA2CcMbp8oHO0xVd4o7MAsy5kFMpT_lUm_1EWFviOpzR-SKcSJ6gtNWrEEoVnqH6K6GUzB8nZu1BX5Jyb8kskMI629E44WT1DuXHz69k_ghxnVSd04BZdejcvBDRCrjJJ6WR64NzRfXio1one-EOZTJNdU7okx-s-e9LYfQvTB2hT5VES-hTk8NpTf8LAAD__6-25mY">