<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/129434>129434</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            s390x: `__builtin_reduce_and` does not optimize well
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          folkertdev
      </td>
    </tr>
</table>

<pre>
    given this C code

https://godbolt.org/z/WvfG8TTxf

```c
#include <vecintrin.h>
#include <stdbool.h>

bool vectors_equal_builtin(vector int a, vector int b) {
    return vec_all_eq(a, b);
}

typedef int vec4i __attribute__((vector_size(16)));

bool vectors_equal_manual(vec4i a, vec4i b) {
    return __builtin_reduce_and(a == b);
}
``` 

The manual implementation fails to optimize to the builtin one. 

```asm
vectors_equal_builtin:
        vceqfs  %v0, %v24, %v26
 lghi    %r2, 0
        locghie %r2, 1
        br %r14

vectors_equal_manual:
        aghi    %r15, -168
        vceqf %v0, %v24, %v26
        vno     %v0, %v0, %v0
        vlgvf   %r1, %v0, 0
 vlgvf   %r0, %v0, 1
        sll     %r1, 3
        rosbg   %r1, %r0, 61, 61, 2
        vlgvf   %r0, %v0, 2
        rosbg   %r1, %r0, 62, 62, 1
 vlgvf   %r0, %v0, 3
        rosbg   %r1, %r0, 63, 63, 0
        tmll %r1, 15
        lghi    %r2, 0
        locghie %r2, 1
        aghi %r15, 168
        br      %r14
```

```llvm
define dso_local noundef zeroext i1 @vectors_equal_manual(<4 x i32> noundef %a, <4 x i32> noundef %b) local_unnamed_addr {
entry:
  %0 = icmp ne <4 x i32> %a, %b
  %1 = bitcast <4 x i1> %0 to i4
  %2 = icmp eq i4 %1, 0
  ret i1 %2
}
``` 

There are many varitions on `vec_all_eq` (see https://www.ibm.com/docs/en/zos/2.4.0?topic=functions-any-predicates), and it would be neat if those all optimized. It might be possible to simplify clang's `vecintrin.h` too. 

This came up while implementing `vec_all_eq` in the rust standard library, where fewer custom intrinsics are better in every way. 

cc @uweigand (posted here so it can be linked to)

</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVlFvozgQ_jXOy6iRsSFNHvKQbTene1_pHpGxB_CtsVPbkKa__mRIAu22q9VVETXM528-D8NnixB0YxH3pPhGiueV6GPr_L525if6qHBYVU5d9o0e0EJsdYAnkE4hoQdCD22Mp0D4gbAjYcfGqcqZuHa-Iez4Rtjxn6H-a_vjx2s9wcmGTj-ZbhjXVppeIRD-NKDUNnpt1y3h338Jh6gq58w9SA_pFgaU0flQ4ksvTFn12kRtCdtOz0HbCIKwJ1jcV4TtgDx-I_QAAOAx9t4mQCmMKfGFsO04JeEITzDy-DyljJcTKqxHmgFlrqEsRYxeV33EsiRse09dBv2GhG2zTaKZfhPZp8o7YXthptm5vmvO9Rdyy9taS4-ql1gKq5JwIPyZ8Odfxd8KD5OEHy3ClBN0dzLYoY0iamehFtoEiA7cKepOv2EaxxbhmhCcxTV8eJ0idIQePn8Z_HCVnv4GiS91ACCsGGhaZBqw_D7aJKxpWp3AhBWepRBdMBgnm1bjHMwWwcqPz7N80vdpjd_pEYtUWZHoHrLN9qPg38u9Ia2DK9WMXQwWSNMM9S3pO-SIWobfEy2XGowBWJLwRdC7UDUfMkxcm2y-si80vU_K_oyXzdfs96v4Q6F8vi6LFztjZnhWLFvjf_fN2AZzD7xvgcrDXOl82fYfvgJjhvQZKKy1RVDBlcZJYcC63ibbeEPv8DWCzoDk9AsDIPwph1fQnBH-_T6VsGL0hC-jo02M6creWtGhKoVS_uocaKO_3DqfsIImnwAtuxNY_EB6T5VIr_hsxFc6ShHiHZ9d4TRZhM5vYDaT4wvofCSYX4fHqQKsYL-zJ48g_GhSFxiE18mcAjgLZEMXZp2msG1AhPdb0fl8XuuqW0vXEXZUTgbCjmjTtuTSkK3zNSX8GN1JS8Kf697KMcWDsJeHk0elpYgYRut-AmEV6Ahn1xsFFYJFEUHXEFsXEIQxd7tUa_g7QqebNibgyYWgKzO6aEhWq-sLSCNsQ9hjuC7mvvFtUind-l4FHUCKDqE_wbnVBmez1rb5tRLajk7t-xAhRGGV8AqMrrzwl7SK81jWGs_oQfYhug6m1EHLMJa7whgx7ZOAA_oLnMXlpkbK1LX9GXWTqkHY9uRCRAUjaXCpPlLYtGij7U9UEF2qHj2s1J6rHd-JFe6zx5zuWFbwbNXuN49S5jtFGYotl1lR8E1OH0W9yelmR7N8pfeMsoJyyjKebYtsXStJxU5miDXfsVySnGIntFmnby-dO1Y6hB73GdvlPF8ZUaEJ49GGMYtnGKOEsXTS8fs06aHqm0ByanSIYaaJOhrcB76jr4QfUqk_3XI3FJTDANbFecM8ozGr3pv9h9ORjm1fXTty9Irp38PJu39RRsKOo7zUnlf9w579FwAA__-VL8gE">