<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/65058>65058</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [bug] Different results of big and little endian's loop vectorize testcase on clang-15.x
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Jolyon0202
      </td>
    </tr>
</table>

<pre>
    Here is our testcast:

```
unsigned char bitmap;
unsigned segNum;

#define MIN(a, b) ((a) <= (b) ? (a) : (b))
__attribute__((noinline)) void tf(unsigned char segnum) {
  segNum = segnum;
  unsigned char seg = MIN(segNum, 4);
  for (unsigned int i = 0; i < seg; i++) {
    bitmap |= (unsigned char)(1UL << i);
 }
}

int main() {
  tf(0);
  printf("tf(0) bitmap: 0x%x\n", bitmap);
  tf(1);
  printf("tf(1) bitmap: 0x%x\n", bitmap);
 tf(2);
  printf("tf(2) bitmap: 0x%x\n", bitmap);
  tf(3);
 printf("tf(3) bitmap: 0x%x\n", bitmap);
  tf(4);
  printf("tf(4) bitmap: 0x%x\n", bitmap);
  tf(5);
  printf("tf(5) bitmap: 0x%x\n", bitmap);
  return 0;
}
```

$clang test.cpp -O2 --target=aarch64-linux-gnu -march=armv8-a && a.out

tf(0) bitmap: 0x0
tf(1) bitmap: 0x1
tf(2) bitmap: 0x3
tf(3) bitmap: 0x7
tf(4) bitmap: 0xf
tf(5) bitmap: 0xf

$clang test.cpp -O2 --target=aarch64_be-linux-gnu -march=armv8-a && a.out

tf(0) bitmap: 0x0
tf(1) bitmap: 0x0
tf(2) bitmap: 0x0
tf(3) bitmap: 0x0
tf(4) bitmap: 0x0
tf(5) bitmap: 0x0



The truck llvm does not have this bug, because have different loop vectorize optimization of tf function. But the bug is still in CodeGen.   Look at here: https://godbolt.org/z/zcGcvezdE

The key point looks like the two REV32, if we remove and we get right result. 
Or the uzp1 , we replace to uzp2 it's also right.

15.x produce a trunc instruction in IR, and then get trunc and bitcast instruction in SelectionDAG, and then Codegen get XTN and REV32. But little endian is : trunc -> trun + bitcast -> XTN.

The different Instruction Selection 
BE: def : Pat<(v4i16 (bitconvert (v2i32 FPR64:$src))), (v4i16 (REV32v4i16 FPR64:$src))>;
LE: def : Pat<(v4i16 (bitconvert (v2i32 FPR64:$src))), (v4i16 FPR64:$src)>;
Look at this commit: https://github.com/llvm/llvm-project/commit/30e0e11eb4d9157e8abeea3f0f9027a00617f2bb









</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8VkFv2zgT_TX0ZWCDIiXZOvgQ23G_fuimRbe76C2gpJHEjUQaJOWm-fULUoojx9kEDbALyLTI0Ty-eZwhKayVtUJck2RDkt1M9K7RZv1_3f7UijLKZrkuf67_hwZBWtC9AYfWFcI6wq8I3RH62KZ0fEK3VwG4hKIRBnLpOnEgfPPMaLG-6bvT-NgyXmIlFcJvH28IWwnCtpATlgFhq6GfAeFbwnd-ZLDwPTyZrh7H_RMwb2-Fc0bmvcPb2wFGaalaqXD4DI5aluAqwlbn1C3Wqu8C7nKkCSNv8AxGOz_ZLtzDZ0MoY7xsC7Gf9smp0gamU0vlQAZHSvgmvG49VugQtgnPGSUYRQayfFTmjEkIcxX98WmQbuthJgzIcjeqf3oJrSfSCamCZmcTBq3oeRgHI1UYJ4yd7KfVvwJ6T1hyT5KtIoyFZR1MZyDBMXoDOPp14ODH3sBl7yXMz8YucPl7ceM3CMfvBU7eAE7eAWzQ9UaFnH2eTefbw2Olx0UrVB32lEVxOMD8M4P53AlToyN8J4QpmjSet1L19_Na9TDv_JA3me64mgsgLCUsBbHQvZtiv5x-dGK8SKFoYrzIAz4xXizmcmK8WJBqYrwQtXqPHrc5_keS0Nckoa9JQl-ThL4myXmKTNpvDYIzfXEHbXvsoNRoQWkHjTgiuEZayPs6pCYWorc4GEpZVWhQOWi1PsARC6eNfEDQByc7-SCc1Ap0Ba6CqleF7y5g0ztwDXpEf_BZJ9sWpIKtLvEDqgUAfNL6DoSDBg165o1zB-tPRbYnbF_rMtetW2hTE7Z_8L_iQ3HEh_L6eUx3-BMOWg4M7yy08g7D5O6Hhq_Xf_JQb7KCHwgGO31EEKr0vRodGFk3DgzavnULGFA_m-DfPxwi8L7B8dCKAsFpP8xAOsKWFkRr9QCxmNKKksU9HIwu-wJBeNVVAVJZL3_QSyr4-NVDeyauQRW4DN_5oVyGS8Jzn9-xxdDZXX048_a61iPK9283wRBCH5ailc61CKhKKZRfEC_4MNuc8OvwCoRtTvOG0e_fbhbPxX5Kh48Taideo4Cbaz9BiVWY6Itw_tBkq2MsozRcLqQrtDqicb53ZJIz2H_5msZh_WNrivH24Z8tTF1DVEPvRQ9-fdo9P_1LLC6_mU46pnUoqEJ3nXQvZLd0TZ8vCt0RtvfVOP7ND0b_hYUjbD-6sj2nSDGKMI_LLEqWuBI5ouAVrTLKloLSNFpWLM__qexfb2flmpcZz8QM11GacZZFjCWzZo2ZSNNlKRKeYsQizpOoElVG84pjTJNkJteMMk5XLKNpTFmyWK5WiadVZctVllMkMcVOyHbhI_OFPJPW9rhOE5qsZq3IsbXh4syYwh8QjP5wTHYzsw5q5H1tSUxbaZ19QnHSteHG7berZAe7U0oOVWz9bpTLOhTBWeaHkn22jY33cQStIJwec1-8s960619esxCCJWwfQvw7AAD__-4reAY">