[llvm] add tests for loop definition of bitconvert (PR #133052)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 26 01:43:49 PDT 2025
https://github.com/Shoreshen created https://github.com/llvm/llvm-project/pull/133052
All tests passed due to:
1. For DAG, pattern will not separate SReg and VReg. One of the sample is:
```
define <2 x double> @v_bitcast_v4f32_to_v2f64(<4 x float> inreg %a, i32 %b) {
%cmp = icmp eq i32 %b, 0
br i1 %cmp, label %cmp.true, label %cmp.false
cmp.true:
%a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
%a2 = bitcast <4 x float> %a1 to <2 x double>
br label %end
cmp.false:
%a3 = bitcast <4 x float> %a to <2 x double>
br label %end
end:
%phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
ret <2 x double> %phi
}
```
It suppose to select from scalar register patterns. But the Vreg pattern is matched is as follow:
```
Debug log:
ISEL: Starting selection on root node: t3: v2f64 = bitcast t2
ISEL: Starting pattern match
Initial Opcode index to 440336
Skipped scope entry (due to false predicate) at index 440339, continuing at 440367
Skipped scope entry (due to false predicate) at index 440368, continuing at 440396
Skipped scope entry (due to false predicate) at index 440397, continuing at 440435
Skipped scope entry (due to false predicate) at index 440436, continuing at 440467
Skipped scope entry (due to false predicate) at index 440468, continuing at 440499
Skipped scope entry (due to false predicate) at index 440500, continuing at 440552
Skipped scope entry (due to false predicate) at index 440553, continuing at 440587
Skipped scope entry (due to false predicate) at index 440588, continuing at 440622
Skipped scope entry (due to false predicate) at index 440623, continuing at 440657
Skipped scope entry (due to false predicate) at index 440658, continuing at 440692
Skipped scope entry (due to false predicate) at index 440693, continuing at 440727
Skipped scope entry (due to false predicate) at index 440728, continuing at 440769
Skipped scope entry (due to false predicate) at index 440770, continuing at 440798
Skipped scope entry (due to false predicate) at index 440799, continuing at 440836
Skipped scope entry (due to false predicate) at index 440837, continuing at 440870
TypeSwitch[v2f64] from 440873 to 440892
Patterns:
/*440892*/ OPC_CompleteMatch, 1, 0,
// Src: (bitconvert:{ *:[v2f64] } VReg_128:{ *:[v4f32] }:$src0) - Complexity = 3
// Dst: VReg_128:{ *:[v2f64] }:$src0
```
2. Global isel will use `Select_COPY` to select bitcast
>From e01a648dc6aff77e34b4eaa3fcde576063267206 Mon Sep 17 00:00:00 2001
From: shore <372660931 at qq.com>
Date: Wed, 26 Mar 2025 16:32:18 +0800
Subject: [PATCH] add tests for loop definition of bitconvert
---
.../CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll | 2394 +++
.../CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll | 6084 ++++++
.../CodeGen/AMDGPU/amdgcn.bitcast.160bit.ll | 178 +
.../CodeGen/AMDGPU/amdgcn.bitcast.16bit.ll | 556 +
.../CodeGen/AMDGPU/amdgcn.bitcast.192bit.ll | 1062 ++
.../CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll | 194 +
.../CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll | 9118 +++++++++
.../CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll | 209 +
.../CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll | 220 +
.../CodeGen/AMDGPU/amdgcn.bitcast.32bit.ll | 1960 ++
.../CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll | 228 +
.../CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll | 235 +
.../CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll | 15566 ++++++++++++++++
.../CodeGen/AMDGPU/amdgcn.bitcast.64bit.ll | 4574 +++++
.../CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll | 163 +
llvm/test/lit.cfg.py | 2 +-
16 files changed, 42742 insertions(+), 1 deletion(-)
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.160bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.16bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.192bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.32bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.64bit.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
new file mode 100644
index 0000000000000..9134339cd1665
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
@@ -0,0 +1,2394 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <32 x float> @v_bitcast_v32i32_to_v32f32(<32 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i32_to_v32f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_i32_e32 v31, vcc, 3, v31
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: v_add_i32_e32 v29, vcc, 3, v29
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v27, vcc, 3, v27
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v25, vcc, 3, v25
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v23, vcc, 3, v23
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v21, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v19, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v17, vcc, 3, v17
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i32_to_v32f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB0_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_u32_e32 v31, vcc, 3, v31
+; VI-NEXT: v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT: v_add_u32_e32 v29, vcc, 3, v29
+; VI-NEXT: v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT: v_add_u32_e32 v27, vcc, 3, v27
+; VI-NEXT: v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT: v_add_u32_e32 v25, vcc, 3, v25
+; VI-NEXT: v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT: v_add_u32_e32 v23, vcc, 3, v23
+; VI-NEXT: v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT: v_add_u32_e32 v21, vcc, 3, v21
+; VI-NEXT: v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT: v_add_u32_e32 v19, vcc, 3, v19
+; VI-NEXT: v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT: v_add_u32_e32 v17, vcc, 3, v17
+; VI-NEXT: v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB0_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i32_to_v32f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB0_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_u32_e32 v31, 3, v31
+; GFX9-NEXT: v_add_u32_e32 v30, 3, v30
+; GFX9-NEXT: v_add_u32_e32 v29, 3, v29
+; GFX9-NEXT: v_add_u32_e32 v28, 3, v28
+; GFX9-NEXT: v_add_u32_e32 v27, 3, v27
+; GFX9-NEXT: v_add_u32_e32 v26, 3, v26
+; GFX9-NEXT: v_add_u32_e32 v25, 3, v25
+; GFX9-NEXT: v_add_u32_e32 v24, 3, v24
+; GFX9-NEXT: v_add_u32_e32 v23, 3, v23
+; GFX9-NEXT: v_add_u32_e32 v22, 3, v22
+; GFX9-NEXT: v_add_u32_e32 v21, 3, v21
+; GFX9-NEXT: v_add_u32_e32 v20, 3, v20
+; GFX9-NEXT: v_add_u32_e32 v19, 3, v19
+; GFX9-NEXT: v_add_u32_e32 v18, 3, v18
+; GFX9-NEXT: v_add_u32_e32 v17, 3, v17
+; GFX9-NEXT: v_add_u32_e32 v16, 3, v16
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB0_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i32_to_v32f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_nc_u32_e32 v31, 3, v31
+; GFX11-NEXT: v_add_nc_u32_e32 v30, 3, v30
+; GFX11-NEXT: v_add_nc_u32_e32 v29, 3, v29
+; GFX11-NEXT: v_add_nc_u32_e32 v28, 3, v28
+; GFX11-NEXT: v_add_nc_u32_e32 v27, 3, v27
+; GFX11-NEXT: v_add_nc_u32_e32 v26, 3, v26
+; GFX11-NEXT: v_add_nc_u32_e32 v25, 3, v25
+; GFX11-NEXT: v_add_nc_u32_e32 v24, 3, v24
+; GFX11-NEXT: v_add_nc_u32_e32 v23, 3, v23
+; GFX11-NEXT: v_add_nc_u32_e32 v22, 3, v22
+; GFX11-NEXT: v_add_nc_u32_e32 v21, 3, v21
+; GFX11-NEXT: v_add_nc_u32_e32 v20, 3, v20
+; GFX11-NEXT: v_add_nc_u32_e32 v19, 3, v19
+; GFX11-NEXT: v_add_nc_u32_e32 v18, 3, v18
+; GFX11-NEXT: v_add_nc_u32_e32 v17, 3, v17
+; GFX11-NEXT: v_add_nc_u32_e32 v16, 3, v16
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i32> %a, splat (i32 3)
+ %a2 = bitcast <32 x i32> %a1 to <32 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i32> %a to <32 x float>
+ br label %end
+
+end:
+ %phi = phi <32 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x float> %phi
+}
+
+define <32 x i32> @v_bitcast_v32f32_to_v32i32(<32 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f32_to_v32i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_f32_e32 v31, 1.0, v31
+; GCN-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GCN-NEXT: v_add_f32_e32 v29, 1.0, v29
+; GCN-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GCN-NEXT: v_add_f32_e32 v27, 1.0, v27
+; GCN-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GCN-NEXT: v_add_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GCN-NEXT: v_add_f32_e32 v23, 1.0, v23
+; GCN-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GCN-NEXT: v_add_f32_e32 v21, 1.0, v21
+; GCN-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GCN-NEXT: v_add_f32_e32 v19, 1.0, v19
+; GCN-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GCN-NEXT: v_add_f32_e32 v17, 1.0, v17
+; GCN-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f32_to_v32i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB1_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_f32_e32 v31, 1.0, v31
+; VI-NEXT: v_add_f32_e32 v30, 1.0, v30
+; VI-NEXT: v_add_f32_e32 v29, 1.0, v29
+; VI-NEXT: v_add_f32_e32 v28, 1.0, v28
+; VI-NEXT: v_add_f32_e32 v27, 1.0, v27
+; VI-NEXT: v_add_f32_e32 v26, 1.0, v26
+; VI-NEXT: v_add_f32_e32 v25, 1.0, v25
+; VI-NEXT: v_add_f32_e32 v24, 1.0, v24
+; VI-NEXT: v_add_f32_e32 v23, 1.0, v23
+; VI-NEXT: v_add_f32_e32 v22, 1.0, v22
+; VI-NEXT: v_add_f32_e32 v21, 1.0, v21
+; VI-NEXT: v_add_f32_e32 v20, 1.0, v20
+; VI-NEXT: v_add_f32_e32 v19, 1.0, v19
+; VI-NEXT: v_add_f32_e32 v18, 1.0, v18
+; VI-NEXT: v_add_f32_e32 v17, 1.0, v17
+; VI-NEXT: v_add_f32_e32 v16, 1.0, v16
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB1_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f32_to_v32i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB1_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_f32_e32 v31, 1.0, v31
+; GFX9-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GFX9-NEXT: v_add_f32_e32 v29, 1.0, v29
+; GFX9-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GFX9-NEXT: v_add_f32_e32 v27, 1.0, v27
+; GFX9-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GFX9-NEXT: v_add_f32_e32 v25, 1.0, v25
+; GFX9-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GFX9-NEXT: v_add_f32_e32 v23, 1.0, v23
+; GFX9-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GFX9-NEXT: v_add_f32_e32 v21, 1.0, v21
+; GFX9-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GFX9-NEXT: v_add_f32_e32 v19, 1.0, v19
+; GFX9-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GFX9-NEXT: v_add_f32_e32 v17, 1.0, v17
+; GFX9-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB1_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f32_to_v32i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB1_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_dual_add_f32 v31, 1.0, v31 :: v_dual_add_f32 v30, 1.0, v30
+; GFX11-NEXT: v_dual_add_f32 v29, 1.0, v29 :: v_dual_add_f32 v28, 1.0, v28
+; GFX11-NEXT: v_dual_add_f32 v27, 1.0, v27 :: v_dual_add_f32 v26, 1.0, v26
+; GFX11-NEXT: v_dual_add_f32 v25, 1.0, v25 :: v_dual_add_f32 v24, 1.0, v24
+; GFX11-NEXT: v_dual_add_f32 v23, 1.0, v23 :: v_dual_add_f32 v22, 1.0, v22
+; GFX11-NEXT: v_dual_add_f32 v21, 1.0, v21 :: v_dual_add_f32 v20, 1.0, v20
+; GFX11-NEXT: v_dual_add_f32 v19, 1.0, v19 :: v_dual_add_f32 v18, 1.0, v18
+; GFX11-NEXT: v_dual_add_f32 v17, 1.0, v17 :: v_dual_add_f32 v16, 1.0, v16
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB1_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <32 x float> %a1 to <32 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x float> %a to <32 x i32>
+ br label %end
+
+end:
+ %phi = phi <32 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i32> %phi
+}
+
+define <16 x i64> @v_bitcast_v32i32_to_v16i64(<32 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i32_to_v16i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_i32_e32 v31, vcc, 3, v31
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: v_add_i32_e32 v29, vcc, 3, v29
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v27, vcc, 3, v27
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v25, vcc, 3, v25
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v23, vcc, 3, v23
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v21, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v19, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v17, vcc, 3, v17
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i32_to_v16i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB2_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_u32_e32 v31, vcc, 3, v31
+; VI-NEXT: v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT: v_add_u32_e32 v29, vcc, 3, v29
+; VI-NEXT: v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT: v_add_u32_e32 v27, vcc, 3, v27
+; VI-NEXT: v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT: v_add_u32_e32 v25, vcc, 3, v25
+; VI-NEXT: v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT: v_add_u32_e32 v23, vcc, 3, v23
+; VI-NEXT: v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT: v_add_u32_e32 v21, vcc, 3, v21
+; VI-NEXT: v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT: v_add_u32_e32 v19, vcc, 3, v19
+; VI-NEXT: v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT: v_add_u32_e32 v17, vcc, 3, v17
+; VI-NEXT: v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB2_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i32_to_v16i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB2_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_u32_e32 v31, 3, v31
+; GFX9-NEXT: v_add_u32_e32 v30, 3, v30
+; GFX9-NEXT: v_add_u32_e32 v29, 3, v29
+; GFX9-NEXT: v_add_u32_e32 v28, 3, v28
+; GFX9-NEXT: v_add_u32_e32 v27, 3, v27
+; GFX9-NEXT: v_add_u32_e32 v26, 3, v26
+; GFX9-NEXT: v_add_u32_e32 v25, 3, v25
+; GFX9-NEXT: v_add_u32_e32 v24, 3, v24
+; GFX9-NEXT: v_add_u32_e32 v23, 3, v23
+; GFX9-NEXT: v_add_u32_e32 v22, 3, v22
+; GFX9-NEXT: v_add_u32_e32 v21, 3, v21
+; GFX9-NEXT: v_add_u32_e32 v20, 3, v20
+; GFX9-NEXT: v_add_u32_e32 v19, 3, v19
+; GFX9-NEXT: v_add_u32_e32 v18, 3, v18
+; GFX9-NEXT: v_add_u32_e32 v17, 3, v17
+; GFX9-NEXT: v_add_u32_e32 v16, 3, v16
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB2_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i32_to_v16i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB2_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_nc_u32_e32 v31, 3, v31
+; GFX11-NEXT: v_add_nc_u32_e32 v30, 3, v30
+; GFX11-NEXT: v_add_nc_u32_e32 v29, 3, v29
+; GFX11-NEXT: v_add_nc_u32_e32 v28, 3, v28
+; GFX11-NEXT: v_add_nc_u32_e32 v27, 3, v27
+; GFX11-NEXT: v_add_nc_u32_e32 v26, 3, v26
+; GFX11-NEXT: v_add_nc_u32_e32 v25, 3, v25
+; GFX11-NEXT: v_add_nc_u32_e32 v24, 3, v24
+; GFX11-NEXT: v_add_nc_u32_e32 v23, 3, v23
+; GFX11-NEXT: v_add_nc_u32_e32 v22, 3, v22
+; GFX11-NEXT: v_add_nc_u32_e32 v21, 3, v21
+; GFX11-NEXT: v_add_nc_u32_e32 v20, 3, v20
+; GFX11-NEXT: v_add_nc_u32_e32 v19, 3, v19
+; GFX11-NEXT: v_add_nc_u32_e32 v18, 3, v18
+; GFX11-NEXT: v_add_nc_u32_e32 v17, 3, v17
+; GFX11-NEXT: v_add_nc_u32_e32 v16, 3, v16
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB2_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i32> %a, splat (i32 3)
+ %a2 = bitcast <32 x i32> %a1 to <16 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i32> %a to <16 x i64>
+ br label %end
+
+end:
+ %phi = phi <16 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i64> %phi
+}
+
+define <32 x i32> @v_bitcast_v16i64_to_v32i32(<16 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i64_to_v32i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_addc_u32_e32 v31, vcc, 0, v31, vcc
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_addc_u32_e32 v29, vcc, 0, v29, vcc
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_addc_u32_e32 v27, vcc, 0, v27, vcc
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_addc_u32_e32 v25, vcc, 0, v25, vcc
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_addc_u32_e32 v23, vcc, 0, v23, vcc
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_addc_u32_e32 v21, vcc, 0, v21, vcc
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_addc_u32_e32 v19, vcc, 0, v19, vcc
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_addc_u32_e32 v17, vcc, 0, v17, vcc
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i64_to_v32i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB3_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_addc_u32_e32 v31, vcc, 0, v31, vcc
+; VI-NEXT: v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT: v_addc_u32_e32 v29, vcc, 0, v29, vcc
+; VI-NEXT: v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT: v_addc_u32_e32 v27, vcc, 0, v27, vcc
+; VI-NEXT: v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT: v_addc_u32_e32 v25, vcc, 0, v25, vcc
+; VI-NEXT: v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT: v_addc_u32_e32 v23, vcc, 0, v23, vcc
+; VI-NEXT: v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT: v_addc_u32_e32 v21, vcc, 0, v21, vcc
+; VI-NEXT: v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT: v_addc_u32_e32 v19, vcc, 0, v19, vcc
+; VI-NEXT: v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT: v_addc_u32_e32 v17, vcc, 0, v17, vcc
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB3_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i64_to_v32i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB3_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v30, vcc, 3, v30
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_addc_co_u32_e32 v31, vcc, 0, v31, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v28, vcc, 3, v28
+; GFX9-NEXT: v_addc_co_u32_e32 v29, vcc, 0, v29, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v26, vcc, 3, v26
+; GFX9-NEXT: v_addc_co_u32_e32 v27, vcc, 0, v27, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v24, vcc, 3, v24
+; GFX9-NEXT: v_addc_co_u32_e32 v25, vcc, 0, v25, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v22, vcc, 3, v22
+; GFX9-NEXT: v_addc_co_u32_e32 v23, vcc, 0, v23, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v20, vcc, 3, v20
+; GFX9-NEXT: v_addc_co_u32_e32 v21, vcc, 0, v21, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v18, vcc, 3, v18
+; GFX9-NEXT: v_addc_co_u32_e32 v19, vcc, 0, v19, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v16, vcc, 3, v16
+; GFX9-NEXT: v_addc_co_u32_e32 v17, vcc, 0, v17, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB3_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i64_to_v32i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB3_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v30, vcc_lo, v30, 3
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_co_ci_u32_e32 v31, vcc_lo, 0, v31, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v28, vcc_lo, v28, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v29, vcc_lo, 0, v29, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v26, vcc_lo, v26, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v27, vcc_lo, 0, v27, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v24, vcc_lo, v24, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v25, vcc_lo, 0, v25, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v22, vcc_lo, v22, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v23, vcc_lo, 0, v23, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v20, vcc_lo, v20, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v21, vcc_lo, 0, v21, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v18, vcc_lo, v18, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v19, vcc_lo, 0, v19, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v16, vcc_lo, v16, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v17, vcc_lo, 0, v17, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB3_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i64> %a, splat (i64 3)
+ %a2 = bitcast <16 x i64> %a1 to <32 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i64> %a to <32 x i32>
+ br label %end
+
+end:
+ %phi = phi <32 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i32> %phi
+}
+
+define <16 x double> @v_bitcast_v32i32_to_v16f64(<32 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i32_to_v16f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_i32_e32 v31, vcc, 3, v31
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: v_add_i32_e32 v29, vcc, 3, v29
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v27, vcc, 3, v27
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v25, vcc, 3, v25
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v23, vcc, 3, v23
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v21, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v19, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v17, vcc, 3, v17
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i32_to_v16f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB4_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_u32_e32 v31, vcc, 3, v31
+; VI-NEXT: v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT: v_add_u32_e32 v29, vcc, 3, v29
+; VI-NEXT: v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT: v_add_u32_e32 v27, vcc, 3, v27
+; VI-NEXT: v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT: v_add_u32_e32 v25, vcc, 3, v25
+; VI-NEXT: v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT: v_add_u32_e32 v23, vcc, 3, v23
+; VI-NEXT: v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT: v_add_u32_e32 v21, vcc, 3, v21
+; VI-NEXT: v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT: v_add_u32_e32 v19, vcc, 3, v19
+; VI-NEXT: v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT: v_add_u32_e32 v17, vcc, 3, v17
+; VI-NEXT: v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB4_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i32_to_v16f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB4_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_u32_e32 v31, 3, v31
+; GFX9-NEXT: v_add_u32_e32 v30, 3, v30
+; GFX9-NEXT: v_add_u32_e32 v29, 3, v29
+; GFX9-NEXT: v_add_u32_e32 v28, 3, v28
+; GFX9-NEXT: v_add_u32_e32 v27, 3, v27
+; GFX9-NEXT: v_add_u32_e32 v26, 3, v26
+; GFX9-NEXT: v_add_u32_e32 v25, 3, v25
+; GFX9-NEXT: v_add_u32_e32 v24, 3, v24
+; GFX9-NEXT: v_add_u32_e32 v23, 3, v23
+; GFX9-NEXT: v_add_u32_e32 v22, 3, v22
+; GFX9-NEXT: v_add_u32_e32 v21, 3, v21
+; GFX9-NEXT: v_add_u32_e32 v20, 3, v20
+; GFX9-NEXT: v_add_u32_e32 v19, 3, v19
+; GFX9-NEXT: v_add_u32_e32 v18, 3, v18
+; GFX9-NEXT: v_add_u32_e32 v17, 3, v17
+; GFX9-NEXT: v_add_u32_e32 v16, 3, v16
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB4_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i32_to_v16f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB4_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_nc_u32_e32 v31, 3, v31
+; GFX11-NEXT: v_add_nc_u32_e32 v30, 3, v30
+; GFX11-NEXT: v_add_nc_u32_e32 v29, 3, v29
+; GFX11-NEXT: v_add_nc_u32_e32 v28, 3, v28
+; GFX11-NEXT: v_add_nc_u32_e32 v27, 3, v27
+; GFX11-NEXT: v_add_nc_u32_e32 v26, 3, v26
+; GFX11-NEXT: v_add_nc_u32_e32 v25, 3, v25
+; GFX11-NEXT: v_add_nc_u32_e32 v24, 3, v24
+; GFX11-NEXT: v_add_nc_u32_e32 v23, 3, v23
+; GFX11-NEXT: v_add_nc_u32_e32 v22, 3, v22
+; GFX11-NEXT: v_add_nc_u32_e32 v21, 3, v21
+; GFX11-NEXT: v_add_nc_u32_e32 v20, 3, v20
+; GFX11-NEXT: v_add_nc_u32_e32 v19, 3, v19
+; GFX11-NEXT: v_add_nc_u32_e32 v18, 3, v18
+; GFX11-NEXT: v_add_nc_u32_e32 v17, 3, v17
+; GFX11-NEXT: v_add_nc_u32_e32 v16, 3, v16
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB4_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i32> %a, splat (i32 3)
+ %a2 = bitcast <32 x i32> %a1 to <16 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i32> %a to <16 x double>
+ br label %end
+
+end:
+ %phi = phi <16 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x double> %phi
+}
+
+define <32 x i32> @v_bitcast_v16f64_to_v32i32(<16 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f64_to_v32i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GCN-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GCN-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GCN-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GCN-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GCN-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GCN-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GCN-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f64_to_v32i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB5_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; VI-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; VI-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; VI-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; VI-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; VI-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; VI-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; VI-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB5_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f64_to_v32i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB5_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GFX9-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GFX9-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GFX9-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GFX9-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GFX9-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GFX9-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GFX9-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB5_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f64_to_v32i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB5_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GFX11-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GFX11-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GFX11-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GFX11-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GFX11-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GFX11-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GFX11-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB5_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <16 x double> %a1 to <32 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x double> %a to <32 x i32>
+ br label %end
+
+end:
+ %phi = phi <32 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i32> %phi
+}
+
+define <16 x i64> @v_bitcast_v32f32_to_v16i64(<32 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f32_to_v16i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_f32_e32 v31, 1.0, v31
+; GCN-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GCN-NEXT: v_add_f32_e32 v29, 1.0, v29
+; GCN-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GCN-NEXT: v_add_f32_e32 v27, 1.0, v27
+; GCN-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GCN-NEXT: v_add_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GCN-NEXT: v_add_f32_e32 v23, 1.0, v23
+; GCN-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GCN-NEXT: v_add_f32_e32 v21, 1.0, v21
+; GCN-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GCN-NEXT: v_add_f32_e32 v19, 1.0, v19
+; GCN-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GCN-NEXT: v_add_f32_e32 v17, 1.0, v17
+; GCN-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB6_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f32_to_v16i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB6_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_f32_e32 v31, 1.0, v31
+; VI-NEXT: v_add_f32_e32 v30, 1.0, v30
+; VI-NEXT: v_add_f32_e32 v29, 1.0, v29
+; VI-NEXT: v_add_f32_e32 v28, 1.0, v28
+; VI-NEXT: v_add_f32_e32 v27, 1.0, v27
+; VI-NEXT: v_add_f32_e32 v26, 1.0, v26
+; VI-NEXT: v_add_f32_e32 v25, 1.0, v25
+; VI-NEXT: v_add_f32_e32 v24, 1.0, v24
+; VI-NEXT: v_add_f32_e32 v23, 1.0, v23
+; VI-NEXT: v_add_f32_e32 v22, 1.0, v22
+; VI-NEXT: v_add_f32_e32 v21, 1.0, v21
+; VI-NEXT: v_add_f32_e32 v20, 1.0, v20
+; VI-NEXT: v_add_f32_e32 v19, 1.0, v19
+; VI-NEXT: v_add_f32_e32 v18, 1.0, v18
+; VI-NEXT: v_add_f32_e32 v17, 1.0, v17
+; VI-NEXT: v_add_f32_e32 v16, 1.0, v16
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB6_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f32_to_v16i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB6_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_f32_e32 v31, 1.0, v31
+; GFX9-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GFX9-NEXT: v_add_f32_e32 v29, 1.0, v29
+; GFX9-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GFX9-NEXT: v_add_f32_e32 v27, 1.0, v27
+; GFX9-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GFX9-NEXT: v_add_f32_e32 v25, 1.0, v25
+; GFX9-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GFX9-NEXT: v_add_f32_e32 v23, 1.0, v23
+; GFX9-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GFX9-NEXT: v_add_f32_e32 v21, 1.0, v21
+; GFX9-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GFX9-NEXT: v_add_f32_e32 v19, 1.0, v19
+; GFX9-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GFX9-NEXT: v_add_f32_e32 v17, 1.0, v17
+; GFX9-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB6_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f32_to_v16i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB6_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_dual_add_f32 v31, 1.0, v31 :: v_dual_add_f32 v30, 1.0, v30
+; GFX11-NEXT: v_dual_add_f32 v29, 1.0, v29 :: v_dual_add_f32 v28, 1.0, v28
+; GFX11-NEXT: v_dual_add_f32 v27, 1.0, v27 :: v_dual_add_f32 v26, 1.0, v26
+; GFX11-NEXT: v_dual_add_f32 v25, 1.0, v25 :: v_dual_add_f32 v24, 1.0, v24
+; GFX11-NEXT: v_dual_add_f32 v23, 1.0, v23 :: v_dual_add_f32 v22, 1.0, v22
+; GFX11-NEXT: v_dual_add_f32 v21, 1.0, v21 :: v_dual_add_f32 v20, 1.0, v20
+; GFX11-NEXT: v_dual_add_f32 v19, 1.0, v19 :: v_dual_add_f32 v18, 1.0, v18
+; GFX11-NEXT: v_dual_add_f32 v17, 1.0, v17 :: v_dual_add_f32 v16, 1.0, v16
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB6_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <32 x float> %a1 to <16 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x float> %a to <16 x i64>
+ br label %end
+
+end:
+ %phi = phi <16 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i64> %phi
+}
+
+define <32 x float> @v_bitcast_v16i64_to_v32f32(<16 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i64_to_v32f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_addc_u32_e32 v31, vcc, 0, v31, vcc
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_addc_u32_e32 v29, vcc, 0, v29, vcc
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_addc_u32_e32 v27, vcc, 0, v27, vcc
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_addc_u32_e32 v25, vcc, 0, v25, vcc
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_addc_u32_e32 v23, vcc, 0, v23, vcc
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_addc_u32_e32 v21, vcc, 0, v21, vcc
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_addc_u32_e32 v19, vcc, 0, v19, vcc
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_addc_u32_e32 v17, vcc, 0, v17, vcc
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i64_to_v32f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB7_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_addc_u32_e32 v31, vcc, 0, v31, vcc
+; VI-NEXT: v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT: v_addc_u32_e32 v29, vcc, 0, v29, vcc
+; VI-NEXT: v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT: v_addc_u32_e32 v27, vcc, 0, v27, vcc
+; VI-NEXT: v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT: v_addc_u32_e32 v25, vcc, 0, v25, vcc
+; VI-NEXT: v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT: v_addc_u32_e32 v23, vcc, 0, v23, vcc
+; VI-NEXT: v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT: v_addc_u32_e32 v21, vcc, 0, v21, vcc
+; VI-NEXT: v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT: v_addc_u32_e32 v19, vcc, 0, v19, vcc
+; VI-NEXT: v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT: v_addc_u32_e32 v17, vcc, 0, v17, vcc
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB7_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i64_to_v32f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB7_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v30, vcc, 3, v30
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_addc_co_u32_e32 v31, vcc, 0, v31, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v28, vcc, 3, v28
+; GFX9-NEXT: v_addc_co_u32_e32 v29, vcc, 0, v29, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v26, vcc, 3, v26
+; GFX9-NEXT: v_addc_co_u32_e32 v27, vcc, 0, v27, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v24, vcc, 3, v24
+; GFX9-NEXT: v_addc_co_u32_e32 v25, vcc, 0, v25, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v22, vcc, 3, v22
+; GFX9-NEXT: v_addc_co_u32_e32 v23, vcc, 0, v23, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v20, vcc, 3, v20
+; GFX9-NEXT: v_addc_co_u32_e32 v21, vcc, 0, v21, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v18, vcc, 3, v18
+; GFX9-NEXT: v_addc_co_u32_e32 v19, vcc, 0, v19, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v16, vcc, 3, v16
+; GFX9-NEXT: v_addc_co_u32_e32 v17, vcc, 0, v17, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB7_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i64_to_v32f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB7_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v30, vcc_lo, v30, 3
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_co_ci_u32_e32 v31, vcc_lo, 0, v31, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v28, vcc_lo, v28, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v29, vcc_lo, 0, v29, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v26, vcc_lo, v26, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v27, vcc_lo, 0, v27, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v24, vcc_lo, v24, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v25, vcc_lo, 0, v25, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v22, vcc_lo, v22, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v23, vcc_lo, 0, v23, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v20, vcc_lo, v20, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v21, vcc_lo, 0, v21, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v18, vcc_lo, v18, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v19, vcc_lo, 0, v19, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v16, vcc_lo, v16, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v17, vcc_lo, 0, v17, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB7_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i64> %a, splat (i64 3)
+ %a2 = bitcast <16 x i64> %a1 to <32 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i64> %a to <32 x float>
+ br label %end
+
+end:
+ %phi = phi <32 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x float> %phi
+}
+
+define <16 x double> @v_bitcast_v32f32_to_v16f64(<32 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f32_to_v16f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_f32_e32 v31, 1.0, v31
+; GCN-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GCN-NEXT: v_add_f32_e32 v29, 1.0, v29
+; GCN-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GCN-NEXT: v_add_f32_e32 v27, 1.0, v27
+; GCN-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GCN-NEXT: v_add_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GCN-NEXT: v_add_f32_e32 v23, 1.0, v23
+; GCN-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GCN-NEXT: v_add_f32_e32 v21, 1.0, v21
+; GCN-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GCN-NEXT: v_add_f32_e32 v19, 1.0, v19
+; GCN-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GCN-NEXT: v_add_f32_e32 v17, 1.0, v17
+; GCN-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB8_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f32_to_v16f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB8_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_f32_e32 v31, 1.0, v31
+; VI-NEXT: v_add_f32_e32 v30, 1.0, v30
+; VI-NEXT: v_add_f32_e32 v29, 1.0, v29
+; VI-NEXT: v_add_f32_e32 v28, 1.0, v28
+; VI-NEXT: v_add_f32_e32 v27, 1.0, v27
+; VI-NEXT: v_add_f32_e32 v26, 1.0, v26
+; VI-NEXT: v_add_f32_e32 v25, 1.0, v25
+; VI-NEXT: v_add_f32_e32 v24, 1.0, v24
+; VI-NEXT: v_add_f32_e32 v23, 1.0, v23
+; VI-NEXT: v_add_f32_e32 v22, 1.0, v22
+; VI-NEXT: v_add_f32_e32 v21, 1.0, v21
+; VI-NEXT: v_add_f32_e32 v20, 1.0, v20
+; VI-NEXT: v_add_f32_e32 v19, 1.0, v19
+; VI-NEXT: v_add_f32_e32 v18, 1.0, v18
+; VI-NEXT: v_add_f32_e32 v17, 1.0, v17
+; VI-NEXT: v_add_f32_e32 v16, 1.0, v16
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB8_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f32_to_v16f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB8_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_f32_e32 v31, 1.0, v31
+; GFX9-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GFX9-NEXT: v_add_f32_e32 v29, 1.0, v29
+; GFX9-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GFX9-NEXT: v_add_f32_e32 v27, 1.0, v27
+; GFX9-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GFX9-NEXT: v_add_f32_e32 v25, 1.0, v25
+; GFX9-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GFX9-NEXT: v_add_f32_e32 v23, 1.0, v23
+; GFX9-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GFX9-NEXT: v_add_f32_e32 v21, 1.0, v21
+; GFX9-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GFX9-NEXT: v_add_f32_e32 v19, 1.0, v19
+; GFX9-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GFX9-NEXT: v_add_f32_e32 v17, 1.0, v17
+; GFX9-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB8_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f32_to_v16f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB8_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_dual_add_f32 v31, 1.0, v31 :: v_dual_add_f32 v30, 1.0, v30
+; GFX11-NEXT: v_dual_add_f32 v29, 1.0, v29 :: v_dual_add_f32 v28, 1.0, v28
+; GFX11-NEXT: v_dual_add_f32 v27, 1.0, v27 :: v_dual_add_f32 v26, 1.0, v26
+; GFX11-NEXT: v_dual_add_f32 v25, 1.0, v25 :: v_dual_add_f32 v24, 1.0, v24
+; GFX11-NEXT: v_dual_add_f32 v23, 1.0, v23 :: v_dual_add_f32 v22, 1.0, v22
+; GFX11-NEXT: v_dual_add_f32 v21, 1.0, v21 :: v_dual_add_f32 v20, 1.0, v20
+; GFX11-NEXT: v_dual_add_f32 v19, 1.0, v19 :: v_dual_add_f32 v18, 1.0, v18
+; GFX11-NEXT: v_dual_add_f32 v17, 1.0, v17 :: v_dual_add_f32 v16, 1.0, v16
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB8_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <32 x float> %a1 to <16 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x float> %a to <16 x double>
+ br label %end
+
+end:
+ %phi = phi <16 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x double> %phi
+}
+
+define <32 x float> @v_bitcast_v16f64_to_v32f32(<16 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f64_to_v32f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GCN-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GCN-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GCN-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GCN-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GCN-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GCN-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GCN-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB9_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f64_to_v32f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB9_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; VI-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; VI-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; VI-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; VI-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; VI-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; VI-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; VI-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB9_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f64_to_v32f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB9_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GFX9-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GFX9-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GFX9-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GFX9-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GFX9-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GFX9-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GFX9-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB9_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f64_to_v32f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB9_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GFX11-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GFX11-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GFX11-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GFX11-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GFX11-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GFX11-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GFX11-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB9_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <16 x double> %a1 to <32 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x double> %a to <32 x float>
+ br label %end
+
+end:
+ %phi = phi <32 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x float> %phi
+}
+
+define <16 x double> @v_bitcast_v16i64_to_v16f64(<16 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i64_to_v16f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_addc_u32_e32 v17, vcc, 0, v17, vcc
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_addc_u32_e32 v19, vcc, 0, v19, vcc
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_addc_u32_e32 v21, vcc, 0, v21, vcc
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_addc_u32_e32 v23, vcc, 0, v23, vcc
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_addc_u32_e32 v25, vcc, 0, v25, vcc
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_addc_u32_e32 v27, vcc, 0, v27, vcc
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_addc_u32_e32 v29, vcc, 0, v29, vcc
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_addc_u32_e32 v31, vcc, 0, v31, vcc
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i64_to_v16f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB10_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v16, vcc, 3, v16
+; VI-NEXT: v_addc_u32_e32 v17, vcc, 0, v17, vcc
+; VI-NEXT: v_add_u32_e32 v18, vcc, 3, v18
+; VI-NEXT: v_addc_u32_e32 v19, vcc, 0, v19, vcc
+; VI-NEXT: v_add_u32_e32 v20, vcc, 3, v20
+; VI-NEXT: v_addc_u32_e32 v21, vcc, 0, v21, vcc
+; VI-NEXT: v_add_u32_e32 v22, vcc, 3, v22
+; VI-NEXT: v_addc_u32_e32 v23, vcc, 0, v23, vcc
+; VI-NEXT: v_add_u32_e32 v24, vcc, 3, v24
+; VI-NEXT: v_addc_u32_e32 v25, vcc, 0, v25, vcc
+; VI-NEXT: v_add_u32_e32 v26, vcc, 3, v26
+; VI-NEXT: v_addc_u32_e32 v27, vcc, 0, v27, vcc
+; VI-NEXT: v_add_u32_e32 v28, vcc, 3, v28
+; VI-NEXT: v_addc_u32_e32 v29, vcc, 0, v29, vcc
+; VI-NEXT: v_add_u32_e32 v30, vcc, 3, v30
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_addc_u32_e32 v31, vcc, 0, v31, vcc
+; VI-NEXT: .LBB10_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i64_to_v16f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB10_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v16, vcc, 3, v16
+; GFX9-NEXT: v_addc_co_u32_e32 v17, vcc, 0, v17, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v18, vcc, 3, v18
+; GFX9-NEXT: v_addc_co_u32_e32 v19, vcc, 0, v19, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v20, vcc, 3, v20
+; GFX9-NEXT: v_addc_co_u32_e32 v21, vcc, 0, v21, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v22, vcc, 3, v22
+; GFX9-NEXT: v_addc_co_u32_e32 v23, vcc, 0, v23, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v24, vcc, 3, v24
+; GFX9-NEXT: v_addc_co_u32_e32 v25, vcc, 0, v25, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v26, vcc, 3, v26
+; GFX9-NEXT: v_addc_co_u32_e32 v27, vcc, 0, v27, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v28, vcc, 3, v28
+; GFX9-NEXT: v_addc_co_u32_e32 v29, vcc, 0, v29, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v30, vcc, 3, v30
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_addc_co_u32_e32 v31, vcc, 0, v31, vcc
+; GFX9-NEXT: .LBB10_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i64_to_v16f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB10_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v16, vcc_lo, v16, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v17, vcc_lo, 0, v17, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v18, vcc_lo, v18, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v19, vcc_lo, 0, v19, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v20, vcc_lo, v20, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v21, vcc_lo, 0, v21, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v22, vcc_lo, v22, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v23, vcc_lo, 0, v23, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v24, vcc_lo, v24, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v25, vcc_lo, 0, v25, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v26, vcc_lo, v26, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v27, vcc_lo, 0, v27, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v28, vcc_lo, v28, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v29, vcc_lo, 0, v29, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v30, vcc_lo, v30, 3
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_co_ci_u32_e32 v31, vcc_lo, 0, v31, vcc_lo
+; GFX11-NEXT: .LBB10_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i64> %a, splat (i64 3)
+ %a2 = bitcast <16 x i64> %a1 to <16 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i64> %a to <16 x double>
+ br label %end
+
+end:
+ %phi = phi <16 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x double> %phi
+}
+
+define <16 x i64> @v_bitcast_v16f64_to_v16i64(<16 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f64_to_v16i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GCN-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GCN-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GCN-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GCN-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GCN-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GCN-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GCN-NEXT: .LBB11_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f64_to_v16i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; VI-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB11_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; VI-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; VI-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; VI-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; VI-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; VI-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; VI-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; VI-NEXT: .LBB11_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_waitcnt vmcnt(0)
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f64_to_v16i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB11_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GFX9-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GFX9-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GFX9-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GFX9-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GFX9-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GFX9-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GFX9-NEXT: .LBB11_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f64_to_v16i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
+; GFX11-NEXT: scratch_load_b32 v31, off, s32
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v32
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB11_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GFX11-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GFX11-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GFX11-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GFX11-NEXT: v_add_f64 v[24:25], v[24:25], 1.0
+; GFX11-NEXT: v_add_f64 v[26:27], v[26:27], 1.0
+; GFX11-NEXT: v_add_f64 v[28:29], v[28:29], 1.0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: v_add_f64 v[30:31], v[30:31], 1.0
+; GFX11-NEXT: .LBB11_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <16 x double> %a1 to <16 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x double> %a to <16 x i64>
+ br label %end
+
+end:
+ %phi = phi <16 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i64> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll
new file mode 100644
index 0000000000000..5d09b22bf383c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll
@@ -0,0 +1,6084 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <4 x float> @v_bitcast_v4i32_to_v4f32(<4 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i32_to_v4f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i32_to_v4f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i32_to_v4f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i32_to_v4f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i32> %a, splat (i32 3)
+ %a2 = bitcast <4 x i32> %a1 to <4 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i32> %a to <4 x float>
+ br label %end
+
+end:
+ %phi = phi <4 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x float> %phi
+}
+
+define <4 x i32> @v_bitcast_v4f32_to_v4i32(<4 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f32_to_v4i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f32_to_v4i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f32_to_v4i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f32_to_v4i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <4 x float> %a1 to <4 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x float> %a to <4 x i32>
+ br label %end
+
+end:
+ %phi = phi <4 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i32> %phi
+}
+
+define <2 x i64> @v_bitcast_v4i32_to_v2i64(<4 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i32_to_v2i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i32_to_v2i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i32_to_v2i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i32_to_v2i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i32> %a, splat (i32 3)
+ %a2 = bitcast <4 x i32> %a1 to <2 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i32> %a to <2 x i64>
+ br label %end
+
+end:
+ %phi = phi <2 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i64> %phi
+}
+
+define <4 x i32> @v_bitcast_v2i64_to_v4i32(<2 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i64_to_v4i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i64_to_v4i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i64_to_v4i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i64_to_v4i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i64> %a, splat (i64 3)
+ %a2 = bitcast <2 x i64> %a1 to <4 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i64> %a to <4 x i32>
+ br label %end
+
+end:
+ %phi = phi <4 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i32> %phi
+}
+
+define <2 x double> @v_bitcast_v4i32_to_v2f64(<4 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i32_to_v2f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i32_to_v2f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i32_to_v2f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i32_to_v2f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i32> %a, splat (i32 3)
+ %a2 = bitcast <4 x i32> %a1 to <2 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i32> %a to <2 x double>
+ br label %end
+
+end:
+ %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x double> %phi
+}
+
+define <4 x i32> @v_bitcast_v2f64_to_v4i32(<2 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f64_to_v4i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f64_to_v4i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB5_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB5_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f64_to_v4i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB5_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB5_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f64_to_v4i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB5_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB5_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <2 x double> %a1 to <4 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x double> %a to <4 x i32>
+ br label %end
+
+end:
+ %phi = phi <4 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i32> %phi
+}
+
+define <8 x i16> @v_bitcast_v4i32_to_v8i16(<4 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i32_to_v8i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v8, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v5, v6, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB6_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_alignbit_b32 v5, v6, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB6_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v4, v8
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i32_to_v8i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i32_to_v8i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i32_to_v8i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i32> %a, splat (i32 3)
+ %a2 = bitcast <4 x i32> %a1 to <8 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i32> %a to <8 x i16>
+ br label %end
+
+end:
+ %phi = phi <8 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i16> %phi
+}
+
+define <4 x i32> @v_bitcast_v8i16_to_v4i32(<8 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i16_to_v4i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_4
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB7_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v6
+; GCN-NEXT: v_or_b32_e32 v0, v0, v8
+; GCN-NEXT: v_or_b32_e32 v1, v1, v11
+; GCN-NEXT: v_or_b32_e32 v2, v2, v5
+; GCN-NEXT: v_or_b32_e32 v3, v3, v7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: .LBB7_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v6
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_or_b32_e32 v0, v8, v0
+; GCN-NEXT: v_or_b32_e32 v1, v11, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v7, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 0x30000, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 0x30000, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i16_to_v4i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB7_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v5, 3
+; VI-NEXT: v_add_u16_e32 v4, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v4, v3
+; VI-NEXT: v_add_u16_e32 v4, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v4, v2
+; VI-NEXT: v_add_u16_e32 v4, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v4, v1
+; VI-NEXT: v_add_u16_e32 v4, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v4, v0
+; VI-NEXT: .LBB7_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i16_to_v4i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i16_to_v4i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i16> %a, splat (i16 3)
+ %a2 = bitcast <8 x i16> %a1 to <4 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i16> %a to <4 x i32>
+ br label %end
+
+end:
+ %phi = phi <4 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i32> %phi
+}
+
+define <8 x half> @v_bitcast_v4i32_to_v8f16(<4 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i32_to_v8f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v9, v3
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v11, v1
+; GCN-NEXT: v_mov_b32_e32 v8, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_4
+; GCN-NEXT: .LBB8_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB8_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: .LBB8_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v8
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i32_to_v8f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i32_to_v8f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i32_to_v8f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i32> %a, splat (i32 3)
+ %a2 = bitcast <4 x i32> %a1 to <8 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i32> %a to <8 x half>
+ br label %end
+
+end:
+ %phi = phi <8 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x half> %phi
+}
+
+define <4 x i32> @v_bitcast_v8f16_to_v4i32(<8 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f16_to_v4i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_4
+; GCN-NEXT: .LBB9_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB9_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v10, v0
+; GCN-NEXT: v_or_b32_e32 v1, v8, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v4, v3
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: .LBB9_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_or_b32_e32 v2, v5, v6
+; GCN-NEXT: v_or_b32_e32 v3, v4, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f16_to_v4i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB9_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v4, 0x200
+; VI-NEXT: v_add_f16_sdwa v5, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v5
+; VI-NEXT: v_add_f16_sdwa v5, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v5
+; VI-NEXT: v_add_f16_sdwa v5, v1, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v4, v0, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v5
+; VI-NEXT: v_or_b32_e32 v0, v0, v4
+; VI-NEXT: .LBB9_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f16_to_v4i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f16_to_v4i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <8 x half> %a1 to <4 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x half> %a to <4 x i32>
+ br label %end
+
+end:
+ %phi = phi <4 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i32> %phi
+}
+
+define <8 x bfloat> @v_bitcast_v4i32_to_v8bf16(<4 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i32_to_v8bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v11, v3
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v1
+; GCN-NEXT: v_mov_b32_e32 v8, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_4
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB10_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v11
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v10
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: .LBB10_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v11
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i32_to_v8bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i32_to_v8bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i32_to_v8bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i32> %a, splat (i32 3)
+ %a2 = bitcast <4 x i32> %a1 to <8 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i32> %a to <8 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <8 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x bfloat> %phi
+}
+
+define <4 x i32> @v_bitcast_v8bf16_to_v4i32(<8 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8bf16_to_v4i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v12, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_4
+; GCN-NEXT: .LBB11_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB11_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v0, v11, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v9, 16
+; GCN-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: .LBB11_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v9
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v12
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v2, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8bf16_to_v4i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB11_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v3
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v3, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v3
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v2
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v1
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v0
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v0
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; VI-NEXT: .LBB11_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8bf16_to_v4i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB11_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v3, v4, v3, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v2
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v2, v4, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v1, v4, v1, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v0, v4, v0, s7
+; GFX9-NEXT: .LBB11_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8bf16_to_v4i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB11_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v4, 0x40c00000, v4 :: v_dual_lshlrev_b32 v3, 16, v3
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v7, v4, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v7, v7, v4, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX11-NEXT: v_add3_u32 v9, v9, v3, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v7, v8 :: v_dual_add_f32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v11, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v13, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v11, v5, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v1
+; GFX11-NEXT: v_add3_u32 v7, v13, v2, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v5, v11, v12 :: v_dual_add_f32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v0, 16, v0
+; GFX11-NEXT: v_perm_b32 v3, v4, v3, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v7, v8 :: v_dual_add_f32 v7, 0x40c00000, v10
+; GFX11-NEXT: v_bfe_u32 v10, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v8, v9, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v9, 0x400000, v6
+; GFX11-NEXT: v_bfe_u32 v11, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v7
+; GFX11-NEXT: v_perm_b32 v2, v5, v2, 0x7060302
+; GFX11-NEXT: v_add3_u32 v11, v11, v7, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v8, v9, vcc_lo
+; GFX11-NEXT: v_add3_u32 v9, v10, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v0, 0x40c00000, v0 :: v_dual_cndmask_b32 v1, v9, v10
+; GFX11-NEXT: v_bfe_u32 v8, v0, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v8, v8, v0, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_perm_b32 v1, v6, v1, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v8, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v7, v0, 0x7060302
+; GFX11-NEXT: .LBB11_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <8 x bfloat> %a1 to <4 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x bfloat> %a to <4 x i32>
+ br label %end
+
+end:
+ %phi = phi <4 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i32> %phi
+}
+
+define <2 x i64> @v_bitcast_v4f32_to_v2i64(<4 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f32_to_v2i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB12_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB12_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f32_to_v2i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f32_to_v2i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f32_to_v2i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <4 x float> %a1 to <2 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x float> %a to <2 x i64>
+ br label %end
+
+end:
+ %phi = phi <2 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i64> %phi
+}
+
+define <4 x float> @v_bitcast_v2i64_to_v4f32(<2 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i64_to_v4f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB13_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB13_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i64_to_v4f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i64_to_v4f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i64_to_v4f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i64> %a, splat (i64 3)
+ %a2 = bitcast <2 x i64> %a1 to <4 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i64> %a to <4 x float>
+ br label %end
+
+end:
+ %phi = phi <4 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x float> %phi
+}
+
+define <2 x double> @v_bitcast_v4f32_to_v2f64(<4 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f32_to_v2f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB14_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB14_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f32_to_v2f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f32_to_v2f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f32_to_v2f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <4 x float> %a1 to <2 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x float> %a to <2 x double>
+ br label %end
+
+end:
+ %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x double> %phi
+}
+
+define <4 x float> @v_bitcast_v2f64_to_v4f32(<2 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f64_to_v4f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB15_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB15_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f64_to_v4f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB15_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB15_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f64_to_v4f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB15_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB15_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f64_to_v4f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB15_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB15_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <2 x double> %a1 to <4 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x double> %a to <4 x float>
+ br label %end
+
+end:
+ %phi = phi <4 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x float> %phi
+}
+
+define <8 x i16> @v_bitcast_v4f32_to_v8i16(<4 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f32_to_v8i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v8, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v5, v6, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB16_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_alignbit_b32 v5, v6, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB16_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v4, v8
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f32_to_v8i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f32_to_v8i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f32_to_v8i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <4 x float> %a1 to <8 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x float> %a to <8 x i16>
+ br label %end
+
+end:
+ %phi = phi <8 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i16> %phi
+}
+
+define <4 x float> @v_bitcast_v8i16_to_v4f32(<8 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i16_to_v4f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_4
+; GCN-NEXT: .LBB17_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB17_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v6
+; GCN-NEXT: v_or_b32_e32 v0, v0, v8
+; GCN-NEXT: v_or_b32_e32 v1, v1, v11
+; GCN-NEXT: v_or_b32_e32 v2, v2, v5
+; GCN-NEXT: v_or_b32_e32 v3, v3, v7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB17_2
+; GCN-NEXT: .LBB17_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v6
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_or_b32_e32 v0, v8, v0
+; GCN-NEXT: v_or_b32_e32 v1, v11, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v7, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 0x30000, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 0x30000, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i16_to_v4f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB17_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v5, 3
+; VI-NEXT: v_add_u16_e32 v4, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v4, v3
+; VI-NEXT: v_add_u16_e32 v4, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v4, v2
+; VI-NEXT: v_add_u16_e32 v4, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v4, v1
+; VI-NEXT: v_add_u16_e32 v4, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v4, v0
+; VI-NEXT: .LBB17_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i16_to_v4f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i16_to_v4f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i16> %a, splat (i16 3)
+ %a2 = bitcast <8 x i16> %a1 to <4 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i16> %a to <4 x float>
+ br label %end
+
+end:
+ %phi = phi <4 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x float> %phi
+}
+
+define <8 x half> @v_bitcast_v4f32_to_v8f16(<4 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f32_to_v8f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v9, v3
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v11, v1
+; GCN-NEXT: v_mov_b32_e32 v8, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB18_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB18_4
+; GCN-NEXT: .LBB18_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB18_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_2
+; GCN-NEXT: .LBB18_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v8
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f32_to_v8f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f32_to_v8f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f32_to_v8f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <4 x float> %a1 to <8 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x float> %a to <8 x half>
+ br label %end
+
+end:
+ %phi = phi <8 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x half> %phi
+}
+
+define <4 x float> @v_bitcast_v8f16_to_v4f32(<8 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f16_to_v4f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_4
+; GCN-NEXT: .LBB19_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB19_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v10, v0
+; GCN-NEXT: v_or_b32_e32 v1, v8, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v4, v3
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB19_2
+; GCN-NEXT: .LBB19_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_or_b32_e32 v2, v5, v6
+; GCN-NEXT: v_or_b32_e32 v3, v4, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f16_to_v4f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB19_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v4, 0x200
+; VI-NEXT: v_add_f16_sdwa v5, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v5
+; VI-NEXT: v_add_f16_sdwa v5, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v5
+; VI-NEXT: v_add_f16_sdwa v5, v1, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v4, v0, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v5
+; VI-NEXT: v_or_b32_e32 v0, v0, v4
+; VI-NEXT: .LBB19_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f16_to_v4f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f16_to_v4f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <8 x half> %a1 to <4 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x half> %a to <4 x float>
+ br label %end
+
+end:
+ %phi = phi <4 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x float> %phi
+}
+
+define <8 x bfloat> @v_bitcast_v4f32_to_v8bf16(<4 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f32_to_v8bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v11, v3
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v1
+; GCN-NEXT: v_mov_b32_e32 v8, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB20_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB20_4
+; GCN-NEXT: .LBB20_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB20_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v11
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v10
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB20_2
+; GCN-NEXT: .LBB20_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v11
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f32_to_v8bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f32_to_v8bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f32_to_v8bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <4 x float> %a1 to <8 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x float> %a to <8 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <8 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x bfloat> %phi
+}
+
+define <4 x float> @v_bitcast_v8bf16_to_v4f32(<8 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8bf16_to_v4f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v12, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB21_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB21_4
+; GCN-NEXT: .LBB21_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB21_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v0, v11, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v9, 16
+; GCN-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB21_2
+; GCN-NEXT: .LBB21_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v9
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v12
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v2, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8bf16_to_v4f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB21_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v3
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v3, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v3
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v2
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v1
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v0
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v0
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; VI-NEXT: .LBB21_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8bf16_to_v4f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB21_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v3, v4, v3, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v2
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v2, v4, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v1, v4, v1, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v0, v4, v0, s7
+; GFX9-NEXT: .LBB21_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8bf16_to_v4f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB21_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v4, 0x40c00000, v4 :: v_dual_lshlrev_b32 v3, 16, v3
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v7, v4, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v7, v7, v4, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX11-NEXT: v_add3_u32 v9, v9, v3, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v7, v8 :: v_dual_add_f32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v11, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v13, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v11, v5, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v1
+; GFX11-NEXT: v_add3_u32 v7, v13, v2, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v5, v11, v12 :: v_dual_add_f32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v0, 16, v0
+; GFX11-NEXT: v_perm_b32 v3, v4, v3, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v7, v8 :: v_dual_add_f32 v7, 0x40c00000, v10
+; GFX11-NEXT: v_bfe_u32 v10, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v8, v9, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v9, 0x400000, v6
+; GFX11-NEXT: v_bfe_u32 v11, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v7
+; GFX11-NEXT: v_perm_b32 v2, v5, v2, 0x7060302
+; GFX11-NEXT: v_add3_u32 v11, v11, v7, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v8, v9, vcc_lo
+; GFX11-NEXT: v_add3_u32 v9, v10, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v0, 0x40c00000, v0 :: v_dual_cndmask_b32 v1, v9, v10
+; GFX11-NEXT: v_bfe_u32 v8, v0, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v8, v8, v0, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_perm_b32 v1, v6, v1, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v8, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v7, v0, 0x7060302
+; GFX11-NEXT: .LBB21_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <8 x bfloat> %a1 to <4 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x bfloat> %a to <4 x float>
+ br label %end
+
+end:
+ %phi = phi <4 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x float> %phi
+}
+
+define <2 x double> @v_bitcast_v2i64_to_v2f64(<2 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i64_to_v2f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB22_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: .LBB22_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i64_to_v2f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i64_to_v2f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i64_to_v2f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i64> %a, splat (i64 3)
+ %a2 = bitcast <2 x i64> %a1 to <2 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i64> %a to <2 x double>
+ br label %end
+
+end:
+ %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x double> %phi
+}
+
+define <2 x i64> @v_bitcast_v2f64_to_v2i64(<2 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f64_to_v2i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB23_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: .LBB23_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f64_to_v2i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB23_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: .LBB23_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f64_to_v2i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB23_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: .LBB23_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f64_to_v2i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB23_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: .LBB23_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <2 x double> %a1 to <2 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x double> %a to <2 x i64>
+ br label %end
+
+end:
+ %phi = phi <2 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i64> %phi
+}
+
+define <8 x i16> @v_bitcast_v2i64_to_v8i16(<2 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i64_to_v8i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v8, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v5, v6, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB24_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v2, vcc, 0, v2, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v6, vcc, 0, v6, vcc
+; GCN-NEXT: v_alignbit_b32 v5, v6, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB24_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v4, v8
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i64_to_v8i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i64_to_v8i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i64_to_v8i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i64> %a, splat (i64 3)
+ %a2 = bitcast <2 x i64> %a1 to <8 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i64> %a to <8 x i16>
+ br label %end
+
+end:
+ %phi = phi <8 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i16> %phi
+}
+
+define <2 x i64> @v_bitcast_v8i16_to_v2i64(<8 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i16_to_v2i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_4
+; GCN-NEXT: .LBB25_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB25_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v6
+; GCN-NEXT: v_or_b32_e32 v0, v0, v8
+; GCN-NEXT: v_or_b32_e32 v1, v1, v11
+; GCN-NEXT: v_or_b32_e32 v2, v2, v5
+; GCN-NEXT: v_or_b32_e32 v3, v3, v7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB25_2
+; GCN-NEXT: .LBB25_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v6
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_or_b32_e32 v0, v8, v0
+; GCN-NEXT: v_or_b32_e32 v1, v11, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v7, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 0x30000, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 0x30000, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i16_to_v2i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB25_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v5, 3
+; VI-NEXT: v_add_u16_e32 v4, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v4, v3
+; VI-NEXT: v_add_u16_e32 v4, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v4, v2
+; VI-NEXT: v_add_u16_e32 v4, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v4, v1
+; VI-NEXT: v_add_u16_e32 v4, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v4, v0
+; VI-NEXT: .LBB25_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i16_to_v2i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i16_to_v2i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i16> %a, splat (i16 3)
+ %a2 = bitcast <8 x i16> %a1 to <2 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i16> %a to <2 x i64>
+ br label %end
+
+end:
+ %phi = phi <2 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i64> %phi
+}
+
+define <8 x half> @v_bitcast_v2i64_to_v8f16(<2 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i64_to_v8f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v10, v3
+; GCN-NEXT: v_mov_b32_e32 v9, v2
+; GCN-NEXT: v_mov_b32_e32 v11, v1
+; GCN-NEXT: v_mov_b32_e32 v8, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB26_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB26_4
+; GCN-NEXT: .LBB26_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB26_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB26_2
+; GCN-NEXT: .LBB26_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v9
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v10, vcc
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v8
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i64_to_v8f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i64_to_v8f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i64_to_v8f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i64> %a, splat (i64 3)
+ %a2 = bitcast <2 x i64> %a1 to <8 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i64> %a to <8 x half>
+ br label %end
+
+end:
+ %phi = phi <8 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x half> %phi
+}
+
+define <2 x i64> @v_bitcast_v8f16_to_v2i64(<8 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f16_to_v2i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB27_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB27_4
+; GCN-NEXT: .LBB27_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB27_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v10, v0
+; GCN-NEXT: v_or_b32_e32 v1, v8, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v4, v3
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB27_2
+; GCN-NEXT: .LBB27_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_or_b32_e32 v2, v5, v6
+; GCN-NEXT: v_or_b32_e32 v3, v4, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f16_to_v2i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB27_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v4, 0x200
+; VI-NEXT: v_add_f16_sdwa v5, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v5
+; VI-NEXT: v_add_f16_sdwa v5, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v5
+; VI-NEXT: v_add_f16_sdwa v5, v1, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v4, v0, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v5
+; VI-NEXT: v_or_b32_e32 v0, v0, v4
+; VI-NEXT: .LBB27_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f16_to_v2i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f16_to_v2i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <8 x half> %a1 to <2 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x half> %a to <2 x i64>
+ br label %end
+
+end:
+ %phi = phi <2 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i64> %phi
+}
+
+define <8 x bfloat> @v_bitcast_v2i64_to_v8bf16(<2 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i64_to_v8bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v11, v3
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v1
+; GCN-NEXT: v_mov_b32_e32 v8, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_4
+; GCN-NEXT: .LBB28_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB28_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v11
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v10
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB28_2
+; GCN-NEXT: .LBB28_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v11, vcc
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i64_to_v8bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i64_to_v8bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i64_to_v8bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i64> %a, splat (i64 3)
+ %a2 = bitcast <2 x i64> %a1 to <8 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i64> %a to <8 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <8 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x bfloat> %phi
+}
+
+define <2 x i64> @v_bitcast_v8bf16_to_v2i64(<8 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8bf16_to_v2i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v12, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB29_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB29_4
+; GCN-NEXT: .LBB29_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB29_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v0, v11, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v9, 16
+; GCN-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB29_2
+; GCN-NEXT: .LBB29_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v9
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v12
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v2, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8bf16_to_v2i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB29_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v3
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v3, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v3
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v2
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v1
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v0
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v0
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; VI-NEXT: .LBB29_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8bf16_to_v2i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB29_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v3, v4, v3, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v2
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v2, v4, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v1, v4, v1, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v0, v4, v0, s7
+; GFX9-NEXT: .LBB29_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8bf16_to_v2i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB29_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v4, 0x40c00000, v4 :: v_dual_lshlrev_b32 v3, 16, v3
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v7, v4, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v7, v7, v4, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX11-NEXT: v_add3_u32 v9, v9, v3, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v7, v8 :: v_dual_add_f32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v11, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v13, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v11, v5, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v1
+; GFX11-NEXT: v_add3_u32 v7, v13, v2, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v5, v11, v12 :: v_dual_add_f32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v0, 16, v0
+; GFX11-NEXT: v_perm_b32 v3, v4, v3, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v7, v8 :: v_dual_add_f32 v7, 0x40c00000, v10
+; GFX11-NEXT: v_bfe_u32 v10, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v8, v9, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v9, 0x400000, v6
+; GFX11-NEXT: v_bfe_u32 v11, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v7
+; GFX11-NEXT: v_perm_b32 v2, v5, v2, 0x7060302
+; GFX11-NEXT: v_add3_u32 v11, v11, v7, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v8, v9, vcc_lo
+; GFX11-NEXT: v_add3_u32 v9, v10, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v0, 0x40c00000, v0 :: v_dual_cndmask_b32 v1, v9, v10
+; GFX11-NEXT: v_bfe_u32 v8, v0, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v8, v8, v0, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_perm_b32 v1, v6, v1, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v8, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v7, v0, 0x7060302
+; GFX11-NEXT: .LBB29_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <8 x bfloat> %a1 to <2 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x bfloat> %a to <2 x i64>
+ br label %end
+
+end:
+ %phi = phi <2 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i64> %phi
+}
+
+define <8 x i16> @v_bitcast_v2f64_to_v8i16(<2 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f64_to_v8i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v9, v3
+; GCN-NEXT: v_mov_b32_e32 v8, v2
+; GCN-NEXT: v_mov_b32_e32 v11, v1
+; GCN-NEXT: v_mov_b32_e32 v10, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v5, v9, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v11, v10, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v11
+; GCN-NEXT: .LBB30_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_alignbit_b32 v5, v9, v8, 16
+; GCN-NEXT: v_alignbit_b32 v1, v11, v10, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v11
+; GCN-NEXT: .LBB30_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v10
+; GCN-NEXT: v_mov_b32_e32 v2, v11
+; GCN-NEXT: v_mov_b32_e32 v4, v8
+; GCN-NEXT: v_mov_b32_e32 v6, v9
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f64_to_v8i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB30_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB30_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f64_to_v8i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB30_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB30_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f64_to_v8i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB30_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB30_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <2 x double> %a1 to <8 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x double> %a to <8 x i16>
+ br label %end
+
+end:
+ %phi = phi <8 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i16> %phi
+}
+
+define <2 x double> @v_bitcast_v8i16_to_v2f64(<8 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i16_to_v2f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v10, v2
+; GCN-NEXT: v_mov_b32_e32 v9, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_4
+; GCN-NEXT: .LBB31_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB31_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v6
+; GCN-NEXT: v_or_b32_e32 v0, v0, v8
+; GCN-NEXT: v_or_b32_e32 v1, v1, v11
+; GCN-NEXT: v_or_b32_e32 v2, v2, v5
+; GCN-NEXT: v_or_b32_e32 v3, v3, v7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB31_2
+; GCN-NEXT: .LBB31_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v6
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_or_b32_e32 v0, v8, v0
+; GCN-NEXT: v_or_b32_e32 v1, v11, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v7, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 0x30000, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 0x30000, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i16_to_v2f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB31_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v5, 3
+; VI-NEXT: v_add_u16_e32 v4, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v4, v3
+; VI-NEXT: v_add_u16_e32 v4, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v4, v2
+; VI-NEXT: v_add_u16_e32 v4, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v4, v1
+; VI-NEXT: v_add_u16_e32 v4, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v4, v0
+; VI-NEXT: .LBB31_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i16_to_v2f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i16_to_v2f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i16> %a, splat (i16 3)
+ %a2 = bitcast <8 x i16> %a1 to <2 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i16> %a to <2 x double>
+ br label %end
+
+end:
+ %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x double> %phi
+}
+
+define <8 x half> @v_bitcast_v2f64_to_v8f16(<2 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f64_to_v8f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: .LBB32_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: .LBB32_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v9
+; GCN-NEXT: v_mov_b32_e32 v1, v11
+; GCN-NEXT: v_mov_b32_e32 v2, v8
+; GCN-NEXT: v_mov_b32_e32 v3, v10
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f64_to_v8f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB32_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB32_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f64_to_v8f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB32_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB32_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f64_to_v8f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB32_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB32_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <2 x double> %a1 to <8 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x double> %a to <8 x half>
+ br label %end
+
+end:
+ %phi = phi <8 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x half> %phi
+}
+
+define <2 x double> @v_bitcast_v8f16_to_v2f64(<8 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f16_to_v2f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB33_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB33_4
+; GCN-NEXT: .LBB33_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB33_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v10, v0
+; GCN-NEXT: v_or_b32_e32 v1, v8, v1
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v4, v3
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB33_2
+; GCN-NEXT: .LBB33_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_or_b32_e32 v2, v5, v6
+; GCN-NEXT: v_or_b32_e32 v3, v4, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f16_to_v2f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB33_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v4, 0x200
+; VI-NEXT: v_add_f16_sdwa v5, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v5
+; VI-NEXT: v_add_f16_sdwa v5, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v5
+; VI-NEXT: v_add_f16_sdwa v5, v1, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v4, v0, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v5
+; VI-NEXT: v_or_b32_e32 v0, v0, v4
+; VI-NEXT: .LBB33_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f16_to_v2f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f16_to_v2f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <8 x half> %a1 to <2 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x half> %a to <2 x double>
+ br label %end
+
+end:
+ %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x double> %phi
+}
+
+define <8 x bfloat> @v_bitcast_v2f64_to_v8bf16(<2 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f64_to_v8bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: .LBB34_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v0
+; GCN-NEXT: .LBB34_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v11
+; GCN-NEXT: v_mov_b32_e32 v1, v10
+; GCN-NEXT: v_mov_b32_e32 v2, v9
+; GCN-NEXT: v_mov_b32_e32 v3, v8
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f64_to_v8bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB34_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB34_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f64_to_v8bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB34_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB34_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f64_to_v8bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB34_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB34_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <2 x double> %a1 to <8 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x double> %a to <8 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <8 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x bfloat> %phi
+}
+
+define <2 x double> @v_bitcast_v8bf16_to_v2f64(<8 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8bf16_to_v2f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v12, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v6
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB35_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB35_4
+; GCN-NEXT: .LBB35_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB35_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v0, v11, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v9, 16
+; GCN-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB35_2
+; GCN-NEXT: .LBB35_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v9
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v12
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v2, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8bf16_to_v2f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB35_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v3
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v3, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v3
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v2
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v1
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v4, 16
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v0
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v0
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; VI-NEXT: .LBB35_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8bf16_to_v2f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB35_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v5, v6, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v3, v4, v3, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v2
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v2, v4, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v1, v4, v1, s7
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; GFX9-NEXT: v_perm_b32 v0, v4, v0, s7
+; GFX9-NEXT: .LBB35_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8bf16_to_v2f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB35_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v4, 0x40c00000, v4 :: v_dual_lshlrev_b32 v3, 16, v3
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v7, v4, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v7, v7, v4, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX11-NEXT: v_add3_u32 v9, v9, v3, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v7, v8 :: v_dual_add_f32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v11, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v13, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v11, v5, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v1
+; GFX11-NEXT: v_add3_u32 v7, v13, v2, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v5, v11, v12 :: v_dual_add_f32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v0, 16, v0
+; GFX11-NEXT: v_perm_b32 v3, v4, v3, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v7, v8 :: v_dual_add_f32 v7, 0x40c00000, v10
+; GFX11-NEXT: v_bfe_u32 v10, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v8, v9, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v9, 0x400000, v6
+; GFX11-NEXT: v_bfe_u32 v11, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v7
+; GFX11-NEXT: v_perm_b32 v2, v5, v2, 0x7060302
+; GFX11-NEXT: v_add3_u32 v11, v11, v7, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v8, v9, vcc_lo
+; GFX11-NEXT: v_add3_u32 v9, v10, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v0, 0x40c00000, v0 :: v_dual_cndmask_b32 v1, v9, v10
+; GFX11-NEXT: v_bfe_u32 v8, v0, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v8, v8, v0, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_perm_b32 v1, v6, v1, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v8, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v7, v0, 0x7060302
+; GFX11-NEXT: .LBB35_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <8 x bfloat> %a1 to <2 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x bfloat> %a to <2 x double>
+ br label %end
+
+end:
+ %phi = phi <2 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x double> %phi
+}
+
+define <8 x half> @v_bitcast_v8i16_to_v8f16(<8 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i16_to_v8f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v16, v7
+; GCN-NEXT: v_mov_b32_e32 v9, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v11, v4
+; GCN-NEXT: v_mov_b32_e32 v12, v3
+; GCN-NEXT: v_mov_b32_e32 v13, v2
+; GCN-NEXT: v_mov_b32_e32 v14, v1
+; GCN-NEXT: v_mov_b32_e32 v15, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB36_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB36_4
+; GCN-NEXT: .LBB36_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB36_3: ; %cmp.false
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB36_2
+; GCN-NEXT: .LBB36_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i16_to_v8f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB36_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v4, 3
+; VI-NEXT: v_add_u16_sdwa v5, v0, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v6, v1, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v7, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v4, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v3, 3, v3
+; VI-NEXT: v_add_u16_e32 v2, 3, v2
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v3, v3, v4
+; VI-NEXT: v_or_b32_e32 v2, v2, v7
+; VI-NEXT: v_or_b32_e32 v1, v1, v6
+; VI-NEXT: v_or_b32_e32 v0, v0, v5
+; VI-NEXT: .LBB36_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i16_to_v8f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i16_to_v8f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i16> %a, splat (i16 3)
+ %a2 = bitcast <8 x i16> %a1 to <8 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i16> %a to <8 x half>
+ br label %end
+
+end:
+ %phi = phi <8 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x half> %phi
+}
+
+define <8 x i16> @v_bitcast_v8f16_to_v8i16(<8 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f16_to_v8i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB37_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v0, v1
+; GCN-NEXT: v_or_b32_e32 v4, v4, v5
+; GCN-NEXT: v_or_b32_e32 v6, v6, v8
+; GCN-NEXT: v_or_b32_e32 v2, v2, v9
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v5, 16
+; GCN-NEXT: .LBB37_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f16_to_v8i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB37_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v5, 0x200
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v2
+; VI-NEXT: v_add_f16_sdwa v2, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v3
+; VI-NEXT: v_add_f16_sdwa v3, v3, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v8, v3
+; VI-NEXT: v_or_b32_e32 v2, v7, v2
+; VI-NEXT: v_or_b32_e32 v1, v6, v1
+; VI-NEXT: v_or_b32_e32 v0, v4, v0
+; VI-NEXT: .LBB37_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f16_to_v8i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f16_to_v8i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <8 x half> %a1 to <8 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x half> %a to <8 x i16>
+ br label %end
+
+end:
+ %phi = phi <8 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i16> %phi
+}
+
+define <8 x bfloat> @v_bitcast_v8i16_to_v8bf16(<8 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i16_to_v8bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v11, v6
+; GCN-NEXT: v_mov_b32_e32 v12, v4
+; GCN-NEXT: v_mov_b32_e32 v9, v2
+; GCN-NEXT: v_mov_b32_e32 v10, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_4
+; GCN-NEXT: .LBB38_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB38_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v11
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB38_2
+; GCN-NEXT: .LBB38_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v11
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v10
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_or_b32_e32 v0, v7, v0
+; GCN-NEXT: v_or_b32_e32 v2, v5, v2
+; GCN-NEXT: v_or_b32_e32 v3, v3, v4
+; GCN-NEXT: v_or_b32_e32 v1, v1, v6
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 0x30000, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i16_to_v8bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB38_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v4, 3
+; VI-NEXT: v_add_u16_sdwa v5, v0, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v6, v1, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v7, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v4, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v3, 3, v3
+; VI-NEXT: v_add_u16_e32 v2, 3, v2
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v3, v3, v4
+; VI-NEXT: v_or_b32_e32 v2, v2, v7
+; VI-NEXT: v_or_b32_e32 v1, v1, v6
+; VI-NEXT: v_or_b32_e32 v0, v0, v5
+; VI-NEXT: .LBB38_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i16_to_v8bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i16_to_v8bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i16> %a, splat (i16 3)
+ %a2 = bitcast <8 x i16> %a1 to <8 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i16> %a to <8 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <8 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x bfloat> %phi
+}
+
+define <8 x i16> @v_bitcast_v8bf16_to_v8i16(<8 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8bf16_to_v8i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_mul_f32_e32 v15, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v14, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v12, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v7
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB39_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB39_4
+; GCN-NEXT: .LBB39_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB39_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v10
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB39_2
+; GCN-NEXT: .LBB39_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v15
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v14
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v12
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v9
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v8
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v4
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v6
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GCN-NEXT: v_alignbit_b32 v0, v9, v0, 16
+; GCN-NEXT: v_alignbit_b32 v4, v10, v2, 16
+; GCN-NEXT: v_alignbit_b32 v6, v7, v8, 16
+; GCN-NEXT: v_alignbit_b32 v2, v3, v5, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v11, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8bf16_to_v8i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB39_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v0
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v0
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v5, 16, v1
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_bfe_u32 v6, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v6, vcc, v6, v5
+; VI-NEXT: v_add_u32_e32 v6, vcc, s6, v6
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v7, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v5, v6, v7, vcc
+; VI-NEXT: v_bfe_u32 v6, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v6, vcc, v6, v1
+; VI-NEXT: v_add_u32_e32 v6, vcc, s6, v6
+; VI-NEXT: v_or_b32_e32 v7, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v6, v7, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v6, 16, v2
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_bfe_u32 v7, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v7, vcc, v7, v6
+; VI-NEXT: v_add_u32_e32 v7, vcc, s6, v7
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v8, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v6, v7, v8, vcc
+; VI-NEXT: v_bfe_u32 v7, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v7, vcc, v7, v2
+; VI-NEXT: v_add_u32_e32 v7, vcc, s6, v7
+; VI-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v7, v8, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v7, 16, v3
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_bfe_u32 v8, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v8, vcc, v8, v7
+; VI-NEXT: v_add_u32_e32 v8, vcc, s6, v8
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v9, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v7, v8, v9, vcc
+; VI-NEXT: v_bfe_u32 v8, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v8, vcc, v8, v3
+; VI-NEXT: v_add_u32_e32 v8, vcc, 0x7fff, v8
+; VI-NEXT: v_or_b32_e32 v9, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v8, v9, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v7, 16
+; VI-NEXT: v_alignbit_b32 v2, v2, v6, 16
+; VI-NEXT: v_alignbit_b32 v1, v1, v5, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; VI-NEXT: .LBB39_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8bf16_to_v8i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB39_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_bfe_u32 v6, v5, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v6, v6, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v7, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v6, v7, vcc
+; GFX9-NEXT: v_bfe_u32 v6, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v6, v6, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v7, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v7, vcc
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v2
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_bfe_u32 v7, v6, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX9-NEXT: v_add3_u32 v7, v7, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v8, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v7, v8, vcc
+; GFX9-NEXT: v_bfe_u32 v7, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v7, v7, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v8, vcc
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_bfe_u32 v8, v7, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX9-NEXT: v_add3_u32 v8, v8, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v9, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v8, v9, vcc
+; GFX9-NEXT: v_bfe_u32 v8, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v8, v8, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v9, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v8, v9, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v3, v7, v3, s6
+; GFX9-NEXT: v_perm_b32 v2, v6, v2, s6
+; GFX9-NEXT: v_perm_b32 v1, v5, v1, s6
+; GFX9-NEXT: v_perm_b32 v0, v4, v0, s6
+; GFX9-NEXT: .LBB39_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8bf16_to_v8i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB39_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_bfe_u32 v7, v4, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v7, v7, v4, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v7, v8 :: v_dual_and_b32 v5, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v5, 0x40c00000, v5 :: v_dual_lshlrev_b32 v0, 16, v0
+; GFX11-NEXT: v_dual_add_f32 v0, 0x40c00000, v0 :: v_dual_lshlrev_b32 v1, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v11, v5, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX11-NEXT: v_add3_u32 v11, v11, v5, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v2
+; GFX11-NEXT: v_bfe_u32 v13, v1, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v5, v11, v12 :: v_dual_add_f32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_add3_u32 v7, v13, v1, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX11-NEXT: v_bfe_u32 v11, v6, 16, 1
+; GFX11-NEXT: v_add3_u32 v9, v9, v0, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v7, v8 :: v_dual_add_f32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v8, v11, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: v_bfe_u32 v12, v2, 16, 1
+; GFX11-NEXT: v_bfe_u32 v13, v7, 16, 1
+; GFX11-NEXT: v_bfe_u32 v14, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v8, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v8, v12, v2, 0x7fff
+; GFX11-NEXT: v_add3_u32 v11, v13, v7, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_add3_u32 v13, v14, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v3
+; GFX11-NEXT: v_perm_b32 v1, v5, v1, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v13, v14, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_perm_b32 v3, v7, v3, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v8, v15, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v2, v6, v2, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v4, v0, 0x7060302
+; GFX11-NEXT: .LBB39_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <8 x bfloat> %a1 to <8 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x bfloat> %a to <8 x i16>
+ br label %end
+
+end:
+ %phi = phi <8 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i16> %phi
+}
+
+define <8 x bfloat> @v_bitcast_v8f16_to_v8bf16(<8 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f16_to_v8bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v7
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB40_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB40_4
+; GCN-NEXT: .LBB40_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB40_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v15
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB40_2
+; GCN-NEXT: .LBB40_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v8
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v11
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f16_to_v8bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB40_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v5, 0x200
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v2
+; VI-NEXT: v_add_f16_sdwa v2, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v3
+; VI-NEXT: v_add_f16_sdwa v3, v3, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v8, v3
+; VI-NEXT: v_or_b32_e32 v2, v7, v2
+; VI-NEXT: v_or_b32_e32 v1, v6, v1
+; VI-NEXT: v_or_b32_e32 v0, v4, v0
+; VI-NEXT: .LBB40_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f16_to_v8bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f16_to_v8bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <8 x half> %a1 to <8 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x half> %a to <8 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <8 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x bfloat> %phi
+}
+
+define <8 x half> @v_bitcast_v8bf16_to_v8f16(<8 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8bf16_to_v8f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v12, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v14, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v15, 1.0, v7
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB41_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB41_4
+; GCN-NEXT: .LBB41_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB41_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB41_2
+; GCN-NEXT: .LBB41_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v15
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v14
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v12
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v9
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v8
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v8
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8bf16_to_v8f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB41_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v4, 16, v0
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_bfe_u32 v5, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v4
+; VI-NEXT: v_add_u32_e32 v5, vcc, 0x7fff, v5
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; VI-NEXT: v_bfe_u32 v5, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v5, vcc, v5, v0
+; VI-NEXT: v_add_u32_e32 v5, vcc, s6, v5
+; VI-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v5, 16, v1
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_bfe_u32 v6, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v6, vcc, v6, v5
+; VI-NEXT: v_add_u32_e32 v6, vcc, s6, v6
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v7, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v5, v6, v7, vcc
+; VI-NEXT: v_bfe_u32 v6, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v6, vcc, v6, v1
+; VI-NEXT: v_add_u32_e32 v6, vcc, s6, v6
+; VI-NEXT: v_or_b32_e32 v7, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v6, v7, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v6, 16, v2
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_bfe_u32 v7, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v7, vcc, v7, v6
+; VI-NEXT: v_add_u32_e32 v7, vcc, s6, v7
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v8, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v6, v7, v8, vcc
+; VI-NEXT: v_bfe_u32 v7, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v7, vcc, v7, v2
+; VI-NEXT: v_add_u32_e32 v7, vcc, s6, v7
+; VI-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v7, v8, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v7, 16, v3
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_bfe_u32 v8, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v8, vcc, v8, v7
+; VI-NEXT: v_add_u32_e32 v8, vcc, s6, v8
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v9, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v7, v8, v9, vcc
+; VI-NEXT: v_bfe_u32 v8, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v8, vcc, v8, v3
+; VI-NEXT: v_add_u32_e32 v8, vcc, 0x7fff, v8
+; VI-NEXT: v_or_b32_e32 v9, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v8, v9, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v7, 16
+; VI-NEXT: v_alignbit_b32 v2, v2, v6, 16
+; VI-NEXT: v_alignbit_b32 v1, v1, v5, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; VI-NEXT: .LBB41_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8bf16_to_v8f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB41_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_bfe_u32 v5, v4, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add3_u32 v5, v5, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v6, vcc
+; GFX9-NEXT: v_bfe_u32 v5, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v5, v5, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v6, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v6, vcc
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_bfe_u32 v6, v5, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v6, v6, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v7, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v6, v7, vcc
+; GFX9-NEXT: v_bfe_u32 v6, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v6, v6, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v7, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v7, vcc
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v2
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_bfe_u32 v7, v6, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX9-NEXT: v_add3_u32 v7, v7, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v8, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v7, v8, vcc
+; GFX9-NEXT: v_bfe_u32 v7, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v7, v7, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v8, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v8, vcc
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_bfe_u32 v8, v7, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX9-NEXT: v_add3_u32 v8, v8, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v9, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v8, v9, vcc
+; GFX9-NEXT: v_bfe_u32 v8, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v8, v8, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v9, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v8, v9, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v3, v7, v3, s6
+; GFX9-NEXT: v_perm_b32 v2, v6, v2, s6
+; GFX9-NEXT: v_perm_b32 v1, v5, v1, s6
+; GFX9-NEXT: v_perm_b32 v0, v4, v0, s6
+; GFX9-NEXT: .LBB41_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8bf16_to_v8f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v4
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB41_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_bfe_u32 v7, v4, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v7, v7, v4, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v7, v8 :: v_dual_and_b32 v5, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v5, 0x40c00000, v5 :: v_dual_lshlrev_b32 v0, 16, v0
+; GFX11-NEXT: v_dual_add_f32 v0, 0x40c00000, v0 :: v_dual_lshlrev_b32 v1, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v11, v5, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX11-NEXT: v_add3_u32 v11, v11, v5, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v2
+; GFX11-NEXT: v_bfe_u32 v13, v1, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v5, v11, v12 :: v_dual_add_f32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_add3_u32 v7, v13, v1, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GFX11-NEXT: v_bfe_u32 v11, v6, 16, 1
+; GFX11-NEXT: v_add3_u32 v9, v9, v0, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v7, v8 :: v_dual_add_f32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v8, v11, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: v_bfe_u32 v12, v2, 16, 1
+; GFX11-NEXT: v_bfe_u32 v13, v7, 16, 1
+; GFX11-NEXT: v_bfe_u32 v14, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v8, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v8, v12, v2, 0x7fff
+; GFX11-NEXT: v_add3_u32 v11, v13, v7, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_add3_u32 v13, v14, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v3
+; GFX11-NEXT: v_perm_b32 v1, v5, v1, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v13, v14, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_perm_b32 v3, v7, v3, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v8, v15, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v2, v6, v2, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v4, v0, 0x7060302
+; GFX11-NEXT: .LBB41_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <8 x bfloat> %a1 to <8 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x bfloat> %a to <8 x half>
+ br label %end
+
+end:
+ %phi = phi <8 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x half> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.160bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.160bit.ll
new file mode 100644
index 0000000000000..6396ca92d598a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.160bit.ll
@@ -0,0 +1,178 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <5 x float> @v_bitcast_v5i32_to_v5f32(<5 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v5i32_to_v5f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v5
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v5i32_to_v5f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v5
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v5i32_to_v5f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v5
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v5i32_to_v5f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v5
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <5 x i32> %a, splat (i32 3)
+ %a2 = bitcast <5 x i32> %a1 to <5 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <5 x i32> %a to <5 x float>
+ br label %end
+
+end:
+ %phi = phi <5 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <5 x float> %phi
+}
+
+define <5 x i32> @v_bitcast_v5f32_to_v5i32(<5 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v5f32_to_v5i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v5
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v5f32_to_v5i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v5
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v5f32_to_v5i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v5
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v5f32_to_v5i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v5
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v4, 1.0, v4 :: v_dual_add_f32 v3, 1.0, v3
+; GFX11-NEXT: v_dual_add_f32 v2, 1.0, v2 :: v_dual_add_f32 v1, 1.0, v1
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <5 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <5 x float> %a1 to <5 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <5 x float> %a to <5 x i32>
+ br label %end
+
+end:
+ %phi = phi <5 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <5 x i32> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.16bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.16bit.ll
new file mode 100644
index 0000000000000..6c88d7e5d5edd
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.16bit.ll
@@ -0,0 +1,556 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define half @v_bitcast_i16_to_f16(i16 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i16_to_f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB0_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB0_4
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB0_3: ; %cmp.false
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v2
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: .LBB0_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i16_to_f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i16_to_f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u16_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i16_to_f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u16 v0, v0, 3
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i16 %a, 3
+ %a2 = bitcast i16 %a1 to half
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i16 %a to half
+ br label %end
+
+end:
+ %phi = phi half [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret half %phi
+}
+
+define i16 @v_bitcast_f16_to_i16(half %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f16_to_i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v2, v0
+; GCN-NEXT: v_mov_b32_e32 v0, 0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB1_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB1_4
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB1_3: ; %cmp.false
+; GCN-NEXT: v_mov_b32_e32 v0, v1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: .LBB1_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f16_to_i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f16_to_i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f16_e32 v0, 0x200, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f16_to_i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f16_e32 v0, 0x200, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd half %a, 0xH0200
+ %a2 = bitcast half %a1 to i16
+ br label %end
+
+cmp.false:
+ %a3 = bitcast half %a to i16
+ br label %end
+
+end:
+ %phi = phi i16 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i16 %phi
+}
+
+define bfloat @v_bitcast_i16_to_bf16(i16 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i16_to_bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i16_to_bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i16_to_bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u16_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i16_to_bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u16 v0, v0, 3
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i16 %a, 3
+ %a2 = bitcast i16 %a1 to bfloat
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i16 %a to bfloat
+ br label %end
+
+end:
+ %phi = phi bfloat [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret bfloat %phi
+}
+
+define i16 @v_bitcast_bf16_to_i16(bfloat %a, i32 %b) {
+; GCN-LABEL: v_bitcast_bf16_to_i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v2, v0
+; GCN-NEXT: v_mov_b32_e32 v0, 0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_mul_f32_e32 v1, 1.0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB3_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB3_4
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB3_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v1
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: .LBB3_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_bf16_to_i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_bfe_u32 v1, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v1, vcc, v1, v0
+; VI-NEXT: v_add_u32_e32 v1, vcc, 0x7fff, v1
+; VI-NEXT: v_or_b32_e32 v2, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_bf16_to_i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_bfe_u32 v1, v0, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_add3_u32 v1, v1, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v2, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc
+; GFX9-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_bf16_to_i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB3_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v1, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v2, 0x400000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v1, v1, v0, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; GFX11-NEXT: .LBB3_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd bfloat %a, 0xR40C0
+ %a2 = bitcast bfloat %a1 to i16
+ br label %end
+
+cmp.false:
+ %a3 = bitcast bfloat %a to i16
+ br label %end
+
+end:
+ %phi = phi i16 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i16 %phi
+}
+
+define bfloat @v_bitcast_f16_to_bf16(half %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f16_to_bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB4_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB4_4
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB4_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v1
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: .LBB4_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f16_to_bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f16_to_bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f16_e32 v0, 0x200, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f16_to_bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f16_e32 v0, 0x200, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd half %a, 0xH0200
+ %a2 = bitcast half %a1 to bfloat
+ br label %end
+
+cmp.false:
+ %a3 = bitcast half %a to bfloat
+ br label %end
+
+end:
+ %phi = phi bfloat [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret bfloat %phi
+}
+
+define half @v_bitcast_bf16_to_f16(bfloat %a, i32 %b) {
+; GCN-LABEL: v_bitcast_bf16_to_f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_mul_f32_e32 v1, 1.0, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB5_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB5_4
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB5_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: .LBB5_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_bf16_to_f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_bfe_u32 v1, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v1, vcc, v1, v0
+; VI-NEXT: v_add_u32_e32 v1, vcc, 0x7fff, v1
+; VI-NEXT: v_or_b32_e32 v2, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_bf16_to_f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_bfe_u32 v1, v0, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_add3_u32 v1, v1, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v2, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc
+; GFX9-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_bf16_to_f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB5_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v1, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v2, 0x400000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v1, v1, v0, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v1, v2, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; GFX11-NEXT: .LBB5_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd bfloat %a, 0xR40C0
+ %a2 = bitcast bfloat %a1 to half
+ br label %end
+
+cmp.false:
+ %a3 = bitcast bfloat %a to half
+ br label %end
+
+end:
+ %phi = phi half [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret half %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.192bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.192bit.ll
new file mode 100644
index 0000000000000..c775efbb2b660
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.192bit.ll
@@ -0,0 +1,1062 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <6 x float> @v_bitcast_v6i32_to_v6f32(<6 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v6i32_to_v6f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v6i32_to_v6f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v6i32_to_v6f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v6i32_to_v6f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <6 x i32> %a, splat (i32 3)
+ %a2 = bitcast <6 x i32> %a1 to <6 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <6 x i32> %a to <6 x float>
+ br label %end
+
+end:
+ %phi = phi <6 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <6 x float> %phi
+}
+
+define <6 x i32> @v_bitcast_v6f32_to_v6i32(<6 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v6f32_to_v6i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v6f32_to_v6i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v6f32_to_v6i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v6f32_to_v6i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <6 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <6 x float> %a1 to <6 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <6 x float> %a to <6 x i32>
+ br label %end
+
+end:
+ %phi = phi <6 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <6 x i32> %phi
+}
+
+define <3 x i64> @v_bitcast_v6i32_to_v3i64(<6 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v6i32_to_v3i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v6i32_to_v3i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v6i32_to_v3i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v6i32_to_v3i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <6 x i32> %a, splat (i32 3)
+ %a2 = bitcast <6 x i32> %a1 to <3 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <6 x i32> %a to <3 x i64>
+ br label %end
+
+end:
+ %phi = phi <3 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x i64> %phi
+}
+
+define <6 x i32> @v_bitcast_v3i64_to_v6i32(<3 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3i64_to_v6i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3i64_to_v6i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3i64_to_v6i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3i64_to_v6i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <3 x i64> %a, splat (i64 3)
+ %a2 = bitcast <3 x i64> %a1 to <6 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x i64> %a to <6 x i32>
+ br label %end
+
+end:
+ %phi = phi <6 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <6 x i32> %phi
+}
+
+define <3 x double> @v_bitcast_v6i32_to_v3f64(<6 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v6i32_to_v3f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v6i32_to_v3f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v6i32_to_v3f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v6i32_to_v3f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <6 x i32> %a, splat (i32 3)
+ %a2 = bitcast <6 x i32> %a1 to <3 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <6 x i32> %a to <3 x double>
+ br label %end
+
+end:
+ %phi = phi <3 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x double> %phi
+}
+
+define <6 x i32> @v_bitcast_v3f64_to_v6i32(<3 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3f64_to_v6i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3f64_to_v6i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB5_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB5_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3f64_to_v6i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB5_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB5_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3f64_to_v6i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB5_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB5_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <3 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <3 x double> %a1 to <6 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x double> %a to <6 x i32>
+ br label %end
+
+end:
+ %phi = phi <6 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <6 x i32> %phi
+}
+
+define <3 x i64> @v_bitcast_v6f32_to_v3i64(<6 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v6f32_to_v3i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB6_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v6f32_to_v3i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v6f32_to_v3i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v6f32_to_v3i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <6 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <6 x float> %a1 to <3 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <6 x float> %a to <3 x i64>
+ br label %end
+
+end:
+ %phi = phi <3 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x i64> %phi
+}
+
+define <6 x float> @v_bitcast_v3i64_to_v6f32(<3 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3i64_to_v6f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3i64_to_v6f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3i64_to_v6f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3i64_to_v6f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <3 x i64> %a, splat (i64 3)
+ %a2 = bitcast <3 x i64> %a1 to <6 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x i64> %a to <6 x float>
+ br label %end
+
+end:
+ %phi = phi <6 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <6 x float> %phi
+}
+
+define <3 x double> @v_bitcast_v6f32_to_v3f64(<6 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v6f32_to_v3f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB8_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v6f32_to_v3f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v6f32_to_v3f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v6f32_to_v3f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <6 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <6 x float> %a1 to <3 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <6 x float> %a to <3 x double>
+ br label %end
+
+end:
+ %phi = phi <3 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x double> %phi
+}
+
+define <6 x float> @v_bitcast_v3f64_to_v6f32(<3 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3f64_to_v6f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB9_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3f64_to_v6f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB9_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB9_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3f64_to_v6f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB9_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB9_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3f64_to_v6f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB9_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB9_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <3 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <3 x double> %a1 to <6 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x double> %a to <6 x float>
+ br label %end
+
+end:
+ %phi = phi <6 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <6 x float> %phi
+}
+
+define <3 x double> @v_bitcast_v3i64_to_v3f64(<3 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3i64_to_v3f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3i64_to_v3f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3i64_to_v3f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3i64_to_v3f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <3 x i64> %a, splat (i64 3)
+ %a2 = bitcast <3 x i64> %a1 to <3 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x i64> %a to <3 x double>
+ br label %end
+
+end:
+ %phi = phi <3 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x double> %phi
+}
+
+define <3 x i64> @v_bitcast_v3f64_to_v3i64(<3 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3f64_to_v3i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: .LBB11_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3f64_to_v3i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB11_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: .LBB11_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3f64_to_v3i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v6
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB11_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: .LBB11_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3f64_to_v3i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v6
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB11_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: .LBB11_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <3 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <3 x double> %a1 to <3 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x double> %a to <3 x i64>
+ br label %end
+
+end:
+ %phi = phi <3 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x i64> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll
new file mode 100644
index 0000000000000..9fa0ba5ad4a19
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll
@@ -0,0 +1,194 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <7 x float> @v_bitcast_v7i32_to_v7f32(<7 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v7i32_to_v7f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v7i32_to_v7f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v7
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v7i32_to_v7f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v7
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v7i32_to_v7f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v7
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <7 x i32> %a, splat (i32 3)
+ %a2 = bitcast <7 x i32> %a1 to <7 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <7 x i32> %a to <7 x float>
+ br label %end
+
+end:
+ %phi = phi <7 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <7 x float> %phi
+}
+
+define <7 x i32> @v_bitcast_v7f32_to_v7i32(<7 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v7f32_to_v7i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v7f32_to_v7i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v7
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v7f32_to_v7i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v7
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v7f32_to_v7i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v7
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v6, 1.0, v6 :: v_dual_add_f32 v5, 1.0, v5
+; GFX11-NEXT: v_dual_add_f32 v4, 1.0, v4 :: v_dual_add_f32 v3, 1.0, v3
+; GFX11-NEXT: v_dual_add_f32 v2, 1.0, v2 :: v_dual_add_f32 v1, 1.0, v1
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <7 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <7 x float> %a1 to <7 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <7 x float> %a to <7 x i32>
+ br label %end
+
+end:
+ %phi = phi <7 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <7 x i32> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll
new file mode 100644
index 0000000000000..b0fc34b5d3ce8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll
@@ -0,0 +1,9118 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <8 x float> @v_bitcast_v8i32_to_v8f32(<8 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i32_to_v8f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i32_to_v8f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i32_to_v8f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i32_to_v8f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i32> %a, splat (i32 3)
+ %a2 = bitcast <8 x i32> %a1 to <8 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i32> %a to <8 x float>
+ br label %end
+
+end:
+ %phi = phi <8 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x float> %phi
+}
+
+define <8 x i32> @v_bitcast_v8f32_to_v8i32(<8 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f32_to_v8i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f32_to_v8i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f32_to_v8i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f32_to_v8i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <8 x float> %a1 to <8 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x float> %a to <8 x i32>
+ br label %end
+
+end:
+ %phi = phi <8 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i32> %phi
+}
+
+define <4 x i64> @v_bitcast_v8i32_to_v4i64(<8 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i32_to_v4i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i32_to_v4i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i32_to_v4i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i32_to_v4i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB2_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB2_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i32> %a, splat (i32 3)
+ %a2 = bitcast <8 x i32> %a1 to <4 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i32> %a to <4 x i64>
+ br label %end
+
+end:
+ %phi = phi <4 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i64> %phi
+}
+
+define <8 x i32> @v_bitcast_v4i64_to_v8i32(<4 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i64_to_v8i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i64_to_v8i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i64_to_v8i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i64_to_v8i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB3_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB3_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i64> %a, splat (i64 3)
+ %a2 = bitcast <4 x i64> %a1 to <8 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i64> %a to <8 x i32>
+ br label %end
+
+end:
+ %phi = phi <8 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i32> %phi
+}
+
+define <4 x double> @v_bitcast_v8i32_to_v4f64(<8 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i32_to_v4f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i32_to_v4f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i32_to_v4f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i32_to_v4f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB4_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB4_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i32> %a, splat (i32 3)
+ %a2 = bitcast <8 x i32> %a1 to <4 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i32> %a to <4 x double>
+ br label %end
+
+end:
+ %phi = phi <4 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x double> %phi
+}
+
+define <8 x i32> @v_bitcast_v4f64_to_v8i32(<4 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f64_to_v8i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f64_to_v8i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB5_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB5_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f64_to_v8i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB5_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB5_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f64_to_v8i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB5_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB5_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <4 x double> %a1 to <8 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x double> %a to <8 x i32>
+ br label %end
+
+end:
+ %phi = phi <8 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i32> %phi
+}
+
+define <16 x i16> @v_bitcast_v8i32_to_v16i16(<8 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i32_to_v16i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v14, v7
+; GCN-NEXT: v_mov_b32_e32 v12, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v16, v4
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v4, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB6_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB6_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v8, v16
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i32_to_v16i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i32_to_v16i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i32_to_v16i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB6_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB6_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i32> %a, splat (i32 3)
+ %a2 = bitcast <8 x i32> %a1 to <16 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i32> %a to <16 x i16>
+ br label %end
+
+end:
+ %phi = phi <16 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i16> %phi
+}
+
+define <8 x i32> @v_bitcast_v16i16_to_v8i32(<16 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i16_to_v8i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v20, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v4
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v15
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_4
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB7_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v18
+; GCN-NEXT: v_or_b32_e32 v0, v0, v22
+; GCN-NEXT: v_or_b32_e32 v1, v1, v23
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v19
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v14
+; GCN-NEXT: v_or_b32_e32 v2, v2, v16
+; GCN-NEXT: v_or_b32_e32 v3, v3, v21
+; GCN-NEXT: v_or_b32_e32 v4, v4, v9
+; GCN-NEXT: v_or_b32_e32 v5, v5, v11
+; GCN-NEXT: v_or_b32_e32 v6, v6, v13
+; GCN-NEXT: v_or_b32_e32 v7, v7, v15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: .LBB7_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v17
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v14
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_or_b32_e32 v0, v22, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_or_b32_e32 v2, v16, v2
+; GCN-NEXT: v_or_b32_e32 v3, v21, v3
+; GCN-NEXT: v_or_b32_e32 v4, v9, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v13, v6
+; GCN-NEXT: v_or_b32_e32 v7, v15, v7
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 0x30000, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 0x30000, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 0x30000, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i16_to_v8i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB7_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v9, 3
+; VI-NEXT: v_add_u16_e32 v8, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v8, v7
+; VI-NEXT: v_add_u16_e32 v8, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v8, v6
+; VI-NEXT: v_add_u16_e32 v8, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v8, v5
+; VI-NEXT: v_add_u16_e32 v8, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v8, v4
+; VI-NEXT: v_add_u16_e32 v8, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v8, v3
+; VI-NEXT: v_add_u16_e32 v8, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v8, v2
+; VI-NEXT: v_add_u16_e32 v8, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v8, v1
+; VI-NEXT: v_add_u16_e32 v8, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v8, v0
+; VI-NEXT: .LBB7_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i16_to_v8i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i16_to_v8i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB7_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB7_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i16> %a, splat (i16 3)
+ %a2 = bitcast <16 x i16> %a1 to <8 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i16> %a to <8 x i32>
+ br label %end
+
+end:
+ %phi = phi <8 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i32> %phi
+}
+
+define <16 x half> @v_bitcast_v8i32_to_v16f16(<8 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i32_to_v16f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v17, v7
+; GCN-NEXT: v_mov_b32_e32 v18, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v5
+; GCN-NEXT: v_mov_b32_e32 v20, v4
+; GCN-NEXT: v_mov_b32_e32 v21, v3
+; GCN-NEXT: v_mov_b32_e32 v22, v2
+; GCN-NEXT: v_mov_b32_e32 v23, v1
+; GCN-NEXT: v_mov_b32_e32 v16, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_4
+; GCN-NEXT: .LBB8_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB8_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v24, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: .LBB8_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v23
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v16, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v18, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i32_to_v16f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i32_to_v16f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i32_to_v16f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB8_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB8_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i32> %a, splat (i32 3)
+ %a2 = bitcast <8 x i32> %a1 to <16 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i32> %a to <16 x half>
+ br label %end
+
+end:
+ %phi = phi <16 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x half> %phi
+}
+
+define <8 x i32> @v_bitcast_v16f16_to_v8i32(<16 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f16_to_v8i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_4
+; GCN-NEXT: .LBB9_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB9_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v24
+; GCN-NEXT: v_or_b32_e32 v0, v25, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v19, v2
+; GCN-NEXT: v_or_b32_e32 v3, v17, v3
+; GCN-NEXT: v_or_b32_e32 v4, v16, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v9, v6
+; GCN-NEXT: v_or_b32_e32 v7, v8, v7
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: .LBB9_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v11, v12
+; GCN-NEXT: v_or_b32_e32 v6, v9, v13
+; GCN-NEXT: v_or_b32_e32 v7, v8, v10
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f16_to_v8i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB9_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v8, 0x200
+; VI-NEXT: v_add_f16_sdwa v9, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v9
+; VI-NEXT: v_add_f16_sdwa v9, v6, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v9
+; VI-NEXT: v_add_f16_sdwa v9, v5, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v9
+; VI-NEXT: v_add_f16_sdwa v9, v4, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v9
+; VI-NEXT: v_add_f16_sdwa v9, v3, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v9
+; VI-NEXT: v_add_f16_sdwa v9, v2, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v9
+; VI-NEXT: v_add_f16_sdwa v9, v1, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v8, v0, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v9
+; VI-NEXT: v_or_b32_e32 v0, v0, v8
+; VI-NEXT: .LBB9_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f16_to_v8i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f16_to_v8i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB9_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB9_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <16 x half> %a1 to <8 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x half> %a to <8 x i32>
+ br label %end
+
+end:
+ %phi = phi <8 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i32> %phi
+}
+
+define <16 x bfloat> @v_bitcast_v8i32_to_v16bf16(<8 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i32_to_v16bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v23, v7
+; GCN-NEXT: v_mov_b32_e32 v22, v6
+; GCN-NEXT: v_mov_b32_e32 v21, v5
+; GCN-NEXT: v_mov_b32_e32 v20, v4
+; GCN-NEXT: v_mov_b32_e32 v19, v3
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v1
+; GCN-NEXT: v_mov_b32_e32 v16, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_4
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB10_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v23
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v22
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v20
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v19
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v18
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: .LBB10_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v17
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v23
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i32_to_v16bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i32_to_v16bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i32_to_v16bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB10_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB10_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i32> %a, splat (i32 3)
+ %a2 = bitcast <8 x i32> %a1 to <16 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i32> %a to <16 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <16 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x bfloat> %phi
+}
+
+define <8 x i32> @v_bitcast_v16bf16_to_v8i32(<16 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16bf16_to_v8i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v26, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v24, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_4
+; GCN-NEXT: .LBB11_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB11_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v23
+; GCN-NEXT: v_alignbit_b32 v0, v0, v26, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v24, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v2, v20, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v18, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: .LBB11_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v26
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v24
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v20
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v12, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v13, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v9, v8, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16bf16_to_v8i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB11_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v7, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v7
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v6
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v5
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v4
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v3
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v2
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v1
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v0
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v8, 16
+; VI-NEXT: .LBB11_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16bf16_to_v8i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB11_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v7, v7, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v8, s7
+; GFX9-NEXT: .LBB11_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16bf16_to_v8i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB11_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v9 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_bfe_u32 v10, v8, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v15, v6, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_add3_u32 v13, v13, v9, 0x7fff
+; GFX11-NEXT: v_add3_u32 v10, v10, v8, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v10, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v15, v6, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_bfe_u32 v12, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v12, v12, v7, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v7, v12, v14 :: v_dual_lshlrev_b32 v12, 16, v5
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v7, v7, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v6
+; GFX11-NEXT: v_dual_cndmask_b32 v9, v13, v10 :: v_dual_add_f32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v8, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v12, v10, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v11, v14 :: v_dual_lshlrev_b32 v11, 16, v4
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_add3_u32 v8, v8, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v9, 0x7060302
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v11 :: v_dual_add_f32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_add3_u32 v11, v12, v10, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v14, v9, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX11-NEXT: v_lshlrev_b32_e32 v12, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v8, v13, vcc_lo
+; GFX11-NEXT: v_add3_u32 v8, v14, v9, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v12 :: v_dual_lshlrev_b32 v12, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: v_perm_b32 v5, v5, v10, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v10, v4, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v8, v11, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v4
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v10, v10, v4, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v10, v11, vcc_lo
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v11, v13, v9, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v9
+; GFX11-NEXT: v_bfe_u32 v13, v3, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_bfe_u32 v14, v10, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_add3_u32 v13, v14, v10, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v14, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v3, v11, v12 :: v_dual_add_f32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_perm_b32 v4, v4, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v10
+; GFX11-NEXT: v_bfe_u32 v16, v2, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_bfe_u32 v14, v11, 16, 1
+; GFX11-NEXT: v_perm_b32 v3, v3, v9, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add3_u32 v12, v16, v2, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v13, v15 :: v_dual_lshlrev_b32 v15, 16, v0
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v12, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v15
+; GFX11-NEXT: v_add3_u32 v13, v14, v11, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v11
+; GFX11-NEXT: v_bfe_u32 v15, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_bfe_u32 v16, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v12
+; GFX11-NEXT: v_perm_b32 v2, v2, v10, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v13, v14, vcc_lo
+; GFX11-NEXT: v_add3_u32 v14, v15, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v16, v16, v12, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v14, v15 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_perm_b32 v1, v1, v11, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v13, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v16, v17, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v13, v13, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v13, v18, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v12, 0x7060302
+; GFX11-NEXT: .LBB11_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <16 x bfloat> %a1 to <8 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x bfloat> %a to <8 x i32>
+ br label %end
+
+end:
+ %phi = phi <8 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i32> %phi
+}
+
+define <4 x i64> @v_bitcast_v8f32_to_v4i64(<8 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f32_to_v4i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB12_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB12_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f32_to_v4i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f32_to_v4i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f32_to_v4i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <8 x float> %a1 to <4 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x float> %a to <4 x i64>
+ br label %end
+
+end:
+ %phi = phi <4 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i64> %phi
+}
+
+define <8 x float> @v_bitcast_v4i64_to_v8f32(<4 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i64_to_v8f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB13_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB13_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i64_to_v8f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i64_to_v8f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i64_to_v8f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB13_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB13_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i64> %a, splat (i64 3)
+ %a2 = bitcast <4 x i64> %a1 to <8 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i64> %a to <8 x float>
+ br label %end
+
+end:
+ %phi = phi <8 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x float> %phi
+}
+
+define <4 x double> @v_bitcast_v8f32_to_v4f64(<8 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f32_to_v4f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB14_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB14_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f32_to_v4f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f32_to_v4f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f32_to_v4f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <8 x float> %a1 to <4 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x float> %a to <4 x double>
+ br label %end
+
+end:
+ %phi = phi <4 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x double> %phi
+}
+
+define <8 x float> @v_bitcast_v4f64_to_v8f32(<4 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f64_to_v8f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB15_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB15_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f64_to_v8f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB15_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB15_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f64_to_v8f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB15_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB15_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f64_to_v8f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB15_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB15_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <4 x double> %a1 to <8 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x double> %a to <8 x float>
+ br label %end
+
+end:
+ %phi = phi <8 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x float> %phi
+}
+
+define <16 x i16> @v_bitcast_v8f32_to_v16i16(<8 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f32_to_v16i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v14, v7
+; GCN-NEXT: v_mov_b32_e32 v12, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v16, v4
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v4, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB16_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v16, 1.0, v16
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB16_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v8, v16
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f32_to_v16i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f32_to_v16i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f32_to_v16i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <8 x float> %a1 to <16 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x float> %a to <16 x i16>
+ br label %end
+
+end:
+ %phi = phi <16 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i16> %phi
+}
+
+define <8 x float> @v_bitcast_v16i16_to_v8f32(<16 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i16_to_v8f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v20, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v4
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v15
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_4
+; GCN-NEXT: .LBB17_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB17_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v18
+; GCN-NEXT: v_or_b32_e32 v0, v0, v22
+; GCN-NEXT: v_or_b32_e32 v1, v1, v23
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v19
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v14
+; GCN-NEXT: v_or_b32_e32 v2, v2, v16
+; GCN-NEXT: v_or_b32_e32 v3, v3, v21
+; GCN-NEXT: v_or_b32_e32 v4, v4, v9
+; GCN-NEXT: v_or_b32_e32 v5, v5, v11
+; GCN-NEXT: v_or_b32_e32 v6, v6, v13
+; GCN-NEXT: v_or_b32_e32 v7, v7, v15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB17_2
+; GCN-NEXT: .LBB17_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v17
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v14
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_or_b32_e32 v0, v22, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_or_b32_e32 v2, v16, v2
+; GCN-NEXT: v_or_b32_e32 v3, v21, v3
+; GCN-NEXT: v_or_b32_e32 v4, v9, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v13, v6
+; GCN-NEXT: v_or_b32_e32 v7, v15, v7
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 0x30000, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 0x30000, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 0x30000, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i16_to_v8f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB17_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v9, 3
+; VI-NEXT: v_add_u16_e32 v8, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v8, v7
+; VI-NEXT: v_add_u16_e32 v8, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v8, v6
+; VI-NEXT: v_add_u16_e32 v8, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v8, v5
+; VI-NEXT: v_add_u16_e32 v8, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v8, v4
+; VI-NEXT: v_add_u16_e32 v8, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v8, v3
+; VI-NEXT: v_add_u16_e32 v8, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v8, v2
+; VI-NEXT: v_add_u16_e32 v8, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v8, v1
+; VI-NEXT: v_add_u16_e32 v8, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v8, v0
+; VI-NEXT: .LBB17_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i16_to_v8f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i16_to_v8f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB17_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB17_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i16> %a, splat (i16 3)
+ %a2 = bitcast <16 x i16> %a1 to <8 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i16> %a to <8 x float>
+ br label %end
+
+end:
+ %phi = phi <8 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x float> %phi
+}
+
+define <16 x half> @v_bitcast_v8f32_to_v16f16(<8 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f32_to_v16f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v17, v7
+; GCN-NEXT: v_mov_b32_e32 v18, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v5
+; GCN-NEXT: v_mov_b32_e32 v20, v4
+; GCN-NEXT: v_mov_b32_e32 v21, v3
+; GCN-NEXT: v_mov_b32_e32 v22, v2
+; GCN-NEXT: v_mov_b32_e32 v23, v1
+; GCN-NEXT: v_mov_b32_e32 v16, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB18_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB18_4
+; GCN-NEXT: .LBB18_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB18_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v24, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_2
+; GCN-NEXT: .LBB18_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v16
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v23
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v22
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v21
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v20
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v19
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v18
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v16, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v18, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f32_to_v16f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f32_to_v16f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f32_to_v16f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <8 x float> %a1 to <16 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x float> %a to <16 x half>
+ br label %end
+
+end:
+ %phi = phi <16 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x half> %phi
+}
+
+define <8 x float> @v_bitcast_v16f16_to_v8f32(<16 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f16_to_v8f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_4
+; GCN-NEXT: .LBB19_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB19_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v24
+; GCN-NEXT: v_or_b32_e32 v0, v25, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v19, v2
+; GCN-NEXT: v_or_b32_e32 v3, v17, v3
+; GCN-NEXT: v_or_b32_e32 v4, v16, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v9, v6
+; GCN-NEXT: v_or_b32_e32 v7, v8, v7
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB19_2
+; GCN-NEXT: .LBB19_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v11, v12
+; GCN-NEXT: v_or_b32_e32 v6, v9, v13
+; GCN-NEXT: v_or_b32_e32 v7, v8, v10
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f16_to_v8f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB19_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v8, 0x200
+; VI-NEXT: v_add_f16_sdwa v9, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v9
+; VI-NEXT: v_add_f16_sdwa v9, v6, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v9
+; VI-NEXT: v_add_f16_sdwa v9, v5, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v9
+; VI-NEXT: v_add_f16_sdwa v9, v4, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v9
+; VI-NEXT: v_add_f16_sdwa v9, v3, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v9
+; VI-NEXT: v_add_f16_sdwa v9, v2, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v9
+; VI-NEXT: v_add_f16_sdwa v9, v1, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v8, v0, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v9
+; VI-NEXT: v_or_b32_e32 v0, v0, v8
+; VI-NEXT: .LBB19_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f16_to_v8f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f16_to_v8f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB19_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB19_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <16 x half> %a1 to <8 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x half> %a to <8 x float>
+ br label %end
+
+end:
+ %phi = phi <8 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x float> %phi
+}
+
+define <16 x bfloat> @v_bitcast_v8f32_to_v16bf16(<8 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f32_to_v16bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v23, v7
+; GCN-NEXT: v_mov_b32_e32 v22, v6
+; GCN-NEXT: v_mov_b32_e32 v21, v5
+; GCN-NEXT: v_mov_b32_e32 v20, v4
+; GCN-NEXT: v_mov_b32_e32 v19, v3
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v1
+; GCN-NEXT: v_mov_b32_e32 v16, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB20_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB20_4
+; GCN-NEXT: .LBB20_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB20_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v23
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v22
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v20
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v19
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v18
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB20_2
+; GCN-NEXT: .LBB20_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v16
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v17
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v18
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v19
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v20
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v21
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v22
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v23
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f32_to_v16bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f32_to_v16bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f32_to_v16bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <8 x float> %a1 to <16 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x float> %a to <16 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <16 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x bfloat> %phi
+}
+
+define <8 x float> @v_bitcast_v16bf16_to_v8f32(<16 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16bf16_to_v8f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v26, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v24, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB21_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB21_4
+; GCN-NEXT: .LBB21_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB21_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v23
+; GCN-NEXT: v_alignbit_b32 v0, v0, v26, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v24, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v2, v20, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v18, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB21_2
+; GCN-NEXT: .LBB21_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v26
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v24
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v20
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v12, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v13, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v9, v8, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16bf16_to_v8f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB21_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v7, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v7
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v6
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v5
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v4
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v3
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v2
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v1
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v0
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v8, 16
+; VI-NEXT: .LBB21_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16bf16_to_v8f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB21_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v7, v7, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v8, s7
+; GFX9-NEXT: .LBB21_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16bf16_to_v8f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB21_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v9 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_bfe_u32 v10, v8, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v15, v6, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_add3_u32 v13, v13, v9, 0x7fff
+; GFX11-NEXT: v_add3_u32 v10, v10, v8, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v10, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v15, v6, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_bfe_u32 v12, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v12, v12, v7, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v7, v12, v14 :: v_dual_lshlrev_b32 v12, 16, v5
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v7, v7, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v6
+; GFX11-NEXT: v_dual_cndmask_b32 v9, v13, v10 :: v_dual_add_f32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v8, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v12, v10, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v11, v14 :: v_dual_lshlrev_b32 v11, 16, v4
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_add3_u32 v8, v8, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v9, 0x7060302
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v11 :: v_dual_add_f32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_add3_u32 v11, v12, v10, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v14, v9, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX11-NEXT: v_lshlrev_b32_e32 v12, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v8, v13, vcc_lo
+; GFX11-NEXT: v_add3_u32 v8, v14, v9, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v12 :: v_dual_lshlrev_b32 v12, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: v_perm_b32 v5, v5, v10, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v10, v4, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v8, v11, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v4
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v10, v10, v4, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v10, v11, vcc_lo
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v11, v13, v9, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v9
+; GFX11-NEXT: v_bfe_u32 v13, v3, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_bfe_u32 v14, v10, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_add3_u32 v13, v14, v10, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v14, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v3, v11, v12 :: v_dual_add_f32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_perm_b32 v4, v4, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v10
+; GFX11-NEXT: v_bfe_u32 v16, v2, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_bfe_u32 v14, v11, 16, 1
+; GFX11-NEXT: v_perm_b32 v3, v3, v9, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add3_u32 v12, v16, v2, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v13, v15 :: v_dual_lshlrev_b32 v15, 16, v0
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v12, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v15
+; GFX11-NEXT: v_add3_u32 v13, v14, v11, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v11
+; GFX11-NEXT: v_bfe_u32 v15, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_bfe_u32 v16, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v12
+; GFX11-NEXT: v_perm_b32 v2, v2, v10, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v13, v14, vcc_lo
+; GFX11-NEXT: v_add3_u32 v14, v15, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v16, v16, v12, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v14, v15 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_perm_b32 v1, v1, v11, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v13, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v16, v17, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v13, v13, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v13, v18, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v12, 0x7060302
+; GFX11-NEXT: .LBB21_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <16 x bfloat> %a1 to <8 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x bfloat> %a to <8 x float>
+ br label %end
+
+end:
+ %phi = phi <8 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x float> %phi
+}
+
+define <4 x double> @v_bitcast_v4i64_to_v4f64(<4 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i64_to_v4f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB22_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: .LBB22_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i64_to_v4f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i64_to_v4f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i64_to_v4f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB22_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: .LBB22_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i64> %a, splat (i64 3)
+ %a2 = bitcast <4 x i64> %a1 to <4 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i64> %a to <4 x double>
+ br label %end
+
+end:
+ %phi = phi <4 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x double> %phi
+}
+
+define <4 x i64> @v_bitcast_v4f64_to_v4i64(<4 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f64_to_v4i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB23_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: .LBB23_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f64_to_v4i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB23_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: .LBB23_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f64_to_v4i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB23_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: .LBB23_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f64_to_v4i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB23_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: .LBB23_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <4 x double> %a1 to <4 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x double> %a to <4 x i64>
+ br label %end
+
+end:
+ %phi = phi <4 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i64> %phi
+}
+
+define <16 x i16> @v_bitcast_v4i64_to_v16i16(<4 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i64_to_v16i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v14, v7
+; GCN-NEXT: v_mov_b32_e32 v12, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v16, v4
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v4, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB24_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v2, vcc, 0, v2, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v6, vcc, 0, v6, vcc
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v16
+; GCN-NEXT: v_addc_u32_e32 v10, vcc, 0, v10, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v14, vcc, 0, v14, vcc
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB24_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v8, v16
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i64_to_v16i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i64_to_v16i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i64_to_v16i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB24_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB24_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i64> %a, splat (i64 3)
+ %a2 = bitcast <4 x i64> %a1 to <16 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i64> %a to <16 x i16>
+ br label %end
+
+end:
+ %phi = phi <16 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i16> %phi
+}
+
+define <4 x i64> @v_bitcast_v16i16_to_v4i64(<16 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i16_to_v4i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v20, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v4
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v15
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_4
+; GCN-NEXT: .LBB25_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB25_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v18
+; GCN-NEXT: v_or_b32_e32 v0, v0, v22
+; GCN-NEXT: v_or_b32_e32 v1, v1, v23
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v19
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v14
+; GCN-NEXT: v_or_b32_e32 v2, v2, v16
+; GCN-NEXT: v_or_b32_e32 v3, v3, v21
+; GCN-NEXT: v_or_b32_e32 v4, v4, v9
+; GCN-NEXT: v_or_b32_e32 v5, v5, v11
+; GCN-NEXT: v_or_b32_e32 v6, v6, v13
+; GCN-NEXT: v_or_b32_e32 v7, v7, v15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB25_2
+; GCN-NEXT: .LBB25_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v17
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v14
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_or_b32_e32 v0, v22, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_or_b32_e32 v2, v16, v2
+; GCN-NEXT: v_or_b32_e32 v3, v21, v3
+; GCN-NEXT: v_or_b32_e32 v4, v9, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v13, v6
+; GCN-NEXT: v_or_b32_e32 v7, v15, v7
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 0x30000, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 0x30000, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 0x30000, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i16_to_v4i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB25_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v9, 3
+; VI-NEXT: v_add_u16_e32 v8, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v8, v7
+; VI-NEXT: v_add_u16_e32 v8, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v8, v6
+; VI-NEXT: v_add_u16_e32 v8, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v8, v5
+; VI-NEXT: v_add_u16_e32 v8, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v8, v4
+; VI-NEXT: v_add_u16_e32 v8, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v8, v3
+; VI-NEXT: v_add_u16_e32 v8, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v8, v2
+; VI-NEXT: v_add_u16_e32 v8, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v8, v1
+; VI-NEXT: v_add_u16_e32 v8, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v8, v0
+; VI-NEXT: .LBB25_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i16_to_v4i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i16_to_v4i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB25_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB25_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i16> %a, splat (i16 3)
+ %a2 = bitcast <16 x i16> %a1 to <4 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i16> %a to <4 x i64>
+ br label %end
+
+end:
+ %phi = phi <4 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i64> %phi
+}
+
+define <16 x half> @v_bitcast_v4i64_to_v16f16(<4 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i64_to_v16f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v18, v7
+; GCN-NEXT: v_mov_b32_e32 v17, v6
+; GCN-NEXT: v_mov_b32_e32 v20, v5
+; GCN-NEXT: v_mov_b32_e32 v19, v4
+; GCN-NEXT: v_mov_b32_e32 v22, v3
+; GCN-NEXT: v_mov_b32_e32 v21, v2
+; GCN-NEXT: v_mov_b32_e32 v23, v1
+; GCN-NEXT: v_mov_b32_e32 v16, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB26_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB26_4
+; GCN-NEXT: .LBB26_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB26_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v24, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB26_2
+; GCN-NEXT: .LBB26_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v16
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v23, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v21
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v22, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v19
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v20, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v17
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v18, vcc
+; GCN-NEXT: v_lshrrev_b32_e32 v16, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v18, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i64_to_v16f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i64_to_v16f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i64_to_v16f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB26_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB26_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i64> %a, splat (i64 3)
+ %a2 = bitcast <4 x i64> %a1 to <16 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i64> %a to <16 x half>
+ br label %end
+
+end:
+ %phi = phi <16 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x half> %phi
+}
+
+define <4 x i64> @v_bitcast_v16f16_to_v4i64(<16 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f16_to_v4i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB27_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB27_4
+; GCN-NEXT: .LBB27_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB27_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v24
+; GCN-NEXT: v_or_b32_e32 v0, v25, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v19, v2
+; GCN-NEXT: v_or_b32_e32 v3, v17, v3
+; GCN-NEXT: v_or_b32_e32 v4, v16, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v9, v6
+; GCN-NEXT: v_or_b32_e32 v7, v8, v7
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB27_2
+; GCN-NEXT: .LBB27_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v11, v12
+; GCN-NEXT: v_or_b32_e32 v6, v9, v13
+; GCN-NEXT: v_or_b32_e32 v7, v8, v10
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f16_to_v4i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB27_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v8, 0x200
+; VI-NEXT: v_add_f16_sdwa v9, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v9
+; VI-NEXT: v_add_f16_sdwa v9, v6, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v9
+; VI-NEXT: v_add_f16_sdwa v9, v5, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v9
+; VI-NEXT: v_add_f16_sdwa v9, v4, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v9
+; VI-NEXT: v_add_f16_sdwa v9, v3, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v9
+; VI-NEXT: v_add_f16_sdwa v9, v2, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v9
+; VI-NEXT: v_add_f16_sdwa v9, v1, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v8, v0, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v9
+; VI-NEXT: v_or_b32_e32 v0, v0, v8
+; VI-NEXT: .LBB27_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f16_to_v4i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f16_to_v4i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB27_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB27_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <16 x half> %a1 to <4 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x half> %a to <4 x i64>
+ br label %end
+
+end:
+ %phi = phi <4 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i64> %phi
+}
+
+define <16 x bfloat> @v_bitcast_v4i64_to_v16bf16(<4 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i64_to_v16bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v23, v7
+; GCN-NEXT: v_mov_b32_e32 v22, v6
+; GCN-NEXT: v_mov_b32_e32 v21, v5
+; GCN-NEXT: v_mov_b32_e32 v20, v4
+; GCN-NEXT: v_mov_b32_e32 v19, v3
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v1
+; GCN-NEXT: v_mov_b32_e32 v16, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_4
+; GCN-NEXT: .LBB28_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB28_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v23
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v22
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v20
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v19
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v18
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v16
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB28_2
+; GCN-NEXT: .LBB28_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v16
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v17, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v18
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v19, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v20
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v21, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v22
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v23, vcc
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i64_to_v16bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i64_to_v16bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i64_to_v16bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB28_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB28_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i64> %a, splat (i64 3)
+ %a2 = bitcast <4 x i64> %a1 to <16 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i64> %a to <16 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <16 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x bfloat> %phi
+}
+
+define <4 x i64> @v_bitcast_v16bf16_to_v4i64(<16 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16bf16_to_v4i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v26, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v24, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB29_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB29_4
+; GCN-NEXT: .LBB29_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB29_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v23
+; GCN-NEXT: v_alignbit_b32 v0, v0, v26, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v24, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v2, v20, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v18, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB29_2
+; GCN-NEXT: .LBB29_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v26
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v24
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v20
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v12, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v13, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v9, v8, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16bf16_to_v4i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB29_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v7, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v7
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v6
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v5
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v4
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v3
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v2
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v1
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v0
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v8, 16
+; VI-NEXT: .LBB29_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16bf16_to_v4i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB29_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v7, v7, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v8, s7
+; GFX9-NEXT: .LBB29_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16bf16_to_v4i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB29_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v9 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_bfe_u32 v10, v8, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v15, v6, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_add3_u32 v13, v13, v9, 0x7fff
+; GFX11-NEXT: v_add3_u32 v10, v10, v8, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v10, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v15, v6, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_bfe_u32 v12, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v12, v12, v7, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v7, v12, v14 :: v_dual_lshlrev_b32 v12, 16, v5
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v7, v7, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v6
+; GFX11-NEXT: v_dual_cndmask_b32 v9, v13, v10 :: v_dual_add_f32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v8, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v12, v10, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v11, v14 :: v_dual_lshlrev_b32 v11, 16, v4
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_add3_u32 v8, v8, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v9, 0x7060302
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v11 :: v_dual_add_f32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_add3_u32 v11, v12, v10, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v14, v9, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX11-NEXT: v_lshlrev_b32_e32 v12, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v8, v13, vcc_lo
+; GFX11-NEXT: v_add3_u32 v8, v14, v9, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v12 :: v_dual_lshlrev_b32 v12, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: v_perm_b32 v5, v5, v10, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v10, v4, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v8, v11, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v4
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v10, v10, v4, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v10, v11, vcc_lo
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v11, v13, v9, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v9
+; GFX11-NEXT: v_bfe_u32 v13, v3, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_bfe_u32 v14, v10, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_add3_u32 v13, v14, v10, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v14, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v3, v11, v12 :: v_dual_add_f32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_perm_b32 v4, v4, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v10
+; GFX11-NEXT: v_bfe_u32 v16, v2, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_bfe_u32 v14, v11, 16, 1
+; GFX11-NEXT: v_perm_b32 v3, v3, v9, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add3_u32 v12, v16, v2, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v13, v15 :: v_dual_lshlrev_b32 v15, 16, v0
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v12, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v15
+; GFX11-NEXT: v_add3_u32 v13, v14, v11, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v11
+; GFX11-NEXT: v_bfe_u32 v15, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_bfe_u32 v16, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v12
+; GFX11-NEXT: v_perm_b32 v2, v2, v10, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v13, v14, vcc_lo
+; GFX11-NEXT: v_add3_u32 v14, v15, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v16, v16, v12, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v14, v15 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_perm_b32 v1, v1, v11, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v13, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v16, v17, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v13, v13, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v13, v18, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v12, 0x7060302
+; GFX11-NEXT: .LBB29_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <16 x bfloat> %a1 to <4 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x bfloat> %a to <4 x i64>
+ br label %end
+
+end:
+ %phi = phi <4 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i64> %phi
+}
+
+define <16 x i16> @v_bitcast_v4f64_to_v16i16(<4 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f64_to_v16i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v17, v7
+; GCN-NEXT: v_mov_b32_e32 v16, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v5
+; GCN-NEXT: v_mov_b32_e32 v18, v4
+; GCN-NEXT: v_mov_b32_e32 v21, v3
+; GCN-NEXT: v_mov_b32_e32 v20, v2
+; GCN-NEXT: v_mov_b32_e32 v23, v1
+; GCN-NEXT: v_mov_b32_e32 v22, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v13, v17, v16, 16
+; GCN-NEXT: v_alignbit_b32 v9, v19, v18, 16
+; GCN-NEXT: v_alignbit_b32 v5, v21, v20, 16
+; GCN-NEXT: v_alignbit_b32 v1, v23, v22, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v23
+; GCN-NEXT: .LBB30_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[22:23], v[22:23], 1.0
+; GCN-NEXT: v_add_f64 v[20:21], v[20:21], 1.0
+; GCN-NEXT: v_add_f64 v[18:19], v[18:19], 1.0
+; GCN-NEXT: v_add_f64 v[16:17], v[16:17], 1.0
+; GCN-NEXT: v_alignbit_b32 v13, v17, v16, 16
+; GCN-NEXT: v_alignbit_b32 v9, v19, v18, 16
+; GCN-NEXT: v_alignbit_b32 v5, v21, v20, 16
+; GCN-NEXT: v_alignbit_b32 v1, v23, v22, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v23
+; GCN-NEXT: .LBB30_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v22
+; GCN-NEXT: v_mov_b32_e32 v2, v23
+; GCN-NEXT: v_mov_b32_e32 v4, v20
+; GCN-NEXT: v_mov_b32_e32 v6, v21
+; GCN-NEXT: v_mov_b32_e32 v8, v18
+; GCN-NEXT: v_mov_b32_e32 v10, v19
+; GCN-NEXT: v_mov_b32_e32 v12, v16
+; GCN-NEXT: v_mov_b32_e32 v14, v17
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f64_to_v16i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB30_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB30_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f64_to_v16i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB30_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB30_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f64_to_v16i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB30_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB30_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <4 x double> %a1 to <16 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x double> %a to <16 x i16>
+ br label %end
+
+end:
+ %phi = phi <16 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i16> %phi
+}
+
+define <4 x double> @v_bitcast_v16i16_to_v4f64(<16 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i16_to_v4f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v20, v6
+; GCN-NEXT: v_mov_b32_e32 v19, v4
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v15
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_4
+; GCN-NEXT: .LBB31_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB31_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v17
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v18
+; GCN-NEXT: v_or_b32_e32 v0, v0, v22
+; GCN-NEXT: v_or_b32_e32 v1, v1, v23
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v19
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v14
+; GCN-NEXT: v_or_b32_e32 v2, v2, v16
+; GCN-NEXT: v_or_b32_e32 v3, v3, v21
+; GCN-NEXT: v_or_b32_e32 v4, v4, v9
+; GCN-NEXT: v_or_b32_e32 v5, v5, v11
+; GCN-NEXT: v_or_b32_e32 v6, v6, v13
+; GCN-NEXT: v_or_b32_e32 v7, v7, v15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB31_2
+; GCN-NEXT: .LBB31_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v17
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v14
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_or_b32_e32 v0, v22, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_or_b32_e32 v2, v16, v2
+; GCN-NEXT: v_or_b32_e32 v3, v21, v3
+; GCN-NEXT: v_or_b32_e32 v4, v9, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v13, v6
+; GCN-NEXT: v_or_b32_e32 v7, v15, v7
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 0x30000, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 0x30000, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 0x30000, v7
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i16_to_v4f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB31_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v9, 3
+; VI-NEXT: v_add_u16_e32 v8, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v8, v7
+; VI-NEXT: v_add_u16_e32 v8, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v8, v6
+; VI-NEXT: v_add_u16_e32 v8, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v8, v5
+; VI-NEXT: v_add_u16_e32 v8, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v8, v4
+; VI-NEXT: v_add_u16_e32 v8, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v8, v3
+; VI-NEXT: v_add_u16_e32 v8, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v8, v2
+; VI-NEXT: v_add_u16_e32 v8, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v8, v1
+; VI-NEXT: v_add_u16_e32 v8, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v8, v0
+; VI-NEXT: .LBB31_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i16_to_v4f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i16_to_v4f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB31_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB31_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i16> %a, splat (i16 3)
+ %a2 = bitcast <16 x i16> %a1 to <4 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i16> %a to <4 x double>
+ br label %end
+
+end:
+ %phi = phi <4 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x double> %phi
+}
+
+define <16 x half> @v_bitcast_v4f64_to_v16f16(<4 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f64_to_v16f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v24, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: .LBB32_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: .LBB32_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v19
+; GCN-NEXT: v_mov_b32_e32 v1, v23
+; GCN-NEXT: v_mov_b32_e32 v2, v16
+; GCN-NEXT: v_mov_b32_e32 v3, v22
+; GCN-NEXT: v_mov_b32_e32 v4, v17
+; GCN-NEXT: v_mov_b32_e32 v5, v21
+; GCN-NEXT: v_mov_b32_e32 v6, v18
+; GCN-NEXT: v_mov_b32_e32 v7, v20
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f64_to_v16f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB32_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB32_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f64_to_v16f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB32_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB32_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f64_to_v16f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB32_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB32_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <4 x double> %a1 to <16 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x double> %a to <16 x half>
+ br label %end
+
+end:
+ %phi = phi <16 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x half> %phi
+}
+
+define <4 x double> @v_bitcast_v16f16_to_v4f64(<16 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f16_to_v4f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB33_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB33_4
+; GCN-NEXT: .LBB33_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB33_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v24
+; GCN-NEXT: v_or_b32_e32 v0, v25, v0
+; GCN-NEXT: v_or_b32_e32 v1, v23, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v19, v2
+; GCN-NEXT: v_or_b32_e32 v3, v17, v3
+; GCN-NEXT: v_or_b32_e32 v4, v16, v4
+; GCN-NEXT: v_or_b32_e32 v5, v11, v5
+; GCN-NEXT: v_or_b32_e32 v6, v9, v6
+; GCN-NEXT: v_or_b32_e32 v7, v8, v7
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB33_2
+; GCN-NEXT: .LBB33_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v11, v12
+; GCN-NEXT: v_or_b32_e32 v6, v9, v13
+; GCN-NEXT: v_or_b32_e32 v7, v8, v10
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f16_to_v4f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB33_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v8, 0x200
+; VI-NEXT: v_add_f16_sdwa v9, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v9
+; VI-NEXT: v_add_f16_sdwa v9, v6, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v9
+; VI-NEXT: v_add_f16_sdwa v9, v5, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v9
+; VI-NEXT: v_add_f16_sdwa v9, v4, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v9
+; VI-NEXT: v_add_f16_sdwa v9, v3, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v9
+; VI-NEXT: v_add_f16_sdwa v9, v2, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v9
+; VI-NEXT: v_add_f16_sdwa v9, v1, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v8, v0, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v9
+; VI-NEXT: v_or_b32_e32 v0, v0, v8
+; VI-NEXT: .LBB33_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f16_to_v4f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f16_to_v4f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB33_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB33_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <16 x half> %a1 to <4 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x half> %a to <4 x double>
+ br label %end
+
+end:
+ %phi = phi <4 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x double> %phi
+}
+
+define <16 x bfloat> @v_bitcast_v4f64_to_v16bf16(<4 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f64_to_v16bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v3
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v2
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v1
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: .LBB34_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v3
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v2
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v1
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v0
+; GCN-NEXT: .LBB34_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v23
+; GCN-NEXT: v_mov_b32_e32 v1, v22
+; GCN-NEXT: v_mov_b32_e32 v2, v21
+; GCN-NEXT: v_mov_b32_e32 v3, v20
+; GCN-NEXT: v_mov_b32_e32 v4, v19
+; GCN-NEXT: v_mov_b32_e32 v5, v18
+; GCN-NEXT: v_mov_b32_e32 v6, v17
+; GCN-NEXT: v_mov_b32_e32 v7, v16
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f64_to_v16bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB34_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB34_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f64_to_v16bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB34_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB34_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f64_to_v16bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB34_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB34_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <4 x double> %a1 to <16 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x double> %a to <16 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <16 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x bfloat> %phi
+}
+
+define <4 x double> @v_bitcast_v16bf16_to_v4f64(<16 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16bf16_to_v4f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v26, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v24, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v11, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v10, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v9, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v8, 1.0, v14
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB35_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB35_4
+; GCN-NEXT: .LBB35_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB35_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v23
+; GCN-NEXT: v_alignbit_b32 v0, v0, v26, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v24, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v2, v20, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v18, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB35_2
+; GCN-NEXT: .LBB35_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v26
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v24
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v23
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v20
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v12, v11, 16
+; GCN-NEXT: v_alignbit_b32 v6, v13, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v9, v8, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16bf16_to_v4f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB35_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v7, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v7
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v6
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v5
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v4
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v3
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v2
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v1
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v8, 16
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v0
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v8, 16
+; VI-NEXT: .LBB35_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16bf16_to_v4f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB35_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v9, v10, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v7, v7, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v8, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v8, s7
+; GFX9-NEXT: .LBB35_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16bf16_to_v4f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB35_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v8, 16, v7
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v9 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_bfe_u32 v10, v8, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v15, v6, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_add3_u32 v13, v13, v9, 0x7fff
+; GFX11-NEXT: v_add3_u32 v10, v10, v8, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v10, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v15, v6, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_bfe_u32 v12, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v12, v12, v7, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v7, v12, v14 :: v_dual_lshlrev_b32 v12, 16, v5
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v7, v7, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v6
+; GFX11-NEXT: v_dual_cndmask_b32 v9, v13, v10 :: v_dual_add_f32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v8, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v5
+; GFX11-NEXT: v_bfe_u32 v12, v10, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v11, v14 :: v_dual_lshlrev_b32 v11, 16, v4
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_add3_u32 v8, v8, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v9, 0x7060302
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v11 :: v_dual_add_f32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_add3_u32 v11, v12, v10, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v14, v9, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX11-NEXT: v_lshlrev_b32_e32 v12, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v8, v13, vcc_lo
+; GFX11-NEXT: v_add3_u32 v8, v14, v9, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v12 :: v_dual_lshlrev_b32 v12, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: v_perm_b32 v5, v5, v10, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v10, v4, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v8, v11, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v4
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v10, v10, v4, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v10, v11, vcc_lo
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v11, v13, v9, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v9
+; GFX11-NEXT: v_bfe_u32 v13, v3, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_bfe_u32 v14, v10, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_add3_u32 v13, v14, v10, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v14, 16, v1
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v3, v11, v12 :: v_dual_add_f32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_perm_b32 v4, v4, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v10
+; GFX11-NEXT: v_bfe_u32 v16, v2, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_bfe_u32 v14, v11, 16, 1
+; GFX11-NEXT: v_perm_b32 v3, v3, v9, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add3_u32 v12, v16, v2, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v13, v15 :: v_dual_lshlrev_b32 v15, 16, v0
+; GFX11-NEXT: v_or_b32_e32 v13, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v12, v13, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v15
+; GFX11-NEXT: v_add3_u32 v13, v14, v11, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v11
+; GFX11-NEXT: v_bfe_u32 v15, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_bfe_u32 v16, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v12
+; GFX11-NEXT: v_perm_b32 v2, v2, v10, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v13, v14, vcc_lo
+; GFX11-NEXT: v_add3_u32 v14, v15, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v16, v16, v12, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v14, v15 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_perm_b32 v1, v1, v11, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v13, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v16, v17, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v13, v13, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v13, v18, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v12, 0x7060302
+; GFX11-NEXT: .LBB35_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <16 x bfloat> %a1 to <4 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x bfloat> %a to <4 x double>
+ br label %end
+
+end:
+ %phi = phi <4 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x double> %phi
+}
+
+define <16 x half> @v_bitcast_v16i16_to_v16f16(<16 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i16_to_v16f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v31, v15
+; GCN-NEXT: v_mov_b32_e32 v30, v14
+; GCN-NEXT: v_mov_b32_e32 v29, v13
+; GCN-NEXT: v_mov_b32_e32 v28, v12
+; GCN-NEXT: v_mov_b32_e32 v27, v11
+; GCN-NEXT: v_mov_b32_e32 v26, v10
+; GCN-NEXT: v_mov_b32_e32 v25, v9
+; GCN-NEXT: v_mov_b32_e32 v24, v8
+; GCN-NEXT: v_mov_b32_e32 v23, v7
+; GCN-NEXT: v_mov_b32_e32 v22, v6
+; GCN-NEXT: v_mov_b32_e32 v21, v5
+; GCN-NEXT: v_mov_b32_e32 v20, v4
+; GCN-NEXT: v_mov_b32_e32 v19, v3
+; GCN-NEXT: v_mov_b32_e32 v18, v2
+; GCN-NEXT: v_mov_b32_e32 v17, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB36_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB36_4
+; GCN-NEXT: .LBB36_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB36_3: ; %cmp.false
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v28
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v30
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v31
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB36_2
+; GCN-NEXT: .LBB36_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v31
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v30
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v29
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v27
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v25
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v23
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v17
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i16_to_v16f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB36_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v8, 3
+; VI-NEXT: v_add_u16_sdwa v9, v0, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v10, v1, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v11, v2, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v12, v3, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v13, v4, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v14, v5, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v15, v6, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v8, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v7, 3, v7
+; VI-NEXT: v_add_u16_e32 v6, 3, v6
+; VI-NEXT: v_add_u16_e32 v5, 3, v5
+; VI-NEXT: v_add_u16_e32 v4, 3, v4
+; VI-NEXT: v_add_u16_e32 v3, 3, v3
+; VI-NEXT: v_add_u16_e32 v2, 3, v2
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v7, v7, v8
+; VI-NEXT: v_or_b32_e32 v6, v6, v15
+; VI-NEXT: v_or_b32_e32 v5, v5, v14
+; VI-NEXT: v_or_b32_e32 v4, v4, v13
+; VI-NEXT: v_or_b32_e32 v3, v3, v12
+; VI-NEXT: v_or_b32_e32 v2, v2, v11
+; VI-NEXT: v_or_b32_e32 v1, v1, v10
+; VI-NEXT: v_or_b32_e32 v0, v0, v9
+; VI-NEXT: .LBB36_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i16_to_v16f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i16_to_v16f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB36_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB36_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i16> %a, splat (i16 3)
+ %a2 = bitcast <16 x i16> %a1 to <16 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i16> %a to <16 x half>
+ br label %end
+
+end:
+ %phi = phi <16 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x half> %phi
+}
+
+define <16 x i16> @v_bitcast_v16f16_to_v16i16(<16 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f16_to_v16i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB37_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v0, v1
+; GCN-NEXT: v_or_b32_e32 v4, v4, v5
+; GCN-NEXT: v_or_b32_e32 v8, v8, v9
+; GCN-NEXT: v_or_b32_e32 v12, v12, v13
+; GCN-NEXT: v_or_b32_e32 v14, v14, v16
+; GCN-NEXT: v_or_b32_e32 v10, v10, v17
+; GCN-NEXT: v_or_b32_e32 v6, v6, v18
+; GCN-NEXT: v_or_b32_e32 v2, v2, v19
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v5, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v9, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v13, 16
+; GCN-NEXT: .LBB37_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f16_to_v16i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB37_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v9, 0x200
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v10, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v11, 0x200, v2
+; VI-NEXT: v_add_f16_sdwa v2, v2, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v12, 0x200, v3
+; VI-NEXT: v_add_f16_sdwa v3, v3, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v13, 0x200, v4
+; VI-NEXT: v_add_f16_sdwa v4, v4, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v14, 0x200, v5
+; VI-NEXT: v_add_f16_sdwa v5, v5, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v15, 0x200, v6
+; VI-NEXT: v_add_f16_sdwa v6, v6, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v16, 0x200, v7
+; VI-NEXT: v_add_f16_sdwa v7, v7, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v16, v7
+; VI-NEXT: v_or_b32_e32 v6, v15, v6
+; VI-NEXT: v_or_b32_e32 v5, v14, v5
+; VI-NEXT: v_or_b32_e32 v4, v13, v4
+; VI-NEXT: v_or_b32_e32 v3, v12, v3
+; VI-NEXT: v_or_b32_e32 v2, v11, v2
+; VI-NEXT: v_or_b32_e32 v1, v10, v1
+; VI-NEXT: v_or_b32_e32 v0, v8, v0
+; VI-NEXT: .LBB37_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f16_to_v16i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f16_to_v16i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB37_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB37_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <16 x half> %a1 to <16 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x half> %a to <16 x i16>
+ br label %end
+
+end:
+ %phi = phi <16 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i16> %phi
+}
+
+define <16 x bfloat> @v_bitcast_v16i16_to_v16bf16(<16 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i16_to_v16bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v23, v14
+; GCN-NEXT: v_mov_b32_e32 v22, v12
+; GCN-NEXT: v_mov_b32_e32 v21, v10
+; GCN-NEXT: v_mov_b32_e32 v20, v8
+; GCN-NEXT: v_mov_b32_e32 v19, v6
+; GCN-NEXT: v_mov_b32_e32 v18, v4
+; GCN-NEXT: v_mov_b32_e32 v17, v2
+; GCN-NEXT: v_mov_b32_e32 v24, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v15
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_4
+; GCN-NEXT: .LBB38_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB38_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB38_2
+; GCN-NEXT: .LBB38_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v23
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v21
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v19
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v17
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v24
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v14
+; GCN-NEXT: v_or_b32_e32 v0, v15, v0
+; GCN-NEXT: v_or_b32_e32 v2, v13, v2
+; GCN-NEXT: v_or_b32_e32 v4, v11, v4
+; GCN-NEXT: v_or_b32_e32 v6, v9, v6
+; GCN-NEXT: v_or_b32_e32 v7, v7, v8
+; GCN-NEXT: v_or_b32_e32 v5, v5, v10
+; GCN-NEXT: v_or_b32_e32 v3, v3, v12
+; GCN-NEXT: v_or_b32_e32 v1, v1, v14
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v12, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v10, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v8, vcc, s6, v6
+; GCN-NEXT: v_add_i32_e32 v6, vcc, s6, v7
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v5
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, s6, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v8
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v14
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i16_to_v16bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB38_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v8, 3
+; VI-NEXT: v_add_u16_sdwa v9, v0, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v10, v1, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v11, v2, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v12, v3, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v13, v4, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v14, v5, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v15, v6, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v8, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v7, 3, v7
+; VI-NEXT: v_add_u16_e32 v6, 3, v6
+; VI-NEXT: v_add_u16_e32 v5, 3, v5
+; VI-NEXT: v_add_u16_e32 v4, 3, v4
+; VI-NEXT: v_add_u16_e32 v3, 3, v3
+; VI-NEXT: v_add_u16_e32 v2, 3, v2
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v7, v7, v8
+; VI-NEXT: v_or_b32_e32 v6, v6, v15
+; VI-NEXT: v_or_b32_e32 v5, v5, v14
+; VI-NEXT: v_or_b32_e32 v4, v4, v13
+; VI-NEXT: v_or_b32_e32 v3, v3, v12
+; VI-NEXT: v_or_b32_e32 v2, v2, v11
+; VI-NEXT: v_or_b32_e32 v1, v1, v10
+; VI-NEXT: v_or_b32_e32 v0, v0, v9
+; VI-NEXT: .LBB38_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i16_to_v16bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i16_to_v16bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB38_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB38_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i16> %a, splat (i16 3)
+ %a2 = bitcast <16 x i16> %a1 to <16 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i16> %a to <16 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <16 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x bfloat> %phi
+}
+
+define <16 x i16> @v_bitcast_v16bf16_to_v16i16(<16 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16bf16_to_v16i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_mul_f32_e32 v31, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v30, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v29, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v28, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v27, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v26, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v24, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v15
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB39_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB39_4
+; GCN-NEXT: .LBB39_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB39_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v31
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v16
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v27
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v24
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v22
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB39_2
+; GCN-NEXT: .LBB39_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v31
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v30
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v29
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v28
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v27
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v26
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v24
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v23
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v20
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v16
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v7
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v8
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v10
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v12
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GCN-NEXT: v_alignbit_b32 v0, v14, v0, 16
+; GCN-NEXT: v_alignbit_b32 v4, v19, v2, 16
+; GCN-NEXT: v_alignbit_b32 v8, v20, v16, 16
+; GCN-NEXT: v_alignbit_b32 v12, v21, v5, 16
+; GCN-NEXT: v_alignbit_b32 v14, v15, v17, 16
+; GCN-NEXT: v_alignbit_b32 v10, v11, v9, 16
+; GCN-NEXT: v_alignbit_b32 v6, v7, v18, 16
+; GCN-NEXT: v_alignbit_b32 v2, v3, v13, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v24, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v23, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v22, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16bf16_to_v16i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB39_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v0
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_bfe_u32 v10, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v10, vcc, v10, v9
+; VI-NEXT: v_add_u32_e32 v10, vcc, s6, v10
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v9, v10, v11, vcc
+; VI-NEXT: v_bfe_u32 v10, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v10, vcc, v10, v1
+; VI-NEXT: v_add_u32_e32 v10, vcc, s6, v10
+; VI-NEXT: v_or_b32_e32 v11, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v10, v11, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v10, 16, v2
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_bfe_u32 v11, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v11, vcc, v11, v10
+; VI-NEXT: v_add_u32_e32 v11, vcc, s6, v11
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc
+; VI-NEXT: v_bfe_u32 v11, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v11, vcc, v11, v2
+; VI-NEXT: v_add_u32_e32 v11, vcc, s6, v11
+; VI-NEXT: v_or_b32_e32 v12, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v11, v12, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_bfe_u32 v12, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v12, vcc, v12, v11
+; VI-NEXT: v_add_u32_e32 v12, vcc, s6, v12
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v13, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v11, v12, v13, vcc
+; VI-NEXT: v_bfe_u32 v12, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v12, vcc, v12, v3
+; VI-NEXT: v_add_u32_e32 v12, vcc, s6, v12
+; VI-NEXT: v_or_b32_e32 v13, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v12, v13, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v12, 16, v4
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_bfe_u32 v13, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v13, vcc, v13, v12
+; VI-NEXT: v_add_u32_e32 v13, vcc, s6, v13
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v14, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v12, v13, v14, vcc
+; VI-NEXT: v_bfe_u32 v13, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v13, vcc, v13, v4
+; VI-NEXT: v_add_u32_e32 v13, vcc, s6, v13
+; VI-NEXT: v_or_b32_e32 v14, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v13, v14, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v13, 16, v5
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_bfe_u32 v14, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v14, vcc, v14, v13
+; VI-NEXT: v_add_u32_e32 v14, vcc, s6, v14
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v15, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v13, v14, v15, vcc
+; VI-NEXT: v_bfe_u32 v14, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v14, vcc, v14, v5
+; VI-NEXT: v_add_u32_e32 v14, vcc, s6, v14
+; VI-NEXT: v_or_b32_e32 v15, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v14, v15, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v14, 16, v6
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_bfe_u32 v15, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v15, vcc, v15, v14
+; VI-NEXT: v_add_u32_e32 v15, vcc, s6, v15
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v16, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v14, v15, v16, vcc
+; VI-NEXT: v_bfe_u32 v15, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v15, vcc, v15, v6
+; VI-NEXT: v_add_u32_e32 v15, vcc, s6, v15
+; VI-NEXT: v_or_b32_e32 v16, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v15, v16, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v15, 16, v7
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_bfe_u32 v16, v15, 16, 1
+; VI-NEXT: v_add_u32_e32 v16, vcc, v16, v15
+; VI-NEXT: v_add_u32_e32 v16, vcc, s6, v16
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v17, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v15, v16, v17, vcc
+; VI-NEXT: v_bfe_u32 v16, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v16, vcc, v16, v7
+; VI-NEXT: v_add_u32_e32 v16, vcc, 0x7fff, v16
+; VI-NEXT: v_or_b32_e32 v17, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v16, v17, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v15, 16
+; VI-NEXT: v_alignbit_b32 v6, v6, v14, 16
+; VI-NEXT: v_alignbit_b32 v5, v5, v13, 16
+; VI-NEXT: v_alignbit_b32 v4, v4, v12, 16
+; VI-NEXT: v_alignbit_b32 v3, v3, v11, 16
+; VI-NEXT: v_alignbit_b32 v2, v2, v10, 16
+; VI-NEXT: v_alignbit_b32 v1, v1, v9, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v8, 16
+; VI-NEXT: .LBB39_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16bf16_to_v16i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB39_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_bfe_u32 v10, v9, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v10, v10, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v10, v11, vcc
+; GFX9-NEXT: v_bfe_u32 v10, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v10, v10, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v11, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v10, v11, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v10, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_bfe_u32 v11, v10, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v11, v11, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc
+; GFX9-NEXT: v_bfe_u32 v11, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v11, v11, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v12, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v11, v12, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_bfe_u32 v12, v11, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v12, v12, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v13, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v12, v13, vcc
+; GFX9-NEXT: v_bfe_u32 v12, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v12, v12, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v13, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v12, v13, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v12, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_bfe_u32 v13, v12, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v13, v13, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v14, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v13, v14, vcc
+; GFX9-NEXT: v_bfe_u32 v13, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v13, v13, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v14, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v13, v14, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v13, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_bfe_u32 v14, v13, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v14, v14, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v15, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v14, v15, vcc
+; GFX9-NEXT: v_bfe_u32 v14, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v14, v14, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v15, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v14, v15, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v14, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_bfe_u32 v15, v14, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v15, v15, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v16, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v15, v16, vcc
+; GFX9-NEXT: v_bfe_u32 v15, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v15, v15, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v16, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v15, v16, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v15, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_bfe_u32 v16, v15, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v16, v16, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v17, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v16, v17, vcc
+; GFX9-NEXT: v_bfe_u32 v16, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v16, v16, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v17, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v16, v17, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v7, v7, v15, s6
+; GFX9-NEXT: v_perm_b32 v6, v6, v14, s6
+; GFX9-NEXT: v_perm_b32 v5, v5, v13, s6
+; GFX9-NEXT: v_perm_b32 v4, v4, v12, s6
+; GFX9-NEXT: v_perm_b32 v3, v3, v11, s6
+; GFX9-NEXT: v_perm_b32 v2, v2, v10, s6
+; GFX9-NEXT: v_perm_b32 v1, v1, v9, s6
+; GFX9-NEXT: v_perm_b32 v0, v0, v8, s6
+; GFX9-NEXT: .LBB39_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16bf16_to_v16i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB39_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v9 :: v_dual_lshlrev_b32 v8, 16, v0
+; GFX11-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_bfe_u32 v11, v8, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v8
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v13, v13, v9, 0x7fff
+; GFX11-NEXT: v_add3_u32 v11, v11, v8, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v8, v11, v14 :: v_dual_and_b32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v14, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_dual_add_f32 v14, 0x40c00000, v14 :: v_dual_add_f32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v12, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_or_b32_e32 v16, 0x400000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v12, v12, v0, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v12, v15 :: v_dual_lshlrev_b32 v15, 16, v4
+; GFX11-NEXT: v_bfe_u32 v12, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v0, v0, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v13, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v12, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v10, 16, v2
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v15
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v11, v12 :: v_dual_add_f32 v10, 0x40c00000, v10
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v18, v4, 16, 1
+; GFX11-NEXT: v_perm_b32 v1, v1, v9, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v13, v10, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_add3_u32 v11, v13, v10, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v13, v2, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v2, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v2
+; GFX11-NEXT: v_bfe_u32 v13, v14, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v14, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_bfe_u32 v14, v15, 16, 1
+; GFX11-NEXT: v_bfe_u32 v13, v3, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v11, v11, v12 :: v_dual_lshlrev_b32 v12, 16, v5
+; GFX11-NEXT: v_add3_u32 v14, v14, v15, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_add3_u32 v15, v18, v4, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v13, v13, v3, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v14, v14, v17, vcc_lo
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_perm_b32 v2, v2, v10, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v19, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v21, v5, 16, 1
+; GFX11-NEXT: v_add3_u32 v18, v19, v12, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v19, 16, v6
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v15, v17 :: v_dual_add_f32 v15, 0x40c00000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_add3_u32 v17, v21, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v19, v15, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v18, v20, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v7
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v15
+; GFX11-NEXT: v_add3_u32 v19, v19, v15, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v22, v6, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v23, v18, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v15, v19, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v21, v22, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: v_bfe_u32 v19, v7, 16, 1
+; GFX11-NEXT: v_add3_u32 v23, v23, v18, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v24, 0x400000, v18
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v21, v22, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_add3_u32 v19, v19, v7, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v15, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v23, v24, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v19, v25, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v7, v7, v18, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_perm_b32 v4, v4, v14, 0x7060302
+; GFX11-NEXT: v_perm_b32 v5, v5, v12, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v13, v16, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v3, v3, v11, 0x7060302
+; GFX11-NEXT: .LBB39_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <16 x bfloat> %a1 to <16 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x bfloat> %a to <16 x i16>
+ br label %end
+
+end:
+ %phi = phi <16 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i16> %phi
+}
+
+define <16 x bfloat> @v_bitcast_v16f16_to_v16bf16(<16 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f16_to_v16bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v30, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v15
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB40_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB40_4
+; GCN-NEXT: .LBB40_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB40_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v28
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v29
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v30
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v31
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB40_2
+; GCN-NEXT: .LBB40_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v30
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v28
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v16
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v23
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f16_to_v16bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB40_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v9, 0x200
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v10, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v11, 0x200, v2
+; VI-NEXT: v_add_f16_sdwa v2, v2, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v12, 0x200, v3
+; VI-NEXT: v_add_f16_sdwa v3, v3, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v13, 0x200, v4
+; VI-NEXT: v_add_f16_sdwa v4, v4, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v14, 0x200, v5
+; VI-NEXT: v_add_f16_sdwa v5, v5, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v15, 0x200, v6
+; VI-NEXT: v_add_f16_sdwa v6, v6, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v16, 0x200, v7
+; VI-NEXT: v_add_f16_sdwa v7, v7, v9 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v16, v7
+; VI-NEXT: v_or_b32_e32 v6, v15, v6
+; VI-NEXT: v_or_b32_e32 v5, v14, v5
+; VI-NEXT: v_or_b32_e32 v4, v13, v4
+; VI-NEXT: v_or_b32_e32 v3, v12, v3
+; VI-NEXT: v_or_b32_e32 v2, v11, v2
+; VI-NEXT: v_or_b32_e32 v1, v10, v1
+; VI-NEXT: v_or_b32_e32 v0, v8, v0
+; VI-NEXT: .LBB40_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f16_to_v16bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f16_to_v16bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB40_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB40_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <16 x half> %a1 to <16 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x half> %a to <16 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <16 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x bfloat> %phi
+}
+
+define <16 x half> @v_bitcast_v16bf16_to_v16f16(<16 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16bf16_to_v16f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v24, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v26, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v27, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v28, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v29, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v30, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v31, 1.0, v15
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB41_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB41_4
+; GCN-NEXT: .LBB41_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB41_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v16
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v24
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v27
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB41_2
+; GCN-NEXT: .LBB41_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v31
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v30
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v29
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v28
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v27
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v26
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v24
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v23
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v20
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v16
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v16, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v18, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16bf16_to_v16f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB41_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_bfe_u32 v9, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v8
+; VI-NEXT: v_add_u32_e32 v9, vcc, 0x7fff, v9
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; VI-NEXT: v_bfe_u32 v9, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v9, vcc, v9, v0
+; VI-NEXT: v_add_u32_e32 v9, vcc, s6, v9
+; VI-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_bfe_u32 v10, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v10, vcc, v10, v9
+; VI-NEXT: v_add_u32_e32 v10, vcc, s6, v10
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v9, v10, v11, vcc
+; VI-NEXT: v_bfe_u32 v10, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v10, vcc, v10, v1
+; VI-NEXT: v_add_u32_e32 v10, vcc, s6, v10
+; VI-NEXT: v_or_b32_e32 v11, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v10, v11, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v10, 16, v2
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_bfe_u32 v11, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v11, vcc, v11, v10
+; VI-NEXT: v_add_u32_e32 v11, vcc, s6, v11
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc
+; VI-NEXT: v_bfe_u32 v11, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v11, vcc, v11, v2
+; VI-NEXT: v_add_u32_e32 v11, vcc, s6, v11
+; VI-NEXT: v_or_b32_e32 v12, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v11, v12, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_bfe_u32 v12, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v12, vcc, v12, v11
+; VI-NEXT: v_add_u32_e32 v12, vcc, s6, v12
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v13, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v11, v12, v13, vcc
+; VI-NEXT: v_bfe_u32 v12, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v12, vcc, v12, v3
+; VI-NEXT: v_add_u32_e32 v12, vcc, s6, v12
+; VI-NEXT: v_or_b32_e32 v13, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v12, v13, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v12, 16, v4
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_bfe_u32 v13, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v13, vcc, v13, v12
+; VI-NEXT: v_add_u32_e32 v13, vcc, s6, v13
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v14, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v12, v13, v14, vcc
+; VI-NEXT: v_bfe_u32 v13, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v13, vcc, v13, v4
+; VI-NEXT: v_add_u32_e32 v13, vcc, s6, v13
+; VI-NEXT: v_or_b32_e32 v14, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v13, v14, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v13, 16, v5
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_bfe_u32 v14, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v14, vcc, v14, v13
+; VI-NEXT: v_add_u32_e32 v14, vcc, s6, v14
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v15, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v13, v14, v15, vcc
+; VI-NEXT: v_bfe_u32 v14, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v14, vcc, v14, v5
+; VI-NEXT: v_add_u32_e32 v14, vcc, s6, v14
+; VI-NEXT: v_or_b32_e32 v15, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v14, v15, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v14, 16, v6
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_bfe_u32 v15, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v15, vcc, v15, v14
+; VI-NEXT: v_add_u32_e32 v15, vcc, s6, v15
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v16, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v14, v15, v16, vcc
+; VI-NEXT: v_bfe_u32 v15, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v15, vcc, v15, v6
+; VI-NEXT: v_add_u32_e32 v15, vcc, s6, v15
+; VI-NEXT: v_or_b32_e32 v16, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v15, v16, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v15, 16, v7
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_bfe_u32 v16, v15, 16, 1
+; VI-NEXT: v_add_u32_e32 v16, vcc, v16, v15
+; VI-NEXT: v_add_u32_e32 v16, vcc, s6, v16
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v17, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v15, v16, v17, vcc
+; VI-NEXT: v_bfe_u32 v16, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v16, vcc, v16, v7
+; VI-NEXT: v_add_u32_e32 v16, vcc, 0x7fff, v16
+; VI-NEXT: v_or_b32_e32 v17, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v16, v17, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v15, 16
+; VI-NEXT: v_alignbit_b32 v6, v6, v14, 16
+; VI-NEXT: v_alignbit_b32 v5, v5, v13, 16
+; VI-NEXT: v_alignbit_b32 v4, v4, v12, 16
+; VI-NEXT: v_alignbit_b32 v3, v3, v11, 16
+; VI-NEXT: v_alignbit_b32 v2, v2, v10, 16
+; VI-NEXT: v_alignbit_b32 v1, v1, v9, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v8, 16
+; VI-NEXT: .LBB41_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16bf16_to_v16f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v8
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB41_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v8, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_bfe_u32 v9, v8, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v9, v9, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v9, v10, vcc
+; GFX9-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v9, v9, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_bfe_u32 v10, v9, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v10, v10, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v10, v11, vcc
+; GFX9-NEXT: v_bfe_u32 v10, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v10, v10, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v11, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v10, v11, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v10, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_bfe_u32 v11, v10, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v11, v11, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc
+; GFX9-NEXT: v_bfe_u32 v11, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v11, v11, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v12, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v11, v12, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v11, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_bfe_u32 v12, v11, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v12, v12, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v13, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v12, v13, vcc
+; GFX9-NEXT: v_bfe_u32 v12, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v12, v12, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v13, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v12, v13, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v12, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_bfe_u32 v13, v12, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v13, v13, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v14, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v13, v14, vcc
+; GFX9-NEXT: v_bfe_u32 v13, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v13, v13, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v14, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v13, v14, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v13, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_bfe_u32 v14, v13, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v14, v14, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v15, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v14, v15, vcc
+; GFX9-NEXT: v_bfe_u32 v14, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v14, v14, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v15, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v14, v15, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v14, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_bfe_u32 v15, v14, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v15, v15, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v16, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v15, v16, vcc
+; GFX9-NEXT: v_bfe_u32 v15, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v15, v15, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v16, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v15, v16, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v15, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_bfe_u32 v16, v15, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v16, v16, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v17, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v16, v17, vcc
+; GFX9-NEXT: v_bfe_u32 v16, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v16, v16, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v17, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v16, v17, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v7, v7, v15, s6
+; GFX9-NEXT: v_perm_b32 v6, v6, v14, s6
+; GFX9-NEXT: v_perm_b32 v5, v5, v13, s6
+; GFX9-NEXT: v_perm_b32 v4, v4, v12, s6
+; GFX9-NEXT: v_perm_b32 v3, v3, v11, s6
+; GFX9-NEXT: v_perm_b32 v2, v2, v10, s6
+; GFX9-NEXT: v_perm_b32 v1, v1, v9, s6
+; GFX9-NEXT: v_perm_b32 v0, v0, v8, s6
+; GFX9-NEXT: .LBB41_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16bf16_to_v16f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v8
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB41_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v9, 0x40c00000, v9 :: v_dual_lshlrev_b32 v8, 16, v0
+; GFX11-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v13, v9, 16, 1
+; GFX11-NEXT: v_bfe_u32 v11, v8, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v14, 0x400000, v8
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v13, v13, v9, 0x7fff
+; GFX11-NEXT: v_add3_u32 v11, v11, v8, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v8, v11, v14 :: v_dual_and_b32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_or_b32_e32 v11, 0x400000, v9
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v14, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_dual_add_f32 v14, 0x40c00000, v14 :: v_dual_add_f32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v12, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v15, 0x400000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_or_b32_e32 v16, 0x400000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v12, v12, v0, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v12, v15 :: v_dual_lshlrev_b32 v15, 16, v4
+; GFX11-NEXT: v_bfe_u32 v12, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v0, v0, v8, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v13, v11, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v12, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v10, 16, v2
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v15
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v11, v12 :: v_dual_add_f32 v10, 0x40c00000, v10
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v18, v4, 16, 1
+; GFX11-NEXT: v_perm_b32 v1, v1, v9, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v13, v10, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_add3_u32 v11, v13, v10, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v13, v2, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v2, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v2
+; GFX11-NEXT: v_bfe_u32 v13, v14, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v11, v12, vcc_lo
+; GFX11-NEXT: v_add3_u32 v11, v13, v14, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v12, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_bfe_u32 v14, v15, 16, 1
+; GFX11-NEXT: v_bfe_u32 v13, v3, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v11, v11, v12 :: v_dual_lshlrev_b32 v12, 16, v5
+; GFX11-NEXT: v_add3_u32 v14, v14, v15, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_add3_u32 v15, v18, v4, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_add3_u32 v13, v13, v3, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v14, v14, v17, vcc_lo
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_perm_b32 v2, v2, v10, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v19, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v21, v5, 16, 1
+; GFX11-NEXT: v_add3_u32 v18, v19, v12, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v19, 16, v6
+; GFX11-NEXT: v_or_b32_e32 v17, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v4, v15, v17 :: v_dual_add_f32 v15, 0x40c00000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: v_add3_u32 v17, v21, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v19, v15, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v18, v20, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v7
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v15
+; GFX11-NEXT: v_add3_u32 v19, v19, v15, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v22, v6, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v23, v18, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v15, v19, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v21, v22, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: v_bfe_u32 v19, v7, 16, 1
+; GFX11-NEXT: v_add3_u32 v23, v23, v18, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v24, 0x400000, v18
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v21, v22, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_add3_u32 v19, v19, v7, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v15, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v23, v24, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v19, v25, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v7, v7, v18, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_perm_b32 v4, v4, v14, 0x7060302
+; GFX11-NEXT: v_perm_b32 v5, v5, v12, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v13, v16, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v3, v3, v11, 0x7060302
+; GFX11-NEXT: .LBB41_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <16 x bfloat> %a1 to <16 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x bfloat> %a to <16 x half>
+ br label %end
+
+end:
+ %phi = phi <16 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x half> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll
new file mode 100644
index 0000000000000..ee3becb4b9d8a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll
@@ -0,0 +1,209 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <9 x float> @v_bitcast_v9i32_to_v9f32(<9 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v9i32_to_v9f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v9
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v9i32_to_v9f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v9
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v9i32_to_v9f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v9
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v9i32_to_v9f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v9
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <9 x i32> %a, splat (i32 3)
+ %a2 = bitcast <9 x i32> %a1 to <9 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <9 x i32> %a to <9 x float>
+ br label %end
+
+end:
+ %phi = phi <9 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <9 x float> %phi
+}
+
+define <9 x i32> @v_bitcast_v9f32_to_v9i32(<9 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v9f32_to_v9i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v9
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v9f32_to_v9i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v9
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v9f32_to_v9i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v9
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v9f32_to_v9i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v9
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v8, 1.0, v8 :: v_dual_add_f32 v7, 1.0, v7
+; GFX11-NEXT: v_dual_add_f32 v6, 1.0, v6 :: v_dual_add_f32 v5, 1.0, v5
+; GFX11-NEXT: v_dual_add_f32 v4, 1.0, v4 :: v_dual_add_f32 v3, 1.0, v3
+; GFX11-NEXT: v_dual_add_f32 v2, 1.0, v2 :: v_dual_add_f32 v1, 1.0, v1
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <9 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <9 x float> %a1 to <9 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <9 x float> %a to <9 x i32>
+ br label %end
+
+end:
+ %phi = phi <9 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <9 x i32> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll
new file mode 100644
index 0000000000000..5220867f9676c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll
@@ -0,0 +1,220 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <10 x float> @v_bitcast_v10i32_to_v10f32(<10 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v10i32_to_v10f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v10
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v10i32_to_v10f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v10
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB0_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB0_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v10i32_to_v10f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v10
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB0_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB0_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v10i32_to_v10f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v10
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <10 x i32> %a, splat (i32 3)
+ %a2 = bitcast <10 x i32> %a1 to <10 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <10 x i32> %a to <10 x float>
+ br label %end
+
+end:
+ %phi = phi <10 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <10 x float> %phi
+}
+
+define <10 x i32> @v_bitcast_v10f32_to_v10i32(<10 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v10f32_to_v10i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v10
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v10f32_to_v10i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v10
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB1_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB1_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v10f32_to_v10i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v10
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB1_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB1_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v10f32_to_v10i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v10
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <10 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <10 x float> %a1 to <10 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <10 x float> %a to <10 x i32>
+ br label %end
+
+end:
+ %phi = phi <10 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <10 x i32> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.32bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.32bit.ll
new file mode 100644
index 0000000000000..6ba466662299d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.32bit.ll
@@ -0,0 +1,1960 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define float @v_bitcast_i32_to_f32(i32 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i32_to_f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i32_to_f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i32_to_f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i32_to_f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i32 %a, 3
+ %a2 = bitcast i32 %a1 to float
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i32 %a to float
+ br label %end
+
+end:
+ %phi = phi float [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret float %phi
+}
+
+define i32 @v_bitcast_f32_to_i32(float %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f32_to_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f32_to_i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f32_to_i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f32_to_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd float %a, 1.000000e+00
+ %a2 = bitcast float %a1 to i32
+ br label %end
+
+cmp.false:
+ %a3 = bitcast float %a to i32
+ br label %end
+
+end:
+ %phi = phi i32 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i32 %phi
+}
+
+define <2 x i16> @v_bitcast_i32_to_v2i16(i32 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i32_to_v2i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB2_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB2_4
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB2_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: .LBB2_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i32_to_v2i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i32_to_v2i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i32_to_v2i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i32 %a, 3
+ %a2 = bitcast i32 %a1 to <2 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i32 %a to <2 x i16>
+ br label %end
+
+end:
+ %phi = phi <2 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i16> %phi
+}
+
+define i32 @v_bitcast_v2i16_to_i32(<2 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i16_to_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v3, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB3_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB3_4
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB3_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v3
+; GCN-NEXT: v_or_b32_e32 v0, v0, v1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: .LBB3_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v3
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i16_to_i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 3
+; VI-NEXT: v_add_u16_e32 v1, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v1, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i16_to_i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i16_to_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i16> %a, splat (i16 3)
+ %a2 = bitcast <2 x i16> %a1 to i32
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i16> %a to i32
+ br label %end
+
+end:
+ %phi = phi i32 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i32 %phi
+}
+
+define <2 x half> @v_bitcast_i32_to_v2f16(i32 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i32_to_v2f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v2, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB4_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB4_4
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB4_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v2
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: .LBB4_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i32_to_v2f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i32_to_v2f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i32_to_v2f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i32 %a, 3
+ %a2 = bitcast i32 %a1 to <2 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i32 %a to <2 x half>
+ br label %end
+
+end:
+ %phi = phi <2 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x half> %phi
+}
+
+define i32 @v_bitcast_v2f16_to_i32(<2 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f16_to_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB5_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB5_4
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB5_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: .LBB5_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f16_to_i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v1, 0x200
+; VI-NEXT: v_add_f16_sdwa v1, v0, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v0, v0, v1
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f16_to_i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f16_to_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <2 x half> %a1 to i32
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x half> %a to i32
+ br label %end
+
+end:
+ %phi = phi i32 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i32 %phi
+}
+
+define <2 x bfloat> @v_bitcast_i32_to_v2bf16(i32 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i32_to_v2bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v2, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB6_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB6_4
+; GCN-NEXT: .LBB6_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB6_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v2
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: .LBB6_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v2
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i32_to_v2bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i32_to_v2bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i32_to_v2bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i32 %a, 3
+ %a2 = bitcast i32 %a1 to <2 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i32 %a to <2 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <2 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x bfloat> %phi
+}
+
+define i32 @v_bitcast_v2bf16_to_i32(<2 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2bf16_to_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_mul_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_4
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB7_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v1
+; GCN-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: .LBB7_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v2
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2bf16_to_i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB7_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_bfe_u32 v2, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v1
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; VI-NEXT: v_bfe_u32 v2, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v1, 16
+; VI-NEXT: .LBB7_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2bf16_to_i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB7_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v2, v2, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; GFX9-NEXT: v_bfe_u32 v2, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v2, v2, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v1, s6
+; GFX9-NEXT: .LBB7_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2bf16_to_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB7_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v2, v2, v1, 0x7fff
+; GFX11-NEXT: v_add3_u32 v3, v3, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v2, v4, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v3, v5, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v1, 0x7060302
+; GFX11-NEXT: .LBB7_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <2 x bfloat> %a1 to i32
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x bfloat> %a to i32
+ br label %end
+
+end:
+ %phi = phi i32 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i32 %phi
+}
+
+define <2 x i16> @v_bitcast_f32_to_v2i16(float %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f32_to_v2i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_4
+; GCN-NEXT: .LBB8_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB8_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: .LBB8_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f32_to_v2i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f32_to_v2i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f32_to_v2i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd float %a, 1.000000e+00
+ %a2 = bitcast float %a1 to <2 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast float %a to <2 x i16>
+ br label %end
+
+end:
+ %phi = phi <2 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i16> %phi
+}
+
+define float @v_bitcast_v2i16_to_f32(<2 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i16_to_f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v3, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_4
+; GCN-NEXT: .LBB9_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB9_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v3
+; GCN-NEXT: v_or_b32_e32 v0, v0, v1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: .LBB9_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v3
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i16_to_f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 3
+; VI-NEXT: v_add_u16_e32 v1, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v1, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i16_to_f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i16_to_f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i16> %a, splat (i16 3)
+ %a2 = bitcast <2 x i16> %a1 to float
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i16> %a to float
+ br label %end
+
+end:
+ %phi = phi float [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret float %phi
+}
+
+define <2 x half> @v_bitcast_f32_to_v2f16(float %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f32_to_v2f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v2, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_4
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB10_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v2
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: .LBB10_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f32_to_v2f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f32_to_v2f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f32_to_v2f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd float %a, 1.000000e+00
+ %a2 = bitcast float %a1 to <2 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast float %a to <2 x half>
+ br label %end
+
+end:
+ %phi = phi <2 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x half> %phi
+}
+
+define float @v_bitcast_v2f16_to_f32(<2 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f16_to_f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_4
+; GCN-NEXT: .LBB11_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB11_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: .LBB11_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f16_to_f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v1, 0x200
+; VI-NEXT: v_add_f16_sdwa v1, v0, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v0, v0, v1
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f16_to_f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f16_to_f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <2 x half> %a1 to float
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x half> %a to float
+ br label %end
+
+end:
+ %phi = phi float [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret float %phi
+}
+
+define <2 x bfloat> @v_bitcast_f32_to_v2bf16(float %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f32_to_v2bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v2, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB12_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB12_4
+; GCN-NEXT: .LBB12_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB12_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v2
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB12_2
+; GCN-NEXT: .LBB12_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v2
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f32_to_v2bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f32_to_v2bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f32_to_v2bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd float %a, 1.000000e+00
+ %a2 = bitcast float %a1 to <2 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast float %a to <2 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <2 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x bfloat> %phi
+}
+
+define float @v_bitcast_v2bf16_to_f32(<2 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2bf16_to_f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_mul_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB13_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB13_4
+; GCN-NEXT: .LBB13_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB13_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v1
+; GCN-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB13_2
+; GCN-NEXT: .LBB13_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v2
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2bf16_to_f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB13_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_bfe_u32 v2, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v1
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; VI-NEXT: v_bfe_u32 v2, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v1, 16
+; VI-NEXT: .LBB13_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2bf16_to_f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB13_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v2, v2, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; GFX9-NEXT: v_bfe_u32 v2, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v2, v2, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v1, s6
+; GFX9-NEXT: .LBB13_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2bf16_to_f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB13_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v2, v2, v1, 0x7fff
+; GFX11-NEXT: v_add3_u32 v3, v3, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v2, v4, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v3, v5, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v1, 0x7060302
+; GFX11-NEXT: .LBB13_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <2 x bfloat> %a1 to float
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x bfloat> %a to float
+ br label %end
+
+end:
+ %phi = phi float [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret float %phi
+}
+
+define <2 x half> @v_bitcast_v2i16_to_v2f16(<2 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i16_to_v2f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v4, v1
+; GCN-NEXT: v_mov_b32_e32 v3, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB14_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB14_4
+; GCN-NEXT: .LBB14_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB14_3: ; %cmp.false
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB14_2
+; GCN-NEXT: .LBB14_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i16_to_v2f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v1, 3
+; VI-NEXT: v_add_u16_sdwa v1, v0, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v0, v0, v1
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i16_to_v2f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i16_to_v2f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i16> %a, splat (i16 3)
+ %a2 = bitcast <2 x i16> %a1 to <2 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i16> %a to <2 x half>
+ br label %end
+
+end:
+ %phi = phi <2 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x half> %phi
+}
+
+define <2 x i16> @v_bitcast_v2f16_to_v2i16(<2 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f16_to_v2i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB15_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_or_b32_e32 v0, v0, v2
+; GCN-NEXT: .LBB15_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f16_to_v2i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 0x200
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v1, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f16_to_v2i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f16_to_v2i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <2 x half> %a1 to <2 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x half> %a to <2 x i16>
+ br label %end
+
+end:
+ %phi = phi <2 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i16> %phi
+}
+
+define <2 x bfloat> @v_bitcast_v2i16_to_v2bf16(<2 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i16_to_v2bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: .LBB16_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i16_to_v2bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v1, 3
+; VI-NEXT: v_add_u16_sdwa v1, v0, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v0, v0, v1
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i16_to_v2bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i16_to_v2bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i16> %a, splat (i16 3)
+ %a2 = bitcast <2 x i16> %a1 to <2 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i16> %a to <2 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <2 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x bfloat> %phi
+}
+
+define <2 x i16> @v_bitcast_v2bf16_to_v2i16(<2 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2bf16_to_v2i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_mul_f32_e32 v3, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_4
+; GCN-NEXT: .LBB17_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB17_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB17_2
+; GCN-NEXT: .LBB17_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v3
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2bf16_to_v2i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB17_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_bfe_u32 v2, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v1
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; VI-NEXT: v_bfe_u32 v2, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v1, 16
+; VI-NEXT: .LBB17_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2bf16_to_v2i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB17_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v2, v2, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; GFX9-NEXT: v_bfe_u32 v2, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v2, v2, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v1, s6
+; GFX9-NEXT: .LBB17_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2bf16_to_v2i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB17_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v2, v2, v1, 0x7fff
+; GFX11-NEXT: v_add3_u32 v3, v3, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v2, v4, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v3, v5, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v1, 0x7060302
+; GFX11-NEXT: .LBB17_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <2 x bfloat> %a1 to <2 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x bfloat> %a to <2 x i16>
+ br label %end
+
+end:
+ %phi = phi <2 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i16> %phi
+}
+
+define <2 x bfloat> @v_bitcast_v2f16_to_v2bf16(<2 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f16_to_v2bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB18_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB18_4
+; GCN-NEXT: .LBB18_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB18_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v3
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_2
+; GCN-NEXT: .LBB18_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v2
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f16_to_v2bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 0x200
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v1, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f16_to_v2bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f16_to_v2bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <2 x half> %a1 to <2 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x half> %a to <2 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <2 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x bfloat> %phi
+}
+
+define <2 x half> @v_bitcast_v2bf16_to_v2f16(<2 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2bf16_to_v2f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v3, 1.0, v1
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_4
+; GCN-NEXT: .LBB19_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB19_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB19_2
+; GCN-NEXT: .LBB19_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v3
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v2
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2bf16_to_v2f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB19_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_bfe_u32 v2, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v1
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; VI-NEXT: v_bfe_u32 v2, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT: v_add_u32_e32 v2, vcc, 0x7fff, v2
+; VI-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v1, 16
+; VI-NEXT: .LBB19_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2bf16_to_v2f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB19_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v2, v2, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
+; GFX9-NEXT: v_bfe_u32 v2, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v2, v2, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v3, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v1, s6
+; GFX9-NEXT: .LBB19_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2bf16_to_v2f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v1
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB19_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v0
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v2, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v2, v2, v1, 0x7fff
+; GFX11-NEXT: v_add3_u32 v3, v3, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v2, v4, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v3, v5, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v1, 0x7060302
+; GFX11-NEXT: .LBB19_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <2 x bfloat> %a1 to <2 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x bfloat> %a to <2 x half>
+ br label %end
+
+end:
+ %phi = phi <2 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x half> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll
new file mode 100644
index 0000000000000..eaf45e7231b79
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll
@@ -0,0 +1,228 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <11 x float> @v_bitcast_v11i32_to_v11f32(<11 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v11i32_to_v11f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v11
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v11i32_to_v11f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v11
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB0_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB0_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v11i32_to_v11f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v11
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB0_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB0_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v11i32_to_v11f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v11
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <11 x i32> %a, splat (i32 3)
+ %a2 = bitcast <11 x i32> %a1 to <11 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <11 x i32> %a to <11 x float>
+ br label %end
+
+end:
+ %phi = phi <11 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <11 x float> %phi
+}
+
+define <11 x i32> @v_bitcast_v11f32_to_v11i32(<11 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v11f32_to_v11i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v11
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v11f32_to_v11i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v11
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB1_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB1_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v11f32_to_v11i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v11
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB1_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB1_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v11f32_to_v11i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v11
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v10, 1.0, v10 :: v_dual_add_f32 v9, 1.0, v9
+; GFX11-NEXT: v_dual_add_f32 v8, 1.0, v8 :: v_dual_add_f32 v7, 1.0, v7
+; GFX11-NEXT: v_dual_add_f32 v6, 1.0, v6 :: v_dual_add_f32 v5, 1.0, v5
+; GFX11-NEXT: v_dual_add_f32 v4, 1.0, v4 :: v_dual_add_f32 v3, 1.0, v3
+; GFX11-NEXT: v_dual_add_f32 v2, 1.0, v2 :: v_dual_add_f32 v1, 1.0, v1
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <11 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <11 x float> %a1 to <11 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <11 x float> %a to <11 x i32>
+ br label %end
+
+end:
+ %phi = phi <11 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <11 x i32> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll
new file mode 100644
index 0000000000000..152d07cb1dd61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll
@@ -0,0 +1,235 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <12 x float> @v_bitcast_v12i32_to_v12f32(<12 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v12i32_to_v12f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v12
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v12i32_to_v12f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v12
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB0_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB0_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v12i32_to_v12f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v12
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB0_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB0_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v12i32_to_v12f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v12
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <12 x i32> %a, splat (i32 3)
+ %a2 = bitcast <12 x i32> %a1 to <12 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <12 x i32> %a to <12 x float>
+ br label %end
+
+end:
+ %phi = phi <12 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <12 x float> %phi
+}
+
+define <12 x i32> @v_bitcast_v12f32_to_v12i32(<12 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v12f32_to_v12i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v12
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v12f32_to_v12i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v12
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB1_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB1_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v12f32_to_v12i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v12
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB1_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB1_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v12f32_to_v12i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v12
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <12 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <12 x float> %a1 to <12 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <12 x float> %a to <12 x i32>
+ br label %end
+
+end:
+ %phi = phi <12 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <12 x i32> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
new file mode 100644
index 0000000000000..5b5b0073c5ca4
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
@@ -0,0 +1,15566 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <16 x float> @v_bitcast_v16i32_to_v16f32(<16 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i32_to_v16f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i32_to_v16f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB0_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB0_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i32_to_v16f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB0_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB0_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i32_to_v16f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB0_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB0_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i32> %a, splat (i32 3)
+ %a2 = bitcast <16 x i32> %a1 to <16 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i32> %a to <16 x float>
+ br label %end
+
+end:
+ %phi = phi <16 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x float> %phi
+}
+
+define <16 x i32> @v_bitcast_v16f32_to_v16i32(<16 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f32_to_v16i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f32_to_v16i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB1_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB1_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f32_to_v16i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB1_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB1_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f32_to_v16i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB1_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB1_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <16 x float> %a1 to <16 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x float> %a to <16 x i32>
+ br label %end
+
+end:
+ %phi = phi <16 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i32> %phi
+}
+
+define <8 x i64> @v_bitcast_v16i32_to_v8i64(<16 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i32_to_v8i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i32_to_v8i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB2_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB2_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i32_to_v8i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB2_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB2_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i32_to_v8i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB2_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB2_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i32> %a, splat (i32 3)
+ %a2 = bitcast <16 x i32> %a1 to <8 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i32> %a to <8 x i64>
+ br label %end
+
+end:
+ %phi = phi <8 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i64> %phi
+}
+
+define <16 x i32> @v_bitcast_v8i64_to_v16i32(<8 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i64_to_v16i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i64_to_v16i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB3_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB3_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i64_to_v16i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB3_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB3_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i64_to_v16i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB3_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB3_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i64> %a, splat (i64 3)
+ %a2 = bitcast <8 x i64> %a1 to <16 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i64> %a to <16 x i32>
+ br label %end
+
+end:
+ %phi = phi <16 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i32> %phi
+}
+
+define <8 x double> @v_bitcast_v16i32_to_v8f64(<16 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i32_to_v8f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v15
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v13
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v11
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v9
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v7
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v3
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i32_to_v8f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB4_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB4_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i32_to_v8f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB4_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB4_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i32_to_v8f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB4_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB4_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i32> %a, splat (i32 3)
+ %a2 = bitcast <16 x i32> %a1 to <8 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i32> %a to <8 x double>
+ br label %end
+
+end:
+ %phi = phi <8 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x double> %phi
+}
+
+define <16 x i32> @v_bitcast_v8f64_to_v16i32(<8 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f64_to_v16i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f64_to_v16i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB5_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB5_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f64_to_v16i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB5_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB5_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f64_to_v16i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB5_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB5_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <8 x double> %a1 to <16 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x double> %a to <16 x i32>
+ br label %end
+
+end:
+ %phi = phi <16 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i32> %phi
+}
+
+define <32 x i16> @v_bitcast_v16i32_to_v32i16(<16 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i32_to_v32i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v30, v15
+; GCN-NEXT: v_mov_b32_e32 v28, v14
+; GCN-NEXT: v_mov_b32_e32 v26, v13
+; GCN-NEXT: v_mov_b32_e32 v24, v12
+; GCN-NEXT: v_mov_b32_e32 v22, v11
+; GCN-NEXT: v_mov_b32_e32 v20, v10
+; GCN-NEXT: v_mov_b32_e32 v18, v9
+; GCN-NEXT: v_mov_b32_e32 v32, v8
+; GCN-NEXT: v_mov_b32_e32 v14, v7
+; GCN-NEXT: v_mov_b32_e32 v12, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v8, v4
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v4, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v29, v30, v28, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v20, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v32, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v8, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB6_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v32, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v30
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_alignbit_b32 v29, v30, v28, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v20, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v32, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v8, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB6_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v16, v32
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i32_to_v32i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB6_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB6_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i32_to_v32i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB6_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB6_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i32_to_v32i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB6_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB6_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i32> %a, splat (i32 3)
+ %a2 = bitcast <16 x i32> %a1 to <32 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i32> %a to <32 x i16>
+ br label %end
+
+end:
+ %phi = phi <32 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i16> %phi
+}
+
+define <16 x i32> @v_bitcast_v32i16_to_v16i32(<32 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i16_to_v16i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v38, v14
+; GCN-NEXT: v_mov_b32_e32 v37, v12
+; GCN-NEXT: v_mov_b32_e32 v36, v10
+; GCN-NEXT: v_mov_b32_e32 v35, v8
+; GCN-NEXT: v_mov_b32_e32 v34, v6
+; GCN-NEXT: v_mov_b32_e32 v33, v4
+; GCN-NEXT: v_mov_b32_e32 v32, v2
+; GCN-NEXT: v_mov_b32_e32 v31, v0
+; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32
+; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_lshlrev_b32_e32 v54, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v55, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v39, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v48, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v49, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v50, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v51, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v52, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v53, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_4
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB7_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v31
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v32
+; GCN-NEXT: v_or_b32_e32 v0, v0, v54
+; GCN-NEXT: v_or_b32_e32 v1, v1, v55
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v33
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v34
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v36
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v37
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v38
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v16
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v18
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v22
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v24
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v26
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v28
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v30
+; GCN-NEXT: v_or_b32_e32 v2, v2, v39
+; GCN-NEXT: v_or_b32_e32 v3, v3, v48
+; GCN-NEXT: v_or_b32_e32 v4, v4, v49
+; GCN-NEXT: v_or_b32_e32 v5, v5, v50
+; GCN-NEXT: v_or_b32_e32 v6, v6, v51
+; GCN-NEXT: v_or_b32_e32 v7, v7, v52
+; GCN-NEXT: v_or_b32_e32 v8, v8, v17
+; GCN-NEXT: v_or_b32_e32 v9, v9, v19
+; GCN-NEXT: v_or_b32_e32 v10, v10, v21
+; GCN-NEXT: v_or_b32_e32 v11, v11, v23
+; GCN-NEXT: v_or_b32_e32 v12, v12, v25
+; GCN-NEXT: v_or_b32_e32 v13, v13, v27
+; GCN-NEXT: v_or_b32_e32 v14, v14, v29
+; GCN-NEXT: v_or_b32_e32 v15, v15, v53
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: .LBB7_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v31
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v30
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v13
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v14
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v15
+; GCN-NEXT: v_or_b32_e32 v0, v54, v0
+; GCN-NEXT: v_or_b32_e32 v1, v55, v1
+; GCN-NEXT: v_or_b32_e32 v2, v39, v2
+; GCN-NEXT: v_or_b32_e32 v3, v48, v3
+; GCN-NEXT: v_or_b32_e32 v4, v49, v4
+; GCN-NEXT: v_or_b32_e32 v5, v50, v5
+; GCN-NEXT: v_or_b32_e32 v6, v51, v6
+; GCN-NEXT: v_or_b32_e32 v7, v52, v7
+; GCN-NEXT: v_or_b32_e32 v8, v17, v8
+; GCN-NEXT: v_or_b32_e32 v9, v19, v9
+; GCN-NEXT: v_or_b32_e32 v10, v21, v10
+; GCN-NEXT: v_or_b32_e32 v11, v23, v11
+; GCN-NEXT: v_or_b32_e32 v12, v25, v12
+; GCN-NEXT: v_or_b32_e32 v13, v27, v13
+; GCN-NEXT: v_or_b32_e32 v14, v29, v14
+; GCN-NEXT: v_or_b32_e32 v15, v53, v15
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, s6, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, s6, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, s6, v7
+; GCN-NEXT: v_add_i32_e32 v8, vcc, s6, v8
+; GCN-NEXT: v_add_i32_e32 v9, vcc, s6, v9
+; GCN-NEXT: v_add_i32_e32 v10, vcc, s6, v10
+; GCN-NEXT: v_add_i32_e32 v11, vcc, s6, v11
+; GCN-NEXT: v_add_i32_e32 v12, vcc, s6, v12
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 0x30000, v13
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 0x30000, v14
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 0x30000, v15
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i16_to_v16i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB7_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v17, 3
+; VI-NEXT: v_add_u16_e32 v16, 3, v15
+; VI-NEXT: v_add_u16_sdwa v15, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v15, v16, v15
+; VI-NEXT: v_add_u16_e32 v16, 3, v14
+; VI-NEXT: v_add_u16_sdwa v14, v14, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v14, v16, v14
+; VI-NEXT: v_add_u16_e32 v16, 3, v13
+; VI-NEXT: v_add_u16_sdwa v13, v13, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v13, v16, v13
+; VI-NEXT: v_add_u16_e32 v16, 3, v12
+; VI-NEXT: v_add_u16_sdwa v12, v12, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v12, v16, v12
+; VI-NEXT: v_add_u16_e32 v16, 3, v11
+; VI-NEXT: v_add_u16_sdwa v11, v11, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v11, v16, v11
+; VI-NEXT: v_add_u16_e32 v16, 3, v10
+; VI-NEXT: v_add_u16_sdwa v10, v10, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v10, v16, v10
+; VI-NEXT: v_add_u16_e32 v16, 3, v9
+; VI-NEXT: v_add_u16_sdwa v9, v9, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v9, v16, v9
+; VI-NEXT: v_add_u16_e32 v16, 3, v8
+; VI-NEXT: v_add_u16_sdwa v8, v8, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v8, v16, v8
+; VI-NEXT: v_add_u16_e32 v16, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v16, v7
+; VI-NEXT: v_add_u16_e32 v16, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v16, v6
+; VI-NEXT: v_add_u16_e32 v16, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v16, v5
+; VI-NEXT: v_add_u16_e32 v16, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v16, v4
+; VI-NEXT: v_add_u16_e32 v16, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v16, v3
+; VI-NEXT: v_add_u16_e32 v16, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v16, v2
+; VI-NEXT: v_add_u16_e32 v16, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v16, v1
+; VI-NEXT: v_add_u16_e32 v16, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v16, v0
+; VI-NEXT: .LBB7_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i16_to_v16i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB7_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB7_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i16_to_v16i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB7_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB7_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i16> %a, splat (i16 3)
+ %a2 = bitcast <32 x i16> %a1 to <16 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i16> %a to <16 x i32>
+ br label %end
+
+end:
+ %phi = phi <16 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i32> %phi
+}
+
+define <32 x half> @v_bitcast_v16i32_to_v32f16(<16 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i32_to_v32f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 ; 4-byte Folded Spill
+; GCN-NEXT: v_mov_b32_e32 v33, v15
+; GCN-NEXT: v_mov_b32_e32 v34, v14
+; GCN-NEXT: v_mov_b32_e32 v35, v13
+; GCN-NEXT: v_mov_b32_e32 v36, v12
+; GCN-NEXT: v_mov_b32_e32 v37, v11
+; GCN-NEXT: v_mov_b32_e32 v38, v10
+; GCN-NEXT: v_mov_b32_e32 v39, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v49, v7
+; GCN-NEXT: v_mov_b32_e32 v50, v6
+; GCN-NEXT: v_mov_b32_e32 v51, v5
+; GCN-NEXT: v_mov_b32_e32 v52, v4
+; GCN-NEXT: v_mov_b32_e32 v53, v3
+; GCN-NEXT: v_mov_b32_e32 v54, v2
+; GCN-NEXT: v_mov_b32_e32 v55, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v37
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v39
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v49
+; GCN-NEXT: s_waitcnt expcnt(6)
+; GCN-NEXT: v_lshrrev_b32_e32 v40, 16, v50
+; GCN-NEXT: s_waitcnt expcnt(5)
+; GCN-NEXT: v_lshrrev_b32_e32 v41, 16, v51
+; GCN-NEXT: s_waitcnt expcnt(4)
+; GCN-NEXT: v_lshrrev_b32_e32 v42, 16, v52
+; GCN-NEXT: s_waitcnt expcnt(3)
+; GCN-NEXT: v_lshrrev_b32_e32 v43, 16, v53
+; GCN-NEXT: s_waitcnt expcnt(2)
+; GCN-NEXT: v_lshrrev_b32_e32 v44, 16, v54
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: v_lshrrev_b32_e32 v45, 16, v55
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: v_lshrrev_b32_e32 v46, 16, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v42
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v46
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v32
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: .LBB8_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v55
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v54
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v53
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v52
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v51
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v50
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v49
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v48
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v39
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v32, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v33, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v34, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v35, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v36, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v37, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v38, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v39, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v32
+; GCN-NEXT: .LBB8_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i32_to_v32f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB8_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB8_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i32_to_v32f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB8_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB8_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i32_to_v32f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB8_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB8_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i32> %a, splat (i32 3)
+ %a2 = bitcast <16 x i32> %a1 to <32 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i32> %a to <32 x half>
+ br label %end
+
+end:
+ %phi = phi <32 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x half> %phi
+}
+
+define <16 x i32> @v_bitcast_v32f16_to_v16i32(<32 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f16_to_v16i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_cvt_f16_f32_e32 v45, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v44, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v43, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v42, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v41, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v52, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v40, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v50, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v55, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v48, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v54, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v38, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v53, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v36, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v51, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v34, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v49, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v33, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v39, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v32, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v37, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v35, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v46
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v45
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v43
+; GCN-NEXT: v_or_b32_e32 v0, v44, v0
+; GCN-NEXT: v_or_b32_e32 v1, v42, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v52, v2
+; GCN-NEXT: v_or_b32_e32 v3, v50, v3
+; GCN-NEXT: v_or_b32_e32 v4, v48, v4
+; GCN-NEXT: v_or_b32_e32 v5, v38, v5
+; GCN-NEXT: v_or_b32_e32 v6, v36, v6
+; GCN-NEXT: v_or_b32_e32 v7, v34, v7
+; GCN-NEXT: v_or_b32_e32 v8, v33, v8
+; GCN-NEXT: v_or_b32_e32 v9, v32, v9
+; GCN-NEXT: v_or_b32_e32 v10, v31, v10
+; GCN-NEXT: v_or_b32_e32 v11, v21, v11
+; GCN-NEXT: v_or_b32_e32 v12, v19, v12
+; GCN-NEXT: v_or_b32_e32 v13, v18, v13
+; GCN-NEXT: v_or_b32_e32 v14, v17, v14
+; GCN-NEXT: v_or_b32_e32 v15, v16, v15
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB9_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x38000000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x38000000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x38000000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x38000000, v28
+; GCN-NEXT: v_add_f32_e32 v29, 0x38000000, v29
+; GCN-NEXT: v_add_f32_e32 v21, 0x38000000, v21
+; GCN-NEXT: v_add_f32_e32 v25, 0x38000000, v25
+; GCN-NEXT: v_add_f32_e32 v19, 0x38000000, v19
+; GCN-NEXT: v_add_f32_e32 v23, 0x38000000, v23
+; GCN-NEXT: v_add_f32_e32 v18, 0x38000000, v18
+; GCN-NEXT: v_add_f32_e32 v22, 0x38000000, v22
+; GCN-NEXT: v_add_f32_e32 v17, 0x38000000, v17
+; GCN-NEXT: v_add_f32_e32 v20, 0x38000000, v20
+; GCN-NEXT: v_add_f32_e32 v16, 0x38000000, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v9, v8
+; GCN-NEXT: v_or_b32_e32 v6, v11, v10
+; GCN-NEXT: v_or_b32_e32 v7, v13, v12
+; GCN-NEXT: v_or_b32_e32 v8, v15, v14
+; GCN-NEXT: v_or_b32_e32 v9, v26, v24
+; GCN-NEXT: v_or_b32_e32 v10, v28, v27
+; GCN-NEXT: v_or_b32_e32 v11, v21, v29
+; GCN-NEXT: v_or_b32_e32 v12, v19, v25
+; GCN-NEXT: v_or_b32_e32 v13, v18, v23
+; GCN-NEXT: v_or_b32_e32 v14, v17, v22
+; GCN-NEXT: v_or_b32_e32 v15, v16, v20
+; GCN-NEXT: .LBB9_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f16_to_v16i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB9_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v16, 0x200
+; VI-NEXT: v_add_f16_sdwa v17, v15, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v15, 0x200, v15
+; VI-NEXT: v_or_b32_e32 v15, v15, v17
+; VI-NEXT: v_add_f16_sdwa v17, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v14, 0x200, v14
+; VI-NEXT: v_or_b32_e32 v14, v14, v17
+; VI-NEXT: v_add_f16_sdwa v17, v13, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v13, 0x200, v13
+; VI-NEXT: v_or_b32_e32 v13, v13, v17
+; VI-NEXT: v_add_f16_sdwa v17, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v12, 0x200, v12
+; VI-NEXT: v_or_b32_e32 v12, v12, v17
+; VI-NEXT: v_add_f16_sdwa v17, v11, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v11, 0x200, v11
+; VI-NEXT: v_or_b32_e32 v11, v11, v17
+; VI-NEXT: v_add_f16_sdwa v17, v10, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v10, 0x200, v10
+; VI-NEXT: v_or_b32_e32 v10, v10, v17
+; VI-NEXT: v_add_f16_sdwa v17, v9, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v9, 0x200, v9
+; VI-NEXT: v_or_b32_e32 v9, v9, v17
+; VI-NEXT: v_add_f16_sdwa v17, v8, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v8
+; VI-NEXT: v_or_b32_e32 v8, v8, v17
+; VI-NEXT: v_add_f16_sdwa v17, v7, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v17
+; VI-NEXT: v_add_f16_sdwa v17, v6, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v17
+; VI-NEXT: v_add_f16_sdwa v17, v5, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v17
+; VI-NEXT: v_add_f16_sdwa v17, v4, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v17
+; VI-NEXT: v_add_f16_sdwa v17, v3, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v17
+; VI-NEXT: v_add_f16_sdwa v17, v2, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v17
+; VI-NEXT: v_add_f16_sdwa v17, v1, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v16, v0, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v17
+; VI-NEXT: v_or_b32_e32 v0, v0, v16
+; VI-NEXT: .LBB9_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f16_to_v16i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB9_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v15, v15, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v14, v14, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v13, v13, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v12, v12, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v11, v11, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v10, v10, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v9, v9, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v8, v8, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB9_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f16_to_v16i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB9_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v15, 0x200, v15 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v14, 0x200, v14 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v13, 0x200, v13 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v12, 0x200, v12 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v11, 0x200, v11 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v10, 0x200, v10 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v9, 0x200, v9 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v8, 0x200, v8 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB9_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <32 x half> %a1 to <16 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x half> %a to <16 x i32>
+ br label %end
+
+end:
+ %phi = phi <16 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i32> %phi
+}
+
+define <32 x bfloat> @v_bitcast_v16i32_to_v32bf16(<16 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16i32_to_v32bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v55, v15
+; GCN-NEXT: v_mov_b32_e32 v54, v14
+; GCN-NEXT: v_mov_b32_e32 v53, v13
+; GCN-NEXT: v_mov_b32_e32 v52, v12
+; GCN-NEXT: v_mov_b32_e32 v51, v11
+; GCN-NEXT: v_mov_b32_e32 v50, v10
+; GCN-NEXT: v_mov_b32_e32 v49, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v39, v7
+; GCN-NEXT: v_mov_b32_e32 v38, v6
+; GCN-NEXT: v_mov_b32_e32 v37, v5
+; GCN-NEXT: v_mov_b32_e32 v36, v4
+; GCN-NEXT: v_mov_b32_e32 v35, v3
+; GCN-NEXT: v_mov_b32_e32 v34, v2
+; GCN-NEXT: v_mov_b32_e32 v33, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_4
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB10_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v55
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v54
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v53
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v52
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v52
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v51
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v50
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v50
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v49
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v48
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v48
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v39
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v38
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v38
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v36
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v36
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v34
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v34
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v33
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v33
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v32
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v32
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: .LBB10_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v39
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v48
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v49
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v50
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v51
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v52
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v53
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v54
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v55
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v15
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v14
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v13
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v12
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v11
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v10
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v9
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16i32_to_v32bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB10_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v15, vcc, 3, v15
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_add_u32_e32 v13, vcc, 3, v13
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_add_u32_e32 v11, vcc, 3, v11
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_add_u32_e32 v9, vcc, 3, v9
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_add_u32_e32 v7, vcc, 3, v7
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_add_u32_e32 v5, vcc, 3, v5
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_add_u32_e32 v3, vcc, 3, v3
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: .LBB10_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16i32_to_v32bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB10_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v15, 3, v15
+; GFX9-NEXT: v_add_u32_e32 v14, 3, v14
+; GFX9-NEXT: v_add_u32_e32 v13, 3, v13
+; GFX9-NEXT: v_add_u32_e32 v12, 3, v12
+; GFX9-NEXT: v_add_u32_e32 v11, 3, v11
+; GFX9-NEXT: v_add_u32_e32 v10, 3, v10
+; GFX9-NEXT: v_add_u32_e32 v9, 3, v9
+; GFX9-NEXT: v_add_u32_e32 v8, 3, v8
+; GFX9-NEXT: v_add_u32_e32 v7, 3, v7
+; GFX9-NEXT: v_add_u32_e32 v6, 3, v6
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v5
+; GFX9-NEXT: v_add_u32_e32 v4, 3, v4
+; GFX9-NEXT: v_add_u32_e32 v3, 3, v3
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: .LBB10_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16i32_to_v32bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB10_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v15, 3, v15
+; GFX11-NEXT: v_add_nc_u32_e32 v14, 3, v14
+; GFX11-NEXT: v_add_nc_u32_e32 v13, 3, v13
+; GFX11-NEXT: v_add_nc_u32_e32 v12, 3, v12
+; GFX11-NEXT: v_add_nc_u32_e32 v11, 3, v11
+; GFX11-NEXT: v_add_nc_u32_e32 v10, 3, v10
+; GFX11-NEXT: v_add_nc_u32_e32 v9, 3, v9
+; GFX11-NEXT: v_add_nc_u32_e32 v8, 3, v8
+; GFX11-NEXT: v_add_nc_u32_e32 v7, 3, v7
+; GFX11-NEXT: v_add_nc_u32_e32 v6, 3, v6
+; GFX11-NEXT: v_add_nc_u32_e32 v5, 3, v5
+; GFX11-NEXT: v_add_nc_u32_e32 v4, 3, v4
+; GFX11-NEXT: v_add_nc_u32_e32 v3, 3, v3
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: .LBB10_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <16 x i32> %a, splat (i32 3)
+ %a2 = bitcast <16 x i32> %a1 to <32 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x i32> %a to <32 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <32 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x bfloat> %phi
+}
+
+define <16 x i32> @v_bitcast_v32bf16_to_v16i32(<32 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32bf16_to_v16i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_mul_f32_e32 v44, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v45, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v42, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v43, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v41, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v51, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v40, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v49, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v55, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v39, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v54, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v37, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v53, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v36, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v52, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v34, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v50, 1.0, v17
+; GCN-NEXT: v_mul_f32_e32 v33, 1.0, v16
+; GCN-NEXT: v_mul_f32_e32 v48, 1.0, v19
+; GCN-NEXT: v_mul_f32_e32 v32, 1.0, v18
+; GCN-NEXT: v_mul_f32_e32 v38, 1.0, v21
+; GCN-NEXT: v_mul_f32_e32 v31, 1.0, v20
+; GCN-NEXT: v_mul_f32_e32 v35, 1.0, v23
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v22
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v24
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v27
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v26
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v29
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v46
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v44
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v42
+; GCN-NEXT: v_alignbit_b32 v0, v0, v45, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v43, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v52
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v50
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v2, v51, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v49, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v39, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v37, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v36, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v34, 16
+; GCN-NEXT: v_alignbit_b32 v8, v8, v33, 16
+; GCN-NEXT: v_alignbit_b32 v9, v9, v32, 16
+; GCN-NEXT: v_alignbit_b32 v10, v10, v31, 16
+; GCN-NEXT: v_alignbit_b32 v11, v11, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v12, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v13, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v14, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB11_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v45
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v44
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v43
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v51
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v41
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v49
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v40
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v39
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v55
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v54
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v36
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v53
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v34
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v52
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v33
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v50
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v32
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff0000, v48
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v31
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff0000, v38
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v35
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v23
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v20
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GCN-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GCN-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GCN-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GCN-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GCN-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GCN-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GCN-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v28, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v9, v8, 16
+; GCN-NEXT: v_alignbit_b32 v6, v11, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v13, v12, 16
+; GCN-NEXT: v_alignbit_b32 v8, v15, v14, 16
+; GCN-NEXT: v_alignbit_b32 v9, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v10, v28, v27, 16
+; GCN-NEXT: v_alignbit_b32 v11, v29, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v25, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v23, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v22, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v20, v16, 16
+; GCN-NEXT: .LBB11_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32bf16_to_v16i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB11_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v15, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v15
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; VI-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v14
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v14, 16, v14
+; VI-NEXT: v_alignbit_b32 v14, v14, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v13
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; VI-NEXT: v_alignbit_b32 v13, v13, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v12
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; VI-NEXT: v_alignbit_b32 v12, v12, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v11
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; VI-NEXT: v_alignbit_b32 v11, v11, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v10
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v10, 16, v10
+; VI-NEXT: v_alignbit_b32 v10, v10, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v9
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; VI-NEXT: v_alignbit_b32 v9, v9, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v8
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v8, 16, v8
+; VI-NEXT: v_alignbit_b32 v8, v8, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v7
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v6
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v5
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v4
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v3
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v2
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v1
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v0
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v16, 16
+; VI-NEXT: .LBB11_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32bf16_to_v16i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB11_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v15, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v15, v15, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v14, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v14, v14, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v13, v13, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v12, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v12, v12, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v11, v11, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v10, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v10, v10, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v9, v9, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v8, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v8, v8, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v7, v7, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v16, s7
+; GFX9-NEXT: .LBB11_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32bf16_to_v16i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB11_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v17, 16, v14
+; GFX11-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX11-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v17 :: v_dual_add_f32 v16, 0x40c00000, v16
+; GFX11-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v16, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v16
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v23, v14, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v16, v16
+; GFX11-NEXT: v_add3_u32 v21, v21, v17, 0x7fff
+; GFX11-NEXT: v_add3_u32 v18, v18, v16, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v18, v19, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v23, v14, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_bfe_u32 v20, v15, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v15
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v20, v20, v15, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v15, v20, v22 :: v_dual_lshlrev_b32 v20, 16, v13
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v15, v15, v16, 0x7060302
+; GFX11-NEXT: v_dual_cndmask_b32 v17, v21, v18 :: v_dual_add_f32 v18, 0x40c00000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v14, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_perm_b32 v14, v14, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_cndmask_b32 v16, v16, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v13, v13
+; GFX11-NEXT: v_add3_u32 v17, v17, v13, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cndmask_b32_e32 v13, v17, v21, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_perm_b32 v13, v13, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v11
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_bfe_u32 v18, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v18, v18, v12, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v10
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v12, v12, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v17, v17, v11, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_bfe_u32 v19, v10, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v19, v19, v10, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v10
+; GFX11-NEXT: v_perm_b32 v11, v11, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v19, v22 :: v_dual_lshlrev_b32 v21, 16, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_lshlrev_b32 v19, 16, v8
+; GFX11-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX11-NEXT: v_perm_b32 v10, v10, v17, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v8, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v9, 0x40c00000, v9
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v18, v18, v8, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v9
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_add3_u32 v17, v17, v9, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v8
+; GFX11-NEXT: v_perm_b32 v9, v9, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v6
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v8, v8, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v6, 0x40c00000, v6 :: v_dual_add_f32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_bfe_u32 v19, v6, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v19, v19, v6, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_add3_u32 v17, v17, v7, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v6
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_cndmask_b32 v17, v17, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v20, v18, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_perm_b32 v7, v7, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v17, 0x40c00000, v19
+; GFX11-NEXT: v_add3_u32 v19, v20, v18, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_bfe_u32 v22, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_lshlrev_b32_e32 v20, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_add3_u32 v16, v16, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v16, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v16, v22, v17, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v5, v5, v18, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v18, v4, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v19, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v18, v18, v4, 0x7fff
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v20 :: v_dual_lshlrev_b32 v20, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v18, v19, vcc_lo
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v20
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v17
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v4, v4, v16, 0x7060302
+; GFX11-NEXT: v_add3_u32 v19, v21, v17, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v21, v3, 16, 1
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v18
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v19, v20, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v21, v3, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v21, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GFX11-NEXT: v_bfe_u32 v24, v2, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v19, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v22
+; GFX11-NEXT: v_add3_u32 v20, v24, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v3, v3, v17, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v21, v23, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v20, v21 :: v_dual_lshlrev_b32 v23, 16, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v21, v22, v19, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add_f32_e32 v20, 0x40c00000, v23
+; GFX11-NEXT: v_perm_b32 v2, v2, v18, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v19, v21, v22 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v24, v20, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v21, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v24, v24, v20, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v21, v21, v0, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v23, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_add3_u32 v22, v23, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v22, v23, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v20, v20
+; GFX11-NEXT: v_perm_b32 v1, v1, v19, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v20, v24, v25, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v21, v26, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v20, 0x7060302
+; GFX11-NEXT: .LBB11_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <32 x bfloat> %a1 to <16 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x bfloat> %a to <16 x i32>
+ br label %end
+
+end:
+ %phi = phi <16 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x i32> %phi
+}
+
+define <8 x i64> @v_bitcast_v16f32_to_v8i64(<16 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f32_to_v8i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB12_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB12_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f32_to_v8i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB12_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB12_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f32_to_v8i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB12_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB12_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f32_to_v8i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB12_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB12_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <16 x float> %a1 to <8 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x float> %a to <8 x i64>
+ br label %end
+
+end:
+ %phi = phi <8 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i64> %phi
+}
+
+define <16 x float> @v_bitcast_v8i64_to_v16f32(<8 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i64_to_v16f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB13_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB13_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i64_to_v16f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB13_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB13_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i64_to_v16f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB13_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB13_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i64_to_v16f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB13_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB13_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i64> %a, splat (i64 3)
+ %a2 = bitcast <8 x i64> %a1 to <16 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i64> %a to <16 x float>
+ br label %end
+
+end:
+ %phi = phi <16 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x float> %phi
+}
+
+define <8 x double> @v_bitcast_v16f32_to_v8f64(<16 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f32_to_v8f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB14_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB14_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f32_to_v8f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB14_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB14_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f32_to_v8f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB14_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB14_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f32_to_v8f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB14_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB14_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <16 x float> %a1 to <8 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x float> %a to <8 x double>
+ br label %end
+
+end:
+ %phi = phi <8 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x double> %phi
+}
+
+define <16 x float> @v_bitcast_v8f64_to_v16f32(<8 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f64_to_v16f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB15_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB15_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f64_to_v16f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB15_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB15_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f64_to_v16f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB15_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB15_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f64_to_v16f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB15_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB15_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <8 x double> %a1 to <16 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x double> %a to <16 x float>
+ br label %end
+
+end:
+ %phi = phi <16 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x float> %phi
+}
+
+define <32 x i16> @v_bitcast_v16f32_to_v32i16(<16 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f32_to_v32i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v30, v15
+; GCN-NEXT: v_mov_b32_e32 v28, v14
+; GCN-NEXT: v_mov_b32_e32 v26, v13
+; GCN-NEXT: v_mov_b32_e32 v24, v12
+; GCN-NEXT: v_mov_b32_e32 v22, v11
+; GCN-NEXT: v_mov_b32_e32 v20, v10
+; GCN-NEXT: v_mov_b32_e32 v18, v9
+; GCN-NEXT: v_mov_b32_e32 v32, v8
+; GCN-NEXT: v_mov_b32_e32 v14, v7
+; GCN-NEXT: v_mov_b32_e32 v12, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v8, v4
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v4, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v29, v30, v28, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v20, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v32, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v8, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB16_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GCN-NEXT: v_add_f32_e32 v18, 1.0, v18
+; GCN-NEXT: v_add_f32_e32 v32, 1.0, v32
+; GCN-NEXT: v_add_f32_e32 v22, 1.0, v22
+; GCN-NEXT: v_add_f32_e32 v20, 1.0, v20
+; GCN-NEXT: v_add_f32_e32 v26, 1.0, v26
+; GCN-NEXT: v_add_f32_e32 v24, 1.0, v24
+; GCN-NEXT: v_add_f32_e32 v30, 1.0, v30
+; GCN-NEXT: v_add_f32_e32 v28, 1.0, v28
+; GCN-NEXT: v_alignbit_b32 v29, v30, v28, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v20, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v32, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v8, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB16_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v16, v32
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f32_to_v32i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB16_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB16_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f32_to_v32i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB16_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB16_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f32_to_v32i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB16_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB16_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <16 x float> %a1 to <32 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x float> %a to <32 x i16>
+ br label %end
+
+end:
+ %phi = phi <32 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i16> %phi
+}
+
+define <16 x float> @v_bitcast_v32i16_to_v16f32(<32 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i16_to_v16f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v38, v14
+; GCN-NEXT: v_mov_b32_e32 v37, v12
+; GCN-NEXT: v_mov_b32_e32 v36, v10
+; GCN-NEXT: v_mov_b32_e32 v35, v8
+; GCN-NEXT: v_mov_b32_e32 v34, v6
+; GCN-NEXT: v_mov_b32_e32 v33, v4
+; GCN-NEXT: v_mov_b32_e32 v32, v2
+; GCN-NEXT: v_mov_b32_e32 v31, v0
+; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32
+; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_lshlrev_b32_e32 v54, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v55, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v39, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v48, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v49, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v50, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v51, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v52, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v53, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_4
+; GCN-NEXT: .LBB17_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB17_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v31
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v32
+; GCN-NEXT: v_or_b32_e32 v0, v0, v54
+; GCN-NEXT: v_or_b32_e32 v1, v1, v55
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v33
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v34
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v36
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v37
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v38
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v16
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v18
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v22
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v24
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v26
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v28
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v30
+; GCN-NEXT: v_or_b32_e32 v2, v2, v39
+; GCN-NEXT: v_or_b32_e32 v3, v3, v48
+; GCN-NEXT: v_or_b32_e32 v4, v4, v49
+; GCN-NEXT: v_or_b32_e32 v5, v5, v50
+; GCN-NEXT: v_or_b32_e32 v6, v6, v51
+; GCN-NEXT: v_or_b32_e32 v7, v7, v52
+; GCN-NEXT: v_or_b32_e32 v8, v8, v17
+; GCN-NEXT: v_or_b32_e32 v9, v9, v19
+; GCN-NEXT: v_or_b32_e32 v10, v10, v21
+; GCN-NEXT: v_or_b32_e32 v11, v11, v23
+; GCN-NEXT: v_or_b32_e32 v12, v12, v25
+; GCN-NEXT: v_or_b32_e32 v13, v13, v27
+; GCN-NEXT: v_or_b32_e32 v14, v14, v29
+; GCN-NEXT: v_or_b32_e32 v15, v15, v53
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB17_2
+; GCN-NEXT: .LBB17_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v31
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v30
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v13
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v14
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v15
+; GCN-NEXT: v_or_b32_e32 v0, v54, v0
+; GCN-NEXT: v_or_b32_e32 v1, v55, v1
+; GCN-NEXT: v_or_b32_e32 v2, v39, v2
+; GCN-NEXT: v_or_b32_e32 v3, v48, v3
+; GCN-NEXT: v_or_b32_e32 v4, v49, v4
+; GCN-NEXT: v_or_b32_e32 v5, v50, v5
+; GCN-NEXT: v_or_b32_e32 v6, v51, v6
+; GCN-NEXT: v_or_b32_e32 v7, v52, v7
+; GCN-NEXT: v_or_b32_e32 v8, v17, v8
+; GCN-NEXT: v_or_b32_e32 v9, v19, v9
+; GCN-NEXT: v_or_b32_e32 v10, v21, v10
+; GCN-NEXT: v_or_b32_e32 v11, v23, v11
+; GCN-NEXT: v_or_b32_e32 v12, v25, v12
+; GCN-NEXT: v_or_b32_e32 v13, v27, v13
+; GCN-NEXT: v_or_b32_e32 v14, v29, v14
+; GCN-NEXT: v_or_b32_e32 v15, v53, v15
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, s6, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, s6, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, s6, v7
+; GCN-NEXT: v_add_i32_e32 v8, vcc, s6, v8
+; GCN-NEXT: v_add_i32_e32 v9, vcc, s6, v9
+; GCN-NEXT: v_add_i32_e32 v10, vcc, s6, v10
+; GCN-NEXT: v_add_i32_e32 v11, vcc, s6, v11
+; GCN-NEXT: v_add_i32_e32 v12, vcc, s6, v12
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 0x30000, v13
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 0x30000, v14
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 0x30000, v15
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i16_to_v16f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB17_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v17, 3
+; VI-NEXT: v_add_u16_e32 v16, 3, v15
+; VI-NEXT: v_add_u16_sdwa v15, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v15, v16, v15
+; VI-NEXT: v_add_u16_e32 v16, 3, v14
+; VI-NEXT: v_add_u16_sdwa v14, v14, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v14, v16, v14
+; VI-NEXT: v_add_u16_e32 v16, 3, v13
+; VI-NEXT: v_add_u16_sdwa v13, v13, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v13, v16, v13
+; VI-NEXT: v_add_u16_e32 v16, 3, v12
+; VI-NEXT: v_add_u16_sdwa v12, v12, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v12, v16, v12
+; VI-NEXT: v_add_u16_e32 v16, 3, v11
+; VI-NEXT: v_add_u16_sdwa v11, v11, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v11, v16, v11
+; VI-NEXT: v_add_u16_e32 v16, 3, v10
+; VI-NEXT: v_add_u16_sdwa v10, v10, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v10, v16, v10
+; VI-NEXT: v_add_u16_e32 v16, 3, v9
+; VI-NEXT: v_add_u16_sdwa v9, v9, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v9, v16, v9
+; VI-NEXT: v_add_u16_e32 v16, 3, v8
+; VI-NEXT: v_add_u16_sdwa v8, v8, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v8, v16, v8
+; VI-NEXT: v_add_u16_e32 v16, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v16, v7
+; VI-NEXT: v_add_u16_e32 v16, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v16, v6
+; VI-NEXT: v_add_u16_e32 v16, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v16, v5
+; VI-NEXT: v_add_u16_e32 v16, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v16, v4
+; VI-NEXT: v_add_u16_e32 v16, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v16, v3
+; VI-NEXT: v_add_u16_e32 v16, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v16, v2
+; VI-NEXT: v_add_u16_e32 v16, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v16, v1
+; VI-NEXT: v_add_u16_e32 v16, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v16, v0
+; VI-NEXT: .LBB17_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i16_to_v16f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB17_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB17_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i16_to_v16f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB17_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB17_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i16> %a, splat (i16 3)
+ %a2 = bitcast <32 x i16> %a1 to <16 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i16> %a to <16 x float>
+ br label %end
+
+end:
+ %phi = phi <16 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x float> %phi
+}
+
+define <32 x half> @v_bitcast_v16f32_to_v32f16(<16 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f32_to_v32f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 ; 4-byte Folded Spill
+; GCN-NEXT: v_mov_b32_e32 v33, v15
+; GCN-NEXT: v_mov_b32_e32 v34, v14
+; GCN-NEXT: v_mov_b32_e32 v35, v13
+; GCN-NEXT: v_mov_b32_e32 v36, v12
+; GCN-NEXT: v_mov_b32_e32 v37, v11
+; GCN-NEXT: v_mov_b32_e32 v38, v10
+; GCN-NEXT: v_mov_b32_e32 v39, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v49, v7
+; GCN-NEXT: v_mov_b32_e32 v50, v6
+; GCN-NEXT: v_mov_b32_e32 v51, v5
+; GCN-NEXT: v_mov_b32_e32 v52, v4
+; GCN-NEXT: v_mov_b32_e32 v53, v3
+; GCN-NEXT: v_mov_b32_e32 v54, v2
+; GCN-NEXT: v_mov_b32_e32 v55, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v37
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v39
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v49
+; GCN-NEXT: s_waitcnt expcnt(6)
+; GCN-NEXT: v_lshrrev_b32_e32 v40, 16, v50
+; GCN-NEXT: s_waitcnt expcnt(5)
+; GCN-NEXT: v_lshrrev_b32_e32 v41, 16, v51
+; GCN-NEXT: s_waitcnt expcnt(4)
+; GCN-NEXT: v_lshrrev_b32_e32 v42, 16, v52
+; GCN-NEXT: s_waitcnt expcnt(3)
+; GCN-NEXT: v_lshrrev_b32_e32 v43, 16, v53
+; GCN-NEXT: s_waitcnt expcnt(2)
+; GCN-NEXT: v_lshrrev_b32_e32 v44, 16, v54
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: v_lshrrev_b32_e32 v45, 16, v55
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: v_lshrrev_b32_e32 v46, 16, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v42
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v46
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v32
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: .LBB18_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v32
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v55
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v54
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v53
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v52
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v51
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v50
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v49
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v48
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v39
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v38
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v37
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v36
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v35
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v34
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v32, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v33, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v34, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v35, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v36, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v37, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v38, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v39, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v32
+; GCN-NEXT: .LBB18_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f32_to_v32f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB18_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB18_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f32_to_v32f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB18_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB18_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f32_to_v32f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB18_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB18_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <16 x float> %a1 to <32 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x float> %a to <32 x half>
+ br label %end
+
+end:
+ %phi = phi <32 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x half> %phi
+}
+
+define <16 x float> @v_bitcast_v32f16_to_v16f32(<32 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f16_to_v16f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_cvt_f16_f32_e32 v45, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v44, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v43, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v42, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v41, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v52, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v40, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v50, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v55, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v48, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v54, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v38, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v53, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v36, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v51, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v34, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v49, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v33, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v39, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v32, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v37, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v35, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v46
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB19_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v45
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v43
+; GCN-NEXT: v_or_b32_e32 v0, v44, v0
+; GCN-NEXT: v_or_b32_e32 v1, v42, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v52, v2
+; GCN-NEXT: v_or_b32_e32 v3, v50, v3
+; GCN-NEXT: v_or_b32_e32 v4, v48, v4
+; GCN-NEXT: v_or_b32_e32 v5, v38, v5
+; GCN-NEXT: v_or_b32_e32 v6, v36, v6
+; GCN-NEXT: v_or_b32_e32 v7, v34, v7
+; GCN-NEXT: v_or_b32_e32 v8, v33, v8
+; GCN-NEXT: v_or_b32_e32 v9, v32, v9
+; GCN-NEXT: v_or_b32_e32 v10, v31, v10
+; GCN-NEXT: v_or_b32_e32 v11, v21, v11
+; GCN-NEXT: v_or_b32_e32 v12, v19, v12
+; GCN-NEXT: v_or_b32_e32 v13, v18, v13
+; GCN-NEXT: v_or_b32_e32 v14, v17, v14
+; GCN-NEXT: v_or_b32_e32 v15, v16, v15
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB19_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB19_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x38000000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x38000000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x38000000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x38000000, v28
+; GCN-NEXT: v_add_f32_e32 v29, 0x38000000, v29
+; GCN-NEXT: v_add_f32_e32 v21, 0x38000000, v21
+; GCN-NEXT: v_add_f32_e32 v25, 0x38000000, v25
+; GCN-NEXT: v_add_f32_e32 v19, 0x38000000, v19
+; GCN-NEXT: v_add_f32_e32 v23, 0x38000000, v23
+; GCN-NEXT: v_add_f32_e32 v18, 0x38000000, v18
+; GCN-NEXT: v_add_f32_e32 v22, 0x38000000, v22
+; GCN-NEXT: v_add_f32_e32 v17, 0x38000000, v17
+; GCN-NEXT: v_add_f32_e32 v20, 0x38000000, v20
+; GCN-NEXT: v_add_f32_e32 v16, 0x38000000, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v9, v8
+; GCN-NEXT: v_or_b32_e32 v6, v11, v10
+; GCN-NEXT: v_or_b32_e32 v7, v13, v12
+; GCN-NEXT: v_or_b32_e32 v8, v15, v14
+; GCN-NEXT: v_or_b32_e32 v9, v26, v24
+; GCN-NEXT: v_or_b32_e32 v10, v28, v27
+; GCN-NEXT: v_or_b32_e32 v11, v21, v29
+; GCN-NEXT: v_or_b32_e32 v12, v19, v25
+; GCN-NEXT: v_or_b32_e32 v13, v18, v23
+; GCN-NEXT: v_or_b32_e32 v14, v17, v22
+; GCN-NEXT: v_or_b32_e32 v15, v16, v20
+; GCN-NEXT: .LBB19_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f16_to_v16f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB19_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v16, 0x200
+; VI-NEXT: v_add_f16_sdwa v17, v15, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v15, 0x200, v15
+; VI-NEXT: v_or_b32_e32 v15, v15, v17
+; VI-NEXT: v_add_f16_sdwa v17, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v14, 0x200, v14
+; VI-NEXT: v_or_b32_e32 v14, v14, v17
+; VI-NEXT: v_add_f16_sdwa v17, v13, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v13, 0x200, v13
+; VI-NEXT: v_or_b32_e32 v13, v13, v17
+; VI-NEXT: v_add_f16_sdwa v17, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v12, 0x200, v12
+; VI-NEXT: v_or_b32_e32 v12, v12, v17
+; VI-NEXT: v_add_f16_sdwa v17, v11, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v11, 0x200, v11
+; VI-NEXT: v_or_b32_e32 v11, v11, v17
+; VI-NEXT: v_add_f16_sdwa v17, v10, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v10, 0x200, v10
+; VI-NEXT: v_or_b32_e32 v10, v10, v17
+; VI-NEXT: v_add_f16_sdwa v17, v9, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v9, 0x200, v9
+; VI-NEXT: v_or_b32_e32 v9, v9, v17
+; VI-NEXT: v_add_f16_sdwa v17, v8, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v8
+; VI-NEXT: v_or_b32_e32 v8, v8, v17
+; VI-NEXT: v_add_f16_sdwa v17, v7, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v17
+; VI-NEXT: v_add_f16_sdwa v17, v6, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v17
+; VI-NEXT: v_add_f16_sdwa v17, v5, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v17
+; VI-NEXT: v_add_f16_sdwa v17, v4, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v17
+; VI-NEXT: v_add_f16_sdwa v17, v3, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v17
+; VI-NEXT: v_add_f16_sdwa v17, v2, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v17
+; VI-NEXT: v_add_f16_sdwa v17, v1, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v16, v0, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v17
+; VI-NEXT: v_or_b32_e32 v0, v0, v16
+; VI-NEXT: .LBB19_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f16_to_v16f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB19_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v15, v15, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v14, v14, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v13, v13, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v12, v12, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v11, v11, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v10, v10, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v9, v9, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v8, v8, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB19_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f16_to_v16f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB19_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v15, 0x200, v15 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v14, 0x200, v14 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v13, 0x200, v13 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v12, 0x200, v12 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v11, 0x200, v11 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v10, 0x200, v10 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v9, 0x200, v9 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v8, 0x200, v8 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB19_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <32 x half> %a1 to <16 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x half> %a to <16 x float>
+ br label %end
+
+end:
+ %phi = phi <16 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x float> %phi
+}
+
+define <32 x bfloat> @v_bitcast_v16f32_to_v32bf16(<16 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v16f32_to_v32bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v55, v15
+; GCN-NEXT: v_mov_b32_e32 v54, v14
+; GCN-NEXT: v_mov_b32_e32 v53, v13
+; GCN-NEXT: v_mov_b32_e32 v52, v12
+; GCN-NEXT: v_mov_b32_e32 v51, v11
+; GCN-NEXT: v_mov_b32_e32 v50, v10
+; GCN-NEXT: v_mov_b32_e32 v49, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v39, v7
+; GCN-NEXT: v_mov_b32_e32 v38, v6
+; GCN-NEXT: v_mov_b32_e32 v37, v5
+; GCN-NEXT: v_mov_b32_e32 v36, v4
+; GCN-NEXT: v_mov_b32_e32 v35, v3
+; GCN-NEXT: v_mov_b32_e32 v34, v2
+; GCN-NEXT: v_mov_b32_e32 v33, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB20_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB20_4
+; GCN-NEXT: .LBB20_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB20_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v55
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v54
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v53
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v52
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v52
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v51
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v50
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v50
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v49
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v48
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v48
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v39
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v38
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v38
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v36
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v36
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v34
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v34
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v33
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v33
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v32
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v32
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB20_2
+; GCN-NEXT: .LBB20_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v32
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v33
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v34
+; GCN-NEXT: v_add_f32_e32 v3, 1.0, v35
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v36
+; GCN-NEXT: v_add_f32_e32 v5, 1.0, v37
+; GCN-NEXT: v_add_f32_e32 v6, 1.0, v38
+; GCN-NEXT: v_add_f32_e32 v7, 1.0, v39
+; GCN-NEXT: v_add_f32_e32 v8, 1.0, v48
+; GCN-NEXT: v_add_f32_e32 v9, 1.0, v49
+; GCN-NEXT: v_add_f32_e32 v10, 1.0, v50
+; GCN-NEXT: v_add_f32_e32 v11, 1.0, v51
+; GCN-NEXT: v_add_f32_e32 v12, 1.0, v52
+; GCN-NEXT: v_add_f32_e32 v13, 1.0, v53
+; GCN-NEXT: v_add_f32_e32 v14, 1.0, v54
+; GCN-NEXT: v_add_f32_e32 v15, 1.0, v55
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v15
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v14
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v13
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v12
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v11
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v10
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v9
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v16f32_to_v32bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB20_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v15, 1.0, v15
+; VI-NEXT: v_add_f32_e32 v14, 1.0, v14
+; VI-NEXT: v_add_f32_e32 v13, 1.0, v13
+; VI-NEXT: v_add_f32_e32 v12, 1.0, v12
+; VI-NEXT: v_add_f32_e32 v11, 1.0, v11
+; VI-NEXT: v_add_f32_e32 v10, 1.0, v10
+; VI-NEXT: v_add_f32_e32 v9, 1.0, v9
+; VI-NEXT: v_add_f32_e32 v8, 1.0, v8
+; VI-NEXT: v_add_f32_e32 v7, 1.0, v7
+; VI-NEXT: v_add_f32_e32 v6, 1.0, v6
+; VI-NEXT: v_add_f32_e32 v5, 1.0, v5
+; VI-NEXT: v_add_f32_e32 v4, 1.0, v4
+; VI-NEXT: v_add_f32_e32 v3, 1.0, v3
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: .LBB20_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v16f32_to_v32bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB20_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v15, 1.0, v15
+; GFX9-NEXT: v_add_f32_e32 v14, 1.0, v14
+; GFX9-NEXT: v_add_f32_e32 v13, 1.0, v13
+; GFX9-NEXT: v_add_f32_e32 v12, 1.0, v12
+; GFX9-NEXT: v_add_f32_e32 v11, 1.0, v11
+; GFX9-NEXT: v_add_f32_e32 v10, 1.0, v10
+; GFX9-NEXT: v_add_f32_e32 v9, 1.0, v9
+; GFX9-NEXT: v_add_f32_e32 v8, 1.0, v8
+; GFX9-NEXT: v_add_f32_e32 v7, 1.0, v7
+; GFX9-NEXT: v_add_f32_e32 v6, 1.0, v6
+; GFX9-NEXT: v_add_f32_e32 v5, 1.0, v5
+; GFX9-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GFX9-NEXT: v_add_f32_e32 v3, 1.0, v3
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: .LBB20_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v16f32_to_v32bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB20_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v15, 1.0, v15 :: v_dual_add_f32 v14, 1.0, v14
+; GFX11-NEXT: v_dual_add_f32 v13, 1.0, v13 :: v_dual_add_f32 v12, 1.0, v12
+; GFX11-NEXT: v_dual_add_f32 v11, 1.0, v11 :: v_dual_add_f32 v10, 1.0, v10
+; GFX11-NEXT: v_dual_add_f32 v9, 1.0, v9 :: v_dual_add_f32 v8, 1.0, v8
+; GFX11-NEXT: v_dual_add_f32 v7, 1.0, v7 :: v_dual_add_f32 v6, 1.0, v6
+; GFX11-NEXT: v_dual_add_f32 v5, 1.0, v5 :: v_dual_add_f32 v4, 1.0, v4
+; GFX11-NEXT: v_dual_add_f32 v3, 1.0, v3 :: v_dual_add_f32 v2, 1.0, v2
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: .LBB20_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <16 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <16 x float> %a1 to <32 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <16 x float> %a to <32 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <32 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x bfloat> %phi
+}
+
+define <16 x float> @v_bitcast_v32bf16_to_v16f32(<32 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32bf16_to_v16f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_mul_f32_e32 v44, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v45, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v42, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v43, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v41, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v51, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v40, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v49, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v55, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v39, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v54, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v37, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v53, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v36, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v52, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v34, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v50, 1.0, v17
+; GCN-NEXT: v_mul_f32_e32 v33, 1.0, v16
+; GCN-NEXT: v_mul_f32_e32 v48, 1.0, v19
+; GCN-NEXT: v_mul_f32_e32 v32, 1.0, v18
+; GCN-NEXT: v_mul_f32_e32 v38, 1.0, v21
+; GCN-NEXT: v_mul_f32_e32 v31, 1.0, v20
+; GCN-NEXT: v_mul_f32_e32 v35, 1.0, v23
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v22
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v24
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v27
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v26
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v29
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v46
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB21_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v44
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v42
+; GCN-NEXT: v_alignbit_b32 v0, v0, v45, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v43, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v52
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v50
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v2, v51, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v49, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v39, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v37, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v36, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v34, 16
+; GCN-NEXT: v_alignbit_b32 v8, v8, v33, 16
+; GCN-NEXT: v_alignbit_b32 v9, v9, v32, 16
+; GCN-NEXT: v_alignbit_b32 v10, v10, v31, 16
+; GCN-NEXT: v_alignbit_b32 v11, v11, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v12, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v13, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v14, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB21_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB21_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v45
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v44
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v43
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v51
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v41
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v49
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v40
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v39
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v55
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v54
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v36
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v53
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v34
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v52
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v33
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v50
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v32
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff0000, v48
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v31
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff0000, v38
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v35
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v23
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v20
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GCN-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GCN-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GCN-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GCN-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GCN-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GCN-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GCN-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v28, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v9, v8, 16
+; GCN-NEXT: v_alignbit_b32 v6, v11, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v13, v12, 16
+; GCN-NEXT: v_alignbit_b32 v8, v15, v14, 16
+; GCN-NEXT: v_alignbit_b32 v9, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v10, v28, v27, 16
+; GCN-NEXT: v_alignbit_b32 v11, v29, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v25, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v23, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v22, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v20, v16, 16
+; GCN-NEXT: .LBB21_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32bf16_to_v16f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB21_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v15, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v15
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; VI-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v14
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v14, 16, v14
+; VI-NEXT: v_alignbit_b32 v14, v14, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v13
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; VI-NEXT: v_alignbit_b32 v13, v13, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v12
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; VI-NEXT: v_alignbit_b32 v12, v12, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v11
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; VI-NEXT: v_alignbit_b32 v11, v11, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v10
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v10, 16, v10
+; VI-NEXT: v_alignbit_b32 v10, v10, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v9
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; VI-NEXT: v_alignbit_b32 v9, v9, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v8
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v8, 16, v8
+; VI-NEXT: v_alignbit_b32 v8, v8, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v7
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v6
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v5
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v4
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v3
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v2
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v1
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v0
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v16, 16
+; VI-NEXT: .LBB21_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32bf16_to_v16f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB21_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v15, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v15, v15, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v14, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v14, v14, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v13, v13, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v12, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v12, v12, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v11, v11, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v10, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v10, v10, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v9, v9, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v8, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v8, v8, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v7, v7, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v16, s7
+; GFX9-NEXT: .LBB21_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32bf16_to_v16f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB21_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v17, 16, v14
+; GFX11-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX11-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v17 :: v_dual_add_f32 v16, 0x40c00000, v16
+; GFX11-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v16, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v16
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v23, v14, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v16, v16
+; GFX11-NEXT: v_add3_u32 v21, v21, v17, 0x7fff
+; GFX11-NEXT: v_add3_u32 v18, v18, v16, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v18, v19, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v23, v14, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_bfe_u32 v20, v15, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v15
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v20, v20, v15, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v15, v20, v22 :: v_dual_lshlrev_b32 v20, 16, v13
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v15, v15, v16, 0x7060302
+; GFX11-NEXT: v_dual_cndmask_b32 v17, v21, v18 :: v_dual_add_f32 v18, 0x40c00000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v14, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_perm_b32 v14, v14, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_cndmask_b32 v16, v16, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v13, v13
+; GFX11-NEXT: v_add3_u32 v17, v17, v13, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cndmask_b32_e32 v13, v17, v21, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_perm_b32 v13, v13, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v11
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_bfe_u32 v18, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v18, v18, v12, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v10
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v12, v12, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v17, v17, v11, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_bfe_u32 v19, v10, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v19, v19, v10, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v10
+; GFX11-NEXT: v_perm_b32 v11, v11, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v19, v22 :: v_dual_lshlrev_b32 v21, 16, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_lshlrev_b32 v19, 16, v8
+; GFX11-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX11-NEXT: v_perm_b32 v10, v10, v17, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v8, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v9, 0x40c00000, v9
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v18, v18, v8, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v9
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_add3_u32 v17, v17, v9, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v8
+; GFX11-NEXT: v_perm_b32 v9, v9, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v6
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v8, v8, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v6, 0x40c00000, v6 :: v_dual_add_f32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_bfe_u32 v19, v6, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v19, v19, v6, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_add3_u32 v17, v17, v7, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v6
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_cndmask_b32 v17, v17, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v20, v18, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_perm_b32 v7, v7, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v17, 0x40c00000, v19
+; GFX11-NEXT: v_add3_u32 v19, v20, v18, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_bfe_u32 v22, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_lshlrev_b32_e32 v20, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_add3_u32 v16, v16, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v16, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v16, v22, v17, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v5, v5, v18, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v18, v4, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v19, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v18, v18, v4, 0x7fff
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v20 :: v_dual_lshlrev_b32 v20, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v18, v19, vcc_lo
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v20
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v17
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v4, v4, v16, 0x7060302
+; GFX11-NEXT: v_add3_u32 v19, v21, v17, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v21, v3, 16, 1
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v18
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v19, v20, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v21, v3, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v21, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GFX11-NEXT: v_bfe_u32 v24, v2, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v19, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v22
+; GFX11-NEXT: v_add3_u32 v20, v24, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v3, v3, v17, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v21, v23, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v20, v21 :: v_dual_lshlrev_b32 v23, 16, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v21, v22, v19, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add_f32_e32 v20, 0x40c00000, v23
+; GFX11-NEXT: v_perm_b32 v2, v2, v18, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v19, v21, v22 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v24, v20, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v21, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v24, v24, v20, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v21, v21, v0, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v23, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_add3_u32 v22, v23, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v22, v23, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v20, v20
+; GFX11-NEXT: v_perm_b32 v1, v1, v19, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v20, v24, v25, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v21, v26, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v20, 0x7060302
+; GFX11-NEXT: .LBB21_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <32 x bfloat> %a1 to <16 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x bfloat> %a to <16 x float>
+ br label %end
+
+end:
+ %phi = phi <16 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <16 x float> %phi
+}
+
+define <8 x double> @v_bitcast_v8i64_to_v8f64(<8 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i64_to_v8f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB22_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v6
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v10
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v14
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; GCN-NEXT: .LBB22_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i64_to_v8f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB22_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: .LBB22_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i64_to_v8f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB22_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: .LBB22_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i64_to_v8f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB22_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: .LBB22_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i64> %a, splat (i64 3)
+ %a2 = bitcast <8 x i64> %a1 to <8 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i64> %a to <8 x double>
+ br label %end
+
+end:
+ %phi = phi <8 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x double> %phi
+}
+
+define <8 x i64> @v_bitcast_v8f64_to_v8i64(<8 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f64_to_v8i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB23_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: .LBB23_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f64_to_v8i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB23_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: .LBB23_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f64_to_v8i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB23_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: .LBB23_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f64_to_v8i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB23_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: .LBB23_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <8 x double> %a1 to <8 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x double> %a to <8 x i64>
+ br label %end
+
+end:
+ %phi = phi <8 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i64> %phi
+}
+
+define <32 x i16> @v_bitcast_v8i64_to_v32i16(<8 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i64_to_v32i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v30, v15
+; GCN-NEXT: v_mov_b32_e32 v28, v14
+; GCN-NEXT: v_mov_b32_e32 v26, v13
+; GCN-NEXT: v_mov_b32_e32 v24, v12
+; GCN-NEXT: v_mov_b32_e32 v22, v11
+; GCN-NEXT: v_mov_b32_e32 v20, v10
+; GCN-NEXT: v_mov_b32_e32 v18, v9
+; GCN-NEXT: v_mov_b32_e32 v32, v8
+; GCN-NEXT: v_mov_b32_e32 v14, v7
+; GCN-NEXT: v_mov_b32_e32 v12, v6
+; GCN-NEXT: v_mov_b32_e32 v10, v5
+; GCN-NEXT: v_mov_b32_e32 v8, v4
+; GCN-NEXT: v_mov_b32_e32 v6, v3
+; GCN-NEXT: v_mov_b32_e32 v4, v2
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v29, v30, v28, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v20, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v32, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v8, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB24_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v2, vcc, 0, v2, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v6, vcc, 0, v6, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v8
+; GCN-NEXT: v_addc_u32_e32 v10, vcc, 0, v10, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v12
+; GCN-NEXT: v_addc_u32_e32 v14, vcc, 0, v14, vcc
+; GCN-NEXT: v_add_i32_e32 v32, vcc, 3, v32
+; GCN-NEXT: v_addc_u32_e32 v18, vcc, 0, v18, vcc
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v20
+; GCN-NEXT: v_addc_u32_e32 v22, vcc, 0, v22, vcc
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v24
+; GCN-NEXT: v_addc_u32_e32 v26, vcc, 0, v26, vcc
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v28
+; GCN-NEXT: v_addc_u32_e32 v30, vcc, 0, v30, vcc
+; GCN-NEXT: v_alignbit_b32 v29, v30, v28, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v20, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v32, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v12, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v8, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GCN-NEXT: .LBB24_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v16, v32
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i64_to_v32i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB24_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB24_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i64_to_v32i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB24_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB24_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i64_to_v32i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB24_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB24_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i64> %a, splat (i64 3)
+ %a2 = bitcast <8 x i64> %a1 to <32 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i64> %a to <32 x i16>
+ br label %end
+
+end:
+ %phi = phi <32 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i16> %phi
+}
+
+define <8 x i64> @v_bitcast_v32i16_to_v8i64(<32 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i16_to_v8i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v38, v14
+; GCN-NEXT: v_mov_b32_e32 v37, v12
+; GCN-NEXT: v_mov_b32_e32 v36, v10
+; GCN-NEXT: v_mov_b32_e32 v35, v8
+; GCN-NEXT: v_mov_b32_e32 v34, v6
+; GCN-NEXT: v_mov_b32_e32 v33, v4
+; GCN-NEXT: v_mov_b32_e32 v32, v2
+; GCN-NEXT: v_mov_b32_e32 v31, v0
+; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32
+; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_lshlrev_b32_e32 v54, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v55, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v39, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v48, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v49, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v50, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v51, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v52, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v53, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_4
+; GCN-NEXT: .LBB25_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB25_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v31
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v32
+; GCN-NEXT: v_or_b32_e32 v0, v0, v54
+; GCN-NEXT: v_or_b32_e32 v1, v1, v55
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v33
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v34
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v36
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v37
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v38
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v16
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v18
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v22
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v24
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v26
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v28
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v30
+; GCN-NEXT: v_or_b32_e32 v2, v2, v39
+; GCN-NEXT: v_or_b32_e32 v3, v3, v48
+; GCN-NEXT: v_or_b32_e32 v4, v4, v49
+; GCN-NEXT: v_or_b32_e32 v5, v5, v50
+; GCN-NEXT: v_or_b32_e32 v6, v6, v51
+; GCN-NEXT: v_or_b32_e32 v7, v7, v52
+; GCN-NEXT: v_or_b32_e32 v8, v8, v17
+; GCN-NEXT: v_or_b32_e32 v9, v9, v19
+; GCN-NEXT: v_or_b32_e32 v10, v10, v21
+; GCN-NEXT: v_or_b32_e32 v11, v11, v23
+; GCN-NEXT: v_or_b32_e32 v12, v12, v25
+; GCN-NEXT: v_or_b32_e32 v13, v13, v27
+; GCN-NEXT: v_or_b32_e32 v14, v14, v29
+; GCN-NEXT: v_or_b32_e32 v15, v15, v53
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB25_2
+; GCN-NEXT: .LBB25_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v31
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v30
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v13
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v14
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v15
+; GCN-NEXT: v_or_b32_e32 v0, v54, v0
+; GCN-NEXT: v_or_b32_e32 v1, v55, v1
+; GCN-NEXT: v_or_b32_e32 v2, v39, v2
+; GCN-NEXT: v_or_b32_e32 v3, v48, v3
+; GCN-NEXT: v_or_b32_e32 v4, v49, v4
+; GCN-NEXT: v_or_b32_e32 v5, v50, v5
+; GCN-NEXT: v_or_b32_e32 v6, v51, v6
+; GCN-NEXT: v_or_b32_e32 v7, v52, v7
+; GCN-NEXT: v_or_b32_e32 v8, v17, v8
+; GCN-NEXT: v_or_b32_e32 v9, v19, v9
+; GCN-NEXT: v_or_b32_e32 v10, v21, v10
+; GCN-NEXT: v_or_b32_e32 v11, v23, v11
+; GCN-NEXT: v_or_b32_e32 v12, v25, v12
+; GCN-NEXT: v_or_b32_e32 v13, v27, v13
+; GCN-NEXT: v_or_b32_e32 v14, v29, v14
+; GCN-NEXT: v_or_b32_e32 v15, v53, v15
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, s6, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, s6, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, s6, v7
+; GCN-NEXT: v_add_i32_e32 v8, vcc, s6, v8
+; GCN-NEXT: v_add_i32_e32 v9, vcc, s6, v9
+; GCN-NEXT: v_add_i32_e32 v10, vcc, s6, v10
+; GCN-NEXT: v_add_i32_e32 v11, vcc, s6, v11
+; GCN-NEXT: v_add_i32_e32 v12, vcc, s6, v12
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 0x30000, v13
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 0x30000, v14
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 0x30000, v15
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i16_to_v8i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB25_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v17, 3
+; VI-NEXT: v_add_u16_e32 v16, 3, v15
+; VI-NEXT: v_add_u16_sdwa v15, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v15, v16, v15
+; VI-NEXT: v_add_u16_e32 v16, 3, v14
+; VI-NEXT: v_add_u16_sdwa v14, v14, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v14, v16, v14
+; VI-NEXT: v_add_u16_e32 v16, 3, v13
+; VI-NEXT: v_add_u16_sdwa v13, v13, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v13, v16, v13
+; VI-NEXT: v_add_u16_e32 v16, 3, v12
+; VI-NEXT: v_add_u16_sdwa v12, v12, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v12, v16, v12
+; VI-NEXT: v_add_u16_e32 v16, 3, v11
+; VI-NEXT: v_add_u16_sdwa v11, v11, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v11, v16, v11
+; VI-NEXT: v_add_u16_e32 v16, 3, v10
+; VI-NEXT: v_add_u16_sdwa v10, v10, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v10, v16, v10
+; VI-NEXT: v_add_u16_e32 v16, 3, v9
+; VI-NEXT: v_add_u16_sdwa v9, v9, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v9, v16, v9
+; VI-NEXT: v_add_u16_e32 v16, 3, v8
+; VI-NEXT: v_add_u16_sdwa v8, v8, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v8, v16, v8
+; VI-NEXT: v_add_u16_e32 v16, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v16, v7
+; VI-NEXT: v_add_u16_e32 v16, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v16, v6
+; VI-NEXT: v_add_u16_e32 v16, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v16, v5
+; VI-NEXT: v_add_u16_e32 v16, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v16, v4
+; VI-NEXT: v_add_u16_e32 v16, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v16, v3
+; VI-NEXT: v_add_u16_e32 v16, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v16, v2
+; VI-NEXT: v_add_u16_e32 v16, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v16, v1
+; VI-NEXT: v_add_u16_e32 v16, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v16, v0
+; VI-NEXT: .LBB25_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i16_to_v8i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB25_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB25_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i16_to_v8i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB25_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB25_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i16> %a, splat (i16 3)
+ %a2 = bitcast <32 x i16> %a1 to <8 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i16> %a to <8 x i64>
+ br label %end
+
+end:
+ %phi = phi <8 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i64> %phi
+}
+
+define <32 x half> @v_bitcast_v8i64_to_v32f16(<8 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i64_to_v32f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 ; 4-byte Folded Spill
+; GCN-NEXT: v_mov_b32_e32 v34, v15
+; GCN-NEXT: v_mov_b32_e32 v33, v14
+; GCN-NEXT: v_mov_b32_e32 v36, v13
+; GCN-NEXT: v_mov_b32_e32 v35, v12
+; GCN-NEXT: v_mov_b32_e32 v38, v11
+; GCN-NEXT: v_mov_b32_e32 v37, v10
+; GCN-NEXT: v_mov_b32_e32 v48, v9
+; GCN-NEXT: v_mov_b32_e32 v39, v8
+; GCN-NEXT: v_mov_b32_e32 v50, v7
+; GCN-NEXT: v_mov_b32_e32 v49, v6
+; GCN-NEXT: v_mov_b32_e32 v52, v5
+; GCN-NEXT: v_mov_b32_e32 v51, v4
+; GCN-NEXT: v_mov_b32_e32 v54, v3
+; GCN-NEXT: v_mov_b32_e32 v53, v2
+; GCN-NEXT: v_mov_b32_e32 v55, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB26_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v37
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v39
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v50
+; GCN-NEXT: s_waitcnt expcnt(6)
+; GCN-NEXT: v_lshrrev_b32_e32 v40, 16, v49
+; GCN-NEXT: s_waitcnt expcnt(5)
+; GCN-NEXT: v_lshrrev_b32_e32 v41, 16, v52
+; GCN-NEXT: s_waitcnt expcnt(4)
+; GCN-NEXT: v_lshrrev_b32_e32 v42, 16, v51
+; GCN-NEXT: s_waitcnt expcnt(3)
+; GCN-NEXT: v_lshrrev_b32_e32 v43, 16, v54
+; GCN-NEXT: s_waitcnt expcnt(2)
+; GCN-NEXT: v_lshrrev_b32_e32 v44, 16, v53
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: v_lshrrev_b32_e32 v45, 16, v55
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: v_lshrrev_b32_e32 v46, 16, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v42
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v46
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v32
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: .LBB26_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB26_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v32
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v55, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v53
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v54, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v51
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v52, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v49
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v50, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v39
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v48, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v37
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v38, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v35
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v36, vcc
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v33
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v34, vcc
+; GCN-NEXT: v_lshrrev_b32_e32 v32, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v33, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v34, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v35, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v36, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v37, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v38, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v39, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v32
+; GCN-NEXT: .LBB26_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i64_to_v32f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB26_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB26_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i64_to_v32f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB26_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB26_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i64_to_v32f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB26_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB26_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i64> %a, splat (i64 3)
+ %a2 = bitcast <8 x i64> %a1 to <32 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i64> %a to <32 x half>
+ br label %end
+
+end:
+ %phi = phi <32 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x half> %phi
+}
+
+define <8 x i64> @v_bitcast_v32f16_to_v8i64(<32 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f16_to_v8i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_cvt_f16_f32_e32 v45, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v44, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v43, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v42, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v41, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v52, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v40, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v50, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v55, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v48, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v54, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v38, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v53, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v36, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v51, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v34, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v49, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v33, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v39, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v32, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v37, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v35, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v46
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB27_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v45
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v43
+; GCN-NEXT: v_or_b32_e32 v0, v44, v0
+; GCN-NEXT: v_or_b32_e32 v1, v42, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v52, v2
+; GCN-NEXT: v_or_b32_e32 v3, v50, v3
+; GCN-NEXT: v_or_b32_e32 v4, v48, v4
+; GCN-NEXT: v_or_b32_e32 v5, v38, v5
+; GCN-NEXT: v_or_b32_e32 v6, v36, v6
+; GCN-NEXT: v_or_b32_e32 v7, v34, v7
+; GCN-NEXT: v_or_b32_e32 v8, v33, v8
+; GCN-NEXT: v_or_b32_e32 v9, v32, v9
+; GCN-NEXT: v_or_b32_e32 v10, v31, v10
+; GCN-NEXT: v_or_b32_e32 v11, v21, v11
+; GCN-NEXT: v_or_b32_e32 v12, v19, v12
+; GCN-NEXT: v_or_b32_e32 v13, v18, v13
+; GCN-NEXT: v_or_b32_e32 v14, v17, v14
+; GCN-NEXT: v_or_b32_e32 v15, v16, v15
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB27_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB27_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x38000000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x38000000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x38000000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x38000000, v28
+; GCN-NEXT: v_add_f32_e32 v29, 0x38000000, v29
+; GCN-NEXT: v_add_f32_e32 v21, 0x38000000, v21
+; GCN-NEXT: v_add_f32_e32 v25, 0x38000000, v25
+; GCN-NEXT: v_add_f32_e32 v19, 0x38000000, v19
+; GCN-NEXT: v_add_f32_e32 v23, 0x38000000, v23
+; GCN-NEXT: v_add_f32_e32 v18, 0x38000000, v18
+; GCN-NEXT: v_add_f32_e32 v22, 0x38000000, v22
+; GCN-NEXT: v_add_f32_e32 v17, 0x38000000, v17
+; GCN-NEXT: v_add_f32_e32 v20, 0x38000000, v20
+; GCN-NEXT: v_add_f32_e32 v16, 0x38000000, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v9, v8
+; GCN-NEXT: v_or_b32_e32 v6, v11, v10
+; GCN-NEXT: v_or_b32_e32 v7, v13, v12
+; GCN-NEXT: v_or_b32_e32 v8, v15, v14
+; GCN-NEXT: v_or_b32_e32 v9, v26, v24
+; GCN-NEXT: v_or_b32_e32 v10, v28, v27
+; GCN-NEXT: v_or_b32_e32 v11, v21, v29
+; GCN-NEXT: v_or_b32_e32 v12, v19, v25
+; GCN-NEXT: v_or_b32_e32 v13, v18, v23
+; GCN-NEXT: v_or_b32_e32 v14, v17, v22
+; GCN-NEXT: v_or_b32_e32 v15, v16, v20
+; GCN-NEXT: .LBB27_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f16_to_v8i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB27_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v16, 0x200
+; VI-NEXT: v_add_f16_sdwa v17, v15, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v15, 0x200, v15
+; VI-NEXT: v_or_b32_e32 v15, v15, v17
+; VI-NEXT: v_add_f16_sdwa v17, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v14, 0x200, v14
+; VI-NEXT: v_or_b32_e32 v14, v14, v17
+; VI-NEXT: v_add_f16_sdwa v17, v13, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v13, 0x200, v13
+; VI-NEXT: v_or_b32_e32 v13, v13, v17
+; VI-NEXT: v_add_f16_sdwa v17, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v12, 0x200, v12
+; VI-NEXT: v_or_b32_e32 v12, v12, v17
+; VI-NEXT: v_add_f16_sdwa v17, v11, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v11, 0x200, v11
+; VI-NEXT: v_or_b32_e32 v11, v11, v17
+; VI-NEXT: v_add_f16_sdwa v17, v10, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v10, 0x200, v10
+; VI-NEXT: v_or_b32_e32 v10, v10, v17
+; VI-NEXT: v_add_f16_sdwa v17, v9, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v9, 0x200, v9
+; VI-NEXT: v_or_b32_e32 v9, v9, v17
+; VI-NEXT: v_add_f16_sdwa v17, v8, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v8
+; VI-NEXT: v_or_b32_e32 v8, v8, v17
+; VI-NEXT: v_add_f16_sdwa v17, v7, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v17
+; VI-NEXT: v_add_f16_sdwa v17, v6, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v17
+; VI-NEXT: v_add_f16_sdwa v17, v5, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v17
+; VI-NEXT: v_add_f16_sdwa v17, v4, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v17
+; VI-NEXT: v_add_f16_sdwa v17, v3, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v17
+; VI-NEXT: v_add_f16_sdwa v17, v2, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v17
+; VI-NEXT: v_add_f16_sdwa v17, v1, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v16, v0, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v17
+; VI-NEXT: v_or_b32_e32 v0, v0, v16
+; VI-NEXT: .LBB27_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f16_to_v8i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB27_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v15, v15, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v14, v14, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v13, v13, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v12, v12, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v11, v11, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v10, v10, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v9, v9, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v8, v8, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB27_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f16_to_v8i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB27_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v15, 0x200, v15 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v14, 0x200, v14 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v13, 0x200, v13 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v12, 0x200, v12 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v11, 0x200, v11 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v10, 0x200, v10 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v9, 0x200, v9 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v8, 0x200, v8 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB27_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <32 x half> %a1 to <8 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x half> %a to <8 x i64>
+ br label %end
+
+end:
+ %phi = phi <8 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i64> %phi
+}
+
+define <32 x bfloat> @v_bitcast_v8i64_to_v32bf16(<8 x i64> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8i64_to_v32bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v55, v15
+; GCN-NEXT: v_mov_b32_e32 v54, v14
+; GCN-NEXT: v_mov_b32_e32 v53, v13
+; GCN-NEXT: v_mov_b32_e32 v52, v12
+; GCN-NEXT: v_mov_b32_e32 v51, v11
+; GCN-NEXT: v_mov_b32_e32 v50, v10
+; GCN-NEXT: v_mov_b32_e32 v49, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v39, v7
+; GCN-NEXT: v_mov_b32_e32 v38, v6
+; GCN-NEXT: v_mov_b32_e32 v37, v5
+; GCN-NEXT: v_mov_b32_e32 v36, v4
+; GCN-NEXT: v_mov_b32_e32 v35, v3
+; GCN-NEXT: v_mov_b32_e32 v34, v2
+; GCN-NEXT: v_mov_b32_e32 v33, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_4
+; GCN-NEXT: .LBB28_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB28_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v55
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v54
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v53
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v52
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v52
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v51
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v50
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v50
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v49
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v48
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v48
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v39
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v38
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v38
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v36
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v36
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v34
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v34
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v33
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v33
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v32
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v32
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB28_2
+; GCN-NEXT: .LBB28_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v32
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v33, vcc
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v34
+; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v35, vcc
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v36
+; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v37, vcc
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v38
+; GCN-NEXT: v_addc_u32_e32 v7, vcc, 0, v39, vcc
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v48
+; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v49, vcc
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v50
+; GCN-NEXT: v_addc_u32_e32 v11, vcc, 0, v51, vcc
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v52
+; GCN-NEXT: v_addc_u32_e32 v13, vcc, 0, v53, vcc
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v54
+; GCN-NEXT: v_addc_u32_e32 v15, vcc, 0, v55, vcc
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v15
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v14
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v13
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v12
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v11
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v10
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v9
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v7
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v6
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v5
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v3
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8i64_to_v32bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB28_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v14, vcc, 3, v14
+; VI-NEXT: v_addc_u32_e32 v15, vcc, 0, v15, vcc
+; VI-NEXT: v_add_u32_e32 v12, vcc, 3, v12
+; VI-NEXT: v_addc_u32_e32 v13, vcc, 0, v13, vcc
+; VI-NEXT: v_add_u32_e32 v10, vcc, 3, v10
+; VI-NEXT: v_addc_u32_e32 v11, vcc, 0, v11, vcc
+; VI-NEXT: v_add_u32_e32 v8, vcc, 3, v8
+; VI-NEXT: v_addc_u32_e32 v9, vcc, 0, v9, vcc
+; VI-NEXT: v_add_u32_e32 v6, vcc, 3, v6
+; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v7, vcc
+; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v4
+; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v5, vcc
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: .LBB28_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8i64_to_v32bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB28_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v14, vcc, 3, v14
+; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v15, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, 3, v12
+; GFX9-NEXT: v_addc_co_u32_e32 v13, vcc, 0, v13, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, 3, v10
+; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, 3, v8
+; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, 3, v6
+; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v7, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, 3, v4
+; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, 3, v2
+; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: .LBB28_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8i64_to_v32bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB28_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v14, vcc_lo, v14, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v15, vcc_lo, 0, v15, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v12, vcc_lo, v12, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v13, vcc_lo, 0, v13, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v10, vcc_lo, v10, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v11, vcc_lo, 0, v11, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v8, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, 0, v9, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v4, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, 0, v5, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v3, vcc_lo
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: .LBB28_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <8 x i64> %a, splat (i64 3)
+ %a2 = bitcast <8 x i64> %a1 to <32 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x i64> %a to <32 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <32 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x bfloat> %phi
+}
+
+define <8 x i64> @v_bitcast_v32bf16_to_v8i64(<32 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32bf16_to_v8i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_mul_f32_e32 v44, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v45, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v42, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v43, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v41, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v51, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v40, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v49, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v55, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v39, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v54, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v37, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v53, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v36, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v52, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v34, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v50, 1.0, v17
+; GCN-NEXT: v_mul_f32_e32 v33, 1.0, v16
+; GCN-NEXT: v_mul_f32_e32 v48, 1.0, v19
+; GCN-NEXT: v_mul_f32_e32 v32, 1.0, v18
+; GCN-NEXT: v_mul_f32_e32 v38, 1.0, v21
+; GCN-NEXT: v_mul_f32_e32 v31, 1.0, v20
+; GCN-NEXT: v_mul_f32_e32 v35, 1.0, v23
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v22
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v24
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v27
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v26
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v29
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v46
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB29_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v44
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v42
+; GCN-NEXT: v_alignbit_b32 v0, v0, v45, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v43, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v52
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v50
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v2, v51, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v49, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v39, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v37, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v36, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v34, 16
+; GCN-NEXT: v_alignbit_b32 v8, v8, v33, 16
+; GCN-NEXT: v_alignbit_b32 v9, v9, v32, 16
+; GCN-NEXT: v_alignbit_b32 v10, v10, v31, 16
+; GCN-NEXT: v_alignbit_b32 v11, v11, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v12, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v13, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v14, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB29_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB29_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v45
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v44
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v43
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v51
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v41
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v49
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v40
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v39
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v55
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v54
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v36
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v53
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v34
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v52
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v33
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v50
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v32
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff0000, v48
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v31
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff0000, v38
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v35
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v23
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v20
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GCN-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GCN-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GCN-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GCN-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GCN-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GCN-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GCN-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v28, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v9, v8, 16
+; GCN-NEXT: v_alignbit_b32 v6, v11, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v13, v12, 16
+; GCN-NEXT: v_alignbit_b32 v8, v15, v14, 16
+; GCN-NEXT: v_alignbit_b32 v9, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v10, v28, v27, 16
+; GCN-NEXT: v_alignbit_b32 v11, v29, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v25, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v23, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v22, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v20, v16, 16
+; GCN-NEXT: .LBB29_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32bf16_to_v8i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB29_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v15, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v15
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; VI-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v14
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v14, 16, v14
+; VI-NEXT: v_alignbit_b32 v14, v14, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v13
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; VI-NEXT: v_alignbit_b32 v13, v13, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v12
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; VI-NEXT: v_alignbit_b32 v12, v12, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v11
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; VI-NEXT: v_alignbit_b32 v11, v11, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v10
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v10, 16, v10
+; VI-NEXT: v_alignbit_b32 v10, v10, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v9
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; VI-NEXT: v_alignbit_b32 v9, v9, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v8
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v8, 16, v8
+; VI-NEXT: v_alignbit_b32 v8, v8, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v7
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v6
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v5
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v4
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v3
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v2
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v1
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v0
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v16, 16
+; VI-NEXT: .LBB29_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32bf16_to_v8i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB29_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v15, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v15, v15, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v14, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v14, v14, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v13, v13, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v12, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v12, v12, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v11, v11, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v10, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v10, v10, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v9, v9, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v8, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v8, v8, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v7, v7, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v16, s7
+; GFX9-NEXT: .LBB29_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32bf16_to_v8i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB29_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v17, 16, v14
+; GFX11-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX11-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v17 :: v_dual_add_f32 v16, 0x40c00000, v16
+; GFX11-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v16, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v16
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v23, v14, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v16, v16
+; GFX11-NEXT: v_add3_u32 v21, v21, v17, 0x7fff
+; GFX11-NEXT: v_add3_u32 v18, v18, v16, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v18, v19, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v23, v14, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_bfe_u32 v20, v15, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v15
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v20, v20, v15, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v15, v20, v22 :: v_dual_lshlrev_b32 v20, 16, v13
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v15, v15, v16, 0x7060302
+; GFX11-NEXT: v_dual_cndmask_b32 v17, v21, v18 :: v_dual_add_f32 v18, 0x40c00000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v14, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_perm_b32 v14, v14, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_cndmask_b32 v16, v16, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v13, v13
+; GFX11-NEXT: v_add3_u32 v17, v17, v13, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cndmask_b32_e32 v13, v17, v21, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_perm_b32 v13, v13, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v11
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_bfe_u32 v18, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v18, v18, v12, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v10
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v12, v12, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v17, v17, v11, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_bfe_u32 v19, v10, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v19, v19, v10, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v10
+; GFX11-NEXT: v_perm_b32 v11, v11, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v19, v22 :: v_dual_lshlrev_b32 v21, 16, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_lshlrev_b32 v19, 16, v8
+; GFX11-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX11-NEXT: v_perm_b32 v10, v10, v17, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v8, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v9, 0x40c00000, v9
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v18, v18, v8, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v9
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_add3_u32 v17, v17, v9, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v8
+; GFX11-NEXT: v_perm_b32 v9, v9, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v6
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v8, v8, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v6, 0x40c00000, v6 :: v_dual_add_f32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_bfe_u32 v19, v6, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v19, v19, v6, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_add3_u32 v17, v17, v7, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v6
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_cndmask_b32 v17, v17, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v20, v18, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_perm_b32 v7, v7, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v17, 0x40c00000, v19
+; GFX11-NEXT: v_add3_u32 v19, v20, v18, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_bfe_u32 v22, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_lshlrev_b32_e32 v20, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_add3_u32 v16, v16, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v16, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v16, v22, v17, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v5, v5, v18, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v18, v4, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v19, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v18, v18, v4, 0x7fff
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v20 :: v_dual_lshlrev_b32 v20, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v18, v19, vcc_lo
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v20
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v17
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v4, v4, v16, 0x7060302
+; GFX11-NEXT: v_add3_u32 v19, v21, v17, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v21, v3, 16, 1
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v18
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v19, v20, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v21, v3, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v21, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GFX11-NEXT: v_bfe_u32 v24, v2, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v19, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v22
+; GFX11-NEXT: v_add3_u32 v20, v24, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v3, v3, v17, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v21, v23, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v20, v21 :: v_dual_lshlrev_b32 v23, 16, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v21, v22, v19, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add_f32_e32 v20, 0x40c00000, v23
+; GFX11-NEXT: v_perm_b32 v2, v2, v18, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v19, v21, v22 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v24, v20, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v21, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v24, v24, v20, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v21, v21, v0, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v23, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_add3_u32 v22, v23, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v22, v23, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v20, v20
+; GFX11-NEXT: v_perm_b32 v1, v1, v19, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v20, v24, v25, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v21, v26, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v20, 0x7060302
+; GFX11-NEXT: .LBB29_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <32 x bfloat> %a1 to <8 x i64>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x bfloat> %a to <8 x i64>
+ br label %end
+
+end:
+ %phi = phi <8 x i64> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x i64> %phi
+}
+
+define <32 x i16> @v_bitcast_v8f64_to_v32i16(<8 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f64_to_v32i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v55, v15
+; GCN-NEXT: v_mov_b32_e32 v54, v14
+; GCN-NEXT: v_mov_b32_e32 v53, v13
+; GCN-NEXT: v_mov_b32_e32 v52, v12
+; GCN-NEXT: v_mov_b32_e32 v51, v11
+; GCN-NEXT: v_mov_b32_e32 v50, v10
+; GCN-NEXT: v_mov_b32_e32 v49, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v38, v7
+; GCN-NEXT: v_mov_b32_e32 v37, v6
+; GCN-NEXT: v_mov_b32_e32 v36, v5
+; GCN-NEXT: v_mov_b32_e32 v35, v4
+; GCN-NEXT: v_mov_b32_e32 v34, v3
+; GCN-NEXT: v_mov_b32_e32 v33, v2
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v29, v55, v54, 16
+; GCN-NEXT: v_alignbit_b32 v25, v53, v52, 16
+; GCN-NEXT: v_alignbit_b32 v21, v51, v50, 16
+; GCN-NEXT: v_alignbit_b32 v17, v49, v48, 16
+; GCN-NEXT: v_alignbit_b32 v13, v38, v37, 16
+; GCN-NEXT: v_alignbit_b32 v9, v36, v35, 16
+; GCN-NEXT: v_alignbit_b32 v5, v34, v33, 16
+; GCN-NEXT: v_alignbit_b32 v32, v1, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v51
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v49
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: .LBB30_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[33:34], v[33:34], 1.0
+; GCN-NEXT: v_add_f64 v[35:36], v[35:36], 1.0
+; GCN-NEXT: v_add_f64 v[37:38], v[37:38], 1.0
+; GCN-NEXT: v_add_f64 v[48:49], v[48:49], 1.0
+; GCN-NEXT: v_add_f64 v[50:51], v[50:51], 1.0
+; GCN-NEXT: v_add_f64 v[52:53], v[52:53], 1.0
+; GCN-NEXT: v_add_f64 v[54:55], v[54:55], 1.0
+; GCN-NEXT: v_alignbit_b32 v29, v55, v54, 16
+; GCN-NEXT: v_alignbit_b32 v25, v53, v52, 16
+; GCN-NEXT: v_alignbit_b32 v21, v51, v50, 16
+; GCN-NEXT: v_alignbit_b32 v17, v49, v48, 16
+; GCN-NEXT: v_alignbit_b32 v13, v38, v37, 16
+; GCN-NEXT: v_alignbit_b32 v9, v36, v35, 16
+; GCN-NEXT: v_alignbit_b32 v5, v34, v33, 16
+; GCN-NEXT: v_alignbit_b32 v32, v1, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v51
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v49
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: .LBB30_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v2, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v33
+; GCN-NEXT: v_mov_b32_e32 v6, v34
+; GCN-NEXT: v_mov_b32_e32 v8, v35
+; GCN-NEXT: v_mov_b32_e32 v10, v36
+; GCN-NEXT: v_mov_b32_e32 v12, v37
+; GCN-NEXT: v_mov_b32_e32 v14, v38
+; GCN-NEXT: v_mov_b32_e32 v16, v48
+; GCN-NEXT: v_mov_b32_e32 v18, v49
+; GCN-NEXT: v_mov_b32_e32 v20, v50
+; GCN-NEXT: v_mov_b32_e32 v22, v51
+; GCN-NEXT: v_mov_b32_e32 v24, v52
+; GCN-NEXT: v_mov_b32_e32 v26, v53
+; GCN-NEXT: v_mov_b32_e32 v28, v54
+; GCN-NEXT: v_mov_b32_e32 v30, v55
+; GCN-NEXT: v_mov_b32_e32 v1, v32
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f64_to_v32i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB30_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB30_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f64_to_v32i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB30_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB30_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f64_to_v32i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB30_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB30_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <8 x double> %a1 to <32 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x double> %a to <32 x i16>
+ br label %end
+
+end:
+ %phi = phi <32 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i16> %phi
+}
+
+define <8 x double> @v_bitcast_v32i16_to_v8f64(<32 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i16_to_v8f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v38, v14
+; GCN-NEXT: v_mov_b32_e32 v37, v12
+; GCN-NEXT: v_mov_b32_e32 v36, v10
+; GCN-NEXT: v_mov_b32_e32 v35, v8
+; GCN-NEXT: v_mov_b32_e32 v34, v6
+; GCN-NEXT: v_mov_b32_e32 v33, v4
+; GCN-NEXT: v_mov_b32_e32 v32, v2
+; GCN-NEXT: v_mov_b32_e32 v31, v0
+; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32
+; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_lshlrev_b32_e32 v54, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v55, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v39, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v48, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v49, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v50, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v51, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v52, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v53, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_4
+; GCN-NEXT: .LBB31_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB31_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v31
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v32
+; GCN-NEXT: v_or_b32_e32 v0, v0, v54
+; GCN-NEXT: v_or_b32_e32 v1, v1, v55
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v33
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v34
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v35
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v36
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v37
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v38
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v16
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v18
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v22
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v24
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v26
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v28
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v30
+; GCN-NEXT: v_or_b32_e32 v2, v2, v39
+; GCN-NEXT: v_or_b32_e32 v3, v3, v48
+; GCN-NEXT: v_or_b32_e32 v4, v4, v49
+; GCN-NEXT: v_or_b32_e32 v5, v5, v50
+; GCN-NEXT: v_or_b32_e32 v6, v6, v51
+; GCN-NEXT: v_or_b32_e32 v7, v7, v52
+; GCN-NEXT: v_or_b32_e32 v8, v8, v17
+; GCN-NEXT: v_or_b32_e32 v9, v9, v19
+; GCN-NEXT: v_or_b32_e32 v10, v10, v21
+; GCN-NEXT: v_or_b32_e32 v11, v11, v23
+; GCN-NEXT: v_or_b32_e32 v12, v12, v25
+; GCN-NEXT: v_or_b32_e32 v13, v13, v27
+; GCN-NEXT: v_or_b32_e32 v14, v14, v29
+; GCN-NEXT: v_or_b32_e32 v15, v15, v53
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB31_2
+; GCN-NEXT: .LBB31_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v31
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v32
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v16
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v18
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v20
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v22
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v24
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v26
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v28
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v30
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff, v7
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff, v9
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff, v11
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff, v13
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v14
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff, v15
+; GCN-NEXT: v_or_b32_e32 v0, v54, v0
+; GCN-NEXT: v_or_b32_e32 v1, v55, v1
+; GCN-NEXT: v_or_b32_e32 v2, v39, v2
+; GCN-NEXT: v_or_b32_e32 v3, v48, v3
+; GCN-NEXT: v_or_b32_e32 v4, v49, v4
+; GCN-NEXT: v_or_b32_e32 v5, v50, v5
+; GCN-NEXT: v_or_b32_e32 v6, v51, v6
+; GCN-NEXT: v_or_b32_e32 v7, v52, v7
+; GCN-NEXT: v_or_b32_e32 v8, v17, v8
+; GCN-NEXT: v_or_b32_e32 v9, v19, v9
+; GCN-NEXT: v_or_b32_e32 v10, v21, v10
+; GCN-NEXT: v_or_b32_e32 v11, v23, v11
+; GCN-NEXT: v_or_b32_e32 v12, v25, v12
+; GCN-NEXT: v_or_b32_e32 v13, v27, v13
+; GCN-NEXT: v_or_b32_e32 v14, v29, v14
+; GCN-NEXT: v_or_b32_e32 v15, v53, v15
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, s6, v1
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v3, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v5, vcc, s6, v5
+; GCN-NEXT: v_add_i32_e32 v6, vcc, s6, v6
+; GCN-NEXT: v_add_i32_e32 v7, vcc, s6, v7
+; GCN-NEXT: v_add_i32_e32 v8, vcc, s6, v8
+; GCN-NEXT: v_add_i32_e32 v9, vcc, s6, v9
+; GCN-NEXT: v_add_i32_e32 v10, vcc, s6, v10
+; GCN-NEXT: v_add_i32_e32 v11, vcc, s6, v11
+; GCN-NEXT: v_add_i32_e32 v12, vcc, s6, v12
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 0x30000, v13
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 0x30000, v14
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 0x30000, v15
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i16_to_v8f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB31_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v17, 3
+; VI-NEXT: v_add_u16_e32 v16, 3, v15
+; VI-NEXT: v_add_u16_sdwa v15, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v15, v16, v15
+; VI-NEXT: v_add_u16_e32 v16, 3, v14
+; VI-NEXT: v_add_u16_sdwa v14, v14, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v14, v16, v14
+; VI-NEXT: v_add_u16_e32 v16, 3, v13
+; VI-NEXT: v_add_u16_sdwa v13, v13, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v13, v16, v13
+; VI-NEXT: v_add_u16_e32 v16, 3, v12
+; VI-NEXT: v_add_u16_sdwa v12, v12, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v12, v16, v12
+; VI-NEXT: v_add_u16_e32 v16, 3, v11
+; VI-NEXT: v_add_u16_sdwa v11, v11, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v11, v16, v11
+; VI-NEXT: v_add_u16_e32 v16, 3, v10
+; VI-NEXT: v_add_u16_sdwa v10, v10, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v10, v16, v10
+; VI-NEXT: v_add_u16_e32 v16, 3, v9
+; VI-NEXT: v_add_u16_sdwa v9, v9, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v9, v16, v9
+; VI-NEXT: v_add_u16_e32 v16, 3, v8
+; VI-NEXT: v_add_u16_sdwa v8, v8, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v8, v16, v8
+; VI-NEXT: v_add_u16_e32 v16, 3, v7
+; VI-NEXT: v_add_u16_sdwa v7, v7, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v16, v7
+; VI-NEXT: v_add_u16_e32 v16, 3, v6
+; VI-NEXT: v_add_u16_sdwa v6, v6, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v16, v6
+; VI-NEXT: v_add_u16_e32 v16, 3, v5
+; VI-NEXT: v_add_u16_sdwa v5, v5, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v16, v5
+; VI-NEXT: v_add_u16_e32 v16, 3, v4
+; VI-NEXT: v_add_u16_sdwa v4, v4, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v16, v4
+; VI-NEXT: v_add_u16_e32 v16, 3, v3
+; VI-NEXT: v_add_u16_sdwa v3, v3, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v3, v16, v3
+; VI-NEXT: v_add_u16_e32 v16, 3, v2
+; VI-NEXT: v_add_u16_sdwa v2, v2, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v2, v16, v2
+; VI-NEXT: v_add_u16_e32 v16, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v16, v1
+; VI-NEXT: v_add_u16_e32 v16, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v16, v0
+; VI-NEXT: .LBB31_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i16_to_v8f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB31_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB31_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i16_to_v8f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB31_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB31_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i16> %a, splat (i16 3)
+ %a2 = bitcast <32 x i16> %a1 to <8 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i16> %a to <8 x double>
+ br label %end
+
+end:
+ %phi = phi <8 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x double> %phi
+}
+
+define <32 x half> @v_bitcast_v8f64_to_v32f16(<8 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f64_to_v32f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 ; 4-byte Folded Spill
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v39, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v48, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v49, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v50, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v51, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v52, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v53, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v54, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v55, 16, v3
+; GCN-NEXT: s_waitcnt expcnt(2)
+; GCN-NEXT: v_lshrrev_b32_e32 v40, 16, v2
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: v_lshrrev_b32_e32 v41, 16, v1
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: v_lshrrev_b32_e32 v42, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v38, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v37, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v36, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v35, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v34, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v33, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v32, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v48, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v49, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v50, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v51, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v52, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v53, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v54, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v55, v42
+; GCN-NEXT: v_cvt_f32_f16_e32 v39, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: .LBB32_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_lshrrev_b32_e32 v55, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v54, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v53, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v52, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v51, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v50, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v49, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v48, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v38, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v37, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v36, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v35, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v34, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v33, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v32, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v39, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v48, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v49, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v50, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v51, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v52, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v53, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v54, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v55, v55
+; GCN-NEXT: .LBB32_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v39
+; GCN-NEXT: v_mov_b32_e32 v1, v55
+; GCN-NEXT: v_mov_b32_e32 v2, v32
+; GCN-NEXT: v_mov_b32_e32 v3, v54
+; GCN-NEXT: v_mov_b32_e32 v4, v33
+; GCN-NEXT: v_mov_b32_e32 v5, v53
+; GCN-NEXT: v_mov_b32_e32 v6, v34
+; GCN-NEXT: v_mov_b32_e32 v7, v52
+; GCN-NEXT: v_mov_b32_e32 v8, v35
+; GCN-NEXT: v_mov_b32_e32 v9, v51
+; GCN-NEXT: v_mov_b32_e32 v10, v36
+; GCN-NEXT: v_mov_b32_e32 v11, v50
+; GCN-NEXT: v_mov_b32_e32 v12, v37
+; GCN-NEXT: v_mov_b32_e32 v13, v49
+; GCN-NEXT: v_mov_b32_e32 v14, v38
+; GCN-NEXT: v_mov_b32_e32 v15, v48
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f64_to_v32f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB32_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB32_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f64_to_v32f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB32_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB32_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f64_to_v32f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB32_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB32_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <8 x double> %a1 to <32 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x double> %a to <32 x half>
+ br label %end
+
+end:
+ %phi = phi <32 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x half> %phi
+}
+
+define <8 x double> @v_bitcast_v32f16_to_v8f64(<32 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f16_to_v8f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_cvt_f16_f32_e32 v45, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v44, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v43, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v42, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v41, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v52, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v40, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v50, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v55, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v48, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v54, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v38, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v53, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v36, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v51, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v34, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v49, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v33, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v39, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v32, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v37, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v35, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v46
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB33_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v45
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v43
+; GCN-NEXT: v_or_b32_e32 v0, v44, v0
+; GCN-NEXT: v_or_b32_e32 v1, v42, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v52, v2
+; GCN-NEXT: v_or_b32_e32 v3, v50, v3
+; GCN-NEXT: v_or_b32_e32 v4, v48, v4
+; GCN-NEXT: v_or_b32_e32 v5, v38, v5
+; GCN-NEXT: v_or_b32_e32 v6, v36, v6
+; GCN-NEXT: v_or_b32_e32 v7, v34, v7
+; GCN-NEXT: v_or_b32_e32 v8, v33, v8
+; GCN-NEXT: v_or_b32_e32 v9, v32, v9
+; GCN-NEXT: v_or_b32_e32 v10, v31, v10
+; GCN-NEXT: v_or_b32_e32 v11, v21, v11
+; GCN-NEXT: v_or_b32_e32 v12, v19, v12
+; GCN-NEXT: v_or_b32_e32 v13, v18, v13
+; GCN-NEXT: v_or_b32_e32 v14, v17, v14
+; GCN-NEXT: v_or_b32_e32 v15, v16, v15
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB33_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB33_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x38000000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x38000000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x38000000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x38000000, v28
+; GCN-NEXT: v_add_f32_e32 v29, 0x38000000, v29
+; GCN-NEXT: v_add_f32_e32 v21, 0x38000000, v21
+; GCN-NEXT: v_add_f32_e32 v25, 0x38000000, v25
+; GCN-NEXT: v_add_f32_e32 v19, 0x38000000, v19
+; GCN-NEXT: v_add_f32_e32 v23, 0x38000000, v23
+; GCN-NEXT: v_add_f32_e32 v18, 0x38000000, v18
+; GCN-NEXT: v_add_f32_e32 v22, 0x38000000, v22
+; GCN-NEXT: v_add_f32_e32 v17, 0x38000000, v17
+; GCN-NEXT: v_add_f32_e32 v20, 0x38000000, v20
+; GCN-NEXT: v_add_f32_e32 v16, 0x38000000, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_or_b32_e32 v2, v3, v2
+; GCN-NEXT: v_or_b32_e32 v3, v5, v4
+; GCN-NEXT: v_or_b32_e32 v4, v7, v6
+; GCN-NEXT: v_or_b32_e32 v5, v9, v8
+; GCN-NEXT: v_or_b32_e32 v6, v11, v10
+; GCN-NEXT: v_or_b32_e32 v7, v13, v12
+; GCN-NEXT: v_or_b32_e32 v8, v15, v14
+; GCN-NEXT: v_or_b32_e32 v9, v26, v24
+; GCN-NEXT: v_or_b32_e32 v10, v28, v27
+; GCN-NEXT: v_or_b32_e32 v11, v21, v29
+; GCN-NEXT: v_or_b32_e32 v12, v19, v25
+; GCN-NEXT: v_or_b32_e32 v13, v18, v23
+; GCN-NEXT: v_or_b32_e32 v14, v17, v22
+; GCN-NEXT: v_or_b32_e32 v15, v16, v20
+; GCN-NEXT: .LBB33_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f16_to_v8f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB33_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v16, 0x200
+; VI-NEXT: v_add_f16_sdwa v17, v15, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v15, 0x200, v15
+; VI-NEXT: v_or_b32_e32 v15, v15, v17
+; VI-NEXT: v_add_f16_sdwa v17, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v14, 0x200, v14
+; VI-NEXT: v_or_b32_e32 v14, v14, v17
+; VI-NEXT: v_add_f16_sdwa v17, v13, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v13, 0x200, v13
+; VI-NEXT: v_or_b32_e32 v13, v13, v17
+; VI-NEXT: v_add_f16_sdwa v17, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v12, 0x200, v12
+; VI-NEXT: v_or_b32_e32 v12, v12, v17
+; VI-NEXT: v_add_f16_sdwa v17, v11, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v11, 0x200, v11
+; VI-NEXT: v_or_b32_e32 v11, v11, v17
+; VI-NEXT: v_add_f16_sdwa v17, v10, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v10, 0x200, v10
+; VI-NEXT: v_or_b32_e32 v10, v10, v17
+; VI-NEXT: v_add_f16_sdwa v17, v9, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v9, 0x200, v9
+; VI-NEXT: v_or_b32_e32 v9, v9, v17
+; VI-NEXT: v_add_f16_sdwa v17, v8, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v8, 0x200, v8
+; VI-NEXT: v_or_b32_e32 v8, v8, v17
+; VI-NEXT: v_add_f16_sdwa v17, v7, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v7, 0x200, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v17
+; VI-NEXT: v_add_f16_sdwa v17, v6, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v6, 0x200, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v17
+; VI-NEXT: v_add_f16_sdwa v17, v5, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v5, 0x200, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v17
+; VI-NEXT: v_add_f16_sdwa v17, v4, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v4
+; VI-NEXT: v_or_b32_e32 v4, v4, v17
+; VI-NEXT: v_add_f16_sdwa v17, v3, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v17
+; VI-NEXT: v_add_f16_sdwa v17, v2, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v2
+; VI-NEXT: v_or_b32_e32 v2, v2, v17
+; VI-NEXT: v_add_f16_sdwa v17, v1, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v16, v0, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v17
+; VI-NEXT: v_or_b32_e32 v0, v0, v16
+; VI-NEXT: .LBB33_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f16_to_v8f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB33_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v15, v15, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v14, v14, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v13, v13, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v12, v12, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v11, v11, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v10, v10, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v9, v9, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v8, v8, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB33_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f16_to_v8f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB33_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v15, 0x200, v15 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v14, 0x200, v14 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v13, 0x200, v13 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v12, 0x200, v12 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v11, 0x200, v11 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v10, 0x200, v10 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v9, 0x200, v9 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v8, 0x200, v8 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB33_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <32 x half> %a1 to <8 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x half> %a to <8 x double>
+ br label %end
+
+end:
+ %phi = phi <8 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x double> %phi
+}
+
+define <32 x bfloat> @v_bitcast_v8f64_to_v32bf16(<8 x double> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v8f64_to_v32bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v15
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v14
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v13
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v12
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v11
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v10
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v9
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GCN-NEXT: v_and_b32_e32 v32, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v33, 16, v7
+; GCN-NEXT: v_and_b32_e32 v34, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v35, 16, v6
+; GCN-NEXT: v_and_b32_e32 v36, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v37, 16, v5
+; GCN-NEXT: v_and_b32_e32 v38, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v39, 16, v4
+; GCN-NEXT: v_and_b32_e32 v48, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v49, 16, v3
+; GCN-NEXT: v_and_b32_e32 v50, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v51, 16, v2
+; GCN-NEXT: v_and_b32_e32 v52, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v53, 16, v1
+; GCN-NEXT: v_and_b32_e32 v54, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v55, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: .LBB34_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GCN-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GCN-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GCN-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GCN-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v15
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v14
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v13
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v12
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v11
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v10
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v9
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GCN-NEXT: v_and_b32_e32 v32, 0xffff0000, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v33, 16, v7
+; GCN-NEXT: v_and_b32_e32 v34, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v35, 16, v6
+; GCN-NEXT: v_and_b32_e32 v36, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v37, 16, v5
+; GCN-NEXT: v_and_b32_e32 v38, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v39, 16, v4
+; GCN-NEXT: v_and_b32_e32 v48, 0xffff0000, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v49, 16, v3
+; GCN-NEXT: v_and_b32_e32 v50, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v51, 16, v2
+; GCN-NEXT: v_and_b32_e32 v52, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v53, 16, v1
+; GCN-NEXT: v_and_b32_e32 v54, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v55, 16, v0
+; GCN-NEXT: .LBB34_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v55
+; GCN-NEXT: v_mov_b32_e32 v1, v54
+; GCN-NEXT: v_mov_b32_e32 v2, v53
+; GCN-NEXT: v_mov_b32_e32 v3, v52
+; GCN-NEXT: v_mov_b32_e32 v4, v51
+; GCN-NEXT: v_mov_b32_e32 v5, v50
+; GCN-NEXT: v_mov_b32_e32 v6, v49
+; GCN-NEXT: v_mov_b32_e32 v7, v48
+; GCN-NEXT: v_mov_b32_e32 v8, v39
+; GCN-NEXT: v_mov_b32_e32 v9, v38
+; GCN-NEXT: v_mov_b32_e32 v10, v37
+; GCN-NEXT: v_mov_b32_e32 v11, v36
+; GCN-NEXT: v_mov_b32_e32 v12, v35
+; GCN-NEXT: v_mov_b32_e32 v13, v34
+; GCN-NEXT: v_mov_b32_e32 v14, v33
+; GCN-NEXT: v_mov_b32_e32 v15, v32
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v8f64_to_v32bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB34_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; VI-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; VI-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; VI-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; VI-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; VI-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; VI-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: .LBB34_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v8f64_to_v32bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB34_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX9-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX9-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX9-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX9-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX9-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX9-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: .LBB34_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v8f64_to_v32bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB34_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[14:15], v[14:15], 1.0
+; GFX11-NEXT: v_add_f64 v[12:13], v[12:13], 1.0
+; GFX11-NEXT: v_add_f64 v[10:11], v[10:11], 1.0
+; GFX11-NEXT: v_add_f64 v[8:9], v[8:9], 1.0
+; GFX11-NEXT: v_add_f64 v[6:7], v[6:7], 1.0
+; GFX11-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GFX11-NEXT: v_add_f64 v[2:3], v[2:3], 1.0
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB34_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <8 x double> %a, splat (double 1.000000e+00)
+ %a2 = bitcast <8 x double> %a1 to <32 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <8 x double> %a to <32 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <32 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x bfloat> %phi
+}
+
+define <8 x double> @v_bitcast_v32bf16_to_v8f64(<32 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32bf16_to_v8f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_mul_f32_e32 v44, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v45, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v42, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v43, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v41, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v51, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v40, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v49, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v55, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v39, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v54, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v37, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v53, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v36, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v52, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v34, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v50, 1.0, v17
+; GCN-NEXT: v_mul_f32_e32 v33, 1.0, v16
+; GCN-NEXT: v_mul_f32_e32 v48, 1.0, v19
+; GCN-NEXT: v_mul_f32_e32 v32, 1.0, v18
+; GCN-NEXT: v_mul_f32_e32 v38, 1.0, v21
+; GCN-NEXT: v_mul_f32_e32 v31, 1.0, v20
+; GCN-NEXT: v_mul_f32_e32 v35, 1.0, v23
+; GCN-NEXT: v_mul_f32_e32 v21, 1.0, v22
+; GCN-NEXT: v_mul_f32_e32 v25, 1.0, v25
+; GCN-NEXT: v_mul_f32_e32 v19, 1.0, v24
+; GCN-NEXT: v_mul_f32_e32 v23, 1.0, v27
+; GCN-NEXT: v_mul_f32_e32 v18, 1.0, v26
+; GCN-NEXT: v_mul_f32_e32 v22, 1.0, v29
+; GCN-NEXT: v_mul_f32_e32 v17, 1.0, v28
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v47
+; GCN-NEXT: v_mul_f32_e32 v20, 1.0, v46
+; GCN-NEXT: v_mul_f32_e32 v16, 1.0, v30
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB35_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v44
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v42
+; GCN-NEXT: v_alignbit_b32 v0, v0, v45, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v43, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v41
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v40
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v54
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v52
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v50
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v2, v51, 16
+; GCN-NEXT: v_alignbit_b32 v3, v3, v49, 16
+; GCN-NEXT: v_alignbit_b32 v4, v4, v39, 16
+; GCN-NEXT: v_alignbit_b32 v5, v5, v37, 16
+; GCN-NEXT: v_alignbit_b32 v6, v6, v36, 16
+; GCN-NEXT: v_alignbit_b32 v7, v7, v34, 16
+; GCN-NEXT: v_alignbit_b32 v8, v8, v33, 16
+; GCN-NEXT: v_alignbit_b32 v9, v9, v32, 16
+; GCN-NEXT: v_alignbit_b32 v10, v10, v31, 16
+; GCN-NEXT: v_alignbit_b32 v11, v11, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v12, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v13, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v14, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: .LBB35_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB35_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v45
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v44
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v43
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v42
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v51
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v41
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v49
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v40
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v39
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v55
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v37
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v54
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v36
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v53
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v34
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v52
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v33
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v50
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v32
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff0000, v48
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v31
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff0000, v38
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v21
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v35
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v19
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v25
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v18
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v23
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v17
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v22
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v16
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v20
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GCN-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GCN-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GCN-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GCN-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GCN-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GCN-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GCN-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GCN-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GCN-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v28, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_alignbit_b32 v3, v5, v4, 16
+; GCN-NEXT: v_alignbit_b32 v4, v7, v6, 16
+; GCN-NEXT: v_alignbit_b32 v5, v9, v8, 16
+; GCN-NEXT: v_alignbit_b32 v6, v11, v10, 16
+; GCN-NEXT: v_alignbit_b32 v7, v13, v12, 16
+; GCN-NEXT: v_alignbit_b32 v8, v15, v14, 16
+; GCN-NEXT: v_alignbit_b32 v9, v26, v24, 16
+; GCN-NEXT: v_alignbit_b32 v10, v28, v27, 16
+; GCN-NEXT: v_alignbit_b32 v11, v29, v21, 16
+; GCN-NEXT: v_alignbit_b32 v12, v25, v19, 16
+; GCN-NEXT: v_alignbit_b32 v13, v23, v18, 16
+; GCN-NEXT: v_alignbit_b32 v14, v22, v17, 16
+; GCN-NEXT: v_alignbit_b32 v15, v20, v16, 16
+; GCN-NEXT: .LBB35_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32bf16_to_v8f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB35_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v15, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v15
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; VI-NEXT: v_alignbit_b32 v15, v15, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v14
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v14, 16, v14
+; VI-NEXT: v_alignbit_b32 v14, v14, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v13
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; VI-NEXT: v_alignbit_b32 v13, v13, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v12
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; VI-NEXT: v_alignbit_b32 v12, v12, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v11
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; VI-NEXT: v_alignbit_b32 v11, v11, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v10
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v10, 16, v10
+; VI-NEXT: v_alignbit_b32 v10, v10, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v9
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; VI-NEXT: v_alignbit_b32 v9, v9, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v8
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v8, 16, v8
+; VI-NEXT: v_alignbit_b32 v8, v8, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v7
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_alignbit_b32 v7, v7, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v6
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_alignbit_b32 v6, v6, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v5
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_alignbit_b32 v5, v5, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v4
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_alignbit_b32 v4, v4, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v3
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v3, v3, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v2
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_alignbit_b32 v2, v2, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v1
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v16, 16
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v0
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v16, 16
+; VI-NEXT: .LBB35_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32bf16_to_v8f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB35_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v15, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v17, v18, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v15, v15, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v14
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v14, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v14, v14, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v13
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v13, v13, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v12
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v12, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v12, v12, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v11
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v11, v11, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v10
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v10, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v10, v10, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v9
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v9, v9, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v8
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v8, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v8, v8, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v7, v7, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v6, v6, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v5, v5, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v4, v4, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v3, v3, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v2, v2, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v1, v1, v16, s7
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; GFX9-NEXT: v_perm_b32 v0, v0, v16, s7
+; GFX9-NEXT: .LBB35_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32bf16_to_v8f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB35_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v17, 16, v14
+; GFX11-NEXT: v_lshlrev_b32_e32 v16, 16, v15
+; GFX11-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v17 :: v_dual_add_f32 v16, 0x40c00000, v16
+; GFX11-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v16, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v16
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_bfe_u32 v23, v14, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v16, v16
+; GFX11-NEXT: v_add3_u32 v21, v21, v17, 0x7fff
+; GFX11-NEXT: v_add3_u32 v18, v18, v16, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v18, v19, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v23, v14, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX11-NEXT: v_or_b32_e32 v18, 0x400000, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_bfe_u32 v20, v15, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v15
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v20, v20, v15, 0x7fff
+; GFX11-NEXT: v_dual_cndmask_b32 v15, v20, v22 :: v_dual_lshlrev_b32 v20, 16, v13
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v15, v15, v16, 0x7060302
+; GFX11-NEXT: v_dual_cndmask_b32 v17, v21, v18 :: v_dual_add_f32 v18, 0x40c00000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_cndmask_b32 v14, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_perm_b32 v14, v14, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v13, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_cndmask_b32 v16, v16, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v13, v13
+; GFX11-NEXT: v_add3_u32 v17, v17, v13, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cndmask_b32_e32 v13, v17, v21, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_perm_b32 v13, v13, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v11
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_bfe_u32 v18, v12, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v18, v18, v12, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v10
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v12, v12, v17, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v17, v11, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v17, v17, v11, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_bfe_u32 v19, v10, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v19, v19, v10, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v10
+; GFX11-NEXT: v_perm_b32 v11, v11, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: v_dual_cndmask_b32 v10, v19, v22 :: v_dual_lshlrev_b32 v21, 16, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_lshlrev_b32 v19, 16, v8
+; GFX11-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX11-NEXT: v_perm_b32 v10, v10, v17, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v16, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_dual_add_f32 v19, 0x40c00000, v19 :: v_dual_add_f32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_add3_u32 v16, v16, v18, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_bfe_u32 v18, v8, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v9, 0x40c00000, v9
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_add3_u32 v18, v18, v8, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v9, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v9
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: v_add3_u32 v17, v17, v9, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v19, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v8
+; GFX11-NEXT: v_perm_b32 v9, v9, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v17, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v18, v22, vcc_lo
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v6
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v8, v8, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_dual_add_f32 v6, 0x40c00000, v6 :: v_dual_add_f32 v19, 0x40c00000, v21
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v19, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_add3_u32 v16, v16, v19, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_bfe_u32 v19, v6, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v16, v16, v20 :: v_dual_add_f32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add3_u32 v19, v19, v6, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v17, v7, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: v_add3_u32 v17, v17, v7, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v17, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v17, v22, v18, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v6
+; GFX11-NEXT: v_dual_add_f32 v18, 0x40c00000, v21 :: v_dual_cndmask_b32 v17, v17, v20
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v20, v18, 16, 1
+; GFX11-NEXT: v_dual_cndmask_b32 v6, v19, v22 :: v_dual_lshlrev_b32 v19, 16, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_perm_b32 v7, v7, v16, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v6, v6, v17, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v17, 0x40c00000, v19
+; GFX11-NEXT: v_add3_u32 v19, v20, v18, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX11-NEXT: v_bfe_u32 v22, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_lshlrev_b32_e32 v20, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v16, v5, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: v_add3_u32 v16, v16, v5, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v16, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v16, v22, v17, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v5, v5, v18, 0x7060302
+; GFX11-NEXT: v_bfe_u32 v18, v4, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v16, v19, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: v_add3_u32 v18, v18, v4, 0x7fff
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v20 :: v_dual_lshlrev_b32 v20, 16, v2
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v18, v19, vcc_lo
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v18, 0x40c00000, v20
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v17
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_perm_b32 v4, v4, v16, 0x7060302
+; GFX11-NEXT: v_add3_u32 v19, v21, v17, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v21, v3, 16, 1
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v18
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v19, v20, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v21, v3, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v3
+; GFX11-NEXT: v_add3_u32 v21, v22, v18, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v22, 16, v1
+; GFX11-NEXT: v_bfe_u32 v24, v2, 16, 1
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v19, v20, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v19, 0x40c00000, v22
+; GFX11-NEXT: v_add3_u32 v20, v24, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_4) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v3, v3, v17, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v18, v21, v23, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v2
+; GFX11-NEXT: v_bfe_u32 v22, v19, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v20, v21 :: v_dual_lshlrev_b32 v23, 16, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
+; GFX11-NEXT: v_add3_u32 v21, v22, v19, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v19
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v19, v19
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add_f32_e32 v20, 0x40c00000, v23
+; GFX11-NEXT: v_perm_b32 v2, v2, v18, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v19, v21, v22 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v24, v20, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v20
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v21, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v24, v24, v20, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_add3_u32 v21, v21, v0, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_bfe_u32 v23, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_add3_u32 v22, v23, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v22, v23, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v20, v20
+; GFX11-NEXT: v_perm_b32 v1, v1, v19, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v20, v24, v25, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v21, v26, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v20, 0x7060302
+; GFX11-NEXT: .LBB35_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <32 x bfloat> %a1 to <8 x double>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x bfloat> %a to <8 x double>
+ br label %end
+
+end:
+ %phi = phi <8 x double> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <8 x double> %phi
+}
+
+define <32 x half> @v_bitcast_v32i16_to_v32f16(<32 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i16_to_v32f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: s_waitcnt expcnt(1)
+; GCN-NEXT: v_mov_b32_e32 v62, v30
+; GCN-NEXT: v_mov_b32_e32 v61, v29
+; GCN-NEXT: v_mov_b32_e32 v60, v28
+; GCN-NEXT: v_mov_b32_e32 v59, v27
+; GCN-NEXT: v_mov_b32_e32 v58, v26
+; GCN-NEXT: v_mov_b32_e32 v57, v25
+; GCN-NEXT: v_mov_b32_e32 v56, v24
+; GCN-NEXT: v_mov_b32_e32 v47, v23
+; GCN-NEXT: v_mov_b32_e32 v46, v22
+; GCN-NEXT: v_mov_b32_e32 v45, v21
+; GCN-NEXT: v_mov_b32_e32 v44, v20
+; GCN-NEXT: v_mov_b32_e32 v43, v19
+; GCN-NEXT: v_mov_b32_e32 v42, v18
+; GCN-NEXT: v_mov_b32_e32 v41, v17
+; GCN-NEXT: v_mov_b32_e32 v40, v16
+; GCN-NEXT: v_mov_b32_e32 v55, v15
+; GCN-NEXT: v_mov_b32_e32 v54, v14
+; GCN-NEXT: v_mov_b32_e32 v53, v13
+; GCN-NEXT: v_mov_b32_e32 v52, v12
+; GCN-NEXT: v_mov_b32_e32 v51, v11
+; GCN-NEXT: v_mov_b32_e32 v50, v10
+; GCN-NEXT: v_mov_b32_e32 v49, v9
+; GCN-NEXT: v_mov_b32_e32 v48, v8
+; GCN-NEXT: v_mov_b32_e32 v39, v7
+; GCN-NEXT: v_mov_b32_e32 v38, v6
+; GCN-NEXT: v_mov_b32_e32 v37, v5
+; GCN-NEXT: v_mov_b32_e32 v36, v4
+; GCN-NEXT: v_mov_b32_e32 v35, v3
+; GCN-NEXT: v_mov_b32_e32 v34, v2
+; GCN-NEXT: v_mov_b32_e32 v33, v1
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB36_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v42
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v46
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v47
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v56
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v57
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v58
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v59
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v60
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v61
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v62
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v63
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr46
+; GCN-NEXT: ; implicit-def: $vgpr47
+; GCN-NEXT: ; implicit-def: $vgpr56
+; GCN-NEXT: ; implicit-def: $vgpr57
+; GCN-NEXT: ; implicit-def: $vgpr58
+; GCN-NEXT: ; implicit-def: $vgpr59
+; GCN-NEXT: ; implicit-def: $vgpr60
+; GCN-NEXT: ; implicit-def: $vgpr61
+; GCN-NEXT: ; implicit-def: $vgpr62
+; GCN-NEXT: ; implicit-def: $vgpr63
+; GCN-NEXT: .LBB36_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB36_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_add_i32_e32 v31, vcc, 3, v63
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v62
+; GCN-NEXT: v_add_i32_e32 v29, vcc, 3, v61
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v60
+; GCN-NEXT: v_add_i32_e32 v27, vcc, 3, v59
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v58
+; GCN-NEXT: v_add_i32_e32 v25, vcc, 3, v57
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v56
+; GCN-NEXT: v_add_i32_e32 v23, vcc, 3, v47
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v46
+; GCN-NEXT: v_add_i32_e32 v21, vcc, 3, v45
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v44
+; GCN-NEXT: v_add_i32_e32 v19, vcc, 3, v43
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v42
+; GCN-NEXT: v_add_i32_e32 v17, vcc, 3, v41
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v40
+; GCN-NEXT: v_add_i32_e32 v15, vcc, 3, v55
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v54
+; GCN-NEXT: v_add_i32_e32 v13, vcc, 3, v53
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v52
+; GCN-NEXT: v_add_i32_e32 v11, vcc, 3, v51
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v50
+; GCN-NEXT: v_add_i32_e32 v9, vcc, 3, v49
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v48
+; GCN-NEXT: v_add_i32_e32 v7, vcc, 3, v39
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v5, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v28
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v30
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: .LBB36_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i16_to_v32f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB36_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v16, 3
+; VI-NEXT: v_add_u16_sdwa v19, v15, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v15, 3, v15
+; VI-NEXT: v_or_b32_e32 v15, v15, v19
+; VI-NEXT: v_add_u16_sdwa v19, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v14, 3, v14
+; VI-NEXT: v_or_b32_e32 v14, v14, v19
+; VI-NEXT: v_add_u16_sdwa v19, v13, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v13, 3, v13
+; VI-NEXT: v_or_b32_e32 v13, v13, v19
+; VI-NEXT: v_add_u16_sdwa v19, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v12, 3, v12
+; VI-NEXT: v_or_b32_e32 v12, v12, v19
+; VI-NEXT: v_add_u16_sdwa v19, v11, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v11, 3, v11
+; VI-NEXT: v_or_b32_e32 v11, v11, v19
+; VI-NEXT: v_add_u16_sdwa v19, v10, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v10, 3, v10
+; VI-NEXT: v_or_b32_e32 v10, v10, v19
+; VI-NEXT: v_add_u16_sdwa v19, v9, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v9, 3, v9
+; VI-NEXT: v_or_b32_e32 v9, v9, v19
+; VI-NEXT: v_add_u16_sdwa v19, v8, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v8, 3, v8
+; VI-NEXT: v_or_b32_e32 v8, v8, v19
+; VI-NEXT: v_add_u16_sdwa v19, v7, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v7, 3, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v19
+; VI-NEXT: v_add_u16_sdwa v19, v6, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v6, 3, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v19
+; VI-NEXT: v_add_u16_sdwa v19, v5, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v5, 3, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v19
+; VI-NEXT: v_add_u16_sdwa v19, v4, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v4, 3, v4
+; VI-NEXT: v_add_u16_sdwa v17, v0, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v18, v1, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v4, v19
+; VI-NEXT: v_add_u16_sdwa v19, v2, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v16, v3, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v3, 3, v3
+; VI-NEXT: v_add_u16_e32 v2, 3, v2
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v3, v3, v16
+; VI-NEXT: v_or_b32_e32 v2, v2, v19
+; VI-NEXT: v_or_b32_e32 v1, v1, v18
+; VI-NEXT: v_or_b32_e32 v0, v0, v17
+; VI-NEXT: .LBB36_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i16_to_v32f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB36_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB36_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i16_to_v32f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB36_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB36_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i16> %a, splat (i16 3)
+ %a2 = bitcast <32 x i16> %a1 to <32 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i16> %a to <32 x half>
+ br label %end
+
+end:
+ %phi = phi <32 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x half> %phi
+}
+
+define <32 x i16> @v_bitcast_v32f16_to_v32i16(<32 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f16_to_v32i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
+; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt vmcnt(1)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v31
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v30, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v32
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB37_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v30
+; GCN-NEXT: v_add_f32_e32 v31, 0x38000000, v31
+; GCN-NEXT: v_add_f32_e32 v30, 0x38000000, v30
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v31
+; GCN-NEXT: v_cvt_f16_f32_e32 v30, v30
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v31
+; GCN-NEXT: v_or_b32_e32 v30, v30, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x38000000, v27
+; GCN-NEXT: v_add_f32_e32 v26, 0x38000000, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v27
+; GCN-NEXT: v_or_b32_e32 v26, v26, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_add_f32_e32 v23, 0x38000000, v23
+; GCN-NEXT: v_add_f32_e32 v22, 0x38000000, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v23
+; GCN-NEXT: v_or_b32_e32 v22, v22, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_add_f32_e32 v19, 0x38000000, v19
+; GCN-NEXT: v_add_f32_e32 v18, 0x38000000, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v19
+; GCN-NEXT: v_or_b32_e32 v18, v18, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v15, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v14, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v15
+; GCN-NEXT: v_or_b32_e32 v14, v14, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v11, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v10, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v11
+; GCN-NEXT: v_or_b32_e32 v10, v10, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v7
+; GCN-NEXT: v_or_b32_e32 v6, v6, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v32, 16, v3
+; GCN-NEXT: v_or_b32_e32 v2, v2, v32
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v28
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v17, 0x38000000, v17
+; GCN-NEXT: v_add_f32_e32 v16, 0x38000000, v16
+; GCN-NEXT: v_add_f32_e32 v21, 0x38000000, v21
+; GCN-NEXT: v_add_f32_e32 v20, 0x38000000, v20
+; GCN-NEXT: v_add_f32_e32 v25, 0x38000000, v25
+; GCN-NEXT: v_add_f32_e32 v24, 0x38000000, v24
+; GCN-NEXT: v_add_f32_e32 v29, 0x38000000, v29
+; GCN-NEXT: v_add_f32_e32 v28, 0x38000000, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v9, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v8, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v13, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v12, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: v_or_b32_e32 v0, v0, v1
+; GCN-NEXT: v_or_b32_e32 v4, v4, v5
+; GCN-NEXT: v_or_b32_e32 v8, v8, v9
+; GCN-NEXT: v_or_b32_e32 v12, v12, v13
+; GCN-NEXT: v_or_b32_e32 v16, v16, v17
+; GCN-NEXT: v_or_b32_e32 v20, v20, v21
+; GCN-NEXT: v_or_b32_e32 v24, v24, v25
+; GCN-NEXT: v_or_b32_e32 v28, v28, v29
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v5, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v9, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v13, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v17, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v21, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v25, 16
+; GCN-NEXT: v_alignbit_b32 v29, v30, v29, 16
+; GCN-NEXT: .LBB37_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f16_to_v32i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB37_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v17, 0x200
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v15
+; VI-NEXT: v_add_f16_sdwa v15, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v15, v19, v15
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v14
+; VI-NEXT: v_add_f16_sdwa v14, v14, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v14, v19, v14
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v13
+; VI-NEXT: v_add_f16_sdwa v13, v13, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v13, v19, v13
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v12
+; VI-NEXT: v_add_f16_sdwa v12, v12, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v12, v19, v12
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v11
+; VI-NEXT: v_add_f16_sdwa v11, v11, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v11, v19, v11
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v10
+; VI-NEXT: v_add_f16_sdwa v10, v10, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v10, v19, v10
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v9
+; VI-NEXT: v_add_f16_sdwa v9, v9, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v9, v19, v9
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v8
+; VI-NEXT: v_add_f16_sdwa v8, v8, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v8, v19, v8
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v7
+; VI-NEXT: v_add_f16_sdwa v7, v7, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v19, v7
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v6
+; VI-NEXT: v_add_f16_sdwa v6, v6, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v19, v6
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v5
+; VI-NEXT: v_add_f16_sdwa v5, v5, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v19, v5
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v4
+; VI-NEXT: v_add_f16_sdwa v4, v4, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v16, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v18, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v19, v4
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v2
+; VI-NEXT: v_add_f16_sdwa v2, v2, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_sdwa v17, v3, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v17
+; VI-NEXT: v_or_b32_e32 v2, v19, v2
+; VI-NEXT: v_or_b32_e32 v1, v18, v1
+; VI-NEXT: v_or_b32_e32 v0, v16, v0
+; VI-NEXT: .LBB37_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f16_to_v32i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB37_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v15, v15, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v14, v14, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v13, v13, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v12, v12, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v11, v11, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v10, v10, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v9, v9, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v8, v8, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB37_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f16_to_v32i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB37_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v15, 0x200, v15 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v14, 0x200, v14 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v13, 0x200, v13 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v12, 0x200, v12 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v11, 0x200, v11 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v10, 0x200, v10 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v9, 0x200, v9 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v8, 0x200, v8 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB37_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <32 x half> %a1 to <32 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x half> %a to <32 x i16>
+ br label %end
+
+end:
+ %phi = phi <32 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i16> %phi
+}
+
+define <32 x bfloat> @v_bitcast_v32i16_to_v32bf16(<32 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32i16_to_v32bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v55, v30
+; GCN-NEXT: v_mov_b32_e32 v54, v28
+; GCN-NEXT: v_mov_b32_e32 v53, v26
+; GCN-NEXT: v_mov_b32_e32 v52, v24
+; GCN-NEXT: v_mov_b32_e32 v51, v22
+; GCN-NEXT: v_mov_b32_e32 v50, v20
+; GCN-NEXT: v_mov_b32_e32 v49, v18
+; GCN-NEXT: v_mov_b32_e32 v48, v16
+; GCN-NEXT: v_mov_b32_e32 v39, v14
+; GCN-NEXT: v_mov_b32_e32 v38, v12
+; GCN-NEXT: v_mov_b32_e32 v37, v10
+; GCN-NEXT: v_mov_b32_e32 v36, v8
+; GCN-NEXT: v_mov_b32_e32 v35, v6
+; GCN-NEXT: v_mov_b32_e32 v34, v4
+; GCN-NEXT: v_mov_b32_e32 v33, v2
+; GCN-NEXT: v_mov_b32_e32 v32, v0
+; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32
+; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v7
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v9
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v11
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v15
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v29
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v31, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_4
+; GCN-NEXT: .LBB38_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB38_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v32
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v33
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v34
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v36
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v38
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v48
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v50
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v52
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v55
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB38_2
+; GCN-NEXT: .LBB38_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v55
+; GCN-NEXT: s_mov_b32 s6, 0x30000
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v54
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v53
+; GCN-NEXT: v_add_i32_e32 v6, vcc, 3, v52
+; GCN-NEXT: v_add_i32_e32 v8, vcc, 3, v51
+; GCN-NEXT: v_add_i32_e32 v10, vcc, 3, v50
+; GCN-NEXT: v_add_i32_e32 v12, vcc, 3, v49
+; GCN-NEXT: v_add_i32_e32 v14, vcc, 3, v48
+; GCN-NEXT: v_add_i32_e32 v16, vcc, 3, v39
+; GCN-NEXT: v_add_i32_e32 v18, vcc, 3, v38
+; GCN-NEXT: v_add_i32_e32 v20, vcc, 3, v37
+; GCN-NEXT: v_add_i32_e32 v22, vcc, 3, v36
+; GCN-NEXT: v_add_i32_e32 v24, vcc, 3, v35
+; GCN-NEXT: v_add_i32_e32 v26, vcc, 3, v34
+; GCN-NEXT: v_add_i32_e32 v28, vcc, 3, v33
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 3, v32
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff, v4
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff, v6
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff, v8
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff, v10
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff, v12
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff, v14
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff, v16
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff, v18
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff, v20
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff, v22
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff, v24
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff, v26
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff, v28
+; GCN-NEXT: v_and_b32_e32 v30, 0xffff, v30
+; GCN-NEXT: v_or_b32_e32 v0, v31, v0
+; GCN-NEXT: v_or_b32_e32 v2, v29, v2
+; GCN-NEXT: v_or_b32_e32 v4, v27, v4
+; GCN-NEXT: v_or_b32_e32 v6, v25, v6
+; GCN-NEXT: v_or_b32_e32 v8, v23, v8
+; GCN-NEXT: v_or_b32_e32 v10, v21, v10
+; GCN-NEXT: v_or_b32_e32 v12, v19, v12
+; GCN-NEXT: v_or_b32_e32 v14, v17, v14
+; GCN-NEXT: v_or_b32_e32 v15, v15, v16
+; GCN-NEXT: v_or_b32_e32 v13, v13, v18
+; GCN-NEXT: v_or_b32_e32 v11, v11, v20
+; GCN-NEXT: v_or_b32_e32 v9, v9, v22
+; GCN-NEXT: v_or_b32_e32 v7, v7, v24
+; GCN-NEXT: v_or_b32_e32 v5, v5, v26
+; GCN-NEXT: v_or_b32_e32 v3, v3, v28
+; GCN-NEXT: v_or_b32_e32 v1, v1, v30
+; GCN-NEXT: v_add_i32_e32 v30, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v28, vcc, s6, v2
+; GCN-NEXT: v_add_i32_e32 v26, vcc, s6, v4
+; GCN-NEXT: v_add_i32_e32 v24, vcc, s6, v6
+; GCN-NEXT: v_add_i32_e32 v22, vcc, s6, v8
+; GCN-NEXT: v_add_i32_e32 v20, vcc, s6, v10
+; GCN-NEXT: v_add_i32_e32 v18, vcc, s6, v12
+; GCN-NEXT: v_add_i32_e32 v16, vcc, s6, v14
+; GCN-NEXT: v_add_i32_e32 v14, vcc, s6, v15
+; GCN-NEXT: v_add_i32_e32 v12, vcc, s6, v13
+; GCN-NEXT: v_add_i32_e32 v10, vcc, s6, v11
+; GCN-NEXT: v_add_i32_e32 v8, vcc, s6, v9
+; GCN-NEXT: v_add_i32_e32 v6, vcc, s6, v7
+; GCN-NEXT: v_add_i32_e32 v4, vcc, s6, v5
+; GCN-NEXT: v_add_i32_e32 v2, vcc, s6, v3
+; GCN-NEXT: v_add_i32_e32 v0, vcc, s6, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v4
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v6
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v8
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v8
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v10
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v10
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v12
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v12
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v14
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v14
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v16
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v18
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v20
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v22
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v24
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v26
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v28
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v28
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v30
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v30
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32i16_to_v32bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB38_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v16, 3
+; VI-NEXT: v_add_u16_sdwa v19, v15, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v15, 3, v15
+; VI-NEXT: v_or_b32_e32 v15, v15, v19
+; VI-NEXT: v_add_u16_sdwa v19, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v14, 3, v14
+; VI-NEXT: v_or_b32_e32 v14, v14, v19
+; VI-NEXT: v_add_u16_sdwa v19, v13, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v13, 3, v13
+; VI-NEXT: v_or_b32_e32 v13, v13, v19
+; VI-NEXT: v_add_u16_sdwa v19, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v12, 3, v12
+; VI-NEXT: v_or_b32_e32 v12, v12, v19
+; VI-NEXT: v_add_u16_sdwa v19, v11, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v11, 3, v11
+; VI-NEXT: v_or_b32_e32 v11, v11, v19
+; VI-NEXT: v_add_u16_sdwa v19, v10, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v10, 3, v10
+; VI-NEXT: v_or_b32_e32 v10, v10, v19
+; VI-NEXT: v_add_u16_sdwa v19, v9, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v9, 3, v9
+; VI-NEXT: v_or_b32_e32 v9, v9, v19
+; VI-NEXT: v_add_u16_sdwa v19, v8, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v8, 3, v8
+; VI-NEXT: v_or_b32_e32 v8, v8, v19
+; VI-NEXT: v_add_u16_sdwa v19, v7, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v7, 3, v7
+; VI-NEXT: v_or_b32_e32 v7, v7, v19
+; VI-NEXT: v_add_u16_sdwa v19, v6, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v6, 3, v6
+; VI-NEXT: v_or_b32_e32 v6, v6, v19
+; VI-NEXT: v_add_u16_sdwa v19, v5, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v5, 3, v5
+; VI-NEXT: v_or_b32_e32 v5, v5, v19
+; VI-NEXT: v_add_u16_sdwa v19, v4, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v4, 3, v4
+; VI-NEXT: v_add_u16_sdwa v17, v0, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v18, v1, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v4, v19
+; VI-NEXT: v_add_u16_sdwa v19, v2, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v16, v3, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v3, 3, v3
+; VI-NEXT: v_add_u16_e32 v2, 3, v2
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v3, v3, v16
+; VI-NEXT: v_or_b32_e32 v2, v2, v19
+; VI-NEXT: v_or_b32_e32 v1, v1, v18
+; VI-NEXT: v_or_b32_e32 v0, v0, v17
+; VI-NEXT: .LBB38_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32i16_to_v32bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB38_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB38_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32i16_to_v32bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB38_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v15, v15, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v14, v14, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v13, v13, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v12, v12, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v11, v11, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v10, v10, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v9, v9, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v8, v8, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v7, v7, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v6, v6, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v5, v5, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v4, v4, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v3, v3, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v2, v2, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: .LBB38_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <32 x i16> %a, splat (i16 3)
+ %a2 = bitcast <32 x i16> %a1 to <32 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x i16> %a to <32 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <32 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x bfloat> %phi
+}
+
+define <32 x i16> @v_bitcast_v32bf16_to_v32i16(<32 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32bf16_to_v32i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: buffer_load_dword v55, off, s[0:3], s32 offset:4
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: v_mul_f32_e32 v63, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v62, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v33, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v32, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v61, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v60, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v35, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v34, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v59, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v58, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v37, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v36, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v57, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v56, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v39, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v38, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v47, 1.0, v16
+; GCN-NEXT: v_mul_f32_e32 v46, 1.0, v17
+; GCN-NEXT: v_mul_f32_e32 v49, 1.0, v18
+; GCN-NEXT: v_mul_f32_e32 v48, 1.0, v19
+; GCN-NEXT: v_mul_f32_e32 v45, 1.0, v20
+; GCN-NEXT: v_mul_f32_e32 v44, 1.0, v21
+; GCN-NEXT: v_mul_f32_e32 v51, 1.0, v22
+; GCN-NEXT: v_mul_f32_e32 v50, 1.0, v23
+; GCN-NEXT: v_mul_f32_e32 v43, 1.0, v24
+; GCN-NEXT: v_mul_f32_e32 v42, 1.0, v25
+; GCN-NEXT: v_mul_f32_e32 v53, 1.0, v26
+; GCN-NEXT: v_mul_f32_e32 v52, 1.0, v27
+; GCN-NEXT: v_mul_f32_e32 v41, 1.0, v28
+; GCN-NEXT: v_mul_f32_e32 v40, 1.0, v29
+; GCN-NEXT: v_mul_f32_e32 v54, 1.0, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v55
+; GCN-NEXT: v_mul_f32_e32 v55, 1.0, v31
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB39_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v63
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v62
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v32
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v61
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v60
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v59
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v58
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v37
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v57
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v56
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v39
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v16, 16, v47
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v46
+; GCN-NEXT: v_lshrrev_b32_e32 v18, 16, v49
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v45
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v44
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v51
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v50
+; GCN-NEXT: v_lshrrev_b32_e32 v24, 16, v43
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v42
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v52
+; GCN-NEXT: v_lshrrev_b32_e32 v28, 16, v41
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v40
+; GCN-NEXT: v_lshrrev_b32_e32 v30, 16, v54
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v55
+; GCN-NEXT: ; implicit-def: $vgpr63
+; GCN-NEXT: ; implicit-def: $vgpr62
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr61
+; GCN-NEXT: ; implicit-def: $vgpr60
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr59
+; GCN-NEXT: ; implicit-def: $vgpr58
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr57
+; GCN-NEXT: ; implicit-def: $vgpr56
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr47
+; GCN-NEXT: ; implicit-def: $vgpr46
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: .LBB39_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB39_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v63
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v62
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v61
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v60
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v59
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v58
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v57
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v56
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v47
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v46
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v45
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v44
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v43
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v42
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v41
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v40
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v54
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v55
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v53
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v52
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v51
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v50
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v49
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v48
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v39
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v38
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff0000, v37
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v36
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff0000, v35
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v34
+; GCN-NEXT: v_and_b32_e32 v30, 0xffff0000, v33
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v32
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v32, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v33, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v15
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v16
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v17
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v18
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v19
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v20
+; GCN-NEXT: v_add_f32_e32 v19, 0x40c00000, v21
+; GCN-NEXT: v_add_f32_e32 v21, 0x40c00000, v22
+; GCN-NEXT: v_add_f32_e32 v20, 0x40c00000, v23
+; GCN-NEXT: v_add_f32_e32 v34, 0x40c00000, v24
+; GCN-NEXT: v_add_f32_e32 v22, 0x40c00000, v25
+; GCN-NEXT: v_add_f32_e32 v25, 0x40c00000, v26
+; GCN-NEXT: v_add_f32_e32 v24, 0x40c00000, v27
+; GCN-NEXT: v_add_f32_e32 v35, 0x40c00000, v28
+; GCN-NEXT: v_add_f32_e32 v26, 0x40c00000, v29
+; GCN-NEXT: v_add_f32_e32 v29, 0x40c00000, v30
+; GCN-NEXT: v_add_f32_e32 v28, 0x40c00000, v31
+; GCN-NEXT: v_lshrrev_b32_e32 v30, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v36, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v37, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v38, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v39, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v48, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v49, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v50, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v15
+; GCN-NEXT: v_and_b32_e32 v51, 0xffff0000, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v16
+; GCN-NEXT: v_and_b32_e32 v52, 0xffff0000, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v19
+; GCN-NEXT: v_and_b32_e32 v53, 0xffff0000, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v20
+; GCN-NEXT: v_and_b32_e32 v54, 0xffff0000, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v22
+; GCN-NEXT: v_and_b32_e32 v55, 0xffff0000, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v24
+; GCN-NEXT: v_and_b32_e32 v40, 0xffff0000, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v26
+; GCN-NEXT: v_and_b32_e32 v41, 0xffff0000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v28
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GCN-NEXT: v_alignbit_b32 v0, v30, v0, 16
+; GCN-NEXT: v_alignbit_b32 v4, v36, v2, 16
+; GCN-NEXT: v_alignbit_b32 v8, v37, v32, 16
+; GCN-NEXT: v_alignbit_b32 v12, v38, v5, 16
+; GCN-NEXT: v_alignbit_b32 v16, v39, v33, 16
+; GCN-NEXT: v_alignbit_b32 v20, v48, v9, 16
+; GCN-NEXT: v_alignbit_b32 v24, v49, v10, 16
+; GCN-NEXT: v_alignbit_b32 v28, v50, v13, 16
+; GCN-NEXT: v_alignbit_b32 v30, v31, v14, 16
+; GCN-NEXT: v_alignbit_b32 v26, v27, v17, 16
+; GCN-NEXT: v_alignbit_b32 v22, v23, v18, 16
+; GCN-NEXT: v_alignbit_b32 v18, v19, v21, 16
+; GCN-NEXT: v_alignbit_b32 v14, v15, v34, 16
+; GCN-NEXT: v_alignbit_b32 v10, v11, v25, 16
+; GCN-NEXT: v_alignbit_b32 v6, v7, v35, 16
+; GCN-NEXT: v_alignbit_b32 v2, v3, v29, 16
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: v_alignbit_b32 v5, v6, v41, 16
+; GCN-NEXT: v_alignbit_b32 v9, v10, v40, 16
+; GCN-NEXT: v_alignbit_b32 v13, v14, v55, 16
+; GCN-NEXT: v_alignbit_b32 v17, v18, v54, 16
+; GCN-NEXT: v_alignbit_b32 v21, v22, v53, 16
+; GCN-NEXT: v_alignbit_b32 v25, v26, v52, 16
+; GCN-NEXT: v_alignbit_b32 v29, v30, v51, 16
+; GCN-NEXT: .LBB39_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32bf16_to_v32i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB39_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v0
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v17, 16, v1
+; VI-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; VI-NEXT: v_bfe_u32 v18, v17, 16, 1
+; VI-NEXT: v_add_u32_e32 v18, vcc, v18, v17
+; VI-NEXT: v_add_u32_e32 v18, vcc, s6, v18
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v17, v17
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v17, v18, v19, vcc
+; VI-NEXT: v_bfe_u32 v18, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v18, vcc, v18, v1
+; VI-NEXT: v_add_u32_e32 v18, vcc, s6, v18
+; VI-NEXT: v_or_b32_e32 v19, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v18, v19, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v18, 16, v2
+; VI-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; VI-NEXT: v_bfe_u32 v19, v18, 16, 1
+; VI-NEXT: v_add_u32_e32 v19, vcc, v19, v18
+; VI-NEXT: v_add_u32_e32 v19, vcc, s6, v19
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v18, v18
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc
+; VI-NEXT: v_bfe_u32 v19, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v19, vcc, v19, v2
+; VI-NEXT: v_add_u32_e32 v19, vcc, s6, v19
+; VI-NEXT: v_or_b32_e32 v20, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v19, v20, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v19, 16, v3
+; VI-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; VI-NEXT: v_bfe_u32 v20, v19, 16, 1
+; VI-NEXT: v_add_u32_e32 v20, vcc, v20, v19
+; VI-NEXT: v_add_u32_e32 v20, vcc, s6, v20
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v21, 0x400000, v19
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v19, v19
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v19, v20, v21, vcc
+; VI-NEXT: v_bfe_u32 v20, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v20, vcc, v20, v3
+; VI-NEXT: v_add_u32_e32 v20, vcc, s6, v20
+; VI-NEXT: v_or_b32_e32 v21, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v20, v21, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v20, 16, v4
+; VI-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; VI-NEXT: v_bfe_u32 v21, v20, 16, 1
+; VI-NEXT: v_add_u32_e32 v21, vcc, v21, v20
+; VI-NEXT: v_add_u32_e32 v21, vcc, s6, v21
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v22, 0x400000, v20
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v20, v20
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v20, v21, v22, vcc
+; VI-NEXT: v_bfe_u32 v21, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v21, vcc, v21, v4
+; VI-NEXT: v_add_u32_e32 v21, vcc, s6, v21
+; VI-NEXT: v_or_b32_e32 v22, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v21, v22, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; VI-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; VI-NEXT: v_bfe_u32 v22, v21, 16, 1
+; VI-NEXT: v_add_u32_e32 v22, vcc, v22, v21
+; VI-NEXT: v_add_u32_e32 v22, vcc, s6, v22
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v23, 0x400000, v21
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v21, v21
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v21, v22, v23, vcc
+; VI-NEXT: v_bfe_u32 v22, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v22, vcc, v22, v5
+; VI-NEXT: v_add_u32_e32 v22, vcc, s6, v22
+; VI-NEXT: v_or_b32_e32 v23, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v22, v23, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v22, 16, v6
+; VI-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; VI-NEXT: v_bfe_u32 v23, v22, 16, 1
+; VI-NEXT: v_add_u32_e32 v23, vcc, v23, v22
+; VI-NEXT: v_add_u32_e32 v23, vcc, s6, v23
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v24, 0x400000, v22
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v22, v22
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v22, v23, v24, vcc
+; VI-NEXT: v_bfe_u32 v23, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v23, vcc, v23, v6
+; VI-NEXT: v_add_u32_e32 v23, vcc, s6, v23
+; VI-NEXT: v_or_b32_e32 v24, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v23, v24, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v23, 16, v7
+; VI-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; VI-NEXT: v_bfe_u32 v24, v23, 16, 1
+; VI-NEXT: v_add_u32_e32 v24, vcc, v24, v23
+; VI-NEXT: v_add_u32_e32 v24, vcc, s6, v24
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v25, 0x400000, v23
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v23, v23
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v23, v24, v25, vcc
+; VI-NEXT: v_bfe_u32 v24, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v24, vcc, v24, v7
+; VI-NEXT: v_add_u32_e32 v24, vcc, s6, v24
+; VI-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v24, v25, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v24, 16, v8
+; VI-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; VI-NEXT: v_bfe_u32 v25, v24, 16, 1
+; VI-NEXT: v_add_u32_e32 v25, vcc, v25, v24
+; VI-NEXT: v_add_u32_e32 v25, vcc, s6, v25
+; VI-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; VI-NEXT: v_or_b32_e32 v26, 0x400000, v24
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v24, v24
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_cndmask_b32_e32 v24, v25, v26, vcc
+; VI-NEXT: v_bfe_u32 v25, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v25, vcc, v25, v8
+; VI-NEXT: v_add_u32_e32 v25, vcc, s6, v25
+; VI-NEXT: v_or_b32_e32 v26, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_cndmask_b32_e32 v8, v25, v26, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v25, 16, v9
+; VI-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; VI-NEXT: v_bfe_u32 v26, v25, 16, 1
+; VI-NEXT: v_add_u32_e32 v26, vcc, v26, v25
+; VI-NEXT: v_add_u32_e32 v26, vcc, s6, v26
+; VI-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; VI-NEXT: v_or_b32_e32 v27, 0x400000, v25
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v25, v25
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_cndmask_b32_e32 v25, v26, v27, vcc
+; VI-NEXT: v_bfe_u32 v26, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v26, vcc, v26, v9
+; VI-NEXT: v_add_u32_e32 v26, vcc, s6, v26
+; VI-NEXT: v_or_b32_e32 v27, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_cndmask_b32_e32 v9, v26, v27, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v26, 16, v10
+; VI-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; VI-NEXT: v_bfe_u32 v27, v26, 16, 1
+; VI-NEXT: v_add_u32_e32 v27, vcc, v27, v26
+; VI-NEXT: v_add_u32_e32 v27, vcc, s6, v27
+; VI-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; VI-NEXT: v_or_b32_e32 v28, 0x400000, v26
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v26, v26
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_cndmask_b32_e32 v26, v27, v28, vcc
+; VI-NEXT: v_bfe_u32 v27, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v27, vcc, v27, v10
+; VI-NEXT: v_add_u32_e32 v27, vcc, s6, v27
+; VI-NEXT: v_or_b32_e32 v28, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_cndmask_b32_e32 v10, v27, v28, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v27, 16, v11
+; VI-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; VI-NEXT: v_bfe_u32 v28, v27, 16, 1
+; VI-NEXT: v_add_u32_e32 v28, vcc, v28, v27
+; VI-NEXT: v_add_u32_e32 v28, vcc, s6, v28
+; VI-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; VI-NEXT: v_or_b32_e32 v29, 0x400000, v27
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v27, v27
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_cndmask_b32_e32 v27, v28, v29, vcc
+; VI-NEXT: v_bfe_u32 v28, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v28, vcc, v28, v11
+; VI-NEXT: v_add_u32_e32 v28, vcc, s6, v28
+; VI-NEXT: v_or_b32_e32 v29, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_cndmask_b32_e32 v11, v28, v29, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v28, 16, v12
+; VI-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; VI-NEXT: v_bfe_u32 v29, v28, 16, 1
+; VI-NEXT: v_add_u32_e32 v29, vcc, v29, v28
+; VI-NEXT: v_add_u32_e32 v29, vcc, s6, v29
+; VI-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; VI-NEXT: v_or_b32_e32 v30, 0x400000, v28
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v28, v28
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_cndmask_b32_e32 v28, v29, v30, vcc
+; VI-NEXT: v_bfe_u32 v29, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v29, vcc, v29, v12
+; VI-NEXT: v_add_u32_e32 v29, vcc, s6, v29
+; VI-NEXT: v_or_b32_e32 v30, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_cndmask_b32_e32 v12, v29, v30, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v29, 16, v13
+; VI-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; VI-NEXT: v_bfe_u32 v30, v29, 16, 1
+; VI-NEXT: v_add_u32_e32 v30, vcc, v30, v29
+; VI-NEXT: v_add_u32_e32 v30, vcc, s6, v30
+; VI-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; VI-NEXT: v_or_b32_e32 v31, 0x400000, v29
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v29, v29
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_cndmask_b32_e32 v29, v30, v31, vcc
+; VI-NEXT: v_bfe_u32 v30, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v30, vcc, v30, v13
+; VI-NEXT: v_add_u32_e32 v30, vcc, s6, v30
+; VI-NEXT: v_or_b32_e32 v31, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_cndmask_b32_e32 v13, v30, v31, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v30, 16, v14
+; VI-NEXT: v_add_f32_e32 v30, 0x40c00000, v30
+; VI-NEXT: v_bfe_u32 v31, v30, 16, 1
+; VI-NEXT: v_add_u32_e32 v31, vcc, v31, v30
+; VI-NEXT: v_add_u32_e32 v31, vcc, s6, v31
+; VI-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; VI-NEXT: v_or_b32_e32 v32, 0x400000, v30
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v30, v30
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_cndmask_b32_e32 v30, v31, v32, vcc
+; VI-NEXT: v_bfe_u32 v31, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v31, vcc, v31, v14
+; VI-NEXT: v_add_u32_e32 v31, vcc, s6, v31
+; VI-NEXT: v_or_b32_e32 v32, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_cndmask_b32_e32 v14, v31, v32, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v31, 16, v15
+; VI-NEXT: v_add_f32_e32 v31, 0x40c00000, v31
+; VI-NEXT: v_bfe_u32 v32, v31, 16, 1
+; VI-NEXT: v_add_u32_e32 v32, vcc, v32, v31
+; VI-NEXT: v_add_u32_e32 v32, vcc, s6, v32
+; VI-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; VI-NEXT: v_or_b32_e32 v33, 0x400000, v31
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v31, v31
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_cndmask_b32_e32 v31, v32, v33, vcc
+; VI-NEXT: v_bfe_u32 v32, v15, 16, 1
+; VI-NEXT: v_add_u32_e32 v32, vcc, v32, v15
+; VI-NEXT: v_add_u32_e32 v32, vcc, s6, v32
+; VI-NEXT: v_or_b32_e32 v33, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_cndmask_b32_e32 v15, v32, v33, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; VI-NEXT: v_lshrrev_b32_e32 v14, 16, v14
+; VI-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; VI-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; VI-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; VI-NEXT: v_lshrrev_b32_e32 v10, 16, v10
+; VI-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; VI-NEXT: v_lshrrev_b32_e32 v8, 16, v8
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v15, v15, v31, 16
+; VI-NEXT: v_alignbit_b32 v14, v14, v30, 16
+; VI-NEXT: v_alignbit_b32 v13, v13, v29, 16
+; VI-NEXT: v_alignbit_b32 v12, v12, v28, 16
+; VI-NEXT: v_alignbit_b32 v11, v11, v27, 16
+; VI-NEXT: v_alignbit_b32 v10, v10, v26, 16
+; VI-NEXT: v_alignbit_b32 v9, v9, v25, 16
+; VI-NEXT: v_alignbit_b32 v8, v8, v24, 16
+; VI-NEXT: v_alignbit_b32 v7, v7, v23, 16
+; VI-NEXT: v_alignbit_b32 v6, v6, v22, 16
+; VI-NEXT: v_alignbit_b32 v5, v5, v21, 16
+; VI-NEXT: v_alignbit_b32 v4, v4, v20, 16
+; VI-NEXT: v_alignbit_b32 v3, v3, v19, 16
+; VI-NEXT: v_alignbit_b32 v2, v2, v18, 16
+; VI-NEXT: v_alignbit_b32 v1, v1, v17, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v16, 16
+; VI-NEXT: .LBB39_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32bf16_to_v32i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB39_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v17, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GFX9-NEXT: v_bfe_u32 v18, v17, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v18, v18, v17, s6
+; GFX9-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v17, v17
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v17, v18, v19, vcc
+; GFX9-NEXT: v_bfe_u32 v18, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v18, v18, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v19, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v18, v19, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v18, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX9-NEXT: v_bfe_u32 v19, v18, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v19, v19, v18, s6
+; GFX9-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v18, v18
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc
+; GFX9-NEXT: v_bfe_u32 v19, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v19, v19, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v20, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v19, v20, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v19, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GFX9-NEXT: v_bfe_u32 v20, v19, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v20, v20, v19, s6
+; GFX9-NEXT: v_or_b32_e32 v21, 0x400000, v19
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v19, v19
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v19, v20, v21, vcc
+; GFX9-NEXT: v_bfe_u32 v20, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v20, v20, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v21, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v20, v21, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v20, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GFX9-NEXT: v_bfe_u32 v21, v20, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v21, v21, v20, s6
+; GFX9-NEXT: v_or_b32_e32 v22, 0x400000, v20
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v20, v20
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v20, v21, v22, vcc
+; GFX9-NEXT: v_bfe_u32 v21, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v21, v21, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v22, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v21, v22, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GFX9-NEXT: v_bfe_u32 v22, v21, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v22, v22, v21, s6
+; GFX9-NEXT: v_or_b32_e32 v23, 0x400000, v21
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v21, v21
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v21, v22, v23, vcc
+; GFX9-NEXT: v_bfe_u32 v22, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v22, v22, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v23, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v22, v23, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v22, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GFX9-NEXT: v_bfe_u32 v23, v22, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v23, v23, v22, s6
+; GFX9-NEXT: v_or_b32_e32 v24, 0x400000, v22
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v22, v22
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v22, v23, v24, vcc
+; GFX9-NEXT: v_bfe_u32 v23, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v23, v23, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v24, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v23, v24, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v23, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GFX9-NEXT: v_bfe_u32 v24, v23, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v24, v24, v23, s6
+; GFX9-NEXT: v_or_b32_e32 v25, 0x400000, v23
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v23, v23
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v23, v24, v25, vcc
+; GFX9-NEXT: v_bfe_u32 v24, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v24, v24, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v24, v25, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v24, 16, v8
+; GFX9-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GFX9-NEXT: v_bfe_u32 v25, v24, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX9-NEXT: v_add3_u32 v25, v25, v24, s6
+; GFX9-NEXT: v_or_b32_e32 v26, 0x400000, v24
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v24, v24
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v24, v25, v26, vcc
+; GFX9-NEXT: v_bfe_u32 v25, v8, 16, 1
+; GFX9-NEXT: v_add3_u32 v25, v25, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v26, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v25, v26, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v25, 16, v9
+; GFX9-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GFX9-NEXT: v_bfe_u32 v26, v25, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX9-NEXT: v_add3_u32 v26, v26, v25, s6
+; GFX9-NEXT: v_or_b32_e32 v27, 0x400000, v25
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v25, v25
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v25, v26, v27, vcc
+; GFX9-NEXT: v_bfe_u32 v26, v9, 16, 1
+; GFX9-NEXT: v_add3_u32 v26, v26, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v27, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v26, v27, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v26, 16, v10
+; GFX9-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GFX9-NEXT: v_bfe_u32 v27, v26, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX9-NEXT: v_add3_u32 v27, v27, v26, s6
+; GFX9-NEXT: v_or_b32_e32 v28, 0x400000, v26
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v26, v26
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v26, v27, v28, vcc
+; GFX9-NEXT: v_bfe_u32 v27, v10, 16, 1
+; GFX9-NEXT: v_add3_u32 v27, v27, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v28, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v27, v28, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v27, 16, v11
+; GFX9-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GFX9-NEXT: v_bfe_u32 v28, v27, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX9-NEXT: v_add3_u32 v28, v28, v27, s6
+; GFX9-NEXT: v_or_b32_e32 v29, 0x400000, v27
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v27, v27
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v27, v28, v29, vcc
+; GFX9-NEXT: v_bfe_u32 v28, v11, 16, 1
+; GFX9-NEXT: v_add3_u32 v28, v28, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v29, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v28, v29, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v28, 16, v12
+; GFX9-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GFX9-NEXT: v_bfe_u32 v29, v28, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX9-NEXT: v_add3_u32 v29, v29, v28, s6
+; GFX9-NEXT: v_or_b32_e32 v30, 0x400000, v28
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v28, v28
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v28, v29, v30, vcc
+; GFX9-NEXT: v_bfe_u32 v29, v12, 16, 1
+; GFX9-NEXT: v_add3_u32 v29, v29, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v30, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v29, v30, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v29, 16, v13
+; GFX9-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GFX9-NEXT: v_bfe_u32 v30, v29, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX9-NEXT: v_add3_u32 v30, v30, v29, s6
+; GFX9-NEXT: v_or_b32_e32 v31, 0x400000, v29
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v29, v29
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v29, v30, v31, vcc
+; GFX9-NEXT: v_bfe_u32 v30, v13, 16, 1
+; GFX9-NEXT: v_add3_u32 v30, v30, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v31, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v30, v31, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v30, 16, v14
+; GFX9-NEXT: v_add_f32_e32 v30, 0x40c00000, v30
+; GFX9-NEXT: v_bfe_u32 v31, v30, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX9-NEXT: v_add3_u32 v31, v31, v30, s6
+; GFX9-NEXT: v_or_b32_e32 v32, 0x400000, v30
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v30, v30
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v30, v31, v32, vcc
+; GFX9-NEXT: v_bfe_u32 v31, v14, 16, 1
+; GFX9-NEXT: v_add3_u32 v31, v31, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v32, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v31, v32, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v31, 16, v15
+; GFX9-NEXT: v_add_f32_e32 v31, 0x40c00000, v31
+; GFX9-NEXT: v_bfe_u32 v32, v31, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX9-NEXT: v_add3_u32 v32, v32, v31, s6
+; GFX9-NEXT: v_or_b32_e32 v33, 0x400000, v31
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v31, v31
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v31, v32, v33, vcc
+; GFX9-NEXT: v_bfe_u32 v32, v15, 16, 1
+; GFX9-NEXT: v_add3_u32 v32, v32, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v33, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v32, v33, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v15, v15, v31, s6
+; GFX9-NEXT: v_perm_b32 v14, v14, v30, s6
+; GFX9-NEXT: v_perm_b32 v13, v13, v29, s6
+; GFX9-NEXT: v_perm_b32 v12, v12, v28, s6
+; GFX9-NEXT: v_perm_b32 v11, v11, v27, s6
+; GFX9-NEXT: v_perm_b32 v10, v10, v26, s6
+; GFX9-NEXT: v_perm_b32 v9, v9, v25, s6
+; GFX9-NEXT: v_perm_b32 v8, v8, v24, s6
+; GFX9-NEXT: v_perm_b32 v7, v7, v23, s6
+; GFX9-NEXT: v_perm_b32 v6, v6, v22, s6
+; GFX9-NEXT: v_perm_b32 v5, v5, v21, s6
+; GFX9-NEXT: v_perm_b32 v4, v4, v20, s6
+; GFX9-NEXT: v_perm_b32 v3, v3, v19, s6
+; GFX9-NEXT: v_perm_b32 v2, v2, v18, s6
+; GFX9-NEXT: v_perm_b32 v1, v1, v17, s6
+; GFX9-NEXT: v_perm_b32 v0, v0, v16, s6
+; GFX9-NEXT: .LBB39_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32bf16_to_v32i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB39_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v17, 16, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX11-NEXT: v_lshlrev_b32_e32 v24, 16, v5
+; GFX11-NEXT: v_lshlrev_b32_e32 v26, 16, v7
+; GFX11-NEXT: v_lshlrev_b32_e32 v28, 16, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v17 :: v_dual_add_f32 v16, 0x40c00000, v16
+; GFX11-NEXT: v_lshlrev_b32_e32 v30, 16, v11
+; GFX11-NEXT: v_dual_add_f32 v24, 0x40c00000, v24 :: v_dual_lshlrev_b32 v25, 16, v6
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v19, v16, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v16
+; GFX11-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v16, v16
+; GFX11-NEXT: v_add3_u32 v21, v21, v17, 0x7fff
+; GFX11-NEXT: v_add3_u32 v19, v19, v16, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: v_dual_add_f32 v26, 0x40c00000, v26 :: v_dual_lshlrev_b32 v27, 16, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v19, v22, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v22, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX11-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v29, 16, v10
+; GFX11-NEXT: v_bfe_u32 v20, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_dual_add_f32 v28, 0x40c00000, v28 :: v_dual_add_f32 v29, 0x40c00000, v29
+; GFX11-NEXT: v_add3_u32 v20, v20, v0, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX11-NEXT: v_dual_add_f32 v30, 0x40c00000, v30 :: v_dual_lshlrev_b32 v31, 16, v12
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v20, v23 :: v_dual_lshlrev_b32 v23, 16, v4
+; GFX11-NEXT: v_bfe_u32 v20, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_dual_add_f32 v10, 0x40c00000, v10 :: v_dual_add_f32 v23, 0x40c00000, v23
+; GFX11-NEXT: v_perm_b32 v0, v0, v16, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v21, v19, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v20, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v2
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_dual_add_f32 v4, 0x40c00000, v4 :: v_dual_add_f32 v31, 0x40c00000, v31
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v19, v20 :: v_dual_add_f32 v18, 0x40c00000, v18
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GFX11-NEXT: v_bfe_u32 v21, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_or_b32_e32 v33, 0x400000, v31
+; GFX11-NEXT: v_add3_u32 v19, v21, v18, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v21, v2, 16, 1
+; GFX11-NEXT: v_bfe_u32 v34, v12, 16, 1
+; GFX11-NEXT: v_perm_b32 v1, v1, v17, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v18, v19, v20 :: v_dual_and_b32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_add3_u32 v19, v21, v2, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_bfe_u32 v21, v22, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v19, v20 :: v_dual_and_b32 v11, 0xffff0000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v19, v21, v22, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v22
+; GFX11-NEXT: v_bfe_u32 v21, v3, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v22, v22
+; GFX11-NEXT: v_bfe_u32 v22, v23, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX11-NEXT: v_perm_b32 v2, v2, v18, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX11-NEXT: v_cndmask_b32_e32 v19, v19, v20, vcc_lo
+; GFX11-NEXT: v_add3_u32 v20, v21, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_or_b32_e32 v32, 0x400000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v20, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v20, v22, v23, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v23
+; GFX11-NEXT: v_bfe_u32 v22, v4, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v23, v23
+; GFX11-NEXT: v_bfe_u32 v23, v24, 16, 1
+; GFX11-NEXT: v_perm_b32 v3, v3, v19, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v20, v20, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v21, v22, v4, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v21, v22, vcc_lo
+; GFX11-NEXT: v_add3_u32 v21, v23, v24, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v24
+; GFX11-NEXT: v_bfe_u32 v23, v5, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v24, v24
+; GFX11-NEXT: v_bfe_u32 v24, v25, 16, 1
+; GFX11-NEXT: v_perm_b32 v4, v4, v20, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v21, v21, v22, vcc_lo
+; GFX11-NEXT: v_add3_u32 v22, v23, v5, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v22, v23, vcc_lo
+; GFX11-NEXT: v_add3_u32 v22, v24, v25, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v25
+; GFX11-NEXT: v_bfe_u32 v24, v6, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v25, v25
+; GFX11-NEXT: v_bfe_u32 v25, v26, 16, 1
+; GFX11-NEXT: v_perm_b32 v5, v5, v21, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v22, v22, v23, vcc_lo
+; GFX11-NEXT: v_add3_u32 v23, v24, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v24, 0x400000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v23, v24, vcc_lo
+; GFX11-NEXT: v_add3_u32 v23, v25, v26, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v24, 0x400000, v26
+; GFX11-NEXT: v_bfe_u32 v25, v7, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v26, v26
+; GFX11-NEXT: v_bfe_u32 v26, v27, 16, 1
+; GFX11-NEXT: v_perm_b32 v6, v6, v22, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v23, v23, v24, vcc_lo
+; GFX11-NEXT: v_add3_u32 v24, v25, v7, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v24, v25, vcc_lo
+; GFX11-NEXT: v_add3_u32 v24, v26, v27, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v27
+; GFX11-NEXT: v_bfe_u32 v26, v8, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v27, v27
+; GFX11-NEXT: v_bfe_u32 v27, v28, 16, 1
+; GFX11-NEXT: v_perm_b32 v7, v7, v23, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v24, v24, v25, vcc_lo
+; GFX11-NEXT: v_add3_u32 v25, v26, v8, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v8
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v25, v26, vcc_lo
+; GFX11-NEXT: v_add3_u32 v25, v27, v28, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v28
+; GFX11-NEXT: v_bfe_u32 v27, v9, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v28, v28
+; GFX11-NEXT: v_bfe_u32 v28, v29, 16, 1
+; GFX11-NEXT: v_perm_b32 v8, v8, v24, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v25, v25, v26, vcc_lo
+; GFX11-NEXT: v_add3_u32 v26, v27, v9, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v27, 0x400000, v9
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v26, v27, vcc_lo
+; GFX11-NEXT: v_add3_u32 v26, v28, v29, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v27, 0x400000, v29
+; GFX11-NEXT: v_bfe_u32 v28, v10, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v29, v29
+; GFX11-NEXT: v_bfe_u32 v29, v30, 16, 1
+; GFX11-NEXT: v_perm_b32 v9, v9, v25, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v26, v26, v27, vcc_lo
+; GFX11-NEXT: v_add3_u32 v27, v28, v10, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v28, 0x400000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v27, v28, vcc_lo
+; GFX11-NEXT: v_add3_u32 v27, v29, v30, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v28, 0x400000, v30
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v30, v30
+; GFX11-NEXT: v_bfe_u32 v30, v31, 16, 1
+; GFX11-NEXT: v_bfe_u32 v29, v11, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v27, v27, v28 :: v_dual_lshlrev_b32 v28, 16, v13
+; GFX11-NEXT: v_add3_u32 v30, v30, v31, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v31, v31
+; GFX11-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX11-NEXT: v_add3_u32 v31, v34, v12, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GFX11-NEXT: v_add3_u32 v29, v29, v11, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v30, v30, v33, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v33, 0x400000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_bfe_u32 v35, v28, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX11-NEXT: v_or_b32_e32 v36, 0x400000, v28
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v31, v33, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v34, v35, v28, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v35, 16, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v28, v28
+; GFX11-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX11-NEXT: v_bfe_u32 v37, v13, 16, 1
+; GFX11-NEXT: v_perm_b32 v10, v10, v26, 0x7060302
+; GFX11-NEXT: v_dual_add_f32 v31, 0x40c00000, v35 :: v_dual_cndmask_b32 v28, v34, v36
+; GFX11-NEXT: v_lshlrev_b32_e32 v34, 16, v15
+; GFX11-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX11-NEXT: v_add3_u32 v33, v37, v13, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v35, v31, 16, 1
+; GFX11-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX11-NEXT: v_add_f32_e32 v34, 0x40c00000, v34
+; GFX11-NEXT: v_or_b32_e32 v37, 0x400000, v31
+; GFX11-NEXT: v_bfe_u32 v38, v14, 16, 1
+; GFX11-NEXT: v_add3_u32 v35, v35, v31, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v31, v31
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_bfe_u32 v39, v34, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v48, 0x400000, v34
+; GFX11-NEXT: v_or_b32_e32 v36, 0x400000, v13
+; GFX11-NEXT: v_cndmask_b32_e32 v31, v35, v37, vcc_lo
+; GFX11-NEXT: v_add3_u32 v37, v38, v14, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v38, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_bfe_u32 v35, v15, 16, 1
+; GFX11-NEXT: v_add3_u32 v39, v39, v34, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v49, 0x400000, v15
+; GFX11-NEXT: v_cndmask_b32_e32 v14, v37, v38, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v34, v34
+; GFX11-NEXT: v_add3_u32 v35, v35, v15, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v14, v14, v31, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v34, v39, v48, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: v_cndmask_b32_e32 v15, v35, v49, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v13, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v15, v15, v34, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v13, v33, v36, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_perm_b32 v12, v12, v30, 0x7060302
+; GFX11-NEXT: v_perm_b32 v13, v13, v28, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v29, v32, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v11, v11, v27, 0x7060302
+; GFX11-NEXT: .LBB39_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <32 x bfloat> %a1 to <32 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x bfloat> %a to <32 x i16>
+ br label %end
+
+end:
+ %phi = phi <32 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x i16> %phi
+}
+
+define <32 x bfloat> @v_bitcast_v32f16_to_v32bf16(<32 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32f16_to_v32bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_cvt_f16_f32_e32 v32, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v33, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v34, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v35, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v36, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v37, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v38, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v39, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v48, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v49, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v50, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v51, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v52, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v53, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v54, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v55, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v40, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v41, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v42, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v43, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v44, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v45, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v46, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v47, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v56, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v57, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v58, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v59, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v60, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v61, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v62, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v63
+; GCN-NEXT: v_cvt_f16_f32_e32 v63, v31
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB40_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v32
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v33
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v34
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v36
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v38
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v48
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v50
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v52
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v55
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v40
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v41
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v42
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v43
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v44
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v45
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v46
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v47
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v56
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v57
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v58
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v59
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v60
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v61
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v62
+; GCN-NEXT: v_lshlrev_b32_e32 v31, 16, v63
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr46
+; GCN-NEXT: ; implicit-def: $vgpr47
+; GCN-NEXT: ; implicit-def: $vgpr56
+; GCN-NEXT: ; implicit-def: $vgpr57
+; GCN-NEXT: ; implicit-def: $vgpr58
+; GCN-NEXT: ; implicit-def: $vgpr59
+; GCN-NEXT: ; implicit-def: $vgpr60
+; GCN-NEXT: ; implicit-def: $vgpr61
+; GCN-NEXT: ; implicit-def: $vgpr62
+; GCN-NEXT: ; implicit-def: $vgpr63
+; GCN-NEXT: .LBB40_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB40_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v63
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v62
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v61
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v60
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v59
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v58
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v57
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v56
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v47
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v46
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v45
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v44
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v43
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v42
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v41
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v40
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v32
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x38000000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x38000000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x38000000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x38000000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x38000000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x38000000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x38000000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x38000000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x38000000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x38000000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x38000000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x38000000, v15
+; GCN-NEXT: v_add_f32_e32 v16, 0x38000000, v16
+; GCN-NEXT: v_add_f32_e32 v17, 0x38000000, v17
+; GCN-NEXT: v_add_f32_e32 v18, 0x38000000, v18
+; GCN-NEXT: v_add_f32_e32 v19, 0x38000000, v19
+; GCN-NEXT: v_add_f32_e32 v20, 0x38000000, v20
+; GCN-NEXT: v_add_f32_e32 v21, 0x38000000, v21
+; GCN-NEXT: v_add_f32_e32 v22, 0x38000000, v22
+; GCN-NEXT: v_add_f32_e32 v23, 0x38000000, v23
+; GCN-NEXT: v_add_f32_e32 v24, 0x38000000, v24
+; GCN-NEXT: v_add_f32_e32 v25, 0x38000000, v25
+; GCN-NEXT: v_add_f32_e32 v26, 0x38000000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x38000000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x38000000, v28
+; GCN-NEXT: v_add_f32_e32 v29, 0x38000000, v29
+; GCN-NEXT: v_add_f32_e32 v30, 0x38000000, v30
+; GCN-NEXT: v_add_f32_e32 v31, 0x38000000, v31
+; GCN-NEXT: v_cvt_f16_f32_e32 v31, v31
+; GCN-NEXT: v_cvt_f16_f32_e32 v30, v30
+; GCN-NEXT: v_cvt_f16_f32_e32 v29, v29
+; GCN-NEXT: v_cvt_f16_f32_e32 v28, v28
+; GCN-NEXT: v_cvt_f16_f32_e32 v27, v27
+; GCN-NEXT: v_cvt_f16_f32_e32 v26, v26
+; GCN-NEXT: v_cvt_f16_f32_e32 v25, v25
+; GCN-NEXT: v_cvt_f16_f32_e32 v24, v24
+; GCN-NEXT: v_cvt_f16_f32_e32 v23, v23
+; GCN-NEXT: v_cvt_f16_f32_e32 v22, v22
+; GCN-NEXT: v_cvt_f16_f32_e32 v21, v21
+; GCN-NEXT: v_cvt_f16_f32_e32 v20, v20
+; GCN-NEXT: v_cvt_f16_f32_e32 v19, v19
+; GCN-NEXT: v_cvt_f16_f32_e32 v18, v18
+; GCN-NEXT: v_cvt_f16_f32_e32 v17, v17
+; GCN-NEXT: v_cvt_f16_f32_e32 v16, v16
+; GCN-NEXT: v_cvt_f16_f32_e32 v32, v15
+; GCN-NEXT: v_cvt_f16_f32_e32 v33, v14
+; GCN-NEXT: v_cvt_f16_f32_e32 v34, v13
+; GCN-NEXT: v_cvt_f16_f32_e32 v35, v12
+; GCN-NEXT: v_cvt_f16_f32_e32 v36, v11
+; GCN-NEXT: v_cvt_f16_f32_e32 v37, v10
+; GCN-NEXT: v_cvt_f16_f32_e32 v38, v9
+; GCN-NEXT: v_cvt_f16_f32_e32 v39, v8
+; GCN-NEXT: v_cvt_f16_f32_e32 v48, v7
+; GCN-NEXT: v_cvt_f16_f32_e32 v49, v6
+; GCN-NEXT: v_cvt_f16_f32_e32 v50, v5
+; GCN-NEXT: v_cvt_f16_f32_e32 v51, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v52, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v53, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v54, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v55, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v31
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v30
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v29
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v28
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v27
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v26
+; GCN-NEXT: v_lshlrev_b32_e32 v6, 16, v25
+; GCN-NEXT: v_lshlrev_b32_e32 v7, 16, v24
+; GCN-NEXT: v_lshlrev_b32_e32 v8, 16, v23
+; GCN-NEXT: v_lshlrev_b32_e32 v9, 16, v22
+; GCN-NEXT: v_lshlrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_lshlrev_b32_e32 v11, 16, v20
+; GCN-NEXT: v_lshlrev_b32_e32 v12, 16, v19
+; GCN-NEXT: v_lshlrev_b32_e32 v13, 16, v18
+; GCN-NEXT: v_lshlrev_b32_e32 v14, 16, v17
+; GCN-NEXT: v_lshlrev_b32_e32 v15, 16, v16
+; GCN-NEXT: v_lshlrev_b32_e32 v16, 16, v32
+; GCN-NEXT: v_lshlrev_b32_e32 v17, 16, v33
+; GCN-NEXT: v_lshlrev_b32_e32 v18, 16, v34
+; GCN-NEXT: v_lshlrev_b32_e32 v19, 16, v35
+; GCN-NEXT: v_lshlrev_b32_e32 v20, 16, v36
+; GCN-NEXT: v_lshlrev_b32_e32 v21, 16, v37
+; GCN-NEXT: v_lshlrev_b32_e32 v22, 16, v38
+; GCN-NEXT: v_lshlrev_b32_e32 v23, 16, v39
+; GCN-NEXT: v_lshlrev_b32_e32 v24, 16, v48
+; GCN-NEXT: v_lshlrev_b32_e32 v25, 16, v49
+; GCN-NEXT: v_lshlrev_b32_e32 v26, 16, v50
+; GCN-NEXT: v_lshlrev_b32_e32 v27, 16, v51
+; GCN-NEXT: v_lshlrev_b32_e32 v28, 16, v52
+; GCN-NEXT: v_lshlrev_b32_e32 v29, 16, v53
+; GCN-NEXT: v_lshlrev_b32_e32 v30, 16, v54
+; GCN-NEXT: v_lshlrev_b32_e32 v31, 16, v55
+; GCN-NEXT: .LBB40_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32f16_to_v32bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB40_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v17, 0x200
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v15
+; VI-NEXT: v_add_f16_sdwa v15, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v15, v19, v15
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v14
+; VI-NEXT: v_add_f16_sdwa v14, v14, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v14, v19, v14
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v13
+; VI-NEXT: v_add_f16_sdwa v13, v13, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v13, v19, v13
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v12
+; VI-NEXT: v_add_f16_sdwa v12, v12, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v12, v19, v12
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v11
+; VI-NEXT: v_add_f16_sdwa v11, v11, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v11, v19, v11
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v10
+; VI-NEXT: v_add_f16_sdwa v10, v10, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v10, v19, v10
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v9
+; VI-NEXT: v_add_f16_sdwa v9, v9, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v9, v19, v9
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v8
+; VI-NEXT: v_add_f16_sdwa v8, v8, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v8, v19, v8
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v7
+; VI-NEXT: v_add_f16_sdwa v7, v7, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v7, v19, v7
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v6
+; VI-NEXT: v_add_f16_sdwa v6, v6, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v6, v19, v6
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v5
+; VI-NEXT: v_add_f16_sdwa v5, v5, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v5, v19, v5
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v4
+; VI-NEXT: v_add_f16_sdwa v4, v4, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v16, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v18, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v4, v19, v4
+; VI-NEXT: v_add_f16_e32 v19, 0x200, v2
+; VI-NEXT: v_add_f16_sdwa v2, v2, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_sdwa v17, v3, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v3, 0x200, v3
+; VI-NEXT: v_or_b32_e32 v3, v3, v17
+; VI-NEXT: v_or_b32_e32 v2, v19, v2
+; VI-NEXT: v_or_b32_e32 v1, v18, v1
+; VI-NEXT: v_or_b32_e32 v0, v16, v0
+; VI-NEXT: .LBB40_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32f16_to_v32bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB40_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v15, v15, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v14, v14, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v13, v13, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v12, v12, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v11, v11, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v10, v10, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v9, v9, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v8, v8, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v7, v7, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v6, v6, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v5, v5, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v4, v4, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v3, v3, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v2, v2, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: .LBB40_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32f16_to_v32bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB40_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v15, 0x200, v15 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v14, 0x200, v14 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v13, 0x200, v13 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v12, 0x200, v12 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v11, 0x200, v11 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v10, 0x200, v10 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v9, 0x200, v9 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v8, 0x200, v8 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v7, 0x200, v7 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v6, 0x200, v6 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v5, 0x200, v5 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v4, 0x200, v4 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v3, 0x200, v3 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v2, 0x200, v2 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: .LBB40_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <32 x half> %a1 to <32 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x half> %a to <32 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <32 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x bfloat> %phi
+}
+
+define <32 x half> @v_bitcast_v32bf16_to_v32f16(<32 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v32bf16_to_v32f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
+; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GCN-NEXT: s_waitcnt expcnt(0)
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32 offset:4
+; GCN-NEXT: v_mul_f32_e32 v32, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v33, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v34, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v35, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v36, 1.0, v4
+; GCN-NEXT: v_mul_f32_e32 v37, 1.0, v5
+; GCN-NEXT: v_mul_f32_e32 v38, 1.0, v6
+; GCN-NEXT: v_mul_f32_e32 v39, 1.0, v7
+; GCN-NEXT: v_mul_f32_e32 v48, 1.0, v8
+; GCN-NEXT: v_mul_f32_e32 v49, 1.0, v9
+; GCN-NEXT: v_mul_f32_e32 v50, 1.0, v10
+; GCN-NEXT: v_mul_f32_e32 v51, 1.0, v11
+; GCN-NEXT: v_mul_f32_e32 v52, 1.0, v12
+; GCN-NEXT: v_mul_f32_e32 v53, 1.0, v13
+; GCN-NEXT: v_mul_f32_e32 v54, 1.0, v14
+; GCN-NEXT: v_mul_f32_e32 v55, 1.0, v15
+; GCN-NEXT: v_mul_f32_e32 v40, 1.0, v16
+; GCN-NEXT: v_mul_f32_e32 v41, 1.0, v17
+; GCN-NEXT: v_mul_f32_e32 v42, 1.0, v18
+; GCN-NEXT: v_mul_f32_e32 v43, 1.0, v19
+; GCN-NEXT: v_mul_f32_e32 v44, 1.0, v20
+; GCN-NEXT: v_mul_f32_e32 v45, 1.0, v21
+; GCN-NEXT: v_mul_f32_e32 v46, 1.0, v22
+; GCN-NEXT: v_mul_f32_e32 v47, 1.0, v23
+; GCN-NEXT: v_mul_f32_e32 v56, 1.0, v24
+; GCN-NEXT: v_mul_f32_e32 v57, 1.0, v25
+; GCN-NEXT: v_mul_f32_e32 v58, 1.0, v26
+; GCN-NEXT: v_mul_f32_e32 v59, 1.0, v27
+; GCN-NEXT: v_mul_f32_e32 v60, 1.0, v28
+; GCN-NEXT: v_mul_f32_e32 v61, 1.0, v29
+; GCN-NEXT: v_mul_f32_e32 v62, 1.0, v30
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v63
+; GCN-NEXT: v_mul_f32_e32 v63, 1.0, v31
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr9
+; GCN-NEXT: ; implicit-def: $vgpr10
+; GCN-NEXT: ; implicit-def: $vgpr11
+; GCN-NEXT: ; implicit-def: $vgpr12
+; GCN-NEXT: ; implicit-def: $vgpr13
+; GCN-NEXT: ; implicit-def: $vgpr14
+; GCN-NEXT: ; implicit-def: $vgpr15
+; GCN-NEXT: ; implicit-def: $vgpr16
+; GCN-NEXT: ; implicit-def: $vgpr17
+; GCN-NEXT: ; implicit-def: $vgpr18
+; GCN-NEXT: ; implicit-def: $vgpr19
+; GCN-NEXT: ; implicit-def: $vgpr20
+; GCN-NEXT: ; implicit-def: $vgpr21
+; GCN-NEXT: ; implicit-def: $vgpr22
+; GCN-NEXT: ; implicit-def: $vgpr23
+; GCN-NEXT: ; implicit-def: $vgpr24
+; GCN-NEXT: ; implicit-def: $vgpr25
+; GCN-NEXT: ; implicit-def: $vgpr26
+; GCN-NEXT: ; implicit-def: $vgpr27
+; GCN-NEXT: ; implicit-def: $vgpr28
+; GCN-NEXT: ; implicit-def: $vgpr29
+; GCN-NEXT: ; implicit-def: $vgpr30
+; GCN-NEXT: ; implicit-def: $vgpr31
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB41_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v32
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v33
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v34
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v35
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v36
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v37
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v38
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v39
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v48
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v49
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v50
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v51
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v52
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v53
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v54
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v55
+; GCN-NEXT: v_lshrrev_b32_e32 v16, 16, v40
+; GCN-NEXT: v_lshrrev_b32_e32 v17, 16, v41
+; GCN-NEXT: v_lshrrev_b32_e32 v18, 16, v42
+; GCN-NEXT: v_lshrrev_b32_e32 v19, 16, v43
+; GCN-NEXT: v_lshrrev_b32_e32 v20, 16, v44
+; GCN-NEXT: v_lshrrev_b32_e32 v21, 16, v45
+; GCN-NEXT: v_lshrrev_b32_e32 v22, 16, v46
+; GCN-NEXT: v_lshrrev_b32_e32 v23, 16, v47
+; GCN-NEXT: v_lshrrev_b32_e32 v24, 16, v56
+; GCN-NEXT: v_lshrrev_b32_e32 v25, 16, v57
+; GCN-NEXT: v_lshrrev_b32_e32 v26, 16, v58
+; GCN-NEXT: v_lshrrev_b32_e32 v27, 16, v59
+; GCN-NEXT: v_lshrrev_b32_e32 v28, 16, v60
+; GCN-NEXT: v_lshrrev_b32_e32 v29, 16, v61
+; GCN-NEXT: v_lshrrev_b32_e32 v30, 16, v62
+; GCN-NEXT: v_lshrrev_b32_e32 v31, 16, v63
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v16
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v17
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v18
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v19
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v20
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v21
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v22
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v23
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v24
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v25
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v26
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v27
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v28
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v29
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v30
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v31
+; GCN-NEXT: ; implicit-def: $vgpr32
+; GCN-NEXT: ; implicit-def: $vgpr33
+; GCN-NEXT: ; implicit-def: $vgpr34
+; GCN-NEXT: ; implicit-def: $vgpr35
+; GCN-NEXT: ; implicit-def: $vgpr36
+; GCN-NEXT: ; implicit-def: $vgpr37
+; GCN-NEXT: ; implicit-def: $vgpr38
+; GCN-NEXT: ; implicit-def: $vgpr39
+; GCN-NEXT: ; implicit-def: $vgpr48
+; GCN-NEXT: ; implicit-def: $vgpr49
+; GCN-NEXT: ; implicit-def: $vgpr50
+; GCN-NEXT: ; implicit-def: $vgpr51
+; GCN-NEXT: ; implicit-def: $vgpr52
+; GCN-NEXT: ; implicit-def: $vgpr53
+; GCN-NEXT: ; implicit-def: $vgpr54
+; GCN-NEXT: ; implicit-def: $vgpr55
+; GCN-NEXT: ; implicit-def: $vgpr40
+; GCN-NEXT: ; implicit-def: $vgpr41
+; GCN-NEXT: ; implicit-def: $vgpr42
+; GCN-NEXT: ; implicit-def: $vgpr43
+; GCN-NEXT: ; implicit-def: $vgpr44
+; GCN-NEXT: ; implicit-def: $vgpr45
+; GCN-NEXT: ; implicit-def: $vgpr46
+; GCN-NEXT: ; implicit-def: $vgpr47
+; GCN-NEXT: ; implicit-def: $vgpr56
+; GCN-NEXT: ; implicit-def: $vgpr57
+; GCN-NEXT: ; implicit-def: $vgpr58
+; GCN-NEXT: ; implicit-def: $vgpr59
+; GCN-NEXT: ; implicit-def: $vgpr60
+; GCN-NEXT: ; implicit-def: $vgpr61
+; GCN-NEXT: ; implicit-def: $vgpr62
+; GCN-NEXT: ; implicit-def: $vgpr63
+; GCN-NEXT: .LBB41_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB41_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v63
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v62
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v61
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v60
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v59
+; GCN-NEXT: v_and_b32_e32 v5, 0xffff0000, v58
+; GCN-NEXT: v_and_b32_e32 v6, 0xffff0000, v57
+; GCN-NEXT: v_and_b32_e32 v7, 0xffff0000, v56
+; GCN-NEXT: v_and_b32_e32 v8, 0xffff0000, v47
+; GCN-NEXT: v_and_b32_e32 v9, 0xffff0000, v46
+; GCN-NEXT: v_and_b32_e32 v10, 0xffff0000, v45
+; GCN-NEXT: v_and_b32_e32 v11, 0xffff0000, v44
+; GCN-NEXT: v_and_b32_e32 v12, 0xffff0000, v43
+; GCN-NEXT: v_and_b32_e32 v13, 0xffff0000, v42
+; GCN-NEXT: v_and_b32_e32 v14, 0xffff0000, v41
+; GCN-NEXT: v_and_b32_e32 v15, 0xffff0000, v40
+; GCN-NEXT: v_and_b32_e32 v16, 0xffff0000, v55
+; GCN-NEXT: v_and_b32_e32 v17, 0xffff0000, v54
+; GCN-NEXT: v_and_b32_e32 v18, 0xffff0000, v53
+; GCN-NEXT: v_and_b32_e32 v19, 0xffff0000, v52
+; GCN-NEXT: v_and_b32_e32 v20, 0xffff0000, v51
+; GCN-NEXT: v_and_b32_e32 v21, 0xffff0000, v50
+; GCN-NEXT: v_and_b32_e32 v22, 0xffff0000, v49
+; GCN-NEXT: v_and_b32_e32 v23, 0xffff0000, v48
+; GCN-NEXT: v_and_b32_e32 v24, 0xffff0000, v39
+; GCN-NEXT: v_and_b32_e32 v25, 0xffff0000, v38
+; GCN-NEXT: v_and_b32_e32 v26, 0xffff0000, v37
+; GCN-NEXT: v_and_b32_e32 v27, 0xffff0000, v36
+; GCN-NEXT: v_and_b32_e32 v28, 0xffff0000, v35
+; GCN-NEXT: v_and_b32_e32 v29, 0xffff0000, v34
+; GCN-NEXT: v_and_b32_e32 v30, 0xffff0000, v33
+; GCN-NEXT: v_and_b32_e32 v31, 0xffff0000, v32
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GCN-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GCN-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GCN-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GCN-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GCN-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GCN-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GCN-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GCN-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GCN-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GCN-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GCN-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GCN-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GCN-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GCN-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GCN-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GCN-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GCN-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GCN-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GCN-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GCN-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GCN-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GCN-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GCN-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GCN-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GCN-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GCN-NEXT: v_add_f32_e32 v30, 0x40c00000, v30
+; GCN-NEXT: v_add_f32_e32 v31, 0x40c00000, v31
+; GCN-NEXT: v_lshrrev_b32_e32 v32, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v33, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v34, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v35, 16, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v36, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v37, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v38, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v39, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v48, 16, v8
+; GCN-NEXT: v_lshrrev_b32_e32 v49, 16, v9
+; GCN-NEXT: v_lshrrev_b32_e32 v50, 16, v10
+; GCN-NEXT: v_lshrrev_b32_e32 v51, 16, v11
+; GCN-NEXT: v_lshrrev_b32_e32 v52, 16, v12
+; GCN-NEXT: v_lshrrev_b32_e32 v53, 16, v13
+; GCN-NEXT: v_lshrrev_b32_e32 v54, 16, v14
+; GCN-NEXT: v_lshrrev_b32_e32 v55, 16, v15
+; GCN-NEXT: v_lshrrev_b32_e32 v15, 16, v16
+; GCN-NEXT: v_lshrrev_b32_e32 v14, 16, v17
+; GCN-NEXT: v_lshrrev_b32_e32 v13, 16, v18
+; GCN-NEXT: v_lshrrev_b32_e32 v12, 16, v19
+; GCN-NEXT: v_lshrrev_b32_e32 v11, 16, v20
+; GCN-NEXT: v_lshrrev_b32_e32 v10, 16, v21
+; GCN-NEXT: v_lshrrev_b32_e32 v9, 16, v22
+; GCN-NEXT: v_lshrrev_b32_e32 v8, 16, v23
+; GCN-NEXT: v_lshrrev_b32_e32 v7, 16, v24
+; GCN-NEXT: v_lshrrev_b32_e32 v6, 16, v25
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v26
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v27
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v28
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v29
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v30
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v31
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v6, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v7, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v8, v8
+; GCN-NEXT: v_cvt_f32_f16_e32 v9, v9
+; GCN-NEXT: v_cvt_f32_f16_e32 v10, v10
+; GCN-NEXT: v_cvt_f32_f16_e32 v11, v11
+; GCN-NEXT: v_cvt_f32_f16_e32 v12, v12
+; GCN-NEXT: v_cvt_f32_f16_e32 v13, v13
+; GCN-NEXT: v_cvt_f32_f16_e32 v14, v14
+; GCN-NEXT: v_cvt_f32_f16_e32 v15, v15
+; GCN-NEXT: v_cvt_f32_f16_e32 v16, v55
+; GCN-NEXT: v_cvt_f32_f16_e32 v17, v54
+; GCN-NEXT: v_cvt_f32_f16_e32 v18, v53
+; GCN-NEXT: v_cvt_f32_f16_e32 v19, v52
+; GCN-NEXT: v_cvt_f32_f16_e32 v20, v51
+; GCN-NEXT: v_cvt_f32_f16_e32 v21, v50
+; GCN-NEXT: v_cvt_f32_f16_e32 v22, v49
+; GCN-NEXT: v_cvt_f32_f16_e32 v23, v48
+; GCN-NEXT: v_cvt_f32_f16_e32 v24, v39
+; GCN-NEXT: v_cvt_f32_f16_e32 v25, v38
+; GCN-NEXT: v_cvt_f32_f16_e32 v26, v37
+; GCN-NEXT: v_cvt_f32_f16_e32 v27, v36
+; GCN-NEXT: v_cvt_f32_f16_e32 v28, v35
+; GCN-NEXT: v_cvt_f32_f16_e32 v29, v34
+; GCN-NEXT: v_cvt_f32_f16_e32 v30, v33
+; GCN-NEXT: v_cvt_f32_f16_e32 v31, v32
+; GCN-NEXT: .LBB41_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
+; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v32bf16_to_v32f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB41_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; VI-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; VI-NEXT: v_bfe_u32 v17, v16, 16, 1
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v16
+; VI-NEXT: v_add_u32_e32 v17, vcc, 0x7fff, v17
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; VI-NEXT: v_bfe_u32 v17, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v17, vcc, v17, v0
+; VI-NEXT: v_add_u32_e32 v17, vcc, s6, v17
+; VI-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v17, 16, v1
+; VI-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; VI-NEXT: v_bfe_u32 v18, v17, 16, 1
+; VI-NEXT: v_add_u32_e32 v18, vcc, v18, v17
+; VI-NEXT: v_add_u32_e32 v18, vcc, s6, v18
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v17, v17
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v17, v18, v19, vcc
+; VI-NEXT: v_bfe_u32 v18, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v18, vcc, v18, v1
+; VI-NEXT: v_add_u32_e32 v18, vcc, s6, v18
+; VI-NEXT: v_or_b32_e32 v19, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v18, v19, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v18, 16, v2
+; VI-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; VI-NEXT: v_bfe_u32 v19, v18, 16, 1
+; VI-NEXT: v_add_u32_e32 v19, vcc, v19, v18
+; VI-NEXT: v_add_u32_e32 v19, vcc, s6, v19
+; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; VI-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v18, v18
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc
+; VI-NEXT: v_bfe_u32 v19, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v19, vcc, v19, v2
+; VI-NEXT: v_add_u32_e32 v19, vcc, s6, v19
+; VI-NEXT: v_or_b32_e32 v20, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_cndmask_b32_e32 v2, v19, v20, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v19, 16, v3
+; VI-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; VI-NEXT: v_bfe_u32 v20, v19, 16, 1
+; VI-NEXT: v_add_u32_e32 v20, vcc, v20, v19
+; VI-NEXT: v_add_u32_e32 v20, vcc, s6, v20
+; VI-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; VI-NEXT: v_or_b32_e32 v21, 0x400000, v19
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v19, v19
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_cndmask_b32_e32 v19, v20, v21, vcc
+; VI-NEXT: v_bfe_u32 v20, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v20, vcc, v20, v3
+; VI-NEXT: v_add_u32_e32 v20, vcc, s6, v20
+; VI-NEXT: v_or_b32_e32 v21, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_cndmask_b32_e32 v3, v20, v21, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v20, 16, v4
+; VI-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; VI-NEXT: v_bfe_u32 v21, v20, 16, 1
+; VI-NEXT: v_add_u32_e32 v21, vcc, v21, v20
+; VI-NEXT: v_add_u32_e32 v21, vcc, s6, v21
+; VI-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; VI-NEXT: v_or_b32_e32 v22, 0x400000, v20
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v20, v20
+; VI-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; VI-NEXT: v_cndmask_b32_e32 v20, v21, v22, vcc
+; VI-NEXT: v_bfe_u32 v21, v4, 16, 1
+; VI-NEXT: v_add_u32_e32 v21, vcc, v21, v4
+; VI-NEXT: v_add_u32_e32 v21, vcc, s6, v21
+; VI-NEXT: v_or_b32_e32 v22, 0x400000, v4
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; VI-NEXT: v_cndmask_b32_e32 v4, v21, v22, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; VI-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; VI-NEXT: v_bfe_u32 v22, v21, 16, 1
+; VI-NEXT: v_add_u32_e32 v22, vcc, v22, v21
+; VI-NEXT: v_add_u32_e32 v22, vcc, s6, v22
+; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; VI-NEXT: v_or_b32_e32 v23, 0x400000, v21
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v21, v21
+; VI-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; VI-NEXT: v_cndmask_b32_e32 v21, v22, v23, vcc
+; VI-NEXT: v_bfe_u32 v22, v5, 16, 1
+; VI-NEXT: v_add_u32_e32 v22, vcc, v22, v5
+; VI-NEXT: v_add_u32_e32 v22, vcc, s6, v22
+; VI-NEXT: v_or_b32_e32 v23, 0x400000, v5
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; VI-NEXT: v_cndmask_b32_e32 v5, v22, v23, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v22, 16, v6
+; VI-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; VI-NEXT: v_bfe_u32 v23, v22, 16, 1
+; VI-NEXT: v_add_u32_e32 v23, vcc, v23, v22
+; VI-NEXT: v_add_u32_e32 v23, vcc, s6, v23
+; VI-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; VI-NEXT: v_or_b32_e32 v24, 0x400000, v22
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v22, v22
+; VI-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; VI-NEXT: v_cndmask_b32_e32 v22, v23, v24, vcc
+; VI-NEXT: v_bfe_u32 v23, v6, 16, 1
+; VI-NEXT: v_add_u32_e32 v23, vcc, v23, v6
+; VI-NEXT: v_add_u32_e32 v23, vcc, s6, v23
+; VI-NEXT: v_or_b32_e32 v24, 0x400000, v6
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; VI-NEXT: v_cndmask_b32_e32 v6, v23, v24, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v23, 16, v7
+; VI-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; VI-NEXT: v_bfe_u32 v24, v23, 16, 1
+; VI-NEXT: v_add_u32_e32 v24, vcc, v24, v23
+; VI-NEXT: v_add_u32_e32 v24, vcc, s6, v24
+; VI-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; VI-NEXT: v_or_b32_e32 v25, 0x400000, v23
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v23, v23
+; VI-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; VI-NEXT: v_cndmask_b32_e32 v23, v24, v25, vcc
+; VI-NEXT: v_bfe_u32 v24, v7, 16, 1
+; VI-NEXT: v_add_u32_e32 v24, vcc, v24, v7
+; VI-NEXT: v_add_u32_e32 v24, vcc, s6, v24
+; VI-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; VI-NEXT: v_cndmask_b32_e32 v7, v24, v25, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v24, 16, v8
+; VI-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; VI-NEXT: v_bfe_u32 v25, v24, 16, 1
+; VI-NEXT: v_add_u32_e32 v25, vcc, v25, v24
+; VI-NEXT: v_add_u32_e32 v25, vcc, s6, v25
+; VI-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; VI-NEXT: v_or_b32_e32 v26, 0x400000, v24
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v24, v24
+; VI-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; VI-NEXT: v_cndmask_b32_e32 v24, v25, v26, vcc
+; VI-NEXT: v_bfe_u32 v25, v8, 16, 1
+; VI-NEXT: v_add_u32_e32 v25, vcc, v25, v8
+; VI-NEXT: v_add_u32_e32 v25, vcc, s6, v25
+; VI-NEXT: v_or_b32_e32 v26, 0x400000, v8
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; VI-NEXT: v_cndmask_b32_e32 v8, v25, v26, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v25, 16, v9
+; VI-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; VI-NEXT: v_bfe_u32 v26, v25, 16, 1
+; VI-NEXT: v_add_u32_e32 v26, vcc, v26, v25
+; VI-NEXT: v_add_u32_e32 v26, vcc, s6, v26
+; VI-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; VI-NEXT: v_or_b32_e32 v27, 0x400000, v25
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v25, v25
+; VI-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; VI-NEXT: v_cndmask_b32_e32 v25, v26, v27, vcc
+; VI-NEXT: v_bfe_u32 v26, v9, 16, 1
+; VI-NEXT: v_add_u32_e32 v26, vcc, v26, v9
+; VI-NEXT: v_add_u32_e32 v26, vcc, s6, v26
+; VI-NEXT: v_or_b32_e32 v27, 0x400000, v9
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; VI-NEXT: v_cndmask_b32_e32 v9, v26, v27, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v26, 16, v10
+; VI-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; VI-NEXT: v_bfe_u32 v27, v26, 16, 1
+; VI-NEXT: v_add_u32_e32 v27, vcc, v27, v26
+; VI-NEXT: v_add_u32_e32 v27, vcc, s6, v27
+; VI-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; VI-NEXT: v_or_b32_e32 v28, 0x400000, v26
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v26, v26
+; VI-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; VI-NEXT: v_cndmask_b32_e32 v26, v27, v28, vcc
+; VI-NEXT: v_bfe_u32 v27, v10, 16, 1
+; VI-NEXT: v_add_u32_e32 v27, vcc, v27, v10
+; VI-NEXT: v_add_u32_e32 v27, vcc, s6, v27
+; VI-NEXT: v_or_b32_e32 v28, 0x400000, v10
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; VI-NEXT: v_cndmask_b32_e32 v10, v27, v28, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v27, 16, v11
+; VI-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; VI-NEXT: v_bfe_u32 v28, v27, 16, 1
+; VI-NEXT: v_add_u32_e32 v28, vcc, v28, v27
+; VI-NEXT: v_add_u32_e32 v28, vcc, s6, v28
+; VI-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; VI-NEXT: v_or_b32_e32 v29, 0x400000, v27
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v27, v27
+; VI-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; VI-NEXT: v_cndmask_b32_e32 v27, v28, v29, vcc
+; VI-NEXT: v_bfe_u32 v28, v11, 16, 1
+; VI-NEXT: v_add_u32_e32 v28, vcc, v28, v11
+; VI-NEXT: v_add_u32_e32 v28, vcc, s6, v28
+; VI-NEXT: v_or_b32_e32 v29, 0x400000, v11
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; VI-NEXT: v_cndmask_b32_e32 v11, v28, v29, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v28, 16, v12
+; VI-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; VI-NEXT: v_bfe_u32 v29, v28, 16, 1
+; VI-NEXT: v_add_u32_e32 v29, vcc, v29, v28
+; VI-NEXT: v_add_u32_e32 v29, vcc, s6, v29
+; VI-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; VI-NEXT: v_or_b32_e32 v30, 0x400000, v28
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v28, v28
+; VI-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; VI-NEXT: v_cndmask_b32_e32 v28, v29, v30, vcc
+; VI-NEXT: v_bfe_u32 v29, v12, 16, 1
+; VI-NEXT: v_add_u32_e32 v29, vcc, v29, v12
+; VI-NEXT: v_add_u32_e32 v29, vcc, s6, v29
+; VI-NEXT: v_or_b32_e32 v30, 0x400000, v12
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; VI-NEXT: v_cndmask_b32_e32 v12, v29, v30, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v29, 16, v13
+; VI-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; VI-NEXT: v_bfe_u32 v30, v29, 16, 1
+; VI-NEXT: v_add_u32_e32 v30, vcc, v30, v29
+; VI-NEXT: v_add_u32_e32 v30, vcc, s6, v30
+; VI-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; VI-NEXT: v_or_b32_e32 v31, 0x400000, v29
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v29, v29
+; VI-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; VI-NEXT: v_cndmask_b32_e32 v29, v30, v31, vcc
+; VI-NEXT: v_bfe_u32 v30, v13, 16, 1
+; VI-NEXT: v_add_u32_e32 v30, vcc, v30, v13
+; VI-NEXT: v_add_u32_e32 v30, vcc, s6, v30
+; VI-NEXT: v_or_b32_e32 v31, 0x400000, v13
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; VI-NEXT: v_cndmask_b32_e32 v13, v30, v31, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v30, 16, v14
+; VI-NEXT: v_add_f32_e32 v30, 0x40c00000, v30
+; VI-NEXT: v_bfe_u32 v31, v30, 16, 1
+; VI-NEXT: v_add_u32_e32 v31, vcc, v31, v30
+; VI-NEXT: v_add_u32_e32 v31, vcc, s6, v31
+; VI-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; VI-NEXT: v_or_b32_e32 v32, 0x400000, v30
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v30, v30
+; VI-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; VI-NEXT: v_cndmask_b32_e32 v30, v31, v32, vcc
+; VI-NEXT: v_bfe_u32 v31, v14, 16, 1
+; VI-NEXT: v_add_u32_e32 v31, vcc, v31, v14
+; VI-NEXT: v_add_u32_e32 v31, vcc, s6, v31
+; VI-NEXT: v_or_b32_e32 v32, 0x400000, v14
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; VI-NEXT: v_cndmask_b32_e32 v14, v31, v32, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v31, 16, v15
+; VI-NEXT: v_add_f32_e32 v31, 0x40c00000, v31
+; VI-NEXT: v_bfe_u32 v32, v31, 16, 1
+; VI-NEXT: v_add_u32_e32 v32, vcc, v32, v31
+; VI-NEXT: v_add_u32_e32 v32, vcc, s6, v32
+; VI-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; VI-NEXT: v_or_b32_e32 v33, 0x400000, v31
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v31, v31
+; VI-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; VI-NEXT: v_cndmask_b32_e32 v31, v32, v33, vcc
+; VI-NEXT: v_bfe_u32 v32, v15, 16, 1
+; VI-NEXT: v_add_u32_e32 v32, vcc, v32, v15
+; VI-NEXT: v_add_u32_e32 v32, vcc, s6, v32
+; VI-NEXT: v_or_b32_e32 v33, 0x400000, v15
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; VI-NEXT: v_cndmask_b32_e32 v15, v32, v33, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v15, 16, v15
+; VI-NEXT: v_lshrrev_b32_e32 v14, 16, v14
+; VI-NEXT: v_lshrrev_b32_e32 v13, 16, v13
+; VI-NEXT: v_lshrrev_b32_e32 v12, 16, v12
+; VI-NEXT: v_lshrrev_b32_e32 v11, 16, v11
+; VI-NEXT: v_lshrrev_b32_e32 v10, 16, v10
+; VI-NEXT: v_lshrrev_b32_e32 v9, 16, v9
+; VI-NEXT: v_lshrrev_b32_e32 v8, 16, v8
+; VI-NEXT: v_lshrrev_b32_e32 v7, 16, v7
+; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v6
+; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v5
+; VI-NEXT: v_lshrrev_b32_e32 v4, 16, v4
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
+; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; VI-NEXT: v_alignbit_b32 v15, v15, v31, 16
+; VI-NEXT: v_alignbit_b32 v14, v14, v30, 16
+; VI-NEXT: v_alignbit_b32 v13, v13, v29, 16
+; VI-NEXT: v_alignbit_b32 v12, v12, v28, 16
+; VI-NEXT: v_alignbit_b32 v11, v11, v27, 16
+; VI-NEXT: v_alignbit_b32 v10, v10, v26, 16
+; VI-NEXT: v_alignbit_b32 v9, v9, v25, 16
+; VI-NEXT: v_alignbit_b32 v8, v8, v24, 16
+; VI-NEXT: v_alignbit_b32 v7, v7, v23, 16
+; VI-NEXT: v_alignbit_b32 v6, v6, v22, 16
+; VI-NEXT: v_alignbit_b32 v5, v5, v21, 16
+; VI-NEXT: v_alignbit_b32 v4, v4, v20, 16
+; VI-NEXT: v_alignbit_b32 v3, v3, v19, 16
+; VI-NEXT: v_alignbit_b32 v2, v2, v18, 16
+; VI-NEXT: v_alignbit_b32 v1, v1, v17, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v16, 16
+; VI-NEXT: .LBB41_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v32bf16_to_v32f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB41_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v16, 0x40c00000, v16
+; GFX9-NEXT: v_bfe_u32 v17, v16, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v17, v17, v16, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v16
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v16, v16
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v16, v17, v18, vcc
+; GFX9-NEXT: v_bfe_u32 v17, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v17, v17, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v18, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v17, v18, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v17, 16, v1
+; GFX9-NEXT: v_add_f32_e32 v17, 0x40c00000, v17
+; GFX9-NEXT: v_bfe_u32 v18, v17, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX9-NEXT: v_add3_u32 v18, v18, v17, s6
+; GFX9-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v17, v17
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v17, v18, v19, vcc
+; GFX9-NEXT: v_bfe_u32 v18, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v18, v18, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v19, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v18, v19, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v18, 16, v2
+; GFX9-NEXT: v_add_f32_e32 v18, 0x40c00000, v18
+; GFX9-NEXT: v_bfe_u32 v19, v18, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX9-NEXT: v_add3_u32 v19, v19, v18, s6
+; GFX9-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v18, v18
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v18, v19, v20, vcc
+; GFX9-NEXT: v_bfe_u32 v19, v2, 16, 1
+; GFX9-NEXT: v_add3_u32 v19, v19, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v20, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v19, v20, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v19, 16, v3
+; GFX9-NEXT: v_add_f32_e32 v19, 0x40c00000, v19
+; GFX9-NEXT: v_bfe_u32 v20, v19, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX9-NEXT: v_add3_u32 v20, v20, v19, s6
+; GFX9-NEXT: v_or_b32_e32 v21, 0x400000, v19
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v19, v19
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v19, v20, v21, vcc
+; GFX9-NEXT: v_bfe_u32 v20, v3, 16, 1
+; GFX9-NEXT: v_add3_u32 v20, v20, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v21, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v20, v21, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v20, 16, v4
+; GFX9-NEXT: v_add_f32_e32 v20, 0x40c00000, v20
+; GFX9-NEXT: v_bfe_u32 v21, v20, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX9-NEXT: v_add3_u32 v21, v21, v20, s6
+; GFX9-NEXT: v_or_b32_e32 v22, 0x400000, v20
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v20, v20
+; GFX9-NEXT: v_add_f32_e32 v4, 0x40c00000, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v20, v21, v22, vcc
+; GFX9-NEXT: v_bfe_u32 v21, v4, 16, 1
+; GFX9-NEXT: v_add3_u32 v21, v21, v4, s6
+; GFX9-NEXT: v_or_b32_e32 v22, 0x400000, v4
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v4, v4
+; GFX9-NEXT: v_cndmask_b32_e32 v4, v21, v22, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v21, 16, v5
+; GFX9-NEXT: v_add_f32_e32 v21, 0x40c00000, v21
+; GFX9-NEXT: v_bfe_u32 v22, v21, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX9-NEXT: v_add3_u32 v22, v22, v21, s6
+; GFX9-NEXT: v_or_b32_e32 v23, 0x400000, v21
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v21, v21
+; GFX9-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v21, v22, v23, vcc
+; GFX9-NEXT: v_bfe_u32 v22, v5, 16, 1
+; GFX9-NEXT: v_add3_u32 v22, v22, v5, s6
+; GFX9-NEXT: v_or_b32_e32 v23, 0x400000, v5
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v5, v5
+; GFX9-NEXT: v_cndmask_b32_e32 v5, v22, v23, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v22, 16, v6
+; GFX9-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GFX9-NEXT: v_bfe_u32 v23, v22, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX9-NEXT: v_add3_u32 v23, v23, v22, s6
+; GFX9-NEXT: v_or_b32_e32 v24, 0x400000, v22
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v22, v22
+; GFX9-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v22, v23, v24, vcc
+; GFX9-NEXT: v_bfe_u32 v23, v6, 16, 1
+; GFX9-NEXT: v_add3_u32 v23, v23, v6, s6
+; GFX9-NEXT: v_or_b32_e32 v24, 0x400000, v6
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v6, v6
+; GFX9-NEXT: v_cndmask_b32_e32 v6, v23, v24, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v23, 16, v7
+; GFX9-NEXT: v_add_f32_e32 v23, 0x40c00000, v23
+; GFX9-NEXT: v_bfe_u32 v24, v23, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v7, 0xffff0000, v7
+; GFX9-NEXT: v_add3_u32 v24, v24, v23, s6
+; GFX9-NEXT: v_or_b32_e32 v25, 0x400000, v23
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v23, v23
+; GFX9-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v23, v24, v25, vcc
+; GFX9-NEXT: v_bfe_u32 v24, v7, 16, 1
+; GFX9-NEXT: v_add3_u32 v24, v24, v7, s6
+; GFX9-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v7, v7
+; GFX9-NEXT: v_cndmask_b32_e32 v7, v24, v25, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v24, 16, v8
+; GFX9-NEXT: v_add_f32_e32 v24, 0x40c00000, v24
+; GFX9-NEXT: v_bfe_u32 v25, v24, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX9-NEXT: v_add3_u32 v25, v25, v24, s6
+; GFX9-NEXT: v_or_b32_e32 v26, 0x400000, v24
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v24, v24
+; GFX9-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v24, v25, v26, vcc
+; GFX9-NEXT: v_bfe_u32 v25, v8, 16, 1
+; GFX9-NEXT: v_add3_u32 v25, v25, v8, s6
+; GFX9-NEXT: v_or_b32_e32 v26, 0x400000, v8
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v8, v8
+; GFX9-NEXT: v_cndmask_b32_e32 v8, v25, v26, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v25, 16, v9
+; GFX9-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GFX9-NEXT: v_bfe_u32 v26, v25, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX9-NEXT: v_add3_u32 v26, v26, v25, s6
+; GFX9-NEXT: v_or_b32_e32 v27, 0x400000, v25
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v25, v25
+; GFX9-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v25, v26, v27, vcc
+; GFX9-NEXT: v_bfe_u32 v26, v9, 16, 1
+; GFX9-NEXT: v_add3_u32 v26, v26, v9, s6
+; GFX9-NEXT: v_or_b32_e32 v27, 0x400000, v9
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v9, v9
+; GFX9-NEXT: v_cndmask_b32_e32 v9, v26, v27, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v26, 16, v10
+; GFX9-NEXT: v_add_f32_e32 v26, 0x40c00000, v26
+; GFX9-NEXT: v_bfe_u32 v27, v26, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX9-NEXT: v_add3_u32 v27, v27, v26, s6
+; GFX9-NEXT: v_or_b32_e32 v28, 0x400000, v26
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v26, v26
+; GFX9-NEXT: v_add_f32_e32 v10, 0x40c00000, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v26, v27, v28, vcc
+; GFX9-NEXT: v_bfe_u32 v27, v10, 16, 1
+; GFX9-NEXT: v_add3_u32 v27, v27, v10, s6
+; GFX9-NEXT: v_or_b32_e32 v28, 0x400000, v10
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v10, v10
+; GFX9-NEXT: v_cndmask_b32_e32 v10, v27, v28, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v27, 16, v11
+; GFX9-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GFX9-NEXT: v_bfe_u32 v28, v27, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v11, 0xffff0000, v11
+; GFX9-NEXT: v_add3_u32 v28, v28, v27, s6
+; GFX9-NEXT: v_or_b32_e32 v29, 0x400000, v27
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v27, v27
+; GFX9-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v27, v28, v29, vcc
+; GFX9-NEXT: v_bfe_u32 v28, v11, 16, 1
+; GFX9-NEXT: v_add3_u32 v28, v28, v11, s6
+; GFX9-NEXT: v_or_b32_e32 v29, 0x400000, v11
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v11, v11
+; GFX9-NEXT: v_cndmask_b32_e32 v11, v28, v29, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v28, 16, v12
+; GFX9-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GFX9-NEXT: v_bfe_u32 v29, v28, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX9-NEXT: v_add3_u32 v29, v29, v28, s6
+; GFX9-NEXT: v_or_b32_e32 v30, 0x400000, v28
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v28, v28
+; GFX9-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v28, v29, v30, vcc
+; GFX9-NEXT: v_bfe_u32 v29, v12, 16, 1
+; GFX9-NEXT: v_add3_u32 v29, v29, v12, s6
+; GFX9-NEXT: v_or_b32_e32 v30, 0x400000, v12
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v12, v12
+; GFX9-NEXT: v_cndmask_b32_e32 v12, v29, v30, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v29, 16, v13
+; GFX9-NEXT: v_add_f32_e32 v29, 0x40c00000, v29
+; GFX9-NEXT: v_bfe_u32 v30, v29, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX9-NEXT: v_add3_u32 v30, v30, v29, s6
+; GFX9-NEXT: v_or_b32_e32 v31, 0x400000, v29
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v29, v29
+; GFX9-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v29, v30, v31, vcc
+; GFX9-NEXT: v_bfe_u32 v30, v13, 16, 1
+; GFX9-NEXT: v_add3_u32 v30, v30, v13, s6
+; GFX9-NEXT: v_or_b32_e32 v31, 0x400000, v13
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v13, v13
+; GFX9-NEXT: v_cndmask_b32_e32 v13, v30, v31, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v30, 16, v14
+; GFX9-NEXT: v_add_f32_e32 v30, 0x40c00000, v30
+; GFX9-NEXT: v_bfe_u32 v31, v30, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX9-NEXT: v_add3_u32 v31, v31, v30, s6
+; GFX9-NEXT: v_or_b32_e32 v32, 0x400000, v30
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v30, v30
+; GFX9-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v30, v31, v32, vcc
+; GFX9-NEXT: v_bfe_u32 v31, v14, 16, 1
+; GFX9-NEXT: v_add3_u32 v31, v31, v14, s6
+; GFX9-NEXT: v_or_b32_e32 v32, 0x400000, v14
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v14, v14
+; GFX9-NEXT: v_cndmask_b32_e32 v14, v31, v32, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v31, 16, v15
+; GFX9-NEXT: v_add_f32_e32 v31, 0x40c00000, v31
+; GFX9-NEXT: v_bfe_u32 v32, v31, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX9-NEXT: v_add3_u32 v32, v32, v31, s6
+; GFX9-NEXT: v_or_b32_e32 v33, 0x400000, v31
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v31, v31
+; GFX9-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v31, v32, v33, vcc
+; GFX9-NEXT: v_bfe_u32 v32, v15, 16, 1
+; GFX9-NEXT: v_add3_u32 v32, v32, v15, s6
+; GFX9-NEXT: v_or_b32_e32 v33, 0x400000, v15
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v15, v15
+; GFX9-NEXT: v_cndmask_b32_e32 v15, v32, v33, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v15, v15, v31, s6
+; GFX9-NEXT: v_perm_b32 v14, v14, v30, s6
+; GFX9-NEXT: v_perm_b32 v13, v13, v29, s6
+; GFX9-NEXT: v_perm_b32 v12, v12, v28, s6
+; GFX9-NEXT: v_perm_b32 v11, v11, v27, s6
+; GFX9-NEXT: v_perm_b32 v10, v10, v26, s6
+; GFX9-NEXT: v_perm_b32 v9, v9, v25, s6
+; GFX9-NEXT: v_perm_b32 v8, v8, v24, s6
+; GFX9-NEXT: v_perm_b32 v7, v7, v23, s6
+; GFX9-NEXT: v_perm_b32 v6, v6, v22, s6
+; GFX9-NEXT: v_perm_b32 v5, v5, v21, s6
+; GFX9-NEXT: v_perm_b32 v4, v4, v20, s6
+; GFX9-NEXT: v_perm_b32 v3, v3, v19, s6
+; GFX9-NEXT: v_perm_b32 v2, v2, v18, s6
+; GFX9-NEXT: v_perm_b32 v1, v1, v17, s6
+; GFX9-NEXT: v_perm_b32 v0, v0, v16, s6
+; GFX9-NEXT: .LBB41_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v32bf16_to_v32f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v16
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB41_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v17, 16, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v16, 16, v0
+; GFX11-NEXT: v_lshlrev_b32_e32 v24, 16, v5
+; GFX11-NEXT: v_lshlrev_b32_e32 v26, 16, v7
+; GFX11-NEXT: v_lshlrev_b32_e32 v28, 16, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_2) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_add_f32 v17, 0x40c00000, v17 :: v_dual_add_f32 v16, 0x40c00000, v16
+; GFX11-NEXT: v_lshlrev_b32_e32 v30, 16, v11
+; GFX11-NEXT: v_dual_add_f32 v24, 0x40c00000, v24 :: v_dual_lshlrev_b32 v25, 16, v6
+; GFX11-NEXT: v_bfe_u32 v21, v17, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v19, v16, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v16
+; GFX11-NEXT: v_add_f32_e32 v25, 0x40c00000, v25
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v16, v16
+; GFX11-NEXT: v_add3_u32 v21, v21, v17, 0x7fff
+; GFX11-NEXT: v_add3_u32 v19, v19, v16, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GFX11-NEXT: v_and_b32_e32 v6, 0xffff0000, v6
+; GFX11-NEXT: v_dual_add_f32 v26, 0x40c00000, v26 :: v_dual_lshlrev_b32 v27, 16, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_cndmask_b32_e32 v16, v19, v22, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v19, 0x400000, v17
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_lshlrev_b32 v22, 16, v3
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GFX11-NEXT: v_add_f32_e32 v6, 0x40c00000, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_and_b32_e32 v8, 0xffff0000, v8
+; GFX11-NEXT: v_add_f32_e32 v27, 0x40c00000, v27
+; GFX11-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX11-NEXT: v_lshlrev_b32_e32 v29, 16, v10
+; GFX11-NEXT: v_bfe_u32 v20, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v0
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add_f32_e32 v8, 0x40c00000, v8
+; GFX11-NEXT: v_dual_add_f32 v28, 0x40c00000, v28 :: v_dual_add_f32 v29, 0x40c00000, v29
+; GFX11-NEXT: v_add3_u32 v20, v20, v0, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v10, 0xffff0000, v10
+; GFX11-NEXT: v_dual_add_f32 v30, 0x40c00000, v30 :: v_dual_lshlrev_b32 v31, 16, v12
+; GFX11-NEXT: v_and_b32_e32 v12, 0xffff0000, v12
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v20, v23 :: v_dual_lshlrev_b32 v23, 16, v4
+; GFX11-NEXT: v_bfe_u32 v20, v1, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v17, v17
+; GFX11-NEXT: v_and_b32_e32 v4, 0xffff0000, v4
+; GFX11-NEXT: v_dual_add_f32 v10, 0x40c00000, v10 :: v_dual_add_f32 v23, 0x40c00000, v23
+; GFX11-NEXT: v_perm_b32 v0, v0, v16, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v17, v21, v19, vcc_lo
+; GFX11-NEXT: v_add3_u32 v19, v20, v1, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v18, 16, v2
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GFX11-NEXT: v_dual_add_f32 v4, 0x40c00000, v4 :: v_dual_add_f32 v31, 0x40c00000, v31
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v1, v19, v20 :: v_dual_add_f32 v18, 0x40c00000, v18
+; GFX11-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX11-NEXT: v_add_f32_e32 v12, 0x40c00000, v12
+; GFX11-NEXT: v_and_b32_e32 v5, 0xffff0000, v5
+; GFX11-NEXT: v_add_f32_e32 v22, 0x40c00000, v22
+; GFX11-NEXT: v_bfe_u32 v21, v18, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v18
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v18, v18
+; GFX11-NEXT: v_add_f32_e32 v5, 0x40c00000, v5
+; GFX11-NEXT: v_or_b32_e32 v33, 0x400000, v31
+; GFX11-NEXT: v_add3_u32 v19, v21, v18, 0x7fff
+; GFX11-NEXT: v_bfe_u32 v21, v2, 16, 1
+; GFX11-NEXT: v_bfe_u32 v34, v12, 16, 1
+; GFX11-NEXT: v_perm_b32 v1, v1, v17, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_dual_cndmask_b32 v18, v19, v20 :: v_dual_and_b32 v7, 0xffff0000, v7
+; GFX11-NEXT: v_add3_u32 v19, v21, v2, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_bfe_u32 v21, v22, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v7, 0x40c00000, v7
+; GFX11-NEXT: v_and_b32_e32 v9, 0xffff0000, v9
+; GFX11-NEXT: v_dual_cndmask_b32 v2, v19, v20 :: v_dual_and_b32 v11, 0xffff0000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v19, v21, v22, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v20, 0x400000, v22
+; GFX11-NEXT: v_bfe_u32 v21, v3, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v22, v22
+; GFX11-NEXT: v_bfe_u32 v22, v23, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v9, 0x40c00000, v9
+; GFX11-NEXT: v_perm_b32 v2, v2, v18, 0x7060302
+; GFX11-NEXT: v_add_f32_e32 v11, 0x40c00000, v11
+; GFX11-NEXT: v_cndmask_b32_e32 v19, v19, v20, vcc_lo
+; GFX11-NEXT: v_add3_u32 v20, v21, v3, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v3
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_or_b32_e32 v32, 0x400000, v11
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v20, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v20, v22, v23, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v21, 0x400000, v23
+; GFX11-NEXT: v_bfe_u32 v22, v4, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v23, v23
+; GFX11-NEXT: v_bfe_u32 v23, v24, 16, 1
+; GFX11-NEXT: v_perm_b32 v3, v3, v19, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v20, v20, v21, vcc_lo
+; GFX11-NEXT: v_add3_u32 v21, v22, v4, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v4
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v4, v4
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v4, v21, v22, vcc_lo
+; GFX11-NEXT: v_add3_u32 v21, v23, v24, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v22, 0x400000, v24
+; GFX11-NEXT: v_bfe_u32 v23, v5, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v24, v24
+; GFX11-NEXT: v_bfe_u32 v24, v25, 16, 1
+; GFX11-NEXT: v_perm_b32 v4, v4, v20, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v21, v21, v22, vcc_lo
+; GFX11-NEXT: v_add3_u32 v22, v23, v5, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v5
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v5, v5
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v5, v22, v23, vcc_lo
+; GFX11-NEXT: v_add3_u32 v22, v24, v25, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v23, 0x400000, v25
+; GFX11-NEXT: v_bfe_u32 v24, v6, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v25, v25
+; GFX11-NEXT: v_bfe_u32 v25, v26, 16, 1
+; GFX11-NEXT: v_perm_b32 v5, v5, v21, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v22, v22, v23, vcc_lo
+; GFX11-NEXT: v_add3_u32 v23, v24, v6, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v24, 0x400000, v6
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v6, v6
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v6, v23, v24, vcc_lo
+; GFX11-NEXT: v_add3_u32 v23, v25, v26, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v24, 0x400000, v26
+; GFX11-NEXT: v_bfe_u32 v25, v7, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v26, v26
+; GFX11-NEXT: v_bfe_u32 v26, v27, 16, 1
+; GFX11-NEXT: v_perm_b32 v6, v6, v22, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v23, v23, v24, vcc_lo
+; GFX11-NEXT: v_add3_u32 v24, v25, v7, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v7
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v7, v7
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v7, v24, v25, vcc_lo
+; GFX11-NEXT: v_add3_u32 v24, v26, v27, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v25, 0x400000, v27
+; GFX11-NEXT: v_bfe_u32 v26, v8, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v27, v27
+; GFX11-NEXT: v_bfe_u32 v27, v28, 16, 1
+; GFX11-NEXT: v_perm_b32 v7, v7, v23, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v24, v24, v25, vcc_lo
+; GFX11-NEXT: v_add3_u32 v25, v26, v8, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v8
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v8, v8
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v8, v25, v26, vcc_lo
+; GFX11-NEXT: v_add3_u32 v25, v27, v28, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v26, 0x400000, v28
+; GFX11-NEXT: v_bfe_u32 v27, v9, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v28, v28
+; GFX11-NEXT: v_bfe_u32 v28, v29, 16, 1
+; GFX11-NEXT: v_perm_b32 v8, v8, v24, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v25, v25, v26, vcc_lo
+; GFX11-NEXT: v_add3_u32 v26, v27, v9, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v27, 0x400000, v9
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v9, v9
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v9, v26, v27, vcc_lo
+; GFX11-NEXT: v_add3_u32 v26, v28, v29, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v27, 0x400000, v29
+; GFX11-NEXT: v_bfe_u32 v28, v10, 16, 1
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v29, v29
+; GFX11-NEXT: v_bfe_u32 v29, v30, 16, 1
+; GFX11-NEXT: v_perm_b32 v9, v9, v25, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v26, v26, v27, vcc_lo
+; GFX11-NEXT: v_add3_u32 v27, v28, v10, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v28, 0x400000, v10
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v10, v10
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v10, v27, v28, vcc_lo
+; GFX11-NEXT: v_add3_u32 v27, v29, v30, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v28, 0x400000, v30
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v30, v30
+; GFX11-NEXT: v_bfe_u32 v30, v31, 16, 1
+; GFX11-NEXT: v_bfe_u32 v29, v11, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_dual_cndmask_b32 v27, v27, v28 :: v_dual_lshlrev_b32 v28, 16, v13
+; GFX11-NEXT: v_add3_u32 v30, v30, v31, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v31, v31
+; GFX11-NEXT: v_and_b32_e32 v13, 0xffff0000, v13
+; GFX11-NEXT: v_add3_u32 v31, v34, v12, 0x7fff
+; GFX11-NEXT: v_add_f32_e32 v28, 0x40c00000, v28
+; GFX11-NEXT: v_add3_u32 v29, v29, v11, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v30, v30, v33, vcc_lo
+; GFX11-NEXT: v_or_b32_e32 v33, 0x400000, v12
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v12, v12
+; GFX11-NEXT: v_bfe_u32 v35, v28, 16, 1
+; GFX11-NEXT: v_add_f32_e32 v13, 0x40c00000, v13
+; GFX11-NEXT: v_or_b32_e32 v36, 0x400000, v28
+; GFX11-NEXT: v_cndmask_b32_e32 v12, v31, v33, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_add3_u32 v34, v35, v28, 0x7fff
+; GFX11-NEXT: v_lshlrev_b32_e32 v35, 16, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v28, v28
+; GFX11-NEXT: v_and_b32_e32 v14, 0xffff0000, v14
+; GFX11-NEXT: v_bfe_u32 v37, v13, 16, 1
+; GFX11-NEXT: v_perm_b32 v10, v10, v26, 0x7060302
+; GFX11-NEXT: v_dual_add_f32 v31, 0x40c00000, v35 :: v_dual_cndmask_b32 v28, v34, v36
+; GFX11-NEXT: v_lshlrev_b32_e32 v34, 16, v15
+; GFX11-NEXT: v_add_f32_e32 v14, 0x40c00000, v14
+; GFX11-NEXT: v_add3_u32 v33, v37, v13, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v35, v31, 16, 1
+; GFX11-NEXT: v_and_b32_e32 v15, 0xffff0000, v15
+; GFX11-NEXT: v_add_f32_e32 v34, 0x40c00000, v34
+; GFX11-NEXT: v_or_b32_e32 v37, 0x400000, v31
+; GFX11-NEXT: v_bfe_u32 v38, v14, 16, 1
+; GFX11-NEXT: v_add3_u32 v35, v35, v31, 0x7fff
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v31, v31
+; GFX11-NEXT: v_add_f32_e32 v15, 0x40c00000, v15
+; GFX11-NEXT: v_bfe_u32 v39, v34, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v48, 0x400000, v34
+; GFX11-NEXT: v_or_b32_e32 v36, 0x400000, v13
+; GFX11-NEXT: v_cndmask_b32_e32 v31, v35, v37, vcc_lo
+; GFX11-NEXT: v_add3_u32 v37, v38, v14, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v38, 0x400000, v14
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v14, v14
+; GFX11-NEXT: v_bfe_u32 v35, v15, 16, 1
+; GFX11-NEXT: v_add3_u32 v39, v39, v34, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v49, 0x400000, v15
+; GFX11-NEXT: v_cndmask_b32_e32 v14, v37, v38, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v34, v34
+; GFX11-NEXT: v_add3_u32 v35, v35, v15, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_2) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_perm_b32 v14, v14, v31, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v34, v39, v48, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v15, v15
+; GFX11-NEXT: v_cndmask_b32_e32 v15, v35, v49, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v13, v13
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_perm_b32 v15, v15, v34, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v13, v33, v36, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v11, v11
+; GFX11-NEXT: v_perm_b32 v12, v12, v30, 0x7060302
+; GFX11-NEXT: v_perm_b32 v13, v13, v28, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v11, v29, v32, vcc_lo
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v11, v11, v27, 0x7060302
+; GFX11-NEXT: .LBB41_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <32 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <32 x bfloat> %a1 to <32 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <32 x bfloat> %a to <32 x half>
+ br label %end
+
+end:
+ %phi = phi <32 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <32 x half> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.64bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.64bit.ll
new file mode 100644
index 0000000000000..e982d6b0631c7
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.64bit.ll
@@ -0,0 +1,4574 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define double @v_bitcast_i64_to_f64(i64 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i64_to_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i64_to_f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i64_to_f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i64_to_f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i64 %a, 3
+ %a2 = bitcast i64 %a1 to double
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i64 %a to double
+ br label %end
+
+end:
+ %phi = phi double [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret double %phi
+}
+
+define i64 @v_bitcast_f64_to_i64(double %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f64_to_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f64_to_i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f64_to_i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f64_to_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB1_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB1_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd double %a, 1.000000e+00
+ %a2 = bitcast double %a1 to i64
+ br label %end
+
+cmp.false:
+ %a3 = bitcast double %a to i64
+ br label %end
+
+end:
+ %phi = phi i64 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i64 %phi
+}
+
+define <2 x i32> @v_bitcast_i64_to_v2i32(i64 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i64_to_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB2_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB2_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i64_to_v2i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i64_to_v2i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i64_to_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i64 %a, 3
+ %a2 = bitcast i64 %a1 to <2 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i64 %a to <2 x i32>
+ br label %end
+
+end:
+ %phi = phi <2 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i32> %phi
+}
+
+define i64 @v_bitcast_v2i32_to_i64(<2 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i32_to_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB3_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB3_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i32_to_i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i32_to_i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i32_to_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i32> %a, splat (i32 3)
+ %a2 = bitcast <2 x i32> %a1 to i64
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i32> %a to i64
+ br label %end
+
+end:
+ %phi = phi i64 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i64 %phi
+}
+
+define <2 x float> @v_bitcast_i64_to_v2f32(i64 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i64_to_v2f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB4_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GCN-NEXT: .LBB4_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i64_to_v2f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i64_to_v2f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i64_to_v2f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i64 %a, 3
+ %a2 = bitcast i64 %a1 to <2 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i64 %a to <2 x float>
+ br label %end
+
+end:
+ %phi = phi <2 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x float> %phi
+}
+
+define i64 @v_bitcast_v2f32_to_i64(<2 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f32_to_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB5_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB5_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f32_to_i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f32_to_i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f32_to_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <2 x float> %a1 to i64
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x float> %a to i64
+ br label %end
+
+end:
+ %phi = phi i64 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i64 %phi
+}
+
+define <4 x i16> @v_bitcast_i64_to_v4i16(i64 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i64_to_v4i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v4, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v1, v4, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: .LBB6_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB6_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_addc_u32_e32 v4, vcc, 0, v4, vcc
+; GCN-NEXT: v_alignbit_b32 v1, v4, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: .LBB6_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v2, v4
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i64_to_v4i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i64_to_v4i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i64_to_v4i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i64 %a, 3
+ %a2 = bitcast i64 %a1 to <4 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i64 %a to <4 x i16>
+ br label %end
+
+end:
+ %phi = phi <4 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i16> %phi
+}
+
+define i64 @v_bitcast_v4i16_to_i64(<4 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i16_to_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB7_4
+; GCN-NEXT: .LBB7_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB7_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v2
+; GCN-NEXT: v_or_b32_e32 v0, v0, v4
+; GCN-NEXT: v_or_b32_e32 v1, v1, v3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB7_2
+; GCN-NEXT: .LBB7_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v2
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i16_to_i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v3, 3
+; VI-NEXT: v_add_u16_e32 v2, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v2, v1
+; VI-NEXT: v_add_u16_e32 v2, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v2, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i16_to_i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i16_to_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i16> %a, splat (i16 3)
+ %a2 = bitcast <4 x i16> %a1 to i64
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i16> %a to i64
+ br label %end
+
+end:
+ %phi = phi i64 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i64 %phi
+}
+
+define <4 x half> @v_bitcast_i64_to_v4f16(i64 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i64_to_v4f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB8_4
+; GCN-NEXT: .LBB8_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB8_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB8_2
+; GCN-NEXT: .LBB8_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i64_to_v4f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i64_to_v4f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i64_to_v4f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i64 %a, 3
+ %a2 = bitcast i64 %a1 to <4 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i64 %a to <4 x half>
+ br label %end
+
+end:
+ %phi = phi <4 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x half> %phi
+}
+
+define i64 @v_bitcast_v4f16_to_i64(<4 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f16_to_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB9_4
+; GCN-NEXT: .LBB9_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB9_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v1
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB9_2
+; GCN-NEXT: .LBB9_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f16_to_i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 0x200
+; VI-NEXT: v_add_f16_sdwa v3, v1, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v2, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v3
+; VI-NEXT: v_or_b32_e32 v0, v0, v2
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f16_to_i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f16_to_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <4 x half> %a1 to i64
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x half> %a to i64
+ br label %end
+
+end:
+ %phi = phi i64 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i64 %phi
+}
+
+define <4 x bfloat> @v_bitcast_i64_to_v4bf16(i64 %a, i32 %b) {
+; GCN-LABEL: v_bitcast_i64_to_v4bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB10_4
+; GCN-NEXT: .LBB10_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB10_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB10_2
+; GCN-NEXT: .LBB10_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v4
+; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_i64_to_v4bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_i64_to_v4bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 3, v0
+; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_i64_to_v4bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, 3
+; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add i64 %a, 3
+ %a2 = bitcast i64 %a1 to <4 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast i64 %a to <4 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <4 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x bfloat> %phi
+}
+
+define i64 @v_bitcast_v4bf16_to_i64(<4 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4bf16_to_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB11_4
+; GCN-NEXT: .LBB11_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB11_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB11_2
+; GCN-NEXT: .LBB11_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v5
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4bf16_to_i64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB11_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v1, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v1
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v0
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; VI-NEXT: .LBB11_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4bf16_to_i64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB11_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; GFX9-NEXT: v_perm_b32 v1, v2, v1, s7
+; GFX9-NEXT: .LBB11_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4bf16_to_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB11_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v3, 0x40c00000, v3 :: v_dual_lshlrev_b32 v2, 16, v0
+; GFX11-NEXT: v_dual_add_f32 v2, 0x40c00000, v2 :: v_dual_lshlrev_b32 v1, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v8, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_bfe_u32 v4, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_add3_u32 v8, v8, v3, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v4, v4, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v6, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v7, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v9, v9, v1, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v5, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v6, v6, v0, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v6, v7, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v8, v10, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v9, v4, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v2, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_perm_b32 v1, v3, v1, 0x7060302
+; GFX11-NEXT: .LBB11_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <4 x bfloat> %a1 to i64
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x bfloat> %a to i64
+ br label %end
+
+end:
+ %phi = phi i64 [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret i64 %phi
+}
+
+define <2 x i32> @v_bitcast_f64_to_v2i32(double %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f64_to_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB12_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB12_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f64_to_v2i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f64_to_v2i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f64_to_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB12_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB12_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd double %a, 1.000000e+00
+ %a2 = bitcast double %a1 to <2 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast double %a to <2 x i32>
+ br label %end
+
+end:
+ %phi = phi <2 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i32> %phi
+}
+
+define double @v_bitcast_v2i32_to_f64(<2 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i32_to_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB13_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB13_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i32_to_f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i32_to_f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i32_to_f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i32> %a, splat (i32 3)
+ %a2 = bitcast <2 x i32> %a1 to double
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i32> %a to double
+ br label %end
+
+end:
+ %phi = phi double [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret double %phi
+}
+
+define <2 x float> @v_bitcast_f64_to_v2f32(double %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f64_to_v2f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB14_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: .LBB14_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f64_to_v2f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f64_to_v2f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f64_to_v2f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB14_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB14_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd double %a, 1.000000e+00
+ %a2 = bitcast double %a1 to <2 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast double %a to <2 x float>
+ br label %end
+
+end:
+ %phi = phi <2 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x float> %phi
+}
+
+define double @v_bitcast_v2f32_to_f64(<2 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f32_to_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB15_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB15_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f32_to_f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f32_to_f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f32_to_f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <2 x float> %a1 to double
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x float> %a to double
+ br label %end
+
+end:
+ %phi = phi double [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret double %phi
+}
+
+define <4 x i16> @v_bitcast_f64_to_v4i16(double %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f64_to_v4i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v1, v5, v4, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v5
+; GCN-NEXT: .LBB16_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB16_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[4:5], v[4:5], 1.0
+; GCN-NEXT: v_alignbit_b32 v1, v5, v4, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v5
+; GCN-NEXT: .LBB16_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v4
+; GCN-NEXT: v_mov_b32_e32 v2, v5
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f64_to_v4i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f64_to_v4i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f64_to_v4i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB16_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB16_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd double %a, 1.000000e+00
+ %a2 = bitcast double %a1 to <4 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast double %a to <4 x i16>
+ br label %end
+
+end:
+ %phi = phi <4 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i16> %phi
+}
+
+define double @v_bitcast_v4i16_to_f64(<4 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i16_to_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB17_4
+; GCN-NEXT: .LBB17_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB17_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v2
+; GCN-NEXT: v_or_b32_e32 v0, v0, v4
+; GCN-NEXT: v_or_b32_e32 v1, v1, v3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB17_2
+; GCN-NEXT: .LBB17_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v2
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i16_to_f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v3, 3
+; VI-NEXT: v_add_u16_e32 v2, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v2, v1
+; VI-NEXT: v_add_u16_e32 v2, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v2, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i16_to_f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i16_to_f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i16> %a, splat (i16 3)
+ %a2 = bitcast <4 x i16> %a1 to double
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i16> %a to double
+ br label %end
+
+end:
+ %phi = phi double [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret double %phi
+}
+
+define <4 x half> @v_bitcast_f64_to_v4f16(double %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f64_to_v4f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v0
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: .LBB18_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB18_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v4, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v5, v5
+; GCN-NEXT: .LBB18_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v4
+; GCN-NEXT: v_mov_b32_e32 v1, v5
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f64_to_v4f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f64_to_v4f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f64_to_v4f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB18_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB18_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd double %a, 1.000000e+00
+ %a2 = bitcast double %a1 to <4 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast double %a to <4 x half>
+ br label %end
+
+end:
+ %phi = phi <4 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x half> %phi
+}
+
+define double @v_bitcast_v4f16_to_f64(<4 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f16_to_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB19_4
+; GCN-NEXT: .LBB19_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB19_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v1
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB19_2
+; GCN-NEXT: .LBB19_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f16_to_f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 0x200
+; VI-NEXT: v_add_f16_sdwa v3, v1, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v2, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v3
+; VI-NEXT: v_or_b32_e32 v0, v0, v2
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f16_to_f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f16_to_f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <4 x half> %a1 to double
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x half> %a to double
+ br label %end
+
+end:
+ %phi = phi double [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret double %phi
+}
+
+define <4 x bfloat> @v_bitcast_f64_to_v4bf16(double %a, i32 %b) {
+; GCN-LABEL: v_bitcast_f64_to_v4bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB20_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v0
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: .LBB20_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB20_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v5, 16, v0
+; GCN-NEXT: .LBB20_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v0, v5
+; GCN-NEXT: v_mov_b32_e32 v1, v4
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_f64_to_v4bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_f64_to_v4bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_f64_to_v4bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB20_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_f64 v[0:1], v[0:1], 1.0
+; GFX11-NEXT: .LBB20_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd double %a, 1.000000e+00
+ %a2 = bitcast double %a1 to <4 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast double %a to <4 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <4 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x bfloat> %phi
+}
+
+define double @v_bitcast_v4bf16_to_f64(<4 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4bf16_to_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB21_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB21_4
+; GCN-NEXT: .LBB21_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB21_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB21_2
+; GCN-NEXT: .LBB21_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v5
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4bf16_to_f64:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB21_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v1, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v1
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v0
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; VI-NEXT: .LBB21_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4bf16_to_f64:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB21_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; GFX9-NEXT: v_perm_b32 v1, v2, v1, s7
+; GFX9-NEXT: .LBB21_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4bf16_to_f64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB21_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v3, 0x40c00000, v3 :: v_dual_lshlrev_b32 v2, 16, v0
+; GFX11-NEXT: v_dual_add_f32 v2, 0x40c00000, v2 :: v_dual_lshlrev_b32 v1, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v8, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_bfe_u32 v4, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_add3_u32 v8, v8, v3, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v4, v4, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v6, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v7, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v9, v9, v1, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v5, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v6, v6, v0, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v6, v7, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v8, v10, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v9, v4, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v2, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_perm_b32 v1, v3, v1, 0x7060302
+; GFX11-NEXT: .LBB21_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <4 x bfloat> %a1 to double
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x bfloat> %a to double
+ br label %end
+
+end:
+ %phi = phi double [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret double %phi
+}
+
+define <2 x float> @v_bitcast_v2i32_to_v2f32(<2 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i32_to_v2f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB22_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB22_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i32_to_v2f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i32_to_v2f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i32_to_v2f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i32> %a, splat (i32 3)
+ %a2 = bitcast <2 x i32> %a1 to <2 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i32> %a to <2 x float>
+ br label %end
+
+end:
+ %phi = phi <2 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x float> %phi
+}
+
+define <2 x i32> @v_bitcast_v2f32_to_v2i32(<2 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f32_to_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB23_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB23_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f32_to_v2i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f32_to_v2i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f32_to_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <2 x float> %a1 to <2 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x float> %a to <2 x i32>
+ br label %end
+
+end:
+ %phi = phi <2 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i32> %phi
+}
+
+define <4 x i16> @v_bitcast_v2i32_to_v4i16(<2 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i32_to_v4i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v4, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v1, v4, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: .LBB24_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB24_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v4, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: v_alignbit_b32 v1, v4, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: .LBB24_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v2, v4
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i32_to_v4i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i32_to_v4i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i32_to_v4i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i32> %a, splat (i32 3)
+ %a2 = bitcast <2 x i32> %a1 to <4 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i32> %a to <4 x i16>
+ br label %end
+
+end:
+ %phi = phi <4 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i16> %phi
+}
+
+define <2 x i32> @v_bitcast_v4i16_to_v2i32(<4 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i16_to_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB25_4
+; GCN-NEXT: .LBB25_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB25_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v2
+; GCN-NEXT: v_or_b32_e32 v0, v0, v4
+; GCN-NEXT: v_or_b32_e32 v1, v1, v3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB25_2
+; GCN-NEXT: .LBB25_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v2
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i16_to_v2i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v3, 3
+; VI-NEXT: v_add_u16_e32 v2, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v2, v1
+; VI-NEXT: v_add_u16_e32 v2, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v2, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i16_to_v2i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i16_to_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i16> %a, splat (i16 3)
+ %a2 = bitcast <4 x i16> %a1 to <2 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i16> %a to <2 x i32>
+ br label %end
+
+end:
+ %phi = phi <2 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i32> %phi
+}
+
+define <4 x half> @v_bitcast_v2i32_to_v4f16(<2 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i32_to_v4f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB26_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB26_4
+; GCN-NEXT: .LBB26_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB26_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB26_2
+; GCN-NEXT: .LBB26_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i32_to_v4f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i32_to_v4f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i32_to_v4f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i32> %a, splat (i32 3)
+ %a2 = bitcast <2 x i32> %a1 to <4 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i32> %a to <4 x half>
+ br label %end
+
+end:
+ %phi = phi <4 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x half> %phi
+}
+
+define <2 x i32> @v_bitcast_v4f16_to_v2i32(<4 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f16_to_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB27_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB27_4
+; GCN-NEXT: .LBB27_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB27_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v1
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB27_2
+; GCN-NEXT: .LBB27_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f16_to_v2i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 0x200
+; VI-NEXT: v_add_f16_sdwa v3, v1, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v2, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v3
+; VI-NEXT: v_or_b32_e32 v0, v0, v2
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f16_to_v2i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f16_to_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <4 x half> %a1 to <2 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x half> %a to <2 x i32>
+ br label %end
+
+end:
+ %phi = phi <2 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i32> %phi
+}
+
+define <4 x bfloat> @v_bitcast_v2i32_to_v4bf16(<2 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2i32_to_v4bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB28_4
+; GCN-NEXT: .LBB28_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB28_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB28_2
+; GCN-NEXT: .LBB28_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v4
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v5
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2i32_to_v4bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2i32_to_v4bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2i32_to_v4bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <2 x i32> %a, splat (i32 3)
+ %a2 = bitcast <2 x i32> %a1 to <4 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x i32> %a to <4 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <4 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x bfloat> %phi
+}
+
+define <2 x i32> @v_bitcast_v4bf16_to_v2i32(<4 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4bf16_to_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB29_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB29_4
+; GCN-NEXT: .LBB29_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB29_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB29_2
+; GCN-NEXT: .LBB29_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v5
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4bf16_to_v2i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB29_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v1, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v1
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v0
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; VI-NEXT: .LBB29_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4bf16_to_v2i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB29_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; GFX9-NEXT: v_perm_b32 v1, v2, v1, s7
+; GFX9-NEXT: .LBB29_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4bf16_to_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB29_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v3, 0x40c00000, v3 :: v_dual_lshlrev_b32 v2, 16, v0
+; GFX11-NEXT: v_dual_add_f32 v2, 0x40c00000, v2 :: v_dual_lshlrev_b32 v1, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v8, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_bfe_u32 v4, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_add3_u32 v8, v8, v3, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v4, v4, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v6, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v7, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v9, v9, v1, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v5, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v6, v6, v0, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v6, v7, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v8, v10, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v9, v4, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v2, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_perm_b32 v1, v3, v1, 0x7060302
+; GFX11-NEXT: .LBB29_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <4 x bfloat> %a1 to <2 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x bfloat> %a to <2 x i32>
+ br label %end
+
+end:
+ %phi = phi <2 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x i32> %phi
+}
+
+define <4 x i16> @v_bitcast_v2f32_to_v4i16(<2 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f32_to_v4i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v4, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_2
+; GCN-NEXT: ; %bb.1: ; %cmp.false
+; GCN-NEXT: v_alignbit_b32 v1, v4, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: .LBB30_2: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB30_4
+; GCN-NEXT: ; %bb.3: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v4, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: v_alignbit_b32 v1, v4, v0, 16
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: .LBB30_4: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: v_mov_b32_e32 v2, v4
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f32_to_v4i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f32_to_v4i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f32_to_v4i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <2 x float> %a1 to <4 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x float> %a to <4 x i16>
+ br label %end
+
+end:
+ %phi = phi <4 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i16> %phi
+}
+
+define <2 x float> @v_bitcast_v4i16_to_v2f32(<4 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i16_to_v2f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB31_4
+; GCN-NEXT: .LBB31_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB31_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v2
+; GCN-NEXT: v_or_b32_e32 v0, v0, v4
+; GCN-NEXT: v_or_b32_e32 v1, v1, v3
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB31_2
+; GCN-NEXT: .LBB31_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v2
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 0x30000, v1
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i16_to_v2f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v3, 3
+; VI-NEXT: v_add_u16_e32 v2, 3, v1
+; VI-NEXT: v_add_u16_sdwa v1, v1, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v2, v1
+; VI-NEXT: v_add_u16_e32 v2, 3, v0
+; VI-NEXT: v_add_u16_sdwa v0, v0, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v0, v2, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i16_to_v2f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i16_to_v2f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i16> %a, splat (i16 3)
+ %a2 = bitcast <4 x i16> %a1 to <2 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i16> %a to <2 x float>
+ br label %end
+
+end:
+ %phi = phi <2 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x float> %phi
+}
+
+define <4 x half> @v_bitcast_v2f32_to_v4f16(<2 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f32_to_v4f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB32_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB32_4
+; GCN-NEXT: .LBB32_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB32_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB32_2
+; GCN-NEXT: .LBB32_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f32_to_v4f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f32_to_v4f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f32_to_v4f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <2 x float> %a1 to <4 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x float> %a to <4 x half>
+ br label %end
+
+end:
+ %phi = phi <4 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x half> %phi
+}
+
+define <2 x float> @v_bitcast_v4f16_to_v2f32(<4 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f16_to_v2f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB33_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB33_4
+; GCN-NEXT: .LBB33_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB33_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v4, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v1
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB33_2
+; GCN-NEXT: .LBB33_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v4
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v1, v0
+; GCN-NEXT: v_or_b32_e32 v1, v2, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f16_to_v2f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 0x200
+; VI-NEXT: v_add_f16_sdwa v3, v1, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v1, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v2, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v0, 0x200, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v3
+; VI-NEXT: v_or_b32_e32 v0, v0, v2
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f16_to_v2f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f16_to_v2f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <4 x half> %a1 to <2 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x half> %a to <2 x float>
+ br label %end
+
+end:
+ %phi = phi <2 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x float> %phi
+}
+
+define <4 x bfloat> @v_bitcast_v2f32_to_v4bf16(<2 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v2f32_to_v4bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v5, v1
+; GCN-NEXT: v_mov_b32_e32 v4, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB34_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB34_4
+; GCN-NEXT: .LBB34_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB34_3: ; %cmp.false
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v5
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v4
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB34_2
+; GCN-NEXT: .LBB34_4: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v4
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v5
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v2f32_to_v4bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v2f32_to_v4bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v2f32_to_v4bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v1, 1.0, v1 :: v_dual_add_f32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <2 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <2 x float> %a1 to <4 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <2 x float> %a to <4 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <4 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x bfloat> %phi
+}
+
+define <2 x float> @v_bitcast_v4bf16_to_v2f32(<4 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4bf16_to_v2f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v3, 1.0, v3
+; GCN-NEXT: v_mul_f32_e32 v2, 1.0, v2
+; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB35_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB35_4
+; GCN-NEXT: .LBB35_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB35_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v0, v4, 16
+; GCN-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB35_2
+; GCN-NEXT: .LBB35_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v4
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v5
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v2
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v3
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
+; GCN-NEXT: v_alignbit_b32 v1, v3, v2, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4bf16_to_v2f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB35_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v1, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v1
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v2, 16
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v0, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v0
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; VI-NEXT: .LBB35_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4bf16_to_v2f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB35_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; GFX9-NEXT: s_mov_b32 s7, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v2, s7
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; GFX9-NEXT: v_perm_b32 v1, v2, v1, s7
+; GFX9-NEXT: .LBB35_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4bf16_to_v2f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB35_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_and_b32_e32 v3, 0xffff0000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v3, 0x40c00000, v3 :: v_dual_lshlrev_b32 v2, 16, v0
+; GFX11-NEXT: v_dual_add_f32 v2, 0x40c00000, v2 :: v_dual_lshlrev_b32 v1, 16, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v8, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v3
+; GFX11-NEXT: v_bfe_u32 v4, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: v_add3_u32 v8, v8, v3, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: v_add3_u32 v4, v4, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v1, 0x40c00000, v1 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v9, v1, 16, 1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_bfe_u32 v6, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v7, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v9, v9, v1, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v5, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_add3_u32 v6, v6, v0, 0x7fff
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v6, v7, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v8, v10, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v9, v4, vcc_lo
+; GFX11-NEXT: v_perm_b32 v0, v0, v2, 0x7060302
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_perm_b32 v1, v3, v1, 0x7060302
+; GFX11-NEXT: .LBB35_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <4 x bfloat> %a1 to <2 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x bfloat> %a to <2 x float>
+ br label %end
+
+end:
+ %phi = phi <2 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <2 x float> %phi
+}
+
+define <4 x half> @v_bitcast_v4i16_to_v4f16(<4 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i16_to_v4f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v8, v3
+; GCN-NEXT: v_mov_b32_e32 v5, v2
+; GCN-NEXT: v_mov_b32_e32 v6, v1
+; GCN-NEXT: v_mov_b32_e32 v7, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB36_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB36_4
+; GCN-NEXT: .LBB36_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB36_3: ; %cmp.false
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v8
+; GCN-NEXT: ; implicit-def: $vgpr8
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB36_2
+; GCN-NEXT: .LBB36_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v3, vcc, 3, v8
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v5
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i16_to_v4f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 3
+; VI-NEXT: v_add_u16_sdwa v3, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v2, v1, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v2
+; VI-NEXT: v_or_b32_e32 v0, v0, v3
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i16_to_v4f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i16_to_v4f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i16> %a, splat (i16 3)
+ %a2 = bitcast <4 x i16> %a1 to <4 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i16> %a to <4 x half>
+ br label %end
+
+end:
+ %phi = phi <4 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x half> %phi
+}
+
+define <4 x i16> @v_bitcast_v4f16_to_v4i16(<4 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f16_to_v4i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB37_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v1, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v4, 16, v3
+; GCN-NEXT: v_or_b32_e32 v0, v0, v1
+; GCN-NEXT: v_or_b32_e32 v2, v2, v4
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: .LBB37_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f16_to_v4i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v3, 0x200
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v4, v1
+; VI-NEXT: v_or_b32_e32 v0, v2, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f16_to_v4i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f16_to_v4i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <4 x half> %a1 to <4 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x half> %a to <4 x i16>
+ br label %end
+
+end:
+ %phi = phi <4 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i16> %phi
+}
+
+define <4 x bfloat> @v_bitcast_v4i16_to_v4bf16(<4 x i16> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4i16_to_v4bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mov_b32_e32 v6, v2
+; GCN-NEXT: v_mov_b32_e32 v5, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v3
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB38_4
+; GCN-NEXT: .LBB38_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB38_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v6
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB38_2
+; GCN-NEXT: .LBB38_4: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v6
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v5
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
+; GCN-NEXT: v_or_b32_e32 v0, v3, v0
+; GCN-NEXT: v_or_b32_e32 v1, v1, v2
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 0x30000, v0
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 0x30000, v1
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v0
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v2
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4i16_to_v4bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v2, 3
+; VI-NEXT: v_add_u16_sdwa v3, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_sdwa v2, v1, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_u16_e32 v1, 3, v1
+; VI-NEXT: v_add_u16_e32 v0, 3, v0
+; VI-NEXT: v_or_b32_e32 v1, v1, v2
+; VI-NEXT: v_or_b32_e32 v0, v0, v3
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4i16_to_v4bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4i16_to_v4bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_u16 v1, v1, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: v_pk_add_u16 v0, v0, 3 op_sel_hi:[1,0]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <4 x i16> %a, splat (i16 3)
+ %a2 = bitcast <4 x i16> %a1 to <4 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x i16> %a to <4 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <4 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x bfloat> %phi
+}
+
+define <4 x i16> @v_bitcast_v4bf16_to_v4i16(<4 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4bf16_to_v4i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_mul_f32_e32 v7, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v6, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v3
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB39_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB39_4
+; GCN-NEXT: .LBB39_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB39_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v7
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v4
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB39_2
+; GCN-NEXT: .LBB39_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v7
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v6
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v5
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v3
+; GCN-NEXT: v_alignbit_b32 v0, v4, v0, 16
+; GCN-NEXT: v_alignbit_b32 v2, v3, v2, 16
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; GCN-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4bf16_to_v4i16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB39_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v0
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v3, 16, v1
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_bfe_u32 v4, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v4, vcc, v4, v3
+; VI-NEXT: v_add_u32_e32 v4, vcc, s6, v4
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v5, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v3, v4, v5, vcc
+; VI-NEXT: v_bfe_u32 v4, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v4, vcc, v4, v1
+; VI-NEXT: v_add_u32_e32 v4, vcc, 0x7fff, v4
+; VI-NEXT: v_or_b32_e32 v5, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v4, v5, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v3, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; VI-NEXT: .LBB39_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4bf16_to_v4i16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB39_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_bfe_u32 v4, v3, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v4, v4, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v5, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v4, v5, vcc
+; GFX9-NEXT: v_bfe_u32 v4, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v4, v4, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v5, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v5, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v3, s6
+; GFX9-NEXT: v_perm_b32 v1, v2, v1, s6
+; GFX9-NEXT: .LBB39_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4bf16_to_v4i16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB39_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v3, 16, v0
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v3, 0x40c00000, v3 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v7, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v7, v7, v3, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX11-NEXT: v_add3_u32 v9, v9, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v2, 0x40c00000, v2 :: v_dual_add_f32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_bfe_u32 v4, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v6, v1, 16, 1
+; GFX11-NEXT: v_add3_u32 v4, v4, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add3_u32 v6, v6, v1, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v5, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v7, v8, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v3, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v6, v4, vcc_lo
+; GFX11-NEXT: v_perm_b32 v1, v2, v1, 0x7060302
+; GFX11-NEXT: .LBB39_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <4 x bfloat> %a1 to <4 x i16>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x bfloat> %a to <4 x i16>
+ br label %end
+
+end:
+ %phi = phi <4 x i16> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x i16> %phi
+}
+
+define <4 x bfloat> @v_bitcast_v4f16_to_v4bf16(<4 x half> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4f16_to_v4bf16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v0
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v6, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v7, v3
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB40_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB40_4
+; GCN-NEXT: .LBB40_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB40_3: ; %cmp.false
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v5
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v6
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v7
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB40_2
+; GCN-NEXT: .LBB40_4: ; %cmp.true
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v6
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x38000000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x38000000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x38000000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x38000000, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v3, v3
+; GCN-NEXT: v_cvt_f16_f32_e32 v2, v2
+; GCN-NEXT: v_cvt_f16_f32_e32 v4, v1
+; GCN-NEXT: v_cvt_f16_f32_e32 v5, v0
+; GCN-NEXT: v_lshlrev_b32_e32 v0, 16, v3
+; GCN-NEXT: v_lshlrev_b32_e32 v1, 16, v2
+; GCN-NEXT: v_lshlrev_b32_e32 v2, 16, v4
+; GCN-NEXT: v_lshlrev_b32_e32 v3, 16, v5
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4f16_to_v4bf16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_mov_b32_e32 v3, 0x200
+; VI-NEXT: v_add_f16_e32 v2, 0x200, v0
+; VI-NEXT: v_add_f16_sdwa v0, v0, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_add_f16_e32 v4, 0x200, v1
+; VI-NEXT: v_add_f16_sdwa v1, v1, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; VI-NEXT: v_or_b32_e32 v1, v4, v1
+; VI-NEXT: v_or_b32_e32 v0, v2, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4f16_to_v4bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: s_movk_i32 s6, 0x200
+; GFX9-NEXT: v_pk_add_f16 v1, v1, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: v_pk_add_f16 v0, v0, s6 op_sel_hi:[1,0]
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4f16_to_v4bf16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_pk_add_f16 v1, 0x200, v1 op_sel_hi:[0,1]
+; GFX11-NEXT: v_pk_add_f16 v0, 0x200, v0 op_sel_hi:[0,1]
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x half> %a, splat (half 0xH0200)
+ %a2 = bitcast <4 x half> %a1 to <4 x bfloat>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x half> %a to <4 x bfloat>
+ br label %end
+
+end:
+ %phi = phi <4 x bfloat> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x bfloat> %phi
+}
+
+define <4 x half> @v_bitcast_v4bf16_to_v4f16(<4 x bfloat> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v4bf16_to_v4f16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
+; GCN-NEXT: v_mul_f32_e32 v4, 1.0, v0
+; GCN-NEXT: v_mul_f32_e32 v5, 1.0, v1
+; GCN-NEXT: v_mul_f32_e32 v6, 1.0, v2
+; GCN-NEXT: v_mul_f32_e32 v7, 1.0, v3
+; GCN-NEXT: ; implicit-def: $vgpr0
+; GCN-NEXT: ; implicit-def: $vgpr1
+; GCN-NEXT: ; implicit-def: $vgpr2
+; GCN-NEXT: ; implicit-def: $vgpr3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB41_3
+; GCN-NEXT: ; %bb.1: ; %Flow
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execnz .LBB41_4
+; GCN-NEXT: .LBB41_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+; GCN-NEXT: .LBB41_3: ; %cmp.false
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v4
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v5
+; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v6
+; GCN-NEXT: v_lshrrev_b32_e32 v3, 16, v7
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v3
+; GCN-NEXT: ; implicit-def: $vgpr7
+; GCN-NEXT: ; implicit-def: $vgpr6
+; GCN-NEXT: ; implicit-def: $vgpr5
+; GCN-NEXT: ; implicit-def: $vgpr4
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB41_2
+; GCN-NEXT: .LBB41_4: ; %cmp.true
+; GCN-NEXT: v_and_b32_e32 v0, 0xffff0000, v7
+; GCN-NEXT: v_and_b32_e32 v1, 0xffff0000, v6
+; GCN-NEXT: v_and_b32_e32 v2, 0xffff0000, v5
+; GCN-NEXT: v_and_b32_e32 v3, 0xffff0000, v4
+; GCN-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GCN-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GCN-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GCN-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GCN-NEXT: v_lshrrev_b32_e32 v4, 16, v0
+; GCN-NEXT: v_lshrrev_b32_e32 v5, 16, v1
+; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v2
+; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v3
+; GCN-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GCN-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GCN-NEXT: v_cvt_f32_f16_e32 v2, v5
+; GCN-NEXT: v_cvt_f32_f16_e32 v3, v4
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v4bf16_to_v4f16:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: s_cbranch_execz .LBB41_2
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v0
+; VI-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; VI-NEXT: v_bfe_u32 v3, v2, 16, 1
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v2
+; VI-NEXT: v_add_u32_e32 v3, vcc, 0x7fff, v3
+; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; VI-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; VI-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; VI-NEXT: v_bfe_u32 v3, v0, 16, 1
+; VI-NEXT: s_movk_i32 s6, 0x7fff
+; VI-NEXT: v_add_u32_e32 v3, vcc, v3, v0
+; VI-NEXT: v_add_u32_e32 v3, vcc, s6, v3
+; VI-NEXT: v_or_b32_e32 v4, 0x400000, v0
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT: v_cndmask_b32_e32 v0, v3, v4, vcc
+; VI-NEXT: v_lshlrev_b32_e32 v3, 16, v1
+; VI-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; VI-NEXT: v_bfe_u32 v4, v3, 16, 1
+; VI-NEXT: v_add_u32_e32 v4, vcc, v4, v3
+; VI-NEXT: v_add_u32_e32 v4, vcc, s6, v4
+; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
+; VI-NEXT: v_or_b32_e32 v5, 0x400000, v3
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; VI-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; VI-NEXT: v_cndmask_b32_e32 v3, v4, v5, vcc
+; VI-NEXT: v_bfe_u32 v4, v1, 16, 1
+; VI-NEXT: v_add_u32_e32 v4, vcc, v4, v1
+; VI-NEXT: v_add_u32_e32 v4, vcc, 0x7fff, v4
+; VI-NEXT: v_or_b32_e32 v5, 0x400000, v1
+; VI-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; VI-NEXT: v_cndmask_b32_e32 v1, v4, v5, vcc
+; VI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
+; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
+; VI-NEXT: v_alignbit_b32 v1, v1, v3, 16
+; VI-NEXT: v_alignbit_b32 v0, v0, v2, 16
+; VI-NEXT: .LBB41_2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v4bf16_to_v4f16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: s_cbranch_execz .LBB41_2
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX9-NEXT: v_add_f32_e32 v2, 0x40c00000, v2
+; GFX9-NEXT: v_bfe_u32 v3, v2, 16, 1
+; GFX9-NEXT: s_movk_i32 s6, 0x7fff
+; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX9-NEXT: v_add3_u32 v3, v3, v2, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v2
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v2, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 0x40c00000, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v4, vcc
+; GFX9-NEXT: v_bfe_u32 v3, v1, 16, 1
+; GFX9-NEXT: v_add3_u32 v3, v3, v1, s6
+; GFX9-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v4, vcc
+; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v0
+; GFX9-NEXT: v_add_f32_e32 v3, 0x40c00000, v3
+; GFX9-NEXT: v_bfe_u32 v4, v3, 16, 1
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: v_add3_u32 v4, v4, v3, s6
+; GFX9-NEXT: v_or_b32_e32 v5, 0x400000, v3
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v3, v3
+; GFX9-NEXT: v_add_f32_e32 v0, 0x40c00000, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v3, v4, v5, vcc
+; GFX9-NEXT: v_bfe_u32 v4, v0, 16, 1
+; GFX9-NEXT: v_add3_u32 v4, v4, v0, s6
+; GFX9-NEXT: v_or_b32_e32 v5, 0x400000, v0
+; GFX9-NEXT: v_cmp_u_f32_e32 vcc, v0, v0
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v5, vcc
+; GFX9-NEXT: s_mov_b32 s6, 0x7060302
+; GFX9-NEXT: v_perm_b32 v0, v0, v3, s6
+; GFX9-NEXT: v_perm_b32 v1, v2, v1, s6
+; GFX9-NEXT: .LBB41_2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v4bf16_to_v4f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v2
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: s_cbranch_execz .LBB41_2
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_lshlrev_b32_e32 v3, 16, v0
+; GFX11-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v3, 0x40c00000, v3 :: v_dual_add_f32 v0, 0x40c00000, v0
+; GFX11-NEXT: v_bfe_u32 v7, v3, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v8, 0x400000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v9, v0, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v10, 0x400000, v0
+; GFX11-NEXT: v_add3_u32 v7, v7, v3, 0x7fff
+; GFX11-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
+; GFX11-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX11-NEXT: v_add3_u32 v9, v9, v0, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_dual_add_f32 v2, 0x40c00000, v2 :: v_dual_add_f32 v1, 0x40c00000, v1
+; GFX11-NEXT: v_bfe_u32 v4, v2, 16, 1
+; GFX11-NEXT: v_or_b32_e32 v5, 0x400000, v2
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v2, v2
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-NEXT: v_bfe_u32 v6, v1, 16, 1
+; GFX11-NEXT: v_add3_u32 v4, v4, v2, 0x7fff
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT: v_add3_u32 v6, v6, v1, 0x7fff
+; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v5, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v3, v3
+; GFX11-NEXT: v_or_b32_e32 v4, 0x400000, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v3, v7, v8, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v9, v10, vcc_lo
+; GFX11-NEXT: v_cmp_u_f32_e32 vcc_lo, v1, v1
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_perm_b32 v0, v0, v3, 0x7060302
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v6, v4, vcc_lo
+; GFX11-NEXT: v_perm_b32 v1, v2, v1, 0x7060302
+; GFX11-NEXT: .LBB41_2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <4 x bfloat> %a, splat (bfloat 0xR40C0)
+ %a2 = bitcast <4 x bfloat> %a1 to <4 x half>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <4 x bfloat> %a to <4 x half>
+ br label %end
+
+end:
+ %phi = phi <4 x half> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <4 x half> %phi
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll
new file mode 100644
index 0000000000000..2b6d288880dff
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll
@@ -0,0 +1,163 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -mtriple=amdgcn -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=tonga -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=VI %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-codegenprepare-break-large-phis-threshold=4096 < %s | FileCheck -check-prefixes=GFX11 %s
+
+define <3 x float> @v_bitcast_v3i32_to_v3f32(<3 x i32> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3i32_to_v3f32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB0_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_i32_e32 v2, vcc, 3, v2
+; GCN-NEXT: v_add_i32_e32 v1, vcc, 3, v1
+; GCN-NEXT: v_add_i32_e32 v0, vcc, 3, v0
+; GCN-NEXT: .LBB0_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3i32_to_v3f32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_u32_e32 v2, vcc, 3, v2
+; VI-NEXT: v_add_u32_e32 v1, vcc, 3, v1
+; VI-NEXT: v_add_u32_e32 v0, vcc, 3, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3i32_to_v3f32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_u32_e32 v2, 3, v2
+; GFX9-NEXT: v_add_u32_e32 v1, 3, v1
+; GFX9-NEXT: v_add_u32_e32 v0, 3, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3i32_to_v3f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v3
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_add_nc_u32_e32 v2, 3, v2
+; GFX11-NEXT: v_add_nc_u32_e32 v1, 3, v1
+; GFX11-NEXT: v_add_nc_u32_e32 v0, 3, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = add <3 x i32> %a, splat (i32 3)
+ %a2 = bitcast <3 x i32> %a1 to <3 x float>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x i32> %a to <3 x float>
+ br label %end
+
+end:
+ %phi = phi <3 x float> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x float> %phi
+}
+
+define <3 x i32> @v_bitcast_v3f32_to_v3i32(<3 x float> %a, i32 %b) {
+; GCN-LABEL: v_bitcast_v3f32_to_v3i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3
+; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GCN-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GCN-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GCN-NEXT: s_cbranch_execz .LBB1_2
+; GCN-NEXT: ; %bb.1: ; %cmp.true
+; GCN-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GCN-NEXT: .LBB1_2: ; %end
+; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: v_bitcast_v3f32_to_v3i32:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3
+; VI-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; VI-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; VI-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; VI-NEXT: ; %bb.1: ; %cmp.true
+; VI-NEXT: v_add_f32_e32 v2, 1.0, v2
+; VI-NEXT: v_add_f32_e32 v1, 1.0, v1
+; VI-NEXT: v_add_f32_e32 v0, 1.0, v0
+; VI-NEXT: ; %bb.2: ; %end
+; VI-NEXT: s_or_b64 exec, exec, s[4:5]
+; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: v_bitcast_v3f32_to_v3i32:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3
+; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
+; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
+; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
+; GFX9-NEXT: ; %bb.1: ; %cmp.true
+; GFX9-NEXT: v_add_f32_e32 v2, 1.0, v2
+; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
+; GFX9-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX9-NEXT: ; %bb.2: ; %end
+; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: v_bitcast_v3f32_to_v3i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_mov_b32 s0, exec_lo
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v3
+; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
+; GFX11-NEXT: ; %bb.1: ; %cmp.true
+; GFX11-NEXT: v_dual_add_f32 v2, 1.0, v2 :: v_dual_add_f32 v1, 1.0, v1
+; GFX11-NEXT: v_add_f32_e32 v0, 1.0, v0
+; GFX11-NEXT: ; %bb.2: ; %end
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %cmp = icmp eq i32 %b, 0
+ br i1 %cmp, label %cmp.true, label %cmp.false
+
+cmp.true:
+ %a1 = fadd <3 x float> %a, splat (float 1.000000e+00)
+ %a2 = bitcast <3 x float> %a1 to <3 x i32>
+ br label %end
+
+cmp.false:
+ %a3 = bitcast <3 x float> %a to <3 x i32>
+ br label %end
+
+end:
+ %phi = phi <3 x i32> [ %a2, %cmp.true ], [ %a3, %cmp.false ]
+ ret <3 x i32> %phi
+}
diff --git a/llvm/test/lit.cfg.py b/llvm/test/lit.cfg.py
index aad7a088551b2..50921879cd1f2 100644
--- a/llvm/test/lit.cfg.py
+++ b/llvm/test/lit.cfg.py
@@ -466,7 +466,7 @@ def have_cxx_shared_library():
print("could not exec llvm-readobj")
return False
- readobj_out = readobj_cmd.stdout.read().decode("ascii")
+ readobj_out = readobj_cmd.stdout.read().decode("utf-8")
readobj_cmd.wait()
regex = re.compile(r"(libc\+\+|libstdc\+\+|msvcp).*\.(so|dylib|dll)")
More information about the llvm-commits
mailing list