<div dir="ltr"><pre>Hi Daniel,<br></pre><pre>The Clang with LLVM backend seems to be missing some opportunities to fill branch delay slots for the MIPS M14K processor.<br><br></pre><pre>Example:<br>static inline void LOCAL(encode_position)(VARS)
{
if (vars->is_mcs05) {
udi(vpu_find_first_neg_from_reduction_4_mcs05, 1, 0);
} else {
udi(vpu_find_first_neg_from_reduction_4, 1, 0);
}
}
is giving
bfc03394: 12800004 beqz s4,bfc033a8 <rx4_eostf+0x4a4>
bfc03398: 00000000 nop
bfc0339c: 70016450 udi VPU,vpu_find_first_neg_from_reduction_4_mcs05,1,zero
bfc033a0: 0bf00ceb j bfc033ac <rx4_eostf+0x4a8>
bfc033a4: 00000000 nop
bfc033a8: 70016350 udi VPU,vpu_find_first_neg_from_reduction_4,1,zero
bfc033ac: 24012200 li at,8704
bfc033b0: 72201bd0 udi VPU,wait_req_pending,1,s1,0,0
I think the following is possible
beqz s4,1f
li at,8704 ; **DELAY SLOT**
j 2f
udi VPU,vpu_find_first_neg_from_reduction_4_mcs05,1,zero ; **DELAY SLOT**
1:
udi VPU,vpu_find_first_neg_from_reduction_4,1,zero
2:
udi VPU,wait_req_pending,1,s1,0,0<br></pre><pre><br>Is there something I could try to exploit delay slots in branch instructions or are there any recent patch which deals with this?<br><br></pre><pre>Thanks,<br></pre><pre>Ambuj<br></pre><pre><br></pre></div>