<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
With so many test changes, it's a little difficult to see exactly what<br>
defect this is trying to fix. Can you give a simple example of code<br>
that breaks currently? Or is suboptimal?<br>
<div><br></div></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The C test case changes can show this clearly. Previously, all the test case only pass constant 0 or 1 as the lane argument. Now for a single ACLE intrinsic function, we have two test cases, one is to check upbound lane number, and the other is check 0. For example,</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default"><div class="gmail_default"><div class="gmail_default"><div class="gmail_default"><font face="arial, helvetica, sans-serif"> int16x4_t test_vmla_laneq_s16(int16x4_t a, int16x4_t b, int16x8_t v) {</font></div>
<div class="gmail_default"><font face="arial, helvetica, sans-serif"> // CHECK: test_vmla_laneq_s16</font></div><div class="gmail_default"><font face="arial, helvetica, sans-serif">- return vmla_laneq_s16(a, b, v, 1);</font></div>
<div class="gmail_default"><font face="arial, helvetica, sans-serif">- // CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]</font></div><div class="gmail_default"><font face="arial, helvetica, sans-serif">+ return vmla_laneq_s16(a, b, v, 7);</font></div>
<div class="gmail_default"><font face="arial, helvetica, sans-serif">+ // CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[7]</font></div><div class="gmail_default"><font face="arial, helvetica, sans-serif"> }</font></div>
<div><br></div></div><div><div><div>+int16x4_t test_vmla_laneq_s16_0(int16x4_t a, int16x4_t b, int16x8_t v) {</div><div>+ // CHECK: test_vmla_laneq_s16_0</div><div>+ return vmla_laneq_s16(a, b, v, 0);</div><div>+ // CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[0]</div>
<div>+}</div></div></div><div><br></div><div>Without the patch, the function <span style="font-family:arial,helvetica,sans-serif">test_vmlaq_laneq_s16 would fail </span><span style="font-family:arial,helvetica,sans-serif">to generate </span>mla v0.4h, v1.4h, v2.h[7]<span style="font-family:arial,helvetica,sans-serif">, but two instructions,</span></div>
<div><br></div><div><div> dup d2, v2.d[1]</div><div> mla v0.4h, v1.4h, v2.h[3]</div></div><div><br></div><div><br></div></div></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>> The LowerVECTOR_SHUFFLE is to cover the following two cases,<br>
> 1) the 1st operand of VDUPlane is EXTRACT_SUBVECTOR<br>
<br>
</div>OK, I see why this one is useful: if the extract index is non-zero<br>
then instead of "INS, OP" we can fold the INS or whatever into the<br>
main operation.<br>
<div><br>
> 2) the 1st operand of VDUPlane is CONCAT_VECTORS, for which the 2nd operand<br>
> is UNDEF.<br>
<br>
</div>This one is more puzzling to me. I'd expect the version with the<br>
CONCAT to be more useful. Instructions accepting a NEON_VDUPLANE want<br>
an Rm with 128-bits don't they? If no such instruction exists to make<br>
use of it then there should be a pattern converting (CONCAT x,<br>
(IMPLICIT_DEF)) into a simple SUBREG_TO_REG as a last ditch effort.<br>
<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">First, we already have the pattern you mentioned, but it can't solve my problem.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
Second, looking at the DAG as below,</div></div><div><br></div><div><div class="gmail_default"><div class="gmail_default"><font face="arial, helvetica, sans-serif"> 0x21dd900: v2f32 = fneg 0x21dd800 [ORD=1] [ID=13]</font></div>
<div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21dda00: v2f32 = undef [ID=4]</span><br></div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21ddb00: v4f32 = concat_vectors 0x21dd900, 0x21dda00 [ORD=2] [ID=14]</span><br>
</div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21ddc00: v4f32 = undef [ID=5]</span><br></div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21ddd00: v4f32 = vector_shuffle 0x21ddb00, 0x21ddc00<1,1,1,1> [ORD=2] [ID=15]</span><br>
</div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21af288: <multiple use></span><br></div><div class="gmail_default"><font face="arial, helvetica, sans-serif"> 0x21dd300: f128 = Register %vreg1 [ID=2]</font></div>
<div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21dd400: f128,ch = CopyFromReg 0x21af288, 0x21dd300 [ID=8]</span><br></div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21dd500: v4f32 = bitcast 0x21dd400 [ID=11]</span><br>
</div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21af288: <multiple use></span><br></div><div class="gmail_default"><font face="arial, helvetica, sans-serif"> 0x21dd000: f128 = Register %vreg0 [ID=1]</font></div>
<div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21dd100: f128,ch = CopyFromReg 0x21af288, 0x21dd000 [ID=7]</span><br></div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21dd200: v4f32 = bitcast 0x21dd100 [ID=10]</span><br>
</div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"> 0x21dde00: v4f32 = fma 0x21ddd00, 0x21dd500, 0x21dd200 [ORD=3] [ID=16]</span><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">
<br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Here, if we don't remove the concat_vectors at lowering stage, we would have to explicitly describe concat_vectors in the pattern being matched. Otherwise, we would fail to combine fneg and fma to be a fms. But obviously, when concat_vectors(xxx, undef) is followed by a VDUPLANE, which is still a vector_shuffle before lowering, this concat_vectors is meaningless and can be completely removed. With this optimization, we needn't to match concat_vectors in the patterns at all, and the .md file would be more neat, right? This is why you can see the code changes like,</div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div><div><font face="arial, helvetica, sans-serif">- def : NI_2VE_lane<!cast<Instruction>(subop # "_4s4s"), neon_uimm1_bare,</font></div>
<div><font face="arial, helvetica, sans-serif">- op, VPR128, VPR128, VPR64, v4i32, v4i32, v2i32,</font></div><div><font face="arial, helvetica, sans-serif">- BinOpFrag<(Neon_vduplane</font></div>
<div><font face="arial, helvetica, sans-serif">- (Neon_combine_4S node:$LHS, undef),</font></div><div><font face="arial, helvetica, sans-serif">- node:$RHS)>>;</font></div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div></div><div><div><font face="arial, helvetica, sans-serif"> def : NI_2VEswap_lane<!cast<Instruction>(subop # "_4s4s"),</font></div>
<div><font face="arial, helvetica, sans-serif"> neon_uimm1_bare, op, VPR128, VPR64, v4f32, v2f32,</font></div><div><font face="arial, helvetica, sans-serif">- BinOpFrag<(Neon_vduplane</font></div>
<div><font face="arial, helvetica, sans-serif">- (Neon_combine_4f (fneg node:$LHS), undef),</font></div><div><font face="arial, helvetica, sans-serif">- node:$RHS)>>;</font></div>
<div><font face="arial, helvetica, sans-serif">+ BinOpFrag<(Neon_vduplane (fneg node:$LHS), node:$RHS)>>;</font></div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br>
</div></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">You can reproduce this problem with this piece of LLVM IR code,</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br>
</div><div><div><font face="arial, helvetica, sans-serif">declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)</font></div><div><span style="font-family:arial,helvetica,sans-serif">define <4 x float> @test_vfmsq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) {</span><br>
</div><div><font face="arial, helvetica, sans-serif">; CHECK: test_vfmsq_lane_f32:</font></div><div><font face="arial, helvetica, sans-serif">; CHECK: fmls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]</font></div><div>
<font face="arial, helvetica, sans-serif">entry:</font></div><div><font face="arial, helvetica, sans-serif"> %sub = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, %v</font></div><div><font face="arial, helvetica, sans-serif"> %lane = shufflevector <2 x float> %sub, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1></font></div>
<div><font face="arial, helvetica, sans-serif"> %0 = tail call <4 x float> @llvm.fma.v4f32(<4 x float> %lane, <4 x float> %b, <4 x float> %a)</font></div><div><font face="arial, helvetica, sans-serif"> ret <4 x float> %0</font></div>
<div><font face="arial, helvetica, sans-serif">}</font></div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><span style="font-family:arial"> </span><br></div></div></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Could you take me through an example this elision improves?<br>
<br>
Cheers.<br>
<span><font color="#888888"><br>
Tim.<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><font face="courier new, monospace">Thanks,</font><div><font face="courier new, monospace">-Jiangning</font></div></div>
</div></div>