<div dir="ltr">Hi Tim,<div><br></div>I am working on removing all VPR64 instruction format , but facing a problem. I wish to extend 64 bit vector(eg. v8i8) to 128 bit (v16i8) before insert/extract, then I can use 128 bit insert/extract to process it. I tried SUBREG_TO_REG and INSERT_SUBREG to get this promotion, but all got an assertion failure:<div>

<br><div>llc: /home/kevin/llvm_trunk/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:534: void llvm::InstrEmitter::EmitSubregNode(llvm::SDNode*, llvm::DenseMap<llvm::SDValue, unsigned int>&, bool, bool): Assertion `SRC && "No register class supports VT and SubIdx for INSERT_SUBREG"' failed.<br>

</div></div><div><br></div><div>I see in AArch64RegisterInfo.td, Vx is defined as subreg of Qx, but there is no belonging relation describe between VPR64 and VPR128. How can I define VPR64 is subreg of VPR128 while both of them hold same register name Vx?</div>

<div><br></div><div>Or, Do you know any dag node can do such promotion without introducing instruction?</div><div><br></div><div>Thanks.</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

2013/9/10 Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Kevin,<br>

<br>

Thanks very much for keeping on working on this, I quite understand<br>

the need to share progress so others don't get blocked.<br>

<div class="im"><br>

>>+      if(getSubTarget().hasNEON())<br>

>>+        [...]<br>

>>+      else<br>

>><br>

>>What's the advantage of emitting  a different instruction if the core<br>

>>has NEON here? The normal FP one should work regardless (and may<br>

>>actually be faster since the lane doesn't have to be decoded)<br>

><br>

> We think if someone move data bewteen GPR and FPR on the device with NEON,<br>

> he most likely to use some SISD instructions  after or before movement.<br>

> Considering FPU and NEON are different hard cores, mixing FPU instruction<br>

> with NEON instruction may get worse performance.<br>

<br>

</div>That's happened historically (Cortex-A9 springs to mind?) but I don't<br>

believe it's a significant issue on either Cortex-A15 or Swift, let<br>

alone any AArch64 chips that may be coming.<br>

<br>

Quite apart from that, your predicate isn't based on whether NEON or<br>

FP is actually being used, but on whether the CPU supports it, which<br>

is a completely different question.<br>

<div class="im"><br>

>>+def Neon_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",<br>

>> SDT_Neon_mov_lane>;<br>

>>+def Neon_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT", SDTypeProfile<1,<br>

>> 3,<br>

>>+                           [SDTCisVec<0>, SDTCisSameAs<0, 1>,<br>

>>+                           SDTCisInt<2>, SDTCisVT<3, i32>]>>;<br>

>><br>

>>What's wrong with the existing extractelt and insertelt?<br>

><br>

> extractelt is defined as the output holds the same value type of vector<br>

> element, but smov and umov can extract the element and extend it<br>

> simultaneously.<br>

<br>

</div>Quite possibly; in that case your patterns would match things like<br>

"(and (extractelt ...), 0xff)" and "(sext_inreg (extractelt ...),<br>

i8)". At least, that's what I see in my DAGs for code I've looked at.<br>

<div class="im"><br>

> Also,I try to use vector_extract . But its last parameter is defined as a<br>

> pointer, which is presented as a i64 constant in aarch64.<br>

<br>

</div>But that's how the nodes are created too. The DAG *does* have an i64<br>

type in the lane slot when I view it. If your patterns work with those<br>

nodes, it's because you're lying to TableGen with the<br>

Neon_vector_extract definition (& insert).<br>

<div class="im"><br>

> This will cause<br>

> pattern match fail because all immediate numbers are defined as i32.<br>

<br>

</div>They don't have to be. You can add i64 immediates to your instructions<br>

if it's more convenient. They get stored as int64_t anyway, and all<br>

type information is lost by the time we get to MachineInstrs.<br>

<div class="im"><br>

>>+               let Inst{14-12} = {Immn{2}, Immn{1}, Immn{0}};<br>

>><br>

>>What's bit 11? Same for later ones. If it's set elsewhere, a comment<br>

>>would be useful. If it's not then setting it is probably essential.<br>

><br>

> Because bit 11 is unspecified, any value is ok.<br>

<br>

</div>Ah, didn't notice that bit. I thought ARM were trying to do away with<br>

such things (Grr!). Oh well. A comment would definitely be helpful<br>

here then.<br>

<div class="im"><br>

>>+    def _16B16 : NeonI_insert<0b1, 0b1,<br>

>>+    [...]<br>

>>+    def _16B8 : NeonI_insert<0b1, 0b1,<br>

>><br>

>>Why are these separate instructions? It seems to be some attempt to<br>

>>model the VPR64/VPR128 distinction, but I don't think it actually<br>

>>gains us anything. It's not like the high part of VPR128 can be used<br>

>>independently really.<br>

><br>

> I am working on removing all VPR64 instrcution format and custom promote all<br>

> 64 bit vector to 128 bit. I will add this modification in next edition.<br>

<br>

</div>Thanks.<br>

<span class="HOEnZb"><font color="#888888"><br>

Tim.<br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div>

</div>