[PATCH] D23155: Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions

Mon Oct 3 11:55:33 PDT 2016

----- Original Message -----
> From: "Nemanja Ivanovic" <nemanja.i.ibm at gmail.com>
> To: "nemanja i ibm" <nemanja.i.ibm at gmail.com>, hfinkel at anl.gov, amehsan at ca.ibm.com, wschmidt at linux.vnet.ibm.com,
> tjablin at gmail.com, cycheng at multicorewareinc.com
> Cc: qcolombet at apple.com, llvm-commits at lists.llvm.org, echristo at gmail.com
> Sent: Monday, October 3, 2016 1:33:18 PM
> Subject: [PATCH] D23155: Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions
> 
> nemanjai added inline comments.
> 
> 
> > kbarton wrote in PPCInstPrinter.cpp:474
> > This should change to C++-style comments (i.e., //) as per the
> > comment-formatting guidelines:
> > http://llvm.org/docs/CodingStandards.html#comment-formatting
> 
> Will do.
> 
> > hfinkel wrote in PPCInstPrinter.cpp:479
> > We don't need to mention how it used to work, that won't help
> > anyone looking at the current code, and might be confusing.
> 
> OK, I'll remove the last sentence.
> 
> > kbarton wrote in PPCAsmPrinter.cpp:174
> > Reformat comments
> 
> Will do.
> 
> > hfinkel wrote in PPCAsmPrinter.cpp:179
> > Same here; we don't need to mention how it used to work.
> 
> OK, will remove the last sentence.
> 
> > kbarton wrote in PPCCallingConv.td:71
> > This is very minor, but this used to be sorted by integer types (in
> > decending order of vector elements), followed by floats sorted in
> > the same way. To maintain this, v2f64 should be moved to the end
> > after v4f32 unless there is a specific reason it has been put
> > where it is.
> 
> I don't think the order implies anything in this list. I'll move
> v2f64 to the end (after v4f32).
> 
> > kbarton wrote in PPCCallingConv.td:122
> > Same comment about placement of v2f64.
> 
> Yup, will move it.
> 
> > kbarton wrote in PPCCallingConv.td:192
> > Same comment about v2f64
> 
> Yup.
> 
> > kbarton wrote in PPCISelLowering.cpp:7079
> > Could this condition be combined with the one below?
> 
> Yeah, the one below was an afterthought and I guess I just missed
> adding it to this one. I'll combine them.
> 
> > kbarton wrote in PPCISelLowering.cpp:10577
> > Is FP really needed? Can you just return on the lines where FP is
> > being assigned?
> 
> Yup, I'll rewrite it that way.
> 
> > kbarton wrote in PPCISelLowering.cpp:10829
> > I don't see ByteWidth used anywhere else, so you can probably do
> > away with it by just copying the logic to assign ByteWidth into
> > this constructor call.
> 
> Yeah, I'll combine it.
> 
> > hfinkel wrote in PPCInstrInfo.cpp:1058
> > We should make this comment more specific, and explain that the
> > difference is that the lanes are stored in a different order.
> 
> I'll rewrite it as:
> // We need to avoid a situation in which the value from a VRRC
> register is spilled using an Altivec instruction and reloaded into
> // a VSRC register using a VSX instruction. The issue with this is
> that the VSX load/store instructions swap the doublewords
> // in the vector and the Altivec ones don't. The register classes on
> the spill/reload may be different if the register is defined using
> // an Altivec instruction and is then used by a VSX instruction.
> 
> > hfinkel wrote in PPCInstrVSX.td:96
> > I don't really understand what this is saying and why using the
> > same register would matter.
> 
> If you look at the instruction where I use this for an example:
> 
>   def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
>              (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;
> 
> If I were to replace that with
> 
>   def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
>              (v2i64 (XXPERMDI (LXSIHZX xoaddr:$src), (LXSIHZX
>              xoaddr:$src), 0))>;
> 
> The output DAG actually produces code that will perform both loads
> into different registers (i.e. two separate LXSIHZX instructions)
> and will do the XXPERMDI on the result registers. I just found using
> a version of the instruction that has a single input register
> operand to be an easy way to avoid this.

Okay; this makes sense. Please update the comment to explain this.

 -Hal

> 
> > hfinkel wrote in PPCInstrVSX.td:1125
> > Why don't you set UseVSXReg = 1 in an outer block (essentially over
> > all definitions), and then override by setting UseVSXReg = 0 for
> > those instructions that shouldn't have it. I'm under the
> > impression that this is a smaller set.
> 
> I think that we'd have to interrupt this block for definitions that
> do not inherit from `Instruction` such as `MovesToVSR,
> VectorExtractions`, etc.
> In any case, I think that over time, our use of these blocks has
> introduced some subtle issues that may or may not ever manifest
> themselves.
> For example, we may have a
> 
>   let Predicates = [ListA] in {
>     let Predicates = [ListB] in {
>     }
>   }
> 
> where the content of the inner block should really have
> 
>   let Predicates = [ListA, ListB] in {
>   }
> 
> but of course, it does not since the inner assignment overrides the
> Predicates rather than appending tothem.
> I plan to go through the file and clean this up a bit but I'd prefer
> to do this as a separate patch.
> 
> > kbarton wrote in PPCMIPeephole.cpp:198
> > Is it possible for MyOpcode to be VSPLTBs also? I don't see this
> > case handled anywhere and it's not clear that this situation could
> > not be symmetric (i.e., MyOpcode == PPC::VSPLTBs and DefOpcode ==
> > PPC::VSPLTB).
> 
> We don't currently have any patterns that will produce such a
> combination. Namely, the versions suffixed with "s" are instructions
> that splat a scalar (actually loaded as a scalar) into the vector
> register. Without this scalar load, we should not be using these
> scalar splats. So a VSPLTBs can only be fed by an LXSIBZX (or I
> suppose also an MTVSR[DW]) but never a VSPLTB.
> 
> > kbarton wrote in PPCRegisterInfo.td:286
> > Same comment about ordering.
> 
> OK, I'll reorder this as well.
> 
> Repository:
>   rL LLVM
> 
> https://reviews.llvm.org/D23155
> 
> 
> 
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory