[PATCH] D23155: Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions

Mon Oct 3 11:33:18 PDT 2016

nemanjai added inline comments.

> kbarton wrote in PPCInstPrinter.cpp:474
> This should change to C++-style comments (i.e., //) as per the comment-formatting guidelines: http://llvm.org/docs/CodingStandards.html#comment-formatting

Will do.

> hfinkel wrote in PPCInstPrinter.cpp:479
> We don't need to mention how it used to work, that won't help anyone looking at the current code, and might be confusing.

OK, I'll remove the last sentence.

> kbarton wrote in PPCAsmPrinter.cpp:174
> Reformat comments

Will do.

> hfinkel wrote in PPCAsmPrinter.cpp:179
> Same here; we don't need to mention how it used to work.

OK, will remove the last sentence.

> kbarton wrote in PPCCallingConv.td:71
> This is very minor, but this used to be sorted by integer types (in decending order of vector elements), followed by floats sorted in the same way. To maintain this, v2f64 should be moved to the end after v4f32 unless there is a specific reason it has been put where it is.

I don't think the order implies anything in this list. I'll move v2f64 to the end (after v4f32).

> kbarton wrote in PPCCallingConv.td:122
> Same comment about placement of v2f64.

Yup, will move it.

> kbarton wrote in PPCCallingConv.td:192
> Same comment about v2f64

Yup.

> kbarton wrote in PPCISelLowering.cpp:7079
> Could this condition be combined with the one below?

Yeah, the one below was an afterthought and I guess I just missed adding it to this one. I'll combine them.

> kbarton wrote in PPCISelLowering.cpp:10577
> Is FP really needed? Can you just return on the lines where FP is being assigned?

Yup, I'll rewrite it that way.

> kbarton wrote in PPCISelLowering.cpp:10829
> I don't see ByteWidth used anywhere else, so you can probably do away with it by just copying the logic to assign ByteWidth into this constructor call.

Yeah, I'll combine it.

> hfinkel wrote in PPCInstrInfo.cpp:1058
> We should make this comment more specific, and explain that the difference is that the lanes are stored in a different order.

I'll rewrite it as:
// We need to avoid a situation in which the value from a VRRC register is spilled using an Altivec instruction and reloaded into
// a VSRC register using a VSX instruction. The issue with this is that the VSX load/store instructions swap the doublewords
// in the vector and the Altivec ones don't. The register classes on the spill/reload may be different if the register is defined using
// an Altivec instruction and is then used by a VSX instruction.

> hfinkel wrote in PPCInstrVSX.td:96
> I don't really understand what this is saying and why using the same register would matter.

If you look at the instruction where I use this for an example:

  def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
             (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;

If I were to replace that with

  def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
             (v2i64 (XXPERMDI (LXSIHZX xoaddr:$src), (LXSIHZX xoaddr:$src), 0))>;

The output DAG actually produces code that will perform both loads into different registers (i.e. two separate LXSIHZX instructions) and will do the XXPERMDI on the result registers. I just found using a version of the instruction that has a single input register operand to be an easy way to avoid this.

> hfinkel wrote in PPCInstrVSX.td:1125
> Why don't you set UseVSXReg = 1 in an outer block (essentially over all definitions), and then override by setting UseVSXReg = 0 for those instructions that shouldn't have it. I'm under the impression that this is a smaller set.

I think that we'd have to interrupt this block for definitions that do not inherit from `Instruction` such as `MovesToVSR, VectorExtractions`, etc.
In any case, I think that over time, our use of these blocks has introduced some subtle issues that may or may not ever manifest themselves.
For example, we may have a

  let Predicates = [ListA] in {
    let Predicates = [ListB] in {
    }
  }

where the content of the inner block should really have

  let Predicates = [ListA, ListB] in {
  }

but of course, it does not since the inner assignment overrides the Predicates rather than appending tothem.
I plan to go through the file and clean this up a bit but I'd prefer to do this as a separate patch.

> kbarton wrote in PPCMIPeephole.cpp:198
> Is it possible for MyOpcode to be VSPLTBs also? I don't see this case handled anywhere and it's not clear that this situation could not be symmetric (i.e., MyOpcode == PPC::VSPLTBs and DefOpcode == PPC::VSPLTB).

We don't currently have any patterns that will produce such a combination. Namely, the versions suffixed with "s" are instructions that splat a scalar (actually loaded as a scalar) into the vector register. Without this scalar load, we should not be using these scalar splats. So a VSPLTBs can only be fed by an LXSIBZX (or I suppose also an MTVSR[DW]) but never a VSPLTB.

> kbarton wrote in PPCRegisterInfo.td:286
> Same comment about ordering.

OK, I'll reorder this as well.

Repository:
  rL LLVM

https://reviews.llvm.org/D23155