[PATCH] D56175: [PowerPC] Exploit store instructions that store a single vector element

Mon Jan 14 13:50:46 PST 2019

nemanjai marked 2 inline comments as done.
nemanjai added inline comments.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3343
+      def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddr:$src),
+                (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
+                             sub_64), xaddr:$src)>;
----------------
jsji wrote:
> nemanjai wrote:
> > jsji wrote:
> > > These are identical to little endian patterns above?
> > Oops. The extract indices were supposed to be reversed. Good catch. Thank you.
> Thanks for updating! One question to me is why this not exposed by testcases below? Are we missing some coverage there?
I'll add it to the respective test.

================
Comment at: test/CodeGen/PowerPC/extract-and-store.ll:118
+; CHECK-NEXT:    addi r3, r7, 12
+; CHECK-NEXT:    stfiwx f0, 0, r3
 ; CHECK-NEXT:    blr
----------------
jsji wrote:
> nemanjai wrote:
> > jsji wrote:
> > > jsji wrote:
> > > > nemanjai wrote:
> > > > > One of the unanswered comments from the original patch was along the lines of:
> > > > > "The `stfs` instruction performs a conversion from 8-byte single precision (that PPC uses for single precision representation in registers) to 4-byte single precision (the in-memory single precision representation). The updated code no longer involves such a conversion, is that semantically correct?"
> > > > > 
> > > > > The short answer is yes. The reason we don't need this conversion is that single precision vector elements are represented the same way in registers and memory. If you inspect the original code sequence carefully, you'll see that it does the following
> > > > > - `xxsldwi` to line up the element into the correct location in the register
> > > > > - `xscvspdpn` to convert vector single precision to scalar single precision
> > > > > - `stfs` to implicitly convert the value back and store it
> > > > > The new sequence just skips all conversion and stores the single-precision vector element as a 4-byte single-precision value.
> > > > Thanks for checking! 
> > > > yes, you are right, the new sequence should work the same way at the old sequences for normal floating point values.
> > > > 
> > > > But I am more concern with abnormal values. 
> > > > 
> > > > Since conversions will change abnormal values, so I think we may get different results for out of range values eg: NaN, Inf, Denormals if we remove those conversions.
> > > > 
> > > > Of course we may argue that the new behavior without conversion maybe be better? 
> > > > 
> > > > So I still think **the semantic is actually different**, but it is a good idea to remove the conversions if we can accept the risk of changing some floating point application's behavior.
> > > > 
> > > Also, looks like some typos in above comments, so it looks confusing to me. 
> > > 
> > > My understandings from ISA are:
> > > 
> > > 1. Power ISA defined two data formats for floating point:
> > >   * 32-bit (4-byte) single format: 
> > >   * 64-bit (8-byte) double format:
> > > 
> > > 2. Each **FPR **contains 64 bits that support the floating-point **double format.** Every instruction that interprets the contents of an FPR as a floating-point value uses the floating-point double format for this interpretation.
> > > 
> > > 3. Vector Floating-Point (VMX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point **single data **format.
> > > 
> > > 4. Vector-Scalar Floating-Point (VSX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the** floating-point single data format (4 bytes) or floating-point double format (8 bytes) depending on the length.**
> > > 
> > > 5. Single Precision Floating-Point **takes operands from the FPRs in double format**, performs the operation, and then coerces this intermediate result to fit in single format. Status bits, in the FPSCR and optionally in the Condition Register, are set to reflect the single-precision result. **The result is then converted to double format and placed into an FPR**. 
> > > 
> > > 6. `xscvspdpn` is used to convert the element from single data format to double format, without setting FPSCR etc. 
> > > 
> > > 7. `stfs` will converting x in FPR from floating-point double format to floating-point single format, then store.
> > > 
> > > 8. `stfiwx` will store the 32-63 bits of FPR to memory directly, without conversion.
> > > 
> > > 
> > I'm not really sure what typos you're referring to, but we can think of it as there being two different single precision representations (i.e. bit patterns). The "in-scalar-register" representation is 64-bits wide and is equivalent to the double precision bit pattern for the same value. The in-memory representation is 32-bits wide and conforms to IEEE specifications for binary 32-bit floating point. Vectors of single precision values use the latter for each element.
> > 
> > Each instruction that operates on scalar floating point registers and "produces a single precision result" actually produces a double precision result that has equivalent precision to the single precision value that would be produced by performing that operation on its inputs.
> > 
> > Now for the issue at hand (i.e. convert-to-extract-then-convert-to-store vs. store-with-no-conversion):
> > - The current implementation does the conversions
> > - The implementation in this patch does not do the conversions, just store
> > - The first conversion (`xscvspdpn`) will perform normalization, NaNs and INFs just produce double precision versions of the same
> > - The second conversion (`stfs`) will perform denormalization (NaNs, INFs and zero remain)
> > - Now the memory contains an unmodified or denormalized version of the vector element
> > -- If it's unmodified (the conversion simply extracts the correct number of bits from exponent/mantissa), nothing to discuss here
> > -- If it's denormalized, it will produce a 32-bit bit pattern that is equivalent to the original denormal value in the vector
> > 
> > So unless I am missing something subtle in the conversion that `stfs` does, I believe that this change does not modify the semantics.
> 
> 
> The typos I was referring is the "8 byte single precision", as I don't think there is any "8 byte single precision" ...
> There are only two data format: 4 byte single format , 8 byte double format.
> 
> Anyway, if this can help for your "understanding", it is fine to "think of it as there being two different single precision".
> 
> I think our key divergency here is whether convert-convert will change semantic for NaN/INFs.
> 
> From my point of view, conversions might change the values.
> 
> eg: ConvertSPtoDP_NS used by xscvspdpn 
> 
> ```
> if (x.bit[1:8] == 255) then do 
>      exponent <- 2047
> end
> else if (x.bit[1:8] == 0) && (fraction == 0) the do 
>      exponent <- 0
> end  
> ```
> 
> The exponent might be overridden.
> 
> So `convert-to-extract-then-convert-to-store` vs. `store-with-no-conversion` might get different results.
> 
> But I also agree that both results should be valid.
> 
> Just that we may want to be aware or accept the risk of changing some floating point application's behavior.
But that's precisely what I'm saying is *not* the case. Setting the exponent to 2047 is precisely the "keeping NaNs NaNs and INFs INFs" that I was referring to.
The `if` in the RTL you have pasted accomplishes that very thing. The `else if` just says that a zero stays a zero.

A NaN has the exponent set to all 1's. All 1's in single precision is `255` and in double precision, it's `2047`. INF has the same property with the additional property that the fraction bits are all zeros.
https://en.wikipedia.org/wiki/IEEE_754-1985#NaN

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56175/new/

https://reviews.llvm.org/D56175