[PATCH] D56175: [PowerPC] Exploit store instructions that store a single vector element

Fri Jan 11 06:58:45 PST 2019

nemanjai marked 3 inline comments as done.
nemanjai added inline comments.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3343
+      def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddr:$src),
+                (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
+                             sub_64), xaddr:$src)>;
----------------
jsji wrote:
> These are identical to little endian patterns above?
Oops. The extract indices were supposed to be reversed. Good catch. Thank you.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3881
+              (STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
+    foreach Idx = [ [0,3], [2,1], [3,2] ] in {
+      def : Pat<(store (i32 (extractelt v4i32:$A, !head(Idx))), xoaddr:$src),
----------------
jsji wrote:
> Can we add some comments about the "magic" indexes here. 
Sure, will do.

================
Comment at: test/CodeGen/PowerPC/extract-and-store.ll:118
+; CHECK-NEXT:    addi r3, r7, 12
+; CHECK-NEXT:    stfiwx f0, 0, r3
 ; CHECK-NEXT:    blr
----------------
jsji wrote:
> jsji wrote:
> > nemanjai wrote:
> > > One of the unanswered comments from the original patch was along the lines of:
> > > "The `stfs` instruction performs a conversion from 8-byte single precision (that PPC uses for single precision representation in registers) to 4-byte single precision (the in-memory single precision representation). The updated code no longer involves such a conversion, is that semantically correct?"
> > > 
> > > The short answer is yes. The reason we don't need this conversion is that single precision vector elements are represented the same way in registers and memory. If you inspect the original code sequence carefully, you'll see that it does the following
> > > - `xxsldwi` to line up the element into the correct location in the register
> > > - `xscvspdpn` to convert vector single precision to scalar single precision
> > > - `stfs` to implicitly convert the value back and store it
> > > The new sequence just skips all conversion and stores the single-precision vector element as a 4-byte single-precision value.
> > Thanks for checking! 
> > yes, you are right, the new sequence should work the same way at the old sequences for normal floating point values.
> > 
> > But I am more concern with abnormal values. 
> > 
> > Since conversions will change abnormal values, so I think we may get different results for out of range values eg: NaN, Inf, Denormals if we remove those conversions.
> > 
> > Of course we may argue that the new behavior without conversion maybe be better? 
> > 
> > So I still think **the semantic is actually different**, but it is a good idea to remove the conversions if we can accept the risk of changing some floating point application's behavior.
> > 
> Also, looks like some typos in above comments, so it looks confusing to me. 
> 
> My understandings from ISA are:
> 
> 1. Power ISA defined two data formats for floating point:
>   * 32-bit (4-byte) single format: 
>   * 64-bit (8-byte) double format:
> 
> 2. Each **FPR **contains 64 bits that support the floating-point **double format.** Every instruction that interprets the contents of an FPR as a floating-point value uses the floating-point double format for this interpretation.
> 
> 3. Vector Floating-Point (VMX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point **single data **format.
> 
> 4. Vector-Scalar Floating-Point (VSX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the** floating-point single data format (4 bytes) or floating-point double format (8 bytes) depending on the length.**
> 
> 5. Single Precision Floating-Point **takes operands from the FPRs in double format**, performs the operation, and then coerces this intermediate result to fit in single format. Status bits, in the FPSCR and optionally in the Condition Register, are set to reflect the single-precision result. **The result is then converted to double format and placed into an FPR**. 
> 
> 6. `xscvspdpn` is used to convert the element from single data format to double format, without setting FPSCR etc. 
> 
> 7. `stfs` will converting x in FPR from floating-point double format to floating-point single format, then store.
> 
> 8. `stfiwx` will store the 32-63 bits of FPR to memory directly, without conversion.
> 
> 
I'm not really sure what typos you're referring to, but we can think of it as there being two different single precision representations (i.e. bit patterns). The "in-scalar-register" representation is 64-bits wide and is equivalent to the double precision bit pattern for the same value. The in-memory representation is 32-bits wide and conforms to IEEE specifications for binary 32-bit floating point. Vectors of single precision values use the latter for each element.

Each instruction that operates on scalar floating point registers and "produces a single precision result" actually produces a double precision result that has equivalent precision to the single precision value that would be produced by performing that operation on its inputs.

Now for the issue at hand (i.e. convert-to-extract-then-convert-to-store vs. store-with-no-conversion):
- The current implementation does the conversions
- The implementation in this patch does not do the conversions, just store
- The first conversion (`xscvspdpn`) will perform normalization, NaNs and INFs just produce double precision versions of the same
- The second conversion (`stfs`) will perform denormalization (NaNs, INFs and zero remain)
- Now the memory contains an unmodified or denormalized version of the vector element
-- If it's unmodified (the conversion simply extracts the correct number of bits from exponent/mantissa), nothing to discuss here
-- If it's denormalized, it will produce a 32-bit bit pattern that is equivalent to the original denormal value in the vector

So unless I am missing something subtle in the conversion that `stfs` does, I believe that this change does not modify the semantics.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56175/new/

https://reviews.llvm.org/D56175