[PATCH] D56175: [PowerPC] Exploit store instructions that store a single vector element

Jinsong Ji via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 7 14:50:30 PST 2019


jsji added a comment.

Thanks for updating!



================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3343
+      def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddr:$src),
+                (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
+                             sub_64), xaddr:$src)>;
----------------
These are identical to little endian patterns above?


================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3881
+              (STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
+    foreach Idx = [ [0,3], [2,1], [3,2] ] in {
+      def : Pat<(store (i32 (extractelt v4i32:$A, !head(Idx))), xoaddr:$src),
----------------
Can we add some comments about the "magic" indexes here. 


================
Comment at: test/CodeGen/PowerPC/extract-and-store.ll:118
+; CHECK-NEXT:    addi r3, r7, 12
+; CHECK-NEXT:    stfiwx f0, 0, r3
 ; CHECK-NEXT:    blr
----------------
nemanjai wrote:
> One of the unanswered comments from the original patch was along the lines of:
> "The `stfs` instruction performs a conversion from 8-byte single precision (that PPC uses for single precision representation in registers) to 4-byte single precision (the in-memory single precision representation). The updated code no longer involves such a conversion, is that semantically correct?"
> 
> The short answer is yes. The reason we don't need this conversion is that single precision vector elements are represented the same way in registers and memory. If you inspect the original code sequence carefully, you'll see that it does the following
> - `xxsldwi` to line up the element into the correct location in the register
> - `xscvspdpn` to convert vector single precision to scalar single precision
> - `stfs` to implicitly convert the value back and store it
> The new sequence just skips all conversion and stores the single-precision vector element as a 4-byte single-precision value.
Thanks for checking! 
yes, you are right, the new sequence should work the same way at the old sequences for normal floating point values.

But I am more concern with abnormal values. 

Since conversions will change abnormal values, so I think we may get different results for out of range values eg: NaN, Inf, Denormals if we remove those conversions.

Of course we may argue that the new behavior without conversion maybe be better? 

So I still think **the semantic is actually different**, but it is a good idea to remove the conversions if we can accept the risk of changing some floating point application's behavior.



================
Comment at: test/CodeGen/PowerPC/extract-and-store.ll:118
+; CHECK-NEXT:    addi r3, r7, 12
+; CHECK-NEXT:    stfiwx f0, 0, r3
 ; CHECK-NEXT:    blr
----------------
jsji wrote:
> nemanjai wrote:
> > One of the unanswered comments from the original patch was along the lines of:
> > "The `stfs` instruction performs a conversion from 8-byte single precision (that PPC uses for single precision representation in registers) to 4-byte single precision (the in-memory single precision representation). The updated code no longer involves such a conversion, is that semantically correct?"
> > 
> > The short answer is yes. The reason we don't need this conversion is that single precision vector elements are represented the same way in registers and memory. If you inspect the original code sequence carefully, you'll see that it does the following
> > - `xxsldwi` to line up the element into the correct location in the register
> > - `xscvspdpn` to convert vector single precision to scalar single precision
> > - `stfs` to implicitly convert the value back and store it
> > The new sequence just skips all conversion and stores the single-precision vector element as a 4-byte single-precision value.
> Thanks for checking! 
> yes, you are right, the new sequence should work the same way at the old sequences for normal floating point values.
> 
> But I am more concern with abnormal values. 
> 
> Since conversions will change abnormal values, so I think we may get different results for out of range values eg: NaN, Inf, Denormals if we remove those conversions.
> 
> Of course we may argue that the new behavior without conversion maybe be better? 
> 
> So I still think **the semantic is actually different**, but it is a good idea to remove the conversions if we can accept the risk of changing some floating point application's behavior.
> 
Also, looks like some typos in above comments, so it looks confusing to me. 

My understandings from ISA are:

1. Power ISA defined two data formats for floating point:
  * 32-bit (4-byte) single format: 
  * 64-bit (8-byte) double format:

2. Each **FPR **contains 64 bits that support the floating-point **double format.** Every instruction that interprets the contents of an FPR as a floating-point value uses the floating-point double format for this interpretation.

3. Vector Floating-Point (VMX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point **single data **format.

4. Vector-Scalar Floating-Point (VSX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the** floating-point single data format (4 bytes) or floating-point double format (8 bytes) depending on the length.**

5. Single Precision Floating-Point **takes operands from the FPRs in double format**, performs the operation, and then coerces this intermediate result to fit in single format. Status bits, in the FPSCR and optionally in the Condition Register, are set to reflect the single-precision result. **The result is then converted to double format and placed into an FPR**. 

6. `xscvspdpn` is used to convert the element from single data format to double format, without setting FPSCR etc. 

7. `stfs` will converting x in FPR from floating-point double format to floating-point single format, then store.

8. `stfiwx` will store the 32-63 bits of FPR to memory directly, without conversion.




Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56175/new/

https://reviews.llvm.org/D56175





More information about the llvm-commits mailing list