[PATCH] D26743: Expandload and Compressing store - documentation update

Mon Nov 28 00:27:48 PST 2016

Ayal added inline comments.

================
Comment at: docs/LangRef.rst:11866
+"""""""
+This is an overloaded intrinsic. The loaded data is several values of integer, floating point or pointer data type are loaded from consecutive memory address and stored into the elements of a vector according to the mask.
+
----------------
"The loaded data is several values of integer," >> "Several values of integer,"

"memory address" >> "memory addresses"

================
Comment at: docs/LangRef.rst:11882
+
+The first operand is the base pointer for the load. The second operand, mask, is a vector of boolean values with the same number of elements as the return type. The third is a pass-through value that is used to fill the masked-off lanes of the result. The return type and the type of the '``passthru``' operand are the same vector types.
+
----------------
Explain about the type of the first operand.

================
Comment at: docs/LangRef.rst:11887
+
+The '``llvm.masked.expandload``' intrinsic is designed for sequential reading of multiple scalar values from memory into a sparse vector in a single IR operation. It is useful for targets that support vector expanding loads and allows vectorizing loop with cross-iteration dependency like in the following example:
+
----------------
If the terms "dense" or "sparse" are to be used they should be defined to avoid confusion - a sparse representation is often the one that is condensed.
Alternatively:
"designed for sequential reading of multiple scalar values from memory into a sparse vector in a single IR operation" >>
"designed for reading multiple scalar values from adjacent memory addresses into possibly non-adjacent vector lanes in a single IR operation"

================
Comment at: docs/LangRef.rst:11902
+    ; Load several elements from array B and expand them in a vector.
+    ; The number of loaded elements is equal to the number of 'true' elements in the mask.
+    %Tmp = call <8 x double> @llvm.masked.expandload.v8f64(double* %Bptr, <8 x i1> %mask, <8 x double> undef)
----------------
for consistency, use "'1' bits" or "'true' elements" but not both.

================
Comment at: docs/LangRef.rst:11903
+    ; The number of loaded elements is equal to the number of 'true' elements in the mask.
+    %Tmp = call <8 x double> @llvm.masked.expandload.v8f64(double* %Bptr, <8 x i1> %mask, <8 x double> undef)
+    ; Store the result in A
----------------
%Bptr should have type <8 x double>*, right?

================
Comment at: docs/LangRef.rst:11918
+"""""""
+This is an overloaded intrinsic. The stored data is a number of scalar values of any integer, floating point or pointer data type picked up from an input vector and stored as a contiguous vector in memory. The mask defines active elements from the input vector that should be stored.
+
----------------
"The stored data is a number of scalar values of any integer, floating point or pointer data type picked up from an input vector and stored as a contiguous vector in memory" >>
"A number of scalar values of integer, floating point or pointer data type are collected from an input vector and stored into adjacent memory addresses"

"The mask defines active elements from the input vector that should be stored" >>
"A mask defines which elements to collect from the vector"

================
Comment at: docs/LangRef.rst:11922-11923
+
+      declare void @llvm.masked.compressstore.v8i32  (<8  x i32>   <value>, i32*   <ptr>, <8  x i1> <mask>)
+      declare void @llvm.masked.compressstore.v16f32 (<16 x float> <value>, float* <ptr>, <16 x i1> <mask>)
+
----------------
ptr should have pointer-to-vector types.

================
Comment at: docs/LangRef.rst:11928
+
+Selects elements from input vector '``value``' according to the '``mask``'. Writes all selected elements from lower to higher sequentially to memory '``ptr``' as one contiguous vector. The mask holds a bit for each vector lane, and is used to select elements to be stored. The number of elements to be stored is equal to number of active bits in the mask.
+
----------------
"Writes all selected elements from lower to higher sequentially to memory '``ptr``' as one contiguous vector." >>
"All selected elements are written into adjacent memory addresses starting at address '``ptr``', from lower to higher."

"equal to number" >> "equal to the number"

================
Comment at: docs/LangRef.rst:11933
+
+The first operand is the vector value, which elements to be picked up and written to memory. The second operand is the base pointer for the store, it has the same underlying type as the element of the vector value operand. The third operand is the mask, a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
+
----------------
"The first operand is the vector value, which elements to be picked up and written to memory." >>
"The first operand is the input vector, from which elements are collected and written to memory."

"vector value operand." >> "input vector operand."

"The types of the mask and the value operand" >> "The mask and the input vector"

================
Comment at: docs/LangRef.rst:11939
+
+The '``llvm.masked.compressstore``' intrinsic is designed for data compressing. It allows to pick up single elements from a vector and store them contiguously in memory in one IR operation. It is useful for targets that support compressing store operation and allows vectorizing loop with a cross-iteration dependency like in the following example:
+
----------------
"data compressing" >> "compressing data in memory"

"to pick up single elements" >> "to collect elements from possibly non-adjacent lanes of a vector"

"store operation" >> "store operations"

"vectorizing loop" >> "vectorizing loops"

"a cross-iteration dependency" >> "cross-iteration dependences"

================
Comment at: docs/LangRef.rst:11943
+
+    // In this loop we load elements from A and dense them into B
+    double *A, B; int *C;
----------------
"dense them" >> "store them consecutively"

================
Comment at: docs/LangRef.rst:11955
+    %Tmp = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* %Aptr, i32 8, <8 x i1> %mask, <8 x double> undef)
+    ; Store all selected elements densely in array B
+    call <void> @llvm.masked.compressstore.v8f64(<8 x double> %Tmp, double* %Bptr, <8 x i1> %mask)
----------------
"densely" >> "consecutively"

Repository:
  rL LLVM

https://reviews.llvm.org/D26743