[PATCH] D10991: [LNT] Reduce I/O execution time for Polybench

Renato Golin renato.golin at linaro.org
Tue Jul 7 10:00:08 PDT 2015


================
Comment at: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.c:46
@@ +45,3 @@
+    for (j = 0; j < n; j++)
+      print_element(A[i][j], j*8, printmat);
+    fputs(printmat, stderr);
----------------
kristof.beyls wrote:
> Do I understand correctly that this code basically only prints out the values of the last row of the entire matrix (the offset is j*8)? I think we'd want whatever the hash function implementation we end up with to still take all elements as input, to improve the chance of detecting a mis-compilation.
> I think the hash function can be really simple - no need for anything complex or secure; but we probably should feed in all matrix elements into the hash function. Maybe the straightforward solution here is to just print out the sum of all elements in a row, rather than each element in the row?
> 
> Tobias may now these tests better: are we expecting bit-reproducible results for these tests? I'm guessing so unless DATA_PRINTF_MODIFIER in the original code was chosen so that it prints out with less precision?
> 
> 
No.

print_element receives a float value (4 bytes) and expand into 8 nibles (8 bytes). So, every iteration of the print of A[i][j] will be on printmat[j*8]. In there, j*8 is only the initial position, on a streak of 8, not the *only* position printed.

================
Comment at: SingleSource/Benchmarks/Polybench/utilities/polybench.h:609-630
@@ -608,3 +608,24 @@
 
-
+/* To avoid calling printf M*M times (and make it run
+   for a long time), we split the output into an encoded string,
+   and print it as a simple char pointer, M times.*/
+static inline
+void print_element(float el, int pos, char *out)
+{
+  union {
+    float datum;
+    char bytes[4];
+  } block;
+
+  block.datum = el;
+  /* each nibble as a char, within the printable range */
+  *(out+pos)   = (block.bytes[0]&0xF0>>4)+'0';
+  *(out+pos+1) = (block.bytes[0]&0x0F)   +'0';
+  *(out+pos+2) = (block.bytes[1]&0xF0>>4)+'0';
+  *(out+pos+3) = (block.bytes[1]&0x0F)   +'0';
+  *(out+pos+4) = (block.bytes[2]&0xF0>>4)+'0';
+  *(out+pos+5) = (block.bytes[2]&0x0F)   +'0';
+  *(out+pos+6) = (block.bytes[3]&0xF0>>4)+'0';
+  *(out+pos+7) = (block.bytes[3]&0x0F)   +'0';
+}
 
----------------
kristof.beyls wrote:
> I'm not sure, but it looks like this may give different answers on big versus little endian machines (which is something the previous implementation didn't have)?
> Maybe just printing out the sum of each row of a matrix (i.e. 4000 floats being printed) instead of the entire matrix (4000*4000 floats being printed) already reduces the IO overhead to be in the noise? If not, a couple of rows could be summed up together to reduce the amount of IO further?
Yes, it does, and that's ok. We already have many tests with different output for big and little endian, and we deal with it by having a file called *.reference_outputs.big-endian. I can't run it, as I don't have any big-endian box. If you do, and want to do it now, feel free to send me some reference outputs. If not, we can wait until someone that runs it sends us. That's how we've done it in the past.

A sum of all the elements would *also* not be perfect, as a double change would go unnoticed.

I/O is not an issue any more, and the differences are indistinguishable from noise.


Repository:
  rL LLVM

http://reviews.llvm.org/D10991







More information about the llvm-commits mailing list