[PATCH] D10991: [LNT] Reduce I/O execution time for Polybench

Tue Jul 7 12:14:31 PDT 2015

================
Comment at: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.c:46
@@ +45,3 @@
+    for (j = 0; j < n; j++)
+      print_element(A[i][j], j*8, printmat);
+    fputs(printmat, stderr);
----------------
kristof.beyls wrote:
> rengolin wrote:
> > kristof.beyls wrote:
> > > rengolin wrote:
> > > > kristof.beyls wrote:
> > > > > Do I understand correctly that this code basically only prints out the values of the last row of the entire matrix (the offset is j*8)? I think we'd want whatever the hash function implementation we end up with to still take all elements as input, to improve the chance of detecting a mis-compilation.
> > > > > I think the hash function can be really simple - no need for anything complex or secure; but we probably should feed in all matrix elements into the hash function. Maybe the straightforward solution here is to just print out the sum of all elements in a row, rather than each element in the row?
> > > > > 
> > > > > Tobias may now these tests better: are we expecting bit-reproducible results for these tests? I'm guessing so unless DATA_PRINTF_MODIFIER in the original code was chosen so that it prints out with less precision?
> > > > > 
> > > > > 
> > > > No.
> > > > 
> > > > print_element receives a float value (4 bytes) and expand into 8 nibles (8 bytes). So, every iteration of the print of A[i][j] will be on printmat[j*8]. In there, j*8 is only the initial position, on a streak of 8, not the *only* position printed.
> > > Yes, I got that, but given that this is a 2-dimensional matrix, with i indicating the row and j indicating the column, only using j to index the printed out result means that every iteration of the i-loop overwrites the results in printmat written on the previous iteration, right? The mallox(n*8) also indicates there is only room to print a single row, not the entire matrix. Or maybe I'm still missing something?
> > That's why the fputs below is inside the i loop. I'm printing one row at a time. This also saves a lot of memory and avoids trashing the allocators, helps caching, etc.
> > 
> > Since the runtime now is indistinguishable from when *not* printing anything, I think it's a good trade-off.
> 
> D'oh - I missed that.
> Cool, so we're still producing roughly the same amount of output - but way more efficiently.
> 
> Provided these tests were already checking for bit-accurate results (I'm not sure - it probably depends on the DATA_PRINTF_MODIFIER), this looks good to me.
> Cool, so we're still producing roughly the same amount of output - but way more efficiently.

Precisely. :)

> Provided these tests were already checking for bit-accurate results (I'm not sure - it probably depends on the DATA_PRINTF_MODIFIER), this looks good to me.

They weren't, the modifier was mostly "%0.2f", and that's why one of the tests (fdtd-apml) didn't work with the new technique. But the gain was too small to justify any work towards that goal.

However, most of the others worked out of the box on ARM, AArch64 and x86_64. I think having a more strict checking is ok, as long as we understand that this was not a requirement. Though, I think it's a good thing.

I'll add a comment before print_element() to that effect, so if people see it failing, we can revert the ones that weren't exact.

I can't see how an alternative would work without either using printf or accumulation of values.

Repository:
  rL LLVM

http://reviews.llvm.org/D10991