[PATCH RFC 0/4] R600: Implement 64bit div/rem

Fri Apr 25 12:08:34 PDT 2014

Hi,

I tried added 64bit div/rem support for r600 and this is what I come up with.
The first patch is just a random cleanup in 32 bit version.
The second patch changes UDIV/UREM nodes to UDIVREM, as this does not happen automatically during type legalizing phase.
The third patch implements the basic iterative division alg
(loop unrolled version), and the last one adds some optimizations that
I could think of. I still have the original commits if you prefer to apply them
individually. The optimizations result in cca 60% fewer instructions and 40%
fewer instruction groups.

My assumption was that people should only ever use 64 integers if they intend
to use large numbers, so the additional overhead of speculative UDIVREM32
does not really matter (it's about 5% of the total instruction count).

A better version would have one initial runtime check and either branch to
UDVIREM-64-by-32 or UDIVREM-64-by-64 that does not do the initial
speculation (and requires the divisor to be >= 2^32). This would speed up the execution if all threads in workgroup have the same kind of divisor
(either all < 2^32 or all >= 2^32)

regards,
Jan

Jan Vesely (4):
  R600: remove unused variable
  R600: Change UDIV/UREM to UDIVREM when legalizing types
  R600: Implement iterative algorithm for udivrem
  R600: optimize the UDIVREM algorithm for 64bit operands

 lib/Target/R600/AMDGPUISelLowering.cpp | 94 +++++++++++++++++++++++++++++++++-
 1 file changed, 92 insertions(+), 2 deletions(-)

-- 
1.9.0