<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Justin Holewinski wrote:

<blockquote

 cite="mid:448D0C8E64E8084981082A23DEDAEC70A28C88FD52@HQMAIL02.nvidia.com"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">-----Original Message-----

From: Dan Bailey [<a class="moz-txt-link-freetext" href="mailto:dan@dneg.com">mailto:dan@dneg.com</a>]

Sent: Sunday, April 29, 2012 8:46 AM

To: Justin Holewinski

Cc: Jim Grosbach; <a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>; Vinod Grover;

<a class="moz-txt-link-abbreviated" href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a>

Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend

Justin,

Firstly, this is great! It seems to be so much further forward in terms of

features compared to the PTX backend and I'm pleased to see all the support

for vector instructions, atomics, debugging information, etc - much of which

was still to be added to the other backend.

I have a few comments to make, but bear in mind my knowledge comes

solely from working on the PTX backend, so I don't know much about how

this compares to other backends and the feasibility of reusing components

designed to be shared across backends, plus my knowledge of tablegen is

also relatively limited.

This is in no particular order:

* Is there any reason why the 32-bit arch name is nvptx and the 64-bit arch

name is nvptx64, especially as the default for the NVCC compiler is to pass

the -m64 flag? Also, all the internal naming (NVPTXTargetMachine for

example) use 32 / 64 suffixes.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

As far as I know, there is no fundamental reason for this.  If it bugs you too much, I'm sure we could change it. :)

  </pre>

  <blockquote type="cite">

    <pre wrap="">* The register naming seems a little arbitrary as well, using FL prefixes for 64-

bit float and da prefixes for 64-bit float arguments for example.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Really, any choice is going to be arbitrary.

  </pre>

  <blockquote type="cite">

    <pre wrap="">* Something I picked up in the NVVM IR spec - it seems to only be possible to

use the bar.sync 0 instruction. Unless this is being removed for PTX 3.0, the

spec (and the PTX backend) support using bar.sync {0..15}. The old PTX

intrinsic also supports a non-zero integer operand.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

The NVVM intrinsic is there to implement CUDA's __syncthreads().  The old intrinsic is still exposed.  I'll see about adding a proper NVVM intrinsic.

  </pre>

  <blockquote type="cite">

    <pre wrap="">* I guess this raises the question of whether or not it's actually worthwhile

retaining compatibility with the old backend. I converted my Jet compiler to

use NVVM intrinsics and still found the address spaces were out of line, so it

was largely useless until I'd done the full conversion anyway. If the plan is to

deprecate the old backend (as I fully support), I'd be tempted to say maybe

just leave all these old intrinsics out too?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

This is something I've debated back and forth.  The original thought was to maintain as much backwards compatibility as possible with the old back-end.  I don't think it hurts anything to leave the old intrinsics in there.

  </pre>

  <blockquote type="cite">

    <pre wrap="">* I know this isn't an easy problem as I remember discussing this at length

with you at the time Justin, but supporting i8 for ld, st, cvt seems to introduce

a load of blocks of string concat code in tablegen. Is there not a better way of

tackling this, it looks like a fairly ugly solution to me?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Oh there are quite a few ugly TableGen solutions in there. :)  I'm trying to find ways to clean some of this stuff up, but it works so it's hard to justify devoting time to it.

  </pre>

  <blockquote type="cite">

    <pre wrap="">* There are a few places in the code where there are blocks of commented

out code that I presume are waiting for upstreaming of other additions.

Some of these are documented, but many of them aren't. Maybe these

should either be removed for addition later or some explanation added as to

why they're not being used.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Good point.  I'll see about adding more documentation to these.

  </pre>

  <blockquote type="cite">

    <pre wrap="">* There seems to be a lot of features that are lacking tests too. I was trying to

find the equivalent flags to use for switching on and off FMA (a feature we

use quite a bit to compare results with x86) and couldn't find any tests

demonstrating this. Considering the amount of code in the backend

committed to handling mad / fma, there really should be a bunch of tests to

go along with this. In fact non of the nvptx-* flags seem to have any

accompanying test cases, nor did atomics, nvvm intrinsics or many of the

other instructions. I imagine NVidia has a fairly full featured CUDA testing

framework, but if this is going to be made open-source, I'd rather as many of

the instructions could have tests too. Having said that, maybe some of the

test cases for the PTX backend could be migrated across?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I'm working on converting the tests now.

  </pre>

  <blockquote type="cite">

    <pre wrap="">* Finally I spotted a number of typos in the comments in the code that could

be cleaned up.

There's a lot of code here, so I might have missed some things. On the whole

though, most of this is fairly minor. It looks well designed and I'm looking

forward to this making it in. I'm especially excited to see the new pass for

vectorization and other passes to do code transformations. I managed to run

our fluid solver through this backend and it generated PTX that did the right

thing. I haven't had time to do any profiling comparing the old backend yet.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Can I put you down as a technical reviewer of the back-end? :)

As the main consumer of the open-source back-end, your seal of approval would go a long way</pre>

</blockquote>

All sounds good to me. Sure, put me down. :)<br>

<br>

Thanks,<br>

Dan<br>

<br>

<br>

</body>

</html>