[PATCH] D69103: Backend for NEC SX-Aurora

Mon Oct 28 12:42:53 PDT 2019

simoll added a comment.

In D69103#1723851 <https://reviews.llvm.org/D69103#1723851>, @rengolin wrote:

> In D69103#1723824 <https://reviews.llvm.org/D69103#1723824>, @simoll wrote:
>
> > Great! How do we mark the target to make it clear it's experimental?
>
>
> There used to be different registers in CMake and some structs, I can't remember. There are some comments in here that may help, but ultimately, you want to make sure that your target is only built if explicitly named, so we don't break bots.

Okay. I'll double check cmake only configures the build with the target when its explicitly added to `LLVM_EXPERIMENTAl_TARGETS_TO_BUILD`. Some pointers to the structs in question would be very helpful.

> 
> 
>> How much of that do you need to see before this patch here can be approved? I am asking because as we are refactoring the out-of-tree patches for upstreaming, it'd ease the pain of rebasing if some of the changes outside of `lib/Target/VE` could already go in.
> 
> Ideally you'd like to at least show the next steps (in the order you named). This could be a tentative Phab review, or a branch on your Github repo that shows it, but it would have to end up in Phab in a way that can be merged pretty soon.
> 
> What we're trying to avoid here is to have a long period (potentially across releases) with just the stub in and no real code. The current patch is just a marker and it would be good to see you're on the right path early on, so make any strong refactory before starting the patch set.

I understand. We will repackage the github code into digestible commits and put them on phab. I'll come back to you when the patch sets leading up to full scalar codegen are ready.

> 
> 
>> Beyond this patch, the critical changes are:
>> 
>> - TableGen:
>>   - IntrinsicEmitter (Additional `IIT_Info` enum entries).
>>   - CodeGenTarget (`MVT` enum entries).
>> - Clang: VE toolchain.
>> 
>>   Later, we will add target-specific vector intrinsics to Clang and LLVM.
> 
> I'm also interested in how you're going to connect to the rest of the toolchain. Are you emitting direct binaries (are they ELF-like?) or assembly that gets lowered by a proprietary tool (like NVidia)?
> 
> Also, what is the subset of IR that can work in your target (if any) and will Clang be responsible for filtering it or do we barf in the middle-end?

Any code that would run on your CPU really :) The full application binary (ELF, btw) runs on the card by default and dispatches systemcalls to a proxy process on the host. The github code already implements full scalar instruction codegen (and there are VE implementations for libcxx/libcxxabi/libunwind, ..).

> I'm expecting the loop vectoriser to be of interest, because if all we have is intrinsics, than it's easier to write assembly directly. :)

Well, we already use the target-specific intrinsics for hand-written Tensorflow kernels.  However, we are planning to implement standard vector instructions and LLVM-VP once its available upstream. The Region Vectorizer is already capable of emitting VP intrinsics. The current loop vectorizer would need to be extended to support tail loop predication through setting the active vector length to be useful for the VE (Otw, eg for packed f32 mode, there would be up to 511 scalarized remainder iterations, ..).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69103/new/

https://reviews.llvm.org/D69103