LEON Sparc Sub-targets upgrade - first commit

Mon Sep 7 21:34:02 PDT 2015

On Sep 7, 2015, at 2:26 PM, Chris.Dewhurst <Chris.Dewhurst at lero.ie> wrote:
> Hi James,
> 
> Thanks for responding.
> 
> The design was largely done before I started on the project and, while we expect changes to make the code acceptable to the LLVM community, I need to balance that need with the needs to avoid excessive churn in the LEON processor back-end design.

It's *great* that you have an implementation of all of this -- and I certainly hope it can go in easily. BUT, I'd just like to say up-front that avoiding churn in an out-of-tree design is really not a terribly interesting consideration. What gets committed upstream might need to diverge drastically from what you've got, in response to review comments, and it's best to make peace with that reality early.

BTW, since you have a large body of work already done, if it's possible, it would be helpful to post your current diff as is as a "work in progress, not ready for in-depth review" patch. Even something which is known to not be ready is often good to post earlier rather than later. It can be referred to for context of smaller cleaned-up patches, or, comments on the general outline can be made early on -- potentially in a more helpful manner than can be given only upon reading an english description of the future submissions.

It would also be useful for me. Some of these features you have an implementation for are things that I need, or might find that I need soon, and was planning to implement myself in llvm upstream. (In particular, I was planning to implement both CASA and instruction scheduling for LEON4 soon.) I'd very much prefer to avoid redoing that work, if I could assisting in upstreaming some already-completed work instead.

> What follows is the briefest of explanations of the design of the LEON back-end. I've tried to structure them as questions so that you might be able to give me some feedback on what is most likely to be accepted by the LLVM community. I'd very much appreciate any feedback that you can give me and I'm very thankful for any amount of time you can give to this. I'm trying to keep this email as brief as possible to avoid taking too much of your time:
> 
> 1. The LEON back-end is implemented as a sub-target of the Sparc processor, based on the Sparc 32-bit processor. There are four variants: LEON2, LEON3, LEON4 and NGMP. Each is implemented as a separate sub-target. LEON processors support all Sparc 32 bit operations. Can I assume this is going to be the best way to implement LEON processor back-end support?

That seems basically reasonable. (isn't NGMP just a couple LEON4 cores, though?)

> 2.  LEON additionally supports a few additional specialised operations, such as CASA, an atomic-cmp-swap instruction. I'm intending to add these as pattern matching features in  SparcInstrInfo.td, with appropriate constraints to target LEON-only. Can you verify that this is the best way to do this?

Yes, that sounds fine. (of course, CASA is in both SparcV9 and LEON, just, with incompatible ASI arguments required.) BTW, Did you also add the equivalent of gcc's -m[no-]user-mode to select between ASI 0x0A and 0x0B for leon's CASA?

> 3. LEON's hardware implementation also has very many errata, which we need to fix as software optimsations. The intention in the design is to implement a back-end pass for each change and add these to the LEON target's information about the processor type. e.g. SDIV is not working correctly on LEON processors, but SDIVrr works correctly, so a back-end pass will modify the instructions to repair this erratum. A sample of a LEON processor type definition is as follows, with similar variants for LEON3,4 and NGMP:

Yeah, these AT697 and UT699 processors sure have a whole lot of errata. [e.g. http://www.atmel.com/Images/doc4409.pdf <http://www.atmel.com/Images/doc4409.pdf> is really rather astounding!]. Although, I'd note that most of them are already fixed by the AT697F and UT699E revisions. 

> //LEON 2 FT (AT697)
> def : Processor<"leon2", LEONItineraries, [FeatureLeon2, FPUSupport, SaveRestoreSupport, ReplaceSDIV, FixCALL,
> RestoreExecAddress, IgnoreZeroFlag, FillDataCache, InsertNOPDoublePrecision,
> RemovingRedundantLoads, LoopInvariantCodeMotion, PromoteArgumentsToScalars,
> BasicBlockPlacement, GlobalVariableOptimizer, CanonicalizeInductionVariables,
> CanonicalizeInductionVariables, DeduceFunctionAttributes, SimpleConstantPropagation,
> InternalizeGlobalSymbols, CodeSinking, ReassociateExpressions, MemCpy, PromoteMemoryRegister,
> LowerSwitchInsts, DeadStoreElimination, DeadGlobalElimination, DeadTypeElimination,
> DeadCodeElimination, DeadInstructionElimination, DeadArgumentElimination, DeadArgumentElimination,
> GlobalValueNumbering, OptimizeCodeGeneration, MergeDuplicateGlobal, DeleteDeadLoops,
> UnusedFunctionPrototypes, StripAllSymbols, RemoveUnusedExceptionHandling, MergeFunctions,
> TailCallElimination, TailRecursionElimination, LoopTiling, OptimizeExecutionTime, OptimizeCodeSize,
> OptimizeDataSize, OptimizeSize, AllOptimizations]>;

That's sure a lot more subtarget features than I'd expect to see here. (after InsertNOPDoublePrecision it looks like just a list of optimization passes, not features?)

I also think these errata fixes probably shouldn't be done under the generic processor type (e.g. "leon2"), but rather as a separate opt-in feature, because they do get fixed in later revisions of the same chip.

> Is this design going to be acceptable to the LLVM community? If not, what direction should we move in to implement hardware errata for these processors?

Would need to see the code to say for sure. :) But I don't think it sounds unreasonable.

Of course, if nobody actually needs some errata workarounds any more because they're for older chip revisions that aren't used anymore, it'd be preferable to not add any code to LLVM supporting it. (AT697F came out in 2009 -- that's not old enough by now that the AT697E-only workarounds are unnecessary for a brand new toolchain?)

> 4. Itineraries: Sparc currently does not implement itineraries. LEON processors, however, have multiple core support and itineraries are a required feature to optimise compiler output. As you can see from the definition above, LEON processors' definitions include itineraries (defined elsewhere, but not here, for brevity). Sparc processors will still implement no itinerary. This also requires a itinerary class to be defined for all existing instructions. e.g. FPU instructions largely have the same itinerary class, so FADDS would have this defined at the end as:

The other Sparc processors only don't implement it only because nobody got around to it yet --they *should* implement it too (but that's fine to leave that for someone else, of course).

> // Floating-point Add and Subtract Instructions, p. 146
> def FADDS  : F3_3<2, 0b110100, 0b001000001,
>                  (outs FPRegs:$rd), (ins FPRegs:$rs1, FPRegs:$rs2),
>                  "fadds $rs1, $rs2, $rd",
>                  [(set f32:$rd, (fadd f32:$rs1, f32:$rs2))], IIC_fpu_instr>;
> 
> Is adding an itinerary to every Sparc instruction going to be an acceptable way to move forward with LEON implementation? I'm assuming that the lack of any itinerary on existing (i.e. non-LEON) processors means that these itinerary classes for the instructions are effectively ignored for any non-LEON processor.

Yes, that sounds good to me. If someone else ever wants to implement it for modern sparcs, they may have to split some of the itinerary classes, but you don't need to worry about that.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150908/8ebc327a/attachment.html>