[PATCH] Rewrite arm_neon.h generation from scratch

Jim Grosbach grosbach at apple.com
Thu Jun 12 15:09:30 PDT 2014


> On Jun 11, 2014, at 3:38 AM, James Molloy <james.molloy at arm.com> wrote:
> 
> Hi Tim,
> 
> Thanks for the comments!
> 
>> +  /// NeededEarly - set if any other intrinsic depends on this intrinsic.
>> +  bool NeededEarly;
>> 
>> Was this an early version of the Dependencies thing? It seems unused.
> 
> Yes and no. It was indeed an early version of Dependencies, but I kept it because
> I think I may well need it for the small-ish big-endian change I have lined up for
> after this. I can remove it now and re-add it again in a later commit if I needed it,
> or I can keep it and remove it if I didn't need it. I'm easy.
> 
>>> def OP_XTN      : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
>> 
>> (And others). Was it a conscious decision to use a "vcombine" call
>> over "(shuffle X, Y, (add mask0, mask1))"? I can see arguments both
>> ways.
> 
> The answer is "yes" -  I consciously tried to keep the exact same code-gen where
> possible. This made it much easier to diff, and to stop subtle mistakes that I
> wouldn't have spotted otherwise. Now that the code demonstrably works, I'm happy to change it.

Makes sense. I’d love to see us refactor more things to use generic IR. You’re right that’s best as follow-on work, though.

In general, I really, really like this patch and the direction it takes the emitter. It seems a strict improvement over the increasingly unwieldy code we have now.

I would like to have clang tests again, but I’m not sold on them being auto-generated from the same source as the header file. That’s always felt like the fox guarding the henhouse to me. Instead, I’d rather us have a separately developed (probably hand-maintained, unfortunately) .c file that has a simple test for each intrinsic. Perhaps we could use the intrinsics.c file from ARM, licensing permitting? In that, I *really* want the tests to check for IR output, not ARM64 assembly output. We should have equivalent tests in the llvm backend to check that the IR maps to the appropriate instruction(s). I don’t mind that work being a separate set of work.

> 
> Maybe as a followup commit though? First commit for identical (as far as possible) codegen, then
> start optimizing the codegen?
> 
>> It only really works because the vcombine is at the outside,
>> unfortunately. I'd love to be able to apply it to the many vget_high
>> calls too but can't think of a clean way without reinventing something
>> like DAGISel's PatFrag.
> 
> I did consider something like this. But I didn't want to be accused of overengineering ;)

Part of what I love about this approach is how it allows for incremental progress in directions like that as needed/desired. This is groundwork that sets the stage for even more.

Good stuff.

-Jim


> 
> Cheers,
> 
> James
> 
>> -----Original Message-----
>> From: Tim Northover [mailto:t.p.northover at gmail.com]
>> Sent: 11 June 2014 11:28
>> To: James Molloy
>> Cc: Renato Golin; Jiangning Liu; Clang Commits
>> Subject: Re: [PATCH] Rewrite arm_neon.h generation from scratch
>> 
>> Hi James,
>> 
>> On 11 June 2014 10:08, James Molloy <James.Molloy at arm.com> wrote:
>>> Ping? :)
>> 
>> I also think this is a wonderful change! Even committed now, it would
>> be a vast improvement on the existing code. I just spotted a couple of
>> possibilities:
>> 
>> +  /// NeededEarly - set if any other intrinsic depends on this intrinsic.
>> +  bool NeededEarly;
>> 
>> Was this an early version of the Dependencies thing? It seems unused.
>> 
>>> def OP_XTN      : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
>> 
>> (And others). Was it a conscious decision to use a "vcombine" call
>> over "(shuffle X, Y, (add mask0, mask1))"? I can see arguments both
>> ways.
>> 
>> Depending on how cunning we want to be, we could use something like:
>> 
>>    class ConcatOp<dag LHS, dag RHS> : Op<(shuffle LHS, RHS, (add
>> mask0, mask1))>;
>> 
>> Then write (with appropriate "node" handling in Intrinsic::emitDag):
>> 
>>    def node;
>>    def OP_XTN : ConcatOp<(node $p0), (call "vmovn", $p1)>;
>> 
>> It only really works because the vcombine is at the outside,
>> unfortunately. I'd love to be able to apply it to the many vget_high
>> calls too but can't think of a clean way without reinventing something
>> like DAGISel's PatFrag.
>> 
>> Cheers.
>> 
>> Tim.
> 
> 
> 
> 
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits





More information about the cfe-commits mailing list