[llvm-commits] [patch] Mostly mechanical removal of getPhysicalRegisterRegClass.patch

Tue Jun 29 08:29:02 PDT 2010

On Jun 29, 2010, at 8:11 AM, Rafael Espindola wrote:

> On 29 June 2010 10:54, Anton Korobeynikov <asl at math.spbu.ru> wrote:
>> Hello, Jakob
>> 
>>> This is a little worrying, but probably not your fault.
>> Yes. This somehow seems to be a fallout from subreg indices unification.
>> 
>> Basically, the problem is that currently we can easily have
>> cross-domain switches while walking via subregs.
>> Look, for example, into DPR_VFP2 regclass. This class is used for NEON
>> code and thus all the stuff operates inside the NEON domain. It should
>> not contain any SPR subregs since there are no 32-bit reg-reg moves in
>> NEON (think about the core with VFP disabled, but not NEON).
>> 
>> In fact, this was the only reason why this regclass does exist - to
>> restrict the set of regs used for certain neon instructions.
> 
> I think we agreed on IRC. The issues is that the question "give me a
> regclass for D0" is ambiguous. Is that the NEON one or the VFP one? If
> we don't specify this, the existing DPR_VFP2 class is a valid answer
> and gives the copy creation extra freedom.
> 
> Since there are cases where we care a lot about moves from neon to
> vfp, we should make those explicit. I agree with Anton that what we
> should do is
> 
> *) Declare the neon registers as independent entities from the vfp
> registers (D0N, D1N, ...)
> *) Declare the class of neon registers. This will not intersect with
> the vfp register classes
> *) Declare move instructions between the two (COPY_TO_REGCLASS?). This
> will print just like a regular copy, but have a cost that reflects the
> move from neon to vfp or the other way.
> 
> Check?

I don't like the sound of this proposal very much.  I know the Cortex-A8 microarchitecture imposes a significantly penalty for moving between NEON and VFP, and maybe the A9 does as well (I forget), but that is just a bad design.  There's nothing in the ARM architecture that encourages those pipelines to be separated as they are in those particular implementations, and we should not design our compiler around that.  Hopefully someday ARM will fix the problem, and at that point, we don't want to be stuck with two sets of equivalent registers.

I missed the IRC discussion.  Could you explain how you came to this conclusion and what other options you considered?