[llvm-commits] [PATCH] Fix PR11334

Mon Aug 13 14:34:07 PDT 2012

Hi

Just ping again for any suggestion on this proposal.

Yours
- Michael

On Thu, 2012-08-02 at 13:14 -0700, Michael Liao wrote:
> Hi
> 
> Just ping. Do we have any suggestions on this?
> 
> Yours
> - Michael
> 
> On Tue, 2012-07-31 at 20:06 -0700, Michael Liao wrote:
> > On Fri, 2012-07-27 at 17:18 -0700, Evan Cheng wrote:
> > > On Jul 26, 2012, at 10:27 AM, Michael Liao <michael.liao at intel.com> wrote:
> > > 
> > > > On Thu, 2012-07-26 at 08:42 +0200, Duncan Sands wrote:
> > > >> Hi Michael, CC'ing Evan and Chris to see if they have any suggestions for
> > > >> how best to deal with this issue.
> > > >> 
> > > > [snipped]
> > > >> However this scheme doesn't work so well for floating point vectors, since it's
> > > >> not that clear to me what
> > > >>   v2f64 = vector_shuffle v4f32, v4f32<0,1>
> > > >> would mean exactly.
> > > > 
> > > > That looks OK. From the perspective to make backend generate the
> > > > proper/efficient code, I have no preference which one will be relaxed.
> > > > But, the rationale to relax FP_EXTEND is that we could leverage the
> > > > existing optimizations in DAG combination or other target-independent
> > > > code generate passes (it's also the one of rationales why PR111334 is
> > > > fixed by recovering FP_EXTEND back instead of generating new node in
> > > > very early stage) without changing the semantic of ISD opcode or adding
> > > > new ones.
> > > > 
> > > > Relaxing FP_EXTEND changes that assumption a little bit but my personal
> > > > feeling it should be OK. Semantically, it's still what it's. When
> > > > input/output vectors are mismatching:
> > > > * if input has more elements than output, only the low part are
> > > > extended.
> > > > * if output has more elements than input (it's most unlikely for
> > > > FP_EXTEND), the high part of the output vector is undefined.
> > > 
> > > This is potentially a big change. I'm very weary of it unless there really isn't a better way to achieve the optimization that you are aiming for. I'm probably not the best person to reason about this. Dan, do you have comments?
> > > 
> > > Evan
> > 
> > I did some experimental change to relax ISD::FP_EXTEND and ISD::FP_ROUND
> > and refined the logic of how to widen their result/operands. There's no
> > more scalarization needed.
> > 
> > I added experimental support in X86 backend as well. So far, 'make
> > check-all' passed with all backends turned on with new test cases
> > attached as well.
> > 
> > Yours
> > - Michael
> > 
> > BTW, to get the current trunk generates better code for FP rounding from
> > v2f64 to v2f32, from v4f64 to v4f32, or etc., I did a similar fixing to
> > recover the scalarized FP_ROUND back and map onto X86 specific round
> > ISD. I will submit patch soon as well as other minor fixes.
> > 
> >  
> > > 
> > > > 
> > > > Unless the optimization really care about the number of vector elements
> > > > (assertions are exceptions), most of them just do optimization based on
> > > > ISD opcode.
> > > > 
> > > > Yours
> > > > - Michael
> > > > 
> > > >> 
> > > >> So I'm not too short what the best plan is in general.
> > > >> 
> > > >> Ciao, Duncan.
> > > >> 
> > > >> [*] This is the main reason why running the output of the GCC vectorizer through
> > > >> llc sometimes produces poor code.  The GCC vectorizer only produces operations
> > > >> that it knows can be represented well by the target processor, so we should
> > > >> never end up scalarizing but we do.
> > > >> 
> > > >>> 
> > > >>> However, I need more comments and suggestions from community before
> > > >>> pushing direction that way.
> > > >>> 
> > > >>> As a short-term solution, this patch only adds a target-specific DAG
> > > >>> optimization.
> > > >>> 
> > > >>> Yours
> > > >>> - Michael
> > > >>> 
> > > >>> On Wed, 2012-07-25 at 12:58 -0700, Rotem, Nadav wrote:
> > > >>>> Hi Michael,
> > > >>>> 
> > > >>>> In your patch you are counting on the type-legalizer to scalarize the FPEXT operation, only to gether it again. I think that the pre-type-legalization DAGCombine code would be short and simple.  Why not implement a DAGCombine optimization which works on vector FPEXT ISDs ?  I understand that it will be more difficult to handle types such as <3 x float>, but are these really important ?
> > > >>>> 
> > > >>>> Thanks,
> > > >>>> Nadav
> > > >>>> 
> > > >>>> -----Original Message-----
> > > >>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Michael Liao
> > > >>>> Sent: Wednesday, July 25, 2012 00:28
> > > >>>> To: llvm-commits at cs.uiuc.edu
> > > >>>> Subject: [llvm-commits] [PATCH] Fix PR11334
> > > >>>> 
> > > >>>> Hi
> > > >>>> 
> > > >>>> Please review the attached patch fixing PR11334. With this patch, the test case in PR11334 could generate the expected insn, CVTPS2PD instead of series of CVTSS2SD. An enhanced test case is included as well.
> > > >>>> 
> > > >>>> Yours
> > > >>>> - Michael
> > > >>> 
> > > >>> 
> > > >>> _______________________________________________
> > > >>> llvm-commits mailing list
> > > >>> llvm-commits at cs.uiuc.edu
> > > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > >>> 
> > > >> 
> > > >> _______________________________________________
> > > >> llvm-commits mailing list
> > > >> llvm-commits at cs.uiuc.edu
> > > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > > 
> > > > 
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > 
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits