[LLVMdev] LowerPacked pass
sabre at nondot.org
Fri Nov 19 09:24:33 PST 2004
On Fri, 19 Nov 2004, Morten Ofstad wrote:
> Chris Lattner wrote:
> > Note that packed support in LLVM is not complete yet. In
> > particular, here are some of the big missing pieces:
> > 1. No code generators can generate vector instructions yet (SSE or
> > altivec, for example). This should be fairly easy to add though.
> > 2. The lowerpacked pass, which currently converts packed ops into their
> > scalar counterparts, has a few limitations:
> > C. It has never been thoroughly tested, primarily because we don't
> > have a producer of packed operations yet. I believe it should
> > work reasonably well though.
> It works reasonably well, quite impressive really considering it's not
> been tested ;-)
You can thank Brad Jones for that, he did a great job! :)
> > B. It always lowers all of the way to scalar ops, even if the target
> > supports SOME packed types. For example, it would be nice for it
> > to eventually lower <16 x float> into 4 <4 x float>'s if the
> > target supports them.
> B is not much of a problem for my use,
Yup, I suspected not. You know what you're generating. :)
> > A. It does not handle packed arguments to functions
> but A is a bit annoying even though I mostly pass pointers to packed
> types anyway. Can you elaborate a bit on what is the problem with this?
> I have calls going back into our code by adding mappings to the JIT, but
> I'm not sure if I can get it to call functions with R32x4 (<float x 4>)
> args without making a wrapper that takes a pointer.
The basic problem is that a FunctionPass, like lower packed cannot change
the prototype of the function it is running on, so you can't change:
void foo(<4 x float>) -> void foo(float,float,float,float).
Like you said, passing a pointer is a work-around. A better solution
would be to implement insert/extract operations, and require all code
generators to support passing Packed types by value, but only require them
to implement the insert/extract operations, all other ops would be
lowered. This is reasonable because the insert/extract ops can be
implemented as simple memory copies.
> > For your work, it might be most expedient to just ignore the lower packed
> > pass and add SSE support to the X86 backend: that will get you up and
> > running quickly and get you the performance you are obviously after. If
> > backwards compatibility with old hardware is an issue, revisiting the
> > lower packed pass would make sense.
> Is it easy to add intrinsics to do things like dot product of packed
> types using SSE instructions? That's probably all I need...
Yes, it's quite easy, take a look at this for some more info:
Before you start adding a bunch of X86 specific intrinsics, please ping
the list with information (ideally in the form of a LangRef.html patch :),
about the intrinsics. While it is not necessarily a problem to have X86
specific intrinsics, we only want them for truly X86 specific operations.
I would think that dot product can be implemented successfully in multiple
different vector ISA's.
Also, I would assume you will want simple things like add and multiply of
packed values as well. These can be added directly to X86ISelSimple.cpp,
like the intrinsics. If you have questions about that process, let me
> > Let me know what you think. In the very short term, the hook exposed to
> > create the lower packed pass can be plunked into the X86TargetMachine and
> > get intra function packed types working for you.
> The patch you did was missing the actual implementation of
> createLowerPackedPass, so I'm including my own differences -- I guess
Sounds good, applied. Sorry for not doing it right in the first place!:
> you don't want to apply the changes to X86TargetMachine as I'm the only
> one actually generating packed types, but I include it for completeness..
This should definitely go in in the future, but I'd rather wait until
packed types work 100% before doing so. Maybe under control of an
-enable-simd flag or something would work.
Speaking of flags, if you look at the top of X86TargetMachine.cpp, there
is a SSEArg command line argument that is currently #ifdef'd out. I would
appreciate it if you enable it and use it to control the instructions
being emitted by the X86 backend (SSE1-3), if you start working on it.
More information about the llvm-dev