[PATCH] D15914: [OpenCL] Pipe builtin functions

Mon Jan 11 10:54:26 PST 2016

pekka.jaaskelainen added inline comments.

================
Comment at: lib/CodeGen/CGBuiltin.cpp:2033
@@ +2032,3 @@
+          *Arg1 = EmitScalarExpr(E->getArg(1));
+    llvm::Type *ReservedIDTy = ConvertType(getContext().OCLReserveIDTy);
+
----------------
Anastasia wrote:
> pekka.jaaskelainen wrote:
> > Anastasia wrote:
> > > Why do we need to modify user defined functions? Only builtins will have this extra parameter.
> > > 
> > > I think packet size would be useful for builtin implementation to know the number of bytes to be read/written. I don't see how to implement it correctly otherwise. As mentioned earlier relying on the metadata is not a conventional compilation approach and should be avoided if possible.
> > The pipe struct can have the packet size in its header before the actual ring buffer or whatever, which can be used as a fallback unless the compiler can optimize it to a larger access. Correct implementation thus doesn't require a "hidden parameter". Adding it as a compile time hidden constant argument should help the optimizers, that's of course true, but I don't think it's strictly required.
> > 
> > If you think having a special behavior for the built-ins calls isn't problematic, then fine, I'm not having so strong opinion on this.
> So where would this size be initialized/set? Note that host code might have different type sizes.
> 
> I am thinking that having a size parameter makes the generated function more generic since we lose information about the type here and recovering it might involve extra steps. I am currently not clear about how to do that.
https://www.khronos.org/registry/cl/sdk/2.0/docs/man/xhtml/clCreatePipe.html sets the packet size in creation. We have a prototype implementation working using that. But that generic software version uses a byte loop which is painfully slow, which of course should be optimized to wide loads/stores whenever possible.

If we want to optimize it, it means inlining the builtin call and then deriving the size for the loop at the compile time after which it can be vectorized (assuming the packet fifo is aligned). Where this size constant comes from is the question under discussion -- you propose having it as an additional hidden parameter to the calls and I propose a metadata. Yours optimizes automatically but requires special casing (when calling a builtin, not user function, add this magic parameter), mine requires a new optimization that takes the MD in account and converts it to a constant after inlining the built-in.

I'm open for both as I do not really know how much trouble the special casing will bring and I appreciate that optimizations might just work, but in my understanding the size info is strictly not required, but very useful for optimization.

http://reviews.llvm.org/D15914