[cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver

Mon Mar 7 17:11:38 PST 2016

----- Original Message -----
> From: "C Bergström via cfe-dev" <cfe-dev at lists.llvm.org>
> To: "Justin Lebar" <jlebar at google.com>
> Cc: "Alexey Bataev" <a.bataev at hotmail.com>, "C Bergström via cfe-dev" <cfe-dev at lists.llvm.org>, "Samuel F Antao"
> <sfantao at us.ibm.com>, "John McCall" <rjmccall at gmail.com>
> Sent: Monday, March 7, 2016 6:46:57 PM
> Subject: Re: [cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in	Clang Driver
> 
> #1 OMP allows multiple code generation, but doesn't *require* it. It
> wouldn't be invalid code if you only generated for a single target at
> a time - which imho isn't that unreasonable. Why?! It's unlikely that
> a user is going to try to build a single binary that runs across
> different supercomputers. It's not like what ANL is getting will be a
> mix (today/near future) of PHI+GPU. It will be a PHI shop.. ORNL is a
> GPU shop. The only burden is the user building the source twice (not
> that hard and what they do today anyway)

I agree, but supercomputers are not the only relevant platforms. Lot's of people have GPUs, and OpenMP offloading can also be used on various kinds of heterogeneous systems. I see no reason to design, at the driver level, for only a single target device type unless doing more is a significant implementation burden.

 -Hal

> 
> #2 This proposed tarball hack/wrapper thingie is just yuck design
> imho. I think there are better and more clean long term solutions
> 
> #3 re: "ARM ISA and Thumb via a runtime switch.  As a result, objdump
> has a very difficult time figuring out how to disassemble code that
> uses both ARM and Thumb."
> 
> My proposed solution of prefixing/name mangling the symbol to include
> "target" or optimization level solves this. It's almost exactly as
> what glibc (I spent 15 minutes looking for this doc I've seen before,
> but couldn't find it.. if really needed I'll go dig in the glibc
> sources for examples - the "doc" I'm looking for could be on the
> loader side though)
> 
> In the meantime there's also this
> 
> https://sourceware.org/glibc/wiki/libmvec
> "For x86_64 vector functions names are created based on #2.6. Vector
> Function Name Mangling from Vector ABI"
> 
> https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt
> 
> Which explicitly handles this case and explicitly mentions OMP.
> "Vector Function ABI provides ABI for vector functions generated by
> compiler supporting SIMD constructs of OpenMP 4.0 [1]."
> 
> 
> it may also be worthwhile looking at existing solutions more closely
> https://gcc.gnu.org/wiki/FunctionSpecificOpt
> https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Function-Attributes.html
> 
> "The target attribute is used to specify that a function is to be
> compiled with different target options than specified on the command
> line. This can be used for instance to have functions compiled with a
> different ISA"
> -------------
> Side note on ARM - Their semi-unified ISA is actually the "right way"
> to go. It's imho a good thing to have the vector or gpu instructions
> unified as a direct extension to the scalar stuff. I won't go into
> low
> level details why, but in the end that design would win over one
> where
> unified memory is possible, but separate load/store is required for
> left side to talk to right side. (iow ARM thumb is imho not the
> problem.. it's objdump - anyway objdump can't handle nvidia saas/ptx
> blobs so it's probably more a "nice to have" instead of absolute
> blocker)
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory