[flang-commits] [flang] 4ad7279 - [flang] CUDA Fortran - part 1/5: parsing

Peter Klausler via flang-commits flang-commits at lists.llvm.org
Wed May 31 09:49:06 PDT 2023


Author: Peter Klausler
Date: 2023-05-31T09:48:59-07:00
New Revision: 4ad7279392653c0bcf564799ffb3f7e20ed4ef00

URL: https://github.com/llvm/llvm-project/commit/4ad7279392653c0bcf564799ffb3f7e20ed4ef00
DIFF: https://github.com/llvm/llvm-project/commit/4ad7279392653c0bcf564799ffb3f7e20ed4ef00.diff

LOG: [flang] CUDA Fortran - part 1/5: parsing

Begin upstreaming of CUDA Fortran support in LLVM Flang.

This first patch implements parsing for CUDA Fortran syntax,
including:
 - a new LanguageFeature enum value for CUDA Fortran
 - driver change to enable that feature for *.cuf and *.CUF source files
 - parse tree representation of CUDA Fortran syntax
 - dumping and unparsing of the parse tree
 - the actual parsers for CUDA Fortran syntax
 - prescanning support for !@CUF and !$CUF
 - basic sanity testing via unparsing and parse tree dumps

... along with any minimized changes elsewhere to make these
work, mostly no-op cases in common::visitors instances in
semantics and lowering to allow them to compile in the face
of new types in variant<> instances in the parse tree.

Because CUDA Fortran allows the kernel launch chevron syntax
("call foo<<<blocks, threads>>>()") only on CALL statements and
not on function references, the parse tree nodes for CallStmt,
FunctionReference, and their shared Call were rearranged a bit;
this caused a fair amount of one-line changes in many files.

More patches will follow that implement CUDA Fortran in the symbol
table and name resolution, and then semantic checking.

Differential Revision: https://reviews.llvm.org/D150159

Added: 
    flang/test/Parser/cuf-sanity-common
    flang/test/Parser/cuf-sanity-tree.CUF
    flang/test/Parser/cuf-sanity-unparse.CUF

Modified: 
    flang/docs/ParserCombinators.md
    flang/include/flang/Common/Fortran-features.h
    flang/include/flang/Common/Fortran.h
    flang/include/flang/Common/indirection.h
    flang/include/flang/Common/template.h
    flang/include/flang/Frontend/FrontendOptions.h
    flang/include/flang/Parser/dump-parse-tree.h
    flang/include/flang/Parser/message.h
    flang/include/flang/Parser/parse-tree-visitor.h
    flang/include/flang/Parser/parse-tree.h
    flang/lib/Common/Fortran.cpp
    flang/lib/Frontend/FrontendAction.cpp
    flang/lib/Frontend/FrontendOptions.cpp
    flang/lib/Lower/Allocatable.cpp
    flang/lib/Lower/Bridge.cpp
    flang/lib/Lower/PFTBuilder.cpp
    flang/lib/Parser/Fortran-parsers.cpp
    flang/lib/Parser/basic-parsers.h
    flang/lib/Parser/executable-parsers.cpp
    flang/lib/Parser/io-parsers.cpp
    flang/lib/Parser/misc-parsers.h
    flang/lib/Parser/parse-tree.cpp
    flang/lib/Parser/parsing.cpp
    flang/lib/Parser/preprocessor.cpp
    flang/lib/Parser/prescan.cpp
    flang/lib/Parser/program-parsers.cpp
    flang/lib/Parser/stmt-parser.h
    flang/lib/Parser/unparse.cpp
    flang/lib/Semantics/check-allocate.cpp
    flang/lib/Semantics/check-do-forall.cpp
    flang/lib/Semantics/expression.cpp
    flang/lib/Semantics/resolve-names.cpp
    flang/lib/Semantics/tools.cpp
    flang/test/lib/lit.local.cfg
    flang/test/lit.cfg.py

Removed: 
    


################################################################################
diff  --git a/flang/docs/ParserCombinators.md b/flang/docs/ParserCombinators.md
index bb17509d16015..b00347396471e 100644
--- a/flang/docs/ParserCombinators.md
+++ b/flang/docs/ParserCombinators.md
@@ -97,8 +97,9 @@ They are `constexpr`, so they should be viewed as type-safe macros.
 * `nonemptySeparated(p, q)` repeatedly matches "p q p q p q ... p",
   returning a `std::list<>` of only the values of the p's.  It fails if
   p immediately fails.
-* `extension(p)` parses p if strict standard compliance is disabled,
-   or with a warning if nonstandard usage warnings are enabled.
+* `extension<feature>([msg,]p)` parses p if strict standard compliance is
+  disabled, or with an optional warning when nonstandard usage warnings
+  are enabled.
 * `deprecated(p)` parses p if strict standard compliance is disabled,
   with a warning if deprecated usage warnings are enabled.
 * `inContext(msg, p)` runs p within an error message context; any
@@ -165,9 +166,9 @@ is built.  All of the following parsers consume characters acquired from
    a longer identifier or keyword).
 * `parenthesized(p)` is shorthand for `"(" >> p / ")"`.
 * `bracketed(p)` is shorthand for `"[" >> p / "]"`.
-* `nonEmptyList(p)` matches a comma-separated list of one or more
+* `nonemptyList(p)` matches a comma-separated list of one or more
   instances of p.
-* `nonEmptyList(errorMessage, p)` is equivalent to
+* `nonemptyList(errorMessage, p)` is equivalent to
   `withMessage(errorMessage, nonemptyList(p))`, which allows one to supply
   a meaningful error message in the event of an empty list.
 * `optionalList(p)` is the same thing, but can be empty, and always succeeds.

diff  --git a/flang/include/flang/Common/Fortran-features.h b/flang/include/flang/Common/Fortran-features.h
index 2e33ec1df792c..8466c86dfff7e 100644
--- a/flang/include/flang/Common/Fortran-features.h
+++ b/flang/include/flang/Common/Fortran-features.h
@@ -27,7 +27,7 @@ ENUM_CLASS(LanguageFeature, BackslashEscapes, OldDebugLines,
     SignedPrimary, FileName, Carriagecontrol, Convert, Dispose,
     IOListLeadingComma, AbbreviatedEditDescriptor, ProgramParentheses,
     PercentRefAndVal, OmitFunctionDummies, CrayPointer, Hollerith, ArithmeticIF,
-    Assign, AssignedGOTO, Pause, OpenACC, OpenMP, CruftAfterAmpersand,
+    Assign, AssignedGOTO, Pause, OpenACC, OpenMP, CUDA, CruftAfterAmpersand,
     ClassicCComments, AdditionalFormats, BigIntLiterals, RealDoControls,
     EquivalenceNumericWithCharacter, EquivalenceNonDefaultNumeric,
     EquivalenceSameNonSequence, AdditionalIntrinsics, AnonymousParents,
@@ -54,6 +54,7 @@ class LanguageFeatureControl {
     disable_.set(LanguageFeature::OldDebugLines);
     disable_.set(LanguageFeature::OpenACC);
     disable_.set(LanguageFeature::OpenMP);
+    disable_.set(LanguageFeature::CUDA); // !@cuf
     disable_.set(LanguageFeature::ImplicitNoneTypeNever);
     disable_.set(LanguageFeature::ImplicitNoneTypeAlways);
     disable_.set(LanguageFeature::DefaultSave);

diff  --git a/flang/include/flang/Common/Fortran.h b/flang/include/flang/Common/Fortran.h
index 4a3e261373f37..59d82744fea71 100644
--- a/flang/include/flang/Common/Fortran.h
+++ b/flang/include/flang/Common/Fortran.h
@@ -15,6 +15,7 @@
 #include "enum-set.h"
 #include "idioms.h"
 #include <cinttypes>
+#include <optional>
 #include <string>
 
 namespace Fortran::common {
@@ -80,6 +81,12 @@ using Label = std::uint64_t;
 // Fortran arrays may have up to 15 dimensions (See Fortran 2018 section 5.4.6).
 static constexpr int maxRank{15};
 
+// CUDA subprogram attribute combinations
+ENUM_CLASS(CUDASubprogramAttrs, Host, Device, HostDevice, Global, Grid_Global)
+
+// CUDA data attributes; mutually exclusive
+ENUM_CLASS(CUDADataAttr, Constant, Device, Managed, Pinned, Shared, Texture)
+
 // Fortran names may have up to 63 characters (See Fortran 2018 C601).
 static constexpr int maxNameLen{63};
 
@@ -99,5 +106,8 @@ static constexpr IgnoreTKRSet ignoreTKRAll{IgnoreTKR::Type, IgnoreTKR::Kind,
     IgnoreTKR::Rank, IgnoreTKR::Device, IgnoreTKR::Managed};
 std::string AsFortran(IgnoreTKRSet);
 
+bool AreCompatibleCUDADataAttrs(
+    std::optional<CUDADataAttr>, std::optional<CUDADataAttr>, IgnoreTKRSet);
+
 } // namespace Fortran::common
 #endif // FORTRAN_COMMON_FORTRAN_H_

diff  --git a/flang/include/flang/Common/indirection.h b/flang/include/flang/Common/indirection.h
index 0bb7cc540a3c2..7348eb0473f07 100644
--- a/flang/include/flang/Common/indirection.h
+++ b/flang/include/flang/Common/indirection.h
@@ -148,6 +148,7 @@ template <typename A> class ForwardOwningPointer {
   A *operator->() const { return p_; }
   operator bool() const { return p_ != nullptr; }
   A *get() { return p_; }
+  auto get() const { return reinterpret_cast<std::add_const_t<A> *>(p_); }
   A *release() {
     A *result{p_};
     p_ = nullptr;

diff  --git a/flang/include/flang/Common/template.h b/flang/include/flang/Common/template.h
index 2a9958f74db38..2ab3b8bce1df9 100644
--- a/flang/include/flang/Common/template.h
+++ b/flang/include/flang/Common/template.h
@@ -94,8 +94,10 @@ constexpr int SearchMembers{
         TUPLEorVARIANT>::value()};
 
 template <typename A, typename TUPLEorVARIANT>
-constexpr bool HasMember{
-    SearchMembers<MatchType<A>::template Match, TUPLEorVARIANT> >= 0};
+constexpr int FindMember{
+    SearchMembers<MatchType<A>::template Match, TUPLEorVARIANT>};
+template <typename A, typename TUPLEorVARIANT>
+constexpr bool HasMember{FindMember<A, TUPLEorVARIANT> >= 0};
 
 // std::optional<std::optional<A>> -> std::optional<A>
 template <typename A>

diff  --git a/flang/include/flang/Frontend/FrontendOptions.h b/flang/include/flang/Frontend/FrontendOptions.h
index f24741b73e6a9..4691dfe87a731 100644
--- a/flang/include/flang/Frontend/FrontendOptions.h
+++ b/flang/include/flang/Frontend/FrontendOptions.h
@@ -113,6 +113,10 @@ bool isFreeFormSuffix(llvm::StringRef suffix);
 /// \return True if the file should be preprocessed
 bool isToBePreprocessed(llvm::StringRef suffix);
 
+/// \param suffix The file extension
+/// \return True if the file contains CUDA Fortran
+bool isCUDAFortranSuffix(llvm::StringRef suffix);
+
 enum class Language : uint8_t {
   Unknown,
 
@@ -182,6 +186,9 @@ class FrontendInputFile {
   /// sufficient to implement gfortran`s logic controlled with `-cpp/-nocpp`.
   unsigned mustBePreprocessed : 1;
 
+  /// Whether to enable CUDA Fortran language extensions
+  bool isCUDAFortran{false};
+
 public:
   FrontendInputFile() = default;
   FrontendInputFile(llvm::StringRef file, InputKind inKind)
@@ -193,6 +200,7 @@ class FrontendInputFile {
     std::string pathSuffix{file.substr(pathDotIndex + 1)};
     isFixedForm = isFixedFormSuffix(pathSuffix);
     mustBePreprocessed = isToBePreprocessed(pathSuffix);
+    isCUDAFortran = isCUDAFortranSuffix(pathSuffix);
   }
 
   FrontendInputFile(const llvm::MemoryBuffer *memBuf, InputKind inKind)
@@ -204,6 +212,7 @@ class FrontendInputFile {
   bool isFile() const { return (buffer == nullptr); }
   bool getIsFixedForm() const { return isFixedForm; }
   bool getMustBePreprocessed() const { return mustBePreprocessed; }
+  bool getIsCUDAFortran() const { return isCUDAFortran; }
 
   llvm::StringRef getFile() const {
     assert(isFile());

diff  --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h
index fa8db77b1ffbf..158064c2f3e31 100644
--- a/flang/include/flang/Parser/dump-parse-tree.h
+++ b/flang/include/flang/Parser/dump-parse-tree.h
@@ -14,6 +14,7 @@
 #include "parse-tree.h"
 #include "tools.h"
 #include "unparse.h"
+#include "flang/Common/Fortran.h"
 #include "flang/Common/idioms.h"
 #include "flang/Common/indirection.h"
 #include "llvm/Support/raw_ostream.h"
@@ -45,6 +46,8 @@ class ParseTreeDumper {
   NODE(std, string)
   NODE(std, int64_t)
   NODE(std, uint64_t)
+  NODE_ENUM(common, CUDADataAttr)
+  NODE_ENUM(common, CUDASubprogramAttrs)
   NODE(format, ControlEditDesc)
   NODE(format::ControlEditDesc, Kind)
   NODE(format, DerivedTypeDataEditDesc)
@@ -120,6 +123,8 @@ class ParseTreeDumper {
   NODE(parser, AllocOpt)
   NODE(AllocOpt, Mold)
   NODE(AllocOpt, Source)
+  NODE(AllocOpt, Stream)
+  NODE(AllocOpt, Pinned)
   NODE(parser, Allocatable)
   NODE(parser, AllocatableStmt)
   NODE(parser, AllocateCoarraySpec)
@@ -165,6 +170,7 @@ class ParseTreeDumper {
   NODE(parser, BoundsSpec)
   NODE(parser, Call)
   NODE(parser, CallStmt)
+  NODE(CallStmt, Chevrons)
   NODE(parser, CaseConstruct)
   NODE(CaseConstruct, Case)
   NODE(parser, CaseSelector)
@@ -216,6 +222,9 @@ class ParseTreeDumper {
   NODE(parser, ContinueStmt)
   NODE(parser, CriticalConstruct)
   NODE(parser, CriticalStmt)
+  NODE(parser, CUDAAttributesStmt)
+  NODE(parser, CUFKernelDoConstruct)
+  NODE(CUFKernelDoConstruct, Directive)
   NODE(parser, CycleStmt)
   NODE(parser, DataComponentDefStmt)
   NODE(parser, DataIDoObject)
@@ -610,6 +619,9 @@ class ParseTreeDumper {
   NODE(PrefixSpec, Non_Recursive)
   NODE(PrefixSpec, Pure)
   NODE(PrefixSpec, Recursive)
+  NODE(PrefixSpec, Attributes)
+  NODE(PrefixSpec, Launch_Bounds)
+  NODE(PrefixSpec, Cluster_Dims)
   NODE(parser, PrintStmt)
   NODE(parser, PrivateStmt)
   NODE(parser, PrivateOrSequence)

diff  --git a/flang/include/flang/Parser/message.h b/flang/include/flang/Parser/message.h
index 64d52b2447656..64b1298fb4969 100644
--- a/flang/include/flang/Parser/message.h
+++ b/flang/include/flang/Parser/message.h
@@ -54,6 +54,7 @@ class MessageFixedText {
   constexpr MessageFixedText &operator=(MessageFixedText &&) = default;
 
   CharBlock text() const { return text_; }
+  bool empty() const { return text_.empty(); }
   Severity severity() const { return severity_; }
   MessageFixedText &set_severity(Severity severity) {
     severity_ = severity;

diff  --git a/flang/include/flang/Parser/parse-tree-visitor.h b/flang/include/flang/Parser/parse-tree-visitor.h
index 073e71c6487b3..79ea29f4b7f32 100644
--- a/flang/include/flang/Parser/parse-tree-visitor.h
+++ b/flang/include/flang/Parser/parse-tree-visitor.h
@@ -568,17 +568,33 @@ template <typename M> void Walk(Designator &x, M &mutator) {
     mutator.Post(x);
   }
 }
-template <typename V> void Walk(const Call &x, V &visitor) {
+template <typename V> void Walk(const FunctionReference &x, V &visitor) {
   if (visitor.Pre(x)) {
     Walk(x.source, visitor);
-    Walk(x.t, visitor);
+    Walk(x.v, visitor);
     visitor.Post(x);
   }
 }
-template <typename M> void Walk(Call &x, M &mutator) {
+template <typename M> void Walk(FunctionReference &x, M &mutator) {
   if (mutator.Pre(x)) {
     Walk(x.source, mutator);
-    Walk(x.t, mutator);
+    Walk(x.v, mutator);
+    mutator.Post(x);
+  }
+}
+template <typename V> void Walk(const CallStmt &x, V &visitor) {
+  if (visitor.Pre(x)) {
+    Walk(x.source, visitor);
+    Walk(x.call, visitor);
+    Walk(x.chevrons, visitor);
+    visitor.Post(x);
+  }
+}
+template <typename M> void Walk(CallStmt &x, M &mutator) {
+  if (mutator.Pre(x)) {
+    Walk(x.source, mutator);
+    Walk(x.call, mutator);
+    Walk(x.chevrons, mutator);
     mutator.Post(x);
   }
 }

diff  --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h
index 23f1fafc98ae0..2742bb0fbc09e 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -67,7 +67,7 @@ class DerivedTypeSpec;
 namespace Fortran::evaluate {
 struct GenericExprWrapper; // forward definition, wraps Expr<SomeType>
 struct GenericAssignmentWrapper; // forward definition, represent assignment
-class ProcedureRef; // forward definition, represents a CALL statement
+class ProcedureRef; // forward definition, represents a CALL or function ref
 } // namespace Fortran::evaluate
 
 // Most non-template classes in this file use these default definitions
@@ -253,6 +253,7 @@ struct StmtFunctionStmt; // R1544
 // Directives, extensions, and deprecated statements
 struct CompilerDirective;
 struct BasedPointerStmt;
+struct CUDAAttributesStmt;
 struct StructureDef;
 struct ArithmeticIfStmt;
 struct AssignStmt;
@@ -264,6 +265,7 @@ struct OpenACCDeclarativeConstruct;
 struct OpenMPConstruct;
 struct OpenMPDeclarativeConstruct;
 struct OmpEndLoopDirective;
+struct CUFKernelDoConstruct;
 
 // Cooked character stream locations
 using Location = const char *;
@@ -361,6 +363,7 @@ EMPTY_CLASS(ErrorRecovery);
 //        pointer-stmt | protected-stmt | save-stmt | target-stmt |
 //        volatile-stmt | value-stmt | common-stmt | equivalence-stmt
 // Extension: (Cray) based POINTER statement
+// Extension: CUDA data attribute statement
 struct OtherSpecificationStmt {
   UNION_CLASS_BOILERPLATE(OtherSpecificationStmt);
   std::variant<common::Indirection<AccessStmt>,
@@ -374,7 +377,8 @@ struct OtherSpecificationStmt {
       common::Indirection<SaveStmt>, common::Indirection<TargetStmt>,
       common::Indirection<ValueStmt>, common::Indirection<VolatileStmt>,
       common::Indirection<CommonStmt>, common::Indirection<EquivalenceStmt>,
-      common::Indirection<BasedPointerStmt>>
+      common::Indirection<BasedPointerStmt>,
+      common::Indirection<CUDAAttributesStmt>>
       u;
 };
 
@@ -507,7 +511,8 @@ struct ActionStmt {
 //        action-stmt | associate-construct | block-construct |
 //        case-construct | change-team-construct | critical-construct |
 //        do-construct | if-construct | select-rank-construct |
-//        select-type-construct | where-construct | forall-construct
+//        select-type-construct | where-construct | forall-construct |
+// (CUDA) CUF-kernel-do-construct
 struct ExecutableConstruct {
   UNION_CLASS_BOILERPLATE(ExecutableConstruct);
   std::variant<Statement<ActionStmt>, common::Indirection<AssociateConstruct>,
@@ -524,7 +529,8 @@ struct ExecutableConstruct {
       common::Indirection<OpenACCConstruct>,
       common::Indirection<AccEndCombinedDirective>,
       common::Indirection<OpenMPConstruct>,
-      common::Indirection<OmpEndLoopDirective>>
+      common::Indirection<OmpEndLoopDirective>,
+      common::Indirection<CUFKernelDoConstruct>>
       u;
 };
 
@@ -977,14 +983,15 @@ struct ComponentArraySpec {
 // R738 component-attr-spec ->
 //        access-spec | ALLOCATABLE |
 //        CODIMENSION lbracket coarray-spec rbracket |
-//        CONTIGUOUS | DIMENSION ( component-array-spec ) | POINTER
+//        CONTIGUOUS | DIMENSION ( component-array-spec ) | POINTER |
+// (CUDA) CONSTANT | DEVICE | MANAGED | PINNED | SHARED | TEXTURE
 EMPTY_CLASS(Allocatable);
 EMPTY_CLASS(Pointer);
 EMPTY_CLASS(Contiguous);
 struct ComponentAttrSpec {
   UNION_CLASS_BOILERPLATE(ComponentAttrSpec);
   std::variant<AccessSpec, Allocatable, CoarraySpec, Contiguous,
-      ComponentArraySpec, Pointer, ErrorRecovery>
+      ComponentArraySpec, Pointer, common::CUDADataAttr, ErrorRecovery>
       u;
 };
 
@@ -1337,7 +1344,8 @@ struct IntentSpec {
 //        CODIMENSION lbracket coarray-spec rbracket | CONTIGUOUS |
 //        DIMENSION ( array-spec ) | EXTERNAL | INTENT ( intent-spec ) |
 //        INTRINSIC | language-binding-spec | OPTIONAL | PARAMETER | POINTER |
-//        PROTECTED | SAVE | TARGET | VALUE | VOLATILE
+//        PROTECTED | SAVE | TARGET | VALUE | VOLATILE |
+// (CUDA) CONSTANT | DEVICE | MANAGED | PINNED | SHARED | TEXTURE
 EMPTY_CLASS(Asynchronous);
 EMPTY_CLASS(External);
 EMPTY_CLASS(Intrinsic);
@@ -1352,7 +1360,8 @@ struct AttrSpec {
   UNION_CLASS_BOILERPLATE(AttrSpec);
   std::variant<AccessSpec, Allocatable, Asynchronous, CoarraySpec, Contiguous,
       ArraySpec, External, IntentSpec, Intrinsic, LanguageBindingSpec, Optional,
-      Parameter, Pointer, Protected, Save, Target, Value, Volatile>
+      Parameter, Pointer, Protected, Save, Target, Value, Volatile,
+      common::CUDADataAttr>
       u;
 };
 
@@ -1926,13 +1935,17 @@ struct StatOrErrmsg {
 
 // R928 alloc-opt ->
 //        ERRMSG = errmsg-variable | MOLD = source-expr |
-//        SOURCE = source-expr | STAT = stat-variable
+//        SOURCE = source-expr | STAT = stat-variable |
+// (CUDA) STREAM = scalar-int-expr
+//        PINNED = scalar-logical-variable
 // R931 source-expr -> expr
 struct AllocOpt {
   UNION_CLASS_BOILERPLATE(AllocOpt);
   WRAPPER_CLASS(Mold, common::Indirection<Expr>);
   WRAPPER_CLASS(Source, common::Indirection<Expr>);
-  std::variant<Mold, Source, StatOrErrmsg> u;
+  WRAPPER_CLASS(Stream, common::Indirection<ScalarIntExpr>);
+  WRAPPER_CLASS(Pinned, common::Indirection<ScalarLogicalVariable>);
+  std::variant<Mold, Source, StatOrErrmsg, Stream, Pinned> u;
 };
 
 // R927 allocate-stmt ->
@@ -3033,7 +3046,9 @@ struct ProcedureDeclarationStmt {
 
 // R1527 prefix-spec ->
 //         declaration-type-spec | ELEMENTAL | IMPURE | MODULE |
-//         NON_RECURSIVE | PURE | RECURSIVE
+//         NON_RECURSIVE | PURE | RECURSIVE |
+// (CUDA)  ATTRIBUTES ( (DEVICE | GLOBAL | GRID_GLOBAL | HOST)... )
+//         LAUNCH_BOUNDS(expr-list) | CLUSTER_DIMS(expr-list)
 struct PrefixSpec {
   UNION_CLASS_BOILERPLATE(PrefixSpec);
   EMPTY_CLASS(Elemental);
@@ -3042,8 +3057,11 @@ struct PrefixSpec {
   EMPTY_CLASS(Non_Recursive);
   EMPTY_CLASS(Pure);
   EMPTY_CLASS(Recursive);
+  WRAPPER_CLASS(Attributes, std::list<common::CUDASubprogramAttrs>);
+  WRAPPER_CLASS(Launch_Bounds, std::list<ScalarIntConstantExpr>);
+  WRAPPER_CLASS(Cluster_Dims, std::list<ScalarIntConstantExpr>);
   std::variant<DeclarationTypeSpec, Elemental, Impure, Module, Non_Recursive,
-      Pure, Recursive>
+      Pure, Recursive, Attributes, Launch_Bounds, Cluster_Dims>
       u;
 };
 
@@ -3172,23 +3190,39 @@ struct ActualArgSpec {
   std::tuple<std::optional<Keyword>, ActualArg> t;
 };
 
-// R1520 function-reference -> procedure-designator ( [actual-arg-spec-list] )
+// R1520 function-reference -> procedure-designator
+//         ( [actual-arg-spec-list] )
 struct Call {
   TUPLE_CLASS_BOILERPLATE(Call);
-  CharBlock source;
   std::tuple<ProcedureDesignator, std::list<ActualArgSpec>> t;
 };
 
 struct FunctionReference {
   WRAPPER_CLASS_BOILERPLATE(FunctionReference, Call);
+  CharBlock source;
   Designator ConvertToArrayElementRef();
   StructureConstructor ConvertToStructureConstructor(
       const semantics::DerivedTypeSpec &);
 };
 
-// R1521 call-stmt -> CALL procedure-designator [( [actual-arg-spec-list] )]
+// R1521 call-stmt -> CALL procedure-designator [ chevrons ]
+//         [( [actual-arg-spec-list] )]
+// (CUDA) chevrons -> <<< scalar-expr, scalar-expr [,
+//          scalar-int-expr [, scalar-int-expr ] ] >>>
 struct CallStmt {
-  WRAPPER_CLASS_BOILERPLATE(CallStmt, Call);
+  BOILERPLATE(CallStmt);
+  struct Chevrons {
+    TUPLE_CLASS_BOILERPLATE(Chevrons);
+    std::tuple<ScalarExpr, ScalarExpr, std::optional<ScalarIntExpr>,
+        std::optional<ScalarIntExpr>>
+        t;
+  };
+  explicit CallStmt(ProcedureDesignator &&pd, std::optional<Chevrons> &&ch,
+      std::list<ActualArgSpec> &&args)
+      : call{std::move(pd), std::move(args)}, chevrons{std::move(ch)} {}
+  Call call;
+  std::optional<Chevrons> chevrons;
+  CharBlock source;
   mutable common::ForwardOwningPointer<evaluate::ProcedureRef>
       typedCall; // filled by semantics
 };
@@ -3267,6 +3301,12 @@ struct CompilerDirective {
   std::variant<std::list<IgnoreTKR>, LoopCount, std::list<NameValue>> u;
 };
 
+// (CUDA) ATTRIBUTE(attribute) [::] name-list
+struct CUDAAttributesStmt {
+  TUPLE_CLASS_BOILERPLATE(CUDAAttributesStmt);
+  std::tuple<common::CUDADataAttr, std::list<Name>> t;
+};
+
 // Legacy extensions
 struct BasedPointer {
   TUPLE_CLASS_BOILERPLATE(BasedPointer);
@@ -4213,5 +4253,23 @@ struct OpenACCConstruct {
       u;
 };
 
+// CUF-kernel-do-construct ->
+//     !$CUF KERNEL DO [ (scalar-int-constant-expr) ] <<< grid, block [, stream]
+//     >>> do-construct
+// grid -> * | scalar-int-expr | ( scalar-int-expr-list )
+// block -> * | scalar-int-expr | ( scalar-int-expr-list )
+// stream -> 0, scalar-int-expr | STREAM = scalar-int-expr
+struct CUFKernelDoConstruct {
+  TUPLE_CLASS_BOILERPLATE(CUFKernelDoConstruct);
+  struct Directive {
+    TUPLE_CLASS_BOILERPLATE(Directive);
+    CharBlock source;
+    std::tuple<std::optional<ScalarIntConstantExpr>, std::list<ScalarIntExpr>,
+        std::list<ScalarIntExpr>, std::optional<ScalarIntExpr>>
+        t;
+  };
+  std::tuple<Directive, std::optional<DoConstruct>> t;
+};
+
 } // namespace Fortran::parser
 #endif // FORTRAN_PARSER_PARSE_TREE_H_

diff  --git a/flang/lib/Common/Fortran.cpp b/flang/lib/Common/Fortran.cpp
index e8d8fef9c49db..27ff31ef78da2 100644
--- a/flang/lib/Common/Fortran.cpp
+++ b/flang/lib/Common/Fortran.cpp
@@ -97,4 +97,23 @@ std::string AsFortran(IgnoreTKRSet tkr) {
   return result;
 }
 
+bool AreCompatibleCUDADataAttrs(std::optional<CUDADataAttr> x,
+    std::optional<CUDADataAttr> y, IgnoreTKRSet ignoreTKR) {
+  if (!x && !y) {
+    return true;
+  } else if (x && y && *x == *y) {
+    return true;
+  } else if (ignoreTKR.test(IgnoreTKR::Device) &&
+      x.value_or(CUDADataAttr::Device) == CUDADataAttr::Device &&
+      y.value_or(CUDADataAttr::Device) == CUDADataAttr::Device) {
+    return true;
+  } else if (ignoreTKR.test(IgnoreTKR::Managed) &&
+      x.value_or(CUDADataAttr::Managed) == CUDADataAttr::Managed &&
+      y.value_or(CUDADataAttr::Managed) == CUDADataAttr::Managed) {
+    return true;
+  } else {
+    return false;
+  }
+}
+
 } // namespace Fortran::common

diff  --git a/flang/lib/Frontend/FrontendAction.cpp b/flang/lib/Frontend/FrontendAction.cpp
index d9fe1232e29be..02052fc5ae41c 100644
--- a/flang/lib/Frontend/FrontendAction.cpp
+++ b/flang/lib/Frontend/FrontendAction.cpp
@@ -86,6 +86,10 @@ bool FrontendAction::beginSourceFile(CompilerInstance &ci,
     invoc.collectMacroDefinitions();
   }
 
+  // Enable CUDA Fortran if source file is *.cuf/*.CUF.
+  invoc.getFortranOpts().features.Enable(Fortran::common::LanguageFeature::CUDA,
+                                         getCurrentInput().getIsCUDAFortran());
+
   // Decide between fixed and free form (if the user didn't express any
   // preference, use the file extension to decide)
   if (invoc.getFrontendOpts().fortranForm == FortranForm::Unknown) {

diff  --git a/flang/lib/Frontend/FrontendOptions.cpp b/flang/lib/Frontend/FrontendOptions.cpp
index 504fac6cd6fb9..2141dacc6df7d 100644
--- a/flang/lib/Frontend/FrontendOptions.cpp
+++ b/flang/lib/Frontend/FrontendOptions.cpp
@@ -23,17 +23,22 @@ bool Fortran::frontend::isFixedFormSuffix(llvm::StringRef suffix) {
 
 bool Fortran::frontend::isFreeFormSuffix(llvm::StringRef suffix) {
   // Note: Keep this list in-sync with flang/test/lit.cfg.py
-  // TODO: Add Cuda Fortan files (i.e. `*.cuf` and `*.CUF`).
   return suffix == "f90" || suffix == "F90" || suffix == "ff90" ||
          suffix == "f95" || suffix == "F95" || suffix == "ff95" ||
          suffix == "f03" || suffix == "F03" || suffix == "f08" ||
-         suffix == "F08" || suffix == "f18" || suffix == "F18";
+         suffix == "F08" || suffix == "f18" || suffix == "F18" ||
+         suffix == "cuf" || suffix == "CUF";
 }
 
 bool Fortran::frontend::isToBePreprocessed(llvm::StringRef suffix) {
   return suffix == "F" || suffix == "FOR" || suffix == "fpp" ||
          suffix == "FPP" || suffix == "F90" || suffix == "F95" ||
-         suffix == "F03" || suffix == "F08" || suffix == "F18";
+         suffix == "F03" || suffix == "F08" || suffix == "F18" ||
+         suffix == "CUF";
+}
+
+bool Fortran::frontend::isCUDAFortranSuffix(llvm::StringRef suffix) {
+  return suffix == "cuf" || suffix == "CUF";
 }
 
 InputKind FrontendOptions::getInputKindForExtension(llvm::StringRef extension) {

diff  --git a/flang/lib/Lower/Allocatable.cpp b/flang/lib/Lower/Allocatable.cpp
index 3300b66b5464f..2050ca1ab9d94 100644
--- a/flang/lib/Lower/Allocatable.cpp
+++ b/flang/lib/Lower/Allocatable.cpp
@@ -367,6 +367,12 @@ class AllocateStmtHelper {
               [&](const Fortran::parser::AllocOpt::Mold &mold) {
                 moldExpr = Fortran::semantics::GetExpr(mold.v.value());
               },
+              [&](const Fortran::parser::AllocOpt::Stream &) {
+                TODO(loc, "CUDA ALLOCATE(STREAM=)");
+              },
+              [&](const Fortran::parser::AllocOpt::Pinned &) {
+                TODO(loc, "CUDA ALLOCATE(PINNED=)");
+              },
           },
           allocOption.u);
   }

diff  --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 9d8e2a37cf510..4a8e84e0ece30 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1321,7 +1321,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   void genFIR(const Fortran::parser::CallStmt &stmt) {
     Fortran::lower::StatementContext stmtCtx;
     Fortran::lower::pft::Evaluation &eval = getEval();
-    setCurrentPosition(stmt.v.source);
+    setCurrentPosition(stmt.source);
     assert(stmt.typedCall && "Call was not analyzed");
     mlir::Value res{};
     if (lowerToHighLevelFIR()) {
@@ -1348,7 +1348,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
     llvm::SmallVector<Fortran::parser::Label> labelList;
     int64_t index = 0;
     for (const Fortran::parser::ActualArgSpec &arg :
-         std::get<std::list<Fortran::parser::ActualArgSpec>>(stmt.v.t)) {
+         std::get<std::list<Fortran::parser::ActualArgSpec>>(stmt.call.t)) {
       const auto &actual = std::get<Fortran::parser::ActualArg>(arg.t);
       if (const auto *altReturn =
               std::get_if<Fortran::parser::AltReturnSpec>(&actual.u)) {

diff  --git a/flang/lib/Lower/PFTBuilder.cpp b/flang/lib/Lower/PFTBuilder.cpp
index ec4c7218b58f6..560a9972148ea 100644
--- a/flang/lib/Lower/PFTBuilder.cpp
+++ b/flang/lib/Lower/PFTBuilder.cpp
@@ -726,7 +726,7 @@ class PFTBuilder {
           [&](const parser::CallStmt &s) {
             // Look for alternate return specifiers.
             const auto &args =
-                std::get<std::list<parser::ActualArgSpec>>(s.v.t);
+                std::get<std::list<parser::ActualArgSpec>>(s.call.t);
             for (const auto &arg : args) {
               const auto &actual = std::get<parser::ActualArg>(arg.t);
               if (const auto *altReturn =

diff  --git a/flang/lib/Parser/Fortran-parsers.cpp b/flang/lib/Parser/Fortran-parsers.cpp
index e6198ee651057..d7e01c924c6b3 100644
--- a/flang/lib/Parser/Fortran-parsers.cpp
+++ b/flang/lib/Parser/Fortran-parsers.cpp
@@ -451,13 +451,16 @@ TYPE_PARSER(construct<DataComponentDefStmt>(declarationTypeSpec,
 // R738 component-attr-spec ->
 //        access-spec | ALLOCATABLE |
 //        CODIMENSION lbracket coarray-spec rbracket |
-//        CONTIGUOUS | DIMENSION ( component-array-spec ) | POINTER
+//        CONTIGUOUS | DIMENSION ( component-array-spec ) | POINTER |
+//        CUDA-data-attr
 TYPE_PARSER(construct<ComponentAttrSpec>(accessSpec) ||
     construct<ComponentAttrSpec>(allocatable) ||
     construct<ComponentAttrSpec>("CODIMENSION" >> coarraySpec) ||
     construct<ComponentAttrSpec>(contiguous) ||
     construct<ComponentAttrSpec>("DIMENSION" >> Parser<ComponentArraySpec>{}) ||
     construct<ComponentAttrSpec>(pointer) ||
+    extension<LanguageFeature::CUDA>(
+        construct<ComponentAttrSpec>(Parser<common::CUDADataAttr>{})) ||
     construct<ComponentAttrSpec>(recovery(
         fail<ErrorRecovery>(
             "type parameter definitions must appear before component declarations"_err_en_US),
@@ -677,7 +680,8 @@ TYPE_PARSER(
 //        CODIMENSION lbracket coarray-spec rbracket | CONTIGUOUS |
 //        DIMENSION ( array-spec ) | EXTERNAL | INTENT ( intent-spec ) |
 //        INTRINSIC | language-binding-spec | OPTIONAL | PARAMETER | POINTER |
-//        PROTECTED | SAVE | TARGET | VALUE | VOLATILE
+//        PROTECTED | SAVE | TARGET | VALUE | VOLATILE |
+//        CUDA-data-attr
 TYPE_PARSER(construct<AttrSpec>(accessSpec) ||
     construct<AttrSpec>(allocatable) ||
     construct<AttrSpec>(construct<Asynchronous>("ASYNCHRONOUS"_tok)) ||
@@ -693,7 +697,17 @@ TYPE_PARSER(construct<AttrSpec>(accessSpec) ||
     construct<AttrSpec>(save) ||
     construct<AttrSpec>(construct<Target>("TARGET"_tok)) ||
     construct<AttrSpec>(construct<Value>("VALUE"_tok)) ||
-    construct<AttrSpec>(construct<Volatile>("VOLATILE"_tok)))
+    construct<AttrSpec>(construct<Volatile>("VOLATILE"_tok)) ||
+    extension<LanguageFeature::CUDA>(
+        construct<AttrSpec>(Parser<common::CUDADataAttr>{})))
+
+// CUDA-data-attr -> CONSTANT | DEVICE | MANAGED | PINNED | SHARED | TEXTURE
+TYPE_PARSER("CONSTANT" >> pure(common::CUDADataAttr::Constant) ||
+    "DEVICE" >> pure(common::CUDADataAttr::Device) ||
+    "MANAGED" >> pure(common::CUDADataAttr::Managed) ||
+    "PINNED" >> pure(common::CUDADataAttr::Pinned) ||
+    "SHARED" >> pure(common::CUDADataAttr::Shared) ||
+    "TEXTURE" >> pure(common::CUDADataAttr::Texture))
 
 // R804 object-name -> name
 constexpr auto objectName{name};
@@ -1181,13 +1195,20 @@ TYPE_CONTEXT_PARSER("ALLOCATE statement"_en_US,
 
 // R928 alloc-opt ->
 //        ERRMSG = errmsg-variable | MOLD = source-expr |
-//        SOURCE = source-expr | STAT = stat-variable
+//        SOURCE = source-expr | STAT = stat-variable |
+// (CUDA) STREAM = scalar-int-expr
+//        PINNED = scalar-logical-variable
 // R931 source-expr -> expr
 TYPE_PARSER(construct<AllocOpt>(
                 construct<AllocOpt::Mold>("MOLD =" >> indirect(expr))) ||
     construct<AllocOpt>(
         construct<AllocOpt::Source>("SOURCE =" >> indirect(expr))) ||
-    construct<AllocOpt>(statOrErrmsg))
+    construct<AllocOpt>(statOrErrmsg) ||
+    extension<LanguageFeature::CUDA>(
+        construct<AllocOpt>(construct<AllocOpt::Stream>(
+            "STREAM =" >> indirect(scalarIntExpr))) ||
+        construct<AllocOpt>(construct<AllocOpt::Pinned>(
+            "PINNED =" >> indirect(scalarLogicalVariable)))))
 
 // R929 stat-variable -> scalar-int-variable
 TYPE_PARSER(construct<StatVariable>(scalar(integer(variable))))
@@ -1239,14 +1260,12 @@ TYPE_PARSER(construct<StatOrErrmsg>("STAT =" >> statVariable) ||
 // !DIR$ IGNORE_TKR [ [(tkrdmac...)] name ]...
 // !DIR$ LOOP COUNT (n1[, n2]...)
 // !DIR$ name...
-constexpr auto beginDirective{skipStuffBeforeStatement >> "!"_ch};
 constexpr auto ignore_tkr{
     "DIR$ IGNORE_TKR" >> optionalList(construct<CompilerDirective::IgnoreTKR>(
                              maybe(parenthesized(many(letter))), name))};
 constexpr auto loopCount{
     "DIR$ LOOP COUNT" >> construct<CompilerDirective::LoopCount>(
                              parenthesized(nonemptyList(digitString64)))};
-
 TYPE_PARSER(beginDirective >>
     sourced(construct<CompilerDirective>(ignore_tkr) ||
         construct<CompilerDirective>(loopCount) ||
@@ -1262,6 +1281,12 @@ TYPE_PARSER(extension<LanguageFeature::CrayPointer>(
                          construct<BasedPointer>("(" >> objectName / ",",
                              objectName, maybe(Parser<ArraySpec>{}) / ")")))))
 
+// CUDA-attributes-stmt -> ATTRIBUTES (CUDA-data-attr) [::] name-list
+TYPE_PARSER(extension<LanguageFeature::CUDA>(construct<CUDAAttributesStmt>(
+    "ATTRIBUTES" >> parenthesized(Parser<common::CUDADataAttr>{}),
+    defaulted(
+        maybe("::"_tok) >> nonemptyList("expected names"_err_en_US, name)))))
+
 // Subtle: the name includes the surrounding slashes, which avoids
 // clashes with other uses of the name in the same scope.
 TYPE_PARSER(construct<StructureStmt>(

diff  --git a/flang/lib/Parser/basic-parsers.h b/flang/lib/Parser/basic-parsers.h
index 784bd770fa64f..515b5993d6737 100644
--- a/flang/lib/Parser/basic-parsers.h
+++ b/flang/lib/Parser/basic-parsers.h
@@ -852,6 +852,7 @@ template <LanguageFeature LF, typename PA> class NonstandardParser {
   constexpr NonstandardParser(const NonstandardParser &) = default;
   constexpr NonstandardParser(PA parser, MessageFixedText msg)
       : parser_{parser}, message_{msg} {}
+  constexpr NonstandardParser(PA parser) : parser_{parser} {}
   std::optional<resultType> Parse(ParseState &state) const {
     if (UserState * ustate{state.userState()}) {
       if (!ustate->features().IsEnabled(LF)) {
@@ -860,7 +861,7 @@ template <LanguageFeature LF, typename PA> class NonstandardParser {
     }
     auto at{state.GetLocation()};
     auto result{parser_.Parse(state)};
-    if (result) {
+    if (result && !message_.empty()) {
       state.Nonstandard(
           CharBlock{at, std::max(state.GetLocation(), at + 1)}, LF, message_);
     }
@@ -877,6 +878,11 @@ inline constexpr auto extension(MessageFixedText feature, PA parser) {
   return NonstandardParser<LF, PA>(parser, feature);
 }
 
+template <LanguageFeature LF, typename PA>
+inline constexpr auto extension(PA parser) {
+  return NonstandardParser<LF, PA>(parser);
+}
+
 // If a is a parser for some deprecated or deleted language feature LF,
 // deprecated<LF>(a) is a parser that is optionally enabled, sets a strict
 // conformance violation flag, and may emit a warning message, if enabled.

diff  --git a/flang/lib/Parser/executable-parsers.cpp b/flang/lib/Parser/executable-parsers.cpp
index 92e7d25d93d3b..56ca3ed4991de 100644
--- a/flang/lib/Parser/executable-parsers.cpp
+++ b/flang/lib/Parser/executable-parsers.cpp
@@ -9,6 +9,7 @@
 // Per-type parsers for executable statements
 
 #include "basic-parsers.h"
+#include "debug-parser.h"
 #include "expr-parsers.h"
 #include "misc-parsers.h"
 #include "stmt-parser.h"
@@ -30,29 +31,31 @@ namespace Fortran::parser {
 //        action-stmt | associate-construct | block-construct |
 //        case-construct | change-team-construct | critical-construct |
 //        do-construct | if-construct | select-rank-construct |
-//        select-type-construct | where-construct | forall-construct
-constexpr auto executableConstruct{
-    first(construct<ExecutableConstruct>(CapturedLabelDoStmt{}),
-        construct<ExecutableConstruct>(EndDoStmtForCapturedLabelDoStmt{}),
-        construct<ExecutableConstruct>(indirect(Parser<DoConstruct>{})),
-        // Attempt DO statements before assignment statements for better
-        // error messages in cases like "DO10I=1,(error)".
-        construct<ExecutableConstruct>(statement(actionStmt)),
-        construct<ExecutableConstruct>(indirect(Parser<AssociateConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<BlockConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<CaseConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<ChangeTeamConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<CriticalConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<IfConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<SelectRankConstruct>{})),
-        construct<ExecutableConstruct>(indirect(Parser<SelectTypeConstruct>{})),
-        construct<ExecutableConstruct>(indirect(whereConstruct)),
-        construct<ExecutableConstruct>(indirect(forallConstruct)),
-        construct<ExecutableConstruct>(indirect(ompEndLoopDirective)),
-        construct<ExecutableConstruct>(indirect(openmpConstruct)),
-        construct<ExecutableConstruct>(indirect(accEndCombinedDirective)),
-        construct<ExecutableConstruct>(indirect(openaccConstruct)),
-        construct<ExecutableConstruct>(indirect(compilerDirective)))};
+//        select-type-construct | where-construct | forall-construct |
+// (CUDA) CUF-kernel-do-construct
+constexpr auto executableConstruct{first(
+    construct<ExecutableConstruct>(CapturedLabelDoStmt{}),
+    construct<ExecutableConstruct>(EndDoStmtForCapturedLabelDoStmt{}),
+    construct<ExecutableConstruct>(indirect(Parser<DoConstruct>{})),
+    // Attempt DO statements before assignment statements for better
+    // error messages in cases like "DO10I=1,(error)".
+    construct<ExecutableConstruct>(statement(actionStmt)),
+    construct<ExecutableConstruct>(indirect(Parser<AssociateConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<BlockConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<CaseConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<ChangeTeamConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<CriticalConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<IfConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<SelectRankConstruct>{})),
+    construct<ExecutableConstruct>(indirect(Parser<SelectTypeConstruct>{})),
+    construct<ExecutableConstruct>(indirect(whereConstruct)),
+    construct<ExecutableConstruct>(indirect(forallConstruct)),
+    construct<ExecutableConstruct>(indirect(ompEndLoopDirective)),
+    construct<ExecutableConstruct>(indirect(openmpConstruct)),
+    construct<ExecutableConstruct>(indirect(accEndCombinedDirective)),
+    construct<ExecutableConstruct>(indirect(openaccConstruct)),
+    construct<ExecutableConstruct>(indirect(compilerDirective)),
+    construct<ExecutableConstruct>(indirect(Parser<CUFKernelDoConstruct>{})))};
 
 // R510 execution-part-construct ->
 //        executable-construct | format-stmt | entry-stmt | data-stmt
@@ -525,4 +528,28 @@ TYPE_CONTEXT_PARSER("UNLOCK statement"_en_US,
     construct<UnlockStmt>("UNLOCK (" >> lockVariable,
         defaulted("," >> nonemptyList(statOrErrmsg)) / ")"))
 
+// CUF-kernel-do-construct -> CUF-kernel-do-directive do-construct
+// CUF-kernel-do-directive ->
+//     !$CUF KERNEL DO [ (scalar-int-constant-expr) ] <<< grid, block [, stream]
+//     >>> do-construct
+// grid -> * | scalar-int-expr | ( scalar-int-expr-list )
+// block -> * | scalar-int-expr | ( scalar-int-expr-list )
+// stream -> ( 0, | STREAM = ) scalar-int-expr
+TYPE_PARSER(sourced(beginDirective >> "$CUF KERNEL DO"_tok >>
+    construct<CUFKernelDoConstruct::Directive>(
+        maybe(parenthesized(scalarIntConstantExpr)),
+        "<<<" >>
+            ("*" >> pure<std::list<ScalarIntExpr>>() ||
+                parenthesized(nonemptyList(scalarIntExpr)) ||
+                applyFunction(singletonList<ScalarIntExpr>, scalarIntExpr)),
+        "," >> ("*" >> pure<std::list<ScalarIntExpr>>() ||
+                   parenthesized(nonemptyList(scalarIntExpr)) ||
+                   applyFunction(singletonList<ScalarIntExpr>, scalarIntExpr)),
+        maybe((", 0 ,"_tok || ", STREAM ="_tok) >> scalarIntExpr) / ">>>" /
+            endDirective)))
+TYPE_CONTEXT_PARSER("!$CUF KERNEL DO construct"_en_US,
+    extension<LanguageFeature::CUDA>(construct<CUFKernelDoConstruct>(
+        Parser<CUFKernelDoConstruct::Directive>{},
+        maybe(Parser<DoConstruct>{}))))
+
 } // namespace Fortran::parser

diff  --git a/flang/lib/Parser/io-parsers.cpp b/flang/lib/Parser/io-parsers.cpp
index 538f03dfdbdcf..8c082810d8a46 100644
--- a/flang/lib/Parser/io-parsers.cpp
+++ b/flang/lib/Parser/io-parsers.cpp
@@ -301,11 +301,6 @@ TYPE_PARSER(first(construct<WaitSpec>(maybe("UNIT ="_tok) >> fileUnitNumber),
     construct<WaitSpec>("IOMSG =" >> msgVariable),
     construct<WaitSpec>("IOSTAT =" >> statVariable)))
 
-template <typename A> common::IfNoLvalue<std::list<A>, A> singletonList(A &&x) {
-  std::list<A> result;
-  result.push_front(std::move(x));
-  return result;
-}
 constexpr auto bareUnitNumberAsList{
     applyFunction(singletonList<PositionOrFlushSpec>,
         construct<PositionOrFlushSpec>(fileUnitNumber))};

diff  --git a/flang/lib/Parser/misc-parsers.h b/flang/lib/Parser/misc-parsers.h
index b9d6ca5418023..e9b52b7d0fcd0 100644
--- a/flang/lib/Parser/misc-parsers.h
+++ b/flang/lib/Parser/misc-parsers.h
@@ -52,5 +52,10 @@ constexpr auto pointer{construct<Pointer>("POINTER"_tok)};
 constexpr auto protectedAttr{construct<Protected>("PROTECTED"_tok)};
 constexpr auto save{construct<Save>("SAVE"_tok)};
 
+template <typename A> common::IfNoLvalue<std::list<A>, A> singletonList(A &&x) {
+  std::list<A> result;
+  result.emplace_back(std::move(x));
+  return result;
+}
 } // namespace Fortran::parser
 #endif

diff  --git a/flang/lib/Parser/parse-tree.cpp b/flang/lib/Parser/parse-tree.cpp
index 38f08f139f3aa..7f0899aaa1429 100644
--- a/flang/lib/Parser/parse-tree.cpp
+++ b/flang/lib/Parser/parse-tree.cpp
@@ -132,7 +132,7 @@ static Expr ActualArgToExpr(ActualArgSpec &arg) {
                     },
                     [&](common::Indirection<FunctionReference> &z) {
                       return WithSource(
-                          z.value().v.source, Expr{std::move(z.value())});
+                          z.value().source, Expr{std::move(z.value())});
                     },
                 },
                 y.value().u);
@@ -151,10 +151,10 @@ Designator FunctionReference::ConvertToArrayElementRef() {
       common::visitors{
           [&](const Name &name) {
             return WithSource(
-                v.source, MakeArrayElementRef(name, std::move(args)));
+                source, MakeArrayElementRef(name, std::move(args)));
           },
           [&](ProcComponentRef &pcr) {
-            return WithSource(v.source,
+            return WithSource(source,
                 MakeArrayElementRef(std::move(pcr.v.thing), std::move(args)));
           },
       },
@@ -226,9 +226,10 @@ Statement<ActionStmt> StmtFunctionStmt::ConvertToAssignment() {
   }
   CHECK(*source.end() == ')');
   source = CharBlock{source.begin(), source.end() + 1};
-  FunctionReference funcRef{WithSource(source,
+  FunctionReference funcRef{
       Call{ProcedureDesignator{Name{funcName.source, funcName.symbol}},
-          std::move(actuals)})};
+          std::move(actuals)}};
+  funcRef.source = source;
   auto variable{Variable{common::Indirection{std::move(funcRef)}}};
   return Statement{std::nullopt,
       ActionStmt{common::Indirection{
@@ -242,7 +243,7 @@ CharBlock Variable::GetSource() const {
             return des.value().source;
           },
           [&](const common::Indirection<parser::FunctionReference> &call) {
-            return call.value().v.source;
+            return call.value().source;
           },
       },
       u);

diff  --git a/flang/lib/Parser/parsing.cpp b/flang/lib/Parser/parsing.cpp
index 1af8afef18caf..1f17ed8b69067 100644
--- a/flang/lib/Parser/parsing.cpp
+++ b/flang/lib/Parser/parsing.cpp
@@ -84,6 +84,11 @@ const SourceFile *Parsing::Prescan(const std::string &path, Options options) {
     prescanner.AddCompilerDirectiveSentinel("$omp");
     prescanner.AddCompilerDirectiveSentinel("$"); // OMP conditional line
   }
+  if (options.features.IsEnabled(LanguageFeature::CUDA)) {
+    prescanner.AddCompilerDirectiveSentinel("$cuf");
+    prescanner.AddCompilerDirectiveSentinel("@cuf");
+    preprocessor.Define("_CUDA", "1");
+  }
   ProvenanceRange range{allSources.AddIncludedFile(
       *sourceFile, ProvenanceRange{}, options.isModuleFile)};
   prescanner.Prescan(range);

diff  --git a/flang/lib/Parser/preprocessor.cpp b/flang/lib/Parser/preprocessor.cpp
index 6dea1298e77c6..1efe21ae18fe3 100644
--- a/flang/lib/Parser/preprocessor.cpp
+++ b/flang/lib/Parser/preprocessor.cpp
@@ -622,11 +622,12 @@ void Preprocessor::Directive(const TokenSequence &dir, Prescanner &prescanner) {
       TokenSequence braced{dir, j + 1, k - j - 1};
       include = braced.ToString();
       j = k;
-    } else if ((include = dir.TokenAt(j).ToString()).substr(0, 1) == "\"" &&
-        include.substr(include.size() - 1, 1) == "\"") { // #include "foo"
+    } else if (((include = dir.TokenAt(j).ToString()).substr(0, 1) == "\"" ||
+                   include.substr(0, 1) == "'") &&
+        include.substr(include.size() - 1, 1) == include.substr(0, 1)) {
+      // #include "foo" and #include 'foo'
       include = include.substr(1, include.size() - 2);
-      // #include "foo" starts search in directory of file containing
-      // the directive
+      // Start search in directory of file containing the directive
       auto prov{dir.GetTokenProvenanceRange(dirOffset).start()};
       if (const auto *currentFile{allSources_.GetSourceFile(prov)}) {
         prependPath = DirectoryName(currentFile->path());

diff  --git a/flang/lib/Parser/prescan.cpp b/flang/lib/Parser/prescan.cpp
index 2bbf1d67eb626..c42e8eaab7994 100644
--- a/flang/lib/Parser/prescan.cpp
+++ b/flang/lib/Parser/prescan.cpp
@@ -127,6 +127,17 @@ void Prescanner::Statement() {
       } else {
         SkipSpaces();
       }
+    } else if (directiveSentinel_[0] == '@' && directiveSentinel_[1] == 'c' &&
+        directiveSentinel_[2] == 'u' && directiveSentinel_[3] == 'f' &&
+        directiveSentinel_[4] == '\0') {
+      // CUDA conditional compilation line.  Remove the sentinel and then
+      // treat the line as if it were normal source.
+      at_ += 5, column_ += 5;
+      if (inFixedForm_) {
+        LabelField(tokens);
+      } else {
+        SkipSpaces();
+      }
     } else {
       // Compiler directive.  Emit normalized sentinel.
       EmitChar(tokens, '!');

diff  --git a/flang/lib/Parser/program-parsers.cpp b/flang/lib/Parser/program-parsers.cpp
index 9a74b3b35318b..521ae43097adc 100644
--- a/flang/lib/Parser/program-parsers.cpp
+++ b/flang/lib/Parser/program-parsers.cpp
@@ -81,10 +81,10 @@ TYPE_CONTEXT_PARSER("specification part"_en_US,
 // are in contexts that impose constraints on the kinds of statements that
 // are allowed, and so we have a variant production for declaration-construct
 // that implements those constraints.
-constexpr auto execPartLookAhead{
-    first(actionStmt >> ok, openaccConstruct >> ok, openmpConstruct >> ok,
-        "ASSOCIATE ("_tok, "BLOCK"_tok, "SELECT"_tok, "CHANGE TEAM"_sptok,
-        "CRITICAL"_tok, "DO"_tok, "IF ("_tok, "WHERE ("_tok, "FORALL ("_tok)};
+constexpr auto execPartLookAhead{first(actionStmt >> ok, openaccConstruct >> ok,
+    openmpConstruct >> ok, "ASSOCIATE ("_tok, "BLOCK"_tok, "SELECT"_tok,
+    "CHANGE TEAM"_sptok, "CRITICAL"_tok, "DO"_tok, "IF ("_tok, "WHERE ("_tok,
+    "FORALL ("_tok, "!$CUF"_tok)};
 constexpr auto declErrorRecovery{
     stmtErrorRecoveryStart >> !execPartLookAhead >> skipStmtErrorRecovery};
 constexpr auto misplacedSpecificationStmt{Parser<UseStmt>{} >>
@@ -168,7 +168,8 @@ TYPE_CONTEXT_PARSER("specification construct"_en_US,
 //        codimension-stmt | contiguous-stmt | dimension-stmt | external-stmt |
 //        intent-stmt | intrinsic-stmt | namelist-stmt | optional-stmt |
 //        pointer-stmt | protected-stmt | save-stmt | target-stmt |
-//        volatile-stmt | value-stmt | common-stmt | equivalence-stmt
+//        volatile-stmt | value-stmt | common-stmt | equivalence-stmt |
+// (CUDA) CUDA-attributes-stmt
 TYPE_PARSER(first(
     construct<OtherSpecificationStmt>(indirect(Parser<AccessStmt>{})),
     construct<OtherSpecificationStmt>(indirect(Parser<AllocatableStmt>{})),
@@ -190,7 +191,8 @@ TYPE_PARSER(first(
     construct<OtherSpecificationStmt>(indirect(Parser<VolatileStmt>{})),
     construct<OtherSpecificationStmt>(indirect(Parser<CommonStmt>{})),
     construct<OtherSpecificationStmt>(indirect(Parser<EquivalenceStmt>{})),
-    construct<OtherSpecificationStmt>(indirect(Parser<BasedPointerStmt>{}))))
+    construct<OtherSpecificationStmt>(indirect(Parser<BasedPointerStmt>{})),
+    construct<OtherSpecificationStmt>(indirect(Parser<CUDAAttributesStmt>{}))))
 
 // R1401 main-program ->
 //         [program-stmt] [specification-part] [execution-part]
@@ -422,16 +424,25 @@ TYPE_PARSER(
 TYPE_PARSER(
     "INTRINSIC" >> maybe("::"_tok) >> construct<IntrinsicStmt>(listOfNames))
 
-// R1520 function-reference -> procedure-designator ( [actual-arg-spec-list] )
+// R1520 function-reference -> procedure-designator
+//                               ( [actual-arg-spec-list] )
 TYPE_CONTEXT_PARSER("function reference"_en_US,
-    construct<FunctionReference>(
-        sourced(construct<Call>(Parser<ProcedureDesignator>{},
+    sourced(construct<FunctionReference>(
+        construct<Call>(Parser<ProcedureDesignator>{},
             parenthesized(optionalList(actualArgSpec))))) /
         !"["_tok)
 
-// R1521 call-stmt -> CALL procedure-designator [( [actual-arg-spec-list] )]
+// R1521 call-stmt -> CALL procedure-designator [chevrons]
+///                          [( [actual-arg-spec-list] )]
+// (CUDA) chevrons -> <<< scalar-expr, scalar-expr [, scalar-int-expr
+//                      [, scalar-int-expr ] ] >>>
+TYPE_PARSER(extension<LanguageFeature::CUDA>(
+    "<<<" >> construct<CallStmt::Chevrons>(scalarExpr, "," >> scalarExpr,
+                 maybe("," >> scalarIntExpr), maybe("," >> scalarIntExpr)) /
+        ">>>"))
 TYPE_PARSER(construct<CallStmt>(
-    sourced(construct<Call>("CALL" >> Parser<ProcedureDesignator>{},
+    sourced(construct<CallStmt>("CALL" >> Parser<ProcedureDesignator>{},
+        maybe(Parser<CallStmt::Chevrons>{}),
         defaulted(parenthesized(optionalList(actualArgSpec)))))))
 
 // R1522 procedure-designator ->
@@ -467,7 +478,13 @@ TYPE_PARSER(construct<AltReturnSpec>(star >> label))
 
 // R1527 prefix-spec ->
 //         declaration-type-spec | ELEMENTAL | IMPURE | MODULE |
-//         NON_RECURSIVE | PURE | RECURSIVE
+//         NON_RECURSIVE | PURE | RECURSIVE |
+// (CUDA)  ATTRIBUTES ( (DEVICE | GLOBAL | GRID_GLOBAL | HOST)... ) |
+//         LAUNCH_BOUNDS(expr-list) | CLUSTER_DIMS(expr-list)
+TYPE_PARSER(first("DEVICE" >> pure(common::CUDASubprogramAttrs::Device),
+    "GLOBAL" >> pure(common::CUDASubprogramAttrs::Global),
+    "GRID_GLOBAL" >> pure(common::CUDASubprogramAttrs::Grid_Global),
+    "HOST" >> pure(common::CUDASubprogramAttrs::Host)))
 TYPE_PARSER(first(construct<PrefixSpec>(declarationTypeSpec),
     construct<PrefixSpec>(construct<PrefixSpec::Elemental>("ELEMENTAL"_tok)),
     construct<PrefixSpec>(construct<PrefixSpec::Impure>("IMPURE"_tok)),
@@ -475,7 +492,19 @@ TYPE_PARSER(first(construct<PrefixSpec>(declarationTypeSpec),
     construct<PrefixSpec>(
         construct<PrefixSpec::Non_Recursive>("NON_RECURSIVE"_tok)),
     construct<PrefixSpec>(construct<PrefixSpec::Pure>("PURE"_tok)),
-    construct<PrefixSpec>(construct<PrefixSpec::Recursive>("RECURSIVE"_tok))))
+    construct<PrefixSpec>(construct<PrefixSpec::Recursive>("RECURSIVE"_tok)),
+    extension<LanguageFeature::CUDA>(
+        construct<PrefixSpec>(construct<PrefixSpec::Attributes>("ATTRIBUTES" >>
+            parenthesized(
+                optionalList(Parser<common::CUDASubprogramAttrs>{}))))),
+    extension<LanguageFeature::CUDA>(construct<PrefixSpec>(
+        construct<PrefixSpec::Launch_Bounds>("LAUNCH_BOUNDS" >>
+            parenthesized(nonemptyList(
+                "expected launch bounds"_err_en_US, scalarIntConstantExpr))))),
+    extension<LanguageFeature::CUDA>(construct<PrefixSpec>(
+        construct<PrefixSpec::Cluster_Dims>("CLUSTER_DIMS" >>
+            parenthesized(nonemptyList("expected cluster dimensions"_err_en_US,
+                scalarIntConstantExpr)))))))
 
 // R1529 function-subprogram ->
 //         function-stmt [specification-part] [execution-part]

diff  --git a/flang/lib/Parser/stmt-parser.h b/flang/lib/Parser/stmt-parser.h
index bc0073f487f46..ba647fd60d4ae 100644
--- a/flang/lib/Parser/stmt-parser.h
+++ b/flang/lib/Parser/stmt-parser.h
@@ -105,5 +105,9 @@ constexpr auto progUnitEndStmtErrorRecovery{
     (many(!"END"_tok >> SkipPast<'\n'>{}) >>
         ("END"_tok >> SkipTo<'\n'>{} || consumedAllInput)) >>
     missingOptionalName};
+
+constexpr auto beginDirective{skipStuffBeforeStatement >> "!"_ch};
+constexpr auto endDirective{space >> endOfLine};
+
 } // namespace Fortran::parser
 #endif // FORTRAN_PARSER_STMT_PARSER_H_

diff  --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 4b6c03cd26dfb..304ff96c66620 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -820,6 +820,8 @@ class UnparseVisitor {
     common::visit(common::visitors{
                       [&](const AllocOpt::Mold &) { Word("MOLD="); },
                       [&](const AllocOpt::Source &) { Word("SOURCE="); },
+                      [&](const AllocOpt::Stream &) { Word("STREAM="); },
+                      [&](const AllocOpt::Pinned &) { Word("PINNED="); },
                       [](const StatOrErrmsg &) {},
                   },
         x.u);
@@ -1685,19 +1687,26 @@ class UnparseVisitor {
   void Unparse(const IntrinsicStmt &x) { // R1519
     Word("INTRINSIC :: "), Walk(x.v, ", ");
   }
+  void Unparse(const CallStmt::Chevrons &x) { // CUDA
+    Walk(std::get<0>(x.t)); // grid
+    Word(","), Walk(std::get<1>(x.t)); // block
+    Walk(",", std::get<2>(x.t)); // bytes
+    Walk(",", std::get<3>(x.t)); // stream
+  }
   void Unparse(const FunctionReference &x) { // R1520
     Walk(std::get<ProcedureDesignator>(x.v.t));
     Put('('), Walk(std::get<std::list<ActualArgSpec>>(x.v.t), ", "), Put(')');
   }
   void Unparse(const CallStmt &x) { // R1521
-    if (asFortran_ && x.typedCall.get()) {
+    if (asFortran_ && x.typedCall.get() && !x.chevrons /*CUDA todo*/) {
       Put(' ');
       asFortran_->call(out_, *x.typedCall);
       Put('\n');
     } else {
-      const auto &pd{std::get<ProcedureDesignator>(x.v.t)};
-      const auto &args{std::get<std::list<ActualArgSpec>>(x.v.t)};
+      const auto &pd{std::get<ProcedureDesignator>(x.call.t)};
       Word("CALL "), Walk(pd);
+      Walk("<<<", x.chevrons, ">>>");
+      const auto &args{std::get<std::list<ActualArgSpec>>(x.call.t)};
       if (args.empty()) {
         if (std::holds_alternative<ProcComponentRef>(pd.u)) {
           Put("()"); // pgf90 crashes on CALL to tbp without parentheses
@@ -1726,6 +1735,15 @@ class UnparseVisitor {
   void Post(const PrefixSpec::Non_Recursive) { Word("NON_RECURSIVE"); }
   void Post(const PrefixSpec::Pure) { Word("PURE"); }
   void Post(const PrefixSpec::Recursive) { Word("RECURSIVE"); }
+  void Unparse(const PrefixSpec::Attributes &x) {
+    Word("ATTRIBUTES("), Walk(x.v), Word(")");
+  }
+  void Unparse(const PrefixSpec::Launch_Bounds &x) {
+    Word("LAUNCH_BOUNDS("), Walk(x.v), Word(")");
+  }
+  void Unparse(const PrefixSpec::Cluster_Dims &x) {
+    Word("CLUSTER_DIMS("), Walk(x.v), Word(")");
+  }
   void Unparse(const FunctionStmt &x) { // R1530
     Walk("", std::get<std::list<PrefixSpec>>(x.t), " ", " ");
     Word("FUNCTION "), Walk(std::get<Name>(x.t)), Put("(");
@@ -1870,9 +1888,6 @@ class UnparseVisitor {
     Walk(std::get<std::optional<AccDataModifier>>(x.t), ":");
     Walk(std::get<AccObjectList>(x.t));
   }
-  void Unparse(const AccDataModifier::Modifier &x) {
-    Word(AccDataModifier::EnumToString(x));
-  }
   void Unparse(const AccBindClause &x) {
     common::visit(common::visitors{
                       [&](const Name &y) { Put('('), Walk(y), Put(')'); },
@@ -1966,9 +1981,6 @@ class UnparseVisitor {
         x.u);
   }
   void Unparse(const AccObjectList &x) { Walk(x.v, ","); }
-  void Unparse(const AccReductionOperator::Operator &x) {
-    Word(AccReductionOperator::EnumToString(x));
-  }
   void Unparse(const AccObjectListWithReduction &x) {
     Walk(std::get<AccReductionOperator>(x.t));
     Put(":");
@@ -2613,6 +2625,10 @@ class UnparseVisitor {
     Walk("(", std::get<std::optional<ArraySpec>>(x.t), ")"), Put(')');
   }
   void Unparse(const BasedPointerStmt &x) { Walk("POINTER ", x.v, ","); }
+  void Unparse(const CUDAAttributesStmt &x) {
+    Word("ATTRIBUTES("), Walk(std::get<common::CUDADataAttr>(x.t));
+    Word(") "), Walk(std::get<std::list<Name>>(x.t), ", ");
+  }
   void Post(const StructureField &x) {
     if (const auto *def{std::get_if<Statement<DataComponentDefStmt>>(&x.u)}) {
       for (const auto &item :
@@ -2658,8 +2674,12 @@ class UnparseVisitor {
 
 #define WALK_NESTED_ENUM(CLASS, ENUM) \
   void Unparse(const CLASS::ENUM &x) { Word(CLASS::EnumToString(x)); }
+  WALK_NESTED_ENUM(AccDataModifier, Modifier)
   WALK_NESTED_ENUM(AccessSpec, Kind) // R807
+  WALK_NESTED_ENUM(AccReductionOperator, Operator)
   WALK_NESTED_ENUM(common, TypeParamAttr) // R734
+  WALK_NESTED_ENUM(common, CUDADataAttr) // CUDA
+  WALK_NESTED_ENUM(common, CUDASubprogramAttrs) // CUDA
   WALK_NESTED_ENUM(IntentSpec, Intent) // R826
   WALK_NESTED_ENUM(ImplicitStmt, ImplicitNoneNameSpec) // R866
   WALK_NESTED_ENUM(ConnectSpec::CharExpr, Kind) // R1205
@@ -2686,6 +2706,38 @@ class UnparseVisitor {
   WALK_NESTED_ENUM(OmpOrderModifier, Kind) // OMP order-modifier
 #undef WALK_NESTED_ENUM
 
+  void Unparse(const CUFKernelDoConstruct::Directive &x) {
+    Word("!$CUF KERNEL DO");
+    Walk(" (", std::get<std::optional<ScalarIntConstantExpr>>(x.t), ")");
+    Word(" <<<");
+    const auto &grid{std::get<1>(x.t)};
+    if (grid.empty()) {
+      Word("*");
+    } else if (grid.size() == 1) {
+      Walk(grid.front());
+    } else {
+      Walk("(", grid, ",", ")");
+    }
+    Word(",");
+    const auto &block{std::get<2>(x.t)};
+    if (block.empty()) {
+      Word("*");
+    } else if (block.size() == 1) {
+      Walk(block.front());
+    } else {
+      Walk("(", block, ",", ")");
+    }
+    if (const auto &stream{std::get<3>(x.t)}) {
+      Word(",STREAM="), Walk(*stream);
+    }
+    Word(">>>\n");
+  }
+
+  void Unparse(const CUFKernelDoConstruct &x) {
+    Walk(std::get<CUFKernelDoConstruct::Directive>(x.t));
+    Walk(std::get<std::optional<DoConstruct>>(x.t));
+  }
+
   void Done() const { CHECK(indent_ == 0); }
 
 private:

diff  --git a/flang/lib/Semantics/check-allocate.cpp b/flang/lib/Semantics/check-allocate.cpp
index fa1951d770f16..00ee0d0954f91 100644
--- a/flang/lib/Semantics/check-allocate.cpp
+++ b/flang/lib/Semantics/check-allocate.cpp
@@ -179,6 +179,8 @@ static std::optional<AllocateCheckerInfo> CheckAllocateOptions(
               parserSourceExpr = &mold.v.value();
               info.gotMold = true;
             },
+            [](const parser::AllocOpt::Stream &) { /* CUDA coming */ },
+            [](const parser::AllocOpt::Pinned &) { /* CUDA coming */ },
         },
         allocOpt.u);
   }

diff  --git a/flang/lib/Semantics/check-do-forall.cpp b/flang/lib/Semantics/check-do-forall.cpp
index 7f61d2fc148ed..8ba301d773f60 100644
--- a/flang/lib/Semantics/check-do-forall.cpp
+++ b/flang/lib/Semantics/check-do-forall.cpp
@@ -975,7 +975,7 @@ static void CheckIfArgIsDoVar(const evaluate::ActualArgument &arg,
 void DoForallChecker::Leave(const parser::CallStmt &callStmt) {
   if (const auto &typedCall{callStmt.typedCall}) {
     const auto &parsedArgs{
-        std::get<std::list<parser::ActualArgSpec>>(callStmt.v.t)};
+        std::get<std::list<parser::ActualArgSpec>>(callStmt.call.t)};
     auto parsedArgIter{parsedArgs.begin()};
     const evaluate::ActualArguments &checkedArgs{typedCall->arguments()};
     for (const auto &checkedOptionalArg : checkedArgs) {

diff  --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index 9e95411a833e3..28c4ba16ae926 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -2722,8 +2722,8 @@ bool ExpressionAnalyzer::CheckIsValidForwardReference(
 MaybeExpr ExpressionAnalyzer::Analyze(const parser::FunctionReference &funcRef,
     std::optional<parser::StructureConstructor> *structureConstructor) {
   const parser::Call &call{funcRef.v};
-  auto restorer{GetContextualMessages().SetLocation(call.source)};
-  ArgumentAnalyzer analyzer{*this, call.source, true /* isProcedureCall */};
+  auto restorer{GetContextualMessages().SetLocation(funcRef.source)};
+  ArgumentAnalyzer analyzer{*this, funcRef.source, true /* isProcedureCall */};
   for (const auto &arg : std::get<std::list<parser::ActualArgSpec>>(call.t)) {
     analyzer.Analyze(arg, false /* not subroutine call */);
   }
@@ -2736,7 +2736,7 @@ MaybeExpr ExpressionAnalyzer::Analyze(const parser::FunctionReference &funcRef,
               true /* might be structure constructor */)}) {
     if (auto *proc{std::get_if<ProcedureDesignator>(&callee->u)}) {
       return MakeFunctionRef(
-          call.source, std::move(*proc), std::move(callee->arguments));
+          funcRef.source, std::move(*proc), std::move(callee->arguments));
     }
     CHECK(std::holds_alternative<semantics::SymbolRef>(callee->u));
     const Symbol &symbol{*std::get<semantics::SymbolRef>(callee->u)};
@@ -2778,9 +2778,9 @@ static bool HasAlternateReturns(const evaluate::ActualArguments &args) {
 }
 
 void ExpressionAnalyzer::Analyze(const parser::CallStmt &callStmt) {
-  const parser::Call &call{callStmt.v};
-  auto restorer{GetContextualMessages().SetLocation(call.source)};
-  ArgumentAnalyzer analyzer{*this, call.source, true /* isProcedureCall */};
+  const parser::Call &call{callStmt.call};
+  auto restorer{GetContextualMessages().SetLocation(callStmt.source)};
+  ArgumentAnalyzer analyzer{*this, callStmt.source, true /* isProcedureCall */};
   const auto &actualArgList{std::get<std::list<parser::ActualArgSpec>>(call.t)};
   for (const auto &arg : actualArgList) {
     analyzer.Analyze(arg, true /* is subroutine call */);
@@ -2791,7 +2791,7 @@ void ExpressionAnalyzer::Analyze(const parser::CallStmt &callStmt) {
                 analyzer.GetActuals(), true /* subroutine */)}) {
       ProcedureDesignator *proc{std::get_if<ProcedureDesignator>(&callee->u)};
       CHECK(proc);
-      if (CheckCall(call.source, *proc, callee->arguments)) {
+      if (CheckCall(callStmt.source, *proc, callee->arguments)) {
         callStmt.typedCall.Reset(
             new ProcedureRef{std::move(*proc), std::move(callee->arguments),
                 HasAlternateReturns(callee->arguments)},
@@ -3284,7 +3284,7 @@ static bool CheckFuncRefToArrayElement(semantics::SemanticsContext &context,
   } else if (name->symbol->Rank() == 0) {
     if (const Symbol *function{
             semantics::IsFunctionResultWithSameNameAsFunction(*name->symbol)}) {
-      auto &msg{context.Say(funcRef.v.source,
+      auto &msg{context.Say(funcRef.source,
           function->flags().test(Symbol::Flag::StmtFunction)
               ? "Recursive call to statement function '%s' is not allowed"_err_en_US
               : "Recursive call to '%s' requires a distinct RESULT in its declaration"_err_en_US,
@@ -3295,7 +3295,7 @@ static bool CheckFuncRefToArrayElement(semantics::SemanticsContext &context,
     return false;
   } else {
     if (std::get<std::list<parser::ActualArgSpec>>(funcRef.v.t).empty()) {
-      auto &msg{context.Say(funcRef.v.source,
+      auto &msg{context.Say(funcRef.source,
           "Reference to array '%s' with empty subscript list"_err_en_US,
           name->source)};
       if (name->symbol) {

diff  --git a/flang/lib/Semantics/resolve-names.cpp b/flang/lib/Semantics/resolve-names.cpp
index d7ea003bf905a..373aa5527489f 100644
--- a/flang/lib/Semantics/resolve-names.cpp
+++ b/flang/lib/Semantics/resolve-names.cpp
@@ -6869,7 +6869,7 @@ bool ResolveNamesVisitor::Pre(const parser::FunctionReference &x) {
   return false;
 }
 bool ResolveNamesVisitor::Pre(const parser::CallStmt &x) {
-  HandleCall(Symbol::Flag::Subroutine, x.v);
+  HandleCall(Symbol::Flag::Subroutine, x.call);
   return false;
 }
 
@@ -8085,7 +8085,7 @@ class ExecutionPartSkimmer {
     resolver_.NoteExecutablePartCall(Symbol::Flag::Function, fr.v);
   }
   void Post(const parser::CallStmt &cs) {
-    resolver_.NoteExecutablePartCall(Symbol::Flag::Subroutine, cs.v);
+    resolver_.NoteExecutablePartCall(Symbol::Flag::Subroutine, cs.call);
   }
 
 private:

diff  --git a/flang/lib/Semantics/tools.cpp b/flang/lib/Semantics/tools.cpp
index d7ef29951e8cc..cab9b8495cfc4 100644
--- a/flang/lib/Semantics/tools.cpp
+++ b/flang/lib/Semantics/tools.cpp
@@ -933,11 +933,12 @@ class ImageControlStmtHelper {
   }
   bool operator()(const parser::CallStmt &stmt) {
     const auto &procedureDesignator{
-        std::get<parser::ProcedureDesignator>(stmt.v.t)};
+        std::get<parser::ProcedureDesignator>(stmt.call.t)};
     if (auto *name{std::get_if<parser::Name>(&procedureDesignator.u)}) {
       // TODO: also ensure that the procedure is, in fact, an intrinsic
       if (name->source == "move_alloc") {
-        const auto &args{std::get<std::list<parser::ActualArgSpec>>(stmt.v.t)};
+        const auto &args{
+            std::get<std::list<parser::ActualArgSpec>>(stmt.call.t)};
         if (!args.empty()) {
           const parser::ActualArg &actualArg{
               std::get<parser::ActualArg>(args.front().t)};

diff  --git a/flang/test/Parser/cuf-sanity-common b/flang/test/Parser/cuf-sanity-common
new file mode 100644
index 0000000000000..368983653a706
--- /dev/null
+++ b/flang/test/Parser/cuf-sanity-common
@@ -0,0 +1,37 @@
+! Common source for CUF parse tree and unparsing tests.
+!@cuf subroutine atcuf;
+end
+
+#ifdef _CUDA
+    subroutine cudadefd;
+end
+#endif
+
+module m
+  real, allocatable, pinned ::pa(:)
+ contains
+  attributes(device) subroutine devicesub; end
+  attributes(device) real function devicefunc(); devicefunc = 1.; end
+  attributes(global) subroutine globalsub; end
+  attributes(grid_global) subroutine gridglobalsub; end
+  attributes(host) subroutine hostsub; end
+  attributes(global) launch_bounds(1, 2) subroutine lbsub; end
+  attributes(global) cluster_dims(1, 2, 3) subroutine cdsub; end
+  attributes(device) subroutine attrs
+! enable with name resolution:    attributes(device) :: devx1
+    real, device :: devx2
+  end subroutine
+  subroutine test
+    logical isPinned
+    !$cuf kernel do(1) <<<*, *, stream = 1>>>
+    do j = 1, 10
+    end do
+    !$cuf kernel do <<<1, (2, 3), stream = 1>>>
+    do j = 1, 10
+    end do
+    call globalsub<<<1, 2>>>
+    call globalsub<<<1, 2, 3>>>
+    call globalsub<<<1, 2, 3, 4>>>
+    allocate(pa(32), stream = 1, pinned = isPinned)
+  end subroutine
+end module

diff  --git a/flang/test/Parser/cuf-sanity-tree.CUF b/flang/test/Parser/cuf-sanity-tree.CUF
new file mode 100644
index 0000000000000..2ce042bcdbc1b
--- /dev/null
+++ b/flang/test/Parser/cuf-sanity-tree.CUF
@@ -0,0 +1,195 @@
+! RUN: %flang_fc1 -fdebug-dump-parse-tree %s 2>&1 | FileCheck %s
+include "cuf-sanity-common"
+!CHECK: Program -> ProgramUnit -> SubroutineSubprogram
+!CHECK: | SubroutineStmt
+!CHECK: | | Name = 'atcuf'
+!CHECK: | SpecificationPart
+!CHECK: | | ImplicitPart -> 
+!CHECK: | ExecutionPart -> Block
+!CHECK: | EndSubroutineStmt -> 
+!CHECK: ProgramUnit -> SubroutineSubprogram
+!CHECK: | SubroutineStmt
+!CHECK: | | Name = 'cudadefd'
+!CHECK: | SpecificationPart
+!CHECK: | | ImplicitPart -> 
+!CHECK: | ExecutionPart -> Block
+!CHECK: | EndSubroutineStmt -> 
+!CHECK: ProgramUnit -> Module
+!CHECK: | ModuleStmt -> Name = 'm'
+!CHECK: | SpecificationPart
+!CHECK: | | ImplicitPart -> 
+!CHECK: | | DeclarationConstruct -> SpecificationConstruct -> TypeDeclarationStmt
+!CHECK: | | | DeclarationTypeSpec -> IntrinsicTypeSpec -> Real
+!CHECK: | | | AttrSpec -> Allocatable
+!CHECK: | | | AttrSpec -> CUDADataAttr = Pinned
+!CHECK: | | | EntityDecl
+!CHECK: | | | | Name = 'pa'
+!CHECK: | | | | ArraySpec -> DeferredShapeSpecList -> int
+!CHECK: | ModuleSubprogramPart
+!CHECK: | | ContainsStmt
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Device
+!CHECK: | | | | Name = 'devicesub'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> FunctionSubprogram
+!CHECK: | | | FunctionStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Device
+!CHECK: | | | | PrefixSpec -> DeclarationTypeSpec -> IntrinsicTypeSpec -> Real
+!CHECK: | | | | Name = 'devicefunc'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> AssignmentStmt = 'devicefunc=1._4'
+!CHECK: | | | | | Variable = 'devicefunc'
+!CHECK: | | | | | | Designator -> DataRef -> Name = 'devicefunc'
+!CHECK: | | | | | Expr = '1._4'
+!CHECK: | | | | | | LiteralConstant -> RealLiteralConstant
+!CHECK: | | | | | | | Real = '1.'
+!CHECK: | | | EndFunctionStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Global
+!CHECK: | | | | Name = 'globalsub'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Grid_Global
+!CHECK: | | | | Name = 'gridglobalsub'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Host
+!CHECK: | | | | Name = 'hostsub'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Global
+!CHECK: | | | | PrefixSpec -> Launch_Bounds -> Scalar -> Integer -> Constant -> Expr = '1_4'
+!CHECK: | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | Scalar -> Integer -> Constant -> Expr = '2_4'
+!CHECK: | | | | | LiteralConstant -> IntLiteralConstant = '2'
+!CHECK: | | | | Name = 'lbsub'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Global
+!CHECK: | | | | PrefixSpec -> Cluster_Dims -> Scalar -> Integer -> Constant -> Expr = '1_4'
+!CHECK: | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | Scalar -> Integer -> Constant -> Expr = '2_4'
+!CHECK: | | | | | LiteralConstant -> IntLiteralConstant = '2'
+!CHECK: | | | | Scalar -> Integer -> Constant -> Expr = '3_4'
+!CHECK: | | | | | LiteralConstant -> IntLiteralConstant = '3'
+!CHECK: | | | | Name = 'cdsub'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | PrefixSpec -> Attributes -> CUDASubprogramAttrs = Device
+!CHECK: | | | | Name = 'attrs'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | | DeclarationConstruct -> SpecificationConstruct -> TypeDeclarationStmt
+!CHECK: | | | | | DeclarationTypeSpec -> IntrinsicTypeSpec -> Real
+!CHECK: | | | | | AttrSpec -> CUDADataAttr = Device
+!CHECK: | | | | | EntityDecl
+!CHECK: | | | | | | Name = 'devx2'
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | | ModuleSubprogram -> SubroutineSubprogram
+!CHECK: | | | SubroutineStmt
+!CHECK: | | | | Name = 'test'
+!CHECK: | | | SpecificationPart
+!CHECK: | | | | ImplicitPart -> 
+!CHECK: | | | | DeclarationConstruct -> SpecificationConstruct -> TypeDeclarationStmt
+!CHECK: | | | | | DeclarationTypeSpec -> IntrinsicTypeSpec -> Logical
+!CHECK: | | | | | EntityDecl
+!CHECK: | | | | | | Name = 'ispinned'
+!CHECK: | | | ExecutionPart -> Block
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> CUFKernelDoConstruct
+!CHECK: | | | | | Directive
+!CHECK: | | | | | | Scalar -> Integer -> Constant -> Expr = '1_4'
+!CHECK: | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | Scalar -> Integer -> Expr = '1_4'
+!CHECK: | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | DoConstruct
+!CHECK: | | | | | | NonLabelDoStmt
+!CHECK: | | | | | | | LoopControl -> LoopBounds
+!CHECK: | | | | | | | | Scalar -> Name = 'j'
+!CHECK: | | | | | | | | Scalar -> Expr = '1_4'
+!CHECK: | | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | | | Scalar -> Expr = '10_4'
+!CHECK: | | | | | | | | | LiteralConstant -> IntLiteralConstant = '10'
+!CHECK: | | | | | | Block
+!CHECK: | | | | | | EndDoStmt -> 
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> CUFKernelDoConstruct
+!CHECK: | | | | | Directive
+!CHECK: | | | | | | Scalar -> Integer -> Expr = '1_4'
+!CHECK: | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | Scalar -> Integer -> Expr = '2_4'
+!CHECK: | | | | | | | LiteralConstant -> IntLiteralConstant = '2'
+!CHECK: | | | | | | Scalar -> Integer -> Expr = '3_4'
+!CHECK: | | | | | | | LiteralConstant -> IntLiteralConstant = '3'
+!CHECK: | | | | | | Scalar -> Integer -> Expr = '1_4'
+!CHECK: | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | DoConstruct
+!CHECK: | | | | | | NonLabelDoStmt
+!CHECK: | | | | | | | LoopControl -> LoopBounds
+!CHECK: | | | | | | | | Scalar -> Name = 'j'
+!CHECK: | | | | | | | | Scalar -> Expr = '1_4'
+!CHECK: | | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | | | Scalar -> Expr = '10_4'
+!CHECK: | | | | | | | | | LiteralConstant -> IntLiteralConstant = '10'
+!CHECK: | | | | | | Block
+!CHECK: | | | | | | EndDoStmt -> 
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> CallStmt = 'CALL globalsub()'
+!CHECK: | | | | | Call
+!CHECK: | | | | | | ProcedureDesignator -> Name = 'globalsub'
+!CHECK: | | | | | Chevrons
+!CHECK: | | | | | | Scalar -> Expr -> LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | Scalar -> Expr -> LiteralConstant -> IntLiteralConstant = '2'
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> CallStmt = 'CALL globalsub()'
+!CHECK: | | | | | Call
+!CHECK: | | | | | | ProcedureDesignator -> Name = 'globalsub'
+!CHECK: | | | | | Chevrons
+!CHECK: | | | | | | Scalar -> Expr -> LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | Scalar -> Expr -> LiteralConstant -> IntLiteralConstant = '2'
+!CHECK: | | | | | | Scalar -> Integer -> Expr -> LiteralConstant -> IntLiteralConstant = '3'
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> CallStmt = 'CALL globalsub()'
+!CHECK: | | | | | Call
+!CHECK: | | | | | | ProcedureDesignator -> Name = 'globalsub'
+!CHECK: | | | | | Chevrons
+!CHECK: | | | | | | Scalar -> Expr -> LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | | Scalar -> Expr -> LiteralConstant -> IntLiteralConstant = '2'
+!CHECK: | | | | | | Scalar -> Integer -> Expr -> LiteralConstant -> IntLiteralConstant = '3'
+!CHECK: | | | | | | Scalar -> Integer -> Expr -> LiteralConstant -> IntLiteralConstant = '4'
+!CHECK: | | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> AllocateStmt
+!CHECK: | | | | | Allocation
+!CHECK: | | | | | | AllocateObject = 'pa'
+!CHECK: | | | | | | | Name = 'pa'
+!CHECK: | | | | | | AllocateShapeSpec
+!CHECK: | | | | | | | Scalar -> Integer -> Expr = '32_4'
+!CHECK: | | | | | | | | LiteralConstant -> IntLiteralConstant = '32'
+!CHECK: | | | | | AllocOpt -> Stream -> Scalar -> Integer -> Expr = '1_4'
+!CHECK: | | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!CHECK: | | | | | AllocOpt -> Pinned -> Scalar -> Logical -> Variable = 'ispinned'
+!CHECK: | | | | | | Designator -> DataRef -> Name = 'ispinned'
+!CHECK: | | | EndSubroutineStmt -> 
+!CHECK: | EndModuleStmt -> 

diff  --git a/flang/test/Parser/cuf-sanity-unparse.CUF b/flang/test/Parser/cuf-sanity-unparse.CUF
new file mode 100644
index 0000000000000..3bd838a75e3ea
--- /dev/null
+++ b/flang/test/Parser/cuf-sanity-unparse.CUF
@@ -0,0 +1,41 @@
+! RUN: %flang_fc1 -fdebug-unparse %s 2>&1 | FileCheck %s
+include "cuf-sanity-common"
+!CHECK: SUBROUTINE atcuf
+!CHECK: END SUBROUTINE
+!CHECK: SUBROUTINE cudadefd
+!CHECK: END SUBROUTINE
+!CHECK: MODULE m
+!CHECK:  REAL, ALLOCATABLE, PINNED :: pa(:)
+!CHECK: CONTAINS
+!CHECK:  ATTRIBUTES(DEVICE) SUBROUTINE devicesub
+!CHECK:  END SUBROUTINE
+!CHECK:  ATTRIBUTES(DEVICE) REAL FUNCTION devicefunc()
+!CHECK:    devicefunc=1._4
+!CHECK:  END FUNCTION
+!CHECK:  ATTRIBUTES(GLOBAL) SUBROUTINE globalsub
+!CHECK:  END SUBROUTINE
+!CHECK:  ATTRIBUTES(GRID_GLOBAL) SUBROUTINE gridglobalsub
+!CHECK:  END SUBROUTINE
+!CHECK:  ATTRIBUTES(HOST) SUBROUTINE hostsub
+!CHECK:  END SUBROUTINE
+!CHECK:  ATTRIBUTES(GLOBAL) LAUNCH_BOUNDS(1_4, 2_4) SUBROUTINE lbsub
+!CHECK:  END SUBROUTINE
+!CHECK:  ATTRIBUTES(GLOBAL) CLUSTER_DIMS(1_4, 2_4, 3_4) SUBROUTINE cdsub
+!CHECK:  END SUBROUTINE
+!CHECK:  ATTRIBUTES(DEVICE) SUBROUTINE attrs
+!CHECK:   REAL, DEVICE :: devx2
+!CHECK:  END SUBROUTINE
+!CHECK:  SUBROUTINE test
+!CHECK:   LOGICAL ispinned
+!CHECK:   !$CUF KERNEL DO (1_4) <<<*,*,STREAM=1_4>>>
+!CHECK:   DO j=1_4,10_4
+!CHECK:   END DO
+!CHECK:   !$CUF KERNEL DO <<<1_4,(2_4,3_4),STREAM=1_4>>>
+!CHECK:   DO j=1_4,10_4
+!CHECK:   END DO
+!CHECK:    CALL globalsub<<<1,2>>>
+!CHECK:    CALL globalsub<<<1,2,3>>>
+!CHECK:    CALL globalsub<<<1,2,3,4>>>
+!CHECK:   ALLOCATE(pa(32_4), STREAM=1_4, PINNED=ispinned)
+!CHECK:  END SUBROUTINE
+!CHECK: END MODULE

diff  --git a/flang/test/lib/lit.local.cfg b/flang/test/lib/lit.local.cfg
index f2c77f45c1bbe..660319fdf4eae 100644
--- a/flang/test/lib/lit.local.cfg
+++ b/flang/test/lib/lit.local.cfg
@@ -15,7 +15,8 @@ config.suffixes = [
     ".ff95",
     ".fpp",
     ".FPP",
-    ".cuf" ".CUF",
+    ".cuf",
+    ".CUF",
     ".f18",
     ".F18",
     ".f03",

diff  --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py
index ba25cb6c78dc2..dda8ed456c986 100644
--- a/flang/test/lit.cfg.py
+++ b/flang/test/lit.cfg.py
@@ -42,7 +42,8 @@
     ".ff95",
     ".fpp",
     ".FPP",
-    ".cuf" ".CUF",
+    ".cuf",
+    ".CUF",
     ".f18",
     ".F18",
     ".f03",


        


More information about the flang-commits mailing list