Hetero-C++ Language Specification

A Hetero-C++ program consists of Parallel Sections, Parallel Tasks, and Parallel Loops. Parallel Sections correspond to Internal Nodes in the hetero DataFlow Graph (DFG), while Parallel Tasks/Loops may translate to either Internal Nodes or Leaf Nodes. Please refer to Writing a Hetero-C++ Program for a detailed overview of the language semantics.

Buffer Pair

A buffer pair denotes a Pointer value followed by the size of the memory it points to. Various Hetero-C++ api calls require specifying both of these arguments together to insert the hetero-C++ intrinsics correctly. For example (…, int* ptr1, size_t ptr1Sz, ...)

Launch & Wait API

void* __hetero_launch(void* RootGraph, unsigned ni, …, unsigned no, ...)

Launches the execution of the graph with node function RootGraph. ni is the number of input buffer pairs to the root function. Similarly, no is the number of output buffer pairs to be returned from the root graph. Any memory which is required to be copied back to the host after the DFG completes execution must be listed in the output argument pairs.

void __hetero_wait(void* Graph)

Waits for completion of execution of the dataflow graph with handle G.

void* __hetero_launch_begin(unsigned ni, …, unsigned no, ...)

Inlined version of __hetero_launch marker. It internally performs the additional step of extracting the region described by the launch begin marker. ni is the number of input buffer pairs to the root function. Similarly, no is the number of output buffer pairs to be returned from the root graph. Any memory which is required to be copied back to the host after the DFG completes execution must be listed in the output argument pairs.

void __hetero_launch_end(void*)

Used to define the ending point for a __hetero_launch_begin region. It takes the opaque handled returned by the launch begin marker to identify which marker is being used.

Parallel Section API

void* __hetero_section_begin()

Defines the starting point for a Parallel Section and returns an opaque handle defining the start of this section. There can be no program code or conditional statements inside Parallel Sections except if inside of a Parallel Task or Loop defined inside. There can be multiple Parallel Tasks and Parallel Loops defined inside a single Parallel Section.

void __hetero_section_end(void*)

Defines the ending point for a specified Parallel Section given by the opaque handle in the first argument.

Parallel Task API

void* __hetero_task_begin(unsigned ni ,..., unsigned no, ..., ["name"])

Defines the starting point for a Parallel Task with the input and output buffer pairs specified. Returns an opaque handle to the starting point of the parallel task.

ni denotes the number of input buffer pairs required inside the Parallel Task. The required buffer pairs will be listed after ni. Similarly, no denotes the number of output buffer pairs, after which the corresponding number of pairs of output pointers and respective pointer sizes must be passed as arguments.

Any memory which needs to be propagated out of the Parallel Task should be considered as an output buffer pair. Note that the same pointer may be an input and output pointer simultaneously.

This marker also takes an optional string literal as a final argument to override the name of the node function which gets generated.

void __hetero_task_end(void*)

Defines the ending point for a Parallel Task given by the opaque handle in the first argument.

Parallel Loop API

void __hetero_parallel_loop(unsigned num_loops ,unsigned ni ,..., unsigned no, ..., ["name"])

Specifies that a loop can be parallelized by creating a N-Dimensional Node in the hetero DFG. The first argument num_loops specifies the number of enclosing loops (including the immediately enclosing loop it is called from) that get converted into the N-dimensional parallel HPVM nodes. The number of dynamic instances of the resulting Node is inferred from the each Loop’s Trip Count, which may be calculated locally inside the parent internal node function or as an argument of the parent internal node function. Note that this call must be the first instruction in the body of the Loop. The arguments to this function are identical to that of a Parallel Task begin call.

This marker also takes an optional string literal as a final argument to override the name of the node function which gets generated.

Optional API

The following methods are not required for generating correct Hetero-C++ code but can aid the compiler in generating more performant code as well as dictating which to offload a particular computation too.

void __hetero_hint(int i)

By default, each node generated by the Hetero-C++ compiler gets assigned to execute on the CPU. To overide this default behavior, users can call this method inside their parallel task or parallel loop where the argument specifies the enum value of the target device. The value mapping is as follows:

CPU_TARGET

GPU_TARGET

FPGA_TARGET

void* __hetero_malloc(size_t)

This method allocates memory on the heap using an aligned allocation. Additionally, it generates specific hpvm runtime calls which inform the hpvm memory tracker to track the allocated memory. In most cases, users can simply use the regular malloc call as the __hetero_launch_* api calls generate the aforementioned hpvm runtime calls at the launch site of the host code. This API should be used when this behavior is not sufficient and we want to continue tracking the memory even after the Dataflow code has completed executing. Note that use of this API overides the automatically generation of the specific hpvm runtime calls and so these calls must be generated for all memory passed into the Hetero-C++ DFG calls (on the host side only).

void* __hetero_free(void* p)

Similar to the regular free call, this method frees memory pointed to by p. Additionally it inserts hpvm runtime calls which inform the hpvm memory tracker to stop tracking the specific memory. In most cases, users can simply use the regular free call as the __hetero_launch_* api calls generate the aforementioned hpvm runtime calls at the launch site of the host code. This API should be used when this behavior is not sufficient and we want to continue tracking the memory even after the Dataflow code has completed executing. Note that use of this API overides the automatically generation of the specific hpvm runtime calls and so these calls must be generated for all memory passed into the Hetero-C++ DFG calls (on the host side only).

void __hetero_request_mem(void*)

This method call informs the hpvm runtime to copy the most up to date ‘dirty’ copy of memory passed into any Dataflow graph back from the device it’s most recent modified copy is back to the host (CPU). In most cases, users do not need to generate this call as the hetero launch markers generate these automatically for any memory which is specified as an output to the graph being launched. Note that use of this API overides the automatically generation of the specific hpvm runtime calls and so these calls must be generated for all (output) memory passed into the Hetero-C++ DFG calls (on the host side only).

void __hetero_isNonZeroLoop(long, ...)

When using the hpvm-dse tool, this method call informs the Design Space Exploration tool with what unroll factors to apply. The first argument refers to the loop induction variable, while the remaining are the constant unroll factors to use in the DSE search.

void __hetero_priv(long, ...)

This optional API enables users to specify that certain arguments to tasks or parallel loops are private. The first argument specifies the number of buffer pairs which are to be marked private, while the following arguments list each buffer pair. Note that this requires that the size of the buffers listed be literal constants.