HPVM Targets

There are three core back ends in the HPVM compiler. A CPU back end, a GPU back end, and an FPGA back end. These back ends are separate from the tensor back ends which can also target the GPU, and which are described under Tensor Components. The three back ends are described below.

HPVM CPU Back End

The CPU back end in HPVM is responsible for compiling the host binary of the application, which includes generating the kernel code for any leaf nodes that have been targeted to the CPU. The CPU back end comprises of two main components: 1. An HPVM-to-CPU code generation pass which generates the LLVM bitcode for the host module, and 2. LLVM’s architecture-specific back end (x86, Arm, RISC-V) which compiles the host module into an executable binary.

The HPVM-to-CPU code-gen pass performs a bottom-up traversal of the HPVM Data Flow Graph (DFG) (see HPVM IR Specification) and handles the nodes as follows:

  • For each leaf node that has a CPU target, the code-gen pass generates corresponding LLVM bit code for that function that would execute the body of that leaf node on the CPU. As part of the translation, nodes that had dynamic replication get translated into functions with loop nests corresponding to the different dimensions of the dynamic replication factor.

  • For each internal node that hasn’t been code-gen’d yet (note that internal nodes that contain FPGA or GPU nodes will get code-gen’d by those back ends), the pass will generate a function in the host module that invokes any internal or leaf node functions within.

  • For the main DFG launch, the pass handles the launching depending on whether or not that DFG is streaming.

The implementation details of the pass can be found in the backend passes developer docs. The output of the code-gen pass is an LLVM Module for the host code that then gets compiled into a binary using LLVM’s respective back end for the target architecture.

HPVM GPU Back End

The GPU back end in HPVM is responsible for generating the OpenCL kernels for the leaf nodes that have been targeted to the GPU, as well as the corresponding host code that launches these kernels. This back end also comprises of two main components: 1. An HPVM-to-GPU code generation pass which compiles the HPVM IR into an LLVM Module for the GPU kernel code, as well as generate the host code that handles these kernels into the host LLVM Module; and 2. An LLVM-to-OpenCL back end which compiles the kernel Module into an OpenCL file containing OpenCL code for the kernels. The OpenCL kernels get compiled at runtime using the Nvidia OpenCL runtime.

The HPVM-to-GPU code-gen pass performs a bottom-up traversal of the HPVM DFG, handling each node as follows:

  • For each leaf node that has a GPU target, the back end generates OpenCL-compliant LLVM bit-code, which would then get compiled using our LLVM-to-OpenCL back end tool. This involves translating our HPVM Graph Querying Intrisics (see HPVM IR Specification) into corresponding OpenCL intrinsics which would get translated into OpenCL function calls by the LLVM-to-OpenCL back end. For example, HPVM’s getNodeInstanceID intrisic will get translated into the corresponding OpenCL get_global_id.

  • For each internal node that launches GPU nodes, the code-gen pass generates the corresponding host code for setting up the kernels and their arguments, launching the kernels on the GPU, copying memory between the GPU and the host, etc.

The implementation details of the pass can be found in the backend passes developer docs. The kernel module that gets generated by the code-gen pass gets compiled using LLVM-to-OpenCL to generate an OpenCL file of the kernels. The output of the back end is the host module, which would still need to be compiled using the CPU back end, and an OpenCL file for the kernels which gets loaded and compiled at run time by the host code and the Nvidia OpenCL runtime.

HPVM FPGA Back End

The FPGA back end in HPVM is responsible for generating the FPGA bitstream for the leaf nodes that have been targeted to the FPGA, as well as the corresponding host code that launches these kernels. The back end targets Intel FPGAs using the Intel FPGA SDK for OpenCL, and uses the Single Work Item Kernel (SWIK) programming model as recommended by Intel (see Intel FPGA SDK for OpenCL Programming Guide for more information.) This back end is comprised of three main components: 1. An HPVM-to-FPGA code generation pass which compiles the HPVM IR into an LLVM Module for the FPGA kernel code, as well as generate the host code that handles these kernels into the host LLVM Module. As part of this pass a Node Sequentialization transformation is used to ensure that the generated OpenCL kernels are SWIKs; 2. An LLVM-to-OpenCL back end which compiles the kernel Module into an OpenCL file containing OpenCL code for the kernels; and 3. The Intel FPGA (Altera) OpenCL Compiler (AOC) is used for synthesizing the generated OpenCL kernels into an FPGA bitstream.

The HPVM-to-FPGA code-gen pass performs a bottom-up traversal of the HPVM DFG, handling each node as follows:

  • For each leaf node that has an FPGA target, the node gets sequentialized using the Node Sequentialization transformation, then the code-gen pass generates OpenCL-compliant LLVM bit-code, which would then get compiled using our LLVM-to-OpenCL back end tool.

  • For each internal node that launches one more FPGA nodes, the code-gen pass generates the corresponding host code for setting up the kernels and their arguments, launching the kernels on the GPU, copying memory between the GPU and the host, etc. Since we support running kernels concurrently on the FPGA, depending on whether the feature is enabled, the back end would use multiple OpenCL command queues and handles launching the kernels in parallel accordingly.

The implementation details of the pass can be found in the backend passes developer docs. The kernel module that gets generated by the code-gen pass gets compiled using LLVM-to-OpenCL to generate an OpenCL file of the kernels. That OpenCL file then gets synthesized using AOC to generate the FPGA bistream. The output of the back end is the host module, which would still need to be compiled using the CPU back end to generate the host binary, and the FPGA bistream that would program the FPGA.

Note that a user-facing tool, hpvm2fpga is provided with hpvm to run the FPGA backend seamlessly to the user. Please refer to HPVM User-Facing Tools for more information.

LLVM-to-OpenCL Tool

Both the GPU and FPGA HPVM back ends use an LLVM-to-OpenCL tool to generate OpenCL code for the kernels out of the LLVM bitcode that our back end passes generate. Our LLVM-to-OpenCL tool, called llvm-ocl, is provided as part of hpvm. The tool translates and LLVM module containing kernel functions that use Intrinsics to represent the OpenCL API calls, and LLVM Metadata to represent keywords like __kernel, __global, and __local, and translates those to OpenCL syntax. The output of the tool is an OpenCL file.