Running General HPVM benchmarks (non-DNN)¶
HPVM’s general (non-DNN) benchmarks are located under
In order to be able to build the existing benchmarks, a new
Makefile.config must be created in
include based on the existing
Makefile.config.example. This configuration file must set up the
LLVM_BUILD_DIR: should point to your local
builddirectory of HPVM.
HPVM_BENCH_DIR: should point to this “benchmarks” directory.
CUDA_PATH: should point to your local CUDA installation, if available. Only required for GPU back end.
OPENCL_PATH: should point to a local OpenCL installation if not using the CUDA libraries. Required to point to Intel FPGA SDK for OpenCL libraries for FPGA back end.
When using the Makefile configuration file, it creates the following
variables that can be defined when running
make to compile any of
TARGET=<target>: Must be used to specify what target device we are compiling to. It will set the
DEVICEmacro in the application kernel’s
__hpvm__hint()API call to mark the target device for each kernel. The variable accepts the following values
TARGET=seq: Sets compilation for CPU target. This is the default.
TARGET=gpu: Sets compilation for GPU target.
TARGET=fpga: Sets compilation for FPGA target.
DEBUG=1: When provided will enable a debug compilation so that the HPVM passes will output debug prints. If instead it is desired to enable debug prints for specific passes,
--debug-only=<DEBUG_TYPE>can be added to the
FPGA_OPTFLAGS(used in FPGA compilation),
HPVM_OPTFLAGS(used in CPU/GPU compilation),
HCC-OPTS(used for the Hetero-C++ frontend), and
OCLBE_FLAGS(used for the llvm-to-opencl back end tool) variables. For example, debug prints for the FPGA back end pass can be enabled using:
FPGA_OPTFLATS += --debug-only=DFG2LLVM_FPGA. Note that this will only enable compilation debug prints. HPVM does not currently support compiling programs in debug mode (i.e. with
Additionally, the FPGA target supports the following extra variables:
EMULATION=1: Can be used to enable compilation of the FPGA kernels in EMULATION mode. This also generates the necessary host code in the binary that would launch the Intel FPGA Emulator instead of the actual FPGA.
BOARD=<board>: Can be used to set the target FPGA borad. Defaults to
a10gx, i.e. Arria 10 GX Development Board.
RTL=1: Can be used to stop compilation after the RTL generation step where Intel AOC pre-synthesis reports get generated.
PROFILE=1: Enables FPGA compilation with profiling enabled (i.e. AOC sytnehsized the design with profile registers). Refer to Intel FPGA SDK for OpenCL Programming Guide for more information.
FPGAOPTS=<opt_list>: Enables the specified optimizations. For multiple optimization, seperate with a comma.
LU: Loop Unrolling. Additionally include
UF=<unroll_factor>to set the unroll factor.
LF: Greedy Loop Fusion.
BI: Automatic Input Buffering.
PRIV: Automatic Argument Privatization.
NTLP: Disables Automatic Task Level Parallelism (otherwise TLP is enabled).
NF: Node Fusion.
As an example, if we wish to compile an application for the FPGA target, in Emulation mode, with Loop Unrolling and Loop Fusion enabled, and with an unroll factor of 4, we can do that using:
make TARGET=fpga EMULATION=1 FPGAOPTS=LU,LF UF=4
Compiling and Running benchmarks¶
Makefile.config has been created, we can build one of the benchmarks. Let us demonstrate using the Edge Detection Pipeline benchmark
pipeline located under
hpvm/benchmarks/general_benchmarks/pipeline. Once in the benchmark folder we can compile the benchmark for different targets as follows:
To compile and run the benhcmark on CPU:
make TARGET=seq make TARGET=seq run
To compile and run the benchmark on GPU (note that this will require having an NVIDIA GPU and the NVIDIA OpenCL runtime installed):
make TARGET=gpu make TARGET=gpu run
To compile and run the benchmark on an Intel FPGA (this will require having an Intel FPGA with OpenCL Support and the Intel FPGA SDK for OpenCL installed):
For FPGA we can first run the benchmark in emulation to verify its functionality:
# This is required to ensure that the Intel FPGA emulator # does not spawn too many threads export OCL_TBB_NUM_WORKERS=<N> make TARGET=fpga EMULATION=1 make TARGET=fpga EMULATION=1 run
Once functionality has been verified and when we are ready to synthesize the FPGA design, we can run a full compilation. Note that this will take a few hours to complete:
make TARGET=fpga make TARGET=fpga run
Your own project¶
See template for an example Makefile that you can use in your own project. As with the benchmarks, make sure
Makefile.config is created as described above. In your makefile, you will need to specify the HPVM source file (i.e. the one that contains the HPVM DFG), any other source files that need to be linked in, the name of your application executable, and any linker and include flags. Make sure you include
heterocc.h in your C/C++ project files to use the Hetero-C++ api functions (found in