Running General HPVM benchmarks (non-DNN)

HPVM’s general (non-DNN) benchmarks are located under hpvm/benhcmarks/general_benchmarks. In order to be able to build the existing benchmarks, a new Makefile.config must be created in include based on the existing Makefile.config.example. This configuration file must set up the following paths:

  • LLVM_BUILD_DIR: should point to your local build directory of HPVM.

  • HPVM_BENCH_DIR: should point to this “benchmarks” directory.

  • CUDA_PATH: should point to your local CUDA installation, if available. Only required for GPU back end.

  • OPENCL_PATH: should point to a local OpenCL installation if not using the CUDA libraries. Required to point to Intel FPGA SDK for OpenCL libraries for FPGA back end.

When using the Makefile configuration file, it creates the following variables that can be defined when running make to compile any of application:

  • TARGET=<target>: Must be used to specify what target device we are compiling to. It will set the DEVICE macro in the application kernel’s __hpvm__hint() API call to mark the target device for each kernel. The variable accepts the following values

    • TARGET=seq: Sets compilation for CPU target. This is the default.

    • TARGET=gpu: Sets compilation for GPU target.

    • TARGET=fpga: Sets compilation for FPGA target.

  • DEBUG=1: When provided will enable a debug compilation so that the HPVM passes will output debug prints. If instead it is desired to enable debug prints for specific passes, --debug-only=<DEBUG_TYPE> can be added to the FPGA_OPTFLAGS (used in FPGA compilation), HPVM_OPTFLAGS (used in CPU/GPU compilation), HCC-OPTS (used for the Hetero-C++ frontend), and OCLBE_FLAGS (used for the llvm-to-opencl back end tool) variables. For example, debug prints for the FPGA back end pass can be enabled using: FPGA_OPTFLATS += --debug-only=DFG2LLVM_FPGA. Note that this will only enable compilation debug prints. HPVM does not currently support compiling programs in debug mode (i.e. with -g flag).

  • Additionally, the FPGA target supports the following extra variables:

    • EMULATION=1: Can be used to enable compilation of the FPGA kernels in EMULATION mode. This also generates the necessary host code in the binary that would launch the Intel FPGA Emulator instead of the actual FPGA.

    • BOARD=<board>: Can be used to set the target FPGA borad. Defaults to a10gx, i.e. Arria 10 GX Development Board.

    • RTL=1: Can be used to stop compilation after the RTL generation step where Intel AOC pre-synthesis reports get generated.

    • PROFILE=1: Enables FPGA compilation with profiling enabled (i.e. AOC sytnehsized the design with profile registers). Refer to Intel FPGA SDK for OpenCL Programming Guide for more information.

    • FPGAOPTS=<opt_list>: Enables the specified optimizations. For multiple optimization, seperate with a comma.

      • LU: Loop Unrolling. Additionally include UF=<unroll_factor> to set the unroll factor.

      • LF: Greedy Loop Fusion.

      • BI: Automatic Input Buffering.

      • PRIV: Automatic Argument Privatization.

      • NTLP: Disables Automatic Task Level Parallelism (otherwise TLP is enabled).

      • NF: Node Fusion.

As an example, if we wish to compile an application for the FPGA target, in Emulation mode, with Loop Unrolling and Loop Fusion enabled, and with an unroll factor of 4, we can do that using:

make TARGET=fpga EMULATION=1 FPGAOPTS=LU,LF UF=4

Compiling and Running benchmarks

Once Makefile.config has been created, we can build one of the benchmarks. Let us demonstrate using the Edge Detection Pipeline benchmark pipeline located under hpvm/benchmarks/general_benchmarks/pipeline. Once in the benchmark folder we can compile the benchmark for different targets as follows:

  • To compile and run the benhcmark on CPU:

    make TARGET=seq
    make TARGET=seq run
    
  • To compile and run the benchmark on GPU (note that this will require having an NVIDIA GPU and the NVIDIA OpenCL runtime installed):

    make TARGET=gpu
    make TARGET=gpu run
    
  • To compile and run the benchmark on an Intel FPGA (this will require having an Intel FPGA with OpenCL Support and the Intel FPGA SDK for OpenCL installed):

    • For FPGA we can first run the benchmark in emulation to verify its functionality:

      # This is required to ensure that the Intel FPGA emulator
      # does not spawn too many threads
      export OCL_TBB_NUM_WORKERS=<N>
      make TARGET=fpga EMULATION=1
      make TARGET=fpga EMULATION=1 run
      
    • Once functionality has been verified and when we are ready to synthesize the FPGA design, we can run a full compilation. Note that this will take a few hours to complete:

      make TARGET=fpga
      make TARGET=fpga run
      

Your own project

See template for an example Makefile that you can use in your own project. As with the benchmarks, make sure Makefile.config is created as described above. In your makefile, you will need to specify the HPVM source file (i.e. the one that contains the HPVM DFG), any other source files that need to be linked in, the name of your application executable, and any linker and include flags. Make sure you include heterocc.h in your C/C++ project files to use the Hetero-C++ api functions (found in include/heterocc.h).