# High-Level Synthesis of Neural Networks for FPGAs

**Christof Schlaak** – Andrej Ivanis – Christophe Dubach

### Motivation

#### Hardware platforms for NN Accelerators:



# Problem

- FPGAs are not easy to use:
  - Require hardware design expertise
  - Require use of low level hardware language
  - Steep learning curve for tools
  - Workflow is not portable





- FPGAs are reconfigurable
  - can exploit different types of NNs
  - can adapt to evolving NN implementations



**library** ieee; use ieee.std\_logic\_1164.all; use ieee.numeric\_std.all; use work.common.all;

architecture behavioral of add is begin

> process (clk) begin if rising\_edge(clk) then data\_out <= std\_logic\_vector(</pre> unsigned(data\_in\_1) + unsigned(data\_in\_2)); data\_out\_valid <= data\_in\_1\_valid and</pre> data\_in\_2\_valid; data\_in\_1\_ready <= data\_out\_ready and</pre> data\_in\_1\_valid and data\_in\_2\_valid; data\_in\_2\_ready <= data\_out\_ready and</pre> data\_in\_1\_valid and data\_in\_2\_valid; end if;

end process;

end behavioral;



# Existing solutions

- OpenCL
  - Outperformed by hand-written HDL
- High-level tools provided by vendors (e.g. Xilinx SDAccel)
  - Not as flexible as HDL
  - May not support upcoming NN architectures
- HDL generation based on pre-built RTL components

## The Lift approach

- Specify behaviour in a high-level functional language
- Optimise using rewrite rules
  - On algorithmic level &
  - On hardware-specific level
- Generate hardware implementation
- map( $\lambda$  arow .  $map(\lambda bcol$ . *reduce*( +, 0 ) • *zip*( arow, bcol ) , transpose( B )
- Estimate design quality using a performance model
  - Feedback results into new design generation

#### Not flexible enough

### Advantages

- Target CPU, GPU and FPGAs
- Support arbitrary NN architectures
- Portable across many FPGAs
- Automatically optimised





lift-project.org

