Useful Tips
Resource estimation
The BOPs generated by this framework can be use as a good estimator for the on-chip resource consumption, when the following conditions are met:
latency
strategy is used.reuse_factor
is set to 1.parallel_factor
is set to match the number of convolution kernel application count (Everything done in parallel).
If io_parallel
is used, resource consumption can be estimated in terms of a linear combination of LUTs and DSPs: $\(\mathrm{LUTs}+55\cdot\mathrm{LUTs}\sim\mathrm{BOPs}\)$
The factor in front of DSPs is rough, but the final order-of-magnitude estimation is still useful.
If io_stream
is used, you will need to add resources used for FIFOs, which cannot be directly estimated from BOPs, and depends on the specific implementation (i.e. ShiftRegister vs. BRAM).
Regarding #pragma HLS DATAFLOW
in vivado/vitis
If you are using io_parallel
AND met the above conditions AND and has a colvolution layer in your network, you may get a much larger resource consumption than expected together with terrible latency. In this case, please try changing the #pragma HLS DATAFLOW
to #pragma HLS PIPELINE
or simply removing it and re-synthesize the code.
Regarding #pragma HLS INLINE RECURSIVE
in vivado
If you are using io_parallel
with latency
strategy with vivado_hls
, you may try adding #pragma HLS INLINE RECURSIVE
to your top function. This may reduce the resource consumption for some networks. In many cases, resource consumption can be reduced by \(\sim10\)%, and latency may or may not be improved.
When use intra-layer heterogeneous quantization
If using latency
strategy, it is recommended to use intra-layer heterogeneous weight quantization.
For intra-later heterogeneous activation quantization, if you are using io_parallel
with latency
strategy, one may enable it. For some networks, this may lead to a huge overfitting, and the resource reduction is not as significant as the weight counterpart.
When using only inter-layer heterogeneous quantization
One is recommended to disable intra-layer heterogeneous weight quantization if and only if the model is planned to be deployed with the resource
strategy in hls4ml
. When intra-layer heterogeneous quantization is not enabled, this is equivalent to optimizing bitwidths with approximated gradients. The obtained resource may be better or worse than the AutoQKeras
counterpart.
When doing this, it is strongly recommended to use only L1
and/or L2
regulation on weights and activations (i.e., set beta=0
), as the training time BOPs estimated is not accurate at all and not relevant.