High Granularity Quantization 2
HGQ2 (High Granularity Quantization 2) is a quantization-aware training framework built on Keras v3, targeting real-time deep learning applications on edge devices like FPGAs. It provides a comprehensive set of tools for creating and training quantized neural networks with minimal effort.
HGQ2 implements an gradient-based automatic bitwidth optimization and quantization-aware training algorithm. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
Key Features
Multi-backend support: Works with TensorFlow, JAX, and PyTorch through Keras v3
Flexible quantization: Supports different quantization schemes including fixed-point and minifloat
Hardware synthesis: Direct integration with hls4ml for FPGA deployment
Trainable quantization parameters: Optimize bitwidths through gradient-based methods
Effective Bit-Operations (EBOP): Accurate resource estimation during training for the deployed firmware
Advanced layer support: HGQ2 supports advanced layers like einsum, einsum dense, and multi-head attention layers with quantization and hardware synthesis support
import keras
from hgq.layers import QDense, QConv2D
from hgq.config import LayerConfigScope, QuantizerConfigScope
# Setup quantization configuration
# These values are the defaults, just for demonstration purposes here
with (
# Configuration scope for setting the default quantization type and overflow mode
# The second configuration scope overrides the first one for the 'datalane' place
QuantizerConfigScope(place='all', default_q_type='kbi', overflow_mode='SAT_SYM'),
# Configuration scope for enabling EBOPs and setting the beta0 value
QuantizerConfigScope(place='datalane', default_q_type='kif', overflow_mode='WRAP'),
LayerConfigScope(enable_ebops=True, beta0=1e-5),
):
model = keras.Sequential([
QConv2D(32, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
QDense(10)
])
Index
API Reference:
- hgq package
- hgq.config package
- hgq.constraints package
- hgq.layers package
- Subpackages
- Submodules
- hgq.layers.activation module
- hgq.layers.batch_normalization module
- hgq.layers.conv module
- hgq.layers.einsum_dense_batchnorm module
- hgq.layers.linformer_attention module
- hgq.layers.multi_head_attention module
- hgq.layers.pooling module
- hgq.layers.softmax module
- Module contents
QAdd
QAveragePooling1D
QAveragePooling2D
QAveragePooling3D
QAveragePow2
QAvgPool1D
QAvgPool2D
QAvgPool3D
QBatchNormDense
QBatchNormalization
QConv1D
QConv2D
QConv3D
QDense
QDot
QEinsum
QEinsumDense
QEinsumDenseBatchnorm
QGlobalAveragePooling1D
QGlobalAveragePooling2D
QGlobalAveragePooling3D
QGlobalAvgPool1D
QGlobalAvgPool2D
QGlobalAvgPool3D
QGlobalMaxPool1D
QGlobalMaxPool2D
QGlobalMaxPool3D
QGlobalMaxPooling1D
QGlobalMaxPooling2D
QGlobalMaxPooling3D
QLinformerAttention
QMaxPool1D
QMaxPool2D
QMaxPool3D
QMaxPooling1D
QMaxPooling2D
QMaxPooling3D
QMaximum
QMeanPow2
QMinimum
QMultiHeadAttention
QMultiply
QSoftmax
QSubtract
QSum
QUnaryFunctionLUT
Quantizer
- hgq.quantizer package
- hgq.regularizers package
- hgq.utils package
- qkeras package