da4ml.cmvm package
Subpackages
Submodules
da4ml.cmvm.api module
- da4ml.cmvm.api.jit_solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1) CascadedSolution
Optimized implementation of a CMVM computation with cascaded two matrices.
- Parameters:
kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
- Returns:
A solution containing the optimized implementation of the CMVM computation with cascaded stages.
- Return type:
- da4ml.cmvm.api.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)
Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).
- Parameters:
kernel (np.ndarray) – The input kernel matrix.
qintervals (list[QInterval]) – List of QIntervals for each input.
latencies (list[float]) – List of latencies for each input
carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)
adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)
- Returns:
The minimal latency for the given kernel, QInterval, and input latencies.
- Return type:
float
- da4ml.cmvm.api.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) CascadedSolution
Solve the CMVM problem with cascaded two matrices.
- Parameters:
kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.
- Returns:
A solution containing the optimized implementation of the CMVM computation with cascaded stages.
- Return type:
da4ml.cmvm.types module
- class da4ml.cmvm.types.CascadedSolution(solutions: tuple[Solution, ...])
Bases:
NamedTuple
A solution that implements cascaded matrix-vector multiplications through multiple CMVM stages.
CascadedSolution represents a sequence of Solution objects where the output of each stage is fed as input to the next stage.
- solutions
A tuple containing the individual Solution objects for each stage of the cascade.
- Type:
tuple[Solution, …]
- Properties
- ----------
- kernel
The overall kernel matrix which the cascaded solution implements: vec @ kernel = solution(vec). This is calculated as the matrix product of all individual solution kernels.
- Type:
NDArray[float32]
- cost
The total cost of the cascaded solution, computed as the sum of the costs of all stages.
- Type:
float
- latency
The minimum and maximum latency of the cascaded solution.
- Type:
tuple[float, float]
- inp_lat
Input latencies
- Type:
list[float]
- in_shift
Input shifts
- Type:
list[int]
- out_lat
Output latencies
- Type:
list[float]
- out_shift
Output shifts
- Type:
list[int]
- out_neg
Output signs
- Type:
list[bool]
- shape
The shape of the corresponding kernel matrix.
- Type:
tuple[int, int]
- property cost
- classmethod deserialize(data: dict)
Load the solution from a file.
- property inp_latency
- property inp_qint
- property inp_shift
- property kernel
- property latency
- classmethod load(path: str)
Load the solution from a file.
- property out_latencies
- property out_neg
- property out_qint
- property out_shift
- property reg_bits
The number of bits used for the register in the solution.
- save(path: str | Path)
Save the solution to a file.
- property shape
- class da4ml.cmvm.types.DAState(shifts: tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[int8]]], expr: list[ndarray[tuple[int, ...], dtype[int8]]], ops: list[Op], freq_stat: dict[Pair, int], kernel: ndarray[tuple[int, ...], dtype[float32]])
Bases:
NamedTuple
Internal state of the DA algorithm.
- expr: list[ndarray[tuple[int, ...], dtype[int8]]]
Alias for field number 1
- kernel: ndarray[tuple[int, ...], dtype[float32]]
Alias for field number 4
- shifts: tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[int8]]]
Alias for field number 0
- class da4ml.cmvm.types.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)
Bases:
NamedTuple
One single operation on the data buffer.
- Parameters:
id0 (int) – index of the first operand
id1 (int) – index of the second operand, or special opcode if negative
opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition
data (int) – Data to be used in the operation
qint (QInterval) – Quantization interval of the resultant buffer
latency (float) – Latency of the data generated by this operation (t_available)
cost (float) – Cost of the operation
- cost: float
Alias for field number 6
- data: int
Alias for field number 3
- id0: int
Alias for field number 0
- id1: int
Alias for field number 1
- latency: float
Alias for field number 5
- opcode: int
Alias for field number 2
- class da4ml.cmvm.types.Pair(id0: int, id1: int, sub: bool, shift: int)
Bases:
NamedTuple
An operation representing data[id0] +/- data[id1] * 2**shift.
- id0: int
Alias for field number 0
- id1: int
Alias for field number 1
- shift: int
Alias for field number 3
- sub: bool
Alias for field number 2
- class da4ml.cmvm.types.Precision(keep_negative: bool, integers: int, fractional: int)
Bases:
NamedTuple
A class representing the precision of a quantized interval.
- fractional: int
Alias for field number 2
- integers: int
Alias for field number 1
- keep_negative: bool
Alias for field number 0
- property qint
- class da4ml.cmvm.types.QInterval(min: float, max: float, step: float)
Bases:
NamedTuple
A class representing a quantized interval: [min, max] with a step size.
- classmethod from_kif(k: int | bool, i: int, f: int)
- max: float
Alias for field number 1
- min: float
Alias for field number 0
- property precision
- step: float
Alias for field number 2
- class da4ml.cmvm.types.Solution(shape: tuple[int, int], inp_shift: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int)
Bases:
NamedTuple
Represents a series of operations that can be applied to a vector of data. May represent a CMVM solution or a general neural network
- shape
#input, #output
- Type:
tuple[int, int]
- inp_shift
The shifts that should be applied to the input data.
- Type:
list[int]
- out_idxs
The indices of the output data in the buffer.
- Type:
list[int]
- out_shifts
The shifts that should be applied to the output data.
- Type:
list[int]
- out_negs
The signs of the output data.
- Type:
list[bool]
- carry_size
Size of the carrier for the adder.
- Type:
int
- adder_size
Elementary size of the adder.
- Type:
int
The core part of the solution is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.
- adder_size: int
Alias for field number 7
- carry_size: int
Alias for field number 6
- property cost
Total cost of the solution.
- classmethod deserialize(data: dict)
Load the solution from a file.
- property inp_latency
Latencies of all input elements of the solution.
- property inp_qint
Quantization intervals of the input elements.
- inp_shift: list[int]
Alias for field number 1
- property kernel
the kernel represented by the solution, when applicable.
- property latency
Minimum and maximum latency of the solution.
- classmethod load(path: str | Path)
Load the solution from a file.
- out_idxs: list[int]
Alias for field number 2
- property out_latency
Latencies of all output elements of the solution.
- out_negs: list[bool]
Alias for field number 4
- property out_qint
Quantization intervals of the output elements.
- out_shifts: list[int]
Alias for field number 3
- property ref_count: ndarray
The number of references to the output elements in the solution.
- save(path: str | Path)
Save the solution to a file.
- save_binary(path: str | Path)
Dump the solution to a binary file.
- shape: tuple[int, int]
Alias for field number 0
- to_binary()
Module contents
- class da4ml.cmvm.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)
Bases:
NamedTuple
One single operation on the data buffer.
- Parameters:
id0 (int) – index of the first operand
id1 (int) – index of the second operand, or special opcode if negative
opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition
data (int) – Data to be used in the operation
qint (QInterval) – Quantization interval of the resultant buffer
latency (float) – Latency of the data generated by this operation (t_available)
cost (float) – Cost of the operation
- cost: float
Alias for field number 6
- data: int
Alias for field number 3
- id0: int
Alias for field number 0
- id1: int
Alias for field number 1
- latency: float
Alias for field number 5
- opcode: int
Alias for field number 2
- class da4ml.cmvm.QInterval(min: float, max: float, step: float)
Bases:
NamedTuple
A class representing a quantized interval: [min, max] with a step size.
- classmethod from_kif(k: int | bool, i: int, f: int)
- max: float
Alias for field number 1
- min: float
Alias for field number 0
- property precision
- step: float
Alias for field number 2
- class da4ml.cmvm.Solution(shape: tuple[int, int], inp_shift: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int)
Bases:
NamedTuple
Represents a series of operations that can be applied to a vector of data. May represent a CMVM solution or a general neural network
- shape
#input, #output
- Type:
tuple[int, int]
- inp_shift
The shifts that should be applied to the input data.
- Type:
list[int]
- out_idxs
The indices of the output data in the buffer.
- Type:
list[int]
- out_shifts
The shifts that should be applied to the output data.
- Type:
list[int]
- out_negs
The signs of the output data.
- Type:
list[bool]
- carry_size
Size of the carrier for the adder.
- Type:
int
- adder_size
Elementary size of the adder.
- Type:
int
The core part of the solution is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.
- adder_size: int
Alias for field number 7
- carry_size: int
Alias for field number 6
- property cost
Total cost of the solution.
- classmethod deserialize(data: dict)
Load the solution from a file.
- property inp_latency
Latencies of all input elements of the solution.
- property inp_qint
Quantization intervals of the input elements.
- inp_shift: list[int]
Alias for field number 1
- property kernel
the kernel represented by the solution, when applicable.
- property latency
Minimum and maximum latency of the solution.
- classmethod load(path: str | Path)
Load the solution from a file.
- out_idxs: list[int]
Alias for field number 2
- property out_latency
Latencies of all output elements of the solution.
- out_negs: list[bool]
Alias for field number 4
- property out_qint
Quantization intervals of the output elements.
- out_shifts: list[int]
Alias for field number 3
- property ref_count: ndarray
The number of references to the output elements in the solution.
- save(path: str | Path)
Save the solution to a file.
- save_binary(path: str | Path)
Dump the solution to a binary file.
- shape: tuple[int, int]
Alias for field number 0
- to_binary()
- da4ml.cmvm.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)
Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).
- Parameters:
kernel (np.ndarray) – The input kernel matrix.
qintervals (list[QInterval]) – List of QIntervals for each input.
latencies (list[float]) – List of latencies for each input
carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)
adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)
- Returns:
The minimal latency for the given kernel, QInterval, and input latencies.
- Return type:
float
- da4ml.cmvm.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) CascadedSolution
Solve the CMVM problem with cascaded two matrices.
- Parameters:
kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.
- Returns:
A solution containing the optimized implementation of the CMVM computation with cascaded stages.
- Return type: