da4ml.cmvm package

Submodules

da4ml.cmvm.api module

da4ml.cmvm.api.jit_solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1) → CascadedSolution

Optimized implementation of a CMVM computation with cascaded two matrices.

Parameters:

kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

da4ml.cmvm.api.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)

Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).

Parameters:

kernel (np.ndarray) – The input kernel matrix.
qintervals (list[QInterval]) – List of QIntervals for each input.
latencies (list[float]) – List of latencies for each input
carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)
adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)

Returns:

The minimal latency for the given kernel, QInterval, and input latencies.

Return type:

float

da4ml.cmvm.api.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) → CascadedSolution

Solve the CMVM problem with cascaded two matrices.

Parameters:

kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

da4ml.cmvm.types module

class da4ml.cmvm.types.CascadedSolution(solutions: tuple[Solution, ...])

Bases: NamedTuple

A solution that implements cascaded matrix-vector multiplications through multiple CMVM stages.

CascadedSolution represents a sequence of Solution objects where the output of each stage is fed as input to the next stage.

solutions

A tuple containing the individual Solution objects for each stage of the cascade.

Type:: tuple[Solution, …]

Properties

----------

kernel

The overall kernel matrix which the cascaded solution implements: vec @ kernel = solution(vec). This is calculated as the matrix product of all individual solution kernels.

Type:: NDArray[float32]

cost

The total cost of the cascaded solution, computed as the sum of the costs of all stages.

Type:: float

latency

The minimum and maximum latency of the cascaded solution.

Type:: tuple[float, float]

inp_qint

Input quantization intervals

Type:: list[QInterval]

inp_lat

Input latencies

Type:: list[float]

in_shift

Input shifts

Type:: list[int]

out_qint

Output quantization intervals

Type:: list[QInterval]

out_lat

Output latencies

Type:: list[float]

out_shift

Output shifts

Type:: list[int]

out_neg

Output signs

Type:: list[bool]

shape

The shape of the corresponding kernel matrix.

Type:: tuple[int, int]

property cost

classmethod deserialize(data: dict): Load the solution from a file.

property inp_latency

property inp_qint

property inp_shift

property kernel

property latency

classmethod load(path: str): Load the solution from a file.

property out_latencies

property out_neg

property out_qint

property out_shift

property reg_bits: The number of bits used for the register in the solution.

save(path: str | Path): Save the solution to a file.

property shape

solutions: tuple[Solution, ...]: Alias for field number 0

class da4ml.cmvm.types.DAState(shifts: tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[int8]]], expr: list[ndarray[tuple[int, ...], dtype[int8]]], ops: list[Op], freq_stat: dict[Pair, int], kernel: ndarray[tuple[int, ...], dtype[float32]])

Bases: NamedTuple

Internal state of the DA algorithm.

expr: list[ndarray[tuple[int, ...], dtype[int8]]]: Alias for field number 1

freq_stat: dict[Pair, int]: Alias for field number 3

kernel: ndarray[tuple[int, ...], dtype[float32]]: Alias for field number 4

ops: list[Op]: Alias for field number 2

shifts: tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[int8]]]: Alias for field number 0

class da4ml.cmvm.types.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)

Bases: NamedTuple

One single operation on the data buffer.

Parameters:

id0 (int) – index of the first operand
id1 (int) – index of the second operand, or special opcode if negative
opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition
data (int) – Data to be used in the operation
qint (QInterval) – Quantization interval of the resultant buffer
latency (float) – Latency of the data generated by this operation (t_available)
cost (float) – Cost of the operation

cost: float: Alias for field number 6

data: int: Alias for field number 3

id0: int: Alias for field number 0

id1: int: Alias for field number 1

latency: float: Alias for field number 5

opcode: int: Alias for field number 2

qint: QInterval: Alias for field number 4

class da4ml.cmvm.types.Pair(id0: int, id1: int, sub: bool, shift: int)

Bases: NamedTuple

An operation representing data[id0] +/- data[id1] * 2**shift.

id0: int: Alias for field number 0

id1: int: Alias for field number 1

shift: int: Alias for field number 3

sub: bool: Alias for field number 2

class da4ml.cmvm.types.Precision(keep_negative: bool, integers: int, fractional: int)

Bases: NamedTuple

A class representing the precision of a quantized interval.

fractional: int: Alias for field number 2

classmethod from_qint(qint: QInterval, symmetric: bool = False)

integers: int: Alias for field number 1

keep_negative: bool: Alias for field number 0

property qint

class da4ml.cmvm.types.QInterval(min: float, max: float, step: float)

Bases: NamedTuple

A class representing a quantized interval: [min, max] with a step size.

classmethod from_kif(k: int | bool, i: int, f: int)

classmethod from_precision(prec: Precision)

max: float: Alias for field number 1

min: float: Alias for field number 0

property precision

step: float: Alias for field number 2

class da4ml.cmvm.types.Solution(shape: tuple[int, int], inp_shift: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int)

Bases: NamedTuple

Represents a series of operations that can be applied to a vector of data. May represent a CMVM solution or a general neural network

shape

#input, #output

Type:: tuple[int, int]

inp_shift

The shifts that should be applied to the input data.

Type:: list[int]

out_idxs

The indices of the output data in the buffer.

Type:: list[int]

out_shifts

The shifts that should be applied to the output data.

Type:: list[int]

out_negs

The signs of the output data.

Type:: list[bool]

ops

Core list of operations for generating each buffer element.

Type:: list[Op]

carry_size

Size of the carrier for the adder.

Type:: int

adder_size

Elementary size of the adder.

Type:: int

The core part of the solution is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.

adder_size: int: Alias for field number 7

carry_size: int: Alias for field number 6

property cost: Total cost of the solution.

classmethod deserialize(data: dict): Load the solution from a file.

property inp_latency: Latencies of all input elements of the solution.

property inp_qint: Quantization intervals of the input elements.

inp_shift: list[int]: Alias for field number 1

property kernel: the kernel represented by the solution, when applicable.

property latency: Minimum and maximum latency of the solution.

classmethod load(path: str | Path): Load the solution from a file.

ops: list[Op]: Alias for field number 5

out_idxs: list[int]: Alias for field number 2

property out_latency: Latencies of all output elements of the solution.

out_negs: list[bool]: Alias for field number 4

property out_qint: Quantization intervals of the output elements.

out_shifts: list[int]: Alias for field number 3

property ref_count: ndarray: The number of references to the output elements in the solution.

save(path: str | Path): Save the solution to a file.

save_binary(path: str | Path): Dump the solution to a binary file.

shape: tuple[int, int]: Alias for field number 0

to_binary()

da4ml.cmvm.types.minimal_kif(qi: QInterval, symmetric: bool = False) → Precision

Calculate the minimal KIF for a given QInterval.

Parameters:

qi (QInterval) – The QInterval to calculate the KIF for.
symmetric (bool) – Only relevant if qi may be negative. If True, -2**i will be regarded as forbidden. May be useful in special cases only. Default is False.

Returns:

A named tuple with the KIF values.

Return type:

Precision

Module contents

class da4ml.cmvm.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)

Bases: NamedTuple

One single operation on the data buffer.

Parameters:

id0 (int) – index of the first operand
id1 (int) – index of the second operand, or special opcode if negative
opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition
data (int) – Data to be used in the operation
qint (QInterval) – Quantization interval of the resultant buffer
latency (float) – Latency of the data generated by this operation (t_available)
cost (float) – Cost of the operation

cost: float: Alias for field number 6

data: int: Alias for field number 3

id0: int: Alias for field number 0

id1: int: Alias for field number 1

latency: float: Alias for field number 5

opcode: int: Alias for field number 2

qint: QInterval: Alias for field number 4

class da4ml.cmvm.QInterval(min: float, max: float, step: float)

Bases: NamedTuple

A class representing a quantized interval: [min, max] with a step size.

classmethod from_kif(k: int | bool, i: int, f: int)

classmethod from_precision(prec: Precision)

max: float: Alias for field number 1

min: float: Alias for field number 0

property precision

step: float: Alias for field number 2

class da4ml.cmvm.Solution(shape: tuple[int, int], inp_shift: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int)

Bases: NamedTuple

Represents a series of operations that can be applied to a vector of data. May represent a CMVM solution or a general neural network

shape

#input, #output

Type:: tuple[int, int]

inp_shift

The shifts that should be applied to the input data.

Type:: list[int]

out_idxs

The indices of the output data in the buffer.

Type:: list[int]

out_shifts

The shifts that should be applied to the output data.

Type:: list[int]

out_negs

The signs of the output data.

Type:: list[bool]

ops

Core list of operations for generating each buffer element.

Type:: list[Op]

carry_size

Size of the carrier for the adder.

Type:: int

adder_size

Elementary size of the adder.

Type:: int

The core part of the solution is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.

adder_size: int: Alias for field number 7

carry_size: int: Alias for field number 6

property cost: Total cost of the solution.

classmethod deserialize(data: dict): Load the solution from a file.

property inp_latency: Latencies of all input elements of the solution.

property inp_qint: Quantization intervals of the input elements.

inp_shift: list[int]: Alias for field number 1

property kernel: the kernel represented by the solution, when applicable.

property latency: Minimum and maximum latency of the solution.

classmethod load(path: str | Path): Load the solution from a file.

ops: list[Op]: Alias for field number 5

out_idxs: list[int]: Alias for field number 2

property out_latency: Latencies of all output elements of the solution.

out_negs: list[bool]: Alias for field number 4

property out_qint: Quantization intervals of the output elements.

out_shifts: list[int]: Alias for field number 3

property ref_count: ndarray: The number of references to the output elements in the solution.

save(path: str | Path): Save the solution to a file.

save_binary(path: str | Path): Dump the solution to a binary file.

shape: tuple[int, int]: Alias for field number 0

to_binary()

da4ml.cmvm.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)

Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).

Parameters:

kernel (np.ndarray) – The input kernel matrix.
qintervals (list[QInterval]) – List of QIntervals for each input.
latencies (list[float]) – List of latencies for each input
carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)
adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)

Returns:

The minimal latency for the given kernel, QInterval, and input latencies.

Return type:

float

da4ml.cmvm.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) → CascadedSolution

Solve the CMVM problem with cascaded two matrices.

Parameters:

kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

da4ml.cmvm package

Subpackages

Submodules

da4ml.cmvm.api module

da4ml.cmvm.types module

Module contents