Estimators

NMF Estimators.

The espm.estimators module implements different NMF algorithms.

The class espm.estimators is an abstract class for all NMF algorithms. It implements the fit and transform methods. The fit method is implemented in the abstract class and calls the _iteration method which is implemented in the child classes. The transform method is implemented in the child classes.

NMF Estimator

class espm.estimators.NMFEstimator(n_components=2, init=None, tol=0.0001, max_iter=200, random_state=None, verbose=1, debug=False, l2=False, G=None, shape_2d=None, normalize=False, log_shift=1e-14, eval_print=10, true_D=None, true_H=None, fixed_H=None, fixed_W=None, hspy_comp=False, no_stop_criterion=False, simplex_H=False, simplex_W=True)[source]

Abstract class for NMF algorithms.

This abstract class espm.estimators.NMFEstimator is used to implement the different NMF algorithms. It solves problems of the form:

\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} \frac{1}{2} L(X, GWH) + R(W, H)\]

where $X$ is the data matrix, $G$ is a matrix of known values, $W$ and $H$ are the matrices to be learned, $R$ is a regularization term and $L$ is a loss function. that represent the generalized KL divergence. As a reminder, the generalized KL divergence is defined as:

\[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

where $Y = GWH$. Since $X$ does not depend on $W$ and $H$, we obtain the loss function:

\[L(X, Y) = - \sum_{i,j} X_{ij} \log \frac{GWH_{ij}} + GWH_{ij}\]

The Generalized KL divergence has the advantage of being zero when $X = Y$, which is not the case for our loss. Therefore, we shift the loss function by a constant $C$ such that it equals the Generalized KL divergence. This constant is stored in the attribute espm.estimators.NMFEstimator.const_KL_.

The loss function can also be selected to be the Frobenius norm. In this case, the loss function is:

\[L(X, Y) = \frac{1}{2} \sum_{i,j} (X_{ij} - Y_{ij})^2\]

While the code will work, it is not recommended to use the Frobenius norm as a loss function. This code is optimized for the KL divergence.

The size of:

$X$ is $(n, p)$,
$W$ is $(m, k)$,
$H$ is $(k, p)$,
$G$ is $(n, m)$.

The columns of the matrices $H$ and $X$ are assumed to be images. This is used typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e. shape_2d[0]*shape_2d[1] = p.

Parameters:

n_componentsint, default=2

Number of components, i.e. dimensionality of the latent space.

initstr

Method used to initialize the procedure. Default is None The method use the initialization of sklearn.decomposition. It can be imported using: .. code-block::python

>>> from sklearn.decomposition._nmf import _initialize_nmf

tolfloat, default=1e-4

Tolerance of the stopping condition.

max_iterint, default=200

Maximum number of iterations before timing out.

random_stateint, RandomState instance, default=None

verboseint, default=1

The verbosity level.

debugbool, default=False

If True, the algorithm will log more and perform more checks.

l2bool, default=False

If True, the algorithm will use the l2 norm instead of the KL divergence.

Gnp.array, function or None, default=None

If np.array, it is the known matrix of the data. If function, it is a function that takes as input the data matrix and returns the known matrix (np.array). If None, it is assumed that G is the identity matrix.

shape_2dtuple or None, default=None

If not None, it is the image shape of the columns of the matrices $X$ and $H$.

normalizebool, default=False

If True, the algorithm will normalize the data matrix $X$.

log_shiftfloat, default=1e-10

Lower bound for W and H, i.e. $\epsilon$.

eval_printint, default=10

Number of iterations between each evaluation of the loss function.

true_Dnp.array or None, default=None

Ground truth for the matrix $GW$. Used for evaluation purposes.

true_Hnp.array or None, default=None

Ground truth for the matrix $H$. Used for evaluation purposes.

fixed_Hnp.array or None, default=None

If not None, it fixes the non-zero values of the matrix $H$. Note that convergence is not guaranteed with fixed_H enabled.

fixed_Wnp.array or None, default=None

If not None, it fixes the non-zero values of the matrix $W$. Note that convergence is not guaranteed with fixed_W enabled.

no_stop_criterionbool, default=False

If True, the algorithm will not stop when the stopping criterion is reached and will continue until max_iter is reached.

hspy_compbool, default=False

If True, the algorithm will use the format compatible with hyperspy. Use this option if you run the algorithm with the method decompositio in hyperspy. For example: .. code-block::python

>>> est = SmoothNMF( n_components = 3, hspy_comp = True)
>>> out = spim.decomposition(algorithm = est, return_info=True)

fit(X, y=None, **params)[source]

Learn a NMF model for the data X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Data matrix to be decomposed
yIgnored
paramsdict: Parameters passed to the fit_transform method.

Returns:

self: The model.

fit_transform(X, y=None, W=None, H=None)[source]

Main function of the estimator object. Learn a NMF model for the data X and returns the transformed data. This is more efficient than calling fit followed by transform.

The size of:

$X$ is $(n, p)$,
$W$ is $(m, k)$,
$H$ is $(k, p)$,
$G$ is $(n, m)$.

Parameters:

X{array-like, sparse matrix} of shape (n, p): Data matrix to be decomposed
yIgnored: Not used, present here for API consistency by convention.
Warray-like of shape (m, k): If specified, it is used as initial guess for the solution.
Harray-like of shape (k, p): If specified, it is used as initial guess for the solution.

Returns:

GWndarrays: Transformed data.

get_losses()[source]: For debug purposes : return the evolution of losses.

inverse_transform(W)[source]

Transform data back to its original space.

Parameters:

W{ndarray, sparse matrix} of shape (n_samples, n_components): Transformed data matrix.

Returns:

X{ndarray, sparse matrix} of shape (n_samples, n_features): Data matrix of original shape.

loss(W, H, average=True, X=None)[source]

Loss function.

Compute the loss function for the given matrices W and H.

Parameters:

Wnp.array: Matrix of shape (n, k)
Hnp.array: Matrix of shape (k, p)
averagebool, default=True: If True, the loss is averaged over the number of elements of the matrices.
Xnp.array or None, default=None: If not None, it is the data matrix. If None, it is assumed that the data matrix in self.X_.

Returns:

loss_float: Value of the loss function.

remove_zeros_lines(X, epsilon)[source]

set_inverse_transform_request(*, W: bool | None | str = '$UNCHANGED$') → NMFEstimator

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

Wstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for W parameter in inverse_transform.

Returns:

selfobject: The updated object.

const_KL_ = None

loss_names_ = ['KL_div_loss']

SmoothNMF

class espm.estimators.SmoothNMF(lambda_L=0.0, linesearch=False, mu=0, epsilon_reg=1, algo='log_surrogate', dicotomy_tol=1e-05, gamma=None, **kwargs)[source]

SmoothNMF - NMF with a smooth regularization term

We encourage to read the example available in the documentation: https://espm.readthedocs.io/en/latest/introduction/notebooks/toy-problem.html

The corresponding notebook is available on github: https://github.com/adriente/espm/blob/main/notebooks/toy-ML.ipynb

The class SmoothNMF implements the regularized NMF algorithm. It solves problems of the form:

\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} D_{GKL}(X || GWH) + \lambda_L tr(H \Delta H^\top) + \mu \sum_{ij} \log(H_{ij} + \epsilon_{reg})\]

where:

D_{GKL} is the Generalized KL divergence loss function defined as:

\[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

See the documentation of the class espm.estimators.NMFEstimator for more details.
Delta is the Laplacian operator (it can be created using the function create_laplacian_matrix from the utils module).
epsilon_{reg} is the slope of the log regularization/sparsity at 0 (you probably want to leave this to 1).
lambda_L is a regularization parameter, which encourages smoothness in the columns of H.
mu is a regularization parameter, which is similar to an L1 sparsity penalty.

The size of:

X is (n, p)
W is (m, k)
H is (k, p)
G is (n, m)

The columns of the matrices H and X are assumed to be images, typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e., shape_2d[0]*shape_2d[1] = p.

Parameters:

lambda_Lfloat, default=1.0: Regularization parameter for the smooth regularization term.
linesearchbool, default=False: If True, use a line search to find the step size.
mufloat, default=0: Regularization parameter for the log regularization/sparsity term.
epsilon_regfloat, default=1: Slope of the log regularization/sparsity at 0.
algostr, default=”log_surrogate”: Algorithm to use for the smooth regularization term. Can be “log_surrogate”, “l2_surrogate”, or “projected_gradient”.
simplex_Hbool, default=False: If True, force the solution of H to be in the simplex.
simplex_Wbool, default=True: If True, force the solution of W to be in the simplex.
dicotomy_tolfloat, default=1e-3: Tolerance for the dichotomy algorithm.
gammafloat, default=None: Initial value for the step size. If None, it is set to the Lipschitz constant of the gradient.
**kwargsdict: Additional parameters for the NMFEstimator class.

check_params()[source]

fit_transform(X, y=None, W=None, H=None)[source]

Fit the model to the data X and returns the transformed data.

The size of:

$X$ is $(n, p)$,
$W$ is $(m, k)$,
$H$ is $(k, p)$,
$G$ is $(n, m)$.

Parameters:

Xarray-like, shape (n, p): Data matrix to be decomposed
yIgnored: Not used, present here for API consistency by convention.
Warray-like, shape (m, k): If init=’custom’, it is used as initial guess for the solution.
Harray-like, shape (k, p): If init=’custom’, it is used as initial guess for the solution.

Returns:

GWndarrays: Transformed data.

loss(W, H, average=True, X=None)[source]: Compute the loss function.

set_inverse_transform_request(*, W: bool | None | str = '$UNCHANGED$') → SmoothNMF

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

Wstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for W parameter in inverse_transform.

Returns:

selfobject: The updated object.

loss_names_ = ['KL_div_loss', 'log_reg_loss', 'Lapl_reg_loss', 'gamma']