espm: The Electron Spectro-Microscopy Python Library
This library contains algorithms to perform non-negative matrix factorization with diverse regularisation (e.g. Laplacian or L1) and contraints (e.g. simplex).
It is specialized for Electron Microscopy applications. It contains code to create artificial Energy dispersive X-ray spectroscopy (EDXS) data and to perform hyperspectral unmixing on EDXS spectrum images.
Installation
You can install this package from PyPi using:
$ pip install espm
If you want to develop, please use the option:
$ git clone https://github.com/adriente/espm.git
$ cd espm
$ pip install cython
$ pip install -e ".[dev]"
Getting started
Generate the synthetic dataset. Run the script:
$ python experiments/generate_synthetic_dataset.py
Documentation
The documentation is available at https://espm.readthedocs.io/en/latest/
You can get started with the following notebooks:
TODOs
Here is a list of things that we need to do before the version 0.2.0, which will be the first official release of the library. The code is already available on github at the following address: https://github.com/adriente/espm.git A draft of the documentation is available at: https://espm.readthedocs.io/en/latest/
Update the line 40 of doc/introduction/index.rst (@Adrien)
- Make some doc for the dataset module (just the minimum) (@Adrien)
Nati: I started doint it. I suggest to put all the function that create samples in dataset.samples.
Toy dataset: create model class, change outputs, adapts function (@Nati) Done!
- Separate the spectral and spacial parts (@Adrien)
Move generate_EDXS_phases to models
Create a modules for weights (Nati: I started doing it)
Complete the general doc in doc/introduction/index.rst (@Adrien)
Clarify the code for the estimator: remove the L2 loss (@Nati) Done!
Add the general problem that we solve in the doc (@Nati) Done!
Update the ML notebook with more explanations (@Nati) Done!
Check that the doc is somehow understanable and sufficiently complete (@Sebastian)
Add the reference to the dataset paper (@Nati)
add the reference to the algorithm paper (@Nati)
@Adrien: Check the code in espm.weights. If you copy this code, you can generate figures for the doc. It seems that you need to have the title Examples for the system to work. Example does not work!
Documentation map
Introduction to the espm
This tutorial will show you the basic operations of the toolbox. After installing the package with pip, start by opening a python shell, e.g. a Jupyter notebook, and import espm. The package espm is built on top of the hyperspy and the scikit-learn packages.
The hyperspy package is a Python library for multidimensional data analysis. It provides the base framework to handles our data. The scikit-learn package is a Python library for machine learning. It provides the base framework to for the Non Negative Matrix Factorization (NMF) algorithms develeoped in this package.
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import hyperspy.api as hs
>>> import espm
Datasets
Let us start by creating some data to play with this tutorial. We can generating artificial datasets using the following lines:
>>> import espm.datasets as ds
>>> ds.generate_built_in_datasets(seeds_range=1)
>>> spim = ds.load_particules(sample = 0)
>>> spim.change_dtype('float64')
The command generate_built_in_datasets will save the generated datasets folder defined in the espm.conf.py file. Alternatively, you can also define the path where the data will be saved using the “base_path” argument.
>>> ds.generate_built_in_datasets(base_path="generated_samples", seeds_range=1)
Here the object spim is of the hyperspy._signals.signal1d.Signal1D
.
This object has different useful attributes and methods. For example,
@ADRIEN — summarize here some of them
Note
Please see the review article espm : a Python library for the simulation of STEM-EDXS datasets for an overview of the simulation methods this package leverages.
Factorization
Taking the non-negative matrix factorization (NMF) is done with the following:
>>> out = spim.decomposition(3)
>>> spim.plot_decomposition_loadings(3)
>>> spim.plot_decomposition_factors(3)


BIt will use the algorithms developped in this contribution.
These algorithms are an important part of this package. They are specialized to solve regularized Poisson NMF problems. Mathematically, they can be expressed as:
Here \(D_{GKL}\) is the fidelity term, i.e. the Generalized KL divergence
The loss is regularized using two terms: a Laplacian regularization on \(H\) and a log regularization on \(H\). \(\lambda\) and \(\mu\) are the regularization parameters. The Laplacian regularization is defined as:
where \(\Delta\) is the Laplacian operator (it can be created using the function espm.utils.create_laplacian_matrix
).
Note that the columns of the matrices :math:`H` and :math:`X` are assumed to be images.
The log regularization is defined as:
where \(\epsilon_{reg}\) is the slope of log regularization at 0. This term acts similarly to an L1 penalty but affects less larger values.
Finally, we assume \(W,H\geq \epsilon\) and that the lines of \(H\) sum to 1:
The size of:
\(X\) is (n, p)
\(W\) is (m, k)
\(H\) is (k, p)
\(G\) is (n, m)
The columns of the matrices \(H\) and \(X\) are assumed to be images, typically for the smoothness regularization. In terms of shape, we have \(n_x \cdot n_y = p\), where \(n_x\) and \(n_y\) are the number of pixels in the x and y directions.
A detailed example on the use these algorithms can be found in this notebook.
List of example notebooks
To go deeper, we invite you to consult the following notebooks.
Analysis of simulated EDX data
In this notebook we showcase the analysis of the built-in simulated dataset. We use the espm EDXS modelling to simulate the data. We first perform a KL-NMF decomposition of the data and then plot the results.
Imports
[1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
# Generic imports
import hyperspy.api as hs
import numpy as np
# espm imports
from espm.models.EDXS_function import print_concentrations_from_W
from espm.estimators import SmoothNMF
import espm.datasets as ds
Generating artificial datasets and loading them
If the datasets were already generated, they are not generated again
[2]:
# Generate the data
# Here `seeds_range` is the number of different samples to generate
ds.generate_built_in_datasets(seeds_range=1)
# Load the data
spim = ds.load_particules(sample = 0)
# We need to change the data to floats for the decomposition to run
spim.change_dtype('float64')
Building G
The information in the metadata of the spim object are used to build the G matrix. This matrix contains a model of the characteristic X-rays and the bremsstrahlung.
[3]:
G = spim.build_G("bremsstrahlung")
Problem solving
3 components for 3 phases
convergence criterions : tol and max_iter. tol is the minimum change of the loss function between two iterations. max_iter is the max number of iterations.
hspy_comp is a required parameter if you want to use the hyperspy api
[4]:
est = SmoothNMF( n_components = 3,tol=0.000001, max_iter = 1000, G = G,hspy_comp = True)
/! It should take a minute to execute in the case of the built-in dataset
[5]:
out = spim.decomposition(algorithm = est, return_info=True)
It 10 / 1000: loss 2.060405e-01, 1.349 it/s
It 20 / 1000: loss 2.059418e-01, 1.383 it/s
It 30 / 1000: loss 2.059186e-01, 1.397 it/s
It 40 / 1000: loss 2.059093e-01, 1.404 it/s
It 50 / 1000: loss 2.059045e-01, 1.405 it/s
It 60 / 1000: loss 2.059014e-01, 1.409 it/s
It 70 / 1000: loss 2.058977e-01, 1.411 it/s
It 80 / 1000: loss 2.058942e-01, 1.413 it/s
It 90 / 1000: loss 2.058909e-01, 1.414 it/s
It 100 / 1000: loss 2.058864e-01, 1.415 it/s
It 110 / 1000: loss 2.058828e-01, 1.416 it/s
It 120 / 1000: loss 2.058789e-01, 1.417 it/s
It 130 / 1000: loss 2.058749e-01, 1.417 it/s
It 140 / 1000: loss 2.058706e-01, 1.416 it/s
It 150 / 1000: loss 2.058663e-01, 1.417 it/s
It 160 / 1000: loss 2.058620e-01, 1.417 it/s
It 170 / 1000: loss 2.058577e-01, 1.418 it/s
It 180 / 1000: loss 2.058538e-01, 1.418 it/s
It 190 / 1000: loss 2.058496e-01, 1.419 it/s
It 200 / 1000: loss 2.058448e-01, 1.419 it/s
It 210 / 1000: loss 2.058395e-01, 1.419 it/s
It 220 / 1000: loss 2.058341e-01, 1.420 it/s
It 230 / 1000: loss 2.058288e-01, 1.419 it/s
It 240 / 1000: loss 2.058235e-01, 1.420 it/s
It 250 / 1000: loss 2.058183e-01, 1.420 it/s
It 260 / 1000: loss 2.058132e-01, 1.420 it/s
It 270 / 1000: loss 2.058085e-01, 1.420 it/s
It 280 / 1000: loss 2.058041e-01, 1.421 it/s
It 290 / 1000: loss 2.057997e-01, 1.421 it/s
It 300 / 1000: loss 2.057954e-01, 1.421 it/s
It 310 / 1000: loss 2.057912e-01, 1.421 it/s
It 320 / 1000: loss 2.057874e-01, 1.421 it/s
It 330 / 1000: loss 2.057838e-01, 1.421 it/s
It 340 / 1000: loss 2.057804e-01, 1.421 it/s
It 350 / 1000: loss 2.057772e-01, 1.421 it/s
It 360 / 1000: loss 2.057743e-01, 1.421 it/s
It 370 / 1000: loss 2.057716e-01, 1.422 it/s
It 380 / 1000: loss 2.057691e-01, 1.422 it/s
exits because of relative change < tol: 1.036662026376086e-06
Stopped after 390 iterations in 4.0 minutes and 34.0 seconds.
Decomposition info:
normalize_poissonian_noise=False
algorithm=SmoothNMF()
output_dimension=None
centre=None
scikit-learn estimator:
SmoothNMF()
First cell : Printing the resulting concentrations.
Second cell : Ploting the resulting spectra
Thrid cell : Ploting the resulting abundances
Hyperspy is mainly designed to be used with the qt graphical backend of matplotlib. Thus two plots will appear if you are in inline mode.
[6]:
print_concentrations_from_W(est.W_, elements = spim.metadata.Sample.elements)
Concentrations report
p0 p1 p2
La : 0.0000 0.2355 0.0000
Rb : 0.4849 0.0061 0.5608
W : 0.0423 0.0014 0.0040
Yb : 0.0000 0.1038 0.0000
Pt : 0.0001 0.0214 0.0001
V : 0.0285 0.0018 0.0081
Al : 0.0000 0.1182 0.0000
Ti : 0.0000 0.1065 0.0017
N : 0.0018 0.1239 0.0003
[7]:
spim.plot_decomposition_loadings(3)
[7]:


[8]:
spim.plot_decomposition_factors(3)
[8]:


Generate simulated data
In this notebook we generate a simulated dataset based on experimental data. - The maps are directly extracted from the experimental data. - The phases (or spectra) are built using chemical composition extracted from the literature (citations to be included)
The output of this notebook is put in the generated_datasets
Imports
[1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
# espm imports
from espm.datasets.generate_weights import generate_weights, chemical_maps_weights
from espm.datasets.generate_EDXS_phases import generate_random_phases, unique_elts, generate_modular_phases
from espm.datasets.base import generate_dataset
from espm.utils import arg_helper
from pathlib import Path
# Generic imports
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import hyperspy.api as hs
Generate weights
We use an experimental dataset to create realistic abundance maps. Theses maps are based from chemical mapping of the dataset.
[2]:
def get_repo_path():
"""Get the path to the git repository.
This is a bit of a hack, but it works.
"""
this_path = Path.cwd() / Path("generate_data.ipynb")
return this_path.resolve().parent.parent
# Path of the experimental dataset
path = get_repo_path() / Path("experiments/71GPa_experimental_data.hspy")
# Creation of the weights
weights = chemical_maps_weights(path,["Fe_Ka","Ca_Ka"],conc_max = 0.7)
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.1.4/lib/python3.7/site-packages/hyperspy/misc/utils.py:475: VisibleDeprecationWarning: Use of the `binned` attribute in metadata is going to be deprecated in v2.0. Set the `axis.is_binned` attribute instead.
VisibleDeprecationWarning,
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.1.4/lib/python3.7/site-packages/hyperspy/io.py:575: VisibleDeprecationWarning: Loading old file version. The binned attribute has been moved from metadata.Signal to axis.is_binned. Setting this attribute for all signal axes instead.
'signal axes instead.', VisibleDeprecationWarning)
[3]:
fig,axs = plt.subplots(1,3,figsize = (20,20*3))
for j in range(axs.shape[0]) :
im = axs[j].imshow(weights[:,:,j],vmin = 0, vmax = 1.0)
axs[j].tick_params(axis = "both",width = 0,labelbottom = False,labelleft = False)
axs[j].set_title("Phase {}".format(j),fontsize = 22)
fig.subplots_adjust(right=0.84)
cbar_ax = fig.add_axes([0.85, 0.47, 0.01, 0.045])
fig.colorbar(im,cax=cbar_ax)
cbar_ax.tick_params(labelsize=22)

Generate phases
We generate phases based on values extracted from the literature. The phases we try to simulate here are Ferropericlase, Bridgmanite and Ca-perovskite.
We use dictionnaries to input the modelling parameters.
[4]:
# Elemental concetration of each phase
elts_dicts = [
# Pseudo ferropericlase
{
"Mg" : 0.522, "Fe" : 0.104, "O" : 0.374, "Cu" : 0.05
},
# Pseudo Ca-Perovskite
{
"Mg" : 0.020, "Fe" : 0.018, "Ca" : 0.188, "Si" : 0.173, "Al" : 0.010, "O" : 0.572, "Ti" : 0.004, "Cu" : 0.05, "Sm" : 0.007, "Lu" : 0.006, "Nd" : 0.006
},
# Pseudo Bridgmanite
{
"Mg" : 0.445, "Fe" : 0.035, "Ca" : 0.031, "Si" : 0.419, "Al" : 0.074, "O" : 1.136, "Cu" : 0.05, "Hf" : 0.01
}]
# Parameters of the bremsstrahlung
brstlg_pars = [
{"b0" : 0.0001629, "b1" : 0.0009812},
{"b0" : 0.0007853, "b1" : 0.0003658},
{"b0" : 0.0003458, "b1" : 0.0006268}
]
# Model parameters : energy scale, detector broadening, x-ray emission database, beam energy, absorption parameters, detector efficiency
model_params = {
"e_offset" : 0.3,
"e_size" : 1980,
"e_scale" : 0.01,
"width_slope" : 0.01,
"width_intercept" : 0.065,
"db_name" : "default_xrays.json",
"E0" : 200,
"params_dict" : {
"Abs" : {
"thickness" : 100.0e-7,
"toa" : 35,
"density" : 4.5,
"atomic_fraction" : False
},
"Det" : "SDD_efficiency.txt"
}
}
# miscellaneous paramaters : average detected number of X-rays per pixel, phases densities, output folder, model name, random seed
data_dict = {
"N" : 100,
"densities" : [1.2,1.0,0.8],
"data_folder" : "71GPa_synthetic_N100",
"model" : "EDXS",
"seed" : 42
}
[5]:
phases, full_dict = generate_modular_phases(
elts_dicts=elts_dicts, brstlg_pars = brstlg_pars,
scales = [1, 1, 1],
model_params= model_params,
seed = 42
)
# scales : bremsstrahlung parameters modifiers
# Finish building the dictionnary of the simulation
data_dict.update(full_dict)
[6]:
fig,axs = plt.subplots(3,1,figsize = (20,15))
# Build the energy scale
x = np.linspace(
full_dict["model_parameters"]["e_offset"],
full_dict["model_parameters"]["e_offset"]+full_dict["model_parameters"]["e_scale"]*full_dict["model_parameters"]["e_size"],
num=full_dict["model_parameters"]["e_size"])
for j in range(axs.shape[0]) :
axs[j].plot(x,phases[j])
axs[j].set_title("Phase {}".format(j),fontsize = 22)
axs[-1].set_xlabel("Energy loss (eV)",fontsize = 16)
[6]:
Text(0.5, 0, 'Energy loss (eV)')

It will produce 1 spectrum images/sample in the target folder.
You can replace seed_range
by the number of samples you want to generate.
[7]:
generate_dataset(seeds_range=1, **data_dict, weights=weights )
100%|██████████| 1/1 [00:04<00:00, 4.17s/it]
The previous command will save the data in the generated_datasets
folder defined in the espm.conf.py
file.
You can also define the path where the data will be saved using the “base_path” argument
[8]:
# generate_dataset(basepath="generated_samples", seeds_range=1, **data_dict, weights=weights )
[ ]:
Regularized Poisson NMF on a toy dataset
This notebook is part of the ESPM package. It is available on github
In this notebook, we show how to solve a regularized Poisson Non Negative Matrix Factorization (NMF) on a toy dataset. The general goal of this problem is to separate a matrix \(X\) into two matrices \(D\) and \(H\) such that \(X \approx DH\). We will assume that \(D\) is the product of two matrices \(G\) and \(W\) such that \(D = GW\). The matrix \(G\) is assumed to be known and therefore \(W\) is the matrix we want to find. In the field of electro-microscopy, the matrix \(GW\) represent the phases, the matrix \(H\) is the matrix of maps (or weights) and \(X\) is the data matrix. The maps \(H\) are 2D images. Furthermore, we will assume that \(W, H\) are non-negative or more generally greater than a small positive value \(\epsilon\).
The data measured data \(X\) is follow a Poisson distribution, i.e. \(X_{ij} \sim \text{Poisson}(n_p\dot{X}_{ij})\). In the context of electro-microscopy, \(n_p\) is the number of photons per pixel. \(\dot{X} = G \dot{W} \dot{H}\) would correspond a noiseless measurement.
The size of:
\(X\) is (n, p),
\(W\) is (m, k),
\(H\) is (k, p),
\(G\) is (n, m).
In general, m is assumed to be much smaller than n and \(G\) is assumed to be known. The parameter shape_2d
defines the shape of the images for the columns of \(H\) and \(X\), i.e. shape_2d[0]*shape_2d[1] = p
.
Mathematically. the problem can be formulated as:
Here \(D_{GKL}\) is the fidelity term, i.e. the Generalized KL divergence
The loss is regularized using two terms: a Laplacian regularization on \(H\) and a log regularization on \(H\). \(\lambda\) and \(\mu\) are the regularization parameters. The Laplacian regularization is defined as:
where \(\Delta\) is the Laplacian operator (it can be created using the function :mod:espm.utils.create_laplacian_matrix
). Note that the columns of the matrices :math:`H` and :math:`X` are assumed to be images.
The log regularization is defined as:
where \(\epsilon_{reg}\) is the slope of log regularization at 0. This term acts similarly to an L1 penalty but affects less larger values.
Finally, we assume \(W,H\geq \epsilon\) and that the lines of \(H\) sum to 1:
This is done by adding the parameter force_simplex=True
to the class espm.estimators.SmoothNMF
. This constraint is essential as it prevent the algorithm to find a solution where all the maps tend to 0 because of the other constraints.
In this notebook, we will use the class espm.estimators.SmoothNMF
to solve the problem.
Imports and function definition
Let’s start by importing the necessary libraries.
[1]:
# Some useful modules for notebooks
%load_ext autoreload
%autoreload 2
%matplotlib inline
[2]:
import numpy as np
import matplotlib.pyplot as plt
from espm.estimators import SmoothNMF
from espm.measures import find_min_angle, ordered_mse, ordered_mae, ordered_r2
from espm.datasets import create_toy_sample
from espm.utils import process_losses
Now we define the parameters that will be used to generate the data.
[3]:
seed = 0 # for reproducibility
m = 15 # Number of components
n = 200 # Length of the phases
n_poisson = 150 # Average poisson number per pixel (this number will be splitted on the m dimension)
[4]:
sample = create_toy_sample(n, m, n_poisson, seed=seed)
GW = sample["GW"]
G = sample["G"]
H = sample["H"]
X = sample["X"]
Xdot = sample["Xdot"]
shape_2d = sample["shape_2d"]
Let us look at the dimension of our problem.
[5]:
print(f"""
- X: {X.shape}
- Xdot: {Xdot.shape}
- G: {G.shape}
- GW: {GW.shape}
- H: {H.shape}
- shape_2d: {shape_2d}
""")
- X: (200, 10000)
- Xdot: (200, 10000)
- G: (200, 15)
- GW: (200, 3)
- H: (3, 10000)
- shape_2d: (100, 100)
Here Xdot
contains the noisless data such that the ground truth H
and GW
satisfies:
[6]:
np.testing.assert_allclose(Xdot, GW @ H)
Let us plot the ground truth maps \(H\) (sometimes also called weights). Here the variable shape_2d
is used to reshape the columns of \(H\) into images.
[7]:
vmin, vmax = 0,1
cmap = plt.cm.gray_r
plt.figure(figsize=(10, 3))
for i, hdot in enumerate(H):
plt.subplot(1,3,i+1)
plt.imshow(H[i].reshape(shape_2d), cmap=cmap, vmin=vmin, vmax=vmax)
plt.axis("off")
plt.title(f"GT - Map {i+1}")

We can also plot the phases corresponding to these maps. The phases corresponds to the matrix \(GW\).
[8]:
e = np.linspace(0,1, GW.shape[0])
plt.plot(e, GW)
plt.title("GT - Phases")
plt.xlabel("Frequency [normalized]")
plt.legend([f"phase {i+1}" for i in range(3)]);

The matrix \(G\) is assumed to be known appriori. Here we plot the first 5 lines of \(G\).
[9]:
l = np.linspace(0, 1,n)
plt.plot(l, G[:,:5]);

Note that the function create_toy_sample
did not return \(W\) but only \(GW\).
Using the ground truth \(GW\), it can be computed as follows:
[10]:
W = np.linalg.lstsq(GW, G, rcond=None)[0].T
W.shape
[10]:
(15, 3)
Solving the problem
Let us define the different hyperparameters of the problem. Feel free to change them and see how the results change.
Let us first define our regularisation parameters.
[11]:
lambda_L = 1 # Smoothness of the maps
mu = 0.2 # Sparsity of the maps
We can additionally add a simplex constraint to the problem by setting force_simplex=True
. This will add the constraint that the line of \(H\) sum to 1. This constraint should be activated for the regularizers to work!. Practically, it will prevent the algorithm from simply minimizing \(W\) and increasing \(H\).
[12]:
force_simplex = True
Finally, we can decide to specify \(G\) or not. If the matrix \(G\) is not specified, the algorithm will directly estimate \(GW\) insead of \(W\).
[13]:
Gused = G
Note that wihtout regularisation, i.e. with the parameters
lambda_L = 0
mu = 0
Gused = None
force_simplex=False
we recover the classical Poisson/KL NMF problem. Our algorithm will apply the MU algorithm from Lee and Seung (2001).
Let us define the parameters for the algorithm. Here the class espm.estimator.SmoothNMF
heritates from sckit-learn’s NMF
class.
[14]:
K = len(H) # Number of components / let assume that we know it
params = {}
params["tol"]=1e-6 # Tolerance for the stopping criterion.
params["max_iter"] = 200 # Maximum number of iterations before timing out.
params["hspy_comp"] = False # If should be set to True if hspy data format is used.
params["verbose"] = 1 # Verbosity level.
params["eval_print"] = 10 # Print the evaluation every eval_print iterations.
params["epsilon_reg"] = 1 # Regularization parameter
params["linesearch"] = False # Use linesearch to accelerate convergence
params["shape_2d"] = shape_2d # Shape of the 2D maps
params["n_components"] = K # Number of components
params["normalize"] = True # Normalize the data. It helps to keep the range of the regularization parameters lambda_L and mu in a reasonable range.
estimator = SmoothNMF(mu=mu, lambda_L=lambda_L, G = Gused, force_simplex=force_simplex, **params)
The function fit_transform
will solve the problem and return the matrix \(GW\).
[15]:
GW_est = estimator.fit_transform(X)
It 10 / 200: loss 7.852122e-03, 4.185 it/s
It 20 / 200: loss 7.825420e-03, 4.254 it/s
It 30 / 200: loss 7.815601e-03, 4.280 it/s
It 40 / 200: loss 7.810161e-03, 4.293 it/s
It 50 / 200: loss 7.806439e-03, 4.300 it/s
It 60 / 200: loss 7.803648e-03, 4.304 it/s
It 70 / 200: loss 7.801485e-03, 4.307 it/s
It 80 / 200: loss 7.799700e-03, 4.309 it/s
It 90 / 200: loss 7.798369e-03, 4.310 it/s
It 100 / 200: loss 7.797371e-03, 4.311 it/s
It 110 / 200: loss 7.796559e-03, 4.311 it/s
It 120 / 200: loss 7.795796e-03, 4.315 it/s
It 130 / 200: loss 7.794962e-03, 4.319 it/s
It 140 / 200: loss 7.794118e-03, 4.322 it/s
It 150 / 200: loss 7.793349e-03, 4.324 it/s
It 160 / 200: loss 7.792436e-03, 4.326 it/s
It 170 / 200: loss 7.791491e-03, 4.327 it/s
It 180 / 200: loss 7.790487e-03, 4.327 it/s
It 190 / 200: loss 7.789284e-03, 4.328 it/s
exits because max_iteration was reached
Stopped after 200 iterations in 0.0 minutes and 46.0 seconds.
We can plot the different losses to see how the algorithm converges.
[16]:
losses = estimator.get_losses()
values, names = process_losses(losses)
plt.figure(figsize=(10, 6))
for name, value in zip(names[:4], values[:4]):
plt.plot(value, label=name)
plt.xlabel("iteration")
plt.yscale("log")
plt.legend()
[16]:
<matplotlib.legend.Legend at 0x7f95d86d9590>

[17]:
losses[0]
[17]:
(0.00801227, 0.00703841, 0.00075641, 0.00021746, 8., 0.03137933, 0.34708)
We can easily recover \(W\) and \(H\) from the estimator.
[18]:
H_est = estimator.H_
W_est = estimator.W_
Let us compute some metrics to evaluate the quality of the solution. We will use the metrics defined in the module espm.metrics
.
We first compute the angles between the phases GW
and the estimated phases GW_est
. The angles are computed using the function compute_angles
. This function also return the indices to match the estimations and the ground truths. Indeed, there is no reason for them to have the same order.
[19]:
angles, true_inds = find_min_angle(GW.T, GW_est.T, unique=True, get_ind=True)
print("angles : ", angles)
angles : [5.468256222971067, 30.136582764210953, 47.47868600255621]
Now, we can compute the Mean Squared Error (MSE) between the maps H
and the estimated maps H_est
.
[20]:
mse = ordered_mse(H, H_est, true_inds)
print("mse : ", mse)
mse : [0.051252249085099934, 0.022205344201354206, 0.05466327775719334]
Finally, let us plot the results.
[21]:
def plot_results(Ddot, D, Hdotflat, Hflat):
fontsize = 30
scale = 15
aspect_ratio = 1.4
marker_list = ["-o","-s","->","-<","-^","-v","-d"]
mark_space = 20
# cmap = plt.cm.hot_r
cmap = plt.cm.gray_r
vmax = 1
vmin = 0
K = len(H)
L = len(D)
angles, true_inds = find_min_angle(Ddot.T, D.T, unique=True, get_ind=True)
mse = ordered_mse(Hdotflat, Hflat, true_inds)
mae = ordered_mae(Hdotflat, Hflat, true_inds)
r2 = ordered_r2(Hdotflat, Hflat, true_inds)
fig, axes = plt.subplots(K,3,figsize = (scale/K * 3 * aspect_ratio,scale))
x = np.linspace(0,1, num = L)
for i in range(K):
axes[2,i].plot(x,Ddot.T[i,:],'g-',label='ground truth',linewidth=4)
axes[2,i].plot(x,D[:,true_inds[i]],'r--',label='reconstructed',linewidth=4)
axes[2,i].set_title("{:.2f} deg".format(angles[i]),fontsize = fontsize-2)
axes[2,i].set_xlim(0,1)
axes[1,i].imshow((Hflat[true_inds[i],:]).reshape(shape_2d),vmin = vmin, vmax = vmax , cmap=cmap)
axes[1,i].set_title("R2: {:.2f}".format(r2[true_inds[i]]),fontsize = fontsize-2)
# axes[i,1].set_ylim(0.0,1.0)
axes[1,i].tick_params(axis = "both",labelleft = False, labelbottom = False,left = False, bottom = False)
im = axes[0,i].imshow(Hdotflat[i].reshape(shape_2d),vmin = vmin, vmax = vmax, cmap=cmap)
axes[0,i].set_title("Phase {}".format(i),fontsize = fontsize)
axes[0,i].tick_params(axis = "both",labelleft = False, labelbottom = False,left = False, bottom = False)
axes[2,0].legend()
rows = ["True maps","Reconstructed maps","Spectra"]
for ax, row in zip(axes[:,0], rows):
ax.set_ylabel(row, rotation=90, fontsize=fontsize)
fig.subplots_adjust(right=0.84)
# put colorbar at desire position
cbar_ax = fig.add_axes([0.85, 0.5, 0.01, 0.3])
fig.colorbar(im,cax=cbar_ax)
# fig.tight_layout()
print("angles : ", angles)
print("mse : ", mse)
print("mae : ", mae)
print("r2 : ", r2)
return fig
fig = plot_results(GW, GW_est, H, H_est)
plt.show()
angles : [5.468256222971067, 30.136582764210953, 47.47868600255621]
mse : [0.051252249085099934, 0.022205344201354206, 0.05466327775719334]
mae : [0.1657080234420085, 0.10186348170125903, 0.12053660132901785]
r2 : [0.30287198337896726, 0.08496786424311309, -0.2617123779050492]

[ ]:
API reference
Models
The espm.models
module implements the physical modelling
that is used both for the creation of the G matrix and for the
simulation of data.
Note
For now espm supports only the modelling of EDS data but we aim at supporting the modelling of EELS data too.
Models abstract class
Its main purpose is to read databases and set energy parameters.
- class espm.models.base.PhysicalModel(e_offset, e_size, e_scale, params_dict, db_name='default_xrays.json', E0=200, **kwargs)[source]
Abstract class of the models
- build_energy_scale(e_offset, e_size, e_scale)[source]
Build an array corresponding to a linear energy scale.
- Parameters
- e_offset
- float
Offset of the energy scale
- e_size
- int
Number of channels of the detector
- e_scale
- float
Scale in keV/channel
- Returns
- energy scale
- np.array 1D
Linear energy scale based on the given parameters of shape (e_size,)
- class espm.models.base.ToyModel(L: int = 200, C: int = 15, K: int = 3, seed: Optional[int] = None)[source]
Toy model.
Simple model with a random G matrix and phases.
The G matrix is generated with a random number of gaussians per column. The phases are generated using a vector drawn from a Laplacian distribution and multiplied by the G matrix.
- Parameters
- Lint
Length of the phases.
- Cint
Number of possible components in the phases.
- Kint
Number of phases.
- seedint, optional
Seed for the random number generator.
Examples
>>> from espm.models import ToyModel >>> import matplotlib.pyplot as plt >>> model = ToyModel(L=200, C=15, K=3, seed=0) >>> model.generate_g_matr() >>> model.generate_phases() >>> print(model.G.shape) (200, 15) >>> print(model.phases.shape) (200, 3) >>> plt.plot(model.phases)
EDXS model
The espm.models.edxs
module implements the creation of the G matrix that contains a modelisation of the characteristic and continuous X-rays.
- class espm.models.edxs.EDXS(*args, width_slope=0.01, width_intercept=0.065, **kwargs)[source]
- generate_g_matr(g_type='bremsstrahlung', norm=True, reference_elt={}, *, elements=[], **kwargs)[source]
Generate the G matrix. With a complete model the matrix is (e_size,n+2). The first n columns correspond to the sum of X-ray characteristic peaks associated to each shell of the elements. The last 2 columns correspond to a bremsstrahlung model.
- Parameters
- g_type
- string
Options of the edxs model to include in the G matrix. The three possibilities are “identity”, “no_brstlg” and “bremsstrahlung”. G is going to be the identity matrix, only characteristic X-ray peaks or characteristic X-rays plus bremsstrahlung model, respectively.
- norm
- boolean
Norms the columns of G for computation purposes. That feature will be removed, for a better solution.
- reference_elt
- dict
The keys are chemical elements (atomic number) and the values are cut-off energies. This argument is used to split some of the columns of G into 2 columns. The first column corresponds to characteristic X-rays before the cut-off and second one corresponds to characteristic X-rays before the cut-off. This feature is implemented to enable more accurate absorption correction.
- elements
- list
List of modeled chemical elements. The list can be populated either with atomic numbers or chemical symbols, e.g. “Fe” or 26.
- Returns
- g matrix
- np.array 2D
matrix of the edx model.
Notes
See our paper about the espm package for more information about the equations of this model.
- generate_phases(phases_parameters)[source]
Generate a series of spectra from list of phase parameters
- Parameters
- phases_parameters
- list
List of dicts containing the phase parameters.
- Returns
- phase_array
- np.array 2D
Array which lines correspond to the modelled phases in the input list.
Notes
The absorption correction is done using the average composition of the phases. The same correction is used for each phase.
- generate_spectrum(b0=0, b1=0, scale=1.0, abs_elts_dict={}, *, elements_dict={})[source]
Generate a spectrum from bremsstrahlung parameters and a chemical composition.
- Parameters
- b0
- float
First bremsstrahlung parameter.
- b1
- float
Second bremsstrahlung parameter.
- scale
- float
Scale factor to apply to the bremsstrahlung. Feature to be removed, scaling should be done with b0 and b1.
- abs_elts_dict
- dict
Dictionnary of elements and associated concentrations. This dictionnary is used to calculate the contribution of the absorption in the spectrum.
- elements_dict
- dict
Dictionnary of elements and associated concentrations describing the chemical composition of the modeled sample.
- Returns
- ——-
- spectrum
- np.array 1D
Output edx spectrum corresponding to the input chemical composition.
Notes
Check EDXS_function for details about the bremsstrahlung model.
Examples
>>> import matplotlib.pyplot as plt >>> from espm.models.edxs import EDXS >>> from espm.conf import DEFAULT_SYNTHETIC_DATA_DICT >>> b0, b1 = 5.5367e-5, 0.00192181 >>> elts_dict = {"Si" : 1.0,"Ca" : 1.0,"O" : 3.0,"C" : 0.3} >>> model = EDXS(**DEFAULT_SYNTHETIC_DATA_DICT['model_parameters']) >>> spectrum = model.generate_spectrum(b0,b1, elements_dict = elts_dict) >>> plt.plot(spectrum)
- espm.models.edxs.G_EDXS(model_params, g_params, part_W=None, G=None)[source]
Update the bremsstrahlung part of the G matrix. This function is used for the NMF decomposition so that the absorption correction is updated in between each step.
- Parameters
- model_params
- dict
Parameters of the edxs model. Check espm.conf.DEFAULT_EDXS_PARAMS for an example.
- g_params
- dict
Parameters specific to the creation of the G matrix.
- part_W
- np.array 2D
Concentrations determined during the NMF decomposition. It is used to determine the average composition that will be used to determine the absorption correction.
- G
- np.array 2D
Current G matrix. Setting it to None will initialize the G matrix.
- Returns
- updated G
- np.array 2D
Updated G matrix with a new absorption correction.
EDXS functions
The espm.models.EDXS_function
module implements misceallenous functions required for the :mod: espm.models.edxs.
This module include bremsstrahlung modelling, database reading and dicts and list manipulation functions.
- espm.models.EDXS_function.G_bremsstrahlung(x, E0, params_dict, *, elements_dict={})[source]
Computes the two-parts continuum X-rays for the G matrix. The two parts of the bremsstrahlung are constructed separately so that its parameters can fitted to data. Absorption and detection are multiplied to each part.
- Parameters
- x
- np.array 1D
Energy scale.
- params_dict
- dict
Dictionnary containing the absorption and detection parameters.
- elements_dict
- dict
Composition of the studied sample. It is required for absorption calculation.
- Returns
- continuum_xrays
- np.array 2D
Two parts continuum X-rays model with shape (energy scale size, 2).
- espm.models.EDXS_function.chapman_bremsstrahlung(x, b0, b1, b2)[source]
Calculate the bremsstrahlung as parametrized by chapman et al.
- Parameters
- x
- np.array 1D
Energy scale.
- b0
- float
First parameter, corresponding to the inverse of the energy.
- b1
- float
Second parameter, corresponding to the energy.
- b2
- float
Third parameter, corresponding to the square of the energy.
- Returns
- bremsstrahlung
- np.array 1D
Bremsstrahlung model
Notes
For details see Chapman et al. Journal of Microscopy, 136, pp. 179-191, (1984)
- espm.models.EDXS_function.continuum_xrays(x, params_dict={}, b0=0, b1=0, E0=200, *, elements_dict={'Si': 1.0})[source]
Computes the continuum X-rays, i.e. the bremsstrahlung multiplied by the absorption and the detection efficiency.
- Parameters
- x
- np.array 1D
Energy scale.
- params_dict
- dict
Dictionnary containing the absorption and detection parameters.
- b0
- float
First parameter.
- b1
- float
Second parameter.
- E0
- float
Energy of the incident beam in keV.
- elements_dict
- dict
Composition of the studied sample. It is required for absorption calculation.
- Returns
- continuum_xrays
- np.array 1D
Continuum X-rays model.
Notes
When an empty dict is provided. The continuum X-rays are set to 0. This is useful to perform calculations without the bremsstrahlung for example.
For an example structure of the params_dict parameter, check the DEFAULT_EDXS_PARAMS espm.conf.
For a custom detection efficiency, check the spectrum fit notebook.
- espm.models.EDXS_function.elts_dict_from_W(part_W, *, elements=[])[source]
Create a dictionnary of the elemental concentration from a fitted W. It useful to recompute absorption during the our custom NMF calculations.
- Parameters
- part_W
- np.array 2D
W matrix output from the NMF calculation. It only makes sense when the NMF decomposition if performed with G.
- elements
- list
List of elements as atomic number or symbol.
- Returns
- elements_dictionnary
- dict
Dictionnary containing the concentration associated to each chemical element of the problem.
Notes
This function should maybe move to another part of espm.
Examples
>>> import numpy as np >>> from espm.models.EDXS_function import elts_dict_from_W >>> W = np.ones((4,3)) >>> elts = ["Si","Ca","O","C"] >>> elts_dict_from_W(W,elements = elts) {"Si" : 0.25, "Ca" : 0.25, "O" : 0.25, "C" : 0.25}
- espm.models.EDXS_function.elts_dict_from_dict_list(dict_list)[source]
Create a single dictionnary of the elemental concentration from a list of dictionnary containing chemical compositions. The new dictionnary corresponds to the average chemical composition of the list.
- Parameters
- dict_list
- list [dict]
list of chemical composition dictionnary
- Returns
- elements_dictionnary
- dict
Dictionnary containing the concentration associated to each chemical elements contained in the input.
Notes
This function should maybe move to another part of espm.
Examples
>>> import numpy as np >>> from espm.models.EDXS_function import elts_dict_from_dict_list >>> dicts = [{"Si" : 1.0},{"Ca" : 1.0},{"O" : 1.0},{"C" : 1.0}] >>> elts_dict_from_dict_list(dicts) {"Si" : 0.25, "Ca" : 0.25, "O" : 0.25, "C" : 0.25}
- espm.models.EDXS_function.gaussian(x, mu, sigma)[source]
Calculate the gaussian function.
- Parameters
- xnp.array 1D
Energy scale.
- mufloat
Mean of the gaussian.
- sigmafloat
Standard deviation of the gaussian.
- Returns
- gaussiannp.array 1D
Gaussian distribution on x, with mean mu and standard deviation sigma.
- espm.models.EDXS_function.lifshin_bremsstrahlung(x, b0, b1, E0=200)[source]
Calculate the custom parametrized bremsstrahlung inspired by the model of L.Lifshin.
The model is designed to be linear and positive in b0 and b1.
- Parameters
- x
- np.array 1D
Energy scale.
- b0
- float
First parameter.
- b1
- float
Second parameter.
- E0
- float
Energy of the incident beam in keV.
- Returns
- bremsstrahlung
- np.array 1D
Bremsstrahlung model
Notes
For details see L.Lifshin, Ottawa, Proc.9.Ann.Conf.Microbeam Analysis Soc. 53. (1974)
- espm.models.EDXS_function.lifshin_bremsstrahlung_b0(x, b0, E0=200)[source]
Calculate the first part of our bremsstrahlung model.
- espm.models.EDXS_function.lifshin_bremsstrahlung_b1(x, b1, E0=200)[source]
Calculate the second part of our bremsstrahlung model.
- espm.models.EDXS_function.print_concentrations_from_W(part_W, *, elements=[])[source]
Print a report of the chemical concentrations from a fitted W.
- Parameters
- part_W
- np.array 2D
W matrix output from the NMF calculation. It only makes sense when the NMF decomposition if performed with G.
- elements
- list
List of elements as atomic number or symbol.
- Returns
- None
- None
None.
Notes
This function should maybe move to another part of espm.
- espm.models.EDXS_function.read_compact_db(elt, db_dict)[source]
Read the energy and cross section of each line of a chemical element for compact databases.
- Parameters
- elt
- string
Atomic number.
- db_dict
- dict
Dictionnary extracted from the json database containing the emission lines and their energies.
- Returns
- energies
- list float
List of energies associated to each emission line of the given element
- cross-sections
- list float
List of emission cross-sections of each line of the given element
- espm.models.EDXS_function.read_lines_db(elt, db_dict)[source]
Read the energy and cross section of each line of a chemical element for explicit databases.
- Parameters
- elt
- string
Atomic number.
- db_dict
- dict
Dictionnary extracted from the json database containing the emission lines and their energies.
- Returns
- energies
- list float
List of energies associated to each emission line of the given element
- cross-sections
- list float
List of emission cross-sections of each line of the given element
- espm.models.EDXS_function.shelf(x, height, length)[source]
Calculate the shelf contribution of the EDXS Silicon Drift Detector (SDD) detector.
- Parameters
- x
- np.array 1D
Energy scale.
- height
- float
Height in intensity of shelf contribution of the detector.
- length
- float
Length in energy of shelf contribution of the detector.
- Returns
- shelf
- np.array 1D
SDD shelf model.
Notes
For details see Scholze, F. and Procop, M., X-Ray Spectrom., 38: 312-321. (2009)
- espm.models.EDXS_function.update_bremsstrahlung(G, part_W, model_parameters, elements_list, norm=True)[source]
Update the continuum X-rays part of the G matrix with the current average chemical composition.
- Parameters
- G
- np.array 2D
EDXS modelling matrix.
- part_W
- np.array 2D
Learned chemical concentrations.
- model_parameters
- dict
Parameters of the EDXS model.
- elements_list
- list
List of elements as atomic number or symbol.
- norm
- bool
work in progress
- Returns
- new_G
- np.array 2D
New G matrix with updated continuum X-rays models.
Notes
This function should maybe move to another part of espm.
Absorption functions
The espm.models.absorption_edxs
module implements the functions to calculate the contribution of absorption in edx spectra.
- espm.models.absorption_edxs.absorption_coefficient(x, atomic_fraction=False, *, elements_dict={'Si': 1.0})[source]
Calculate the mass-absorption coefficent of a given composition at a certain energy or over a certain energy range.
- Parameters
- x
- np.array 1D
Energy value or energy scale over which the mass-absorption coefficient is calculated.
- atomic_fraction
- boolean
If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.
- elements_dict
- dict
Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.
- Returns
- mass-absorption coefficient
- np.array 1D
Calculated value of the mass-absorption coefficient on all the energy range.
Notes
The mass-absorption coefficients are calculated using the database of hyperspy
- espm.models.absorption_edxs.absorption_correction(x, thickness=1e-05, toa=90, density=None, atomic_fraction=False, *, elements_dict={'Si': 1.0}, **kwargs)[source]
Calculate the contribution of the absorption on the EDX spectrum for a thin slab of material with a given composition at a certain energy or over a certain energy range.
- Parameters
- x
- np.array 1D
Energy value or energy scale over which the mass-absorption coefficient is calculated.
- thickness
- float
Thickness of the material slab in meter. If the thickness is set to 0.0, the function will return 1.0.
- toa
- float
Take-off angle in degrees of the x-rays travelling from the sample to the x-ray detectors.
- density
- float
Density of the material in g/m3 (to be checked). If None, an approximation of the density based on the atomic numbers of the elements of the materials is calculated.
- atomic_fraction
- boolean
If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.
- elements_dict
- dict
Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.
- Returns
- absorption correction
- np.array 1D
Calculated value of the absorption correction on all the energy range.
- espm.models.absorption_edxs.absorption_mass_thickness(x, mass_thickness, toa=90, atomic_fraction=True, *, elements_dict={'Si': 1.0})[source]
Calculate the contribution of the absorption of a mass-thickness map.
- Parameters
- x
- np.array 1D
Energy value or energy scale over which the mass-absorption coefficient is calculated.
- mass_thickness:
- np.array 2D
Mass-thickness map. See the notes for details.
- toa
- float
Take-off angle in degrees of the x-rays travelling from the sample to the x-ray detectors.
- atomic_fraction
- boolean
If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.
- elements_dict
- dict
Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.
- Returns
- absorption correction
- np.array 3D
Calculated value of the absorption correction on all the energy range for all pixels of the mass-thickness map.
Notes
The mass-thickness map can be extracted from an EDX spectrum image by calculating the concentration of an element for both K and L lines. Since there should be only one concentration for both lines, it is possible to determine the contribution of the absorption for each pixel and thus obtain the mass-thickness map.
- espm.models.absorption_edxs.det_efficiency(x, det_dict)[source]
Simulate the detection efficiency of an EDX detector.
- Parameters
- x
- np.array 1D
Energy value or energy scale over which the mass-absorption coefficient is calculated.
- det_dict
- dict
See Examples for an example of the structure of the dict. One of the layer of the input dictionnary should be named “detection” to model a active layer of the detector.
- Returns
- efficiency
- np.array 1D
Calculated detection efficiency based on the input layers model.
Examples
>>> from espm.models.absorption_edxs import det_efficiency >>> import matplotlib.pyplot as plt >>> import numpy as np >>> dict = {"detection" : {"thickness" : 10e-3, "density" : 2.3, "atomic_fraction" : False, "elements_dict" : {"Si" : 1.0}}, >>> "dead_layer" : {"thickness" : 50e-7, "density" : 2.7, "atomic_fraction" : False, "elements_dict" : {"Al" : 1.0}}} >>> x = np.linspace(0.5,20,num = 1000) >>> efficiency = det_efficiency(x,dict) >>> plt.plot(x,efficiency)
- espm.models.absorption_edxs.det_efficiency_from_curve(x, filename, kind='cubic')[source]
Interpolate a detection efficiency curve that is stored in ~/espm/tables.
- Parameters
- x
- np.array 1D
Energy value or energy scale over which the mass-absorption coefficient is calculated.
- filename:
- string
Name of the file of the detection efficiency.
- kind
- string
Polynomial order of the interpolation.
- Returns
- Detection efficiency
- np.array 1D
Interpolated detection efficiency.
- espm.models.absorption_edxs.det_efficiency_layer(x, thickness=1e-05, density=None, atomic_fraction=False, *, elements_dict={'Si': 1.0})[source]
Calculate the proportion of absorbed X-rays for one virtual layer of the detector.
- Parameters
- x
- np.array 1D
Energy value or energy scale over which the mass-absorption coefficient is calculated.
- thickness
- float
Thickness of the material slab in meter. If the thickness is set to 0.0, the function will return 1.0.
- density
- float
Density of the material in g/m3 (to be checked). If None, an approximation of the density based on the atomic numbers of the elements of the materials is calculated.
- atomic_fraction
- boolean
If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.
- elements_dict
- dict
Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.
- Returns
- absorption
- np.array 1D
Calculated absorption of one layer of the detector.
Notes
We assume that the X-rays arrive perpendicular to the detector.
Weights
Functions to generate weights for synthetic datasets.
- espm.weights.laplacian_weights(shape_2d, n_phases=3, seed=0)[source]
Generates a random weight matrix with a laplacian distribution.
The random weight are then filtered with median filter 2 times to smooth the distribution. Eventually, the result is normalized to sum up to one.
- Parameters
- shape_2dtuple
Shape of the weight matrix.
- n_phasesint, optional
Number of phases. The default is 3.
- seedint, optional
Seed for the random number generator. The default is 0.
- Returns
- weightsnumpy.ndarray
Weight matrix with laplacian distribution.
Examples
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from espm.weights import laplacian_weights >>> weights = laplacian_weights((100,100), 3, 0) >>> plt.figure(figsize=(10,3)) >>> for i in range(3): ... plt.subplot(1,3,i+1) ... plt.imshow(weights[:,:,i], cmap=plt.cm.gray_r) ... plt.axis("off") ... plt.title(f"Map {i+1}") >>> plt.show()
- espm.weights.load_toy_weights()[source]
Load the toy problem weights.
- Returns
- Hdotnp.ndarray
The weights of the toy problem.
Examples
>>> from espm.weights import load_toy_weights >>> import matplotlib.pyplot as plt >>> Hdot = load_toy_weights() >>> print(Hdot.shape) (200, 200, 3) >>> print(Hdot.dtype) float64 >>> plt.figure(figsize=(10,3)) >>> for i in range(3): ... plt.subplot(1,3,i+1) ... plt.imshow(Hdot[:,:,i], cmap=plt.cm.gray_r) ... plt.axis("off") ... plt.title(f"Map {i+1}") >>> plt.show()
- espm.weights.random_weights(shape_2d, n_phases=3, seed=0)[source]
Generates a random weight matrix with a uniform distribution.
The random weights are then normalized to sum up to one.
- Parameters
- shape_2dtuple
Shape of the weight matrix.
- n_phasesint, optional
Number of phases. The default is 3.
- seedint, optional
Seed for the random number generator. The default is 0.
- Returns
- weightsnumpy.ndarray
Weight matrix with uniform distribution.
Datasets
Sample creation
The espm.datasets.samples
module provides a set of functions to create
the samples of datasets.
- espm.datasets.samples.create_toy_sample(L, C, n_poisson, seed=None)[source]
Create a toy sample.
- Parameters
- Lint
Length of the phases.
- Cint
Number of possible components in the phases.
- n_poissonint
Poisson parameter. The higher, the less noise.
- seedint, optional
Seed for the random number generator.
- Returns
- sampledict
Dictionary containing the sample.
- The dictionary contains the following keys:
- model_parameters: dict
Dictionary containing the model parameters.
- misc_parameters: dict
Dictionary containing the misc parameters. Default empty.
- shape_2d: list
List of length 2 containing the shape of the 2D images.
- GW: np.array
The marrix corresponding to the phases.
- H: np.array
The matrix corresponding to the weights.
- X: np.array
The matrix corresponding to the noisy data.
- Xdot: np.array
The matrix corresponding to the noiseless data.
- G: np.array
The matrix corresponding to the G matrix.
Examples
>>> import matplotlib.pyplot as plt >>> from espm.datasets import create_toy_sample >>> L, C, n_poisson = 200, 15, 400, >>> sample = create_toy_sample(L, C, n_poisson, seed=0) >>> Hdot = sample["H"] >>> GW = sample["GW"] >>> G = sample["G"] >>> X = sample["X"] >>> Xdot = sample["Xdot"] >>> shape_2d = sample["shape_2d"] >>> vmin, vmax = 0,1 >>> cmap = plt.cm.gray_r >>> plt.figure(figsize=(10, 3)) >>> for i, hdot in enumerate(Hdot): ... plt.subplot(1,3,i+1) ... plt.imshow(Hdot[i].reshape(shape_2d), cmap=cmap, vmin=vmin, vmax=vmax) ... plt.axis("off") ... plt.title(f"Map {i+1}") >>> plt.show()
>>> plt.figure(figsize=(10, 3)) >>> l = np.arange(0, 1, 1/L) >>> plt.plot(l, G[:,:3]) >>> plt.title("Spectral response of each elements") >>> plt.show()
Estimators
NMF Estimators.
The espm.estimators
module implements different NMF algorithms.
The class espm.estimators
is an abstract class for all NMF algorithms. It implements the fit and transform methods.
The fit method is implemented in the abstract class and calls the _iteration method which is implemented in the child classes.
The transform method is implemented in the child classes.
NMF Estimator
- class espm.estimators.NMFEstimator(n_components=2, init=None, tol=0.0001, max_iter=200, random_state=None, verbose=1, debug=False, l2=False, G=None, shape_2d=None, normalize=False, log_shift=1e-14, eval_print=10, true_D=None, true_H=None, fixed_H=None, fixed_W=None, hspy_comp=False)[source]
Abstract class for NMF algorithms.
This abstract class espm.estimators.NMFEstimator is used to implement the different NMF algorithms. It solves problems of the form:
\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} \frac{1}{2} L(X, GWH) + R(W, H)\]where \(X\) is the data matrix, \(G\) is a matrix of known values, \(W\) and \(H\) are the matrices to be learned, \(R\) is a regularization term and \(L\) is a loss function. that represent the generalized KL divergence. As a reminder, the generalized KL divergence is defined as:
\[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]where \(Y = GWH\). Since \(X\) does not depend on \(W\) and \(H\), we obtain the loss function:
\[L(X, Y) = - \sum_{i,j} X_{ij} \log \frac{GWH_{ij}} + GWH_{ij}\]The Generalized KL divergence has the advantage of being zero when \(X = Y\), which is not the case for our loss. Therefore, we shift the loss function by a constant \(C\) such that it equals the Generalized KL divergence. This constant is stored in the attribute
espm.estimators.NMFEstimator.const_KL_
.The loss function can also be selected to be the Frobenius norm. In this case, the loss function is:
\[L(X, Y) = \frac{1}{2} \sum_{i,j} (X_{ij} - Y_{ij})^2\]While the code will work, it is not recommended to use the Frobenius norm as a loss function. This code is optimized for the KL divergence.
The size of:
\(X\) is \((n, p)\),
\(W\) is \((m, k)\),
\(H\) is \((k, p)\),
\(G\) is \((n, m)\).
The columns of the matrices \(H\) and \(X\) are assumed to be images. This is used typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e. shape_2d[0]*shape_2d[1] = p.
- Parameters
- n_componentsint, default=2
Number of components, i.e. dimensionality of the latent space.
- initstr
Method used to initialize the procedure. Default is None The method use the initialization of
sklearn.decomposition
. It can be imported using: .. code-block::python>>> from sklearn.decomposition._nmf import _initialize_nmf
- tolfloat, default=1e-4
Tolerance of the stopping condition.
- max_iterint, default=200
Maximum number of iterations before timing out.
- random_stateint, RandomState instance, default=None
- verboseint, default=1
The verbosity level.
- debugbool, default=False
If True, the algorithm will log more and perform more checks.
- l2bool, default=False
If True, the algorithm will use the l2 norm instead of the KL divergence.
- Gnp.array, function or None, default=None
If np.array, it is the known matrix of the data. If function, it is a function that takes as input the data matrix and returns the known matrix (np.array). If None, it is assumed that G is the identity matrix.
- shape_2dtuple or None, default=None
If not None, it is the image shape of the columns of the matrices \(X\) and \(H\).
- normalizebool, default=False
If True, the algorithm will normalize the data matrix \(X\).
- log_shiftfloat, default=1e-10
Lower bound for W and H, i.e. \(\epsilon\).
- eval_printint, default=10
Number of iterations between each evaluation of the loss function.
- true_Dnp.array or None, default=None
Ground truth for the matrix \(GW\). Used for evaluation purposes.
- true_Hnp.array or None, default=None
Ground truth for the matrix \(H\). Used for evaluation purposes.
- fixed_Hnp.array or None, default=None
If not None, it fixes the non-zero values of the matrix \(H\).
- fixed_Wnp.array or None, default=None
If not None, it fixes the non-zero values of the matrix \(W\).
- hspy_compbool, default=False
If True, the algorithm will use the format compatible with hyperspy. Use this option if you run the algorithm with the method decompositio in hyperspy. For example: .. code-block::python
>>> est = SmoothNMF( n_components = 3, hspy_comp = True) >>> out = spim.decomposition(algorithm = est, return_info=True)
- fit(X, y=None, **params)[source]
Learn a NMF model for the data X.
- Parameters
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Data matrix to be decomposed
- yIgnored
- paramsdict
Parameters passed to the fit_transform method.
- Returns
- self
The model.
- fit_transform(X, y=None, W=None, H=None)[source]
Learn a NMF model for the data X and returns the transformed data. This is more efficient than calling fit followed by transform.
The size of:
\(X\) is \((n, p)\),
\(W\) is \((m, k)\),
\(H\) is \((k, p)\),
\(G\) is \((n, m)\).
- Parameters
- X{array-like, sparse matrix} of shape (n, p)
Data matrix to be decomposed
- yIgnored
Not used, present here for API consistency by convention.
- Warray-like of shape (m, k)
If specified, it is used as initial guess for the solution.
- Harray-like of shape (k, p)
If specified, it is used as initial guess for the solution.
- Returns
- GWndarrays
Transformed data.
- inverse_transform(W)[source]
Transform data back to its original space.
- Parameters
- W{ndarray, sparse matrix} of shape (n_samples, n_components)
Transformed data matrix.
- Returns
- X{ndarray, sparse matrix} of shape (n_samples, n_features)
Data matrix of original shape.
- loss(W, H, average=True, X=None)[source]
Loss function.
Compute the loss function for the given matrices W and H.
- Parameters
- Wnp.array
Matrix of shape (n, k)
- Hnp.array
Matrix of shape (k, p)
- averagebool, default=True
If True, the loss is averaged over the number of elements of the matrices.
- Xnp.array or None, default=None
If not None, it is the data matrix. If None, it is assumed that the data matrix in self.X_.
- Returns
- loss_float
Value of the loss function.
- const_KL_ = None
- loss_names_ = ['KL_div_loss']
SmoothNMF
- class espm.estimators.SmoothNMF(lambda_L=1.0, linesearch=False, mu=0, epsilon_reg=1, algo='log_surrogate', force_simplex=True, dicotomy_tol=1e-05, gamma=None, **kwargs)[source]
SmoothNMF - NMF with a smooth regularization term
We encourage to read the example available in the documentation: https://espm.readthedocs.io/en/latest/introduction/notebooks/toy-problem.html
The corresponding notebook is available on github: https://github.com/adriente/espm/blob/main/notebooks/toy-ML.ipynb
The class SmoothNMF implements the regularized NMF algorithm. It solves problems of the form:
\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} D_{GKL}(X || GWH) + \lambda_L tr(H \Delta H^\top) + \mu \sum_{ij} \log(H_{ij} + \epsilon_{reg})\]where:
D_{GKL} is the Generalized KL divergence loss function defined as:
\[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]See the documentation of the class
espm.estimators.NMFEstimator
for more details.Delta is the Laplacian operator (it can be created using the function create_laplacian_matrix from the utils module).
epsilon_{reg} is the slope of the log regularization/sparsity at 0 (you probably want to leave this to 1).
lambda_L is a regularization parameter, which encourages smoothness in the columns of H.
mu is a regularization parameter, which is similar to an L1 sparsity penalty.
The size of:
X is (n, p)
W is (m, k)
H is (k, p)
G is (n, m)
The columns of the matrices H and X are assumed to be images, typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e., shape_2d[0]*shape_2d[1] = p.
- Parameters
- lambda_Lfloat, default=1.0
Regularization parameter for the smooth regularization term.
- linesearchbool, default=False
If True, use a line search to find the step size.
- mufloat, default=0
Regularization parameter for the log regularization/sparsity term.
- epsilon_regfloat, default=1
Slope of the log regularization/sparsity at 0.
- algostr, default=”log_surrogate”
Algorithm to use for the smooth regularization term. Can be “log_surrogate”, “l2_surrogate”, or “projected_gradient”.
- force_simplexbool, default=True
If True, force the solution to be in the simplex.
- dicotomy_tolfloat, default=1e-3
Tolerance for the dichotomy algorithm.
- gammafloat, default=None
Initial value for the step size. If None, it is set to the Lipschitz constant of the gradient.
- **kwargsdict
Additional parameters for the NMFEstimator class.
- fit_transform(X, y=None, W=None, H=None)[source]
Fit the model to the data X and returns the transformed data.
The size of:
\(X\) is \((n, p)\),
\(W\) is \((m, k)\),
\(H\) is \((k, p)\),
\(G\) is \((n, m)\).
- Parameters
- Xarray-like, shape (n, p)
Data matrix to be decomposed
- yIgnored
Not used, present here for API consistency by convention.
- Warray-like, shape (m, k)
If init=’custom’, it is used as initial guess for the solution.
- Harray-like, shape (k, p)
If init=’custom’, it is used as initial guess for the solution.
- Returns
- GWndarrays
Transformed data.
- loss_names_ = ['KL_div_loss', 'log_reg_loss', 'Lapl_reg_loss', 'gamma']
Measures
The espm.measures
module implements different measures for the matrix factorisation problem.
In particular it contains the different losses and regularizers used in espm.estimator
module.
It also contains different metrics to evaluate the results.
- espm.measures.Frobenius_loss(X, W, H, average=False)[source]
Frobenius norm of the difference between X and WH.
Compute the Froebenius norm (elementwise L2 norm of a matrix) given \(X,W,H\):
\[\| X - WH \|_F = \sum_{ji} \left| X_{ij} - (W H)_{ij} \right|^2\]- Parameters
X (np.array 2D) – n x m matrix
W (np.array 2D) – n x k matrix
H (np.array 2D) – k x m matrix
average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)
- Returns
the answer
Examples
>>> import numpy as np >>> from espm.measures import Frobenius_loss >>> X = np.array([[1, 1, -1], [2, 4, 5]]) >>> W = np.array([[1], [1]]) >>> H = np.array([[1, 2, 3]]) >>> Frobenius_loss(X, W, H) 26
- espm.measures.KL(X, Y, log_shift=1e-14, average=False)[source]
Generalized KL (Kullback–Leibler) divergence for two matrices
\[D_KL(X || Y) = \sum_{ji} X_{ij} \log (X / Y)_{ij} + (Y - X)_{ij}\]- Parameters
X (np.array 2D) – n x m matrix
W (np.array 2D) – n x k matrix
H (np.array 2D) – k x m matrix
log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module
esppy.conf
)average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)
- Returns
the answer
- Return type
- espm.measures.KL_loss_surrogate(X, W, H, Ht, log_shift=1e-14, average=False)[source]
Surrogate loss for the KL divergence.
- espm.measures.KLdiv(X, D, H, log_shift=1e-14, average=False)[source]
Generalized KL (Kullback–Leibler) divergence
Compute the generalized KL divergence given \(X,W,H\):
\[D_KL(X || WH) = \sum_{ji} X_{ij} \log (X / D A)_{ij} + (D A - X)_{ij}\]- Parameters
X (np.array 2D) – n x m matrix
W (np.array 2D) – n x k matrix
H (np.array 2D) – k x m matrix
log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module
esppy.conf
)average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)
- Returns
the answer
- Return type
Examples
>>> import numpy as np >>> from espm.measures import KLdiv >>> X = np.array([[1, 1, 1], [2, 4, 5]]) >>> W = np.array([[1], [1]]) >>> H = np.array([[1, 2, 3]]) >>> KLdiv(X, W, H) 2.921251732961556
- espm.measures.KLdiv_loss(X, W, H, log_shift=1e-14, average=False)[source]
Generalized Generalized KL (Kullback–Leibler) divergence loss
Compute the loss based on the generalized KL divergence given :math: X,W,H:
\[\sum_{ji} X_{ij} \log (D W)_{ij} + (D W)_{ij}\]This does not contains all the term of the KL divergence, only the ones depending on W and H.
- Parameters
X (np.array 2D) – n x m matrix
Y (np.array 2D) – n x m matrix
log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module
esppy.conf
)average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)
- Returns
the answer
- Return type
Examples
>>> import numpy as np >>> from espm.measures import KLdiv, KLdiv_loss >>> X = np.array([[1, 1, 1], [2, 4, 5]]) >>> W = np.array([[1], [1]]) >>> H = np.array([[1, 2, 3]]) >>> KLdiv_loss(X, W, H) 1.9425903651915402ZSXDerzA >>> KLdiv(X, W, H) 2.921251732961556
- espm.measures.find_min_MSE(true_maps, algo_maps, get_ind=False, unique=False)[source]
Compare all the mean squared errors between ground truth spectra and NMF spectra and find the minimum configuration.
- Parameters
true_maps (np.array 2D) – true maps with shape (number of phases, number of pixels)
algo_maps (np.array 2D) – NMF maps with shape (number of phases, number of pixels)
get_ind (boolean) – If True, the function will also return the indices of the NMF maps corresponding to the ground truth
unique (boolean) – If False it will find the global minimum but several maps can be associated to the same ground truth
- Returns
list of mse, (optionally) tuple of indices
- Return type
- ..warning::
The output being either a list or a tuple of list isn’t a great idea. It has to change.
- espm.measures.find_min_angle(true_vectors, algo_vectors, get_ind=False, unique=False)[source]
Compare all the angles between ground truth spectra and NMF spectra and find the minimum configuration.
- Parameters
true_vectors (np.array 2D) – true spectra with shape (number of phases, number of energy channels)
algo_vectors (np.array 2D) – NMF spectra with shape (number of phases, number of energy channels)
get_ind (boolean) – If True, the function will also return the indices of the NMF spectra corresponding to the ground truth
unique (boolean) – If False it will find the global minimum but several spectra can be associated to the same ground truth
- Returns
list of angles, (optionally) tuple of indices
- Return type
- ..warning::
The output being either a list or a tuple of list isn’t a great idea. It has to change.
- espm.measures.find_min_config(true_maps, true_spectra, algo_maps, algo_spectra, angles=True)[source]
- espm.measures.log_reg(H, mu, epsilon=1, average=False)[source]
Log regularisation
Compute the regularization loss:
\[R(\mu, H, \epsilon) = \sum_{ij} \mu_i \log \left( H_{ij} + \epsilon \right).\]
- espm.measures.log_surrogate(H, Ht, mu, epsilon, average=False)[source]
Surrogate loss for the log function.
- espm.measures.mae(map1, map2)[source]
Mean average error
Calculate the mean average error between two 2D arrays of the same dimension.
- Parameters
map1 (np.array 2D) – first array
map2 (np.array 2D) – second array
- Returns
the answer
Examples
>>> import numpy as np >>> from espm.measures import mae >>> map1 = np.array([[0, 1][0, 1]]) >>> map2 = np.array([[1, 1][1, 1]]) >>> mae(map1, map2) 0.5
- espm.measures.mse(map1, map2)[source]
Mean square error
Calculate the mean squared error between two 2D arrays of the same dimension.
- Parameters
map1 (np.array 2D) – first array
map2 (np.array 2D) – second array
- Returns
the answer
Examples
>>> import numpy as np >>> from espm.measures import mse >>> map1 = np.array([[0, 1][0, 1]]) >>> map2 = np.array([[1, 1][1, 1]]) >>> mse(map1, map2) 0.5
- espm.measures.ordered_mae(true_maps, algo_maps, input_inds)[source]
input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the mean average errors of each phase in truth order.
- espm.measures.ordered_mse(true_maps, algo_maps, input_inds)[source]
input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the mean squared errors of each phase in truth order.
- espm.measures.ordered_r2(true_maps, algo_maps, input_inds)[source]
input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the coefficient of determination of each phase in truth order.
- espm.measures.r2(map_true, map_pred)[source]
\(R^2\) - Coefficient of determination
Calculates the coefficient of determination between two 2D arrays of the same dimension. This is also called regression score function. See wikipedia.
This function is a wrapper for the function
sklearn.metrics.r2_score
of Scikit Learn.- Parameters
map1 (np.array 2D) – first array
map2 (np.array 2D) – second array
- Returns
the answer
- Return type
- espm.measures.spectral_angle(v1, v2)[source]
Spectral angle
Calculate the angle between two spectra of the same dimension.
- Parameters
v1 (np.array 1D) – first spectrum
v2 (np.array 1D) – second spectrum
- Returns
the answer
- Return type
Examples
>>> import numpy as np >>> from espm.measures import spectral_angle >>> v1 = np.array([0, 1, 0]) >>> v2 = np.array([1, 0, 1]) >>> spectral_angle(v1, v2) 90.0
- espm.measures.squared_distance(x, y=None)[source]
Squared distance between two between all colon vectors matrices.
Calculate the squared L2 distance between all pairs of vectors of two matrices. If only one matrix is given, the function uses each pair of vector of this matrix.
- Parameters
x (np.array 2D) – n x m matrix of first colon vectors
y (np.array 2D) – n x m matrix of second colon vectors (optional)
- Returns
the answer
- Return type
np.array (m x m)
Examples
>>> import numpy as np >>> from espm.measures import square_distance >>> x = np.arange(3) >>> square_distance(x, x) array([[ 0., 1., 2.], [ 1., 0., 1.], [ 2., 1., 0.]])
- espm.measures.trace_xtLx(L, x, average=False)[source]
Trace of \(X^T L X\)
Compute the following expression \(\text{Tr} (X^T L X)\).
- Parameters
L (np.array 2D) – n x n matrix
mu (np.array 2D) – n x k martrix of k n-sized vector
average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)
- Returns
the answer
- Return type
- espm.measures.unique_min(matrix)[source]
From a square matrix of float values, finds the combination of elements with different lines which mimises the sum of elements.
It is a brute force algorithm, it is not recommended to input a matrix bigger than 20 x 20
- Parameters
matrix (np.array 2D) – square matrix
- Returns
list of unique min values and corresponding indices in the same order
- Return type
Examples
>>> import numpy as np >>> from espm.measures import unique_min >>> matrix = np.array([[1.2, 1.3, 3.5], [4.9, 2.2, 6.5], [9.0, 4.1, 1.8]]) >>> unique_min(v1, v2) ([1.2, 2.2, 1.8], (0, 1, 2))
Utils
Utils for the ESPM package
- espm.utils.approx_density(atomic_fraction=False, *, elements_dict={})[source]
Wrapper to the density_of_mixture function of hyperspy. Takes a dict of chemical composition expressed in atomic weight fractions. Returns an approximated density.
- espm.utils.arg_helper(params, d_params)[source]
Check if all parameter of d_params are in params. If not, they are added to params with the default value.
- Parameters
- paramsdict
Dictionary of parameters to be checked.
- d_paramsdict
Dictionary of default parameters.
- Returns
- paramsdict
Dictionary of parameters with the default parameters added if not present.
- espm.utils.atomic_to_weight_dict(*, elements_dict={})[source]
Wrapper to the atomic_to_weight function of hyperspy. Takes a dict of chemical composition expressed in atomic fractions. Returns a dict of chemical composition expressed in atomic weight fratiom.
- espm.utils.bin_spim(data, n, m)[source]
Take a 3D array of size (x,y,k) [px, py, e] Returns a 3D array of size (n,m,k) [new_px, new_py, e]
- espm.utils.check_keys(params, d_params, upperkeys='', toprint=True)[source]
Check if all parameter of d_params are in params. If not, they are added to params with the default value.
- Parameters
- paramsdict
Dictionary of parameters to be checked.
- d_paramsdict
Dictionary of default parameters.
- upperkeysstr
String of the upper keys.
- toprintbool
If True, print the warning.
- Returns
- paramsdict
Dictionary of parameters with the default parameters added if not present.
Examples
>>> params = {'a':1,'b':2} >>> d_params = {'a':1,'b':2,'c':3} >>> check_keys(params,d_params) >>> params {'a': 1, 'b': 2, 'c': 3}
- espm.utils.create_laplacian_matrix(nx, ny=None)[source]
Helper method to create the laplacian matrix for the laplacian regularization
- Parameters
- :param nx: height of the original image
- :param ny: width of the original image
- Returns
- rtype
scipy.sparse.csr_matrix ..
- :return:the n x n laplacian matrix, where n = nx*ny
- espm.utils.is_number(i)[source]
Return True if i is a number
- Parameters
i (any) – variable to check
- Returns
True if i is a number
- Return type
- espm.utils.is_symbol(i)[source]
Return True if i is a chemical symbol
- Parameters
i (any) – variable to check
- Returns
True if i is a chemical symbol
- Return type
- espm.utils.isdict(p)[source]
Return True if the variable a dictionary.
- Parameters
p (any) – variable to check
- Returns
True if p is a dictionary
- Return type
- espm.utils.number_to_symbol_dict(func)[source]
Decorator Takes a dict of elements (a.k.a chemical composition) with atomic numbers as keys (e.g. 26 for Fe) returns a dict of elements with symbols as keys (e.g. Fe for iron)
- espm.utils.number_to_symbol_list(func)[source]
Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)
- espm.utils.process_losses(losses)[source]
Process the losses to be plotted
- Parameters
- losses: np.ndarray
Array of losses (output of espm.estimators.NMFEstimator.get_losses method)
- Returns
- values: np.ndarray
Array of values
- names: list
List of names
- espm.utils.rescaled_DH(D, H)[source]
Rescale the matrices D and H such that the columns of H sums approximately to one.
- Parameters
D (np.array 2D) – n x k matrix
H (np.array 2D) – k x m matrix
- Returns
D_rescale, H_rescale
- Return type
np.array 2D, np.array 2D
Changelog
All notable changes to this project will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
0.1.1 (2023-21-02)
First inoficial release
First version of the package on PyPi
All main functionality tested
Basic documentation
Contributing
Contributions are welcome, and they are greatly appreciated! The development of this package takes place on GitHub. Issues, bugs, and feature requests should be reported there. Code and documentation can be improved by submitting a pull request. Please add documentation and tests for any new code.
The package can be set up (ideally in a fresh virtual environment) for local development with the following:
$ git clone https://github.com/adriente/espm.git
$ pip install -e ".[dev]"
You can improve or add functionality in the espm
folder, along with
corresponding unit tests in espm/tests/test_*.py
(with reasonable
coverage).
If you have a nice example to demonstrate the use of the introduced
functionality, please consider adding a notebook in doc/introduction/notebooks
.
In general, this can be done as a relative symbolic link to a notebook in the
notebooks
folder.
Update README.rst
and CHANGELOG.rst
if applicable.
After making any change, please run the tests, and build the documentation with the following
$ make test
$ make doc
To iterate faster, you can partially run the test suite, at various degrees of granularity, as follows:
$ pytest espm.test.test_utils.py
$ pytest espm.test.test_estimators.py
Making a release
Update the version number and release date in
setup.py
,espm/__init__.py
andCHANGELOG.rst
.Create a git tag with
git tag -a v0.2.0 -m "espm v0.2.0"
.Push the tag to GitHub with
git push --tag
. The tag should now appear in the releases and tags tab.Create a release on GitHub and select the created tag.
Build the distribution with
make dist
and check that thedist/espm-0.2.0.tar.gz
source archive contains all required files. The binary wheel should be found asdist/espm-0.2.0.py3-none-any.whl
.Test the upload and installation process:
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/* $ pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple emspy
Build and upload the distribution to the real PyPI with
make release
.
Repository organization
LICENSE.txt Project license
*.rst Important documentation
Makefile Targets for make
setup.py Meta information about package (published on PyPI)
.gitignore Files ignored by the git revision control system
.travis.yml Defines testing on Travis continuous integration
espm/ Contains the modules (the actual toolbox implementation)
__init.py__ Load modules at package import
*.py One file per module
espm/tables/ Contains data tables for the EDXS and EDSS models
__init.py__ Load modules at package import
*.json data_files for the package
*.txt data_files for the package
espm/tests/ Contains the test suites (will be distributed to end user)
__init.py__ Load modules at package import
test_*.py One test suite per module
test_docstrings.py Test the examples in the docstrings (reference doc)
doc/ Package documentation
conf.py Sphinx configuration
index.rst Documentation entry page
*.rst Include doc files from root directory
doc/reference/ Reference documentation
index.rst Reference entry page
*.rst Only directives, the actual doc is alongside the code
doc/introduction/
index.rst Tutorials entry page
*.rst One file per tutorial
doc/introduction/notebooks/
*.ipynb One notebook per tutorial