espm: The Electron Spectro-Microscopy Python Library

Documentation Status

This library contains algorithms to perform non-negative matrix factorization with diverse regularisation (e.g. Laplacian or L1) and contraints (e.g. simplex).

It is specialized for Electron Microscopy applications. It contains code to create artificial Energy dispersive X-ray spectroscopy (EDXS) data and to perform hyperspectral unmixing on EDXS spectrum images.

Installation

You can install this package from PyPi using:

$ pip install espm

If you want to develop, please use the option:

$ git clone https://github.com/adriente/espm.git
$ cd espm
$ pip install cython
$ pip install -e ".[dev]"

Getting started

Generate the synthetic dataset. Run the script:

$ python experiments/generate_synthetic_dataset.py

Documentation

The documentation is available at https://espm.readthedocs.io/en/latest/

You can get started with the following notebooks:

TODOs

Here is a list of things that we need to do before the version 0.2.0, which will be the first official release of the library. The code is already available on github at the following address: https://github.com/adriente/espm.git A draft of the documentation is available at: https://espm.readthedocs.io/en/latest/

  • Update the line 40 of doc/introduction/index.rst (@Adrien)

  • Make some doc for the dataset module (just the minimum) (@Adrien)
    • Nati: I started doint it. I suggest to put all the function that create samples in dataset.samples.

  • Toy dataset: create model class, change outputs, adapts function (@Nati) Done!

  • Separate the spectral and spacial parts (@Adrien)
    • Move generate_EDXS_phases to models

    • Create a modules for weights (Nati: I started doing it)

  • Complete the general doc in doc/introduction/index.rst (@Adrien)

  • Clarify the code for the estimator: remove the L2 loss (@Nati) Done!

  • Add the general problem that we solve in the doc (@Nati) Done!

  • Update the ML notebook with more explanations (@Nati) Done!

  • Check that the doc is somehow understanable and sufficiently complete (@Sebastian)

  • Add the reference to the dataset paper (@Nati)

  • add the reference to the algorithm paper (@Nati)

@Adrien: Check the code in espm.weights. If you copy this code, you can generate figures for the doc. It seems that you need to have the title Examples for the system to work. Example does not work!

Documentation map

Introduction to the espm

This tutorial will show you the basic operations of the toolbox. After installing the package with pip, start by opening a python shell, e.g. a Jupyter notebook, and import espm. The package espm is built on top of the hyperspy and the scikit-learn packages.

The hyperspy package is a Python library for multidimensional data analysis. It provides the base framework to handles our data. The scikit-learn package is a Python library for machine learning. It provides the base framework to for the Non Negative Matrix Factorization (NMF) algorithms develeoped in this package.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import hyperspy.api as hs
>>> import espm
Datasets

Let us start by creating some data to play with this tutorial. We can generating artificial datasets using the following lines:

>>> import espm.datasets as ds
>>> ds.generate_built_in_datasets(seeds_range=1)
>>> spim = ds.load_particules(sample = 0)
>>> spim.change_dtype('float64')

The command generate_built_in_datasets will save the generated datasets folder defined in the espm.conf.py file. Alternatively, you can also define the path where the data will be saved using the “base_path” argument.

>>> ds.generate_built_in_datasets(base_path="generated_samples", seeds_range=1)

Here the object spim is of the hyperspy._signals.signal1d.Signal1D. This object has different useful attributes and methods. For example, @ADRIEN — summarize here some of them

Note

Please see the review article espm : a Python library for the simulation of STEM-EDXS datasets for an overview of the simulation methods this package leverages.

Factorization

Taking the non-negative matrix factorization (NMF) is done with the following:

>>> out = spim.decomposition(3)
>>> spim.plot_decomposition_loadings(3)
>>> spim.plot_decomposition_factors(3)
_images/index-3_00.png
_images/index-3_01.png

BIt will use the algorithms developped in this contribution.

These algorithms are an important part of this package. They are specialized to solve regularized Poisson NMF problems. Mathematically, they can be expressed as:

\[\dot{W}, \dot{H} = \arg\min_{W\geq\epsilon, H\geq\epsilon, \sum_i H_{ij} = 1} D_{GKL}(X || GWH) + \lambda tr ( H^\top \Delta H) + \mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})$$\]

Here \(D_{GKL}\) is the fidelity term, i.e. the Generalized KL divergence

\[D_{GKL}(X \| Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

The loss is regularized using two terms: a Laplacian regularization on \(H\) and a log regularization on \(H\). \(\lambda\) and \(\mu\) are the regularization parameters. The Laplacian regularization is defined as:

\[\lambda tr ( H^\top \Delta H)\]

where \(\Delta\) is the Laplacian operator (it can be created using the function espm.utils.create_laplacian_matrix). Note that the columns of the matrices :math:`H` and :math:`X` are assumed to be images.

The log regularization is defined as:

\[\mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})\]

where \(\epsilon_{reg}\) is the slope of log regularization at 0. This term acts similarly to an L1 penalty but affects less larger values.

Finally, we assume \(W,H\geq \epsilon\) and that the lines of \(H\) sum to 1:

\[\sum_i H_{ij} = 1.\]

The size of:

  • \(X\) is (n, p)

  • \(W\) is (m, k)

  • \(H\) is (k, p)

  • \(G\) is (n, m)

The columns of the matrices \(H\) and \(X\) are assumed to be images, typically for the smoothness regularization. In terms of shape, we have \(n_x \cdot n_y = p\), where \(n_x\) and \(n_y\) are the number of pixels in the x and y directions.

A detailed example on the use these algorithms can be found in this notebook.

List of example notebooks

To go deeper, we invite you to consult the following notebooks.

Analysis of simulated EDX data

In this notebook we showcase the analysis of the built-in simulated dataset. We use the espm EDXS modelling to simulate the data. We first perform a KL-NMF decomposition of the data and then plot the results.

Imports
[1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

# Generic imports
import hyperspy.api as hs
import numpy as np

# espm imports
from espm.models.EDXS_function import print_concentrations_from_W
from espm.estimators import SmoothNMF
import espm.datasets as ds
Generating artificial datasets and loading them

If the datasets were already generated, they are not generated again

[2]:
# Generate the data
# Here `seeds_range` is the number of different samples to generate
ds.generate_built_in_datasets(seeds_range=1)

# Load the data
spim = ds.load_particules(sample = 0)
# We need to change the data to floats for the decomposition to run
spim.change_dtype('float64')
Building G

The information in the metadata of the spim object are used to build the G matrix. This matrix contains a model of the characteristic X-rays and the bremsstrahlung.

[3]:
G = spim.build_G("bremsstrahlung")
Problem solving
Picking analysis parameters
  • 3 components for 3 phases

  • convergence criterions : tol and max_iter. tol is the minimum change of the loss function between two iterations. max_iter is the max number of iterations.

  • hspy_comp is a required parameter if you want to use the hyperspy api

[4]:
est = SmoothNMF( n_components = 3,tol=0.000001, max_iter = 1000, G = G,hspy_comp = True)
Calculating the decomposition

/! It should take a minute to execute in the case of the built-in dataset

[5]:
out = spim.decomposition(algorithm = est, return_info=True)
It 10 / 1000: loss 2.060405e-01,  1.349 it/s
It 20 / 1000: loss 2.059418e-01,  1.383 it/s
It 30 / 1000: loss 2.059186e-01,  1.397 it/s
It 40 / 1000: loss 2.059093e-01,  1.404 it/s
It 50 / 1000: loss 2.059045e-01,  1.405 it/s
It 60 / 1000: loss 2.059014e-01,  1.409 it/s
It 70 / 1000: loss 2.058977e-01,  1.411 it/s
It 80 / 1000: loss 2.058942e-01,  1.413 it/s
It 90 / 1000: loss 2.058909e-01,  1.414 it/s
It 100 / 1000: loss 2.058864e-01,  1.415 it/s
It 110 / 1000: loss 2.058828e-01,  1.416 it/s
It 120 / 1000: loss 2.058789e-01,  1.417 it/s
It 130 / 1000: loss 2.058749e-01,  1.417 it/s
It 140 / 1000: loss 2.058706e-01,  1.416 it/s
It 150 / 1000: loss 2.058663e-01,  1.417 it/s
It 160 / 1000: loss 2.058620e-01,  1.417 it/s
It 170 / 1000: loss 2.058577e-01,  1.418 it/s
It 180 / 1000: loss 2.058538e-01,  1.418 it/s
It 190 / 1000: loss 2.058496e-01,  1.419 it/s
It 200 / 1000: loss 2.058448e-01,  1.419 it/s
It 210 / 1000: loss 2.058395e-01,  1.419 it/s
It 220 / 1000: loss 2.058341e-01,  1.420 it/s
It 230 / 1000: loss 2.058288e-01,  1.419 it/s
It 240 / 1000: loss 2.058235e-01,  1.420 it/s
It 250 / 1000: loss 2.058183e-01,  1.420 it/s
It 260 / 1000: loss 2.058132e-01,  1.420 it/s
It 270 / 1000: loss 2.058085e-01,  1.420 it/s
It 280 / 1000: loss 2.058041e-01,  1.421 it/s
It 290 / 1000: loss 2.057997e-01,  1.421 it/s
It 300 / 1000: loss 2.057954e-01,  1.421 it/s
It 310 / 1000: loss 2.057912e-01,  1.421 it/s
It 320 / 1000: loss 2.057874e-01,  1.421 it/s
It 330 / 1000: loss 2.057838e-01,  1.421 it/s
It 340 / 1000: loss 2.057804e-01,  1.421 it/s
It 350 / 1000: loss 2.057772e-01,  1.421 it/s
It 360 / 1000: loss 2.057743e-01,  1.421 it/s
It 370 / 1000: loss 2.057716e-01,  1.422 it/s
It 380 / 1000: loss 2.057691e-01,  1.422 it/s
exits because of relative change < tol: 1.036662026376086e-06
Stopped after 390 iterations in 4.0 minutes and 34.0 seconds.
Decomposition info:
  normalize_poissonian_noise=False
  algorithm=SmoothNMF()
  output_dimension=None
  centre=None
scikit-learn estimator:
SmoothNMF()
Getting the losses and the results of the decomposition
  • First cell : Printing the resulting concentrations.

  • Second cell : Ploting the resulting spectra

  • Thrid cell : Ploting the resulting abundances

Hyperspy is mainly designed to be used with the qt graphical backend of matplotlib. Thus two plots will appear if you are in inline mode.

[6]:
print_concentrations_from_W(est.W_, elements = spim.metadata.Sample.elements)
Concentrations report
     p0     p1     p2
La : 0.0000 0.2355 0.0000
Rb : 0.4849 0.0061 0.5608
W  : 0.0423 0.0014 0.0040
Yb : 0.0000 0.1038 0.0000
Pt : 0.0001 0.0214 0.0001
V  : 0.0285 0.0018 0.0081
Al : 0.0000 0.1182 0.0000
Ti : 0.0000 0.1065 0.0017
N  : 0.0018 0.1239 0.0003
[7]:
spim.plot_decomposition_loadings(3)
[7]:
_images/introduction_notebooks_api_13_0.png
_images/introduction_notebooks_api_13_1.png
[8]:
spim.plot_decomposition_factors(3)
[8]:
_images/introduction_notebooks_api_14_0.png
_images/introduction_notebooks_api_14_1.png
Generate simulated data

In this notebook we generate a simulated dataset based on experimental data. - The maps are directly extracted from the experimental data. - The phases (or spectra) are built using chemical composition extracted from the literature (citations to be included)

The output of this notebook is put in the generated_datasets

Imports
[1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

# espm imports
from espm.datasets.generate_weights import generate_weights, chemical_maps_weights
from espm.datasets.generate_EDXS_phases import generate_random_phases, unique_elts, generate_modular_phases
from espm.datasets.base import generate_dataset
from espm.utils import arg_helper
from pathlib import Path

# Generic imports
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import hyperspy.api as hs


Generate weights

We use an experimental dataset to create realistic abundance maps. Theses maps are based from chemical mapping of the dataset.

[2]:
def get_repo_path():
    """Get the path to the git repository.

    This is a bit of a hack, but it works.
    """
    this_path = Path.cwd() / Path("generate_data.ipynb")
    return this_path.resolve().parent.parent

# Path of the experimental dataset
path = get_repo_path() / Path("experiments/71GPa_experimental_data.hspy")

# Creation of the weights
weights = chemical_maps_weights(path,["Fe_Ka","Ca_Ka"],conc_max = 0.7)
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.1.4/lib/python3.7/site-packages/hyperspy/misc/utils.py:475: VisibleDeprecationWarning: Use of the `binned` attribute in metadata is going to be deprecated in v2.0. Set the `axis.is_binned` attribute instead.
  VisibleDeprecationWarning,
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.1.4/lib/python3.7/site-packages/hyperspy/io.py:575: VisibleDeprecationWarning: Loading old file version. The binned attribute has been moved from metadata.Signal to axis.is_binned. Setting this attribute for all signal axes instead.
  'signal axes instead.', VisibleDeprecationWarning)
Plot the results
[3]:
fig,axs = plt.subplots(1,3,figsize = (20,20*3))
for j in range(axs.shape[0]) :
    im = axs[j].imshow(weights[:,:,j],vmin = 0, vmax = 1.0)
    axs[j].tick_params(axis = "both",width = 0,labelbottom = False,labelleft = False)
    axs[j].set_title("Phase {}".format(j),fontsize = 22)

fig.subplots_adjust(right=0.84)
cbar_ax = fig.add_axes([0.85, 0.47, 0.01, 0.045])
fig.colorbar(im,cax=cbar_ax)
cbar_ax.tick_params(labelsize=22)
_images/introduction_notebooks_generate_data_6_0.png
Generate phases

We generate phases based on values extracted from the literature. The phases we try to simulate here are Ferropericlase, Bridgmanite and Ca-perovskite.

We use dictionnaries to input the modelling parameters.

[4]:
# Elemental concetration of each phase
elts_dicts = [
    # Pseudo ferropericlase
    {
        "Mg" : 0.522, "Fe" : 0.104, "O" : 0.374, "Cu" : 0.05
    },
    # Pseudo Ca-Perovskite
    {
        "Mg" : 0.020, "Fe" : 0.018, "Ca" : 0.188, "Si" : 0.173, "Al" : 0.010, "O" : 0.572, "Ti" : 0.004, "Cu" : 0.05, "Sm" : 0.007, "Lu" : 0.006, "Nd" : 0.006
    },
    # Pseudo Bridgmanite
    {
        "Mg" : 0.445, "Fe" : 0.035, "Ca" : 0.031, "Si" : 0.419, "Al" : 0.074, "O" : 1.136, "Cu" : 0.05, "Hf" : 0.01
    }]

# Parameters of the bremsstrahlung
brstlg_pars = [
    {"b0" : 0.0001629, "b1" : 0.0009812},
    {"b0" : 0.0007853, "b1" : 0.0003658},
    {"b0" : 0.0003458, "b1" : 0.0006268}
]

# Model parameters : energy scale, detector broadening, x-ray emission database, beam energy, absorption parameters, detector efficiency
model_params = {
        "e_offset" : 0.3,
        "e_size" : 1980,
        "e_scale" : 0.01,
        "width_slope" : 0.01,
        "width_intercept" : 0.065,
        "db_name" : "default_xrays.json",
        "E0" : 200,
        "params_dict" : {
            "Abs" : {
                "thickness" : 100.0e-7,
                "toa" : 35,
                "density" : 4.5,
                "atomic_fraction" : False
            },
            "Det" : "SDD_efficiency.txt"
        }
    }

# miscellaneous paramaters : average detected number of X-rays per pixel, phases densities, output folder, model name, random seed
data_dict = {
    "N" : 100,
    "densities" : [1.2,1.0,0.8],
    "data_folder" : "71GPa_synthetic_N100",
    "model" : "EDXS",
    "seed" : 42
}
[5]:
phases, full_dict = generate_modular_phases(
    elts_dicts=elts_dicts, brstlg_pars = brstlg_pars,
    scales = [1, 1, 1],
    model_params= model_params,
    seed = 42
    )
# scales : bremsstrahlung parameters modifiers

# Finish building the dictionnary of the simulation
data_dict.update(full_dict)
Plot the results
[6]:
fig,axs = plt.subplots(3,1,figsize = (20,15))

# Build the energy scale
x = np.linspace(
    full_dict["model_parameters"]["e_offset"],
    full_dict["model_parameters"]["e_offset"]+full_dict["model_parameters"]["e_scale"]*full_dict["model_parameters"]["e_size"],
    num=full_dict["model_parameters"]["e_size"])

for j in range(axs.shape[0]) :
    axs[j].plot(x,phases[j])
    axs[j].set_title("Phase {}".format(j),fontsize = 22)

axs[-1].set_xlabel("Energy loss (eV)",fontsize = 16)
[6]:
Text(0.5, 0, 'Energy loss (eV)')
_images/introduction_notebooks_generate_data_13_1.png
Generate the data

It will produce 1 spectrum images/sample in the target folder.

You can replace seed_range by the number of samples you want to generate.

[7]:
generate_dataset(seeds_range=1, **data_dict, weights=weights )
100%|██████████| 1/1 [00:04<00:00,  4.17s/it]

The previous command will save the data in the generated_datasets folder defined in the espm.conf.py file.

You can also define the path where the data will be saved using the “base_path” argument

[8]:
# generate_dataset(basepath="generated_samples", seeds_range=1, **data_dict, weights=weights )
[ ]:

Regularized Poisson NMF on a toy dataset

This notebook is part of the ESPM package. It is available on github

In this notebook, we show how to solve a regularized Poisson Non Negative Matrix Factorization (NMF) on a toy dataset. The general goal of this problem is to separate a matrix \(X\) into two matrices \(D\) and \(H\) such that \(X \approx DH\). We will assume that \(D\) is the product of two matrices \(G\) and \(W\) such that \(D = GW\). The matrix \(G\) is assumed to be known and therefore \(W\) is the matrix we want to find. In the field of electro-microscopy, the matrix \(GW\) represent the phases, the matrix \(H\) is the matrix of maps (or weights) and \(X\) is the data matrix. The maps \(H\) are 2D images. Furthermore, we will assume that \(W, H\) are non-negative or more generally greater than a small positive value \(\epsilon\).

The data measured data \(X\) is follow a Poisson distribution, i.e. \(X_{ij} \sim \text{Poisson}(n_p\dot{X}_{ij})\). In the context of electro-microscopy, \(n_p\) is the number of photons per pixel. \(\dot{X} = G \dot{W} \dot{H}\) would correspond a noiseless measurement.

The size of:

  • \(X\) is (n, p),

  • \(W\) is (m, k),

  • \(H\) is (k, p),

  • \(G\) is (n, m).

In general, m is assumed to be much smaller than n and \(G\) is assumed to be known. The parameter shape_2d defines the shape of the images for the columns of \(H\) and \(X\), i.e. shape_2d[0]*shape_2d[1] = p.

Mathematically. the problem can be formulated as:

\[\dot{W}, \dot{H} = \arg\min_{W\geq\epsilon, H\geq\epsilon, \sum_i H_{ij} = 1} D_{GKL}(X || GWH) + \lambda tr ( H^\top \Delta H) + \mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})\]

Here \(D_{GKL}\) is the fidelity term, i.e. the Generalized KL divergence

\[D_{GKL}(X \| Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

The loss is regularized using two terms: a Laplacian regularization on \(H\) and a log regularization on \(H\). \(\lambda\) and \(\mu\) are the regularization parameters. The Laplacian regularization is defined as:

\[\lambda tr ( H^\top \Delta H)\]

where \(\Delta\) is the Laplacian operator (it can be created using the function :mod:espm.utils.create_laplacian_matrix). Note that the columns of the matrices :math:`H` and :math:`X` are assumed to be images.

The log regularization is defined as:

\[\mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})\]

where \(\epsilon_{reg}\) is the slope of log regularization at 0. This term acts similarly to an L1 penalty but affects less larger values.

Finally, we assume \(W,H\geq \epsilon\) and that the lines of \(H\) sum to 1:

\[\sum_i H_{ij} = 1.\]

This is done by adding the parameter force_simplex=True to the class espm.estimators.SmoothNMF. This constraint is essential as it prevent the algorithm to find a solution where all the maps tend to 0 because of the other constraints.

In this notebook, we will use the class espm.estimators.SmoothNMF to solve the problem.

Imports and function definition

Let’s start by importing the necessary libraries.

[1]:
# Some useful modules for notebooks
%load_ext autoreload
%autoreload 2
%matplotlib inline
[2]:
import numpy as np
import matplotlib.pyplot as plt
from espm.estimators import SmoothNMF
from espm.measures import find_min_angle, ordered_mse, ordered_mae, ordered_r2
from espm.datasets import create_toy_sample
from espm.utils import process_losses

Now we define the parameters that will be used to generate the data.

[3]:

seed = 0 # for reproducibility m = 15 # Number of components n = 200 # Length of the phases n_poisson = 150 # Average poisson number per pixel (this number will be splitted on the m dimension)
[4]:
sample = create_toy_sample(n, m, n_poisson, seed=seed)

GW = sample["GW"]
G = sample["G"]
H = sample["H"]
X = sample["X"]
Xdot = sample["Xdot"]
shape_2d = sample["shape_2d"]


Let us look at the dimension of our problem.

[5]:
print(f"""
- X: {X.shape}
- Xdot: {Xdot.shape}
- G: {G.shape}
- GW: {GW.shape}
- H: {H.shape}
- shape_2d: {shape_2d}
""")

- X: (200, 10000)
- Xdot: (200, 10000)
- G: (200, 15)
- GW: (200, 3)
- H: (3, 10000)
- shape_2d: (100, 100)

Here Xdot contains the noisless data such that the ground truth H and GW satisfies:

\[X = GWH\]
[6]:
np.testing.assert_allclose(Xdot, GW @ H)

Let us plot the ground truth maps \(H\) (sometimes also called weights). Here the variable shape_2d is used to reshape the columns of \(H\) into images.

[7]:

vmin, vmax = 0,1 cmap = plt.cm.gray_r plt.figure(figsize=(10, 3)) for i, hdot in enumerate(H): plt.subplot(1,3,i+1) plt.imshow(H[i].reshape(shape_2d), cmap=cmap, vmin=vmin, vmax=vmax) plt.axis("off") plt.title(f"GT - Map {i+1}")
_images/introduction_notebooks_toy-problem_12_0.png

We can also plot the phases corresponding to these maps. The phases corresponds to the matrix \(GW\).

[8]:
e = np.linspace(0,1, GW.shape[0])
plt.plot(e, GW)
plt.title("GT - Phases")
plt.xlabel("Frequency [normalized]")
plt.legend([f"phase {i+1}" for i in range(3)]);
_images/introduction_notebooks_toy-problem_14_0.png

The matrix \(G\) is assumed to be known appriori. Here we plot the first 5 lines of \(G\).

[9]:
l = np.linspace(0, 1,n)
plt.plot(l, G[:,:5]);
_images/introduction_notebooks_toy-problem_16_0.png

Note that the function create_toy_sample did not return \(W\) but only \(GW\).

Using the ground truth \(GW\), it can be computed as follows:

[10]:
W = np.linalg.lstsq(GW, G, rcond=None)[0].T
W.shape
[10]:
(15, 3)
Solving the problem
Parameters

Let us define the different hyperparameters of the problem. Feel free to change them and see how the results change.

Let us first define our regularisation parameters.

[11]:
lambda_L = 1 # Smoothness of the maps
mu = 0.2 # Sparsity of the maps

We can additionally add a simplex constraint to the problem by setting force_simplex=True. This will add the constraint that the line of \(H\) sum to 1. This constraint should be activated for the regularizers to work!. Practically, it will prevent the algorithm from simply minimizing \(W\) and increasing \(H\).

[12]:
force_simplex = True

Finally, we can decide to specify \(G\) or not. If the matrix \(G\) is not specified, the algorithm will directly estimate \(GW\) insead of \(W\).

[13]:
Gused = G

Note that wihtout regularisation, i.e. with the parameters

lambda_L = 0
mu = 0
Gused = None
force_simplex=False

we recover the classical Poisson/KL NMF problem. Our algorithm will apply the MU algorithm from Lee and Seung (2001).

Let us define the parameters for the algorithm. Here the class espm.estimator.SmoothNMF heritates from sckit-learn’s NMF class.

[14]:

K = len(H) # Number of components / let assume that we know it params = {} params["tol"]=1e-6 # Tolerance for the stopping criterion. params["max_iter"] = 200 # Maximum number of iterations before timing out. params["hspy_comp"] = False # If should be set to True if hspy data format is used. params["verbose"] = 1 # Verbosity level. params["eval_print"] = 10 # Print the evaluation every eval_print iterations. params["epsilon_reg"] = 1 # Regularization parameter params["linesearch"] = False # Use linesearch to accelerate convergence params["shape_2d"] = shape_2d # Shape of the 2D maps params["n_components"] = K # Number of components params["normalize"] = True # Normalize the data. It helps to keep the range of the regularization parameters lambda_L and mu in a reasonable range. estimator = SmoothNMF(mu=mu, lambda_L=lambda_L, G = Gused, force_simplex=force_simplex, **params)

The function fit_transform will solve the problem and return the matrix \(GW\).

[15]:
GW_est = estimator.fit_transform(X)
It 10 / 200: loss 7.852122e-03,  4.185 it/s
It 20 / 200: loss 7.825420e-03,  4.254 it/s
It 30 / 200: loss 7.815601e-03,  4.280 it/s
It 40 / 200: loss 7.810161e-03,  4.293 it/s
It 50 / 200: loss 7.806439e-03,  4.300 it/s
It 60 / 200: loss 7.803648e-03,  4.304 it/s
It 70 / 200: loss 7.801485e-03,  4.307 it/s
It 80 / 200: loss 7.799700e-03,  4.309 it/s
It 90 / 200: loss 7.798369e-03,  4.310 it/s
It 100 / 200: loss 7.797371e-03,  4.311 it/s
It 110 / 200: loss 7.796559e-03,  4.311 it/s
It 120 / 200: loss 7.795796e-03,  4.315 it/s
It 130 / 200: loss 7.794962e-03,  4.319 it/s
It 140 / 200: loss 7.794118e-03,  4.322 it/s
It 150 / 200: loss 7.793349e-03,  4.324 it/s
It 160 / 200: loss 7.792436e-03,  4.326 it/s
It 170 / 200: loss 7.791491e-03,  4.327 it/s
It 180 / 200: loss 7.790487e-03,  4.327 it/s
It 190 / 200: loss 7.789284e-03,  4.328 it/s
exits because max_iteration was reached
Stopped after 200 iterations in 0.0 minutes and 46.0 seconds.

We can plot the different losses to see how the algorithm converges.

[16]:

losses = estimator.get_losses() values, names = process_losses(losses) plt.figure(figsize=(10, 6)) for name, value in zip(names[:4], values[:4]): plt.plot(value, label=name) plt.xlabel("iteration") plt.yscale("log") plt.legend()
[16]:
<matplotlib.legend.Legend at 0x7f95d86d9590>
_images/introduction_notebooks_toy-problem_33_1.png
[17]:
losses[0]
[17]:
(0.00801227, 0.00703841, 0.00075641, 0.00021746, 8., 0.03137933, 0.34708)

We can easily recover \(W\) and \(H\) from the estimator.

[18]:
H_est = estimator.H_
W_est = estimator.W_

Let us compute some metrics to evaluate the quality of the solution. We will use the metrics defined in the module espm.metrics.

We first compute the angles between the phases GW and the estimated phases GW_est. The angles are computed using the function compute_angles. This function also return the indices to match the estimations and the ground truths. Indeed, there is no reason for them to have the same order.

[19]:
angles, true_inds = find_min_angle(GW.T, GW_est.T, unique=True, get_ind=True)
print("angles : ", angles)
angles :  [5.468256222971067, 30.136582764210953, 47.47868600255621]

Now, we can compute the Mean Squared Error (MSE) between the maps H and the estimated maps H_est.

[20]:
mse = ordered_mse(H, H_est, true_inds)
print("mse : ", mse)
mse :  [0.051252249085099934, 0.022205344201354206, 0.05466327775719334]

Finally, let us plot the results.

[21]:

def plot_results(Ddot, D, Hdotflat, Hflat): fontsize = 30 scale = 15 aspect_ratio = 1.4 marker_list = ["-o","-s","->","-<","-^","-v","-d"] mark_space = 20 # cmap = plt.cm.hot_r cmap = plt.cm.gray_r vmax = 1 vmin = 0 K = len(H) L = len(D) angles, true_inds = find_min_angle(Ddot.T, D.T, unique=True, get_ind=True) mse = ordered_mse(Hdotflat, Hflat, true_inds) mae = ordered_mae(Hdotflat, Hflat, true_inds) r2 = ordered_r2(Hdotflat, Hflat, true_inds) fig, axes = plt.subplots(K,3,figsize = (scale/K * 3 * aspect_ratio,scale)) x = np.linspace(0,1, num = L) for i in range(K): axes[2,i].plot(x,Ddot.T[i,:],'g-',label='ground truth',linewidth=4) axes[2,i].plot(x,D[:,true_inds[i]],'r--',label='reconstructed',linewidth=4) axes[2,i].set_title("{:.2f} deg".format(angles[i]),fontsize = fontsize-2) axes[2,i].set_xlim(0,1) axes[1,i].imshow((Hflat[true_inds[i],:]).reshape(shape_2d),vmin = vmin, vmax = vmax , cmap=cmap) axes[1,i].set_title("R2: {:.2f}".format(r2[true_inds[i]]),fontsize = fontsize-2) # axes[i,1].set_ylim(0.0,1.0) axes[1,i].tick_params(axis = "both",labelleft = False, labelbottom = False,left = False, bottom = False) im = axes[0,i].imshow(Hdotflat[i].reshape(shape_2d),vmin = vmin, vmax = vmax, cmap=cmap) axes[0,i].set_title("Phase {}".format(i),fontsize = fontsize) axes[0,i].tick_params(axis = "both",labelleft = False, labelbottom = False,left = False, bottom = False) axes[2,0].legend() rows = ["True maps","Reconstructed maps","Spectra"] for ax, row in zip(axes[:,0], rows): ax.set_ylabel(row, rotation=90, fontsize=fontsize) fig.subplots_adjust(right=0.84) # put colorbar at desire position cbar_ax = fig.add_axes([0.85, 0.5, 0.01, 0.3]) fig.colorbar(im,cax=cbar_ax) # fig.tight_layout() print("angles : ", angles) print("mse : ", mse) print("mae : ", mae) print("r2 : ", r2) return fig fig = plot_results(GW, GW_est, H, H_est) plt.show()
angles :  [5.468256222971067, 30.136582764210953, 47.47868600255621]
mse :  [0.051252249085099934, 0.022205344201354206, 0.05466327775719334]
mae :  [0.1657080234420085, 0.10186348170125903, 0.12053660132901785]
r2 :  [0.30287198337896726, 0.08496786424311309, -0.2617123779050492]
_images/introduction_notebooks_toy-problem_42_1.png
[ ]:

API reference

Models

The espm.models module implements the physical modelling that is used both for the creation of the G matrix and for the simulation of data.

Note

For now espm supports only the modelling of EDS data but we aim at supporting the modelling of EELS data too.

Models abstract class

Its main purpose is to read databases and set energy parameters.

class espm.models.base.Model[source]

Abstract class for models.

abstract generate_g_matr(g_parameters)[source]
abstract generate_phases(phase_parameters)[source]
class espm.models.base.PhysicalModel(e_offset, e_size, e_scale, params_dict, db_name='default_xrays.json', E0=200, **kwargs)[source]

Abstract class of the models

build_energy_scale(e_offset, e_size, e_scale)[source]

Build an array corresponding to a linear energy scale.

Parameters
e_offset
float

Offset of the energy scale

e_size
int

Number of channels of the detector

e_scale
float

Scale in keV/channel

Returns
energy scale
np.array 1D

Linear energy scale based on the given parameters of shape (e_size,)

extract_DB(db_name)[source]

Read the cross-sections from the database

Parameters
db_name
string

Name of the json database file

Returns
data
dict

A dictionnary containing the cross-sections in the database

extract_DB_mdata(db_name)[source]

Read the metadata of the database

Parameters
db_name
string

Name of the json database file

Returns
data
dict

A dictionnary containing the metadata related to the database

class espm.models.base.ToyModel(L: int = 200, C: int = 15, K: int = 3, seed: Optional[int] = None)[source]

Toy model.

Simple model with a random G matrix and phases.

The G matrix is generated with a random number of gaussians per column. The phases are generated using a vector drawn from a Laplacian distribution and multiplied by the G matrix.

Parameters
Lint

Length of the phases.

Cint

Number of possible components in the phases.

Kint

Number of phases.

seedint, optional

Seed for the random number generator.

Examples

>>> from espm.models import ToyModel
>>> import matplotlib.pyplot as plt
>>> model = ToyModel(L=200, C=15, K=3, seed=0)
>>> model.generate_g_matr()
>>> model.generate_phases()
>>> print(model.G.shape)
(200, 15)
>>> print(model.phases.shape)
(200, 3)
>>> plt.plot(model.phases)
_images/models-1.png
generate_g_matr(*args, **kwargs) None[source]

Generate G matrix.

Parameters
argsignored
kwargsignored
Returns
None
generate_phases(*args, **kwargs) None[source]

Generate phases.

Parameters
argsignored
kwargsignored
Returns
None
EDXS model

The espm.models.edxs module implements the creation of the G matrix that contains a modelisation of the characteristic and continuous X-rays.

class espm.models.edxs.EDXS(*args, width_slope=0.01, width_intercept=0.065, **kwargs)[source]
generate_g_matr(g_type='bremsstrahlung', norm=True, reference_elt={}, *, elements=[], **kwargs)[source]

Generate the G matrix. With a complete model the matrix is (e_size,n+2). The first n columns correspond to the sum of X-ray characteristic peaks associated to each shell of the elements. The last 2 columns correspond to a bremsstrahlung model.

Parameters
g_type
string

Options of the edxs model to include in the G matrix. The three possibilities are “identity”, “no_brstlg” and “bremsstrahlung”. G is going to be the identity matrix, only characteristic X-ray peaks or characteristic X-rays plus bremsstrahlung model, respectively.

norm
boolean

Norms the columns of G for computation purposes. That feature will be removed, for a better solution.

reference_elt
dict

The keys are chemical elements (atomic number) and the values are cut-off energies. This argument is used to split some of the columns of G into 2 columns. The first column corresponds to characteristic X-rays before the cut-off and second one corresponds to characteristic X-rays before the cut-off. This feature is implemented to enable more accurate absorption correction.

elements
list

List of modeled chemical elements. The list can be populated either with atomic numbers or chemical symbols, e.g. “Fe” or 26.

Returns
g matrix
np.array 2D

matrix of the edx model.

Notes

See our paper about the espm package for more information about the equations of this model.

generate_phases(phases_parameters)[source]

Generate a series of spectra from list of phase parameters

Parameters
phases_parameters
list

List of dicts containing the phase parameters.

Returns
phase_array
np.array 2D

Array which lines correspond to the modelled phases in the input list.

Notes

The absorption correction is done using the average composition of the phases. The same correction is used for each phase.

generate_spectrum(b0=0, b1=0, scale=1.0, abs_elts_dict={}, *, elements_dict={})[source]

Generate a spectrum from bremsstrahlung parameters and a chemical composition.

Parameters
b0
float

First bremsstrahlung parameter.

b1
float

Second bremsstrahlung parameter.

scale
float

Scale factor to apply to the bremsstrahlung. Feature to be removed, scaling should be done with b0 and b1.

abs_elts_dict
dict

Dictionnary of elements and associated concentrations. This dictionnary is used to calculate the contribution of the absorption in the spectrum.

elements_dict
dict

Dictionnary of elements and associated concentrations describing the chemical composition of the modeled sample.

Returns
——-
spectrum
np.array 1D

Output edx spectrum corresponding to the input chemical composition.

Notes

Check EDXS_function for details about the bremsstrahlung model.

Examples

>>> import matplotlib.pyplot as plt
>>> from espm.models.edxs import EDXS
>>> from espm.conf import DEFAULT_SYNTHETIC_DATA_DICT
>>> b0, b1 = 5.5367e-5, 0.00192181
>>> elts_dict = {"Si" : 1.0,"Ca" : 1.0,"O" : 3.0,"C" : 0.3}
>>> model = EDXS(**DEFAULT_SYNTHETIC_DATA_DICT['model_parameters'])
>>> spectrum = model.generate_spectrum(b0,b1, elements_dict = elts_dict)
>>> plt.plot(spectrum)
_images/models-2.png
espm.models.edxs.G_EDXS(model_params, g_params, part_W=None, G=None)[source]

Update the bremsstrahlung part of the G matrix. This function is used for the NMF decomposition so that the absorption correction is updated in between each step.

Parameters
model_params
dict

Parameters of the edxs model. Check espm.conf.DEFAULT_EDXS_PARAMS for an example.

g_params
dict

Parameters specific to the creation of the G matrix.

part_W
np.array 2D

Concentrations determined during the NMF decomposition. It is used to determine the average composition that will be used to determine the absorption correction.

G
np.array 2D

Current G matrix. Setting it to None will initialize the G matrix.

Returns
updated G
np.array 2D

Updated G matrix with a new absorption correction.

EDXS functions

The espm.models.EDXS_function module implements misceallenous functions required for the :mod: espm.models.edxs. This module include bremsstrahlung modelling, database reading and dicts and list manipulation functions.

espm.models.EDXS_function.G_bremsstrahlung(x, E0, params_dict, *, elements_dict={})[source]

Computes the two-parts continuum X-rays for the G matrix. The two parts of the bremsstrahlung are constructed separately so that its parameters can fitted to data. Absorption and detection are multiplied to each part.

Parameters
x
np.array 1D

Energy scale.

params_dict
dict

Dictionnary containing the absorption and detection parameters.

elements_dict
dict

Composition of the studied sample. It is required for absorption calculation.

Returns
continuum_xrays
np.array 2D

Two parts continuum X-rays model with shape (energy scale size, 2).

espm.models.EDXS_function.chapman_bremsstrahlung(x, b0, b1, b2)[source]

Calculate the bremsstrahlung as parametrized by chapman et al.

Parameters
x
np.array 1D

Energy scale.

b0
float

First parameter, corresponding to the inverse of the energy.

b1
float

Second parameter, corresponding to the energy.

b2
float

Third parameter, corresponding to the square of the energy.

Returns
bremsstrahlung
np.array 1D

Bremsstrahlung model

Notes

For details see Chapman et al. Journal of Microscopy, 136, pp. 179-191, (1984)

espm.models.EDXS_function.continuum_xrays(x, params_dict={}, b0=0, b1=0, E0=200, *, elements_dict={'Si': 1.0})[source]

Computes the continuum X-rays, i.e. the bremsstrahlung multiplied by the absorption and the detection efficiency.

Parameters
x
np.array 1D

Energy scale.

params_dict
dict

Dictionnary containing the absorption and detection parameters.

b0
float

First parameter.

b1
float

Second parameter.

E0
float

Energy of the incident beam in keV.

elements_dict
dict

Composition of the studied sample. It is required for absorption calculation.

Returns
continuum_xrays
np.array 1D

Continuum X-rays model.

Notes

  • When an empty dict is provided. The continuum X-rays are set to 0. This is useful to perform calculations without the bremsstrahlung for example.

  • For an example structure of the params_dict parameter, check the DEFAULT_EDXS_PARAMS espm.conf.

  • For a custom detection efficiency, check the spectrum fit notebook.

espm.models.EDXS_function.elts_dict_from_W(part_W, *, elements=[])[source]

Create a dictionnary of the elemental concentration from a fitted W. It useful to recompute absorption during the our custom NMF calculations.

Parameters
part_W
np.array 2D

W matrix output from the NMF calculation. It only makes sense when the NMF decomposition if performed with G.

elements
list

List of elements as atomic number or symbol.

Returns
elements_dictionnary
dict

Dictionnary containing the concentration associated to each chemical element of the problem.

Notes

This function should maybe move to another part of espm.

Examples

>>> import numpy as np
>>> from espm.models.EDXS_function import elts_dict_from_W
>>> W = np.ones((4,3))
>>> elts = ["Si","Ca","O","C"]
>>> elts_dict_from_W(W,elements = elts)
    {"Si" : 0.25, "Ca" : 0.25, "O" : 0.25, "C" : 0.25}
espm.models.EDXS_function.elts_dict_from_dict_list(dict_list)[source]

Create a single dictionnary of the elemental concentration from a list of dictionnary containing chemical compositions. The new dictionnary corresponds to the average chemical composition of the list.

Parameters
dict_list
list [dict]

list of chemical composition dictionnary

Returns
elements_dictionnary
dict

Dictionnary containing the concentration associated to each chemical elements contained in the input.

Notes

This function should maybe move to another part of espm.

Examples

>>> import numpy as np
>>> from espm.models.EDXS_function import elts_dict_from_dict_list
>>> dicts = [{"Si" : 1.0},{"Ca" : 1.0},{"O" : 1.0},{"C" : 1.0}]
>>> elts_dict_from_dict_list(dicts)
    {"Si" : 0.25, "Ca" : 0.25, "O" : 0.25, "C" : 0.25}
espm.models.EDXS_function.gaussian(x, mu, sigma)[source]

Calculate the gaussian function.

Parameters
xnp.array 1D

Energy scale.

mufloat

Mean of the gaussian.

sigmafloat

Standard deviation of the gaussian.

Returns
gaussiannp.array 1D

Gaussian distribution on x, with mean mu and standard deviation sigma.

espm.models.EDXS_function.lifshin_bremsstrahlung(x, b0, b1, E0=200)[source]

Calculate the custom parametrized bremsstrahlung inspired by the model of L.Lifshin.

The model is designed to be linear and positive in b0 and b1.

Parameters
x
np.array 1D

Energy scale.

b0
float

First parameter.

b1
float

Second parameter.

E0
float

Energy of the incident beam in keV.

Returns
bremsstrahlung
np.array 1D

Bremsstrahlung model

Notes

For details see L.Lifshin, Ottawa, Proc.9.Ann.Conf.Microbeam Analysis Soc. 53. (1974)

espm.models.EDXS_function.lifshin_bremsstrahlung_b0(x, b0, E0=200)[source]

Calculate the first part of our bremsstrahlung model.

espm.models.EDXS_function.lifshin_bremsstrahlung_b1(x, b1, E0=200)[source]

Calculate the second part of our bremsstrahlung model.

espm.models.EDXS_function.print_concentrations_from_W(part_W, *, elements=[])[source]

Print a report of the chemical concentrations from a fitted W.

Parameters
part_W
np.array 2D

W matrix output from the NMF calculation. It only makes sense when the NMF decomposition if performed with G.

elements
list

List of elements as atomic number or symbol.

Returns
None
None

None.

Notes

This function should maybe move to another part of espm.

espm.models.EDXS_function.read_compact_db(elt, db_dict)[source]

Read the energy and cross section of each line of a chemical element for compact databases.

Parameters
elt
string

Atomic number.

db_dict
dict

Dictionnary extracted from the json database containing the emission lines and their energies.

Returns
energies
list float

List of energies associated to each emission line of the given element

cross-sections
list float

List of emission cross-sections of each line of the given element

espm.models.EDXS_function.read_lines_db(elt, db_dict)[source]

Read the energy and cross section of each line of a chemical element for explicit databases.

Parameters
elt
string

Atomic number.

db_dict
dict

Dictionnary extracted from the json database containing the emission lines and their energies.

Returns
energies
list float

List of energies associated to each emission line of the given element

cross-sections
list float

List of emission cross-sections of each line of the given element

espm.models.EDXS_function.shelf(x, height, length)[source]

Calculate the shelf contribution of the EDXS Silicon Drift Detector (SDD) detector.

Parameters
x
np.array 1D

Energy scale.

height
float

Height in intensity of shelf contribution of the detector.

length
float

Length in energy of shelf contribution of the detector.

Returns
shelf
np.array 1D

SDD shelf model.

Notes

For details see Scholze, F. and Procop, M., X-Ray Spectrom., 38: 312-321. (2009)

espm.models.EDXS_function.update_bremsstrahlung(G, part_W, model_parameters, elements_list, norm=True)[source]

Update the continuum X-rays part of the G matrix with the current average chemical composition.

Parameters
G
np.array 2D

EDXS modelling matrix.

part_W
np.array 2D

Learned chemical concentrations.

model_parameters
dict

Parameters of the EDXS model.

elements_list
list

List of elements as atomic number or symbol.

norm
bool

work in progress

Returns
new_G
np.array 2D

New G matrix with updated continuum X-rays models.

Notes

This function should maybe move to another part of espm.

Absorption functions

The espm.models.absorption_edxs module implements the functions to calculate the contribution of absorption in edx spectra.

espm.models.absorption_edxs.absorption_coefficient(x, atomic_fraction=False, *, elements_dict={'Si': 1.0})[source]

Calculate the mass-absorption coefficent of a given composition at a certain energy or over a certain energy range.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
mass-absorption coefficient
np.array 1D

Calculated value of the mass-absorption coefficient on all the energy range.

Notes

The mass-absorption coefficients are calculated using the database of hyperspy

espm.models.absorption_edxs.absorption_correction(x, thickness=1e-05, toa=90, density=None, atomic_fraction=False, *, elements_dict={'Si': 1.0}, **kwargs)[source]

Calculate the contribution of the absorption on the EDX spectrum for a thin slab of material with a given composition at a certain energy or over a certain energy range.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

thickness
float

Thickness of the material slab in meter. If the thickness is set to 0.0, the function will return 1.0.

toa
float

Take-off angle in degrees of the x-rays travelling from the sample to the x-ray detectors.

density
float

Density of the material in g/m3 (to be checked). If None, an approximation of the density based on the atomic numbers of the elements of the materials is calculated.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
absorption correction
np.array 1D

Calculated value of the absorption correction on all the energy range.

espm.models.absorption_edxs.absorption_mass_thickness(x, mass_thickness, toa=90, atomic_fraction=True, *, elements_dict={'Si': 1.0})[source]

Calculate the contribution of the absorption of a mass-thickness map.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

mass_thickness:
np.array 2D

Mass-thickness map. See the notes for details.

toa
float

Take-off angle in degrees of the x-rays travelling from the sample to the x-ray detectors.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
absorption correction
np.array 3D

Calculated value of the absorption correction on all the energy range for all pixels of the mass-thickness map.

Notes

The mass-thickness map can be extracted from an EDX spectrum image by calculating the concentration of an element for both K and L lines. Since there should be only one concentration for both lines, it is possible to determine the contribution of the absorption for each pixel and thus obtain the mass-thickness map.

espm.models.absorption_edxs.det_efficiency(x, det_dict)[source]

Simulate the detection efficiency of an EDX detector.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

det_dict
dict

See Examples for an example of the structure of the dict. One of the layer of the input dictionnary should be named “detection” to model a active layer of the detector.

Returns
efficiency
np.array 1D

Calculated detection efficiency based on the input layers model.

Examples

>>> from espm.models.absorption_edxs import det_efficiency
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dict = {"detection" : {"thickness" : 10e-3, "density" : 2.3, "atomic_fraction" : False, "elements_dict" : {"Si" : 1.0}},
>>> "dead_layer" : {"thickness" : 50e-7, "density" : 2.7, "atomic_fraction" : False, "elements_dict" : {"Al" : 1.0}}}
>>> x = np.linspace(0.5,20,num = 1000)
>>> efficiency = det_efficiency(x,dict)
>>> plt.plot(x,efficiency)
_images/models-3.png
espm.models.absorption_edxs.det_efficiency_from_curve(x, filename, kind='cubic')[source]

Interpolate a detection efficiency curve that is stored in ~/espm/tables.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

filename:
string

Name of the file of the detection efficiency.

kind
string

Polynomial order of the interpolation.

Returns
Detection efficiency
np.array 1D

Interpolated detection efficiency.

espm.models.absorption_edxs.det_efficiency_layer(x, thickness=1e-05, density=None, atomic_fraction=False, *, elements_dict={'Si': 1.0})[source]

Calculate the proportion of absorbed X-rays for one virtual layer of the detector.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

thickness
float

Thickness of the material slab in meter. If the thickness is set to 0.0, the function will return 1.0.

density
float

Density of the material in g/m3 (to be checked). If None, an approximation of the density based on the atomic numbers of the elements of the materials is calculated.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
absorption
np.array 1D

Calculated absorption of one layer of the detector.

Notes

We assume that the X-rays arrive perpendicular to the detector.

Weights

Functions to generate weights for synthetic datasets.

espm.weights.laplacian_weights(shape_2d, n_phases=3, seed=0)[source]

Generates a random weight matrix with a laplacian distribution.

The random weight are then filtered with median filter 2 times to smooth the distribution. Eventually, the result is normalized to sum up to one.

Parameters
shape_2dtuple

Shape of the weight matrix.

n_phasesint, optional

Number of phases. The default is 3.

seedint, optional

Seed for the random number generator. The default is 0.

Returns
weightsnumpy.ndarray

Weight matrix with laplacian distribution.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from espm.weights import laplacian_weights
>>> weights = laplacian_weights((100,100), 3, 0)
>>> plt.figure(figsize=(10,3))
>>> for i in range(3):
...     plt.subplot(1,3,i+1)
...     plt.imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     plt.axis("off")
...     plt.title(f"Map {i+1}")
>>> plt.show()
_images/weights-1.png
espm.weights.load_toy_weights()[source]

Load the toy problem weights.

Returns
Hdotnp.ndarray

The weights of the toy problem.

Examples

>>> from espm.weights import load_toy_weights
>>> import matplotlib.pyplot as plt
>>> Hdot = load_toy_weights()
>>> print(Hdot.shape)
(200, 200, 3)
>>> print(Hdot.dtype)
float64
>>> plt.figure(figsize=(10,3))
>>> for i in range(3):
...     plt.subplot(1,3,i+1)
...     plt.imshow(Hdot[:,:,i], cmap=plt.cm.gray_r)
...     plt.axis("off")
...     plt.title(f"Map {i+1}")
>>> plt.show()
_images/weights-2.png
espm.weights.random_weights(shape_2d, n_phases=3, seed=0)[source]

Generates a random weight matrix with a uniform distribution.

The random weights are then normalized to sum up to one.

Parameters
shape_2dtuple

Shape of the weight matrix.

n_phasesint, optional

Number of phases. The default is 3.

seedint, optional

Seed for the random number generator. The default is 0.

Returns
weightsnumpy.ndarray

Weight matrix with uniform distribution.

Datasets
Sample creation

The espm.datasets.samples module provides a set of functions to create the samples of datasets.

espm.datasets.samples.create_toy_sample(L, C, n_poisson, seed=None)[source]

Create a toy sample.

Parameters
Lint

Length of the phases.

Cint

Number of possible components in the phases.

n_poissonint

Poisson parameter. The higher, the less noise.

seedint, optional

Seed for the random number generator.

Returns
sampledict

Dictionary containing the sample.

The dictionary contains the following keys:
  • model_parameters: dict

    Dictionary containing the model parameters.

  • misc_parameters: dict

    Dictionary containing the misc parameters. Default empty.

  • shape_2d: list

    List of length 2 containing the shape of the 2D images.

  • GW: np.array

    The marrix corresponding to the phases.

  • H: np.array

    The matrix corresponding to the weights.

  • X: np.array

    The matrix corresponding to the noisy data.

  • Xdot: np.array

    The matrix corresponding to the noiseless data.

  • G: np.array

    The matrix corresponding to the G matrix.

Examples

>>> import matplotlib.pyplot as plt
>>> from espm.datasets import create_toy_sample
>>> L, C, n_poisson = 200, 15, 400,
>>> sample = create_toy_sample(L, C, n_poisson, seed=0)
>>> Hdot = sample["H"]
>>> GW = sample["GW"]
>>> G = sample["G"]
>>> X = sample["X"]
>>> Xdot = sample["Xdot"]
>>> shape_2d = sample["shape_2d"]
>>> vmin, vmax = 0,1
>>> cmap = plt.cm.gray_r
>>> plt.figure(figsize=(10, 3))
>>> for i, hdot in enumerate(Hdot):
...     plt.subplot(1,3,i+1)
...     plt.imshow(Hdot[i].reshape(shape_2d), cmap=cmap, vmin=vmin, vmax=vmax)
...     plt.axis("off")
...     plt.title(f"Map {i+1}")
>>> plt.show()
_images/datasets-1_00_00.png
>>> plt.figure(figsize=(10, 3))
>>> l = np.arange(0, 1, 1/L)
>>> plt.plot(l, G[:,:3])
>>> plt.title("Spectral response of each elements")
>>> plt.show()
_images/datasets-1_01_00.png
Estimators

NMF Estimators.

The espm.estimators module implements different NMF algorithms.

The class espm.estimators is an abstract class for all NMF algorithms. It implements the fit and transform methods. The fit method is implemented in the abstract class and calls the _iteration method which is implemented in the child classes. The transform method is implemented in the child classes.

NMF Estimator
class espm.estimators.NMFEstimator(n_components=2, init=None, tol=0.0001, max_iter=200, random_state=None, verbose=1, debug=False, l2=False, G=None, shape_2d=None, normalize=False, log_shift=1e-14, eval_print=10, true_D=None, true_H=None, fixed_H=None, fixed_W=None, hspy_comp=False)[source]

Abstract class for NMF algorithms.

This abstract class espm.estimators.NMFEstimator is used to implement the different NMF algorithms. It solves problems of the form:

\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} \frac{1}{2} L(X, GWH) + R(W, H)\]

where \(X\) is the data matrix, \(G\) is a matrix of known values, \(W\) and \(H\) are the matrices to be learned, \(R\) is a regularization term and \(L\) is a loss function. that represent the generalized KL divergence. As a reminder, the generalized KL divergence is defined as:

\[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

where \(Y = GWH\). Since \(X\) does not depend on \(W\) and \(H\), we obtain the loss function:

\[L(X, Y) = - \sum_{i,j} X_{ij} \log \frac{GWH_{ij}} + GWH_{ij}\]

The Generalized KL divergence has the advantage of being zero when \(X = Y\), which is not the case for our loss. Therefore, we shift the loss function by a constant \(C\) such that it equals the Generalized KL divergence. This constant is stored in the attribute espm.estimators.NMFEstimator.const_KL_.

The loss function can also be selected to be the Frobenius norm. In this case, the loss function is:

\[L(X, Y) = \frac{1}{2} \sum_{i,j} (X_{ij} - Y_{ij})^2\]

While the code will work, it is not recommended to use the Frobenius norm as a loss function. This code is optimized for the KL divergence.

The size of:

  • \(X\) is \((n, p)\),

  • \(W\) is \((m, k)\),

  • \(H\) is \((k, p)\),

  • \(G\) is \((n, m)\).

The columns of the matrices \(H\) and \(X\) are assumed to be images. This is used typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e. shape_2d[0]*shape_2d[1] = p.

Parameters
n_componentsint, default=2

Number of components, i.e. dimensionality of the latent space.

initstr

Method used to initialize the procedure. Default is None The method use the initialization of sklearn.decomposition. It can be imported using: .. code-block::python

>>> from sklearn.decomposition._nmf import _initialize_nmf
tolfloat, default=1e-4

Tolerance of the stopping condition.

max_iterint, default=200

Maximum number of iterations before timing out.

random_stateint, RandomState instance, default=None
verboseint, default=1

The verbosity level.

debugbool, default=False

If True, the algorithm will log more and perform more checks.

l2bool, default=False

If True, the algorithm will use the l2 norm instead of the KL divergence.

Gnp.array, function or None, default=None

If np.array, it is the known matrix of the data. If function, it is a function that takes as input the data matrix and returns the known matrix (np.array). If None, it is assumed that G is the identity matrix.

shape_2dtuple or None, default=None

If not None, it is the image shape of the columns of the matrices \(X\) and \(H\).

normalizebool, default=False

If True, the algorithm will normalize the data matrix \(X\).

log_shiftfloat, default=1e-10

Lower bound for W and H, i.e. \(\epsilon\).

eval_printint, default=10

Number of iterations between each evaluation of the loss function.

true_Dnp.array or None, default=None

Ground truth for the matrix \(GW\). Used for evaluation purposes.

true_Hnp.array or None, default=None

Ground truth for the matrix \(H\). Used for evaluation purposes.

fixed_Hnp.array or None, default=None

If not None, it fixes the non-zero values of the matrix \(H\).

fixed_Wnp.array or None, default=None

If not None, it fixes the non-zero values of the matrix \(W\).

hspy_compbool, default=False

If True, the algorithm will use the format compatible with hyperspy. Use this option if you run the algorithm with the method decompositio in hyperspy. For example: .. code-block::python

>>> est = SmoothNMF( n_components = 3, hspy_comp = True)
>>> out = spim.decomposition(algorithm = est, return_info=True)
fit(X, y=None, **params)[source]

Learn a NMF model for the data X.

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

Data matrix to be decomposed

yIgnored
paramsdict

Parameters passed to the fit_transform method.

Returns
self

The model.

fit_transform(X, y=None, W=None, H=None)[source]

Learn a NMF model for the data X and returns the transformed data. This is more efficient than calling fit followed by transform.

The size of:

  • \(X\) is \((n, p)\),

  • \(W\) is \((m, k)\),

  • \(H\) is \((k, p)\),

  • \(G\) is \((n, m)\).

Parameters
X{array-like, sparse matrix} of shape (n, p)

Data matrix to be decomposed

yIgnored

Not used, present here for API consistency by convention.

Warray-like of shape (m, k)

If specified, it is used as initial guess for the solution.

Harray-like of shape (k, p)

If specified, it is used as initial guess for the solution.

Returns
GWndarrays

Transformed data.

get_losses()[source]
inverse_transform(W)[source]

Transform data back to its original space.

Parameters
W{ndarray, sparse matrix} of shape (n_samples, n_components)

Transformed data matrix.

Returns
X{ndarray, sparse matrix} of shape (n_samples, n_features)

Data matrix of original shape.

loss(W, H, average=True, X=None)[source]

Loss function.

Compute the loss function for the given matrices W and H.

Parameters
Wnp.array

Matrix of shape (n, k)

Hnp.array

Matrix of shape (k, p)

averagebool, default=True

If True, the loss is averaged over the number of elements of the matrices.

Xnp.array or None, default=None

If not None, it is the data matrix. If None, it is assumed that the data matrix in self.X_.

Returns
loss_float

Value of the loss function.

remove_zeros_lines(X, epsilon)[source]
const_KL_ = None
loss_names_ = ['KL_div_loss']
SmoothNMF
class espm.estimators.SmoothNMF(lambda_L=1.0, linesearch=False, mu=0, epsilon_reg=1, algo='log_surrogate', force_simplex=True, dicotomy_tol=1e-05, gamma=None, **kwargs)[source]

SmoothNMF - NMF with a smooth regularization term

We encourage to read the example available in the documentation: https://espm.readthedocs.io/en/latest/introduction/notebooks/toy-problem.html

The corresponding notebook is available on github: https://github.com/adriente/espm/blob/main/notebooks/toy-ML.ipynb

The class SmoothNMF implements the regularized NMF algorithm. It solves problems of the form:

\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} D_{GKL}(X || GWH) + \lambda_L tr(H \Delta H^\top) + \mu \sum_{ij} \log(H_{ij} + \epsilon_{reg})\]

where:

  • D_{GKL} is the Generalized KL divergence loss function defined as:

    \[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

    See the documentation of the class espm.estimators.NMFEstimator for more details.

  • Delta is the Laplacian operator (it can be created using the function create_laplacian_matrix from the utils module).

  • epsilon_{reg} is the slope of the log regularization/sparsity at 0 (you probably want to leave this to 1).

  • lambda_L is a regularization parameter, which encourages smoothness in the columns of H.

  • mu is a regularization parameter, which is similar to an L1 sparsity penalty.

The size of:

  • X is (n, p)

  • W is (m, k)

  • H is (k, p)

  • G is (n, m)

The columns of the matrices H and X are assumed to be images, typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e., shape_2d[0]*shape_2d[1] = p.

Parameters
lambda_Lfloat, default=1.0

Regularization parameter for the smooth regularization term.

linesearchbool, default=False

If True, use a line search to find the step size.

mufloat, default=0

Regularization parameter for the log regularization/sparsity term.

epsilon_regfloat, default=1

Slope of the log regularization/sparsity at 0.

algostr, default=”log_surrogate”

Algorithm to use for the smooth regularization term. Can be “log_surrogate”, “l2_surrogate”, or “projected_gradient”.

force_simplexbool, default=True

If True, force the solution to be in the simplex.

dicotomy_tolfloat, default=1e-3

Tolerance for the dichotomy algorithm.

gammafloat, default=None

Initial value for the step size. If None, it is set to the Lipschitz constant of the gradient.

**kwargsdict

Additional parameters for the NMFEstimator class.

fit_transform(X, y=None, W=None, H=None)[source]

Fit the model to the data X and returns the transformed data.

The size of:

  • \(X\) is \((n, p)\),

  • \(W\) is \((m, k)\),

  • \(H\) is \((k, p)\),

  • \(G\) is \((n, m)\).

Parameters
Xarray-like, shape (n, p)

Data matrix to be decomposed

yIgnored

Not used, present here for API consistency by convention.

Warray-like, shape (m, k)

If init=’custom’, it is used as initial guess for the solution.

Harray-like, shape (k, p)

If init=’custom’, it is used as initial guess for the solution.

Returns
GWndarrays

Transformed data.

loss(W, H, average=True, X=None)[source]

Compute the loss function.

loss_names_ = ['KL_div_loss', 'log_reg_loss', 'Lapl_reg_loss', 'gamma']
Measures

The espm.measures module implements different measures for the matrix factorisation problem. In particular it contains the different losses and regularizers used in espm.estimator module. It also contains different metrics to evaluate the results.

espm.measures.Frobenius_loss(X, W, H, average=False)[source]

Frobenius norm of the difference between X and WH.

Compute the Froebenius norm (elementwise L2 norm of a matrix) given \(X,W,H\):

\[\| X - WH \|_F = \sum_{ji} \left| X_{ij} - (W H)_{ij} \right|^2\]
Parameters
  • X (np.array 2D) – n x m matrix

  • W (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Examples

>>> import numpy as np
>>> from espm.measures import Frobenius_loss
>>> X = np.array([[1, 1, -1], [2, 4, 5]])
>>> W = np.array([[1], [1]])
>>> H = np.array([[1, 2, 3]])
>>> Frobenius_loss(X, W, H)
    26
espm.measures.KL(X, Y, log_shift=1e-14, average=False)[source]

Generalized KL (Kullback–Leibler) divergence for two matrices

\[D_KL(X || Y) = \sum_{ji} X_{ij} \log (X / Y)_{ij} + (Y - X)_{ij}\]
Parameters
  • X (np.array 2D) – n x m matrix

  • W (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

  • log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module esppy.conf)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

espm.measures.KL_loss_surrogate(X, W, H, Ht, log_shift=1e-14, average=False)[source]

Surrogate loss for the KL divergence.

espm.measures.KLdiv(X, D, H, log_shift=1e-14, average=False)[source]

Generalized KL (Kullback–Leibler) divergence

Compute the generalized KL divergence given \(X,W,H\):

\[D_KL(X || WH) = \sum_{ji} X_{ij} \log (X / D A)_{ij} + (D A - X)_{ij}\]
Parameters
  • X (np.array 2D) – n x m matrix

  • W (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

  • log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module esppy.conf)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

Examples

>>> import numpy as np
>>> from espm.measures import KLdiv
>>> X = np.array([[1, 1, 1], [2, 4, 5]])
>>> W = np.array([[1], [1]])
>>> H = np.array([[1, 2, 3]])
>>> KLdiv(X, W, H)
    2.921251732961556
espm.measures.KLdiv_loss(X, W, H, log_shift=1e-14, average=False)[source]

Generalized Generalized KL (Kullback–Leibler) divergence loss

Compute the loss based on the generalized KL divergence given :math: X,W,H:

\[\sum_{ji} X_{ij} \log (D W)_{ij} + (D W)_{ij}\]

This does not contains all the term of the KL divergence, only the ones depending on W and H.

Parameters
  • X (np.array 2D) – n x m matrix

  • Y (np.array 2D) – n x m matrix

  • log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module esppy.conf)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

Examples

>>> import numpy as np
>>> from espm.measures import KLdiv, KLdiv_loss
>>> X = np.array([[1, 1, 1], [2, 4, 5]])
>>> W = np.array([[1], [1]])
>>> H = np.array([[1, 2, 3]])
>>> KLdiv_loss(X, W, H)
    1.9425903651915402ZSXDerzA
>>> KLdiv(X, W, H)
    2.921251732961556
espm.measures.find_min_MSE(true_maps, algo_maps, get_ind=False, unique=False)[source]

Compare all the mean squared errors between ground truth spectra and NMF spectra and find the minimum configuration.

Parameters
  • true_maps (np.array 2D) – true maps with shape (number of phases, number of pixels)

  • algo_maps (np.array 2D) – NMF maps with shape (number of phases, number of pixels)

  • get_ind (boolean) – If True, the function will also return the indices of the NMF maps corresponding to the ground truth

  • unique (boolean) – If False it will find the global minimum but several maps can be associated to the same ground truth

Returns

list of mse, (optionally) tuple of indices

Return type

(list[float],list[int])

..warning::

The output being either a list or a tuple of list isn’t a great idea. It has to change.

espm.measures.find_min_angle(true_vectors, algo_vectors, get_ind=False, unique=False)[source]

Compare all the angles between ground truth spectra and NMF spectra and find the minimum configuration.

Parameters
  • true_vectors (np.array 2D) – true spectra with shape (number of phases, number of energy channels)

  • algo_vectors (np.array 2D) – NMF spectra with shape (number of phases, number of energy channels)

  • get_ind (boolean) – If True, the function will also return the indices of the NMF spectra corresponding to the ground truth

  • unique (boolean) – If False it will find the global minimum but several spectra can be associated to the same ground truth

Returns

list of angles, (optionally) tuple of indices

Return type

(list[float],list[int])

..warning::

The output being either a list or a tuple of list isn’t a great idea. It has to change.

espm.measures.find_min_config(true_maps, true_spectra, algo_maps, algo_spectra, angles=True)[source]
espm.measures.global_min(matr)[source]
espm.measures.log_reg(H, mu, epsilon=1, average=False)[source]

Log regularisation

Compute the regularization loss:

\[R(\mu, H, \epsilon) = \sum_{ij} \mu_i \log \left( H_{ij} + \epsilon \right).\]
Parameters
  • H (np.array 2D) – n x m matrix

  • mu (np.array 1D) – n vector

  • epsilon (float) – value of \(\epsilon\) (default 1)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

espm.measures.log_surrogate(H, Ht, mu, epsilon, average=False)[source]

Surrogate loss for the log function.

espm.measures.mae(map1, map2)[source]

Mean average error

Calculate the mean average error between two 2D arrays of the same dimension.

Parameters
  • map1 (np.array 2D) – first array

  • map2 (np.array 2D) – second array

Returns

the answer

Examples

>>> import numpy as np
>>> from espm.measures import mae
>>> map1 = np.array([[0, 1][0, 1]])
>>> map2 = np.array([[1, 1][1, 1]])
>>> mae(map1, map2)
    0.5
espm.measures.mse(map1, map2)[source]

Mean square error

Calculate the mean squared error between two 2D arrays of the same dimension.

Parameters
  • map1 (np.array 2D) – first array

  • map2 (np.array 2D) – second array

Returns

the answer

Examples

>>> import numpy as np
>>> from espm.measures import mse
>>> map1 = np.array([[0, 1][0, 1]])
>>> map2 = np.array([[1, 1][1, 1]])
>>> mse(map1, map2)
    0.5
espm.measures.ordered_angles(true_spectra, algo_spectra, input_inds)[source]

See ordered mse

espm.measures.ordered_mae(true_maps, algo_maps, input_inds)[source]

input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the mean average errors of each phase in truth order.

espm.measures.ordered_mse(true_maps, algo_maps, input_inds)[source]

input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the mean squared errors of each phase in truth order.

espm.measures.ordered_r2(true_maps, algo_maps, input_inds)[source]

input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the coefficient of determination of each phase in truth order.

espm.measures.r2(map_true, map_pred)[source]

\(R^2\) - Coefficient of determination

Calculates the coefficient of determination between two 2D arrays of the same dimension. This is also called regression score function. See wikipedia.

This function is a wrapper for the function sklearn.metrics.r2_score of Scikit Learn.

Parameters
  • map1 (np.array 2D) – first array

  • map2 (np.array 2D) – second array

Returns

the answer

Return type

float

espm.measures.residuals(data, model)[source]
espm.measures.spectral_angle(v1, v2)[source]

Spectral angle

Calculate the angle between two spectra of the same dimension.

Parameters
  • v1 (np.array 1D) – first spectrum

  • v2 (np.array 1D) – second spectrum

Returns

the answer

Return type

float

Examples

>>> import numpy as np
>>> from espm.measures import spectral_angle
>>> v1 = np.array([0, 1, 0])
>>> v2 = np.array([1, 0, 1])
>>> spectral_angle(v1, v2)
    90.0
espm.measures.squared_distance(x, y=None)[source]

Squared distance between two between all colon vectors matrices.

Calculate the squared L2 distance between all pairs of vectors of two matrices. If only one matrix is given, the function uses each pair of vector of this matrix.

Parameters
  • x (np.array 2D) – n x m matrix of first colon vectors

  • y (np.array 2D) – n x m matrix of second colon vectors (optional)

Returns

the answer

Return type

np.array (m x m)

Examples

>>> import numpy as np
>>> from espm.measures import square_distance
>>> x = np.arange(3)
>>> square_distance(x, x)
    array([[ 0.,  1.,  2.],
    [ 1.,  0.,  1.],
    [ 2.,  1.,  0.]])
espm.measures.trace_xtLx(L, x, average=False)[source]

Trace of \(X^T L X\)

Compute the following expression \(\text{Tr} (X^T L X)\).

Parameters
  • L (np.array 2D) – n x n matrix

  • mu (np.array 2D) – n x k martrix of k n-sized vector

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

espm.measures.unique_min(matrix)[source]

From a square matrix of float values, finds the combination of elements with different lines which mimises the sum of elements.

It is a brute force algorithm, it is not recommended to input a matrix bigger than 20 x 20

Parameters

matrix (np.array 2D) – square matrix

Returns

list of unique min values and corresponding indices in the same order

Return type

(list, list[int])

Examples

>>> import numpy as np
>>> from espm.measures import unique_min
>>> matrix = np.array([[1.2,  1.3,  3.5],
                    [4.9,  2.2,  6.5],
                   [9.0,  4.1,  1.8]])
>>> unique_min(v1, v2)   
    ([1.2, 2.2, 1.8], (0, 1, 2))
Utils

Utils for the ESPM package

espm.utils.approx_density(atomic_fraction=False, *, elements_dict={})[source]

Wrapper to the density_of_mixture function of hyperspy. Takes a dict of chemical composition expressed in atomic weight fractions. Returns an approximated density.

espm.utils.arg_helper(params, d_params)[source]

Check if all parameter of d_params are in params. If not, they are added to params with the default value.

Parameters
paramsdict

Dictionary of parameters to be checked.

d_paramsdict

Dictionary of default parameters.

Returns
paramsdict

Dictionary of parameters with the default parameters added if not present.

espm.utils.atomic_to_weight_dict(*, elements_dict={})[source]

Wrapper to the atomic_to_weight function of hyperspy. Takes a dict of chemical composition expressed in atomic fractions. Returns a dict of chemical composition expressed in atomic weight fratiom.

espm.utils.bin_spim(data, n, m)[source]

Take a 3D array of size (x,y,k) [px, py, e] Returns a 3D array of size (n,m,k) [new_px, new_py, e]

espm.utils.check_keys(params, d_params, upperkeys='', toprint=True)[source]

Check if all parameter of d_params are in params. If not, they are added to params with the default value.

Parameters
paramsdict

Dictionary of parameters to be checked.

d_paramsdict

Dictionary of default parameters.

upperkeysstr

String of the upper keys.

toprintbool

If True, print the warning.

Returns
paramsdict

Dictionary of parameters with the default parameters added if not present.

Examples

>>> params = {'a':1,'b':2}
>>> d_params = {'a':1,'b':2,'c':3}
>>> check_keys(params,d_params)
>>> params
{'a': 1, 'b': 2, 'c': 3}
espm.utils.close_all()[source]

Close all opened windows.

espm.utils.create_laplacian_matrix(nx, ny=None)[source]

Helper method to create the laplacian matrix for the laplacian regularization

Parameters
:param nx: height of the original image
:param ny: width of the original image
Returns
rtype

scipy.sparse.csr_matrix ..

:return:the n x n laplacian matrix, where n = nx*ny
espm.utils.is_number(i)[source]

Return True if i is a number

Parameters

i (any) – variable to check

Returns

True if i is a number

Return type

bool

espm.utils.is_symbol(i)[source]

Return True if i is a chemical symbol

Parameters

i (any) – variable to check

Returns

True if i is a chemical symbol

Return type

bool

espm.utils.isdict(p)[source]

Return True if the variable a dictionary.

Parameters

p (any) – variable to check

Returns

True if p is a dictionary

Return type

bool

espm.utils.number_to_symbol_dict(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with atomic numbers as keys (e.g. 26 for Fe) returns a dict of elements with symbols as keys (e.g. Fe for iron)

espm.utils.number_to_symbol_list(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)

espm.utils.process_losses(losses)[source]

Process the losses to be plotted

Parameters
losses: np.ndarray

Array of losses (output of espm.estimators.NMFEstimator.get_losses method)

Returns
values: np.ndarray

Array of values

names: list

List of names

espm.utils.rescaled_DH(D, H)[source]

Rescale the matrices D and H such that the columns of H sums approximately to one.

Parameters
  • D (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

Returns

D_rescale, H_rescale

Return type

np.array 2D, np.array 2D

espm.utils.symbol_list()[source]
espm.utils.symbol_to_number_dict(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)

espm.utils.symbol_to_number_list(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)

Changelog

All notable changes to this project will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

0.1.1 (2023-21-02)

First inoficial release

  • First version of the package on PyPi

  • All main functionality tested

  • Basic documentation

Contributing

Contributions are welcome, and they are greatly appreciated! The development of this package takes place on GitHub. Issues, bugs, and feature requests should be reported there. Code and documentation can be improved by submitting a pull request. Please add documentation and tests for any new code.

The package can be set up (ideally in a fresh virtual environment) for local development with the following:

$ git clone https://github.com/adriente/espm.git
$ pip install -e ".[dev]"

You can improve or add functionality in the espm folder, along with corresponding unit tests in espm/tests/test_*.py (with reasonable coverage). If you have a nice example to demonstrate the use of the introduced functionality, please consider adding a notebook in doc/introduction/notebooks. In general, this can be done as a relative symbolic link to a notebook in the notebooks folder.

Update README.rst and CHANGELOG.rst if applicable.

After making any change, please run the tests, and build the documentation with the following

$ make test
$ make doc

To iterate faster, you can partially run the test suite, at various degrees of granularity, as follows:

$ pytest espm.test.test_utils.py
$ pytest espm.test.test_estimators.py
Making a release
  1. Update the version number and release date in setup.py, espm/__init__.py and CHANGELOG.rst.

  2. Create a git tag with git tag -a v0.2.0 -m "espm v0.2.0".

  3. Push the tag to GitHub with git push --tag. The tag should now appear in the releases and tags tab.

  4. Create a release on GitHub and select the created tag.

  5. Build the distribution with make dist and check that the dist/espm-0.2.0.tar.gz source archive contains all required files. The binary wheel should be found as dist/espm-0.2.0.py3-none-any.whl.

  6. Test the upload and installation process:

    $ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
    $ pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple emspy
    
  7. Build and upload the distribution to the real PyPI with make release.

Repository organization
LICENSE.txt         Project license
*.rst               Important documentation
Makefile            Targets for make
setup.py            Meta information about package (published on PyPI)
.gitignore          Files ignored by the git revision control system
.travis.yml         Defines testing on Travis continuous integration

espm/              Contains the modules (the actual toolbox implementation)
 __init.py__        Load modules at package import
 *.py               One file per module
espm/tables/       Contains data tables for the EDXS and EDSS models
 __init.py__        Load modules at package import
 *.json             data_files for the package
 *.txt              data_files for the package

espm/tests/        Contains the test suites (will be distributed to end user)
 __init.py__        Load modules at package import
 test_*.py          One test suite per module
 test_docstrings.py Test the examples in the docstrings (reference doc)

doc/                Package documentation
 conf.py            Sphinx configuration
 index.rst          Documentation entry page
 *.rst              Include doc files from root directory

doc/reference/      Reference documentation
 index.rst          Reference entry page
 *.rst              Only directives, the actual doc is alongside the code

doc/introduction/
 index.rst          Tutorials entry page
 *.rst              One file per tutorial

doc/introduction/notebooks/
 *.ipynb            One notebook per tutorial

References