espm: The Electron Spectro-Microscopy Python Library

Documentation Status

This library contains algorithms to perform non-negative matrix factorization with diverse regularisation (e.g. Laplacian or L1) and contraints (e.g. simplex).

It is specialized for Electron Microscopy applications. It contains code to create artificial Energy dispersive X-ray spectroscopy (EDXS) data and to perform hyperspectral unmixing on EDXS spectrum images.

Installation

You can install this package from PyPi using:

$ pip install espm

If you want to develop, please use the option:

$ git clone https://github.com/adriente/espm.git
$ cd espm
$ pip install cython
$ pip install -e ".[dev]"

Getting started

Try the api.ipynb notebook in the notebooks folder.

Documentation

The documentation is available at https://espm.readthedocs.io/en/latest/

You can get started with the following notebooks:

CITING

If you use this library, please cite the following paper:

@article{teurtrie2023espm,
title={espm: A Python library for the simulation of STEM-EDXS datasets},
author={Teurtrie, Adrien and Perraudin, Nathana{\"e}l and Holvoet, Thomas and Chen, Hui and Alexander, Duncan TL and Obozinski, Guillaume and H{\'e}bert, C{\'e}cile},
journal={Ultramicroscopy},
pages={113719},
year={2023},
publisher={Elsevier}
}

Documentation map

Introduction to the espm

This tutorial will show you the basic operations of the toolbox. After installing the package with pip, start by opening a python shell, e.g. a Jupyter notebook, and import espm. The package espm is built on top of the hyperspy and the scikit-learn packages.

The hyperspy package is a Python library for multidimensional data analysis. It provides the base framework to handles our data. The scikit-learn package is a Python library for machine learning. It provides the base framework to for the Non Negative Matrix Factorization (NMF) algorithms develeoped in this package.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import hyperspy.api as hs
>>> import espm
Datasets

Let us start by creating some data to play with this tutorial. We can generating artificial datasets using the following lines:

>>> import espm.datasets as ds
>>> ds.generate_built_in_datasets(seeds_range=1)
>>> spim = ds.load_particules(sample = 0)
>>> spim.change_dtype('float64')

The command generate_built_in_datasets will save the generated datasets folder defined in the espm.conf.py file. Alternatively, you can also define the path where the data will be saved using the “base_path” argument.

>>> ds.generate_built_in_datasets(base_path="generated_samples", seeds_range=1)

Here the object spim is of the hyperspy._signals.signal1d.Signal1D. This object has different useful attributes and methods. For example, @ADRIEN — summarize here some of them

Note

Please see the review article espm : a Python library for the simulation of STEM-EDXS datasets for an overview of the simulation methods this package leverages.

Factorization

Taking the non-negative matrix factorization (NMF) is done with the following:

>>> out = spim.decomposition(3)
>>> spim.plot_decomposition_loadings(3)
>>> spim.plot_decomposition_factors(3)
_images/index-3_00.png
_images/index-3_01.png

BIt will use the algorithms developped in a coming contribution.

These algorithms are an important part of this package. They are specialized to solve regularized Poisson NMF problems. Mathematically, they can be expressed as:

\[\dot{W}, \dot{H} = \arg\min_{W\geq\epsilon, H\geq\epsilon, \sum_i H_{ij} = 1} D_{GKL}(X || GWH) + \lambda tr ( H^\top \Delta H) + \mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})$$\]

Here \(D_{GKL}\) is the fidelity term, i.e. the Generalized KL divergence

\[D_{GKL}(X \| Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

The loss is regularized using two terms: a Laplacian regularization on \(H\) and a log regularization on \(H\). \(\lambda\) and \(\mu\) are the regularization parameters. The Laplacian regularization is defined as:

\[\lambda tr ( H^\top \Delta H)\]

where \(\Delta\) is the Laplacian operator (it can be created using the function espm.utils.create_laplacian_matrix). Note that the columns of the matrices :math:`H` and :math:`X` are assumed to be images.

The log regularization is defined as:

\[\mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})\]

where \(\epsilon_{reg}\) is the slope of log regularization at 0. This term acts similarly to an L1 penalty but affects less larger values.

Finally, we assume \(W,H\geq \epsilon\) and that the lines of \(H\) sum to 1:

\[\sum_i H_{ij} = 1.\]

The size of:

  • \(X\) is (n, p)

  • \(W\) is (m, k)

  • \(H\) is (k, p)

  • \(G\) is (n, m)

The columns of the matrices \(H\) and \(X\) are assumed to be images, typically for the smoothness regularization. In terms of shape, we have \(n_x \cdot n_y = p\), where \(n_x\) and \(n_y\) are the number of pixels in the x and y directions.

A detailed example on the use these algorithms can be found in this notebook.

List of example notebooks

To go deeper, we invite you to consult the following notebooks.

Analysis of simulated EDX data

In this notebook we showcase the analysis of the built-in simulated dataset. We use the espm EDXS modelling to simulate the data. We first perform a KL-NMF decomposition of the data and then plot the results.

Imports
[1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

# Generic imports
import hyperspy.api as hs
import numpy as np

# espm imports
from espm.estimators import SmoothNMF
import espm.datasets as ds
Generating artificial datasets and loading them

If the datasets were already generated, they are not generated again

[2]:
# Generate the data
# Here `seeds_range` is the number of different samples to generate
ds.generate_built_in_datasets(seeds_range=1)

# Load the data
spim = ds.load_particules(sample = 0)
# We need to change the data to floats for the decomposition to run
spim.change_dtype('float64')
Building G

The information in the metadata of the spim object are used to build the G matrix. This matrix contains a model of the characteristic X-rays and the bremsstrahlung.

[3]:
spim.build_G("bremsstrahlung")
Problem solving
Picking analysis parameters
  • 3 components for 3 phases

  • convergence criterions : tol and max_iter. tol is the minimum change of the loss function between two iterations. max_iter is the max number of iterations.

  • hspy_comp is a required parameter if you want to use the hyperspy api

[4]:
est = SmoothNMF( n_components = 3,tol=0.000001, max_iter = 50, G = spim.G,hspy_comp = True)
Calculating the decomposition

/! It should take a minute to execute in the case of the built-in dataset

[5]:
out = spim.decomposition(algorithm = est, return_info=True)
It 10 / 50: loss 2.047093e-01,  1.363 it/s
It 20 / 50: loss 2.045638e-01,  1.394 it/s
It 30 / 50: loss 2.045228e-01,  1.411 it/s
It 40 / 50: loss 2.045052e-01,  1.418 it/s
exits because max_iteration was reached
Stopped after 50 iterations in 0.0 minutes and 35.0 seconds.
Decomposition info:
  normalize_poissonian_noise=False
  algorithm=SmoothNMF()
  output_dimension=None
  centre=None
scikit-learn estimator:
SmoothNMF()
Getting the losses and the results of the decomposition
  • First cell : Printing the resulting concentrations.

  • Second cell : Ploting the resulting spectra

  • Thrid cell : Ploting the resulting abundances

Hyperspy is mainly designed to be used with the qt graphical backend of matplotlib. Thus two plots will appear if you are in inline mode.

[6]:
spim.print_concentration_report()
Concentrations report
     p0     p1     p2
V  : 0.0246 0.0079 0.0178
Rb : 0.4842 0.0863 0.5199
W  : 0.0390 0.0092 0.0045
N  : 0.0151 0.0851 0.0134
Yb : 0.0009 0.0826 0.0018
Pt : 0.0012 0.0150 0.0013
Al : 0.0005 0.1082 0.0010
Ti : 0.0002 0.0860 0.0007
La : 0.0004 0.2014 0.0012
[7]:
spim.plot_decomposition_loadings(3)
[7]:
_images/introduction_notebooks_api_13_0.png
_images/introduction_notebooks_api_13_1.png
[8]:
spim.plot_decomposition_factors(3)
[8]:
_images/introduction_notebooks_api_14_0.png
_images/introduction_notebooks_api_14_1.png
Generate simulated data

In this notebook we generate a simulated dataset based on experimental data. - The maps are directly extracted from the experimental data. - The phases (or spectra) are built using chemical composition extracted from the literature (citations to be included)

The output of this notebook is put in the generated_datasets

Imports
[1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

# espm imports
from espm.datasets.base import generate_dataset
from espm.weights.generate_weights import chemical_map_weights
from espm.models.generate_EDXS_phases import generate_modular_phases
from espm.models.EDXS_function import elts_list_from_dict_list

# Generic imports
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import hyperspy.api as hs


Generate weights

We use an experimental dataset to create realistic abundance maps. Theses maps are based from chemical mapping of the dataset.

[2]:
def get_repo_path():
    """Get the path to the git repository.

    This is a bit of a hack, but it works.
    """
    this_path = Path.cwd() / Path("generate_data.ipynb")
    return this_path.resolve().parent.parent

# Path of the experimental dataset
path = get_repo_path() / Path("generated_datasets/71GPa_experimental_data.hspy")

# Creation of the weights
weights = chemical_map_weights( file = path, line_list = ["Fe_Ka","Ca_Ka"], conc_list = [0.5,0.5])
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.2.1/lib/python3.7/site-packages/hyperspy/misc/utils.py:475: VisibleDeprecationWarning: Use of the `binned` attribute in metadata is going to be deprecated in v2.0. Set the `axis.is_binned` attribute instead.
  VisibleDeprecationWarning,
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.2.1/lib/python3.7/site-packages/hyperspy/io.py:575: VisibleDeprecationWarning: Loading old file version. The binned attribute has been moved from metadata.Signal to axis.is_binned. Setting this attribute for all signal axes instead.
  'signal axes instead.', VisibleDeprecationWarning)
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.2.1/lib/python3.7/site-packages/hyperspy/misc/utils.py:475: VisibleDeprecationWarning: Use of the `binned` attribute in metadata is going to be deprecated in v2.0. Set the `axis.is_binned` attribute instead.
  VisibleDeprecationWarning,
/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.2.1/lib/python3.7/site-packages/hyperspy/io.py:575: VisibleDeprecationWarning: Loading old file version. The binned attribute has been moved from metadata.Signal to axis.is_binned. Setting this attribute for all signal axes instead.
  'signal axes instead.', VisibleDeprecationWarning)
Plot the results
[3]:
fig,axs = plt.subplots(1,3,figsize = (20,20*3))
for j in range(axs.shape[0]) :
    im = axs[j].imshow(weights[:,:,j],vmin = 0, vmax = 1.0)
    axs[j].tick_params(axis = "both",width = 0,labelbottom = False,labelleft = False)
    axs[j].set_title("Phase {}".format(j),fontsize = 22)

fig.subplots_adjust(right=0.84)
cbar_ax = fig.add_axes([0.85, 0.47, 0.01, 0.045])
fig.colorbar(im,cax=cbar_ax)
cbar_ax.tick_params(labelsize=22)
_images/introduction_notebooks_generate_data_6_0.png
Generate phases

We generate phases based on values extracted from the literature. The phases we try to simulate here are Ferropericlase, Bridgmanite and Ca-perovskite.

We use dictionnaries to input the modelling parameters.

[4]:
# Elemental concetration of each phase
elts_dicts = [
    # Pseudo ferropericlase
    {
        "Mg" : 0.522, "Fe" : 0.104, "O" : 0.374, "Cu" : 0.05
    },
    # Pseudo Ca-Perovskite
    {
        "Mg" : 0.020, "Fe" : 0.018, "Ca" : 0.188, "Si" : 0.173, "Al" : 0.010, "O" : 0.572, "Ti" : 0.004, "Cu" : 0.05, "Sm" : 0.007, "Lu" : 0.006, "Nd" : 0.006
    },
    # Pseudo Bridgmanite
    {
        "Mg" : 0.445, "Fe" : 0.035, "Ca" : 0.031, "Si" : 0.419, "Al" : 0.074, "O" : 1.136, "Cu" : 0.05, "Hf" : 0.01
    }]

# Parameters of the bremsstrahlung
brstlg_pars = [
    {"b0" : 0.0001629, "b1" : 0.0009812},
    {"b0" : 0.0007853, "b1" : 0.0003658},
    {"b0" : 0.0003458, "b1" : 0.0006268}
]

# Model parameters : energy scale, detector broadening, x-ray emission database, beam energy, absorption parameters, detector efficiency
model_params = {
        "e_offset" : 0.3,
        "e_size" : 1980,
        "e_scale" : 0.01,
        "width_slope" : 0.01,
        "width_intercept" : 0.065,
        "db_name" : "default_xrays.json",
        "E0" : 200,
        "params_dict" : {
            "Abs" : {
                "thickness" : 100.0e-7,
                "toa" : 35,
                "density" : 4.5,
                "atomic_fraction" : False
            },
            "Det" : "SDD_efficiency.txt"
        }
    }

# miscellaneous paramaters : average detected number of X-rays per pixel, phases densities, output folder, model name, random seed
data_dict = {
    "N" : 100,
    "densities" : [1.2,1.0,0.8],
    "data_folder" : "71GPa_synthetic_N100",
    "model" : "EDXS",
    "seed" : 42
}
[5]:
phases = generate_modular_phases(
    elts_dicts=elts_dicts, brstlg_pars = brstlg_pars,
    scales = [1, 1, 1],
    model_params= model_params,
    seed = 42
    )
# scales : bremsstrahlung parameters modifiers

elements = elts_list_from_dict_list(elts_dicts)
Plot the results
[6]:
fig,axs = plt.subplots(3,1,figsize = (20,15))

# Build the energy scale
x = np.linspace(
    model_params["e_offset"],
    model_params["e_offset"]+model_params["e_scale"]*model_params["e_size"],
    num=model_params["e_size"])

for j in range(axs.shape[0]) :
    axs[j].plot(x,phases[j])
    axs[j].set_title("Phase {}".format(j),fontsize = 22)

axs[-1].set_xlabel("Energy loss (eV)",fontsize = 16)
[6]:
Text(0.5, 0, 'Energy loss (eV)')
_images/introduction_notebooks_generate_data_13_1.png
Generate the data

It will produce 1 spectrum images/sample in the target folder.

You can replace seed_range by the number of samples you want to generate.

[7]:
generate_dataset( phases = phases,
                  weights = weights,
                  model_params = model_params,
                  misc_params = data_dict,
                  base_seed=data_dict["seed"],
                  sample_number=1,
                  elements = elements)
100%|██████████| 1/1 [00:04<00:00,  4.33s/it]

The previous command will save the data in the generated_datasets folder defined in the espm.conf.py file.

You can also define the path where the data will be saved using the “base_path” argument

Regularized Poisson NMF on a toy dataset

This notebook is part of the ESPM package. It is available on github

In this notebook, we show how to solve a regularized Poisson Non Negative Matrix Factorization (NMF) on a toy dataset. The general goal of this problem is to separate a matrix \(X\) into two matrices \(D\) and \(H\) such that \(X \approx DH\). We will assume that \(D\) is the product of two matrices \(G\) and \(W\) such that \(D = GW\). The matrix \(G\) is assumed to be known and therefore \(W\) is the matrix we want to find. In the field of electro-microscopy, the matrix \(GW\) represent the phases, the matrix \(H\) is the matrix of maps (or weights) and \(X\) is the data matrix. The maps \(H\) are 2D images. Furthermore, we will assume that \(W, H\) are non-negative or more generally greater than a small positive value \(\epsilon\).

The data measured data \(X\) is follow a Poisson distribution, i.e. \(X_{ij} \sim \text{Poisson}(n_p\dot{X}_{ij})\). In the context of electro-microscopy, \(n_p\) is the number of photons per pixel. \(\dot{X} = G \dot{W} \dot{H}\) would correspond a noiseless measurement.

The size of:

  • \(X\) is (n, p),

  • \(W\) is (m, k),

  • \(H\) is (k, p),

  • \(G\) is (n, m).

In general, m is assumed to be much smaller than n and \(G\) is assumed to be known. The parameter shape_2d defines the shape of the images for the columns of \(H\) and \(X\), i.e. shape_2d[0]*shape_2d[1] = p.

Mathematically. the problem can be formulated as:

\[\dot{W}, \dot{H} = \arg\min_{W\geq\epsilon, H\geq\epsilon, \sum_i H_{ij} = 1} D_{GKL}(X || GWH) + \lambda tr ( H^\top \Delta H) + \mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})\]

Here \(D_{GKL}\) is the fidelity term, i.e. the Generalized KL divergence

\[D_{GKL}(X \| Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

The loss is regularized using two terms: a Laplacian regularization on \(H\) and a log regularization on \(H\). \(\lambda\) and \(\mu\) are the regularization parameters. The Laplacian regularization is defined as:

\[\lambda tr ( H^\top \Delta H)\]

where \(\Delta\) is the Laplacian operator (it can be created using the function :mod:espm.utils.create_laplacian_matrix). Note that the columns of the matrices :math:`H` and :math:`X` are assumed to be images.

The log regularization is defined as:

\[\mu \sum_{i,j} (\log H_{ij} + \epsilon_{reg})\]

where \(\epsilon_{reg}\) is the slope of log regularization at 0. This term acts similarly to an L1 penalty but affects less larger values.

Finally, we assume \(W,H\geq \epsilon\) and that the lines of \(H\) sum to 1:

\[\sum_i H_{ij} = 1.\]

This is done by adding the parameter force_simplex=True to the class espm.estimators.SmoothNMF. This constraint is essential as it prevent the algorithm to find a solution where all the maps tend to 0 because of the other constraints.

In this notebook, we will use the class espm.estimators.SmoothNMF to solve the problem.

Imports and function definition

Let’s start by importing the necessary libraries.

[1]:
# Some useful modules for notebooks
%load_ext autoreload
%autoreload 2
%matplotlib inline
[2]:
import numpy as np
import matplotlib.pyplot as plt
from espm.estimators import SmoothNMF
from espm.measures import find_min_angle, ordered_mse, ordered_mae, ordered_r2
from espm.models import ToyModel
from espm.weights import generate_weights as gw
from espm.datasets.base import generate_spim_sample
from espm.utils import process_losses

Now we define the parameters that will be used to generate the data.

[3]:

seed = 42 # for reproducibility m = 15 # Number of components n = 200 # Length of the phases n_poisson = 300 # Average poisson number per pixel (this number will be splitted on the m dimension) densities = np.random.uniform(0.1, 2.0, 3) # Random densities
[4]:
def get_toy_sample():
    model_params = {"L": n, "C": m, "K": 3, "seed": seed}
    misc_params = {"N": n_poisson, "seed": seed, 'densities' : densities, "model": "ToyModel"}

    toy_model = ToyModel(**model_params)
    toy_model.generate_phases()
    phases = toy_model.phases.T
    weights = gw.generate_weights("toy_problem", None)

    sample = generate_spim_sample(phases, weights, model_params,misc_params, seed = seed)
    return sample

def to_vec(X):
    n = X.shape[2]
    return X.transpose(2,0,1).reshape(n, -1)

sample = get_toy_sample()

GW = sample["GW"].T
G = sample["G"]
H = to_vec(sample["H"])
X = to_vec(sample["X"])
Xdot = to_vec(sample["Xdot"])
shape_2d = sample["shape_2d"]


Let us look at the dimension of our problem.

[5]:
print(f"""
- X: {X.shape}
- Xdot: {Xdot.shape}
- G: {G.shape}
- GW: {GW.shape}
- H: {H.shape}
- shape_2d: {shape_2d}
""")

- X: (200, 10000)
- Xdot: (200, 10000)
- G: (200, 15)
- GW: (200, 3)
- H: (3, 10000)
- shape_2d: (100, 100)

Here Xdot contains the noisless data such that the ground truth H and GW satisfies:

\[X = GWH\]
[6]:
np.testing.assert_allclose(Xdot, GW @ H)

Let us plot the ground truth maps \(H\) (sometimes also called weights). Here the variable shape_2d is used to reshape the columns of \(H\) into images.

[7]:

vmin, vmax = 0,1 cmap = plt.cm.gray_r plt.figure(figsize=(10, 3)) for i, hdot in enumerate(H): plt.subplot(1,3,i+1) plt.imshow(H[i].reshape(shape_2d), cmap=cmap, vmin=vmin, vmax=vmax) plt.axis("off") plt.title(f"GT - Map {i+1}")
_images/introduction_notebooks_toy-problem_12_0.png

We can also plot the phases corresponding to these maps. The phases corresponds to the matrix \(GW\).

[8]:
e = np.linspace(0,1, GW.shape[0])
plt.plot(e, GW)
plt.title("GT - Phases")
plt.xlabel("Frequency [normalized]")
plt.legend([f"phase {i+1}" for i in range(3)]);
_images/introduction_notebooks_toy-problem_14_0.png

The matrix \(G\) is assumed to be known appriori. Here we plot the first 5 lines of \(G\).

[9]:
l = np.linspace(0, 1,n)
plt.plot(l, G[:,:5]);
_images/introduction_notebooks_toy-problem_16_0.png

Note that the function create_toy_sample did not return \(W\) but only \(GW\).

Using the ground truth \(GW\), it can be computed as follows:

[10]:
W = np.linalg.lstsq(GW, G, rcond=None)[0].T
W.shape
[10]:
(15, 3)
Solving the problem
Parameters

Let us define the different hyperparameters of the problem. Feel free to change them and see how the results change.

Let us first define our regularisation parameters.

[11]:
lambda_L = 2 # Smoothness of the maps
mu = 0.05 # Sparsity of the maps

We can additionally add a simplex constraint to the problem by setting force_simplex=True. This will add the constraint that the line of \(H\) sum to 1. This constraint should be activated for the regularizers to work!. Practically, it will prevent the algorithm from simply minimizing \(W\) and increasing \(H\).

[12]:
force_simplex = True

Finally, we can decide to specify \(G\) or not. If the matrix \(G\) is not specified, the algorithm will directly estimate \(GW\) insead of \(W\).

[13]:
Gused = G

Note that wihtout regularisation, i.e. with the parameters

lambda_L = 0
mu = 0
Gused = None
force_simplex=False

we recover the classical Poisson/KL NMF problem. Our algorithm will apply the MU algorithm from Lee and Seung (2001).

Let us define the parameters for the algorithm. Here the class espm.estimator.SmoothNMF heritates from sckit-learn’s NMF class.

[14]:

K = len(H) # Number of components / let assume that we know it params = {} params["tol"]=1e-6 # Tolerance for the stopping criterion. params["max_iter"] = 200 # Maximum number of iterations before timing out. params["hspy_comp"] = False # If should be set to True if hspy data format is used. params["verbose"] = 1 # Verbosity level. params["eval_print"] = 10 # Print the evaluation every eval_print iterations. params["epsilon_reg"] = 1 # Regularization parameter params["linesearch"] = False # Use linesearch to accelerate convergence params["shape_2d"] = shape_2d # Shape of the 2D maps params["n_components"] = K # Number of components params["normalize"] = True # Normalize the data. It helps to keep the range of the regularization parameters lambda_L and mu in a reasonable range. estimator = SmoothNMF(mu=mu, lambda_L=lambda_L, G = Gused, force_simplex=force_simplex, **params)

The function fit_transform will solve the problem and return the matrix \(GW\).

[15]:
GW_est = estimator.fit_transform(X)
It 10 / 200: loss 8.199383e-03,  4.182 it/s
It 20 / 200: loss 8.123480e-03,  4.250 it/s
It 30 / 200: loss 8.096584e-03,  4.276 it/s
It 40 / 200: loss 8.080755e-03,  4.291 it/s
It 50 / 200: loss 8.065004e-03,  4.299 it/s
It 60 / 200: loss 8.019184e-03,  4.304 it/s
It 70 / 200: loss 7.967115e-03,  4.308 it/s
It 80 / 200: loss 7.780344e-03,  4.312 it/s
It 90 / 200: loss 7.693353e-03,  4.314 it/s
It 100 / 200: loss 7.678082e-03,  4.316 it/s
It 110 / 200: loss 7.659617e-03,  4.317 it/s
It 120 / 200: loss 7.625445e-03,  4.316 it/s
It 130 / 200: loss 7.574880e-03,  4.316 it/s
It 140 / 200: loss 7.526745e-03,  4.314 it/s
It 150 / 200: loss 7.498068e-03,  4.314 it/s
It 160 / 200: loss 7.483342e-03,  4.315 it/s
It 170 / 200: loss 7.472914e-03,  4.316 it/s
It 180 / 200: loss 7.464122e-03,  4.315 it/s
It 190 / 200: loss 7.455899e-03,  4.315 it/s
exits because max_iteration was reached
Stopped after 200 iterations in 0.0 minutes and 46.0 seconds.

We can plot the different losses to see how the algorithm converges.

[16]:

losses = estimator.get_losses() values, names = process_losses(losses) plt.figure(figsize=(10, 6)) for name, value in zip(names[:4], values[:4]): plt.plot(value, label=name) plt.xlabel("iteration") plt.yscale("log") plt.legend()
[16]:
<matplotlib.legend.Legend at 0x7f2186eae490>
_images/introduction_notebooks_toy-problem_33_1.png
[17]:
losses[0]
[17]:
(0.00883246, 0.00810643, 0.00018791, 0.00053812, 8., 0.20168493, 0.38053404)

We can easily recover \(W\) and \(H\) from the estimator.

[18]:
H_est = estimator.H_
W_est = estimator.W_

Let us compute some metrics to evaluate the quality of the solution. We will use the metrics defined in the module espm.metrics.

We first compute the angles between the phases GW and the estimated phases GW_est. The angles are computed using the function compute_angles. This function also return the indices to match the estimations and the ground truths. Indeed, there is no reason for them to have the same order.

[19]:
angles, true_inds = find_min_angle(GW.T, GW_est.T, unique=True, get_ind=True)
print("angles : ", angles)
angles :  [7.303671408311623, 25.736214105002425, 30.788294617407914]

Now, we can compute the Mean Squared Error (MSE) between the maps H and the estimated maps H_est.

[20]:
mse = ordered_mse(H, H_est, true_inds)
print("mse : ", mse)
mse :  [0.37937082838545216, 0.11121192704478366, 0.15250586376963057]

Finally, let us plot the results.

[21]:

def plot_results(Ddot, D, Hdotflat, Hflat): fontsize = 30 scale = 15 aspect_ratio = 1.4 marker_list = ["-o","-s","->","-<","-^","-v","-d"] mark_space = 20 # cmap = plt.cm.hot_r cmap = plt.cm.gray_r vmax = 1 vmin = 0 K = len(H) L = len(D) angles, true_inds = find_min_angle(Ddot.T, D.T, unique=True, get_ind=True) mse = ordered_mse(Hdotflat, Hflat, true_inds) mae = ordered_mae(Hdotflat, Hflat, true_inds) r2 = ordered_r2(Hdotflat, Hflat, true_inds) fig, axes = plt.subplots(K,3,figsize = (scale/K * 3 * aspect_ratio,scale)) x = np.linspace(0,1, num = L) for i in range(K): axes[2,i].plot(x,Ddot.T[i,:],'g-',label='ground truth',linewidth=4) axes[2,i].plot(x,D[:,true_inds[i]],'r--',label='reconstructed',linewidth=4) axes[2,i].set_title("{:.2f} deg".format(angles[i]),fontsize = fontsize-2) axes[2,i].set_xlim(0,1) axes[1,i].imshow((Hflat[true_inds[i],:]).reshape(shape_2d),vmin = vmin, vmax = vmax , cmap=cmap) axes[1,i].set_title("R2: {:.2f}".format(r2[true_inds[i]]),fontsize = fontsize-2) # axes[i,1].set_ylim(0.0,1.0) axes[1,i].tick_params(axis = "both",labelleft = False, labelbottom = False,left = False, bottom = False) im = axes[0,i].imshow(Hdotflat[i].reshape(shape_2d),vmin = vmin, vmax = vmax, cmap=cmap) axes[0,i].set_title("Phase {}".format(i),fontsize = fontsize) axes[0,i].tick_params(axis = "both",labelleft = False, labelbottom = False,left = False, bottom = False) axes[2,0].legend() rows = ["True maps","Reconstructed maps","Spectra"] for ax, row in zip(axes[:,0], rows): ax.set_ylabel(row, rotation=90, fontsize=fontsize) fig.subplots_adjust(right=0.84) # put colorbar at desire position cbar_ax = fig.add_axes([0.85, 0.5, 0.01, 0.3]) fig.colorbar(im,cax=cbar_ax) # fig.tight_layout() print("angles : ", angles) print("mse : ", mse) print("mae : ", mae) print("r2 : ", r2) return fig fig = plot_results(GW, GW_est, H, H_est) plt.show()
angles :  [7.303671408311623, 25.736214105002425, 30.788294617407914]
mse :  [0.37937082838545216, 0.11121192704478366, 0.15250586376963057]
mae :  [0.5880329885151475, 0.23360594333565174, 0.35517588268316624]
r2 :  [-7.756461302568054, -0.5126935634346239, -5.284422099392397]
_images/introduction_notebooks_toy-problem_42_1.png

API reference

Models

The espm.models module implements the model objects that are used to simulate features of the data (e.g. in EDXS it corresponds to spectra). The models are also used for the creation of the G matrix, which can be later used for the decomposition of data using NMF.

Note

For now espm supports only the modelling of EDS data but we aim at supporting the modelling of EELS data too.

Models base classes

The PhysicalModel is the abstract class from which the EDX model inherits. The ToyModel can be used for testing data analysis algorithms.

class espm.models.base.Model[source]

Abstract class for models.

abstract generate_g_matr(g_parameters)[source]
abstract generate_phases(phase_parameters)[source]
class espm.models.base.PhysicalModel(e_offset, e_size, e_scale, params_dict, db_name='default_xrays.json', E0=200, **kwargs)[source]

Abstract class of the models

build_energy_scale(e_offset, e_size, e_scale)[source]

Build an array corresponding to a linear energy scale.

Parameters
e_offset
float

Offset of the energy scale

e_size
int

Number of channels of the detector

e_scale
float

Scale in keV/channel

Returns
energy scale
np.array 1D

Linear energy scale based on the given parameters of shape (e_size,)

extract_DB(db_name)[source]

Read the cross-sections from the database

Parameters
db_name
string

Name of the json database file

Returns
data
dict

A dictionnary containing the cross-sections in the database

extract_DB_mdata(db_name)[source]

Read the metadata of the database

Parameters
db_name
string

Name of the json database file

Returns
data
dict

A dictionnary containing the metadata related to the database

class espm.models.base.ToyModel(L: int = 200, C: int = 15, K: int = 3, seed: Optional[int] = None)[source]

Toy model.

Simple model with a random G matrix and phases.

The G matrix is generated with a random number of gaussians per column. The phases are generated using a vector drawn from a Laplacian distribution and multiplied by the G matrix.

Parameters
Lint

Length of the phases.

Cint

Number of possible components in the phases.

Kint

Number of phases.

seedint, optional

Seed for the random number generator.

Examples

>>> from espm.models import ToyModel
>>> import matplotlib.pyplot as plt
>>> model = ToyModel(L=200, C=15, K=3, seed=0)
>>> model.generate_g_matr()
>>> model.generate_phases()
>>> print(model.G.shape)
(200, 15)
>>> print(model.phases.shape)
(200, 3)
>>> plt.plot(model.phases)
_images/models-1.png
generate_g_matr(*args, **kwargs) None[source]

Generate G matrix.

Parameters
argsignored
kwargsignored
Returns
None
generate_phases(*args, **kwargs) None[source]

Generate phases.

Parameters
argsignored
kwargsignored
Returns
None
EDXS model

The espm.models.edxs module implements the creation of the G matrix that contains a modelisation of the characteristic and continuous X-rays.

This module is a key component of the EDXS data simulation and the EDXS data analysis.

class espm.models.edxs.EDXS(*args, width_slope=0.01, width_intercept=0.065, **kwargs)[source]
generate_g_matr(g_type='bremsstrahlung', reference_elt={}, *, elements=[], **kwargs)[source]

Generate the G matrix. With a complete model the matrix is (e_size,n+2). The first n columns correspond to the sum of X-ray characteristic peaks associated to each shell of the elements. The last 2 columns correspond to a bremsstrahlung model.

The first n columns of G (noted \(\kappa\)), corresponding to the characteristic X-rays, can be calculated using the following formula:

\[\kappa_{k,Z} = \sum_{ij} x_{ij}(Z) \frac{1}{\Delta(\varepsilon_k) \sqrt{2\pi} } e{^{-\frac{1}{2} {\left( \frac{\varepsilon_k - \varepsilon^{ij}(Z)}{\Delta(\varepsilon_k)} \right)}^2}}\]

where \(\varepsilon_k\) is the energy of the kth energy channel, \(\varepsilon^{ij}(Z)\) is the energy of the ijth line of the Zth element, \(\Delta(\varepsilon_k)\) is the FWHM of the detector at the energy \(\varepsilon_k\) and \(x_{ij}(Z)\) is the intensity ratio of the ijth line of the Zth element. The last term of the equation is a gaussian function centered at \(\varepsilon^{ij}(Z)\) and with a FWHM of \(\Delta(\varepsilon_k)\).

The last two columns of G (noted \(\beta\)), corresponding to the bremsstrahlung model, can be calculated using the following formula:

\[ \begin{align}\begin{aligned}\beta_{k,n+1} = \frac{\varepsilon_0 - \varepsilon_k}{\varepsilon_0 \varepsilon_k} \times \frac{1 - (\varepsilon_0 - \varepsilon_k)}{\varepsilon_0}\\\beta_{k,n+2} = \frac{(\varepsilon_0 - \varepsilon_k)^2}{\varepsilon^2_0 \varepsilon_k}\end{aligned}\end{align} \]

Note that there is no parameter for the bremsstrahlung since it is supposed to be learned with the W matrix (see espm.estimators).

Parameters
g_type
string

Options of the edxs model to include in the G matrix. The three possibilities are “identity”, “no_brstlg” and “bremsstrahlung”. G is going to be the identity matrix, only characteristic X-ray peaks or characteristic X-rays plus bremsstrahlung model, respectively.

reference_elt
dict

The keys are chemical elements (atomic number) and the values are cut-off energies. This argument is used to split some of the columns of G into 2 columns. The first column corresponds to characteristic X-rays before the cut-off and second one corresponds to characteristic X-rays before the cut-off. This feature is implemented to enable more accurate absorption correction.

elements
list

List of modeled chemical elements. The list can be populated either with atomic numbers or chemical symbols, e.g. “Fe” or 26.

Returns
g matrix
np.array 2D

matrix of the edx model.

Notes

See our paper about the espm package [TPH+23] for more information about the equations of this model.

generate_phases(phases_parameters)[source]

Generate a series of spectra from list of phase parameters.

Parameters
phases_parameters
list

List of dicts containing the phase parameters.

Returns
phase_array
np.array 2D

Array which lines correspond to the modelled phases in the input list.

Notes

The absorption correction is done using the average composition of the phases. The same correction is used for each phase.

generate_spectrum(b0=0, b1=0, scale=1.0, abs_elts_dict={}, *, elements_dict={})[source]

Generate a spectrum from bremsstrahlung parameters and a chemical composition. The modelling is done using the same formula as for the generate_g_matr function.

Parameters
b0
float

First bremsstrahlung parameter.

b1
float

Second bremsstrahlung parameter.

scale
float

Scale factor to apply to the bremsstrahlung. Feature to be removed, scaling should be done with b0 and b1.

abs_elts_dict
dict

Dictionnary of elements and associated concentrations. This dictionnary is used to calculate the contribution of the absorption in the spectrum.

elements_dict
dict

Dictionnary of elements and associated concentrations describing the chemical composition of the modeled sample.

Returns
——-
spectrum
np.array 1D

Output edx spectrum corresponding to the input chemical composition.

Notes

Check EDXS_function for details about the bremsstrahlung model.

Examples

>>> import matplotlib.pyplot as plt
>>> from espm.models.edxs import EDXS
>>> from espm.conf import DEFAULT_EDXS_PARAMS
>>> b0, b1 = 5.5367e-5, 0.00192181
>>> elts_dict = {"Si" : 1.0,"Ca" : 1.0,"O" : 3.0,"C" : 0.3}
>>> model = EDXS(**DEFAULT_EDXS_PARAMS)
>>> spectrum = model.generate_spectrum(b0,b1, elements_dict = elts_dict)
>>> plt.plot(spectrum)
_images/models-2.png
espm.models.edxs.G_EDXS(model, g_params, part_W=None, G=None)[source]

Update the bremsstrahlung part of the G matrix. This function is used for the NMF decomposition so that the absorption correction is updated in between each step.

Parameters
model_params
dict

Parameters of the edxs model. Check espm.conf.DEFAULT_EDXS_PARAMS for an example.

g_params
dict

Parameters specific to the creation of the G matrix.

part_W
np.array 2D

Concentrations determined during the NMF decomposition. It is used to determine the average composition that will be used to determine the absorption correction.

G
np.array 2D

Current G matrix. Setting it to None will initialize the G matrix.

Returns
updated G
np.array 2D

Updated G matrix with a new absorption correction.

EDXS functions

The espm.models.EDXS_function module implements misceallenous functions required for the :mod: espm.models.edxs. This module include bremsstrahlung modelling, database reading and dicts and list manipulation functions.

espm.models.EDXS_function.G_bremsstrahlung(x, E0, params_dict, *, elements_dict={})[source]

Computes the two-parts continuum X-rays for the G matrix. The two parts of the bremsstrahlung are constructed separately so that its parameters can fitted to data. Absorption and detection are multiplied to each part.

Parameters
x
np.array 1D

Energy scale.

params_dict
dict

Dictionnary containing the absorption and detection parameters.

elements_dict
dict

Composition of the studied sample. It is required for absorption calculation.

Returns
continuum_xrays
np.array 2D

Two parts continuum X-rays model with shape (energy scale size, 2).

espm.models.EDXS_function.chapman_bremsstrahlung(x, b0, b1, b2)[source]

Calculate the bremsstrahlung as parametrized by chapman et al.

Parameters
x
np.array 1D

Energy scale.

b0
float

First parameter, corresponding to the inverse of the energy.

b1
float

Second parameter, corresponding to the energy.

b2
float

Third parameter, corresponding to the square of the energy.

Returns
bremsstrahlung
np.array 1D

Bremsstrahlung model

Notes

For details see [CNC84]

espm.models.EDXS_function.continuum_xrays(x, params_dict={}, b0=0, b1=0, E0=200, *, elements_dict={'Si': 1.0})[source]

Computes the continuum X-rays, i.e. the bremsstrahlung multiplied by the absorption and the detection efficiency.

Parameters
x
np.array 1D

Energy scale.

params_dict
dict

Dictionnary containing the absorption and detection parameters.

b0
float

First parameter.

b1
float

Second parameter.

E0
float

Energy of the incident beam in keV.

elements_dict
dict

Composition of the studied sample. It is required for absorption calculation.

Returns
continuum_xrays
np.array 1D

Continuum X-rays model.

Notes

  • When an empty dict is provided. The continuum X-rays are set to 0. This is useful to perform calculations without the bremsstrahlung for example.

  • For an example structure of the params_dict parameter, check the DEFAULT_EDXS_PARAMS espm.conf.

  • For a custom detection efficiency, check the spectrum fit notebook.

espm.models.EDXS_function.elts_dict_from_W(part_W, *, elements=[])[source]

Create a dictionnary of the elemental concentration from a fitted W. It useful to recompute absorption during the our custom NMF calculations.

Parameters
part_W
np.array 2D

W matrix output from the NMF calculation. It only makes sense when the NMF decomposition if performed with G.

elements
list

List of elements as atomic number or symbol.

Returns
elements_dictionnary
dict

Dictionnary containing the concentration associated to each chemical element of the problem.

Notes

This function should maybe move to another part of espm.

Examples

>>> import numpy as np
>>> from espm.models.EDXS_function import elts_dict_from_W
>>> W = np.ones((4,3))
>>> elts = ["Si","Ca","O","C"]
>>> elts_dict_from_W(W,elements = elts)
    {"Si" : 0.25, "Ca" : 0.25, "O" : 0.25, "C" : 0.25}
espm.models.EDXS_function.elts_dict_from_dict_list(dict_list)[source]

Create a single dictionnary of the elemental concentration from a list of dictionnary containing chemical compositions. The new dictionnary corresponds to the average chemical composition of the list.

Parameters
dict_list
list [dict]

list of chemical composition dictionnary

Returns
elements_dictionnary
dict

Dictionnary containing the concentration associated to each chemical elements contained in the input.

Notes

This function should maybe move to another part of espm.

Examples

>>> import numpy as np
>>> from espm.models.EDXS_function import elts_dict_from_dict_list
>>> dicts = [{"Si" : 1.0},{"Ca" : 1.0},{"O" : 1.0},{"C" : 1.0}]
>>> elts_dict_from_dict_list(dicts)
    {"Si" : 0.25, "Ca" : 0.25, "O" : 0.25, "C" : 0.25}
espm.models.EDXS_function.elts_list_from_dict_list(dict_list)[source]

Create a list of elements from a list of dictionnary containing chemical compositions. The list contains all the elements contained in the input.

Parameters
dict_list
list [dict]

list of chemical composition dictionnary

Returns
elements_list
list

List of elements as atomic number or symbol.

Examples

>>> import numpy as np
>>> from espm.models.EDXS_function import elts_list_from_dict_list
>>> dicts = [{"Si" : 1.0, "C" : 0.5, "O" : 0.5},{"Ca" : 1.0, "K" : 2.0},{"O" : 1.0, "Si" : 0.5},{"C" : 1.0}] 
>>> elts_list_from_dict_list(dicts)
    ["Si","Ca","O","C","K"]
espm.models.EDXS_function.gaussian(x, mu, sigma)[source]

Calculate the gaussian function according to the following formula:

\[f(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(x-\mu)^2}{2 \sigma^2}}\]
Parameters
xnp.array 1D

Energy scale.

mufloat

Mean of the gaussian.

sigmafloat

Standard deviation of the gaussian.

Returns
gaussiannp.array 1D

Gaussian distribution on x, with mean mu and standard deviation sigma.

espm.models.EDXS_function.lifshin_bremsstrahlung(x, b0, b1, E0=200)[source]

Calculate the custom parametrized bremsstrahlung inspired by the model of L.Lifshin.

The model is designed to have the same definition set as the original model of L.Lifshin with \(b_0 > 0\) and \(b_1 > 0\).

The model is defined as follows:

\[f(\varepsilon) = b_0 \frac{\varepsilon_0 - \varepsilon}{\varepsilon_0 \varepsilon} \left(1 - \frac{\varepsilon_0 - \varepsilon}{\varepsilon_0}\right) + b_1 \frac{\left( \varepsilon_0 - \varepsilon \right) ^2}{\varepsilon_0^2 \varepsilon}\]

where \(\varepsilon_0\) is the energy of the incident beam, \(\varepsilon\) is the energy of the emitted photon and \(b_0\) and \(b_1\) are the parameters of the model.

Parameters
x
np.array 1D

Energy scale.

b0
float

First parameter.

b1
float

Second parameter.

E0
float

Energy of the incident beam in keV.

Returns
bremsstrahlung
np.array 1D

Bremsstrahlung model

Notes

For details see L.Lifshin, Ottawa, Proc.9.Ann.Conf.Microbeam Analysis Soc. 53. (1974)

espm.models.EDXS_function.lifshin_bremsstrahlung_b0(x, b0, E0=200)[source]

Calculate the first part of our bremsstrahlung model.

espm.models.EDXS_function.lifshin_bremsstrahlung_b1(x, b1, E0=200)[source]

Calculate the second part of our bremsstrahlung model.

espm.models.EDXS_function.read_compact_db(elt, db_dict)[source]

Read the energy and cross section of each line of a chemical element for compact databases.

Parameters
elt
string

Atomic number.

db_dict
dict

Dictionnary extracted from the json database containing the emission lines and their energies.

Returns
energies
list float

List of energies associated to each emission line of the given element

cross-sections
list float

List of emission cross-sections of each line of the given element

espm.models.EDXS_function.read_lines_db(elt, db_dict)[source]

Read the energy and cross section of each line of a chemical element for explicit databases.

Parameters
elt
string

Atomic number.

db_dict
dict

Dictionnary extracted from the json database containing the emission lines and their energies.

Returns
energies
list float

List of energies associated to each emission line of the given element

cross-sections
list float

List of emission cross-sections of each line of the given element

espm.models.EDXS_function.shelf(x, height, length)[source]

Calculate the shelf contribution of the EDXS Silicon Drift Detector (SDD) detector.

Parameters
x
np.array 1D

Energy scale.

height
float

Height in intensity of shelf contribution of the detector.

length
float

Length in energy of shelf contribution of the detector.

Returns
shelf
np.array 1D

SDD shelf model.

Notes

For details see [SP09]

espm.models.EDXS_function.update_bremsstrahlung(G, part_W, model, elements_list)[source]

Update the continuum X-rays part of the G matrix with the current average chemical composition.

Parameters
G
np.array 2D

EDXS modelling matrix.

part_W
np.array 2D

Learned chemical concentrations.

model
espm.models.EDXS

EDXS model.

elements_list
list

List of elements as atomic number or symbol.

Returns
new_G
np.array 2D

New G matrix with updated continuum X-rays models.

Notes

This function should maybe move to another part of espm.

Absorption functions

The espm.models.absorption_edxs module implements the functions to calculate the contribution of absorption in edx spectra.

espm.models.absorption_edxs.absorption_coefficient(x, atomic_fraction=False, *, elements_dict={'Si': 1.0})[source]

Calculate the mass-absorption coefficent of a given composition at a certain energy or over a certain energy range.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
mass-absorption coefficient
np.array 1D

Calculated value of the mass-absorption coefficient on all the energy range.

Notes

The mass-absorption coefficients are calculated using the database of hyperspy [dlPPF+22].

espm.models.absorption_edxs.absorption_correction(x, thickness=1e-05, toa=90, density=None, atomic_fraction=False, *, elements_dict={'Si': 1.0}, **kwargs)[source]

Calculate the contribution of the absorption on the EDX spectrum for a thin slab of material with a given composition at a certain energy or over a certain energy range. The absorption correction is calculated using the following formula :

\[\frac{1 - \exp(-\chi)}{\chi} \]

where \(\chi = \mu \rho t / \sin(\theta)\), \(\mu\) is the mass-absorption coefficient, \(\rho\) is the density of the material, \(t\) is the thickness of the sample and \(\theta\) is the average take-off angle of the x-rays travelling from the sample to the x-ray detectors.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

thickness
float

Thickness of the material slab in meter. If the thickness is set to 0.0, the function will return 1.0.

toa
float

Take-off angle in degrees of the x-rays travelling from the sample to the x-ray detectors.

density
float

Density of the material in g/m3 (to be checked). If None, an approximation of the density based on the atomic numbers of the elements of the materials is calculated.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
absorption correction
np.array 1D

Calculated value of the absorption correction on all the energy range.

espm.models.absorption_edxs.absorption_mass_thickness(x, mass_thickness, toa=90, atomic_fraction=True, *, elements_dict={'Si': 1.0})[source]

Calculate the contribution of the absorption of a mass-thickness map.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

mass_thickness:
np.array 2D

Mass-thickness map. See the notes for details.

toa
float

Take-off angle in degrees of the x-rays travelling from the sample to the x-ray detectors.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
absorption correction
np.array 3D

Calculated value of the absorption correction on all the energy range for all pixels of the mass-thickness map.

Notes

The mass-thickness map can be extracted from an EDX spectrum image by calculating the concentration of an element for both K and L lines. Since there should be only one concentration for both lines, it is possible to determine the contribution of the absorption for each pixel and thus obtain the mass-thickness map.

espm.models.absorption_edxs.det_efficiency(x, det_dict)[source]

Simulate the detection efficiency of an EDX detector. The absorption wihtin one layer of the detector is calculated according to the following equation, corresponding to the Beer-Lambert law:

\[d_k = \exp\left( - \mu(\varepsilon_k)\rho t \right)\]

where \(d_k\) is the detection efficiency at the energy \(\varepsilon_k\), \(\mu(\varepsilon_k)\) is the mass-absorption coefficient at the energy \(\varepsilon_k\), \(\rho\) is the density of the material, and \(t\) is the thickness of the material.

In a detection layer of the detector, an absobed photon counts as a detected photon so that the detection efficiency is equal to 1 minus the absorption. The detection efficiency is calculated as the product of the absorption of each layer of the detector.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

det_dict
dict

See Examples for an example of the structure of the dict. One of the layer of the input dictionnary should be named “detection” to model a active layer of the detector.

Returns
efficiency
np.array 1D

Calculated detection efficiency based on the input layers model.

Examples

>>> from espm.models.absorption_edxs import det_efficiency
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dict = {"detection" : {"thickness" : 10e-3, "density" : 2.3, "atomic_fraction" : False, "elements_dict" : {"Si" : 1.0}},
>>> "dead_layer" : {"thickness" : 50e-7, "density" : 2.7, "atomic_fraction" : False, "elements_dict" : {"Al" : 1.0}}}
>>> x = np.linspace(0.5,20,num = 1000)
>>> efficiency = det_efficiency(x,dict)
>>> plt.plot(x,efficiency)
_images/models-3.png
espm.models.absorption_edxs.det_efficiency_from_curve(x, filename, kind='cubic')[source]

Interpolate a detection efficiency curve that is stored in ~/espm/tables.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

filename:
string

Name of the file of the detection efficiency.

kind
string

Polynomial order of the interpolation.

Returns
Detection efficiency
np.array 1D

Interpolated detection efficiency.

espm.models.absorption_edxs.det_efficiency_layer(x, thickness=1e-05, density=None, atomic_fraction=False, *, elements_dict={'Si': 1.0})[source]

Calculate the proportion of absorbed X-rays for one virtual layer of the detector.

Parameters
x
np.array 1D

Energy value or energy scale over which the mass-absorption coefficient is calculated.

thickness
float

Thickness of the material slab in meter. If the thickness is set to 0.0, the function will return 1.0.

density
float

Density of the material in g/m3 (to be checked). If None, an approximation of the density based on the atomic numbers of the elements of the materials is calculated.

atomic_fraction
boolean

If True, the atomic concentrations of elements_dict are converted in atomic weight fractions.

elements_dict
dict

Dictionnary with chemical elements as keys and atomic weight fractions (or atomic concentrations) as values.

Returns
absorption
np.array 1D

Calculated absorption of one layer of the detector.

Notes

We assume that the X-rays arrive perpendicular to the detector.

espm.models.generate_EDXS_phases.generate_brem_params(seed)[source]

Generate random parameters for the Bremsstrahlung model with a scaling factor that somewhat resembles the real data.

Parameters
seedint

Seed for the random number generator.

Returns
dict

Dictionary containing the parameters for the Bremsstrahlung model.

espm.models.generate_EDXS_phases.generate_elts_dict(seed, nb_elements=3)[source]

Generate a random dictionary of elements and their relative abundance. The elements are limited to the range 6-82.

Parameters
seedint

Seed for the random number generator.

nb_elementsint, optional

Number of elements to generate. The default is 3.

Returns
dict

Dictionary containing the elements and their relative abundance.

espm.models.generate_EDXS_phases.generate_modular_phases(elts_dicts=3, brstlg_pars=None, scales=None, model_params=None, seed=0)[source]

Generate an array of phases of the EDXS model with set model parameters, elemental compositions and bremsstrahlung parameters.

Parameters
elts_dictsint or list, optional

Number of phases to generate or list of dictionnaries containing the chemical compositions. The default is 3.

brstlg_parslist, optional

List of dictionnaries containing the bremsstrahlung parameters. If the parameters are not set, they are generated randomly.

scaleslist, optional

List of scaling factors for the phases. If the scales are not set, they are set to 1.0.

model_paramsdict, optional

Dictionary containing the model parameters. If the parameters are not set, they are set to the default values. See the config file for the default values.

seedint, optional

Seed for the random number generator. The default is 0.

Returns
np.array

Array of phases of the EDXS model.

Examples

>>> from espm.models.generate_EDXS_phases import generate_modular_phases
>>> import matplotlib.pyplot as plt
>>> phases = generate_modular_phases(elts_dicts = [{"8": 0.5, "14": 0.2, "26": 0.3}, {"8" : 0.5, "23" : 0.3, "7" : 0.2}],
... brstlg_pars = [{"b0" : 0.03, "b1" : 0.01}, {"b0" : 0.002, "b1" : 0.0025}])
>>> plt.plot(phases[0])
>>> plt.plot(phases[1])
_images/models-4.png
espm.models.generate_EDXS_phases.generate_random_phases(n_phases=3, seed=0)[source]

Generate a list of phases of the EDXS model with random elemental compositions and Bremsstrahlung parameters. The model parameters are set to the default values.

Parameters
n_phasesint, optional

Number of phases to generate. The default is 3.

seedint, optional

Seed for the random number generator. The default is 0.

Returns
np.array

Array of phases of the EDXS model.

espm.models.generate_EDXS_phases.unique_elts(dict_list)[source]

Generate a list of unique chemical elements from a list of chemical elements dictionnaries.

Parameters
dict_listlist

List of dictionnaries containing the elements and their relative abundance.

Returns
list

List of unique chemical elements.

Examples

>>> unique_elts([{"elements_dict" : {"18" : 0.2, "8" : 0.5, "12" : 0.3}}, {"elements_dict" : {"8" : 0.5, "26" : 0.3, "14" : 0.2}}])
['18', '8', '12', '26', '14']
Weights
Weights

This module contains functions to generate phase distribution maps. Generate simple maps using espm.weights.generate_weights or more complex maps using espm.weights.abundance.

Weights creation class

The espm.weights.abundance module implements the class Abundance which is used to create the weights of the phases. The weights are stored in the attribute weights and are normalized to 1.

class espm.weights.abundance.Abundance(shape_2d, n_phases)[source]
add_chemical_map(file, element_line, conc_min, conc_max, sigma, phase_id, **kwargs)[source]

Function to add a chemical map extracted from a EDS spectrum image as a phase. A threshold and blurring are automatically applied to the map to suppress noise.

Parameters
filestr

Path to the EDS spectrum image.

element_linestr

Element line to be used for the chemical map. The corresponding element has to be in the metadata of the spectrum image. For example, if the element is ‘Cu’ and the line is ‘K’, the element_line parameter should be ‘Cu_K’.

conc_minfloat

Minimum concentration. It has to be between 0.0 and 1.0.

conc_maxfloat

Maximum concentration. It has to be between 0.0 and 1.0.

sigmafloat

Sigma parameter for the Gaussian filter. It has to be greater than 0.0. Note that the automatic threshold is applied to the blurred map.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

**kwargs

Keyword arguments for the get_lines_intensity method of the EDS spectrum image.

Returns
None.
add_gaussian_ripple(center, width, conc_max, phase_id)[source]

Function to define a gaussian ripple spanning over the whole length of the weights abundance of a defined phase. The concentration is max at the center and 0.0 on the edges.

Parameters
centerint

Position on the mean of the gaussian ripple in pixels.

widthint

Full width at half maximum of the gaussian in pixels.

conc_maxfloat

Maximum concentration of the gaussian ripple. It has to be between 0.0 and 1.0.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

Returns
None.

Examples

>>> from espm.weights.abundance import Abundance
>>> from matplotlib import pyplot as plt
>>> shape_2d = (100,100)
>>> n_phases = 2
>>> gaussian_ripple = Abundance(shape_2d, n_phases)
>>> gaussian_ripple.add_gaussian_ripple(50, 15, 1.0, 1)
>>> plt.imshow(gaussian_ripple.weights[:,:,1])
_images/weights-1.png
add_image(image, phase_id, conc_min, conc_max)[source]

Function to add a 2D numpy array as a phase.

Parameters
imagenumpy.ndarray

2D numpy array with the phase. It has to be the same size as the shape_2d parameter.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

conc_minfloat

Minimum concentration. It has to be between 0.0 and 1.0.

conc_maxfloat

Maximum concentration. It has to be between 0.0 and 1.0.

Returns
None.

Examples

>>> from espm.weights.abundance import Abundance
>>> import hyperspy.api as hs
>>> from matplotlib import pyplot as plt
>>> shape_2d = (512,512)
>>> n_phases = 2
>>> image = Abundance(shape_2d, n_phases)
>>> data = hs.datasets.example_signals.object_hologram()
>>> image.add_image(data.data, 1, 0.0, 1.0)
>>> plt.imshow(image.weights[:,:,1])
_images/weights-2.png
add_laplacian(seed, phase_id, conc_min, conc_max, size_x=50, size_y=50)[source]

Function to generate smooth noise. The characteristic length scale of the noise variations is defined by the scale_x and scale_y parameters.

Parameters
seedint

Seed for the random number generator.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

conc_minfloat

Minimum concentration. It has to be between 0.0 and 1.0.

conc_maxfloat

Maximum concentration. It has to be between 0.0 and 1.0.

size_xint, optional

characteristic length scale of the noise variations in the x direction. The default is 50.

size_yint, optional

characteristic length scale of the noise variations in the y direction. The default is 50.

Returns
None.

Examples

>>> from espm.weights.abundance import Abundance
>>> from matplotlib import pyplot as plt
>>> shape_2d = (100,100)
>>> n_phases = 2
>>> laplacian = Abundance(shape_2d, n_phases)
>>> laplacian.add_laplacian(1, 1, 0.0, 1.0, 10, 10)
>>> plt.imshow(laplacian.weights[:,:,1])
_images/weights-3.png
add_random(seed, phase_id, conc_min, conc_max)[source]

Function to generate random noise.

Parameters
seedint

Seed for the random number generator.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

conc_minfloat

Minimum concentration. It has to be between 0.0 and 1.0.

conc_maxfloat

Maximum concentration. It has to be between 0.0 and 1.0.

Returns
None.

Examples

>>> from espm.weights.abundance import Abundance
>>> from matplotlib import pyplot as plt
>>> shape_2d = (100,100)
>>> n_phases = 2
>>> random = Abundance(shape_2d, n_phases)
>>> random.add_random(1, 1, 0.0, 1.0)
>>> plt.imshow(random.weights[:,:,1])
_images/weights-4.png
add_sphere(ind_origin, radius, conc_max, phase_id, asym_x=1.0, asym_y=1.0)[source]

Function to define a sphere abundance of a defined phase. The concentration is max at centre and 0.0 on the edges.

Parameters
ind_origintuple of integers

Coordinates of the centre of the sphere.

radiusfloat

Radius of the sphere in pixels.

conc_maxfloat

Maximum concentration of the sphere.

phase_idint

Index of the phase

asym_xfloat, optional

Asymmetry of the sphere in the x direction. The default is 1.0.

asym_yfloat, optional

Asymmetry of the sphere in the y direction. The default is 1.0.

Returns
None.

Examples

>>> from espm.weights.abundance import Abundance
>>> from matplotlib import pyplot as plt
>>> shape_2d = (100,100)
>>> n_phases = 2
>>> sphere = Abundance(shape_2d, n_phases)
>>> sphere.add_sphere((50,50), 30, 1.0, 1)
>>> plt.imshow(sphere.weights[:,:,1])
_images/weights-5.png
add_wedge(ind_origin, length, width, conc_min, conc_max, phase_id)[source]

Function to define a wedge abundance of a defined phase. The concentration is max at the top and min at the bottom.

Parameters
ind_origintuple of integers

Coordinates of the top left corner of the wedge. It has to be inside the current weights, i.e. ind_origin[0] + length <= self.shape_2d[0] and ind_origin[1] + width <= self.shape_2d[1].

lengthint

Length of the wedge in pixels.

widthint

Width of the wedge in pixels.

conc_minfloat

Minimum concentration of the wedge. It has to be between 0.0 and 1.0.

conc_maxfloat

Maximum concentration of the wedge. It has to be between 0.0 and 1.0.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

Returns
None.

Examples

>>> from espm.weights.abundance import Abundance
>>> from matplotlib import pyplot as plt
>>> shape_2d = (100,100)
>>> n_phases = 2
>>> wedge = Abundance(shape_2d, n_phases)
>>> wedge.add_wedge((0,0), 50, 50, 0.0, 1.0, 1)
>>> plt.imshow(wedge.weights[:,:,1])
_images/weights-6.png
check_add_weights(val, phase_id)[source]

Check if the sum of the weights is below 1. If it is, add the new abundance. If not, print a message.

Parameters
valarray

Array of the new abundance.

phase_idint

Index of the phase. It has to be between 1 and n_phases-1.

Returns
None.
scale_phase(values, conc_min, conc_max)[source]

Scale the values of a phase between conc_min and conc_max. If the values are all zero, the function returns the values without scaling.

Parameters
valuesarray

Array of the abundance to scale.

conc_minfloat

Minimum concentration of the phase. It has to be between 0.0 and 1.0.

conc_maxfloat

Maximum concentration of the phase. It has to be between 0.0 and 1.0.

Returns
scaled_valuesarray

Array of the scaled abundance.

property weights
Predetermined weights

The :mod: espm.weights.generate_weights module implements functions to generate predetermined weights to quickly and easily generate spatial phase distributions. For a more advanced weights creation, see the :mod: espm.weights.abundance module.

espm.weights.generate_weights.chemical_map_weights(file=None, line_list=[], conc_list=[], **kwargs)[source]

Generate a weight matrix based on an experimental EDS spectrum image. For each selected line a map is generated with the corresponding concentration.

Parameters
shape_2dtuple, optional

Shape of the weight matrix. The default is [80, 80].

filestr, optional

Path to the EDS spectrum image. If None, the function does nothing.

line_listlist, optional

List of lines to be used. The default is []. If line_list is empty, the function does nothing. The lines must be in the form of a string, e.g. “Fe_Ka”.

conc_listlist, optional

List of concentrations for each line. The default is []. If conc_list is empty, the concentration of each line is set to 1/len(line_list).

Returns
weightsnumpy.ndarray

Weight matrix with chemical map distribution.

espm.weights.generate_weights.gaussian_ripple_weights(shape_2d, width=1, seed=0, **kwargs)[source]

Generate a weight matrix with a gaussian ripple of the given width and randomly centered.

Parameters
shape_2dtuple

Shape of the weight matrix.

widthint, optional

Width of the gaussian ripple. The default is 1.

seedint, optional

Seed for the random number generator. The default is 0. If seed is 0, the gaussian ripple is centered at half the second dimension of the weight matrix.

Returns
weightsnumpy.ndarray

Weight matrix with gaussian ripple distribution.

Examples

>>> from espm.weights.generate_weights import gaussian_ripple_weights
>>> import matplotlib.pyplot as plt
>>> weights = gaussian_ripple_weights((100,100), 20, 0)
>>> fig = plt.figure(figsize=(5,2))
>>> axs = fig.subplots(1,2)
>>> for i in range(2):
...     axs[i].imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     axs[i].set_title(f"Map {i+1}")
>>> fig.show()
_images/weights-7.png
espm.weights.generate_weights.generate_weights(weight_type, shape_2d, n_phases=3, seed=0, **params)[source]

Generate a weight matrix with the given type. Additional parameters can be passed to the function as keyword arguments.

Parameters
weight_typestr

Type of weight matrix. Accepted types : random, laplacian, sphere, gaussian_ripple, wedge, toy_problem, chemical_map.

shape_2dtuple

Shape of the weight matrix.

n_phasesint, optional

Number of phases. The default is 3. The first phase is the complementary of the other phases.

seedint, optional

Seed for the random number generator. The default is 0.

**paramsdict

Additional parameters for the weight matrix generation. See the documentation of the corresponding function for more details.

Returns
weightsnumpy.ndarray

Weight matrix with the given type.

espm.weights.generate_weights.laplacian_weights(shape_2d, n_phases=3, seed=0, size_x=10, size_y=10, **kwargs)[source]

Generates a random weight matrix with a laplacian distribution.

The random weight are then filtered with median filter 2 times to smooth the distribution. Eventually, the result is normalized to sum up to one.

Parameters
shape_2dtuple

Shape of the weight matrix.

n_phasesint, optional

Number of phases. The default is 3.

seedint, optional

Seed for the random number generator. The default is 0.

Returns
weightsnumpy.ndarray

Weight matrix with laplacian distribution.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from espm.weights.generate_weights import laplacian_weights
>>> weights = laplacian_weights((100,100), 3, 0)
>>> fig = plt.figure(figsize=(10,3))
>>> axs = fig.subplots(1,3)
>>> for i in range(3):
...     axs[i].imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     axs[i].set_title(f"Map {i+1}")
>>> fig.show()
_images/weights-8.png
espm.weights.generate_weights.random_weights(shape_2d, n_phases=3, seed=0)[source]

Generates a random weight matrix with a uniform distribution.

The random weights are then normalized to sum up to one.

Parameters
shape_2dtuple

Shape of the weight matrix.

n_phasesint, optional

Number of phases. The default is 3.

seedint, optional

Seed for the random number generator. The default is 0.

Returns
weightsnumpy.ndarray

Weight matrix with uniform distribution.

Examples

>>> from espm.weights.generate_weights import random_weights
>>> import matplotlib.pyplot as plt
>>> weights = random_weights((100,100))
>>> fig = plt.figure(figsize=(10,3))
>>> axs = fig.subplots(1,3)
>>> for i in range(3):
...     axs[i].imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     axs[i].set_title(f"Map {i+1}")
>>> fig.show()
_images/weights-9.png
espm.weights.generate_weights.spheres_weights(shape_2d=[80, 80], n_phases=3, seed=0, radius=1, **kwargs)[source]

Generate a weight matrix with randomly placed spheres of the given radius.

Parameters
shape_2dtuple, optional

Shape of the weight matrix. The default is [80, 80].

n_phasesint, optional

Number of phases. The default is 3. The first phase is the complementary of the other phases.

seedint, optional

Seed for the random number generator. The default is 0.

radiusint, optional

Radius of the spheres in pixels. The default is 1.

Returns
weightsnumpy.ndarray

Weight matrix with spheres distribution.

Examples

>>> from espm.weights.generate_weights import spheres_weights
>>> import matplotlib.pyplot as plt
>>> weights = spheres_weights((100,100), 3, 0, 20)
>>> fig = plt.figure(figsize=(10,3))
>>> axs = fig.subplots(1,3)
>>> for i in range(3):
...     axs[i].imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     axs[i].set_title(f"Map {i+1}")
>>> fig.show()
_images/weights-10.png
espm.weights.generate_weights.toy_weights(**kwargs)[source]

Load the toy problem weights.

Returns
Hdotnp.ndarray

The weights of the toy problem.

Examples

>>> from espm.weights.generate_weights import toy_weights
>>> import matplotlib.pyplot as plt
>>> weights = toy_weights()
>>> fig = plt.figure(figsize=(10,3))
>>> axs = fig.subplots(1,3)
>>> for i in range(3):
...     axs[i].imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     axs[i].set_title(f"Map {i+1}")
>>> fig.show()
_images/weights-11.png
espm.weights.generate_weights.wedge_weights(shape_2d=[80, 80])[source]

Generate a weight matrix with a wedge of phase 2 and complementary phase 1.

Parameters
shape_2dtuple, optional

Shape of the weight matrix. The default is [80, 80].

Returns
weightsnumpy.ndarray

Weight matrix with wedge distribution.

Examples

>>> from espm.weights.generate_weights import wedge_weights
>>> import matplotlib.pyplot as plt
>>> weights = wedge_weights((100,100))
>>> fig = plt.figure(figsize=(5,2))
>>> axs = fig.subplots(1,2)
>>> for i in range(2):
...     axs[i].imshow(weights[:,:,i], cmap=plt.cm.gray_r)
...     axs[i].set_title(f"Map {i+1}")
>>> fig.show()
_images/weights-12.png
Datasets
Sample creation

The espm.datasets module implements the functions that combines the spatial distributions generated from espm.weights and the spectra generated from espm.models into a 3D dataset. This part of the espm package manages the integration into the hyperspy framework : - the datasets and their metadata are stored as hyperspy signals (.hspy). - the espm.eds_spim module implements the EDS_espm class, which is a subclass of the hyperspy.signals.Signal1D class.

Using the EDS_espm class, the user can easily use most of the hyperspy functionalities (e.g. plotting, fitting, decomposition, etc.) as well as the espm functionalites on their experimental and simulated data.

Note

For now espm supports only the signals modeled as EDS data but we aim at implementing the signals corresponding to EELS data too.

The module espm.datasets.base implements the functions that combines a spatial distribution and associated spectra into a 3D dataset. It also implements the functions to convert the dataset into hyperspy compatible objects.

espm.datasets.base.generate_dataset(base_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/v0.2.1/lib/python3.7/site-packages/generated_datasets'), sample_number=10, base_seed=0, elements=[], *args, **kwargs)[source]

Generate a set of spectrum images files and save them in the generated dataset folder. Each spectrum image is saved in a separate file and was generated using a different seed.

Parameters
base_pathstr, optional

The path to the folder where the samples will be saved. The default is DATASETS_PATH.

sample_numberint, optional

The number of samples to generate. The default is 10.

base_seedint, optional

The seed used to generate the samples. The default is 0.

Returns
None.
espm.datasets.base.generate_spim(phases, weights, densities, N, seed=0, continuous=False)[source]

Generate a noiseless spectrum image as tensor product of the phases and weights. Then, if asked for, a noisy spectrum image is generated by drawing from a Poisson distribution.

The noiseless spectrum image is defined as:

\[Y^{nl} = N D \otimes ( Diag(d) A )\]

where \(D\) is the normalized phases, \(A\) is the weights, \(d\) is the density modifier and \(N\) is the number of counts per pixel.

To obtain the noisy spectrum image, the noiseless spectrum image is drawn from a Poisson distribution.

Parameters
phasesarray_like

The phases of the model. Shape (n, spectral_len).

weightsarray_like

The weights of the model. Shape (shape_2d[0], shape_2d[1], n).

densitiesarray_like

Density modifier of the phases. Shape (n,).

Nint

The number of counts per pixel.

seedint, optional

Seed for the random number generator. The default is 0.

continuousbool, optional

If True, the function returns a noiseless spectrum image. The default is False.

Returns
numpy.ndarray

The spectrum image. Shape (shape_2d[0], shape_2d[1], spectral_len).

Notes

More details about the spectrum image generation can be found in the contribution: [TPH+23].

espm.datasets.base.generate_spim_sample(phases, weights, model_params, misc_params, seed=0, g_params={})[source]

Generate a dictionary containing: the spectrum image (made with the weights and phases), the ground truth, the model parameters and the misc parameters.

Parameters
phasesarray_like

The phases of the model. Shape (n, spectral_len).

weightsarray_like

The weights of the model. Shape (shape_2d[0], shape_2d[1], n). The weights should sum to one along axis 2.

model_paramsdict

The parameters of the model. For examples see the default parameters in espm.conf.

misc_paramsdict

The misc parameters of the model. For examples see the default parameters in espm.conf.

seedint, optional

The seed for the random number generator. The default is 0.

g_paramsdict, optional

The parameters for the g matrix. The default is {}. Note that for EDXS data the g matrix is not used during the creation of the data.

Returns
sampledict

A dictionary containing the spectrum image, the ground truth, the model parameters and the misc parameters.

espm.datasets.base.sample_to_EDS_espm(sample, elements=[])[source]

Convert dataset to a custom hyperspy signal type called EDS_espm containing the noisy spectrum image as data, the ground truth as metadata and other useful information.

Parameters
sampledict

A dictionary containing the noisy spectrum image as data, the ground truth as metadata and other useful information. See espm.datasets.base.generate_spim_sample() for more details.

elementslist, optional

A list of the elements present in the sample. The default is [].

Returns
EDS_espm

The hyperspy compatible signal object of the espm.eds_spim module.

espm.datasets.base.sample_to_Signal1D(sample)[source]

Same as espm.datasets.base.sample_to_EDS_espm() but for non-EDS data such as the toy dataset.

The module espm.datasets.built_in_EDXS_datasets implements the functions that generate two built-in datasets: - A dataset of 2 particles embedded in a matrix. - A dataset with a linear local accumulation of Sr.

espm.datasets.built_in_EDXS_datasets.generate_built_in_datasets(seeds_range=10)[source]

Generate the two built-in datasets if they are not already present in the datasets folder.

Parameters
seeds_rangeint

The number of seeds to use for the generation of the built-in datasets. The built-in datasets are generated with a base_seed, and then the base_seed + 1, base_seed + 2, etc. up to base_seed + seeds_range -1.

Returns
None
espm.datasets.built_in_EDXS_datasets.load_grain_boundary(sample=0)[source]

Load the built-in dataset of a grain boundary.

Parameters
sampleint

The sample number to load.

Returns
spimhyperspy.signals.EDS_espm

The loaded dataset.

espm.datasets.built_in_EDXS_datasets.load_particules(sample=0)[source]

Load the built-in dataset of particles.

Parameters
sampleint

The sample number to load.

Returns
spimhyperspy.signals.EDS_espm

The loaded dataset.

The module espm.eds_spim implements the EDS_espm class, which is a subclass of the hyperspy.signals.Signal1D class. The main purpose of this class is to provide an easy and clean interface between the hyperspy framework and the espm package: - The metadata are organised to correspond as much as possible to the typical metadata that can be found in hyperspy EDS_TEM object. - The machine learning algorithms of espm can be easily applied to the EDS_espm object using the standard hyperspy decomposition method. See the notebooks for examples. - The EDS_espm provides a convinient way to:

class espm.datasets.eds_spim.EDS_espm(*args, **kwargs)[source]
add_elements(*, elements=[])[source]

Add elements to the existing list of elements in the metadata.

Parameters
elementslist, optional

List of the elements to be added to the existing list of elements in the metadata. They have to be chemical symbols (e.g. [‘Si’,’Fe’, ‘O’]).

build_G(problem_type='bremsstrahlung', reference_elt={})[source]

Build the G matrix of the espm.models.EDXS model corresponding to the metadata of the EDS_espm object and stores it as an attribute.

Parameters
problem_typestr, optional
Determines the type of the G matrix to build. It can be “bremsstrahlung”, “no_brstlg” or “identity”. The parameters correspond to:
  • “bremsstrahlung” : the G matrix is a callable with both characteristic X-rays and a bremsstrahlung model.

  • “no_brstlg” : the G matrix is a matrix with only characteristic X-rays.

  • “identity” : the G matrix is None which is equivalent to an identity matrix for espm functions.

reference_eltdict, optional

Dictionary containing atomic numbers and a corresponding cut-off energies. It is used to separate the characteristic X-rays of the given elements into two energies ranges and assign them each a column in the G matrix instead of having one column per element. For example reference_elt = {“26”,3.0} will separate the characteristic X-rays of the element Fe into two energies ranges and assign them each a column in the G matrix. This is useful to circumvent issues with the absorption.

Returns
GNone or numpy.ndarray or callable

The G matrix of the espm.models.EDXS model corresponding to the metadata of the EDS_espm object.

build_ground_truth(reshape=True)[source]

Get the ground truth stored in the metadata of the EDS_espm object, if available. The reshape arguments can be used to get the ground truth in a form easier to use for machine learning algorithms.

Parameters
reshapebool, optional

If False, the ground truth is returned in the form of a 3D array of shape (shape_2d[0],shape_2d[1],n_phases) and a 2D array of shape (n_phases,n_features).

Returns
phasesnumpy.ndarray

The ground truth of the spectra of the phases.

weightsnumpy.ndarray

The ground truth of the spatial distribution of the phases.

print_concentration_report(abs=False)[source]

Print a report of the chemical concentrations from a fitted W.

Parameters
absbool

If True, print the absolute concentrations, if False, print the relative concentrations.

Returns
None

Notes

  • This function is only available if the learning results contain a decomposition algorithm that has been fitted.

  • The “absolute” concentrations correspond to some physical number. To retrieve the number of atoms per unit volume, you need to multiply by the correct pre-factors such as beam current, detector solid angle, etc…

set_additional_parameters(thickness=2e-05, density=3.5, detector_type='SDD_efficiency.txt', width_slope=0.01, width_intercept=0.065, xray_db='default_xrays.json')[source]

Helper function to set the metadata that are specific to the espm package so that it does not overwrite experimental metadata. See the documentation of the set_analysis_parameters() function for the meaning of the parameters.

set_analysis_parameters(beam_energy=200, azimuth_angle=0.0, elevation_angle=22.0, tilt_stage=0.0, elements=[], thickness=2e-05, density=3.5, detector_type='SDD_efficiency.txt', width_slope=0.01, width_intercept=0.065, xray_db='default_xrays.json')[source]

Helper function to set the metadata of the EDS_espm object. Be careful, it will overwrite the metadata of the object.

Parameters
beam_energyfloat, optional

The energy of the electron beam in keV.

azimuth_anglefloat, optional

The azimuth angle of the EDS detector in degrees.

elevation_anglefloat, optional

The elevation angle of the EDS detector in degrees.

tilt_stagefloat, optional

The tilt angle of the sample stage in degrees (usually it correspond to alpha on FEI instruments).

elementslist, optional

List of the elements to be used in the analysis.

thicknessfloat, optional

The thickness of the sample in centimeters.

densityfloat, optional

The density of the sample in g/cm^3.

detector_typestr, optional

The type of the detector. It is either the name of a text file containing the efficiency of

width_slopefloat, optional

The slope of the linear fit of the detector width as a function of the energy.

width_interceptfloat, optional

The intercept of the linear fit of the detector width as a function of the energy.

xray_dbstr, optional

The name of the X-ray emission cross-section database to be used. The default tables are avalaible in the espm/tables folder. Additional tables can be generated by emtables.

set_fixed_W(phases_dict)[source]

Helper function to create a fixed_W matrix. The output matrix will have -1 entries except for the elements (and bremsstrahlung parameters) that are present in the phases_dict dictionary. In the output (fixed_W) matrix, the -1 entries will be ignored during the decomposition using espm.estimator.NMFEstimator are normally learned while the non-negative entries will be fixed to the values given in the phases_dict dictionary. Usually, the easiest is to fix some elements to 0.0 in some phases if you want to improve unmixing results. For example, if you have a phase with only Si and O, you can fix the Fe element to 0.0 in this phase.

Parameters
phases_dictdict

Determines which elements of fixed_W are going to be non-negative. The dictionnary has typically the following structure : phases_dict = {“phase1_name” : {“Fe” : 0.0, “O” : 1.25e23}, “phase2_name” : {“Si” : 0.0, “b0” : 0.05}}.

Returns
——-
Wnumpy.ndarray
set_microscope_parameters(beam_energy=200, azimuth_angle=0.0, elevation_angle=22.0, tilt_stage=0.0)[source]

Helper function to set the microscope parameters of the EDS_espm object. Be careful, it will overwrite the microscope parameters of the object. See the documentation of the set_analysis_parameters() function for the meaning of the parameters.

update_G(part_W=None, G=None)[source]

Update the absortion part of the bremsstrahlung of the G matrix.

property X

The data in the form of a 2D array of shape (n_samples, n_features).

property Xdot

The ground truth in the form of a 3D array of shape (shape_2d[0],shape_2d[1],n_features), if available.

property maps

Ground truth of the spatial distribution of the phases in the form of a 3D array of shape (shape_2d[0],shape_2d[1],n_phases), if available.

property maps_2d

Ground truth of the spatial distribution of the phases in the form of a 2D array of shape (shape_2d[0]*shape_2d[1],n_phases), if available.

property model

The espm.models.EDXS model corresponding to the metadata of the EDS_espm object.

property phases

Ground truth of the spectra of the phases in the form of a 2D array of shape (n_phases,n_features), if available.

property shape_2d

Shape of the data in the spatial dimension.

espm.datasets.eds_spim.build_G(model, g_params)[source]
espm.datasets.eds_spim.get_metadata(spim)[source]

Get the metadata of the EDS_espm object and format it as a model parameters dictionary.

Estimators

NMF Estimators.

The espm.estimators module implements different NMF algorithms.

The class espm.estimators is an abstract class for all NMF algorithms. It implements the fit and transform methods. The fit method is implemented in the abstract class and calls the _iteration method which is implemented in the child classes. The transform method is implemented in the child classes.

NMF Estimator
class espm.estimators.NMFEstimator(n_components=2, init=None, tol=0.0001, max_iter=200, random_state=None, verbose=1, debug=False, l2=False, G=None, shape_2d=None, normalize=False, log_shift=1e-14, eval_print=10, true_D=None, true_H=None, fixed_H=None, fixed_W=None, hspy_comp=False)[source]

Abstract class for NMF algorithms.

This abstract class espm.estimators.NMFEstimator is used to implement the different NMF algorithms. It solves problems of the form:

\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} \frac{1}{2} L(X, GWH) + R(W, H)\]

where \(X\) is the data matrix, \(G\) is a matrix of known values, \(W\) and \(H\) are the matrices to be learned, \(R\) is a regularization term and \(L\) is a loss function. that represent the generalized KL divergence. As a reminder, the generalized KL divergence is defined as:

\[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

where \(Y = GWH\). Since \(X\) does not depend on \(W\) and \(H\), we obtain the loss function:

\[L(X, Y) = - \sum_{i,j} X_{ij} \log \frac{GWH_{ij}} + GWH_{ij}\]

The Generalized KL divergence has the advantage of being zero when \(X = Y\), which is not the case for our loss. Therefore, we shift the loss function by a constant \(C\) such that it equals the Generalized KL divergence. This constant is stored in the attribute espm.estimators.NMFEstimator.const_KL_.

The loss function can also be selected to be the Frobenius norm. In this case, the loss function is:

\[L(X, Y) = \frac{1}{2} \sum_{i,j} (X_{ij} - Y_{ij})^2\]

While the code will work, it is not recommended to use the Frobenius norm as a loss function. This code is optimized for the KL divergence.

The size of:

  • \(X\) is \((n, p)\),

  • \(W\) is \((m, k)\),

  • \(H\) is \((k, p)\),

  • \(G\) is \((n, m)\).

The columns of the matrices \(H\) and \(X\) are assumed to be images. This is used typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e. shape_2d[0]*shape_2d[1] = p.

Parameters
n_componentsint, default=2

Number of components, i.e. dimensionality of the latent space.

initstr

Method used to initialize the procedure. Default is None The method use the initialization of sklearn.decomposition. It can be imported using: .. code-block::python

>>> from sklearn.decomposition._nmf import _initialize_nmf
tolfloat, default=1e-4

Tolerance of the stopping condition.

max_iterint, default=200

Maximum number of iterations before timing out.

random_stateint, RandomState instance, default=None
verboseint, default=1

The verbosity level.

debugbool, default=False

If True, the algorithm will log more and perform more checks.

l2bool, default=False

If True, the algorithm will use the l2 norm instead of the KL divergence.

Gnp.array, function or None, default=None

If np.array, it is the known matrix of the data. If function, it is a function that takes as input the data matrix and returns the known matrix (np.array). If None, it is assumed that G is the identity matrix.

shape_2dtuple or None, default=None

If not None, it is the image shape of the columns of the matrices \(X\) and \(H\).

normalizebool, default=False

If True, the algorithm will normalize the data matrix \(X\).

log_shiftfloat, default=1e-10

Lower bound for W and H, i.e. \(\epsilon\).

eval_printint, default=10

Number of iterations between each evaluation of the loss function.

true_Dnp.array or None, default=None

Ground truth for the matrix \(GW\). Used for evaluation purposes.

true_Hnp.array or None, default=None

Ground truth for the matrix \(H\). Used for evaluation purposes.

fixed_Hnp.array or None, default=None

If not None, it fixes the non-zero values of the matrix \(H\).

fixed_Wnp.array or None, default=None

If not None, it fixes the non-zero values of the matrix \(W\).

hspy_compbool, default=False

If True, the algorithm will use the format compatible with hyperspy. Use this option if you run the algorithm with the method decompositio in hyperspy. For example: .. code-block::python

>>> est = SmoothNMF( n_components = 3, hspy_comp = True)
>>> out = spim.decomposition(algorithm = est, return_info=True)
fit(X, y=None, **params)[source]

Learn a NMF model for the data X.

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

Data matrix to be decomposed

yIgnored
paramsdict

Parameters passed to the fit_transform method.

Returns
self

The model.

fit_transform(X, y=None, W=None, H=None)[source]

Learn a NMF model for the data X and returns the transformed data. This is more efficient than calling fit followed by transform.

The size of:

  • \(X\) is \((n, p)\),

  • \(W\) is \((m, k)\),

  • \(H\) is \((k, p)\),

  • \(G\) is \((n, m)\).

Parameters
X{array-like, sparse matrix} of shape (n, p)

Data matrix to be decomposed

yIgnored

Not used, present here for API consistency by convention.

Warray-like of shape (m, k)

If specified, it is used as initial guess for the solution.

Harray-like of shape (k, p)

If specified, it is used as initial guess for the solution.

Returns
GWndarrays

Transformed data.

get_losses()[source]
inverse_transform(W)[source]

Transform data back to its original space.

Parameters
W{ndarray, sparse matrix} of shape (n_samples, n_components)

Transformed data matrix.

Returns
X{ndarray, sparse matrix} of shape (n_samples, n_features)

Data matrix of original shape.

loss(W, H, average=True, X=None)[source]

Loss function.

Compute the loss function for the given matrices W and H.

Parameters
Wnp.array

Matrix of shape (n, k)

Hnp.array

Matrix of shape (k, p)

averagebool, default=True

If True, the loss is averaged over the number of elements of the matrices.

Xnp.array or None, default=None

If not None, it is the data matrix. If None, it is assumed that the data matrix in self.X_.

Returns
loss_float

Value of the loss function.

remove_zeros_lines(X, epsilon)[source]
const_KL_ = None
loss_names_ = ['KL_div_loss']
SmoothNMF
class espm.estimators.SmoothNMF(lambda_L=1.0, linesearch=False, mu=0, epsilon_reg=1, algo='log_surrogate', force_simplex=True, dicotomy_tol=1e-05, gamma=None, **kwargs)[source]

SmoothNMF - NMF with a smooth regularization term

We encourage to read the example available in the documentation: https://espm.readthedocs.io/en/latest/introduction/notebooks/toy-problem.html

The corresponding notebook is available on github: https://github.com/adriente/espm/blob/main/notebooks/toy-ML.ipynb

The class SmoothNMF implements the regularized NMF algorithm. It solves problems of the form:

\[\dot{W}, \dot{H} = \arg \min_{W \geq \epsilon, H \geq \epsilon} D_{GKL}(X || GWH) + \lambda_L tr(H \Delta H^\top) + \mu \sum_{ij} \log(H_{ij} + \epsilon_{reg})\]

where:

  • D_{GKL} is the Generalized KL divergence loss function defined as:

    \[D_{GKL}(X || Y) = \sum_{i,j} X_{ij} \log \frac{X_{ij}}{Y_{ij}} - X_{ij} + Y_{ij}\]

    See the documentation of the class espm.estimators.NMFEstimator for more details.

  • Delta is the Laplacian operator (it can be created using the function create_laplacian_matrix from the utils module).

  • epsilon_{reg} is the slope of the log regularization/sparsity at 0 (you probably want to leave this to 1).

  • lambda_L is a regularization parameter, which encourages smoothness in the columns of H.

  • mu is a regularization parameter, which is similar to an L1 sparsity penalty.

The size of:

  • X is (n, p)

  • W is (m, k)

  • H is (k, p)

  • G is (n, m)

The columns of the matrices H and X are assumed to be images, typically for the smoothness regularization. The parameter shape_2d defines the shape of the images, i.e., shape_2d[0]*shape_2d[1] = p.

Parameters
lambda_Lfloat, default=1.0

Regularization parameter for the smooth regularization term.

linesearchbool, default=False

If True, use a line search to find the step size.

mufloat, default=0

Regularization parameter for the log regularization/sparsity term.

epsilon_regfloat, default=1

Slope of the log regularization/sparsity at 0.

algostr, default=”log_surrogate”

Algorithm to use for the smooth regularization term. Can be “log_surrogate”, “l2_surrogate”, or “projected_gradient”.

force_simplexbool, default=True

If True, force the solution to be in the simplex.

dicotomy_tolfloat, default=1e-3

Tolerance for the dichotomy algorithm.

gammafloat, default=None

Initial value for the step size. If None, it is set to the Lipschitz constant of the gradient.

**kwargsdict

Additional parameters for the NMFEstimator class.

fit_transform(X, y=None, W=None, H=None)[source]

Fit the model to the data X and returns the transformed data.

The size of:

  • \(X\) is \((n, p)\),

  • \(W\) is \((m, k)\),

  • \(H\) is \((k, p)\),

  • \(G\) is \((n, m)\).

Parameters
Xarray-like, shape (n, p)

Data matrix to be decomposed

yIgnored

Not used, present here for API consistency by convention.

Warray-like, shape (m, k)

If init=’custom’, it is used as initial guess for the solution.

Harray-like, shape (k, p)

If init=’custom’, it is used as initial guess for the solution.

Returns
GWndarrays

Transformed data.

loss(W, H, average=True, X=None)[source]

Compute the loss function.

loss_names_ = ['KL_div_loss', 'log_reg_loss', 'Lapl_reg_loss', 'gamma']
Measures

The espm.measures module implements different measures for the matrix factorisation problem. In particular it contains the different losses and regularizers used in espm.estimator module. It also contains different metrics to evaluate the results.

espm.measures.Frobenius_loss(X, W, H, average=False)[source]

Frobenius norm of the difference between X and WH.

Compute the Froebenius norm (elementwise L2 norm of a matrix) given \(X,W,H\):

\[\| X - WH \|_F = \sum_{ji} \left| X_{ij} - (W H)_{ij} \right|^2\]
Parameters
  • X (np.array 2D) – n x m matrix

  • W (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Examples

>>> import numpy as np
>>> from espm.measures import Frobenius_loss
>>> X = np.array([[1, 1, -1], [2, 4, 5]])
>>> W = np.array([[1], [1]])
>>> H = np.array([[1, 2, 3]])
>>> Frobenius_loss(X, W, H)
    26
espm.measures.KL(X, Y, log_shift=1e-14, average=False)[source]

Generalized KL (Kullback–Leibler) divergence for two matrices

\[D_KL(X || Y) = \sum_{ji} X_{ij} \log (X / Y)_{ij} + (Y - X)_{ij}\]
Parameters
  • X (np.array 2D) – n x m matrix

  • W (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

  • log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module esppy.conf)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

espm.measures.KL_loss_surrogate(X, W, H, Ht, log_shift=1e-14, average=False)[source]

Surrogate loss for the KL divergence.

espm.measures.KLdiv(X, D, H, log_shift=1e-14, average=False)[source]

Generalized KL (Kullback–Leibler) divergence

Compute the generalized KL divergence given \(X,W,H\):

\[D_KL(X || WH) = \sum_{ji} X_{ij} \log (X / D A)_{ij} + (D A - X)_{ij}\]
Parameters
  • X (np.array 2D) – n x m matrix

  • W (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

  • log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module esppy.conf)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

Examples

>>> import numpy as np
>>> from espm.measures import KLdiv
>>> X = np.array([[1, 1, 1], [2, 4, 5]])
>>> W = np.array([[1], [1]])
>>> H = np.array([[1, 2, 3]])
>>> KLdiv(X, W, H)
    2.921251732961556
espm.measures.KLdiv_loss(X, W, H, log_shift=1e-14, average=False)[source]

Generalized Generalized KL (Kullback–Leibler) divergence loss

Compute the loss based on the generalized KL divergence given :math: X,W,H:

\[\sum_{ji} X_{ij} \log (D W)_{ij} + (D W)_{ij}\]

This does not contains all the term of the KL divergence, only the ones depending on W and H.

Parameters
  • X (np.array 2D) – n x m matrix

  • Y (np.array 2D) – n x m matrix

  • log_shift (float) – small constant to ensure the KL divergence does not explode (default value set in module esppy.conf)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

Examples

>>> import numpy as np
>>> from espm.measures import KLdiv, KLdiv_loss
>>> X = np.array([[1, 1, 1], [2, 4, 5]])
>>> W = np.array([[1], [1]])
>>> H = np.array([[1, 2, 3]])
>>> KLdiv_loss(X, W, H)
    1.9425903651915402ZSXDerzA
>>> KLdiv(X, W, H)
    2.921251732961556
espm.measures.find_min_MSE(true_maps, algo_maps, get_ind=False, unique=False)[source]

Compare all the mean squared errors between ground truth spectra and NMF spectra and find the minimum configuration.

Parameters
  • true_maps (np.array 2D) – true maps with shape (number of phases, number of pixels)

  • algo_maps (np.array 2D) – NMF maps with shape (number of phases, number of pixels)

  • get_ind (boolean) – If True, the function will also return the indices of the NMF maps corresponding to the ground truth

  • unique (boolean) – If False it will find the global minimum but several maps can be associated to the same ground truth

Returns

list of mse, (optionally) tuple of indices

Return type

(list[float],list[int])

..warning::

The output being either a list or a tuple of list isn’t a great idea. It has to change.

espm.measures.find_min_angle(true_vectors, algo_vectors, get_ind=False, unique=False)[source]

Compare all the angles between ground truth spectra and NMF spectra and find the minimum configuration.

Parameters
  • true_vectors (np.array 2D) – true spectra with shape (number of phases, number of energy channels)

  • algo_vectors (np.array 2D) – NMF spectra with shape (number of phases, number of energy channels)

  • get_ind (boolean) – If True, the function will also return the indices of the NMF spectra corresponding to the ground truth

  • unique (boolean) – If False it will find the global minimum but several spectra can be associated to the same ground truth

Returns

list of angles, (optionally) tuple of indices

Return type

(list[float],list[int])

..warning::

The output being either a list or a tuple of list isn’t a great idea. It has to change.

espm.measures.find_min_config(true_maps, true_spectra, algo_maps, algo_spectra, angles=True)[source]

Determines the best match between the true and the NMF spectra and maps by finding the configuration that minimizes either the sum of angles or the sum of MSE. The function returns a warning (boolean) if the MSE and the angles disagree on the best configuration.

Parameters
  • true_maps (np.array 2D) – true maps with shape (number of phases, number of pixels)

  • true_spectra (np.array 2D) – true spectra with shape (number of phases, number of energy channels)

  • algo_maps (np.array 2D) – NMF maps with shape (number of phases, number of pixels)

  • algo_spectra (np.array 2D) – NMF spectra with shape (number of phases, number of energy channels)

  • angles (boolean) – If True, the function will minimize the sum of angles between true and NMF spectra. If False, it will minimize the sum of MSE between true and NMF maps.

Returns

list of angles, list of MSE, configuration of the best match, warning

espm.measures.global_min(matr)[source]
espm.measures.log_reg(H, mu, epsilon=1, average=False)[source]

Log regularisation

Compute the regularization loss:

\[R(\mu, H, \epsilon) = \sum_{ij} \mu_i \log \left( H_{ij} + \epsilon \right).\]
Parameters
  • H (np.array 2D) – n x m matrix

  • mu (np.array 1D) – n vector

  • epsilon (float) – value of \(\epsilon\) (default 1)

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

espm.measures.log_surrogate(H, Ht, mu, epsilon, average=False)[source]

Surrogate loss for the log function.

espm.measures.mae(map1, map2)[source]

Mean average error

Calculate the mean average error between two 2D arrays of the same dimension.

Parameters
  • map1 (np.array 2D) – first array

  • map2 (np.array 2D) – second array

Returns

the answer

Examples

>>> import numpy as np
>>> from espm.measures import mae
>>> map1 = np.array([[0, 1][0, 1]])
>>> map2 = np.array([[1, 1][1, 1]])
>>> mae(map1, map2)
    0.5
espm.measures.mse(map1, map2)[source]

Mean square error

Calculate the mean squared error between two 2D arrays of the same dimension.

Parameters
  • map1 (np.array 2D) – first array

  • map2 (np.array 2D) – second array

Returns

the answer

Examples

>>> import numpy as np
>>> from espm.measures import mse
>>> map1 = np.array([[0, 1][0, 1]])
>>> map2 = np.array([[1, 1][1, 1]])
>>> mse(map1, map2)
    0.5
espm.measures.ordered_angles(true_spectra, algo_spectra, input_inds)[source]

See ordered mse

espm.measures.ordered_mae(true_maps, algo_maps, input_inds)[source]

input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the mean average errors of each phase in truth order.

espm.measures.ordered_mse(true_maps, algo_maps, input_inds)[source]

input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the mean squared errors of each phase in truth order.

espm.measures.ordered_r2(true_maps, algo_maps, input_inds)[source]

input : p x Npx matrix of floats, p x Npx matrix of floats, list of integers output : list of floats %————————-% Takes true maps of p phases and Npx pixels, reconstructed maps of the same size and indices of the correspondance between true phases and reconstructed phases returns the coefficient of determination of each phase in truth order.

espm.measures.r2(map_true, map_pred)[source]

\(R^2\) - Coefficient of determination

Calculates the coefficient of determination between two 2D arrays of the same dimension. This is also called regression score function. See wikipedia.

This function is a wrapper for the function sklearn.metrics.r2_score of Scikit Learn.

Parameters
  • map1 (np.array 2D) – first array

  • map2 (np.array 2D) – second array

Returns

the answer

Return type

float

espm.measures.residuals(data, model)[source]
espm.measures.spectral_angle(v1, v2)[source]

Spectral angle

Calculate the angle between two spectra of the same dimension.

Parameters
  • v1 (np.array 1D) – first spectrum

  • v2 (np.array 1D) – second spectrum

Returns

the answer

Return type

float

Examples

>>> import numpy as np
>>> from espm.measures import spectral_angle
>>> v1 = np.array([0, 1, 0])
>>> v2 = np.array([1, 0, 1])
>>> spectral_angle(v1, v2)
    90.0
espm.measures.squared_distance(x, y=None)[source]

Squared distance between two between all colon vectors matrices.

Calculate the squared L2 distance between all pairs of vectors of two matrices. If only one matrix is given, the function uses each pair of vector of this matrix.

Parameters
  • x (np.array 2D) – n x m matrix of first colon vectors

  • y (np.array 2D) – n x m matrix of second colon vectors (optional)

Returns

the answer

Return type

np.array (m x m)

Examples

>>> import numpy as np
>>> from espm.measures import square_distance
>>> x = np.arange(3)
>>> square_distance(x, x)
    array([[ 0.,  1.,  2.],
    [ 1.,  0.,  1.],
    [ 2.,  1.,  0.]])
espm.measures.trace_xtLx(L, x, average=False)[source]

Trace of \(X^T L X\)

Compute the following expression \(\text{Tr} (X^T L X)\).

Parameters
  • L (np.array 2D) – n x n matrix

  • mu (np.array 2D) – n x k martrix of k n-sized vector

  • average (boolean) – replace the sum with a mean, i.e., divide the result by n*m (default False)

Returns

the answer

Return type

float

espm.measures.unique_min(matrix)[source]

From a square matrix of float values, finds the combination of elements with different lines which mimises the sum of elements.

It is a brute force algorithm, it is not recommended to input a matrix bigger than 20 x 20

Parameters

matrix (np.array 2D) – square matrix

Returns

list of unique min values and corresponding indices in the same order

Return type

(list, list[int])

Examples

>>> import numpy as np
>>> from espm.measures import unique_min
>>> matrix = np.array([[1.2,  1.3,  3.5],
                    [4.9,  2.2,  6.5],
                   [9.0,  4.1,  1.8]])
>>> unique_min(v1, v2)   
    ([1.2, 2.2, 1.8], (0, 1, 2))
Utils

Utils for the ESPM package

espm.utils.approx_density(atomic_fraction=False, *, elements_dict={})[source]

Wrapper to the density_of_mixture function of hyperspy. Takes a dict of chemical composition expressed in atomic weight fractions. Returns an approximated density.

espm.utils.arg_helper(params, d_params)[source]

Check if all parameter of d_params are in params. If not, they are added to params with the default value.

Parameters
paramsdict

Dictionary of parameters to be checked.

d_paramsdict

Dictionary of default parameters.

Returns
paramsdict

Dictionary of parameters with the default parameters added if not present.

espm.utils.atomic_to_weight_dict(*, elements_dict={})[source]

Wrapper to the atomic_to_weight function of hyperspy. Takes a dict of chemical composition expressed in atomic fractions. Returns a dict of chemical composition expressed in atomic weight fratiom.

espm.utils.bin_spim(data, n, m)[source]

Take a 3D array of size (x,y,k) [px, py, e] Returns a 3D array of size (n,m,k) [new_px, new_py, e]

espm.utils.check_keys(params, d_params, upperkeys='', toprint=True)[source]

Check if all parameter of d_params are in params. If not, they are added to params with the default value.

Parameters
paramsdict

Dictionary of parameters to be checked.

d_paramsdict

Dictionary of default parameters.

upperkeysstr

String of the upper keys.

toprintbool

If True, print the warning.

Returns
paramsdict

Dictionary of parameters with the default parameters added if not present.

Examples

>>> params = {'a':1,'b':2}
>>> d_params = {'a':1,'b':2,'c':3}
>>> check_keys(params,d_params)
>>> params
{'a': 1, 'b': 2, 'c': 3}
espm.utils.close_all()[source]

Close all opened windows.

espm.utils.create_laplacian_matrix(nx, ny=None)[source]

Helper method to create the laplacian matrix for the laplacian regularization

Parameters
:param nx: height of the original image
:param ny: width of the original image
Returns
rtype

scipy.sparse.csr_matrix ..

:return:the n x n laplacian matrix, where n = nx*ny
espm.utils.is_number(i)[source]

Return True if i is a number

Parameters

i (any) – variable to check

Returns

True if i is a number

Return type

bool

espm.utils.is_symbol(i)[source]

Return True if i is a chemical symbol

Parameters

i (any) – variable to check

Returns

True if i is a chemical symbol

Return type

bool

espm.utils.isdict(p)[source]

Return True if the variable a dictionary.

Parameters

p (any) – variable to check

Returns

True if p is a dictionary

Return type

bool

espm.utils.number_to_symbol_dict(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with atomic numbers as keys (e.g. 26 for Fe) returns a dict of elements with symbols as keys (e.g. Fe for iron)

espm.utils.number_to_symbol_list(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)

espm.utils.process_losses(losses)[source]

Process the losses to be plotted

Parameters
losses: np.ndarray

Array of losses (output of espm.estimators.NMFEstimator.get_losses method)

Returns
values: np.ndarray

Array of values

names: list

List of names

espm.utils.rescaled_DH(D, H)[source]

Rescale the matrices D and H such that the columns of H sums approximately to one.

Parameters
  • D (np.array 2D) – n x k matrix

  • H (np.array 2D) – k x m matrix

Returns

D_rescale, H_rescale

Return type

np.array 2D, np.array 2D

espm.utils.symbol_list()[source]
espm.utils.symbol_to_number_dict(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)

espm.utils.symbol_to_number_list(func)[source]

Decorator Takes a dict of elements (a.k.a chemical composition) with symbols as keys (e.g. Fe for iron) returns a dict of elements with atomic numbers as keys (e.g. 26 for iron)

Table utils

The espm.table_utils module implements some methods to manipulate the tables produces by the emtables package.

Using this module it is possible to load the tables and to import custom k-factors into the tables.

Note

The emtables package can be found here: https://github.com/adriente/emtables

espm.tables_utils.get_k_factor(table, mdata, element, line, range=0.5, ref_elt='14', ref_line='KL3', ref_range=0.5)[source]

Obtain the k-factor of a line from an emtables, X-ray emission cross section table.

Parameters
tabledict

The table of the X-ray emission cross sections.

mdatadict

The metadata of the table.

elementint

The atomic number of the element to use.

linestr

The regex of the line to use. It has to correspond to IUPAC notation.

rangefloat

The energy range to use for the integration of the cross section of the line. For example, if range = 0.5, the integration will be done between the energy of the line - 0.5 and the energy of the line + 0.5. We do so that when you select the “KL3” line, it integrates around it and make it correspond to the K-alpha bunch of lines.

ref_eltint

The atomic number of the element to use as a reference for the k-factor. The default reference line is Si “KL3” with an integration range of 0.5.

ref_linestr

The regex of the line to use as a reference for the k-factor. It has to correspond to IUPAC notation.

ref_rangefloat

The energy range to use for the integration of the cross section of the reference line.

Returns
k_factorfloat

The k-factor of the line. It does not take into account the absorption correction.

espm.tables_utils.import_k_factors(table, mdata, k_factors_names, k_factors_values, ref_name)[source]

Modify the X-ray emission cross-sections of the input table using the k-factors input, i.e. imposing cross-sections ratios to correspond to the k-factors. The metadata are modified too to keep track of the modifications.

Parameters
tabledict

The table of the X-ray emission cross sections.

mdatadict

The metadata of the table.

k_factors_nameslist

The list of the names of the k-factors to import. It has to correspond to the nomenclature of the hyperspy X-ray lines.

k_factors_valueslist

The list of the values of the k-factors to import. It has to have the same length and ordering as k_factors_names.

ref_namestr

The name of the X-ray line to use as a reference for the k-factors. It has to correspond to the nomenclature of the hyperspy X-ray lines.

Returns
new_tabledict

The modified table of the X-ray emission cross sections.

new_mdatadict

The modified metadata of the table.

espm.tables_utils.load_table(db_name)[source]

Load the table and metadata of a json table generated by emtables.

Parameters
db_namestr

The file name of the table to load.

Returns
tabledict

The table of the cross sections.

metadatadict

The metadata of the table.

Notes

Call espm.conf.DB_PATH to get the folder of the tables.

espm.tables_utils.modify_table_lines(table, mdata, elements, line, coeff)[source]

Modify the cross section of the lines of the selected elements in the input table.

Parameters
tabledict

The table of the X-ray emission cross sections.

mdatadict

The metadata of the table.

elementslist

The list of the atomic numbers of the elements to modify.

linestr

The regex of the line to modify. It has to correspond to IUPAC notation.

coefffloat

The coefficient to multiply the cross section of the selected lines.

Returns
new_tabledict

The modified table of the X-ray emission cross sections.

new_mdatadict

The modified metadata of the table.

Notes

X-ray line regex examples : input “L” will modify all the L lines, input “L3” will modifiy all the L3 lines, input “L3M2” will modify the “L3M2” line.

espm.tables_utils.save_table(filename, table, mdata)[source]

Saves a table and its metadata in a json file. The structure of the json file is compliant with espm.

Changelog

All notable changes to this project will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

0.1.4 (2023-22-02)

First inofficial release

  • First version of the package on PyPi

  • All main functionality tested

  • Basic documentation

0.2.0 (2023-20-04)

Second inofficial release

  • All important part of the documentation are present

  • This is a test before the first official release 1.0.0

Contributing

Contributions are welcome, and they are greatly appreciated! The development of this package takes place on GitHub. Issues, bugs, and feature requests should be reported there. Code and documentation can be improved by submitting a pull request. Please add documentation and tests for any new code.

The package can be set up (ideally in a fresh virtual environment) for local development with the following:

$ git clone https://github.com/adriente/espm.git
$ pip install -e ".[dev]"

You can improve or add functionality in the espm folder, along with corresponding unit tests in espm/tests/test_*.py (with reasonable coverage). If you have a nice example to demonstrate the use of the introduced functionality, please consider adding a notebook in doc/introduction/notebooks. In general, this can be done as a relative symbolic link to a notebook in the notebooks folder.

Update README.rst and CHANGELOG.rst if applicable.

After making any change, please run the tests, and build the documentation with the following

$ make test
$ make doc

To iterate faster, you can partially run the test suite, at various degrees of granularity, as follows:

$ pytest espm.test.test_utils.py
$ pytest espm.test.test_estimators.py
Making a release
  1. Update the version number and release date in setup.py, espm/__init__.py and CHANGELOG.rst.

  2. Create a git tag with git tag -a v0.2.0 -m "espm v0.2.0".

  3. Push the tag to GitHub with git push --tag. The tag should now appear in the releases and tags tab.

  4. Create a release on GitHub and select the created tag.

  5. Build the distribution with make dist and check that the dist/espm-0.2.0.tar.gz source archive contains all required files. The binary wheel should be found as dist/espm-0.2.0.py3-none-any.whl.

  6. Test the upload and installation process:

    $ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
    $ pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple espm
    
  7. Build and upload the distribution to the real PyPI with make release.

Repository organization
LICENSE.txt         Project license
*.rst               Important documentation
Makefile            Targets for make
setup.py            Meta information about package (published on PyPI)
.gitignore          Files ignored by the git revision control system
.travis.yml         Defines testing on Travis continuous integration

espm/              Contains the modules (the actual toolbox implementation)
 __init.py__        Load modules at package import
 *.py               One file per module
espm/tables/       Contains data tables for the EDXS and EDSS models
 __init.py__        Load modules at package import
 *.json             data_files for the package
 *.txt              data_files for the package

espm/tests/        Contains the test suites (will be distributed to end user)
 __init.py__        Load modules at package import
 test_*.py          One test suite per module
 test_docstrings.py Test the examples in the docstrings (reference doc)

doc/                Package documentation
 conf.py            Sphinx configuration
 index.rst          Documentation entry page
 *.rst              Include doc files from root directory

doc/reference/      Reference documentation
 index.rst          Reference entry page
 *.rst              Only directives, the actual doc is alongside the code

doc/introduction/
 index.rst          Tutorials entry page
 *.rst              One file per tutorial

doc/introduction/notebooks/
 *.ipynb            One notebook per tutorial

References

CNC84

JN Chapman, WAP Nicholson, and PA Crozier. Understanding thin film x-ray spectra. Journal of Microscopy, 136(2):179–191, 1984.

dlPPF+22

Francisco de la Peña, Eric Prestat, Vidar Tonaas Fauske, Pierre Burdet, Jonas Lähnemann, Petras Jokubauskas, Tom Furnival, Magnus Nord, Tomas Ostasevicius, Katherine E. MacArthur, Duncan N. Johnstone, Mike Sarahan, Joshua Taillon, Thomas Aarholt, pquinn-dls, Vadim Migunov, Alberto Eljarrat, Jan Caron, Carter Francis, T. Nemoto, Timothy Poon, Stefano Mazzucco, actions-user, Nicolas Tappy, Niels Cautaerts, Suhas Somnath, Tom Slater, Michael Walls, Florian Winkler, and Håkon Wiik Ånes. Hyperspy/hyperspy: release v1.7.3. October 2022. URL: https://doi.org/10.5281/zenodo.7263263, doi:10.5281/zenodo.7263263.

SP09

F Scholze and M Procop. Modelling the response function of energy dispersive x-ray spectrometers with silicon detectors. X-Ray Spectrometry: An International Journal, 38(4):312–321, 2009.

TPH+23

Adrien Teurtrie, Nathanaël Perraudin, Thomas Holvoet, Hui Chen, Duncan TL Alexander, Guillaume Obozinski, and Cécile Hébert. Espm: a python library for the simulation of stem-edxs datasets. Ultramicroscopy, pages 113719, 2023.