Datasets

Sample creation

The espm.datasets module implements the functions that combines the spatial distributions generated from espm.weights and the spectra generated from espm.models into a 3D dataset. This part of the espm package manages the integration into the hyperspy framework : - the datasets and their metadata are stored as hyperspy signals (.hspy). - the espm.eds_spim module implements the EDS_espm class, which is a subclass of the hyperspy.signals.Signal1D class.

Using the EDS_espm class, the user can easily use most of the hyperspy functionalities (e.g. plotting, fitting, decomposition, etc.) as well as the espm functionalites on their experimental and simulated data.

Note

For now espm supports only the signals modeled as EDS data but we aim at implementing the signals corresponding to EELS data too.

The module espm.datasets.base implements the functions that combines a spatial distribution and associated spectra into a 3D dataset. It also implements the functions to convert the dataset into hyperspy compatible objects.

espm.datasets.base.generate_dataset(*args, base_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/latest/lib/python3.11/site-packages/generated_datasets'), sample_number=10, base_seed=0, elements=[], **kwargs)[source]

Generate a set of spectrum images files and save them in the generated dataset folder. Each spectrum image is saved in a separate file and was generated using a different seed.

Parameters:
base_pathstr, optional

The path to the folder where the samples will be saved. The default is DATASETS_PATH.

sample_numberint, optional

The number of samples to generate. The default is 10.

base_seedint, optional

The seed used to generate the samples. The default is 0.

Returns:
None.
espm.datasets.base.generate_spim(phases, weights, densities, N, seed=0, continuous=False)[source]

Generate a noiseless spectrum image as tensor product of the phases and weights. Then, if asked for, a noisy spectrum image is generated by drawing from a Poisson distribution.

The noiseless spectrum image is defined as:

\[Y^{nl} = N D \otimes ( Diag(d) A )\]

where \(D\) is the normalized phases, \(A\) is the weights, \(d\) is the density modifier and \(N\) is the number of counts per pixel.

To obtain the noisy spectrum image, the noiseless spectrum image is drawn from a Poisson distribution.

Parameters:
phasesarray_like

The phases of the model. Shape (n, spectral_len).

weightsarray_like

The weights of the model. Shape (shape_2d[0], shape_2d[1], n).

densitiesarray_like

Density modifier of the phases. Shape (n,).

Nint

The number of counts per pixel.

seedint, optional

Seed for the random number generator. The default is 0.

continuousbool, optional

If True, the function returns a noiseless spectrum image. The default is False.

Returns:
numpy.ndarray

The spectrum image. Shape (shape_2d[0], shape_2d[1], spectral_len).

Notes

More details about the spectrum image generation can be found in the contribution: [TPH+23].

espm.datasets.base.generate_spim_sample(phases, weights, model_params, misc_params, seed=0, g_params={})[source]

Generate a dictionary containing: the spectrum image (made with the weights and phases), the ground truth, the model parameters and the misc parameters.

Parameters:
phasesarray_like

The phases of the model. Shape (n, spectral_len).

weightsarray_like

The weights of the model. Shape (shape_2d[0], shape_2d[1], n). The weights should sum to one along axis 2.

model_paramsdict

The parameters of the model. For examples see the default parameters in espm.conf.

misc_paramsdict

The misc parameters of the model. For examples see the default parameters in espm.conf.

seedint, optional

The seed for the random number generator. The default is 0.

g_paramsdict, optional

The parameters for the g matrix. The default is {}. Note that for EDXS data the g matrix is not used during the creation of the data.

Returns:
sampledict

A dictionary containing the spectrum image, the ground truth, the model parameters and the misc parameters.

espm.datasets.base.sample_to_EDS_espm(sample, elements=[])[source]

Convert dataset to a custom hyperspy signal type called EDS_espm containing the noisy spectrum image as data, the ground truth as metadata and other useful information.

Parameters:
sampledict

A dictionary containing the noisy spectrum image as data, the ground truth as metadata and other useful information. See espm.datasets.base.generate_spim_sample() for more details.

elementslist, optional

A list of the elements present in the sample. The default is [].

Returns:
EDS_espm

The hyperspy compatible signal object of the espm.eds_spim module.

espm.datasets.base.sample_to_Signal1D(sample)[source]

Same as espm.datasets.base.sample_to_EDS_espm() but for non-EDS data such as the toy dataset.

The module espm.datasets.built_in_EDXS_datasets implements the functions that generate two built-in datasets: - A dataset of 2 particles embedded in a matrix. - A dataset with a linear local accumulation of Sr.

espm.datasets.built_in_EDXS_datasets.generate_built_in_datasets(seeds_range=10)[source]

Generate the two built-in datasets if they are not already present in the datasets folder.

Parameters:
seeds_rangeint

The number of seeds to use for the generation of the built-in datasets. The built-in datasets are generated with a base_seed, and then the base_seed + 1, base_seed + 2, etc. up to base_seed + seeds_range -1.

Returns:
None
espm.datasets.built_in_EDXS_datasets.load_grain_boundary(sample=0)[source]

Load the built-in dataset of a grain boundary.

Parameters:
sampleint

The sample number to load.

Returns:
spimhyperspy.signals.EDS_espm

The loaded dataset.

espm.datasets.built_in_EDXS_datasets.load_particules(sample=0)[source]

Load the built-in dataset of particles.

Parameters:
sampleint

The sample number to load.

Returns:
spimhyperspy.signals.EDS_espm

The loaded dataset.

The module espm.eds_spim implements the EDS_espm class, which is a subclass of the hyperspy.signals.Signal1D class. The main purpose of this class is to provide an easy and clean interface between the hyperspy framework and the espm package: - The metadata are organised to correspond as much as possible to the typical metadata that can be found in hyperspy EDS_TEM object. - The machine learning algorithms of espm can be easily applied to the EDS_espm object using the standard hyperspy decomposition method. See the notebooks for examples. - The EDS_espm provides a convinient way to:

class espm.datasets.eds_spim.EDS_espm(*args, **kwargs)[source]
add_elements(*, elements=[])[source]

Add elements to the existing list of elements in the metadata.

Parameters:
elementslist, optional

List of the elements to be added to the existing list of elements in the metadata. They have to be chemical symbols (e.g. [‘Si’,’Fe’, ‘O’]).

build_G(problem_type='bremsstrahlung', reference_elt={}, stoichiometries=[])[source]

Build the G matrix of the espm.models.EDXS model corresponding to the metadata of the EDS_espm object and stores it as an attribute.

Parameters:
problem_typestr, optional
Determines the type of the G matrix to build. It can be “bremsstrahlung”, “no_brstlg” or “identity”. The parameters correspond to:
  • “bremsstrahlung” : the G matrix is a callable with both characteristic X-rays and a bremsstrahlung model.

  • “no_brstlg” : the G matrix is a matrix with only characteristic X-rays.

  • “identity” : the G matrix is None which is equivalent to an identity matrix for espm functions.

reference_eltdict, optional

Dictionary containing atomic numbers and a corresponding cut-off energies. It is used to separate the characteristic X-rays of the given elements into two energies ranges and assign them each a column in the G matrix instead of having one column per element. For example reference_elt = {“26”,3.0} will separate the characteristic X-rays of the element Fe into two energies ranges and assign them each a column in the G matrix. This is useful to circumvent issues with the absorption.

stoichiometrieslist, optional

List of the stoichiometries of the phases in the sample. In the case the stoichiometry of one of the phase is known, it can be used to improve the accuracy of the decomposition by fixing the ratio between certain elements. Each composition of the list should be given a string such as “Fe2O3” or “FeO” for example. A corresponding model element will be added in the metadata. For a clever use of this feature it is best to use it in combination with a fixed W matrix, see the EDS_espm.set_fixed_W() method.

Returns:
GNone or numpy.ndarray or callable

The G matrix of the espm.models.EDXS model corresponding to the metadata of the EDS_espm object.

build_ground_truth(reshape=True)[source]

Get the ground truth stored in the metadata of the EDS_espm object, if available. The reshape arguments can be used to get the ground truth in a form easier to use for machine learning algorithms.

Parameters:
reshapebool, optional

If False, the ground truth is returned in the form of a 3D array of shape (shape_2d[0],shape_2d[1],n_phases) and a 2D array of shape (n_phases,n_features).

Returns:
phasesnumpy.ndarray

The ground truth of the spectra of the phases.

weightsnumpy.ndarray

The ground truth of the spatial distribution of the phases.

carto_fixed_W(brstlg_comps=1)[source]

Helper function to create a fixed_W matrix for chemical mapping. It will output a matrix It can be used to make a decomposition with as many components as they are chemical elements and then allow each component to have only one of each element. The spectral components are then the characteristic peaks of each element and the spatial components are the associated chemical maps. The bremsstrahlung is calculated separately and added to other components.

Parameters:
brstlg_compsint, optional

Number of bremsstrahlung components to add to the decomposition.

Returns:
Wnumpy.ndarray
print_concentration_report(abs=False, selected_elts=[], W_input=None)[source]

Print a report of the chemical concentrations from a fitted W.

Parameters:
absbool

If True, print the absolute concentrations, if False, print the relative concentrations.

selected_eltslist, optional

List of the elements to be printed. If empty, all the elements will be printed.

W_inputnumpy.ndarray, optional

If not None, the concentrations will be computed from this W matrix instead of the one fitted during the decomposition.

Returns:
None

Notes

  • This function is only available if the learning results contain a decomposition algorithm that has been fitted.

  • The “absolute” concentrations correspond to some physical number. To retrieve the number of atoms per unit volume, you need to multiply by the correct pre-factors such as beam current, detector solid angle, etc…

set_additional_parameters(thickness=2e-05, density=3.5, detector_type='SDD_efficiency.txt', width_slope=0.01, width_intercept=0.065, xray_db='default_xrays.json')[source]

Helper function to set the metadata that are specific to the espm package so that it does not overwrite experimental metadata. See the documentation of the set_analysis_parameters() function for the meaning of the parameters.

set_analysis_parameters(beam_energy=200, azimuth_angle=0.0, elevation_angle=22.0, tilt_stage=0.0, elements=[], thickness=2e-05, density=3.5, detector_type='SDD_efficiency.txt', width_slope=0.01, width_intercept=0.065, xray_db='default_xrays.json')[source]

Helper function to set the metadata of the EDS_espm object. Be careful, it will overwrite the metadata of the object.

Parameters:
beam_energyfloat, optional

The energy of the electron beam in keV.

azimuth_anglefloat, optional

The azimuth angle of the EDS detector in degrees.

elevation_anglefloat, optional

The elevation angle of the EDS detector in degrees.

tilt_stagefloat, optional

The tilt angle of the sample stage in degrees (usually it correspond to alpha on FEI instruments).

elementslist, optional

List of the elements to be used in the analysis.

thicknessfloat, optional

The thickness of the sample in centimeters.

densityfloat, optional

The density of the sample in g/cm^3.

detector_typestr, optional

The type of the detector. It is either the name of a text file containing the efficiency of

width_slopefloat, optional

The slope of the linear fit of the detector width as a function of the energy.

width_interceptfloat, optional

The intercept of the linear fit of the detector width as a function of the energy.

xray_dbstr, optional

The name of the X-ray emission cross-section database to be used. The default tables are avalaible in the espm/tables folder. Additional tables can be generated by emtables.

set_fixed_W(phases_dict)[source]

Helper function to create a fixed_W matrix. The output matrix will have -1 entries except for the elements (and bremsstrahlung parameters) that are present in the phases_dict dictionary. In the output (fixed_W) matrix, the -1 entries will be ignored during the decomposition using espm.estimator.NMFEstimator are normally learned while the non-negative entries will be fixed to the values given in the phases_dict dictionary. Usually, the easiest is to fix some elements to 0.0 in some phases if you want to improve unmixing results. For example, if you have a phase with only Si and O, you can fix the Fe element to 0.0 in this phase.

Parameters:
phases_dictdict

Determines which elements of fixed_W are going to be non-negative. The dictionnary has typically the following structure : phases_dict = {“phase1_name” : {“Fe” : 0.0, “O” : 1.25e23}, “phase2_name” : {“Si” : 0.0, “b0” : 0.05}}.

Returns
——-
Wnumpy.ndarray
set_microscope_parameters(beam_energy=200, azimuth_angle=0.0, elevation_angle=22.0, tilt_stage=0.0)[source]

Helper function to set the microscope parameters of the EDS_espm object. Be careful, it will overwrite the microscope parameters of the object. See the documentation of the set_analysis_parameters() function for the meaning of the parameters.

update_G(part_W=None, G=None)[source]

Update the absortion part of the bremsstrahlung of the G matrix.

property X

The data in the form of a 2D array of shape (n_samples, n_features).

property Xdot

The ground truth in the form of a 3D array of shape (shape_2d[0],shape_2d[1],n_features), if available.

property maps

Ground truth of the spatial distribution of the phases in the form of a 3D array of shape (shape_2d[0],shape_2d[1],n_phases), if available.

property maps_2d

Ground truth of the spatial distribution of the phases in the form of a 2D array of shape (shape_2d[0]*shape_2d[1],n_phases), if available.

property model

The espm.models.EDXS model corresponding to the metadata of the EDS_espm object.

property phases

Ground truth of the spectra of the phases in the form of a 2D array of shape (n_phases,n_features), if available.

property shape_2d

Shape of the data in the spatial dimension.

espm.datasets.eds_spim.build_G(model, g_params)[source]
espm.datasets.eds_spim.get_metadata(spim)[source]

Get the metadata of the EDS_espm object and format it as a model parameters dictionary.