Datasets
Sample creation
The espm.datasets module implements the functions that combines the spatial distributions generated from espm.weights and the spectra generated from espm.models into a 3D dataset.
This part of the espm package manages the integration into the hyperspy framework :
- the datasets and their metadata are stored as hyperspy signals (.hspy).
- the espm.eds_spim module implements the EDSespm class, which is a subclass of the hyperspy.signals.Signal1D class.
Using the EDSespm class, the user can easily use most of the hyperspy functionalities (e.g. plotting, fitting, decomposition, etc.) as well as the espm functionalites on their experimental and simulated data.
Note
For now espm supports only the signals modeled as EDS data but we aim at implementing the signals corresponding to EELS data too.
The module espm.datasets.base implements the functions that combines a spatial distribution and associated spectra into a 3D dataset. It also implements the functions to convert the dataset into hyperspy compatible objects.
- espm.datasets.base.generate_dataset(*args, base_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/espm/envs/stable/lib/python3.11/site-packages/generated_datasets'), sample_number=10, base_seed=0, elements=[], **kwargs)[source]
Generate a set of spectrum images files and save them in the generated dataset folder. Each spectrum image is saved in a separate file and was generated using a different seed.
- Parameters:
- base_pathstr, optional
The path to the folder where the samples will be saved. The default is DATASETS_PATH.
- sample_numberint, optional
The number of samples to generate. The default is 10.
- base_seedint, optional
The seed used to generate the samples. The default is 0.
- Returns:
- None.
- espm.datasets.base.generate_spim(phases, weights, densities, N, seed=0, continuous=False)[source]
Generate a noiseless spectrum image as tensor product of the phases and weights. Then, if asked for, a noisy spectrum image is generated by drawing from a Poisson distribution.
The noiseless spectrum image is defined as:
\[Y^{nl} = N D \otimes ( Diag(d) A )\]where \(D\) is the normalized phases, \(A\) is the weights, \(d\) is the density modifier and \(N\) is the number of counts per pixel.
To obtain the noisy spectrum image, the noiseless spectrum image is drawn from a Poisson distribution.
- Parameters:
- phasesarray_like
The phases of the model. Shape (n, spectral_len).
- weightsarray_like
The weights of the model. Shape (shape_2d[0], shape_2d[1], n).
- densitiesarray_like
Density modifier of the phases. Shape (n,).
- Nint
The number of counts per pixel.
- seedint, optional
Seed for the random number generator. The default is 0.
- continuousbool, optional
If True, the function returns a noiseless spectrum image. The default is False.
- Returns:
- numpy.ndarray
The spectrum image. Shape (shape_2d[0], shape_2d[1], spectral_len).
Notes
More details about the spectrum image generation can be found in the contribution: [TPH+23].
- espm.datasets.base.generate_spim_sample(phases, weights, model_params, misc_params, seed=0, g_params={})[source]
Generate a dictionary containing: the spectrum image (made with the weights and phases), the ground truth, the model parameters and the misc parameters.
- Parameters:
- phasesarray_like
The phases of the model. Shape (n, spectral_len).
- weightsarray_like
The weights of the model. Shape (shape_2d[0], shape_2d[1], n). The weights should sum to one along axis 2.
- model_paramsdict
The parameters of the model. For examples see the default parameters in espm.conf.
- misc_paramsdict
The misc parameters of the model. For examples see the default parameters in espm.conf.
- seedint, optional
The seed for the random number generator. The default is 0.
- g_paramsdict, optional
The parameters for the g matrix. The default is {}. Note that for EDXS data the g matrix is not used during the creation of the data.
- Returns:
- sampledict
A dictionary containing the spectrum image, the ground truth, the model parameters and the misc parameters.
- espm.datasets.base.sample_to_EDSespm(sample, elements=[])[source]
Convert dataset to a custom hyperspy signal type called EDSespm containing the noisy spectrum image as data, the ground truth as metadata and other useful information.
- Parameters:
- sampledict
A dictionary containing the noisy spectrum image as data, the ground truth as metadata and other useful information. See
espm.datasets.base.generate_spim_sample()for more details.- elementslist, optional
A list of the elements present in the sample. The default is [].
- Returns:
- EDSespm
The hyperspy compatible signal object of the
espm.eds_spimmodule.
- espm.datasets.base.sample_to_Signal1D(sample)[source]
Same as
espm.datasets.base.sample_to_EDSespm()but for non-EDS data such as the toy dataset.
The module espm.datasets.built_in_EDXS_datasets implements the functions that generate two built-in datasets:
- A dataset of 2 particles embedded in a matrix.
- A dataset with a linear local accumulation of Sr.
- espm.datasets.built_in_EDXS_datasets.generate_built_in_datasets(seeds_range=10)[source]
Generate the two built-in datasets if they are not already present in the datasets folder.
- Parameters:
- seeds_rangeint
The number of seeds to use for the generation of the built-in datasets. The built-in datasets are generated with a base_seed, and then the base_seed + 1, base_seed + 2, etc. up to base_seed + seeds_range -1.
- Returns:
- None
- espm.datasets.built_in_EDXS_datasets.load_grain_boundary(sample=0)[source]
Load the built-in dataset of a grain boundary.
- Parameters:
- sampleint
The sample number to load.
- Returns:
- spimhyperspy.signals.EDSespm
The loaded dataset.
- espm.datasets.built_in_EDXS_datasets.load_particules(sample=0)[source]
Load the built-in dataset of particles.
- Parameters:
- sampleint
The sample number to load.
- Returns:
- spimhyperspy.signals.EDSespm
The loaded dataset.
The module espm.eds_spim implements the EDSespm class, which is a subclass of the hyperspy.signals.Signal1D class.
The main purpose of this class is to provide an easy and clean interface between the hyperspy framework and the espm package:
- The metadata are organised to correspond as much as possible to the typical metadata that can be found in hyperspy EDS_TEM object.
- The machine learning algorithms of espm can be easily applied to the EDSespm object using the standard hyperspy decomposition method. See the notebooks for examples.
- The EDSespm provides a convinient way to:
get the results of
espm.estimators.NMFEstimatoraccess ground truth in case of simulated data
estimate best binning thanks to the method developed by G. Obozinski, N. Perraudin and M. Martinez Ruts.
set fixed W for the
espm.estimators.NMFEstimatordecomposition
- class espm.datasets.eds_spim.EDSespm(*args, **kwargs)[source]
- build_G(problem_type='bremsstrahlung', ignored_elements=['Cu'], *, elements_dict={})[source]
Build the G matrix of the
espm.models.EDXSmodel corresponding to the metadata of theEDSespmobject and stores it as an attribute.- Parameters:
- problem_typestr, optional
- Determines the type of the G matrix to build. It can be “bremsstrahlung”, “no_brstlg” or “identity”. The parameters correspond to:
“bremsstrahlung” : the G matrix is a callable with both characteristic X-rays and a bremsstrahlung model.
“no_brstlg” : the G matrix is a matrix with only characteristic X-rays.
“identity” : the G matrix is None which is equivalent to an identity matrix for espm functions.
- elements_dictdict, optional
Dictionary containing atomic numbers and a corresponding cut-off energies. It is used to separate the characteristic X-rays of the given elements into two energies ranges and assign them each a column in the G matrix instead of having one column per element. For example elements_dict = {“26”,3.0} will separate the characteristic X-rays of the element Fe into two energies ranges and assign them each a column in the G matrix. This is useful to circumvent issues with the absorption.
- Returns
- ——-
- None
- carto_fixed_W(brstlg_comps=1)[source]
Helper function to create a fixed_W matrix for chemical mapping. It will output a matrix It can be used to make a decomposition with as many components as they are chemical elements and then allow each component to have only one of each element. The spectral components are then the characteristic peaks of each element and the spatial components are the associated chemical maps. The bremsstrahlung is calculated separately and added to other components.
- Parameters:
- brstlg_compsint, optional
Number of bremsstrahlung components to add to the decomposition.
- Returns:
- Wnumpy.ndarray
- decomposition(normalize_poissonian_noise=False, navigation_mask=None, closing=True, *args, **kwargs)[source]
Apply a decomposition to a dataset with a choice of algorithms.
The results are stored in
self.learning_results.Read more in the User Guide.
- Parameters:
- normalize_poissonian_noisebool, default True
If True, scale the signal to normalize Poissonian noise using the approach described in [*].
- navigation_maskNone or float or boolean numpy array, default 1.0
The navigation locations marked as True are not used in the decomposition. If float is given the vacuum_mask method is used to generate a mask with the float value as threshold.
- closing: bool, default True
If true, applied a morphologic closing to the mask obtained by vacuum_mask.
- algorithm{“SVD”, “MLPCA”, “sklearn_pca”, “NMF”, “sparse_pca”, “mini_batch_sparse_pca”, “RPCA”, “ORPCA”, “ORNMF”, custom object}, default “SVD”
The decomposition algorithm to use. If algorithm is an object, it must implement a
fit_transform()method orfit()andtransform()methods, in the same manner as a scikit-learn estimator.- output_dimensionNone or int
Number of components to keep/calculate. Default is None, i.e.
min(data.shape).- centre{None, “navigation”, “signal”}, default None
If None, the data is not centered prior to decomposition.
If “navigation”, the data is centered along the navigation axis. Only used by the “SVD” algorithm.
If “signal”, the data is centered along the signal axis. Only used by the “SVD” algorithm.
- auto_transposebool, default True
If True, automatically transposes the data to boost performance. Only used by the “SVD” algorithm.
- signal_maskboolean numpy array
The signal locations marked as True are not used in the decomposition.
- var_arraynumpy array
Array of variance for the maximum likelihood PCA algorithm. Only used by the “MLPCA” algorithm.
- var_funcNone or function or numpy array, default None
If None, ignored
If function, applies the function to the data to obtain
var_array. Only used by the “MLPCA” algorithm.If numpy array, creates
var_arrayby applying a polynomial function defined by the array of coefficients to the data. Only used by the “MLPCA” algorithm.
- reproject{None, “signal”, “navigation”, “both”}, default None
If not None, the results of the decomposition will be projected in the selected masked area.
- return_info: bool, default False
The result of the decomposition is stored internally. However, some algorithms generate some extra information that is not stored. If True, return any extra information if available. In the case of sklearn.decomposition objects, this includes the sklearn Estimator object.
- print_infobool, default True
If True, print information about the decomposition being performed. In the case of sklearn.decomposition objects, this includes the values of all arguments of the chosen sklearn algorithm.
- svd_solver{“auto”, “full”, “arpack”, “randomized”}, default “auto”
- If auto:
The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.
- If full:
run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd(), and select the components by postprocessing- If arpack:
use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds(). It requires strictly 0 < output_dimension < min(data.shape)- If randomized:
use truncated SVD, calling
sklearn.utils.extmath.randomized_svd()to estimate a limited number of components
- copybool, default True
If True, stores a copy of the data before any pre-treatments such as normalization in
s._data_before_treatments. The original data can then be restored by callings.undo_treatments().If False, no copy is made. This can be beneficial for memory usage, but care must be taken since data will be overwritten.
- **kwargsextra keyword arguments
Any keyword arguments are passed to the decomposition algorithm.
See also
vacuum_mask
References
Examples
>>> s = exspy.data.EDS_TEM_FePt_nanoparticles() >>> si = hs.stack([s]*3) >>> si.change_dtype(float) >>> si.decomposition()
- estimate_best_binning(inspect=False)[source]
Estimate the best binning for the dataset based on the method developed by G. Obozinski, N. Perraudin and M. Martinez Ruts. M. Martinez Ruts has designed an estimator that compares the binned and unbinned data and its minimum gives the best binning factor.
- Parameters:
- bin_samplingint, optional
Number of binning factors to sample for the estimation.
- inspectbool, optional
If True, the function will return the values of the estimator for each binning factor and the estimated best binning factor. If False, it will return only the estimated best binning factor.
- Returns:
- estimated_binningtuple
The estimated binning for the dataset.
- estimate_mass_thickness(ignored_elements=['Cu'], tol=1e-08, *, elements_dict={})[source]
Based on the complete metadata of the
EDSespmobject, this function estimates the mass thickness of the sample. This function derives the mass-thickness from the characteristic X-rays. Then the bremsstrahlung parameters are estimated using that mass-thickness. The process is then repeated ten times to ensure convergence. The results are plotted on the spectrum.Check the metadata to read the estimated mass-thickness.
- Parameters:
- elements_dictdict, optional
Dictionary containing atomic numbers and a corresponding cut-off energies. It is used to separate the characteristic X-rays of the given elements into two energies ranges and assign them each a column in the G matrix instead of having one column per element. This is useful to circumvent issues with the mass-absorption coefficient.
- Returns:
- None
Notes
The mass-thickness \(\rho t\) in g.cm^-2 is estimated using the following formula:
\[\rho t = \frac{H}{I \times 10^{-9} \times \tau \times N_e \times \sigma \times \Omega / (4\pi)}\]where \(H\) is the intensity of the characteristic X-rays, \(I\) is the beam current in nA, \(\tau\) is the acquisition time in seconds, \(N_e\) is the number of electrons in a Coulomb, \(sigma\) is the average X-ray emission cross-section, and \(\Omega\) is the geometric efficiency of the detector in sr.
We recommend to use the
select_background_windows()method to select the background windows before running this method.
- print_concentration_report(selected_elts=[], W_input=None, fit_error=True, disclaimer=True)[source]
Print a report of the chemical concentrations from a fitted W.
- Parameters:
- selected_eltslist, optional
List of the elements to be printed. If empty, all the elements will be printed.
- W_inputnumpy.ndarray, optional
If not None, the concentrations will be computed from this W matrix instead of the one fitted during the decomposition.
- fit_errorbool, optional
If True, the statistical errors on the concentrations will be printed.
- disclaimerbool, optional
If True, a disclaimer will be printed at the end of the report.
- Returns:
- None
Notes
This function is only available if the learning results contain a decomposition algorithm that has been fitted.
- select_background_windows(num_windows=4, ranges=None)[source]
Select the background windows for the bremsstrahlung estimation. The function will open a window with the spectrum and the user will be able to select the background windows by clicking and dragging the mouse. Click then on ‘Apply’ to validate the selection. A bremmstrahlung model will be estimated and plotted on the spectrum.
- Parameters:
- num_windowsint, optional
Number of background windows to select.
- rangeslist, optional
List of tuples containing the left and right bounds of the background windows. If provided, the function will not open a window and will directly use the provided ranges, bypassing the gui.
- Returns:
- None
- set_analysis_parameters(thickness=None, density=None, detector_type=None, width_slope=None, width_intercept=None, geom_eff=None, xray_db=None)[source]
Set the relevant parameters for the analysis in the metadata of the
EDSespmobject.- Parameters:
- thicknessfloat
Thickness of the sample in cm.
- densityfloat
Density of the sample in g/cm^3.
- detector_typestr
Type of the detector. The default is “SDD_efficiency.txt”.
- width_slopefloat
Slope of the width of the peaks in the EDS spectrum.
- width_interceptfloat
Intercept of the width of the peaks in the EDS spectrum.
- geom_efffloat
Geometric efficiency of the detector.
- acq_timefloat
Acquisition time of the spectrum in seconds.
- probe_currentfloat
Probe current in A.
- xray_dbstr
Path to the xray database file. The default is “200keV_xrays.json”.
- set_fixed_W(phases_dict)[source]
Helper function to create a fixed_W matrix. The output matrix will have -1 entries except for the elements (and bremsstrahlung parameters) that are present in the phases_dict dictionary. In the output (fixed_W) matrix, the -1 entries will be ignored during the decomposition using
espm.estimator.NMFEstimatorare normally learned while the non-negative entries will be fixed to the values given in the phases_dict dictionary. Usually, the easiest is to fix some elements to 0.0 in some phases if you want to improve unmixing results. For example, if you have a phase with only Si and O, you can fix the Fe element to 0.0 in this phase.- Parameters:
- phases_dictdict
Determines which elements of fixed_W are going to be non-negative. The dictionnary has typically the following structure : phases_dict = {“phase1_name” : {“Fe” : 0.0, “O” : 1.25e23}, “phase2_name” : {“Si” : 0.0, “b0” : 0.05}}.
- Returns
- ——-
- Wnumpy.ndarray
- property G
The G matrix of the
espm.models.EDXSmodel corresponding to the metadata of theEDSespmobject.
- property X
The data in the form of a 2D array of shape (n_samples, n_features).
- property custom_init
Boolean setting whether using the custom_init (see espm.models.EDXS) or not. If True, the custom_init will be used to initialise the decomposition. If False, the default initialisation will be used. If None, the will be set to False.
- property shape_2d
Shape of the data in the spatial dimension.