direpack.sprm.sprm.sprm

class sprm(n_components=1, eta=0.5, fun='Hampel', probp1=0.95, probp2=0.975, probp3=0.999, centre='median', scale='mad', verbose=True, maxit=100, tol=0.01, start_cutoff_mode='specific', start_X_init='pcapp', columns=False, copy=True)[source]

SPRM Sparse Partial Robust M Regression

Algorithm first outlined in:

Sparse partial robust M regression, Irene Hoffmann, Sven Serneels, Peter Filzmoser, Christophe Croux, Chemometrics and Intelligent Laboratory Systems, 149 (2015), 50-59.

Parameters
  • eta (float.) – Sparsity parameter in [0,1)

  • n_components (int) – min 1. Note that if applied on data, n_components shall take a value <= min(x_data.shape)

  • fun (str) – downweighting function. ‘Hampel’ (recommended), ‘Fair’ or ‘Huber’

  • probp1 (float) – probability cutoff for start of downweighting (e.g. 0.95)

  • probp2 (float) – probability cutoff for start of steep downweighting (e.g. 0.975, only relevant if fun=’Hampel’)

  • probp3 (float) – probability cutoff for start of outlier omission (e.g. 0.999, only relevant if fun=’Hampel’)

  • centre (str) – type of centring (‘mean’, ‘median’, ‘l1median’, or ‘kstepLTS’, the latter recommended statistically, if too slow, switch to ‘median’)

  • scale (str) – type of scaling (‘std’,’mad’, ‘scaleTau2’ [recommended] or ‘None’)

  • verbose (booleans) – specifying verbose mode

  • maxit (int) – maximal number of iterations in M algorithm

  • tol (float) – tolerance for convergence in M algorithm

  • start_cutoff_mode (str,) – values:’specific’ will set starting value cutoffs specific to X and y (preferred); any other value will set X and y stating cutoffs identically. The latter yields identical results to the SPRM R implementation available from CRAN.

  • start_X_init (str,) – values: ‘pcapp’ will include a PCA/broken stick projection to calculate the staring weights, else just based on X; any other value will calculate the X starting values based on the X matrix itself. This is less stable for very flat data (p >> n), yet yields identical results to the SPRM R implementation available from CRAN.

  • columns ((def false) Either boolean, list, numpy array or pandas Index) – if False, no column names supplied; if True, if X data are supplied as a pandas data frame, will extract column names from the frame throws an error for other data input types if a list, array or Index (will only take length x_data.shape[1]), the column names of the x_data supplied in this list, will be printed in verbose mode.

  • copy ((def True) boolean, whether to copy data) –

Attributes always provided
  • x_weights_: X block PLS weighting vectors (usually denoted W)

  • x_loadings_: X block PLS loading vectors (usually denoted P)

  • C_: vector of inner relationship between response and latent variablesblock re

  • x_scores_: X block PLS score vectors (usually denoted T)

  • coef_: vector of regression coefficients

  • intercept_: intercept

  • coef_scaled_: vector of scaled regression coeeficients (when scaling option used)

  • intercept_scaled_: scaled intercept

  • residuals_: vector of regression residuals

  • x_ev_: X block explained variance per component

  • y_ev_: y block explained variance

  • fitted_: fitted response

  • x_Rweights_: X block SIMPLS style weighting vectors (usually denoted R)

  • x_caseweights_: X block case weights

  • y_caseweights_: y block case weights

  • caseweights_: combined case weights

  • colret_: names of variables retained in the sparse model

  • x_loc_: X block location estimate

  • y_loc_: y location estimate

  • x_sca_: X block scale estimate

  • y_sca_: y scale estimate

  • non_zero_scale_vars_: indicator vector of variables in X with nonzero scale

__init__(n_components=1, eta=0.5, fun='Hampel', probp1=0.95, probp2=0.975, probp3=0.999, centre='median', scale='mad', verbose=True, maxit=100, tol=0.01, start_cutoff_mode='specific', start_X_init='pcapp', columns=False, copy=True)[source]

Methods

__init__([n_components, eta, fun, probp1, ...])

fit(X, y)

Fit a SPRM model.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

predict(Xn)

Predict using a SPRM model.

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_params(**params)

Set the parameters of this estimator.

transform(Xn)

Transform input data.

valscore(Xn, yn, scoring)

Specific score function for validation data

weightnewx(Xn)

Calculate case weights for new data based on the projection in the SPRM score space

Attributes