Regression models#

Classes to build models from an class:metatensor.TensorMap.

Model classes listed here use are based on Numpy. Classes based on torch will be added in the future.

class equisolve.numpy.models.Ridge[source]#

Linear least squares with l2 regularization for metatensor.Tensormap’s.

Weights \(w\) are calculated according to

\[w = X^T \left( X \cdot X^T + α I \right)^{-1} \cdot y \,,\]

where \(X\) is the training data, \(y\) the target data and \(α\) is the regularization strength.

Ridge will regress a model for each block in X. If a block contains components the component values will be stacked along the sample dimension for the fit. Therefore, the corresponding weights will be the same for each component.

fit(X: TensorMap, y: TensorMap, alpha: float | TensorMap = 1.0, sample_weight: float | TensorMap | None = None, solver='auto', cond: float | None = None) None[source]#

Fit a regression model to each block in X.

Ridge takes all available values and gradients in the provided TensorMap for the fit. Gradients can be exlcuded from the fit if removed from the TensorMap. See metatensor.remove_gradients() for details.

Parameters:
  • X – training data

  • y – target values

  • alpha – Constant α that multiplies the L2 term, controlling regularization strength. Values must be non-negative floats i.e. in [0, inf). α can be different for each column in X to regulerize each property differently.

  • sample_weight – Individual weights for each sample. For None or a float, every sample will have the same weight of 1 or the float, respectively..

  • solver

    Solver to use in the computational routines:

    • ”auto”: If n_features > n_samples in X, it solves the dual problem using cholesky_dual. If this fails, it switches to scipy.linalg.lstsq() to solve the dual problem. If n_features <= n_samples in X, it solves the primal problem using cholesky. If this fails, it switches to scipy.linalg.eigh() on (X.T @ X) and cuts off eigenvalues below machine precision times maximal shape of X.

    • ”cholesky”: using scipy.linalg.solve() on (X.T@X) w = X.T @ y

    • ”cholesky_dual”: using scipy.linalg.solve() on the dual problem (X@X.T) w_dual = y, the primal weights are obtained by w = X.T @ w_dual

    • ”lstsq”: using scipy.linalg.lstsq() on the linear system X w = y

  • cond – Cut-off ratio for small singular values during the fit. For the purposes of rank determination, singular values are treated as zero if they are smaller than cond times the largest singular value in “weights” matrix.

predict(X: TensorMap) TensorMap[source]#

Predict using the linear model.

Parameters:

X – samples

Returns:

predicted values

score(X: TensorMap, y: TensorMap, parameter_key: str) float[source]#

Return the weights of determination of the prediction.

Parameters:
  • X – Test samples

  • y – True values for X.

  • parameter_key – Parameter to score for. Examples are "values", "positions" or "cell".

Returns score:

\(\mathrm{RMSE}\) for each block in self.predict(X) with respecy to y.

property weights: TensorMap#

Tensormap containing the weights of the provided training data.