Surrogate

This module contains various surrogate models used in the illumination library.

Fingerprint Surrogate

class illumination.functions.surrogate.Fingerprint_Surrogate(config)[source]

Bases: GP_Surrogate

A surrogate model using molecular fingerprints for predicting fitness values. The surrogate model is based on Gaussian Processes (GP) regression.

Attributes:

representation: The type of molecular fingerprint used for encoding molecules. generator: The fingerprint generator corresponding to the chosen representation.

Methods:

__init__: Initializes the Fingerprint_Surrogate object with the specified fingerprint representation. calculate_encodings: Calculates fingerprint encodings for a list of molecules. add_to_prior_data: Adds new molecules and their fitness values to the training data for the GP model.

add_to_prior_data(molecules)[source]

Adds new molecules and their fitness values to the training data for the GP model.

Args:

molecules: A list of new molecules to be added to the training data.

Returns:

None

calculate_encodings(molecules)[source]

Calculates fingerprint encodings for a list of molecules.

Args:

molecules: A list of molecules to be encoded.

Returns:

List[np.ndarray]: A list of fingerprint encodings.

String Surrogate

class illumination.functions.surrogate.String_Surrogate(config)[source]

Bases: GP_Surrogate

A surrogate model using molecular string representations (SMILES or SELFIES) for predicting fitness values. The surrogate model is based on Gaussian Processes (GP) regression.

Attributes:

smiles: A list of SMILES strings representing the molecules. representation: The type of molecular string representation used (e.g., SMILES or SELFIES). cv: A CountVectorizer object for converting molecular strings into numerical representations.

Methods:

__init__: Initializes the String_Surrogate object with the specified molecular string representation. calculate_encodings: Calculates string encodings for a list of molecules. add_to_prior_data: Adds new molecules and their fitness values to the training data for the GP model.

add_to_prior_data(molecules)[source]

Adds new molecules and their fitness values to the training data for the GP model.

Args:

molecules: A list of new molecules to be added to the training data.

Returns:

None

calculate_encodings(molecules)[source]

Calculates string encodings for a list of molecules.

Args:

molecules: A list of molecules to be encoded.

Returns:

np.ndarray: A 2D array of string encodings.