Argenomic: Illuminating Chemical Space

Chemists search through the space of molecules to discover new drugs, often aided by search algorithms, which automatically explore a search space to find high-performing solutions. With Argenomic we provide a novel type of chemical space exploration software, based on a paper by Jeriek Van den Abeele and Jonas Verhellen. Argenomic produces a large diversity of high-performing, yet qualitatively different molecules, illuminates the fitness of molecules over chemical space, and improves search efficiency compared to both machine learning and genetic algorithm approaches.

Shining a Light on Chemical Space

Optimization algorithms aim to find the best solution within a given search space. In the case of molecular optimization, where the mathematical evaluation function is often unknown, heuristic search methods are necessary. Genetic algorithms are a popular heuristic method inspired by evolution, where mutations and crossovers are used to generate novel solutions. However, genetic algorithms can stagnate in low-performing valleys or local optima. Argenomic addresses this issue by enforcing molecular diversity through feature-based niches and decoupling mutations from crossovers. By explicitly retaining diverse solutions in far-away niches, Argenomic can escape stagnation and illuminate the relationship between features and performance.

Argenomic assigns candidate solutions generated by a genetic algorithm to a niche, in which only the fittest candidate survives based on user-defined features. This feature-based niche assignment contrasts with classical genetic algorithms, which only retain high-scoring solutions regardless of diversity. The enforced variation between niches allows for diverse crossovers and mutations, spreading potent scaffolds to other niches. The resulting diverse solutions in far-away niches can be used to escape stagnation, while the fitness score obtained represents the capability of the corresponding feature space to contain high-performance molecules. Argenomic provides insights into how varying features affects performance, both positively and negatively.

Handing Over the Reins

In practical terms, users of Argenomic can choose their own features of interest, and define relevant ranges of variation to construct a feature space. If, for instance, a user wants to find medicinally relevant molecules in chemical space, they could construct a feature space based on physicochemical properties like lipophilicity and molecular mass, and practical concerns like synthetic accessibility. The chosen ranges in which to explore these features can be used to specify a desired subset of chemical space in which to generate new molecules. To use the Argenomic software, simply open the terminal, activate the conda environment and run:

python3 illuminate.py configuration_file=./configuration/config.yaml generations=50

After initialization (mainly needed to set-up the niches), every generation the maximum fitness, the mean fitness, and standard deviation are printed. The percent of niches that are filled is also printed every generation. Which options of the Argonomic software are used is determined in the configuration file. For example, our default config.yaml file determines the initial pool of molecules (100 molecules from Guacamol), the batch size of molecules mutated every generation (40), where to store the results, how many niches are used (150), which properties are used to make the niches (ExactMolWt, MolLogP, TPSA, MolMR), and which structural alerts (Glaxo) are used to filter out unwanted molcules.

Inspiration and References

As in all creative endeavors, the ideas behind Argenomic were not born in a vacuum. The projected was kickstarted by discovering the multi-dimensional archive of phenotypic elites (MAP-Elites) as devised by Jean-Baptiste Mouret and Jeff Clune. MAP-Elites is a simple, efficacious and surprisingly powerful tool developed in the context of soft robot design, which solves stagnation issues in genetic algorithms by mimicking biological evolutionary diversity as described above. We chose a graph-based genetic algorithm (GB-GA) as our underlying molecular optimization tool. In GB-GA, candidate molecules are represented by their molecular graph and mutations and crossovers are implemented as actions on the graph. Thanks to Jan H. Jensen a fully open-source implementation of GB-GA is available. The source code for Argenomic can be found on github.

Phone

(+47) 93960932

Address

Bernhard Herres vei 48B
Oslo, 0376
Norway