Introduction

The ExCAPE (Exascale Compound Activity Prediction Engines) aims at producing state-of-the-art scalable machine learning algorithms and implementation suitable for future Exascale machines (1e18 FLOPs) for predicting compound bioactivity.

The project is part of the H2020 European Initiative, the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020).

The RHUL team contribute in the area of Uncertainty Quantification with their expertise in Conformal Prediction and Venn-Abers Prediction.

Deliverable reports

  • Report #1 : Conformal Predictors The report summarises some preliminary findings of WP1.4: Confidence Estimation and feature significance. It presents an application of conformal predictors in transductive and inductive modes to the large, high-dimensional, sparse and imbalanced data sets found in Compound Activity Prediction from PubChem public repository. The report describes a version of conformal predictors called Mondrian Predictor that keeps validity guarantees for each class. The report also describes briefly the parallelization approach that allowed to distribute the computational load and reduce execution time. Download PDF
  • Report #2 : Probabilistic prediction The objective of this subpackage is to complement the bare prediction of bioactivity with an estimate of its uncertainty. The previous report described Conformal Prediction and discussed some results of its application to BioAssay data. This Report introduces Multi-probabilistic prediction, also referred to as Venn prediction. Download PDF
  • Report #3 : Integration of Conformal Prediction with ML Algorithms The objective of this subpackage is to integrate Conformal Prediction (CP) with the Machine Learning methods adopted in ExCAPE. Conformal Prediction was described in the first Report, with particular emphasis on Inductive and Class-conditional (Mondrian) forms. The deliverable is a Python module that implements Mondrian Inductive CP (MICP). The module is meant to be used as a stage downstream the ML algorithms in the overall pipeline that takes the EXCAPE DB data as input and produces predictions as final output. We understand that the module offers a reference implementation. Partners in WP2 and WP3 are free to re-implement it. Download PDF

Technical reports


Publications

Researchers