Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

projects

Human-Simulatable Additive Classification Models

Published: August 18, 2025

Generalised additive classification models provide probabilistic predictions by adding the output of individual predictors to form a raw score $a=a_1+\dots+a_k$ and then by mapping this raw score to a probability $p(a)$. This process is considered human-interpretable because of the relative ease with which humans can carry out addition, and this is further enhanced by model fitting methods that restrict the individual predictor output to a small set of round values, i.e., $a_i \in \{-5, \dots, 5\}$. However, even when the resulting raw sum is round and easy to compute, the standard classification machinery based around the logistic transformation $p(a)=e^a/(1+e^a)$ renders it hard for humans to convert raw outputs to probabilities. For instance, a raw model output of $a=1$ corresponds to a probability $e/(1+e) \approx$ 0.7311 and $a=2$ corresponds to $e^2/(1+e^2)\approx 0.8808$.

Graph Neural Networks for Guiding Chemical Synthesis of Metal–Organic Frameworks

Published: August 25, 2025

This project will explore how graph neural networks (GNNs) can be used to probabilistically predict successful crystallization conditions in the reticular synthesis of novel metal–organic frameworks (MOFs). MOFs are rich chemical structures with important applications in carbon capture and storage, water harvesting from air, hydrogen storage, and catalysis—areas central to addressing global energy and sustainability challenges.

publications

Identifying domains of applicability of machine learning models for materials science

Published in Nature Communications, 2020

Demonstrates how statistical rule learning enables the discovery of trustworthy input ranges of machine learning models for materials properties.

Recommended citation: C Sutton, M Boley, LM Ghringhelli, M Rupp, J Vreeken, M Scheffler. (2020). "Identifying domains of applicability of machine learning models for materials science." Nature Communications. 11(1),4428.
Download Paper

Better Short than Greedy: Interpretable Models through Optimal Rule Boosting

Published in Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), 2021

Improving the accuracy / comprehensibility trade-off of additive rule ensembles via exactly optimising the gradient boosting objective for conjunctive rules.

Recommended citation: M Boley, S Teshuva, P Le Bodic, G Webb. (2021). "Better Short than Greedy: Interpretable Models through Optimal Rule Boosting." SDM.
Download Paper

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly

Published in Journal of Chemical Information and Modeling, 2023

Prodives interpetable models to predict the morphological outcome of polymerization-induced self-assemblies with a performance that suffices to reduce time-consuming experimentations by practitioners.

Recommended citation: Y Lu, D Yalcin, PJ Pigram, LD Blackman, M Boley. "Interpretable machine learning models for phase prediction in polymerization-induced self-assembly." Journal of Chemical Information and Modeling 63, no. 11 (2023): 3288-3306.
Download Paper

Bayes beats cross validation: fast and accurate ridge regression via expectation maximization

Published in Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), 2023

Presents a novel method for tuning the regularization hyper-parameter, λ, of a ridge regression that is faster to compute than leave-one-out cross-validation (LOOCV) while yielding estimates of the regression parameters of equal, or particularly in the setting of sparse covariates, superior quality to those obtained by minimising the LOOCV risk

Recommended citation: SY Tew, M Boley, DF Schmidt. (2023). "Bayes beats cross validation: fast and accurate ridge regression via expectation maximization." NeurIPS. 36
Download Paper

Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles

Published in International Conference on Artificial Intelligence and Statistics, 2024

Improving the accuracy / comprehensibility trade-off of rule ensembles and other additive models via proper adaption of boosting with weight correction.

Recommended citation: F Yang, P Le Bodic, M Kamp, M Boley. (2024). "Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles." AISTATS.
Download Paper

talks

An Introduction to Subgroup Discovery

Published: September 26, 2018

Subgroup discovery (SGD) is a form of local pattern discovery for labeled data that can help find interpretable descriptors from materials-science data obtained by first-principles calculations. In contrast to global modeling algorithms like kernel ridge regression or artificial neural networks, SGD finds local regions in the input space in which a target property takes on an interesting distribution. These local distributions can potentially reflect interesting scientific phenomena that are not represented in standard machine learning models. In this talk, we go over the conceptual basics of SGD, sketch corresponding search algorithms, and show some exemplary applications to materials-science data obtained by first-principles calculations.

Download Download Slides

</p>