An Introduction to Subgroup Discovery

Date:

Subgroup discovery (SGD) is a form of local pattern discovery for labeled data that can help find interpretable descriptors from materials-science data obtained by first-principles calculations. In contrast to global modeling algorithms like kernel ridge regression or artificial neural networks, SGD finds local regions in the input space in which a target property takes on an interesting distribution. These local distributions can potentially reflect interesting scientific phenomena that are not represented in standard machine learning models. In this talk, we go over the conceptual basics of SGD, sketch corresponding search algorithms, and show some exemplary applications to materials-science data obtained by first-principles calculations.