Data Science for similarity and crystal-structure prediction in Materials
Mission and Research Topics
With the advent of high-throughput materials science computations, millions of calculated data are now available to the scientific community (see the Novel Materials Discovery (NOMAD) repository, the NOMAD laboratory and references therein). This represents an unprecedented opportunity for data driven materials science.
Artist's impression of the application of data science to materials science. The picture on the right was automatically generated by a convolutional neural network starting from the two pictures on the left, so the "artist" is the neural network itself!
The research of this group is focused on developing and implementing scalable and efficient computational methods to automatically (or semi-automatically) extract knowledge from materials science data.
More specifically, we work in the following two areas:
- Materials similarities: We develop methods to assess similarities and to build similarity maps between materials, these similarities being based on either structural, mechanical or chemical properties. On the one hand, such maps can be used to eliminate redundancy in materials science databases. On the other hand, these materials maps would also reveal which regions of this high-dimensional space have not been explored yet, but may contain novel materials with unusual properties.
- Crystal-structure classification and prediction (in collaboration with Dr. Luca Ghiringhelli): We use low-dimensional representations of physical systems (descriptors) and supervised learning techniques – in particular neural networks and kernel methods - to automatically classify crystal structures. Moreover, we will apply these approaches to technologically relevant crystal-structure prediction problems.
Being part of the Novel Materials Discovery European Center of Excellence, we are making the computational tools that stem from our research available to the scientific community with both easy-the-use and more advanced tutorials in the context of the NOMAD Analytics Toolkit.
The starting point of our research are state-of-the-art data science techniques, such as for example convolutional and siamese neural networks, kernel methods, hierarchical clustering algorithms, and various dimensionality reduction methods. On top of this, we integrate our physical insight and domain knowledge in both descriptor identification (how the system is represented) and modeling, modifying current algorithms according to the physical problem that needs to be solved. In fact, we strongly believe that the application of data science to materials should not only lead to transferable models with excellent performance, but more importantly generate value through real physical and chemical insight.