Conference on a FAIR Data Infrastructure for Materials Genomics

Conference on a FAIR Data Infrastructure for Materials Genomics

3 - 5 June, 2020 I virtual meeting

Open questions from the discussions

During the virtual conference, lively discussions were held and the experts answered the participants' questions.
These are the speakers' answers to questions that were not answered due to time limitations:

Tong-Yi Zhang: From Data to Knowledge: Data Driven Discovery of Formulas

watch the plenary talk: | watch the discussion:

Question: I love the point made in the presentation that essentially all our historical scientific models, (like those derived by Newton, Faraday, and many, many others through the scientific revolution) originate from data-driven observations. Do you know of some published literature that discusses these points further?

Answer by Tong-Yi Zhang: Thanks for the question. The following references might answer your question:

  • Goldstein, E.B., Coco, G., 2015. Machine learning components in deterministic models: hybrid synergy in the age of data. Frontiers in Environmental Science 3, 1–4.
  • Schmidt, M., Lipson, H., 2009. Distilling free-form natural laws from experimental data. science 324, 81–85.
  • Vaddireddy, H., Rasheed, A., Staples, A.E., San, O., 2020. Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data. Physics of Fluids 32, 015113.
  • Brunton, S.L., Proctor, J.L., Kutz, J.N., 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113, 3932–3937.
  • Rudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N., 2017. Data-driven discovery of partial differential equations. Science Advances 3, e1602614.
  • Schaeffer, H., 2017. Learning partial differential equations via data discovery and sparse optimization. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 473, 20160446.
  • Zhang, J., Ma, W., 2020. Data-driven discovery of governing equations for fluid dynamics based on molecular simulation. J. Fluid Mech. 892, A5.
  • Howison, T., Hughes, J., Iida, F., 2020. Large-scale automated investigation of free-falling paper shapes via iterative physical experimentation. Nat Mach Intell 2, 68–75.
  • Iten, R., Metger, T., Wilming, H., del Rio, L., Renner, R., 2020. Discovering Physical Concepts with Neural Networks. Phys. Rev. Lett. 124, 010508.
  • Udrescu, S.-M., Tegmark, M., 2020. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631.
  • and our recent review paper: Sun, S., Ouyang, R., Zhang, B., Zhang, T.-Y., 2019. Data-driven discovery of formulas by symbolic regression. MRS Bull. 44, 559–564.

Mark Greiner: FAIR’ifying Experimental Data Community

watch the plenary talk: | watch the discussion:

Question: Following up on the question concerning LIMS, which open-source ELN would you recommend.

Answer by Mark Greiner: Here are a few open source LIMS available. Here is a link where you can read about them: Not all of them are free. The one that seems to be the furthest developed and with the most active community, while also being open source and free, is Senaite ( This can be easily set-up on a Virtual machine and accessed over a web browser. We have this one installed on our server.

Patrick Rinke: Smart Materials Science Data Generation with the BOSS Code

watch the plenary talk: | watch the discussion:

Question: Hi Patrick, great overview and insight. A couple of questions on the Gaussian-Process based active learning:

  • How does the method scale with the dimension of the descriptor space? More specifically, if some dimensions of the descriptor are actually irrelevant, would the method learn efficiently to ignore them (in some sense)?
  • Exploration and exploitation are two objectives often combined in one cost function with one relative-weight parameter. Can this relative weight be hyper-optimized by defining a suitable "human expectations" ranking?

Answer by Patrick Rinke:

  • Concerning question 1: The scaling with dimensions depends on how complicated the landscape is along each dimension. In the worst case, the scaling is exponential. In easier cases, we have had it is more or less quadratic. In principle, one can wrap another learning algorithm around the Gaussian process that learns irrelevant dimensions and discards them. We have not explored this, yet, but it's on the to-do list.
  • Concerning question 2: Yes. This hyperparameter could be optimized externally. We currently don't do this, because for most of our applications the default setting works well. Also, an external hyperparameter optimization like this would increase the number of acquisitions, which is usually something we would like to avoid for expensive acquisitions. We are, however, working with computer scientists at Aalto on human-in-the-loop concepts and this would be a good example for querying human expertise.

Runhai Ouyang: The Data-driven Method SISSO: Concept, Applications, and Challenges

watch the plenary talk: | watch the discussion:

Question: Does SISSO include cross-validation or how can it be combined with CV? And is there some notebook online available?

Answer by Runhai Ouyang: The current SISSO code does not have any CV. Users can easily design their preferred SISSO-powered CV schemes and do it with a script or even manually. Alternatively, I will prepare some useful tools soon for a typical CV and upload it to the 'utilities' folder of the SISSO package on the github.


Page last modified on June 10, 2020, at 03:10 PM EST