Featurization or feature engineering is key to the success of machine learning models for chemical and biological systems. In this talk, we will discuss the topological data analysis (TDA) and its combination with machine learning models. Unlike traditional graph/network or geometric models, TDA can significantly reduce the data complexity and dimensionality by the characterization of only intrinsic global information1. In topology based machine learning models, biomolecular topological fingerprints are extracted by using the persistent homology, which is the most important tool in TDA. These topological fingerprints are then transformed into feature vectors and inputted into machine learning models, including SVM, random forest, CNN, etc. Recently, topology based machine learning models have made enormous progress in protein-ligand binding affinity prediction and consistently delivered some of the best results in D3R drug design Grand Challenge 2, 3 and 4.
Dr. Kelin Xia obtained his PhD degree from the Chinese Academy of Sciences in Jan 2013. He was a visiting scholar in the Department of Mathematics, Michigan State University from Dec 2009-Dec 2012. From Jan 2013 to May 2016, he worked as a visiting assistant professor at Michigan State University. He joined Nanyang Technological University at Jun 2016. His research focused on scientific computation, mathematical molecular biology, and topological data analysis (TDA) of complex data in biomolecular systems.