Machine learning with scientific data often requires translating scientific into a form usable by machine learning. In the case of scikit-learn, ML models require a fixed set of numbers for each data point and we will introduce a few approaches for pushing a scientific concept like "molecule" into a set of numbers.