Alexandre Quemy joined IBM in 2015 where he currently holds a senior engineer position in the Data and AI organization. Prior joining IBM, he worked at Inria research center in France, in the team Machine Learning and Optimization on multi-objective optimization and meta-heuristics. In parallel of his activities at IBM, he is pursuing a Ph.D. at PUT, under the supervision of Pr. Wrembel, in the field of AutoML in unstructured space with a focal point on explainability.
For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as AutoML. AutoML focuses on solving the black-box optimization problem named CASH, for Combined Algorithm Selection and Hyperparameter optimization. As its name states, CASH consists in selection the proper algorithm and its set of hyperparameters.
However, a rule of thumb regarding the effort in building a machine learning workflows is that 80% of the time is spent on data preparation. Therefore, one might wonder why most effort is spent on automating the least time consuming part. In this talk, we will try to answer this question and to show that indeed, under time constraint, it is often - but not always - more interesting to allocate time to automated data preparation rather than hyperparameter tuning. Last, we will briefly present a robust two-stage optimization process to allocate time between data preparation and CASH.