by Michal Valko | Principal Llama Engineer | Meta
Show info PresentationBiography:
Michal is the Principal Llama Engineer at Meta,
a former machine learning scientist in Google DeepMind Paris, tenured researcher at Inria, and the lecturer of the master course “Graphs in Machine Learning” at l'ENS MVA at Paris-Saclay. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. That is why he is working on methods and settings that are able to deal with minimal feedback, such as deep reinforcement learning, bandit algorithms, self-supervised learning, or self play. Michal has recently worked on representation learning, word models and deep (reinforcement) learning algorithms that have some theoretical underpinning. In the past he has also worked on sequential algorithms with structured decisions where exploiting the structure leads to provably faster learning. Michal is now working on large large models (LMMs), in particular providing algorithmic solutions for their scalable fine-tuning and alignment. He received his Ph.D. in 2011 from the University of Pittsburgh under the supervision of Miloš Hauskrecht and was a postdoc of Rémi Munos before getting a permanent position at Inria in 2012 and starting DeepMind Paris in 2018.
Abstract:
Reinforcement learning from human feedback (RLHF) is a go-to solution for aligning large language models (LLMs) with human preferences; it passes through learning a reward model that subsequently optimizes the LLM's policy. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. We turn to an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF) and give a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.
by Björn Schuller | Professor of Artificial Intelligence | ICL | Professor of Health Informatics | TUM
Show info PresentationBiography:
Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany where he is Full Professor and Chair of Health Informatics. He is also Full Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK, co-founding CEO and current CSO of audEERING amongst other Professorships and Affiliations. Previous stays include Full Professor and Chair at the University of Augsburg/Germany, independent research leader within the Alan Turing Institute as part of the UK Health Security Agency, Full Professor at the University of Passau/Germany, Key Researcher at Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the BCS, Fellow of the ELLIS, Fellow of the ISCA, Fellow and President-Emeritus of the AAAC, Elected Full Member Sigma Xi, and Senior Member of the ACM. He (co-)authored 1,400+ publications (60,000+ citations, h-index=100+ ranking him number 7 in the UK for Computer Science), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments.
Abstract:
In this talk, we will voyage into uncharted territories where the latest in artificial intelligence converges with computational audio understanding and generation. It will unveil how cutting-edge deep learning techniques propel Computer Audition into unexplored dimensions. From unraveling complex audio patterns in healthcare diagnostics to orchestrating audio composition, it illuminates the interstellar applications of automatic end-to-end learning and self-supervised large models in real-world scenarios. The talk further navigates through the stars of current algorithmic innovation, showcasing how this technology increasingly harmonizes with the complexity of human audio treatment. Prepare to embark on a sensorial expedition, where Computer Audition unveils its transformative prowess, propelling us beyond the boundaries of sound, and into an interstellar realm of possibilities.
by Gabriella Pasi | Professor | University of Milano-Bicocca
Show info PresentationBiography:
Gabriella Pasi is professor of the University of Milano-Bicocca, and Head of the Department of Informatics, Systems and Communication, where she leads the Information and Knowledge Representation, Retrieval, and Reasoning (IKR3) research Lab. Her main research interests are related to Information Retrieval and Natural Language Processing, Knowledge Representation and Reasoning, User Modelling, and Social Media mining. She has published more than 250 papers, and she has served as Program Chair and Senior Area Chair of several international conferences. She has delivered Keynote talks at several international conferences, and She was a panelist at the Panel “Women in IR” at SIGIR 2022. She is Associate Editor of several international journals (that include ACM Computing Surveys and the IEEE Transactions on Fuzzy Systems). She recently received an Outstanding Research Contribution Award from the Web Intelligence Consortium. She has been both coordinator and PI of several projects. She has been member of the Panel of Experts for Computer Science (PE6) of the ERC, for both Starting and Consolidator Grants. Since 2021 she is co-director of the ELLIS Unit in Milan (European Laboratory for Learning and Intelligent Systems).
Abstract:
In the past few years there has been an increasing interest in the application of Deep Learning techniques to various Information Retrieval tasks, among which Personalized Search.
In this lecture, I will present an overview of the research undertaken in my research lab, related to the definition of neural approaches to personalise search. In particular, three contributions will be synthesized; the first one is an approach to user modeling in the context of Product Search. By this approach multiple representations of the user preferences are defined, which are combined and exploited in a re-ranking approach. A second contribution addresses the issue of query-aware user modeling, by proposing a way to decide which user-related information is useful to answer a specific query and if personalization is required. Finally, a personalized approach to query expansion will be presented.
by Hamed Valizadegan | Senior Researcher (Machine Learning Lead) | NASA
Show info PresentationBiography:
Holder a PhD in computer science with a focus in Machine Learning, Dr. Valizadegan has more than 20 years of experience in Artificial Intelligence and Machine Learning. He is a data science manager and machine learning lead at USRA and a senior machine learning scientist at NASA through NASA Academic Mission Services (NAMS). Dr. Valizadegan has applied machine learning expertise in a diverse set of domains from engineering and biology to medicine and astronomy. At NASA, Dr. Valizadegan has been involved in several machine learning projects related to Hubble Space Telescope, Kepler and TESS missions, James Web Space Telescope (JWST) mission, Space Biology, and Orion Vehicle. Dr. Valizadegan has led a team of scientists who developed machine learning models to discover 370 new exoplanets to date.
Abstract:
The Kepler and TESS missions have yielded an astounding 100,000 potential transit signals, paving the way for an intricate process of distillation to identify viable exoplanet candidates. In response to this formidable challenge, we introduce ExoMiner, a groundbreaking deep neural network meticulously designed for the classification of transit signals in the search for exoplanets. ExoMiner played a pivotal role in validating and authenticating 301 previously undiscovered exoplanets. This keynote voyage commences with a sweeping overview of the captivating realm of exoplanetary exploration. As we delve into the heart of our presentation, we embark on an exploration of the distinctive attributes that set ExoMiner apart, unraveling the intricate web of factors that contribute to its remarkable accuracy. Our narrative extends beyond the confines of ExoMiner as we navigate through the integration of additional machine learning models seamlessly layered atop the ExoMiner architecture. This collaborative synthesis has already led to the identification of an additional 69 exoplanets, showcasing our unwavering commitment to pushing the boundaries of exoplanetary discovery. The journey of implementing machine learning in domains of such complexity is riddled with practical challenges. Guaranteeing consistent performance and earning the trust of domain experts in the reliability of results becomes an arduous task. This keynote will intricately delve into the practical intricacies, providing profound insights into the delicate balance between machine intelligence and the discerning eye of scientific scrutiny.
by Laura Bonavera | Associate Professor | University of Oviedo
Show info PresentationBiography:
Laura Bonavera achieved her Astrophysics PhD in 2011, supported by a fellowship from the International School for Advanced Studies (Italy) and a scholarship from CSIRO Astronomy and Space Science and the ATNF (Australia). Until 2015, she devoted her postdoctoral efforts to the ESA Planck mission at the Instituto de Fisica de Cantabria (IFCA, Spain). Later, in 2016, she moved to the University of Oviedo (Spain), now serving as an Associate Professor, teaching in Engineering and Physics Degrees.
Laura’s primary research interests are in Astrophysics and Cosmology, specifically in the infrared and radio sources, the Cosmic Microwave Background (CMB), and the analysis of lensing effects on high-z sub-mm galaxies for astrophysical and cosmological parameters estimation. Notably, Laura played a significant role in generating crucial Planck CMB maps and in producing and validating various versions of the Planck Catalogue of Compact Sources, consisting in the many galaxies detected in Planck maps. Given her expertise, she is currently leading research that employs neural networks for CMB extraction, point source detection, and foreground characterization. In particular, her group is obtaining promising results using fully convolutional neural network applied to CMB maps.
Abstract:
The Cosmic Microwave Background (CMB) is a remnant radiation of the Big Bang, which is thought to mark the origin of the universe. The CMB anisotropies provides fundamental information about cosmology and on the initial conditions and the energy contents of the Universe. Therefore, it is very important to measure them with the highest precision, implying a very precise recovery of the CMB signal.
However, CMB measurements are contaminated by other astrophysical signals called foregrounds. The process to clean the data and recover the CMB signal is called component separation, and usually exploits multi-frequency sky maps. The foregrounds include diffuse emission from our Galaxy, mostly thermal dust emission and synchrotron emission, and extragalactic foregrounds, which appears as point sources i.e., galaxy clusters and extra-galactic sources.
The significance of developing highly efficient methods for CMB extraction also implies precise point source detection. This becomes particularly relevant for the next generation of CMB experiments with high resolution. Some classical methods have been developed in the past decade; our ongoing research explores the application of Fully Convolutional Neural Networks to enhance the performance of these tasks.
An overview of the challenges posed, our Neural Network approach, and the promising results obtained so far will be presented.
by Wenwu Wang | Professor of Signal Processing and ML | University of Surrey
Show info PresentationBiography:
Wenwu Wang is a Professor in Signal Processing and Machine Learning at the University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred AI. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 300 papers in these areas. He is a (co-)author or (co-)recipient of over 15 awards including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2023 and 2020 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, and the LVA/ICA 2018 Best Student Paper Award. He is an Associate Editor (2020-2025) for IEEE/ACM Transactions on Audio Speech and Language Processing. He was a Senior Area Editor (2019-2023) and Associate Editor (2014-2028) for IEEE Transactions on Signal Processing. He is the elected Chair (2023-2024) of IEEE SPS Machine Learning for Signal Processing Technical Committee, the Vice Chair (2022-2024) of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, an elected Member (2022-2024) of the IEEE SPS Signal Processing Theory and Methods Technical Committee. He is a Satellite Workshop Co-Chair for IEEE ICASSP 2024.
Abstract:
Text-to-audio generation aims to produce an audio clip based on a text prompt which is a language description of the audio content to be generated. This can be used as sound synthesis tools for film making, game design, virtual reality/metaverse, digital media, and digital assistants for text understanding by the visually impaired. To achieve cross modal text to audio generation, it is essential to comprehend the audio events and scenes within an audio clip, as well as interpret the textual information presented in natural language. In addition, learning the mapping and alignment of these two streams of information is crucial. Exciting developments have recently emerged in the field of automated audio-text cross modal generation. In this talk, we will give an introduction of this field, including problem description, potential applications, datasets, open challenges, recent technical progresses, and possible future research directions. We will focus on the deep generative AI methods for text to audio generation. We will start with our earlier work on conditional audio generation published in MLSP 2021 which was used as the baseline system in DCASE 2023. We then move on to the discussion of several algorithms that we have developed recently, including AudioLDM, AudioLDM2, Re-AudioLDM, and Wavjourney, which are getting increasingly popular in the signal processing, machine learning, and audio engineering communities.
by Piotr Bojanowski | Research Scientist | Meta
Show info PresentationBiography:
Piotr Bojanowski is Research Scientist at Meta. He is interested in applications of machine learning to Computer Vision and Natural Language Processing. His main research interests are focused on large-scale unsupervised learning. Co-creator of the fastText library designed to help build scalable text representation and classification solutions. Before joining Meta, in 2016, he received his Ph.D. in Computer Science while working in the Willow team where he was involved in Human Action Detection in Videos with Aligned Text. A graduate of École Polytechnique, where he got a Masters Degree in Mathematics, Machine Learning and Computer Vision (MVA).
Abstract:
Large-scale training of neural networks with self-supervision has allowed us to obtain robust data representations on nearly any domain. Features computed with DINOv2 show very strong performance on many benchmarks, meeting the quality of CLIP-like models on categorization tasks and setting a new bar for dense prediction ones (segmentation, depth estimation). At the same time, those models show outstanding out-of-domain robustness, allowing one to run inference on drastically different inputs.
In this talk, I will present some results we obtained by training dedicated models on specialist data: high-resolution satellite imagery, chest X-rays, or fluorescent microscopy of cells. For all three, we show that a backbone trained with DINO captures exciting properties of the data. They unlock many downstream applications without requiring finetuning and exhibiting outstanding robustness. Finally, features trained that way open the path to exciting results driven by data-based discovery.
by Souhaib Ben Taieb | Associate Professor of ML and Data Science | University of Mons
Show info PresentationBiography:
Souhaib Ben Taieb is an Associate Professor of Machine Learning and Data Science at the University of Mons (UMONS) in Belgium, leading the Big Data and Machine Learning Lab. His research encompasses various aspects of probabilistic machine learning for time series data, spanning topics such as uncertainty quantification, probabilistic forecasting, anomaly detection, forecast scoring and calibration. Previously, he was a Lecturer in the Department of Econometrics and Business Statistics at Monash University in Australia, for approximately four years. He was a postdoctoral researcher in the Spatio-Temporal Statistics and Data Science group at KAUST in Saudi Arabia. Souhaib successfully completed his Ph.D. in Machine Learning at the Free University of Brussels (ULB) in Belgium. He also holds a B.Sc and M.Sc in Computer Science from ULB. Souhaib received the Solvay Award for his Ph.D. thesis, and an IEEE Power & Energy Society award for ranking fifth among hundreds of participating teams in the Kaggle Global Energy Forecasting Competition 2012. He also received the Top Reviewer certificate at ICML 2020. He is an Associate Editor of the International Journal on Forecasting.
Abstract:
Marked Temporal Point Processes (TPPs) are a valuable tool for modeling continuous-time event sequences and predicting the arrival time and type of future events. Neural TPPs leverage the expressive power of neural networks to address the limitations of classical TPP models, such as the Hawkes process, which are prone to misspecification errors due to their strong modeling assumptions. Quantifying uncertainty in these neural models, however, remains a significant challenge. Existing tools for uncertainty quantification often yield unreliable results, primarily because of implicit modeling assumptions and their dependence on asymptotic guarantees. This presentation will demonstrate how to construct well-calibrated, distribution-free neural TPP models from multiple event sequences using conformal prediction. Specifically, we will present methods to build a distribution-free joint prediction region for the event arrival time and type with a finite-sample coverage guarantee. We will first consider a naive but statistically valid approach that combines independent prediction regions for the event arrival time and type. By neglecting potential dependencies between these variables, this method can be overly conservative, resulting in large prediction regions. Then, we will introduce a more refined method based on the highest density regions, derived from the joint predictive density of event arrival time and type. This method effectively addresses the challenge of creating a joint prediction region for a bivariate response that includes both continuous and discrete data types. Additionally, we will assess the validity and efficiency of these methods through a comprehensive set of experiments on both simulated and real-world event sequence datasets.
by Halina Kwaśnicka | Professor of Department of Artificial Intelligence | Wroclaw University of Science and Technology
Show info PresentationBiography:
Halina Kwaśnicka is a Computer scientist specializing in artificial intelligence and a professor at the Department of Artificial Intelligence, Wroclaw University of Science and Technology, Poland.
In 2023, the Rector of Wrocław University of Science and Technology, Prof. Arkadiusz Wójs, granted her the status of professor magnus. Over time, her research interest has evolved from nature-inspired methods, data mining and knowledge-based systems to methods of generating hierarchies of groups of objects and Explainable Artificial Intelligence.
Up to 1989, she worked at the Futures Research Centre at Wroclaw University of Science and Technology. From 2004 to 2012, she was the Deputy Director for Scientific research at the Institute of Informatics. Halina Kwaśnicka was a founder of the Department of Computational Intelligence in 2004. She was the head of this department up to the end of 2020 (in 2021, the name was changed to the Department of Artificial Intelligence);
2001 – 2003, she was the Lower Silesian Science Festival coordinator at the Wroclaw University of Science and Technology.
Professor Kwaśnicka was invited by the Editorial Board of the London Journal of Research in Science: Natural and Formal (LJRS) as an honorary Rosalind Member of London Journals Press.
She led research projects on applying artificial intelligence methods in medicine, image analysis and natural language processing. She participated in EU projects and managed a Polish-Singapore research project. She is a member of the Program Council of the AI Tech project (invited by the Minister of Digital Affairs). The project aims to conduct high-level graduate studies on Artificial Intelligence.
She was a member of a group of international scientists and industry experts during the first scientific evaluation of the research activities of the Center for Advanced Systems Understanding in Görlitz. She worked as an expert of the European Commission on the review of the project ""TRUST-AI,"" a Transparent, Reliable and Unbiased Smart Tool for AI. Halina Kwaśnicka is a member of the Steering Committee of the National Center for Research and Development - INFOSTRATEG Strategic Program. She was honoured as TOP of the TOP Women in AI 2022 by Perspektywy Women in Tech and received a special award: ""Distinguished Mentor in AI.""
Abstract:
Object Cluster Hierarchy – a new variant of Hierarchical Cluster Analysis is presented. Object Cluster Hierarchy (OCH) is an extension of the Hierarchical Clustering (HC) paradigm, considering Human perception of hierarchical data. Even though OCH is at an early stage of development, an increasing number of new OCH methods are being published. The existing benchmark datasets created to validate HC methods are unsuitable to validate OCH methods as they need to consider the differences between OCH and HC structures. There is no publicly available set of benchmarking data which would assist OCH development. Therefore, there is a need to establish a systematic benchmarking approach for OCH. A new method of generating hierarchical structures of data with assumed, user-defined properties is presented. A new set of benchmarks consists of hierarchical data structures with the ground truth assignment. The experiments show the usefulness of the data generator, which can produce a wide range of differently structured data. Furthermore, datasets that represent the most common types of hierarchies are generated and made available to the public for benchmarking, along with the developed generator. The implemented generator and the benchmarking datasets are freely available at http://kio.pwr.edu.pl/?page_id=396, along with instructions on how to use them.
by Mohammadamin Barekatain | Former Senior Research Engineer | Google DeepMind
Show info PresentationBiography:
Mohammadamin Barekatain is a former senior research engineer at Google DeepMind. Since early 2024, he has been a quantitative developer at Quadrature in London. He earned his master’s degree in computer science with the highest honours from the Technical University of Munich. His research interests span reinforcement learning, deep learning, large language models (LLMs), and their applications in mathematical and algorithmic discovery. He has contributed to several groundbreaking projects, such as AlphaTensor, AlphaDev, AlphaZero, MuZero, and Funsearch. AlphaTensor, featured on the front cover of Nature, discovered new, faster, and exact matrix multiplication algorithms that surpassed a 50-year-old record in computer science and mathematics. AlphaDev, published in Nature in 2023, discovered a new, faster sorting algorithm now open-sourced in the LLVM standard C++ library – the first accepted change in over a decade. His latest work, Funsearch, made the first mathematical discoveries for established open problems using an LLM. Funsearch was published in Nature and covered by The Guardian, New Scientist, and MIT Technology Review. Mohammadamin has published his research in prestigious scientific journals, such as Nature, and top machine learning conferences, such as NeurIPS, ICML, and IJCAI.
Abstract:
Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks. However, they sometimes suffer from confabulations, which can result in incorrect facts. This hinders their ability to contribute to new scientific discoveries. This talk will introduce FunSearch (short for searching in the functional space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. It will demonstrate the surprising effectiveness of this approach to surpass the best-known results in important theoretical and practical problems. Applying FunSearch to a central problem in extremal combinatorics –– the cap set problem –– it discovers new constructions of large cap sets going beyond the best-known ones, both in finite dimensional and asymptotic cases. This represents the first scientific discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be interpretable, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.
by Alessandro Ortis | Assistant Professor of ML and Computer Vision | Università di Catania
Show info PresentationBiography:
Alessandro Ortis is an Assistant Professor (RTDb) in Computer Vision at the Department of Mathematics and Computer Science of the University of Catania, where he also serves as teacher of Statistical Laboratory and Basics of Computing (MD in Data Science) and Programmazione 2 (BD in Computer Science). He has been working in the field of Computer Vision research since 2012 when he joined the IPLab (Image Processing Laboratory). He obtained a Master's Degree in Computer Science (summa cum laude) from the University of Catania in March 2015. Alessandro was awarded with the Archimedes Prize for the excellence of academic career and research activity conferred by the University of Catania in 2015. Along the way he has done two research internships at STMicroelectronics in 2011/2012 and at TIM in 2015. In January 2019 he achieved the PhD in Mathematics and Computer Science granted by TIM. The PhD thesis investigates several aspects related to Visual Sentiment Analysis applied to crowdsourced images/videos. Part of the PhD research has been spent at the Imperial College in London, under the supervision of Prof. Catarina Sismeiro, working on the analysis of social media advertising content. His research interests lie in the fields of Computer Vision, Machine Learning and Multimedia. Alessandro is a reviewer for several International Conferences and Journals, he is also involved in editorial activities of journals in the field of Computer Vision with different roles. He is an active member of several scientific associations and societies, he is also a member of the organizing committees of several international scientific events, including conferences, challenges, workshops and special issues. Alessandro is also a member of several international associations/societies, IEEE Senior Member and member of the IEEE Signal Processing Society, CVPL (Italian association for the research in Computer Vision, Pattern recognition and machine Learning) and REPRISE ( expert reviewers for research projects funded by the Italian Ministry of Education, Universities and Research), among others.
Abstract:
In recent years, Artificial Intelligence (AI) algorithms, predominantly leveraging deep neural networks, have garnered significant success across diverse domains, particularly in the realm of Computer Vision applications. The impressive performance exhibited by cutting-edge generative architectures, such as Generative Adversarial Networks (GAN) and Diffusion Models (DM), has piqued researchers' interest in emerging challenges concerning the authenticity and integrity of multimedia content, as well as the susceptibility of AI-driven applications.
The rise of Adversarial Machine Learning algorithms has given rise to various techniques capable of undermining the optimal functioning of machine learning models, leading to instances of misclassification. The amalgamation of Adversarial Machine Learning techniques with the advanced capabilities of generative models has proven to be an exceptionally potent, albeit potentially hazardous, toolset, particularly when wielded for illicit purposes. The imperative for developing innovative forensic algorithms designed to ascertain the authenticity and integrity of multimedia content has become increasingly evident.
This presentation aims to offer an insightful introduction to the realm of Adversarial Machine Learning, shedding light on key techniques employed in specific attacks targeting various types of media content. The discussion will commence with an overview of the evolutionary trajectory of GANs and related models, delving into the challenges posed by the proliferation of deepfakes. Subsequently, the presentation will pivot towards the intricacies of Adversarial Machine Learning, scrutinizing model robustness and vulnerabilities, underscoring the need for proactive measures in the face of evolving threats.
by Halima Bouzidi | PhD | Université Polytechnique Hauts-de-France
Show info PresentationBiography:
Halima Bouzidi is a final-year Ph.D. candidate in computer science at the Polytechnic University of Hauts de-France (UPHF), Valenciennes, France. She received her MSc degree in 2020 from the Higher School of Computer Science of Algiers, Algeria. Her research interests encompass the design automation of static and dynamic neural networks on resource-constrained edge devices and neural network computations mapping on heterogeneous Multiprocessor Systems-On-Chips (MPSoCs). Her Ph.D. thesis focuses on answering questions on how the interdisciplinary fields of machine learning, multi-objective optimization, and edge hardware design can help mitigate the challenges of deploying neural networks on Internet of Things (IoT) systems characterized by computation and memory limitations.
Abstract:
In the ever-evolving realm of Artificial Intelligence (AI), neural networks, ranging from basic Multi-Layer Perceptrons (MLPs) to advanced Large Language Models (LLMs), have become pivotal in shaping our digital landscape. Despite their impressive capabilities, LLMs still need to overcome a significant challenge: their increasing need for computational power and memory capacity makes it difficult to fit them on resource-resource-constrained hardware that characterizes modern Internet of Things (IoT) systems. In this talk, we delve into the intricate challenges of deploying LLMs within the constraints of IoT devices. We provide a comprehensive review of LLMs' limitations from both the software and hardware perspectives. Additionally, by adopting a holistic approach, we explore various optimization dimensions to unlock the efficiency of LLMs and ease their end-to-end deployment on IoT devices. These dimensions involve architectural designs, inference strategies, and hardware configurations.
by Timothée Darcet | PhD student | Meta and Inria
Show info PresentationBiography:
Timothée Darcet is a PhD student at INRIA and FAIR, in Meta. His research work focuses on self-supervised learning and vision transformers, in the context of large-scale visual representation learning and foundation models for vision. His PhD thesis is about advancing self-supervised learning from classification towards deeper image understanding in the hope of solving more complex tasks such as image segmentation or monocular geometry estimation. Prior to that, he studied at École polytechnique and at ENS, in France.
Abstract:
Self-supervised learning (SSL) is the field that studies how to train neural networks without any labels. This approach has been very successful in NLP, leading to BERT, GPT, and all the current large language models, with the success that we know. What about vision? In the recent years, vision saw many flavors of SSL bloom, leading to interesting emerging properties and creating the hope that we can create with SSL a generalist model, that makes it easy to solve any task in vision. Vision SSL, however, struggled to scale up as well as the large language models, staying confined to relatively small models. With DINOv2, we recently showed that SSL can scale up very effectively with the right ideas, and leads to very good performances on a wide range of tasks. In this talk, we will discuss the what, why, and how of large-scale SSL with DINOv2
by Adam Narożniak | Data Scientist | Flower Labs
Show info PresentationBiography:
Adam Narożniak is an enthusiast of Federated Learning with high standards for his work. He works for Flower Labs as Data scientists. He like meaningful challenges that have a practical impact.
Abstract:
Rethink assumptions about data collection and model training.
When does the centralized (traditional) ML not work, and why? What is an alternative?
Everything you need to know to start Federated Learning ... starting from what it is!
by Paweł Ekk-Cierniakowski | Data Science Domain Lead | SoftwareOne
Show info PresentationBiography:
For almost 10 years he has been professionally involved in data analytics as a data scientist, team leader and project manager. He has participated in many projects in the area of advanced data analytics, such as monitoring production lines, opinion analysis, fraud detection and price forecasting. His main experience and interests are in the pharmaceutical and healthcare industries, but he has been involved in projects in various areas such as finance, retail, energy and agriculture. Currently, he is responsible for designing and implementing data solutions, mainly in the field of machine learning and artificial intelligence. He shares his knowledge as a data science trainer, lecturer and speaker at conferences. Co-author of scientific papers, mainly in the field of medicine and statistics, published e.g. in journals from the Master Journal List.
Abstract:
The aim of the presentation will be to present various projects using large language models and the lessons learned from their implementation. During this session I will answer the following questions:
Is it better to use the existing model as it is, fine-tune it or try to build it from scratch?
How to monitor the performance of large language models?
How to ensure the security of solutions based on LLMs?
What do the answers to the above questions depend on in the context of a specific project?
In the summary, based on the lessons learned, I will also describe when it is worth using large language models and in which cases they will likely not work at all.
by Marcin Kowiel | Team Lead, Machine Learning Engineer | Ryvu Therapeutics
Show info PresentationBiography:
Marcin Kowiel is an accomplished professional with a diverse background in computer science, mathematics, and machine learning. He earned his PhD in small molecule crystallography from Poznan University of Medical Sciences in 2015. Later, he conducted research in protein crystallography at the Institute of Bioorganic Chemistry, Polish Academy of Sciences. Marcin has over 10 years of professional experience working as a software engineer in Python. He worked as a machine learning engineer and since 2018, he has also been working as a manager. He has led teams in both a cyber security company and a target discovery startup that specializes in genomics and biomarkers detection for cancer treatment. Currently, Marcin is serving as the Data Science Team Leader at Ryvu Therapeutics, a clinical-stage drug discovery and development company that focuses on novel small molecule therapies for emerging targets in oncology. Marcin is helping to improve and speed up the drug development process with the use of Artificial Intelligence. His primary focus is on the Hit Identification and Lead Optimization stages.
Abstract:
In the constantly evolving field of drug discovery, the use of machine learning (ML) to predict Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties has become a game-changer. My presentation will explore the intricacies of developing ML models specifically designed for ADMET property prediction, a critical step in identifying promising drug candidates.
This comprehensive process starts with meticulous data collection, followed by rigorous data cleaning and innovative feature generation techniques, such as fingerprints, descriptors, or graph embeddings. The presentation will also highlight the challenges encountered in creating ADMET models, such as data standardization, time-dependent data splitting, cross-validation, and the selection of appropriate evaluation metrics. I will address strategies for managing limited-resolution experimental data, the scarcity of training data points, and the crucial task of estimating model inference limits for new chemical spaces. Finally, I will emphasize the importance of model explainability and how it is reported, ensuring transparency and trustworthiness in ML-driven drug discovery.
Join me to explore how machine learning is transforming the drug discovery process, providing insights into ADMET properties, and ultimately contributing to the development of safer, more effective drugs.
by Marcin Wiktorowski | Artificial Intelligence Team Leader | Spectre Solutions
Show info PresentationBiography:
Marcin Wiktorowski is an Artificial Intelligence Team Leader at Spectre Solutions. His area of interest revolves around computer vision, deep learning, projective geometry, and remote sensing. In 2021, he earned his Master’s degree in Computer Science from Poznan University of Technology (PUT). Following graduation, he remained at PUT, contributing to the Lungs Radiomics Project ([lungs.pl](http://lungs.pl/)) as part of an interdisciplinary team consisting of computer science specialists and medical professionals.
Abstract:
In 2023, Polish roads witnessed 20340 road accidents, resulting in 1669 fatalities and 22944 injuries. Recognizing the urgency of addressing this alarming trend, the concept of an innovative solution emerged – a UAV-based system designed to patrol highways and identify traffic offenders.
In the following talk, a novel approach to the traffic monitoring problem will be presented - explaining how aerial recordings of the road are converted into vehicles’ positions, speeds, and distances between them. The presented solution, based on state-of-the-art computer vision models, stands out from existing ones due to the lack of assumptions about the 3D structure of the road and camera position, which implies robustness to variations in camera pose during the flight. The talk will describe how high-resolution images are processed in real-time using the limited computing power available on the drone. Additionally, a dedicated simulation environment serving as the Digital Twin of the road will be showcased.
by Grzegorz Rypeść | PhD Student | Warsaw University of Technology | IDEAS-NCBR
Show info PresentationBiography:
Grzegorz Rypeść obtained a double master's degree in computer engineering from the Warsaw University of Technology and Kyungpook National University in South Korea. He is also a graduate of the Matex class at the 14th High School of Stanisław Staszic in Warsaw, where he achieved the title of finalist in the nationwide Physics Olympiad. He has published at prestigious conferences such as ICLR, IJCAI (A*), as well as less prestigious ones like IEEE CEC and IEEE FedCSIS. His areas of interest include computer vision, computer graphics, continuous learning, optimization algorithms, and evolutionary computing. He has worked at companies such as Samsung, Kogifi, Match Trade Technologies, Kool2Play, and Sports Algorithmics and Gaming. As a supporter of evolutionary theory, he loves drawing inspiration for research from nature. In his free time, you can find him actively participating in the WAKK Habazie kayaking club or contemplating the panorama of Warsaw while sipping various herbal infusions.
Abstract:
The talk will explain SEED, an ensemble method for Continual Learning that was accepted for the prestigious ICLR2024 conference: https://arxiv.org/abs/2401.10191.
Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.
by Dominik Lewy, Karol Piniarski | Principal Data Scientist, Lead Computer Vision Consultant | Lingaro
Show info PresentationBiography:
Dominik has over 9 years of hands-on experience in Machine Learning, Deep Learning, Data Exploration and Business Analysis projects primarily in the FMCG industry. He is a technical leader setting goals and preparing road maps for projects. He is also a PhD candidate at Warsaw University of Technology where he focuses on the study of neural networks for image processing. He tries to be a bridge between commercial and academic worlds. His main research interest is digital image processing in context of facilitating adoption of deep learning algorithms in business context where training data is scarce or non-existing.
Karol has been specialising in computer vision for almost 10 years. Involved in many research and commercial projects in the field of generative models, deep learning, object detection, image compression and ADAS. Lead Computer Vision Consultant at Lingaro Group. Additionally, in constant cooperation with the Poznan University of Technology (PUT) as an assistant professor at the Institute of Automation and Robotics. He received his doctoral degree in 2022 at the Poznan University of Technology.
Abstract:
This presentation provides a comprehensive overview of critical considerations for utilizing foundational models within commercial use cases, with a focus on Computer Vision and Natural Language Processing domains. It outlines a systematic framework comprising essential steps for verification. Additionally, the presentation illuminates the process through examples of evaluation protocols, offering practical insights into assessing model performance and applicability in real-world scenarios. The analysis will concern mainly generative models, particularly text-to-image synthesis, and Large Language Models. Through this detailed exploration, participants will gain a deeper understanding of the strategic and technical prerequisites for leveraging foundational models to drive innovation and efficiency in commercial applications.
by Krzysztof Sopyła | AI R&D Manager | Pearson
Show info PresentationBiography:
Krzysztof believes artificial intelligence will enhance our skills and free us from monotonous work. And this credo has guided him throughout his career. He also has 15 years of academic and business experience in machine learning and NLP projects. This includes 9 years as CEO and Chief Data Scientist at Ermlab Software. For the last 10 years, Krzysztof has worked on NLP solutions, managed projects and ML teams in organisations, and helped to create innovative services and products. He has implemented research results for 4 NCBR funded projects, including his startup https://goodwrite.pl (aka Polish Grammarly). He is currently helping to develop the AI R&D department at Pearson.
Abstract:
The presentation aims to show our path towards successful LLM utilisation in the EdTech domain. It will focus on laying proper LLM foundations for projects devoted to quality content generation, tutoring, personalisation, response assessment and feedback generation. I want to present our view with our why and some short analytical explanation.
The presentation will emphasise the delicate balance between speed, quality, ethics, and costs. We cover topics such as the utilisation of small LLM for cost-effective deployments, discuss the merits of open-source versus proprietary models and delve into safeguarding measures, including security enhancements, discussion moderation, and content safety protocols.
Furthermore, we address the importance of aligning LLM models with specific use cases while acknowledging inherent tradeoffs and how we approach risk minimisation through model evaluation. Additionally, we highlight the significance of robust architecture and guide the integration of diverse elements effectively.
by Aleksandra Chrabrowa | Research Engineer | Machine Learning Research, Allegro
Show info PresentationBiography:
Ola Chrabrowa is a Research Engineer at Machine Learning Research at Allegro.
She works with NLP (textual data). She has a couple of years of industry experience in ML/NLP and a background in physics.
Abstract:
Modern system designs for e-commerce search are complex, multi-layered systems that must adhere to various online and offline performance requirements. Retrieval is a key component of search engines. In the face of today's dominance by neural networks and large language models (LLMs), it may be surprising to discover that old-school lexical retrieval remains a tough opponent for dense retrieval, which leverages neural networks. During this talk, you will learn what dense retrieval is and delve into the dense retrieval training processes employed at Allegro.
by Maks Operlejn | Machine Learning Engineer | deepsense.ai
Show info PresentationBiography:
I obtained my master’s degree in Computer Science from Gdańsk University of Technology. Currently, I’m a Machine Learning Engineer at deepsense.ai where my work revolves around Large Language Models and their wide-ranging applications - from implementing personalized RAG systems (using both closed and open-source models) to developing and testing AI Coding Agents. I also collaborated with the authors of the LangChain library, focusing on the topic of privacy of input data for language models. Besides work, I feel a constant need to learn about new cultures - this mainly manifests itself in traveling and learning languages. Additionally, I often find myself compulsively buying books that I most probably won’t have enough time to read.
Abstract:
GitHub Copilot, despite its flaws, is turning into a reliable ally for plenty of developers out there. The big wave of Large Language Models (LLMs) has got us swapping our regular dives into StackOverflow for quick sessions with ChatGPT. Since language models do a pretty decent job of managing individual programming tasks and code snippets, let's aim higher - what if we integrate language models into our code repositories for integral collaboration? Or even higher - would it be feasible to create a software developer agent that can write an entire codebase from scratch, tailored to your specifications?
The domain of AI code writing tools is quite fresh and, frankly, filled with a fair share of junk. Yet, there are some solid foundations right there and I've made an effort to test them. The goal is simple: to identify the standouts, inspire you with some work enhancing tools, but also outline the future of AI-enhanced coding process.
by Mateusz Hordyński | Technical Leader | deepsense.ai
Show info PresentationBiography:
I'm a software engineer who specializes in designing data-intensive/big data architectures for both cloud and on-premise infrastructures. Right now, I work as a Technical Leader at deepsense.ai where I create generative AI-powered applications and build data pipelines to support them. In my free time, I enjoy sports and traveling.
Abstract:
Prepare to dive into the exciting world of Large Language Models (LLMs) and structured data sources. In this session, we'll shed light on how to link LLMs to relational databases. We'll explore case studies and share different tried-and-tested approaches to achieve this. You'll understand the benefits, the obstacles, and what's on the horizon for integrating LLMs and structured data. Whether you're a seasoned professional in data science or just starting out, this talk has something to help you improve your skills in working with LLMs and external structured data sources.
by Filip Szatkowski | PhD Student | IDEAS NCBR & Warsaw University of Technology
Show info PresentationBiography:
Filip is a PhD Student at CV Lab at Warsaw University of Technology and IDEAS NCBR. He is passionate about deep learning research and his research interests are centered around efficient machine learning algorithms. In particular, he works on continual learning and adaptive computations in machine learning. He always enjoy expanding his skill set and learning about new things.
Abstract:
The increasing complexity of deep learning models has led to a drastic rise in their computational demands. To effectively address the emerging challenges related to model cost, accessibility, and environmental impact, we need to leverage more efficient methods. The inference cost of the models can be significantly reduced with conditional computation methods, which dynamically adjust the processing path within the model. Such methods use different parts of the model depending on the input data and thus enable computational savings, particularly on easier samples. In this talk, I will provide a brief overview of the most popular conditional computation techniques, including early exits and mixture-of-experts. Additionally, I will present our recent work that converts dense transformer models into mixture-of-experts. By leveraging activation sparsity and employing dynamic-k expert selection strategy, our approach significantly reduces the inference costs of the model while preserving its performance.
by Szymon Marcinkowski | GPU Engineer | Intel
Show info PresentationBiography:
Szymon is a Software Engineer at Intel, where he took part in many AI initiatives, such as OpenVINO, XeSS and DirectML. His main area of expertise and interest is AI optimizations, where he has already authored several projects. He has a strong background in accelerating AI performance on Intel hardware.
In 2018, he earned his Bachelor’s degree in Automatic Control and Robotics from Gdansk University of Technology.
Abstract:
There are many frameworks to build AI models like PyTorch or TensorFlow. They are great to train and validate our models, however sometimes we want to integrate them into games, engines, middleware or other applications and get great performance. DirectML is cross vendor, hardware-accelerated AI library driven by Microsoft, which can help you achieve that.
by Michał Mikołajczak | Tech Lead & CEO | Datarabbit
Show info PresentationBiography:
Michal Mikolajczak is a founder and Tech Lead at datarabbit.ai – data/machine learning focused software house, that helps organizations utilize their data and gain competitive advantage by designing, building, and shipping AI and data-driven solutions for their businesses. Due to working there on a variety of projects from different industries, he possesses a broad range of diversified ML experience, with a focus on its productization.
But his primary background is image processing – for a couple of years he worked in the medical imaging field, including being a CTO of a brain imaging analytics startup that was successfully acquired by a NASDAQ company, and serving as an advisor/consultant in a number of other companies in the field. Privately a big fan of BI/any kind of data visualization systems that allow storytelling with data and Pratchett works enjoyer.
Abstract:
Daily, approximately 329 million terabytes of data are created, with social media contributing 13%. In our data-driven era, the amount of valuable content generated and almost immediately available to others is outstanding, but there are downsides to that: malicious users generate harmful content, from racism to gore, posing challenges for communication platforms. The sheer volume overwhelms human moderators.
We tried to address this problem with automation of harmful content flagging in one of our projects. There are commercial solutions targeting it already available, but their detection capabilities were rather fixed, and we needed an adjustable rule set – so we tried coming up with custom solution. As the LLMs/generative AI capabilities are promising, we tried to employ them to the task. If you would like to know what were the results, how to utilize LLMs to screen both text and image data, or why at some point the solution achieved ""lawyer mode"" – come to this talk for a story.
by Dawid Lipski | Chief Algorithmic Trading Officer | Match-Trade
Show info PresentationBiography:
Dawid brings a solid background in mathematics and quantitative methods to his work in financial technology, having completed his studies simultaneously at the Warsaw University of Technology and the Warsaw School of Economics. His professional journey began at X-Trade Brokers, focusing on risk management, where he gained valuable experience in the financial sector. Later, he joined Match-Trade, where his work transitioned to the development of algorithms for high-frequency trading, a challenging yet rewarding field that combines the interests in probability, finance and technology.
His academic background and professional experience have led him to develop interests in the market microstructure, probability, and the essential infrastructure of finance. These areas continue to influence his work and research, driving his pursuit of practical solutions in algorithmic trading and financial technology. Currently, he leads a team of dozens of quants and traders. Together, they create algorithms that play a crucial role in maintaining the quality of exchanges around the world, ensuring the financial markets remain efficient and reliable.
Abstract:
In the rapidly evolving world of high-frequency trading (HFT), where milliseconds can equate to defeat or success, the deployment of machine learning methods has become a cornerstone for success. However, as these algorithms become increasingly complex, the need for transparency and understanding has never been more critical. This lecture delves into the vital role of explainable machine learning (XAI) within the HFT environment, emphasizing the necessity of market understanding for algorithms creation.
This presentation will explore the foundational concepts of explainable machine learning, shedding light on how these methodologies can help enhancing decisions in the context of HFT. Through practical examples, we will review key XAI techniques and tools that facilitate a deeper understanding of market dynamics and algorithmic behaviors.
by Łukasz Sztukiewicz | Student | Carnegie Mellon University & Poznan University of Technology
Show infoBiography:
Łukasz is currently pursuing a Bachelor of Science degree in Artificial Intelligence at Poznan University of Technology. He gained his experience as a research fellow at AutonLab, Carnegie Mellon University, where he contributed to both theoretical and applied projects. Despite his background in research, he possesses an entrepreneurial spirit. What fascinates him about AI is its practical utility in addressing real-world challenges, particularly in healthcare. Łukasz firmly advocates for prioritizing the development of trustworthy, reliable, and cost-effective machine learning solutions to facilitate widespread adoption in the future. His work is showcased at conferences such as MLinPL and ICLR.
Abstract:
Decision trees, a cornerstone of machine learning, have stood the test of time as one of the oldest and most extensively studied techniques. Despite their longevity, they remain highly relevant and effective in modern applications.
One well-known fact is that they are inherently unstable, meaning that small variations in the input data can result in very different output models. While numerous studies have addressed common data challenges such as missing values, outliers, and feature noise, the reliability of data annotations is often overlooked, and in reality, labels are rarely perfect.
This talk will discuss the successes and failures in improving the robustness of decision trees to label noise. We will show that by combining a fuzzy decision tree paradigm with robust splitting criteria, we can learn single, robust, and performant trees even on data with high label noise.
by Piotr Baryczkowski, Sebastian Szczepaniak | Master | Poznań University of Technology
Show info PresentationBiography:
Piotr Baryczkowski is a person who is currently making significant progress in the field of artificial intelligence. As a Deep Learning Software Engineer Intern at Intel, he has immersed himself in the cutting-edge world of technology, contributing his expertise to advance the field. Concurrently, Piotr is pursuing a Master's degree at Poznań University of Technology (PUT), specializing in Computer Vision and TinyML. His academic journey reflects his commitment to exploring the intersection of computer science and machine learning, showcasing a keen interest in the development of compact and efficient machine learning models.
Piotr's passion extends to his current focus on Spiking Neural Networks, an area that holds immense promise for the future of artificial intelligence. His work at Intel and academic pursuits demonstrate his dedication to staying at the forefront of technological advancements.
Sebastian Szczepaniak is a capable Data Scientist currently working at Lingaro. He graduated in computer science from Poznań University of Technology and is now pursuing a Master's in artificial intelligence, showing a strong interest in specialized areas like spiking neural networks and tiny ML.
In his role at Lingaro, Sebastian effectively analyzes complex datasets, combining theoretical knowledge with practical skills. His focus on cutting-edge AI research, particularly in emerging areas, reflects his forward-thinking approach.
Szczepaniak's academic journey highlights his dedication to excellence in both theory and application. His proficiency in advanced concepts positions him to make significant contributions to the ever-evolving field of artificial intelligence, showcasing his expertise in the dynamic realm of data science.
Abstract:
The growing popularity of edge computing goes hand in hand with the widespread use of systems based on artificial intelligence. There are many different technologies used to accelerate AI algorithms in end devices. One of the more efficient is CMOS technology thanks to the ability to control the physical parameters of the device. This article discusses the complexity of the semiconductor implementation of TinyML edge systems in relation to various criteria. In particular, the influence of the model parameters on the complexity of the system is analyzed. As a use case, a CMOS preprocessor device dedicated to detecting heart rate in wearable devices is used. The authors use the current and weak inversion operating modes, which allow the preprocessor to be powered by cells of the human energy harvesting class.
by Piotr Ludynia, Michał Szafarczyk | Students | AGH University of Science and Technology
Show info PresentationBiography:
Piotr Ludynia is a computer science student at AGH University of Science and Technology in Kraków. His main areas of study include machine learning, computer vision, algorithms, and code optimization. His Bachelor of Engineering thesis project centers on involves creating an open-source Python library for the efficient computation of molecular fingerprints, that implements efficient parallel solution and scikit-learn compatibility. Commencing his studies in 2020, he anticipates completing his Bachelor's degree in Engineering in 2024 and pursuing a Master's degree, set for completion in 2025. He works at Intel Technology Poland. Michał Szafarczyk is student of medicine at Jagiellonian University in Cracow. Recently acquired Bachelor's degree in Computer Science from AGH UST. Bachelor's thesis discussed the topic of molecular fingerprints and implementing an effective library for their usage. Current field of research is chemoinformatics and ML in medical imaging, with some casual projects from other topics like LLMs or RL. Also, a member and lecturer of BIT student organization at AGH.
Abstract:
Molecular fingerprints are algorithms commonly used to vectorize graphs of chemical molecules as part of preprocessing in machine learning solutions. Machine learning on graphs is a non-trivial task and an important problem in modern data science. An easy way to perform this task is by encoding the graph into a vector consisting of values of certain descriptors. Molecular fingerprints are algorithms designed to handle this type of preprocessing. Even simple models, that incorporate them, can yield results comparable to state-of-the-art neural network solutions. We would like to present some popular fingerprint algorithms and their uses in chemoinformatics and machine learning. The talk was inspired by our project, in which we implemented a library for efficiently computing such fingerprints.
by Sebastian Chwilczyński | Data Scientist & Student | deepsense.ai & Poznań University of Technology
Show info PresentationBiography:
Sebastian Chwilczyński, a graduate in Artificial Intelligence from Poznan University of Technology, is a member of the GHOST science club and a devoted music enthusiast. He gained his experience, among others, at Intel in the Audio Research team and PSNC working on Computer Vision problems. Currently engaged at deepsense.ai, Sebastian tackles challenges across various deep learning domains. He is obsessed with following good practices during development of deep learning models. He loves to share his knowledge, this is why he led many groups at GHOST science club, both in practical and research setting.
Abstract:
Current hype on deep learning (DL) captures the attention of many programmers and researchers. Unfortunately, lack of a unified research scheme for DL models results in inconsistencies in methodologies, unclear documentations, difficulties in understanding others' code, and challenges in replicating others' results. Furthermore, the training of neural networks increasingly takes on the form of trial and error, lacking a structured and thoughtful process.
One possible solution to this problem is the automatic imposition of rules and standards by which users should conduct the development of their projects, along with the provision of tools for their implementation. With this intention, we created Actually Robust Training (ART) — a framework inspired by Andrej Karpathy's blog post A Recipe for Training Neural Networks. ART is a Python library containing a set of tools for methodological and effective deep learning model training.
Developing a DL model with ART is like playing a game of Deep Learning, where we must accomplish all lower levels to face the final boss — the test set. And who doesn't enjoy the thrill of a game?
https://github.com/SebChw/Actually-Robust-Training