Analysis of large amounts of data offers new opportunities to understand many processes better. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources, leading to many observations with missing features. From questionnaires to collaborative filtering, from electronic health records to single-cell analysis, missingness is everywhere at play and is rather the norm than the exception. Even "clean" data sets are often barely "cleaned" versions of incomplete data sets—with all the unfortunate biases this cleaning process may have created.

Despite this ubiquity, tackling missing values is often overlooked. Handling missing values poses many challenges, and there is a vast literature in the statistical community, with many implementations available. Yet, there are still many open issues and the need to design new methods or to introduce new point of views: for missing values in a supervised-learning setting, in deep learning architectures, to adapt available methods for high dimensional observed data with different type of missing values, deal with feature mismatch and distribution mismatch. Missing data is one of the eight pillars of causal wisdom for Judea Pearl who brought graphical model reasoning to tackle some missing not at random values.

The goal of the Art of Learning with Missing Values (ARTEMISS) workshop is to give more momentum and exposition to research on missing values, both theoretical, methodological, and applied, and emphasize the connections with other areas of machine learning (e.g. causal inference, semi-supervised learning, generative modelling, uncertainty quantification, transfer learning, distributional shift, etc.). We will also attach importance to discussing the reproducibility problems that can be caused by missing data, the danger of forgetting the missing values issues and the importance of providing sound implementations.

Confirmed Speakers

Rich Caruana

Rich Caruana
Senior Principal Researcher at Microsoft Research.

Mihaela van der Schaar

Mihaela van der Schaar
John Humphrey Plummer Professor at the University of Cambridge and Turing Fellow at The Alan Turing Institute.

Karthika Mohan

Karthika Mohan
Postdoctoral scholar at CHAI in UC Berkeley.

Mauricio Sadinle

Mauricio Sadinle
Assistant Professor at the University of Washington.

Madeleine Udell

Madeleine Udell
Assistant Professor at Cornell University.

José Miguel Hernández Lobato

José Miguel Hernández Lobato
University Lecturer in Machine Learning, University of Cambridge.

Confirmed Discussants

Andrew Gelman

Andrew Gelman
Higgins Professor of Statistics at Columbia University.

Geert Molenberghs

Geert Molenberghs
Professor at Hasselt University and University of Leuven.

Ilya Shpitser

Ilya Shpitser
John C. Malone Assistant Professor at the Johns Hopkins University.

Call for papers

We welcome short papers from both academic and industrial practitioners/researchers. In particular, since missing data is a critical issue in many domains, we would like to federate industrial/applied know-how and various academic approaches. We also welcome very applied work from areas others than machine learning and statistics.

Submission details

Authors should submit extended abstracts of no more than four pages (excluding references) using the ICML LaTeX style files. Adding an appendix is permitted but reviewers will not be required to read it.

Papers must be submitted in OpenReview.

Submissions will be reviewed single-blind (reviewers are anonymous), so authors names and affiliations should be included in the submission. Submissions and reviews are private; only accepted papers will be publicly available on OpenReview. Public commentary is not allowed.

The deadline for submissions is May 20 2020 11:59PM UTC-0.
The deadline for submissions has been extended to June 10 2020 11:59PM UTC-0.

Program

TBA