The Art of Learning with Missing Values

First ICML workshop, July 17th, 2020

Analysis of large amounts of data offers new opportunities to understand many processes better. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources, leading to many observations with missing features. From questionnaires to collaborative filtering, from electronic health records to single-cell analysis, missingness is everywhere at play and is rather the norm than the exception. Even "clean" data sets are often barely "cleaned" versions of incomplete data sets—with all the unfortunate biases this cleaning process may have created.

Despite this ubiquity, tackling missing values is often overlooked. Handling missing values poses many challenges, and there is a vast literature in the statistical community, with many implementations available. Yet, there are still many open issues and the need to design new methods or to introduce new point of views: for missing values in a supervised-learning setting, in deep learning architectures, to adapt available methods for high dimensional observed data with different type of missing values, deal with feature mismatch and distribution mismatch. Missing data is one of the eight pillars of causal wisdom for Judea Pearl who brought graphical model reasoning to tackle some missing not at random values.

The goal of the Art of Learning with Missing Values (ARTEMISS) workshop is to give more momentum and exposition to research on missing values, both theoretical, methodological, and applied, and emphasize the connections with other areas of machine learning (e.g. causal inference, semi-supervised learning, generative modelling, uncertainty quantification, transfer learning, distributional shift, etc.). We will also attach importance to discussing the reproducibility problems that can be caused by missing data, the danger of forgetting the missing values issues and the importance of providing sound implementations.

Confirmed Speakers

Rich Caruana
Senior Principal Researcher at Microsoft Research.

Mihaela van der Schaar
John Humphrey Plummer Professor at the University of Cambridge and Turing Fellow at The Alan Turing Institute.

Karthika Mohan
Postdoctoral scholar at CHAI in UC Berkeley.

Mauricio Sadinle
Assistant Professor at the University of Washington.

Madeleine Udell
Assistant Professor at Cornell University.

José Miguel Hernández Lobato
University Lecturer in Machine Learning, University of Cambridge.

Confirmed Discussants

Andrew Gelman
Higgins Professor of Statistics at Columbia University.

Geert Molenberghs
Professor at Hasselt University and University of Leuven.

Ilya Shpitser
John C. Malone Assistant Professor at the Johns Hopkins University.

Program

The program starts at Fri Jul 17 10:45 am and ends at 8:10 pm (CEST).

Please see the program in the ICML schedule or at the ICML Virtual Site (ICML registration required).

Accepted papers

The accepted papers are listed below and available at OpenReview. This does not constitute a proceeding for the workshop.

Optimal recovery of missing values for non-negative matrix factorization: A probabilistic error bound
Rebecca Chen, Lav R. Varshney

Visna---Visualising Multivariate Missing Values
Antony Unwin, Alexander Pilhoefer

Causal Discovery in the Presence of Missing Values for Neuropathic Pain Diagnosis
Ruibo Tu, Kun Zhang, Bo Christer Bertilson, Clark Glymour, Hedvig Kjellström, Cheng Zhang

Multi-output prediction of global vegetation distribution with incomplete data
Rita Beigaite, Jesse Read, Indre Zliobaite

A Penalized Likelihood Approach for Statistical Inference in a High-Dimensional Linear Model with Missing Data
Jiwei Zhao

A Random Matrix Analysis of Learning with α-Dropout
Mohamed El Amine Seddik, Romain Couillet, Mohamed Tamaazousti

Path Imputation Strategies for Signature Models
Michael Moor, Max Horn, Christian Bock, Karsten Borgwardt, Bastian Rieck

Clustering Data with nonignorable Missingness using Semi-Parametric Mixture Models
Marie Du Roy de Chaumaray, Matthieu Marbac

Estimating conditional density of missing values using deep Gaussian mixture model
Marcin Przewięźlikowski, Marek Śmieja, Łukasz Struski

Does imputation matter? Benchmark for real-life classification problems.
Katarzyna Woźnica, Przemyslaw Biecek

VAEs in the Presence of Missing Data
Mark Collier, Alfredo Nazabal, Chris Williams

Missing the Point: Non-Convergence in Iterative Imputation Algorithms
Hanne I. Oberman, Stef van Buuren, Gerko Vink

Predicting Feature Imputability in the Absence of Ground Truth
Niamh McCombe, Xuemei Ding, Girijesh Prasad, David P Finn, Stephen Todd, Paula L McClean, Kongfatt Wong-Lin

Variance estimation after Kernel Ridge Regression Imputation
Hengfang Wang, Jae Kwang Kim

Online Mixed Missing Value Imputation Using Gaussian Copula
Eric Landgrebe, yuxuan zhao, Madeleine Udell

Imputation of Missing Behavioral Measures in Connectome-based Predictive Modelling
Qinghao Liang, Dustin Scheinost

Handling Missing Data in Decision Trees: A Probabilistic Approach
Pasha Khosravi, antonio vergari, YooJung Choi, Yitao Liang, Guy Van den Broeck

The Dynamic Latent Block Model for Sparse and Evolving Count Matrices
Giulia Marchello, Marco Corneli, Charles Bouveyron

Missing rating imputation based on product reviews via deep latent variable models
Dingge Liang, Marco Corneli, Pierre Latouche, Charles Bouveyron

Lung Segmentation from Chest X-rays using Variational Data Imputation
Raghavendra Selvan, Erik Dam, Nicki Skafte Detlefsen, Sofus Rischel, Kaining Sheng, Mads Nielsen, Akshay Pai

Inferring Causal Dependencies between Chaotic Dynamical Systems from Sporadic Time Series
Edward De Brouwer, Adam Arany, Jaak Simm, Yves Moreau

The impact of incomplete data on quantile regression for longitudinal data
Anneleen Verhasselt, Alvaro José Flórez, Ingrid Van Keilegom, Geert Molenberghs

Multi-label Learning with Missing Values using Combined Facial Action Unit Datasets
Jaspar Pahl, Ines Rieger, Dominik Seuss

A Study on Intentional-Value-Substitution Training for Regression with Incomplete Information
Takuya Fukushima, Tomoharu Nakashima, Taku Hasegawa, Vicenç Torra

How to miss data? Reinforcement learning for environments with high observation cost
Mehmet Koseoglu, Ayca Ozcelikkale

Processing of incomplete images by (graph) convolutional neural networks
Tomasz Danel, Marek Śmieja, Łukasz Struski, Przemysław Spurek, Lukasz Maziarka

Conditioning on "and nothing else": Simple Models of Missing Data between Naive Bayes and Logistic Regression
David Poole, Ali Mohammad Mehr, Wan Shing Martin Wang

Multi-Time Attention Networks for Irregularly Sampled Time Series
Satya Narayan Shukla, Benjamin Marlin

How to deal with missing data in supervised deep learning?
Niels Bruun Ipsen, Pierre-Alexandre Mattei, Jes Frellsen

VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data
Chao Ma, Sebastian Tschiatschek, Richard E. Turner, José Miguel Hernández-Lobato, Cheng Zhang

Working with Deep Generative Models and Tabular Data Imputation
Ramiro Camino, Christian Hammerschmidt, Radu State

Information Theoretic Approaches for Testing Missingness in Predictive Models
Shreyas A Bhave, Rajesh Ranganath, Adler Perotte

Call for papers

We welcome short papers from both academic and industrial practitioners/researchers. In particular, since missing data is a critical issue in many domains, we would like to federate industrial/applied know-how and various academic approaches. We also welcome very applied work from areas others than machine learning and statistics.

Submission details

Authors should submit extended abstracts of no more than four pages (excluding references) ~~using the ICML LaTeX style files~~. Adding an appendix is permitted but reviewers will not be required to read it.

Papers must be submitted in OpenReview.

Submissions will be reviewed single-blind (reviewers are anonymous), so authors names and affiliations should be included in the submission. Submissions and reviews are private; only accepted papers will be publicly available on OpenReview. Public commentary is not allowed.

~~The deadline for submissions is May 20 2020 11:59PM UTC-0.~~
The deadline for submissions has been extended to June 10 2020 11:59PM UTC-0.

Please use the ARTEMISS LaTeX style files when submitting the camera ready version.

Program Committee (reviewers)

We thankful to the members of the program committee, who contributed to shaping this workshop:

Aude Sportisse, Université Pierre-et-Marie-Curie
Dingge Liang , Université Côte d'Azur, INRIA
Geneviève Robin, École des Ponts ParisTech, INRIA
George H. Chen, Carnegie Mellon University
Giulia Marchello, Université Côte d'Azur, INRIA
Francois Husson, Agrocampus Ouest
Jes Frellsen, Technical University of Denmark
Julie Josse, École Polytechnique
Marine Le Morvan, INRIA
Michael Moor, Swiss Federal Institute of Technology
Nicole S. Erler, Erasmus Medical Center

Najmeh Abiri, IT University of Copenhagen
Niels Bruun Ipsen, Technical University of Denmark
Nicolas Jouvin, Université Paris 1 Panthéon-Sorbonne
Pierre-Alexandre Mattei, INRIA
Raghavendra Selvan, University of Copenhagen
Shreyas A. Bhave, Columbia University
Sebastian Tschiatschek, University of Vienna
Steffen Moritz, Technische Hochschule Köln
Tomasz Danel, Jagiellonian University
Vincent Audigier, CNAM
Gaël Varoquaux, INRIA

Organizers

The conference is jointly organized by: