University of Waterloo

Student seminars

Kimathi Kaai, Anand Murugan

Nov 11, 2023, 11:30 am, EC4-2101A

*** 11:30am Presenter: Kimathi Kaai ***

Domain generalization (DG) focuses on transferring domain-invariant knowledge from multiple source domains (available at train time) to an a priori unseen target domain(s). This task implicitly assumes that a class of interest is expressed in multiple source domains (domain-shared), which helps break the spurious correlations between domain and class and enables domain-invariant learning. However, in real-world applications, classes may often be expressed only in a specific domain (domain-linked), which leads to extremely poor generalization performance for these classes. In my thesis I advocate this task to the community and develop an algorithm to learn generalizable representations for these domain-linked classes by transferring useful representations from domain-shared classes. Specifically, I propose a **F**air and c**ON**trastive feature-space regularization algorithm for **D**omain-linked DG, **FOND.** I’ll present experiments with baselines across popular DG benchmarks to accomplish state-of-the-art DG results for domain-linked classes, given a sufficient number of domain-shared classes.  

*** 12 pm Presenter: Anand Murugan ***

As machine learning increasingly influences healthcare predictive models, attention to model performance across patient demographics is imperative. Despite considerable focus on algorithmic biases, the impact of data bias in healthcare machine learning (HML) models remains under-explored. Our study employs a systematic survey that identifies the Medical Information Mart for Intensive Care (MIMIC) database as a prevalent source for data in HML. Subsequent data and statistical analysis of task-specific data from MIMIC-IV v2.0 reveals a noteworthy association between ethnicity and predictive outcomes. Our empirical modeling of the benchmark/SOTA task specific models further substantiates the inconsistent performance across ethnic demographics. This divergence underscores the urgent necessity to address data biases in addition to algorithmic biases to foster fair, generalizable, and data-aware HML models. To facilitate this, we introduce a comprehensive datasheet for MIMIC-IV v2.0 CRD, aimed at assisting researchers in fully comprehending underlying data and effectively utilize it for downstream tasks.