dc.description.abstracten |
Model fairness and stability are tightly dependent both on the input data, and on
the choices and practices in the machine learning (ML) lifecycle. Errors in the ML
pipeline can be classified as exogenous data errors (e.g., outliers, missing values, or
incorrect labels) and endogenous modeling or processing errors (e.g., incorrect or sub-
optimal choices of outlier detection or missing value imputation technique). To an-
ticipate and mitigate detrimental downstream consequences, it is crucial to under-
stand the impact of various exogenous data errors on model performance during
model development. Moreover, these data errors may occur during the model serv-
ing flow, so model developers need to know the possible behavior of their models
under exogenous data errors to choose the best one that can be robust and fair in the
production setup. Thus, the focus of this work is on quantifying the impact of exoge-
nous errors on model performance, both during development and post-deployment.
There is currently no open-source toolkit that can help to measure the impact
of controlled data errors on model performance, in terms of both fairness and sta-
bility. To address this gap, we develop two software libraries in this thesis, Virny
and MLcF. Virny is a dedicated library for auditing model fairness and stability, and
plugs into MLcF, which takes a broader lifecycle view of model performance un-
der different errors (exogenous vs. endogenous). To showcase the utility of these
libraries, we conduct extensive stress-testing of different model architectures by sys-
tematically injecting controlled data errors. These experiments allow us to draw
interesting insights about the impact of data errors on classifier performance — in
terms of accuracy, stability, and fairness. |
uk |