Paper
Conference
Stable and Uncertainty-Aware Local Post-hoc Explanations Using Active Learning

Stable and Uncertainty-Aware Local Post-hoc Explanations Using Active Learning

With the increasing deployment of opaque machine learning models, there is a critical need for locally faithful, stable and reliable model  explanations. Existing model-agnostic approaches typically rely on random perturbation sampling, which introduces variance across runs, thereby motivating the development of perturbation strategies that explicitly prioritize informativeness. In order to alleviate the instability in explanations, there is a need to query perturbations that are maximally informative about local surrogate parameters. We propose the  EAGLE framework, which formulates the perturbation selection as an active learning problem, using expected information gain and Bayesian disagreement, and produces feature importance scores with associated confidence estimates, thereby quantifying the reliability of explanations.  Experiments on tabular and image datasets show that EAGLE improves reproducibility across runs, balances the exploration-exploitation tradeoff in perturbation selection, and achieves high neighborhood stability relative to state of the art baselines such as  Focus Sampling based BayesLIME and UnRAvEL.