XPROAX

local eXplanations for text classification with
PROgressive neighborhood ApproXimation

Abstract The importance of the neighborhood for training a local surrogate model to approximate the local decision boundary of a black box classifier has been already highlighted in the literature. Several attempts have been made to construct a better neighborhood for high dimensional data, like texts, by using generative autoencoders. However, existing approaches mainly generate neighbors by selecting purely at random from the latent space and struggle under the curse of dimensionality to learn a good local decision boundary. To overcome this problem, we propose a progressive approximation of the neighborhood using counterfactual instances as initial landmarks and a careful 2-stage sampling approach to refine counterfactuals and generate factuals in the neighborhood of the input instance to be explained. Our work focuses on textual data and our explanations consist of both word-level explanations from the original instance (intrinsic) and the neighborhood (extrinsic) and factual- and counterfactual-instances discovered during the neighborhood generation process that further reveal the effect of altering certain parts in the input text. Our experiments on real-world datasets demonstrate that our method outperforms the competitors in terms of usefulness and stability (for the qualitative part) and completeness, compactness and correctness (for the quantitative part).

XPROAX is a local explanation method specified for text classifiers. Benefit from the more careful construction of neighborhoods, XPROAX provides high-quality explanations with more details for understanding black-box decisions. The explanation consists of four components: (i) intrinsic words, (ii) extrinsic words, (iii) factuals, (iv) counterfactuals.

One major challenge of explaining text classifiers is neighborhood construction. The frequently used word-dropping method can easily lead to incomplete sentences. To address this challenge, the basic idea behind XPROAX is to deploy a generative model for generating better (semantically meaningful and grammatically correct) neighboring texts. Furthermore, we propose a two-staged progressive neighborhood approximation method in this paper. It helps constraint the neighborhood of a given input based on the local manifold and improves the quality of constructed neighborhoods.

In this paper, we perform qualitative and quantitative evaluations on XPROAX and compare the proposed method with state-of-the-art local explanation methods. Experimental results show that our method outperforms the competitors . The experiments also illustrates the quality of neighborhoods have a huge impact on final explanations. More specifically, the comparison between XPROAX and XSPELLS shows that the careful construction of the neighborhood overcomes the weakness of random sampling in a latent space.
(Please refer to the paper for more details)

Yi Cai, Arthur Zimek, Eirini Ntoutsi. XPROAX-Local explanations for text classification with progressive neighborhood approximation. In 8th IEEE International Conference on Data Science and Advanced Analytics (DSAA) 2021.
Code Paper Poster Slides Video