Method overview

FPDE is a post-hoc feature attribution method for classification. It explains one input by contrasting the input with a prototype for a target class and a prototype for a rival class. The output is an attribution vector with one value per feature. Positive values support the target class relative to the rival class. Negative values support the rival class relative to the target class.

When to use FPDE

Use FPDE when you have:

A feature matrix where each column has a consistent meaning.
Class labels for training samples.
A classification model, usually one that exposes predict_proba.
A need to explain target-versus-rival evidence at the feature level.

FPDE does not retrain or modify the classifier. It uses training data to build prototypes, then uses those prototypes to explain individual inputs.

Core terms

Term	Meaning
Prototype	A representative feature vector for a class. FPDE currently builds one class-mean prototype per class.
Target class	The class being explained, often the classifier’s predicted class.
Rival class	The contrast class, often the second-highest-probability class.
Evidence	The scalar target-versus-rival contrast decomposed by FPDE.
Attribution	A per-feature contribution to the evidence value.
Anchor	A reference vector used by Cos-FPDE before computing cosine contrasts.
Baseline	A replacement vector used for deletion and insertion perturbation curves.

Prototype construction

class_mean_prototypes(X, y) computes one prototype per class by averaging all training rows with that class label. FPDEEngine.fit(X_train, y_train, model) builds the same prototype state and stores it for repeated explanations. For repeated workflows, prefer FPDEEngine because it reuses:

Class-mean prototypes
Prototype labels
Mean and zero anchors
The baseline vector
Label-to-prototype lookup state

Target and rival selection

When you use FPDEEngine with a model, FPDE chooses the local contrast from predict_proba.

Select the target

The target class is the highest-probability class.

Select the rival

The rival class is the second-highest-probability class.

Map labels to prototypes

FPDE maps both labels to their fitted class prototypes.

When you call lower-level functions directly, pass positive_label and negative_label. If you omit negative_label, FPDE selects a non-target prototype according to the selected mode.

Diff-FPDE

Diff-FPDE decomposes the difference in squared distances from the input to the rival and target prototypes.

E_diff = ||x - p_neg||^2 - ||x - p_pos||^2
phi_j  = (x_j - p_neg_j)^2 - (x_j - p_pos_j)^2

The attribution values sum to the Diff-FPDE evidence. A positive feature value means that feature moves the input closer to the target prototype than to the rival prototype under this squared-distance contrast.

Cos-FPDE

Cos-FPDE decomposes a regularized cosine-similarity contrast.

E_cos = cos_eps(x - anchor, p_pos - anchor)
        - cos_eps(x - anchor, p_neg - anchor)

Cos-FPDE is an exact coordinate decomposition of the cosine contrast. It is not a leave-one-feature-out causal effect, because the cosine denominator depends on all coordinates.

Hyb-FPDE

Hyb-FPDE mixes Diff-FPDE and Cos-FPDE attribution vectors.

phi_hyb_j = lambda_hyb * phi_diff_j + (1 - lambda_hyb) * phi_cos_j

lambda_hyb must be in [0, 1].

lambda_hyb=1.0 uses the Diff-FPDE endpoint.
lambda_hyb=0.0 uses the Cos-FPDE endpoint.
Intermediate values blend both attribution vectors.

By default, Hyb-FPDE uses L1-normalized component attribution vectors before mixing. Set normalize="none" when you want to mix the raw component scales.

Bayesian-FPDE

Bayesian-FPDE replaces a single selected lambda_hyb with a posterior distribution over the lambda grid. Use it when you want to express uncertainty about the Hyb-FPDE mixture weight, or when you want a Bayesian model-averaged attribution instead of a point-selected one. FPDEEngine.select_bayesian_lambda evaluates the same held-out deletion and insertion validation score as select_lambda, then converts the finite lambda grid into a posterior. The default prior is Beta(1, 1), which is uniform over the candidates. For each candidate, the unnormalized log-posterior is:

log_likelihood = n_eval_samples * validation_score / temperature
log_prior      = (alpha - 1) * log(lambda_hyb)
               + (beta - 1) * log(1 - lambda_hyb)

Tune alpha and beta to encode prior preferences over lambda_hyb. Smaller temperature values make the posterior sharper around high-scoring candidates; larger values flatten it toward the prior. The normalized posterior weights produce:

posterior_mean_lambda: the Bayesian model-averaged mixture weight.
map_lambda: the highest-posterior lambda candidate.
credible_interval: an equal-tail credible interval on the grid for the chosen credible_mass.
posterior_rows: candidate rows with validation metrics and posterior mass.

engine.explain_one_bayesian, engine.explain_batch_bayesian, and engine.explain_matrix_bayesian use posterior_mean_lambda. Because Hyb-FPDE is linear in lambda_hyb, this is equivalent to the expected attribution vector under the lambda posterior.

Interpreting results

An FPDE explanation contains:

attributions: one contribution per feature
evidence: the sum of the attribution vector for the selected contrast
positive_score and negative_score: the target and rival side scores
exactness_residual: numerical difference between summed attribution and direct evidence
Prototype labels and prototype indices

For visualization, FPDEExplanation.normalized_attributions returns an L1-normalized attribution vector. Use it for plotting, not as the raw evidence scale.

Getting Started

Concepts

Reference

Method overview

When to use FPDE

Core terms

Prototype construction

Target and rival selection

Diff-FPDE

Cos-FPDE

Hyb-FPDE

Bayesian-FPDE

Interpreting results

​When to use FPDE

​Core terms

​Prototype construction

​Target and rival selection

​Diff-FPDE

​Cos-FPDE

​Hyb-FPDE

​Bayesian-FPDE

​Interpreting results

When to use FPDE

Core terms

Prototype construction

Target and rival selection

Diff-FPDE

Cos-FPDE

Hyb-FPDE

Bayesian-FPDE

Interpreting results