Enabling SQL-based Training Data Debugging for Federated Learning

Posted on December 18, 2021 3 minute read ∼ Filed in : A paper note

Abstract

background

The SQL-based training data debugging framework has proved effective to fix logistical regression model bugs in a non-federated learning setting.

Remove the label errors from training data, such that unexpected behavior disappear in the retrained model.

The paper tries to enable such a framework for federated learning

Challenge

develop a security protocol for FL debugging which is proved to be secure, efficient, and accurate.

Solutions

FedRain is extended from Rain, the state-of-art SQL-based training data debugging framework. It falls short in terms of both efficiency and security.
They propose novel SQL-based training data debugging framework called FROG, which is more secure, more accurate, and more efficient than FedRain.

Introduction

Problems

When a biased or inaccurate federated learning model mispredicts in a way that affects downstream analysis results, how to automatically identify the training examples that most contributed to the downstream error in a way that retains federated learning’s security guarantees.

But in federated learning, training data errors can come from any of data sources.

Solutions

Identify erroneous training records that directly affect errors in the query results.

And then remove the subset of training data and retrain the model.

Model Loss:

It removes the training examples from highest too lowest training loss until complaint is resolved.

The losss can be obtained during federated inference.

The method is secure, but the model loss is independent of the complaint and it likely remove many irreleveant training records.
Influence Function

Allow instance-based complaint (expose the instance to both party. )

Quickly approximates the infuence of removing each training example on the misprediction.

And then iteratively ranks and removes training records until complaint is resolved.

This is not secure.
Rain

Allow to expose why the prediction is not accurate and what it shoude be.

Company A can directly complain that why the ratio is too high and it should be zero.

Rain iteratively ranks and removes training records that most increased the ratio until the complaint is resolved. Unfortunately, Rain is not secure.
The paper presents a secure debugging framework which provides Rain-style complaints.

FedRain needs to limit the number of stochastic gradient descent iterations to less than the number of features in order to ensure security. However, this is typically far lower than the iterations needed for logistic regression to converge. Thus, FedRain often cannot reach high model accuracy without breaking the security guarantee

Contributions

Enable SQL based training data debugging for FL
Propose FedRain, a security guaranteed version of Rain
Porpose Frog, a novel FL debugging framework
Conduct experiments. Result shows Frog is better in terms of efficiency and accuracy.

Problem definition

Our goal is to identify the minimum number of training examples such that if they were removed, and the model was retrained, the updated model would lead to a new query result that satisfes the complaint .

Background

Rain:SQL-based training framework

Overall

Rain then takes the complaint as input and produces a ranked list of the training examples based on how much each training example contributes to the complaint.

Generate the ranked list;
Remove the top-k training examples from the ranked list;
Retrain an ML model on the new training set;
Set = Repeat (1)-(4) until < 0.

main contributition

How to effciently compute the effect on the query result for deleting each training example.
How to make the SQL query differentiable with respect to the model parameters so that continuous optimization techniques can be applied for solving the challenge 1.

Solutions

convert SQL query into a formula
Relax into continuous variables so it become differentiable
Compare score for each training examples

Enabling SQL-based Training Data Debugging for Federated Learning

Abstract

background

Challenge

Solutions

Introduction

Problems

Solutions

Contributions

Problem definition

Background