Enabling SQL-based Training Data Debugging for Federated Learning
3 minute read ∼ Filed in : A paper noteAbstract
background
The SQL-based training data debugging framework has proved effective to fix logistical regression model bugs in a non-federated learning setting.
- Remove the label errors from training data, such that unexpected behavior disappear in the retrained model.
The paper tries to enable such a framework for federated learning
Challenge
develop a security protocol for FL debugging which is proved to be secure, efficient, and accurate.
Solutions
- FedRain is extended from Rain, the state-of-art SQL-based training data debugging framework. It falls short in terms of both efficiency and security.
- They propose novel SQL-based training data debugging framework called FROG, which is more secure, more accurate, and more efficient than FedRain.
Introduction
Problems
When a biased or inaccurate federated learning model mispredicts in a way that affects downstream analysis results, how to automatically identify the training examples that most contributed to the downstream error in a way that retains federated learning’s security guarantees.
But in federated learning, training data errors can come from any of data sources.
Solutions
Identify erroneous training records that directly affect errors in the query results.
And then remove the subset of training data and retrain the model.
-
Model Loss:
It removes the training examples from highest too lowest training loss until complaint is resolved.
The losss can be obtained during federated inference.
The method is secure, but the model loss is independent of the complaint and it likely remove many irreleveant training records.
-
Influence Function
Allow instance-based complaint (expose the instance to both party. )
Quickly approximates the infuence of removing each training example on the misprediction.
And then iteratively ranks and removes training records until complaint is resolved.
This is not secure.
-
Rain
Allow to expose why the prediction is not accurate and what it shoude be.
Company A can directly complain that why the ratio is too high and it should be zero.
Rain iteratively ranks and removes training records that most increased the ratio until the complaint is resolved. Unfortunately, Rain is not secure.
-
The paper presents a secure debugging framework which provides Rain-style complaints.
FedRain needs to limit the number of stochastic gradient descent iterations to less than the number of features in order to ensure security. However, this is typically far lower than the iterations needed for logistic regression to converge. Thus, FedRain often cannot reach high model accuracy without breaking the security guarantee
Contributions
- Enable SQL based training data debugging for FL
- Propose FedRain, a security guaranteed version of Rain
- Porpose Frog, a novel FL debugging framework
- Conduct experiments. Result shows Frog is better in terms of efficiency and accuracy.
Problem definition
Our goal is to identify the minimum number of training examples such that if they were removed, and the model was retrained, the updated model would lead to a new query result that satisfes the complaint .
Background
Rain:SQL-based training framework
Overall
Rain then takes the complaint as input and produces a ranked list of the training examples based on how much each training example contributes to the complaint.
- Generate the ranked list;
- Remove the top-k training examples from the ranked list;
- Retrain an ML model on the new training set;
- Set = Repeat (1)-(4) until < 0.
main contributition
- How to effciently compute the effect on the query result for deleting each training example.
- How to make the SQL query differentiable with respect to the model parameters so that continuous optimization techniques can be applied for solving the challenge 1.
Solutions
-
convert SQL query into a formula
-
Relax into continuous variables so it become differentiable
-
Compare score for each training examples
FedRain: Federated Rain