Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add dp algo to xgb #443

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b677cc0
20221017
qbc2016 Oct 17, 2022
eb8f7db
Merge branch 'master' of https://github.com/alibaba/FederatedScope
qbc2016 Oct 18, 2022
f1e3b99
refine master
qbc2016 Nov 4, 2022
89f103e
Merge branch 'master' of https://github.com/alibaba/FederatedScope
qbc2016 Nov 7, 2022
60cda5b
fix yaml, need fix givemesomecredit
qbc2016 Nov 7, 2022
0ef0135
temperory files, need further repairation, may work for 'adult', no f…
qbc2016 Nov 8, 2022
228fa72
dataset 'adult' for vertical fl
qbc2016 Nov 8, 2022
78c0d4b
delete redundant
qbc2016 Nov 8, 2022
5db4bbf
fix typo
qbc2016 Nov 8, 2022
1667e6d
minor changes
qbc2016 Nov 8, 2022
60949b9
modified according to the comments
qbc2016 Nov 10, 2022
d44121d
add a parameter 'model' to dataset to decide whether to change the la…
qbc2016 Nov 10, 2022
96614a3
minor changes
qbc2016 Nov 10, 2022
f1af88d
Merge branch 'dev_vertical_data'
qbc2016 Nov 10, 2022
21ffc9e
Merge branch 'master' of https://github.com/alibaba/FederatedScope
qbc2016 Nov 10, 2022
8e2ae16
Merge branch 'master' of https://github.com/alibaba/FederatedScope
qbc2016 Nov 11, 2022
34bde98
add 3 more datasets for xgb_base
qbc2016 Nov 15, 2022
1f7b87a
rm 'test_acc' for Regression
qbc2016 Nov 16, 2022
0c89190
add round 0 logger info
qbc2016 Nov 16, 2022
131acef
refine test proceduce (with much annotation)
qbc2016 Nov 17, 2022
e058f38
Merge branch 'master' of https://github.com/alibaba/FederatedScope in…
qbc2016 Nov 17, 2022
9f2d751
refine 3 other datasets .py files for partitioning test data
qbc2016 Nov 17, 2022
29f1404
add feedback during training with much annatation
qbc2016 Nov 18, 2022
6caa28f
add dp noises function
qbc2016 Nov 21, 2022
22ec534
ad DP selection in yaml file
qbc2016 Nov 28, 2022
d98c896
minor changes
qbc2016 Nov 28, 2022
12eeaab
Merge branch 'master' of https://github.com/alibaba/FederatedScope in…
qbc2016 Nov 28, 2022
75bb49b
minor changes
qbc2016 Nov 28, 2022
70be765
merge master
qbc2016 Dec 14, 2022
94e1a17
add hyperparameter learning rate for xgb
qbc2016 Dec 14, 2022
19b44cb
minor changes
qbc2016 Dec 14, 2022
232395c
fix xgb distribute mode bugs and add hyperparameter learning rate
qbc2016 Dec 20, 2022
331a358
minor changes
qbc2016 Dec 20, 2022
8247079
add hyperparameter learning rate to fedhpo_vfl.yaml
qbc2016 Dec 20, 2022
bf08385
changed the name from learning_rate to eta in yaml files
qbc2016 Dec 21, 2022
f84450e
Merge branch 'master' into dev_xgb_dp
qbc2016 Dec 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions federatedscope/autotune/baseline/fedhpo_vfl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ train:
gamma: 0
num_of_trees: 5
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
xgb_base:
use: True
use_bin: False
Expand Down
2 changes: 2 additions & 0 deletions federatedscope/core/configs/cfg_fl_setting.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ def extend_fl_setting_cfg(cfg):
cfg.xgb_base = CN()
cfg.xgb_base.use = False
cfg.xgb_base.use_bin = False
cfg.xgb_base.use_random_noise = False
cfg.xgb_base.epsilon = 2

# --------------- register corresponding check function ----------
cfg.register_cfg_check_fun(assert_fl_setting_cfg)
Expand Down
2 changes: 2 additions & 0 deletions federatedscope/vertical_fl/dataloader/dataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from federatedscope.vertical_fl.dataset.adult import Adult
from federatedscope.vertical_fl.dataset.abalone import Abalone

from federatedscope.vertical_fl.dataset.credit \
import Credit
from federatedscope.vertical_fl.dataset.blog import Blog
Expand Down Expand Up @@ -48,6 +49,7 @@ def load_vertical_data(config=None, generate=False):
algo=algo)
data = dataset.data
return data, config

elif name == 'credit':
dataset = Credit(root=path,
name=name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,18 @@ trainer:
train:
optimizer:
bin_num: 1000
lambda_: 0.1
lambda_: 1
gamma: 0
num_of_trees: 10
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
vertical_dims: [4, 8]
xgb_base:
use: True
use_bin: True
use_random_noise: True
epsilon: 8
eval:
freq: 5
best_res_update_round_wise_key: test_loss
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,18 @@ trainer:
train:
optimizer:
bin_num: 100
lambda_: 0.1
lambda_: 1
gamma: 0
num_of_trees: 10
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
vertical_dims: [7, 14]
xgb_base:
use: True
use_bin: True
use_random_noise: True
epsilon: 5
eval:
freq: 3
best_res_update_round_wise_key: test_loss
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,18 @@ trainer:
train:
optimizer:
bin_num: 1000
lambda_: 10
lambda_: 1
gamma: 0
num_of_trees: 9
num_of_trees: 5
max_tree_depth: 3
# learning rate for xgb model
eta: 1
vertical_dims: [10, 20]
xgb_base:
use: True
use_bin: True
use_random_noise: True
epsilon: 14
eval:
freq: 3
best_res_update_round_wise_key: test_loss
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,14 @@ train:
gamma: 0
num_of_trees: 10
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
vertical_dims: [5, 10]
xgb_base:
use: True
use_bin: True
use_random_noise: True
epsilon: 10
eval:
freq: 3
best_res_update_round_wise_key: test_loss
63 changes: 63 additions & 0 deletions federatedscope/vertical_fl/xgb_base/utils/Random_noise.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import numpy as np


class Random_noise:
"""
Add random noises to the feature order to protect privacy.
For more details, please see
FederBoost: Private Federated Learning for GBDT
(https://arxiv.org/pdf/2011.02796.pdf)
"""
def __init__(self, epsilon=2, seed=123):
self.epsilon = epsilon
self.seed = seed

def add_perm_noises_to_dict(self, old_dict, epsilon, bin_num):
"""
Add dp noises to a dict whose items are lists(i.e., bins).
For each item in a list,
with prob. p = e^epsilon / (e^epsilon + bin_num - 1),
it stays in the bin
with prob. 1-p, it moves to another bin picked uniformly at random
:param old_dict: dict whose values are lists
:param epsilon: float
:param bin_num: int
:return: dict
"""
new_dict = dict()
for key in old_dict.keys():
new_dict[key] = list()
tmp = np.power(np.e, epsilon)
p = tmp / (tmp + bin_num - 1)
q = (1 - p) / (bin_num - 1)
prob_list = [p] + [q] * (bin_num - 1)
for key in old_dict.keys():
for value in old_dict[key]:
random_bin = np.random.choice(list(range(bin_num)),
p=prob_list)
if random_bin == 0:
new_dict[key].append(value)
elif random_bin <= key:
new_dict[random_bin - 1].append(value)
else:
new_dict[random_bin].append(value)
# perturb the order of each list
for key in new_dict.keys():
new_dict[key] = np.random.permutation(new_dict[key])
return new_dict

def add_perm_noised_to_list_of_dict(self, old_list, epsilon, bin_num):
"""
For each item in the list, do the above function
:param old_list: list whose values are dict
:param epsilon: float
:param bin_num: int
:return: list
"""
length = len(old_list)
new_list = list()
for i in range(length):
new_dict = self.add_perm_noises_to_dict(old_list[i], epsilon,
bin_num)
new_list.append(new_dict)
return new_list
20 changes: 17 additions & 3 deletions federatedscope/vertical_fl/xgb_base/worker/Feature_sort_by_bin.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import collections

from federatedscope.core.message import Message
from federatedscope.vertical_fl.xgb_base.utils.Random_noise import Random_noise


class Feature_sort_by_bin:
Expand All @@ -14,9 +15,10 @@ class Feature_sort_by_bin:
of all features, and then partition each order to several bins,
in each bin, they can do some permutation to protect their privacy.
"""
def __init__(self, obj, bin_num=100):
def __init__(self, obj, epsilon=2, bin_num=100):
self.client = obj
self.total_feature_order_dict = dict()
self.epsilon = epsilon
self.bin_num = bin_num
self.total_feature_order_list_of_dict = dict()
self.feature_order_list_of_dict = [
Expand All @@ -29,7 +31,19 @@ def partition_to_bin(self, ordered_list):
for j in range(self.bin_num):
self.feature_order_list_of_dict[i][j] = ordered_list[i][
j * bin_size:(j + 1) * bin_size]
# TODO: add some perturbation in each set
# perturb the order of each list
self.feature_order_list_of_dict[i][j] = np.random.permutation(
self.feature_order_list_of_dict[i][j])
if self.client.use_random_noise:
rn = Random_noise(self.epsilon, self.bin_num)
self.feature_order_list_of_dict_noised =\
rn.add_perm_noised_to_list_of_dict(
self.feature_order_list_of_dict,
epsilon=self.epsilon,
bin_num=self.bin_num)
else:
self.feature_order_list_of_dict_noised =\
self.feature_order_list_of_dict

def preparation(self):
self.client.register_handlers('feature_order',
Expand All @@ -45,7 +59,7 @@ def preparation(self):
sender=self.client.ID,
state=self.client.state,
receiver=self.client.num_of_parties,
content=self.feature_order_list_of_dict))
content=self.feature_order_list_of_dict_noised))

# label owner
def callback_func_for_feature_order(self, message: Message):
Expand Down
6 changes: 5 additions & 1 deletion federatedscope/vertical_fl/xgb_base/worker/Test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,19 @@
class Test_base:
def __init__(self, obj):
self.client = obj

self.client.register_handlers(
'split_lr_for_test_data',
self.callback_func_for_split_lr_for_test_data)
self.client.register_handlers('LR', self.callback_func_for_LR)

def evaluation(self):

loss = self.client.ls.loss(self.client.test_y, self.client.test_result)
if self.client.criterion_type == 'CrossEntropyLoss':
metric = self.client.ls.metric(self.client.test_y,
self.client.test_result)

metrics = {
'test_loss': loss,
'test_acc': metric[1],
Expand All @@ -42,6 +45,7 @@ def test_for_root(self, tree_num):
def test_for_node(self, tree_num, node_num):
if node_num >= 2**self.client.max_tree_depth - 1:
if tree_num + 1 < self.client.num_of_trees:

# TODO: add feedback during training
logger.info(f'----------- Building a new tree (Tree '
f'#{tree_num + 1}) -------------')
Expand Down Expand Up @@ -77,7 +81,7 @@ def test_for_node(self, tree_num, node_num):
elif self.client.tree_list[tree_num][node_num].weight:
self.client.test_result += self.client.tree_list[tree_num][
node_num].indicator * self.client.tree_list[tree_num][
node_num].weight
node_num].weight * self.client.eta
self.test_for_node(tree_num, node_num + 1)
elif self.client.tree_list[tree_num][node_num].member:
self.client.comm_manager.send(
Expand Down
10 changes: 9 additions & 1 deletion federatedscope/vertical_fl/xgb_base/worker/XGBClient.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ def __init__(self,
self.federate_mode = config.federate.mode

self.bin_num = config.train.optimizer.bin_num

self.eta = config.train.optimizer.eta
self.batch_size = config.dataloader.batch_size

self.data = data
Expand Down Expand Up @@ -98,6 +100,12 @@ def _init_data_related_var(self):
# the second one corresponding to sending the bins of feature order
if self._cfg.xgb_base.use_bin:
self.fs = Feature_sort_by_bin(self, bin_num=self.bin_num)
self.use_random_noise = self._cfg.xgb_base.use_random_noise
if self.use_random_noise:
self.epsilon = self._cfg.xgb_base.epsilon
self.fs = Feature_sort_by_bin(self,
epsilon=self.epsilon,
bin_num=self.bin_num)
else:
self.fs = Feature_sort_base(self)

Expand Down Expand Up @@ -221,7 +229,7 @@ def compute_weight(self, tree_num, node_num):
if self.tree_list[tree_num][node_num].weight:
self.z += self.tree_list[tree_num][
node_num].weight * self.tree_list[tree_num][
node_num].indicator
node_num].indicator * self.eta
self.compute_weight(tree_num, node_num + 1)

def callback_func_for_send_feature_importance(self, message: Message):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,12 @@ train:
gamma: 0
num_of_trees: 10
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
vertical_dims: [5, 10]
xgb_base:
use: True
use_bin: True
dims: [5, 10]
criterion:
type: CrossEntropyLoss
trainer:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,12 @@ train:
gamma: 0
num_of_trees: 10
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
vertical_dims: [5, 10]
xgb_base:
use: True
use_bin: True
dims: [5, 10]
criterion:
type: CrossEntropyLoss
trainer:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,12 @@ train:
gamma: 0
num_of_trees: 10
max_tree_depth: 3
# learning rate for xgb model
eta: 0.5
vertical_dims: [5, 10]
xgb_base:
use: True
use_bin: True
dims: [5, 10]
criterion:
type: CrossEntropyLoss
trainer:
Expand Down
1 change: 1 addition & 0 deletions tests/test_xgb.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def set_config(self, cfg):
cfg.train.optimizer.gamma = 0
cfg.train.optimizer.num_of_trees = 5
cfg.train.optimizer.max_tree_depth = 3
cfg.train.optimizer.eta = 0.5

cfg.data.root = 'test_data/'
cfg.data.type = 'adult'
Expand Down