You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by li1553770945 March 13, 2023
I have successfully installed the fedscale framework and downloaded the femnist dataset. I am trying to follow the information on this page Deploment to complete my first sample. I entered the fedscale driver start conf.yml command and the contents of my conf.yml are as follows.
# Configuration file of FAR training experiment# ========== Cluster configuration ========== # ip address of the parameter server (need 1 GPU process)ps_ip: localhost# ip address of each worker:# of available gpus process on each gpu in this node# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3worker_ips:
- localhost:[4]exp_path: $FEDSCALE_HOME/fedscale/cloud# Entry function of executor and aggregator under $exp_pathexecutor_entry: execution/executor.pyaggregator_entry: aggregation/aggregator.pyauth:
ssh_user: ""ssh_private_key: ~/.ssh/id_rsa# cmd to run before we can indeed run FAR (in order)setup_commands:
- source $HOME/anaconda3/bin/activate fedscale# ========== Additional job configuration ========== # Default parameters are specified in config_parser.py, wherein more description of the parameter can be foundjob_conf:
- job_name: femnist # Generate logs under this folder: log_path/job_name/time_stamp
- log_path: $FEDSCALE_HOME/benchmark # Path of log files
- num_participants: 50# Number of participants per round, we use K=100 in our paper, large K will be much slower
- data_set: femnist # Dataset: openImg, google_speech, stackoverflow
- data_dir: $FEDSCALE_HOME/benchmark/dataset/data/femnist # Path of the dataset
- data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv # Allocation of data to each client, turn to iid setting if not provided
- device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity # Path of the client trace
- device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
- model: resnet18 # NOTE: Please refer to our model zoo README and use models for these small image (e.g., 32x32x3) inputs# - model_zoo: fedscale-torch-zoo
- eval_interval: 10# How many rounds to run a testing on the testing set
- rounds: 1000# Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
- filter_less: 21# Remove clients w/ less than 21 samples
- num_loaders: 2
- local_steps: 5
- learning_rate: 0.05
- batch_size: 20
- test_bsz: 20
- use_cuda: False
- save_checkpoint: False
After that the message displayed is
starting aggregator on localhost...
Aggregator local PID 2767758. run kill -9 2767758 to kill the job.
Starting workers on localhost ...
Submitted job, please check your logs $FEDSCALE_HOME/benchmark/logs/femnist/0314_015101 for status
I checked the log file directory and didn't find the log file, and I checked the PID of his output and there is no process.
The text was updated successfully, but these errors were encountered:
Discussed in #217
Originally posted by li1553770945 March 13, 2023
I have successfully installed the fedscale framework and downloaded the femnist dataset. I am trying to follow the information on this page Deploment to complete my first sample. I entered the
fedscale driver start conf.yml
command and the contents of my conf.yml are as follows.After that the message displayed is
I checked the log file directory and didn't find the log file, and I checked the PID of his output and there is no process.
The text was updated successfully, but these errors were encountered: