Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions



Hello and thank you for your amazing work on the SFace model!  

I am currently working on training the SFace model using the repository and have tested it on various datasets. My ultimate goal is to first achieve the same accuracy as your pre-trained model and then train it on my custom dataset. However, I am encountering some issues and discrepancies compared to the official model, and I would greatly appreciate your guidance on resolving them. Below are the details of my observations and questions:

### 1. Model Size Discrepancy  
- The size of my trained SFace model is **5.1 MB**, while the official model provided in the repository is **39 MB**.  
  - What could be causing this significant difference in model size?  
  - Are there specific configurations or components included in the official model that I might be missing?

### 2. Accuracy Discrepancy  
- The accuracy of my trained SFace model is significantly lower than the accuracy of your pre-trained model.  
  - What could be the potential reasons for this gap in performance?  
  - What steps or adjustments can I take to improve the accuracy to match your pre-trained model?

### 3. Output Embedding Size Discrepancy  
- My trained model produces an embedding size of **512**, while I noticed that the official SFace model has an embedding size of **128**.  
  - Why is there a difference in the embedding sizes?  
  - How can I configure my training process to produce an embedding size of **128** instead of **512**?

### 4. Training Configuration Review  
Below is the configuration I used for training my model. Could you please review it and let me know if there are any parameters or settings that need to be adjusted to achieve results closer to your pre-trained model?  

```python
{
    'SEED': 1337,
    'INPUT_SIZE': [112, 112],
    'EMBEDDING_SIZE': 512,
    'DROP_LAST': True,
    'WEIGHT_DECAY': 0.0005,
    'MOMENTUM': 0.9,
    'GPU_ID': [0],
    'DEVICE': device(type='cuda', index=0),
    'MULTI_GPU': False,
    'NUM_EPOCH': 125,
    'STAGES': [35, 65, 95, 205],
    'LR': 0.1,
    'BATCH_SIZE': 240,
    'DATA_ROOT': '../faces_emore/',
    'EVAL_PATH': '../eval/',
    'BACKBONE_NAME': 'MobileFaceNet',
    'HEAD_NAME': 'SFaceLoss',
    'TARGET': ['cfp_ff', 'cplfw', 'calfw', 'cfp_fp', 'vgg2_fp', 'lfw', 'agedb_30'],
    'BACKBONE_RESUME_ROOT': '',
    'HEAD_RESUME_ROOT': '',
    'WORK_PATH': 'face_empire'
}

parser.add_argument('--param_s', default=64.0, type=float)
parser.add_argument('--param_k', default=80.0, type=float)
parser.add_argument('--param_a', default=0.87, type=float)
parser.add_argument('--param_b', default=1.22, type=float)
```

If there is a need for specific changes in the following files, please advise:  
- `sface_torch/config.py`  
- `sface_torch/train_SFace_torch.py`  
- `sface_torch/backbone/model_mobilefacenet.py`

### 5. Training Parameters and Threshold Details  
- **Training Parameters**: What specific parameters did you use to train the official SFace model (e.g., learning rate schedules, optimizer settings, data augmentation techniques, etc.)? This would help me align my training process with yours.  
- **Threshold Details**: I noticed the use of a cosine threshold, `threshold_cosine = 0.363`, in some evaluation scripts.  
  - How was this threshold value determined?  
  - Is it dataset-specific, or is it a general threshold applicable across different datasets?  

### 6. Training Logs for Reference  
Below is a sample of my training logs for reference. If you notice anything unusual or suboptimal in the metrics or training behavior, please let me know:  

```
Epoch 8 Batch 185960    Speed: 797.15 samples/s    intra_Loss -25.3453 (-26.1291)    inter_Loss 16.8062 (18.1256)    Wyi 0.4486 (0.4653)    Wj 0.0001 (0.0001)    Prec@1 77.917 (82.729)
Epoch 8 Batch 185980    Speed: 696.71 samples/s    intra_Loss -26.5736 (-26.1947)    inter_Loss 19.4150 (18.5150)    Wyi 0.4811 (0.4683)    Wj 0.0001 (0.0001)    Prec@1 87.500 (82.583)
Epoch 8 Batch 186000    Speed: 709.95 samples/s    intra_Loss -26.4986 (-26.2168)    inter_Loss 18.4467 (18.5980)    Wyi 0.4808 (0.4673)    Wj 0.0001 (0.0001)    Prec@1 86.250 (82.333)
Learning rate 0.100000
Perform Evaluation on ['cfp_ff', 'cplfw', 'calfw', 'cfp_fp', 'vgg2_fp', 'lfw', 'agedb_30'] , and Save Checkpoints...
(14000, 512)
[cfp_ff][186000]XNorm: 102.98364
[cfp_ff][186000]Accuracy-Flip: 0.98029+-0.00629
[cfp_ff][186000]Best-Threshold: 1.45500
(12000, 512)
[cplfw][186000]XNorm: 85.15097
[cplfw][186000]Accuracy-Flip: 0.78867+-0.02125
[cplfw][186000]Best-Threshold: 1.54200
(12000, 512)
[calfw][186000]XNorm: 103.92467
[calfw][186000]Accuracy-Flip: 0.90883+-0.01038
[calfw][186000]Best-Threshold: 1.49800
(14000, 512)
[cfp_fp][186000]XNorm: 86.52919
[cfp_fp][186000]Accuracy-Flip: 0.80686+-0.02192
[cfp_fp][186000]Best-Threshold: 1.68900
(10000, 512)
[vgg2_fp][186000]XNorm: 89.77735
[vgg2_fp][186000]Accuracy-Flip: 0.84040+-0.01292
[vgg2_fp][186000]Best-Threshold: 1.59500
(12000, 512)
[lfw][186000]XNorm: 104.07785
[lfw][186000]Accuracy-Flip: 0.98400+-0.00642
[lfw][186000]Best-Threshold: 1.43000
(12000, 512)
[agedb_30][186000]XNorm: 100.46037
[agedb_30][186000]Accuracy-Flip: 0.89783+-0.01895
[agedb_30][186000]Best-Threshold: 1.57000
highest_acc: [0.9847142857142857, 0.8046666666666666, 0.9238333333333332, 0.8068571428571429, 0.85, 0.9865, 0.9065000000000001]
Epoch 8 Batch 186020    Speed: 56.99 samples/s    intra_Loss -26.4323 (-26.0712)    inter_Loss 19.7517 (18.7060)    Wyi 0.4774 (0.4650)    Wj 0.0001 (0.0001)    Prec@1 85.000 (82.271)
```

### 7. Model Conversion to ONNX  
I would like to convert my trained SFace model to the ONNX format for deployment.  
- Could you please provide guidance on how to properly convert the SFace model to ONNX?  
- Are there any specific considerations or steps I should follow to ensure compatibility and performance after conversion?

---

Thank you so much for your time and assistance! I am looking forward to your insights and recommendations.

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

1. Model Size Discrepancy

2. Accuracy Discrepancy

3. Output Embedding Size Discrepancy

4. Training Configuration Review

5. Training Parameters and Threshold Details

6. Training Logs for Reference

7. Model Conversion to ONNX

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

Description

1. Model Size Discrepancy

2. Accuracy Discrepancy

3. Output Embedding Size Discrepancy

4. Training Configuration Review

5. Training Parameters and Threshold Details

6. Training Logs for Reference

7. Model Conversion to ONNX

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions