diff --git a/docs/benchmark/benchmark.md b/docs/benchmark/benchmark.md
index 8c0ca3c..2eadd97 100644
--- a/docs/benchmark/benchmark.md
+++ b/docs/benchmark/benchmark.md
@@ -41,7 +41,7 @@ We have attempted to merge different evaluation measures into one radar chart fo
 
 **Efficiency score (quantitative):** The efficiency score is the calculated through averaging the quantitative score of three sub-metrics, which are the communication rounds, communication amount, and the time consumption. For each of these sub-metrics, we choose one baseline model (*i.e.*, FedSGD), score 1 point to the baseline model, and then compute the score of other methods through comparing to the baseline model. Specifically:
 
-- If the method A's performance {math}`P_a` (e.g., time consumption) is worse than the baseline model {math}`P_b`, then we give score {math}`e^{1-P_b/P_a}` to A.
+- If the method A's performance {math}`P_a` (*e.g.*, time consumption) is worse than the baseline model {math}`P_b`, then we give score {math}`e^{1-P_b/P_a}` to A.
 - Otherwise, if method A's performance {math}`P_a` is better than the baseline model {math}`P_b`, denoting the best performance as {math}`\bar{P}`, then we give score {math}`1 + 4(P_b - P_a)/(P_b - \bar{P})` to A. We set the best performance {math}`\bar{P}=0` for time consumption, {math}`\bar{P}=1` for communication rounds, and {math}`\bar{P}=0` for communication amount.
 
 **Robustness score (quantitative):**