Visual Information Extraction

SOTAs

This page serves as a compilation of the performance metrics achieved by various visual information extraction algorithms on public benchmarks. The data presented here are collected from research papers as well as official code repositories.

🎖️Commonly Used Metrics

F1-score

Given the prediction of the model and the ground-truth, if the predicted content is exactly consistent with the ground-truth, it will be recorded as a true positive(TP) sample. Let $N_p$ denotes the number of predictions, $N_g$ for the number of ground-truths, $N_t$ for the number of TP samples, then we have

$$ precision = \frac{N_t}{N_p} $$

$$ recall = \frac{N_t}{N_g} $$

$$ F1 = \frac{2 \times precision \times recall}{precision + recall} $$

Entity F1 score

The Entity F1 score is a metric used for the Entity Extraction task (also known as Semantic Entity Recognition, SER). It measures the accuracy of the predicted string and its corresponding category with respect to the ground-truth. When both the predicted string and category match the ground-truth, it is considered a TP sample.

If you are using BIO-tagging models, such as LayoutLM, LiLT, etc., you can utilize the seqeval library for metric calculation.

Linking F1 score

Used as a metric for Entity Linking (or Relation Extraction, RE) task. Models takes the ground-truths of Entity Extraction as input, then predicts the linking relation between entities. A linking is considered as TP if and only if the predicted pair exists in the ground-truth pairs.

Pair F1 score

Used as a metric for end-to-end pair extraction task. A prediction is considered TP if and only if the predicted key-value pair exactly matches the ground-truth pair.

QA F1 score

The metric is used specifically for LLM-based models.

For Entity Extraction, two types of operations are employed:

The model takes the text content of an entity as input and predicts its corresponding key category. Used for datasets like FUNSD, where each key category contains multiple entities.
The model takes the key category name as input and predicts the corresponding text content. Used for datasets where each key category contains one or no entity.

For Entity Linking, the model takes the question entity as input, then predicts the corresponding answer entity.

⚠️ It is worth-noting that, the QA F1 score is a more relaxed metric compared to the conventional settings, since prior information like the entity span is provided to the model. Therefore, scores obtained through the QA pipeline cannot be directly compared with the scores obtained through the conventional settings. In this section, we will list these QA scores separately.

Edit Distance Score

The edit distance between the prediction string and ground-truth of a key category is calculated as follow

$$ score = 1 - \frac{i + d + m}{N} $$

where $i$, $d$, $m$, $N$ denotes the number of insertions, number of deletions, number of modifications and the total number of instances occurring in the ground truth, respectively.

The document parsing task in CORD employs this metric. The zhang-shasha library can be used to calculate the edit distance between strings.

🗒️List of Index

SROIE
CORD
FUNSD
XFUND
EPHOIE
DeepForm
Kleister Charity

SROIE

The SROIE dataset takes the entity micro F1-score as the evaluation metric. The dataset contains 4 key categories, each category contains one or no entity. The dataset consists of four key categories, with each category containing one or no entity. In this metric, if the predicted string of a key category is consistent with the ground-truth string, it will be recorded as a true positive (TP) sample. The total number of TP samples, total number of predictions, and total number of ground-truth strings are used to calculate the micro-F1 score.

You can find the evaluation scripts for the SROIE dataset on the ICDAR2019 SROIE official page (Download tab, Task 3 Evaluation script).

Type	Approach		Precision	Recall	F1	QA F1
Grid-based	ViBERTgrid	BERT-base	-	-	96.25	-
Grid-based	ViBERTgrid	RoBERTa-base	-	-	96.40	-
GNN-based	PICK		-	-	96.12	-
	MatchVIE		-	-	96.57	-
	GraphDoc		-	-	98.45	-
	FormNetV2		-	-	98.31	-
Large Scale Pre-trained	LayoutLM	base	94.38	94.38	94.38	-
	LayoutLM	large	95.24	95.24	95.24	-
	LayoutLMv2	base	96.25	96.25	96.25	-
	LayoutLMv2	large	99.04	96.61	97.81	-
	TILT	base	-	-	97.65	-
	TILT	large	-	-	98.10	-
	BROS	base	-	-	95.91	-
	BROS	large	-	-	96.62	-
	StrucTexT	eng-base	-	-	96.88	-
		chn&eng-base	-	-	98.27	-
		chn&eng-large	-	-	98.70	-
	WUKONG-READER	base	-	-	96.88	-
	WUKONG-READER	large	-	-	98.15	-
	ERNIE-layout	large	-	-	97.55	-
	QGN		-	-	97.90	-
	LayoutMask	base	-	-	96.87	-
	LayoutMask	large	-	-	97.27	-
	HGALayoutLM	base	99.58	99.48	99.53	-
	HGALayoutLM	large	99.69	99.53	99.61	-
End-to-End	TRIE	ground-truth	-	-	96.18	-
	TRIE	end-to-end	-	-	82.06	-
	VIES	ground-truth	-	-	96.12	-
	VIES	end-to-end	-	-	91.07	-
	Kuang CFAM	end-to-end	-	-	85.87	-
	OmniParser		-	-	85.60	-
	HIP		-	-	87.60	-
LLM-based	HRVDA		-	-	-	91.00
	Monkey		-	-	-	41.90
	TextMonkey		-	-	-	47.00
	MiniMonkey		-	-	-	70.30
	UniDoc	224	-	-	-	1.40
	UniDoc	336	-	-	-	2.92
	DocPedia	224	-	-	-	17.01
	DocPedia	336	-	-	-	21.44
	LayoutLLM	Llama2-7B-chat	-	-	-	70.97
	LayoutLLM	Vicuna-1.5-7B	-	-	-	72.12
Other Methods	TCPN	TextLattice	-	-	96.54	-
		Tag, ground-truth	-	-	95.46	-
		Tag, end-to-end	-	-	91.21	-
		Tag&Copy, end-to-end	-	-	91.93	-

CORD

The authors of the CORD dataset, the Clova-AI team, have not explicitly specified the task type and evaluation metrics for this dataset. However, upon reviewing the source code of Donut, one of Clova-AI's works, it is apparent that they evaluate the model's performance in Document Structure Parsing. In a typical receipt, various details about the purchased items are provided, such as their names, quantities, and unit prices. These entities have a hierarchical relationship, and a receipt can be represented by a JSON-like structure as shown below:

{
    "menu": [
        {
            "nm": "EGG TART",
            "cnt": "1",
            "price": "13,000"
        },
        {
            "nm": "CHOCO CUS ARD PASTRY",
            "cnt": "2",
            "price": "24,000"
        },
        {
            "nm": "REDBEAN BREAD",
            "cnt": "1",
            "price": "9,000"
        }
    ],
    "total": {
        "total_price": "46,000",
        "cashprice": "50,000",
        "changeprice": "4,000"
    }
}

The evaluation metric used by Donut is the TED Acc (Tree Edit Distance Accuracy), which measures the similarity between the predicted JSON and the ground-truth.

In addition to Document Structure Parsing, Donut also evaluates the model's performance on the Entity Extraction task, using the Entity F1 score as the evaluation metric. Most SOTA models follows this evaluation pipeline.

Another work by clovvai, SPADE, evaluate the model's performance on Document Structure Parsing through a relaxed structured field F1-score. This evaluation measures the accuracy of dependency parsing by computing the F1 score of predicted edges. The task is simplified by not considering differences between predictions and ground truth in certain fields (such as store name, menu name, and item name) when the edit distance is less than 2 or when the ratio of edit distance to the ground truth string length is less than or equal to 0.4. Details can be found in their paper (Section 5.3 and A.2).

Some other works, such as BROS, evaluate the model's performance on Entity Linking using the Linking F1 score.

Type	Approach		Entity Extraction				Entity Linking			Document Structure Parsing
Type	Approach		Precision	Recall	F1	QA F1	Precision	Recall	F1	Precision	Recall	F1	TED Acc
GNN-based	GraphDoc		-	-	96.93	-	-	-	-	-	-	-	-
	FormNet		98.02	96.55	97.28	-	-	-	-	-	-	-	-
	FomNetV2		-	-	97.70	-	-	-	-	-	-	-	-
Large Scale Pre-trained	LayoutLM	base	94.37	95.08	94.72	-	-	-	-	-	-	-	-
	LayoutLM	large	94.32	95.54	94.93	-	-	-	-	-	-	-	-
	LayoutLMv2	base	94.53	95.39	94.95	-	-	-	-	-	-	-	-
	LayoutLMv2	large	95.65	96.37	96.01	-	-	-	-	-	-	-	-
	LayoutLMv3	base	-	-	96.56	-	-	-	-	-	-	-	-
	LayoutLMv3	large	-	-	97.46	-	-	-	-	-	-	-	-
	DocFormer	base	96.52	96.14	96.33	-	-	-	-	-	-	-	-
	DocFormer	large	97.25	96.74	96.99	-	-	-	-	-	-	-	-
	TILT	base	-	-	95.11	-	-	-	-	-	-	-	-
	TILT	large	-	-	96.33	-	-	-	-	-	-	-	-
	BROS	base	-	-	96.50	-	-	-	95.73	-	-	-	-
	BROS	large	-	-	97.28	-	-	-	97.40	-	-	-	-
	UDoc	UDoc	-	-	96.64	-	-	-	-	-	-	-	-
	UDoc	UDoc*	-	-	96.86	-	-	-	-	-	-	-	-
	LiLT	[EN-RoBERTa]base	-	-	96.07	-	-	-	-	-	-	-	-
	LiLT	[InfoXLM]base	-	-	95.77	-	-	-	-	-	-	-	-
	DocReL		-	-	97.00	-	-	-	-	-	-	-	-
	WUKONG-READER	base	-	-	96.54	-	-	-	-	-	-	-	-
	WUKONG-READER	large	-	-	97.27	-	-	-	-	-	-	-	-
	ERNIE-layout	large	-	-	96.99	-	-	-	-	-	-	-	-
	QGN		-	-	96.84	-	-	-	-	-	-	-	-
	GeoLayoutLM		-	-	97.97	-	-	-	99.45	-	-	-	-
	GraphLayoutLM	base	-	-	97.28	-	-	-	-	-	-	-	-
	GraphLayoutLM	large	-	-	97.75	-	-	-	-	-	-	-	-
	HGALayoutLM	base	97.89	97.16	97.52	-	-	-	-	-	-	-	-
	HGALayoutLM	large	97.97	97.38	97.67	-	-	-	-	-	-	-	-
	DocFormerv2	base	97.51	96.10	96.80	-	-	-	-	-	-	-	-
	DocFormerv2	large	97.71	97.70	97.70	-	-	-	-	-	-	-	-
	DocTr		-	-	98.20	-	-	-	-	-	-	94.40	-
	LayoutMask	base	-	-	96.99	-	-	-	-	-	-	-	-
	LayoutMask	large	-	-	97.19	-	-	-	-	-	-	-	-
End-to-End	Donut		-	-	84.10	-	-	-	-	-	-	-	90.90
	ESP		-	-	95.65	-	-	-	-	-	-	-	-
	UDOP		-	-	97.58	-	-	-	-	-	-	-	-
	CREPE		-	-	85.00	-	-	-	-	-	-	-	-
	OmniParser		-	-	84.80	-	-	-	-	-	-	-	88.00
	HIP		-	-	85.70	-	-	-	-	-	-	-	-
LLM-based	HRVDA		-	-	-	89.30	-	-	-	-	-	-	-
	LayoutLLM	Llama2-7B-chat	-	-	-	62.21	-	-	-	-	-	-	-
	LayoutLLM	Vicuna-1.5-7B	-	-	-	63.10	-	-	-	-	-	-	-
Other Methods	SPADE	♠ CORD, oracle input	-	-	-	-	-	-	-	-	-	92.50	-
		♠ CORD	-	-	-	-	-	-	-	-	-	88.20	-
		♠ CORD+	-	-	-	-	-	-	-	-	-	87.40	-
		♠ CORD++	-	-	-	-	-	-	-	-	-	83.10	-
		♠ w/o TCM, CORD, oracle input	-	-	-	-	-	-	-	-	-	91.50	-
		♠ w/o TCM, CORD	-	-	-	-	-	-	-	-	-	87.40	-
		♠ w/o TCM, CORD+	-	-	-	-	-	-	-	-	-	86.10	-
		♠ w/o TCM, CORD++	-	-	-	-	-	-	-	-	-	82.60	-

FUNSD

FUNSD comprises two tasks: Entity Extraction and Entity Linking. The Entity Extraction task requires extracting header, question, and answer entities from the document, and employs Entity F1 Score as the evaluation metric. The Entity Linking task focuses on linking predictions between question and answer entities, and uses Linking F1 Score as the evaluation metric.

It is worth noting that, in most mainstream approaches, these two subtasks are considered independent. For instance, LayoutLM's Entity Linking official implementation takes the ground-truth of question and answer entities as input and predict the linkings only, without considering the performance of Entity Extraction.

Real-world applications require extracting all key-value pairs from the document, which involves combining the EE and EL tasks to predict the entire kv-pair content. We termed this task as the End-to-end Pair Extraction. It presents challenges such as error accumulation and text segment aggregation. Regrettably, only a few studies have recognized and addressed these challenges, while the majority of research continues to follow the conventional EE+EL setting. We hope to see more studies that delve into this particular case.

Type	Approach		Entity Extraction				Entity Linking				E2E Pair Extraction
Type	Approach		Precision	Recall	F1	QA F1	Precision	Recall	F1	QA F1	Precision	Recall	F1
Grid-based	MSAU-PAF		-	-	83.00	-	-	-	-	-	-	-	75.00
GNN-based	GraphDoc		-	-	87.77	-	-	-	-	-	-	-	-
	MatchVIE		-	-	81.33	-	-	-	-	-	-	-	-
	FormNet		-	-	84.69	-	-	-	-	-	-	-	-
	FormNetV2		-	-	92.51	-	-	-	-	-	-	-	-
Large Scale Pre-trained	LayoutLM	base	75.97	81.55	78.66	-	-	-	-	-	-	-	-
	LayoutLM	large	75.96	82.19	78.95	-	-	-	-	-	-	-	-
	LayoutLMv2	base	80.29	85.39	82.76	-	-	-	-	-	-	-	-
	LayoutLMv2	large	83.24	85.19	84.20	-	-	-	-	-	-	-	-
	LayoutXLM	base, Language Specific Fine-tuning	-	-	79.40	-	-	-	54.83	-	-	-	-
		large, Language Specific Fine-tuning	-	-	82.25	-	-	-	64.04	-	-	-	-
		base, Multitask Fine-tuning	-	-	79.24	-	-	-	66.71	-	-	-	-
		large, Multitask Fine-tuning	-	-	80.68	-	-	-	76.83	-	-	-	-
	LayoutLMv3	base	-	-	90.29	-	-	-	-	-	-	-	-
	LayoutLMv3	large	-	-	92.08	-	-	-	-	-	-	-	-
	XYLayoutLM		-	-	83.35	-	-	-	-	-	-	-	-
	SelfDoc		-	-	83.36	-	-	-	-	-	-	-	-
	DocFormer	base	80.76	86.09	83.34	-	-	-	-	-	-	-	-
	DocFormer	large	82.29	86.94	84.55	-	-	-	-	-	-	-	-
	StructuralLM-large		83.52	-	85.14	-	-	-	-	-	-	-	-
	BROS	base	81.16	85.02	83.05	-	-	-	71.46	-	-	-	-
	BROS	large	82.81	86.31	84.52	-	-	-	77.01	-	-	-	-
	StrucTexT	eng-base	-	-	83.09	-	-	-	44.10	-	-	-	-
		chn&eng-base	-	-	84.83	-	-	-	70.45	-	-	-	-
		chn&eng-large	-	-	87.56	-	-	-	74.21	-	-	-	-
	UDoc	UDoc	-	-	87.96	-	-	-	-	-	-	-	-
	UDoc	UDoc*	-	-	87.93	-	-	-	-	-	-	-	-
	LiLT	[En RoBERTa]base	87.21	89.65	88.41	-	-	-	-	-	-	-	-
		[InfoXLM]base	84.67	87.09	85.86	-	-	-	-	-	-	-	-
		[InfoXLM]base, Language Specific Fine-tuning	-	-	84.15	-	-	-	62.76	-	-	-	-
		[InfoXLM]base, Multitask Fine-tuning	-	-	85.74	-	-	-	74.07	-	-	-	-
	DocReL		-	-	-	-	-	-	46.10	-	-	-	-
	WUKONG-READER	base	-	-	91.52	-	-	-	-	-	-	-	-
	WUKONG-READER	large	-	-	93.62	-	-	-	-	-	-	-	-
	ERNIE-layout	large	-	-	93.12	-	-	-	-	-	-	-	-
	GeoLayoutLM		-	-	92.86	-	-	-	89.45	-	-	-	-
	KVPFormer		-	-	-	-	-	-	90.86	-	-	-	-
	GraphLayoutLM	base	-	-	93.15	-	-	-	-	-	-	-	-
	GraphLayoutLM	large	-	-	94.39	-	-	-	-	-	-	-	-
	HGALayoutLM	base	94.84	93.80	94.32	-	-	-	-	-	-	-	-
	HGALayoutLM	large	95.67	94.95	95.31	-	-	-	-	-	-	-	-
	DocFormerv2	base	89.15	87.60	88.37	-	-	-	-	-	-	-	-
	DocFormerv2	large	89.88	87.92	88.89	-	-	-	-	-	-	-	-
	DocTr		-	-	84.00	-	-	-	73.90	-	-	-	-
	DocFormerv2	base	89.15	87.60	88.37	-	-	-	-	-	-	-	-
	DocFormerv2	large	89.88	87.92	88.89	-	-	-	-	-	-	-	-
	LayoutMask	base	-	-	92.91	-	-	-	-	-	-	-	-
	LayoutMask	large	-	-	93.20	-	-	-	-	-	-	-	-
End-to-End	ESP		-	-	91.12	-	-	-	88.88	-	-	-	-
	UDOP		-	-	91.62	-	-	-	-	-	-	-	-
	HIP		-	-	52.00	-	-	-	-	-	-	-	-
LLM-based	Monkey		-	-	-	-	-	-	-	24.10	-	-	-
	TextMonkey		-	-	-	-	-	-	-	32.30	-	-	-	-	-	-	-	-
	MiniMonkey		-	-	-	-	-	-	-	42.90	-	-	-	-	-	-	-	-
	UniDoc	224	-	-	-	-	-	-	-	1.19	-	-	-
	UniDoc	336	-	-	-	-	-	-	-	1.02	-	-	-
	DocPedia	224	-	-	-	-	-	-	-	18.75	-	-	-
	DocPedia	336	-	-	-	-	-	-	-	29.86	-	-	-
	LayoutLLM	Llama2-7B-chat	-	-	-	-	-	-	-	78.65	-	-	-
	LayoutLLM	Vicuna-1.5-7B	-	-	-	-	-	-	-	79.98	-	-	-
Other Methods	SPADE		-	-	71.60	-	-	-	41.30	-	-	-	-

XFUND

XFUND is an multi-lingual extension of FUNSD, covering 7 languages: Chinese, Japanese, Spanish, French, Italian, German, and Portuguese. It contains 1,393 fully annotated forms, with each language containing 199 forms. The training set comprises 149 forms, while the test set includes 50 forms. XFUND also includes two subtasks: Entity Extraction and Entity Linking. Its follows the same evaluation protocol as FUNSD.

Note: In the following chart, the term Avg. represents the average score of the 7 non-English subsets. Some methods include the English subset in their reported average scores. To ensure a fair comparison, we made adjustments accordingly.

Type	Approach		Entity Extraction								Entity Linking
Type	Approach		ZH	JA	ES	FR	IT	DE	PT	Avg.	ZH	JA	ES	FR	IT	DE	PT	Avg.
Large Scale Pre-trained	LayoutXLM	base, Language Specific Fine-tuning	89.24	79.21	75.50	79.02	80.02	82.22	79.03	82.40	70.73	69.63	68.96	63.53	64.15	65.51	57.18	65.67
		large, Language Specific Fine-tuning	91.61	80.33	78.30	80.98	82.75	83.61	82.73	82.90	78.88	72.25	76.66	71.02	76.91	68.43	67.96	73.16
		base, Zero-shot transfer	60.19	47.15	45.65	57.57	48.46	52.52	53.90	52.21	44.94	44.08	47.08	44.16	40.90	38.20	36.85	42.31
		large, Zero-shot transfer	68.96	51.90	49.76	61.35	55.17	59.05	60.77	58.14	55.31	56.96	57.80	56.15	51.84	48.90	47.95	53.56
		base, Multitask Fine-tuning	89.73	79.64	77.98	81.73	82.10	83.22	82.41	82.40	82.41	81.42	81.04	82.21	83.10	78.54	70.44	79.88
		large, Multitask Fine-tuning	91.55	82.16	80.55	83.84	83.72	85.30	86.50	84.80	90.00	86.21	85.92	86.69	86.75	82.63	81.60	85.69
	XYLayoutLM		91.76	80.57	76.87	79.97	81.75	83.35	80.01	82.04	74.45	70.59	72.59	65.21	65.72	67.03	58.98	67.79
	LiLT	[InfoXLM] base, Language Specific Fine-tuning	89.38	79.64	79.11	79.53	83.76	82.31	82.20	82.27	72.97	70.37	71.95	69.65	70.43	65.58	58.74	68.53
		[InfoXLM] base, Zero-shot transfer	61.52	51.84	51.01	59.23	53.71	60.13	63.25	57.24	47.64	50.81	49.68	52.09	46.97	41.69	42.72	47.37
		[InfoXLM] base, Multi-task Fine-tuning	90.47	80.88	83.40	85.77	87.92	87.69	84.93	85.86	84.71	83.45	83.35	84.66	84.58	78.78	76.43	82.28
	KVPFormer		-	-	-	-	-	-	-	-	94.27	94.23	95.23	97.19	94.11	92.41	92.19	94.23
	HGALayoutLM		94.22	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
End-to-End	ESP	Language Specific Fine-tuning	90.30	81.10	85.40	90.50	88.90	87.20	87.50	87.30	90.80	88.30	85.20	90.90	90.00	85.20	86.20	88.10
End-to-End	ESP	Multitask Fine-tuning	-	-	-	-	-	-	-	89.13	-	-	-	-	-	-	-	92.31

EPHOIE

EPHOIE consists of 11 key categories for Entity Extraction and takes the Entity F1 as the evaluation metric. If the predicted string of a key category is consistent with the ground-truth string and not empty, it will be recorded as a TP sample.

Type	Approach		Precision	Recall	F1
Grid-based	MathcVIE		-	-	96.87
Large-Scale Pre-trained	StrucTexT	chn&eng-base	-	-	98.84
	StrucTexT	chn&eng-large	-	-	99.30
	LiLT	[InfoXLM]base	96.99	98.20	97.59
	LiLT	[ZH-RoBERTa]base	97.62	98.33	97.97
	QGN		-	-	98.49
End-to-End	VIES	ground-truth	-	-	95.23
End-to-End	VIES	end-to-end	-	-	83.81
Other Methods	TCPN	TextLattice	-	-	98.06
		Copy Mode, end-to-end	-	-	84.67
		Tag Mode, end-to-end	-	-	86.19
		Tag Mode, ground-truth	-	-	97.59

DeepForm

Type	Approach		QA F1
End-to-end	Donut		61.60
LLM-based	Qwen-VL		4.10
	Monkey		40.60
	mPLUG-DocOwl		42.60
	mPLUG-DocOwl 1.5	DocOwl-1.5	68.80
	mPLUG-DocOwl 1.5	DocOwl-1.5 chat	68.80
	UReader		49.50

Kleister Charity

Kleister Charity (KLC) contains 8 kind of key categories. It contains 2788 financial reports with 61643 pages in total. This benchmark is commonly used by LLM-based approaches in a QA-manner.

Type	Approach		QA F1
End-to-end	Donut		30.00
LLM-based	Qwen-VL		15.90
	Monkey		32.80
	mPLUG-DocOwl		30.30
	mPLUG-DocOwl 1.5	DocOwl-1.5	37.90
	mPLUG-DocOwl 1.5	DocOwl-1.5 chat	38.70
	UReader		32.80
	DoCo	Qwen-VL-Chat	33.80
	DoCo	mPLUG-Owl	32.90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sotas_vie.md

sotas_vie.md

Visual Information Extraction

SOTAs

🎖️Commonly Used Metrics

F1-score

Entity F1 score

Linking F1 score

Pair F1 score

QA F1 score

Edit Distance Score

🗒️List of Index

SROIE

CORD

FUNSD

XFUND

EPHOIE

DeepForm

Kleister Charity

Files

sotas_vie.md

Latest commit

History

sotas_vie.md

File metadata and controls

Visual Information Extraction

SOTAs

🎖️Commonly Used Metrics

F1-score

Entity F1 score

Linking F1 score

Pair F1 score

QA F1 score

Edit Distance Score

🗒️List of Index

SROIE

CORD

FUNSD

XFUND

EPHOIE

DeepForm

Kleister Charity