Failure to reproduce Table 3 of the e2efold paper #6

kad-ecoli · 2020-06-30T23:55:21Z

I try to run https://github.com/ml4bio/e2efold/blob/master/e2efold_productive/e2efold_productive_short.py on the ArchiveII dataset used to benchmark e2efold in Table 3 of the e2efold paper. For time's sake, I only run the result on the subset of 3911 target RNAs with up to 600 nucleotides, rather than the full set of 3975 RNAs. Nonetheless, I do not think the small number of 64 (1.6%) out of 3975 would alter the conclusion of the benchmark. Even though the result of e2efold pretrained model on this dataset is much better than what was shown in #5, it is still worse than what was reported in the paper:

Method	F1	MCC	Predicted base pairs per RNA
e2efold	0.5540	0.5595	42.2986
LinearFold	0.6060	0.6085	56.8372

In particular, e2efold is found to be even worse than LinearFold, a thermodynamics based RNA folding program, although e2efold is supposed to outperform all state-of-the-art algorithms on this dataset according to the original paper. Note that on average, each target in these 3911 RNAs has 59.1386 base pairs; therefore, e2efold is certainly under-predicting many pairs.

I wondered whether such poor performance is caused by inconsistency in packaging details of e2efold_productive. To verify this, could the e2efold team kindly provides the detail table of per target F1 so that I can check the worse offenders? Thank you.

liyu95 · 2020-07-01T02:05:03Z

Thank you for reproducing the Table 3 performance! Yes, e2efold_productive and experiment_archiveii are slightly different.

For your information, the detailed reproducing code for ArchiveII and Table 3 is in the folder experiment_archiveii.
As we discussed in the paper, we reported the performance of the RNA types in ArchiveII, which also have enough training data samples in the training dataset. More specifically, the performance of the following RNA types are reported:
['RNaseP', '5s', 'tmRNA', 'tRNA', 'telomerase', '16s']
We did not include grp1 as their sequence lengths in ArchiveII and RNAStralign are completely different. For SRP, we do not have enough training samples in the RNAStralign dataset, which only contains 468 samples. While in ArchiveII, there are much more SRP sequences, whose number is 928.

You can run main.sh in the folder experiment_archiveii to reproduce the performance in Table 3.

If you want to go into details, you can check e2e_learning_stage3.py and e2e_learning_stage3_rnastralign_all_long.py in the same folder.

kad-ecoli · 2020-07-01T03:06:52Z

Can you give the list of ArchiveII target (i.e. RNA names, not just the types) on which Table 3 is calculated?

liyu95 · 2020-07-01T03:11:29Z

Thank you very much for your interest!

Everything (name, sequence length, performance) is stored with the following code (line 228-244 in e2e_learning_stage3.py)

    e2e_result_df = pd.DataFrame()
    e2e_result_df['name'] = [a.name for a in test_data.data]
    e2e_result_df['type'] = list(map(lambda x: x.split('_')[0], [a.name for a in test_data.data]))
    e2e_result_df['seq_lens'] = list(map(lambda x: x.numpy(), seq_lens_list))
    e2e_result_df['exact_p'] = pp_exact_p
    e2e_result_df['exact_r'] = pp_exact_r
    e2e_result_df['exact_f1'] = pp_exact_f1
    e2e_result_df['shift_p'] = pp_shift_p
    e2e_result_df['shift_r'] = pp_shift_r
    e2e_result_df['shift_f1'] = pp_shift_f1
    # pdb.set_trace()
    final_result = e2e_result_df[e2e_result_df['type'].isin(
        ['RNaseP', '5s', 'tmRNA', 'tRNA', 'telomerase', '16s'])]
    to_output = list(map(str, 
        list(final_result[['exact_p', 'exact_r', 'exact_f1', 'shift_p','shift_r', 'shift_f1']].mean().values.round(3))))
    print('Number of sequences: ', len(final_result))
    print(to_output)

Could you please have a check?

kad-ecoli · 2020-07-01T11:40:52Z

I see. So the paper actually only test on a subset of 2877 (72%) RNAs from the full set of 3975 ArchiveII RNAs. This was not clear in the paper. On this subset, e2efold_productive_short.py (or e2efold_productive_long.py if >600 nucleotides) indeed outperforms LinearFold. However, the performance is still not as impressive as that reported in the paper. F1 is <0.7 for e2efold_productive, even though the paper reports F1=0.821.

Method	F1	MCC	Predicted base pairs per RNA
e2efold	0.6894	0.6943	36.5262
LinearFold	0.6186	0.6216	50.6879

liyu95 · 2020-07-01T12:14:58Z

Thank you for running the code and reproduce the results!
In the Section Test On ArchiveII Without Re-training in our paper, we made it clear that we tested on the overlapping RNA families in ArchiveII, instead of the entire set. In Table 3, we also reported the performance as F1=0.686, which you have reproduced. We have never claimed we can reach F1>0.8 on the ArchiveII dataset.
Could you please make it clear what your concern is?

kad-ecoli · 2020-07-01T12:40:07Z

Sorry, I mixed Table 2 with Table 3. You are correct that e2efold should have F1=0.69 on this dataset.

The main reason for my misunderstanding was that the original text said "We then test the model on sequences in ArchiveII that have overlapping RNA types (5SrRNA, 16SrRNA, etc) with the RNAStralign dataset" which apparently should have included the "SRP" RNA type shared by both datasets with hundreds of RNAs.

In fact, "SRP" was excluded from Table 3. This exclusion made e2efold appeared better than SOTA.

kad-ecoli closed this as completed Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to reproduce Table 3 of the e2efold paper #6

Failure to reproduce Table 3 of the e2efold paper #6

kad-ecoli commented Jun 30, 2020 •

edited

Loading

liyu95 commented Jul 1, 2020

kad-ecoli commented Jul 1, 2020 •

edited

Loading

liyu95 commented Jul 1, 2020

kad-ecoli commented Jul 1, 2020 •

edited

Loading

liyu95 commented Jul 1, 2020

kad-ecoli commented Jul 1, 2020 •

edited

Loading

Failure to reproduce Table 3 of the e2efold paper #6

Failure to reproduce Table 3 of the e2efold paper #6

Comments

kad-ecoli commented Jun 30, 2020 • edited Loading

liyu95 commented Jul 1, 2020

kad-ecoli commented Jul 1, 2020 • edited Loading

liyu95 commented Jul 1, 2020

kad-ecoli commented Jul 1, 2020 • edited Loading

liyu95 commented Jul 1, 2020

kad-ecoli commented Jul 1, 2020 • edited Loading

kad-ecoli commented Jun 30, 2020 •

edited

Loading

kad-ecoli commented Jul 1, 2020 •

edited

Loading

kad-ecoli commented Jul 1, 2020 •

edited

Loading

kad-ecoli commented Jul 1, 2020 •

edited

Loading