-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
CAT训练数据集的问题 | Issue with datasets utilized in CAT-Protocol #65
Comments
你好,感谢对我们工作的关注。 这里确实是我们的CASIAv2只使用了篡改图像,没有使用真图。可能带来了一些歧义。 论文中report的结果确实是不含真实casiav2的,但因为1800的sample策略,应该差别不会太大。我们会在后续的版本中注意clarify这点。 |
感谢回复,那请问关于FantasticReality这个数据集也是只用了篡改图像实现的吗?其实如果可以的话我很想了解各个数据集中json文件的使用图像数量情况,可以考虑把每个数据集的json文件开源吗?这样可以帮助更多的人使用同一个标准进行训练。 |
如果存在某些要求无法开放json文件的话,能否告知FantasticReality这个数据集是只用了篡改图像进行采样训练?还是使用所有的真假图像采样1800张进行训练呢?我希望能统一所有的标准进行比较我自己的模型以及复现相关模型。最后真诚的感谢您在图像篡改检测领域做出的贡献。 |
English VersionHello, we have carefully reviewed the training scripts used in the Benco paper over the past few days and re-trained to verify the results of the checkpoint. We have identified some issues with the CAT-Net protocol and are making a statement here. We apologize for any inconvenience caused.
For the above three issues, we will emphasize and clarify them in updates to the GitHub homepage and the Arxiv version. We sincerely apologize for the inconvenience and confusion caused to all researchers. In particular, after careful review and confirmation, we ensure that all models and ablation experiments under the CAT-Net protocol were conducted under the above-mentioned [unified standards], so the reported content in the paper remains fair and meaningful for reference. Additionally, the IMDLBenco codebase aims to provide an easy, accurate, efficient, and fast way to reproduce previous models and develop one's own models using PyTorch. This purpose remains unchanged, and our clarification of the details is to ensure that everything aligns with our original intent. No one is perfect. We should stand firm and learn from criticism. Open-source itself is intended to provide a platform for community oversight and error correction. We hope the community will understand and that future research can also embrace the spirit of open-source, supervision, and iteration. 中文版本(Chinese Version)你好,我们这几天仔细排查了一下Benco论文中训练用的脚本,并重新训练核对了checkpoint的结果。我们发现了一些问题针对CAT-Net Protocol的问题并且在这里做出声明,由此带来的困扰十分抱歉。
对于上述三个问题,我们会更新在Github主页和Arxiv版本中进行强调与澄清。由此带来的不便与困惑向各位Researcher致以诚挚的歉意。 特别的,我们经过仔细地检查确认,所有CAT-Net协议下的模型与消融实验,都可以确保是在上述的【统一标准下】进行的,所以文章report的内容仍是公平且具有参考意义的。 此外,IMDLBenco的codebase目的在于提供一个方便,准确,高效,快速复现前人模型,开发自己模型的PyTorch包,这个目的不会因此改变,我们澄清细节也是希望能保证一切实现符合我们的初衷。 人非完人,挨打要立正,开源本身也是希望提供社区监督与勘误的途径。也希望社区能予以理解。也希望后续的研究也能一同秉持着开源,监督,迭代的精神。 |
另外,这些全部的json我们会include在仓库内的单独区域以供参考,但是因为都是绝对路径,所以暂时不考虑从benco的代码实现部署这些json的内容。但可以通过仓库去检查审视json包含的文件,确保所有的协议和我们的paper中report完全一致,统一,标准。 |
祝好,如果有新的问题欢迎讨论与交流! |
非常感谢您的回复,在我目前遇到的各个篡改检测的仓库中,我从未遇到一个有如此详尽的训练过程,且对每个问题都迅速详细解答澄清的情况,在json文件开源后,我会close这个issue。最后真诚的祝贺大佬往后的科研之路越来越好! |
Hi, Please check the samples of each json file we utilized for each dataset at here: 你好,请在这里查看对应的json文件,有问题或者讨论欢迎交流!
Due to the large file size, it is not suitable to store them in the GitHub repository. Instead, these sample JSON files will be stored on Google Drive. 因为文件大小过大,不适合放在github仓库,这里改用google云盘存放这些sample json: https://drive.google.com/drive/folders/1EQJT9rkJWbDaoVUqHceIwHzBAF4a3jCm?usp=sharing |
感谢您的分享! |
大佬您好,我想尝试复现您的数据集设置,但是遇到了一点问题,在CAT协议中,想问问关于CASIA2.0是只有5123张篡改图像进行采样吗,因为您给出的格式如下:
[
[
"ManiDataset",
"/mnt/data0/public_datasets/IML/CASIA2.0"
],
[
"JsonDataset",
"/mnt/data0/public_datasets/IML/FantasticReality_v1/FantasticReality.json"
],
而在CATNET中的数据集组织格式如下:
class SplicingDataset(Dataset):
def init(self, crop_size, grid_crop, blocks=('RGB',), mode="train", DCT_channels=3, read_from_jpeg=False, class_weight=None):
self.dataset_list = []
if mode == "train":
self.dataset_list.append(FantasticReality(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/FR_train_list.txt"))
self.dataset_list.append(FantasticReality(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/FR_auth_train_list.txt", is_auth_list=True))
self.dataset_list.append(IMD2020(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/IMD_train_list.txt", read_from_jpeg=read_from_jpeg))
self.dataset_list.append(CASIA(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/CASIA_v2_train_list.txt", read_from_jpeg=read_from_jpeg))
self.dataset_list.append(CASIA(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/CASIA_v2_auth_train_list.txt", read_from_jpeg=read_from_jpeg))
# self.dataset_list.append(tampCOCO(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/cm_COCO_train_list.txt"))
# self.dataset_list.append(tampCOCO(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/sp_COCO_train_list.txt"))
# self.dataset_list.append(tampCOCO(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/bcm_COCO_train_list.txt"))
# self.dataset_list.append(tampCOCO(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/bcmc_COCO_train_list.txt"))
# self.dataset_list.append(compRAISE(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/compRAISE_train.txt"))
CATNet是在真假图像分别采样的情况,总共组织了10种类型,这就令我有点困惑,希望能得到您的解答!谢谢!
The text was updated successfully, but these errors were encountered: