Skip to content
Jeffrey Ede edited this page Sep 18, 2020 · 33 revisions

Overview

Electron microscopy datasets are available from a combination of Zenodo and Google Drive storage (mirror 1). They're also available from a publicly accessible University of Warwick dataserver (mirror 2). TEM and STEM Images/Crops datasets were collected by dozens of Warwick scientists working on hundreds of projects and therefore have a diverse constitution. Wavefunctions are for atom columns.

A preprint|paper provides dataset details and visualizations. Datasets are in the public domain and can be used without restriction. Most datasets are large (100+ GB) so downloads may take a couple of hours or more depending on your internet connection. In addition, if many users have recently downloaded a dataset from mirror 1, you might get an error saying "download quota exceeded for this file so you can't download at this time". To avoid this, either # to Google Drive or use mirror 2.

Exit Wavefunctions

Multiple datasets containing 98340 wavefunctions simulated with clTEM. In addition, there are 1000 experimental focal series. Wavefunctions are in 64-bit complex (320, 320) numpy array files (.npy) that can be opened with np.load(). Focal series images are in TIFF format. Featured in this preprint.

Datasets include:

  • Wavefunctions (wavefunctions_partitioned_multiple_hq): n=3, multiple materials - 27.8 GB.
  • Wavefunctions Unseen Training (wavefunctions_multiple_unseen_train_hq): n=3, multiple materials, materials in training set - 1.2 GB.
  • Wavefunctions Single (wavefunctions_single_hq): n=3, single material - 3.7 GB.
  • Wavefunctions Restricted (wavefunctions_multiple_forth_hq): n=3, multiple materials, simulation hyperparameter ranges reduced by a factor close to 1/4 - 9.1 GB.
  • Wavefunctions n=1 (wavefunctions): n=1, multiple materials. See dataset_info.txt for partitioning into training, validation and test sets. - 28.6 GB.
  • Wavefunctions n=1 Unseen Training (unseen_train): n=1, multiple materials, materials in training set - 1.1 GB.
  • Wavefunctions n=1 Single (wavefunctions_single): n=1, single material - 3.7 GB.
  • Experimental Focal Series (experimental_focal_series): 1000 experimental focal series. Series have a quadratically increasing defocus sequence; however, they are at different spatial scales - 13.7 GB.
  • CIFs (cifs): Downloaded from the COD and used for clTEM simulations - 203.9 MB.
  • ULRs (url_lists): COD URLs cifs were downloaded from.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

Exit Wavefunctions 96x96

Wavefunctions downsampled to 96x96. They are in 32-bit complex (dataset_size, 320, 320, 2) numpy array files (.npy) that can be opened with np.load(). Python index [...,0] is the real part, and [...,1] is the imaginary part. Training, validation, and test sets are concatenated along the batch axis (training data at low indices).

  • Wavefunctions 96x96 (wavefunctions_n=3): Bilinearly dowsampled from wavefunctions_multiple_hq with antialiasing. 36324 wavefunctions: 24530 training, 3399 validation, and 8395 test. - 2.62 GB.
  • Wavefunctions 96x96 Restricted (wavefunctions_restricted_n=3): Bilinearly dowsampled from wavefunctions_multiple_forth_hq with antialiasing. 11870 wavefunctions: 8002 training, 1105 validation, and 2763 test. - 855 MB.
  • Wavefunctions 96x96 Single (wavefunctions_single_n=3): Bilinearly dowsampled from wavefunctions_single_hq with antialiasing. 4825 wavefunctions: 3861 training, and 964 validation. - 347 MB.

Download mirror 1

Electron Micrographs 96x96

Size 96x96 images intended for rapid development. Images are in numpy array files (.npy) that can be opened with np.load().

  • Full TEM images downsampled to 96x96 with antialiasing. Images are in a (17266, 96, 96, 1) numpy array file (.npy). - 607 MB.
  • Full STEM images downsampled to 96x96 with antialiasing. Images are in a (19769, 96, 96, 1) numpy array file (.npy). - 695 MB.
  • 96x96 crops from full STEM images. Images are in a (19769, 96, 96, 1) numpy array file (.npy). - 695 MB.

Download mirror 1

STEM Full Images

Full STEM images in a variety of shapes. Featured in this paper.

Info: 159.4 GB. 16227 images.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

STEM Crops

Non-overlapping 512x512 crops from images in the STEM full images dataset. Featured in this paper.

Info: 157.3 GB. 110933 training, 21259 validation and 28877 test set crops, totalling 161069 crops.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

TEM Full Images

Full TEM images. Featured in this paper.

Info: 269.8 GB. 11350 training, 2431 validation and 3486 test images, totalling 17267 images.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

Contact

Jeffrey Ede: j.m.ede@warwick.ac.uk