Classification in Cryo-Electron Tomograms

SHREC 2021 Track

Utrecht University logo
Motivation

There is a noticeable gap in knowledge about the organization of cellular life at the mesoscopic level. With the advent of the direct electron detectors and the associated resolution revolution, cryo-electron tomography (cryo-ET) has the potential to bridge this gap by simultaneously visualizing the cellular architecture and structural details of macromolecular assemblies, thee-dimensionally. The technique offers insights in key cellular processes and opens new possibilities for rational drug design. However, the biological samples are radiation sensitive, which limits the maximal resolution and signal-to-noise ratio. Innovation in computational methods remains key to derive biological information from the tomograms.

Task

In this SHREC track, we propose a task of localization and classification of biological particles in the cryo-electron tomogram volume. We provide physics-based simulation of cryo-electron tomograms and annotations for all of them except the test tomogram. We hope that this will enable researchers to try out different methods, including machine learning and statistical approaches. All 3D object retrieval and 3D electron microscopy experts interested in computational cryo-ET are welcome to participate.

Dataset

To provide participants with as accurate ground truth information as possible, we have created a physics-based simulator to generate cryo-electron tomograms.

The dataset consists of 10 tomograms, with 1nm/voxel resolution, with a size of 512x512x512 voxels. Each tomogram is packed with up to 1500 of 13 uniformly distributed and rotated different proteins, various in size and structure, as well as membranes and gold fiducials.

For each but the test tomogram, we provide:

  1. Ground truth volume
  2. Ground truth tilt angle projections (using which the tomogram was constructed)
  3. Text file with locations and PDB ID of each particle
  4. Occupancy volumes (where each voxel contains particle ID of the particle (w.r.t. text file) or 0 if that’s not a particle)
  5. Class mask volumes (where each voxel contains class ID of the particle (w.r.t. text file) or 0 if that's not a particle)
Dataset also includes Python code examples of data loading and a README file with more detailed information.
The full dataset also includes evaluation script that was used for the paper results.

Important note: Unfortunately, we have found a bug in the generated dataset and now have to change the evaluation process and our target schedule. The particles for the simulator are generated by loading their PDB entry and generating their electron density maps. One of the classes, protein 4V94, has been generated incorrectly - it always appears twice next to each other and the center of such particle is in the empty space between them. The reason is that PDB upload was a mirrored structure, while naturally the protein occurs as a single particle. This class was not used last year, so the issue is not present in SHREC2020 dataset. Moreover, we have re-checked all other classes and we are sure that this happened only to the particular class. While that does not present a problem for semantic segmentation approaches that are trained on class masks, it is a problem for approaches that use the center locations that we provide in particle_locations.txt. We have decided to remove 4V94 protein from the evaluation completely, to be fair to everyone.

Download SHREC2021 contest dataset Download SHREC2021 diff dataset Download SHREC2021 full dataset Pre-print PDF
Registration

If you intend to participate in the track, please send us an email and mention your affiliation and co-authors.
This helps us keep track of the participants and plan accordingly. It also allows us to send you updates about the track.

Submission

From participants, no later than the deadline mentioned in the schedule, we expect results submitted along with a one-page description of the method used to generate them. Results should be presented as a .txt file containing the found particles, in the similar fashion to the ground truth text files. Data should be formatted in 4 columns: predicted class (PDB ID), estimated center X coordinates, estimated center Y coordinates, estimated center Z coordinates.

Evaluation

The main goal of the track is to localize and classify biological particles in the tomogram. The performance of methods will be evaluated solely on the test tomogram: the only tomogram for which ground truth is not provided. Following metrics will be measured and compared: precision, recall, F1 score. We intend to compare submitted results in two areas: localization (if a particle is found or not) and classification (if a found particle is correctly classified or not).

Changes from 2020

We made some major updates to the simulation process. Firstly, we updated phase contrast by applying a solvent correction in the generation of the macromolecule's electrostatic potential. We also extended the model with absorption contrast (i.e. electrons absorped by the specimen) which forms an important contribution to the signal. The image formation process is now improved with DQE/MTF measurements of the K2 Summit with dose dependent poissonian noise. The models in the dataset have more inter-model variation with varying defocus and electron dose. Finally, we scaled the amplitudes of simulated projections with different experimental projection images to make the models more representative of experimental data.
We have added one protein class of a ribosome (5mrc), and updated an outdated pdb structure (4d8q -> 4v94). Fiducials (gold beads) and membranes were added to provide a realistic additional challenge.

Promotional image

Slice of an experimental tomogram and a 3D visualization of averaged particles.
Image credits: Pfeffer S, Woellhaf MW, Herrmann JM, Förster F, Organization of the mitochondrial translation machinery studied in situ by cryoelectron tomography. Nature communications 6:6019 (2015)

Organizers

  • Ilja Gubins 1
  • Marten Chaillet 2
  • Gijs van der Schot 2
  • Remco C. Veltkamp 1
  • Friedrich G. Forster 2
1: Utrecht University, Department of Information and Computing Sciences
2: Utrecht University, Department of Chemistry

Contact us

Schedule

The registration and submission deadlines are in AoE (Anywhere on Earth) timezone.

January 8 Track announcement & registration is open
January 25 Dataset release
February 8 Registration deadline
March 8
March 22
Participants submission deadline
March 15
July 10
Track paper submission to SHREC
April 15
July 25
SHREC: First reviews done, first stage decision on acceptance or rejection
June 30
Aug 18
SHREC: Final version submission
Sept 1 Publication online on 3DOR proceedings
Sept 3 Presentation on 3DOR workshop

Previous years: 2019 | 2020
Website last updated: Aug 31 13:00 (CEST)