docs/DAVIS-format.txt

Annotation Format:


The annotations in each frame are stored in png format.
This png is stored indexed i.e. it has a single channel and each pixel has a value from 0 to 254 that corresponds to a color palette attached to the png file.
It is important to take this into account when decoding the png i.e. the output of decoding should be a single channel image and it should not be necessary to do any remap from RGB to indexes. 
The latter is crucial to preseve the index of each object so it can match to the correct object in evaluation.

Each pixel that belongs to the same object has the same value in this png map through the whole video.
Start at 1 for the first object, then 2, 3, 4 etc.
The background (not an object) has value 0.
Also note that invalid/void pixels are stored with a 254 value.


These can be read like this:

import PIL.Image as Image
img = np.array(Image.open("000005.png"))


or like this:

ann_data = tf.read_file(ann_filename)
ann = tf.image.decode_image(ann_data, dtype=tf.uint8, channels=1)


See the code for loading the davis dataset for more details.