Anatomy of a PSD File

The PSD file format is massive. It's the result of over 20 years of backward compatible changes and 14 major Photoshop releases. New features have been cleverly tacked on, and as a result, there are some redundancies and oddities. I've been working with the PSD format for over 2 years now, and there are still many details that I haven't explored yet.

That said, PSD files have a general structure that has existed for a long time. I'm going to attempt to explain the important and interesting parts without going into mundane details.

Let's get to it.

A Little Background on Binary Files

If you have ever worked with reading/writing/parsing binary files, or if you know how they work, just skip this whole section.

Binary files are a bit of a black box. If you try to open them in your text editor, you're going to see lots of strange symbols and characters. It looks like gibberish, but it's actually a bunch of 1's and 0's that follow a very strict specification. Your text editor is trying to interpret the 1's and 0's using a particular character set, such as UTF-8, but is failing miserably because the data isn't supposed to be human readable at all.

In order to read the data in the binary file, you need a specification, or a map of sorts. Of course, if you can't find it online, you can also reverse engineer it, but that's another story. Adobe has a public spec for the PSD file format available online. Unfortunately, despite its latest revision date, it is incomplete and even incorrect in some places. Some incredibly important stuff, which I'll get into below, is even marked as "Undocumented data". That said, if you take a quick glance at the page, you will get a gist of what a file spec looks like.

The Main Structure

The PSD format is broken up into 5 main components: the header, color mode data, image resources, layer and mask data, and image data, respectively. All of these sections are variable in size, but luckily the first 4 bytes of each contain their length (except for the header, because it starts at position 0).

Since each section starts with its length, this makes it easy for the parser to skip entire sections if needed. PSD.rb does this if you only need to read the image data. The length values can also be used for validation and error correction purposes. If you have a bug in your parser, you can easily compare where it thinks the end of the section is to where it should be and attempt to gracefully adjust.

Descriptors

Before we dive in, there is one important thing to know. Photoshop has an internal data type it refers to as a Descriptor. A Descriptor is very similar to an "object" or a "struct" in many programing languages. It's basically a complex data structure that that contains many different types of data and can be nested many levels deep. In Ruby, we can directly represent a Descriptor as a Hash, which is what PSD.rb does.

Every Descriptor starts with a name and ID, although their contents are optional and might be blank. This is followed by the number of items in the Descriptor.

Every data type in a Descriptor has a unique case-sensitive key that tells the parser how to proceed. There are 18 (known) data types in total:

bool = Boolean
type, GlbC = Class name
Objc, GlbO = A nested Descriptor
doub = Double
enum = Enumerated type, read as a String
alis = Alias, read as a String
Pth = File path
long = Long integer
comp = Large integer aka longlong
VlLs = List, can contain any type of data
ObAr = Object array, currently unsure how to parse
tdta = Raw data, simply read as byte string
obj = Reference, can exist as many different forms
- Clss = null value
- Enmr = Enumerated type
- Idnt = Identifier, read as Integer
- indx = Index, read as Integer
- name = Reference name, read as String
- rele = Offset, read as Integer
- prop = Property, read as String
TEXT = Arbitrary String
UntF = Unit double
UnFl = Unit float

Header

The header is an incredibly important part of the PSD file. It contains information about the color mode, the color depth, the number of color channels, and the dimensions of the whole PSD. In other words, when you set the PSD to be RGB 8-bit color mode, this is stored in the header. This becomes especially important when you go to extract the full preview image from the PSD.

The header data looks like this:

{
 "sig"=>"8BPS",
 "version"=>1,
 "channels"=>3,
 "rows"=>600,
 "cols"=>900,
 "depth"=>8,
 "mode"=>3,
 "color_data_len"=>0
}

You'll notice that the "mode" is simply a number. This number maps to a specific color that is used for the entire document. Here are all of the possibilities:

[
 'Bitmap',
 'GrayScale',
 'IndexedColor',
 'RGBColor',
 'CMYKColor',
 'HSLColor',
 'HSBColor',
 'Multichannel',
 'Duotone',
 'LabColor',
 'Gray16',
 'RGB48',
 'Lab48',
 'CMYK64',
 'DeepMultichannel',
 'Duotone16'
]

The color mode, the depth, and the number of channels determine how we parse the image data. Color channels are the primary color components that are used to describe the color of each pixel in the image. This is a really confusing way of saying R, G, and B are the color channels for RGB. If the channel count is 4 for RGB, then it's really RGBA because it includes alphatransparency data. The same goes for Grayscale, CMYK, and so on.

Color Mode Data

This section actually only exists if your PSD file is set to Indexed or Duotone color. Because of this, I have not explored this section at all and PSD.rb does not support it yet.

If you have indexed color, this section contains the color table for the image. If you have duotone color, the contents are undocumented.

Image Resources

The Resources section is mostly boring settings, but there are a few incredibly important parts to pay attention to. It's basically an array of various program settings and metadata that exist on a per-PSD basis. A lot of the settings are things such as "Auto Save File Path" and "Timeline Information".

There is one incredibly important section, however, that stores layer comp information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly