Skip to content

The OTDR (Optical Time Domain Reflectometer) Data Format

Hsin-Yu Sidney Li edited this page Jan 17, 2016 · 8 revisions

(Last Revised 2016-01-04)

This is an old copy (from January 4, 2016), of the posting of my blog. There have been some major changes and corrections since that last update. Please see my blog posting for the up-to-date version.

Introduction

The SOR ("Standard OTDR Record") data format is used to store OTDR optical time-domain reflectometer fiber data. The format is defined by the Telcordia SR-4731, issue 2 standard. While it is a standard, it is unfortunately not open, in that the specifics of the data format are not openly available. You can buy the standards document from Telcordia for $750 US (as of this writing), but this was too much for my budget. (And likely comes with all sorts of licensing restrictions. I wouldn't know; I have never seen the document!)

There are several freely available OTDR trace readers available from download on the web, but most do not allow exporting the trace data into, say, a CSV file for further analysis, and only one I am aware of that runs natively on Linux (although some will work with the Wine emulator). There have been requests on various Internet forums asking for information on how to extract the trace data, but I am not aware of anyone providing any answers beyond pointing to the free readers and the Telcordia standard.

Fortunately the data format is not particularly hard to decipher. The table of contents on the Telcordia SR-4731, issue 2 page provides several clues, as does the Wikipedia page on optical time-domain reflectometer. Using a binary-file editor/viewer and comparing the outputs from some free OTDR SOR file readers, I was able to piece together most of the encoding in the SOR data format. In this article I will describe my findings, in the hope that it will be useful to other people. But use it at your own risk! The information provided here is based on guess work from looking at a limited number of sample files. I can not guarantee that there are no mistakes, or that I have uncovered all possible exceptions to the rules that I have deduced from the sample files. You have been warned!

A Simple SOR File Reader

For the impatient, I have written a simple program pubOTDR (hosted at GitHub) that parses a SOR file and dumps the trace curve into a TAB delimited data file. The program is written in Perl, and is far from efficient, but it should work! Some day I might write a Python version, but in the meanwhile you are welcome to port the code to whatever programming language that is more useful to you.

To run the program (in Linux), open a terminal and run the program:

  % read_otdr.pl my_otdr_file.sor

where my_otdr_file.sor is the OTDR SOR file. It will print the parsed information on the screen and dump the trace data into the file my_otdr_file-trace.dat.

Organization of the SOR file

There are actually two main versions of the OTDR SOR files. The earlier version is from Bellcore (the 1.x versions), the new version is 2.x. The files are binary data filies; all values are encoded as little-endian signed or unsigned integers, with floating-point values represented as scaled integers (i.e., the integers are multiplied by some factor, typically some power of 10, to become the actual value). Floating-point numbers are not used,

In both version, the data is arranged in blocks; some are required, some are optional. They are:

  • Map block (required): Map
  • General parameters block (required): GenParams
  • Supplier parameters block (required): SupParams
  • Fixed parameters block (required): FxdParams
  • Key events block (required if data point block is not present): KeyEvents
  • Link Parameters block (optional): LnkParams
  • Data points block (required if key events block is not present): DataPts
  • Special proprietary block (optional): these appear to be vendor specific.
  • Checksum block (optional): Cksum

The Map block is the first block, containing the format version number and details of the blocks to follow. The individual blocks in the file are each described by its own "map" which consists of the name of the block (a string), a version number, and the size of the block (in bytes). These "maps" also specify the order in which the blocks appear in the file; the order can differ from vendor to vendor. However, the checksum block always appears as the last block, and this arrangement makes sense for creating and then appending the checksum of the file after it is calculated.

After the Map block comes the individual blocks that contain the actual data, in the order described in the Map block.

One difference between the older 1.x version and the new 2.x version is that blocks in the new 2.x version are preceded by the name of the block (e.g., GenParams), while the older version does not. The preceding block name is redundant, but it affords an extra layer of sanity check.

The map block

In the newer 2.x version, the Map block starts with the string "Map", followed by a terminating '\0' character. The older 1.x version does not have the 'Map\0' heading. There are 8 bytes following the 'Map\0' heading (in the 1.x version, just 8 bytes). These are:
  • 0-1: version number
  • 2-5: number of bytes in the Map block
  • 6-7: number of blocks

All numbers are unsigned integers (recall that all are in little-endian order). The version number is encoded as 100 times the the version number. For example, version 1.10 would be encoded as the number 110. The number of bytes contained in the Map block include the 'Map\0' header and the 8-byte information that follows.

The 8-byte information is followed by individual "maps" of the blocks that are to follow. Each of the block "maps" consist of the block name (a string terminated by the '\0' character), followed by 6-bytes. The 6-bytes are:

  • 0-1: version number
  • 2-5: number of bytes in the block

The version number are usually the same, especially for the standard/required blocks, but they can be different for the special proprietary blocks. The version number actually cited as the version number of the file appears to be the version number from the FxdParams block.

The general parameters block

The format of the 1.x version and 2.x version are slightly different. I will describe the newer 2.x version first.

The general parameters block starts with the "GenParams\0" heading (string followed by a terminating '\0' character), then two bytes that indicates the language (EN for English). This is followed by the following fields (in the following, all strings include a terminating '\0' character unless indicated otherwise):

  1. cable ID: string
  2. fiber ID: string
  3. fiber type: 2 byte unsigned integer
  4. wavelength: 2 byte unsigned integer
  5. location A (staring location): string
  6. location B (ending location): string
  7. cable code (or fiber type): string
  8. build condition: 2 byte characters (no terminating '\0')
  9. unknown field: 8 bytes
  10. operator: string
  11. comments: string

The interpretation of the cable code field (7th field) seem to vary from vendor to vendor, with some using it as the fiber type.

The fiber type (3rd field) is an integer that indicates the type of fiber. The encoding is as follows (see here for more details):

  • 651: ITU-T G.651 (multi-mode fiber)
  • 652: ITU-T G.652 (standard single-mode fiber)
  • 653: ITU-T G.653 (dispersion-shifted fiber)
  • 654: ITU-T G.654 (1550nm loss-minimzed fiber)
  • 655: ITU-T G.655 (nonzero dispersion-shifted fiber)

The wavelength (4th field) is encoded as 10 times the wavelength. Thus 13100 means 1310.0nm. However I have seen cases (in older 1.x format files) that are off by a factor of 10 (showing 1310 for 1310nm). I have not been able to determine what the rules are, but the newer 2.x formats files all have the factor of 10.

Build condition (8th field) consist of two bytes of character. The encoding is as follows:

  • BC: as-built
  • CC: as-current
  • RC: as-repaired
  • OT: other

The string fields may contain the newline or carriage-return characters.

The format for the 1.x version is similar, but it does not have the 'GenParams\0' header, and fiber type (3rd field) is missing. As far as I can tell, there is no encoding of the fiber type in the 1.x version format.

The supplier parameters block

The supplier parameters block starts with the 'SupParams\0' string in the 2.x version format; this header is absent in the 1.x version format. The fields in the supplier parameters block in the 2.x version format are as follows (all are strings terminated by the '\0' character):
  1. supplier name
  2. OTDR name
  3. OTDR serial number
  4. module name
  5. module serial number
  6. software version
  7. other

The fixed parameters block

The fixed parameters block starts with the 'FxdParams\0' string in the 2.x version format; this header is absent in the 1.x version format.

The fields for the 2.x version format are as follows (all are unsigned integers unless otherwise noted):

  • 0-3: date/time: integer, 4 bytes
  • 4-5: unknown: 2 bytes
  • 6-7: wavelength: integer, 2 bytes
  • 8-17: unknown: 10 bytes
  • 18-19: pulse-width: 2 bytes
  • 20-23: distance spacing: 4 bytes
  • 24-27: number of data points in trace: 4 bytes
  • 28-31: index of refraction: 4 bytes
  • 32-33: backscattering coefficient: 2 bytes
  • 34-37: number of averages: 4 bytes
  • 38-41: range: 4 bytes
  • 42-57: unknown: 16 bytes
  • 58-59: loss threshold: 2 bytes
  • 60-61: reflection threshold: 2 bytes
  • 62-63: end-of-transmission threshold: 2 bytes
  • 64-65: trace type: 2 characters
  • 66-81: unknown: 16 bytes

The date/time field is a 4-byte unsigned integer that is Unix (or POSIX) time, and is the number of seconds that have elapsed since 00:00:00 UTC of January 1st, 1970. The two bytes that follow it may be related to the time-zone (there are cases where the time displayed by the free OTDR readers are off by one hour from what I get by interpreting it as Unix time), but so far I have not been able to determine how it is encoded.

The wavelength is encoded as an unsigned integer that is 10 times the wavelength in nanometers (similar to the previous wavelength field in the general parameters block).

Pulse-width is an unsigned integer in nanoseconds.

Distance spacing is an unsigned integer. To convert it into meters, multiply by 2×10-6. However, this number needs to be adjusted by factoring in the refractive index, which will be explained later.

The refractive index is an unsigned integer that is 106 times the value of the index of refraction (IOR).

The backscattering coefficient an unsigned integer. Multiply the integer by -0.1 to get dB.

The range is an unsigned integer. To convert it into kilometers, multiply by 10-6. However, similar to the distance spacing, this number needs to be adjusted by factoring in the refractive index, which will be explained later.

The loss, reflection, and end-of-termination (EOT) thresholds are the specified values for determining when one of the "events" occur. They are unsigned integers. To convert the integers into dB values, multiply the loss and EOT thresholds by 0.001, and multiply the reflection value by -0.001.

Trace type is represented by two characters. These are:

  • ST: standard trace
  • RT: reverse trace
  • DT: difference trace
  • RF: reference

The format for version 1.x is similar, but do not have some of the fields in the 2.x version. They are:

  • 0-3: date/time: integer, 4 bytes
  • 4-5: unknown: 2 bytes
  • 6-7: wavelength: integer, 2 bytes
  • 8-13: unknown: 6 bytes
  • 14-15: pulse-width: 2 bytes
  • 16-19: distance spacing: 4 bytes
  • 20-23: number of data points in trace: 4 bytes
  • 24-27: index of refraction: 4 bytes
  • 28-29: backscattering coefficient: 2 bytes
  • 30-33: number of averages: 4 bytes
  • 34-37: range: 4 bytes
  • 38-47: unknown: 10 bytes
  • 48-49: loss threshold: 2 bytes
  • 50-51: reflection threshold: 2 bytes
  • 52-53: end-of-transmission threshold: 2 bytes

For the 1.x version, distance spacing is handled the same way. Range is similar, but is multiply by 2×10-5 to convert into kilometers. There is no trace type encoding in the 1.x version.

I have seen cases where the range value are not consistent with the distance spacing and number of data points. The safer bet seems be to ignore the range value and calculate it yourself from the distance spacing and the number of data points. (The trace data are equal-distance.) However, the distance spacing and range values need to be adjusted by factoring in the refractive index. Apparently the values given are for the distance that light travels in a particular media --- but not in vacuum. The "real" distance (shown by other OTDR SOR file readers) are calculated by the following formula:


  (real/displayed distance) = (raw distance) × 1.498962239 / (refractive index)

The magic number 1.498962239 was deduced (guessed) by comparing the output from the various OTDR SOR file readers with the raw/original values extracted from the SOR files. I have not yet discovered where this magic number (which looks like a refractive index value) comes from, and the distances calculated according to this scaling do not agree with the values from various OTDR SOR file readers down to all decimal points. However, in all cases that I have checked, the values agree within a fraction of a meter.

Regarding the relationship between range and distance spacing: you would assume that the range should be distance spacing times the number of data points minus one, yet the various OTDR SOR file readers and writers seem to multiply by the number of data points (i.e., off by one).

Key events block

The key events block starts with the 'KeyEvents\0' string in the 2.x version format; this header is absent in the 1.x version format. The formats of the 1.x version and 2.x version are slightly different. I will start with the 2.x version:

The first two bytes following the header is a unsigned integer that is the total number of events. Each event is a fixed 42-byte record followed by a '\0' terminated comment string (which may be empty). The fixed 42-bytes are as follows:

  • 00-01: event number (1, 2, 3, etc.); 2 bytes, unsigned integer
  • 02-05: unadjusted distance; 4 bytes, unsigned integer
  • 06-07: slope; 2 bytes, signed integer
  • 08-09: splice loss; 2 bytes, signed integer
  • 10-13: reflection loss; 4 bytes, signed integer
  • 14-21: event type: 8 characters
  • 22-29: segment 1: 8 bytes (details to follow)
  • 30-37: segment 2: 8 bytes (details to follow)
  • 38-41: unknown: 4 bytes

The event number count starts from 1. The distances are represented as unsigned integers, and handled similar to the distances in the fixed parameters block:


  (real/displayed distance in kilometers) = (integer value) × 2× 10-5 × 1.498962239 / (refractive index)

Slope, splice loss, and reflection loss are all signed integers, and are multiplied by 0.001 to become dB/km (for slope) and dB.

Event type is represented by a string of the form nx9999LS, where n and x are single characters. x appears to represents (or correlate with) the "mode" in which the event was added or declared. When

  • x is 'A', it is manual mode, otherwise it is auto mode.
  • x can be the characters 'E', 'F', 'M', or 'D', but I have not discovered what they signify, except that 'E' appears to signify the end of the fiber.

The n character is a number: 0, 1, or 2: 0 is a loss or gain in power; 1 is a reflection, and 2 means that it is a "multiple event".

The two segments (segment 1 and segment 2) encode positions related to the event. Each segment is 8 bytes wide, split into two parts. The first 4-bytes integer of segment 2 represents the end-of-event position. The second 4-byte integer of segment 2 represents the starting position of the next event. In all examples that I've seen, segment 2 of an event record is identical to segment 1 of the event record that follows it, so segment 1 is just a repeat of the end-of-event and start-of-next-event positions of the previous event. Translation of the integers to kilometers follow the same formula as the distance encoding of the event.

Following the end of all event records is 22 bytes. These are encoded as follows:

  • 00-03: total loss: 2 bytes, unsigned integer
  • 04-07: fiber start position: 4 bytes, signed integer
  • 08-11: fiber length: 4 bytes, unsigned integer
  • 12-13: Optical return loss (ORL): 2 bytes, unsigned integer
  • 14-17: duplicate of 04-07 (fiber start position)
  • 18-21: duplicate of 08-11 (fiber length)

The total loss integer and ORL values are multiplied by 0.001 to become dB. The fiber start position and fiber length are handled the same way as before, namely:


  (real/displayed distance in kilometers) = (integer value) × 2× 10-5 × 1.498962239 / (refractive index)

Note that the fiber start position can be a negative number. In all examples that I've seen, the last 8 bytes are just duplicates of the fiber start position and the fiber length.

The format for version 1.x is similar, but each event record is a fixed 22-bytes plus a '\0' terminated comment string:

  • 00-01: event number (1, 2, 3, etc.); 2 bytes, unsigned integer
  • 02-05: unadjusted distance; 4 bytes, unsigned integer
  • 06-07: slope; 2 bytes, signed integer
  • 08-09: splice loss; 2 bytes, signed integer
  • 10-13: reflection loss; 4 bytes, signed integer
  • 14-21: event type: 8 characters

The only difference is that the two segment information are absent. The trailing 22-bytes in the version 1.x format appears to be the same as the 2.x version, but the numbers for the fiber starting position do not match or make sense in the examples I've studied.

The data points block

We finally come to the data points block that encodes the trace curve itself. Similar to the other blocks, the block starts with a header string 'DataPts\0' in the version 2.x format, but the header is absent in the 1.x version format. The format for both the 1.x version and 2.x version are the same.

After the header (if applicable), the data points block starts with 12 bytes. The first 4 bytes is an unsigned integer that is the number of data points (this will be the same as the number of data points from the fixed parameters block). This is followed by 2 bytes whose purpose is unknown. The next 4 bytes is a repeat of the number data points, followed by another 2 bytes that appears to always be the same as the previous mysterious 2 bytes.

After the initial 12-bytes, comes the real data. Each data point is a 2 bytes unsigned integer. Multiply by -0.001 to translate the value into dB (converting all values to zero or negative). Different OTDR SOR file readers offset the data differently; some offset the data so that the highest reading is 0 dB. Others add an offset to make the minimum reading 0 dB.

The data points are equally spaced, by the "distance spacing" value specified in the fixed parameters block (after the adjustment with the refractive index value). One can apply the "fiber start position" value specified in the key events block, but some OTDR SOF file readers do not do this.

There are hints in the public that suggest that an additional scaling factor might be applied to the trace curve (since 2 bytes can only give you a maximum of 65.535 dB of dynamic range, which might not be enough). But it is not clear where this scaling factor can come from. The unknown 2 bytes in the initial 12-byte segment does not seem to be that scaling factor, and I have yet to encounter a SOR file that the scaling factor.

The checksum block

The checksum block also starts with a 'Cksum\0' header in the 2.x format, and is absent in the 1.x format. The checksum value itself is 2 bytes (16-bits). The algorithm for calculating the checksum uses a particular 16-bit CRC (cyclic redundancy check) algorithm (or function). For the reader who is not familiar with CRC algorithms, please see the excellent article A Painless Guide to CRC Error Detection Algorithms by Ross N. Williams (1993). The specific flavor used for the OTDR SOR format is sometimes known as "CRC-16/CCITT-FALSE" (for a catalog of different CRC-16 algorithms, please see Catalogue of parametrised CRC algorithms with 16 bits).
For a Perl implementation of CRC functions, please see the Digest::CRC module; for a Python implementation, please see the crcmod module.

Since there are several variants of the CRC-16 algorithm, and there is some confusion of the names and exact definitions, I will spell out the exact parameters below, following the convention in the Painless Guide document:

  • Width: 16
  • Poly: 0x1021
  • Init: 0xFFFF
  • RefIn: False
  • RefOut: False
  • XorOut: 0x0000
  • Check: 0x29B1 (with an input string of "123456789")

The last item is useful for checking whether the implementation you use is the correct one: when given the string "12345789", the checksum should come out to be "0x29B1".

The exact algorithm for calculating the checksum (for both version 1.x and 2.x) is as follows: take the whole content of the file, including the 'Cksum\0' header, as one huge binary string, and calculate the checksum on this string. The check sum will be two bytes (16-bits), which is then appended to the file (following the 'Cksum\0' header). However, the two bytes need to be swapped because the convention in the SOR file is to store numbers in little endian byte order. For example, if the checksum is 0xD680, the last two bytes of the SOR file are 0x80, 0xD6.

This turns out to be very awkward in some ways. A very interesting property of the CRC algorithm is that, if you were to append the two bytes of the 16-bit checksum to the file (or string) and then run the CRC function on it, the checksum will be zero! But this only works if the checksum is appended in big endian byte order. Said another way, take last two bytes of the SOR file, swap them, and run the checksum function on it. If the checksum is zero, then everything is okay (or rather, is most likely to be okay; the CRC error detection code will detect most spare errors, but it doesn't detect all errors after all). It would have been nice if the byte-swapping of the two checksum bytes were not necessary.

One last word of caution: I have check this CRC-16 algorithm against many versions of 1.x and 2.x files; with the exception of only two sample files from one particular vendor, all of them check out, so I believe that the algorithm is correct.

Closing Remarks

There are still several parts of the SOR format that are still unknown to me, but I believe the bulk of the encoding scheme is as I have described in this article, and should be mostly correct. The simple (and extremely inefficient) pubOTDR program is basically an implementation of the findings described above. You are free to use the program and the information in this article as you see fit (although I would appreciate some acknowledgement). But once again, all of this is provided at no cost, no guarantees, and no warranties; use this at your own risk!