-
Notifications
You must be signed in to change notification settings - Fork 6
Xdispec
This document describes the XAS Data Interchange Format (XDI ), version 1.0, a simple file format for a single X-ray Absorption Spectroscopy (XAS) measurement. We are defining this format to accomplish the following goals:
- Establish a common language for transferring data between XAS beamlines, XAS experimenters, data analysis packages, web applications, and anything else that needs to process XAS data.
- Increase the relevance and longevity of experimental data by reducing the amount of data archeology future interpretations of that data will require.
- Enhance the user experience by promoting interoperability among data acquisition systems and data analysis packages.
- Provide a mechanism for extracting and preserving a single XAS-like data set from a related experiment (for example, A DAFS measurement) or from a complex data structure (for example, a database or a hierarchical data file used to store a multi-spectral data set).
This format is intended to encode single-scan data files with metadata. It is not intended to encode relationships between many XAS measurements or between an XAS measurement and other parts of a multi-spectral experiment.
In order to fulfill these goals, XDI files provide a flexible, consistent representation of information common to all XAS experiments. This format is simpler than a format based on XML, HDF, or a database; it yields self-documenting files; and it is easy for both humans and computers to read. Its structure is inspired by that of Internet electronic mail, a plain-text data format which has proven to be robust, extensible, and enduring. Due to these advantages, and because of our intention to develop free software tools and libraries that support XDI, we hope that this file format described in this specification will see wide adoption in the XAS community.
We do not intend this specification to dictate the file formats used by data acquisition systems during XAS experiments, although this is certainly a suitable format for that purpose. Any attempt to do so would be unreasonable due to the number of different data acquisition systems currently deployed at synchrotrons around the world, the variety of experiments performed at these installations, and the continuing development of new experimental techniques. Instead, this specification addresses the representation of a single scan of XAS data after an experiment has been completed.
A beamline which adopts this specification shall either use this format as its native file format or shall provide their users with tools that convert between their native file formats and XDI . In short, they will send their users home with their XAS data stored in this format. We intend to encourage this practice by developing tools for reading, editing, writing, and validating XDI files. Beamlines may choose to modify their data acquisition systems to write data using this format in situations where that would be appropriate. We plan to assist in this effort by developing libraries for popular programming languages which can read, manipulate, and write XDI files.
With their experiment data stored in XDI files, users may choose data analysis packages which are capable of reading this format. It is our hope that, as this specification gains wider adoption, users will ultimately be freed from the responsibility of understanding file formats. With this aim in mind, we shall assist software developers in supporting XDI files.
XDI files contain two sections, a header with information about one scan of an XAS experiment and the data collected during that scan. The header consists of versioning information, a series of fields that contain a single pieces of information, an area for users to store comments about the experiment, and a sequence of labels for the columns of data. The data section contains these columns, with each row corresponding to one point of the scan.
Although the header has been designed to contain arbitrary information, the meanings of several fields are explicitly defined. These fields, described below, contain the most common information about XAS experiments. We hope that users will benefit from their existence when using data analysis packages that support XDI files. However, none of the defined fields are required to be present. For example, some of these fields may not be appropriate for certain experiments and should be omitted in that case.
Some examples of header information follow. A complete list of defined headers along with their specifications is found in Sec. 4.1.
- D_spacing: the d-spacing of the monochromator crystal used to collect the data.
- Beamline: the location where the experiment was performed.
- Edge Energy: edge energy value defined by the data acquisition software.
- Source: the type of x-ray source used in the experiment.
- Timestamps: start and end times of this scan.
- Mu expressions: math expressions for calculating experimental spectra from the data columns.
This section of the XDI specification formally describes the structure of XDI files.
The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in RFC 2119.3, Key words for use in RFCs to Indicate Requirement Levels
An XDI implementation is not compliant if it fails to satisfy one or more of the must or required level requirements presented in this specification.
All of the representations defined in this document are described both in prose and using an augmented Backus-Naur Form (BNF). The syntax used in these grammars is defined in RFC 5234, BNF for Syntax SpecificationsAugmented. Software developers who wish to implement support for XDI files themselves will need to familiarize themselves with this notation to understand this specification. Section 3 is of particular relevance to the notation conventions used in this document. Repitition syntax for grammar rules is a bit peculiar and is described in sections 3.6 and 3.7 of RFC 5234. Some definitions from the the RFC:
-
DIGIT
means one instance of that rule.DIGIT
is synonymous with1DIGIT
-
*DIGIT
means one or more repititions,2*DIGIT
means two or more repititions -
1*2DIGIT
indicates a range from one to two repititions -
3DIGIT
means 3 repititions exactly.
The basic rules used throughout this section to define parsing constructs are presented in the appendix in B.2 and B.3 as part of the complete grammar. All parsing rules that consist of a sequence of multi-character tokens must be delimited by white space unless the tokens of the sequence may be unambiguously identified.
The header and data sections of an XDI file are comprised of structured US-ASCII text. Header field values that are "free-form" or "text" may contain UTF-8 encoded Unicode text, although Unicode support in applications that use XDI files is OPTIONAL. The US-ASCII coded character set is formally by ANSI X3.4-186. The Universal Character Set (Unicode) is defined by ISO/IEC 10646. The UTF-8 translation format is defined by IETF RFC 3629.
The header section of an XDI file appears at the beginning of the file and is comprised of structured text. Every line of the header must begin with a comment character and must end with an end-of-line sequence, both of which are defined below. There are no multi-line headers. Lines may be of any length. Support for the POSIX, Apple, or Microsoft end-of-line conventions is provided to increase cross-platform portability.
COMM = "#" / ";" EOL = CR / LF / CRLF
Header lines are subdivided into four optional subsections — versioning information, header fields, user comments, and column labels — with two separators that are required when when header sections are present. These are organized in the following sequence:
- The required first line of the file is the version line.
- This is followed by zero or more header fields, which can be defined headers or extension headers. These two header types are explained in Sec. 3.4.2.
- The header lines are separated from the user comments by the FIELD-END rule, which is a comment character followed one or more slashes (/) followed by an end-of-line.
- The comment section is for user-supplied, free-format text. Each line begins with a comment character and ends with an end-of-line.
- The comment section ends with the HEADER-END rule, which is a line of dashes which starts with a comment character and ends with an end-of-line.
- The last line before the data is a line of column labels which identify the columns of data. There should be as many labels as there are columns. The label line begins with a comment character and ends with an end-of-line.
All header sections are optional. In a file that does not follow the XDI specification but which contains an obvious header (obvious in the sense that lines begin with a comment character and end with an end-of-line), the obvious header lines should be interpreted as user comments.
The optional status of the headers is to accommodate data files which contain obvious header lines but which are not compliant with this specification. In that case, all header lines are interpreted as user comments. Of course, in that case few of the advantages of the XDI format are realized.
The separator lines (FIELD-END
and HEADER-END
) serve specific, syntactic purposes in the XDI grammar. The line of dashes is a common visual cue denoting the end of the headers and beginning of the data. The FIELD-END
serves to separate and distinguish field lines from freely-formatted user comments, which may resemble a header fields or other grammatical constructs. Similarly, the HEADER-END
serves to distinguish column labels from user comments, which are otherwise grammatically identical elements of the data file.
FIELD-END = COMM 2*"/" EOL HEADER-END = COMM 2*"-" EOL
The first line of the XDI header contains the XDI version to which the file conforms. XDI represents versions of the file format with a <major>.<minor>
numbering scheme. The <minor>
version is incremented when changes are made to the format that do not affect compatibility with previous versions, as when new defined header fields are defined. (A parser compliant with an earlier minor version would treat the newly defined header as an extension field. Propagated to an output file as an extension field, this field would then be interpreted correctly by a more recent parser.) The <major>
version is incremented when other changes are made to the format, as when the definition of the contents of a defined header field is altered.
A series of optional version entries, separated by white space, may follow the XDI version. These version entries exist to allow various programs to annotate the file as it proceeds through the collection and nalysis process. Such annotation is optional although version information must be included in this sequence by software that create XDI files containing extension fields (see section 4.2). The order of the optional version entries is undefined but should be preserved to accurately represent the sequence in which applications have manipulated the file.
XDI-VERSION = "XDI/" *DIGIT ". " *DIGIT APPLICATIONS = VCHAR VERSION = COMM XDI-VERSION *APPLICATIONS EOL
Note that the XDI major and minor version numbers must be treated as integers that may contain more than a single digit. “XDI/1.12” is a higher (more recent) version than “XDI/1.2”.
This specification does not impose a restriction on how applications identify and version themselves. However, a single application must identify and version itself using a single text sequence without white space. Some acceptable examples follow. The first example shows an application which uses the same format as the XDI version rule, which is the recommended format for application versioning; the second shows the names of the data acquisition and data processing programs are specified by name but without version numbers; the third shows some arbitrary method of versioning an application.
# XDI/1.0 Datacollectatron/7.75 # XDI/1.0 XDAC Athena # XDI/1.0 XAS!Collect-3000
The lines immediately following the version line of the header contain the fields of the header. These fields are arranged in a manner similar to the the header of an Internet electronic mail message, although XDI fields shall not span multiple lines. Each field consists of a case-insensitive name, a separating colon, and an associated value. When multiple occurrences of the same field are present the value of the last occurrence must be used as the value for the field.
Although the values of some fields have a required structure, all values are assumed to be free-form text in the following rules. Rules for each of the defined fields are defined in section 4.1 and the complete definition of the FIELDS
rule may be found in section B.5. Although some defined fields take more specifically specified content, the generic definition of a field looks like this:
PROPERWORD = ALPHA *(ALPHA / DIGIT / "_" / "-") WORD = *(ALPHA / DIGIT / "_") FIELD-NAME = PROPERWORD FIELD-VALUE = *WORD FIELD-LINE = COMM FIELD-NAME ": " FIELD-VALUE EOL FIELDS = *FIELD-LINE FIELD-END
The header fields subsection is ended with a FIELD-END
line. Note that because no fields are required to be present, this subsection may contain no lines. The dividing line must be present if any header lines are present but may be absent if no header lines are present. If a field is present, it should also contain a value.
Following the dividing line at the end of the header fields subsection is the area of the header that contains user comments. Please note that this area is reserved for comments supplied by the experimenter and must not be used by software as a place to store other information. Refer to section 4.2 for information about using extension fields for this purpose.
COMMENT-LINE = COMM *VCHAR EOL COMMENTS = *COMMENT-LINE HEADER-END
As with the header fields, this section may contain no lines of commentary or lines that contain no comment text but must end with a dividing line. When extracting the comment subsection from an XDI file, software should remove a single leading space and any trailing white space from each comment line but must not further alter the line’s contents.
Applications must preserve all user comment.
The final line of the XDI header contains the labels for each column of data in the data section of the file, separated by white space. There must be one label present for each column of data present in the data section.
LABEL = *WORD LABELS = COMM *LABEL EOL
The number of column labels must equal the number of columns of data in the data section.
Note that each column label must be a word and that white space delimits the labels. For specific column labels which, in natural language, would consist of two or more words, the use of CamelCase or underscores is recommended.
The data section of the file contains white space delimited columns of floating-point numbers. If the abscissa is not explicitly identified using the ABSCISSA
header, then the first column of this section must contain the abscissa. The remaining columns must correspond to experimental values at that abscissa. If the abscissa is not the photon energy, then the ABSCISSA
must define a math expression for converting the abscissa column to energy.
DATA-LINE = *FLOAT EOL DATA = *DATA-LINE
Blank lines in this section must be discarded. The number of columns must be the same for all lines that contain data. Any column containing a measurement of times must be represented as floating point numbers.
When present, the following header fields must comply with their associated parsing rules. Any fields which fail to do so must be ignored by preprocessing and analysis software. The text in brackets to the right of the token provides a quick overview of the expected format, and any text following the line of dots is an example of a valid value. The grammar rule for the header follows.
-
Abscissa: [math expression]
$1
This field identifies the column containing the abscissa of the data contained in the file. The math expression can be used to specify how to convert the specified column to energy. For instance, if data are recorded as a function of encoder value, then the math expression might be something like12398.61 / (2 * dspacing) / sin( $1 / (57.29577951 * stpdeg) )
where 12398.61 is c in eV·˚ units, 57.29577951 is the constant for converting between radians and degrees, A and dspacing and stpdeg would be replaced by the d-spacing and number of steps per degree for that monochromator.
ABSCISSA = COMM "Abscissa" ": " MATH EOL
-
Beamline: [text]
APS ID10
The location where the experiment was performed.
BEAMLINE = COMM "Beamline" ": " *WORD EOL
-
D_spacing: [float]
3.13555
The inter-planer spacing of the monochromator's crystals, in Ångstroms.
DSPACING = COMM "D_spacing" ": " FLOAT EOL
-
Edge_energy: [float]
5465
The energy reference of the scan -- often the zero-valent edge energy of the absorbing atom -- as defined in the data acquisition software, in electron volts.
EDGEENERGY = COMM "Edge_energy" ": " FLOAT EOL
-
End_time: [timestamp]
2003-04-01T13:01:02
The date and time that this scan ended, using the timestamp format specified in ISO 8601 The example above represents one minute and two seconds after one o'clock in the afternoon of April 1, 2003.
DATETIME = 4DIGIT "-" 2DIGIT "-" 2DIGIT "T" 2DIGIT ":" 2DIGIT ":" 2DIGIT ENDTIME = COMM "End-time" ": " DATETIME EOL
-
Mu_fluorescence: [math expression]
$4/$2
The math expression for calculating the μ(E) of fluorescence from this file's data section.
MUFLUOR = COMM "Mu_fluorescence" ": " MATH EOL
-
Mu_reference: [math expression]
ln($3/$5)
The math expression for calculating the μ(E) of the reference from this file's data section.
MUREF = COMM "Mu_reference" ": " MATH EOL
-
Mu_transmission: [math expression]
ln($2/$3)
The math expression for calculating the μ(E) of transmission from this file's data section.
MUTRANS = COMM "Mu_tranmission" ": " MATH EOL
-
Start_time: [timestamp]
2003-04-01T13:01:02
The date and time that this scan started, using the timestamp format specified in ISO 8601 The example above represents one minute and two seconds after one o'clock in the afternoon of April 1, 2003.
DATETIME = 4DIGIT "-" 2DIGIT "-" 2DIGIT "T" 2DIGIT ":" 2DIGIT ":" 2DIGIT STARTTIME = COMM "Start-time" ": " DATETIME EOL
Extension fields are fields present in the header of an XDI file that are not defined in that file’s version of XDI. Such fields are interpreted as having values of free-form text. Any field not defined in section 4.1 must be considered an extension field, providing backwards compatibility between different minor versions of this specification.
EXT-FIELD-NAME = PROPERWORD *("-" PROPERWORD) EXT-FIELD = COMM EXT-FIELD-NAME ": " *VCHAR EOL
Data acquisition systems and data analysis packages may embed additional information in an XDI file by adding extension fields to the header. Extension fields created by applications should begin with a form of the application name used in the version line, followed by a hyphen (in appendix A examples such as MX-SSRS
are shown in the example data file, where MX is the name of the data acquisition software at that beamline). This requirement prevents field name collisions between different applications and between applications and future versions of this specification.
Applications that read XDI files may attempt to parse the values of extension fields to extract the additional information about the scan. They should propagate these fields into output files they create, but must propagate the associated version information if they do so.
When multiple occurrences of the same field are present the value of the all occurrences must be preserved. In this way, extension fields are interpreted differently from defined headers.
Related extension fields (i.e. related by being relevant to the data acquisition software or to a particular data analysis package) should be prefixed by a string identifying that software. If that software is identified in the version line, then this string should be the same as the string used in the version line. For example, the XDAC program in wide use at NSLS identifies the seconds per point of a measurement with the string SPP
. Thus the extension field would look something like this:
# XDAC-SPP: 2 2 1k
An XDI implementation must recognize the following common extension field namespaces and must process them in a way that preserves their hierarchical relationship. The namespaces are widely relevant to the measurement of XAS data.
-
Ring-: This identifies conditions of the storage ring. Specific fields are not specified, but are intended to convey information about the storage ring. Some examples
-
Ring-energy
: Energy of the stored current -
Ring-current
: Amount of current stored in the ring
-
-
Source-: This identifies properties of the source. Specific fields are not specified, but are intended to convey information about the source of photons. Some examples:
-
Source-harmonic
: The harmonic setting of an undulator -
Source-taper
: The extent of taper applied to an undulator -
Source-gap
: The gap setting of a wiggler -
Source-anode
: The anode material of rotating anode x-ray source with a Roland circle
-
-
Optics-: This identifies properties of the optics in use at the beamline. Specific fields are not specified, but are intended to convey information about how photons are conditioned at the beamline. Some examples:
-
Optics-collimation
: Describes the state of a collimating mirror -
Optics-mono
: Specifies the monochromator material -
Optics-detune
: Describes the detuning state of the mono second crystal -
Optics-focusing
: Describes the state of a focusing mirror -
Optics-harmonic_rejection
: Describes the state of a harminic rejection mirror - Other things conveyed in the
Optics-
namespace might be the details of a four-bounce monochromator, the use of assymetric crystals, the details of a polychromator, the state of filters inserted in the beam, and so on.
-
-
Sample-: This identifies properties of the sample for which data is contained in the file. Specific fields are not specified, but are intended to convey information about how the sample is prepared for measurement. Some examples:
-
Sample-name
: A description of the sample -
Sample-formula
: The stoichiometric formula of the sample -
Sample-preparation
: How the sample was prepared for mounting in the beam, e.g. "powder on tape" -
Sample-reference
: What material is used in a reference channel
-
Having defined the rules of the defined header fields, it is now possible to create a complete version of the FIELDS
rule that was provisionally defined in section 3.4.2. The complete XDI grammar is found in appendix B. Section B.5 shows the complete definition of the header fields.
There are several limitations to the current specification and grammar. Errors can be reported and issues can be suggested at The XDI issue tracker.
Here is an example of a file conforming to this specification and providing substantial metadata. This was edited by hand from a real data file measured at beamline ID10 at the APS in 2005. The lines beginning MX-
are extension fields denoting parameters of the MX data acquisition system in use at the beamline.
# XDI/1.0 MX/2.0 # Beamline: APS 10ID # Source: undulator a # Start_time 2005-03-08T20:08:57 # Edge_energy: 7112.00 # Abscissa: $1 # Mu_transmission: ln($2/$3) # Mu_fluorescence: $4/$2 # Mu_reference: ln($3/$5) # Ring-energy: 7.00 # Optics-undulator_harmonic: 3 # Optics-crystal: Si 111 # Optics-collimation: none # Optics-focusing: none # Optics-harmonic_rejection: flat Rh-coated mirror # MX-Num-regions: 1 # MX-SRB: 6900 # MX-SRSS: 0.5 # MX-SPP: 0.1 # MX-Settling-time: 0 # MX-Offsets: 11408.00 11328.00 13200.00 10774.00 # MX-Gains: 8.00 7.00 7.00 9.00 #/// # Fe K-edge, Lepidocrocite powder on kapton tape, RT # 4 layers of tape # exafs, 20 invang #--- # energy mcs3 mcs4 mcs6 mcs5 6899.9609 48120 19430 2250 54540 6900.1421 48390 19540 2260 54860 6900.5449 48520 19610 2250 55110 6900.9678 48930 19780 2280 55650 6901.3806 48460 19590 2250 55110 (....etc....)
The XDI grammer, in a single file
XDI = VERSION [FIELDS] [COMMENTS] [LABELS] DATA
OCTET = %x00-FF ; 8 bits of data UPALPHA = %x41-5A ; upper case letters A - Z LOALPHA = %x61-7A ; lower case letters a - z CHAR = %x01-7F ; any 7-bit US-ASCII character, excluding NUL VCHAR = %x21-7E ; visible (printing) characters, 7-bit (US-ASCII) ALPHA = UPALPHA / LOALPHA ; US-ASCII letters DIGIT = %x30-39 ; digits 0 - 9 CTL = %x00-1F / %x7F ; control characters (octets 0 - 31) and DEL (127) CR = %x0D ; carriage return LF = %x0A ; line feed CRLF = CR LF ; MS newline = carriage return line feed SP = %x20 ; space HT = %x09 ; horizontal tab WS = SP / HT ; white space EOL = CR / LF / CRLF ; cross-platform end-of-line
SIGN = "+" / "-" EXPONENT = ("e" / "E" / "d" / "D") [SIGN] *DIGIT NUMBER = *DIGIT ["." *DIGIT] [EXPONENT] INF = ("i" / "I") ("n" / "N") ("f" / "F") NAN = ("n" / "N") ("a" / "A") ("n" / "N") FLOAT = [SIGN] (NUMBER / INF / NAN ) DATETIME = 4DIGIT "-" 2DIGIT "-" 2DIGIT "T" 2DIGIT ":" 2DIGIT ":" 2DIGIT TEXT = %09 / %x20-FF ; any OCTET except CTLs, including WS COMM = "#" / ";" PROPERWORD = ALPHA *(ALPHA / DIGIT / "_" / "-") WORD = 1*(ALPHA / DIGIT / "_") MATH = ["ln"] *("-" / "+" / "*" / "$" / "/" / "(" / ")" DIGIT)
FIELD-END = COMM 2*"/" EOL HEADER-END = COMM 2*"-" EOL
XDI-VERSION = "XDI/" 1*DIGIT ". " 1*DIGIT APPLICATIONS = VCHAR VERSION = COMM XDI-VERSION *APPLICATIONS EOL
ABSCISSA = COMM "Abscissa" ": " MATH BEAMLINE = COMM "Beamline" ": " 1*WORD DSPACING = COMM "D_spacing" ": " FLOAT EDGEENERGY = COMM "Edge_energy" ": " FLOAT ENDTIME = COMM "End_time" ": " DATETIME MUFLUOR = COMM "Mu_fluorescence" ": " MATH MUREF = COMM "Mu_reference" ": " MATH MUTRANS = COMM "Mu_transmission" ": " MATH STARTTIME = COMM "Start_time" ": " DATETIME DEFINEDFIELDS = *( ( ABSCISSA / BEAMLINE / DSPACING / EDGEENERGY / STARTTIME / ENDTIME / MUFLUOR / MUREF / MUTRANS / EXT_FIELD / FIELD_LINE ) EOL ) FIELD-END
FIELD-LINE = DEFINEDFIELDS EXT-FIELD-NAME = WORD *("-" WORD) FIELD-LINE = COMM FIELD-NAME ": " 1*WORD EOL EXT-FIELD = COMM EXT-FIELD-NAME ": " 1*VCHAR EOL
FIELDS = (FIELD-LINE / EXT-FIELD)(s) FIELD_END
COMMENT-LINE = COMM *VCHAR EOL COMMENTS = *COMMENT-LINE HEADER-END
LABEL = PROPERWORD LABELS = COMM 1*LABEL EOL
DATA-LINE = *FLOAT EOL DATA = *DATA-LINE