-
Notifications
You must be signed in to change notification settings - Fork 6
Xdispec
This document describes the XAS Data Interchange Format (XDI ), version 1.0, a simple file format for a single X-ray Absorption Spectroscopy (XAS) measurement. We are defining this format to accomplish the following goals:
-
Establish a common language for transferring data between XAS beamlines, XAS experimenters, data analysis packages, web applications, and anything else that needs to process XAS data.
-
Increase the relevance and longevity of experimental data by reducing the amount of data archeology future interpretations of that data will require.
-
Enhance the user experience by promoting interoperability among data acquisition systems and data analysis packages.
-
Provide a mechanism for extracting and preserving a single XAS-like data set from a related experiment (for example, A DAFS measurement) or from a complex data structure (for example, a database or a hierarchical data file used to store a multi-spectral data set).
This format is intended to encode single-scan data files with metadata. It is not intended to encode relationships between many XAS measurements or between an XAS measurement and other parts of a multi-spectral experiment.
In order to fulfill these goals, XDI files provide a flexible, consistent representation of information common to all XAS experiments. This format is simpler than a format based on XML, HDF, or a database; it yields self-documenting files; and it is easy for both humans and computers to read. Its structure is inspired by that of Internet electronic mail, a plain-text data format which has proven to be robust, extensible, and enduring. Due to these advantages, and because of our intention to develop free software tools and libraries that support XDI, we hope that this file format described in this specification will see wide adoption in the XAS community.
We do not intend this specification to dictate the file formats used by data acquisition systems during XAS experiments, although this is certainly a suitable format for that purpose. Any attempt to do so would be unreasonable due to the number of different data acquisition systems currently deployed at synchrotrons around the world, the variety of experiments performed at these installations, and the continuing development of new experimental techniques. Instead, this specification addresses the representation of a single scan of XAS data after an experiment has been completed.
A beamline which adopts this specification shall either use this format as its native file format or shall provide their users with tools that convert between their native file formats and XDI . In short, they will send their users home with their XAS data stored in this format. We intend to encourage this practice by developing tools for reading, editing, writing, and validating XDI files. Beamlines may choose to modify their data acquisition systems to write data using this format in situations where that would be appropriate. We plan to assist in this effort by developing libraries for popular programming languages which can read, manipulate, and write XDI files.
With their experiment data stored in XDI files, users may choose data analysis packages which are capable of reading this format. It is our hope that, as this specification gains wider adoption, users will ultimately be freed from the responsibility of understanding file formats. With this aim in mind, we shall assist software developers in supporting XDI files.
XDI files contain two sections, a header with information about one scan of an XAS experiment and the data collected during that scan. The header consists of versioning information, a series of fields that contain a single pieces of information, an area for users to store comments about the experiment, and a sequence of labels for the columns of data. The data section contains these columns, with each row corresponding to one point of the scan.
Although the header has been designed to contain arbitrary information, the meanings of several fields are explicitly defined. These fields, described below, contain the most common information about XAS experiments. We hope that users will benefit from their existence when using data analysis packages that support XDI files. However, none of the defined fields are required to be present. For example, some of these fields may not be appropriate for certain experiments and should be omitted in that case.
Some examples of header information follow. A complete list of defined headers along with their specifications is found in Sec. 4.1.
- D_spacing: the d-spacing of the monochromator crystal used to collect the data.
- Beamline: the location where the experiment was performed.
- Edge Energy: edge energy value defined by the data acquisition software.
- Source: the type of x-ray source used in the experiment.
- Timestamps: start and end times of this scan.
- Mu expressions: math expressions for calculating experimental spectra from the data columns.
This section of the XDI specification formally describes the structure of XDI files.
The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in RFC 2119.3, Key words for use in RFCs to Indicate Requirement Levels
An XDI implementation is not compliant if it fails to satisfy one or more of the must or required level requirements presented in this specification.
All of the representations defined in this document are described both in prose and using an augmented Backus-Naur Form (BNF). The syntax used in these grammars is defined in RFC 5234, BNF for Syntax SpecificationsAugmented. Software developers who wish to implement support for XDI files themselves will need to familiarize themselves with this notation to understand this specification. Section 3 is of particular relevance to the notation conventions used in this document. Repitition syntax for grammar rules is a bit peculiar and is described in sections 3.6 and 3.7 of RFC 5234. Repitition is very importantin this specification. Here is a short summary of the repitition rules from the the RFC:
-
DIGIT
means one instance of that rule.DIGIT
is synonymous with1DIGIT
-
*DIGIT
means one or more repititions,2*DIGIT
means two or more repititions -
1*2DIGIT
indicates a range from one to two repititions -
3DIGIT
means 3 repititions exactly.
The basic rules used throughout this section to define parsing constructs are presented in the appendix in B.2 and B.3 (XREF...) as part of the complete grammar. All parsing rules that consist of a sequence of multi-character tokens must be delimited by white space unless the tokens of the sequence may be unambiguously identified.
The header and data sections of an XDI file are comprised of structured US-ASCII text. Header field values that are "free-form" or "text" may contain UTF-8 encoded Unicode text, although Unicode support in applications that use XDI files is OPTIONAL. The US-ASCII coded character set is formally by ANSI X3.4-186. The Universal Character Set (Unicode) is defined by ISO/IEC 10646. The UTF-8 translation format is defined by IETF RFC 3629.
The header section of an XDI file appears at the beginning of the file and is comprised of structured text. Every line of the header must begin with a comment character and must end with an end-of-line sequence, both of which are defined below. There are no multi-line headers. Lines may be of any length, but users of XDI should remember that XAS software may be implemented in a programming language without dynamic memory allocation (e.g. Fortran) and so should restrict lines to 2048 characters. Support for the POSIX, Apple, and Microsoft end-of-line conventions is provided to increase cross-platform portability.
COMM = "#" / ";" EOL = CR / LF / CRLF
Header lines are subdivided into four optional subsections -- versioning information, header fields, user comments, and column labels -- with two separators that are required when when header sections are present. These are organized in the following sequence:
- The required first line of the file is the version line.
- This is followed by zero or more header fields, which can be defined headers or extension headers. These two header types are explained in Sec. 3.4.2.
- The header lines are separated from the user comments by the FIELD-END rule, which is a comment character followed one or more slashes (/) followed by an end-of-line.
- The comment section is for user-supplied, free-format text. Each line begins with a comment character and ends with an end-of-line.
- The comment section ends with the HEADER-END rule, which is a line of dashes which starts with a comment character and ends with an end-of-line.
- The last line before the data is a line of column labels which identify the columns of data. There should be as many labels as there are columns. The label line begins with a comment character and ends with an end-of-line.
All header sections are optional. In a file that does not follow the XDI specification but which contains an obvious header (obvious in the sense that lines begin with a comment character and end with an end-of-line), the obvious header lines should be interpreted as user comments.
The optional status of the headers is to accommodate data files which contain obvious header lines but which are not compliant with this specification. In that case, all header lines are interpreted as user comments. Of course, in that case few of the advantages of the XDI format are realized.
The separator lines (FIELD-END
and HEADER-END
) serve specific,
syntactic purposes in the XDI grammar. The line of dashes is a common
visual cue denoting the end of the headers and beginning of the
data. The FIELD-END
serves to separate and distinguish field lines
from freely-formatted user comments, which may resemble a header
fields or other grammatical constructs. Similarly, the HEADER-END
serves to distinguish column labels from user comments, which are
otherwise grammatically identical elements of the data file.
FIELD-END = COMM 2*"/" EOL HEADER-END = COMM 2*"-" EOL
The first line of the XDI header contains the XDI version to which the
file conforms. XDI represents versions of the file format with a
<major>.<minor>
numbering scheme. The <minor>
version is
incremented when changes are made to the format that do not affect
compatibility with previous versions, as when new defined header
fields are defined. (A parser compliant with an earlier minor version
would treat the newly defined header as an extension field. Propagated
to an output file as an extension field, this field would then be
interpreted correctly by a more recent parser.) The <major>
version
is incremented when other changes are made to the format, as when the
definition of the contents of a defined header field is altered.
A series of optional version entries, separated by white space, may follow the XDI version. These version entries exist to allow various programs to annotate the file as it proceeds through the collection and nalysis process. Such annotation is optional although version information must be included in this sequence by software that create XDI files containing extension fields (see section 4.2). The order of the optional version entries is undefined but should be preserved to accurately represent the sequence in which applications have manipulated the file.
XDI-VERSION = "XDI/" *DIGIT ". " *DIGIT APPLICATIONS = VCHAR VERSION = COMM XDI-VERSION *APPLICATIONS EOL
Note that the XDI major and minor version numbers must be treated as integers that may contain more than a single digit. “XDI/1.12” is a higher (more recent) version than “XDI/1.2”.
This specification does not impose a restriction on how applications identify and version themselves. However, a single application must identify and version itself using a single text sequence without white space. Some acceptable examples follow. The first example shows an application which uses the same format as the XDI version rule, which is the recommended format for application versioning; the second shows the names of the data acquisition and data processing programs are specified by name but without version numbers; the third shows some arbitrary method of versioning an application.
# XDI/1.0 Datacollectatron/7.75 # XDI/1.0 XDAC Athena # XDI/1.0 XAS!Collect-3000
The lines immediately following the version line of the header contain the fields of the header. These fields are arranged in a manner similar to the the header of an Internet electronic mail message, although XDI fields shall not span multiple lines. Each field consists of a case-insensitive name, a separating colon, and an associated value. When multiple occurrences of the same field are present the value of the last occurrence must be used as the value for the field.
Except in the case of a header whose value has a required structure,
values are assumed to be free-form text. The defined fields are
defined in section 4.1 and the complete
definition of the FIELDS
rule may be found in
section B.5. A complete explanation of header fields
is found in section 4.
The header fields subsection is ended with a FIELD-END
line, which
consists of a comment character followed by two or more foreward slash
charaters (/
) and ending with an end-of-line character.
Following the dividing line at the end of the header fields subsection is the area of the header that contains user comments. Please note that this area is reserved for comments supplied by the experimenter and must not be used by software as a place to store other information. Refer to section 4.3 for information about using extension fields for this purpose.
COMMENT-LINE = COMM *VCHAR EOL COMMENTS = *COMMENT-LINE HEADER-END
This section may contain no lines of commentary or lines that contain
no comment text. This section must end with HEADER-END
dividing
line, which is a line starting with a comment character and containing
two or more dashes and ending with an end-of-line character.
When extracting the comment subsection from an XDI file, software may remove a single leading space and any trailing white space from each comment line but must not further alter the line’s contents, including any interior whitespace.
Applications must preserve all user comment.
The final line of the XDI header contains the labels for each column of data in the data section of the file, separated by white space. There must be one label present for each column of data present in the data section.
LABEL = *WORD LABELS = COMM *LABEL EOL
The number of column labels must equal the number of columns of data in the data section.
Note that each column label must be a word and that white space delimits the labels. For specific column labels which, in natural language, would consist of two or more words, the use of CamelCase or underscores is recommended.
It is recommended that the column labels be those labels defined
in section 4.2 for use with the headers in the Column.
namespace.
The data section of the file contains white space delimited columns of floating-point numbers.
DATA-LINE = *FLOAT EOL DATA = *DATA-LINE
Blank lines in this section must be discarded. The number of columns must be the same for all lines that contain data. Any column containing a measurement of times must be represented as floating point numbers.
<a name=fields">
When present, header fields must comply with the associated parsing rules. Any fields which fail to do so must be ignored by preprocessing and analysis software.
XDI fields use a simple namespace concept as their structure. Here is the grammar:
PROPERWORD = ALPHA *(ALPHA / DIGIT / "_" / "-") WORD = *(ALPHA / DIGIT / "_" / "-") SEPARATOR = ".' VALUE = *VCHAR FIELD-NAME = PROPERWORD *(SEPARATOR WORD) FIELD-VALUE = *VALUE FIELD-LINE = COMM FIELD-NAME ": " FIELD-VALUE EOL FIELDS = *FIELD-LINE FIELD-END
Here are some examples which demonstrate both the format of the XDI field and the namespace concept:
# Beamline.name: APS 20BM # Beamline.source: bend magnet # Column.1: energy eV # Column.4: i0
The name of the field is one or more words. The first word in the
name much start with a letter. Subsequent words in the name
must be separated by the separator character (.
). The name
must end with a colon (:
). The colon must be followed by
the value of the field.
The namespaces are used to group related fields. In the example above, two namespaces are shown. One groups tegather charactersitics of the beamline at which the data were measured, the other groups together hints about interpreting the columns in the data file.
There are two kinds of namespaces. Defined namespaces are defined in this specification. Extension namespaces may be added by application developers to insert metadata into the data file. Defined fields for both kinds of namespaces must observe the grammar shown above.
<a name=defined_namespaces">
The following namespaces are defined in this specification and are
used to convey information of common interest to many beamlines and
applications. Except for Beamline.d-spacing
, individual fields are
optional. The extensive use of these optional fields to fully
identify the provenance of the data is recommended. As discussed
in ???, Beamline.d-spacing
is required in a valid XDI file.
A header in a defined namespace must not appear more than once in a file.
-
Ring.: This namespace identifies conditions of the storage ring. Some examples
-
Ring.energy
: Energy of the stored current -
Ring.current
: Amount of current stored in the ring
-
-
Beamline.: This namespace conveys information about the beamline at which the data were measured. Some examples:
-
Beamline.name
: The name used to identify the beamline -
Beamline.d-spacing
: The d-spacing of the monochromator crystal used to measure the data (see Sec. ???)
-
-
Source.: This identifies properties of the source. Specific fields are not specified, but are intended to convey information about the source of photons. Some examples:
-
Source.harmonic
: The harmonic setting of an undulator -
Source.taper
: The extent of taper applied to an undulator -
Source.gap
: The gap setting of a wiggler -
Source.anode
: The anode material of rotating anode x-ray source with a Roland circle
-
-
Optics.: This identifies properties of the optics in use at the beamline. Specific fields are not specified, but are intended to convey information about how photons are conditioned at the beamline. Some examples:
-
Optics.collimation
: Describes the state of a collimating mirror -
Optics.mono
: Specifies the monochromator material -
Optics.detune
: Describes the detuning state of the mono second crystal -
Optics.focusing
: Describes the state of a focusing mirror -
Optics.harmonic-rejection
: Describes the state of a harminic rejection mirror - Other things conveyed in the
Optics.
namespace might be the details of a four-bounce monochromator, the use of assymetric crystals, the details of a polychromator, the state of filters inserted in the beam, and so on.
-
-
Sample.: This identifies properties of the sample for which data is contained in the file. Specific fields are not specified, but are intended to convey information about how the sample is prepared for measurement. Some examples:
-
Sample.name
: A description of the sample -
Sample.formula
: The stoichiometric formula of the sample -
Sample.preparation
: How the sample was prepared for mounting in the beam, e.g. "powder on tape" -
Sample.reference
: What material is used in a reference channel
-
-
Time.: This namespace is used to specify creation times of the data file. Note that time must be specified according to the time stamp specification given in ISO 8601. An XDI file is non-complient if time stamps are not ISO 8601 complient.
-
Time.start
: The timestamp of the beginning of a data scan -
Time.end
: The timestamp of the end of a data scan
-
<a name=columns">
The Column namespace is the mechanism by which XDI files provide hints about how to extract useful information from the columns in the data section of the file.
-
All fields in this namespace *must be of the form
Column.N
, where N represents an integer. The integer is used to identify a particular column in the data file. The value of a Colulmn field is used to indicate the contents of that column. -
There are several defined column lables. These are words that must be used to describe a column when that column is present in the data file and identified among the header fields. The list of defined column labels is given below.
-
A header defining the abscissa of the data is required. Data may be stored in refernce to any reasonable abscissa, but that must be identified. Reasonable abscissa choices include energy (in units of eV or keV) or angle (in units of degrees, radians, or motor steps). If units of motor steps are chosen, then adequate information must be provided via headers in the
Optics.
namespace to translate the abscissa into energy units. -
The header identifying the abscissa must provide two values: the column label for the abscissa and the corresponding units. It looks something like this:
# Column.1: energy eV
. All other headers in the Column namespace must provide one value -- the column label.
Here is a list of defined column labels and their meanings along with unit definitions for the abscissa. Along with column labels defining the abscissa and various detectors, labels for representing EXAFS data in various stages of data processing (μ(E), normalized μ(E);, χ(k), the Fourier transform of χ(k), or the Fourier filter of χ(k)) are provided.
COL_LABEL Meaning choice of units (if required) -------------------------------------------------------------------------------------- energy mono energy eV / keV angle mono angle degrees / radians / steps i0 monitor intensity itrans transmission intensity ifluor flourescence intensity irefer reference intensity mutrans mu transmission mufluor mu fluorescence murefer mu reference normtrans normalized mu transmission normfluor normalized mu fluorescence normrefer normalized mu reference k wavenumber chi EXAFS r radial distance chir_mag magnitude of FT[chi(k)] chir_pha phase of FT[chi(k)] chir_re real part of FT[chi(k)] chir_im imaginary part of FT[chi(k)] k wavenumber chi_mag magnitude of Filtered chi(k) chi_pha phase of Filtered chi(k) chi_re real part of Filtered chi(k) chi_im imaginary part of Filtered chi(k)
Extension fields are fields present in the header of an XDI file that are not defined in the XDI specification. Such fields must be structured by the same grammar as a defined field, but are interpreted as having values of free-form text. Any field not defined in section 4.1 must be considered an extension field, providing backwards compatibility between different minor versions of this specification.
EXT-FIELD-NAME = PROPERWORD *("-" WORD) EXT-FIELD = COMM EXT-FIELD-NAME ": " *VALUE EOL
Data acquisition systems and data analysis packages may embed
additional information in an XDI file by adding extension fields to
the header. Extension fields created by applications should begin
with a form of the application name used in the version line, followed
by a hyphen (in appendix A examples such as MX.SSRS
are
shown in the example data file, where MX is the name of the data
acquisition software at that beamline). This requirement prevents
field name collisions between different applications and between
applications and future versions of this specification.
Applications that read XDI files may attempt to parse the values of extension fields to extract the additional information about the scan. They should propagate these fields into output files they create, but must propagate the associated version information if they do so.
While multiple occurrences of the same field are dscouraged, when present, the value of the last occurrence must be preserved.
Having defined the rules of the defined header fields, it is now
possible to create a complete version of the FIELDS
rule that was
provisionally defined in section 3.4.2. The complete XDI
grammar is found in appendix B. Section B.5 shows the
complete definition of the header fields.
There are several limitations to the current specification and grammar. Errors can be reported and issues can be suggested at The XDI issue tracker.
<a name=example">
Here is an example of a file conforming to this specification and
providing substantial metadata. This was edited by hand from a real
data file measured at beamline ID10 at the APS in 2005. The lines
beginning MX.
are extension fields denoting parameters of the MX data
acquisition system in use at the beamline.
# XDI/1.0 MX/2.0 # Beamline.name: APS 10ID # Beamline.edge-energy: 7112.00 # Ring.energy: 7.00 # Source.type: undulator a # Source.undulator-harmonic: 3 # Time.start 2005-03-08T20:08:57 # Optics.crystal: Si 111 # Optics.collimation: none # Optics.focusing: none # Optics.harmonic-rejection: flat Rh-coated mirror # Column.1: energy ev # Column.2: i0 # Column.3: itrans # Column.4: ifluor # Column.5: iref # MX.Num-regions: 1 # MX.SRB: 6900 # MX.SRSS: 0.5 # MX.SPP: 0.1 # MX.Settling-time: 0 # MX.Offsets: 11408.00 11328.00 13200.00 10774.00 # MX.Gains: 8.00 7.00 7.00 9.00 #/// # Fe K-edge, Lepidocrocite powder on kapton tape, RT # 4 layers of tape # exafs, 20 invang #--- # energy mcs3 mcs4 mcs6 mcs5 6899.9609 48120 19430 2250 54540 6900.1421 48390 19540 2260 54860 6900.5449 48520 19610 2250 55110 6900.9678 48930 19780 2280 55650 6901.3806 48460 19590 2250 55110 (....etc....)
The XDI grammer, in a single file
XDI = VERSION [FIELDS] [COMMENTS] [LABELS] DATA
OCTET = %x00-FF ; 8 bits of data UPALPHA = %x41-5A ; upper case letters A - Z LOALPHA = %x61-7A ; lower case letters a - z CHAR = %x01-7F ; any 7-bit US-ASCII character, excluding NUL VCHAR = %x21-7E ; visible (printing) characters, 7-bit (US-ASCII) ALPHA = UPALPHA / LOALPHA ; US-ASCII letters DIGIT = %x30-39 ; digits 0 - 9 CTL = %x00-1F / %x7F ; control characters (octets 0 - 31) and DEL (127) CR = %x0D ; carriage return LF = %x0A ; line feed CRLF = CR LF ; MS newline = carriage return line feed SP = %x20 ; space HT = %x09 ; horizontal tab WS = SP / HT ; white space EOL = CR / LF / CRLF ; cross-platform end-of-line
SIGN = "+" / "-" EXPONENT = ("e" / "E" / "d" / "D") [SIGN] *DIGIT NUMBER = *DIGIT ["." *DIGIT] [EXPONENT] INF = ("i" / "I") ("n" / "N") ("f" / "F") NAN = ("n" / "N") ("a" / "A") ("n" / "N") FLOAT = [SIGN] (NUMBER / INF / NAN ) DATETIME = 4DIGIT "-" 2DIGIT "-" 2DIGIT "T" 2DIGIT ":" 2DIGIT ":" 2DIGIT TEXT = %09 / %x20-FF ; any OCTET except CTLs, including WS COMM = "#" / ";" PROPERWORD = ALPHA *(ALPHA / DIGIT / "_" / "-") WORD = 1*(ALPHA / DIGIT / "_") MATH = ["ln"] *("-" / "+" / "*" / "$" / "/" / "(" / ")" DIGIT)
FIELD-END = COMM 2*"/" EOL HEADER-END = COMM 2*"-" EOL
XDI-VERSION = "XDI/" 1*DIGIT ". " 1*DIGIT APPLICATIONS = VCHAR VERSION = COMM XDI-VERSION *APPLICATIONS EOL
ABSCISSA = COMM "Abscissa" ": " MATH BEAMLINE = COMM "Beamline" ": " 1*WORD DSPACING = COMM "D_spacing" ": " FLOAT EDGEENERGY = COMM "Edge_energy" ": " FLOAT ENDTIME = COMM "End_time" ": " DATETIME MUFLUOR = COMM "Mu_fluorescence" ": " MATH MUREF = COMM "Mu_reference" ": " MATH MUTRANS = COMM "Mu_transmission" ": " MATH STARTTIME = COMM "Start_time" ": " DATETIME DEFINEDFIELDS = *( ( ABSCISSA / BEAMLINE / DSPACING / EDGEENERGY / STARTTIME / ENDTIME / MUFLUOR / MUREF / MUTRANS / EXT_FIELD / FIELD_LINE ) EOL ) FIELD-END
FIELD-LINE = DEFINEDFIELDS EXT-FIELD-NAME = WORD *("-" WORD) FIELD-LINE = COMM FIELD-NAME ": " 1*WORD EOL EXT-FIELD = COMM EXT-FIELD-NAME ": " 1*VCHAR EOL
FIELDS = (FIELD-LINE / EXT-FIELD)(s) FIELD_END
COMMENT-LINE = COMM *VCHAR EOL COMMENTS = *COMMENT-LINE HEADER-END
LABEL = PROPERWORD LABELS = COMM 1*LABEL EOL
DATA-LINE = *FLOAT EOL DATA = *DATA-LINE