-
Notifications
You must be signed in to change notification settings - Fork 6
Xdispec
Need proper language here to explain the assumption of authority to do this chore.
- Bruce
- Matt
- Armando
- Gerd
- Darren
This document describes the XAS Data Interchange Format (hereafter referred to as XDI), version 1.0, a simple file format for a single X-ray Absorption Spectroscopy (XAS) measurement. We are defining this format to accomplish the following goals:
-
Establish a common language for transferring data between XAS beamlines, XAS experimenters, data analysis packages, web applications, and anything else that needs to process XAS data.
-
Increase the relevance and longevity of experimental data by reducing the amount of data archeology future interpretations of that data will require.
-
Enhance the user experience by promoting inter-operability among data acquisition systems, data analysis packages, and other applications.
-
Provide a mechanism for extracting and preserving a single XAS-like data set from a related experiment (for example, A DAFS measurement) or from a complex data structure (for example, a database or a hierarchical data file used to store a multi-spectral data set).
This format is intended to encode single-scan data files with metadata. It is not intended to encode relationships between many XAS measurements or between an XAS measurement and other parts of a multi-spectral experiment.
In order to fulfill these goals, XDI files provide a flexible, consistent representation of information common to all XAS experiments. This format is simpler than a format based on XML, HDF, or a database; it yields self-documenting files; and it is easy for both humans and computers to read. Its structure is inspired by that of Internet electronic mail, a plain-text data format which has proven to be robust, extensible, and enduring. Due to these advantages, and because of our intention to develop free software tools and libraries that support XDI, we hope that this file format described in this specification will see wide adoption in the XAS community.
We do not intend this specification to dictate the file formats used by data acquisition systems during XAS experiments, although this is certainly a suitable format for that purpose. Any attempt to do so would be unreasonable due to the number of different data acquisition systems currently deployed at synchrotrons around the world, the variety of experiments performed at these installations, and the continuing development of new experimental techniques. Instead, this specification addresses the representation of a single scan of XAS data after an experiment has been completed.
A beamline which adopts this specification shall either use this format as its native file format or shall provide their users with tools that convert between their native file formats and XDI . In short, they will send their users home with their XAS data stored in this format. We intend to encourage this practice by developing tools for reading, editing, writing, and validating XDI files. Beamlines may choose to modify their data acquisition systems to write data using this format in situations where that would be appropriate. We plan to assist in this effort by developing libraries for popular programming languages which can read, manipulate, and write XDI files.
With their experiment data stored in XDI files, users will want data analysis packages and other applications which are capable of reading this format. It is our hope that, as this specification gains wider adoption, users will ultimately be freed from the responsibility of understanding file formats. With these aims in mind, we shall assist software developers in supporting XDI files.
XDI files contain two sections, a header with information about one scan of an XAS experiment and the data collected during that scan. The header consists of versioning information, a series of fields that contain a single pieces of information, an area for users to store comments about the experiment, and a sequence of labels for the columns of data. The data section contains these columns, with each row corresponding to one point of the scan.
The header has been designed to contain arbitrary metadata describing the contents of the file. This metadata is organized in a way that is easily readable by both humans and computers. These fields, described below, contain information about XAS experiments which is useful for both users and applications. A complete list of defined headers along with their specifications is found in Sec. 4.1.
This section of the XDI specification formally describes the structure of XDI files.
The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in RFC 2119.3, Key words for use in RFCs to Indicate Requirement Levels
An XDI implementation is not compliant if it fails to satisfy one or more of the must or required level requirements presented in this specification.
All of the representations defined in this document are described both in prose and using an augmented Backus-Naur Form (BNF). The syntax used in these grammars is defined in RFC 5234, BNF for Syntax Specifications Augmented. Software developers who wish to implement support for XDI files themselves will need to familiarize themselves with this notation to understand this specification. Section 3 is of particular relevance to the notation conventions used in this document. Repetition syntax for grammar rules is a bit peculiar and is described in sections 3.6 and 3.7 of RFC 5234. Repetition is very important in this specification. Here is a short summary of the repetition rules from the the RFC:
-
DIGIT
means one instance of that rule.DIGIT
is synonymous with1DIGIT
-
*DIGIT
means one or more repetitions,2*DIGIT
means two or more repetitions -
1*2DIGIT
indicates a range from one to two repetitions -
3DIGIT
means 3 repetitions exactly.
The basic rules used throughout this section to define parsing constructs are presented in the appendix in B.2 and B.3 (XREF...) as part of the complete grammar. All parsing rules that consist of a sequence of multi-character tokens must be delimited by white space unless the tokens of the sequence may be unambiguously identified.
The header and data sections of an XDI file are comprised of structured US-ASCII text. Header field values that are "free-form" or "text" may contain UTF-8 encoded Unicode text, although Unicode support in applications that use XDI files is OPTIONAL. The US-ASCII coded character set is formally by ANSI X3.4-186. The Universal Character Set (Unicode) is defined by ISO/IEC 10646. The UTF-8 translation format is defined by IETF RFC 3629.
The header section of an XDI file appears at the beginning of the file and is comprised of structured text. Every line of the header must begin with a comment character and must end with an end-of-line sequence, both of which are defined below. There are no multi-line headers. Lines may be of any length, but users of XDI should remember that XAS software may be implemented in a programming language without dynamic memory allocation (e.g. Fortran) and so should restrict lines to 2048 characters. Support for the POSIX, Apple, and Microsoft end-of-line conventions is provided to increase cross-platform portability.
COMM = "#" / ";"
EOL = CR / LF / CRLF
Header lines are subdivided into four subsections -- versioning information, header fields, user comments, and column labels -- with two separators that are required when when header sections are present. These are subsections must occur in the following sequence:
-
The required first line of the file is the version line.
-
This is followed by header fields, which can be defined headers or extension headers. These two header types are explained in Sec. 3.4.2. Some headers are required, as explained in Sec. 4.4.
-
The header lines are separated from the user comments by the
FIELD-END
rule, which is a comment character followed one or more slashes (/) followed by an end-of-line. If the comment section is present, theFIELD-END
line must also be present. -
The optional comment section is for user-supplied, free-format text. Each line begins with a comment character and ends with an end-of-line.
-
The comment section ends with the required
HEADER-END
rule, which is a line of dashes which starts with a comment character and ends with an end-of-line. -
The last line before the data is a line of optional column labels which identify the columns of data. There should be as many labels as there are columns. The label line begins with a comment character and ends with an end-of-line.
The separator lines (FIELD-END
and HEADER-END
) serve specific,
syntactic purposes in the XDI grammar. The line of dashes is a common
visual cue denoting the end of the headers and beginning of the
data. The FIELD-END
serves to separate and distinguish field lines
from freely-formatted user comments, which may resemble a header
fields or other grammatical constructs. Similarly, the HEADER-END
serves to distinguish column labels from user comments, which are
otherwise grammatically identical elements of the data file.
FIELD-END = COMM 2*"/" EOL
HEADER-END = COMM 2*"-" EOL
The first line of the XDI header contains the XDI version to which the
file conforms. XDI represents versions of the file format with a
<major>.<minor>
numbering scheme. The <minor>
version is
incremented when changes are made to the format that do not affect
compatibility with previous versions, as when new defined header
fields are defined. (A parser compliant with an earlier minor version
would treat the newly defined header as an extension field. Propagated
to an output file as an extension field, this field would then be
interpreted correctly by a more recent parser.) The <major>
version
is incremented when other changes are made to the format, as when the
definition of the contents of a defined header field is altered.
A series of optional version entries, separated by white space, may follow the XDI version. These version entries exist to allow various programs to annotate the file as it proceeds through the collection and analysis process. Such annotation is optional although version information must be included in this sequence by software that create XDI files containing extension fields (see section 4.2). The order of the optional version entries is undefined but should be preserved to accurately represent the sequence in which applications have manipulated the file.
XDI-VERSION = "XDI/" *DIGIT ". " *DIGIT
APPLICATIONS = VCHAR
VERSION = COMM XDI-VERSION *APPLICATIONS EOL
Note that the XDI major and minor version numbers must be treated as integers that may contain more than a single digit. “XDI/1.12” is a higher (more recent) version than “XDI/1.2”.
This specification does not impose a restriction on how applications identify and version themselves. However, a single application must identify and version itself using a single text sequence without white space. Some acceptable examples follow. The first example shows an application which uses the same format as the XDI version rule, which is the recommended format for application versioning; the second shows the names of the data acquisition and data processing programs are specified by name but without version numbers; the third shows some arbitrary method of versioning an application.
# XDI/1.0 Datacollectatron/7.75
# XDI/1.0 XDAC Athena
# XDI/1.0 XAS!Collect-3000
The lines immediately following the version line of the header contain the fields of the header. These fields are arranged in a manner similar to the the header of an Internet electronic mail message, although XDI fields shall not span multiple lines. Each field consists of a case-insensitive name, a separating colon, and an associated value. When multiple occurrences of the same field are present the value of the last occurrence must be used as the value for the field.
Except in the case of a header whose value has a required structure,
values are assumed to be free-form text. The defined fields are
defined in section 4.1 and the complete
definition of the FIELDS
rule may be found in
section B.5. A complete explanation of header fields
is found in section 4.
The header fields subsection is ended with a FIELD-END
line, which
consists of a comment character followed by two or more forward slash
characters (/
) and ending with an end-of-line character.
Following the dividing line at the end of the header fields subsection is the area of the header that contains user comments. Please note that this area is reserved for comments supplied by the experimenter and must not be used by software as a place to store other information. Refer to section 4.3 for information about using extension fields for this purpose.
COMMENT-LINE = COMM *VCHAR EOL
COMMENTS = *COMMENT-LINE HEADER-END
This section may contain no lines of commentary or lines that contain
no comment text. This section must end with HEADER-END
dividing
line, which is a line starting with a comment character and containing
two or more dashes and ending with an end-of-line character.
When extracting the comment subsection from an XDI file, software may remove a single leading space and any trailing white space from each comment line but must not further alter the line’s contents, including any interior whitespace.
Applications must preserve all user comment.
The final line of the XDI header contains the labels for each column of data in the data section of the file, separated by white space. There must be one label present for each column of data present in the data section.
LABEL = *WORD
LABELS = COMM *LABEL EOL
The number of column labels must equal the number of columns of data in the data section.
Note that each column label must be a word and that white space delimits the labels. For specific column labels which, in natural language, would consist of two or more words, the use of CamelCase or underscores is recommended.
It is recommended that the column labels be those labels defined
in section 4.2 for use with the headers in the Column.
namespace.
The data section of the file contains white space delimited columns of floating-point numbers.
DATA-LINE = *FLOAT EOL
DATA = *DATA-LINE
Blank lines in this section must be discarded. The number of columns must be the same for all lines that contain data. Any column containing a measurement of times must be represented as floating point numbers.
<a name=fields">
When present, header fields must comply with the associated parsing rules. Any fields which fail to do so must be ignored by preprocessing and analysis software.
XDI fields use a simple namespace concept as their structure. Here is the grammar:
PROPERWORD = ALPHA *(ALPHA / DIGIT / "_" / "-")
WORD = *(ALPHA / DIGIT / "_" / "-")
SEPARATOR = "."
VALUE = *VCHAR
FIELD-NAME = PROPERWORD *(SEPARATOR WORD)
FIELD-VALUE = *VALUE
FIELD-LINE = COMM FIELD-NAME ": " FIELD-VALUE EOL
FIELDS = *FIELD-LINE FIELD-END
Here are some examples which demonstrate both the format of the XDI field and the namespace concept:
# Beamline.name: APS 20BM
# Beamline.source: bend magnet
# Column.1: energy eV
# Column.4: i0
The name of the field is one or more words. The first word in the
name much start with a letter. Subsequent words in the name
must be separated by the separator character (.
). The name
must end with a colon (:
). The colon must be followed by
the value of the field. A missing value shall be interpreted as
a zero-length string.
The namespaces are used to group related fields. In the example above, two namespaces are shown. One is used to group together characteristics of the beamline at which the data were measured, the other is used to group together hints about interpreting the columns in the data file.
There are two kinds of namespaces. Defined namespaces are defined in this specification. Extension namespaces may be added by application developers to insert metadata into the data file. Defined fields for both kinds of namespaces must observe the grammar shown above.
<a name=defined_namespaces">
The following namespaces are defined in this specification and are
used to convey information of common interest to many beamlines and
applications. Except for Beamline.d-spacing
, individual fields are
optional. The extensive use of these optional fields to fully
identify the provenance of the data is recommended. As discussed
in Sec. 4.4, Beamline.d-spacing
is required in a
valid XDI file.
A header in a defined namespace must not appear more than once in a file.
-
Beamline. : This namespace conveys information about the beamline at which the data were measured.
Beamline.d-spacing
is a required header. Some examples:-
Beamline.name
: The name used to identify the beamline -
Beamline.d-spacing
: The d-spacing of the monochromator crystal used to measure the data (see Sec. 4.4).
-
-
Ring. : This namespace identifies conditions of the storage ring. Specific fields in this namespace are not specified. Some examples
-
Ring.energy
: Energy of the stored current -
Ring.current
: Amount of current stored in the ring
-
-
Source. : This identifies properties of the source. Specific fields in this namespace are not specified. Some examples:
-
Source.harmonic
: The harmonic setting of an undulator -
Source.taper
: The extent of taper applied to an undulator -
Source.gap
: The gap setting of a wiggler -
Source.anode
: The anode material of rotating anode x-ray source with a Roland circle
-
-
Optics. : This identifies properties of the optics in use at the beamline. Specific fields in this namespace are not specified. Some examples:
-
Optics.collimation
: Describes the state of a collimating mirror -
Optics.mono
: Specifies the monochromator material -
Optics.detune
: Describes the detuning state of the mono second crystal -
Optics.focusing
: Describes the state of a focusing mirror -
Optics.harmonic-rejection
: Describes the state of a harmonic rejection mirror - Other things conveyed in the
Optics.
namespace might be the details of a four-bounce monochromator, the use of asymmetric crystals, the details of a polychromator, the state of filters inserted in the beam, and so on.
-
-
Sample. : This identifies properties of the sample for which data is contained in the file. Specific fields in this namespaceare not specified. Some examples:
-
Sample.name
: A description of the sample -
Sample.formula
: The stoichiometric formula of the sample -
Sample.preparation
: How the sample was prepared for mounting in the beam, e.g. "powder on tape" -
Sample.reference
: What material is used in a reference channel
-
-
Time. : This namespace is used to specify creation times of the data file. Note that an indication of a particular moment in time must be specified according to the time stamp specification given in ISO 8601. An XDI file is non-compliant if time stamps are not ISO 8601 compliant.
-
Time.start
: The time-stamp of the beginning of a data scan -
Time.end
: The time-stamp of the end of a data scan
-
<a name=columns">
The Column namespace is the mechanism by which XDI files provide hints about how to extract useful information from the columns in the data section of the file.
-
All fields in this namespace *must be of the form
Column.N
, where N represents an integer. The integer is used to identify a particular column in the data file. The value of a Column field is used to indicate the contents of that column. -
There are several defined column labels. These are words that must be used to describe a column when that column is present in the data file and identified among the header fields. The list of defined column labels is given below.
-
A header defining the abscissa of the data is required. Data may be stored using any reasonable units for the abscissa, but that choice of units must be identified. Reasonable abscissa choices include energy (in units of eV or keV) or angle (in units of degrees, radians, or motor steps). eV units are recommended. If units of motor steps are chosen, then adequate information must be provided via headers in the
Optics.
namespace to translate the abscissa into energy units. -
The header identifying the abscissa must provide two values: the column label for the abscissa and the corresponding units. It looks something like this:
# Column.1: energy eV
. All other headers in the Column namespace must provide one value -- the column label.
Here is a list of defined column labels and their meanings along with unit definitions for the abscissa. Along with column labels defining the abscissa and various detectors, labels for representing EXAFS data in various stages of data processing (μ(E), normalized μ(E), χ(k), the Fourier transform of χ(k), or the Fourier filter of χ(k)) are provided.
COL_LABEL Meaning choice of units (if required)
--------------------------------------------------------------------------------------
energy mono energy eV / keV
angle mono angle degrees / radians / steps
i0 monitor intensity
itrans transmission intensity
ifluor fluorescence intensity
irefer reference intensity
mutrans mu transmission
mufluor mu fluorescence
murefer mu reference
normtrans normalized mu transmission
normfluor normalized mu fluorescence
normrefer normalized mu reference
k wavenumber
chi EXAFS
chi_mag magnitude of Filtered chi(k)
chi_pha phase of Filtered chi(k)
chi_re real part of Filtered chi(k)
chi_im imaginary part of Filtered chi(k)
r radial distance
chir_mag magnitude of FT[chi(k)]
chir_pha phase of FT[chi(k)]
chir_re real part of FT[chi(k)]
chir_im imaginary part of FT[chi(k)]
Extension fields are fields present in the header of an XDI file that are not defined in the XDI specification. Such fields must be structured by the same grammar as a defined field, but are interpreted as having values of free-form text. Any field not defined in section 4.1 must be considered an extension field, providing backwards compatibility between different minor versions of this specification.
EXT-FIELD-NAME = PROPERWORD *("-" WORD)
EXT-FIELD = COMM EXT-FIELD-NAME ": " *VALUE EOL
Data acquisition systems and data analysis packages may embed
additional information in an XDI file by adding extension fields to
the header. Extension fields created by applications should begin
with a form of the application name used in the version line, followed
by a hyphen (in appendix A examples such as MX.SSRS
are
shown in the example data file, where MX is the name of the data
acquisition software at that beamline). This requirement prevents
field name collisions between different applications and between
applications and future versions of this specification.
Applications that read XDI files may attempt to parse the values of extension fields to extract the additional information about the scan. They should propagate these fields into output files they create, but must propagate the associated version information if they do so.
Multiple occurrences of the same field are discouraged. When present, the value of the last occurrence (reading linearly from the beginning of the file) must be preserved.
<a name="required>
Correct interpolation of data onto common energy axes over large energy ranges requires knowledge of the crystal d-spacing ...
- In absence of a header in
Column.
namespace, assume first column is the abscissa and that the units are eV... (although, intuiting anscissa units is usually possible).
or
- Absence of abscissa header makes file invalid.
Having defined the rules of the defined header fields, it is now
possible to create a complete version of the FIELDS
rule that was
provisionally defined in section 3.4.2. The complete XDI
grammar is found in appendix B. Section B.5 shows the
complete definition of the header fields.
There are several limitations to the current specification and grammar. Errors can be reported and issues can be suggested at the XDI issue tracker.
<a name=example">
Here is an example of a file conforming to this specification and
providing substantial metadata. This was edited by hand from a real
data file measured at beamline ID10 at the APS in 2005. The lines
beginning MX.
are extension fields denoting parameters of the MX data
acquisition system in use at the beamline.
# XDI/1.0 MX/2.0
# Beamline.name: APS 10ID
# Beamline.edge-energy: 7112.00
# Beamline.d-spacing: 3.1356
# Ring.energy: 7.00
# Source.type: undulator a
# Source.undulator-harmonic: 1
# Time.start 2005-03-08T20:08:57
# Optics.crystal: Si 111
# Optics.collimation: none
# Optics.focusing: none
# Optics.harmonic-rejection: flat Rh-coated mirror
# Column.1: energy eV
# Column.2: i0
# Column.3: itrans
# Column.4: ifluor
# Column.5: irefer
# MX.Num-regions: 1
# MX.SRB: 6900
# MX.SRSS: 0.5
# MX.SPP: 0.1
# MX.Settling-time: 0
# MX.Offsets: 11408.00 11328.00 13200.00 10774.00
# MX.Gains: 8.00 7.00 7.00 9.00
#///
# Fe K-edge, Lepidocrocite powder on kapton tape, RT
# 4 layers of tape
# exafs, 20 invang
#---
# energy mcs3 mcs4 mcs6 mcs5
6899.9609 48120 19430 2250 54540
6900.1421 48390 19540 2260 54860
6900.5449 48520 19610 2250 55110
6900.9678 48930 19780 2280 55650
6901.3806 48460 19590 2250 55110
(....etc....)
The XDI grammar, in a single file
This section is not up-to-date BR, 18/04/2011
XDI = VERSION FIELDS [COMMENTS] [LABELS] DATA
OCTET = %x00-FF ; 8 bits of data
UPALPHA = %x41-5A ; upper case letters A - Z
LOALPHA = %x61-7A ; lower case letters a - z
CHAR = %x01-7F ; any 7-bit US-ASCII character, excluding NUL
VCHAR = %x21-7E ; visible (printing) characters, 7-bit (US-ASCII)
ALPHA = UPALPHA / LOALPHA ; US-ASCII letters
DIGIT = %x30-39 ; digits 0 - 9
CTL = %x00-1F / %x7F ; control characters (octets 0 - 31) and DEL (127)
CR = %x0D ; carriage return
LF = %x0A ; line feed
CRLF = CR LF ; MS newline = carriage return line feed
SP = %x20 ; space
HT = %x09 ; horizontal tab
WS = SP / HT ; white space
EOL = CR / LF / CRLF ; cross-platform end-of-line
SIGN = "+" / "-"
EXPONENT = ("e" / "E" / "d" / "D") [SIGN] *DIGIT
NUMBER = *DIGIT ["." *DIGIT] [EXPONENT]
INF = ("i" / "I") ("n" / "N") ("f" / "F")
NAN = ("n" / "N") ("a" / "A") ("n" / "N")
FLOAT = [SIGN] (NUMBER / INF / NAN )
DATETIME = 4DIGIT "-" 2DIGIT "-" 2DIGIT "T" 2DIGIT ":" 2DIGIT ":" 2DIGIT
TEXT = %09 / %x20-FF ; any OCTET except CTLs, including WS
COMM = "#" / ";"
PROPERWORD = ALPHA *(ALPHA / DIGIT / "_" / "-")
WORD = 1*(ALPHA / DIGIT / "_")
MATH = ["ln"] *("-" / "+" / "*" / "$" / "/" / "(" / ")" DIGIT)
FIELD-END = COMM 2*"/" EOL
HEADER-END = COMM 2*"-" EOL
XDI-VERSION = "XDI/" 1*DIGIT ". " 1*DIGIT
APPLICATIONS = VCHAR
VERSION = COMM XDI-VERSION *APPLICATIONS EOL
ABSCISSA = COMM "Abscissa" ": " MATH
BEAMLINE = COMM "Beamline" ": " 1*WORD
DSPACING = COMM "D_spacing" ": " FLOAT
EDGEENERGY = COMM "Edge_energy" ": " FLOAT
ENDTIME = COMM "End_time" ": " DATETIME
MUFLUOR = COMM "Mu_fluorescence" ": " MATH
MUREF = COMM "Mu_reference" ": " MATH
MUTRANS = COMM "Mu_transmission" ": " MATH
STARTTIME = COMM "Start_time" ": " DATETIME
DEFINEDFIELDS = *( ( ABSCISSA / BEAMLINE
/ DSPACING / EDGEENERGY
/ STARTTIME / ENDTIME
/ MUFLUOR / MUREF / MUTRANS
/ EXT_FIELD / FIELD_LINE
) EOL ) FIELD-END
FIELD-LINE = DEFINEDFIELDS
EXT-FIELD-NAME = WORD *("-" WORD)
FIELD-LINE = COMM FIELD-NAME ": " 1*WORD EOL
EXT-FIELD = COMM EXT-FIELD-NAME ": " 1*VCHAR EOL
FIELDS = (FIELD-LINE / EXT-FIELD)(s) FIELD_END
COMMENT-LINE = COMM *VCHAR EOL
COMMENTS = *COMMENT-LINE HEADER-END
LABEL = PROPERWORD
LABELS = COMM 1*LABEL EOL
DATA-LINE = *FLOAT EOL
DATA = *DATA-LINE