A Element object is an element of a sequential element stream or element chain produced by the Rexx Parser when parsing a Rexx program.
Each element represents an elementary portion of the parsed program, or a parser-generated marker. Each marker is located between two portions and occupies no space.
Every element has a element category
that identifies its syntactic category; additionally,
when the element category is .EL.TAKEN_CONSTANT
,
the element also has a subcategory,
which further specifies the type and use of the taken constant.
Markers
are used to convey additional meaning to the element
chain. For example, a marker may indicate
that a clause has ended, or that an implicit
EXIT
instruction has to be assumed at the end
of a code section. Markers are also called
inserted,
implied,
or zero-length
elements.
Some of the markers are dictated by the Rexx Language
conventions (for example, a semicolon is assumed
after THEN
, ELSE
and OTHERWISE
, or before THEN
),
and others are added by the Rexx Parser to enhance
the element stream by ensuring that it has certain
properties (for example, that a clause is always
delimited by two end-of-clause markers).
Portions, or non-inserted elements represent fragments of the source program, and, although they roughly correspond to the Rexx notion of token, their concept is extended to encompass elements that are not considered tokens by Rexx, like comments, or non-significant whitespace. This definition is chosen so that the following invariant becomes true:
A program source is always equivalent to the ordered concatenation of the values of its element chain.
Elements which are non-inserted but do not fall under the Rexx definition of token are ignorable elements.
You can check whether a element is ignorable by using the isIgnorable method of the Element class, and you can make an element ignorable by using the makeIgnorable method.
Please note that the knowledge of the fact that a certain element is or is not ignorable may imply a quite involved syntactical analysis of a relatively large part of the program. Think, for example, of ignorable and non-ignorable whitespace when parsing a template containing complex expressions which include blank operators.
A element C
representing a compound variable can be managed as a whole,
or decomposed into its constituent elements or parts, by using
the parts instance method of the Element class (see below).
This method is only available when the class of the element is
.EL.COMPOUND_VARIABLE
or .EL.EXPOSED_COMPOUND_VARIABLE
,
and then it returns an array containing all the parts
(elements) of the compound variable.
The first element of the array is always the stem name,
that is, it is of class .EL.STEM_VARIABLE
or .EL.EXPOSED_STEM_VARIABLE
,
and it includes the first dot in the compound variable name.
The rest of the components are a sequence of either
simple variables, of class .EL.SIMPLE_VARIABLE
or .EL.EXPOSED_SIMPLE_VARIABLE
;
signless integers, of class .EL.INTEGER_NUMBER
;
pure dotless constant symbols, of class .EL.SYMBOL_LITERAL
; or
separator dots, of class .EL.TAIL_SEPARATOR
.
A fundamental property of an element E
is its
element categories,
E~category
, a one-byte value that identifies the syntactic
category of the element, regardless of whether it is ignorable or not,
or implied or not. The Rexx Parser is able to recognize and assign
a very wide variety of categories; you can browse the
listing of possible classes here.
At initialization time, the Rexx Parser stores a set of symbolic
element names in the global environment. All these names start
with the .EL.
prefix.
--------------------------------------------------------------------------------
-- Some sample element categories --
--------------------------------------------------------------------------------
.EL.EXPOSED_STEM_VARIABLE -- A stem variable that has been exposed
.EL.ENVIRONMENT_SYMBOL -- An environment symbol
.EL.DIRECTIVE_START -- The directive start sequence, "::"
.EL.ELLIPSIS -- The ARG instruction ellipsis, "..."
.EL.ASG.PLUS -- The "+=" compound assignment sequence
The CategoryName
public routine returns the symbolic
form of an element category value.
Say CategoryName( .EL.ELLIPSIS ) -- EL.ELLIPSIS
A collection of convenient names for several
category sets is also created;
these start with the .ALL.
prefix.
More information about the sets of element categories
can be found here.
--------------------------------------------------------------------------------
-- Some sample category sets --
--------------------------------------------------------------------------------
.ALL.SPECIAL_CHARS -- All the special chars
.ALL.STRINGS -- Standard, hexadecimal and binary
.ALL.NUMBERS -- Integers, fractional and exponential
The Element class redefines the <
and \<
operators to simplify
testing for element categories and sets of categories:
element < category; /* is equivalent to */ element~category == category
element < set; /* is equivalent to */ set~contains(element~category)
element \< category; /* is equivalent to */ element~category \== category
element \< set; /* is equivalent to */ \set~contains(element~category)
"<"
can be read as "is", "is a", "belongs to", or "in",
depending on the context:
-- element is...
element < .EL.DIRECTIVE_START -- ...a directive start sequence
element < .ALL.NUMBERS -- ...a numeric element
element < .EL.KEYWORD -- ...a keyword
A "taken constant" is "a string or a symbol taken as a constant" (it is an
unfortunate name, but, for lack of a shorter one, it has stuck; see the ANSI
standard, 6.3.2.22, taken_constant
). Taken constants appear in several
places in the Rexx language syntax definition. For example, a symbol
appearing in a label position is "taken as a constant", in the sense
that no variable substitution is performed, and, if its syntactical
form is that of a compound variable, no component value is substituted.
Say This.is.a.compound.variable...1234.3ea56...but
This.is.not.a.compound.variable:
Exit
A taken constant element E
has always the same element category,
.EL.TAKEN_CONSTANT
but, additionally, it has
an extra attribute, E~subCategory
, which further
determines the syntactic category of the element. Like its
element category, the element subcategory is also a one-byte value,
and it also has a set of symbolic names created
at parser initialization time.
In the case of taken constants, element << subcategory
is redefined to mean "the category of the element is .EL.TAKEN_CONSTANT,
and the subbcategory of the element is subcategory", and
similarly for the \<<
operator.
--------------------------------------------------------------------------------
-- Some sample possible taken constant subCategory values --
--------------------------------------------------------------------------------
.BUILTIN.FUNCTION.NAME -- A BIF name, i.e., not an internal or external
-- routine, nor a ::RESOURCE name
.LABEL.NAME -- A label
.METHOD.NAME -- A method name
.RESOURCE.DELIMITER.NAME -- The optional end delimiter of a ::RESOURCE
Most subcategory names end with the .NAME
suffix, except
for some very few ones, which end with .VALUE
.
The ConstantName
public routine returns the symbolic
form of a subcategory value.
Say ConstantName( .LABEL.NAME ) -- LABEL.NAME
The expression element < categories
returns .True
when
categories~contains(element~category)
,
and .False
otherwise. Please note that when categories
contains only one byte (i.e., it represents a single element category),
element < categories
is equivalent to element~category == categories
.
The expression element << subcategories
returns .True
when
element~category == .EL.TAKEN_CONSTANT & subcategories~contains(element~subcategory)
,
and .False
otherwise.
The negation of the <
operator, \<
, is also overloaded.
The negation of the <<
operator, \<<
, is also overloaded.
Returns a one-byte value determining the category of the element. Element categories are described in detail here.
Returns a string formatted as "line column"
.
This is the position of the first character in the element,
when the element has an extent. Otherwise, it is the position
of the first character of the following element in the same line,
if one exists, or the position of the first character
after the previous element in the same line, if one exists, or both.
A semicolon inserted in an empty line will have a from value of "line 1".
See also method to.
Returns .True
when the destination element is one of:
- A variable assignment target (i.e., the left-hand-side of an assignment)
- A assignment message target object in an assignment instruction.
- A parsing template variable (when this variable will receive a value, not when the variable is used as part of a pattern).
- An argument variable in a
USE ARG
instruction. - A variable reference term in a
USE ARG
instruction. - A assignment message target object in a
USE ARG
instruction. - A counter specified using the
COUNTER
subkeyword in aDO
orLOOP
instruction. - A control variable used in an iterative loop.
- A variable in an
EXPOSE
,USE LOCAL
,PROCEDURE EXPOSE
orDROP
instruction.
See also method setAssigned.
Returns .True
when the destination element is ignorable.
Comments are always ignorable, as are, in many cases,
whitespace sequences. Higher levels of parsing
"jump over" (i.e., they ignore) ignorable elements.
Please refer to the description of
ignorable elements, above.
See also method makeIgnorable.
Makes the destination element ignorable.
See also method isIgnorable.
The next element in the parsing stream,
or .Nil
if this is the last element in the stream.
See also method prev.
Returns an array containing the parts that constitute a compound variable. This method is only available when the element is a compound variable. You can find more information about this method here.
Returns the previous element in the parsing stream,
or .Nil
if this is the first element in the stream.
See also method next.
Marks the element as assigned, so that subsequent calls to
isAssigned will return .True
.
Returns the contents of the element, as it appears on the source file. This method cannot be used with comments and other element classes that are potentially multi-line, like the resource data of a resource.
Returns a one-byte value determining the element subCategory.
This method is only available when the element class is
.EL.TAKEN_CONSTANT
. Element subcategories are described in detail
here.
Returns a string formatted as "line column"
.
The position of the first character after the element,
when the element has a positive extent;
please note that this can point to the first character
"out of the line" when the element is at the end of the line.
When a element is inserted or implied
(that is, it has a zero length extent),
to
returns the same value as from
.
See also method from.
Returns the contents of the element, partially interpreted
(see the description for the source method).
Symbols are translated to uppercase, and strings are interpreted,
i.e., double quotes are deleted in literal strings,
and binary and hexadecimal strings are transformed into byte strings.
As an example, and assuming an ASCII encoding,
this means that the strings "a"
, "61"X
and "0110 0001"B
have different sources, but identical values.