diff --git a/specs/http-client/index.html b/specs/http-client/index.html index 9cf6cb9..5dcaf40 100644 --- a/specs/http-client/index.html +++ b/specs/http-client/index.html @@ -68,7 +68,7 @@ value: 'GitHub expath/expath-cg', href: 'https://github.com/expath/expath-cg/' }, { - value: 'File a bug', + value: 'Report an issue', href: 'https://github.com/expath/expath-cg/issues/' }] }], @@ -174,10 +174,15 @@

- This proposal provides an HTTP client interface for XPath 2.0. It - defines one extension function to perform HTTP requests, and has been - designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any - other XPath 2.0 usage. + This specification defines an HTTP client interface for XPath based + languages. The HTTP client interface is provided through a single + extension function which performs HTTP requests, and associated + error codes which define client error states. +

+

+ It has been designed to be compatible via [[!XPATH20]] with + [[!XQUERY]], and [[!XSLT20]]. It should also be suitable + for any other language which hosts XPath 2.0, such as [[!XPROC]].

@@ -187,7 +192,7 @@

Introduction

Namespace conventions

-

The module defined by this document does define one function in the namespace +

The module defined by this document defines one function in the namespace http://expath.org/ns/http-client. In this document, the http prefix, when used, is bound to this namespace URI.

Error codes are defined in the namespace http://expath.org/ns/error. In this @@ -198,18 +203,18 @@

Error management

Error conditions are identified by a code (a QName). When such an error condition is reached during the execution of the function, a dynamic error is thrown, with the corresponding error code (as if the standard XPath function - error had been called).

-

Error codes are defined through the spec. For too many reasons to enumerate here, the - HTTP protocol layer can raise an error. In this case, if the error condition is not - mentioned explicitly in the spec, the implementation must raise an error with an - appropriate message err:HC001.

+ fn:error had been called).

+

There are many cases where the + HTTP protocol layer may raise an error. In each case, if the error condition is not + mentioned explicitly in the spec, the implementation MUST raise an error with the + error code err:HC001.

The http:send-request function

-

This module defines an XPath extension function that sends an HTTP request and return the - corresponding response. It supports HTTP multi-part messages. Here is the signature of this +

This module defines an XPath extension function that sends an HTTP request and returns the + corresponding response. It also supports HTTP multi-part messages. Here is the signature of this function:

@@ -233,23 +238,23 @@

The http:send-request function

-

Besides the 3-params signature above, there are 2 other signatures that are convenient - shortcuts (corresponding to the full version in which corresponding params have been set +

Besides the arity-three signature above, there are two other signatures that are convenient + shortcuts (corresponding to the full version in which corresponding parameters have been set to the empty sequence). They are:

http:send-request($request as element(http:request)?, $href as xs:string?) as item()+ @@ -261,19 +266,19 @@

The http:send-request function

Sending a request

-

The functions defined in this module make one able to send a request to an HTTP server and - receive the corresponding response. Here is how the request is represented by the - parameters to this function, and how they are used to generate the actual HTTP request to - send.

- -
-

The request elements

-

The http:request element represents all the needed information to send the - HTTP request. So it is always possible to create such an element that will carry over all - the needed info for a particular request. For some of those values though, you can use an - additional param instead. For instance, some signatures define the +

The functions defined in this module allow the transmission of a request to an HTTP server and + the reception of the corresponding response. The request is represented by the + parameters to the function, which define how to generate the actual HTTP request to + transmit.

+ +
+

The Request Element

+

The http:request element represents all the information needed to send the + HTTP request.

+

Some of the values defined for the http:request element can instead be set through + a parameter to the function. For instance, some signatures define the parameter $href. If the value of this parameter is not the empty sequence, - it will then be used instead of the value of the attribute href on + it will override the value of the attribute href on the http:request element.

@@ -294,43 +299,38 @@

The request elements

    -
  • method is the HTTP verb to use, as GET, POST, etc. It is case - insensitive
  • -
  • href is the URI the request has to be sent to. It can be overridden by +
  • method is the HTTP method + to use, e.g.: GET, HEAD, POST, etc. It is case insensitive
  • +
  • href is the URI that the request is made to. It can be overridden by the parameter $href
  • http-version is the version of HTTP to use. It must be either the string 1.0 or 1.1. Default is implementation-defined. An implementation SHOULD support both and the default SHOULD be 1.1. If the value specified is not supported by a specific implementation, - it should throw an appropriate error message err:HC007.
  • -
  • status-only control how the response will look like; if it is true, only - the status code and the headers are returned, the content is not (no http:body nor - http:multipart, nor the interpreted additional value in the returned sequence, see - hereafter).
  • + it MUST throw the error err:HC007. +
  • status-only controls how the response will be parsed; if it is true, only + the status code and the headers are returned, and the content is omitted (no http:body, nor + http:multipart, nor the interpreted additional value in the returned sequence).
  • username, password, auth-method - and send-authorization are used for authentication (see section - below).
  • -
  • override-media-type is a MIME type. It can be used only with - http:request, and will override the Content-Type header returned by the - server.
  • -
  • follow-redirect control whether an HTTP redirect is automatically - followed or not. If it is false, the HTTP redirect is returned as the response. If it - is true (the default) the function tries to follow the redirect, by sending the same + and send-authorization are used for authentication (see ).
  • +
  • override-media-type is a Media Type ([[rfc6838]]). It can be used only with + http:request, and will override the Content-Type + header in the HTTP Response returned by the server.
  • +
  • follow-redirect controls whether an HTTP redirect is automatically + followed or not. If it is false, the HTTP redirect is returned as the response. If it + is true (the default) the function tries to follow the redirect, by sending the same request to the new address (including body, headers, and authentication credentials). Maximum one redirect is followed (there is no attempt to follow a redirect in response to following a first redirect).
  • timeout is the maximum number of seconds to wait for the server to - respond. If this time duration is reached, an error is thrown - err:HC006.
  • -
  • http:header represent an HTTP header, either in the - http:request or in the http:response elements, as defined - below.
  • -
  • http:multipart represents a multi-part body, either in a request or a - response, as defined below.
  • -
  • http:body represents the body, either of a request or a response, as - defined below.
  • + respond. If this time duration is exceeded, the error err:HC006 MUST be raised. +
  • http:header represents an HTTP header, either in the + http:request or in the http:response elements.
  • +
  • http:multipart represents a multi-part body, either in a request or a + response.
  • +
  • http:body represents the body, either of a request or a response.
-

The http:header element represents an HTTP header, either in a request or in +

The http:header element represents an HTTP header, either in a request or a response:

@@ -339,8 +339,8 @@ 

The request elements

<!-- Content: empty --> </http:header>
-

The http:body element represents the body of either an HTTP request or of an - HTTP response (in multipart requests and responses, it represents the body of a single one +

The http:body element represents the body of either an HTTP request or an + HTTP response (in multipart requests and responses, it represents the body of a single part):

@@ -365,23 +365,22 @@ 

The request elements

<!-- Content: any* --> </http:body>
-

The media-type is the MIME media type of the body part. It is mandatory. In - a request it is given by the user and is the default value of the Content-Type header if - it is not set explicitly. In a response, it is given by the implementation from the - Content-Type header returned by the server. The src attribute can be used in +

The media-type is the media type of the body part. It is mandatory. In + a request it is provided by the user and is the default value of the Content-Type header if + it is not set explicitly. In a response, it is provided by the implementation from the + Content-Type header returned by the server. The src attribute can be used in a request to set the body content as the content of the linked resource instead of using - the children of the http:body element. When this attribute is used, only + the children of the http:body element. When this attribute is used, only the media-type attribute must also be present, and there can be neither - content in the http:body element, nor any other attribute, or this is an - error err:HC004.

+ content in the http:body element, nor any other attribute, otherwise the + error err:HC004 MUST be raised.

All the attributes, except src, are used to set the corresponding - serialization parameter defined in [[!xslt-xquery-serialization]], as defined for the - XPath 3.0 function fn:serialize() [[!xpath-functions-30]]. Those attributes - can be given by the user on a request to control the way a part body is serialized. In the + serialization parameters defined in [[!xslt-xquery-serialization]]. Those attributes + can be provided by the user on a request to control the way a part body is serialized. In the response, the implementation can, but is not required, to provide some of them if it has the corresponding information (some of them do not make any sense in a response, therefore - they will never be on a response element, for instance version).

-

The http:multipart element represents an HTTP multi-part request or + they will never be supplied on the response element, for instance version).

+

The http:multipart element represents an HTTP Multipart Type request or response:

@@ -394,31 +393,31 @@ 

The request elements

and has to be a multipart media type (that is, its main type must be multipart). The boundary attribute is the boundary marker used to separate the several parts in the message (the value of the attribute is prefixed with - "--" to form the actual boundary marker in the request; on the other way, + "--" to form the actual boundary marker in the request; conversely, this prefix is removed from the boundary marker in the response to set the value of the attribute).

-

Serializing the request content

-

If the request can have content (one body or several body parts), it can be specified by - the http:multipart element, the http:body element, and/or the +

Serializing the Request

+

If the request entity body has content (one body or several body parts), it can be specified by + the http:multipart element, the http:body element, and/or the parameter $bodies. For each body, the content of the HTTP body is generated - as follow.

-

Except when its attribute src is present, a http:request + as follows.

+

Except when its attribute src is present, a http:request element can have several attributes representing serialization parameters, as defined in [[!xslt-xquery-serialization]]. This spec defines in addition the method - 'binary'; in this case the body content must be either an xs:hexBinary or an - xs:base64Binary item, and no other serialization parameter can be set + binary; in this case the body content must be either an xs:hexBinary or an + xs:base64Binary item, and no other serialization parameter can be set besides media-type.

The default value of the serialization method depends on the media-type: it - is 'xml' if it is an XML media type, 'html' if it is an HTML - media type, 'xhtml' if it is application/xhtml+xml, - 'text' if it is a textual media type, and 'binary' for any other + is xml if it is an XML media type, html if it is an HTML + media type, xhtml if it is application/xhtml+xml, + text if it is a textual media type, and binary for any other case.

-

When a body element has an empty content (i.e. it has no child node at all) its content - is given by the parameter $bodies. In a single part request, this param must - have at most one item. If the body is empty, the param cannot be the empty sequence. In a +

When a body element has no content (i.e. no child nodes) its content + is given by the parameter $bodies. In a single part request, this parameter must + have at most one item. If the body is empty, the parameter cannot be the empty sequence. In a multipart request, $bodies must have as many items as there are empty body elements. If there are three empty body elements, the content of the first of them is $bodies[1], and so on. The number of empty body elements must be equal to @@ -429,30 +428,31 @@

Serializing the request content

Authentication

HTTP authentication when sending a request is controlled by the attributes username, password, auth-method and - send-authorization on the element http:request. + send-authorization on the http:request element. If username has a value, password and auth-method must have a value too. And if any one of the three other attributes have been set, username must be set too.

-

The attribute auth-method can be either "Basic" or - "Digest", but other values can also be used, in an implementation-defined - way. The handling of those attributes must be done in conformance to [[!rfc2617]]. - If send-authorization is true (default value is false) and the authentication +

The attribute auth-method can be either Basic or + Digest, but other values can also be used, in an implementation-defined + way. The handling of those attributes must be done in conformance with [[!rfc2617]]. + If send-authorization is true (default value is false) and the authentication method supports generating the header Authorization without challenge, the - request contains this header. The default value is to send a non-authenticated request, - and if the response is an authentication challenge, then only send the credentials in a - second message.

+ request contains this header. The default value is to send a non-authenticated request, + and if the response is an authentication challenge, only then send the credentials in a + second request.

-

Dealing with the response

+

Handling the Response

After having sent the request to the HTTP server, the function waits for the response. - It analyses it and returns a sequence representing this response. This sequence has - an http:response element as first item, which is followed be an additional + The HTTP client parses the raw response and the function returns a representation of the + response as a sequence. The sequence has + an http:response element as the first item, which is followed by an additional item for each body or body part in the response.

-

The result element

+

The Response Element

 <http:response status = integer
@@ -460,86 +460,84 @@ 

The result element

<!-- Content: (http:header*, (http:body|http:multipart)?) --> </http:response>
-

This is the first item returned by the function defined in this module. - The status attribute is the HTTP status code returned by the server, - and message is the message coming with the status on the status line. +

The http:response element is the first item in the sequence returned by the function. + The status attribute is the HTTP Status Code returned by the server, + and message is the Reason Phrase coming with the Status-Line. The http:header elements are as defined for the request, but represent - instead the response headers. The http:body - and http:multipart elements are also like in the request, but + instead the response headers. The http:body + and http:multipart elements are also like in the request, but http:body elements must be empty.

-

Representing the result content

-

Instead of being inserted within the http:response element, the content of - each body is returned as a single item in the return sequence. Each item is in the same - order (after the http:response element) than the http:body - elements. For each body, the way this item is built from the HTTP response is as +

The Response Entity Body

+

Instead of being inserted within the http:response element, the content of + each body is returned as a single item in the returned sequence. Each item is in the same + order (after the http:response element) as the http:body + elements. For each body, the way this item is built from the HTTP response is as follow.

If the status-only attribute has the value true (default is false), the returned sequence will only contain the - http:response element (with the headers, but also the empty - http:body or http:multipart elements, as if - status-only was false), and the following items, representing the bodies + http:response element (with the headers, but also the empty + http:body or http:multipart elements, as if + status-only was false), and the following items, representing the bodies content are not generated from the HTTP response.

-

For each body that has to be interpreted, the following rules apply in order to build the - corresponding item. If the body media type is a text media type, the item is a string, - containing the body content. If the media type is an XML media type, the content is - parsed and the item is the resulting document node. If the media type is an HTML type, +

For each body that has to be parsed, the following rules apply in order to build the + corresponding XDM item. If the body media type is a text media type, the item is an xs:string, + containing the body content. If the media type is an XML media type, the content is + parsed and the item is the resulting XDM document-node. If the media type is an HTML type, the content is tidied up and parsed (this process is - implementation-dependant) and the item is the resulting document node. If this is a - binary media type, the content is returned as a base64Binary item. From the previous - rules, a result item can then be either a document node (from XML or HTML), a string or a - base64Binary.

+ both implementation-dependent and implementation-defined) and the item is the resulting XDM document-node. If this is a + binary media type, the content is returned as an xs:base64Binary item. From the previous + rules, a result item can then be either a document-node (from XML or HTML), an xs:string, or a + xs:base64Binary.

When the type of a part is either XML or HTML, its body has to be parsed into a document - node. If there is any error when parsing the content, an error is raised with an - appropriate message err:HC002.

+ node. If an error occurs whilst parsing the content, the error err:HC002 MUST be raised.

If the attribute override-media-type is set on the request, its value is - used instead of the Content-Type returned by the HTTP server. If the Content-Type of the - response is a multipart type, the value of override-media-type can only be a + used instead of the Content-Type header returned by the HTTP server. If the Content-Type header of the + response indicates a multipart type, the value of override-media-type can only be a multipart type, or application/octet-stream (to get the raw entity as a - binary item). If it is not, this is an error err:HC003.

+ binary item). If it is not, the error err:HC003 MUST be raised.

-

Content types handling

-

In both requests and responses, MIME type strings are used to choose the way the entity - content has to be respectively serialized or parsed. Four different kinds of type are - defined here, which are used in the above text about sending request and receiving response. - The intent is to provide the spirit of the entity content handling regarding its content - type, but an implementation is encouraged to deviate from those rules if it is obvious that - a particular type should be treated in a specific way (normally, that would be the case only - to treat a binary type as another type).

+

Processing Media Types

+

In both requests and responses, Media Type strings are used to choose the way the entity + content has to be serialized or parsed.

+

We define four different classes of Media Type, which are used for sending requests and receiving responses. + The intent is to provide guidance as to handling the entity content with respect to its content + type, but an implementation is permitted to deviate from those rules if it is obvious that + a particular type should be treated in a specific way, typically this can be useful for binary types such as [[EXI]].

    -
  • An XML media type has a MIME type of text/xml, +
  • For XML the following media types are appropriate: text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity, as defined in [[!rfc3023]] (except - that application/xml-dtd is considered a text media type). MIME types ending - by +xml are also XML media types.
  • -
  • An HTML media type has a MIME type of text/html.
  • -
  • Text media types are the remaining types beginning with text/.
  • -
  • Binary types are all the other types. An implementation can treat some of those binary + that application/xml-dtd is considered a text media type). Media types ending + with +xml are also considered XML types.
  • +
  • For HTML the media type text/html is suggested.
  • +
  • For Text, a media type beginning with text/ is suggested.
  • +
  • All other media types are treated as binary. An implementation MAY treat some of those binary types as either an XML, HTML or text media type if it is more appropriate (this is implementation-defined).
-

Summary of Error Conditions

+

Error Codes

err:HC001
An HTTP error occurred.
err:HC002
Error parsing the entity content as XML or HTML.
err:HC003
-
With a multipart response, the override-media-type must be either a multipart media type - or application/octet-stream.
+
A multipart response was received, but the override-media-type was not a multipart media type + or application/octet-stream.
err:HC004
-
The src attribute on the body element is mutually exclusive with all other attribute - (except the media-type).
+
The src attribute on the body element is mutually exclusive with all other attribute + (except the media-type).
err:HC005
-
The request element is not valid.
+
The http:request element is invalid.
err:HC006
A timeout occurred waiting for the response.
err:HC007