S3 is an XML-based API. When you do a list operation i.e. GET /<bucket>?list-type=2 you get a response of the following

HTTP/1.1 200
x-amz-request-charged: RequestCharged
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult>
   <IsTruncated>boolean</IsTruncated>
   <Contents>
      <ChecksumAlgorithm>string</ChecksumAlgorithm>
      ...
      <ETag>string</ETag>
      <Key>string</Key>
      <LastModified>timestamp</LastModified>
      <Owner>
         <DisplayName>string</DisplayName>
         <ID>string</ID>
      </Owner>
      <RestoreStatus>
         <IsRestoreInProgress>boolean</IsRestoreInProgress>
         <RestoreExpiryDate>timestamp</RestoreExpiryDate>
      </RestoreStatus>
      <Size>long</Size>
      <StorageClass>string</StorageClass>
   </Contents>
   ...
   <Name>string</Name>
   <Prefix>string</Prefix>
   <Delimiter>string</Delimiter>
   <MaxKeys>integer</MaxKeys>
   <CommonPrefixes>
      <Prefix>string</Prefix>
   </CommonPrefixes>
   ...
   <EncodingType>string</EncodingType>
   <KeyCount>integer</KeyCount>
   <ContinuationToken>string</ContinuationToken>
   <NextContinuationToken>string</NextContinuationToken>
   <StartAfter>string</StartAfter>
</ListBucketResult>

The most salient thing for listing is the <Contents> blocks which enumerate the contents of the bucket. Of particular interest is the <Key> element, which specifies keys you can look up with a GetObject API request.

Escaping

One of the most notorious difficulties with XML is escaping. Control characters like < are special and should be replaced with &lt. Slashes / are super special and directory deliminators. Furthermore, XML supports a special block syntax <![CDATA[...]]> for character data that should not be escaped.

So this begs the question, what does the S3 API return when the keys contain special data? I tried uploading files with the following keys, through the S3 console interface

&lt
<![CDATA[...]]>
foo<Contents>

Interestingly the CDATA block hit something strange which manifested as noise in UI of the type

However, the underlying response in the XML response is escaped with URL encoding (%26 not &amp, %3C not &lt). So these represents tricky cases that should be considered when testing vendor conformance or alternative parsing mechanisms.

<Key>%26lt</Key>
<Key>%3C%21%5BCDATA%5B...%5D%5D%3E</Key>
<Key>foo%3CContents%3E</Key>

Amazon key guidelines

Amazon publishes a guideline on S3 keynames. Special handling is required for

&$@=;/:+,?
multiple spaces
ASCII ranges 0–31 Amazon says to avoid
\{^}%\]">[~#| and backticks
ASCII range 128–255

Trying to upload files with names

&$@=; :+,?
\{^}%\]">[~#|

yields

<Key>%26%24%40%3D%3B++%3A%2B%2C%3F</Key>
<Key>%5C%7B%5E%7D%25%5C%5D%22%3E%5B%7E%23%7C</Key>

Notice spaces are converted to + while + is URL encoded to %2B, this is x-www-form-urlencoded not the similar URI encoding(!).

Unit Test

Tested in conformance testing here. Note \x00 should work but doesn't (might be a runtime issue).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 XML Escaping Cases.md

S3 XML Escaping Cases.md

Escaping

Amazon key guidelines

Unit Test

Files

S3 XML Escaping Cases.md

Latest commit

History

S3 XML Escaping Cases.md

File metadata and controls

Escaping

Amazon key guidelines

Unit Test