Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type #438

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,14 @@ The primitive type is a 2-byte `FIXED_LEN_BYTE_ARRAY`.

The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT` and `DOUBLE`.

### VARIABLE_SIZE_LIST

The `VARIABLE_SIZE_LIST` annotation represents a variable-size list of elements
of a primitive data type. It must annotate a `BYTE_ARRAY` primitive type.

The `BYTE_ARRAY` data is interpreted as a variable size sequence of elements of
the same primitive data type.

## Temporal Types

### DATE
Expand Down
18 changes: 12 additions & 6 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,9 @@ struct ListType {} // see LogicalTypes.md
struct EnumType {} // allowed for BYTE_ARRAY, must be encoded with UTF-8
struct DateType {} // allowed for INT32
struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes
struct VariableSizeListType { // allowed for BYTE_ARRAY, see LogicalTypes.md
1: required Type type;
}

/**
* Logical type to annotate a column that is always null.
Expand Down Expand Up @@ -397,12 +400,15 @@ union LogicalType {
8: TimestampType TIMESTAMP

// 9: reserved for INTERVAL
10: IntType INTEGER // use ConvertedType INT_* or UINT_*
11: NullType UNKNOWN // no compatible ConvertedType
12: JsonType JSON // use ConvertedType JSON
13: BsonType BSON // use ConvertedType BSON
14: UUIDType UUID // no compatible ConvertedType
15: Float16Type FLOAT16 // no compatible ConvertedType
10: IntType INTEGER // use ConvertedType INT_* or UINT_*
11: NullType UNKNOWN // no compatible ConvertedType
12: JsonType JSON // use ConvertedType JSON
13: BsonType BSON // use ConvertedType BSON
14: UUIDType UUID // no compatible ConvertedType
15: Float16Type FLOAT16 // no compatible ConvertedType
// 16: reserved for GEOMETRY
// 17: reserved for FIXED_SIZE_LIST
18: VariableSizeListType VARIABLE_SIZE_LIST // no compatible ConvertedType
}

/**
Expand Down