-
-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Store slim table tags in hstore #692
Comments
Internally |
If you want to go extensions, there's between-row deduplicating |
The version requirements for jsonb would require two different implementations in osm2pgsql, one for 9.4+ and another for previous versions. This rules it out. I do not see requiring the extension as an issue, given we require the postgis extension and many uses will require the hstore extension anyways. This would be true regardless of if I felt if jsonb was a better option for new code. Absent good evidence on performance, I don't. |
As for zson, requiring an extension that's not in contrib and compiled by default (or part of postgis) is a barrier. |
Hi, I am using direct About other (above) comments, nowdays — 2018 and pg v9.3 end of life, the v9.4+ is the reference version — is time to use The
|
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=NUM - (1 for old format, 2 for new format) * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. Note that this commit currently doesn't have a mechanism for detecting which format a database has when doing updates. This will be added later. For the time being you have to use the same command line options for updates that were used on import. This commit doesn't properly do testing. If you want to run all the tests with the new database format, set `middle_database_format = 2` in options.hpp and recompile. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=NUM - (1 for old format, 2 for new format) * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. Note that this commit currently doesn't have a mechanism for detecting which format a database has when doing updates. This will be added later. For the time being you have to use the same command line options for updates that were used on import. This commit doesn't properly do testing. If you want to run all the tests with the new database format, set `middle_database_format = 2` in options.hpp and recompile. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit doesn't properly do testing. If you want to run all the tests with the new database format, you have to set `options.middle_database_format = 2` in osm2pgsql.cpp and recompile. We'll need a better solution for that. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
I did some testing, and the representation of tags in a hstore is more efficient than our current
text[]
array of alternating keys and values. For testing latter can be turned into the former withhstore(tags)
.Doing this on the slim table of a planet and summing the
pg_column_size
gives a size of 38GB fortext[]
representation and 30GB forhstore
. The overall size of the current slim table is 94GB for the table and 190GB for the indexes so this is about a 5% space savings.I considered jsonb, but it is the same size as hstore.
This change would require the hstore extension for osm2pgsql databases. I do not consider this an issue because hstore is a standard part of contrib, and many databases require it already.
In addition to being more efficient, if someone did want to use our internal slim tables to query by tag, it is much easier to do so with hstore.
The text was updated successfully, but these errors were encountered: