From 578a23c1b97eeac9a0f4ec4f2479a229ffd53bbf Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Tue, 1 Oct 2024 17:48:10 +0200 Subject: [PATCH 1/7] Add RFC 102 text: Embedding resource files into libgdal --- doc/source/development/rfc/index.rst | 1 + .../rfc/rfc102_embedded_resources.rst | 170 ++++++++++++++++++ doc/source/spelling_wordlist.txt | 6 + 3 files changed, 177 insertions(+) create mode 100644 doc/source/development/rfc/rfc102_embedded_resources.rst diff --git a/doc/source/development/rfc/index.rst b/doc/source/development/rfc/index.rst index 73f24bcf634e..fc7714e67617 100644 --- a/doc/source/development/rfc/index.rst +++ b/doc/source/development/rfc/index.rst @@ -107,3 +107,4 @@ RFC list rfc98_build_requirements_gdal_3_9 rfc99_geometry_coordinate_precision rfc101_raster_dataset_threadsafety + rfc102_embedded_resources diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst new file mode 100644 index 000000000000..1012a6436eb2 --- /dev/null +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -0,0 +1,170 @@ +.. _rfc-102: + +=================================================================== +RFC 102: Embedding resource files into libgdal +=================================================================== + +============== ============================================= +Author: Even Rouault +Contact: even.rouault @ spatialys.com +Started: 2024-Oct-01 +Status: Draft +Target: GDAL 3.10 or 3.11 +============== ============================================= + +Summary +------- + +This RFC uses C23 ``#embed`` pre-processor directive, when available, +to be able to embed GDAL resource files directly into libgdal. It is also +intended to be used for PROJ, in particular for its :file:`proj.db` file. + +Motivation +---------- + +Some parts of GDAL core, but mostly drivers, depend on a number of resource +files for correct execution. Locating those resource files on the filesystem +can be painful in some use cases of GDAL, that involve relocating the GDAL +binary at installation time. One such case could be the GDAL embedded in Rasterio +or Fiona binary wheels where :config:`GDAL_DATA` must be correctly set currently. +Web-assembly (WASM) use cases come also to mind as users of GDAL builds where +resources are directly included in libgdal. + +Technical solution +------------------ + +The C23 standard includes a `#embed "filename" `__ +pre-processor directive that ingests the specified filename and returns its +content as tokens that can be stored in a unsigned char or char array. + +Getting the content of a file into a variable is as simple as the following +(which also demonstrates adding a nul-terminating character when this is needed): + +.. code-block:: c + + static const char szPDS4Template[] = { + #embed "data/pds4_template.xml" + , '\0'}; + +Compiler support +---------------- + +Support for that directive is still very new. clang 19.1 is the +first compiler which has a release including it, and has an efficient +implementation of it, able to embed very large files with minimum RAM and CPU +usage. + +The development version of GCC 15 also supports it, but in a non-optimized way +for now. i.e. trying to include large files, of several tens of megabytes could +cause significant compilation time, but without impact on runtime. This is not +an issue for GDAL use cases, and there is intent from GCC developers to improve +this in the future. + +Embedding PROJ's :file:`proj.db` of size 9.1 MB with GCC 15dev at time of writing takes +18 seconds and 1.7 GB RAM, compared to 0.4 second and 400 MB RAM for clang 19, +which is still reasonable (Generating :file:`proj.db` itself from its source .sql files +takes one minute on the same system). + +There is no timeline for Visual Studio C/C++ at time of writing (it has been +`requested by users `__) + +To be noted that currently clang 19.1 only supports ``#embed`` in .c files, not +C++ ones (the C++ standard has not yet adopted this feature). So embedding +resources must be done in a .c file, which is obviously not a problem since +we can easily export symbols/functions from a .c file to be available by C++. + +New CMake options +----------------- + +Resources will only be embedded if the new ``EMBED_RESOURCE_FILES`` CMake option +is set to ``ON``. This option will default to ``ON`` for static library builds +and if `C23 ``#embed`` is detected to be available. Users might also turn it to ON for +shared library builds. A CMake error is emitted if the option is turned on but +the compiler lacks support for it. + +A complementary CMake option ``USE_ONLY_EMBEDDED_RESOURCE_FILES`` will also +be added. It will default to ``OFF``. When set to ON, GDAL will not try to +locate resource files in the GDAL_DATA directory burnt at build time into libgdal +(``${install_prefix}/share/gdal``), or by the :config:`GDAL_DATA` configuration option. + +Said otherwise, if ``EMBED_RESOURCE_FILES=ON`` but ``USE_ONLY_EMBEDDED_RESOURCE_FILES=OFF``, +GDAL will first try to locate resource files from the file system, and +fallback to the embedded version if not found. + +The resource files will still be installed in ``${install_prefix}/share/gdal``, +unless ``USE_ONLY_EMBEDDED_RESOURCE_FILES`` is set to ON. + +Impacted code +------------- + +- gcore: embedding LICENSE.TXT, and tms_*.json files +- frmts/grib: embedding GRIB2 CSV files +- frmts/hdf5: embedding bag_template.xml +- frmts/nitf: embedding nitf_spec.xml +- frmts/pdf: embedding pdf_composition.xml +- frmts/pds: embedding pds4_template.xml +- ogr/ogrsf_frmts/dgn: embedding seed_2d.dgn and seed_3d.dgn +- ogr/ogrsf_frmts/dxf: embedding header.dxf and leader.dxf +- ogr/ogrsf_frmts/gml: embedding .gfs files and gml_registry.xml +- ogr/ogrsf_frmts/gmlas: embedding gmlasconf.xml +- ogr/ogrsf_frmts/miramon: embedding MM_m_idofic.csv +- ogr/ogrsf_frmts/osm: embedding osm_conf.ini +- ogr/ogrsf_frmts/plscenes: embedding plscenesconf.json +- ogr/ogrsf_frmts/s57: embedding s57*.csv files +- ogr/ogrsf_frmts/sxf: embedding default.rsc +- ogr/ogrsf_frmts/vdv: embedding vdv452.xml + +PROJ specificities +------------------ + +Loading of the embedded :file:`proj.db` will involve using the +`SQLite3 memvfs `__, +as done by +`DuckDB Spatial `__ + +Considered alternatives +----------------------- + +Including resource files into libraries has been a long-wished feature of C/C++. +Different workarounds have emerged over the years, such as the use of the +``od -x`` utility, GNU ``ld`` linker ``-b`` mode, or CMake-based solutions such +as https://jonathanhamberg.com/post/cmake-file-embedding/ + +We could potentially use the later to address non-C23 capable compilers, but +we have chosen not to do that, for the sake of implementation simplicity. And, +if considering using the CMake trick as the only solution, we should note that +C23 #embed has the potential for better compile time, as demonstrated by clang +implementation. + +Backward compatibility +---------------------- + +Fully backwards compatible. + +C23 is not required if EMBED_RESOURCE_FILES is not enabled. + +Documentation +------------- + +The 2 new CMake variables will be documented. + +Testing +------- + +The existing fedora:rawhide continuous integration target, which has now clang +19.1 available, will be modified to test the effect of the new variables. + +Local builds using GCC 15dev builds of https://jwakely.github.io/pkg-gcc-latest/ +have also be successfully done during the development of the candidate implementation + +Related issues and PRs +---------------------- + +- https://github.com/OSGeo/gdal/issues/10780 + +- Candidate implementation (in progress): https://github.com/OSGeo/gdal/compare/master...rouault:gdal:embedded_resources?expand=1 + +Voting history +-------------- + +TBD diff --git a/doc/source/spelling_wordlist.txt b/doc/source/spelling_wordlist.txt index 77d3e4e14e6c..bb808f788eb7 100644 --- a/doc/source/spelling_wordlist.txt +++ b/doc/source/spelling_wordlist.txt @@ -1353,6 +1353,7 @@ IdentificationTolerance identificator Identificator IDisposable +idofic Idrisi iDriver idx @@ -1782,6 +1783,7 @@ minY MinZ Mipmaps MiraD +miramon mis mitab mkdir @@ -2409,6 +2411,7 @@ Placemark plaintext Plessis plmosaic +plscenes plscenesconf pM pnBufferSize @@ -2841,6 +2844,7 @@ RPF rpr RRaster rrd +rsc rsiz rst rsync @@ -3142,6 +3146,7 @@ swi Swif swiftclient swq +sxf sym symlinked syntaxes @@ -3413,6 +3418,7 @@ vcpkg vct vcvars vdc +vdv vecror VectorInfo VectorInfoOptions From e4982d826b478a6ffd3d548c521c6534f5096dd7 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Thu, 3 Oct 2024 22:00:22 +0200 Subject: [PATCH 2/7] RFC102 text: PROJ specificities --- doc/source/development/rfc/rfc102_embedded_resources.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst index 1012a6436eb2..ffe22a7c8450 100644 --- a/doc/source/development/rfc/rfc102_embedded_resources.rst +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -122,6 +122,10 @@ Loading of the embedded :file:`proj.db` will involve using the as done by `DuckDB Spatial `__ +Note: acknowledging how critical access to proj.db is, we make an exception of +also allowing embedding it with non-C23 capable compilers, using a CMake script, +derived from https://jonathanhamberg.com/post/cmake-file-embedding/. + Considered alternatives ----------------------- @@ -162,7 +166,9 @@ Related issues and PRs - https://github.com/OSGeo/gdal/issues/10780 -- Candidate implementation (in progress): https://github.com/OSGeo/gdal/compare/master...rouault:gdal:embedded_resources?expand=1 +- GDAL candidate implementation (in progress): https://github.com/OSGeo/gdal/compare/master...rouault:gdal:embedded_resources?expand=1 + +- PROJ candidate implementation: https://github.com/OSGeo/PROJ/pull/4265 Voting history -------------- From 98cfa1da16dadca1aa7642e08dc8b57cc23f145e Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Wed, 9 Oct 2024 16:27:07 +0200 Subject: [PATCH 3/7] RFC102 text: updates --- .../development/rfc/rfc102_embedded_resources.rst | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst index ffe22a7c8450..19572c9d0b6a 100644 --- a/doc/source/development/rfc/rfc102_embedded_resources.rst +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -16,8 +16,9 @@ Summary ------- This RFC uses C23 ``#embed`` pre-processor directive, when available, -to be able to embed GDAL resource files directly into libgdal. It is also -intended to be used for PROJ, in particular for its :file:`proj.db` file. +to be able to optionally embed GDAL resource files directly into libgdal. +It is also intended to be used for PROJ, in particular for its :file:`proj.db` file. +For PROJ, a fallback mechanism will be used to not require C23. Motivation ---------- @@ -88,7 +89,7 @@ locate resource files in the GDAL_DATA directory burnt at build time into libgda (``${install_prefix}/share/gdal``), or by the :config:`GDAL_DATA` configuration option. Said otherwise, if ``EMBED_RESOURCE_FILES=ON`` but ``USE_ONLY_EMBEDDED_RESOURCE_FILES=OFF``, -GDAL will first try to locate resource files from the file system, and +GDAL/PROJ will first try to locate resource files from the file system, and fallback to the embedded version if not found. The resource files will still be installed in ``${install_prefix}/share/gdal``, @@ -102,7 +103,7 @@ Impacted code - frmts/hdf5: embedding bag_template.xml - frmts/nitf: embedding nitf_spec.xml - frmts/pdf: embedding pdf_composition.xml -- frmts/pds: embedding pds4_template.xml +- frmts/pds: embedding pds4_template.xml and vicar.json - ogr/ogrsf_frmts/dgn: embedding seed_2d.dgn and seed_3d.dgn - ogr/ogrsf_frmts/dxf: embedding header.dxf and leader.dxf - ogr/ogrsf_frmts/gml: embedding .gfs files and gml_registry.xml @@ -122,6 +123,8 @@ Loading of the embedded :file:`proj.db` will involve using the as done by `DuckDB Spatial `__ +Embedding of resource files in PROJ is limited to :file:`proj.db` + Note: acknowledging how critical access to proj.db is, we make an exception of also allowing embedding it with non-C23 capable compilers, using a CMake script, derived from https://jonathanhamberg.com/post/cmake-file-embedding/. @@ -145,7 +148,7 @@ Backward compatibility Fully backwards compatible. -C23 is not required if EMBED_RESOURCE_FILES is not enabled. +C23 is not required, unless EMBED_RESOURCE_FILES is enabled in GDAL. Documentation ------------- @@ -166,7 +169,7 @@ Related issues and PRs - https://github.com/OSGeo/gdal/issues/10780 -- GDAL candidate implementation (in progress): https://github.com/OSGeo/gdal/compare/master...rouault:gdal:embedded_resources?expand=1 +- GDAL candidate implementation: https://github.com/OSGeo/gdal/pull/10972 - PROJ candidate implementation: https://github.com/OSGeo/PROJ/pull/4265 From 87a21b5f66859a7b698d89a2248c116ce53ca1ad Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Wed, 9 Oct 2024 17:05:36 +0200 Subject: [PATCH 4/7] RFC102: also extent to proj.ini --- .../development/rfc/rfc102_embedded_resources.rst | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst index 19572c9d0b6a..1c58ecf3e15f 100644 --- a/doc/source/development/rfc/rfc102_embedded_resources.rst +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -9,7 +9,7 @@ Author: Even Rouault Contact: even.rouault @ spatialys.com Started: 2024-Oct-01 Status: Draft -Target: GDAL 3.10 or 3.11 +Target: GDAL 3.11, PROJ 9.6 ============== ============================================= Summary @@ -17,8 +17,9 @@ Summary This RFC uses C23 ``#embed`` pre-processor directive, when available, to be able to optionally embed GDAL resource files directly into libgdal. -It is also intended to be used for PROJ, in particular for its :file:`proj.db` file. -For PROJ, a fallback mechanism will be used to not require C23. +It is also intended to be used for PROJ, for its :file:`proj.db` and +:file:`proj.ini` files. For PROJ, a fallback mechanism will be used to not +require C23. Motivation ---------- @@ -123,7 +124,8 @@ Loading of the embedded :file:`proj.db` will involve using the as done by `DuckDB Spatial `__ -Embedding of resource files in PROJ is limited to :file:`proj.db` +Embedding of resource files in PROJ is limited to :file:`proj.db` and + :file:`proj.ini`. Note: acknowledging how critical access to proj.db is, we make an exception of also allowing embedding it with non-C23 capable compilers, using a CMake script, From 29823fba6d84f252946c1d1aaa64754f403145ea Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Thu, 10 Oct 2024 15:07:38 +0200 Subject: [PATCH 5/7] RFC102 text: point to PROJ RFC-8 --- .../rfc/rfc102_embedded_resources.rst | 29 +++++-------------- 1 file changed, 7 insertions(+), 22 deletions(-) diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst index 1c58ecf3e15f..7a324a5ab67b 100644 --- a/doc/source/development/rfc/rfc102_embedded_resources.rst +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -9,7 +9,7 @@ Author: Even Rouault Contact: even.rouault @ spatialys.com Started: 2024-Oct-01 Status: Draft -Target: GDAL 3.11, PROJ 9.6 +Target: GDAL 3.11 ============== ============================================= Summary @@ -17,9 +17,9 @@ Summary This RFC uses C23 ``#embed`` pre-processor directive, when available, to be able to optionally embed GDAL resource files directly into libgdal. -It is also intended to be used for PROJ, for its :file:`proj.db` and -:file:`proj.ini` files. For PROJ, a fallback mechanism will be used to not -require C23. + +A similar `PROJ RFC-8 `__ has been +submitted for PROJ to embed its :file:`proj.db` and :file:`proj.ini` files. Motivation ---------- @@ -90,7 +90,7 @@ locate resource files in the GDAL_DATA directory burnt at build time into libgda (``${install_prefix}/share/gdal``), or by the :config:`GDAL_DATA` configuration option. Said otherwise, if ``EMBED_RESOURCE_FILES=ON`` but ``USE_ONLY_EMBEDDED_RESOURCE_FILES=OFF``, -GDAL/PROJ will first try to locate resource files from the file system, and +GDAL will first try to locate resource files from the file system, and fallback to the embedded version if not found. The resource files will still be installed in ``${install_prefix}/share/gdal``, @@ -116,21 +116,6 @@ Impacted code - ogr/ogrsf_frmts/sxf: embedding default.rsc - ogr/ogrsf_frmts/vdv: embedding vdv452.xml -PROJ specificities ------------------- - -Loading of the embedded :file:`proj.db` will involve using the -`SQLite3 memvfs `__, -as done by -`DuckDB Spatial `__ - -Embedding of resource files in PROJ is limited to :file:`proj.db` and - :file:`proj.ini`. - -Note: acknowledging how critical access to proj.db is, we make an exception of -also allowing embedding it with non-C23 capable compilers, using a CMake script, -derived from https://jonathanhamberg.com/post/cmake-file-embedding/. - Considered alternatives ----------------------- @@ -171,9 +156,9 @@ Related issues and PRs - https://github.com/OSGeo/gdal/issues/10780 -- GDAL candidate implementation: https://github.com/OSGeo/gdal/pull/10972 +- `GDAL candidate implementation `__ -- PROJ candidate implementation: https://github.com/OSGeo/PROJ/pull/4265 +- `PROJ RFC-8 Embedding resource files into libproj `__ Voting history -------------- From 585f978f88a68f0a50607f15006a3f60a55f694b Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Sat, 12 Oct 2024 14:36:25 +0200 Subject: [PATCH 6/7] RFC102 text: tweak language Co-authored-by: Peter A. Jonsson --- doc/source/development/rfc/rfc102_embedded_resources.rst | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst index 7a324a5ab67b..21dc9fc01c21 100644 --- a/doc/source/development/rfc/rfc102_embedded_resources.rst +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -24,11 +24,10 @@ submitted for PROJ to embed its :file:`proj.db` and :file:`proj.ini` files. Motivation ---------- -Some parts of GDAL core, but mostly drivers, depend on a number of resource -files for correct execution. Locating those resource files on the filesystem -can be painful in some use cases of GDAL, that involve relocating the GDAL -binary at installation time. One such case could be the GDAL embedded in Rasterio -or Fiona binary wheels where :config:`GDAL_DATA` must be correctly set currently. +Some parts of GDAL core, mostly drivers, require external resource files located +in the filesystem. Locating these resource files is difficult for use cases where +the GDAL binaries are relocated during installation time. +One such case could be the GDAL embedded in Rasterio or Fiona binary wheels where :config:`GDAL_DATA` must be set to the directory of the resource files. Web-assembly (WASM) use cases come also to mind as users of GDAL builds where resources are directly included in libgdal. From a644d24f757217fe1d597300f224e1afc8a9a677 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Tue, 29 Oct 2024 17:54:52 +0100 Subject: [PATCH 7/7] RFC102 text: update status --- doc/source/development/rfc/rfc102_embedded_resources.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst index 21dc9fc01c21..96ed7ade8ae3 100644 --- a/doc/source/development/rfc/rfc102_embedded_resources.rst +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -8,7 +8,7 @@ RFC 102: Embedding resource files into libgdal Author: Even Rouault Contact: even.rouault @ spatialys.com Started: 2024-Oct-01 -Status: Draft +Status: Adopted, implemented Target: GDAL 3.11 ============== ============================================= @@ -162,4 +162,4 @@ Related issues and PRs Voting history -------------- -TBD ++1 from PSC members JukkaR, JavierJS, KurtS, HowardB and EvenR