Skip to content
This repository has been archived by the owner on Mar 25, 2024. It is now read-only.

Operation chaining #124

Open
vmora opened this issue Mar 2, 2016 · 17 comments
Open

Operation chaining #124

vmora opened this issue Mar 2, 2016 · 17 comments

Comments

@vmora
Copy link
Contributor

vmora commented Mar 2, 2016

Experimenting with the python API, raises some question about what we want the API to be.

Chaining operations while maintaining smath precision (SP for lazy exact, mp, interval...) proved difficult to port to postgis (@mhugo any update on passing references around). It is not really a problem with the python API where we can expose whatever class we feel like.

The fundamental question to me is: to we need to do that ?

We know now that we need to post-cond output validity when plungin back into double. If we are chaining operations with SP (i.e. operation return a ref to a sfcgal class that may be invalid in double), we could do something like:

double_geometry( operation1( operation2( sp_geometry( input_geom_in_double ) ) ) )

This seem to be a good approach iff:

  • we are actually able to do that (fine in pythin, maybe not in postgis because we need to pass refs)
  • we think that we can gain something significant (e.g. we may be unable to post-cond the validity of intermediates while the final validity is OK, like we do an intersection at the end, and the constructed problematic area is removed by the operation)

The alternative, or complementary (current) interface is:

double_geometry operation1( input_geom_in_double ) = double_geometry( operation1(  sp_geometry( input_geom_in_double ) ) )

There is another complexity arising from the "passing ref to chain": what is the SP precise type that is accepted as input ? Some operations double -> double would be optimally implemented without SP.

My guts feeling is that the simple interface is better (i.e. independant functions that can convert the double input to whatever type is optimal, and return a double geom that is valid in double).

What do you think ?

@mborne
Copy link
Contributor

mborne commented Mar 2, 2016

I think that precision lost is a matter of serialization. We failed to keep "exact computation" between PostGIS function calls because :

  • geometries are "GSERIALIZED" between each function call
  • Coordinates are stored in double in GSERIALIZED format (rational number allows exact representation of the coordinates)

The fact that PostGIS need a serialization of geometries is a PostGIS problem. I don't think that a "pysfcgal" module should suffer from this limitation.

Meanwhile, at the end of the chain, round to double is a requirement. So, instead of exposing the internal representation of the coordinates in all functions ("operation with double geometries"), I think that we should focus on a way to cleanly round Epeck geometries to Epick geometries (aka without creating intersections)

Trying multiple position for points creating intersections could be an alternative. The missing "snap rounding 3D" ensuring a minimal distance between points, segments and triangles in a mesh would be the best way.

@ocourtin
Copy link

ocourtin commented Mar 2, 2016

Python API sounds like a good idea.
And as long as we can, keeping the same interface and abilities for both PostGIS and Python API, too.

For PostGIS, if new PG Headers are not helpful enough, what about something like:
ST_SFCGAL_chained("operation2(%s, operation1(%s))", geometry, int)

@vmora
Copy link
Contributor Author

vmora commented Mar 2, 2016

@mborne then what would be the kernel of the geom exposed to python ? Epeck ?

The second someone ask for an x()... @mhugo what happens then ?

@vmora
Copy link
Contributor Author

vmora commented Mar 2, 2016

ST_SFCGAL_chained("operation2(%s, operation1(%s))", geometry, int)

Nice... this is a serious and functional option. The second question remains: do we gain by doing that, if yes, then what is the kernel for the public geometries.

Once the choices made, I can wrap the wkt reader and start testing with python. Benching unit tests with boost framework .vs. python gonna be fun.

@mborne
Copy link
Contributor

mborne commented Mar 2, 2016

@vmora If we use a single kernel, we should use Epeck. Provided points are not created (situation where Epeck is required), the price should be a factor 2 compared to pure double processing (interval arithmetic)

Optimal approach might requires to switch between serialized, epick and epeck geometries according to the requirements of the algorithms with the following logic :

  • geometry is serialized, read as epick geometry if algorithm doesn't requires epeck, invoke algorithm
  • geometry is epick, algorithm requires epick => invoke epick
  • geometry is epick, algorithm requires epeck => convert geometry to epeck, invoke epeck
  • geometry is epeck, invoke epeck (even if the algorithm only requires epick)

A kind of "Lazy Geometry" behind the opaque pointer (sfcgal_geometry_t) in the CAPI with some helpers :

  • isEpeckGeometry
  • getEpickGeometry : should not be used if the wrapped geometry is epeck
  • getEpeckGeometry
    ?

@Remi-C
Copy link

Remi-C commented Mar 2, 2016

Hey,
even if it doesn't look feasible to keep precision within postgis,
an advanced user could still use plpythonu and pysfcgal to workaround that,
which kind of push toward keeping precision in python.

Yes I do realize that plpython is not going to be used in production.
My 2 uneducated cents.

Cheers,
Rémi-C

2016-03-03 0:32 GMT+01:00 MBorne notifications@github.com:

@vmora https://github.com/vmora If we use a single kernel, we should
use Epeck. Provided points are not created (situation where Epeck is
required), the price should be a factor 2 compared to pure double
processing (interval arithmetic)

Optimal approach might requires to switch between serialized, epick and
epeck geometries according to the requirements of the algorithms with the
following logic :

  • geometry is serialized, read as epick geometry if algorithm doesn't
    requires epeck, invoke algorithm
  • geometry is epick, algorithm requires epick => invoke epick
  • geometry is epick, algorithm requires epeck => convert geometry to
    epeck, invoke epeck
  • geometry is epeck, invoke epeck (event if the algorithm only
    requires epick)

A kind of "Lazy Geometry" behind the opaque pointer (sfcgal_geometry_t) in
the CAPI with some helpers :

  • isEpeckGeometry
  • getEpickGeometry : should not be used if the wrapped geometry is
    epeck
  • getEpeckGeometry ?


Reply to this email directly or view it on GitHub
#124 (comment).

@ocourtin
Copy link

ocourtin commented Mar 3, 2016

@remic Yeap the Pl/Python way is far better/clever than parsing back the chain SFCGAL side (i.e SF_SFCGAL_chained)

Thanks Remi for this input !

@mhugo
Copy link
Contributor

mhugo commented Mar 3, 2016

Could we exhibit a chaining example where we might gain in keeping the precision ? I mean, apart from the serialization / deserialization problem of postgis ? With a pysfcgal API ?

@vmora
Copy link
Contributor Author

vmora commented Mar 3, 2016

@mborne in summary the returned geometry could be epeck or epick, so we need to expose the two types to python, or to have a wrapper... why not an un-templated Geometry (variant) with both epick and epeck cases ?

@mborne
Copy link
Contributor

mborne commented Mar 3, 2016

why not an un-templated Geometry (variant) with both epick and epeck cases ?

The idea is that there is enough dynamic dispatch to manage in the C++ with the differents types Geometry as a parameter.

Meanwhile, your idea of a double variant (kernel and geometry type) seems great. It would ease the write of CAPI and lead to manage the auto-switch at the algorithm level with something like :

// Epick is enough
TriangulatedSurface<K> triangulate2DZ( const Geometry<K> & g ){
    // "pure" CGAL
}

std::unique_ptr< IGeometry > triangulate2DZ( const IGeometry & g ){
    if ( g.isEmpty() ){
          return makeGeometryCollection(); // returns empty geometry collection
    }
    if ( g.isEpeck() ){
          // avoid round to double of previous epeck constructions
          return makeGeometry( triangulate2DZ( g.asEpeckGeometry() ) ) ;
    }else{
          // no need to work with an epeck geometry
          return makeGeometry( triangulate2DZ( g.asEpickGeometry() ) ) ;
    }
}

@vmora
Copy link
Contributor Author

vmora commented Mar 3, 2016

@remic It was my first approach before I learned that plpython is not available in prod...

After I read @ocourtin message, I stoped alfway of proposing to use python to parse the expression :)

@ocourtin
Copy link

ocourtin commented Mar 3, 2016

@vmora @remic
Humm, what kind of issue to not use Pl/Python in production ?

@Remi-C
Copy link

Remi-C commented Mar 3, 2016

plpython is in fact plpythonu, like plpython-unstrusted.

If a user can change plpython function,
he can basically do what he wants with the database,
and even the filesystem (ex : writting/deleting files
https://github.com/Remi-C/PPPP_utilities/blob/master/postgres/write_files_with_plpython.sql
).

Personally I find it very handy.
For real database administrator, it might not look so funny.

Maybe a mail on postgres list inquiring if pysfcgal
could be safely used may give a better idea?

Cheers,
Rémi-C

2016-03-03 14:05 GMT+01:00 Olivier Courtin notifications@github.com:

@vmora https://github.com/vmora @remic
Humm, what kind of issue to not use Pl/Python in production ?


Reply to this email directly or view it on GitHub
#124 (comment).

@vmora
Copy link
Contributor Author

vmora commented Mar 3, 2016

I wonder if you can do that in plpythonu

import shutil
shutil.rmtree("/")

@Remi-C
Copy link

Remi-C commented Mar 3, 2016

I'm not sure you have rights outside of postgres folder on the filesystem
Cheers,
Rémi-C

2016-03-03 14:28 GMT+01:00 Vincent Mora notifications@github.com:

I wonder if you can do that in plpythonu

import shutil
shutil.rmtree("/")


Reply to this email directly or view it on GitHub
#124 (comment).

@ocourtin
Copy link

ocourtin commented Mar 3, 2016

@remic
GRANT EXECUTE give rights to use an already existing function
GRANT CREATE FUNCTION give rights to create a function (inside schema/database...)

So it could be safe in a DBA point of view
(as long as the DBA will and could review the Python function the user wrote)

@vmora
rmtree on / will do, but will impact only files postgres process can write (i.e: o+w files)

@Remi-C
Copy link

Remi-C commented Mar 3, 2016

Launched a postgres mail thread.
http://postgresql.nabble.com/PLPythonu-for-production-server-td5890479.html
Cheers,
Rémi-C

2016-03-03 16:35 GMT+01:00 Olivier Courtin notifications@github.com:

@remic
GRANT EXECUTE give rights to use an already existing function
GRANT CREATE FUNCTION give rights to create a function (inside
schema/database...)

So it could be safe in a DBA point of view
(as long as the DBA will and could review the Python function the user
wrote)

@vmora https://github.com/vmora
rmtree on / will do, but will impact only files postgres process can write
(i.e: o+w files)


Reply to this email directly or view it on GitHub
#124 (comment).

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants