Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Added support for spherical coordinates #23

Merged
merged 7 commits into from
Jan 8, 2017

Conversation

mjziebarth
Copy link
Contributor

Added support for spherical coordinates (standard geographic longitude/latitude coordinate system) to ordinary kriging. Depending on choice of coordinates_type in the respective function, coordinates will be interpreted as either planar 2d coordinates or longitude/latitude coordinates (x being mapped to longitude and y to latitude). Thoughts on the interface?

Also merged latest master branch commits.

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjziebarth Thanks a lot for contributing!

Below are few general comments on the code.

However I don't have a very good understanding of the impact of this coordinates types changes on the anisotropy option.

@bsmurphy What do you think?

# Calculate dot product of both vectors:
lat1 *= np.pi/180.0
lat2 *= np.pi/180.0
d = np.cos((lon1-lon2)*np.pi/180.0)*np.cos(lat1)*np.cos(lat2) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the transformation to radians np.pi/180.0 already done at the line above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above line, only the latitudes are converted. For the longitudes, I decided to convert only the difference, saving one vector multiplication. This should, however, not be much of a difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww, you are right I missed that.

# Angle is arccos of dot product. Avoid errors caused
# by numerics:
d[d>1.0] = 1.0
d[d<-1.0]=-1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if these 2 lines are removed? acrcos returns nan because d is outside it's domain definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, NAN is returned by arccos. Some small numerical errors might cause |d|>1.0. even though it should not be possible analytically. I believe I've encountered this problem in similar code I used in another setting, so I decided to add it here as well.

@@ -54,6 +58,26 @@
from scipy.optimize import minimize


def great_circle_distance(lon1, lat1, lon2, lat2):
# Calculate dot product of both vectors:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this comment a bit confusing: the dot product would be np.array([lat1, lon1]).dot(np.array([lat2, lon2]) , or is it related to the computation below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was refering to the euclidean dot product of both vectors. Will clear that up in the doc string!


def euclid3_to_great_circle(euclid3_distance):
"""Convert euclidean distance between points on a unit sphere to
the corresponding great circle distance."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to specify the units of euclid3_distance and of the output.. I also don't understand the 2.0 threshold below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a proper doc string soon. The threshold is again used to prevent arccos from returning NAN (as in the above comment).

self.XCENTER= 0.0
self.YCENTER= 0.0
self.anisotropy_scaling = 1.0
self.anisotropy_angle = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't know how this should work. In any case we can't silently overwrite a user-provided value e.g. of anisotropy_angle without warning the user it's happening. If anisotropy is not compatible with this types of coordinate, the doc-string should say so.

@@ -189,7 +189,7 @@ class OrdinaryKriging:
def __init__(self, x, y, z, variogram_model='linear', variogram_parameters=None,
variogram_function=None, nlags=6, weight=False, anisotropy_scaling=1.0,
anisotropy_angle=0.0, verbose=False, enable_plotting=False,
enable_statistics=False):
enable_statistics=False, coordinates_type='euclidean'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add an if test and raise an error is corrdinates_type is not one of the 2 acceptable values.

Copy link
Contributor Author

@mjziebarth mjziebarth Dec 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I did this already in line 238/239. Moved the definition of self.coordinates_type behind the check so that it will always be valid if set.


diff = np.abs(core.great_circle_distance(lon_ref[i], lat_ref[i], lon, lat) \
-core.euclid3_to_great_circle(np.sqrt(dx**2+dy**2+dz**2)))
self.assertTrue(np.all(diff < 1e-3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use numpy.testing.assert_allclose here I think

@@ -54,6 +58,26 @@
from scipy.optimize import minimize


def great_circle_distance(lon1, lat1, lon2, lat2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a proper doc string, indicating the type and the expected units of lon1, lat1 and of the result. For instance something like,

"""" Calculate the great circle distance between two points

Parameters
----------
lat1 : float
      latitude of point A (degrees)

[...] 
Returns
-------
distance: float
     the great circle distance (degrees)
""""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@mjziebarth
Copy link
Contributor Author

Thanks for the input! I'll adress the other points in the next few days and do some code adjustments (doc strings etc.)

- added user warning if anisotropy correction is used incombination with
  geographic coordinates
- added doc strings for great_circle_distance and euclid3_to_great_circle in
  core.py
@rth
Copy link
Contributor

rth commented Dec 16, 2016

@mjziebarth I am not sure if you wanted to do more commits in addition to the ones above...

Also could someone with a good understanding of geostatistics please review this PR ? cc. @bsmurphy @basaks

@basaks
Copy link
Collaborator

basaks commented Dec 17, 2016

@rth and @mjziebarth This is of interest to me. In my Kriging and regression kriging, I assumed that lon/lat scales as x/y. I assumed that is a fair assumption when you are Kriging locally (ignores curvature effects) or if you have many observations and they are more or less evenly distributed, especially if you use n_closest_points in OrdinaryKriging. Any one with a different opinion?

This PR is removes these assumptions. I will review over the weekend. Thanks for this @mjziebarth .

@mjziebarth
Copy link
Contributor Author

@rth I will do a merge of the latest PR. Perhaps I can manage to work through UK as well this weekend, but I'm not sure about that. So, as is, this PR will most likely remain OK-only till the end of this weekend. It should, however, not be hard for someone with good understanding of UK and the respective code (which I am not, unfortunately!) to implement changes similar to the OK code if UK also relies on isotropic variogram models (which it looked like it did on a quick glance).

@basaks I think that would be quite okay if you do local kriging close to the equator (might even better than the great-circle approach if you get into domains where numerical errors of the trigonometric functions come into play, but I don't know if a realistic scenario like that exists). The further you deviate from the equator, the more anisotropy there is in how lat/lon coordinate differences translate into great circle distance, which you could still correct using the anisotropy scaling if the latitude range of your sample is small.
What's impossible to do with x/y scaling, I think, is

  • a setting where the periodicity comes into play, like big longitude ranges or points very close to the equator
  • settings with large latitude range because then there is local variation of x/y anisotropy which cannot (I believe?) be corrected by the current code and would mess with the global isotropic variogram.

Just some thoughts off the top of my head, hope I understood and addressed your question correctly!

@basaks
Copy link
Collaborator

basaks commented Dec 19, 2016

I have gone through the PR. Comments:

  1. I would love to see a demonstration of the error when someone uses OrdinaryKriging with and without the spherical correction. Can you please add one in the examples directory?
  2. A test showing that your great circle distance calculation is correct. Or at least provide a reference. You can may be test your version against the geopy package? Just add geopy as a test dependency. Or you could use pyprog.Geod as a reference.

@basaks
Copy link
Collaborator

basaks commented Dec 19, 2016

@mjziebarth Can you enhance this may be?

"""
euclidean vs geographic
"""
import numpy as np
from pykrige.ok import OrdinaryKriging

# dummy data
X = np.random.randint(0, 400, size=(100, 2)).astype(float)
y = 5 * np.random.rand(100)

Xtest = np.random.randint(0, 400, size=(50, 2)).astype(float)
ytest = 5 * np.random.rand(50)

ok = OrdinaryKriging(
            x=X[:, 0],
            y=X[:, 1],
            z=y,
            verbose=True,
            coordinates_type='euclidean'
            )

ok2 = OrdinaryKriging(
            x=X[:, 0],
            y=X[:, 1],
            z=y,
            verbose=True,
            coordinates_type='geographic'
            )

print(ok.execute('points', Xtest[:, 0], Xtest[:, 1],
                 n_closest_points=5, backend='loop'))
print(ok2.execute('points', Xtest[:, 0], Xtest[:, 1],
                  n_closest_points=5, backend='loop'))

@basaks
Copy link
Collaborator

basaks commented Dec 20, 2016

@mjziebarth or anyone with more knowledge of the variogram models,
The variogram models used in either OK and UK in PyKrige, can I just transform my points from lon/lat into x/y by reprojecting using gdal and then use eucledian until this is PR is clarified?

Thinking....this should serve as a test case for this PR. I.e. compare the following:

  1. Transform the lat/lon (a shapefile maybe) using gdal into x-y (can use ogr2ogr for this), perform kriging, and,
  2. krige using lat/lon using coordinates_type=geographic and compare with above.

@mjziebarth
Copy link
Contributor Author

AFAIK there is no projection that preserves all distances (even without taking the periodicity into account), so intuitively there should be differences of the projected kriging compared to native krigin.
But I don't now the magnitude of these influences. Maybe there is a projection that is locally roughly equidistant? This could then be checked against a similar local kriging with this PR.

@basaks
Copy link
Collaborator

basaks commented Dec 20, 2016

@mjziebarth What's the best way to do the following then?
I have a shapefile with lat/lon and z(target) values.
I can use lat/lon to krige, or I can use latlon+coordinates_type=geographic to Krige. How can I say what is correct and why?
I understand that this PR is about saying that the latter is correct. Is that because the veriogram model is isotropic, that I need x and y in the same scale throughout my model?

If that is the case, can you provide a test case that this PR is correctly converting lon/lat to x/y and doing a proper job?

@mjziebarth
Copy link
Contributor Author

Yes, the latter would be my guess on how to do it. Bear in mind that I'm fairly new to kriging (in fact I first heard about it roughly about the time I started my fork).

As far as I understand, the variogram models used in this code are all functions of a single distance coordinate. As such, the choice of coordinates seems rather arbitrary to me, provided that the pairwise distances between the input data points used to determine the variogram are correct. The native distance metric on a sphere, disregarding earth's deformation, would be great circle distance. That's what this PR is about: Accept a different kind of data and make sure that the information needed for kriging, all needed distances, is calculated correctly.

That's why I believe that it is not possible to use a projection to convert lon/lat to x/y and afterwards use usual planar kriging: Because, afaik, there is no projection that satisfies the above condition that the euclidean metric used on the projected x/y coordinates equals the results we would have obtained from applying the great circle distance to the original lon/lat data.

It should be possible to approximate the great circle distance by the euclidean metric for local point sets, but most likely only where the curvature can be neglected over the whole set of points. Such sets could be used to test this PR.

Because lon/lat are never converted to x/y in this PR but handled natively, I can of course not provide a correct conversion to x/y.

Whether the kind of kriging proposed here has some pitfalls, I don't know - I would consider it an intuitive adaption of planar kriging.

@rth rth added this to the v1.4 milestone Dec 21, 2016
@basaks
Copy link
Collaborator

basaks commented Dec 21, 2016

Thanks for that @mjziebarth .

As far as I understand, the variogram models used in this code are all functions of a single distance coordinate. As such, the choice of coordinates seems rather arbitrary to me, provided that the pairwise distances between the input data points used to determine the variogram are correct. The native distance metric on a sphere, disregarding earth's deformation, would be great circle distance. That's what this PR is about: Accept a different kind of data and make sure that the information needed for kriging, all needed distances, is calculated correctly.

OK may be I should simply ask this then:

  1. My observations are in lat lon already, and,
  2. I want to krige in lat/lon.

Is that going to be ok?

From here:

d = np.sqrt(dx**2 + dy**2)
g = 0.5 * dz**2,

I get the feeling that it is ok to use lon/lat as x/y, as I have never changed my reference coordinate system, i.e., I computed my variogram in lon/lat and I also kriged in lon/lat.

Can you clarify?

@mjziebarth
Copy link
Contributor Author

mjziebarth commented Dec 21, 2016

@basaks So if I understand correctly, you have data given in lon/lat and what you do is simply using x=lon, y=lat and do normal kriging where d = np.sqrt(dlon**2 + dlat**2)?

Whether or not this is going to be okay depends on a number of things, foremost on what you want to achieve. I assume you want the variogram to give you information about correlation based on how far away two points are from each other. I further assume that you want that distance to be, as I said above, the 'native' distance of your space, the sphere. On that sphere, the great circle distance tells you exactly how far you would have to travel on that sphere on a perceived straight line between two points.

Given those assumptions, the question whether it is okay to krige using just lat/lon as if x/y boils down to the question how big differences between those two methods are. Some thoughts on that:

  • If your data is confined to a small area at the equator, you can neglect curvature and doing lat/lon kriege as if x/y should work without problems.
  • If your data is confined to a small area away from the equator, you could scale your longitude coordinate differences by 1.0*cos(latitude), which is how a full longitude circle's circumfence scales with latitude.
  • If your data is not confined, but does not span the whole sphere, you might be able to find a projection that keeps your distances correct, but I'm not to sure about that
  • If your data spans the whole sphere, I believe it's impossible to find a project, even if only because of the topological periodicity of the sphere.

Did I assume correctly and did that clarify?

Edit: Mixed / and *.

@basaks
Copy link
Collaborator

basaks commented Dec 21, 2016

I assume you want the variogram to give you information about correlation based on how far away two points are from each other. I further assume that you want that distance to be, as I said above, the 'native' distance of your space, the sphere.

I don't want any of that. I just want my kriged values are correct in lat lon, from reference (observations) in lat/lon. My data (observations or kriged area) does not span the world, or has no discontinuity. My observations are in lat/lon and kriging will also be in lat/lon.

Going by the variogram maths, it still appears to me that if one does not change scale between variogram computation, and kriging, it will be fine. Lat lon just seems to be a different coordinate system.

Now, if you were to ask me, how far do I have to go on the sphere to get to my next lat/lon that would be different as I change my lat/lon.

With this observation, I am afraid, I still don't understand what this PR is achieving, except in the special case of discontinuity in lat/lon that you speak of.

@mjziebarth
Copy link
Contributor Author

Fair enough, that's a different approach. Happy if that works out for you! In that case you do not need this code.

I origininally created the code because I had data in lon/lat coordinates and needed interpolated data. For me, spatial closeness on the sphere was the most important / most intuitive measure in determining the interpolated values. Fairly intuitive first guess, as many physical processes (ignoring more complex or perculiar processes) depend more on distance than on arbitrary coordinates. Say, for instance, a priori isotropic diffusion. Correlation then seems like the best guess of how to interpolate, so I ended up with kriging.

That is what this PR achieves: Krige based on the actual spatial structure or spatial distance and not on choice of coordinate system.

Say, for instance, you have two points at latitude 89°, but longitude 0° and 5°. Then in planar kriging, they would count for a distance of 5°. In reality, their distance is only 0.09°. So under said premise, data at this coordinate pair would count for a very different section of the variogram than expected from their spatial distance.

@basaks
Copy link
Collaborator

basaks commented Dec 21, 2016

@mjziebarth
Very good point. Now I get it.
It's the spherical coordinate mapping of dx=R d\theta etc. Thanks, and yes, I agree that it will have an effect.
On a local scale (small kriging area), where \theta is more or less constant over a small area, this effect will be negligible.
Thank you for clearing that up :)

This bit clarified it for me:

Say, for instance, you have two points at latitude 89°, but longitude 0° and 5°. Then in planar kriging, they would count for a distance of 5°. In reality, their distance is only 0.09°. So under said premise, data at this coordinate pair would count for a very different section of the variogram than expected from their spatial distance.

Did you manage to write some tests for this PR?

@bsmurphy
Copy link
Contributor

From a quick glance, this looks good! Sorry it took me a little while to get this...

@rth
Copy link
Contributor

rth commented Jan 4, 2017

Sorry for late response. This looks good for me too. Should we merge this? unless @basaks you have additional requests?

@rth
Copy link
Contributor

rth commented Jan 8, 2017

Merging this, so that refactoring in PR #23 doesn't break it. If there are any additional concerns about this we could open a new issue about it. Thanks for contributing @mjziebarth and sorry it took so long to merge this.

@rth rth merged commit 0196081 into GeoStat-Framework:master Jan 8, 2017
@bsmurphy
Copy link
Contributor

Thanks for merging this in, @rth. This is super useful.

@mjziebarth
Copy link
Contributor Author

Thanks for merging and comments everyone!

I plan to add a usage example in a seperate PR as soon as I find the time to do so.

@basaks I'll add a reference for the great circle distance function in a seperate PR, possibly rewriting it a bit (for increased numerical accuracy for antipodal points). This requires some research on my part, however, so it will take until I find the time to do so.

@mjziebarth mjziebarth mentioned this pull request Mar 3, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants