-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
NEARBY appears to not sort correctly #195
Comments
Hi Alex, Thanks for bringing this to my attention. I've seen fringe cases where points fall slightly out of order, but this is usually when points are close to the poles and are separated by more distance than you're showing. This behavior is a quirk of translating R-tree mercator coordinates and WGS84. In your case the points are pretty close and it makes sense that the system should be returning them in order. I'll need to investigate further and I'll keep you posted. |
Thanks for a quick response! Here's my original issue that seems like a bigger problem, but I might be missing something obvious, so maybe you can clue me in. I'm working on a dataset which has lots of addresses in the US mapped. Within that dataset, if I ask for
However, if I ask for
|
This is normal behavior. When a radius is provided the system performs a standard INTERSECTS on the outer boundary of the radius. This will always return back an accurate list but the results will come back in an undefined order. Adding the "LIMIT 3" is basically requesting "give me any 3 points inside the specified radius".
Something is wrong here. I would like to reproduce the issue on my side asap. You mentioned this is with version 1.9.0. What's the operating system? Thanks |
It's on linux, kernel 3.13.0-123-generic, x86_64. Ubuntu trusty. The dataset is pretty big, some 150 million points. How do we get it to your side? |
Thanks for the OS info. As for data, perhaps just a smaller segment is all I need. Would it be possible to extract a couple thousand random points around Hopefully I can identify the issue with just this data to start. |
@tidwall will the z values be significant for you for this test dataset we'll grab? |
@hctareq yes the Z coordinate is needed too. |
Looks like the z-value is directly responsible. Here's the minimal test case:
produces
|
Interestingly, if I use |
I think you may have identified the root of the bug. The kNN operation works strictly on the coordinates stored in the 3D R-tree. The X and Y are -180 to 180 / -90 to 90 and the Z axis is unbounded and represented as meters. The kNN algorithm does a direct line of sight distance calculations from point to point in (XYZ). Because XY uses degrees and Z uses meters the result of the calculation won't represent point to point at ground level. For example, here's an image of a bunch of cities on a 3D R-tree with elevation, but in mercator projection. You can see that the "height" of the Z axis can sometimes be taller than an entire continent is long. Thus in the current state the kNN operation is not reliable when a meters Z coordinate is used. I consider this a bug of the system as Z is mostly used for elevation data in meters. I have an idea that might fix the problem for the long term. But in the meantime the only workarounds that come to mind are to remove all Z coordinate or make Z coordinates tiny, such as millimeters/micrometers. If this is the case then it should be easy to test on my side without the need for extra data. I'll see what I can do to turn out a patch asap. |
Scratch that, I mean to use a measurement system that is much larger, in order to make the values smaller. For example using kilometers or maybe x10000 will make the Z coordinate less signifigant during kNN operations. |
The docs (http://tile38.com/commands/set/#z-coordinate) mention timestamps as an example use of the Z coordinate, but it sounds like you're saying it's really meant to deal with meters as far as implementation details go. Going forward, do you see that staying the case to be able to compare in tandem with the lat/long dimensions? |
What I'm getting at is whether, longer term, the Z coordinate will be treated as a first class citizen in a 3D R-tree or if things will revert to 2D and the Z coordinate devolves to more of a normal "field." The wording in the docs is:
|
Hmm, how can 100 meters of elevation be possibly affecting the distance of several thousand kilometers? In the last test case, the point 3 thousand km away is returned first, while the second point is only 2 km away. The 100 meters of elevation difference should not matter here, right? |
The Z coordinate is used for meters/elevation for many (perhaps most) applications . Though it's occasionally used for speed or time, this is common in fleet-management systems.
Yes, the plan is to keep the Z coordinate as a first class citizen. Though it's likely that there'll be some options with how the kNN operation works on the data. |
This is because the Tile38 R-tree stores XY in degrees and Z is user-defined. The kNN operation assumes that XY and Z are all the same unit type. At the equator there're about 111km for every degree.
It matters because the calculation thinks that 100 meters is actually 100 degrees. For example, the 3D distance for point 1:
and then point 2:
So point 1 is coming back closer than point 2. |
Ah, that makes sense now. And it definitely sounds like a bug :-) We're going to work around this by removing z-value and using a separate field, as our values had nothing to do with elevation from the start, just didn't know that affected the distance. Thanks! |
Yes this is a bug, but now I have some good ideas on how to fix it! You're welcome and thank you very much for helping out with diagnosing the issue. |
I'm closing this issue and added the new bug entry #196. |
Now that we're clear on the impact of Z, the original issue in the first comment is still a problem, in a purely 2D world. Should I open the new issue with just that, or should we re-open this one? Thanks, |
The current version of Tile38 performs kNN operations in euclidian space. In order to get accurate ordering we'd need to move to a geodesic structure. This is done by converting 2D points in lat/lon (or 3D points with Z as meters) to a 3D point on a sphere. For example: becomes I've been working on this for v2.0 but unfortunately I'm still a ways out. |
In the meantime there may be ways to improve the results quite a bit. Though it's not likely to have perfect ordering in it's current state. |
I'll investigate further and keep you posted. |
I pushed an update to the master branch that provides better distance ordering. |
Everything looks correct now, thanks! |
That's good news. I'm going to close this issue for now, but feel free to reopen if you run into any problems. Thanks a bunch for your help. |
Came across this while looking for z index support. Wondering if the |
what does first citizen mean?. is it that the value of z coordinate will be the final criterion of sorting? |
I have this simple test case, using version 1.9.0:
this returns:
I was expecting to get the points in the nearest-first order, according to the docs. Instead I get results that are 53, 91, and 81 meters away from the center of the search. Is this a bug, or am I missing something obvious here? Thanks in advance for any help!
Alex
The text was updated successfully, but these errors were encountered: