You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Analyzing fishing activity from AIS (Automatic identification system) data
Technologies:
Python
MongoDB
Dataset:
https://zenodo.org/record/1167595#.X_0AltgzaUl
Refers to real data from October 1st,2015 to March 31,2016, in Celtic Sea, Channel and Bay of Biscay (France).
FishingAnalysisAIS.pdf (Greek)
nosql_db.ipynb (Add data to local mongodb )
mongodbProject.ipynb
2. Geospatial Queries on Big Data
Technologies
Apache Spark
Scala
Dataset:
Testing dataset: Hotels and Restaurants Worldwide
Any two geospatial datasets can be used with a few modifications of the code
Data can be downloaded from https://www.dropbox.com/s/dis84tm0r5vyzy7/hotels_restaurants.rar?dl=0
Topics
Geospatial data indexing and partitioning in a cluster
Geospatial join between two datasets given a distance d
i.e.: Find all hotels close to all restaurants by a distance no more than d
Manual creation of a space-partitioning algorithm (for learning purposes. Well known libraries already exist for this task)
For distance metric, Haversine formula is used
Files
GeoQueries_Spark.jar
or
GeoQueries_Spark.scala
Instructions (for jar)
run: spark-submit --master local[*] --class "app" JAR_FILE d n file1 file2
where:
JAR_FILE: jar file location
d: distance in kilometers
n: number of partitions to make
file1,file2: locations of the two datasets to join