Skip to content
Dimitri van Hees edited this page Mar 18, 2016 · 2 revisions

How would you publish geospatial data nowadays if the OGC standards/web services did not exist? The OGC standards were crafted almost two decades ago. Within the geo-information community they are fulfilling specific needs in an excellent way. However, current web developers have trouble interacting with OGC services. Many reasons are cited; such as the quality of the implementations, their complexity, lack of flexibility and data/discovery quality issues. The Web has changed considerably since the introduction of WMS, WFS and their contemporaries. In recent years new paradigms have emerged and old ideas e.g. Linked Data have matured. Light-weight Application Programming Interfaces (API) have become the dominant carriers of web service functionalities. What do these developments mean for the delivery of geospatial data? How should geospatial data be published so that it meets the expectations of contemporary web developers and effectively uses the newly crafted/matured available Web technologies/paradigms such as Linked Data? How should geospatial data be published so that it is crawlable and indexable by the likes of Google and Bing?

Vision

Where traditionally the end-user was identified as the consumer, nowadays more and more the end-user is a developer, app, machine or third-party integration. With over more than 3.5 million developers worldwide, data needs to be published in a developer friendly way. This means industry-wide accepted and welldocumented data-exchange mechanisms should be used to get benefits out of data. Developer-friendly APIs (Application Programming Interfaces) are the answer to this, as proven by the immensely growing API industry in the United States, also known as the API economy.

We help organizations with setting up and implementing an API strategy to ensure that their data is future-proof and consumable for at least the vast majority of the so called Large Set of Unknown Developers. Following recent developments on the more academically approach of publishing data using Linked Data, we think that every publication mechanism should provide proper content-negotiation and backwards compatibility possibilities so data can be enriched and/or published in a later stadium, despite of whatever standards are ‘the best’ or whatever type of developers wants to use the data (e.g. geo developers, mobile developers, web developers, etc.).

Expectations

We believe that offering a well-documented, findable and developer-friendly API with content-negotiated mediatypes is the best way to publish data, geospatial data included. By adding semantics using JSON-LD, the data might trigger search engines and crawlers like Apple Siri, Microsoft Cortana and Google Now, to boost search results and integrate the data, so that human readable questions could be answered.

This research topic focuses on crawlability and the publication of data using the modern ecosystem of the web, which is embraced perfectly by our vision: there is no Holy Grail, you should just publish as much formats as possible. We expect that bringing together the best of the worlds of content-negotiation, JSON-LD, REST and WFS’s GeoJSON in a single publication strategy, will do this trick in a generic way for multiple datasets.

Deliverables

At the end of the research period we want to produce the following deliverables:

  • RESTful Geo-JSON API.
  • RESTful Geo-JSON-LD API.
  • Geo-JSON-LD converted RDF NQuads.
  • API specification documents which can be submitted to popular API search engines, portals and other API community tools.
  • API documentation for better understanding on how the API works for consumers.
  • API Software Development Kits (SDK's) and code examples.
  • Different HTML representations which can be monitored through external tools (e.g. Google Analytics) to follow the crawlability performance (also after the research period, because search engines might need some time to process these pages). These representations will be hosted on different websites to test which representation (HTML + embedded JSON-LD, HTML + embedded RDFa, etc.) gets the highest score using the old SEO method of Trial and Error.
  • Report with all the research outcomes, tested vocabularies and community communication.

Procedure

During the research period we will follow an agile procedure based on rapid development principles. This means our focus will be on results, getting answers on the research questions and possibilities for future expansion of the deliverables as mentioned above, prior to performance, scalability, code quality and other requirements we normally use for business critical software in production environments.

Besides of spending unnecessary time we have no direct influence on external crawlers/search engines, so it’s very hard to estimate how much time we need to spend per feature. Using agile and rapid development we are able to quickly adjust features and code, which gives us the best possibilities for the best results.

Assumptions

Because we are dependent of external parties like Google, we cannot give guarantees on certain outcomes. That’s why we want to rely on the following assumptions:

  • Search Engines might take some time to index our test sites, so the results might not be available within the research period.
  • Technical boundaries are left out of scope. For example, we believe that performance issues with Linked Data will be solved when it gets adopted by the community or by spending more money on hardware if it really offers a lot of added value. But for the sake of finding answers on this research topic, it doesn’t matter.
  • For popular search engines we rely on Schema.org as the main vocabulary. We will address the Schema.org user group on suggestions for expanding the vocabulary and will add the outcome of this communication to the report, but we don’t expect that it will be expanded within the given time frame of the research period.