As a small business bringing new untested products to market, I have to be mindful of a capped budget and the need to minimize expenses. Therefore whenever I can acquire something for free I take it. In this way I’ve acquired software tools, templates, advertising and web analytics. One of the biggest acquisitions has been open data. However in this journey I’ve discovered what is not for sale will still be paid for in other ways.
What I would like to share with you are insights I’ve gained in the process of investigating free geographic data options to support my own business. I will describe the kind of data I needed for a particular project; why commercial data was not an especially good option (hint – its not just $$$); and why open data was a good alternative. This will set up a surprising conclusion – although open (free) data was adopted as a provisional solution, there were a variety of trade-offs in that decision, some of which weren’t immediately obvious. Moreover, adopting free data doesn’t result in dollar free expenses so much as shift where you spend your money.
The Project Data Requirements
To accommodate project needs I required data that met the following criteria:
- Seamless world-wide coverage – or at least populated land areas including major island groups.
- Suitable for viewing at both small- and large-scale.
- Licensing that allowed for commercial use.
- Delivery in an acceptable format. In practice this meant:
- GeoJSON or shapefile.
- Any format that could be readily convertible to GeoJSON or shapefile through open source tools.
- Images or layers delivered through a Web Mapping Service (WMS) or Web Mapping Tile Service (WMTS).
ESRI Commercial Data
Commercial data meeting these requirements certainly exists. In an earlier venture into entrepreneurship I had purchased a single-user desktop version of ArcGIS 9.3; the software actually came with a data package that closely matched the requirements listed above. Some of the reasons I chose not to seriously consider this dataset for my current needs were:
- The data is more then 10 years out of date, in a file compression format (.sdc) that only ESRI (Environmental Systems Research Institute) products know how to read (at least I haven’t found any third party tools capable of reading it.)
- Even if I could extract the dataset I’m not sure how much effort it would take to bring it up to date and whether I even retain rights to download newer versions under my original license.
- Although I still have this version of ArcGIS on my laptop (the license allowed installation on one desktop and one laptop), I no longer use this as my main GIS tool and my laptop is rather underpowered for this role anyways.
- The data license is registered under my earlier business name. Although this can probably be changed I’m not even sure if I still have the account login credentials, or if the account is still active.
ESRI also offers data through a free ArcGIS online subscription – however data usage is restricted to non-commercial purposes. If you want to use their data in business, either you need to purchase a data package or upgrade to a payed subscription.
Google Data for Commercial Use
Google now offers Google Pro for free and you can use the data that comes with it without cost and for commercial purposes. The catch is you can only use it within the context of their own software. This limits you to carrying out analysis on a google platform and exporting results to a static format. It is a violation of terms to take the data and port it to another software platform such as QGIS.
Google does sell its data for commercial use; don’t ask me for how much, if you go down that rabbit hole you eventually get to the dreaded “contact our dedicated sales team for a personalized quote today!”. I’ve learned this is code for “if you need to ask, you can’t afford it”.
Open Geographic Data Sources
An alternative to a commercial solution is free, open data. There is actually an abundance of providers offering such information online. Almost every level of government (at least in developed, democratic countries) offers its geographic data for free with almost no restrictions. International agencies, professional societies, non-profits and even individuals also distribute map datasets at no cost.
While much of this information is highly usable, most of it is for geographic extents no larger then the national level. This already violated my first criteria of seamless global coverage. While in theory datasets from multiple providers could be stitched together, in reality this was not practical. Integrating data in multiple formats, assembled to different standards and following conflicting rules for feature classification would be a prohibitively difficult task.
The data provider provisionally chosen was OpenStreetMap (OSM). This is one of the few free geographic data sources that has genuine world wide coverage. It is a not-for-profit organization started in England with a mission to “provide free geographic data, such as street maps, to anyone.” Data is a mix of government sources and user provided uploads, integrated by a team of volunteer professionals.
Aside from being free and having a world wide extent the dataset had the following virtues:
- A non-restrictive license that allowed commercial use. The only obligations were:
- Attribution to OSM.
- A clear statement that the data came from an Open Database License (ODbL) – data that users could obtain themselves from OSM.
- An obligation to make any derivative database available to all users (although this could be satisfied via instructions on how to replicate the derivative database).
- A regular schedule of updates. A full download of the most up to date version is made available on a weekly basis. A subset of the full package is also made available limited to changed features since the last full update.
- A large community of users and contributors which should reinforce quality and ensure a large knowledgebase to answer questions.
- OSM has existed for more then 10 years. Enough organizations and businesses rely on its continuance that there is plenty of incentive to make sure it remains funded and volunteer staffed.
- The ODbL license results in added-value derivative databases also being made available for public consumption. For example www.Geofabrik.de provides a derivative database that has been chunked in countries and sub-administrative units, some of which is provided in the more maleable shapefile format.
The Cost of Free Data
This brings me to the thesis of this post. Although OSM (and many other open data providers) have the virtues listed above there are also down sides. Some of these translate into real money expenses (as in you will be still writing someone a cheque). In the case of OSM the following apply:
- Because so much of the content comes from user contributions, the data quality, geographic coverage and freshness can be variable. Generally metro areas anywhere are well covered. Remote locations (even in developed countries) and sparsely populated rural regions may be less up to date or even lack features altogether.
- Determining if you have an obligation to share derived data can be non-straight forward. The ODbL has a share-alike provision that is triggered if you make public a product that was created from a modified version of the OSM database. Examples of modifying the database include editing existing map features or adding additional attributes to all features; the result is what the ODbL terms a derivative database. The challenge occurs when you have an edge case – for instance if you change the map projection. This is what would be considered a trivial transformation within the OSM community – an action that technically meets the definition of a derivative database but is not felt to enhance the core value. At the time of writing there is an ongoing discussion on whether to adopt trivial transformations as an exclusion to the share-alike provision, but this is not yet officially adopted.
- OSM servers have finite capacity and the organization limits how many calls any consuming application can make over time. Rather annoyingly they don’t provide any exact quantification such as number of requests per day. Therefore if you are hosting a live web mapping application serving thousands of your own customers, you are best advised to run your own data server (or subscribe to a commercial distributor of OSM data). In either case, this will incur a cash expense.
- The requirement to make derivative databases available to the public means that this cannot be the sole value proposition to your customers. You are permitted to charge any fee you like for redistribution, but also must make the derivative database available for free to the public. Therefore your value proposition might be:
- Hosting the fee-based derivative database on a high capacity server(s) thus supporting high-volume users without degrading performance or contravening OSM quota limits.
- Bundling the derivative database with other services and products.
- Acquiring revenue through the sale of produced works such as paper maps or games.
- Although there is no requirement to pay anything to use OSM, the organization (or more precisely the OSM Foundation) accepts donations, charges a membership fee from contributors and solicits professionals to volunteer time to maintain the database. There is at least a moral argument to make a contribution if you are a non-trivial user of OSM data. On a more practical side, if your geographic area of interest is not well served by OSM or has errors, you have a business incentive to make the contribution that rectifies the issue.
In summary, this is what I have learned; there are more benefits to using open source information then just cost savings. But this is balanced by real costs in time and money, and these need to be quantified explicitly before you are truly informed in making a cost-benefit analysis.