OSM Data Structure
This post presents a description of the OpenStreetMap (OSM) data structure and some implications for users. References are mostly from that organization’s wiki. What I intend to provide is:
- A short, precise definition of the data elements and how they interact.
- What the pros and cons are to organizing geographical data this way.
- A proposed mapping of OSM feature elements describing base map features.
Hosting an OSM Server
A limitation to using any tile service is that you are stuck with whatever symbolization and feature selection the author has chosen to provide. This is because what is returned to you is an exported image rather then the raw data itself. By maintaining my own server I could present layers in any format I wanted; group or split feature representation as needed; expose or hide features when required. Something in particular I want is to create some standard layers that could consistently be used as base map features. To do that I would have to take a deeper dive into investigating exactly how OSM organizes it’s data so that I could identify what combination of OSM features would need to be extracted to create my proposed base layers.
Structure of OSM Data Elements
Many other geographic data formats offer a way to organize like-features as a homogeneous collection that can be manipulated as an entity. For example GeoJSON has feature collections and ArcGIS uses feature classes. This is not the case with OSM; essentially each feature (i.e. a discrete object on the landscape, such as 1 building or 1 stream tributary) is independent. To get a collection of like features you have to filter against the whole dataset based on selected attribute values. This means you have to become familiar with the OSM data attribute definitions instead of simply navigating a hierarchy of predefined feature classes.
Essentially OSM has only 2 basic geometry types – nodes and ways. Nodes are points with a latitude and a longitude. Ways are an ordered list of nodes and are used to model a line. There are also closed ways where the first node and final node are the same and are used to represent either a loop in a line feature or an area of two-dimensional space occupied by a feature like a parcel of land. To identify whether a closed way represents an area occupying feature or not either the area attribute must be provided, with a yes or no value or another attribute that infers area needs to be attached.
OSM Data Elements
|Feature||A single identifiable physical element on the landscape. A feature is typically modelled as a node, a way or a relation with a unique id.|
|Node||A point with a latitude, longitude and unique id.|
|Way||An ordered list of nodes (2 – 2000 nodes) with a unique id. If more then 2000 nodes are required to define a single feature then multiple ways need to be created and joined in a relation.|
|Closed Way||A way in which the first and last node are the same. A closed way represents either a linear feature that loops back on itself or an area feature that occupies ground.|
|Relation||A collection of nodes, ways or other relations (each participating element is a member). It is recommended that no more then 300 members participate in a single relation. Although any one relation can technically support a mix of data element types in practice relations are collections of ways. Relations are used for ways that exceed the 2000 node limit or to define features that consist of multiple rings (e.g. an area feature that has a hole in it). A members can have a role attribute whose value identifies what function it provides to the rest of the relation e.g. if a closed way member has the role=inner, it defines the perimeter of an internal hole in an area feature.|
|Tag (Attribute)||Tags are the means to associate attributes to features. A tag consists of a key (what kind of attribute is being defined) and a value (the actual attribute value). This generally this takes the form:
Features can have any number of tags (or none) but can have only one tag per key. The OSM community maintains a list of commonly used tags with definitions and lists of acceptable values.
Strengths and Weaknesses of OSM Implementation
|Flexibility - contributors can add any tag they want, including entirely new ones. This means features can be fully described.||The only quality control is contributors own vigilance plus review and correction from other members.|
|The structure is fairly simple and does not require sophisticated knowledge in database theory or geographic data formats.||Populating attributes is entirely voluntary, only enforced by norms and best practices. There are no default values or required attributes.|
|Features can be added without all feature attributes being known.||To extract a class of features you have to know what attributes it is defined by (and hope contributors have attributed correctly).|
|Contributions become part of the larger dataset almost immediately.||There is no contours feature to show land relief (hills, cliffs, depressions, flat land). Elevation can be captured in the ele attribute but this is infrequently provided.|
|Anyone can make edits which means corrections can be made quickly.||As is the data is not amendable to many standard geographic analysis operations. Reasons for this can include:
• Nodes are required wherever lines intersect – OSM does not enforce this.
• The geometry type must be explicit. In many cases closed loops do not have the area attribute defined and whether the feature is a line or polygon must be inferred from some other attribute, something that is beyond a software tool.
|A lot of frequently updated data is made available to any user, at no cost, with no restrictions on use other then attributing OSM as data provider. This raw data becomes the starting point for many value-added information products.||There are no pre-defined layers e.g. such as “roads”. Every feature is a node, way, closed way or relation and a “layer” is simply any collection of features that share one or more attribute values (and not even necessarily the same element type). This can make it hard to extract the correct, complete set of features you actually want if attributes are misapplied or missing.|
Proposed Basemap Feature Layers
One of my objectives is to extract from OSM features that can be used as basemap layers. Basemaps display basic geographic features as a locational reference to other mapped features. As a model I intend to create something similar to the basemap layers historically used in natural resource management. The first table below defines the layers I am trying to create. The second layer identifies what the OSM equivalent is and some notes on potential difficulties.
|Basemap Layer||Geometry Type||Description|
|Water Polygon||Polygon||Areas covered by water excluding oceans and non-inland seas (primarily freshwater and salt-water lakes and large rivers).|
|Water Line||Line||Permanent streams and rivers below a certain absolute size; center-line of larger rivers above certain scales.|
|Roads||Line||Anything meant to be driven on by motorized non-off-road vehicles.|
|Rail Lines||Line||Anything with physical rails in place with track suitable for commercial, public transit or passenger traffic.|
|Country – Administrative 1||Polygon||National boundaries.|
|Province - Administrative 2||Polygon||Sub-national boundaries within a nation typically equivalent to a province.|
|Region - Administrative 3||Polygon||Government administrative levels larger then municipalities but subordinate to a province. Examples would be counties.|
|Municipal - Administrative 4||Polygon||Local government typically centered on a city or town plus surrounding rural area.|
|Area polygons where all parts of a polygon fall within the same elevation range.|
|Basemap Layer||OSM Attribute Keys and Values||Notes|
|Water Polygon||natural=water, waterway=riverbank, natural=bay||A “bay” is “water mostly surrounded by land” and can be lake or ocean. There doesn’t seem to be a way to distinguish between the two which implies a post-export geoprocessing operation would have to be developed to limit features to freshwater/inland seas (something like “only features completely within land boundaries”.|
|Water Line||waterway=river, waterway=stream|
|Country – Administrative 1||boundary=administrative + admin_level=2,|
boundary=maritime + admin_level=2
|Province - Administrative 2||boundary=administrative + admin_level=4,|
boundary=maritime + admin_level=4
|Good for US, Canada; works for most other countries.|
|Region - Administrative 3||boundary=administrative + admin_level=6,|
boundary=maritime + admin_level=6
|Good for US, Canada; works for most other countries.|
|Municipal - Administrative 4||boundary=administrative + admin_level=8,|
boundary=maritime + admin_level=8
|Good for US, Canada with some exceptions (very large cities, national and sub-national capitals may have higher admin_level); many less developed countries do not have communities smaller then cities attributed this way.|
|Contours||ele=number||Too incomplete to use in most areas; as node (point) features would have to run a post-export geoprocessing operation to convert to polygons.|
The above can be considered a preliminary definition that will probably take several iterations to get right. I may have to add coastline or continental boundary as a basemap layer but am hoping I can get away with simply using Country as approximately the same. The plan also envisions increasing numbers of secondary layers – such as urban, wilderness or park lands – being later defined.