OSM Data Structure

 

This post presents a description of the OpenStreetMap (OSM) data structure and some implications for users. References are mostly from that organization’s wiki. What I intend to provide is:

 

  1. A short, precise definition of the data elements and how they interact.
  2. What the pros and cons are to organizing geographical data this way.
  3. A proposed mapping of OSM feature elements describing base map features.

 

Hosting an OSM Server

 

Recently I have been considering building my own geographic data server based primarily on OpenStreetMap. I originally started to work with OSM when prototyping maps that would be used as part of IPT’s YourMapData product. These were implemented on Leaflet (an open-source javascript library for mapping) whose scripts typically reference a map image tile service. The tile service is what provides the basemap that gives context to whatever else you are trying to map. The OSM community maintains such a server and many tutorials and sample implementations use it by default and therefore so did I. However this free resource has limits – OSM will cut off anyone who makes excessive requests, and they don’t explicitly publish what those limits are. Therefore I knew sooner or later I would have to switch over to a commercial source and to date have been using MapQuest Map Tiles service.

 

A limitation to using any tile service is that you are stuck with whatever symbolization and feature selection the author has chosen to provide. This is because what is returned to you is an exported image rather then the raw data itself. By maintaining my own server I could present layers in any format I wanted; group or split feature representation as needed; expose or hide features when required. Something in particular I want is to create some standard layers that could consistently be used as base map features. To do that I would have to take a deeper dive into investigating exactly how OSM organizes it’s data so that I could identify what combination of OSM features would need to be extracted to create my proposed base layers.

 

Structure of OSM Data Elements

 

Many other geographic data formats offer a way to organize like-features as a homogeneous collection that can be manipulated as an entity. For example GeoJSON has feature collections and ArcGIS uses feature classes. This is not the case with OSM; essentially each feature (i.e. a discrete object on the landscape, such as 1 building or 1 stream tributary) is independent. To get a collection of like features you have to filter against the whole dataset based on selected attribute values. This means you have to become familiar with the OSM data attribute definitions instead of simply navigating a hierarchy of predefined feature classes.

 

Essentially OSM has only 2 basic geometry types – nodes and ways. Nodes are points with a latitude and a longitude. Ways are an ordered list of nodes and are used to model a line. There are also closed ways where the first node and final node are the same and are used to represent either a loop in a line feature or an area of two-dimensional space occupied by a feature like a parcel of land. To identify whether a closed way represents an area occupying feature or not either the area attribute must be provided, with a yes or no value or another attribute that infers area needs to be attached.

OSM Data Elements

Data ElementDescription
FeatureA single identifiable physical element on the landscape. A feature is typically modelled as a node, a way or a relation with a unique id.
NodeA point with a latitude, longitude and unique id.
WayAn ordered list of nodes (2 – 2000 nodes) with a unique id. If more then 2000 nodes are required to define a single feature then multiple ways need to be created and joined in a relation.
Closed WayA way in which the first and last node are the same. A closed way represents either a linear feature that loops back on itself or an area feature that occupies ground.
RelationA collection of nodes, ways or other relations (each participating element is a member). It is recommended that no more then 300 members participate in a single relation. Although any one relation can technically support a mix of data element types in practice relations are collections of ways. Relations are used for ways that exceed the 2000 node limit or to define features that consist of multiple rings (e.g. an area feature that has a hole in it). A members can have a role attribute whose value identifies what function it provides to the rest of the relation e.g. if a closed way member has the role=inner, it defines the perimeter of an internal hole in an area feature.
Tag (Attribute)Tags are the means to associate attributes to features. A tag consists of a key (what kind of attribute is being defined) and a value (the actual attribute value). This generally this takes the form:

key=value

Features can have any number of tags (or none) but can have only one tag per key. The OSM community maintains a list of commonly used tags with definitions and lists of acceptable values.

Strengths and Weaknesses of OSM Implementation

ProCon
Flexibility - contributors can add any tag they want, including entirely new ones. This means features can be fully described.The only quality control is contributors own vigilance plus review and correction from other members.
The structure is fairly simple and does not require sophisticated knowledge in database theory or geographic data formats.Populating attributes is entirely voluntary, only enforced by norms and best practices. There are no default values or required attributes.
Features can be added without all feature attributes being known.To extract a class of features you have to know what attributes it is defined by (and hope contributors have attributed correctly).
Contributions become part of the larger dataset almost immediately. There is no contours feature to show land relief (hills, cliffs, depressions, flat land). Elevation can be captured in the ele attribute but this is infrequently provided.
Anyone can make edits which means corrections can be made quickly.As is the data is not amendable to many standard geographic analysis operations. Reasons for this can include:
• Nodes are required wherever lines intersect – OSM does not enforce this.
• The geometry type must be explicit. In many cases closed loops do not have the area attribute defined and whether the feature is a line or polygon must be inferred from some other attribute, something that is beyond a software tool.
A lot of frequently updated data is made available to any user, at no cost, with no restrictions on use other then attributing OSM as data provider. This raw data becomes the starting point for many value-added information products.There are no pre-defined layers e.g. such as “roads”. Every feature is a node, way, closed way or relation and a “layer” is simply any collection of features that share one or more attribute values (and not even necessarily the same element type). This can make it hard to extract the correct, complete set of features you actually want if attributes are misapplied or missing.

Proposed Basemap Feature Layers

 

One of my objectives is to extract from OSM features that can be used as basemap layers. Basemaps display basic geographic features as a locational reference to other mapped features. As a model I intend to create something similar to the basemap layers historically used in natural resource management. The first table below defines the layers I am trying to create. The second layer identifies what the OSM equivalent is and some notes on potential difficulties.

Basemap LayerGeometry TypeDescription
Water PolygonPolygonAreas covered by water excluding oceans and non-inland seas (primarily freshwater and salt-water lakes and large rivers).
Water LineLinePermanent streams and rivers below a certain absolute size; center-line of larger rivers above certain scales.
RoadsLineAnything meant to be driven on by motorized non-off-road vehicles.
Rail LinesLineAnything with physical rails in place with track suitable for commercial, public transit or passenger traffic.
Country – Administrative 1PolygonNational boundaries.
Province - Administrative 2PolygonSub-national boundaries within a nation typically equivalent to a province.
Region - Administrative 3PolygonGovernment administrative levels larger then municipalities but subordinate to a province. Examples would be counties.
Municipal - Administrative 4PolygonLocal government typically centered on a city or town plus surrounding rural area.
ContoursPolygon
(Lines ?)
Area polygons where all parts of a polygon fall within the same elevation range.
Basemap LayerOSM Attribute Keys and ValuesNotes
Water Polygonnatural=water, waterway=riverbank, natural=bayA “bay” is “water mostly surrounded by land” and can be lake or ocean. There doesn’t seem to be a way to distinguish between the two which implies a post-export geoprocessing operation would have to be developed to limit features to freshwater/inland seas (something like “only features completely within land boundaries”.
Water Linewaterway=river, waterway=stream
Roadshighway=
{
motorway
trunk
primary
secondary
tertiary
unclassified
residential
service
motorway_link
primary_link
secondary_link
tertiary_link
track
road
}
Rail Linesrailway=
{
rail
disused
light_rail
monorail
narrow_gauge
preserved
}
Country – Administrative 1boundary=administrative + admin_level=2,
boundary=maritime + admin_level=2
Province - Administrative 2boundary=administrative + admin_level=4,
boundary=maritime + admin_level=4
Good for US, Canada; works for most other countries.
Region - Administrative 3boundary=administrative + admin_level=6,
boundary=maritime + admin_level=6
Good for US, Canada; works for most other countries.
Municipal - Administrative 4boundary=administrative + admin_level=8,
boundary=maritime + admin_level=8
Good for US, Canada with some exceptions (very large cities, national and sub-national capitals may have higher admin_level); many less developed countries do not have communities smaller then cities attributed this way.
Contoursele=numberToo incomplete to use in most areas; as node (point) features would have to run a post-export geoprocessing operation to convert to polygons.

The above can be considered a preliminary definition that will probably take several iterations to get right. I may have to add coastline or continental boundary as a basemap layer but  am hoping I can get away with simply using Country as approximately the same. The plan also envisions increasing numbers of secondary layers – such as urban, wilderness or park lands – being later defined.

References

http://wiki.openstreetmap.org/wiki/Map_Features

http://wiki.openstreetmap.org/wiki/Features

http://wiki.openstreetmap.org/wiki/Relation

http://wiki.openstreetmap.org/wiki/Elements

http://wiki.openstreetmap.org/wiki/Tags

http://wiki.openstreetmap.org/wiki/Node

http://wiki.openstreetmap.org/wiki/Way

http://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative#admin_level

Please follow and like us:
error

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.