← libosmium

Libosmium Manual

Table of Contents

1. Introduction

The OpenStreetMap project is growing at an enormous rate. Working with the OSM data becomes increasingly difficult, because there is just so much of it and because it gets more complex all the time.

Osmium was developed as an answer to this challenge. After years of developing software to work with OSM data in many programming languages like Perl, Ruby, Java and even in XSLT, it became evident that something more was needed to efficiently work with these huge amounts of data. Processing speed was, of course, one big issue here, but the other one is available memory. Data processing tasks can be so much faster if their working set fits into memory, that it makes sense to think about this. Because Osmium is a C++ library it can make very efficient use of the main memory on your computer. Primitive objects such as integers and doubles, but also complex objects need only as much memory as is really necessary. There isn’t a lot of management overhead needed in many cases, if the data structures are chosen carefully.

Osmium has been in continuous development since it was borne in October 2010. And it has changed considerably over time. While the basic premise, to write a low-level efficient OSM library, is still true, it has become more and more powerful and at the same time easier to use. Osmium has been in production use nearly from day one, some parts of it have been ripped from earlier production code. Osmium is not an academic exercise, but it is used and it has shown its power many times. And while C++ might not be the easiest programming language to learn and Osmium might not be the easiest library to use, we try to make it as simple as possible to work with it, as long as this doesn’t compromise efficiency too much.

Header-only Library

Osmium is a header-only library, so there is nothing to compile to build it. Just include the header files you need.

The osmium Namespace

Everything in the Osmium library is in the osmium namespace or in sub-namespaces. You’ll likely encounter the osmium::io namespace for everything related to file input and output and the osmium::geom namespace for geometry-related functionality, but there are some more.

Do not directly use anything in any sub-namespace called detail. Those classes and functions are for internal use only.

Code in any experimental sub-namespace is experimental and might be removed or changed without notice.

License

This manual is available under the Creative Commons Attribution-ShareAlike License version 4.0.

The Osmium Library is available under the very liberal Boost Software License:

Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the “Software”) to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

2. Dependencies

Different parts of Libosmium have different dependencies. You do not need to install all of them, just those that you need for whatever you are doing with Libosmium. But for a beginner it is not always easy to see which dependencies are needed and which aren’t. This manual differentiates between important dependencies and extra dependencies to help you out. You should at least install the important dependencies when starting to experiment with Libosmium, but feel free to install all dependencies. Whatever is not needed will not be used anway, it will not slow down your program or make the binaries bigger.

On Linux systems most of these libraries are available through your package manager, see the list below for the names of the packages. But make sure to check the versions. If the packaged version available is not new enough, you’ll have to install from source. Most likely this is the case for Protozero and Libosmium itself.

On macOS many of the libraries above will be available through Homebrew.

When building Libosmium tests and examples, CMake will automatically look for these libraries in the usual places on your system. In addition it will look for the Protozero library in the same directory where the Libosmium repository is. So if you are building from the Git repository and want to use the newest Libosmium and Protozero, clone both into the same directory:

mkdir work
cd work
git clone https://github.com/mapbox/protozero
git clone https://github.com/osmcode/libosmium

In addition to the programs listed here, you’ll need a C++ compiler which supports C++11. Clang 3.4 or later and GCC 4.8 or later are known to work.

Important dependencies

CMake and Make

To build the tests, examples, etc. you need the CMake build system. Programs using Libosmium can, of course, be built with any build system you like, but the Libosmium repository as well as many projects based on Libosmium use it.

CMake has an optional curses-based configuration tool called ccmake. It is recommended that you install this also.

CMake usually generates a Makefile for Make, which you will also need.

Expat

Expat is needed for parsing OSM XML files.

ZLib

zlib is needed for reading and writing OSM PBF files and for GZip support when reading and writing XML files.

bz2lib

bz2lib is needed for BZip2 support when reading and writing OSM XML files.

Boost >= 1.55

Boost is used for some (limited) functionality in libosmium. Many programs using libosmium will not actually need boost or only need parts of it.

You need at least Boost version 1.55.

Google Protocol Buffers (until version 2.2)

Not needed any more from version 2.3.0 onwards

Google Protocol Buffers in at least version 2.4.0 is needed for reading and writing OSM PBF files.

OSMPBF (until version 2.2)

Not needed any more from version 2.3.0 onwards

The OSMPBF library is needed for reading and writing OSM PBF files.

Protozero >= 1.6.3 (since libosmium version 2.3.0)

The Protozero header only library is needed for reading and writing OSM PBF files.

You need at least version 1.6.3.

Up to version 2.13 a copy of this library was included in the libosmium repository. For newer version you need to install either a packaged version or a version from the git repository.

Utfcpp (until version 2.14.0)

Not needed any more from version 2.15.0 onwards

The utf8-cpp library is needed for the OPL output format. A copy of this library is included in the libosmium repository but not installed by default. Either use the packages of your distribution, install it from the source, or use the INSTALL_UTFCPP option of the libosmium CMake configuration to install the bundled version.

Extra dependencies

Google Sparsehash (deprecated)

Google Sparsehash (https://github.com/sparsehash/sparsehash) is used for the sparse-mem-table index map, sometimes used as a node location store. This isn’t usually needed any more, because there are better implementations for the node location store available.

Boost Program Options (until version 2.7.2)

Boost Program Options is needed for parsing command line options in some examples.

GDAL/OGR

GDAL/OGR is needed if you want to convert OSM geometries into OGR geometries.

To use, compile with what the command

gdal-config --cflags

returns and link with what

gdal-config --libs

returns.

GEOS

GEOS is needed if you want to convert OSM geometries into GEOS geometries. The GEOS support is deprecated and works only until GEOS 3.5. For details see this commit.

Proj.4

The Proj.4 library is needed if you want to project OSM coordinates into spatial reference systems other than Web Mercator (EPSG 3857, often named Google Mercator).

Only the old proj_api.h based API is supported. If you need this to work with newer versions of Proj.4, have a look at https://github.com/osmcode/osmium-proj for some untested experimental code.

LZ4 (from 2.16.0)

The LZ4 library is needed if you want to use LZ4 compression in PBF files. This is an optional feature available from libosmium version 2.16.0.

Doxygen

The Libosmium API documentation can be built using Doxygen. Usually you do not need to do this, because the API reference is available online. If you want to build it yourself, you need Graphviz in addition to Doxygen.

Installing dependencies on some Linux systems

Debian Stretch, Buster, Bullseye or newer

You can install all dependencies with:

apt-get install -q -y \
    cmake \
    doxygen \
    g++ \
    git \
    graphviz \
    libboost-dev \
    libbz2-dev \
    libexpat1-dev \
    libgdal-dev \
    libgeos++-dev \
    liblz4-dev \
    libproj-dev \
    make \
    ruby \
    ruby-json \
    spatialite-bin \
    zlib1g-dev

Ubuntu 18.04 or newer

You can install all dependencies with:

apt-get install -q -y \
    cmake \
    doxygen \
    g++ \
    git \
    graphviz \
    libboost-dev \
    libbz2-dev \
    libexpat1-dev \
    libgdal-dev \
    libgeos++-dev \
    liblz4-dev \
    libproj-dev \
    make \
    ruby \
    ruby-json \
    spatialite-bin \
    zlib1g-dev

Fedora

You can install all dependencies with:

dnf install --quiet --assumeyes \
    boost-devel \
    bzip2-devel \
    cmake \
    doxygen \
    expat-devel \
    gcc-c++ \
    gdal-devel \
    gdalcpp-static \
    geos-devel \
    git \
    graphviz \
    lz4-devel \
    make \
    proj-devel \
    ruby \
    rubygem-json \
    spatialite-tools \
    zlib-devel

openSUSE 42

You can install all dependencies with:

zypper --non-interactive --no-color install \
    boost_1_61-devel \
    cmake \
    doxygen \
    gcc6-c++ \
    gdal-devel \
    geos-devel \
    graphviz \
    libbz2-devel \
    libexpat-devel \
    libproj-devel \
    proj \
    ruby2.3 \
    ruby2.3-rubygem-json \
    zlib-devel

Arch Linux

You can install all important dependencies with:

sudo pacman -Suy protobuf boost-libs zlib expat cmake make bzip2

and all extra dependencies with:

sudo pacman -Suy boost gdal proj doxygen

3. Building Libosmium

Libosmium is a header-only library, that means that you do not have to build anything. But you might want to build the tests, examples, benchmarks or the documentation. This chapter explains how to do that.

Before building you need to install all the dependencies.

CMake

Libosmium uses the CMake configuration system available on all major platforms. CMake will generate a configuration for a build system of your choice. On Linux and macOS this is usually GNU Make, on Windows Nmake or MSBuild.

Build types

CMake knows several different build types that result in the use of different compiler options and different build options (see below). By default the build type RelWithDebInfo (Release with debug info) will be used, but you can change this either by setting CMAKE_BUILD_TYPE in ccmake or on the command line:

cmake -DCMAKE_BUILD_TYPE=Dev

Here are the build types used for Libosmium:

CMAKE_BUILD_TYPE Description
Debug Debug mode, no optimizations.
Dev For Libosmium developers. All build options are set to ON and very strict compiler warnings are enabled.
MinSizeRel Release mode, optimize for small binary.
RelWithDebInfo Release mode with debug information compiled in. Use this unless the binaries generated are too big for you.
Release Release mode.

Build options

Depending on the build type (see above), different build options are ON or OFF. You can change the settings in ccmake or on the command line with something like

cmake -DBUILD_EXAMPLES=ON

etc.

Build option Default Description
BUILD_BENCHMARKS OFF (ON in Dev build) Build the benchmark programs. You only need this if you intend to run the benchmarks.
BUILD_DATA_TESTS OFF (ON in Dev build) Build the data tests. These tests need OSM test data from a different repository, so they are a bit more difficult to run. See chapter Running Tests for details.
BUILD_EXAMPLES ON Build the examples in the examples directory.
BUILD_HEADERS OFF (ON in Dev build) Only interesting for Libosmium developers. This will build every Libosmium header file by itself to check if the include dependencies are all set correctly.
BUILD_TESTING ON Build the unit tests. See chapter Running Tests for details.

Building on Linux and macOS

Linux: Osmium is developed on Linux and tested best on that system. Debian Jessie (testing) and current Ubuntu systems come with everything needed for Osmium. Debian wheezy (stable) and the Ubuntu LTS release 12.04 don’t have compilers current enough. If you are stuck on these systems, use a backported compiler.

macOS: Osmium also works well on macOS with the exception of the parts that need the mremap system call that is not available on macOS.

First clone Libosmium from the git repository (or install it in some other way):

git clone https://github.com/osmcode/libosmium
cd libosmium

Then create a directory in which the build should happen. In this documentation we will use the directory build, but you can choose any other name. You can have several build directories at the same time with different build options and they will not interfere with each other.

mkdir build
cd build

The call CMake to create an initial configuration:

cmake ..

CMake will check your system, determine locations of programs, include headers, libraries etc. It will also set some default build options. You can then call

ccmake ..

to enter a cursed-based tool that allows you to edit any configuration setting. Use the cursor keys to choose any variable and press Enter to change it. Once you are done, press c to configure and handle any errors that might appear. You might have to do this step several times. Then press g to generate the configuration and exit the program. For more advanced usage info, see the ccmake help.

Now you can call

make

to complete the build.

For Mac users: If you have clang 3.2 or newer, use the system compiler. If not you have to build the compiler yourself. See the instructions on https://clang.llvm.org/ .

Building on Windows

You need a rather new Visual C++ compiler for this to work. Visual C++ 2013 (a.k.a 12.0) is not supported. You’ll need 2014 CTP or the 2015 Preview. This is due to the limited C++11 support in earlier versions of Visual C++.

The easiest way on Windows is to use the windows-builds repository.

When the pre-requisites (Visual Studio 2014/2015, git) are in place, it should not take more than these steps to compile libosmium:

git clone https://github.com/mapbox/windows-builds.git
cd windows-builds
settings.bat
scripts\build_libosmium_deps
scripts\package_libosmium_deps
scripts\build_libosmium vs

Building on 32bit architectures

Osmium works well on 64 bit machines, but on 32 bit machines there are some problems. Be aware that not everything will work on 32 bit architectures. This is mostly due to the 64 bit needed for node IDs. Also Osmium hasn’t been tested well on 32 bit systems. Here are some issues you might run into:

Please report any issues you have and we might be able to solve them.

Building the reference documentation

To build the documentation you’ll need Doxygen.

After configuring with CMake as described above, call

make doc

to create the reference documentation.

Installing Libosmium

Call make install in the build directory to install the library. By default, this will install the Osmium include files into /usr/local/include/.

The following external (header-only) libraries are included in the libosmium repository:

If you want this library to be installed along with libosmium itself when calling make install, you have to use the CMake options INSTALL_GDALCPP.

(Libosmium versions 2.13 and before also included protozero which could be included with INSTALL_PROTOZERO. Newer versions of libosmium don’t include this any more.)

(Libosmium versions 2.14.0 and before also included utfcpp which could be included with INSTALL_UTFCPP. Newer versions of libosmium don’t include this any more.)

If something didn’t work

Here are some tips if your build failed:

Advanced CMake configuration

The following variables can be set in the CMake configuration to further change the build. Changes here are usually not necessary though:

Option Description
BENCHMARK If BUILD_BENCHMARKS is ON, this variable contains the semicolon-separated list of all benchmarks that should be built. The prefix osmium_benchmark_ will be added to all executables.
EXAMPLES If BUILD_EXAMPLES is ON, this variable contains the semicolon-separated list of all examples that should be built. The prefix osmium_ will be added to all executables.
OSMIUM_WARNING_OPTIONS C++ compiler warning options used in Dev mode.

Running clang-tidy

To check for problems in the source code not detected by compilers, you can run the clang-tidy command. If it is installed and CMake found it, you can call Make with the clang-tidy target:

make clang-tidy

The configuration for clang-tidy is in the file .clang-tidy. It also contains documentation on why certain warnings are disabled.

Running clang-tidy will take quite a while and might generate a lot of output. You can redirect the output to a file using something like this:

make clang-tidy >clang-tidy.log 2>&1

Running CPPCheck

To check for problems in the source code not detected by compilers, you can run the cppcheck command. If it is installed and CMake found it, you can call Make with the cppcheck target:

make cppcheck

This will check all .hpp and .cpp files and can take a while.

4. Running tests

Libosmium uses version 1 of the Catch unit testing framework and CTest which is part of the CMake suite.

There are three kinds of tests: unit tests, data tests, and example tests. For the details see below.

Tests should never fail. If they do fail in your environment, please report this as a bug. Some tests will be disabled on some platforms if they are testing functionality thats not available on that platform. Some tests will be disabled on your host if you don’t have the needed dependencies installed.

Running the tests

To run the tests, build the project es described in the Building Libosmium chapter and then run

ctest

which will run all the configured tests. You can run all tests matching a pattern with something like

ctest -R 'io_.*'

or exclude tests from being run with something like

ctest -E io_test_reader

If there is some problem you can enable verbose mode:

ctest -V

See the CTest documentation for more details.

Labels

CTest allows tests to be labeled to categorize them. All unit tests have the label unit and a label for their category (the directory under test/t). All data tests have the label data. In addition all tests are labeled as fast or slow. Fast tests don’t take a noticable amount of time, slow tests do.

You can run all tests with labels matching a regular expression with -L. So to run only fast tests use

ctest -L fast

You can use

ctest --print-labels

to see all available labels.

Unit tests

Unit tests check small parts of Libosmium. They can be found in the directories under test/t. If you are installing Libosmium, you should probably run these tests to make sure Libosmium works in your environment.

Unit tests are enabled or disabled with the BUILD_TESTING CMake setting. Different tests have different dependencies and CMake will disable all tests that don’t have their dependencies met.

You can also run the unit tests manually without going through CTest. After building they are in the build/test directory. Call them with --help to see options.

Data tests

Data tests need external OSM test data to run. They are enabled or disabled with BUILD_DATA_TESTS, but you have to install the test data for them to work. For this call git submodule update --init in the libosmium repository.

If you have put the test data somewhere else, you can use the OSM_TESTDATA variable in CMake to point to that directory.

The testdata-multipolygon test needs Spatialite and Ruby with the json gem installed. Those dependencies are currently not checked for in the CMake configuration.

Note that older versions of libosmium don’t have the test data installed as a submodule, but expect it to be in the same directory you installed Libosmium in. To do this clone the osm-testdata repository:

git clone https://github.com/osmcode/osm-testdata

Example tests

Some example programs come with tests. Those tests are under test/examples. They run the example programs with some arguments to check basic functionality. Currently these tests are very rudimentary.

5. Using Libosmium in your own projects

Libosmium is generally quite easy to use in your own projects. Just include the specific header files you need for your application and start using Libosmium functions. Because Libosmium is a header-only library, there is nothing to link with. There isn’t one include file for everything, but many include files each only bringing in some specific classes and functions. This way you are not paying for something you don’t use.

Read the manuals

Before you do anything else we recommend you at least skim the Libosmium concepts manual and this manual. This will give you an overview of what’s where and how Libosmium works.

Read the API reference

The API reference contains a documentation of every class and function in Libosmium. It will tell you which #include directive you need where.

Libosmium uses several other libraries for many of its functions and you have to figure out which libraries to link with when you include specific Libosmium header files. This is documented in the reference and there is a list below for your convenience.

CMake configuration

If you are using CMake to configure your project, using Libosmium is very easy, because complete configuration is available. Copy the file FindOsmium.cmake to your project:

cd your-project
mkdir -p cmake
cd cmake
wget https://github.com/osmcode/libosmium/raw/master/cmake/FindOsmium.cmake

and include it in your CMakeLists.txt:

list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
find_package(Osmium REQUIRED)

This will tell CMake to find the Libosmium includes on the build system during the configuration. You can check whether this was successful with something like:

if(NOT OSMIUM_FOUND)
    message(WARNING "Libosmium not found!\n")
endif()

If your code doesn’t work with older version of Libosmium, you can tell CMake the minimum version number:

find_package(Osmium 2.15.6 REQUIRED)

You can add an optional list of components that should be found also. For example to look for the io and gdal components you extend the find_package command like this:

find_package(Osmium REQUIRED COMPONENTS io gdal)

FindOsmium knows about the following components:

After that add the include directories:

include_directories(${OSMIUM_INCLUDE_DIRS})

You can look at the CMake configuration in the Osmium Tool and Osmium Contrib repositories for some working examples.

Note that you should occasionally check whether you still have a current version of FindOsmium.cmake and update if necessary.

Libraries needed for specific functionality

Also see the dependencies chapter.

XML input

For XML input you need the Expat XML parser, for XML output no special XML library is needed. In any case you need threading enabled. If you want to read or write compressed XML files you need ZLib and BZ2lib.

PBF input and output

For PBF input and output you need several libraries and threading enabled.

For version 2.3.0 and above you don’t need much:

If you want support for lz4 compression in PBF blobs, you also need the LZ4 library.

For versions up to 2.2 you need some more libraries:

GDAL/OGR

The GDAL/OGR library is needed when you want to convert OSM geometries into OGR geometries or report problems building multipolygons into OGR formats.

PROJ

The PROJ library is only needed when you want to project OSM locations into arbitrary coordinate reference systems. If you only want to convert to Web Mercator, use osmium::geom::MercatorProjection instead and you don’t need an extra library.

Note that only PROJ up to version 5 is supported.

Compiler options

You might have to set the C++ version using the compiler option

-std=c++11

When working with OSM data you often have very large files with several gigabytes. This can lead to problems on 32bit systems. Use the options

-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

for the compiler to make sure that large files work.

Sample Compilation String

g++ osm_processor.cpp --std=c++11 -lpthread -lz -lexpat -lbz2

6. Basic Types

All the types and classes described in this chapter are value types, ie they are small and can be copied around cheaply.

IDs

Typedef: osmium::object_id_type

Include: <osmium/osm/types.hpp>

For object IDs use the type osmium::object_id_type. It is a 64bit signed integer that can represent the more than 2 billion nodes we already have in OSM. While way and relation IDs could theoretically use a smaller ID type (signed 32 bit are currently enough), for consistency and to be future-proof, they will also use this type in most cases.

OSM objects always have positive IDs. But some software (such as JOSM) uses negative IDs for objects that have not yet been uploaded to the main OSM database. To support these use cases, the object_id_type is a signed integer.

Some parts of Osmium, notably the different index classes, can only work with positive IDs. In those cases the type osmium::unsigned_object_id_type is used. If you know that your data only contains positive IDs or only negative IDs, you can use the positive_id() member function on the Object class to get IDs of that type. It will return the absolute value of the ID.

If your data contains a mix of positive and negative IDs, this simple approach will fail! In that case you have to use two indexes, one for the positive IDs and one for the negative IDs. The osmium::handler::NodeLocationsForWay class takes this approach.

Other Primitive Types

Include: <osmium/osm/types.hpp>

There are several other typedefs:

Type Description
object_version_type type for OSM object version number
changeset_id_type type for OSM changeset IDs
user_id_type type for OSM user IDs
num_changes_type type for number of changes in a changeset

All these types are currently 32bit integers. Version numbers, changeset IDs and User IDs are always positive (they start out with 1). The number of changes can be 0 or larger.

Locations

Class: osmium::Location

Include: <osmium/osm/location.hpp>

In Osmium all positions on Earth are stored in objects of the osmium::Location class. Coordinates are stored as 32 bit signed integers after multiplying the coordinates with osmium::coordinate_precision = 10,000,000. This means we can store coordinates with a resolution of better than one centimeter, good enough for OSM use. The main OSM database uses the same system. We do this to save memory, a 32 bit integer uses only 4 bytes, a double uses 8.

Coordinates are not checked when they are set.

To create a location:

osmium::Location location{9.3, 49.7};

or using integers:

osmium::Location location{93000000, 497000000};

Make sure you are using the right number type or you will get very wrong coordinates.

You can also create an undefined location. This is used for instance for coordinates in ways that are not set yet:

osmium::Location location{};

In a boolean context an undefined location returns false, a defined true. So you can write something like:

if (location) {
    ...defined location here...
}

You can get and set the coordinates using the internal (integer) format with the x() and y() member functions and the external (double) format with the lon() and lat() member functions.

The normal bounds for the longitude and latitude are -180 to 180 and -90 to 90, respectively. But in historic OSM data you can sometimes find locations outside these bounds. Call

location.valid()

to find out if a location is inside those bounds.

The lon() and lat() getter calls will throw an exception if the location is invalid or undefined.

Segments

Class: osmium::Segment

Include: <osmium/osm/segment.hpp>

Segments are the directed connection between two locations. They are not OSM objects but sometimes useful in algorithms.

Undirected Segments

Class: osmium::UndirectedSegment

Include: <osmium/osm/undirected_segment.hpp>

Undirected Segments are connection between two locations. They are not OSM objects but sometimes useful in algorithms.

Boxes

Class: osmium::Box

Include: <osmium/osm/box.hpp>

A box is a rectangle described by the minimum and maximum longitude and latitude. It is used, for instance, in the header of OSM files and in changesets to describe the bounding box.

osmium::Box box;
box.extend(osmium::Location{3.2, 4.3});
box.extend({4.5, 7.2});
box.extend({3.3, 8.9});
std::cout << box;  // (3.2,4.3,4.5,8.9)

7. OSM Entities

Osmium works with the four basic types of OSM entities: Nodes, Ways, and Relations (which are all [OSM Objects]) and Changesets. In addition Areas are supported, which are not native OSM objects, but they are almost treated like real OSM objects.

These OSM entities can not be created like any normal C++ object, but they need a buffer to live in. See the next chapter for details. Accessing existing OSM entities on the other hand is easy and straightforward.

OSM Objects

Class: osmium::OSMObject

Include: <osmium/osm/object.hpp>

The osmium::OSMObject class is the base class for nodes, ways, and relations. it has accessors for the usual OSM attributes:

osmium::OSMObject& obj = ...;
std::cout << "id=" << obj.id()
          << " version=" << obj.version()
          << " timestamp=" << obj.timestamp()
          << " visible=" << (obj.visible() ? "true" : "false"
          << " changeset=" << obj.changeset()
          << " uid=" << obj.uid()
          << " user=" << obj.user() << "\n";

The changeset() and uid() accessor functions return the IDs of the changeset that created this object version and the User ID of the user creating this version of the object, respectively. They do not link to an object of that type.

The visible flag will always be true for normal OSM data, but for history data or change files it shows whether an object version has been deleted.

In addition each object has a list of tags attached:

const osmium::TagList& tags = obj.tags();

You can iterate over all tags:

for (const auto& tag : obj.tags()) {
    std::cout << tag.key() << '=' << tag.value() << '\n';
}

Or you can find specific tags:

const char* highway = obj.tags().get_value_by_key("highway");
if (highway && !std::strcmp(highway, "primary") {
    ...
}

Nodes

Class: osmium::Node

Include: <osmium/osm/node.hpp>

A Node is a kind of OSMObject. In addition to the things you can do with any OSMObject, the Node has a Location.

const osmium::Node& node = ...;
double longitude = node.location().lon();

Ways

Classes: osmium::Way, osmium::WayNode, osmium::WayNodeList

Include: <osmium/osm/way.hpp>

A Way is a kind of OSMObject. In addition to the things you can do with any OSMObject, a Way has a list of node references:

const osmium::Way& way = ...;
for (const osmium::NodeRef& nr : way.nodes()) {
    std::cout << "ref=" << nr.ref() << " location=" << nr.location() << '\n';
}

Relations

Classes: osmium::Relation, osmium::RelationMember, osmium::RelationMemberList

Include: <osmium/osm/relation.hpp>

A Relation is a kind of OSMObject. In addition to the things you can do with any OSMObject, a Relation has a list of members:

const osmium::Relation& relation = ...;
const osmium::RelationMemberList& rml = way.members();
for (const osmium::RelationMember& rm : rml) {
    std::cout << rm.type() << rm.ref() << " (role=" << rm.role() << ")\n";
}

Areas

not yet documented

Changesets

Class: osmium:Changeset

Include: <osmium/osm/changeset.hpp>

Changesets contain the metadata for a set of changes to OSM data.

osmium::Changeset

8. Buffers

Include: <osmium/memory/buffer.hpp>

OSM entities have to be stored somewhere in memory. They are complex objects containing arbitrary number of tags, relations can have any number of members etc. If we handled those objects like any normal C++ object, creating them would take lots of small memory allocations and many pointer indirections to get at all the parts of the data. Instead OSM entities are created inside so-called buffers. Buffers can have a fixed size or grow as needed. New objects can be added at the end, and they are stored inside those buffers in a reasonably space-efficient manner while still being accessible easily and quickly.

Buffers can be moved around between different parts of your program and even between threads. The content of buffers can even be written to disk as it is and read back in and immediately used “as is” without any serialization or de-serialization step needed.

But all of this has one draw-back: It is slightly more complicated to create those objects and they can not just be instantiated on the stack.

Buffers can not be copied, because it is unclear who would be responsible for the memory then. But they can be moved.

Creating a Buffer

Buffers exist in two different flavours, those with external memory management and those with internal memory management. If you already have some memory with data in it (for instance read from disk), you create a buffer with external memory managment. It is your job then to free the memory once the buffer isn’t used any more. If you don’t have some memory space already, you can create a Buffer object and have it manage the memory internally. It will dynamically allocate memory and free it again after use.

To create a buffer from existing memory you give the address and size to the constructor:

const int buffer_size = 10240;
void* mem = malloc(buffer_size);
osmium::memory::Buffer buffer{mem, buffer_size};

This will create an empty buffer with buffer_size bytes available for use.

If the new buffer already contains some data, you can add the number of bytes already in use as a third parameter to the constructor:

void* mem = malloc(buffer_size);
int num = read(0, mem, buffer_size);
osmium::memory::Buffer buffer{mem, buffer_size, num};

To create a buffer with internal memory-management you construct it with the number of bytes it should have initially and a flag that tells Osmium whether it should automatically grow the buffer if it is needed:

const int buffer_size = 10240;
osmium::memory::Buffer buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
osmium::memory::Buffer buffer{buffer_size, osmium::memory::Buffer::auto_grow::no};

Adding Items to the Buffer

You cannot create OSM objects on the stack, they always have to be stored in buffers. To create OSM objects special “builder” classes are used:

void add_tags(osmium::memory::Buffer& buffer, osmium::builder::Builder* builder) {
    osmium::builder::TagListBuilder tl_builder{buffer, builder};
    tl_builder.add_tag("amenity", "restaurant");
}

const int buffer_size = 10240;
osmium::memory::Buffer node_buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
{
    osmium::builder::NodeBuilder builder{node_buffer};
    builder.add_user("foo");
    osmium::Node& obj = builder.object();
    obj.set_id(1);
    obj.set_version(1);
    obj.set_changeset(5);
    obj.set_uid(140);
    obj.set_timestamp("2016-01-05T01:22:45Z");
    obj.set_location(osmium::Location{9.0, 49.0});
    add_tags(node_buffer, &builder);
}
node_buffer.commit();
// do something with the buffer (e.g. write to file)

Building OSM entities and adding them to a buffer has some pitfalls. A buffer has to be aligned (padding with zeros) before committing. If you try to commit a buffer which is not aligned, you program will fail with Assertion 'buffer.is_aligned()' failed.

The addition of the attributes version, changeset, uid and timestamp may be omitted but you have to add the attribute user in order to have an aligned buffer.

If the object has references to other OSM objects (tags of an OSM object, node references of a way, members of a relation), you need additional builders for these reference lists. The destructor of one of these builders has to be called before another builder writes data to the buffer.

void build_way(osmium::memory::Buffer& buffer) {
    osmium::builder::WayBuilder way_builder{buffer};
    way_builder.object().set_id(1);
    // set attributes version, changeset, uid and timestamp (all optional)
    way_builder.add_user("foo");
    {
        osmium::builder::WayNodeListBuilder wnl_builder{buffer, &way_builder};
        wnl_builder.add_node_ref(osmium::NodeRef (1, osmium::Location()));
        wnl_builder.add_node_ref(osmium::NodeRef (2, osmium::Location()));
    }
    add_tags(buffer, way_builder);
}

const int buffer_size = 10240;
osmium::memory::Buffer way_buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
build_way(way_buffer);
way_buffer.commit();

This will create only a way, the nodes have to be created separately.

Building relations works similar to building ways. You use a osmium::builder::RelationBuilder instead of a WayNodeListBuilder. The instance of RelationBuilder has to go out of scope before the TagListBuilder writes the tags to the buffer and vice versa.

Handling a Full Buffer

If a buffer becomes full, there are two different things that can happen:

If the buffer was created with auto_grow::yes, it will reserve more memory on the heap and double its size. This will happen without the client code noticing, but it will invalidate any pointer pointing into the buffer. This is similar behaviour as a std::vector so it should be familiar to C++ programmers.

If the buffer was created with auto_grow::no (or if it is a buffer with external memory management), the exception osmium::BufferIsFull will be thrown. In this case you have to catch the exception, either grow the buffer or create a new one. If you grow the buffer you can keep going at the point where you left off. If you start a new one, the last object you were writing to the buffer when the exception was thrown was not committed and you have to write it again into the new buffer.

The CallbackBuffer Class

Include: <osmium/memory/callback_buffer.hpp>

The CallbackBuffer is a small wrapper class around the Buffer class. It tries to keep the size of the internal buffer beneath a maximum buffer size specified in the constructor. If the buffer is “full” a callback is called.

// Initialize a callback buffer with default size (1MB) and default max
// size (800kB). You can change those numbers by giving them to the constructor.
CallbackBuffer cb;

// Set a callback that knows what to do with the buffer, for instance it can
// write it out to disk.
cb.set_callback([&](osmium::memory::Buffer&& buffer) {
    ...handle buffer...
}

// Add objects to your buffer, for instance like this:
osmium::builder::add_node(cb.buffer(), _id(9), ...);

// Call `possibly_flush()` after each object added to the buffer to check
// the size and possibly call the callback.
cb.possibly_flush();

// ...

// Force a flush of the buffer when you are finished adding data to the buffer.
cb.flush();

Note that the buffer can grow beyond the initial buffer size if needed. This can happen if a new object doesn’t fit into the rest of the buffer available or if no callback function is set (yet).

9. Input and Output

Libosmium can read several different OSM file formats.

Headers

Whenever you want to use Osmium to access OSM files you need to include the right header files and link your program to the right libraries. If you want to support all the different formats you add

#include <osmium/io/any_input.hpp>

and/or

#include <osmium/io/any_output.hpp>

to your C++ files. These headers will pull in all the file formats and all the compression types for input and output, respectively. Usually this is what you want to use. But if you are sure you don’t need all formats or if you don’t have all the libraries needed for all the formats, you can pick and choose formats and compression types.

If you only need some file formats, you can include any combinations of the following headers:

#include <osmium/io/pbf_input.hpp>
#include <osmium/io/xml_input.hpp>

#include <osmium/io/debug_output.hpp>
#include <osmium/io/opl_output.hpp>
#include <osmium/io/pbf_output.hpp>
#include <osmium/io/xml_output.hpp>

If you want compression support, you have to add the includes for the different compression algorithms:

#include <osmium/io/gzip_compression.hpp>
#include <osmium/io/bzip2_compression.hpp>

Or, if you want both anyway, you can just use the shortcut:

#include <osmium/io/any_compression.hpp>

Compression

If you want to use compression you have to include the right header files and link to the libz and libbz2 libraries, respectively.

File Formats

XML

For read support you need the expat parser library. Link with:

-lexpat

For write support no special library is needed.

PBF

To build with PBF support you have to compile with threads and need libz:

-pthread -lz

Note that in older versions of libosmium you needed to link with the protobuf and osmpbf libraries. They are not used any more. Instead the protozero header-only library is used.

Reading and Writing OSM Files with Osmium

The osmium::io::File class

Before reading from or writing to an OSM file, you have to instantiate an object of class osmium::io::File. It encapsulates the file name as well as any information about the format of the file. In the simplest case the File class can derive the file format from the file name:

osmium::io::File input_file{"planet.osm.pbf"} // PBF format
osmium::io::File input_file{"planet.osm.bz2"} // XML with bzip2 compression
osmium::io::File input_file{"planet.osc.gz"}  // XML change file, gzip2 compression

The constructor of the File class has a second, optional argument giving the format of the file, which can be used if the format can’t be deduced from the file name. In the simplest form the format argument looks the same as the usual file suffixes:

osmium::io::File input_file{"somefile", "osm.bz2"};

This setting of the format is often needed when reading from STDIN or writing to STDOUT. Both an empty string and a single dash as filename signify STDIN/STDOUT:

osmium::io::File input_file{"-", "osm.bz2"};
osmium::io::File output_file{"", "pbf"};

The format string can also take optional arguments separated by commas.

osmium::io::File output_file{"out.osm.pbf", "pbf,pbf_dense_nodes=false"};

It is also possible to change the format after creating a File object using the accessor functions:

osmium::io::File input_file{"some_file.osm"};
input_file.format(osmium::io::file_format_pbf);

Reading a File

After you have a File object you can instantiate a Reader object to open the file for reading:

osmium::io::File input_file{"input.osm.pbf"};
osmium::io::Reader reader{input_file};

As a shortcut you can just give a file name to the Reader if you are relying on the automatic file format detection and don’t want to do any special format handling:

osmium::io::Reader reader{"input.osm.pbf"};

Optionally you can add a second argument to the Reader constructor giving the types of OSM entities you are interested in. Sometimes you only need, say, the ways from the file, but not the nodes and relations. If you tell the Reader about it, it might be able to read the file more efficiently by skipping those parts you are not interested in:

osmium::io::Reader reader{"input.osm.pbf", osmium::osm_entity_bits::way};

You can set the following flags:

Flag Description
osmium::osm_entity_bits::nothing Do not ready any entities at all (useful if you are only interested in the file header)
osmium::osm_entity_bits::node Read nodes
osmium::osm_entity_bits::way Read ways
osmium::osm_entity_bits::relation Read relations
osmium::osm_entity_bits::changeset Read changesets
osmium::osm_entity_bits::all Read all of the above

You can also “or” several flags together if needed.

You can get the header information from the file using the header() function:

osmium::io::Header header = reader.header();

You read the OSM entities from the file using the read() which returns a buffer with the data:

while (osmium::memory::Buffer buffer = reader.read()) {
    ...
}

At the end of the file an invalid buffer is returned which evaluates to false in boolean context.

You can close the file at any time. It will also be automatically closed when the Reader object goes out of scope.

reader.close();

In most cases you do not want to work with the buffers, but with the OSM entities within them. See the [Iterators] chapter and the [Handlers] chapter for more convenient methods of working with open files.

The File Header

Some OSM file formats contain a file header. The most popular formats XML and PBF have a header as well as the O5M/O5C format. The OPL format doesn’t have a header.

You access the header information of a file you are reading from the Reader object with the header() method:

osmium::io::Header header = reader.header();

When writing a file the header can be set in the constructor of the Writer object, see below.

The header can contain any number of bounding boxes, although usually there is only a single one (or none). PBF files only allow a single bounding box, but XML files can have multiple ones, although it is unusual and the semantics are unclear, so it is discouraged to create files with multiple bounding boxes.

The header contains a flag telling you whether this file can contain multiple versions of the same object. This is true for history files and for change files, but not for normal OSM data files. Not all OSM file formats can distinguish between those cases, so the flag might be wrong.

In addition the header can contain any number of key-value pairs with additional information. Most often this is used to set the generator, the program that generated the file. Depending on the file format some of these key-value pairs are handled specially. Because there is no generic header option facility in OSM files, you can only read/write options that Osmium recognizes. Unknown options or options not suitable for the file format you are writing are silently ignored.

See the description of the osmium::io::Header and the osmium::util::Options class for details on setting and accessing these options.

These header options are recognized by Osmium:

Format R/W Option Description
XML,PBF r/w generator The program that generated this file. If this is not set by an application, Libosmium will set it to libosmium/VERSION on writing.
XML r/w xml_josm_upload Value of the upload attribute on the osm XML element (true or false) for use in JOSM.
XML r version File version (currently always set to 0.6).
PBF, O5M/O5C r timestamp (Replication) timestamp (1).
PBF r pbf_dense_nodes Set when reading a PBF file with DenseNodes (2).
PBF r pbf_optional_feature_# Set for all optional features specified in PBF header (3).
PBF r/w osmosis_replication_timestamp Timestamp used in replication (1, 4).
PBF r/w osmosis_replication_sequence_number Sequence number used in replication (4).
PBF r/w osmosis_replication_base_url Base URL for change files used in replication (4).
PBF r/w sorting Sorting of the file (5).
O5M/O5C r o5m_timestamp (Replication) timestamp (1).

Notes:

  1. The timestamp field is set to the same value as either osmosis_replication_timestamp or o5m_timestamp (if available). When writing a file, a timestamp option is ignored, you have to use one of the other ones.
  2. To disable DenseNodes when writing a file (they are enabled by default), you have to set this option not on the Header but on the File object.
  3. Example: When there are two optional features names “Foo” and “Bar” set in the PBF header, the options pbf_optional_feature_0=Foo and pbf_optional_feature_1=Bar are set.
  4. See the section “What are the replication fields for?” on https://wiki.openstreetmap.org/wiki/PBF_Format for details.
  5. Read or write the optional header property Sort.Type_then_ID if set to Type_then_ID.

Writing a File

To create an OSM file, create an instance of the osmium::io::Writer class and move buffers with OSM objects into its write() function:

osmium::memory::Buffer buffer;
// Add objects to the buffer (see above) or read it from
// an input file using osmium::io::Reader::read().
osmium::io::File output_file{"output.osm.pbf"};
osmium::io::Writer writer{output_file};
writer.write(std::move(buffer));
writer.close();

As a shortcut, you can directly give the filename to the Writer if you are relying on the automatic file format detection (the same as for Readers) and don’t need any special handling.

osmium::io::Writer writer{"output.osm.pbf"};

You can give additional arguments to the constructor of the Writer class, for instance a customized header or to allow writing over an existing file:

osmium::io::Header header;
header.set("generator", "FastOSMTool");
osmium::io::Writer writer{"output.osm.pbf",
                          header,
                          osmium::io::overwrite::allow,
                          osmium::io::fsync::yes};

10. Iterators

Every C++ programmer is familiar with iterators and their flexibility. There is no reason we couldn’t take advantage of that and of the many algorithms supplied by the STL. So libosmium supports several different kinds of iterators to access OSM data. You can iterate over all OSM objects in a buffer, or over all objects from a data source (usually a file), or over a bunch of pointers to OSM objects, and there are output iterators to write to files, too. All these different iterators can be used consistently and easily from your code without having to know much about what’s underneath. And because they work just like STL iterators do, you can use all the algorithms from the STL.

Some of these iterators will keep track of underlying buffers and make sure the buffers and the data in them stay around as long as there is an iterator pointing to it. This adds some overhead but makes using the data much easier.

Accessing Data in Buffers

Buffers containing OSM entities support the usual begin(), end(), cbegin(), and cend() functions:

osmium::memory::Buffer buffer = ...;

auto it = buffer.begin();
auto end = buffer.end();

for (; it != end; ++it) {
    std::cout << it->type() << "\n";
}

Of course you can also use the C++11 for loop:

for (auto& item : buffer) {
    ...
}

Accessing Data from Files

osmium::io::Reader reader{"input.osm"};
osmium::io::InputIterator<osmium::io::Reader> in{reader};
osmium::io::InputIterator<osmium::io::Reader> end;

11. Handlers

If you process OSM data with libosmium to do something (e.g. convert to a different format, import into a database, build a routing graph), you will usually create one or more handlers.

Handlers are created by deriving a class from osmium::handler::Handler which defines methods for all OSM object types, i.e. a method node(const osmium::Node&) for nodes, a method way(const osmium::Way&) for ways etc. You have to implement the methods for the object types you want to process. Libosmium will read the data, feed it object by object into the handler and you can do there whatever you want. Your handler may have temporary storage, e.g. if you want to sum up the length of all roads in an OSM file.

#include <iostream>

#include <osmium/handler.hpp>
#include <osmium/io/any_input.hpp>
#include <osmium/osm/node.hpp>
#include <osmium/osm/way.hpp>
#include <osmium/visitor.hpp>

class MyHandler : public osmium::handler::Handler {
public:
    void way(const osmium::Way& way) {
        std::cout << "way " << way.id() << '\n';
        for (const osmium::Tag& t : way.tags()) {
            std::cout << t.key() << "=" << t.value() << '\n';
        }
    }

    void node(const osmium::Node& node) {
        std::cout << "node " << node.id() << '\n';
    }
};

int main() {
    auto otypes = osmium::osm_entity_bits::node | osmium::osm_entity_bits::way;
    osmium::io::Reader reader{"input.osm.pbf", otypes};
    MyHandler handler;
    osmium::apply(reader, handler);
    reader.close();
}

The example above reads an OSM file and writes some informations about nodes and ways to STDOUT.

You can define multiple handlers, osmium will feed the objects into the handlers one after another. Just add the additional handlers to osmium::apply() which accepts a reader and one or multiple handlers.

Multiple handlers are necessary if you want to access the locations of the nodes referenced by a way because the way itself only contains references to the nodes. A special handler has to offer methods to look up the location by the ID of a node. The best index type for this NodeLocationsForWays handler depends on the size of the file, the available memory and the operating system. See Osmium Concept Manual for details.

#include <iostream>

#include <osmium/handler.hpp>
#include <osmium/osm/node.hpp>
#include <osmium/osm/way.hpp>
#include <osmium/io/any_input.hpp>
#include <osmium/visitor.hpp>
#include <osmium/index/map/sparse_mem_array.hpp>
#include <osmium/handler/node_locations_for_ways.hpp>

class MyHandler : public osmium::handler::Handler {
public:
    void way(const osmium::Way& way) {
        std::cout << "way " << way.id() << '\n';
        for (const auto& n : way.nodes()) {
            std::cout << n.ref() << ": " << n.lon() << ", " << n.lat() << '\n';
        }
    }
};

int main() {
    auto otypes = osmium::osm_entity_bits::node | osmium::osm_entity_bits::way;
    osmium::io::Reader reader{"input.osm.pbf", otypes};

    namespace map = osmium::index::map;
    using index_type = map::SparseMemArray<osmium::unsigned_object_id_type, osmium::Location>;
    using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;

    index_type index;
    location_handler_type location_handler{index};

    MyHandler handler;
    osmium::apply(reader, location_handler, handler);
    reader.close();
}

You can find lots of examples how to use a handler at the examples of libosmium and osmium-contrib repository.

12. Working with relations

Working with relations is more complicated than working with just nodes and ways. But relations contain a lot of interesting data, first and foremost the multipolygon relations needed for proper area support. To work with relations you usually have to somehow combine the relation objects with their member objects. Libosmium contains a lot of building blocks that can help you do that.

One often used approach looks like this: You read the OSM file containing the data you want to work on (either the planet or some extract) twice. On the first pass only relations are read and kept in main memory. On the second pass nodes, ways, and relations are read and matched to the in-memory relations they are a member of. This approach works quite well, because a) libosmium can read OSM data really fast, so reading a file twice isn’t as expensive as you might imagine, and b) because there aren’t that many relations in the OSM data compared to the number of nodes and ways. You could keep the nodes and ways in memory to later match them to the relations, but this would need a lot more memory. And it can’t handle the case properly where there are relations that are members of other relations, because you do not know that you might need a member relation before you see the parent relation.

This chapter describes how you can use the RelationsManager class to implement this approach in your code that can handle any kind of relation you like. It will then describe how you can use the MultipolygonManager that specifically does this for multipolygon relations. And after that we look at the classes used behind the scenes if you need to go deeper.

Note that there are classes used in earlier version of libosmium for similar work, namely the osmium::relations::Collector and osmium::area::MultipolygonCollector classes. They are still available, but deprecated now. Please use the manager classes instead.

Using the RelationsManager

The RelationsManager class handles the whole process outlined above of storing relations in memory and later matching OSM member objects to their parent relations. Once all the pieces of a relation have been assembled it will call your code to actually do something with the relation. Internally it uses several other classes described in the next chapter.

To use the RelationsManager create your own class deriving from it. The RelationsManager uses the Curiously recurring template pattern (CRTP) to call into your code.

#include <osmium/relations/relations_manager.hpp>

class YourManager : public osmium::relations::RelationsManager<YourClass, true, true, false> {
    ...
};

As you can see the first template parameter of the RelationsManager is your class, the next three template parameters tell the RelationsManager whether you are interested in member nodes, ways, and/or relations, respectively. So the code above says: I only want to handle members of type node and way, but not members of type relation. If a parameter is set to false the code in the class will behave as if there are no objects of the given class in the input file, your code will never see them.

Usually you want to overwrite several functions in this class that tell the RelationsManager how to behave:

The new_relation() function is called for every relation encountered in the input data. Usually this function should first decide whether your code is interested in this relation, typically by looking at the type tag. You can then do any processing on the relation that doesn’t require the actual member objects to be available. To “express interest” in this relation, return true, the relation is then “remembered” by the RelationsManager for further processing, otherwise the RelationsManager ignores this relation.

bool new_relation(const osmium::Relation& relation) noexcept {
    return relation.tags().has_tag("type", "route");
}

If you have expressed an interest in a relation, the new_member() function is called for each member. Again, you should first decide whether you are interested in this member, for instance depending on its type or role. Remember that at this time you only have the member type, id, and role available, not the whole object. You can then do any processing you need and return true or false depending on whether you are interested in this member. The default is to simply return true for all members which is often enough because you already specified which types of members you are interested in using the template parameters of the RelationsManager class.

bool new_member(const osmium::Relation& /*relation*/, const osmium::RelationMember& /*member*/, std::size_t /*n*/) noexcept {
    return true;
}

These two functions are called during the first pass through the data. All remaining functions are called during the second pass. The most important is the complete_relation() function. It is called for each relation you have expressed an interest in once all the members have been found in the input file. So when this is called you have access to the relation object as well as all the member objects.

Here is an example:

void complete_relation(const osmium::Relation& relation) {
    // Iterate over all members
    for (const auto& member : relation.members()) {
        // member.ref() will be 0 for all members you are not interested
        // in. The objects for those members are not available.
        if (member.ref() != 0) {
            // Get the member object
            const osmium::OSMObject* obj = this->get_member_object(member);

            // If you know which type you have you can also use any of these:
            const osmium::Node* node         = this->get_member_node(member.ref());
            const osmium::Way* way           = this->get_member_way(member.ref());
            const osmium::Relation* relation = this->get_member_relation(member.ref());
        }
    }
}

The pointers returned from the get_member_* functions will be nullptr if the member is not available. If you do the member.ref() != 0 check first, all members are available and you don’t need to check for nullptr.

You have to do all the processing of your relation in this function. Once you return from this function, the relation and its members will be removed from memory to make space for more data. If you will need the data again, you have to store it yourself somewhere.

The RelationsManager keeps an output Buffer for you. You can write objects into this buffer, later to be used in your application or written out to disk. Say you want to write out all member objects with a building tag:

void complete_relation(const osmium::Relation& relation) {
    for (const auto& member : relation.members()) {
        if (member.ref() != 0) {
            const auto& obj = this->member_database(member.type()).get(member.ref()));
            if (obj.tags().has_tag("building") {
                this->buffer().add_item(obj);
                this->buffer().commit();
            }
        }
    }
}

We’ll see later where this buffer ends up and how you can access it.

There are a bunch more functions that you can overwrite if you need to. The functions

are called for each node, way, or relation, respectively, before the member handling is done. The functions

are called for each node, way, or relation, respectively, after the member handling is done. The functions

are called when the node, way, or relation isn’t a member in any relation.

Note that these functions are only called if you have set the corresponding template parameters of the RelationsManager to true.

Here is the sequence of processing for each object:

Now that you have customized your class, you can use it like this:

int main(int argc, char* argv[]) {
    // You'll need some OSM input file
    osmium::io::File input_file{argv[1]};

    // Instantiate your manager class
    YourManager manager;

    // First pass through the file
    osmium::relations::read_relations(input_file, manager);

    // Second pass through the file
    osmium::io::Reader reader{input_file};
    osmium::apply(reader, manager.handler());

    // Access data in output buffer
    osmium::memory::Buffer buffer = manager.read();
    ...
}

If your manager stores its results internally in some way, this is enough. If your manager didn’t write any data into the output buffer or only few objects, the code above will do. But if the output buffer can grow too large, you have to handle it.

Using the Output buffer

The RelationsManager has an internal instance of the CallbackBuffer class (see chapter 8). In the class derived from the RelationsManager you use the buffer() function to get access to its internal buffer. You can add objects to this buffer as explained above.

The RelationsManager::handler() function sets the callback for this buffer. Your callback function is called whenever the internal buffer is full and you must do something with the buffer from there, for instance write it out to disk.

    // Second pass through the file
    osmium::io::Reader reader{input_file};
    osmium::apply(reader, manager.handler([&](osmium::memory::Buffer&& buffer){
        // This will be called whenever the buffer is "full" (see below).
        // Handle the buffer here.
    }));
}

Incomplete Relations

If you work with extracts of the planet, your extract will usually not have all relations complete, i.e. some members of some relations are missing because they are located beyond the boundary of the extract. For these relations you will never get a call to complete_relation(). If you still want to see these relations, call the for_each_incomplete_relation() function on the manager:

manager.for_each_incomplete_relation([&](const osmium::relations::RelationHandle& handle){
    // Access relation from handle:
    const osmium::Relation& relation = *handle;

    // Access members
    for (const auto& member : handle->members()) {
        if (member.ref() == 0) {
            // we did not express interest in the member
        } else {
            const auto* object = get_object(member);
            if (object == nullptr) {
                // member was not in input data
            } else {
                // member was in input data
            }
        }
    }
    // do something with the relation
});

The RelationHandle works a bit like a pointer giving you access to the underlying Relation using operator*() and operator->().

MultipolygonManager and MultipolygonManagerLegacy

Multipolygons are a type of relation at OpenStreetMap (they are tagged with type=multipolygon) to model areas with inner rings and areas with multiple outer rings. Osmium provides a relations manager for multipolygons and boundary relations (which work like multipolygons but are tagged with type=boundary called osmium::area::MultipolygonManager.

If you are working with older OSM data (before about June 2017) you have to take old-style multipolygons into account. They are not supported by the osmium::area::MultipolygonManager class, but you can use the osmium::area::MultipolygonManagerLegacy class instead.

There are lots of examples how to use a MultipolygonManager, e.g.

If the RelationsManager is not enough

The RelationsManager class has a lot of built-in flexibility allowing you to change its functionality by overriding many of its functions in a derived class. If this is not enough, you can use the following classes as a basis for your own implementation. Look the RelationsManager code as an example on how they are used and take it from there.

13. Creating Geometries

OSM objects describe where something is and what it is. The what is described by the tags, the where, the “geometry” is “encoded” in the locations (longitude and latitude) of the nodes for simple points, in the locations of the nodes in a way forming a linestring (or, possibly, a polygon if the first and last node are the same), and more complex geometrical objects (such as multipolygons) if relations are involved.

For many uses cases the geometry of an OSM object (or OSM objects) is important. After all, if you want to render a map, you need the geometry of everything in it. That is why libosmium has many functions to create the different kind of geometries from OSM objects. The whole exercise is made more difficult, because there are many different ways to represent geometries in C++ programs used by different software packages. Osmium knows about several of them.

Example: Creating a point geometry from a node

As an introductory example, we’ll look at how a point geometry can be created from a node.

#include <osmium/geom/factory.hpp>
const osmium::Node& node = ...; // got this from somewhere

osmium::geom::WKTFactory<> factory;
std::string wkt = factory.create_point(node);

First you need a geometry factory. Those factories know how to convert OSM objects into different kinds of geometry represantations. The WKTFactory creates geometries in the WKT (Well Known Text) format which is just a string like POINT(3.567 25.642).

Then you use the factory to create the point from the node and you are done.

Geometry types

Libosmium can create the following geometry types:

Geometry type from these objects with function
Point Node, NodeRef, Location create_point()
LineString Way, WayNodeList create_linestring()
Polygon Way, WayNodeList create_polygon()
MultiPolygon Area create_multipolygon()

Notes:

Factories

Libosmium supports the following factories for different geometry formats:

WKT

Well-known text is a simple text based format with geometries that look like POINT(2.2452, 41.3124) or LINESTRING(1.1554 2.5215, 1.1453 2.5663). They can be created like this:

#include <include/osmium/geom/wkt.hpp>
osmium::geom::WKTFactory<> factory;

The factory constructor takes an optional integer argument with the precision (number of digits after the decimal point), the default is 7, which is enough for OSM.

osmium::geom::WKTFactory<> factory{3}; // three digits after decimal point

All creation functions return a std::string:

std::string point = factory.create_point(node);
std::string line  = factory.create_linestring(way);
...

WKB

Well-known binary is a simple binary format. Create the factory like this:

#include <include/osmium/geom/wkb.hpp>
osmium::geom::WKBFactory<> factory;

The factory constructor takes two optional arguments. The first decides whether you want WKB (wkb_type::wkb, default) or Extended WKB (EWKB, wkb_type::ewkb), the second decides whether to output in raw binary (out_type::binary, default) or in hex encoded binary (out_type::hex).

To create extended WKB in hex format as used by PostGIS for example:

osmium::geom::WKBFactory<> factory{osmium::geom::wkb_type::ewkb,
                                   osmium::geom::out_type::hex};

All creation functions return a std::string:

std::string point = factory.create_point(node);
std::string line  = factory.create_linestring(way);
...

GEOS

The functions for creating GEOS geometries are deprecated and work only until GEOS 3.5. If you want to use it beyond that contact the libosmium developers by opening an issue on the Github repository.

GEOS is an Open Source library with powerful operations to work with and modify geometries. To use it from libsomium:

#include <include/osmium/geom/geos.hpp>
osmium::geom::GEOSFactory<> factory;

You can also set the SRID used by GEOS (default is -1, unset):

osmium::geom::GEOSFactory<> factory{4326};

If this is not flexible enough for your case, you can also create a GEOS factory yourself and then the libosmium factory from it:

geos::geom::PrecisionModel geos_pm;
geos::geom::GeometryFactory geos_factory{&pm, 4326};
osmium::geom::GEOSFactory<> factory{geos_factory};

Note: GEOS keeps a pointer to the factory it was created from in each geometry. You have to make sure the factory is not destroyed before all the geometries created from it have been destroyed!

All creation functions return a unique_ptr to the GEOS geometry:

std::unique_ptr<geos::geom::Point> point = factory.create_point(node);
std::unique_ptr<geos::geom::LineString> line = factory.create_linestring(way);
...

GDAL/OGR

The GDAL/OGR library is very popular. Almost all Open Source GIS tools use it in one form or another to read or write geometries from/to files or databases in dozens of different formats (Shapfiles, Spatialite, PostGIS, etc.) You can use it from libosmium, too:

#include <include/osmium/geom/ogr.hpp>
osmium::geom::OGRFactory<> factory;

The factory constructor doesn’t take any special arguments.

All creation functions return a unique_ptr to the OGR geometry:

std::unique_ptr<OGRPoint> point = factory.create_point(node);
std::unique_ptr<OGRLineString> line = factory.create_linestring(way);
...

GeoJSON

The GeoJSON format describes how to encode geometries in JSON.

Libosmium has two different GeoJSON factories. One creates normal std::strings with the JSON data. The other uses the RapidJSON library. Both only create the geometry portion of the JSON structure for you. You have to add the feature structure with the properties yourself as needed for your use case.

The GeoJSONFactory takes an optional precision as argument like the WKT constructor:

#include <include/osmium/geom/geojson.hpp>

osmium::geom::GeoJSONFactory<> factory{6};
std::string point = factory.create_point(node);

The RapidGeoJSONFactory takes a form of rapidjson::Writer as argument. Here is an example:

#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <include/osmium/geom/rapid_geojson.hpp>

typedef rapidjson::Writer<rapidjson::StringBuffer> writer_type;
rapidjson::StringBuffer stream;
writer_type writer{stream};
osmium::geom::RapidGeoJSONFactory<writer_type> factory{writer};

Please see the RapidJSON documentation for details about the Writer class.

Using projections

Before creating the geometries, libosmium can convert the coordinates from the OSM objects into different coordinate systems using a projection. This projection is given as a template parameter to the factory constructor:

osmium::geom::WKTFactory<> factory; // default identity projection (EPSG 4326)

or

osmium::geom::WKTFactory<osmium::geom::IdentityProjection> factory; // same

Often used is the Web Mercator projection (EPSG 3857):

#include <osmium/geom/mercator_projection.hpp>
osmium::geom::WKTFactory<osmium::geom::MercatorProjection> factory;

The identity and Mercator projection are handled internally in libosmium. But you can also use any projection implemented by the Proj.4 library:

#include <osmium/geom/projection.hpp>

osmium::geom::Projection projection{"+init=epsg:31467"}; // Gauss-Krueger GK3
osmium::geom::WKTFactory<osmium::geom::Projection> factory{projection};

You need to link with -lproj if you use this library. See the documentation of the Proj.4 library on the different ways to initialize a projection using a projection string.

Exceptions

Factory functions throw osmium::geometry_error exceptions if something went wrong creating a geometry.

Implementing your own factory

The geometry formats already implemented should cover a lot of uses, but if you need to implement your own format factory, you can do so based on the code in libosmium. You have to implement your own SomeFormatFactoryImpl class that implements the make_point(), linestring_start(), linestring_add_location(), linestring_finish(), polygon_start(), polygon_add_location(), polygon_finish(), multipolygon_start(), multipolygon_polygon_start(), multipolygon_polygon_finish(), multipolygon_outer_ring_start(), multipolygon_outer_ring_finish(), multipolygon_inner_ring_start(), multipolygon_inner_ring_finish(), multipolygon_add_location(), and multipolygon_finish() functions. These functions are usually very small adapting the data to the desired format. All the really logic is in the provided GeometryFactory parent class.

Then all you need is define the partial specialization

template <class TProjection = IdentityProjection>
using SomeFormatFactory = GeometryFactory<SomeFormatFactoryImpl, TProjection>;

and you are done.

Use the other implementations as examples and ask if you have any questions.

14. Storage

Osmium offers serveral different indexes suitable for different use cases. You have to choose a suitable index type. See the Osmium Concepts Manual for a list of available index types.

If you want to choose the index type on runtime, you can use osmium::index::MapFactory. The following code listing shows its usage. location_index_type is a variable you either set based on the preferences of the user of your program or based on your own estimates (e.g. file size).

#include <osmium/index/map.hpp>

using index_type = osmium::index::map::Map<osmium::unsigned_object_id_type, osmium::Location>;
using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;
std::string location_index_type = "sparse_mem_array";
const auto& map_factory = osmium::index::MapFactory<osmium::unsigned_object_id_type, osmium::Location>::instance();
auto location_index = map_factory.create_map(location_index_type);
location_handler_type location_handler{*location_index};

The ItemStash class

Occasionally you want to store OSM objects in main memory and find them again later. To store the data you can use a Buffer (see chapter 8), but that is sometimes not enough. The ItemStash class might help.

An instance of the ItemStash class is used to keep any number of Items in memory. Usually those items are OSM objects, but it will work for any kind of item that can also be stored in a buffer. In fact, internally ItemStash uses an auto-growing buffer for this.

You add objects using add_item() which returns an opaque handle that can be used to later get the item back (using get_item()) or remove the item using remove_item(). The handle remains valid regardless of the operations you are doing on the stash (until you remove an item which invalidates the handle). This is different than any pointers or references into the ItemStash memory which are invalidated by calls to add_item().

osmium::ItemStash stash;

const osmium::OSMObject& object = ...;
auto handle = stash.add_item(object);

// ...

const auto& object = stash.get_item<osmium::OSMObject>(handle);

// ...

stash.remove_item(handle);

The ItemStash will internally manage the memory needed and occasionally do a garbage collection which will purge all removed items. You can call garbage_collect() to force this.

Note that the ItemStash does not keep any indexes to find objects by ID or by other means (for instance tags). It only finds the items again using the handles. It is your job to keep the handles somewhere and index them if necessary. Handles are small value-type objects, feal free to copy them around. A default constructed osmium::ItemStash::handle_type can be used as an invalid handle.

15. Exceptions

Libosmium uses various C++ standard exceptions and some Osmium-specific exceptions to tell you about problems. All Osmium-specific exceptions are in the osmium namespace, they are all derived from one of the standard C++ exceptions, usually std::runtime_error or std::system_error.

List of Osmium Exceptions

Exception Derived from Description
osmium::io_error   Some kind of input/output error. Derived classes describe the error in more detail.
osmium::xml_error io_error Some kind of XML parser error.
osmium::format_version_error io_error The OSM file format version was not understood. Osmium currently can only read version 0.6 files.
osmium::geometry_error   Some kind of geometry error.
osmium::projection_error   Thrown when a projection from one coordinate system into another fails in some way. Either the projection can’t be initialized because of invalid parameters or the projection can’t be calculated because the coordinates can’t be transformed into the target coordinate system.
osmium::not_found   This exception is thrown when a key is not found in an index.
osmium::invalid_location    
osmium::unknown_type   Thrown by visitors when they encounter an unknown (or in this context unexpected) item type in a buffer. This should not happen in usual circumstances.

Standard Exceptions thrown by Osmium

std::invalid_argument
Thrown by some Osmium functions.

16. Handling of invalid data

Libosmium can, to a certain extend, handle data that is invalid in the sense that it is not allowed in the OSM database or even might be nonsensical, for instance longitudes larger than 180°. This section explains the details and reasons.

There are good reasons for this behaviour:

Generally more low-level classes and functions (such as basic classes Location, Node, Tag, etc.) are more lenient for flexibility, while higher level functions (such as file I/O) might be more strict to support typical use cases.

File input and output

It is possible to encode some data in OSM files that can be considered to be invalid. When reading and writing OSM files libosmium does not care about that. It will give you the data in the form it is in the file and write out data you give to it in that form.

Order of objects in files

OSM objects in OSM files are usually ordered by type, ID (and version for history files). This is a useful convention, but it is not necessarily so. All OSM file formats allow the data to be in any order and libosmium can read and write those files. Whenever you read data using libosmium, it will be given to you in the order it is in the file, whatever that is. Whenever you write data, you must give it to libosmium in the order you want it to end up in the file.

Note that the ordering of objects in a file might influence the size of the file. Some file formats (notably PBF) will encode the data better if the same types of objects are together and even better if they are ordered by ID.

IDs

OSM node, way, relation, and changeset IDs are always positive. Zero is allowed by libosmium and understood as the “unset” or “don’t know” value. Negative values are also allowed because some programs (JOSM for instance) use negative IDs as temporary IDs. Not all parts of libosmium will just work with negative integers, though, you might have to handle them specially in some way. Indexes usually only work with positive IDs, if you have to handle negative IDs, use two indexes, one for positive IDs and one for negative IDs that you have to transform first.

OSM uses a different ID space for each entity type (nodes, ways, relations, changesets) and gives out IDs starting from 1. Libosmium allows any kind of ID that fits into an unsigned 64bit int, but some parts, notably the indexes, are optimized for smaller and more or less contiguous integers.

User ID

The user ID has to be zero (“unknown” or “anonymous”) or a positive integer. Negative values are not allowed in libosmium.

Timestamps

Timestamps are stored internally as seconds since the epoch (1970-01-01). Although OSM was founded much later, timestamps are not checked. Libosmium uses a few special values here. Time 0 is the “unknown” value, time 1 is understood to be “before any other time value” and 2^32-1 is understood to be “after any other time value”.

Locations

Locations are given in WGS84 longitude and latitude. Both libosmium and the OSM database store the coordinates internally as signed 32bit integers. 32bit integers have a range somewhat larger than the -180° to 180° longitude and -90° to 90° latitude. Values outside this range, but inside the signed 32bit integers are possible and historic OSM data contains such values. Use the Location::valid() function to check whether a location is in the proper range.

Strings and UTF-8

OSM strings use UTF-8 encoding, but a lot of the libosmium code doesn’t care about that and doesn’t check that a string is valid UTF-8. This is mostly for performance reasons, but it could also allow other character sets in non-OSM uses of the library.

Historically the OSM database sometimes contained non-UTF-8 strings. This should have all been fixed by now.

These parts of the library don’t care about string encoding:

These parts of the library do care about string encoding:

Strings and control characters

OSM strings (user names, tag keys and values, and roles) can not contain certain control characters. The reason is that those control characters can’t be expressed in XML. (XXX More details needed.)

Strings in OSM can only have a maximum lengt of 256 unicode characters. Libosmiums input and output routines allow any length up to 2^16 bytes. (XXX More details needed.)

17. Run-time Configuration

Osmium reads some settings from environment variables. This allows you to set configuration options for the library at run-time without any support from the application using the library. Setting these variables is usually not needed in normal operations but could be useful when debugging or tweaking performance.

OSMIUM_POOL_THREADS

The number of threads in the thread pool used for certain input/output operations.

If this is a negative number, it will be set to the actual number of cores on the system plus the given number, ie it will leave a number of cores unused. In all cases the minimum number of threads in the pool is 1.

Default: -2

OSMIUM_USE_POOL_THREADS_FOR_PBF_PARSING

Normally PBF parsing will use the thread pool. You can disable this by setting this variable to false.

Default: true

Queue Sizes

The following environment variables can be used to change the queue sizes used for file IO:

Smaller queue sizes mean that potentially less memory is used, but it also means that the work can’t be parallelized as effectively.

The minimum value for all queue sizes is 2. When set to 0, the default is used.

OSMIUM_CLEAN_PAGE_CACHE_AFTER_READ

Since 2.17.0 Osmium will, when reading files, tell the kernel using fadvise that it can remove pages from the buffer cache that are not needed any more. This is usually beneficial, because the memory can be used for something else. But if you are reading the same OSM file multiple times at the same time or in short succession, it might be better to keep those buffer pages.

Since 2.17.1 you can set the environment variable OSMIUM_CLEAN_PAGE_CACHE_AFTER_READ to no and Osmium will not call fadvise. Set it to yes or anything else (or not set it at all) to get the default behaviour.

18. Changes from old versions of Osmium

This version has some substantial changes from he “old Osmium” available from https://github.com/joto/osmium and users of the “old Osmium” will have to rewrite their code. Use the examples provided in the “example” directory or in the osmium-contrib repository to get an idea what needs changing. These examples are often similar to the examples provided with the old Osmium so they should give you an idea how your code has to change.

Here are some of the more important changes: