← libosmium

Libosmium Manual

Table of Contents

1. Introduction

The OpenStreetMap project is growing at an enormous rate. Working with the OSM data becomes increasingly difficult, because there is just so much of it and because it gets more complex all the time.

Osmium was developed as an answer to this challenge. After years of developing software to work with OSM data in many programming languages like Perl, Ruby, Java and even in XSLT, it became evident that something more was needed to efficiently work with these huge amounts of data. Processing speed was, of course, one big issue here, but the other one is available memory. Data processing tasks can be so much faster if their working set fits into memory, that it makes sense to think about this. Because Osmium is a C++ library it can make very efficient use of the main memory on your computer. Primitive objects such as integers and doubles, but also complex objects need only as much memory as is really necessary. There isn’t a lot of management overhead needed in many cases, if the data structures are chosen carefully.

Osmium has been in continuous development since it was borne in October 2010. And it has changed considerably over time. While the basic premise, to write a low-level efficient OSM library, is still true, it has become more and more powerful and at the same time easier to use. Osmium has been in production use nearly from day one, some parts of it have been ripped from earlier production code. Osmium is not an academic exercise, but it is used and it has shown its power many times. And while C++ might not be the easiest programming language to learn and Osmium might not be the easiest library to use, we try to make it as simple as possible to work with it, as long as this doesn’t compromise efficiency too much.

Header-only Library

Osmium is a header-only library, so there is nothing to compile to build it. Just include the header files you need.

The osmium Namespace

Everything in the Osmium library is in the osmium namespace or in sub-namespaces. You’ll likely encounter the osmium::io namespace for everything related to file input and output and the osmium::geom namespace for geometry-related functionality, but there are some more.

Do not directly use anything in any sub-namespace called detail. Those classes and functions are for internal use only.

Code in any experimental sub-namespace is experimental and might be removed or changed without notice.

License

This manual is available under the Creative Commons Attribution-ShareAlike License version 4.0.

The Osmium Library is available under the very liberal Boost Software License:

Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the “Software”) to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

2. Dependencies

Different parts of Libosmium have different dependencies. You do not need to install all of them, just those that you need for whatever you are doing with Libosmium. But for a beginner it is not always easy to see which dependencies are needed and which aren’t. This manual differentiates between important dependencies and extra dependencies to help you out. You should at least install the important dependencies when starting to experiment with Libosmium, but feel free to install all dependencies. Whatever is not needed will not be used anway, it will not slow down your program or make the binaries bigger.

In addition to the programs listed here, you’ll need a C++ compiler which supports C++11. Clang 3.4 or later and GCC 4.8 or later are known to work.

Installing dependencies on Linux

Debian/Ubuntu

You can install all important dependencies with:

sudo apt-get install cmake cmake-curses-gui make \
    libexpat1-dev zlib1g-dev libbz2-dev

and all extra dependencies with:

sudo apt-get install libsparsehash-dev libboost-dev \
    libgdal-dev libproj-dev doxygen graphviz

Arch Linux

You can install all important dependencies with:

sudo pacman -Suy protobuf boost-libs zlib expat cmake make bzip2

and all extra dependencies with:

sudo pacman -Suy sparsehash boost gdal proj doxygen

Important dependencies

CMake and Make

To build the tests, examples, etc. you need the CMake build system. Programs using Libosmium can, of course, be built with any build system you like, but the Libosmium repository as well as many projects based on Libosmium use it.

CMake has an optional curses-based configuration tool called ccmake. It is recommended that you install this also.

CMake usually generates a Makefile for Make, which you will also need.

Google Protocol Buffers (until version 2.2)

Not needed any more from version 2.3.0 onwards

Google Protocol Buffers in at least version 2.4.0 is needed for reading and writing OSM PBF files.

OSMPBF (until version 2.2)

Not needed any more from version 2.3.0 onwards

The OSMPBF library is needed for reading and writing OSM PBF files.

Protozero (since version 2.3.0)

The Protozero header only library is needed for reading and writing OSM PBF files. A copy of this library is included in the libosmium repository but not installed by default. Either use the packages of your distribution, install it from Github, or use the INSTALL_PROTOZERO option of the libosmium CMake configuration to install the bundled version.

Utfcpp

The utf8-cpp library is needed for the OPL output format. A copy of this library is included in the libosmium repository but not installed by default. Either use the packages of your distribution, install it from the source, or use the INSTALL_UTFCPP option of the libosmium CMake configuration to install the bundled version.

Expat

Expat is needed for parsing OSM XML files.

ZLib

zlib is needed for reading and writing OSM PBF files and for GZip support when reading and writing XML files.

bz2lib

bz2lib is needed for BZip2 support when reading and writing OSM XML files.

Boost

Boost Iterator is used for Tag filters, and for the Object Pointer Collection. The CRC32 checksum implementatation from boost is needed for caclcuation checksums over OSM objects. Libosmium versions before 2.6.1 also needed Boost for writing PBF files.

You need at least Boost version 1.55.

Extra dependencies

Google Sparsehash

Google Sparsehash (http://code.google.com/p/google-sparsehash/) is needed for the sparse-mem-table index map, often used as a node location store.

Boost Program Options (until version 2.7.2)

Boost Program Options is needed for parsing command line options in some examples.

GDAL/OGR

GDAL/OGR is needed if you want to convert OSM geometries into OGR geometries.

To use, compile with what the command

gdal-config --cflags

returns and link with what

gdal-config --libs

returns.

GEOS

GEOS is needed if you want to convert OSM geometries into GEOS geometries. The GEOS support is deprecated and works only until GEOS 3.5. For details see this commit.

Proj.4

The Proj.4 library is needed if you want to project OSM coordinates into spatial reference systems other than Web Mercator (EPSG 3857, often named Google Mercator).

Doxygen

The Libosmium API documentation can be built using Doxygen. Usually you do not need to do this, because the API reference is available online. If you want to build it yourself, you need Graphviz in addition to Doxygen.

3. Building Libosmium

Libosmium is a header-only library, that means that you do not have to build anything. But you might want to build the tests, examples, benchmarks or the documentation. This chapter explains how to do that.

Before building you need to install all the dependencies.

CMake

Libosmium uses the CMake configuration system available on all major platforms. CMake will generate a configuration for a build system of your choice. On Linux and Mac OS/X this is usually GNU Make, on Windows Nmake or MSBuild.

Build types

CMake knows several different build types that result in the use of different compiler options and different build options (see below). By default the build type RelWithDebInfo (Release with debug info) will be used, but you can change this either by setting CMAKE_BUILD_TYPE in ccmake or on the command line:

cmake -DCMAKE_BUILD_TYPE=Dev

Here are the build types used for Libosmium:

CMAKE_BUILD_TYPE Description
Debug Debug mode, no optimizations.
Dev For Libosmium developers. All build options are set to ON and very strict compiler warnings are enabled.
MinSizeRel Release mode, optimize for small binary.
RelWithDebInfo Release mode with debug information compiled in. Use this unless the binaries generated are too big for you.
Release Release mode.

Build options

Depending on the build type (see above), different build options are ON or OFF. You can change the settings in ccmake or on the command line with something like

cmake -DBUILD_EXAMPLES=ON

etc.

Build option Default Description
BUILD_BENCHMARKS OFF (ON in Dev build) Build the benchmark programs. You only need this if you intend to run the benchmarks.
BUILD_DATA_TESTS OFF (ON in Dev build) Build the data tests. These tests need OSM test data from a different repository, so they are a bit more difficult to run. See chapter Running Tests for details.
BUILD_EXAMPLES ON Build the examples in the examples directory.
BUILD_HEADERS OFF (ON in Dev build) Only interesting for Libosmium developers. This will build every Libosmium header file by itself to check if the include dependencies are all set correctly.
BUILD_TESTING ON Build the unit tests. See chapter Running Tests for details.

Building on Linux and Mac OS/X

Linux: Osmium is developed on Linux and tested best on that system. Debian Jessie (testing) and current Ubuntu systems come with everything needed for Osmium. Debian wheezy (stable) and the Ubuntu LTS release 12.04 don’t have compilers current enough. If you are stuck on these systems, use a backported compiler.

Mac OSX: Osmium also works well on Mac OSX with the exception of the parts that need the mremap system call that is not available on Mac OSX.

First clone Libosmium from the git repository (or install it in some other way):

git clone https://github.com/osmcode/libosmium
cd libosmium

Then create a directory in which the build should happen. In this documentation we will use the directory build, but you can choose any other name. You can have several build directories at the same time with different build options and they will not interfere with each other.

mkdir build
cd build

The call CMake to create an initial configuration:

cmake ..

CMake will check your system, determine locations of programs, include headers, libraries etc. It will also set some default build options. You can then call

ccmake ..

to enter a cursed-based tool that allows you to edit any configuration setting. Use the cursor keys to choose any variable and press Enter to change it. Once you are done, press c to configure and handle any errors that might appear. You might have to do this step several times. Then press g to generate the configuration and exit the program. For more advanced usage info, see the ccmake help.

Now you can call

make

to complete the build.

For Mac users: If you have clang 3.2 or newer, use the system compiler. If not you have to build the compiler yourself. See the instructions on http://clang.llvm.org/ .

Building on Windows

You need a rather new Visual C++ compiler for this to work. Visual C++ 2013 (a.k.a 12.0) is not supported. You’ll need 2014 CTP or the 2015 Preview. This is due to the limited C++11 support in earlier versions of Visual C++.

The easiest way on Windows is to use the windows-builds repository.

When the pre-requisites (Visual Studio 2014/2015, git) are in place, it should not take more than these steps to compile libosmium:

git clone https://github.com/mapbox/windows-builds.git
cd windows-builds
settings.bat
scripts\build_libosmium_deps
scripts\package_libosmium_deps
scripts\build_libosmium vs

Building on 32bit architectures

Osmium works well on 64 bit machines, but on 32 bit machines there are some problems. Be aware that not everything will work on 32 bit architectures. This is mostly due to the 64 bit needed for node IDs. Also Osmium hasn’t been tested well on 32 bit systems. Here are some issues you might run into:

Please report any issues you have and we might be able to solve them.

Building the reference documentation

To build the documentation you’ll need Doxygen.

After configuring with CMake as described above, call

make doc

to create the reference documentation.

Installing Libosmium

Call make install in the build directory to install the library. By default, this will install the Osmium include files into /usr/local/include/.

The following external (header-only) libraries are included in the libosmium repository:

If you want (some of) those libraries to be installed along with libosmium itself when calling make install, you have to use the CMake options INSTALL_GDALCPP, INSTALL_PROTOZERO, and/or INSTALL_UTFCPP.

If something didn’t work

Here are some tips if your build failed:

Advanced CMake configuration

The following variables can be set in the CMake configuration to further change the build. Changes here are usually not necessary though:

Option Description
BENCHMARK If BUILD_BENCHMARKS is ON, this variable contains the semicolon-separated list of all benchmarks that should be built. The prefix osmium_benchmark_ will be added to all executables.
EXAMPLES If BUILD_EXAMPLES is ON, this variable contains the semicolon-separated list of all examples that should be built. The prefix osmium_ will be added to all executables.
OSMIUM_WARNING_OPTIONS C++ compiler warning options used in Dev mode.

Running CPPCheck

To check for problems in the source code not detected by compilers, you can run the cppcheck command. If it is installed and CMake found it, you can call Make with the cppcheck target:

make cppcheck

This will check all .hpp and .cpp files and can take a while.

4. Running tests

Libosmium uses the Catch unit testing framework and CTest which is part of the CMake suite.

There are three kinds of tests: unit tests, data tests, and example tests. For the details see below.

Tests should never fail. If they do fail in your environment, please report this as a bug. Some tests will be disabled on some platforms if they are testing functionality thats not available on that platform. Some tests will be disabled on your host if you don’t have the needed dependencies installed.

Running the tests

To run the tests, build the project es described in the Building Libosmium chapter and then run

ctest

which will run all the configured tests. You can run all tests matching a pattern with something like

ctest -R 'io_.*'

or exclude tests from being run with something like

ctest -E io_test_reader

If there is some problem you can enable verbose mode:

ctest -V

See the CTest documentation for more details.

Labels

CTest allows tests to be labeled to categorize them. All unit tests have the label unit and a label for their category (the directory under test/t). All data tests have the label data. In addition all tests are labeled as fast or slow. Fast tests don’t take a noticable amount of time, slow tests do.

You can run all tests with labels matching a regular expression with -L. So to run only fast tests use

ctest -L fast

You can use

ctest --print-labels

to see all available labels.

Unit tests

Unit tests check small parts of Libosmium. They can be found in the directories under test/t. If you are installing Libosmium, you should probably run these tests to make sure Libosmium works in your environment.

Unit tests are enabled or disabled with the BUILD_TESTING CMake setting. Different tests have different dependencies and CMake will disable all tests that don’t have their dependencies met.

You can also run the unit tests manually without going through CTest. After building they are in the build/test directory. Call them with --help to see options.

Data tests

Data tests need external OSM test data to run. They are enabled or disabled with BUILD_DATA_TESTS, but you have to install the test data for them to work. In the same directory you installed Libosmium in, clone the osm-testdata repository:

git clone https://github.com/osmcode/osm-testdata

If you have put the test data somewhere else, you can use the OSM_TESTDATA variable in CMake to point to that directory.

The testdata-multipolygon test needs Spatialite and Ruby with the json gem installed. Those dependencies are currently not checked for in the CMake configuration.

Example tests

Some example programs come with tests. Those tests are under test/examples. They run the example programs with some arguments to check basic functionality. Currently these tests are very rudimentary.

5. Using Libosmium in your own projects

Libosmium is generally quite easy to use in your own projects. Just include the specific header files you need for your application and start using Libosmium functions. Because Libosmium is a header-only library, there is nothing to link with. There isn’t one include file for everything, but many include files each only bringing in some specific classes and functions. This way you are not paying for something you don’t use.

Read the manuals

Before you do anything else we recommend you at least skim the Libosmium concepts manual and this manual. This will give you an overview of what’s where and how Libosmium works.

Read the API reference

The API reference contains a documentation of every class and function in Libosmium. It will tell you which #include directive you need where.

Libosmium uses several other libraries for many of its functions and you have to figure out which libraries to link with when you include specific Libosmium header files. This is documented in the reference and there is a list below for your convenience.

CMake configuration

If you are using CMake to configure your project, using Libosmium is very easy, because complete configuration is available. Copy the file FindOsmium.cmake to your project:

cd your-project
mkdir -p cmake
cd cmake
wget https://github.com/osmcode/libosmium/raw/master/cmake/FindOsmium.cmake

and include it in your CMakeLists.txt:

list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
find_package(Osmium REQUIRED)

This will tell CMake to find the Libosmium includes on the build system during the configuration. You can check whether this was successful with something like:

if(NOT OSMIUM_FOUND)
    message(WARNING "Libosmium not found!\n")
endif()

If your code doesn’t work with older version of Libosmium, you can tell CMake the minimum version number:

find_package(Osmium 2.10.2 REQUIRED)

You can add an optional list of components that should be found also. For example to look for the io and gdal components you extend the find_package command like this:

find_package(Osmium REQUIRED COMPONENTS io gdal)

FindOsmium knows about the following components:

After that add the include directories:

include_directories(${OSMIUM_INCLUDE_DIRS})

You can look at the CMake configuration in the Osmium Tool and Osmium Contrib repositories for some working examples.

Note that you should occasionally check whether you still have a current version of FindOsmium.cmake and update if necessary.

Libraries needed for specific functionality

Also see the dependencies chapter.

XML input

For XML input you need the Expat XML parser, for XML output no special XML library is needed. In any case you need threading enabled. If you want to read or write compressed XML files you need ZLib and BZ2lib.

PBF input and output

For PBF input and output you need several libraries and threading enabled.

For version 2.3.0 and above you don’t need much:

For versions up to 2.2 you need some more libraries:

GDAL/OGR

The GDAL/OGR library is needed when you want to convert OSM geometries into OGR geometries or report problems building multipolygons into OGR formats.

Proj.4

The Proj.4 library is only needed when you want to project OSM locations into arbitrary coordinate reference systems. If you only want to convert to Web Mercator, use osmium::geom::MercatorProjection instead and you don’t need an extra library.

Compiler options

You might have to set the C++ version using the compiler option

-std=c++11

When working with OSM data you often have very large files with several gigabytes. This can lead to problems on 32bit systems. Use the options

-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

for the compiler to make sure that large files work.

Sample Compilation String

g++ osm_processor.cpp --std=c++11 -lpthread -lz -lexpat -lbz2

6. Basic Types

All the types and classes described in this chapter are value types, ie they are small and can be copied around cheaply.

IDs

Typedef: osmium::object_id_type

Include: <osmium/osm/types.hpp>

For object IDs use the type osmium::object_id_type. It is a 64bit signed integer that can represent the more than 2 billion nodes we already have in OSM. While way and relation IDs could theoretically use a smaller ID type (signed 32 bit are currently enough), for consistency and to be future-proof, they will also use this type in most cases.

OSM objects always have positive IDs. But some software (such as JOSM) uses negative IDs for objects that have not yet been uploaded to the main OSM database. To support these use cases, the object_id_type is a signed integer.

Some parts of Osmium, notably the different index classes, can only work with positive IDs. In those cases the type osmium::unsigned_object_id_type is used. If you know that your data only contains positive IDs or only negative IDs, you can use the positive_id() member function on the Object class to get IDs of that type. It will return the absolute value of the ID.

If your data contains a mix of positive and negative IDs, this simple approach will fail! In that case you have to use two indexes, one for the positive IDs and one for the negative IDs. The osmium::handler::NodeLocationsForWay class takes this approach.

Other Primitive Types

Include: <osmium/osm/types.hpp>

There are several other typedefs:

Type Description
object_version_type type for OSM object version number
changeset_id_type type for OSM changeset IDs
user_id_type type for OSM user IDs
num_changes_type type for number of changes in a changeset

All these types are currently 32bit integers. Version numbers, changeset IDs and User IDs are always positive (they start out with 1). The number of changes can be 0 or larger.

Locations

Class: osmium::Location

Include: <osmium/osm/location.hpp>

In Osmium all positions on Earth are stored in objects of the osmium::Location class. Coordinates are stored as 32 bit signed integers after multiplying the coordinates with osmium::coordinate_precision = 10,000,000. This means we can store coordinates with a resolution of better than one centimeter, good enough for OSM use. The main OSM database uses the same system. We do this to save memory, a 32 bit integer uses only 4 bytes, a double uses 8.

Coordinates are not checked when they are set.

To create a location:

osmium::Location location{9.3, 49.7};

or using integers:

osmium::Location location{9300000000, 49700000000};

Make sure you are using the right number type or you will get very wrong coordinates.

You can also create an undefined location. This is used for instance for coordinates in ways that are not set yet:

osmium::Location location{};

In a boolean context an undefined location returns false, a defined true. So you can write something like:

if (location) {
    ...defined location here...
}

You can get and set the coordinates using the internal (integer) format with the x() and y() member functions and the external (double) format with the lon() and lat() member functions.

The normal bounds for the longitude and latitude are -180 to 180 and -90 to 90, respectively. But in historic OSM data you can sometimes find locations outside these bounds. Call

location.valid()

to find out if a location is inside those bounds.

The lon() and lat() getter calls will throw an exception if the location is invalid or undefined.

Segments

Class: osmium::Segment

Include: <osmium/osm/segment.hpp>

Segments are the directed connection between two locations. They are not OSM objects but sometimes useful in algorithms.

Undirected Segments

Class: osmium::UndirectedSegment

Include: <osmium/osm/undirected_segment.hpp>

Undirected Segments are connection between two locations. They are not OSM objects but sometimes useful in algorithms.

Boxes

Class: osmium::Box

Include: <osmium/osm/box.hpp>

A box is a rectangle described by the minimum and maximum longitude and latitude. It is used, for instance, in the header of OSM files and in changesets to describe the bounding box.

osmium::Box box;
box.extend(osmium::Location{3.2, 4.3});
box.extend({4.5, 7.2});
box.extend({3.3, 8.9});
std::cout << box;  // (3.2,4.3,4.5,8.9)

7. OSM Entities

Osmium works with the four basic types of OSM entities: Nodes, Ways, and Relations (which are all [OSM Objects]) and Changesets. In addition Areas are supported, which are not native OSM objects, but they are almost treated like real OSM objects.

These OSM entities can not be created like any normal C++ object, but they need a buffer to live in. See the next chapter for details. Accessing existing OSM entities on the other hand is easy and straightforward.

OSM Objects

Class: osmium::OSMObject

Include: <osmium/osm/object.hpp>

The osmium::OSMObject class is the base class for nodes, ways, and relations. it has accessors for the usual OSM attributes:

osmium::OSMObject& obj = ...;
std::cout << "id=" << obj.id()
          << " version=" << obj.version()
          << " timestamp=" << obj.timestamp()
          << " visible=" << (obj.visible() ? "true" : "false"
          << " changeset=" << obj.changeset()
          << " uid=" << obj.uid()
          << " user=" << obj.user() << "\n";

The changeset() and uid() accessor functions return the IDs of the changeset that created this object version and the User ID of the user creating this version of the object, respectively. They do not link to an object of that type.

The visible flag will always be true for normal OSM data, but for history data or change files it shows whether an object version has been deleted.

In addition each object has a list of tags attached:

const osmium::TagList& tags = obj.tags();

You can iterate over all tags:

for (const auto& tag : obj.tags()) {
    std::cout << tag.key() << '=' << tag.value() << '\n';
}

Or you can find specific tags:

const char* highway = obj.tags().get_value_by_key("highway");
if (highway && !std::strcmp(highway, "primary") {
    ...
}

Nodes

Class: osmium::Node

Include: <osmium/osm/node.hpp>

A Node is a kind of OSMObject. In addition to the things you can do with any OSMObject, the Node has a Location.

const osmium::Node& node = ...;
double longitude = node.location().lon();

Ways

Classes: osmium::Way, osmium::WayNode, osmium::WayNodeList

Include: <osmium/osm/way.hpp>

A Way is a kind of OSMObject. In addition to the things you can do with any OSMObject, a Way has a list of node references:

const osmium::Way& way = ...;
for (const osmium::NodeRef& nr : way.nodes()) {
    std::cout << "ref=" << nr.ref() << " location=" << nr.location() << '\n';
}

Relations

Classes: osmium::Relation, osmium::RelationMember, osmium::RelationMemberList

Include: <osmium/osm/relation.hpp>

A Relation is a kind of OSMObject. In addition to the things you can do with any OSMObject, a Relation has a list of members:

const osmium::Relation& relation = ...;
const osmium::RelationMemberList& rml = way.members();
for (const osmium::RelationMember& rm : rml) {
    std::cout << rm.type() << rm.ref() << " (role=" << rm.role() << ")\n";
}

Areas

not yet documented

Changesets

Class: osmium:Changeset

Include: <osmium/osm/changeset.hpp>

Changesets contain the metadata for a set of changes to OSM data.

osmium::Changeset

8. Buffers

OSM entities have to be stored somewhere in memory. They are complex objects containing arbitrary number of tags, relations can have any number of members etc. If we handled those objects like any normal C++ object, creating them would take lots of small memory allocations and many pointer indirections to get at all the parts of the data. Instead OSM entities are created inside so-called buffers. Buffers can have a fixed size or grow as needed. New objects can be added at the end, and they are stored inside those buffers in a reasonably space-efficient manner while still being accessible easily and quickly.

Buffers can be moved around between different parts of your program and even between threads. The content of buffers can even be written to disk as it is and read back in and immediately used “as is” without any serializaton or de-serialization step needed.

But all of this has one draw-back: It is slightly more complicated to create those objects and they can not just be instantiated on the stack.

Buffers can not be copied, because it is unclear who would be responsible for the memory then. But they can be moved.

Creating a Buffer

Buffers exist in two different flavours, those with external memory management and those with internal memory management. If you already have some memory with data in it (for instance read from disk), you create a Buffer with external memory managment. It is your job then to free the memory once the buffer isn’t used any more. If you don’t have some memory space already, you can create a Buffer object and have it manage the memory internally. It will dynamically allocate memory and free it again after use.

To create a buffer from existing memory you give the address and size to the constructor:

const int buffer_size = 10240;
void* mem = malloc(buffer_size);
osmium::memory::Buffer buffer{mem, buffer_size};

This will create an empty buffer with buffer_size bytes available for use.

If the new buffer already contains some data, you can add the number of bytes already in use as a third parameter to the constructor:

void* mem = malloc(buffer_size);
int num = read(0, mem, buffer_size);
osmium::memory::Buffer buffer{mem, buffer_size, num};

To create a buffer with internal memory-management you construct it with the number of bytes it should have initially and a flag that tells Osmium whether it should automatically grow the buffer if it is needed:

const int buffer_size = 10240;
osmium::memory::Buffer buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
osmium::memory::Buffer buffer{buffer_size, osmium::memory::Buffer::auto_grow::no};

Adding Items to the Buffer

You cannot create OSM objects on the stack, they always have to be stored in buffers. To create OSM objects special “builder” classes are used:

void add_tags(osmium::memory::Buffer& buffer, osmium::builder::Builder* builder) {
    osmium::builder::TagListBuilder tl_builder{buffer, builder};
    tl_builder.add_tag("amenity", "restaurant");
}

const int buffer_size = 10240;
osmium::memory::Buffer node_buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
{
    osmium::builder::NodeBuilder builder{node_buffer};
    builder.add_user("foo");
    osmium::Node& obj = builder.object();
    obj.set_id(1);
    obj.set_version(1);
    obj.set_changeset(5);
    obj.set_uid(140);
    obj.set_timestamp("2016-01-05T01:22:45Z");
    obj.set_location(osmium::Location{9.0, 49.0});
    add_tags(node_buffer, &builder);
}
node_buffer.commit();
// do something with the buffer (e.g. write to file)

Building OSM entities and adding them to a buffer has some pitfalls. A buffer has to be aligned (padding with zeros) before committing. If you try to commit a buffer which is not aligned, you program will fail with Assertion 'buffer.is_aligned()' failed.

The addition of the attributes version, changeset, uid and timestamp may be omitted but you have to add the attribute user in order to have an aligned buffer.

If the object has references to other OSM objects (tags of an OSM object, node references of a way, members of a relation), you need additional builders for these reference lists. The destructor of one of these builders has to be called before another builder writes data to the buffer.

void build_way(osmium::memory::Buffer& buffer) {
    osmium::builder::WayBuilder way_builder{buffer};
    way_builder.object().set_id(1);
    // set attributes version, changeset, uid and timestamp (all optional)
    way_builder.add_user("foo");
    {
        osmium::builder::WayNodeListBuilder wnl_builder{buffer, &way_builder};
        wnl_builder.add_node_ref(osmium::NodeRef (1, osmium::Location()));
        wnl_builder.add_node_ref(osmium::NodeRef (2, osmium::Location()));
    }
    add_tags(buffer, way_builder);
}

const int buffer_size = 10240;
osmium::memory::Buffer way_buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
build_way(way_buffer);
way_buffer.commit();

This will create only a way, the nodes have to be created separately.

Building relations works similar to building ways. You use a osmium::builder::RelationBuilder instead of a WayNodeListBuilder. The instance of RelationBuilder has to go out of scope before the TagListBuilder writes the tags to the buffer and vice versa.

Handling a Full Buffer

If a buffer becomes full, there are three different things that can happen:

If the buffer was created with auto_grow::yes, it will reserve more memory on the heap and double its size. This will happen without the client code noticing, but it will invalidate any pointer pointing into the buffer. This is the same behaviour a std::vector has so it should be familiar to C++ programmers.

If the buffer was created with auto_grow::no (or if it is a buffer with external memory management), the exception osmium::memory::BufferIsFull will be thrown. In this case you have to catch the exception, either grow the buffer or create a new one. If you grow the buffer you can keep going at the point where you left off. If you start a new one, the last object you were writing to the buffer when the exception was thrown was not committed and you have to write it again into the new buffer.

As a third option you can set a callback functor that wil be called when the buffer is full. The functor takes a reference to the buffer as argument and returns void:

void full(osmium::memory::Buffer& buffer) {
    std::cout << "Buffer is full\n";
}

osmium::memory::Buffer buffer{buffer_size, false};
buffer.set_full_callback(full);

9. Input and Output

Libosmium can read several different OSM file formats.

Headers

Whenever you want to use Osmium to access OSM files you need to include the right header files and link your program to the right libraries. If you want to support all the different formats you add

#include <osmium/io/any_input.hpp>

and/or

#include <osmium/io/any_output.hpp>

to your C++ files. These headers will pull in all the file formats and all the compression types for input and output, respectively. Usually this is what you want to use. But if you are sure you don’t need all formats or if you don’t have all the libraries needed for all the formats, you can pick and choose formats and compression types.

If you only need some file formats, you can include any combinations of the following headers:

#include <osmium/io/pbf_input.hpp>
#include <osmium/io/xml_input.hpp>

#include <osmium/io/debug_output.hpp>
#include <osmium/io/opl_output.hpp>
#include <osmium/io/pbf_output.hpp>
#include <osmium/io/xml_output.hpp>

If you want compression support, you have to add the includes for the different compression algorithms:

#include <osmium/io/gzip_compression.hpp>
#include <osmium/io/bzip2_compression.hpp>

Or, if you want both anyway, you can just use the shortcut:

#include <osmium/io/any_compression.hpp>

Compression

If you want to use compression you have to include the right header files and link to the libz and libbz2 libraries, respectively.

File Formats

XML

For read support you need the expat parser library. Link with:

-lexpat

For write support no special library is needed.

PBF

To build with PBF support you have to compile with threads and need libz:

-pthread -lz

Note that in older versions of libosmium you needed to link with the protobuf and osmpbf libraries. They are not used any more. Instead the protozero header-only library is used. This library is included in the libosmium repository.

Reading and Writing OSM Files with Osmium

The osmium::io::File class

Before reading from or writing to an OSM file, you have to instantiate an object of class osmium::io::File. It encapsulates the file name as well as any information about the format of the file. In the simplest case the File class can derive the file format from the file name:

osmium::io::File input_file{"planet.osm.pbf"} // PBF format
osmium::io::File input_file{"planet.osm.bz2"} // XML with bzip2 compression
osmium::io::File input_file{"planet.osc.gz"}  // XML change file, gzip2 compression

The constructor of the File class has a second, optional argument giving the format of the file, which can be used if the format can’t be deduced from the file name. In the simplest form the format argument looks the same as the usual file suffixes:

osmium::io::File input_file{"somefile", "osm.bz2"};

This setting of the format is often needed when reading from STDIN or writing to STDOUT. Both an empty string and a single dash as filename signify STDIN/STDOUT:

osmium::io::File input_file{"-", "osm.bz2"};
osmium::io::File output_file{"", "pbf"};

The format string can also take optional arguments separated by commas.

osmium::io::File output_file{"out.osm.pbf", "pbf,pbf_dense_nodes=false"};

It is also possible to change the format after creating a File object using the accessor functions:

osmium::io::File input_file{"some_file.osm"};
input_file.format(osmium::io::file_format_pbf);

Reading a File

After you have a File object you can instantiate a Reader object to open the file for reading:

osmium::io::File input_file{"input.osm.pbf"};
osmium::io::Reader reader{input_file};

As a shortcut you can just give a file name to the Reader if you are relying on the automatic file format detection and don’t want to do any special format handling:

osmium::io::Reader reader{"input.osm.pbf"};

Optionally you can add a second argument to the Reader constructor giving the types of OSM entities you are interested in. Sometimes you only need, say, the ways from the file, but not the nodes and relations. If you tell the Reader about it, it might be able to read the file more efficiently by skipping those parts you are not interested in:

osmium::io::Reader reader{"input.osm.pbf", osmium::osm_entity_bits::way};

You can set the following flags:

Flag Description
osmium::osm_entity_bits::nothing Do not ready any entities at all (useful if you are only interested in the file header)
osmium::osm_entity_bits::node Read nodes
osmium::osm_entity_bits::way Read ways
osmium::osm_entity_bits::relation Read relations
osmium::osm_entity_bits::changeset Read changesets
osmium::osm_entity_bits::all Read all of the above

You can also “or” several flags together if needed.

You can get the header information from the file using the header() function:

osmium::io::Header header = reader.header();

You read the OSM entities from the file using the read() which returns a buffer with the data:

while (osmium::memory::Buffer buffer = reader.read()) {
    ...
}

At the end of the file an invalid buffer is returned which evaluates to false in boolean context.

You can close the file at any time. It will also be automatically closed when the Reader object goes out of scope.

reader.close();

In most cases you do not want to work with the buffers, but with the OSM entities within them. See the [Iterators] chapter and the [Handlers] chapter for more convenient methods of working with open files.

The Header

Format Option Default Description
all generator Osmium/VERSION The program that generated this file
XML xml_josm_upload not set Set upload attribute in header to given value (true or false) for use in JOSM

Writing a File

To create an OSM file, create an instance of the osmium::io::Writer class and move buffers with OSM objects into its write() function:

osmium::memory::Buffer buffer;
// Add objects to the buffer (see above) or read it from
// an input file using osmium::io::Reader::read().
osmium::io::File output_file{"output.osm.pbf"};
osmium::io::Writer writer{output_file};
writer.write(std::move(buffer));
writer.close();

As a shortcut, you can directly give the filename to the Writer if you are relying on the automatic file format detection (the same as for Readers) and don’t need any special handling.

osmium::io::Writer writer{"output.osm.pbf"};

You can give additional arguments to the constructor of the Writer class, for instance a customized header or to allow writing over an existing file:

osmium::io::Header header;
header.set("generator", "FastOSMTool");
osmium::io::Writer writer{"output.osm.pbf",
                          header,
                          osmium::io::overwrite::allow,
                          osmium::io::fsync::yes};

10. Iterators

Every C++ programmer is familiar with iterators and their flexibility. There is no reason we couldn’t take advantage of that and of the many algorithms supplied by the STL. So libosmium supports several different kinds of iterators to access OSM data. You can iterate over all OSM objects in a buffer, or over all objects from a data source (usually a file), or over a bunch of pointers to OSM objects, and there are output iterators to write to files, too. All these different iterators can be used consistently and easily from your code without having to know much about what’s underneath. And because they work just like STL iterators do, you can use all the algorithms from the STL.

Some of these iterators will keep track of underlying buffers and make sure the buffers and the data in them stay around as long as there is an iterator pointing to it. This adds some overhead but makes using the data much easier.

Accessing Data in Buffers

Buffers containing OSM entities support the usual begin(), end(), cbegin(), and cend() functions:

osmium::memory::Buffer buffer = ...;

auto it = buffer.begin();
auto end = buffer.end();

for (; it != end; ++it) {
    std::cout << it->type() << "\n";
}

Of course you can also use the C++11 for loop:

for (auto& item : buffer) {
    ...
}

Accessing Data from Files

osmium::io::Reader reader{"input.osm"};
osmium::io::InputIterator<osmium::io::Reader> in{reader};
osmium::io::InputIterator<osmium::io::Reader> end;

11. Handlers

If you process OSM data with libosmium to do something (e.g. convert to a different format, import into a database, build a routing graph), you will usually create one or more handlers.

Handlers are created by deriving a class from osmium::handler::Handler which defines methods for all OSM object types, i.e. a method node(const osmium::Node&) for nodes, a method way(const osmium::Way&) for ways etc. You have to implement the methods for the object types you want to process. Libosmium will read the data, feed it object by object into the handler and you can do there whatever you want. Your handler may have temporary storage, e.g. if you want to sum up the length of all roads in an OSM file.

#include <iostream>

#include <osmium/handler.hpp>
#include <osmium/io/any_input.hpp>
#include <osmium/osm/node.hpp>
#include <osmium/osm/way.hpp>
#include <osmium/visitor.hpp>

class MyHandler : public osmium::handler::Handler {
public:
    void way(const osmium::Way& way) {
        std::cout << "way " << way.id() << '\n';
        for (const osmium::Tag& t : way.tags()) {
            std::cout << t.key() << "=" << t.value() << '\n';
        }
    }

    void node(const osmium::Node& node) {
        std::cout << "node " << node.id() << '\n';
    }
};

int main() {
    auto otypes = osmium::osm_entity_bits::node | osmium::osm_entity_bits::way;
    osmium::io::Reader reader{"input.osm.pbf", otypes};
    MyHandler handler;
    osmium::apply(reader, handler);
    reader.close();
}

The example above reads an OSM file and writes some informations about nodes and ways to STDOUT.

You can define multiple handlers, osmium will feed the objects into the handlers one after another. Just add the additional handlers to osmium::apply() which accepts a reader and one or multiple handlers.

Multiple handlers are necessary if you want to access the locations of the nodes referenced by a way because the way itself only contains references to the nodes. A special handler has to offer methods to look up the location by the ID of a node. The best index type for this NodeLocationsForWays handler depends on the size of the file, the available memory and the operating system. See Osmium Concept Manual for details.

#include <iostream>

#include <osmium/handler.hpp>
#include <osmium/osm/node.hpp>
#include <osmium/osm/way.hpp>
#include <osmium/io/any_input.hpp>
#include <osmium/visitor.hpp>
#include <osmium/index/map/sparse_mem_array.hpp>
#include <osmium/handler/node_locations_for_ways.hpp>

class MyHandler : public osmium::handler::Handler {
public:
    void way(const osmium::Way& way) {
        std::cout << "way " << way.id() << '\n';
        for (const auto& n : way.nodes()) {
            std::cout << n.ref() << ": " << n.lon() << ", " << n.lat() << '\n';
        }
    }
};

int main() {
    auto otypes = osmium::osm_entity_bits::node | osmium::osm_entity_bits::way;
    osmium::io::Reader reader{"input.osm.pbf", otypes};

    namespace map = osmium::index::map;
    using index_type = map::SparseMemArray<osmium::unsigned_object_id_type, osmium::Location>;
    using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;

    index_type index;
    location_handler_type location_handler{index};

    MyHandler handler;
    osmium::apply(reader, location_handler, handler);
    reader.close();
}

You can find lots of examples how to use a handler at the examples of libosmium and osmium-contrib repository.

12. Collectors

If you write your own handler and get a relation callback, you cannot directly access the members of the relation. The only information you can get is the type, ID and role of the members.

To access the tags and geometry of relations members, you have to write your own collector. (If you just want to read multipolygons and boundary relations, head over to the subsection MultipolygonCollector of this manual because Osmium provides a special MultipolygonCollector for that purpose.)

Write Your Own Collector

A collector is a class which is derived from osmium::relations::Collector.

By default, it collects all relations and all their members. You can change this behaviour by overwriting following methods. See the class documentation of the Collector class for details.

Your collector has to implement at least void complete_relation(osmium::relations::RelationMeta& relation_meta). It is a callback function and called once a relation and all its members are complete. You can do there with the relation what ever you want to do. The following example shows how to access the relation, its members and their tags and references.

void MyRelCollector::complete_relation(osmium::relations::RelationMeta& relation_meta) {
    const osmium::Relation& relation = this->get_relation(relation_meta);
    std::cout << "Working on relation "
              << relation.id()
              << " which has following tags:\n";

    for (const osmium::Tag& tag : relation.tags()) {
        std::cout << tag.key() << " = " << tag.value() << '\n';
    }

    for (const auto& member : relation.members()) {
        switch (member.type()) {
            case osmium::item_type::node : {
                std::cout << "member node "
                          << member.ref()
                          << " with role "
                          << member.role()
                          << '\n';
                const auto& node = static_cast<const osmium::Node&>
                    (this->get_member(this->get_offset(member.type(), member.ref())));
                std::cout << "at "
                          << node.location()
                          << '\n';
                }
                break;
            case osmium::item_type::way :
                std::cout << "member way "
                          << member.ref()
                          << " with role "
                          << member.role()
                          << '\n';
                // accessing tags, node references and node locations
                // works like shown above with nodes, just cast to a
                // different class
                break;
            case osmium::item_type::relation :
                std::cout << "member relation "
                          << member.ref()
                          << " with role "
                          << member.role()
                          << '\n';
                break;
        }
    }
}

Incomplete Relations

If you work with extracts of the planet, your extract will usually not have all relations complete, i.e. some members of some relations are missing because they are located beyond the boundary of the extract. These relations will not be handled by complete_relation(osmium::relations::RelationMeta&). If you still want to work with these relations, you can add a method to your collector which handles these relations after everything has been finished. You have to call this method manually.

void MyRelCollector::handle_incomplete_relations() {
    for (auto* relation : this->get_incomplete_relations()) {
        for (const auto& member : relation.members()) {
            std::pair<bool, size_t> offset_pair =
                get_availability_and_offset(member.type(), member.ref());
            if (offset_pair.first) {
                // do what you would do with a relation
            }
        }
    }
}

The example above avoids a common pitfall when working with incomplete relations. Instead of using osmium::relations::Collector::get_offset(size_t), you should use std::pair<bool, size_t> osmium::relations::Collector::get_availability_and_offset(osmium::item_type, osmium::object_id_type) to ensure that the member is available. Otherwise your program will fail with Assertion 'range.begin()->is_available()' failed.

MultipolygonCollector

Multipolygons are a type of relations at OpenStreetMap (they are tagged with type=multipolygon) to model areas with inner rings and areas with multiple outer rings. Osmium provides a collector for multipolygons and boundary relations (which work like multipolygons but are tagged with type=boundary called osmium::area::MultipolygonCollector.

There are lots of examples how to use a MultipolygonCollector, e.g.

13. Creating Geometries

OSM objects describe where something is and what it is. The what is described by the tags, the where, the “geometry” is “encoded” in the locations (longitude and latitude) of the nodes for simple points, in the locations of the nodes in a way forming a linestring (or, possibly, a polygon if the first and last node are the same), and more complex geometrical objects (such as multipolygons) if relations are involved.

For many uses cases the geometry of an OSM object (or OSM objects) is important. After all, if you want to render a map, you need the geometry of everything in it. That is why libosmium has many functions to create the different kind of geometries from OSM objects. The whole exercise is made more difficult, because there are many different ways to represent geometries in C++ programs used by different software packages. Osmium knows about several of them.

Example: Creating a point geometry from a node

As an introductory example, we’ll look at how a point geometry can be created from a node.

#include <osmium/geom/factory.hpp>
const osmium::Node& node = ...; // got this from somewhere

osmium::geom::WKTFactory<> factory;
std::string wkt = factory.create_point(node);

First you need a geometry factory. Those factories know how to convert OSM objects into different kinds of geometry represantations. The WKTFactory creates geometries in the WKT (Well Known Text) format which is just a string like POINT(3.567 25.642).

Then you use the factory to create the point from the node and you are done.

Geometry types

Libosmium can create the following geometry types:

Geometry type from these objects with function
Point Node, NodeRef, Location create_point()
LineString Way, WayNodeList create_linestring()
Polygon Way, WayNodeList create_polygon()
MultiPolygon Area create_multipolygon()

Notes:

Factories

Libosmium supports the following factories for different geometry formats:

WKT

Well-known text is a simple text based format with geometries that look like POINT(2.2452, 41.3124) or LINESTRING(1.1554 2.5215, 1.1453 2.5663). They can be created like this:

#include <include/osmium/geom/wkt.hpp>
osmium::geom::WKTFactory<> factory;

The factory constructor takes an optional integer argument with the precision (number of digits after the decimal point), the default is 7, which is enough for OSM.

osmium::geom::WKTFactory<> factory{3}; // three digits after decimal point

All creation functions return a std::string:

std::string point = factory.create_point(node);
std::string line  = factory.create_linestring(way);
...

WKB

Well-known binary is a simple binary format. Create the factory like this:

#include <include/osmium/geom/wkb.hpp>
osmium::geom::WKBFactory<> factory;

The factory constructor takes two optional arguments. The first decides whether you want WKB (wkb_type::wkb, default) or Extended WKB (EWKB, wkb_type::ewkb), the second decides whether to output in raw binary (out_type::binary, default) or in hex encoded binary (out_type::hex).

To create extended WKB in hex format as used by PostGIS for example:

osmium::geom::WKBFactory<> factory{osmium::geom::wkb_type::ewkb,
                                   osmium::geom::out_type::hex};

All creation functions return a std::string:

std::string point = factory.create_point(node);
std::string line  = factory.create_linestring(way);
...

GEOS

The functions for creating GEOS geometries are deprecated and work only until GEOS 3.5. If you want to use it beyond that contact the libosmium developers by opening an issue on the Github repository.

GEOS is an Open Source library with powerful operations to work with and modify geometries. To use it from libsomium:

#include <include/osmium/geom/geos.hpp>
osmium::geom::GEOSFactory<> factory;

You can also set the SRID used by GEOS (default is -1, unset):

osmium::geom::GEOSFactory<> factory{4326};

If this is not flexible enough for your case, you can also create a GEOS factory yourself and then the libosmium factory from it:

geos::geom::PrecisionModel geos_pm;
geos::geom::GeometryFactory geos_factory{&pm, 4326};
osmium::geom::GEOSFactory<> factory{geos_factory};

Note: GEOS keeps a pointer to the factory it was created from in each geometry. You have to make sure the factory is not destroyed before all the geometries created from it have been destroyed!

All creation functions return a unique_ptr to the GEOS geometry:

std::unique_ptr<geos::geom::Point> point = factory.create_point(node);
std::unique_ptr<geos::geom::LineString> line = factory.create_linestring(way);
...

GDAL/OGR

The GDAL/OGR library is very popular. Almost all Open Source GIS tools use it in one form or another to read or write geometries from/to files or databases in dozens of different formats (Shapfiles, Spatialite, PostGIS, etc.) You can use it from libosmium, too:

#include <include/osmium/geom/ogr.hpp>
osmium::geom::OGRFactory<> factory;

The factory constructor doesn’t take any special arguments.

All creation functions return a unique_ptr to the OGR geometry:

std::unique_ptr<OGRPoint> point = factory.create_point(node);
std::unique_ptr<OGRLineString> line = factory.create_linestring(way);
...

GeoJSON

The GeoJSON format describes how to encode geometries in JSON.

Libosmium has two different GeoJSON factories. One creates normal std::strings with the JSON data. The other uses the RapidJSON library. Both only create the geometry portion of the JSON structure for you. You have to add the feature structure with the properties yourself as needed for your use case.

The GeoJSONFactory takes an optional precision as argument like the WKT constructor:

#include <include/osmium/geom/geojson.hpp>

osmium::geom::GeoJSONFactory<> factory{6};
std::string point = factory.create_point(node);

The RapidGeoJSONFactory takes a form of rapidjson::Writer as argument. Here is an example:

#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <include/osmium/geom/rapid_geojson.hpp>

typedef rapidjson::Writer<rapidjson::StringBuffer> writer_type;
rapidjson::StringBuffer stream;
writer_type writer{stream};
osmium::geom::RapidGeoJSONFactory<writer_type> factory{writer};

Please see the RapidJSON documentation for details about the Writer class.

Using projections

Before creating the geometries, libosmium can convert the coordinates from the OSM objects into different coordinate systems using a projection. This projection is given as a template parameter to the factory constructor:

osmium::geom::WKTFactory<> factory; // default identity projection (EPSG 4326)

or

osmium::geom::WKTFactory<osmium::geom::IdentityProjection> factory; // same

Often used is the Web Mercator projection (EPSG 3857):

#include <osmium/geom/mercator_projection.hpp>
osmium::geom::WKTFactory<osmium::geom::MercatorProjection> factory;

The identity and Mercator projection are handled internally in libosmium. But you can also use any projection implemented by the Proj.4 library:

#include <osmium/geom/projection.hpp>

osmium::geom::Projection projection{"+init=epsg:31467"}; // Gauss-Krueger GK3
osmium::geom::WKTFactory<osmium::geom::Projection> factory{projection};

You need to link with -lproj if you use this library. See the documentation of the Proj.4 library on the different ways to initialize a projection using a projection string.

Exceptions

Factory functions throw osmium::geometry_error exceptions if something went wrong creating a geometry.

Implementing your own factory

The geometry formats already implemented should cover a lot of uses, but if you need to implement your own format factory, you can do so based on the code in libosmium. You have to implement your own SomeFormatFactoryImpl class that implements the make_point(), linestring_start(), linestring_add_location(), linestring_finish(), polygon_start(), polygon_add_location(), polygon_finish(), multipolygon_start(), multipolygon_polygon_start(), multipolygon_polygon_finish(), multipolygon_outer_ring_start(), multipolygon_outer_ring_finish(), multipolygon_inner_ring_start(), multipolygon_inner_ring_finish(), multipolygon_add_location(), and multipolygon_finish() functions. These functions are usually very small adapting the data to the desired format. All the really logic is in the provided GeometryFactory parent class.

Then all you need is define the partial specialization

template <class TProjection = IdentityProjection>
using SomeFormatFactory = GeometryFactory<SomeFormatFactoryImpl, TProjection>;

and you are done.

Use the other implementations as examples and ask if you have any questions.

14. Storage

Osmium offers serveral different indexes suitable for different use cases. You have to choose a suitable index type. See the Osmium Concepts Manual for a list of available index types.

If you want to choose the index type on runtime, you can use osmium::index::MapFactory. The following code listing shows its usage. location_index_type is a variable you either set based on the preferences of the user of your program or based on your own estimates (e.g. file size).

#include <osmium/index/map.hpp>

using index_type = osmium::index::map::Map<osmium::unsigned_object_id_type, osmium::Location>;
using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;
std::string location_index_type = "sparse_mem_array";
const auto& map_factory = osmium::index::MapFactory<osmium::unsigned_object_id_type, osmium::Location>::instance();
auto location_index = map_factory.create_map(location_index_type);
location_handler_type location_handler{*location_index};

15. Exceptions

Libosmium uses various C++ standard exceptions and some Osmium-specific exceptions to tell you about problems. All Osmium-specific exceptions are in the osmium namespace, they are all derived from one of the standard C++ exceptions, usually std::runtime_error or std::system_error.

List of Osmium Exceptions

Exception Derived from Description
osmium::io_error   Some kind of input/output error. Derived classes describe the error in more detail.
osmium::xml_error io_error Some kind of XML parser error.
osmium::format_version_error io_error The OSM file format version was not understood. Osmium currently can only read version 0.6 files.
osmium::geometry_error   Some kind of geometry error.
osmium::projection_error   Thrown when a projection from one coordinate system into another fails in some way. Either the projection can’t be initialized because of invalid parameters or the projection can’t be calculated because the coordinates can’t be transformed into the target coordinate system.
osmium::not_found   This exception is thrown when a key is not found in an index.
osmium::invalid_location    
osmium::unknown_type   Thrown by visitors when they encounter an unknown (or in this context unexpected) item type in a buffer. This should not happen in usual circumstances.

Standard Exceptions thrown by Osmium

std::invalid_argument
Thrown by some Osmium functions.

16. Handling of invalid data

Libosmium can, to a certain extend, handle data that is invalid in the sense that it is not allowed in the OSM database or even might be nonsensical, for instance longitudes larger than 180°. This section explains the details and reasons.

There are good reasons for this behaviour:

Generally more low-level classes and functions (such as basic classes Location, Node, Tag, etc.) are more lenient for flexibility, while higher level functions (such as file I/O) might be more strict to support typical use cases.

File input and output

It is possible to encode some data in OSM files that can be considered to be invalid. When reading and writing OSM files libosmium does not care about that. It will give you the data in the form it is in the file and write out data you give to it in that form.

Order of objects in files

OSM objects in OSM files are usually ordered by type, ID (and version for history files). This is a useful convention, but it is not necessarily so. All OSM file formats allow the data to be in any order and libosmium can read and write those files. Whenever you read data using libosmium, it will be given to you in the order it is in the file, whatever that is. Whenever you write data, you must give it to libosmium in the order you want it to end up in the file.

Note that the ordering of objects in a file might influence the size of the file. Some file formats (notably PBF) will encode the data better if the same types of objects are together and even better if they are ordered by ID.

IDs

OSM node, way, relation, and changeset IDs are always positive. Zero is allowed by libosmium and understood as the “unset” or “don’t know” value. Negative values are also allowed because some programs (JOSM for instance) use negative IDs as temporary IDs. Not all parts of libosmium will just work with negative integers, though, you might have to handle them specially in some way. Indexes usually only work with positive IDs, if you have to handle negative IDs, use two indexes, one for positive IDs and one for negative IDs that you have to transform first.

OSM uses a different ID space for each entity type (nodes, ways, relations, changesets) and gives out IDs starting from 1. Libosmium allows any kind of ID that fits into an unsigned 64bit int, but some parts, notably the indexes, are optimized for smaller and more or less contiguous integers.

User ID

The user ID has to be zero (“unknown” or “anonymous”) or a positive integer. Negative values are not allowed in libosmium.

Timestamps

Timestamps are stored internally as seconds since the epoch (1970-01-01). Although OSM was founded much later, timestamps are not checked. Libosmium uses a few special values here. Time 0 is the “unknown” value, time 1 is understood to be “before any other time value” and 2^32-1 is understood to be “after any other time value”.

Locations

Locations are given in WGS84 longitude and latitude. Both libosmium and the OSM database store the coordinates internally as signed 32bit integers. 32bit integers have a range somewhat larger than the -180° to 180° longitude and -90° to 90° latitude. Values outside this range, but inside the signed 32bit integers are possible and historic OSM data contains such values. Use the Location::valid() function to check whether a location is in the proper range.

Strings and UTF-8

OSM strings use UTF-8 encoding, but a lot of the libosmium code doesn’t care about that and doesn’t check that a string is valid UTF-8. This is mostly for performance reasons, but it could also allow other character sets in non-OSM uses of the library.

Historically the OSM database sometimes contained non-UTF-8 strings. This should have all been fixed by now.

These parts of the library don’t care about string encoding:

These parts of the library do care about string encoding:

Strings and control characters

OSM strings (user names, tag keys and values, and roles) can not contain certain control characters. The reason is that those control characters can’t be expressed in XML. (XXX More details needed.)

Strings in OSM can only have a maximum lengt of 256 unicode characters. Libosmiums input and output routines allow any length up to 2^16 bytes. (XXX More details needed.)

17. Run-time Configuration

Osmium reads some settings from environment variables. This allows you to set configuration options for the library at run-time without any support from the application using the library. Setting these variables is usually not needed in normal operations but could be useful when debugging or tweaking performance.

OSMIUM_POOL_THREADS

The number of threads in the thread pool used for certain input/output operations.

If this is a negative number, it will be set to the actual number of cores on the system plus the given number, ie it will leave a number of cores unused. In all cases the minimum number of threads in the pool is 1.

Default: -2

OSMIUM_USE_POOL_THREADS_FOR_PBF_PARSING

Normally PBF parsing will use the thread pool. You can disable this by setting this variable to false.

Default: true

18. Changes from old versions of Osmium

This version has some substantial changes from he “old Osmium” available from https://github.com/joto/osmium and users of the “old Osmium” will have to rewrite their code. Use the examples provided in the “example” directory or in the osmium-contrib repository to get an idea what needs changing. These examples are often similar to the examples provided with the old Osmium so they should give you an idea how your code has to change.

Here are some of the more important changes: