Table of Contents
- 1. Introduction
- 2. Dependencies
- 3. Building Libosmium
- 4. Running tests
- 5. Using Libosmium in your own projects
- 6. Basic Types
- 7. OSM Entities
- 8. Buffers
- 9. Input and Output
- 10. Iterators
- 11. Handlers
- 12. Working with relations
- 13. Creating Geometries
- 14. Storage
- 15. Exceptions
- 16. Handling of invalid data
- 17. Run-time Configuration
- 18. Changes from old versions of Osmium
1. Introduction
The OpenStreetMap project is growing at an enormous rate. Working with the OSM data becomes increasingly difficult, because there is just so much of it and because it gets more complex all the time.
Osmium was developed as an answer to this challenge. After years of developing software to work with OSM data in many programming languages like Perl, Ruby, Java and even in XSLT, it became evident that something more was needed to efficiently work with these huge amounts of data. Processing speed was, of course, one big issue here, but the other one is available memory. Data processing tasks can be so much faster if their working set fits into memory, that it makes sense to think about this. Because Osmium is a C++ library it can make very efficient use of the main memory on your computer. Primitive objects such as integers and doubles, but also complex objects need only as much memory as is really necessary. There isn’t a lot of management overhead needed in many cases, if the data structures are chosen carefully.
Osmium has been in continuous development since it was borne in October 2010. And it has changed considerably over time. While the basic premise, to write a low-level efficient OSM library, is still true, it has become more and more powerful and at the same time easier to use. Osmium has been in production use nearly from day one, some parts of it have been ripped from earlier production code. Osmium is not an academic exercise, but it is used and it has shown its power many times. And while C++ might not be the easiest programming language to learn and Osmium might not be the easiest library to use, we try to make it as simple as possible to work with it, as long as this doesn’t compromise efficiency too much.
Header-only Library
Osmium is a header-only library, so there is nothing to compile to build it. Just include the header files you need.
The osmium
Namespace
Everything in the Osmium library is in the osmium
namespace or in
sub-namespaces. You’ll likely encounter the osmium::io
namespace for
everything related to file input and output and the osmium::geom
namespace
for geometry-related functionality, but there are some more.
Do not directly use anything in any sub-namespace called detail
. Those
classes and functions are for internal use only.
Code in any experimental
sub-namespace is experimental and might be removed
or changed without notice.
License
This manual is available under the Creative Commons Attribution-ShareAlike License version 4.0.
The Osmium Library is available under the very liberal Boost Software License:
Boost Software License - Version 1.0 - August 17th, 2003
Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the “Software”) to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:
The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2. Dependencies
Different parts of Libosmium have different dependencies. You do not need to install all of them, just those that you need for whatever you are doing with Libosmium. But for a beginner it is not always easy to see which dependencies are needed and which aren’t. This manual differentiates between important dependencies and extra dependencies to help you out. You should at least install the important dependencies when starting to experiment with Libosmium, but feel free to install all dependencies. Whatever is not needed will not be used anway, it will not slow down your program or make the binaries bigger.
On Linux systems most of these libraries are available through your package manager, see the list below for the names of the packages. But make sure to check the versions. If the packaged version available is not new enough, you’ll have to install from source. Most likely this is the case for Protozero and Libosmium itself.
On macOS many of the libraries above will be available through Homebrew.
When building Libosmium tests and examples, CMake will automatically look for these libraries in the usual places on your system. In addition it will look for the Protozero library in the same directory where the Libosmium repository is. So if you are building from the Git repository and want to use the newest Libosmium and Protozero, clone both into the same directory:
mkdir work
cd work
git clone https://github.com/mapbox/protozero
git clone https://github.com/osmcode/libosmium
In addition to the programs listed here, you’ll need a C++ compiler which supports C++11. Clang 3.4 or later and GCC 4.8 or later are known to work.
Important dependencies
CMake and Make
To build the tests, examples, etc. you need the CMake build system. Programs using Libosmium can, of course, be built with any build system you like, but the Libosmium repository as well as many projects based on Libosmium use it.
CMake has an optional curses-based configuration tool called ccmake
. It is
recommended that you install this also.
CMake usually generates a Makefile for Make, which you will also need.
- Debian/Ubuntu:
cmake
,cmake-curses-gui
,make
- Fedora/CentOS:
cmake
,make
- openSUSE:
cmake
,make
Expat
Expat is needed for parsing OSM XML files.
- Debian/Ubuntu:
libexpat1-dev
- Fedora/CentOS:
expat-devel
- openSUSE:
libexpat-devel
- Homebrew:
expat
ZLib
zlib is needed for reading and writing OSM PBF files and for GZip support when reading and writing XML files.
- Debian/Ubuntu:
zlib1g-dev
- Fedora/CentOS:
zlib-devel
- openSUSE:
zlib-devel
bz2lib
bz2lib is needed for BZip2 support when reading and writing OSM XML files.
- Debian/Ubuntu:
libbz2-dev
- Fedora/CentOS:
bzip2-devel
- openSUSE:
libbz2-devel
Boost >= 1.55
Boost is used for some (limited) functionality in libosmium. Many programs using libosmium will not actually need boost or only need parts of it.
- Boost Iterator is used for Tag filters, and for the Object Pointer Collection. (needed until libosmium 2.17)
- The CRC32 checksum implementation from boost can be used for calculating checksums over OSM objects. The implementation in zlib is used otherwise.
- Libosmium versions before 2.6.1 needed Boost for writing PBF files.
- Libosmium from version 2.12.0 uses boost::variant in the osmium::StringMatcher class.
You need at least Boost version 1.55.
- Debian/Ubuntu:
libboost-dev
- Fedora/CentOS:
boost-devel
- openSUSE:
boost-devel
- Homebrew:
boost
Google Protocol Buffers (until version 2.2)
Not needed any more from version 2.3.0 onwards
Google Protocol Buffers in at least version 2.4.0 is needed for reading and writing OSM PBF files.
- Debian/Ubuntu:
libprotobuf-dev
,protobuf-compiler
- openSUSE:
protobuf-devel
- Homebrew:
protobuf
OSMPBF (until version 2.2)
Not needed any more from version 2.3.0 onwards
The OSMPBF library is needed for reading and writing OSM PBF files.
- Debian/Ubuntu:
libosmpbf-dev
(The package in Ubuntu 14.04 and older is too old, install from source instead.) - Homebrew:
osm-pbf
Protozero >= 1.6.3 (since libosmium version 2.3.0)
The Protozero header only library is needed for reading and writing OSM PBF files.
You need at least version 1.6.3.
Up to version 2.13 a copy of this library was included in the libosmium repository. For newer version you need to install either a packaged version or a version from the git repository.
- Debian/Ubuntu:
libprotozero-dev
- Fedora/CentOS:
protozero-devel
Utfcpp (until version 2.14.0)
Not needed any more from version 2.15.0 onwards
The utf8-cpp library is needed for the OPL
output format. A copy of this library is included in the libosmium repository
but not installed by default. Either use the packages of your distribution,
install it from the source, or use the INSTALL_UTFCPP
option of the libosmium
CMake configuration to install the bundled version.
- Debian/Ubuntu:
libutfcpp-dev
- Fedora/CentOS:
utf8cpp-devel
- openSUSE:
utfcpp
Extra dependencies
Google Sparsehash (deprecated)
Google Sparsehash (https://github.com/sparsehash/sparsehash) is used for
the sparse-mem-table
index map, sometimes used as a node location store.
This isn’t usually needed any more, because there are better implementations
for the node location store available.
- Debian/Ubuntu:
libsparsehash-dev
- Fedora/CentOS:
sparsehash-devel
- openSUSE:
sparsehash-devel
- Homebrew:
google-sparsehash
Boost Program Options (until version 2.7.2)
Boost Program Options is needed for parsing command line options in some examples.
- Debian/Ubuntu:
libboost-program-options-dev
- Fedora/CentOS:
boost-program-options
- openSUSE:
boost-devel
GDAL/OGR
GDAL/OGR is needed if you want to convert OSM geometries into OGR geometries.
- Debian/Ubuntu:
libgdal-dev
- Fedora/CentOS:
gdal-devel
- openSUSE:
gdal-devel
- Homebrew:
gdal
To use, compile with what the command
gdal-config --cflags
returns and link with what
gdal-config --libs
returns.
GEOS
GEOS is needed if you want to convert OSM geometries into GEOS geometries. The GEOS support is deprecated and works only until GEOS 3.5. For details see this commit.
- Debian/Ubuntu:
libgeos++-dev
- Fedora/CentOS:
geos-devel
- openSUSE:
geos-devel
- Homebrew:
geos
Proj.4
The Proj.4 library is needed if you want to project OSM coordinates into spatial reference systems other than Web Mercator (EPSG 3857, often named Google Mercator).
Only the old proj_api.h
based API is supported. If you need this to work
with newer versions of Proj.4, have a look at https://github.com/osmcode/osmium-proj
for some untested experimental code.
- Debian/Ubuntu:
libproj-dev
- Fedora/CentOS:
proj-devel
,proj-epsg
- openSUSE:
libproj-devel
,proj
LZ4 (from 2.16.0)
The LZ4 library is needed if you want to use LZ4 compression in PBF files. This is an optional feature available from libosmium version 2.16.0.
- Debian/Ubuntu:
liblz4-dev
Doxygen
The Libosmium API documentation can be built using Doxygen. Usually you do not need to do this, because the API reference is available online. If you want to build it yourself, you need Graphviz in addition to Doxygen.
- Debian/Ubuntu:
doxygen
,graphviz
- Fedora/CentOS:
doxygen
,graphviz
,xmlstarlet
- openSUSE:
doxygen
,graphviz
Installing dependencies on some Linux systems
Debian Stretch, Buster, Bullseye or newer
You can install all dependencies with:
apt-get install -q -y \
cmake \
doxygen \
g++ \
git \
graphviz \
libboost-dev \
libbz2-dev \
libexpat1-dev \
libgdal-dev \
libgeos++-dev \
liblz4-dev \
libproj-dev \
make \
ruby \
ruby-json \
spatialite-bin \
zlib1g-dev
Ubuntu 18.04 or newer
You can install all dependencies with:
apt-get install -q -y \
cmake \
doxygen \
g++ \
git \
graphviz \
libboost-dev \
libbz2-dev \
libexpat1-dev \
libgdal-dev \
libgeos++-dev \
liblz4-dev \
libproj-dev \
make \
ruby \
ruby-json \
spatialite-bin \
zlib1g-dev
Fedora
You can install all dependencies with:
dnf install --quiet --assumeyes \
boost-devel \
bzip2-devel \
cmake \
doxygen \
expat-devel \
gcc-c++ \
gdal-devel \
gdalcpp-static \
geos-devel \
git \
graphviz \
lz4-devel \
make \
proj-devel \
ruby \
rubygem-json \
spatialite-tools \
zlib-devel
openSUSE 42
You can install all dependencies with:
zypper --non-interactive --no-color install \
boost_1_61-devel \
cmake \
doxygen \
gcc6-c++ \
gdal-devel \
geos-devel \
graphviz \
libbz2-devel \
libexpat-devel \
libproj-devel \
proj \
ruby2.3 \
ruby2.3-rubygem-json \
zlib-devel
Arch Linux
You can install all important dependencies with:
sudo pacman -Suy protobuf boost-libs zlib expat cmake make bzip2
and all extra dependencies with:
sudo pacman -Suy boost gdal proj doxygen
3. Building Libosmium
Libosmium is a header-only library, that means that you do not have to build anything. But you might want to build the tests, examples, benchmarks or the documentation. This chapter explains how to do that.
Before building you need to install all the dependencies.
CMake
Libosmium uses the CMake configuration system available on all major platforms. CMake will generate a configuration for a build system of your choice. On Linux and macOS this is usually GNU Make, on Windows Nmake or MSBuild.
Build types
CMake knows several different build types that result in the use of different
compiler options and different build options (see below). By default the build
type RelWithDebInfo
(Release with debug info) will be used, but you can
change this either by setting CMAKE_BUILD_TYPE
in ccmake
or on the command
line:
cmake -DCMAKE_BUILD_TYPE=Dev
Here are the build types used for Libosmium:
CMAKE_BUILD_TYPE |
Description |
---|---|
Debug |
Debug mode, no optimizations. |
Dev |
For Libosmium developers. All build options are set to ON and very strict compiler warnings are enabled. |
MinSizeRel |
Release mode, optimize for small binary. |
RelWithDebInfo |
Release mode with debug information compiled in. Use this unless the binaries generated are too big for you. |
Release |
Release mode. |
Build options
Depending on the build type (see above), different build options are ON
or
OFF
. You can change the settings in ccmake
or on the command line with
something like
cmake -DBUILD_EXAMPLES=ON
etc.
Build option | Default | Description |
---|---|---|
BUILD_BENCHMARKS |
OFF (ON in Dev build) |
Build the benchmark programs. You only need this if you intend to run the benchmarks. |
BUILD_DATA_TESTS |
OFF (ON in Dev build) |
Build the data tests. These tests need OSM test data from a different repository, so they are a bit more difficult to run. See chapter Running Tests for details. |
BUILD_EXAMPLES |
ON |
Build the examples in the examples directory. |
BUILD_HEADERS |
OFF (ON in Dev build) |
Only interesting for Libosmium developers. This will build every Libosmium header file by itself to check if the include dependencies are all set correctly. |
BUILD_TESTING |
ON |
Build the unit tests. See chapter Running Tests for details. |
Building on Linux and macOS
Linux: Osmium is developed on Linux and tested best on that system. Debian Jessie (testing) and current Ubuntu systems come with everything needed for Osmium. Debian wheezy (stable) and the Ubuntu LTS release 12.04 don’t have compilers current enough. If you are stuck on these systems, use a backported compiler.
macOS: Osmium also works well on macOS with the exception of the parts that need the mremap system call that is not available on macOS.
First clone Libosmium from the git repository (or install it in some other way):
git clone https://github.com/osmcode/libosmium
cd libosmium
Then create a directory in which the build should happen. In this documentation
we will use the directory build
, but you can choose any other name. You can
have several build directories at the same time with different build options
and they will not interfere with each other.
mkdir build
cd build
The call CMake to create an initial configuration:
cmake ..
CMake will check your system, determine locations of programs, include headers, libraries etc. It will also set some default build options. You can then call
ccmake ..
to enter a cursed-based tool that allows you to edit any configuration setting.
Use the cursor keys to choose any variable and press Enter
to change it. Once
you are done, press c
to configure and handle any errors that might appear.
You might have to do this step several times. Then press g
to generate the
configuration and exit the program. For more advanced usage info, see the
ccmake
help.
Now you can call
make
to complete the build.
For Mac users: If you have clang 3.2 or newer, use the system compiler. If not you have to build the compiler yourself. See the instructions on https://clang.llvm.org/ .
Building on Windows
You need a rather new Visual C++ compiler for this to work. Visual C++ 2013 (a.k.a 12.0) is not supported. You’ll need 2014 CTP or the 2015 Preview. This is due to the limited C++11 support in earlier versions of Visual C++.
The easiest way on Windows is to use the windows-builds repository.
When the pre-requisites (Visual Studio 2014/2015, git) are in place, it should not take more than these steps to compile libosmium:
git clone https://github.com/mapbox/windows-builds.git
cd windows-builds
settings.bat
scripts\build_libosmium_deps
scripts\package_libosmium_deps
scripts\build_libosmium vs
Building on 32bit architectures
Osmium works well on 64 bit machines, but on 32 bit machines there are some problems. Be aware that not everything will work on 32 bit architectures. This is mostly due to the 64 bit needed for node IDs. Also Osmium hasn’t been tested well on 32 bit systems. Here are some issues you might run into:
- Google Sparsehash does not work on 32 bit machines in our use case. Support for it is deprecated and will be removed in a future version.
- The
mmap
system call is called with asize_t
argument, so it can’t give you more than 4GByte of memory on 32 bit systems. This might be a problem.
Please report any issues you have and we might be able to solve them.
Building the reference documentation
To build the documentation you’ll need Doxygen.
After configuring with CMake as described above, call
make doc
to create the reference documentation.
Installing Libosmium
Call make install
in the build directory to install the library. By default,
this will install the Osmium include files into /usr/local/include/
.
The following external (header-only) libraries are included in the libosmium repository:
If you want this library to be installed along with libosmium
itself when calling make install
, you have to use the CMake options
INSTALL_GDALCPP
.
(Libosmium versions 2.13 and before also included
protozero which could be included
with INSTALL_PROTOZERO
. Newer versions of libosmium don’t include this any
more.)
(Libosmium versions 2.14.0 and before also included
utfcpp which could be included
with INSTALL_UTFCPP
. Newer versions of libosmium don’t include this any
more.)
If something didn’t work
Here are some tips if your build failed:
- Make sure you have all dependencies installed. Sometimes it is not easy to see from the error message which dependency is missing.
- Usually CMake and Make are quite good at tracking what needs rebuilding when you change configurations etc. But sometimes they get confused. Try restarting from scratch with an empty build directory.
- Check the
cmake
output to see if there are any warnings. - Try
cmake
with-DOsmium_DDEBUG=ON
to see some more debug information. - Run
make VERBOSE=1
to see the commands Make is calling. - Check the advanced CMake configuration section below.
Advanced CMake configuration
The following variables can be set in the CMake configuration to further change the build. Changes here are usually not necessary though:
Option | Description |
---|---|
BENCHMARK |
If BUILD_BENCHMARKS is ON , this variable contains the semicolon-separated list of all benchmarks that should be built. The prefix osmium_benchmark_ will be added to all executables. |
EXAMPLES |
If BUILD_EXAMPLES is ON , this variable contains the semicolon-separated list of all examples that should be built. The prefix osmium_ will be added to all executables. |
OSMIUM_WARNING_OPTIONS |
C++ compiler warning options used in Dev mode. |
Running clang-tidy
To check for problems in the source code not detected by compilers, you can run
the clang-tidy
command. If it is installed and CMake found it, you can call
Make with the clang-tidy
target:
make clang-tidy
The configuration for clang-tidy is in the file .clang-tidy
. It also contains
documentation on why certain warnings are disabled.
Running clang-tidy will take quite a while and might generate a lot of output. You can redirect the output to a file using something like this:
make clang-tidy >clang-tidy.log 2>&1
Running CPPCheck
To check for problems in the source code not detected by compilers, you can run
the cppcheck
command. If it is installed and CMake found it, you can call
Make with the cppcheck
target:
make cppcheck
This will check all .hpp
and .cpp
files and can take a while.
4. Running tests
Libosmium uses version 1 of the Catch unit testing framework and CTest which is part of the CMake suite.
There are three kinds of tests: unit tests, data tests, and example tests. For the details see below.
Tests should never fail. If they do fail in your environment, please report this as a bug. Some tests will be disabled on some platforms if they are testing functionality thats not available on that platform. Some tests will be disabled on your host if you don’t have the needed dependencies installed.
Running the tests
To run the tests, build the project es described in the Building Libosmium chapter and then run
ctest
which will run all the configured tests. You can run all tests matching a pattern with something like
ctest -R 'io_.*'
or exclude tests from being run with something like
ctest -E io_test_reader
If there is some problem you can enable verbose mode:
ctest -V
See the CTest documentation for more details.
Labels
CTest allows tests to be labeled to categorize them. All unit tests have the
label unit
and a label for their category (the directory under test/t
). All
data tests have the label data
. In addition all tests are labeled as fast
or slow
. Fast tests don’t take a noticable amount of time, slow tests do.
You can run all tests with labels matching a regular expression with -L
. So
to run only fast tests use
ctest -L fast
You can use
ctest --print-labels
to see all available labels.
Unit tests
Unit tests check small parts of Libosmium. They can be found in the directories
under test/t
. If you are installing Libosmium, you should probably run these
tests to make sure Libosmium works in your environment.
Unit tests are enabled or disabled with the BUILD_TESTING
CMake setting.
Different tests have different dependencies and CMake will disable all tests
that don’t have their dependencies met.
You can also run the unit tests manually without going through CTest. After
building they are in the build/test
directory. Call them with --help
to see
options.
Data tests
Data tests need external OSM test data to run. They are enabled or disabled
with BUILD_DATA_TESTS
, but you have to install the test data for them to
work. For this call git submodule update --init
in the libosmium repository.
If you have put the test data somewhere else, you can use the OSM_TESTDATA
variable in CMake to point to that directory.
The testdata-multipolygon
test needs
Spatialite and
Ruby with the json
gem installed. Those
dependencies are currently not checked for in the CMake configuration.
Note that older versions of libosmium don’t have the test data installed as a submodule, but expect it to be in the same directory you installed Libosmium in. To do this clone the osm-testdata repository:
git clone https://github.com/osmcode/osm-testdata
Example tests
Some example programs come with tests. Those tests are under test/examples
.
They run the example programs with some arguments to check basic functionality.
Currently these tests are very rudimentary.
5. Using Libosmium in your own projects
Libosmium is generally quite easy to use in your own projects. Just include the specific header files you need for your application and start using Libosmium functions. Because Libosmium is a header-only library, there is nothing to link with. There isn’t one include file for everything, but many include files each only bringing in some specific classes and functions. This way you are not paying for something you don’t use.
Read the manuals
Before you do anything else we recommend you at least skim the Libosmium concepts manual and this manual. This will give you an overview of what’s where and how Libosmium works.
Read the API reference
The API reference contains a
documentation of every class and function in Libosmium. It will tell you which
#include
directive you need where.
Libosmium uses several other libraries for many of its functions and you have to figure out which libraries to link with when you include specific Libosmium header files. This is documented in the reference and there is a list below for your convenience.
CMake configuration
If you are using CMake to configure your project, using Libosmium is very easy, because complete configuration is available. Copy the file FindOsmium.cmake to your project:
cd your-project
mkdir -p cmake
cd cmake
wget https://github.com/osmcode/libosmium/raw/master/cmake/FindOsmium.cmake
and include it in your CMakeLists.txt
:
list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
find_package(Osmium REQUIRED)
This will tell CMake to find the Libosmium includes on the build system during the configuration. You can check whether this was successful with something like:
if(NOT OSMIUM_FOUND)
message(WARNING "Libosmium not found!\n")
endif()
If your code doesn’t work with older version of Libosmium, you can tell CMake the minimum version number:
find_package(Osmium 2.15.6 REQUIRED)
You can add an optional list of components that should be found also. For
example to look for the io
and gdal
components you extend the
find_package
command like this:
find_package(Osmium REQUIRED COMPONENTS io gdal)
FindOsmium
knows about the following components:
pbf
- include libraries needed for PBF input and outputxml
- include libraries needed for XML input and outputio
- include libraries needed for any type of input/outputgdal
- include if you want to use any of the OGR functionsproj
- include if you want to use any of the Proj.4 functionssparsehash
- include if you use the sparsehash index map (sparse_mem_table
), deprecated
After that add the include directories:
include_directories(${OSMIUM_INCLUDE_DIRS})
You can look at the CMake configuration in the Osmium Tool and Osmium Contrib repositories for some working examples.
Note that you should occasionally check whether you still have a current
version of FindOsmium.cmake
and update if necessary.
Libraries needed for specific functionality
Also see the dependencies chapter.
XML input
For XML input you need the Expat XML parser, for XML output no special XML library is needed. In any case you need threading enabled. If you want to read or write compressed XML files you need ZLib and BZ2lib.
- Dependencies: Expat, Zlib, BZ2lib
- Link with:
libexpat
, enable multithreading - Classes:
osmium::io::Reader
,osmium::io::Writer
- Include files:
osmium/io/any_input.hpp
,osmium/io/any_output.hpp
,osmium/io/xml_input.hpp
,osmium/io/xml_output.hpp
,osmium/io/any_compression.hpp
,osmium/io/gzip_compression.hpp
,osmium/io/bzip2_compression.hpp
PBF input and output
For PBF input and output you need several libraries and threading enabled.
For version 2.3.0 and above you don’t need much:
- Dependencies: Zlib
- Link with:
libz
,ws2_32
(Windows only), enable multithreading - Classes:
osmium::io::Reader
,osmium::io::Writer
- Include files:
osmium/io/any_input.hpp
,osmium/io/any_output.hpp
,osmium/io/pbf_input.hpp
,osmium/io/pbf_output.hpp
If you want support for lz4 compression in PBF blobs, you also need the LZ4 library.
For versions up to 2.2 you need some more libraries:
- Dependencies: Google Protocol Buffers, OSMPBF, Zlib
- Link with:
libprotobuf-lite
,libosmpbf
,libz
,ws2_32
(Windows only), enable multithreading - Classes:
osmium::io::Reader
,osmium::io::Writer
- Include files:
osmium/io/any_input.hpp
,osmium/io/any_output.hpp
,osmium/io/pbf_input.hpp
,osmium/io/pbf_output.hpp
GDAL/OGR
The GDAL/OGR library is needed when you want to convert OSM geometries into OGR geometries or report problems building multipolygons into OGR formats.
- Link with:
libgdal
- Classes:
osmium::geom::OGRFactory
- Include files:
osmium/geom/ogr.hpp
,osmium/area/problem_reporter_ogr.hpp
PROJ
The PROJ library is only needed when you want to project OSM locations into
arbitrary coordinate reference systems. If you only want to convert to Web
Mercator, use osmium::geom::MercatorProjection
instead and you don’t need an
extra library.
- Link with:
libproj
- Classes:
osmium::geom::Projection
- Include files:
osmium/geom/projection.hpp
Note that only PROJ up to version 5 is supported.
Compiler options
You might have to set the C++ version using the compiler option
-std=c++11
When working with OSM data you often have very large files with several gigabytes. This can lead to problems on 32bit systems. Use the options
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
for the compiler to make sure that large files work.
Sample Compilation String
g++ osm_processor.cpp --std=c++11 -lpthread -lz -lexpat -lbz2
6. Basic Types
All the types and classes described in this chapter are value types, ie they are small and can be copied around cheaply.
IDs
Typedef: osmium::object_id_type
Include: <osmium/osm/types.hpp>
For object IDs use the type osmium::object_id_type
. It is a 64bit
signed integer that can represent the more than 2 billion nodes
we already have in OSM. While way and relation IDs could theoretically
use a smaller ID type (signed 32 bit are currently enough), for
consistency and to be future-proof, they will also use this type
in most cases.
OSM objects always have positive IDs. But some software (such as JOSM)
uses negative IDs for objects that have not yet been uploaded to the
main OSM database. To support these use cases, the object_id_type
is
a signed integer.
Some parts of Osmium, notably the different index classes, can only
work with positive IDs. In those cases the type
osmium::unsigned_object_id_type
is used. If you know that your data
only contains positive IDs or only negative IDs, you can use the
positive_id()
member function on the Object
class to get IDs of that type.
It will return the absolute value of the ID.
If your data contains a mix of positive and negative IDs, this simple
approach will fail! In that case you have to use two indexes, one
for the positive IDs and one for the negative IDs. The
osmium::handler::NodeLocationsForWay
class takes this approach.
Other Primitive Types
Include: <osmium/osm/types.hpp>
There are several other typedefs:
Type | Description |
---|---|
object_version_type |
type for OSM object version number |
changeset_id_type |
type for OSM changeset IDs |
user_id_type |
type for OSM user IDs |
num_changes_type |
type for number of changes in a changeset |
All these types are currently 32bit integers. Version numbers, changeset IDs and User IDs are always positive (they start out with 1). The number of changes can be 0 or larger.
Locations
Class: osmium::Location
Include: <osmium/osm/location.hpp>
In Osmium all positions on Earth are stored in objects of the
osmium::Location class. Coordinates are stored as 32 bit signed integers
after multiplying the coordinates with osmium::coordinate_precision
= 10,000,000.
This means we can store coordinates with a resolution of better
than one centimeter, good enough for OSM use. The main OSM
database uses the same system. We do this to save memory, a
32 bit integer uses only 4 bytes, a double uses 8.
Coordinates are not checked when they are set.
To create a location:
osmium::Location location{9.3, 49.7};
or using integers:
osmium::Location location{93000000, 497000000};
Make sure you are using the right number type or you will get very wrong coordinates.
You can also create an undefined location. This is used for instance for coordinates in ways that are not set yet:
osmium::Location location{};
In a boolean context an undefined location returns false, a defined true. So you can write something like:
if (location) {
...defined location here...
}
You can get and set the coordinates using the internal (integer)
format with the x()
and y()
member functions and the external (double)
format with the lon()
and lat()
member functions.
The normal bounds for the longitude and latitude are -180 to 180 and -90 to 90, respectively. But in historic OSM data you can sometimes find locations outside these bounds. Call
location.valid()
to find out if a location is inside those bounds.
The lon()
and lat()
getter calls will throw an exception if the location is
invalid or undefined.
Segments
Class: osmium::Segment
Include: <osmium/osm/segment.hpp>
Segments are the directed connection between two locations. They are not OSM objects but sometimes useful in algorithms.
Undirected Segments
Class: osmium::UndirectedSegment
Include: <osmium/osm/undirected_segment.hpp>
Undirected Segments are connection between two locations. They are not OSM objects but sometimes useful in algorithms.
Boxes
Class: osmium::Box
Include: <osmium/osm/box.hpp>
A box is a rectangle described by the minimum and maximum longitude and latitude. It is used, for instance, in the header of OSM files and in changesets to describe the bounding box.
osmium::Box box;
box.extend(osmium::Location{3.2, 4.3});
box.extend({4.5, 7.2});
box.extend({3.3, 8.9});
std::cout << box; // (3.2,4.3,4.5,8.9)
7. OSM Entities
Osmium works with the four basic types of OSM entities: Nodes, Ways, and Relations (which are all [OSM Objects]) and Changesets. In addition Areas are supported, which are not native OSM objects, but they are almost treated like real OSM objects.
These OSM entities can not be created like any normal C++ object, but they need a buffer to live in. See the next chapter for details. Accessing existing OSM entities on the other hand is easy and straightforward.
OSM Objects
Class: osmium::OSMObject
Include: <osmium/osm/object.hpp>
The osmium::OSMObject
class is the base class for nodes, ways, and relations.
it has accessors for the usual OSM attributes:
osmium::OSMObject& obj = ...;
std::cout << "id=" << obj.id()
<< " version=" << obj.version()
<< " timestamp=" << obj.timestamp()
<< " visible=" << (obj.visible() ? "true" : "false"
<< " changeset=" << obj.changeset()
<< " uid=" << obj.uid()
<< " user=" << obj.user() << "\n";
The changeset()
and uid()
accessor functions return the IDs of the changeset
that created this object version and the User ID of the user creating this version
of the object, respectively. They do not link to an object of that type.
The visible
flag will always be true for normal OSM data, but for history data
or change files it shows whether an object version has been deleted.
In addition each object has a list of tags attached:
const osmium::TagList& tags = obj.tags();
You can iterate over all tags:
for (const auto& tag : obj.tags()) {
std::cout << tag.key() << '=' << tag.value() << '\n';
}
Or you can find specific tags:
const char* highway = obj.tags().get_value_by_key("highway");
if (highway && !std::strcmp(highway, "primary") {
...
}
Nodes
Class: osmium::Node
Include: <osmium/osm/node.hpp>
A Node
is a kind of OSMObject
. In addition to the things you can do with any
OSMObject, the Node has a Location.
const osmium::Node& node = ...;
double longitude = node.location().lon();
Ways
Classes: osmium::Way
, osmium::WayNode
, osmium::WayNodeList
Include: <osmium/osm/way.hpp>
A Way
is a kind of OSMObject
. In addition to the things you can do with any
OSMObject, a Way has a list of node references:
const osmium::Way& way = ...;
for (const osmium::NodeRef& nr : way.nodes()) {
std::cout << "ref=" << nr.ref() << " location=" << nr.location() << '\n';
}
Relations
Classes: osmium::Relation
, osmium::RelationMember
, osmium::RelationMemberList
Include: <osmium/osm/relation.hpp>
A Relation
is a kind of OSMObject
. In addition to the things you can do with any
OSMObject, a Relation has a list of members:
const osmium::Relation& relation = ...;
const osmium::RelationMemberList& rml = way.members();
for (const osmium::RelationMember& rm : rml) {
std::cout << rm.type() << rm.ref() << " (role=" << rm.role() << ")\n";
}
Areas
not yet documented
Changesets
Class: osmium:Changeset
Include: <osmium/osm/changeset.hpp>
Changesets contain the metadata for a set of changes to OSM data.
osmium::Changeset
8. Buffers
Include: <osmium/memory/buffer.hpp>
OSM entities have to be stored somewhere in memory. They are complex objects containing arbitrary number of tags, relations can have any number of members etc. If we handled those objects like any normal C++ object, creating them would take lots of small memory allocations and many pointer indirections to get at all the parts of the data. Instead OSM entities are created inside so-called buffers. Buffers can have a fixed size or grow as needed. New objects can be added at the end, and they are stored inside those buffers in a reasonably space-efficient manner while still being accessible easily and quickly.
Buffers can be moved around between different parts of your program and even between threads. The content of buffers can even be written to disk as it is and read back in and immediately used “as is” without any serialization or de-serialization step needed.
But all of this has one draw-back: It is slightly more complicated to create those objects and they can not just be instantiated on the stack.
Buffers can not be copied, because it is unclear who would be responsible for the memory then. But they can be moved.
Creating a Buffer
Buffers exist in two different flavours, those with external memory management and those with internal memory management. If you already have some memory with data in it (for instance read from disk), you create a buffer with external memory managment. It is your job then to free the memory once the buffer isn’t used any more. If you don’t have some memory space already, you can create a Buffer object and have it manage the memory internally. It will dynamically allocate memory and free it again after use.
To create a buffer from existing memory you give the address and size to the constructor:
const int buffer_size = 10240;
void* mem = malloc(buffer_size);
osmium::memory::Buffer buffer{mem, buffer_size};
This will create an empty buffer with buffer_size
bytes available for use.
If the new buffer already contains some data, you can add the number of bytes already in use as a third parameter to the constructor:
void* mem = malloc(buffer_size);
int num = read(0, mem, buffer_size);
osmium::memory::Buffer buffer{mem, buffer_size, num};
To create a buffer with internal memory-management you construct it with the number of bytes it should have initially and a flag that tells Osmium whether it should automatically grow the buffer if it is needed:
const int buffer_size = 10240;
osmium::memory::Buffer buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
osmium::memory::Buffer buffer{buffer_size, osmium::memory::Buffer::auto_grow::no};
Adding Items to the Buffer
You cannot create OSM objects on the stack, they always have to be stored in buffers. To create OSM objects special “builder” classes are used:
void add_tags(osmium::memory::Buffer& buffer, osmium::builder::Builder* builder) {
osmium::builder::TagListBuilder tl_builder{buffer, builder};
tl_builder.add_tag("amenity", "restaurant");
}
const int buffer_size = 10240;
osmium::memory::Buffer node_buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
{
osmium::builder::NodeBuilder builder{node_buffer};
builder.add_user("foo");
osmium::Node& obj = builder.object();
obj.set_id(1);
obj.set_version(1);
obj.set_changeset(5);
obj.set_uid(140);
obj.set_timestamp("2016-01-05T01:22:45Z");
obj.set_location(osmium::Location{9.0, 49.0});
add_tags(node_buffer, &builder);
}
node_buffer.commit();
// do something with the buffer (e.g. write to file)
Building OSM entities and adding them to a buffer has some pitfalls. A buffer has to be
aligned (padding with zeros) before committing. If you try to commit a buffer which is
not aligned, you program will fail with Assertion 'buffer.is_aligned()' failed
.
The addition of the attributes version
, changeset
, uid
and timestamp
may be
omitted but you have to add the attribute user
in order to have an aligned buffer.
If the object has references to other OSM objects (tags of an OSM object, node references of a way, members of a relation), you need additional builders for these reference lists. The destructor of one of these builders has to be called before another builder writes data to the buffer.
void build_way(osmium::memory::Buffer& buffer) {
osmium::builder::WayBuilder way_builder{buffer};
way_builder.object().set_id(1);
// set attributes version, changeset, uid and timestamp (all optional)
way_builder.add_user("foo");
{
osmium::builder::WayNodeListBuilder wnl_builder{buffer, &way_builder};
wnl_builder.add_node_ref(osmium::NodeRef (1, osmium::Location()));
wnl_builder.add_node_ref(osmium::NodeRef (2, osmium::Location()));
}
add_tags(buffer, way_builder);
}
const int buffer_size = 10240;
osmium::memory::Buffer way_buffer{buffer_size, osmium::memory::Buffer::auto_grow::yes};
build_way(way_buffer);
way_buffer.commit();
This will create only a way, the nodes have to be created separately.
Building relations works similar to building ways. You use a
osmium::builder::RelationBuilder
instead of a WayNodeListBuilder
. The
instance of RelationBuilder
has to go out of scope before the
TagListBuilder
writes the tags to the buffer and vice versa.
Handling a Full Buffer
If a buffer becomes full, there are two different things that can happen:
If the buffer was created with auto_grow::yes
, it will reserve more memory
on the heap and double its size. This will happen without the client code
noticing, but it will invalidate any pointer pointing into the buffer. This
is similar behaviour as a std::vector
so it should be familiar to C++
programmers.
If the buffer was created with auto_grow::no
(or if it is a buffer with
external memory management), the exception osmium::BufferIsFull
will be
thrown. In this case you have to catch the exception, either grow the buffer or
create a new one. If you grow the buffer you can keep going at the point where
you left off. If you start a new one, the last object you were writing to the
buffer when the exception was thrown was not committed and you have to write it
again into the new buffer.
The CallbackBuffer
Class
Include: <osmium/memory/callback_buffer.hpp>
The CallbackBuffer
is a small wrapper class around the Buffer
class. It
tries to keep the size of the internal buffer beneath a maximum buffer size
specified in the constructor. If the buffer is “full” a callback is called.
// Initialize a callback buffer with default size (1MB) and default max
// size (800kB). You can change those numbers by giving them to the constructor.
CallbackBuffer cb;
// Set a callback that knows what to do with the buffer, for instance it can
// write it out to disk.
cb.set_callback([&](osmium::memory::Buffer&& buffer) {
...handle buffer...
}
// Add objects to your buffer, for instance like this:
osmium::builder::add_node(cb.buffer(), _id(9), ...);
// Call `possibly_flush()` after each object added to the buffer to check
// the size and possibly call the callback.
cb.possibly_flush();
// ...
// Force a flush of the buffer when you are finished adding data to the buffer.
cb.flush();
Note that the buffer can grow beyond the initial buffer size if needed. This can happen if a new object doesn’t fit into the rest of the buffer available or if no callback function is set (yet).
9. Input and Output
Libosmium can read several different OSM file formats.
Headers
Whenever you want to use Osmium to access OSM files you need to include the right header files and link your program to the right libraries. If you want to support all the different formats you add
#include <osmium/io/any_input.hpp>
and/or
#include <osmium/io/any_output.hpp>
to your C++ files. These headers will pull in all the file formats and all the compression types for input and output, respectively. Usually this is what you want to use. But if you are sure you don’t need all formats or if you don’t have all the libraries needed for all the formats, you can pick and choose formats and compression types.
If you only need some file formats, you can include any combinations of the following headers:
#include <osmium/io/pbf_input.hpp>
#include <osmium/io/xml_input.hpp>
#include <osmium/io/debug_output.hpp>
#include <osmium/io/opl_output.hpp>
#include <osmium/io/pbf_output.hpp>
#include <osmium/io/xml_output.hpp>
If you want compression support, you have to add the includes for the different compression algorithms:
#include <osmium/io/gzip_compression.hpp>
#include <osmium/io/bzip2_compression.hpp>
Or, if you want both anyway, you can just use the shortcut:
#include <osmium/io/any_compression.hpp>
Compression
If you want to use compression you have to include the right header files and
link to the libz
and libbz2
libraries, respectively.
File Formats
XML
For read support you need the expat parser library. Link with:
-lexpat
For write support no special library is needed.
PBF
To build with PBF support you have to compile with threads and need libz
:
-pthread -lz
Note that in older versions of libosmium
you needed to link with the
protobuf
and osmpbf
libraries. They are not used any more. Instead the
protozero header-only library is used.
Reading and Writing OSM Files with Osmium
The osmium::io::File class
Before reading from or writing to an OSM file, you have to instantiate an object of class osmium::io::File. It encapsulates the file name as well as any information about the format of the file. In the simplest case the File class can derive the file format from the file name:
osmium::io::File input_file{"planet.osm.pbf"} // PBF format
osmium::io::File input_file{"planet.osm.bz2"} // XML with bzip2 compression
osmium::io::File input_file{"planet.osc.gz"} // XML change file, gzip2 compression
The constructor of the File class has a second, optional argument giving the format of the file, which can be used if the format can’t be deduced from the file name. In the simplest form the format argument looks the same as the usual file suffixes:
osmium::io::File input_file{"somefile", "osm.bz2"};
This setting of the format is often needed when reading from STDIN or writing to STDOUT. Both an empty string and a single dash as filename signify STDIN/STDOUT:
osmium::io::File input_file{"-", "osm.bz2"};
osmium::io::File output_file{"", "pbf"};
The format string can also take optional arguments separated by commas.
osmium::io::File output_file{"out.osm.pbf", "pbf,pbf_dense_nodes=false"};
It is also possible to change the format after creating a File object using the accessor functions:
osmium::io::File input_file{"some_file.osm"};
input_file.format(osmium::io::file_format_pbf);
Reading a File
After you have a File object you can instantiate a Reader object to open the file for reading:
osmium::io::File input_file{"input.osm.pbf"};
osmium::io::Reader reader{input_file};
As a shortcut you can just give a file name to the Reader if you are relying on the automatic file format detection and don’t want to do any special format handling:
osmium::io::Reader reader{"input.osm.pbf"};
Optionally you can add a second argument to the Reader constructor giving the types of OSM entities you are interested in. Sometimes you only need, say, the ways from the file, but not the nodes and relations. If you tell the Reader about it, it might be able to read the file more efficiently by skipping those parts you are not interested in:
osmium::io::Reader reader{"input.osm.pbf", osmium::osm_entity_bits::way};
You can set the following flags:
Flag | Description |
---|---|
osmium::osm_entity_bits::nothing |
Do not ready any entities at all (useful if you are only interested in the file header) |
osmium::osm_entity_bits::node |
Read nodes |
osmium::osm_entity_bits::way |
Read ways |
osmium::osm_entity_bits::relation |
Read relations |
osmium::osm_entity_bits::changeset |
Read changesets |
osmium::osm_entity_bits::all |
Read all of the above |
You can also “or” several flags together if needed.
You can get the header information from the file using the header()
function:
osmium::io::Header header = reader.header();
You read the OSM entities from the file using the read()
which returns a
buffer with the data:
while (osmium::memory::Buffer buffer = reader.read()) {
...
}
At the end of the file an invalid buffer is returned which evaluates to false in boolean context.
You can close the file at any time. It will also be automatically closed when the Reader object goes out of scope.
reader.close();
In most cases you do not want to work with the buffers, but with the OSM entities within them. See the [Iterators] chapter and the [Handlers] chapter for more convenient methods of working with open files.
The File Header
Some OSM file formats contain a file header. The most popular formats XML and PBF have a header as well as the O5M/O5C format. The OPL format doesn’t have a header.
You access the header information of a file you are reading from the Reader
object with the header()
method:
osmium::io::Header header = reader.header();
When writing a file the header can be set in the constructor of the Writer
object, see below.
The header can contain any number of bounding boxes, although usually there is only a single one (or none). PBF files only allow a single bounding box, but XML files can have multiple ones, although it is unusual and the semantics are unclear, so it is discouraged to create files with multiple bounding boxes.
The header contains a flag telling you whether this file can contain multiple versions of the same object. This is true for history files and for change files, but not for normal OSM data files. Not all OSM file formats can distinguish between those cases, so the flag might be wrong.
In addition the header can contain any number of key-value pairs with
additional information. Most often this is used to set the generator
, the
program that generated the file. Depending on the file format some of these
key-value pairs are handled specially. Because there is no generic header
option facility in OSM files, you can only read/write options that Osmium
recognizes. Unknown options or options not suitable for the file format you
are writing are silently ignored.
See the description of the osmium::io::Header
and the osmium::util::Options
class for details on setting and accessing these options.
These header options are recognized by Osmium:
Format | R/W | Option | Description |
---|---|---|---|
XML,PBF | r/w | generator |
The program that generated this file. If this is not set by an application, Libosmium will set it to libosmium/VERSION on writing. |
XML | r/w | xml_josm_upload |
Value of the upload attribute on the osm XML element (true or false ) for use in JOSM. |
XML | r | version |
File version (currently always set to 0.6 ). |
PBF, O5M/O5C | r | timestamp |
(Replication) timestamp (1). |
PBF | r | pbf_dense_nodes |
Set when reading a PBF file with DenseNodes (2). |
PBF | r | pbf_optional_feature_# |
Set for all optional features specified in PBF header (3). |
PBF | r/w | osmosis_replication_timestamp |
Timestamp used in replication (1, 4). |
PBF | r/w | osmosis_replication_sequence_number |
Sequence number used in replication (4). |
PBF | r/w | osmosis_replication_base_url |
Base URL for change files used in replication (4). |
PBF | r/w | sorting |
Sorting of the file (5). |
O5M/O5C | r | o5m_timestamp |
(Replication) timestamp (1). |
Notes:
- The
timestamp
field is set to the same value as eitherosmosis_replication_timestamp
oro5m_timestamp
(if available). When writing a file, atimestamp
option is ignored, you have to use one of the other ones. - To disable DenseNodes when writing a file (they are enabled by default),
you have to set this option not on the
Header
but on theFile
object. - Example: When there are two optional features names “Foo” and “Bar” set in
the PBF header, the options
pbf_optional_feature_0=Foo
andpbf_optional_feature_1=Bar
are set. - See the section “What are the replication fields for?” on
https://wiki.openstreetmap.org/wiki/PBF_Format
for details. - Read or write the optional header property
Sort.Type_then_ID
if set toType_then_ID
.
Writing a File
To create an OSM file, create an instance of the osmium::io::Writer
class
and move buffers with OSM objects into its write()
function:
osmium::memory::Buffer buffer;
// Add objects to the buffer (see above) or read it from
// an input file using osmium::io::Reader::read().
osmium::io::File output_file{"output.osm.pbf"};
osmium::io::Writer writer{output_file};
writer.write(std::move(buffer));
writer.close();
As a shortcut, you can directly give the filename to the Writer if you are relying on the automatic file format detection (the same as for Readers) and don’t need any special handling.
osmium::io::Writer writer{"output.osm.pbf"};
You can give additional arguments to the constructor of the Writer class, for instance a customized header or to allow writing over an existing file:
osmium::io::Header header;
header.set("generator", "FastOSMTool");
osmium::io::Writer writer{"output.osm.pbf",
header,
osmium::io::overwrite::allow,
osmium::io::fsync::yes};
10. Iterators
Every C++ programmer is familiar with iterators and their flexibility. There is no reason we couldn’t take advantage of that and of the many algorithms supplied by the STL. So libosmium supports several different kinds of iterators to access OSM data. You can iterate over all OSM objects in a buffer, or over all objects from a data source (usually a file), or over a bunch of pointers to OSM objects, and there are output iterators to write to files, too. All these different iterators can be used consistently and easily from your code without having to know much about what’s underneath. And because they work just like STL iterators do, you can use all the algorithms from the STL.
Some of these iterators will keep track of underlying buffers and make sure the buffers and the data in them stay around as long as there is an iterator pointing to it. This adds some overhead but makes using the data much easier.
Accessing Data in Buffers
Buffers containing OSM entities support the usual begin()
, end()
, cbegin()
,
and cend()
functions:
osmium::memory::Buffer buffer = ...;
auto it = buffer.begin();
auto end = buffer.end();
for (; it != end; ++it) {
std::cout << it->type() << "\n";
}
Of course you can also use the C++11 for
loop:
for (auto& item : buffer) {
...
}
Accessing Data from Files
osmium::io::Reader reader{"input.osm"};
osmium::io::InputIterator<osmium::io::Reader> in{reader};
osmium::io::InputIterator<osmium::io::Reader> end;
11. Handlers
If you process OSM data with libosmium to do something (e.g. convert to a different format, import into a database, build a routing graph), you will usually create one or more handlers.
Handlers are created by deriving a class from osmium::handler::Handler
which defines
methods for all OSM object types, i.e. a method node(const osmium::Node&)
for nodes, a
method way(const osmium::Way&)
for ways etc.
You have to implement the methods for the object types you want to process. Libosmium
will read the data, feed it object by object into the handler and you can do there whatever
you want. Your handler may have temporary storage, e.g. if you want to sum up the length of
all roads in an OSM file.
#include <iostream>
#include <osmium/handler.hpp>
#include <osmium/io/any_input.hpp>
#include <osmium/osm/node.hpp>
#include <osmium/osm/way.hpp>
#include <osmium/visitor.hpp>
class MyHandler : public osmium::handler::Handler {
public:
void way(const osmium::Way& way) {
std::cout << "way " << way.id() << '\n';
for (const osmium::Tag& t : way.tags()) {
std::cout << t.key() << "=" << t.value() << '\n';
}
}
void node(const osmium::Node& node) {
std::cout << "node " << node.id() << '\n';
}
};
int main() {
auto otypes = osmium::osm_entity_bits::node | osmium::osm_entity_bits::way;
osmium::io::Reader reader{"input.osm.pbf", otypes};
MyHandler handler;
osmium::apply(reader, handler);
reader.close();
}
The example above reads an OSM file and writes some informations about nodes
and ways to STDOUT
.
You can define multiple handlers, osmium will feed the objects into the
handlers one after another. Just add the additional handlers to
osmium::apply()
which accepts a reader and one or multiple handlers.
Multiple handlers are necessary if you want to access the locations of the
nodes referenced by a way because the way itself only contains references to
the nodes. A special handler has to offer methods to look up the location by
the ID of a node. The best index type for this NodeLocationsForWays
handler
depends on the size of the file, the available memory and the operating system.
See Osmium Concept Manual for details.
#include <iostream>
#include <osmium/handler.hpp>
#include <osmium/osm/node.hpp>
#include <osmium/osm/way.hpp>
#include <osmium/io/any_input.hpp>
#include <osmium/visitor.hpp>
#include <osmium/index/map/sparse_mem_array.hpp>
#include <osmium/handler/node_locations_for_ways.hpp>
class MyHandler : public osmium::handler::Handler {
public:
void way(const osmium::Way& way) {
std::cout << "way " << way.id() << '\n';
for (const auto& n : way.nodes()) {
std::cout << n.ref() << ": " << n.lon() << ", " << n.lat() << '\n';
}
}
};
int main() {
auto otypes = osmium::osm_entity_bits::node | osmium::osm_entity_bits::way;
osmium::io::Reader reader{"input.osm.pbf", otypes};
namespace map = osmium::index::map;
using index_type = map::SparseMemArray<osmium::unsigned_object_id_type, osmium::Location>;
using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;
index_type index;
location_handler_type location_handler{index};
MyHandler handler;
osmium::apply(reader, location_handler, handler);
reader.close();
}
You can find lots of examples how to use a handler at the examples of libosmium and osmium-contrib repository.
12. Working with relations
Working with relations is more complicated than working with just nodes and ways. But relations contain a lot of interesting data, first and foremost the multipolygon relations needed for proper area support. To work with relations you usually have to somehow combine the relation objects with their member objects. Libosmium contains a lot of building blocks that can help you do that.
One often used approach looks like this: You read the OSM file containing the data you want to work on (either the planet or some extract) twice. On the first pass only relations are read and kept in main memory. On the second pass nodes, ways, and relations are read and matched to the in-memory relations they are a member of. This approach works quite well, because a) libosmium can read OSM data really fast, so reading a file twice isn’t as expensive as you might imagine, and b) because there aren’t that many relations in the OSM data compared to the number of nodes and ways. You could keep the nodes and ways in memory to later match them to the relations, but this would need a lot more memory. And it can’t handle the case properly where there are relations that are members of other relations, because you do not know that you might need a member relation before you see the parent relation.
This chapter describes how you can use the RelationsManager
class to
implement this approach in your code that can handle any kind of relation
you like. It will then describe how you can use the MultipolygonManager
that specifically does this for multipolygon relations. And after that we
look at the classes used behind the scenes if you need to go deeper.
Note that there are classes used in earlier version of libosmium for similar
work, namely the osmium::relations::Collector
and
osmium::area::MultipolygonCollector
classes. They are still available, but
deprecated now. Please use the manager classes instead.
Using the RelationsManager
The RelationsManager
class handles the whole process outlined above of
storing relations in memory and later matching OSM member objects to their
parent relations. Once all the pieces of a relation have been assembled it
will call your code to actually do something with the relation. Internally
it uses several other classes described in the next chapter.
To use the RelationsManager
create your own class deriving from it. The
RelationsManager
uses the Curiously recurring template pattern
(CRTP)
to call into your code.
#include <osmium/relations/relations_manager.hpp>
class YourManager : public osmium::relations::RelationsManager<YourClass, true, true, false> {
...
};
As you can see the first template parameter of the RelationsManager
is your
class, the next three template parameters tell the RelationsManager whether
you are interested in member nodes, ways, and/or relations, respectively. So
the code above says: I only want to handle members of type node and way, but
not members of type relation. If a parameter is set to false
the code in
the class will behave as if there are no objects of the given class in the
input file, your code will never see them.
Usually you want to overwrite several functions in this class that tell
the RelationsManager
how to behave:
The new_relation()
function is called for every relation encountered in the
input data. Usually this function should first decide whether your code is
interested in this relation, typically by looking at the type
tag. You can
then do any processing on the relation that doesn’t require the actual member
objects to be available. To “express interest” in this relation, return true
,
the relation is then “remembered” by the RelationsManager
for further
processing, otherwise the RelationsManager
ignores this relation.
bool new_relation(const osmium::Relation& relation) noexcept {
return relation.tags().has_tag("type", "route");
}
If you have expressed an interest in a relation, the new_member()
function is
called for each member. Again, you should first decide whether you are
interested in this member, for instance depending on its type or role. Remember
that at this time you only have the member type, id, and role available, not
the whole object. You can then do any processing you need and return true
or
false
depending on whether you are interested in this member. The default
is to simply return true
for all members which is often enough because you
already specified which types of members you are interested in using the
template parameters of the RelationsManager
class.
bool new_member(const osmium::Relation& /*relation*/, const osmium::RelationMember& /*member*/, std::size_t /*n*/) noexcept {
return true;
}
These two functions are called during the first pass through the data. All
remaining functions are called during the second pass. The most important
is the complete_relation()
function. It is called for each relation you
have expressed an interest in once all the members have been found in the
input file. So when this is called you have access to the relation object
as well as all the member objects.
Here is an example:
void complete_relation(const osmium::Relation& relation) {
// Iterate over all members
for (const auto& member : relation.members()) {
// member.ref() will be 0 for all members you are not interested
// in. The objects for those members are not available.
if (member.ref() != 0) {
// Get the member object
const osmium::OSMObject* obj = this->get_member_object(member);
// If you know which type you have you can also use any of these:
const osmium::Node* node = this->get_member_node(member.ref());
const osmium::Way* way = this->get_member_way(member.ref());
const osmium::Relation* relation = this->get_member_relation(member.ref());
}
}
}
The pointers returned from the get_member_*
functions will be nullptr
if
the member is not available. If you do the member.ref() != 0
check first,
all members are available and you don’t need to check for nullptr
.
You have to do all the processing of your relation in this function. Once you return from this function, the relation and its members will be removed from memory to make space for more data. If you will need the data again, you have to store it yourself somewhere.
The RelationsManager
keeps an output Buffer
for you. You can write objects
into this buffer, later to be used in your application or written out to disk.
Say you want to write out all member objects with a building
tag:
void complete_relation(const osmium::Relation& relation) {
for (const auto& member : relation.members()) {
if (member.ref() != 0) {
const auto& obj = this->member_database(member.type()).get(member.ref()));
if (obj.tags().has_tag("building") {
this->buffer().add_item(obj);
this->buffer().commit();
}
}
}
}
We’ll see later where this buffer ends up and how you can access it.
There are a bunch more functions that you can overwrite if you need to. The functions
before_node()
before_way()
before_relation()
are called for each node, way, or relation, respectively, before the member handling is done. The functions
after_node()
after_way()
after_relation()
are called for each node, way, or relation, respectively, after the member handling is done. The functions
node_not_in_any_relation()
way_not_in_any_relation()
relation_not_in_any_relation()
are called when the node, way, or relation isn’t a member in any relation.
Note that these functions are only called if you have set the corresponding
template parameters of the RelationsManager
to true.
Here is the sequence of processing for each object:
- Call
before_node/way/relation()
- Have we expressed any interest in this object for any relation?
- If yes, store the object in memory. Call
complete_relation()
for all relations that were “completed” by this object, ie where this object was the last missing member. - If no, call
node/way/relation_not_in_any_relation()
- If yes, store the object in memory. Call
- Call
after_node/way/relation()
Now that you have customized your class, you can use it like this:
int main(int argc, char* argv[]) {
// You'll need some OSM input file
osmium::io::File input_file{argv[1]};
// Instantiate your manager class
YourManager manager;
// First pass through the file
osmium::relations::read_relations(input_file, manager);
// Second pass through the file
osmium::io::Reader reader{input_file};
osmium::apply(reader, manager.handler());
// Access data in output buffer
osmium::memory::Buffer buffer = manager.read();
...
}
If your manager stores its results internally in some way, this is enough. If your manager didn’t write any data into the output buffer or only few objects, the code above will do. But if the output buffer can grow too large, you have to handle it.
Using the Output buffer
The RelationsManager
has an internal instance of the CallbackBuffer
class
(see chapter 8). In the class derived from the RelationsManager you use the
buffer()
function to get access to its internal buffer. You can add objects
to this buffer as explained above.
The RelationsManager::handler()
function sets the callback for this buffer.
Your callback function is called whenever the internal buffer is full and
you must do something with the buffer from there, for instance write it
out to disk.
// Second pass through the file
osmium::io::Reader reader{input_file};
osmium::apply(reader, manager.handler([&](osmium::memory::Buffer&& buffer){
// This will be called whenever the buffer is "full" (see below).
// Handle the buffer here.
}));
}
Incomplete Relations
If you work with extracts of the planet, your extract will usually not have all
relations complete, i.e. some members of some relations are missing because
they are located beyond the boundary of the extract. For these relations you
will never get a call to complete_relation()
. If you still want to see these
relations, call the for_each_incomplete_relation()
function on the manager:
manager.for_each_incomplete_relation([&](const osmium::relations::RelationHandle& handle){
// Access relation from handle:
const osmium::Relation& relation = *handle;
// Access members
for (const auto& member : handle->members()) {
if (member.ref() == 0) {
// we did not express interest in the member
} else {
const auto* object = get_object(member);
if (object == nullptr) {
// member was not in input data
} else {
// member was in input data
}
}
}
// do something with the relation
});
The RelationHandle
works a bit like a pointer giving you access to the
underlying Relation
using operator*()
and operator->()
.
MultipolygonManager and MultipolygonManagerLegacy
Multipolygons are
a type of relation at OpenStreetMap (they are tagged with
type=multipolygon
) to model areas with inner rings and areas with multiple
outer rings. Osmium provides a relations manager for multipolygons and
boundary relations
(which work like multipolygons but are tagged with type=boundary
called
osmium::area::MultipolygonManager
.
If you are working with older OSM data (before about June 2017) you have to
take old-style multipolygons into account. They are not supported by the
osmium::area::MultipolygonManager
class, but you can use the
osmium::area::MultipolygonManagerLegacy
class instead.
There are lots of examples how to use a MultipolygonManager
, e.g.
osmium_area_test.cpp
at Osmium examples- OSM Area Tools
If the RelationsManager is not enough
The RelationsManager
class has a lot of built-in flexibility allowing you
to change its functionality by overriding many of its functions in a derived
class. If this is not enough, you can use the following classes as a basis
for your own implementation. Look the RelationsManager
code as an example
on how they are used and take it from there.
- The
RelationsManagerBase
class is the base class of theRelationsManager
class. It mostly is a convenient container for theItemStash
,RelationsDatabase
,MembersDatabase
andCallbackBuffer
classes, but it doesn’t contain much more. - The
ItemStash
(see chapter XXX) is used to store relations and member objects in memory. - The
RelationsDatabase
is used to keep track of all relations we are interested in. It uses theItemStash
for internal storage. - The
MembersDatabase
is used to keep track of all member objects we are interested in. It must always be used together with aRelationsDatabase
. It uses theItemStash
for internal storage.
13. Creating Geometries
OSM objects describe where something is and what it is. The what is described by the tags, the where, the “geometry” is “encoded” in the locations (longitude and latitude) of the nodes for simple points, in the locations of the nodes in a way forming a linestring (or, possibly, a polygon if the first and last node are the same), and more complex geometrical objects (such as multipolygons) if relations are involved.
For many uses cases the geometry of an OSM object (or OSM objects) is important. After all, if you want to render a map, you need the geometry of everything in it. That is why libosmium has many functions to create the different kind of geometries from OSM objects. The whole exercise is made more difficult, because there are many different ways to represent geometries in C++ programs used by different software packages. Osmium knows about several of them.
Example: Creating a point geometry from a node
As an introductory example, we’ll look at how a point geometry can be created from a node.
#include <osmium/geom/factory.hpp>
const osmium::Node& node = ...; // got this from somewhere
osmium::geom::WKTFactory<> factory;
std::string wkt = factory.create_point(node);
First you need a geometry factory. Those factories know how to convert OSM
objects into different kinds of geometry represantations. The WKTFactory
creates geometries in the WKT (Well Known Text) format which is just a string
like POINT(3.567 25.642)
.
Then you use the factory to create the point from the node and you are done.
Geometry types
Libosmium can create the following geometry types:
Geometry type | from these objects | with function |
---|---|---|
Point |
Node , NodeRef , Location |
create_point() |
LineString |
Way , WayNodeList |
create_linestring() |
Polygon |
Way , WayNodeList |
create_polygon() |
MultiPolygon |
Area |
create_multipolygon() |
Notes:
- LineStrings can also be created in reverse and with or without duplicate nodes. See the reference documentation for details.
- Polygons can only be created from closed ways or way node lists.
Factories
Libosmium supports the following factories for different geometry formats:
WKT
Well-known text is a simple
text based format with geometries that look like POINT(2.2452, 41.3124)
or
LINESTRING(1.1554 2.5215, 1.1453 2.5663)
. They can be created like this:
#include <include/osmium/geom/wkt.hpp>
osmium::geom::WKTFactory<> factory;
The factory constructor takes an optional integer argument with the precision (number of digits after the decimal point), the default is 7, which is enough for OSM.
osmium::geom::WKTFactory<> factory{3}; // three digits after decimal point
All creation functions return a std::string
:
std::string point = factory.create_point(node);
std::string line = factory.create_linestring(way);
...
WKB
Well-known binary is a simple binary format. Create the factory like this:
#include <include/osmium/geom/wkb.hpp>
osmium::geom::WKBFactory<> factory;
The factory constructor takes two optional arguments. The first decides
whether you want WKB (wkb_type::wkb
, default) or Extended WKB (EWKB,
wkb_type::ewkb
), the second decides whether to output in raw binary
(out_type::binary
, default) or in hex encoded binary (out_type::hex
).
To create extended WKB in hex format as used by PostGIS for example:
osmium::geom::WKBFactory<> factory{osmium::geom::wkb_type::ewkb,
osmium::geom::out_type::hex};
All creation functions return a std::string
:
std::string point = factory.create_point(node);
std::string line = factory.create_linestring(way);
...
GEOS
The functions for creating GEOS geometries are deprecated and work only until GEOS 3.5. If you want to use it beyond that contact the libosmium developers by opening an issue on the Github repository.
GEOS is an Open Source library with powerful operations to work with and modify geometries. To use it from libsomium:
#include <include/osmium/geom/geos.hpp>
osmium::geom::GEOSFactory<> factory;
You can also set the SRID used by GEOS (default is -1, unset):
osmium::geom::GEOSFactory<> factory{4326};
If this is not flexible enough for your case, you can also create a GEOS factory yourself and then the libosmium factory from it:
geos::geom::PrecisionModel geos_pm;
geos::geom::GeometryFactory geos_factory{&pm, 4326};
osmium::geom::GEOSFactory<> factory{geos_factory};
Note: GEOS keeps a pointer to the factory it was created from in each geometry. You have to make sure the factory is not destroyed before all the geometries created from it have been destroyed!
All creation functions return a unique_ptr
to the GEOS geometry:
std::unique_ptr<geos::geom::Point> point = factory.create_point(node);
std::unique_ptr<geos::geom::LineString> line = factory.create_linestring(way);
...
GDAL/OGR
The GDAL/OGR library is very popular. Almost all Open Source GIS tools use it in one form or another to read or write geometries from/to files or databases in dozens of different formats (Shapfiles, Spatialite, PostGIS, etc.) You can use it from libosmium, too:
#include <include/osmium/geom/ogr.hpp>
osmium::geom::OGRFactory<> factory;
The factory constructor doesn’t take any special arguments.
All creation functions return a unique_ptr
to the OGR geometry:
std::unique_ptr<OGRPoint> point = factory.create_point(node);
std::unique_ptr<OGRLineString> line = factory.create_linestring(way);
...
GeoJSON
The GeoJSON format describes how to encode geometries in JSON.
Libosmium has two different GeoJSON factories. One creates normal
std::string
s with the JSON data. The other uses the
RapidJSON library. Both only create
the geometry portion of the JSON structure for you. You have to add the
feature structure with the properties yourself as needed for your use case.
The GeoJSONFactory takes an optional precision as argument like the WKT constructor:
#include <include/osmium/geom/geojson.hpp>
osmium::geom::GeoJSONFactory<> factory{6};
std::string point = factory.create_point(node);
The RapidGeoJSONFactory takes a form of rapidjson::Writer
as argument. Here
is an example:
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <include/osmium/geom/rapid_geojson.hpp>
typedef rapidjson::Writer<rapidjson::StringBuffer> writer_type;
rapidjson::StringBuffer stream;
writer_type writer{stream};
osmium::geom::RapidGeoJSONFactory<writer_type> factory{writer};
Please see the RapidJSON documentation for details about the Writer
class.
Using projections
Before creating the geometries, libosmium can convert the coordinates from the OSM objects into different coordinate systems using a projection. This projection is given as a template parameter to the factory constructor:
osmium::geom::WKTFactory<> factory; // default identity projection (EPSG 4326)
or
osmium::geom::WKTFactory<osmium::geom::IdentityProjection> factory; // same
Often used is the Web Mercator projection (EPSG 3857):
#include <osmium/geom/mercator_projection.hpp>
osmium::geom::WKTFactory<osmium::geom::MercatorProjection> factory;
The identity and Mercator projection are handled internally in libosmium. But you can also use any projection implemented by the Proj.4 library:
#include <osmium/geom/projection.hpp>
osmium::geom::Projection projection{"+init=epsg:31467"}; // Gauss-Krueger GK3
osmium::geom::WKTFactory<osmium::geom::Projection> factory{projection};
You need to link with -lproj
if you use this library. See the documentation
of the Proj.4 library on the different ways to initialize a projection using
a projection string.
Exceptions
Factory functions throw osmium::geometry_error
exceptions if something went
wrong creating a geometry.
Implementing your own factory
The geometry formats already implemented should cover a lot of uses, but if you
need to implement your own format factory, you can do so based on the code in
libosmium. You have to implement your own SomeFormatFactoryImpl
class that
implements the make_point()
, linestring_start()
,
linestring_add_location()
, linestring_finish()
, polygon_start()
,
polygon_add_location()
, polygon_finish()
, multipolygon_start()
,
multipolygon_polygon_start()
, multipolygon_polygon_finish()
,
multipolygon_outer_ring_start()
, multipolygon_outer_ring_finish()
,
multipolygon_inner_ring_start()
, multipolygon_inner_ring_finish()
,
multipolygon_add_location()
, and multipolygon_finish()
functions. These
functions are usually very small adapting the data to the desired format. All
the really logic is in the provided GeometryFactory parent class.
Then all you need is define the partial specialization
template <class TProjection = IdentityProjection>
using SomeFormatFactory = GeometryFactory<SomeFormatFactoryImpl, TProjection>;
and you are done.
Use the other implementations as examples and ask if you have any questions.
14. Storage
Osmium offers serveral different indexes suitable for different use cases. You have to choose a suitable index type. See the Osmium Concepts Manual for a list of available index types.
If you want to choose the index type on runtime, you can use
osmium::index::MapFactory
. The following code listing shows its
usage. location_index_type
is a variable you either set based on the
preferences of the user of your program or based on your own estimates (e.g.
file size).
#include <osmium/index/map.hpp>
using index_type = osmium::index::map::Map<osmium::unsigned_object_id_type, osmium::Location>;
using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;
std::string location_index_type = "sparse_mem_array";
const auto& map_factory = osmium::index::MapFactory<osmium::unsigned_object_id_type, osmium::Location>::instance();
auto location_index = map_factory.create_map(location_index_type);
location_handler_type location_handler{*location_index};
The ItemStash
class
Occasionally you want to store OSM objects in main memory and find them again
later. To store the data you can use a Buffer
(see chapter 8), but that is
sometimes not enough. The ItemStash
class might help.
An instance of the ItemStash
class is used to keep any number of Items
in memory. Usually those items are OSM objects, but it will work for any kind
of item that can also be stored in a buffer. In fact, internally ItemStash
uses an auto-growing buffer for this.
You add objects using add_item()
which returns an opaque handle that can be
used to later get the item back (using get_item()
) or remove the item using
remove_item()
. The handle remains valid regardless of the operations you
are doing on the stash (until you remove an item which invalidates the handle).
This is different than any pointers or references into the ItemStash
memory
which are invalidated by calls to add_item()
.
osmium::ItemStash stash;
const osmium::OSMObject& object = ...;
auto handle = stash.add_item(object);
// ...
const auto& object = stash.get_item<osmium::OSMObject>(handle);
// ...
stash.remove_item(handle);
The ItemStash
will internally manage the memory needed and occasionally do a
garbage collection which will purge all removed items. You can call
garbage_collect()
to force this.
Note that the ItemStash
does not keep any indexes to find objects by ID or by
other means (for instance tags). It only finds the items again using the
handles. It is your job to keep the handles somewhere and index them if
necessary. Handles are small value-type objects, feal free to copy them around.
A default constructed osmium::ItemStash::handle_type
can be used as an
invalid handle.
15. Exceptions
Libosmium uses various C++ standard exceptions and some Osmium-specific
exceptions to tell you about problems. All Osmium-specific exceptions are in
the osmium
namespace, they are all derived from one of the standard C++
exceptions, usually std::runtime_error
or std::system_error
.
List of Osmium Exceptions
Exception | Derived from | Description |
---|---|---|
osmium::io_error |
Some kind of input/output error. Derived classes describe the error in more detail. | |
osmium::xml_error |
io_error |
Some kind of XML parser error. |
osmium::format_version_error |
io_error |
The OSM file format version was not understood. Osmium currently can only read version 0.6 files. |
osmium::geometry_error |
Some kind of geometry error. | |
osmium::projection_error |
Thrown when a projection from one coordinate system into another fails in some way. Either the projection can’t be initialized because of invalid parameters or the projection can’t be calculated because the coordinates can’t be transformed into the target coordinate system. | |
osmium::not_found |
This exception is thrown when a key is not found in an index. | |
osmium::invalid_location |
||
osmium::unknown_type |
Thrown by visitors when they encounter an unknown (or in this context unexpected) item type in a buffer. This should not happen in usual circumstances. |
Standard Exceptions thrown by Osmium
std::invalid_argument
- Thrown by some Osmium functions.
16. Handling of invalid data
Libosmium can, to a certain extend, handle data that is invalid in the sense that it is not allowed in the OSM database or even might be nonsensical, for instance longitudes larger than 180°. This section explains the details and reasons.
There are good reasons for this behaviour:
- Libosmium is trying to be a generic library and, while some values might be considered invalid in the OSM context, they might be considered valid in other contexts.
- Sometimes data is not checked for performance reasons. You are expected to
give correct data to libosmium. If you don’t, the result is undefined.
Usually data will be checked in debug builds anyway using the
assert
macro. - Invalid values do creep in and libosmium wants to be able to help you fix it. But unless it can read and handle the invalid data, you can’t write a program that takes this data and fixes it in some way.
- The OSM database historically was more lenient than it is today. But libosmium must work with OSM files created years ago and history dumps of the database. So it allows those invalid values we know used to exist, but are not allowed any more today.
Generally more low-level classes and functions (such as basic classes
Location
, Node
, Tag
, etc.) are more lenient for flexibility, while higher
level functions (such as file I/O) might be more strict to support typical use
cases.
File input and output
It is possible to encode some data in OSM files that can be considered to be invalid. When reading and writing OSM files libosmium does not care about that. It will give you the data in the form it is in the file and write out data you give to it in that form.
Order of objects in files
OSM objects in OSM files are usually ordered by type, ID (and version for history files). This is a useful convention, but it is not necessarily so. All OSM file formats allow the data to be in any order and libosmium can read and write those files. Whenever you read data using libosmium, it will be given to you in the order it is in the file, whatever that is. Whenever you write data, you must give it to libosmium in the order you want it to end up in the file.
Note that the ordering of objects in a file might influence the size of the file. Some file formats (notably PBF) will encode the data better if the same types of objects are together and even better if they are ordered by ID.
IDs
OSM node, way, relation, and changeset IDs are always positive. Zero is allowed by libosmium and understood as the “unset” or “don’t know” value. Negative values are also allowed because some programs (JOSM for instance) use negative IDs as temporary IDs. Not all parts of libosmium will just work with negative integers, though, you might have to handle them specially in some way. Indexes usually only work with positive IDs, if you have to handle negative IDs, use two indexes, one for positive IDs and one for negative IDs that you have to transform first.
OSM uses a different ID space for each entity type (nodes, ways, relations, changesets) and gives out IDs starting from 1. Libosmium allows any kind of ID that fits into an unsigned 64bit int, but some parts, notably the indexes, are optimized for smaller and more or less contiguous integers.
User ID
The user ID has to be zero (“unknown” or “anonymous”) or a positive integer. Negative values are not allowed in libosmium.
Timestamps
Timestamps are stored internally as seconds since the epoch (1970-01-01). Although OSM was founded much later, timestamps are not checked. Libosmium uses a few special values here. Time 0 is the “unknown” value, time 1 is understood to be “before any other time value” and 2^32-1 is understood to be “after any other time value”.
Locations
Locations are given in WGS84 longitude and latitude. Both libosmium and
the OSM database store the coordinates internally as signed 32bit integers.
32bit integers have a range somewhat larger than the -180° to 180° longitude
and -90° to 90° latitude. Values outside this range, but inside the signed
32bit integers are possible and historic OSM data contains such values. Use
the Location::valid()
function to check whether a location is in the proper
range.
Strings and UTF-8
OSM strings use UTF-8 encoding, but a lot of the libosmium code doesn’t care about that and doesn’t check that a string is valid UTF-8. This is mostly for performance reasons, but it could also allow other character sets in non-OSM uses of the library.
Historically the OSM database sometimes contained non-UTF-8 strings. This should have all been fixed by now.
These parts of the library don’t care about string encoding:
- PBF input and output
These parts of the library do care about string encoding:
- OPL and debug output expect UTF-8 encoded data and escape non-printable characters accordingly
Strings and control characters
OSM strings (user names, tag keys and values, and roles) can not contain certain control characters. The reason is that those control characters can’t be expressed in XML. (XXX More details needed.)
Strings in OSM can only have a maximum lengt of 256 unicode characters. Libosmiums input and output routines allow any length up to 2^16 bytes. (XXX More details needed.)
17. Run-time Configuration
Osmium reads some settings from environment variables. This allows you to set configuration options for the library at run-time without any support from the application using the library. Setting these variables is usually not needed in normal operations but could be useful when debugging or tweaking performance.
OSMIUM_POOL_THREADS
The number of threads in the thread pool used for certain input/output operations.
If this is a negative number, it will be set to the actual number of cores on the system plus the given number, ie it will leave a number of cores unused. In all cases the minimum number of threads in the pool is 1.
Default: -2
OSMIUM_USE_POOL_THREADS_FOR_PBF_PARSING
Normally PBF parsing will use the thread pool. You can disable this by setting
this variable to false
.
Default: true
Queue Sizes
The following environment variables can be used to change the queue sizes used for file IO:
- Raw input data queue:
OSMIUM_MAX_INPUT_QUEUE_SIZE
(default 20) - Parsed OSM input data queue:
OSMIUM_MAX_OSMDATA_QUEUE_SIZE
(default 20) - Output data queue:
OSMIUM_MAX_OUTPUT_QUEUE_SIZE
(default 20) - Worker threads input queue:
OSMIUM_MAX_WORK_QUEUE_SIZE
(default 10)
Smaller queue sizes mean that potentially less memory is used, but it also means that the work can’t be parallelized as effectively.
The minimum value for all queue sizes is 2. When set to 0, the default is used.
OSMIUM_CLEAN_PAGE_CACHE_AFTER_READ
Since 2.17.0 Osmium will, when reading files, tell the kernel using fadvise
that it can remove pages from the buffer cache that are not needed any more.
This is usually beneficial, because the memory can be used for something else.
But if you are reading the same OSM file multiple times at the same time or in
short succession, it might be better to keep those buffer pages.
Since 2.17.1 you can set the environment variable
OSMIUM_CLEAN_PAGE_CACHE_AFTER_READ
to no
and Osmium will not call
fadvise
. Set it to yes
or anything else (or not set it at all) to get the
default behaviour.
18. Changes from old versions of Osmium
This version has some substantial changes from he “old Osmium” available from https://github.com/joto/osmium and users of the “old Osmium” will have to rewrite their code. Use the examples provided in the “example” directory or in the osmium-contrib repository to get an idea what needs changing. These examples are often similar to the examples provided with the old Osmium so they should give you an idea how your code has to change.
Here are some of the more important changes:
-
Osmium now needs C++11. It will not work with older compilers. You need at least GCC 4.8 or clang (LLVM) 3.4.
-
Namespaces are now all lower case. Everything is in the “osmium” namespace or sub-namespaces of it. Many classes and functions have been moved to different, more logical or shorter namespaces.
-
You can’t just instantiate OSM objects such as Nodes, Ways, or Relations. You need a Buffer first to hold them and use the Builder classes to create them. This is a bit more cumbersome, but greatly reduces the need for memory management and makes Osmium faster and easier to parallelize.
-
Usually you don’t act on single OSM objects any more, but on groups of them in a Buffer.
-
Reading and writing of OSM data is much simpler. Use the Reader and Writer classes as they hide much of the detail and have much nicer interfaces than the old Input/Output classes.
-
The class Osmium::OSM::Position was renamed to osmium::Location. This better reflects that it is a location on the planet we are talking about. The word “position” has many meanings and is, for instance, often used to denote a position in a file or buffer or so.
-
The dependency on boost has been greatly reduced. C++11 offers many features that used to be only available with boost, such as
shared_ptr
. Libosmium now uses the C++11 versions of these. -
Osmium now makes use of the new C++11 threading support when reading and writing OSM files.