← Documentation

OPL File Format

Table of Contents

1. Introduction

The OPL (“Object Per Line”) format was created to allow easy access to and manipulation of OpenStreetMap data with typical UNIX command line tools such as grep, sed, and awk, or typical scripting languages such as Python, Ruby or Perl. It is also great for writing compact test cases.

In an OPL file each OSM object is on its own line with a newline character at the end. Each line contains fields separated by spaces.

This makes some ad-hoc OSM data manipulation easy to do, but it is not as fast as some specialized tool.

OPL files are only about half the size of OSM XML files, when compressed (with gzip or bzip2) they are about the same size.

Osmium can read and write OPL files.

2. File Format

Each line of the file contains one OSM object (a node, way, or relation) or an OSM changeset. Lines end in a newline character.

Each line is made up of several fields separated by a space character. Each field is introduced by a specific character defining the type of the field.

When OPL files are written by Osmium, the fields always appear in the same order in a line and are always all present (except when the file is written without metadata, see below).

When Osmium parses a line, fields can appear in any order and all fields except the first one are optional.

Fields in OSM data files

One of these fields is always the first:

n - Node ID (nodes only)
w - Way ID (ways only)
r - Relation ID (relations only)

Then in the given order:

v - Version
d - Deleted flag ('V' - visible or 'D' - deleted)
c - Changeset ID
t - Timestamp (ISO Format)
i - User ID
u - Username
T - Tags
x - Longitude (nodes only)
y - Latitude (nodes only)
N - Nodes (ways only)
M - Members (relations only)

If the file was written without metadata (using the option add_metadata=false in Osmium), the fields v, d, c, t, i, and u are missing.

The t, N, M, and T fields can be empty. If the user is anonymous, the ‘User ID’ will be 0 and the ‘Username’ field will be empty: ... i0 u .... If the node is deleted, the ‘Longitude’ and ‘Latitude’ fields are empty. All other fields always contain data.

Fields in OSM changeset files

For changesets the fields are:

c - Changeset ID
k - num_changes
s - created_at (start) timestamp (ISO Format)
e - closed_at (end) timestamp (ISO Format)
d - number of comments in the discussion
i - User ID
u - Username
x - Longitude (left bottom corner, min_lon)
y - Latitude (left bottom corner, min_lat)
X - Longitude (right top corner, max_lon)
Y - Latitude (right top corner, max_lat)
T - Tags

The field e is empty when the changeset is not closed yet. The fields x, y, X, Y can be empty when no bounding box could be derived. The field k can be 0. The field T can be empty if there are no tags.

Changeset discussions do not appear in the OPL format!

3. Encoding

Numbers

Numbers, such as IDs and version numbers are written as decimal digits.

Timestamps

Timestamps are written in the same ISO 8601 format used in OSM XML files: yyyy-mm-ddThh:mm:ssZ. The time zone is always Z.

If the timestamp is not set, it will show up empty.

Deleted flag

The ‘Deleted flag’ shows whether an object version has been deleted (dD) or is visible (dV). For normal OSM data files this is always dV, but change files and osm history files can contain deleted objects.

Text in user names, tags, and roles

User names, tags, and relation member roles can contain any valid Unicode character. Any characters that have special meaning in OPL files (space, newline, ‘,’ (comma), ‘=’ (equals), ‘@’, and ‘%’) have to be escaped as well as any non-printing characters.

Escaped characters are written as %xxxx%, i.e. a percent sign followed by the hex code of the Unicode code point followed by another percent sign. The number of digits in the hex code is not fixed, but must be between 1 and 6, because all Unicode code points can be expressed in not more than 6 hex digits.

Do not use two percent characters directly after another %%, the result is currently not defined.

Any code reading OPL files has to cope with encoded and non-encoded characters (except that characters used in the OPL file with special meaning will always be encoded).

Currently there is a hard-coded list in the Osmium source of all the characters that don’t need escaping. This list is incomplete and subject to change. Currently two hex digits are used for code points less than 256 and at least four hex digits for numbers above that.

(An older version of OPL tried to encode characters as %xxxx with always 4 hex digits, but this doesn’t work because Unicode code points can need more digits.)

Tags

Tags are written in the form key = value. Several tags are joined by a commas (,). Any equal sign or comma in the key or value is escaped.

Nodes in ways

Nodes in ways are written as a comma-separated list of nID combinations.

Optionally node locations can also appear in the node list in ways. In this case they are encoded as nIDxLONyLAT.

Relation members

Relation members consist of the type n, w, or r, the ID, an at-sign (@) and the role. Several members are joined by commas (‘,’). Any at-sign or comma in the roles is escaped.

4. Format Overview

Some lines have been broken in this description for easier reading, in the file format they are not.

NODE:
    n(OBJECT_ID) v(VERSION) d(V|D) c(CHANGESET_ID) t(yyyy-mm-ddThh:mm:ssZ)
    i(USER_ID) u(USERNAME) T(TAGS) x(LONGITUDE) y(LATITUDE)

WAY:
    w(OBJECT_ID) v(VERSION) d(V|D) c(CHANGESET_ID) t(yyyy-mm-ddThh:mm:ssZ)
    i(USER_ID) u(USERNAME) T(TAGS) N(WAY_NODES)

RELATION:
    r(OBJECT_ID) v(VERSION) d(V|D) c(CHANGESET_ID) t(yyyy-mm-ddThh:mm:ssZ)
    i(USER_ID) u(USERNAME) T(TAGS) M(MEMBERS)

CHANGESET:
    c(CHANGESET_ID) k(NUM_CHANGES) s(yyyy-mm-ddThh:mm:ssZ) e(yyyy-mm-ddThh:mm:ssZ)
    d(NUM_COMMENTS) i(USER_ID) u(USERNAME)
    x(LONGITUDE) y(LATITUDE) X(LONGITUDE) Y(LATITUDE) T(TAGS)

TAGS
    (KEY)=(VALUE),...

WAY_NODES:
    n(NODE_REF),...
    or
    n(NODE_REF)x(LONGITUDE)y(LATITUDE),...

MEMBERS:
    [nwr](MEMBER_REF)@(MEMBER_ROLE),...

5. Usage Examples

Here are some examples how the OPL format can be used to easily get some data out of an OSM file.

Note that some of these commands generate quite a lot of output. You might want to add a | less or redirect into a file. For larger OSM files some of these commands might take quite a while, so try them out on small files first.

Find all objects tagged highway=...:

egrep "( T|,)highway=" data.osm.opl

Find all IDs of ways tagged highway=...:

egrep '^w' data.osm.opl | egrep "( T|,)highway=" | cut -d' ' -f1 | cut -c2-

Find all nodes with version > 9:

egrep '^n' data.osm.opl | egrep -v ' v. '

Find the first fields of the relation with the highest version number:

egrep '^r' data.osm.opl | sort -b -n -k 2.2,2 | tail -1 | cut -d' ' -f1-7

Find all objects with changeset ID 123:

egrep ' c123 ' data.osm.opl

Count how many objects were created in each hour of the day:

egrep ' v1 ' data.osm.opl | cut -d' ' -f5 | cut -dT -f2 | \
    cut -d: -f1 | sort | uniq -c

Find all closed ways:

egrep '^w' data.osm.opl | egrep 'N(n[0-9]+),.*\1 '

Find all ways tagged with area=yes that are not closed:

egrep '^w' data.osm.opl | egrep 'area=yes' | egrep -v 'N(n[0-9]+),.*\1 '

Find all users who have created post boxes:

egrep ' v1 ' data.osm.opl | egrep 'amenity=post_box' | \
    cut -d' ' -f7 | cut -c2- | sort -u

Find all node IDs used in via roles in relations:

egrep '^r' data.osm.opl | sed -e 's/^.* M\(.*\) .*$/\1/' | egrep '@via[, ]' | \
    sed -e 's/,/\n/g' | egrep '^n.*@via$' | cut -d@ -f1 | cut -c2- | sort -nu

Find all nodes having any tags igoring created_by tags:

egrep '^n' data.osm.opl | egrep -v ' T$' | \
    sed -e 's/\( T\|,\)created_by=[^,]\+\(,\|$\)/\1/' | egrep -v ' T$'

Count tag key usage:

sed -e 's/^.* T//' data.osm.opl | egrep -v '^$' | sed -e 's/,/\n/g' | \
    cut -d= -f1 | sort | uniq -c | sort -nr

Order by object type, object id and version (ie the usual order for OSM files):

sed -e 's/^r/z/' data.osm.opl | sort -b -k1.1,1.1 -k1.2,1n -k2.2,2n | sed -e 's/^z/r/'

Create statistics on number of nodes in ways:

egrep '^w' data.osm.opl | cut -d' ' -f9 | tr -dc 'n\n' | \
    awk '{a[length]++} END {for(i=1;i<=2000;++i) { print i, a[i] ? a[i] : 0 } }'