osrm-backend/doc/advanced.md
Siarhei Fedartsou 563c04ae2a Squashed 'third_party/protozero/' changes from d5d8debf1..f379578a3
f379578a3 Release 1.7.1
040947ba2 CMake: Add language C to project()
c60e87879 Update include catch.hpp to 2.13.8
6599d4828 Update change log
ebd2e4f40 Allow setting C++ version to compile with in CMake config
bbb2a95d0 Github actions: Only install x64 version of vcpkgs in windows build
546edf929 Avoid narrowing conversion by being explicit
91adaecd6 Fix various issues reported by clang-tidy and disable some checks
68f30a1f7 Add Github actions CI build for Linux/macOS/Windows
c13886b78 Update some links
bd429c52f Include tools subdirectory *after* testing is enabled
3133dc52f Use std::memcpy instead of just memcpy
7ee29422a Merge pull request #106 from daniel-j-h/fix-byteswap-ub
aba0800aa CMake config: clang-tidy target needs writer_tests which need protobuf
74516e8df Modernize CMake config a little bit
b4486ca7a Disable some clang-tidy warnings
19f4b75f6 Fix appveyor build
58b1a19a4 Modernize Doxygen config file
046c07d0e Update included catch2 framework to current version v2.13.7
49acea746 fix some doxygen warnings by removing obsolete config entries
0c5426df3 fix cmake dep warning: 'Compatibility with CMake < 2.8.12 will be removed from a future version of CMake'
dadf7bd51 Fixes float and double byteswap undefined behavior
85db94025 Merge pull request #105 from ffontaine/master
d3a35791d Merge pull request #104 from joto/master
03daae49c CMakeLists.txt: respect BUILD_TESTING
67133e362 Add missing includes
9f85f3a5e Update README.md
010ffcf69 Release 1.7.0
6ad492994 Fixed docs
adde4dedf Travis: Add non-Intel architectures
37c3d0e1d Add typedefs and functions to make buffer adaptor work as container
83563acdb Remove delegating constructor because clang-tidy doesn't like it
b4afc06db Use #include "" for library-internal includes
d1929788e Add missing includes
fae5247f0 Update change log
2abb1b5cd Travis: Test with std::string_view, not std::experimental::string_view
697bd812d Use forwarding constructor in fixed_size_buffer_adaptor
c8fd2e819 Move fixed_size_buffer_adaptor into buffer_fixed.hpp
abb856ecc Remove semicolon where it doesn't belong
6243855bc Change the way the customization for special buffer classes work
d6a8ed098 Remove useless post-increment
184046cb0 Remove need for push_back() on custom buffer types
0a974e067 Remove templated buffer adaptor wrappers
ed6ba5097 Add buffer implementation based on std::vector<char>
cec309c3c Use more descriptive names for buffer test types
a7b99da6f Use TEMPLATE_TEST_CASE to test different buffer implementations
a0abc493c Use explicit for constructor
72850abc9 Remove broken doxygen link
1e347c620 Add more convenient fixed_size_buffer_adaptor constructor
fb575e0ea Make members private
50e953b71 Make older compilers happy
c850ef150 Extend tests of static buffer use
f6d8394c0 Rename fixed_size_buffer to fixed_size_buffer_adaptor
3b18162e3 Make the buffer backend used by the pbf writer configurable.
6fd19c58d "Modernize" travis config
981aba084 Use explicit cast to avoid undefined behaviour
550974d5d Travis: Do not test GCC 4.7 any more
866e024fc Revert "Workaround in catch code so it compiles with older compilers"
65dfad056 Disable a clang-tidy test triggered by Catch.
02bf73df5 Workaround in catch code so it compiles with older compilers
f98792a15 Travis: Do not update homebrew for faster builds
2d87da7ec Switch tests to Catch2
5dc45ac3b Avoid signed/unsigned comparison
3a93f19ba Add missing includes
b49c077ac Disable clang-tidy for files where we don't have a compile command
34396fc7d Travis: Fix gcc8 build
c3060101c Handle clang-tidy warnings
64ef96ff0 Revert "Initialize test messages"
a0828d538 Travis: Also build with GCC 8
171c5c446 Update travis xcode versions
f5a223aa7 Use "auto*" instead of just "auto" for pointer types
e4fa23616 Initialize test messages
e3a59454a Simpler code and avoid shadowing of external function
830f049b4 Use STL algorithms insted of raw loops
faa7e6e8a Disable config settings not used in newer Doxygen versions
3b2e11438 Remove unnecessary enum name
7487f8109 Release 1.6.8
6dcaf8fde Travis config: Use "official" way to pull homebrew package.
c61eb29c3 Revert "Disable warnings from clang-tidy about a missing file."
28d05a0a8 Disable warnings from clang-tidy about a missing file.
329920a3c Pesky aliases of clang-tidy warnings strike again.
79fd87922 User plain assert() instead of our own so compare() can be noexcept.
473e6ec13 Update change log.
393e279b7 Make pbf_writer destructor noexcept.
48a38b3f2 Disable clang-tidy misc-non-private-member-variables-in-classes.
e9c148c8a Use no-argument version of main().
29ba04123 Disable clang tidy checks for C arrays.
2fcfb56e2 More places to use std::array instead of a C style array.
7321761a3 Disable a clang-tidy warning.
4d9d8fff4 Make data_view::compare() noexcept.
3325364cf User uppercase integer literal suffix.
df0a23c5e Use std::array instead of C arrays in some places.
8247ed76b Make clang-tidy include order check happy.
f1b504e16 Update travis config to user newer compilers and operating systems.
ccf692d47 Disable some clang-tidy warnings.
095abd259 CMake config: Also look for newer clang-tidy versions.
2c1f6f9c8 Use uppercase integer literal suffixes.
fadd024d4 Release 1.6.7
8c6acbff7 Fix signed-unsigned comparison.
b36774ccb Release 1.6.6
5a92b744f Remove useless asserts, simplify condition.
06bafb56c Fix several possible UBs.
b7b290b1a Release 1.6.5
51753d514 Merge pull request #95 from tomhughes/subscript
b90faaf03 Avoid out of bounds array subscript
7d418492e Merge pull request #94 from nigels-com/proto2
015f9cc5e Specify proto2 syntax to appease protoc
23d48fd2a Use universal initialization syntax in constructors.
0f610fad5 Update travis config: Use xenial for most builds.
d71da0b04 Update appveyor config: Simpler builds, current MSVC, 32bit build
3ef46ba78 Release 1.6.4
3a1ef0138 Tighten some tests.
18eebb8c3 Remove unused code from tests.
29ef3e4e7 More casts to remove undefined behaviour.
6108e6480 No more bitwise operations on signed integers in zigzag encoder/decoder.
6e0d34985 Remove bitwise operations on signed integers in varint decoder.
4af65f262 Update change log.
2f82182fe Add some tips to test/create_pbf_test_data.sh.
c55f4ed55 Fix some doxygen warnings.
afa362a03 Add static_asserts to check movability of some classes.
efeb45e0c Disable readability-implicit-bool-conversion clang-tidy warning.
78febda5b Explicit conversion and tests for new pbf_reader::data() function.
0d5492c9c Revert "Explicit conversion and tests for new pbf_reader::data() function."
43cf8fa5a Fix travis config.
bd2ae4682 Explicit conversion and tests for new pbf_reader::data() function.
28cd406bd Update travis with newer compiler versions.
0555e6a1f Add function to get the not yet read data from a pbf_reader.
bf4284bee Disable docker builds on travis. They are being phased out by travis.
5ffe45b71 New add_packed_fixed template function.
e54cd858d Add helper function that computes the length a varint would have.
72d7e143a More consistent implementation of operators.
3a41880c2 Do not download protobuf library, it isn't found by cmake anyway.
3c662ce3c Remove comment that doesn't apply (any more).
45da6dd4d Update zigzag tests.
4ad573dbf Extra cast so we do the xor with unsigned ints.
509aec5ab Update appveyor build to current Visual Studio compiler.
67b24e1a3 Remove unnecessary workaround in Appveyor config.
c559af682 Remove xcode6.4 build soon to be removed from travis.
0662dcecc Release 1.6.3
da5bfc019 Move byteswap_inplace functions from detail into protozero namespace.
a44efc34e Travis: Ignore install problems on OSX.
5775b2b23 Travis update to newer OSX image.
032aa037c Special case the distance between default initialized iterators.
0ca02161e Make dereferencing operator of fixed_iterator noexcept.
a0095f603 Test code must call functions that it wants to test.
6791b0bc3 Add unit tests.
191eb4004 Add some paranoia asserts.
99ca512f5 Use TEST_CASEs instead of SECTIOs in some tests.
040e2bc14 Add some asserts and tests.

git-subtree-dir: third_party/protozero
git-subtree-split: f379578a3f7c8162aac0ac31c2696de09a5b5f93
2024-07-13 15:52:32 +02:00

327 lines
12 KiB
Markdown

# Protozero Advanced Topics
This documentation contains some mixed advanced topics for Protozero users.
Read the [tutorial](tutorial.md) first if you are new to Protozero.
## Limitations of Protozero
* A protobuf message has to fit into memory completely, otherwise it can not
be parsed with this library. There is no streaming support.
* The length of a string, bytes, or submessage can't be more than 2^31-1.
* There is no specific support for maps but they can be used as described in
the "Backwards compatibility" section of
https://developers.google.com/protocol-buffers/docs/proto3#maps.
## Checking the Protozero version number
If `protozero/version.hpp` is included, the following macros are set:
| Macro | Example | Description |
| -------------------------- | ------- | ---------------------------------------------- |
| `PROTOZERO_VERSION_MAJOR` | 1 | Major version number |
| `PROTOZERO_VERSION_MINOR` | 3 | Minor version number |
| `PROTOZERO_VERSION_PATCH` | 2 | Patch number |
| `PROTOZERO_VERSION_CODE` | 10302 | Version (major * 10,000 + minor * 100 + patch) |
| `PROTOZERO_VERSION_STRING` | "1.3.2" | Version string |
## Changing Protozero behaviour with macros
The behaviour of Protozero can be changed by defining the following macros.
They have to be set before including any of the Protozero headers.
### `PROTOZERO_STRICT_API`
If this is set, you will get some extra warnings or errors during compilation
if you are using an old (deprecated) interface to Protozero. Enable this if
you want to make sure your code will work with future versions of Protozero.
### `PROTOZERO_USE_VIEW`
Protozero uses the class `protozero::data_view` as the return type of the
`pbf_reader::get_view()` method and a few other functions take a
`protozero::data_view` as parameter.
If `PROTOZERO_USE_VIEW` is unset, `protozero::data_view` is Protozero's own
implementation of a *string view* class.
Set this macro if you want to use a different implementation such as the C++17
`std::string_view` class. In this case `protozero::data_view` will simply be
an alias to the class you specify.
#define PROTOZERO_USE_VIEW std::string_view
## Repeated fields in messages
The Google Protobuf spec documents that a non-repeated field can actually
appear several times in a message and the implementation is required to return
the value of the last version of that field in this case. `pbf_reader.hpp` does
not enforce this. If this feature is needed in your case, you have to do this
yourself.
The [spec also
says](https://developers.google.com/protocol-buffers/docs/encoding#packed)
that you must be able to read a packed repeated field where a not-packed
repeated field is expected and vice versa. Also there can be several (packed or
not-packed) repeated fields with the same tag and their contents must be
concatenated. It is your responsibility to do this, Protozero doesn't do that
for you.
### Using `tag_and_type()`
The `tag_and_type()` free function and the method of the same name on the
`pbf_reader` and `pbf_message` classes can be used to access both packed and
unpacked repeated fields. (It can also be used to check that you have the
right type of encoding for other fields.)
Here is the outline:
```cpp
enum class ExampleMsg : protozero::pbf_tag_type {
repeated_uint32_x = 1
};
std::string data = ...
pbf_message<ExampleMsg> message{data};
while (message.next()) {
switch (message.tag_and_type()) {
case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::length_delimited): {
auto xit = message.get_packed_uint32();
... // handle the repeated field when it is packed
}
break;
case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::varint): {
auto x = message.get_uint32();
... // handle the repeated field when it is not packed
}
break;
default:
message.skip();
}
}
```
All this works on `pbf_reader` in the same way as with `pbf_message` with the
usual difference that `pbf_reader` takes a numeric field tag and `pbf_message`
an enum field.
If you only want to check for one specific tag and type you can use the
two-argument version of `pbf_reader::next()`. In this case `17` is the field
tag we are looking for:
```cpp
std::string data = ...
pbf_reader message{data};
while (message.next(17, pbf_wire_type::varint)) {
auto foo = message.get_int32();
...
}
```
See the test under `test/t/tag_and_type/` for a complete example.
## Reserving memory when writing messages
If you know beforehand how large a message will become or can take an educated
guess, you can call the usual `std::string::reserve()` on the underlying string
before you give it to an `pbf_writer` or `pbf_builder` object.
Or you can (at any time) call `reserve()` on the `pbf_writer` or `pbf_builder`.
This will reserve the given amount of bytes *in addition to whatever is already
in that message*. (Note that this behaviour is different then what `reserve()`
does on `std::string` or `std::vector`.)
In the general case it is not easy to figure out how much memory you will need
because of the varint packing of integers. But sometimes you can make at least
a rough estimate. Still, you should probably only use this facility if you have
benchmarks proving that it actually makes your program faster.
## Using the low-level varint and zigzag encoding and decoding functions
Protozero gives you access to the low-level functions for encoding and
decoding varint and zigzag integer encodings, because these functions can
sometimes be useful outside the Protocol Buffer context.
### Using low-level functions
To use the low-level functions, add this include to your C++ program:
```cpp
#include <protozero/varint.hpp>
```
### Functions
The following functions are then available:
```cpp
decode_varint()
write_varint()
encode_zigzag32()
encode_zigzag64()
decode_zigzag32()
decode_zigzag64()
```
See the reference documentation created by `make doc` for details.
## Vectored input for length-delimited fields
Length-delimited fields (like string fields, byte fields and messages) are
usually set by calling `add_string()`, `add_message()`, etc. These functions
have several forms, but they basically all take a *tag*, a *size*, and a
*pointer to the data*. They write the length of the data into the message
and then copy the data over.
Sometimes you have the data not in one place, but spread over several
buffers. In this case you have to consolidate those buffers first, which needs
an extra copy. Say you have two very long strings that should be concatenated
into a message:
```cpp
std::string a{"very long string..."};
std::string b{"another very long string..."};
std::string data;
protozero::pbf_writer writer{data};
a.append(b); // expensive extra copy
writer.add_string(1, a);
```
To avoid this, the function `add_bytes_vectored()` can be used which allows
vectored (or scatter/gather) input like this:
```cpp
std::string a{"very long string..."};
std::string b{"another very long string..."};
std::string data;
protozero::pbf_writer writer{data};
writer.add_bytes_vectored(1, a, b);
```
`add_bytes_vectored()` will add up the sizes of all its arguments and copy over
all the data only once.
The function takes any number of arguments. The arguments must be of a type
supporting the `data()` and `size()` methods like `protozero::data_view()`,
`std::string` or the C++17 `std::string_view`.
Note that there is only one version of the function which can be used for any
length-delimited field including strings, bytes, messages and repeated packed
fields.
The function is also available in the `pbf_builder` class.
## Internal handling of varints
When varints are decoded they are always decoded as 64bit unsigned integers and
after that casted to the type you are requesting (using `static_cast`). This
means that if the protocol buffer message was created with a different integer
type than what you are reading it with, you might get wrong results without any
warning or error. This is the same behaviour as the Google Protocol Buffers
library has.
In normal use, this should never matter, because presumably you are using the
same types to write that data as you are using to read it later. It can happen
if the data is corrupted intentionally or unintentionally in some way. But
this can't be used to feed you any data that it wasn't possible to feed you
without this behaviour, so it doesn't open up any potential problems. You
always have to check anyway that the integers are in the range you expected
them to be in if the expected range is different than the range of the integer
type. This is especially true for enums which protozero will return as
`int32_t`.
## How many items are there in a repeated packed field?
Sometimes it is useful to know how many values there are in a repeated packed
field. For instance when you want to reserve space in a `std::vector`.
```cpp
protozero::pbf_reader message{...};
message.next(...);
const auto range = message.get_packed_sint32();
std::vector<int> myvalues;
myvalues.reserve(range.size());
for (auto value : range) {
myvalues.push_back(value);
}
```
It depends on the type of range how expensive the `size()` call is. For ranges
derived from packed repeated fixed sized values the effort will be constant,
for ranges derived from packed repeated varints, the effort will be linear, but
still considerably cheaper than decoding the varints. You have to benchmark
your use case to see whether the `reserve()` (or whatever you are using the
`size()` for) is worth it.
## Using a different buffer class than std::string
Normally you are using the `pbf_writer` or `pbf_builder` classes which use a
`std::string` that you supply as their buffer for building the actual protocol
buffers message into. But you can use a different buffer implementation
instead. This might be useful if you want to use a fixed-size buffer for
instance.
The `pbf_writer` and `pbf_builder` classes are actually only aliases for the
`basic_pbf_writer` and `basic_pbf_builder` template classes:
```cpp
using pbf_writer = basic_pbf_writer<std::string>;
template <typename T>
using pbf_builder = basic_pbf_builder<std::string, T>;
```
If you want to use a different buffer type, use the `basic_*` form of the
class and use the buffer class as template parameter. When instantiating the
`basic_pbf_writer` or `basic_pbf_builder`, the only parameter to the
constructor must always be a reference to an object of the buffer class.
```cpp
some_buffer_class buffer;
basic_pbf_writer<some_buffer_class> writer{buffer};
```
For this to work you must supply template specializations for some static
functions in the `protozero::buffer_customization` struct, see
`buffer_tmpl.hpp` for details.
Protozero already supports two buffer types:
* `std::string` (to use include `protozero/buffer_string.hpp`)
* `std::vector<char>` (to use include `protozero/buffer_vector.hpp`)
There is a class `protozero::fixed_size_buffer_adaptor` you can use as adaptor
for any fixed-sized buffer you might have. Include `protozero/buffer_fixed.hpp`
to use it:
```cpp
#include <protozero/buffer_fixed.hpp>
your_buffer_class some_buffer;
protozero::fixed_size_buffer_adaptor buffer_adaptor{some_buffer.data(), some_buffer.size()};
basic_pbf_writer<protozero::fixed_size_buffer_adaptor> writer{buffer_adaptor};
```
The buffer adaptor can be initialized with any container if it supports the
`data()` and `size()` member functions:
```cpp
protozero::fixed_size_buffer_adaptor buffer_adaptor{some_buffer};
```