Siarhei Fedartsou 563c04ae2a Squashed 'third_party/protozero/' changes from d5d8debf1..f379578a3

f379578a3 Release 1.7.1
040947ba2 CMake: Add language C to project()
c60e87879 Update include catch.hpp to 2.13.8
6599d4828 Update change log
ebd2e4f40 Allow setting C++ version to compile with in CMake config
bbb2a95d0 Github actions: Only install x64 version of vcpkgs in windows build
546edf929 Avoid narrowing conversion by being explicit
91adaecd6 Fix various issues reported by clang-tidy and disable some checks
68f30a1f7 Add Github actions CI build for Linux/macOS/Windows
c13886b78 Update some links
bd429c52f Include tools subdirectory *after* testing is enabled
3133dc52f Use std::memcpy instead of just memcpy
7ee29422a Merge pull request #106 from daniel-j-h/fix-byteswap-ub
aba0800aa CMake config: clang-tidy target needs writer_tests which need protobuf
74516e8df Modernize CMake config a little bit
b4486ca7a Disable some clang-tidy warnings
19f4b75f6 Fix appveyor build
58b1a19a4 Modernize Doxygen config file
046c07d0e Update included catch2 framework to current version v2.13.7
49acea746 fix some doxygen warnings by removing obsolete config entries
0c5426df3 fix cmake dep warning: 'Compatibility with CMake < 2.8.12 will be removed from a future version of CMake'
dadf7bd51 Fixes float and double byteswap undefined behavior
85db94025 Merge pull request #105 from ffontaine/master
d3a35791d Merge pull request #104 from joto/master
03daae49c CMakeLists.txt: respect BUILD_TESTING
67133e362 Add missing includes
9f85f3a5e Update README.md
010ffcf69 Release 1.7.0
6ad492994 Fixed docs
adde4dedf Travis: Add non-Intel architectures
37c3d0e1d Add typedefs and functions to make buffer adaptor work as container
83563acdb Remove delegating constructor because clang-tidy doesn't like it
b4afc06db Use #include "" for library-internal includes
d1929788e Add missing includes
fae5247f0 Update change log
2abb1b5cd Travis: Test with std::string_view, not std::experimental::string_view
697bd812d Use forwarding constructor in fixed_size_buffer_adaptor
c8fd2e819 Move fixed_size_buffer_adaptor into buffer_fixed.hpp
abb856ecc Remove semicolon where it doesn't belong
6243855bc Change the way the customization for special buffer classes work
d6a8ed098 Remove useless post-increment
184046cb0 Remove need for push_back() on custom buffer types
0a974e067 Remove templated buffer adaptor wrappers
ed6ba5097 Add buffer implementation based on std::vector<char>
cec309c3c Use more descriptive names for buffer test types
a7b99da6f Use TEMPLATE_TEST_CASE to test different buffer implementations
a0abc493c Use explicit for constructor
72850abc9 Remove broken doxygen link
1e347c620 Add more convenient fixed_size_buffer_adaptor constructor
fb575e0ea Make members private
50e953b71 Make older compilers happy
c850ef150 Extend tests of static buffer use
f6d8394c0 Rename fixed_size_buffer to fixed_size_buffer_adaptor
3b18162e3 Make the buffer backend used by the pbf writer configurable.
6fd19c58d "Modernize" travis config
981aba084 Use explicit cast to avoid undefined behaviour
550974d5d Travis: Do not test GCC 4.7 any more
866e024fc Revert "Workaround in catch code so it compiles with older compilers"
65dfad056 Disable a clang-tidy test triggered by Catch.
02bf73df5 Workaround in catch code so it compiles with older compilers
f98792a15 Travis: Do not update homebrew for faster builds
2d87da7ec Switch tests to Catch2
5dc45ac3b Avoid signed/unsigned comparison
3a93f19ba Add missing includes
b49c077ac Disable clang-tidy for files where we don't have a compile command
34396fc7d Travis: Fix gcc8 build
c3060101c Handle clang-tidy warnings
64ef96ff0 Revert "Initialize test messages"
a0828d538 Travis: Also build with GCC 8
171c5c446 Update travis xcode versions
f5a223aa7 Use "auto*" instead of just "auto" for pointer types
e4fa23616 Initialize test messages
e3a59454a Simpler code and avoid shadowing of external function
830f049b4 Use STL algorithms insted of raw loops
faa7e6e8a Disable config settings not used in newer Doxygen versions
3b2e11438 Remove unnecessary enum name
7487f8109 Release 1.6.8
6dcaf8fde Travis config: Use "official" way to pull homebrew package.
c61eb29c3 Revert "Disable warnings from clang-tidy about a missing file."
28d05a0a8 Disable warnings from clang-tidy about a missing file.
329920a3c Pesky aliases of clang-tidy warnings strike again.
79fd87922 User plain assert() instead of our own so compare() can be noexcept.
473e6ec13 Update change log.
393e279b7 Make pbf_writer destructor noexcept.
48a38b3f2 Disable clang-tidy misc-non-private-member-variables-in-classes.
e9c148c8a Use no-argument version of main().
29ba04123 Disable clang tidy checks for C arrays.
2fcfb56e2 More places to use std::array instead of a C style array.
7321761a3 Disable a clang-tidy warning.
4d9d8fff4 Make data_view::compare() noexcept.
3325364cf User uppercase integer literal suffix.
df0a23c5e Use std::array instead of C arrays in some places.
8247ed76b Make clang-tidy include order check happy.
f1b504e16 Update travis config to user newer compilers and operating systems.
ccf692d47 Disable some clang-tidy warnings.
095abd259 CMake config: Also look for newer clang-tidy versions.
2c1f6f9c8 Use uppercase integer literal suffixes.
fadd024d4 Release 1.6.7
8c6acbff7 Fix signed-unsigned comparison.
b36774ccb Release 1.6.6
5a92b744f Remove useless asserts, simplify condition.
06bafb56c Fix several possible UBs.
b7b290b1a Release 1.6.5
51753d514 Merge pull request #95 from tomhughes/subscript
b90faaf03 Avoid out of bounds array subscript
7d418492e Merge pull request #94 from nigels-com/proto2
015f9cc5e Specify proto2 syntax to appease protoc
23d48fd2a Use universal initialization syntax in constructors.
0f610fad5 Update travis config: Use xenial for most builds.
d71da0b04 Update appveyor config: Simpler builds, current MSVC, 32bit build
3ef46ba78 Release 1.6.4
3a1ef0138 Tighten some tests.
18eebb8c3 Remove unused code from tests.
29ef3e4e7 More casts to remove undefined behaviour.
6108e6480 No more bitwise operations on signed integers in zigzag encoder/decoder.
6e0d34985 Remove bitwise operations on signed integers in varint decoder.
4af65f262 Update change log.
2f82182fe Add some tips to test/create_pbf_test_data.sh.
c55f4ed55 Fix some doxygen warnings.
afa362a03 Add static_asserts to check movability of some classes.
efeb45e0c Disable readability-implicit-bool-conversion clang-tidy warning.
78febda5b Explicit conversion and tests for new pbf_reader::data() function.
0d5492c9c Revert "Explicit conversion and tests for new pbf_reader::data() function."
43cf8fa5a Fix travis config.
bd2ae4682 Explicit conversion and tests for new pbf_reader::data() function.
28cd406bd Update travis with newer compiler versions.
0555e6a1f Add function to get the not yet read data from a pbf_reader.
bf4284bee Disable docker builds on travis. They are being phased out by travis.
5ffe45b71 New add_packed_fixed template function.
e54cd858d Add helper function that computes the length a varint would have.
72d7e143a More consistent implementation of operators.
3a41880c2 Do not download protobuf library, it isn't found by cmake anyway.
3c662ce3c Remove comment that doesn't apply (any more).
45da6dd4d Update zigzag tests.
4ad573dbf Extra cast so we do the xor with unsigned ints.
509aec5ab Update appveyor build to current Visual Studio compiler.
67b24e1a3 Remove unnecessary workaround in Appveyor config.
c559af682 Remove xcode6.4 build soon to be removed from travis.
0662dcecc Release 1.6.3
da5bfc019 Move byteswap_inplace functions from detail into protozero namespace.
a44efc34e Travis: Ignore install problems on OSX.
5775b2b23 Travis update to newer OSX image.
032aa037c Special case the distance between default initialized iterators.
0ca02161e Make dereferencing operator of fixed_iterator noexcept.
a0095f603 Test code must call functions that it wants to test.
6791b0bc3 Add unit tests.
191eb4004 Add some paranoia asserts.
99ca512f5 Use TEST_CASEs instead of SECTIOs in some tests.
040e2bc14 Add some asserts and tests.

git-subtree-dir: third_party/protozero
git-subtree-split: f379578a3f7c8162aac0ac31c2696de09a5b5f93

2024-07-13 15:52:32 +02:00

12 KiB

Raw Blame History

Protozero Advanced Topics

This documentation contains some mixed advanced topics for Protozero users. Read the tutorial first if you are new to Protozero.

Limitations of Protozero

A protobuf message has to fit into memory completely, otherwise it can not be parsed with this library. There is no streaming support.
The length of a string, bytes, or submessage can't be more than 2^31-1.
There is no specific support for maps but they can be used as described in the "Backwards compatibility" section of https://developers.google.com/protocol-buffers/docs/proto3#maps.

Checking the Protozero version number

If protozero/version.hpp is included, the following macros are set:

Macro	Example	Description
`PROTOZERO_VERSION_MAJOR`	1	Major version number
`PROTOZERO_VERSION_MINOR`	3	Minor version number
`PROTOZERO_VERSION_PATCH`	2	Patch number
`PROTOZERO_VERSION_CODE`	10302	Version (major * 10,000 + minor * 100 + patch)
`PROTOZERO_VERSION_STRING`	"1.3.2"	Version string

Changing Protozero behaviour with macros

The behaviour of Protozero can be changed by defining the following macros. They have to be set before including any of the Protozero headers.

`PROTOZERO_STRICT_API`

If this is set, you will get some extra warnings or errors during compilation if you are using an old (deprecated) interface to Protozero. Enable this if you want to make sure your code will work with future versions of Protozero.

`PROTOZERO_USE_VIEW`

Protozero uses the class protozero::data_view as the return type of the pbf_reader::get_view() method and a few other functions take a protozero::data_view as parameter.

If PROTOZERO_USE_VIEW is unset, protozero::data_view is Protozero's own implementation of a string view class.

Set this macro if you want to use a different implementation such as the C++17 std::string_view class. In this case protozero::data_view will simply be an alias to the class you specify.

#define PROTOZERO_USE_VIEW std::string_view

Repeated fields in messages

The Google Protobuf spec documents that a non-repeated field can actually appear several times in a message and the implementation is required to return the value of the last version of that field in this case. pbf_reader.hpp does not enforce this. If this feature is needed in your case, you have to do this yourself.

The spec also says that you must be able to read a packed repeated field where a not-packed repeated field is expected and vice versa. Also there can be several (packed or not-packed) repeated fields with the same tag and their contents must be concatenated. It is your responsibility to do this, Protozero doesn't do that for you.

Using `tag_and_type()`

The tag_and_type() free function and the method of the same name on the pbf_reader and pbf_message classes can be used to access both packed and unpacked repeated fields. (It can also be used to check that you have the right type of encoding for other fields.)

Here is the outline:

enum class ExampleMsg : protozero::pbf_tag_type {
    repeated_uint32_x = 1
};

std::string data = ...
pbf_message<ExampleMsg> message{data};
while (message.next()) {
    switch (message.tag_and_type()) {
        case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::length_delimited): {
                auto xit = message.get_packed_uint32();
                ... // handle the repeated field when it is packed
            }
            break;
        case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::varint): {
                auto x = message.get_uint32();
                ... // handle the repeated field when it is not packed
            }
            break;
        default:
            message.skip();
    }
}

All this works on pbf_reader in the same way as with pbf_message with the usual difference that pbf_reader takes a numeric field tag and pbf_message an enum field.

If you only want to check for one specific tag and type you can use the two-argument version of pbf_reader::next(). In this case 17 is the field tag we are looking for:

std::string data = ...
pbf_reader message{data};
while (message.next(17, pbf_wire_type::varint)) {
    auto foo = message.get_int32();
    ...
}

See the test under test/t/tag_and_type/ for a complete example.

Reserving memory when writing messages

If you know beforehand how large a message will become or can take an educated guess, you can call the usual std::string::reserve() on the underlying string before you give it to an pbf_writer or pbf_builder object.

Or you can (at any time) call reserve() on the pbf_writer or pbf_builder. This will reserve the given amount of bytes in addition to whatever is already in that message. (Note that this behaviour is different then what reserve() does on std::string or std::vector.)

In the general case it is not easy to figure out how much memory you will need because of the varint packing of integers. But sometimes you can make at least a rough estimate. Still, you should probably only use this facility if you have benchmarks proving that it actually makes your program faster.

Using the low-level varint and zigzag encoding and decoding functions

Protozero gives you access to the low-level functions for encoding and decoding varint and zigzag integer encodings, because these functions can sometimes be useful outside the Protocol Buffer context.

Using low-level functions

To use the low-level functions, add this include to your C++ program:

#include <protozero/varint.hpp>

Functions

The following functions are then available:

decode_varint()
write_varint()
encode_zigzag32()
encode_zigzag64()
decode_zigzag32()
decode_zigzag64()

See the reference documentation created by make doc for details.

Vectored input for length-delimited fields

Length-delimited fields (like string fields, byte fields and messages) are usually set by calling add_string(), add_message(), etc. These functions have several forms, but they basically all take a tag, a size, and a pointer to the data. They write the length of the data into the message and then copy the data over.

Sometimes you have the data not in one place, but spread over several buffers. In this case you have to consolidate those buffers first, which needs an extra copy. Say you have two very long strings that should be concatenated into a message:

std::string a{"very long string..."};
std::string b{"another very long string..."};

std::string data;
protozero::pbf_writer writer{data};

a.append(b); // expensive extra copy

writer.add_string(1, a);

To avoid this, the function add_bytes_vectored() can be used which allows vectored (or scatter/gather) input like this:

std::string a{"very long string..."};
std::string b{"another very long string..."};

std::string data;
protozero::pbf_writer writer{data};

writer.add_bytes_vectored(1, a, b);

add_bytes_vectored() will add up the sizes of all its arguments and copy over all the data only once.

The function takes any number of arguments. The arguments must be of a type supporting the data() and size() methods like protozero::data_view(), std::string or the C++17 std::string_view.

Note that there is only one version of the function which can be used for any length-delimited field including strings, bytes, messages and repeated packed fields.

The function is also available in the pbf_builder class.

Internal handling of varints

When varints are decoded they are always decoded as 64bit unsigned integers and after that casted to the type you are requesting (using static_cast). This means that if the protocol buffer message was created with a different integer type than what you are reading it with, you might get wrong results without any warning or error. This is the same behaviour as the Google Protocol Buffers library has.

In normal use, this should never matter, because presumably you are using the same types to write that data as you are using to read it later. It can happen if the data is corrupted intentionally or unintentionally in some way. But this can't be used to feed you any data that it wasn't possible to feed you without this behaviour, so it doesn't open up any potential problems. You always have to check anyway that the integers are in the range you expected them to be in if the expected range is different than the range of the integer type. This is especially true for enums which protozero will return as int32_t.

How many items are there in a repeated packed field?

Sometimes it is useful to know how many values there are in a repeated packed field. For instance when you want to reserve space in a std::vector.

protozero::pbf_reader message{...};
message.next(...);
const auto range = message.get_packed_sint32();

std::vector<int> myvalues;
myvalues.reserve(range.size());

for (auto value : range) {
    myvalues.push_back(value);
}

It depends on the type of range how expensive the size() call is. For ranges derived from packed repeated fixed sized values the effort will be constant, for ranges derived from packed repeated varints, the effort will be linear, but still considerably cheaper than decoding the varints. You have to benchmark your use case to see whether the reserve() (or whatever you are using the size() for) is worth it.

Using a different buffer class than std::string

Normally you are using the pbf_writer or pbf_builder classes which use a std::string that you supply as their buffer for building the actual protocol buffers message into. But you can use a different buffer implementation instead. This might be useful if you want to use a fixed-size buffer for instance.

The pbf_writer and pbf_builder classes are actually only aliases for the basic_pbf_writer and basic_pbf_builder template classes:

using pbf_writer = basic_pbf_writer<std::string>;

template <typename T>
using pbf_builder = basic_pbf_builder<std::string, T>;

If you want to use a different buffer type, use the basic_* form of the class and use the buffer class as template parameter. When instantiating the basic_pbf_writer or basic_pbf_builder, the only parameter to the constructor must always be a reference to an object of the buffer class.

some_buffer_class buffer;
basic_pbf_writer<some_buffer_class> writer{buffer};

For this to work you must supply template specializations for some static functions in the protozero::buffer_customization struct, see buffer_tmpl.hpp for details.

Protozero already supports two buffer types:

std::string (to use include protozero/buffer_string.hpp)
std::vector<char> (to use include protozero/buffer_vector.hpp)

There is a class protozero::fixed_size_buffer_adaptor you can use as adaptor for any fixed-sized buffer you might have. Include protozero/buffer_fixed.hpp to use it:

#include <protozero/buffer_fixed.hpp>

your_buffer_class some_buffer;
protozero::fixed_size_buffer_adaptor buffer_adaptor{some_buffer.data(), some_buffer.size()};
basic_pbf_writer<protozero::fixed_size_buffer_adaptor> writer{buffer_adaptor};

The buffer adaptor can be initialized with any container if it supports the data() and size() member functions:

protozero::fixed_size_buffer_adaptor buffer_adaptor{some_buffer};

12 KiB Raw Blame History