f379578a3 Release 1.7.1 040947ba2 CMake: Add language C to project() c60e87879 Update include catch.hpp to 2.13.8 6599d4828 Update change log ebd2e4f40 Allow setting C++ version to compile with in CMake config bbb2a95d0 Github actions: Only install x64 version of vcpkgs in windows build 546edf929 Avoid narrowing conversion by being explicit 91adaecd6 Fix various issues reported by clang-tidy and disable some checks 68f30a1f7 Add Github actions CI build for Linux/macOS/Windows c13886b78 Update some links bd429c52f Include tools subdirectory *after* testing is enabled 3133dc52f Use std::memcpy instead of just memcpy 7ee29422a Merge pull request #106 from daniel-j-h/fix-byteswap-ub aba0800aa CMake config: clang-tidy target needs writer_tests which need protobuf 74516e8df Modernize CMake config a little bit b4486ca7a Disable some clang-tidy warnings 19f4b75f6 Fix appveyor build 58b1a19a4 Modernize Doxygen config file 046c07d0e Update included catch2 framework to current version v2.13.7 49acea746 fix some doxygen warnings by removing obsolete config entries 0c5426df3 fix cmake dep warning: 'Compatibility with CMake < 2.8.12 will be removed from a future version of CMake' dadf7bd51 Fixes float and double byteswap undefined behavior 85db94025 Merge pull request #105 from ffontaine/master d3a35791d Merge pull request #104 from joto/master 03daae49c CMakeLists.txt: respect BUILD_TESTING 67133e362 Add missing includes 9f85f3a5e Update README.md 010ffcf69 Release 1.7.0 6ad492994 Fixed docs adde4dedf Travis: Add non-Intel architectures 37c3d0e1d Add typedefs and functions to make buffer adaptor work as container 83563acdb Remove delegating constructor because clang-tidy doesn't like it b4afc06db Use #include "" for library-internal includes d1929788e Add missing includes fae5247f0 Update change log 2abb1b5cd Travis: Test with std::string_view, not std::experimental::string_view 697bd812d Use forwarding constructor in fixed_size_buffer_adaptor c8fd2e819 Move fixed_size_buffer_adaptor into buffer_fixed.hpp abb856ecc Remove semicolon where it doesn't belong 6243855bc Change the way the customization for special buffer classes work d6a8ed098 Remove useless post-increment 184046cb0 Remove need for push_back() on custom buffer types 0a974e067 Remove templated buffer adaptor wrappers ed6ba5097 Add buffer implementation based on std::vector<char> cec309c3c Use more descriptive names for buffer test types a7b99da6f Use TEMPLATE_TEST_CASE to test different buffer implementations a0abc493c Use explicit for constructor 72850abc9 Remove broken doxygen link 1e347c620 Add more convenient fixed_size_buffer_adaptor constructor fb575e0ea Make members private 50e953b71 Make older compilers happy c850ef150 Extend tests of static buffer use f6d8394c0 Rename fixed_size_buffer to fixed_size_buffer_adaptor 3b18162e3 Make the buffer backend used by the pbf writer configurable. 6fd19c58d "Modernize" travis config 981aba084 Use explicit cast to avoid undefined behaviour 550974d5d Travis: Do not test GCC 4.7 any more 866e024fc Revert "Workaround in catch code so it compiles with older compilers" 65dfad056 Disable a clang-tidy test triggered by Catch. 02bf73df5 Workaround in catch code so it compiles with older compilers f98792a15 Travis: Do not update homebrew for faster builds 2d87da7ec Switch tests to Catch2 5dc45ac3b Avoid signed/unsigned comparison 3a93f19ba Add missing includes b49c077ac Disable clang-tidy for files where we don't have a compile command 34396fc7d Travis: Fix gcc8 build c3060101c Handle clang-tidy warnings 64ef96ff0 Revert "Initialize test messages" a0828d538 Travis: Also build with GCC 8 171c5c446 Update travis xcode versions f5a223aa7 Use "auto*" instead of just "auto" for pointer types e4fa23616 Initialize test messages e3a59454a Simpler code and avoid shadowing of external function 830f049b4 Use STL algorithms insted of raw loops faa7e6e8a Disable config settings not used in newer Doxygen versions 3b2e11438 Remove unnecessary enum name 7487f8109 Release 1.6.8 6dcaf8fde Travis config: Use "official" way to pull homebrew package. c61eb29c3 Revert "Disable warnings from clang-tidy about a missing file." 28d05a0a8 Disable warnings from clang-tidy about a missing file. 329920a3c Pesky aliases of clang-tidy warnings strike again. 79fd87922 User plain assert() instead of our own so compare() can be noexcept. 473e6ec13 Update change log. 393e279b7 Make pbf_writer destructor noexcept. 48a38b3f2 Disable clang-tidy misc-non-private-member-variables-in-classes. e9c148c8a Use no-argument version of main(). 29ba04123 Disable clang tidy checks for C arrays. 2fcfb56e2 More places to use std::array instead of a C style array. 7321761a3 Disable a clang-tidy warning. 4d9d8fff4 Make data_view::compare() noexcept. 3325364cf User uppercase integer literal suffix. df0a23c5e Use std::array instead of C arrays in some places. 8247ed76b Make clang-tidy include order check happy. f1b504e16 Update travis config to user newer compilers and operating systems. ccf692d47 Disable some clang-tidy warnings. 095abd259 CMake config: Also look for newer clang-tidy versions. 2c1f6f9c8 Use uppercase integer literal suffixes. fadd024d4 Release 1.6.7 8c6acbff7 Fix signed-unsigned comparison. b36774ccb Release 1.6.6 5a92b744f Remove useless asserts, simplify condition. 06bafb56c Fix several possible UBs. b7b290b1a Release 1.6.5 51753d514 Merge pull request #95 from tomhughes/subscript b90faaf03 Avoid out of bounds array subscript 7d418492e Merge pull request #94 from nigels-com/proto2 015f9cc5e Specify proto2 syntax to appease protoc 23d48fd2a Use universal initialization syntax in constructors. 0f610fad5 Update travis config: Use xenial for most builds. d71da0b04 Update appveyor config: Simpler builds, current MSVC, 32bit build 3ef46ba78 Release 1.6.4 3a1ef0138 Tighten some tests. 18eebb8c3 Remove unused code from tests. 29ef3e4e7 More casts to remove undefined behaviour. 6108e6480 No more bitwise operations on signed integers in zigzag encoder/decoder. 6e0d34985 Remove bitwise operations on signed integers in varint decoder. 4af65f262 Update change log. 2f82182fe Add some tips to test/create_pbf_test_data.sh. c55f4ed55 Fix some doxygen warnings. afa362a03 Add static_asserts to check movability of some classes. efeb45e0c Disable readability-implicit-bool-conversion clang-tidy warning. 78febda5b Explicit conversion and tests for new pbf_reader::data() function. 0d5492c9c Revert "Explicit conversion and tests for new pbf_reader::data() function." 43cf8fa5a Fix travis config. bd2ae4682 Explicit conversion and tests for new pbf_reader::data() function. 28cd406bd Update travis with newer compiler versions. 0555e6a1f Add function to get the not yet read data from a pbf_reader. bf4284bee Disable docker builds on travis. They are being phased out by travis. 5ffe45b71 New add_packed_fixed template function. e54cd858d Add helper function that computes the length a varint would have. 72d7e143a More consistent implementation of operators. 3a41880c2 Do not download protobuf library, it isn't found by cmake anyway. 3c662ce3c Remove comment that doesn't apply (any more). 45da6dd4d Update zigzag tests. 4ad573dbf Extra cast so we do the xor with unsigned ints. 509aec5ab Update appveyor build to current Visual Studio compiler. 67b24e1a3 Remove unnecessary workaround in Appveyor config. c559af682 Remove xcode6.4 build soon to be removed from travis. 0662dcecc Release 1.6.3 da5bfc019 Move byteswap_inplace functions from detail into protozero namespace. a44efc34e Travis: Ignore install problems on OSX. 5775b2b23 Travis update to newer OSX image. 032aa037c Special case the distance between default initialized iterators. 0ca02161e Make dereferencing operator of fixed_iterator noexcept. a0095f603 Test code must call functions that it wants to test. 6791b0bc3 Add unit tests. 191eb4004 Add some paranoia asserts. 99ca512f5 Use TEST_CASEs instead of SECTIOs in some tests. 040e2bc14 Add some asserts and tests. git-subtree-dir: third_party/protozero git-subtree-split: f379578a3f7c8162aac0ac31c2696de09a5b5f93
12 KiB
Protozero Advanced Topics
This documentation contains some mixed advanced topics for Protozero users. Read the tutorial first if you are new to Protozero.
Limitations of Protozero
- A protobuf message has to fit into memory completely, otherwise it can not be parsed with this library. There is no streaming support.
- The length of a string, bytes, or submessage can't be more than 2^31-1.
- There is no specific support for maps but they can be used as described in the "Backwards compatibility" section of https://developers.google.com/protocol-buffers/docs/proto3#maps.
Checking the Protozero version number
If protozero/version.hpp
is included, the following macros are set:
Macro | Example | Description |
---|---|---|
PROTOZERO_VERSION_MAJOR |
1 | Major version number |
PROTOZERO_VERSION_MINOR |
3 | Minor version number |
PROTOZERO_VERSION_PATCH |
2 | Patch number |
PROTOZERO_VERSION_CODE |
10302 | Version (major * 10,000 + minor * 100 + patch) |
PROTOZERO_VERSION_STRING |
"1.3.2" | Version string |
Changing Protozero behaviour with macros
The behaviour of Protozero can be changed by defining the following macros. They have to be set before including any of the Protozero headers.
PROTOZERO_STRICT_API
If this is set, you will get some extra warnings or errors during compilation if you are using an old (deprecated) interface to Protozero. Enable this if you want to make sure your code will work with future versions of Protozero.
PROTOZERO_USE_VIEW
Protozero uses the class protozero::data_view
as the return type of the
pbf_reader::get_view()
method and a few other functions take a
protozero::data_view
as parameter.
If PROTOZERO_USE_VIEW
is unset, protozero::data_view
is Protozero's own
implementation of a string view class.
Set this macro if you want to use a different implementation such as the C++17
std::string_view
class. In this case protozero::data_view
will simply be
an alias to the class you specify.
#define PROTOZERO_USE_VIEW std::string_view
Repeated fields in messages
The Google Protobuf spec documents that a non-repeated field can actually
appear several times in a message and the implementation is required to return
the value of the last version of that field in this case. pbf_reader.hpp
does
not enforce this. If this feature is needed in your case, you have to do this
yourself.
The spec also says that you must be able to read a packed repeated field where a not-packed repeated field is expected and vice versa. Also there can be several (packed or not-packed) repeated fields with the same tag and their contents must be concatenated. It is your responsibility to do this, Protozero doesn't do that for you.
Using tag_and_type()
The tag_and_type()
free function and the method of the same name on the
pbf_reader
and pbf_message
classes can be used to access both packed and
unpacked repeated fields. (It can also be used to check that you have the
right type of encoding for other fields.)
Here is the outline:
enum class ExampleMsg : protozero::pbf_tag_type {
repeated_uint32_x = 1
};
std::string data = ...
pbf_message<ExampleMsg> message{data};
while (message.next()) {
switch (message.tag_and_type()) {
case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::length_delimited): {
auto xit = message.get_packed_uint32();
... // handle the repeated field when it is packed
}
break;
case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::varint): {
auto x = message.get_uint32();
... // handle the repeated field when it is not packed
}
break;
default:
message.skip();
}
}
All this works on pbf_reader
in the same way as with pbf_message
with the
usual difference that pbf_reader
takes a numeric field tag and pbf_message
an enum field.
If you only want to check for one specific tag and type you can use the
two-argument version of pbf_reader::next()
. In this case 17
is the field
tag we are looking for:
std::string data = ...
pbf_reader message{data};
while (message.next(17, pbf_wire_type::varint)) {
auto foo = message.get_int32();
...
}
See the test under test/t/tag_and_type/
for a complete example.
Reserving memory when writing messages
If you know beforehand how large a message will become or can take an educated
guess, you can call the usual std::string::reserve()
on the underlying string
before you give it to an pbf_writer
or pbf_builder
object.
Or you can (at any time) call reserve()
on the pbf_writer
or pbf_builder
.
This will reserve the given amount of bytes in addition to whatever is already
in that message. (Note that this behaviour is different then what reserve()
does on std::string
or std::vector
.)
In the general case it is not easy to figure out how much memory you will need because of the varint packing of integers. But sometimes you can make at least a rough estimate. Still, you should probably only use this facility if you have benchmarks proving that it actually makes your program faster.
Using the low-level varint and zigzag encoding and decoding functions
Protozero gives you access to the low-level functions for encoding and decoding varint and zigzag integer encodings, because these functions can sometimes be useful outside the Protocol Buffer context.
Using low-level functions
To use the low-level functions, add this include to your C++ program:
#include <protozero/varint.hpp>
Functions
The following functions are then available:
decode_varint()
write_varint()
encode_zigzag32()
encode_zigzag64()
decode_zigzag32()
decode_zigzag64()
See the reference documentation created by make doc
for details.
Vectored input for length-delimited fields
Length-delimited fields (like string fields, byte fields and messages) are
usually set by calling add_string()
, add_message()
, etc. These functions
have several forms, but they basically all take a tag, a size, and a
pointer to the data. They write the length of the data into the message
and then copy the data over.
Sometimes you have the data not in one place, but spread over several buffers. In this case you have to consolidate those buffers first, which needs an extra copy. Say you have two very long strings that should be concatenated into a message:
std::string a{"very long string..."};
std::string b{"another very long string..."};
std::string data;
protozero::pbf_writer writer{data};
a.append(b); // expensive extra copy
writer.add_string(1, a);
To avoid this, the function add_bytes_vectored()
can be used which allows
vectored (or scatter/gather) input like this:
std::string a{"very long string..."};
std::string b{"another very long string..."};
std::string data;
protozero::pbf_writer writer{data};
writer.add_bytes_vectored(1, a, b);
add_bytes_vectored()
will add up the sizes of all its arguments and copy over
all the data only once.
The function takes any number of arguments. The arguments must be of a type
supporting the data()
and size()
methods like protozero::data_view()
,
std::string
or the C++17 std::string_view
.
Note that there is only one version of the function which can be used for any length-delimited field including strings, bytes, messages and repeated packed fields.
The function is also available in the pbf_builder
class.
Internal handling of varints
When varints are decoded they are always decoded as 64bit unsigned integers and
after that casted to the type you are requesting (using static_cast
). This
means that if the protocol buffer message was created with a different integer
type than what you are reading it with, you might get wrong results without any
warning or error. This is the same behaviour as the Google Protocol Buffers
library has.
In normal use, this should never matter, because presumably you are using the
same types to write that data as you are using to read it later. It can happen
if the data is corrupted intentionally or unintentionally in some way. But
this can't be used to feed you any data that it wasn't possible to feed you
without this behaviour, so it doesn't open up any potential problems. You
always have to check anyway that the integers are in the range you expected
them to be in if the expected range is different than the range of the integer
type. This is especially true for enums which protozero will return as
int32_t
.
How many items are there in a repeated packed field?
Sometimes it is useful to know how many values there are in a repeated packed
field. For instance when you want to reserve space in a std::vector
.
protozero::pbf_reader message{...};
message.next(...);
const auto range = message.get_packed_sint32();
std::vector<int> myvalues;
myvalues.reserve(range.size());
for (auto value : range) {
myvalues.push_back(value);
}
It depends on the type of range how expensive the size()
call is. For ranges
derived from packed repeated fixed sized values the effort will be constant,
for ranges derived from packed repeated varints, the effort will be linear, but
still considerably cheaper than decoding the varints. You have to benchmark
your use case to see whether the reserve()
(or whatever you are using the
size()
for) is worth it.
Using a different buffer class than std::string
Normally you are using the pbf_writer
or pbf_builder
classes which use a
std::string
that you supply as their buffer for building the actual protocol
buffers message into. But you can use a different buffer implementation
instead. This might be useful if you want to use a fixed-size buffer for
instance.
The pbf_writer
and pbf_builder
classes are actually only aliases for the
basic_pbf_writer
and basic_pbf_builder
template classes:
using pbf_writer = basic_pbf_writer<std::string>;
template <typename T>
using pbf_builder = basic_pbf_builder<std::string, T>;
If you want to use a different buffer type, use the basic_*
form of the
class and use the buffer class as template parameter. When instantiating the
basic_pbf_writer
or basic_pbf_builder
, the only parameter to the
constructor must always be a reference to an object of the buffer class.
some_buffer_class buffer;
basic_pbf_writer<some_buffer_class> writer{buffer};
For this to work you must supply template specializations for some static
functions in the protozero::buffer_customization
struct, see
buffer_tmpl.hpp
for details.
Protozero already supports two buffer types:
std::string
(to use includeprotozero/buffer_string.hpp
)std::vector<char>
(to use includeprotozero/buffer_vector.hpp
)
There is a class protozero::fixed_size_buffer_adaptor
you can use as adaptor
for any fixed-sized buffer you might have. Include protozero/buffer_fixed.hpp
to use it:
#include <protozero/buffer_fixed.hpp>
your_buffer_class some_buffer;
protozero::fixed_size_buffer_adaptor buffer_adaptor{some_buffer.data(), some_buffer.size()};
basic_pbf_writer<protozero::fixed_size_buffer_adaptor> writer{buffer_adaptor};
The buffer adaptor can be initialized with any container if it supports the
data()
and size()
member functions:
protozero::fixed_size_buffer_adaptor buffer_adaptor{some_buffer};