This fixes issues #3952. The new approach pre-computes masks for fast
access. Since elements can potentially span multiple words we need masks
and offsets for each upper and lower word.
Due to a bug in the C++14 standart the mask computation is not
recognized as constexpr, but would work on C++17.