Make entry points of individual pipeline stages responsible for
configuring the task scheduler with requested number of threads
passed in corresponding configuration bundle (ie. follow extractor).
Rename module partition to partitioner.
This cultivates naming used in existing modules like extractor,
customizer, etc. - noun vs verb (word partition is both though).
Here's all I could get out of a instrumented `osrm-partition`; debug build,
with all the bells and whistles I could think of to make it more verbose:
```
==17928==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1560 byte(s) in 3 object(s) allocated from:
#0 0x7f4244185b30 in operator new[](unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cc:62
#1 0x7f4242a788b3 (/usr/lib/libtbb.so.2+0x208b3)
SUMMARY: AddressSanitizer: 1560 byte(s) leaked in 3 allocation(s).<Paste>
``
Symbolizing the address results in
```
echo "/usr/lib/libtbb.so 0x7f4242a788b3" | llvm-symbolizer
_fini
```
Looks like a crt finalizer => static global dtor "leaking" from tbb.
Which turned out to be a missing `tbb::task_scheduler_init` on our end:
> Using task_scheduler_init is optional in Intel® Threading Building
> Blocks (Intel® TBB) 2.2. By default, Intel TBB 2.2 automatically creates
> a task scheduler the first time that a thread uses task scheduling
> services and destroys it when the last such thread exits.
https://www.threadingbuildingblocks.org/docs/help/hh_goto.htm?index.htm#reference/task_scheduler/task_scheduler_init_cls.html
Without an explicit instanz the first call to a tbb algorithm seem to initialize
a global scheduler singleton which then "leaks" until the program exits.
Phew.
Implements parallel recursion for the partitioner
Fixes osrm-extract's -dump-partition-graph: accept no further tokens
References:
- http://www.boost.org/doc/libs/1_55_0/doc/html/boost/program_options/bool_switch.html
Pulls parameters through to make them configurable from the outside
Defaults are equivalent to:
./osrm-partition \
berlin-latest.osrm \
--max-cell-size 4096 \
--balance 1.2 \
--boundary 0.25 \
--optimizing-cuts 10
Fixes parallel_do call for Intel TBB 4.2 (Trusty): no range-based overload