Created
December 24, 2016 10:49
-
-
Save Vermeille/fdd690f6d204824ce4d1588a71011b3c to your computer and use it in GitHub Desktop.
A Good, Idiomatic, STL-like split algorithm?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <algorithm> | |
#include <iostream> | |
#include <string> | |
#include <tuple> | |
#include <utility> | |
/* | |
This split impl has a nice prototype and isn't callback / continuation | |
oriented. However, it has two major drawbacks | |
1) Iterating over it is a bit painful. See below. | |
2) If the first split-range is empty, it's ignored. The following are not. | |
See below how the range where "e" is missing actually makes an iteration, and | |
"g" doesn't. That's not consistent, and I see no way to fix this with this | |
interace. | |
*/ | |
template <class FwdIt, class Separator> | |
std::pair<FwdIt, FwdIt> split(FwdIt b, FwdIt e, const Separator& sep) { | |
if (b == e) { | |
return std::make_pair(e, e); | |
} | |
if (*b == sep) { | |
++b; | |
} | |
return std::make_pair(b, std::find(b, e, sep)); | |
} | |
int main() { | |
std::string csv = | |
"a,b,c\n" | |
"d,,f\n" | |
",h,i\n"; | |
auto line = split(csv.begin(), csv.end(), '\n'); | |
while (line.first != csv.end()) { | |
std::cout << "line: " << std::string(line.first, line.second) << "\n"; | |
auto val = split(line.first, line.second, ','); | |
while (val.first != line.second) { | |
std::cout << " val: " << std::string(val.first, val.second) | |
<< "\n"; | |
val = split(val.second, line.second, ','); | |
} | |
line = split(line.second, csv.end(), '\n'); | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <algorithm> | |
#include <iostream> | |
#include <string> | |
#include <utility> | |
/* | |
This implementation is consistent and works. However | |
1) that CPS isn't very STL-idiomatic. | |
2) It also forces to explore all splits. You can certainly play with the b/e | |
iterators to discard some, but that sucks. | |
*/ | |
template <class FwdIt, class T, class F> | |
void split(FwdIt b, FwdIt e, const T& x, F&& f) { | |
while (true) { | |
auto found = std::find(b, e, x); | |
f(b, found); | |
if (found == e) { | |
return; | |
} | |
b = found + 1; | |
} | |
} | |
int main() { | |
std::string csv = | |
"a,b,c\n" | |
"d,,f\n" | |
",h,i\n"; | |
split(csv.begin(), csv.end(), '\n', [&](auto lb, auto le) { | |
std::cout << "line: " << std::string(lb, le) << "\n"; | |
split(lb, le, ',', [&](auto tb, auto te) { | |
std::cout << " val: " << std::string(tb, te) << "\n"; | |
}); | |
}); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Guillaume,
What do you think of the split view adaptor in range-v3 to solve this problem ?
std::string input = "split me into words";
auto splitRange = view::split(input, [](char c){returns c == ' '));
which produces a view on the initial range over which you can iterate.
If you want to keep the results and destruct the initial string, you can copy splitRange, or alternatively use boost's split:
std::string input = "split me into words";
std::vector<std::string> results;
boost::split(results, input, [](char c){returns c == ' ')}
Although I think boost's interface clarity could be improved by returning the results by value:
std::string input = "split me into words";
std::vector<std::string> results = split(input, [](char c){returns c == ' ')}
which shouldn't impact performance thanks to move semantics.
References:
split view adaptor in range-v3:
https://github.com/ericniebler/range-v3/blob/ca997df10962c482274e6be37fdbe39add8664c9/test/view/split.cpp
boost split
http://www.boost.org/doc/libs/1_57_0/doc/html/string_algo/usage.html#idp430824992