Chapter 42. A C++-ish strtok(); strsplit(), join(), and trim()

Two template functions implement strtok()-like functionality that's more C++-like than the standard C library version:

#include <x/strtok.H>

std::string str;

std::list<std::string> container;

x::strtok_str(str, " \t\r\n", container);

strtok_str()'s second argument sets the list of delimiter characters. Consecutive sequences of characters in the first parameter are delimited by characters in the delimiter character list. The sequences extracted and added to the container passed as the third argument. The third argument can be any container that implements push_back(). The above example extracts words from the string delimited by whitespace characters, and appends them to the container.

#include <x/strtok.H>

std::string str;

std::list<std::string> container;

x::strtok_if(str,
             [] (char c)
             {
                return c == ' '|| c == '\t' || c == '\n' ||
                    c == '\r';
             }, container);

The strtok_if() function uses a functor or a predicate rather than a literal string, to define delimiter characters. The functor takes one parameter, a single character, and returns true if the character should be considered a delimiter character.

x::join() is the opposite of strtok(), and combines a collection of strings together, with a separator string sandwiched in between:

#include <x/join.H>
#include <iostream>
#include <filesystem>

int main()
{
    std::set<std::string> files;

    for (auto &d: std::filesystem::directory_iterator("."))
    {
        files.insert(d.path().filename());
    }

    std::cout << x::join(files, "\n") << std::endl;
    return 0;
}

x::join() is heavily overloaded. The collection of strings can be specified as a container, or as an input sequence defined by a beginning and an ending iterator. The separator string can be a string object or a literal string. The resulting joined string can be written to an output iterator, or returned as a single string.

#include <x/strsplit.H>

std::vector<std::string> words;

std::string line;

x::strsplit_if(line.begin(), line.end(), words,
   []
   (char c)
   {
       return isspace(c);
   },
   []
   (char quote)
   {
       return quote == 34;
   });

x::strsplit_if() implements string splitting with quoting. This example splits strings by whitespaces, except that the quote character delimits quoted content. Whitespace inside quotes becomes a part of each split word. If two consecutive quoting characters inside a quoted string appear, the first one gets dropped, and the second one gets included in the split word.

x::strsplit(line.begin(), line.end(), words, " \t\r", 34);

x::strsplit() is a wrapper for x::strsplit_if() that supplies the lambda predicates based on the literal raw character string, and the quoting character.

std::string str;
std::string trimmed=x::trim(str);
      

An overloaded x::trim() removes all leading and trailing whitespace from a std::string or std::u32string, and returns the trimmed string. The two argument overload of x::trim() takes a beginning and an ending iterator value of a character sequence. The two arguments are passed by reference. x::trim() updates the iterators to skip all leading and trailing whitespace.