Do not try to force boys to be masculine. You will only break them.

Ben Key: Ben.Key@YekNeb.com

September 18, 2018

 

When I was a child, every attempt to beat masculinity into me was made. I was never given a chance to even suspect that there was an alternative.

I was constantly told the following.

  • Men do not cry.
  • Men do not show emotions.
  • Men accept whatever comes stoically, without complaint.
  • Men do not show when they are in pain.

I was an adult before I began to realize that I did not fit in with other men. I was an adult before I realized that I preferred dresses over jeans and a tee-shirt.

I was embarrassed. I did not understand why I preferred to wear a dress. I was confused. There was no one I could go to.

I was married before I even heard the term “transgendered.” I still would have married a woman, but I would have chosen a woman who likes other women, not a woman who likes men.

When I finally accepted that I am transgendered, it ruined my marriage. The first thing my wife said to me was “I did not marry a woman.” She then accused me of marrying her under false pretenses.

The thing is, she is wrong. I was in denial. I was ashamed. I had not accepted who and what I am. I had never even heard the word “transgendered.”

I blame those who tried to pound masculinity into me with a belt or, when that did not prove to be effective, a piece of two by four for ruining my marriage and my life.

I thought of this when I read the article Many Ways to Be a Girl, but One Way to Be a Boy: The New Gender Rules on NYTimes.com.

Please. Do not destroy the life of your child. Do not crush their soul. I am begging you.

 

Advertisements

Musings on the Formatting of C and C++ Code

Ben Key: Ben.Key@YekNeb.com

October 4, 2013; October 26, 2018

 

I was watching the video The Care and Feeding of C++’s Dragons. I found it to be very interesting. I especially found the CLang based program that reformats code to be promising. However, there were some things about the tool that I find disturbing. It seems to use some formatting patterns that I think are huge mistakes.

To illustrate, some of the sample code looks like the following.

int functionName(int a, int b, int c,
                 int d, int e, int f,
                 int g, int h, int i) const {
  /* Code omitted. */
  if (test()) {
    /* Code omitted. */
  }
}

There are two problems that I see with formatting code in this way. The first is that on the line where the list of variables is continued, there is a great deal of what I see as entirely unnecessary white space. In my opinion it should be as follows.

int functionName(int a, int b, int c,
  int d, int e, int f,
  int g, int h, int i) const {

Or, my personal preference would be the following.

int functionName(
  int a, int b, int c,
  int d, int e, int f,
  int g, int h, int i) const {

My reasoning is quite simple.

Examine what happens when the name of the function is changed to something longer, as follows, and then it is reformated. The code becomes as follows.

int aMuchLongerFunctionName(int a, int b, int c
                            int d, int e, int f,
                            int g, int h, int i) const {

If you do a diff of the original version of the code and the new code, it will show that three lines have changed instead of showing that only one line has changed. I am aware of the fact that many diff tools have an ignore white space option and that if this option is set it will only show one line as having changed. However, not all diff tools have that option. In addition, some companies and organizations have a strict policy that every line of code that changes must be changed for a purpose associated with a given task. And these companies do not accept changes that are only related to reformatting as being associated with a given task. In essence, if the changed line does not affect the functionality of the code, it is not an acceptable change. And these organizations will deliberately not turn on the ignore white space option and will turn a deaf ear to the argument that they should just enable that option (can you tell that I am speaking from experience?).

If you are in such a situation and you change the name of a function that initially is formatted with the parameter list aligned with the end of the function name and you adhere to a strict “every changed line must have a functional purpose” rule you will inevitably end up with the following.

int aMuchLongerFunctionName(int a, int b, int c
                 int d, int e, int f,
                 int g, int h, int i) const {

This just looks wrong!

There is also another reason for not aligning the parameters with the end of the function name. Consider the following.

int aVeryLongFunctionNameThatGoesBeyondTheEdgeOfTheScreen(int a, int b, int c,
                                                          int d, int e, int f,
                                                          int g, int h, int i) const {

In this case, you cannot see the parameters at all without wasting your time scrolling across the screen.

If you always begin the parameter list on its own line that is indented one level deep as follows, you would not ever have to scroll the screen just to see the parameter list.

int aVeryLongFunctionNameThatGoesBeyondTheEdgeOfTheScreen(
  int a, int b, int c,
  int d, int e, int f,
  int g, int h, int i) const {

The second issue I have is with putting braces at the end of the line. In C and C++ braces are optional for some statements such as if. And lets face the facts, C and C++ is often inconsistently indented. Putting braces at the end leads to more work in the following scenario.

if (aVeryLongTestThatGoesPastTheEdgeOfTheScreen()) {
  /*
       Thousands
 of
         inconsistently indented
    lines of
 code /*
}

Putting the brace at the end of the if line forces someone who is reading the code to hit the end key to determine how much code will only be called if the condition is true, one line or thousands of lines, when they might not give a damned about seeing the end of the test because the beginning of it is enough to tell them whether or not the condition can be true in the scenario they are working on. What if the person knows that in the case they are working on, the function “aVeryLongTestThatGoesPastTheEdgeOfTheScreen” will return false. They really do not need to see the end of the test in this case except to find out how many lines they need to skip past in order to get to code that is relevant to their task. Why not just put the brace on a line by itself and make everyone’s life so much easier? Why force someone to hit the end key just so they can answer the following question. How many lines do I need to skip to get to code that is relevant to my task?

Until C and C++ do as they did in the Go language and make the braces mandatory, I believe braces should never be at the end of the line.

In Go, where the braces are mandatory, it does not matter as much to me because I know that if the code compiles the brace is there and I do not care if I cannot see it. But in C and C++, I do not want you to force me to find and hit the end key just so I can tell where your if statement ends. Of course, that does not mean that I think that the decision to put the brace at the end was a good one for Go. I often use a paren matching feature to skip past the irrelevant code in the scenario I have described. That requires that the caret be on the opening paren. In Go I need to hit the end key anyway just to get the caret on the brace so I can use the paren matching feature to skip past code I do not care about. Why? If the brace were on a line by itself, I do not need to locate and hit the end key. I can just arrow down to the brace line and use the paren matching feature.

I know that these arguments are only relevant to the placement of braces for conditional statements and that they are not relevant to the placement of braces at the beginning of functions. However, I still feel that the opening brace of a function should be on a line by itself for the sake of consistency.

I cannot believe other people have adopted code formatting patterns that to me are so obviously mistakes. Is there something I am missing that makes my arguments invalid?

And before you say, “just hit the end key, it is not that hard,” consider the fact that some people are hunt an peck typists. For some people, any extra key they need to hunt for unnecessarily is an aggravation that interrupts their work flow. I am certain that for some people who are touch typists, hitting one additional key is no big deal, but for hunt and peck typists, it can be.

I for one am a hunt and peck typist despite the fact that I began using computers in 1985 and for me finding the end key just to find out how many lines of code will only get called in the condition is true case is enough of a disruption that I find it to be extremely annoying.

When I first wrote this article back in 2013, I was not aware of the many options for customizing the behavior of clang-format.

Fortunately, you can easily customize the behavior of clang-format. There are numerous Clang-Format Style Options available. For example, you can instruct clang-format to “always break after an open bracket, if the parameters don’t fit on a single line” by setting AlignAfterOpenBracket to AlwaysBreak.

When you use clang-format to format a file it will search for a “.clang-format file located in one of the parent directories of the source file” and load the various formatting options from there. Clang-format also has a number of predefined coding styles to choose from: LLVM, Google, Chromium, Mozilla, and WebKit. You can use the -fallback-style and -style command line arguments to specify the coding style you wish to use. For more information see the ClangFormat manual.

I have begun using clang-format for my own open source projects, and I am pleased with the results. If you are interested, you can take a look at my SnKOpen .clang-format file.

There are various websites that will help you to generate the perfect .clang-format file for your project. One of the best is the clang-format configurator. The Unformat project, which generates a .clang-format file from example codebase may also be worth investigating.

To Brace Or Not To Brace

Summary

In this article I discuss my opinions on when braces should be used to delineate blocks of code in C and C++. In addition I discuss my views on where in the code the braces should be placed. I use examples from thirteen years of experience in C and C++ programming to back up my opinions.

Discussion

One commonly debated topic in C and C++ programming is whether or not braces should be used with if, while, and for statements in C and C++. The debate stems from the fact that if the if, while, or for statement requires exactly one statement after the test, braces are not required. For example, the following is allowed in the C and C++ language specifications:

if ({test})
    {statement}

while ({test})
    {statement}

for ({start}; {test}; {next})
    {statement}

According to the C and C++ language specifications braces are only considered mandatory if more than one statement is to be executed when the {test} evaluates to true. In fact, the C and C++ language specifications allow {statement} to be on the same line as the {test}.

However, it is my professional opinion that {statement} should never be placed on the same line as the {test}. In addition, braces should be considered mandatory.

First I will discuss the basis for my opinion that {statement} should never be placed on the same line as the {test} in if, for, and while statements.

Consider the following code snippet:

if (foo()) bar();
    baz();

When tracing through this code snippet in a debugger the debugger will stop on the

if (foo()) bar();

line. When the user uses the “step over” command, the debugger stops on the

baz();

line. The debugger gives no indication of whether or not the function bar was ever called.

If this code snippet were written as follows,

if (foo())
    bar();
baz();

the following will happen as the user steps through the code. First, the debugger will stop on the

if (foo())

line. When the user uses the “step over” command, the debugger will stop on the bar line if foo returned true. Otherwise the debugger will stop on the baz line next. By simply changing the formatting so that the call to bar is on its own line the code becomes much easier to debug and the user no longer has any doubt about whether or not the function bar was called. For this reason, the {statement} should never be placed on the same line as the {test} of a if, for, or while statement.

Some will argue that the user can use the step in command to determine if bar is called in the original version of the if statement. However, it is not practical to do so. This is because the first time the step in command is used on the

if (foo()) bar();

line, the debugger will step into foo. The user will then have to use the step out command to return to the function containing the if statement and use the step in command again to determine whether or not bar is called.

Matters are worse if the {test} of the if statement is more complicated such as the following:

if ((foo1() || foo2() || foo3()) && foo4()) bar();

In this case the user will need to use the step in, step out, step in sequence as many as four times just to find out if bar is called. Expecting someone to go to this much trouble to determine if a single function is called is simply unreasonable.

Next I will discuss the basis for my opinion that braces should be considered mandatory.

First, changes over time are easier to track if braces are considered to be mandatory. Consider the following function in which the if statement is written without braces:

/* revision 1 */
void fun()
{
    if (foo())
        bar();
    baz();
}

Every application changes over time. Lets say that the function changes so that the function bar1 needs to be called in addition to the function bar if foo returns true. The function fun becomes as follows:

/* revision 2 */
void fun()
{
    if (foo())
    {
        bar();
        bar1();
    }
    baz();
}

If you use a tool such as diff to determine what the changes between revision 1 and 2 of this function, it will indicate that three lines of code changed. The first change is the addition of the open brace. The second change is the addition of the call to bar1 after the call to bar. The third change is the addition of the closing brace. However, there was only one line of code that changed the actual functionality of the function fun.

If revision 1 of fun were written as follows:

/* revision 1 */
void fun()
{
    if (foo())
    {
        bar();
    }
    baz();
}

then diff would indicate that only one line had changed.

Next considering braces to be mandatory protects you from possible mistakes by developers making changes to your code when they are in a hurry and under a lot of pressure. Consider the original version of the function fun listed above. Lets assume that a developer wished to modify the function so that it would write a message to a log file when fun is about to call bar, but they are in a hurry or perhaps had just finished a task in Python which uses indentation and not braces to delineate code blocks and forget to add the braces. Then the function becomes as follows:

/* revision 2 */
void fun()
{
    if (foo())
        log("fun calling bar because foo returned TRUE.");
        bar();
    baz();
}

This code will compile without warnings. However, it will change the behavior of fun in an obviously unwanted way in that fun is now calling bar even if foo does not return TRUE. Fortunately it is easy to tell that this change in behavior was not intended in this case.

The problem becomes more complicated in situations in which instead of adding code to log the function call, the task is to have a function be called before bar if foo returns TRUE. Again lets assume that the developer is in a hurry or still has Python on his mind so he forgets to add the braces. Then the function fun becomes as follows:

/* revision 2 */
void fun()
{
    if (foo())
        fun1();
        bar();
    baz();
}

This code will also compile without any warnings. However, determining if this change is in error is not as easy as in the first case in which the change was the addition of a line of code intended for logging. By just looking at the code can you tell with 100% certainty that the developer who made this change did not intend to change the function fun so that bar is called all the time without asking the developer who made the change? If you are using a source control tool such as subversion to track changes to your software over time and the developer provides detailed change descriptions it is possible that you could. However, under most circumstances, you could not be 100% certain that the change in behavior was not intentional without talking to the developer who made the change. Then what will you do if the developer had died or is unavailable for some other reason?

If braces are considered mandatory, this problem will never come up in your project.

The final reason that braces should be considered mandatory is that it eases code navigation in modern text editors. Most modern text editors have a brace matching capability that allows you to jump to the matching brace. In if, for, and while statements this lets you jump to the end of the statement with a single command. For simple if, for, and while statements this makes no difference. However, there are cases in which the braces for one statement are optional rule is misused and code is written like this.

if ({test})
    if ({test1})
        if ({test2})
            for ({start}; {test3}; {next})
            {
                /*
                several thousand lines of code
                */
            }

In the case that you are reading through this code and you know that {test} does not return TRUE, you do not care about what happens if {test} returns TRUE. You want to move past the for loop to find out what happens if {test} returns FALSE. If braces were present for the “if ( {test} )” statement, you could simply press down arrow once and then use the move to matching brace command to move on to that section of code. However, there are no braces so you have to arrow down four times before using the move to matching brace command. If this same code were written as follows, the extra three keystrokes would not be necessary.

if ({test})
{
    if ({test1})
    {
        if ({test2})
        {
            for ({start}; {test3}; {next})
            {
                /*
                several thousand lines of code
                */
            }
        }
    }
}

There is also a debate about where the braces should be placed in code. In all my examples the opening brace is located on its own line. However, many programmers prefer to place the opening brace at the end of the if, for, or while line as follows:

if ({test}) {
    {statement}
}

This is perfectly legal according to the C and C++ language specifications. However it is my opinion that this should never be done, that the opening brace should always be placed on its own line. Consider the following:

if ({AVeryLongAndComplicatedTestThatGoesOffTheRightEdgeOfTheScreen}) {
    {statement}
    {statement1}
    /*
    several thousand more lines of code
    */
}

In this case, assuming that the test actually does go off the right edge of the screen, can you tell with absolute certainty that {statement1} and the several thousand additional lines of code will only get called if the test returns TRUE without going to the trouble of using the end key to determine whether or not the if line ends in a brace? Simply depending on indentation is not an accurate indicator. This is because the C and C++ language specification allows for different levels of indentation to be used in the same block of code. For example the following is legal in C and C++.

if ({test})
{
    {statement}
        {statement1}
{statement3}
    {statement4}
}

The fact is that in code where braces are placed at the end of the if, for, or while line, someone reading the code must go through the trouble to using the end key every time a if, for, or while line is encountered that goes off the right edge of the screen in order to determine whether or not multiple lines of code or a single line of code gets called when the test returns TRUE. This simply makes the job of reviewing the code much more difficult.

Too summarize, braces should be considered mandatory in if, for, and while statements in order to make tracking changes over time easier, to protect you from the harried programmer phenomena, and to make navigating through your code easier. In addition, the {statement} should never be placed on the same line as the {test} in if, for, or while statements in order to make it easier to debug your code. Finally, braces should never be placed at the end of the if, for, or while line in order to make it easier to determine whether one statement or many statements get called if the {test} returns TRUE when the test is long enough that it actually goes off the right edge of the screen.

 

Splitting a string in C++

Splitting a string in C++

Ben Key:

June 11, 2013; Updated November 25, 2018

Introduction

A common task in programming is to split a delimited string into an array of tokens. For example, it may be necessary to split a string containing spaces into an array of words. This is one area where programming languages like Java and Python surpass C++ since both of these programming languages include support for this in their standard libraries while C++ does not. However, this task can be accomplished in C++ in various ways.

In Java this task could be accomplished using the String.split method as follows.


String Str = "The quick brown fox jumped over the lazy dog.";
String[] Results = Str.split(" ");

In Python this task could be accomplished using the str.split method as follows.


Str = "The quick brown fox jumped over the lazy dog."
Results = Str.split()

In C++ this task is not quite so simple. It can still be accomplished in a variety of different ways.

Using the C runtime library

One option is to use the C runtime library. The following C runtime library functions can be used to split a string.

Unfortunately, there are various differences between platforms. This necessitates the use of C preprocessor directives to determine what is appropriate for the current platform.

The following code demonstrates the technique.


char* FindToken(
    char* str, const char* delim, char** saveptr)
{
#if (_SVID_SOURCE || _BSD_SOURCE || _POSIX_C_SOURCE >= 1 \
  || _XOPEN_SOURCE || _POSIX_SOURCE)
    return ::strtok_r(str, delim, saveptr);
#elif defined(_MSC_VER) && (_MSC_VER >= 1800)
    return strtok_s(token, delim, saveptr);
#else
    return std::strtok(token, delim);
#endif
}

wchar_t* FindToken(
    wchar_t* token, const wchar_t* delim, wchar_t** saveptr)
{
#if ( (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)) \
  || (defined(__cplusplus) && (__cplusplus >= 201103L)) )
    return std::wcstok(token, delim, saveptr);
#elif defined(_MSC_VER) && (_MSC_VER >= 1800)
    return wcstok_s(token, delim, saveptr);
#else
    return std::wcstok(token, delim);
#endif
}

char* CopyString(char* destination, const char* source)
{
    return std::strcpy(destination, source);
}

wchar_t* CopyString(wchar_t* destination, const wchar_t* source)
{
    return std::wcscpy(destination, source);
}

template <class charType>
size_t splitWithFindToken(
    const std::basic_string<charType>& str,
    const std::basic_string<charType>& delim,
    std::vector< std::basic_string<charType> >& tokens)
{
    std::unique_ptr<charType[]> ptr = std::make_unique<charType[]>(str.length() + 1);
    memset(ptr.get(), 0, (str.length() + 1) * sizeof(charType));
    CopyString(ptr.get(), str.c_str());
    charType* saveptr;
    charType* token = FindToken(ptr.get(), delim.c_str(), &saveptr);
    while (token != nullptr)
    {
        tokens.push_back(token);
        token = FindToken(nullptr, delim.c_str(), &saveptr);
    }
    return tokens.size();
}

Using the basic_istringstream class

Solution 1: std::istream_iterator

Perhaps the simplest method of accomplishing this task is using the basic_istringstream class as follows.


template <class charType>
size_t splitWithStringStream(
    const std::basic_string<charType>& str,
    std::vector< std::basic_string<charType> >& tokens)
{
    typedef std::basic_string<charType> my_string;
    typedef std::vector< std::basic_string<charType> > my_vector;
    typedef std::basic_istringstream<
        charType, std::char_traits<charType> >
        my_istringstream;
    typedef std::istream_iterator<
        std::basic_string<charType>, charType,
        std::char_traits<charType> >
        my_istream_iterator;
    tokens.clear();
    if (str.empty())
    {
        return 0;
    }
    my_istringstream iss(str);
    std::copy(
        my_istream_iterator{iss}, my_istream_iterator(),
        std::back_inserter<my_vector>(tokens));
    return tokens.size();
}

The splitWithStringStream function can be used as follows.


std::string str("The quick brown fox jumped over the lazy dog.");
std::vector<std::string> tokens;
size_t s = splitWithStringStream(str, tokens);

The splitWithStringStream function has the advantage of using nothing beyond functions that are part of the C++ standard library. To use it you just need to include the following C++ standard library headers: algorithm, iterator, sstream, string, and vector.

An alternate version of the function is as follows.


template <class charType>
size_t splitWithStringStream1(
    const std::basic_string<charType>& str,
    std::vector< std::basic_string<charType> >& tokens)
{
    typedef std::basic_string<charType> my_string;
    typedef std::vector< std::basic_string<charType> > my_vector;
    typedef std::basic_istringstream<
        charType, std::char_traits<charType> >
        my_istringstream;
    typedef std::istream_iterator<
        std::basic_string<charType>, charType,
        std::char_traits<charType> >
        my_istream_iterator;
    tokens.clear();
    if (str.empty())
    {
        return 0;
    }
    my_istringstream iss(str);
    std::vector<my_string> results(
        my_istream_iterator{iss}, my_istream_iterator());
    tokens.swap(results);
    return tokens.size();
}

The splitWithStringStream and the splitWithStringStream1 functions do have two drawbacks. First, the functions are potentially inefficient and slow since the entire string is copied into the stream, which takes up just as much memory as the string. Second, they only support space delimited strings.

Solution 2: std::getline

The following function makes it possible to use a character other than space as the delimiter.


template<typename charType>
size_t splitWithGetLine(
    const std::basic_string<charType>& str,
    const charType delim,
    std::vector< std::basic_string<charType> >& tokens)
{
    typedef std::basic_string<charType> my_string;
    typedef std::basic_istringstream<
        charType, std::char_traits<charType> >
        my_istringstream;
    tokens.clear();
    if (str.empty())
    {
       return 0;
    }
    my_istringstream iss(str);
    my_string token;
    while (std::getline(iss, token, delim))
    {
        tokens.push_back(token);
    }
    return tokens.size();
}

This function can be used as follows.


std::wstring str(L"This is a test.||This is only a test.|This concludes this test.");
std::vector<std::wstring> tokens;
size_t s = splitWithGetLine(str, L'|', tokens);

Note that this solution does not skip empty tokens, so the above example will result in tokens containing four items, one of which would be an empty string.

This function, like the splitWithStringStream and splitWithStringStream1 functions, is potentially inefficient and slow. It does allow the delimiter character to be specified. However, it only supports a single delimiting character. This function does not support strings in which several delimiting characters may be used.

To use the splitWithGetLine function you just need to include the following C++ standard library headers: algorithm, iterator, sstream, string, and vector.

Using only members of the basic_string class

Solution 1

It is possible to accomplish this task using only member functions of the basic_string class. The following function allows you to specify the delimiting character and uses only the find_first_not_of, find, and substr members of the basic_string class. The function also has optional parameters that allow you to specify that empty tokens should be ignored and to specify a maximum number of segments that the string should be split into.


template<typename charType>
size_t splitWithBasicString(
    const std::basic_string<charType>& str,
    const charType delim,
    std::vector< std::basic_string<charType> > &tokens,
    const bool trimEmpty = false,
    const size_t maxTokens = (size_t)(-1))
{
    typedef std::basic_string<charType> my_string;
    typedef typename my_string::size_type my_size_type;
    tokens.clear();
    if (str.empty())
    {
        return 0;
    }
    my_size_type len = str.length();
    // Skip delimiters at beginning.
    my_size_type left = str.find_first_not_of(delim, 0);
    size_t i = 1;
    if (!trimEmpty && left != 0)
    {
        tokens.push_back(my_string());
        ++i;
    }
    while (i < maxTokens)
    {
        my_size_type right = str.find(delim, left);
        if (right == my_string::npos)
        {
            break;
        }
        if (!trimEmpty || right - left > 0)
        {
            tokens.push_back(str.substr(left, right - left));
            ++i;
        }
        left = right + 1;
    }
    if (left < len)
    {
        tokens.push_back(str.substr(left));
    }
    return tokens.size();
}

This function does not suffer from the same potential performance issues as the stream based functions and it allows you to specify the delimiting character. However, it only supports a single delimiting character. This function does not support strings in which several delimiting characters may be used.

To use the splitWithBasicString function you just need to include the following C++ standard library headers: string and vector.

Solution 2

Sometimes the string that is to split uses several different delimiting characters. At other times it may simply be impossible to know for certain in advance what delimiting characters are used. In these cases you may know that the delimiting character could be one of several possibilities. In this case it is necessary for the function to be able to accept a string containing each possible delimiting character. This too can be accomplished using only member functions of the basic_string class.

The following function allows you to specify the delimiting character and uses only the find_first_not_of, find_first_of, and substr members of the basic_string class. The function also has optional parameters that allow you to specify that empty tokens should be ignored and to specify a maximum number of segments that the string should be split into.


template<typename charType>
size_t splitWithBasicString(
    const std::basic_string<charType>& str,
    const std::basic_string<charType>& delim,
    std::vector< std::basic_string<charType> >& tokens,
    const bool trimEmpty = false,
    const size_t maxTokens = (size_t)(-1))
{
    typedef std::basic_string<charType> my_string;
    typedef typename my_string::size_type my_size_type;
    tokens.clear();
    if (str.empty())
    {
       return 0;
    }
    my_size_type len = str.length();
    // Skip delimiters at beginning.
    my_size_type left = str.find_first_not_of(delim, 0);
    size_t i = 1;
    if (!trimEmpty && left != 0)
    {
        tokens.push_back(my_string());
        ++i;
    }
    while (i < maxTokens)
    {
        my_size_type right = str.find_first_of(delim, left);
        if (right == my_string::npos)
        {
           break;
        }
        if (!trimEmpty || right - left > 0)
        {
            tokens.push_back(str.substr(left, right - left));
            ++i;
        }
        left = right + 1;
    }
    if (left < len)
    {
       tokens.push_back(str.substr(left));
    }
    return tokens.size();
}

Using Boost

Boost is a collection of peer-reviewed, cross-platform, open source C++ libraries that are designed to complement and extend the C++ standard library. Boost provides at least two methods for splitting a string.

Solution 1

One option is to use the boost::algorithm::split function in the Boost String Algorithms Library.

In order to use the split function simply include boost/algorithm/string.hpp and then call the function as follows.


std::string str(" The  quick brown fox\tjumped over the lazy dog.");
std::vector<std::string> strs;
boost::split(strs, str, boost::is_any_of("\t "));

Solution 2

Another option is to use the Boost Tokenizer Library. In order to use the Boost Tokenizer Library simply include boost/tokenizer.hpp. Then you can use the Boost Tokenizer as follows.


typedef boost::char_separator<char> my_separator;
typedef boost::tokenizer<my_separator> my_tokenizer;
std::string str(" The  quick brown fox\tjumped over the lazy dog.");
my_separator sep(" \t");
my_tokenizer tokens(str, sep);
my_tokenizer::iterator itEnd = tokens.end();
for (my_tokenizer::iterator it = tokens.begin(); it != itEnd; ++it)
{
    std::cout << *it << std::endl;
}

Using the C++ String Toolkit Library

Another option is to use the C++ String Toolkit Library. The following example shows how the strtk::parse function can be used to split a string.


std::string str("The quick brown fox jumped over the lazy dog.");
std::vector<std::string> tokens;
strtk::parse(str, " ", tokens);

Other Options

Of course there are many other options. Feel free to refer to the web pages listed in the references section below for many other options.

Summary

In this article I discussed several options for splitting strings in C++.

The code for the basic_istringstream class, the basic_string class, and Boost along with a complete sample demonstrating the use of the the functions can be found on Ideone.

References

Nogard

Fire Breathing Dragon

Image source: https://flic.kr/p/vjc4Rk

A man comes running up to a group of people, shouting “Run, run for your lives! There is a nogard right behind me!”

They reply, “Nogard, what is a nogard?”

In response he says “You know, an enormous, previously thought to be mythological, beast that flies and breathes fire.”

Laughing, the group of people reply, “Do you mean a dragon?”

“Yes, that is it, that is what I meant, a dragon. A dragon is right behind me! Now you really should be running for your lives!”

“What do you mean by saying a dragon is right behind you? There is no such thing as a dragon.”

He looks over his shoulder, sees the dragon approaching, screams, and runs away. Meanwhile, the group of people are laughing and joking with each other, discussing the “crazy” stranger. Seconds latter they notice that they are now in a deep shadow, something that is very unusual out in the open on a cloudless, sunny day. They look up just in time to see a great blast of fire coming from the dragon’s mouth…

Meanwhile, a mile away, the “crazy” man mutters, “I tried to warn you” as he retreats deeper into the hiding place he found, hoping that the dragon would not think to search there.

 

Why the Department of Health and Human Services New Definition of Gender is Foolish and Short-sighted

Introduction

The Department of Health and Human Services is attempting to establish a legal definition of sex under Title IX, the federal civil rights law that bans gender discrimination in education programs that receive government financial assistance. They claim that the purpose of the effort is to adopt an explicit and uniform definition of gender as determined “on a biological basis that is clear, grounded in science, objective and administrable.” The problem is that they are rejecting established science that proves that gender often cannot be so clearly defined. The proposed definition would define sex as either male or female, unchangeable, and determined by the genitals that a person is born with.

More information on this proposal can be found at the following locations.

The problem is that gender is not black or white. There are times in which the genitals that a person is born with are not reliable indicators of gender. Some examples of situations in which genitals that a person is born with are not reliable indicators of gender are as follows.

Androgen Insensitivity Syndrome

Androgen insensitivity syndrome is a condition that affects sexual development before birth and during puberty. People with this condition are genetically male, with one X chromosome and one Y chromosome in each cell. Because their bodies are unable to respond to certain male sex hormones (called androgens), they may have mostly female external sex characteristics or signs of both male and female sexual development.

Source: Genetics Home Reference:Androgen insensitivity syndrome

Chimera

It’s possible for one person to have two different sets of DNA. In some cases a person may have both male and female DNA in their bodies. In other words, some cells are male and other cells are female.

For more information see the following.

Congenital Adrenal Hyperplasia

Congenital adrenal hyperplasia (CAH) is a group of inherited genetic disorders that affect the adrenal glands, a pair of walnut-sized organs above your kidneys. A person with CAH lacks one of the enzymes the adrenal glands use to produce hormones that help regulate metabolism, the immune system, blood pressure and other essential functions.

Source: Mayo Clinic: Congenital adrenal hyperplasia

Intersex

There are various conditions that led to a child being born with both male and female characteristics. These children were once referred to as hermaphrodites. Currently they are often referred to as intersex.

For more information see the following.

Swyer Syndrome

Swyer syndrome is a rare disorder characterized by the failure of the sex glands (i.e., testicles or ovaries) to develop. For more information see National Organization for Rare Diseases: Swyer syndrome.

Conclusion

The proposed definition of gender is not based on science as the Department of Health and Human Services claims. It is in fact based on science denial and right-wing religious ideology.

In addition, there have been some studies that suggest that there are structural differences in the brains of transgendered individuals.

Face it. It’s a mixed-up, muddled-up, shook-up world where gender is concerned, except for Lola.

 

Are Entitlements Programs Responsible for Rising Deficits?

In the article McConnell Blames Entitlements, Not GOP, for Rising Deficits, it is revealed that Senate Majority Leader Mitch McConnell recently “blamed rising federal deficits and debt on a bipartisan unwillingness to contain spending on Medicare, Medicaid and Social Security.”

Does his claim hold water, or is he lying?

Evidence

I Googled for “do entitlements programs contribute to debt” and “money borrowed from social security.” The following is a summary of what I found.

 


  • Budget Deficit and Entitlements: The Grand Delusion

    A key quote from this article is as follows.

    What this means is that cutting entitlements will not be a major part of closing the nation’s very formidable looming budget deficits unless Americans are prepared to renege on the commitment to assure the elderly and disabled basic income and health coverage.


  • A debt crisis is coming. But don’t blame entitlements.

    Key quotes from this article are as follows.

    The deficit, of course, reflects the gap between spending and revenue. It is dishonest to single out entitlements for blame. The federal budget was in surplus from 1998 through 2001, but large tax cuts and unfunded wars have been huge contributors to our current deficit problem. The primary reason the deficit in coming years will now be higher than had been expected is the reduction in tax revenue from last year’s tax cuts, not an increase in spending. This year, revenue is expected to fall below 17 percent of gross domestic product — the lowest it has been in the past 50 years with the exception of the aftermath of the past two recessions.

    There is some room for additional spending reductions in these programs, but not to an extent large enough to solve the long-run debt problem. The Social Security program needs only modest reforms to restore its 75-year solvency, and these should include adjustments in both spending and revenue.”

    Medicare has been a leader in bending the health-care cost curve. Reforms to payments and reformed benefit structures in Medicare could do more to hold down its future costs.”

    This article mentions that “Medicare has been a leader in bending the health-care cost curve.” The article What does it mean to “bend the health-care cost curve?” explains that without Medicare things would be far worse because health care costs would be even higher than they are now.

     

  • National debt: Why entitlement spending must be reined in
    This article seems to support the claim made by Mitch McConnell. But it primarily focuses on Medicade as an expensive program. Here are two quotes that support the claim made by the aforementioned Washington Post article that Social Security is not the problem.

    Social Security is another story. It has not contributed to the accrual of the country’s current debt load.

    In fact, Social Security has helped to keep federal deficits lower than they otherwise would have been because the federal government borrowed the surplus revenue paid into the program since the 1980s. And federal spending on the program is expected to grow much more slowly than on Medicare.

  • Key Drivers of the Debt
    This article mentions two key factors that contribute to dept; America’s demographics and rising healthcare costs. It makes only very brief mention of Social Security. It focuses primarily on Medicare, Medicaid, and health care costs in general.

    The following quote tells the truth about exactly why Medicare and Medicaid are problems.

    America has one of the most wasteful healthcare systems among advanced nations. Combined with the demographic realities of rapidly growing elderly population, America’s healthcare system leaves us with an unsustainable fiscal future.

 

From these articles we can conclude that Social Security is not a major contributing factor in rising deficits. Medicade apparently is a major contributing factor but it is not due to a flaw in Medicade itself. It is because the American healthcare system is broken!

The following articles discuss the reasons why the American healthcare system is so expensive.


  • 6 Reasons Healthcare Is So Expensive in the U.S.

    The following are two very important quotes from this article.

    In most countries the government negotiates drug prices with the drug makers, but when Congress created Medicare Part D, it specifically denied Medicare the right to use its power to negotiate drug prices. The Veteran’s Administration and Medicaid, which can negotiate drug prices, pay the lowest drug prices.

    Most other developed countries control costs, in part, by having the government play a stronger role in negotiating prices for healthcare. Their healthcare systems don’t require the high administrative costs that drive up pricing in the U.S. As the global overseers of their country’s systems, these governments have the ability to negotiate lower drug, medical equipment and hospital costs. They can influence the mix of treatments used and patients’ ability to go to specialists or seek more expensive treatments.


  • How Price Transparency Can Control the Cost of Health Care

The following is a meme this is often shared in response to discussions about Social Security being a major contributing factor in the deficit.

An often shared meme that makes the following claim. Next time a Republican tells you that ‘Social Security is broke,’ remind them that Pres. Bush ‘borrowed’ $1.37 trillion of Social Security surplus revenue to pay for his tax cuts for the rich and his war in Iraq and never paid it back.

The above meme makes the following claim. Next time a Republican tells you that ‘Social Security is broke,’ remind them that Pres. Bush ‘borrowed’ $1.37 trillion of Social Security surplus revenue to pay for his tax cuts for the rich and his war in Iraq and never paid it back.

The article Did George W. Bush ‘borrow’ from Social Security to fund the war in Iraq and tax cuts? ranks this claim as mostly false.

The following quotes are relevant.

For about 50 years, Social Security was a “pay-as-you-go” system, meaning annual payroll taxes pretty much covered that year’s benefits checks. Then in 1982, President Ronald Reagan enacted a payroll tax hike to prepare for the impending surge of retiring baby boomers, and a surplus began to build.

By law, the U.S. Treasury is required to take the surplus and, in exchange, issue interest-accruing bonds to the Social Security trust funds. The Treasury, meanwhile, uses the cash to fund government expenses, though it has to repay the bonds whenever the Social Security commissioner wants to redeem them.

Experts told us there’s no question that the Treasury will repay the Social Security surplus (including what was accumulated during the Bush years) when the trust fund starts redeeming the bonds in 2020.

Thus, this is not a contributing factor.

Final analysis

I rank the claim made by Mitch McConnell as liar, liar, pants vaporized.

While Medicare does contribute to the deficit, it is because of larger problems related to the cost of healthcare. Simply requiring greater healthcare price transparency and allowing the United States government greater control over the cost of drugs would deal with much of that.

As for Social Security, there are numerous less drastic options for addressing the long-range solvency problem that do not involve cutting benefits. Some of these options are as follows.

  • Increase the Payroll Tax Cap
  • Eliminate the Payroll Tax Cap
  • Reduce Benefits for Higher Earners
  • Increase the Payroll Tax Rate
  • Apply Payroll Tax to All Salary Reduction Plans

A more detailed discussion of suggested reforms to Social Security can be found in the following locations.

 

On the perils of assuming file path manipulation is easy

I recently worked on a bug in which a product developed by my employer was no longer finding user settings files. Everything worked correctly in a prior version of the product but failed in the current version.

The following is a simplified version of the old code.

BOOL GetUserProfilePath(LPWSTR profilePathName)
{
    if (profilePathName == nullptr) return FALSE;
    profilePathName[0] = static_cast<wchar_t>(0);
    std::wstring userPath = GetUserPath();
    if (userPath.empty()) return FALSE;
    TCHAR tempPath[MAX_PATH];
    GetProgramPath(tempPath);
    ::PathAppend(tempPath, userPath.c_str());
    ::PathCanonicalize(profilePathName, tempPath);
    if (profilePathName[0])
    {
        return TRUE;
    }
    return FALSE;
}

The following is a simplified version of the new code.

BOOL GetUserProfilePath(std::wstring& profilePathName)
{
    profilePathName.clear();
    std::wstring userPath = GetUserPath();
    if (userPath.empty()) return FALSE;
    wstring tempPath;
    GetProgramPath(tempPath);
    Path::Append(tempPath, userPath);
    wchar_t temp[MAX_PATH];
    PathCanonicalize(temp, tempPath.c_str());
    profilePathName = temp;
    if (!profilePathName.empty())
    {
        return TRUE;
    }
    return FALSE;
}

At first glance these two implementations appear to be equivalent.

Now, here are a few additional details that reveal why they are not in any way equivalent.

First, due to a previously unknown bug, the GetUserPath function returned a complete path, not a relative path. It was, in fact, the program path. Thus if the program path was “c:\MyApp,” then the value returned by the GetUserPath function was “c:\MyApp.”

Second, the GetProgramPath function also obtains the program path, thus in this example its value would also be “c:\MyApp.”

Thus, when the ::PathAppend Win32 API function was being called it was being asked to append a full path onto another full path. Microsoft chose to, in their infinite wisdom, attempt to protect you from this mistake by trying to *just do the right thing* by generating a valid path despite your mistake. Thus, the bug in the GetUserPath function was harmless.

The person who wrote the Path::Append function was not aware of this. The following was their implementation of the function.

namespace Path
{
    // Various details left out.
    bool Append(std::wstring& dest, const std::wstring& source)
    {
        Path::AddBackslash(dest);
        dest += source;
        return !dest.empty();
    }
}

Therefore, the end result of this change was that the otherwise harmless bug in the GetUserPath function suddenly became a big deal. Before this change, the GetUserProfilePath function returned the string “c:\MyApp” regardless of the bug in the GetUserPath function. After this change the GetUserProfilePath function returned the string “c:\MyApp\c:\MyApp,” which is an invalid path.

I was asked to fix this with the smallest change possible. The GetUserPath function happens to be part of a deprecated component that we are trying to phase out. Therefore, I was not allowed to touch it. As a result, I chose to fix it by simply modifying the Path::Append function so that it uses the ::PathAppend Win32 API function instead of attempting to do the work itself.

I am sharing this in the hopes that it could spare you the difficulties I had over the three days it took me to diagnose and fix this bug.