Using Character or String Constants in a Template function

Using Character or String Constants in a Template function

Ben Key:

October 1, 2013; November 10, 2018

 


This article provides several techniques that may be used when creating template functions in which the character type is a template parameter and you must use character or string constants. It also provides the source code for several template functions and macros that may be used to implement one of the proposed solutions.

Problem Description

There are various data types that may be used to represent characters in C++. The most common of these are char and wchar_t. It often is necessary to write code that is capable of handling either type of character.

One alternative is to simply implement the function once for each character type that needs to be supported. There are obvious problems with this approach. The first problem is that this approach causes unnecessary code duplication. It also opens up the possibility that the different implementations of the function will diverge over time as changes are made in one implementation, for example in order to fix bugs, but not in the other implementations of the function.

Consider the following example.

The CommandLineToArgVector function parses a command line string such as that which might be returned by the GetCommandLine function. This function depends on a number of character constants: specifically NULCHAR (’\0’), SPACECHAR (’ ‘), TABCHAR (’), DQUOTECHAR (’”‘), and SLASHCHAR (’\’).

In order to support characters of both char and wchar_t it is necessary to implement the function twice as follows.


inline size_t CommandLineToArgVector(
    const char* commandLine,
    std::vector<std::string>& arg_vector)
{
    arg_vector.clear();
    /* Code omitted. */
    return static_cast<size_t>(arg_vector.size());
}

inline size_t CommandLineToArgVector(
    const wchar_t* commandLine,
    std::vector<std::wstring>& arg_vector)
{
    arg_vector.clear();
    /* Code omitted. */
    return static_cast<size_t>(arg_vector.size());
}

The obvious solution is to implement this as a template function that accepts the character type as a template parameter as follows.


template<typename CharType>
size_t CommandLineToArgVector(
    const CharType* commandLine,
    std::vector< std::basic_string<CharType> >& arg_vector)
{
    arg_vector.clear();
    /* Code omitted. */
    return static_cast<size_t>(arg_vector.size());
}

The only thing that prevents us from doing that is the existence of the character constants. How do you represent the constants so that they are can be of either data type?

Possible Solutions

As is often the case, there are many possible solutions to this problem. This article discusses several of them.

Widen

One possible solution is to make use of std::ctype::widen as follows.


template <typename CharType>
CharType widenChar(
    const char ch, const std::locale& loc = std::locale())
{
    const auto& cType = std::use_facet<std::ctype<CharType>>(loc);
    return cType.widen(ch);
}

This same technique can be extended for use with string literals as follows.


template<typename CharType>
std::basic_string<CharType> widenString(
    const char* str, const std::locale& loc = std::locale())
{
    std::basic_string<CharType> ret;
    if (str == nullptr || str[0] == 0)
    {
        return ret;
    }
    const auto& cType = std::use_facet<std::ctype<CharType>>(loc);
    auto srcLen = std::strlen(str);
    auto bufferSize = srcLen + 32;
    auto tmpPtr = yekneb::make_unique<CharType[]>(bufferSize);
    auto tmp = tmpPtr.get();
    cType.widen(str, str + srcLen, tmp);
    ret = tmp;
    return ret;
}

Then in the template function you can use the character constants in a character type neutral way as follows.


const CharType NULCHAR = widenChar<CharType>('\0');
const CharType SPACECHAR = widenChar<CharType>(' ');
const CharType TABCHAR =  widenChar<CharType>('\t');
const CharType DQUOTECHAR = widenChar<CharType>('\"');
const CharType SLASHCHAR =  widenChar<CharType>('\\');

The problem is that this requires a multibyte to wide character conversion for the wchar_t case. This may not be much of an issue for functions that are not used very often and are not performance critical. However the cost of the multibyte to wide character conversion would not be acceptable in performance critical situations.

The question is how to solve this problem without these performance issues.

Algorithm Traits

Another option is to create an algorithm traits class that includes functions that return each of the constants as follows.


template <typename CharType>
struct CommandLineToArgVector_Traits
{
    CharType NULCHAR();
    CharType SPACECHAR();
    CharType TABCHAR();
    CharType DQUOTECHAR();
    CharType SLASHCHAR();
    CharType* STRING();
};

template<>
struct CommandLineToArgVector_Traits<char>
{
    char NULCHAR()
    {
        return '\0';
    }
    char SPACECHAR()
    {
        return ' ';
    }
    char TABCHAR()
    {
        return '\t';
    }
    char DQUOTECHAR()
    {
        return '\"';
    }
    char SLASHCHAR()
    {
        return '\\';
    }
    char* STRING()
    {
        return "String";
    }
};

template<>
struct CommandLineToArgVector_Traits<wchar_t>
{
    wchar_t NULCHAR()
    {
        return L'\0';
    }
    wchar_t SPACECHAR()
    {
        return L' ';
    }
    wchar_t TABCHAR()
    {
        return L'\t';
    }
    wchar_t DQUOTECHAR()
    {
        return L'\"';
    }
    wchar_t SLASHCHAR()
    {
        return L'\\';
    }
    wchar_t* STRING()
    {
        return L"String";
    }
};

As you can see, it is a simple matter to incorporate string constants in the algorithm traits structure.

The algorithm traits structure can then be used as follows in your template function.


CommandLineToArgVector_Traits<CharType> traits;
const CharType NULCHAR = traits.NULCHAR();
const CharType SPACECHAR = traits.SPACECHAR();
const CharType TABCHAR =  traits.TABCHAR();
const CharType DQUOTECHAR = traits.DQUOTECHAR();
const CharType SLASHCHAR =  traits.SLASHCHAR();
const CharType* STRING = traits.STRING();

This solution does not have the same performance issues that the widenChar solution does but these performance gains come at the cost of a significant increase in the complexity of the code. In addition, there is now a need to maintain two implementations of the Algorithm Traits structure, which means that we are right back where we started from, though admittedly maintaining two implementations of the Algorithm Traits structure is a lot less work than maintaining two implementations of the algorithm.

It would be ideal to develop a solution that has the convenience and simplicity of the widenChar solution without the performance issues. The question is how.

Preprocessor and Template Magic

Fortunately it is possible to solve this problem using some preprocessor and template magic. I found this solution in the Stack Overflow article How to express a string literal within a template parameterized by the type of the characters used to represent the literal.

The solution is as follows.


template<typename CharType>
CharType CharConstantOfType(const char c, const wchar_t w);

template<>
char CharConstantOfType<char>(const char c, const wchar_t /*w*/)
{
    return c;
}
template<>
wchar_t CharConstantOfType<wchar_t>(const char /*c*/, const wchar_t w)
{
    return w;
}

template<typename CharType>
const CharType* StringConstantOfType(const char* c, const wchar_t* w);

template<>
const char* StringConstantOfType<char>(const char* c, const wchar_t* /*w*/)
{
    return c;
}
template<>
const wchar_t* StringConstantOfType<wchar_t>(const char* /*c*/, const wchar_t* w)
{
    return w;
}

#define _TOWSTRING(x) L##x
#define TOWSTRING(x) _TOWSTRING(x)
#define CHAR_CONSTANT(TYPE, STRING) CharConstantOfType<TYPE>(STRING, TOWSTRING(STRING))
#define STRING_CONSTANT(TYPE, STRING) StringConstantOfType<TYPE>(STRING, TOWSTRING(STRING))

Then in the template function you can use the character constants in a character type neutral way as follows.


const CharType NULCHAR = CHAR_CONSTANT(CharType, '\0');
const CharType SPACECHAR = CHAR_CONSTANT(CharType, ' ');
const CharType TABCHAR =  CHAR_CONSTANT(CharType, '\t');
const CharType DQUOTECHAR = CHAR_CONSTANT(CharType, '\"');
const CharType SLASHCHAR =  CHAR_CONSTANT(CharType, '\\');
const CharType* STRING = STRING_CONSTANT(CharType, "String");

Abracadabra. The problem is solved without any code duplication or performance issues.

Article Source Code

The source code for this article can be found on SullivanAndKey.com. The relevant files may be found at the following locations.

  • widen.h: Contains the source of widenChar and widenString.
  • ConstantOfType.h: Contains the source of CharConstantOfType and StringConstantOfType.
  • CmdLineToArgv.h: The header file for the CommandLineToArgVector function.
  • CmdLineToArgv.cpp: The source code for the CommandLineToArgVector function.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.