View source code
Display the source code in std/utf.d from which this page was generated on github.
Report a bug
If you spot a problem with this page, click here to create a Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone.

Function std.utf.byCodeUnit

Iterate a range of char, wchar, or dchars by code unit.

auto auto byCodeUnit(R) (
  R r
)
if (isAutodecodableString!R || isInputRange!R && isSomeChar!(ElementEncodingType!R) || is(R : const(dchar[])) && !isStaticArray!R);

The purpose is to bypass the special case decoding that front does to character arrays. As a result, using ranges with byCodeUnit can be nothrow while front throws when it encounters invalid Unicode sequences.

A code unit is a building block of the UTF encodings. Generally, an individual code unit does not represent what's perceived as a full character (a.k.a. a grapheme cluster in Unicode terminology). Many characters are encoded with multiple code units. For example, the UTF-8 code units for ø are 0xC3 0xB8. That means, an individual element of byCodeUnit often does not form a character on its own. Attempting to treat it as one while iterating over the resulting range will give nonsensical results.

Parameters

NameDescription
r an input range of characters (including strings) or a type that implicitly converts to a string type.

Returns

If r is not an auto-decodable string (i.e. a narrow string or a user-defined type that implicits converts to a string type), then r is returned.

Otherwise, r is converted to its corresponding string type (if it's not already a string) and wrapped in a random-access range where the element encoding type of the string (its code unit) is the element type of the range, and that range returned. The range has slicing.

If r is quirky enough to be a struct or class which is an input range of characters on its own (i.e. it has the input range API as member functions), and it's implicitly convertible to a string type, then r is returned, and no implicit conversion takes place.

If r is wrapped in a new range, then that range has a source property for returning the string that's currently contained within that range.

See Also

Refer to the std.uni docs for a reference on Unicode terminology.

For a range that iterates by grapheme cluster (written character) see byGrapheme.

Example

import std.range.primitives;

auto r = "Hello, World!".byCodeUnit();
static assert(hasLength!(typeof(r)));
static assert(hasSlicing!(typeof(r)));
static assert(isRandomAccessRange!(typeof(r)));
static assert(is(ElementType!(typeof(r)) == immutable char));

// contrast with the range capabilities of standard strings
auto s = "Hello, World!";
static assert(isBidirectionalRange!(typeof(r)));
static assert(is(ElementType!(typeof(s)) == dchar));

static assert(!isRandomAccessRange!(typeof(s)));
static assert(!hasSlicing!(typeof(s)));
static assert(!hasLength!(typeof(s)));

Example

byCodeUnit does no Unicode decoding

string noel1 = "noe\u0308l"; // noël using e + combining diaeresis
assert(noel1.byCodeUnit[2] != 'ë');
writeln(noel1.byCodeUnit[2]); // 'e'

string noel2 = "no\u00EBl"; // noël using a precomposed ë character
// Because string is UTF-8, the code unit at index 2 is just
// the first of a sequence that encodes 'ë'
assert(noel2.byCodeUnit[2] != 'ë');

Example

byCodeUnit exposes a source property when wrapping narrow strings.

import std.algorithm.comparison : equal;
import std.range : popFrontN;
{
    auto range = byCodeUnit("hello world");
    range.popFrontN(3);
    assert(equal(range.save, "lo world"));
    string str = range.source;
    writeln(str); // "lo world"
}
// source only exists if the range was wrapped
{
    auto range = byCodeUnit("hello world"d);
    static assert(!__traits(compiles, range.source));
}

Authors

Walter Bright and Jonathan M Davis

License

Boost License 1.0.