View source code
Display the source code in std/utf.d from which this page was generated on github.
Report a bug
If you spot a problem with this page, click here to create a Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone.

Alias std.utf.byUTF

Iterate an input range of characters by char type C by encoding the elements of the range.

alias byUTF(C, Flag!("useReplacementDchar") useReplacementDchar = Yes.useReplacementDchar) = byUTF!UC;

UTF sequences that cannot be converted to the specified encoding are either replaced by U+FFFD per "5.22 Best Practice for U+FFFD Substitution" of the Unicode Standard 6.2 or result in a thrown UTFException. Hence byUTF is not symmetric. This algorithm is lazy, and does not allocate memory. @nogc, pure-ity, nothrow, and @safe-ty are inferred from the r parameter.

Parameters

NameDescription
C char, wchar, or dchar
useReplacementDchar UseReplacementDchar.yes means replace invalid UTF with replacementDchar, UseReplacementDchar.no means throw UTFException for invalid UTF

Throws

UTFException if invalid UTF sequence and useReplacementDchar is set to UseReplacementDchar.yes

GC

Does not use GC if useReplacementDchar is set to UseReplacementDchar.no

Returns

A bidirectional range if R is a bidirectional range and not auto-decodable, as defined by isAutodecodableString.

A forward range if R is a forward range and not auto-decodable.

Or, if R is a range and it is auto-decodable and is(ElementEncodingType!typeof(r) == C), then the range is passed to byCodeUnit.

Otherwise, an input range of characters.

Example

import std.algorithm.comparison : equal;

// hellö as a range of `char`s, which are UTF-8
assert("hell\u00F6".byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6]));

// `wchar`s are able to hold the ö in a single element (UTF-16 code unit)
assert("hell\u00F6".byUTF!wchar().equal(['h', 'e', 'l', 'l', 'ö']));

// 𐐷 is four code units in UTF-8, two in UTF-16, and one in UTF-32
assert("𐐷".byUTF!char().equal([0xF0, 0x90, 0x90, 0xB7]));
assert("𐐷".byUTF!wchar().equal([0xD801, 0xDC37]));
assert("𐐷".byUTF!dchar().equal([0x00010437]));

Example

import std.algorithm.comparison : equal;
import std.exception : assertThrown;

assert("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.yes).equal("hello\uFFFDetty"));
assertThrown!UTFException("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.no).equal("hello betty"));

Example

import std.range.primitives;
wchar[] s = ['ă', 'î'];

auto rc = s.byUTF!char;
static assert(isBidirectionalRange!(typeof(rc)));
writeln(rc.back); // 0xae
rc.popBack;
writeln(rc.back); // 0xc3
rc.popBack;
writeln(rc.back); // 0x83
rc.popBack;
writeln(rc.back); // 0xc4

auto rw = s.byUTF!wchar;
static assert(isBidirectionalRange!(typeof(rw)));
writeln(rw.back); // 'î'
rw.popBack;
writeln(rw.back); // 'ă'

auto rd = s.byUTF!dchar;
static assert(isBidirectionalRange!(typeof(rd)));
writeln(rd.back); // 'î'
rd.popBack;
writeln(rd.back); // 'ă'

Authors

Walter Bright and Jonathan M Davis

License

Boost License 1.0.