Alias std.utf.byUTF
Iterate an input range
of characters by char type C
by encoding the elements of the range.
UTF sequences that cannot be converted to the specified encoding are either
replaced by U+FFFD per "5.22 Best Practice for U+FFFD Substitution"
of the Unicode Standard 6.2 or result in a thrown UTFException.
Hence byUTF is not symmetric.
This algorithm is lazy, and does not allocate memory.
@nogc
, pure
-ity, nothrow
, and @safe
-ty are inferred from the
r
parameter.
Parameters
Name | Description |
---|---|
C | char , wchar , or dchar |
useReplacementDchar | UseReplacementDchar.yes means replace invalid UTF with replacementDchar ,
UseReplacementDchar.no means throw UTFException for invalid UTF |
Throws
UTFException
if invalid UTF sequence and useReplacementDchar
is set to UseReplacementDchar
GC
Does not use GC if useReplacementDchar
is set to UseReplacementDchar
Returns
A forward range if R
is a range and not auto-decodable, as defined by
isAutodecodableString
, and if the base range is
also a forward range.
Or, if R
is a range and it is auto-decodable and
is(ElementEncodingType!typeof(r) == C)
, then the range is passed
to byCodeUnit
.
Otherwise, an input range of characters.
Example
import std .algorithm .comparison : equal;
// hellö as a range of `char`s, which are UTF-8
"hell\u00F6" .byUTF!char() .equal(['h', 'e', 'l', 'l', 0xC3, 0xB6]);
// `wchar`s are able to hold the ö in a single element (UTF-16 code unit)
"hell\u00F6" .byUTF!wchar() .equal(['h', 'e', 'l', 'l', 'ö']);
// 𐐷 is four code units in UTF-8, two in UTF-16, and one in UTF-32
"𐐷" .byUTF!char() .equal([0xF0, 0x90, 0x90, 0xB7]);
"𐐷" .byUTF!wchar() .equal([0xD801, 0xDC37]);
"𐐷" .byUTF!dchar() .equal([0x00010437]);
Example
import std .algorithm .comparison : equal;
import std .exception : assertThrown;
assert("hello\xF0betty" .byChar .byUTF!(dchar, UseReplacementDchar .yes) .equal("hello\uFFFDetty"));
assertThrown!UTFException("hello\xF0betty" .byChar .byUTF!(dchar, UseReplacementDchar .no) .equal("hello betty"));