Display the source code in std/utf.d from which this page was generated on github.

If you spot a problem with this page, click here to create a Bugzilla issue.

Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone.

Module `std.utf`

Encode and decode UTF-8, UTF-16 and UTF-32 strings.

UTF character support is restricted to '\u0000' <= character <= '\U0010FFFF'.


Category	Functions
Decode	`decode` `decodeFront`
Lazy decode	`byCodeUnit` `byChar` `byWchar` `byDchar` `byUTF`
Encode	`encode` `toUTF8` `toUTF16` `toUTF32` `toUTFz` `toUTF16z`
Length	`codeLength` `count` `stride` `strideBack`
Index	`toUCSindex` `toUTFindex`
Validation	`isValidDchar` `isValidCodepoint` `validate`
Miscellaneous	`replacementDchar` `UseReplacementDchar` `UTFException`

Functions

Name	Description
`byCodeUnit(r)`	Iterate a range of char, wchar, or dchars by code unit.
`codeLength(c)`	Returns the number of code units that are required to encode the code point `c` when `C` is the character type used to encode it.
`codeLength(input)`	Returns the number of code units that are required to encode `str` in a string whose character type is `C`. This is particularly useful when slicing one string with the length of another and the two string types use different character types.
`count(str)`	Returns the total number of code points encoded in `str`.
`decode(str, index)`	Decodes and returns the code point starting at `str[index]`. `index` is advanced to one past the decoded code point. If the code point is not well-formed, then a `UTFException` is thrown and `index` remains unchanged.
`decodeBack(str, numCodeUnits)`	`decodeBack` is a variant of `decode` which specifically decodes the last code point. Unlike `decode`, `decodeBack` accepts any bidirectional range of code units (rather than just a string or random access range). It also takes the range by `ref` and pops off the elements as it decodes them. If `numCodeUnits` is passed in, it gets set to the number of code units which were in the code point which was decoded.
`decodeFront(str, numCodeUnits)`	`decodeFront` is a variant of `decode` which specifically decodes the first code point. Unlike `decode`, `decodeFront` accepts any input range of code units (rather than just a string or random access range). It also takes the range by `ref` and pops off the elements as it decodes them. If `numCodeUnits` is passed in, it gets set to the number of code units which were in the code point which was decoded.
`encode(buf, c)`	Encodes `c` into the static array, `buf`, and returns the actual length of the encoded character (a number between `1` and `4` for `char[4]` buffers and a number between `1` and `2` for `wchar[2]` buffers).
`encode(str, c)`	Encodes `c` in `str`'s encoding and appends it to `str`.
`isValidCodepoint(c)`	Checks if a single character forms a valid code point.
`isValidDchar(c)`	Check whether the given Unicode code point is valid.
`stride(str, index)`	Calculate the length of the UTF sequence starting at `index` in `str`.
`strideBack(str, index)`	Calculate the length of the UTF sequence ending one code unit before `index` in `str`.
`toUCSindex(str, index)`	Given `index` into `str` and assuming that `index` is at the start of a UTF sequence, `toUCSindex` determines the number of UCS characters up to `index`. So, `index` is the index of a code unit at the beginning of a code point, and the return value is how many code points into the string that that code point is.
`toUTF16(s)`	Encodes the elements of `s` to UTF-16 and returns a newly GC allocated `wstring` of the elements.
`toUTF16z(str)`	`toUTF16z` is a convenience function for `toUTFz!(const(wchar)*)`.
`toUTF32(s)`	Encodes the elements of `s` to UTF-32 and returns a newly GC allocated `dstring` of the elements.
`toUTF8(s)`	Encodes the elements of `s` to UTF-8 and returns a newly allocated string of the elements.
`toUTFindex(str, n)`	Given a UCS index `n` into `str`, returns the UTF index. So, `n` is how many code points into the string the code point is, and the array index of the code unit is returned.
`validate(str)`	Checks to see if `str` is well-formed unicode or not.

Classes

Name	Description
`UTFException`	Exception thrown on errors in std.utf functions.

Templates

Name	Description
`toUTFz`	Returns a C-style zero-terminated string equivalent to `str`. `str` must not contain embedded `'\0'`'s as any C function will treat the first `'\0'` that it sees as the end of the string. If `str.empty` is `true`, then a string containing only `'\0'` is returned.

Manifest constants

Name	Type	Description
`replacementDchar`		Inserted in place of invalid UTF sequences.

Aliases

Name	Type	Description
`byChar`	`byUTF!char`	Iterate an input range of characters by char, wchar, or dchar. These aliases simply forward to `byUTF` with the corresponding C argument.
`byDchar`	`byUTF!dchar`	Iterate an input range of characters by char, wchar, or dchar. These aliases simply forward to `byUTF` with the corresponding C argument.
`byUTF`	`byUTF!UC`	Iterate an input range of characters by char type `C` by encoding the elements of the range.
`byWchar`	`byUTF!wchar`	Iterate an input range of characters by char, wchar, or dchar. These aliases simply forward to `byUTF` with the corresponding C argument.
`UseReplacementDchar`	`Flag!("useReplacementDchar")`	Whether or not to replace invalid UTF with `replacementDchar`

Authors

Walter Bright and Jonathan M Davis

License

Boost License 1.0.

API Documentation

Module `std.utf`

See Also

Functions

Classes

Templates

Manifest constants

Aliases

Authors

License

API Documentation

Module std.utf

See Also

Functions

Classes

Templates

Manifest constants

Aliases

Authors

License

Module `std.utf`