View source code
Display the source code in rt/util/utf.d from which this page was generated on github.
Report a bug
If you spot a problem with this page, click here to create a Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone.

Module rt.util.utf

Encode and decode UTF-8, UTF-16 and UTF-32 strings.

For Win32 systems, the C wchar_t type is UTF-16 and corresponds to the D wchar type. For Posix systems, the C wchar_t type is UTF-32 and corresponds to the D utf.dchar type.

UTF character support is restricted to (\u0000 <= character <= \U0010FFFF).

See Also

Wikipedia
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335

Functions

NameDescription
codeLength(c) Returns the code length of c in the encoding using C as a code point. The code is returned in character count, not in bytes.
decode(s, idx) Decodes and returns character starting at s[idx]. idx is advanced past the decoded character. If the character is not well formed, a UtfException is thrown and idx remains unchanged.
encode(s, c) Encodes character c and appends it to array s[].
isValidDchar(c) Test if c is a valid UTF-32 character.
stride(s, i) stride() returns the length of a UTF-8 sequence starting at index i in string s.
stride(s, i) stride() returns the length of a UTF-16 sequence starting at index i in string s.
stride(s, i) stride() returns the length of a UTF-32 sequence starting at index i in string s.
toUCSindex(s, i) Given an index i into an array of characters s[], and assuming that index i is at the start of a UTF character, determine the number of UCS characters up to that index i.
toUTF16(s) Encodes string s into UTF-16 and returns the encoded string. toUTF16z() is suitable for calling the 'W' functions in the Win32 API that take an LPWSTR or LPCWSTR argument.
toUTF16z(s) Encodes string s into UTF-16 and returns the encoded string. toUTF16z() is suitable for calling the 'W' functions in the Win32 API that take an LPWSTR or LPCWSTR argument.
toUTF32(s) Encodes string s into UTF-32 and returns the encoded string.
toUTF8(s) Encodes string s into UTF-8 and returns the encoded string.
toUTFindex(s, n) Given a UCS index n into an array of characters s[], return the UTF index.
validate(s) Checks to see if string is well formed or not. S can be an array of char, wchar, or dchar. Throws a UtfException if it is not. Use to check all untrusted input for correctness.

Authors

Walter Bright, Sean Kelly

License

Boost License 1.0.