View source code
Display the source code in rt/util/utf.d from which this
page was generated on github.
Report a bug
If you spot a problem with this page, click here to create a
Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page.
Requires a signed-in GitHub account. This works well for small changes.
If you'd like to make larger changes you may want to consider using
local clone.
Module rt.util.utf
Encode and decode UTF-8, UTF-16 and UTF-32 strings.
For Win32 systems, the C wchar_t type is UTF-16 and corresponds to the D wchar type. For Posix systems, the C wchar_t type is UTF-32 and corresponds to the D utf.dchar type.
UTF character support is restricted to (\u0000 <= character <= \U0010FFFF).
See Also
Wikipedia
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335
Functions
Name | Description |
---|---|
codeLength(c)
|
Returns the code length of c in the encoding using C as a
code point. The code is returned in character count, not in bytes.
|
decode(s, idx)
|
Decodes and returns character starting at s[idx]. idx is advanced past the decoded character. If the character is not well formed, a UtfException is thrown and idx remains unchanged. |
encode(s, c)
|
Encodes character c and appends it to array s[]. |
isValidDchar(c)
|
Test if c is a valid UTF-32 character. |
stride(s, i)
|
stride() returns the length of a UTF-8 sequence starting at index i in string s. |
stride(s, i)
|
stride() returns the length of a UTF-16 sequence starting at index i in string s. |
stride(s, i)
|
stride() returns the length of a UTF-32 sequence starting at index i in string s. |
toUCSindex(s, i)
|
Given an index i into an array of characters s[], and assuming that index i is at the start of a UTF character, determine the number of UCS characters up to that index i. |
toUTF16(s)
|
Encodes string s into UTF-16 and returns the encoded string. toUTF16z() is suitable for calling the 'W' functions in the Win32 API that take an LPWSTR or LPCWSTR argument. |
toUTF16z(s)
|
Encodes string s into UTF-16 and returns the encoded string. toUTF16z() is suitable for calling the 'W' functions in the Win32 API that take an LPWSTR or LPCWSTR argument. |
toUTF32(s)
|
Encodes string s into UTF-32 and returns the encoded string. |
toUTF8(s)
|
Encodes string s into UTF-8 and returns the encoded string. |
toUTFindex(s, n)
|
Given a UCS index n into an array of characters s[], return the UTF index. |
validate(s)
|
Checks to see if string is well formed or not. S can be an array
of char , wchar , or dchar . Throws a UtfException
if it is not. Use to check all untrusted input for correctness.
|
Authors
Walter Bright, Sean Kelly
License
Copyright © 1999-2022 by the D Language Foundation | Page generated by ddox.