View source code
							
							
						
								Display the source code in std/utf.d from which this
								page was generated on github.
							
						
							Report a bug
							
						
								If you spot a problem with this page, click here to create a
								Bugzilla issue.
							
						
							
								Improve this page
							
							
					
								Quickly fork, edit online, and submit a pull request for this page.
								Requires a signed-in GitHub account. This works well for small changes.
								If you'd like to make larger changes you may want to consider using
								local clone.
							
						Module std.utf
Encode and decode UTF-8, UTF-16 and UTF-32 strings.
UTF character support is restricted to
    '\u0000' <= character <= '\U0010FFFF'.
| Category | Functions | 
|---|---|
| Decode | decodedecodeFront | 
| Lazy decode | byCodeUnitbyCharbyWcharbyDcharbyUTF | 
| Encode | encodetoUTF8toUTF16toUTF32toUTFztoUTF16z | 
| Length | codeLengthcountstridestrideBack | 
| Index | toUCSindextoUTFindex | 
| Validation | isValidDcharvalidate | 
| Miscellaneous | replacementDcharUseReplacementDcharUTFException | 
See Also
Wikipedia
        http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
        http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335
Functions
| Name | Description | 
|---|---|
| 
									byCodeUnit(r)
								 | Iterate a range of char, wchar, or dchars by code unit. | 
| 
									codeLength(c)
								 | Returns the number of code units that are required to encode the code point cwhenCis the character type used to encode it. | 
| 
									codeLength(input)
								 | Returns the number of code units that are required to encode strin a string whose character type isC. This is particularly useful
    when slicing one string with the length of another and the two string
    types use different character types. | 
| 
									count(str)
								 | Returns the total number of code points encoded in str. | 
| 
									decode(str, index)
								 | Decodes and returns the code point starting at str[index].indexis advanced to one past the decoded code point. If the code point is not
    well-formed, then aUTFExceptionis thrown andindexremains
    unchanged. | 
| 
									decodeBack(str, numCodeUnits)
								 | decodeBackis a variant ofdecodewhich specifically decodes
    the last code point. Unlikedecode,decodeBackaccepts any
    bidirectional range of code units (rather than just a string or random access
    range). It also takes the range byrefand pops off the elements as it
    decodes them. IfnumCodeUnitsis passed in, it gets set to the number
    of code units which were in the code point which was decoded. | 
| 
									decodeFront(str, numCodeUnits)
								 | decodeFrontis a variant ofdecodewhich specifically decodes
    the first code point. Unlikedecode,decodeFrontaccepts any
    input range
    of code units (rather than just a string or random access
    range). It also takes the range byrefand pops off the elements as it
    decodes them. IfnumCodeUnitsis passed in, it gets set to the number
    of code units which were in the code point which was decoded. | 
| 
									encode(buf, c)
								 | Encodes cinto the static array,buf, and returns the actual
    length of the encoded character (a number between1and4forchar[4]buffers and a number between1and2forwchar[2]buffers). | 
| 
									encode(str, c)
								 | Encodes cinstr's encoding and appends it tostr. | 
| 
									isValidDchar(c)
								 | Check whether the given Unicode code point is valid. | 
| 
									stride(str, index)
								 | Calculate the length of the UTF sequence starting at indexinstr. | 
| 
									strideBack(str, index)
								 | Calculate the length of the UTF sequence ending one code unit before indexinstr. | 
| 
									toUCSindex(str, index)
								 | Given indexintostrand assuming thatindexis at the start
    of a UTF sequence,toUCSindexdetermines the number of UCS characters
    up toindex. So,indexis the index of a code unit at the
    beginning of a code point, and the return value is how many code points into
    the string that that code point is. | 
| 
									toUTF16(s)
								 | Encodes the elements of sto UTF-16 and returns a newly GC allocatedwstringof the elements. | 
| 
									toUTF16z(str)
								 | toUTF16zis a convenience function fortoUTFz!(const(wchar)*). | 
| 
									toUTF32(s)
								 | Encodes the elements of sto UTF-32 and returns a newly GC allocateddstringof the elements. | 
| 
									toUTF8(s)
								 | Encodes the elements of sto UTF-8 and returns a newly allocated
 string of the elements. | 
| 
									toUTFindex(str, n)
								 | Given a UCS index nintostr, returns the UTF index.
    So,nis how many code points into the string the code point is, and
    the array index of the code unit is returned. | 
| 
									validate(str)
								 | Checks to see if stris well-formed unicode or not. | 
Classes
| Name | Description | 
|---|---|
| 
									UTFException
								 | Exception thrown on errors in std.utf functions. | 
Templates
| Name | Description | 
|---|---|
| 
									toUTFz
								 | Returns a C-style zero-terminated string equivalent to str.strmust not contain embedded'\0''s as any C function will treat the first'\0'that it sees as the end of the string. Ifstristrue, then a string containing only'\0'is returned. | 
Manifest constants
| Name | Type | Description | 
|---|---|---|
| replacementDchar | Inserted in place of invalid UTF sequences. | 
Aliases
| Name | Type | Description | 
|---|---|---|
| byChar |  | Iterate an input range
 of characters by char, wchar, or dchar.
 These aliases simply forward to byUTFwith the
 corresponding C argument. | 
| byDchar |  | Iterate an input range
 of characters by char, wchar, or dchar.
 These aliases simply forward to byUTFwith the
 corresponding C argument. | 
| byUTF | byUTF!(Unqual!C) | Iterate an input range
 of characters by char type Cby encoding the elements of the range. | 
| byWchar |  | Iterate an input range
 of characters by char, wchar, or dchar.
 These aliases simply forward to byUTFwith the
 corresponding C argument. | 
| UseReplacementDchar | Flag!("useReplacementDchar") | Whether or not to replace invalid UTF with replacementDchar | 
Authors
License
					Copyright © 1999-2022 by the D Language Foundation | Page generated by ddox.