Module std.encoding
Classes and functions for handling and transcoding between various encodings.
For cases where the encoding is known at compile-time, functions are provided for arbitrary encoding and decoding of characters, arbitrary transcoding between strings of different type, as well as validation and sanitization.
Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252.
| Category | Functions | 
|---|---|
| Decode |     codePoints
    decode
    decodeReverse
    safeDecode
 | 
| Conversion |     codeUnits
    sanitize
    transcode
 | 
| Classification |     canEncode
    isValid
    isValidCodePoint
    isValidCodeUnit
 | 
| BOM |     BOM
    BOMSeq
    getBOM
    utfBOM
 | 
| Length & Index |     firstSequence
    encodedLength
    index
    lastSequence
    validLength
 | 
| Encoding schemes |     encodingName
    EncodingScheme
    EncodingSchemeASCII
    EncodingSchemeLatin1
    EncodingSchemeLatin2
    EncodingSchemeUtf16Native
    EncodingSchemeUtf32Native
    EncodingSchemeUtf8
    EncodingSchemeWindows1250
    EncodingSchemeWindows1251
    EncodingSchemeWindows1252
 | 
| Representation |     AsciiChar
    AsciiString
    Latin1Char
    Latin1String
    Latin2Char
    Latin2String
    Windows1250Char
    Windows1250String
    Windows1251Char
    Windows1251String
    Windows1252Char
    Windows1252String
 | 
| Exceptions |     INVALID_SEQUENCE
    EncodingException
 | 
For cases where the encoding is not known at compile-time, but is
known at run-time, the abstract class EncodingScheme
and its subclasses is provided.  To construct a run-time encoder/decoder,
one does e.g.
auto e = EncodingScheme .create("utf-8");
This library supplies EncodingScheme subclasses for ASCII,
ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250,
WINDOWS-1251, WINDOWS-1252, UTF-8, and (on little-endian architectures)
UTF-16LE and UTF-32LE; or (on big-endian architectures) UTF-16BE and UTF-32BE.
This library provides a mechanism whereby other modules may add EncodingScheme subclasses for any other encoding.
Functions
| Name | Description | 
|---|---|
								
									canEncode(c)
								
							 | 
							Returns true iff it is possible to represent the specified codepoint in the encoding. | 
								
									codePoints(s)
								
							 | 
							Returns a foreachable struct which can bidirectionally iterate over all code points in a string. | 
								
									codeUnits(c)
								
							 | 
							Returns a foreachable struct which can bidirectionally iterate over all code units in a code point. | 
								
									decode(s)
								
							 | 
							Decodes a single code point. | 
								
									decodeReverse(s)
								
							 | 
							Decodes a single code point from the end of a string. | 
								
									encode(c)
								
							 | 
							Encodes a single code point. | 
								
									encode(c, array)
								
							 | 
							Encodes a single code point into an array. | 
								
									encode(c, dg)
								
							 | 
							Encodes a single code point to a delegate. | 
								
									encode(s, range)
								
							 | 
							Encodes the contents of s in units of type Tgt, writing the result to an
output range.
 | 
						
								
									encodedLength(c)
								
							 | 
							Returns the number of code units required to encode a single code point. | 
								
									encodingName()
								
							 | 
							Returns the name of an encoding. | 
								
									firstSequence(s)
								
							 | 
							Returns the length of the first encoded sequence. | 
								
									getBOM(input)
								
							 | 
							Returns a BOMSeq for a given input.
If no BOM is present the BOMSeq for BOM is
returned. The BOM sequence at the beginning of the range will
not be comsumed from the passed range. If you pass a reference type
range make sure that save creates a deep copy.
 | 
						
								
									index(s, n)
								
							 | 
							Returns the array index at which the (n+1)th code point begins. | 
								
									isValid(s)
								
							 | 
							Returns true if the string is encoded correctly | 
								
									isValidCodePoint(c)
								
							 | 
							Returns true if c is a valid code point | 
								
									isValidCodeUnit(c)
								
							 | 
							Returns true if the code unit is legal. For example, the byte 0x80 would not be legal in ASCII, because ASCII code units must always be in the range 0x00 to 0x7F. | 
								
									lastSequence(s)
								
							 | 
							Returns the length of the last encoded sequence. | 
								
									safeDecode(s)
								
							 | 
							Decodes a single code point. The input does not have to be valid. | 
								
									sanitize(s)
								
							 | 
							Sanitizes a string by replacing malformed code unit sequences with valid code unit sequences. The result is guaranteed to be valid for this encoding. | 
								
									transcode(s, r)
								
							 | 
							Convert a string from one encoding to another. | 
								
									validLength(s)
								
							 | 
							Returns the length of the longest possible substring, starting from the first code unit, which is validly encoded. | 
Classes
| Name | Description | 
|---|---|
								
									EncodingException
								
							 | 
							The base class for exceptions thrown by this module | 
								
									EncodingScheme
								
							 | 
							Abstract base class of all encoding schemes | 
								
									EncodingSchemeASCII
								
							 | 
							EncodingScheme to handle ASCII | 
								
									EncodingSchemeLatin1
								
							 | 
							EncodingScheme to handle Latin-1 | 
								
									EncodingSchemeLatin2
								
							 | 
							EncodingScheme to handle Latin-2 | 
								
									EncodingSchemeUtf16Native
								
							 | 
							EncodingScheme to handle UTF-16 in native byte order | 
								
									EncodingSchemeUtf32Native
								
							 | 
							EncodingScheme to handle UTF-32 in native byte order | 
								
									EncodingSchemeUtf8
								
							 | 
							EncodingScheme to handle UTF-8 | 
								
									EncodingSchemeWindows1250
								
							 | 
							EncodingScheme to handle Windows-1250 | 
								
									EncodingSchemeWindows1251
								
							 | 
							EncodingScheme to handle Windows-1251 | 
								
									EncodingSchemeWindows1252
								
							 | 
							EncodingScheme to handle Windows-1252 | 
Enums
| Name | Description | 
|---|---|
								
									AsciiChar
								
							 | 
							Defines various character sets. | 
								
									BOM
								
							 | 
							Definitions of common Byte Order Marks.
The elements of the enum can used as indices into bomTable to get
matching BOMSeq.
 | 
						
								
									Latin1Char
								
							 | 
							Defines an Latin1-encoded character. | 
								
									Latin2Char
								
							 | 
							Defines a Latin2-encoded character. | 
								
									Windows1250Char
								
							 | 
							Defines a Windows1250-encoded character. | 
								
									Windows1251Char
								
							 | 
							Defines a Windows1251-encoded character. | 
								
									Windows1252Char
								
							 | 
							Defines a Windows1252-encoded character. | 
Manifest constants
| Name | Type | Description | 
|---|---|---|
								
									INVALID_SEQUENCE
								
							 | 
							Special value returned by safeDecode
 | 
						|
								
									utfBOM
								
							 | 
							Constant defining a fully decoded BOM | 
Global variables
| Name | Type | Description | 
|---|---|---|
								
									bomTable
								
							 | 
							
								immutable(Tuple!(std.encoding.BOM,"schema",ubyte[],"sequence")[])
							 | 
							Mapping of a byte sequence to Byte Order Mark (BOM) | 
Aliases
| Name | Type | Description | 
|---|---|---|
								
									AsciiString
								
							 | 
							
								immutable(AsciiChar)[]
							 | 
							Defines various character sets. | 
								
									BOMSeq
								
							 | 
							
								Tuple!(std.encoding.BOM,"schema",ubyte[],"sequence")
							 | 
							The type stored inside bomTable.
 | 
						
								
									Latin1String
								
							 | 
							
								immutable(Latin1Char)[]
							 | 
							Defines an Latin1-encoded string (as an array of immutable(Latin1Char)).
 | 
						
								
									Latin2String
								
							 | 
							
								immutable(Latin2Char)[]
							 | 
							Defines an Latin2-encoded string (as an array of  immutable(Latin2Char)).
 | 
						
								
									Windows1250String
								
							 | 
							
								immutable(Windows1250Char)[]
							 | 
							Defines an Windows1250-encoded string (as an array of  immutable(Windows1250Char)).
 | 
						
								
									Windows1251String
								
							 | 
							
								immutable(Windows1251Char)[]
							 | 
							Defines an Windows1251-encoded string (as an array of  immutable(Windows1251Char)).
 | 
						
								
									Windows1252String
								
							 | 
							
								immutable(Windows1252Char)[]
							 | 
							Defines an Windows1252-encoded string (as an array of  immutable(Windows1252Char)).
 | 
						
Authors
Janice Caron