Arrays
Kinds
There are four kinds of arrays:
Syntax | Description |
---|---|
type* | Pointers to data |
type[integer] | Static arrays |
type[] | Dynamic arrays |
type[type] | Associative arrays |
Pointers
int* p;
A pointer to type T has a value which is a reference (address) to another object of type T. It is commonly called a pointer to T.
If a pointer contains a null value, it is not pointing to a valid object.
When a pointer to T is dereferenced, it must either contain a null value, or point to a valid object of type T.
- The behavior when a null pointer is dereferenced. Typically the program will be aborted.
- Dereferencing a pointer that is not null and does not point to a valid object of type T.
Static Arrays
int[3] s;
Static arrays have a length fixed at compile time.
The total size of a static array cannot exceed 16Mb.
A static array with a dimension of 0 is allowed, but no space is allocated for it.
Static arrays are value types. They are passed to and returned by functions by value.
- Use dynamic arrays for larger arrays.
- Static arrays with 0 elements are useful as the last member of a variable length struct, or as the degenerate case of a template expansion.
- Because static arrays are passed to functions by value, a larger array can consume a lot of stack space. Use dynamic arrays instead.
Dynamic Arrays
int[] a;
Dynamic arrays consist of a length and a pointer to the array data. Multiple dynamic arrays can share all or parts of the array data.
- Use dynamic arrays instead of pointers to arrays as much as practical. Indexing of dynamic arrays are bounds checked, avoiding buffer underflow and overflow problems.
Array Declarations
Declarations appear before the identifier being declared and read right to left, so:
int[] a; // dynamic array of ints int[4][3] b; // array of 3 arrays of 4 ints each int[][5] c; // array of 5 dynamic arrays of ints. int*[]*[3] d; // array of 3 pointers to dynamic arrays of pointers to ints int[]* e; // pointer to dynamic array of ints
Array Usage
There are two broad kinds of operations to do on an array - affecting the handle to the array, and affecting the contents of the array.
The handle to an array is specified by naming the array, as in p, s or a:
int* p; int[3] s; int[] a; int[] b; p = s.ptr; // p points to the first element of the array s. p = a.ptr; // p points to the first element of the array a. a = p; // error, since the length of the array pointed // to by p is unknown a = s; // a is initialized to point to the s array a = b; // a points to the same array as b does
Indexing
See also IndexExpression.
Slicing
Slicing an array means to specify a subarray of it. An array slice does not copy the data, it is only another reference to it. For example:
void foo(int); int[10] a; // declare array of 10 ints int[] b; b = a[1..3]; // a[1..3] is a 2 element array consisting of // a[1] and a[2] foo(b[1]); // equivalent to foo(0) a[2] = 3; foo(b[1]); // equivalent to foo(3)
The [] is shorthand for a slice of the entire array. For example, the assignments to b:
int[10] a; int[] b; b = a; b = a[]; b = a[0 .. a.length];
are all semantically equivalent.
Slicing is not only handy for referring to parts of other arrays, but for converting pointers into bounds-checked arrays:
int* p; int[] b = p[0..8];
See also SliceExpression.
Array Copying
When the slice operator appears as the left-hand side of an assignment expression, it means that the contents of the array are the target of the assignment rather than a reference to the array. Array copying happens when the left-hand side is a slice, and the right-hand side is an array of or pointer to the same type.
int[3] s; int[3] t; s = t; // the 3 elements of t are copied into s s[] = t; // the 3 elements of t are copied into s s[] = t[]; // the 3 elements of t are copied into s s[1..2] = t[0..1]; // same as s[1] = t[0] s[0..2] = t[1..3]; // same as s[0] = t[1], s[1] = t[2] s[0..4] = t[0..4]; // error, only 3 elements in s s[0..2] = t; // error, operands have different lengths
Overlapping Copying
Overlapping copies are an error:
s[0..2] = s[1..3]; // error, overlapping copy s[1..3] = s[0..2]; // error, overlapping copy
Disallowing overlapping makes it possible for more aggressive parallel code optimizations than possible with the serial semantics of C.
If overlapping is required, use std.algorithm.mutation.copy:
import std.algorithm; int[] s = [1, 2, 3, 4]; copy(s[1..3], s[0..2]); assert(s == [2, 3, 3, 4]);
Array Setting
If a slice operator appears as the left-hand side of an assignment expression, and the type of the right-hand side is the same as the element type of the left-hand side, then the array contents of the left-hand side are set to the right-hand side.
int[3] s; int* p; s[] = 3; // same as s[0] = 3, s[1] = 3, s[2] = 3 p[0..2] = 3; // same as p[0] = 3, p[1] = 3
Array Concatenation
The binary operator ~ is the cat operator. It is used to concatenate arrays:
int[] a; int[] b; int[] c; a = b ~ c; // Create an array from the concatenation // of the b and c arrays
Many languages overload the + operator to mean concatenation. This confusingly leads to, does:
"10" + 3 + 4
produce the number 17, the string "1034" or the string "107" as the result? It isn't obvious, and the language designers wind up carefully writing rules to disambiguate it - rules that get incorrectly implemented, overlooked, forgotten, and ignored. It's much better to have + mean addition, and a separate operator to be array concatenation.
Similarly, the ~= operator means append, as in:
a ~= b; // a becomes the concatenation of a and b
Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array, so:
a = b; // a refers to b a = b ~ c[0..0]; // a refers to a copy of b
Appending does not always create a copy, see setting dynamic array length for details.
Array Operations
Many array operations, also known as vector operations, can be expressed at a high level rather than as a loop. For example, the loop:
T[] a, b;
...
for (size_t i = 0; i < a.length; i++)
a[i] = b[i] + 4;
assigns to the elements of a the elements of b with 4 added to each. This can also be expressed in vector notation as:
T[] a, b; ... a[] = b[] + 4;
A vector operation is indicated by the slice operator appearing as the left-hand side of an assignment or an op-assignment expression. The right-hand side can be an expression consisting either of an array slice of the same length and type as the left-hand side or a scalar expression of the element type of the left-hand side, in any combination.
T[] a, b, c; ... a[] -= (b[] + 4) * c[];
The slice on the left and any slices on the right must not overlap. All operands are evaluated exactly once, even if the array slice has zero elements in it.
The order in which the array elements are computed is implementation defined, and may even occur in parallel. An application must not depend on this order.
Implementation note: Many vector operations are expected to take advantage of any vector math instructions available on the target computer.
Pointer Arithmetic
int[3] abc; // static array of 3 ints int[] def = [ 1, 2, 3 ]; // dynamic array of 3 ints void dibb(int* array) { array[2]; // means same thing as *(array + 2) *(array + 2); // get 3rd element } void diss(int[] array) { array[2]; // ok *(array + 2); // error, array is not a pointer } void ditt(int[3] array) { array[2]; // ok *(array + 2); // error, array is not a pointer }
Rectangular Arrays
Experienced FORTRAN numerics programmers know that multidimensional "rectangular" arrays for things like matrix operations are much faster than trying to access them via pointers to pointers resulting from "array of pointers to array" semantics. For example, the D syntax:
double[][] matrix;
declares matrix as an array of pointers to arrays. (Dynamic arrays are implemented as pointers to the array data.) Since the arrays can have varying sizes (being dynamically sized), this is sometimes called "jagged" arrays. Even worse for optimizing the code, the array rows can sometimes point to each other! Fortunately, D static arrays, while using the same syntax, are implemented as a fixed rectangular layout:
double[3][3] matrix;
declares a rectangular matrix with 3 rows and 3 columns, all contiguously in memory. In other languages, this would be called a multidimensional array and be declared as:
double matrix[3,3];
Array Length
Within the [ ] of a static or a dynamic array, the symbol $ represents the length of the array.
int[4] foo; int[] bar = foo; int* p = &foo[0]; // These expressions are equivalent: bar[] bar[0 .. 4] bar[0 .. $] bar[0 .. bar.length] p[0 .. $] // '$' is not defined, since p is not an array bar[0]+$ // '$' is not defined, out of scope of [ ] bar[$-1] // retrieves last element of the array
Array Properties
Static array properties are:
Property | Description |
---|---|
.init | Returns an array literal with each element of the literal being the .init property of the array element type. |
.sizeof | Returns the array length multiplied by the number of bytes per array element. |
.length | Returns the number of elements in the array. This is a fixed quantity for static arrays. It is of type size_t. |
.ptr | Returns a pointer to the first element of the array. |
.dup | Create a dynamic array of the same size and copy the contents of the array into it. The copy will have any immutability or const stripped. If this conversion is invalid the call will not compile. |
.idup | Create a dynamic array of the same size and copy the contents of the array into it. The copy is typed as being immutable. If this conversion is invalid the call will not compile. |
Dynamic array properties are:
Property | Description |
---|---|
.init | Returns null. |
.sizeof | Returns the size of the dynamic array reference, which is 8 in 32-bit builds and 16 on 64-bit builds. |
.length | Get/set number of elements in the array. It is of type size_t. |
.ptr | Returns a pointer to the first element of the array. |
.dup | Create a dynamic array of the same size and copy the contents of the array into it. The copy will have any immutability or const stripped. If this conversion is invalid the call will not compile. |
.idup | Create a dynamic array of the same size and copy the contents of the array into it. The copy is typed as being immutable. If this conversion is invalid the call will not compile. |
Examples:
int* p; int[3] s; int[] a; p.length; // error, length not known for pointer s.length; // compile time constant 3 a.length; // runtime value p.dup; // error, length not known s.dup; // creates an array of 3 elements, copies // elements s into it a.dup; // creates an array of a.length elements, copies // elements of a into it
Setting Dynamic Array Length
The .length property of a dynamic array can be set as the left-hand side of an = operator:
array.length = 7;
This causes the array to be reallocated in place, and the existing contents copied over to the new array. If the new array length is shorter, the array is not reallocated, and no data is copied. It is equivalent to slicing the array:
array = array[0..7];
If the new array length is longer, the remainder is filled out with the default initializer.
To maximize efficiency, the runtime always tries to resize the array in place to avoid extra copying. It will always do a copy if the new size is larger and the array was not allocated via the new operator or resizing in place would overwrite valid data in the array. For example:
char[] a = new char[20]; char[] b = a[0..10]; char[] c = a[10..20]; char[] d = a; b.length = 15; // always reallocates because extending in place would // overwrite other data in a. b[11] = 'x'; // a[11] and c[1] are not affected d.length = 1; d.length = 20; // also reallocates, because doing this will overwrite a and // c c.length = 12; // may reallocate in place if space allows, because nothing // was allocated after c. c[5] = 'y'; // may affect contents of a, but not b or d because those // were reallocated. a.length = 25; // This always reallocates because if c extended in place, // then extending a would overwrite c. If c didn't // reallocate in place, it means there was not enough space, // which will still be true for a. a[15] = 'z'; // does not affect c, because either a or c has reallocated.
To guarantee copying behavior, use the .dup property to ensure a unique array that can be resized. Also, one may use the .capacity property to determine how many elements can be appended to the array without reallocating.
These issues also apply to appending arrays with the ~= operator. Concatenation using the ~ operator is not affected since it always reallocates.
Resizing a dynamic array is a relatively expensive operation. So, while the following method of filling an array:
int[] array; while (1) { import core.stdc.stdio : getchar; auto c = getchar; if (!c) break; ++array.length; array[array.length - 1] = c; }
int[] array; array.length = 100; // guess int i; for (i = 0; ; i++) { import core.stdc.stdio : getchar; auto c = getchar; if (!c) break; if (i == array.length) array.length *= 2; array[i] = c; } array.length = i;
Picking a good initial guess is an art, but you usually can pick a value covering 99% of the cases. For example, when gathering user input from the console - it's unlikely to be longer than 80.
Also, you may wish to utilize the reserve function to pre-allocate array data to use with the append operator.
int[] array; size_t cap = array.reserve(10); // request array ~= [1, 2, 3, 4, 5]; assert(cap >= 10); // allocated may be more than request assert(cap == array.capacity);
Functions as Array Properties
See Uniform Function Call Syntax (UFCS).
Array Bounds Checking
It is an error to index an array with an index that is less than 0 or greater than or equal to the array length. If an index is out of bounds, a RangeError exception is raised if detected at runtime, and an error if detected at compile time. A program may not rely on array bounds checking happening, for example, the following program is incorrect:
import core.exception; try { auto array = [1, 2]; for (auto i = 0; ; i++) { array[i] = 5; } } catch (RangeError) { // terminate loop }
auto array = [1, 2]; for (auto i = 0; i < array.length; i++) { array[i] = 5; }
Implementation Note: Compilers should attempt to detect array bounds errors at compile time, for example:
int[3] foo; int x = foo[3]; // error, out of bounds
Insertion of array bounds checking code at runtime should be turned on and off with a compile time switch.
An out of bounds memory access will cause undefined behavior, therefore array bounds check is normally enabled in @safe functions. The runtime behavior is part of the language semantics.
See also Safe Functions.
Disabling Array Bounds Checking
Insertion of array bounds checking code at runtime may be turned off with a compiler switch -boundscheck.
If the bounds check in @system or @trusted code is disabled, the code correctness must still be guaranteed by the code author.
On the other hand, disabling the bounds check in @safe code will break the guaranteed memory safety by compiler. It's not recommended unless motivated by speed measurements.
Array Initialization
Default Initialization
- Pointers are initialized to null.
- Static array contents are initialized to the default initializer for the array element type.
- Dynamic arrays are initialized to having 0 elements.
- Associative arrays are initialized to having 0 elements.
Void Initialization
Void initialization happens when the Initializer for an array is void. What it means is that no initialization is done, i.e. the contents of the array will be undefined. This is most useful as an efficiency optimization. Void initializations are an advanced technique and should only be used when profiling indicates that it matters.
Static Initialization of Statically Allocated Arrays
Static initalizations are supplied by a list of array element values enclosed in [ ]. The values can be optionally preceded by an index and a :. If an index is not supplied, it is set to the previous index plus 1, or 0 if it is the first value.
int[3] a = [ 1:2, 3 ]; // a[0] = 0, a[1] = 2, a[2] = 3
This is most handy when the array indices are given by enums:
enum Color { red, blue, green }; int value[Color.max + 1] = [ Color.blue :6, Color.green:2, Color.red :5 ];
These arrays are statically allocated when they appear in global scope. Otherwise, they need to be marked with const or static storage classes to make them statically allocated arrays.
Special Array Types
Strings
A string is an array of characters. String literals are just an easy way to write character arrays. String literals are immutable (read only).
char[] str1 = "abc"; // error, "abc" is not mutable char[] str2 = "abc".dup; // ok, make mutable copy immutable(char)[] str3 = "abc"; // ok immutable(char)[] str4 = str1; // error, str4 is not mutable immutable(char)[] str5 = str1.idup; // ok, make immutable copy
The name string is aliased to immutable(char)[], so the above declarations could be equivalently written as:
char[] str1 = "abc"; // error, "abc" is not mutable char[] str2 = "abc".dup; // ok, make mutable copy string str3 = "abc"; // ok string str4 = str1; // error, str4 is not mutable string str5 = str1.idup; // ok, make immutable copy
char[] strings are in UTF-8 format. wchar[] strings are in UTF-16 format. dchar[] strings are in UTF-32 format.
Strings can be copied, compared, concatenated, and appended:
str1 = str2;
if (str1 < str3) { ... }
func(str3 ~ str4);
str4 ~= str1;
with the obvious semantics. Any generated temporaries get cleaned up by the garbage collector (or by using alloca()). Not only that, this works with any array not just a special String array.
A pointer to a char can be generated:
char* p = &str[3]; // pointer to 4th element char* p = str; // pointer to 1st element
Since strings, however, are not 0 terminated in D, when transferring a pointer to a string to C, add a terminating 0:
str ~= "\0";
or use the function std.string.toStringz.
The type of a string is determined by the semantic phase of compilation. The type is one of: char[], wchar[], dchar[], and is determined by implicit conversion rules. If there are two equally applicable implicit conversions, the result is an error. To disambiguate these cases, a cast or a postfix of c, w or d can be used:
cast(immutable(wchar) [])"abc" // this is an array of wchar characters "abc"w // so is this
String literals that do not have a postfix character and that have not been cast can be implicitly converted between string, wstring, and dstring as necessary.
char c; wchar w; dchar d; c = 'b'; // c is assigned the character 'b' w = 'b'; // w is assigned the wchar character 'b' //w = 'bc'; // error - only one wchar character at a time w = "b"[0]; // w is assigned the wchar character 'b' w = "\r"[0]; // w is assigned the carriage return wchar character d = 'd'; // d is assigned the character 'd'
Strings and Unicode
Note that built-in comparison operators operate on a code unit basis. The end result for valid strings is the same as that of code point for code point comparison as long as both strings are in the same normalization form. Since normalization is a costly operation not suitable for language primitives it's assumed to be enforced by the user.
The standard library lends a hand for comparing strings with mixed encodings (by transparently decoding, see std.algorithm.cmp), case-insensitive comparison and normalization.
Last but not least, a desired string sorting order differs by culture and language and is usually nothing like code point for code point comparison. The natural order of strings is obtained by applying the Unicode collation algorithm that should be implemented in the standard library.
C's printf() and Strings
printf() is a C function and is not part of D. printf() will print C strings, which are 0 terminated. There are two ways to use printf() with D strings. The first is to add a terminating 0, and cast the result to a char*:
str ~= "\0"; printf("the string is '%s'\n", str.ptr);or:
import std.string; printf("the string is '%s'\n", std.string.toStringz(str));
String literals already have a 0 appended to them, so can be used directly:
printf("the string is '%s'\n", "string literal".ptr);
So, why does the first string literal to printf not need the .ptr? The first parameter is prototyped as a const(char)*, and a string literal can be implicitly cast to a const(char)*. The rest of the arguments to printf, however, are variadic (specified by ...), and a string literal typed immutable(char)[] cannot pass to variadic parameters.
The second way is to use the precision specifier. The length comes first, followed by the pointer:
printf("the string is '%.*s'\n", str.length, str.ptr);
The best way is to use std.stdio.writefln, which can handle D strings:
import std.stdio; writefln("the string is '%s'", str);
Void Arrays
There is a special type of array which acts as a wildcard that can hold arrays of any kind, declared as void[]. Void arrays are used for low-level operations where some kind of array data is being handled, but the exact type of the array elements are unimportant. The .length of a void array is the length of the data in bytes, rather than the number of elements in its original type. Array indices in indexing and slicing operations are interpreted as byte indices.
Arrays of any type can be implicitly converted to a void array; the compiler inserts the appropriate calculations so that the .length of the resulting array's size is in bytes rather than number of elements. Void arrays cannot be converted back to the original type without using a cast, and it is an error to convert to an array type whose element size does not evenly divide the length of the void array.
void main() { int[] data1 = [1,2,3]; long[] data2; void[] arr = data1; // OK, int[] implicit converts to void[]. assert(data1.length == 3); assert(arr.length == 12); // length is implicitly converted to bytes. //data1 = arr; // Illegal: void[] does not implicitly // convert to int[]. int[] data3 = cast(int[]) arr; // OK, can convert with explicit cast. data2 = cast(long[]) arr; // Runtime error: long.sizeof == 8, which // does not divide arr.length, which is 12 // bytes. }
Void arrays can also be static if their length is known at compile-time. The length is specified in bytes:
void main() { byte[2] x; int[2] y; void[2] a = x; // OK, lengths match void[2] b = y; // Error: int[2] is 8 bytes long, doesn't fit in 2 bytes. }
While it may seem that void arrays are just fancy syntax for ubyte[], there is a subtle distinction. The garbage collector generally will not scan ubyte[] arrays for pointers, ubyte[] being presumed to contain only pure byte data, not pointers. However, it will scan void[] arrays for pointers, since such an array may have been implicitly converted from an array of pointers or an array of elements that contain pointers. Allocating an array that contains pointers as ubyte[] may run the risk of the GC collecting live memory if these pointers are the only remaining references to their targets.
Implicit Conversions
A pointer T* can be implicitly converted to one of the following:
- void*
A static array T[dim] can be implicitly converted to one of the following:
- T[]
- const(U)[]
- const(U[])
- void[]
A dynamic array T[] can be implicitly converted to one of the following (U is a base class of T):
- const(U)[]
- const(U[])
- void[]