- Removing Keys
- Testing Membership
- Using Classes as the KeyType
- Using Structs or Unions as the KeyType
- Construction or Assignment on Setting AA Entries
- Inserting if not present
- Advanced updating
- Static Initialization of AAs
- Runtime Initialization of Immutable AAs
- Construction and Reference Semantics
- Properties
Associative Arrays
Associative arrays have an index that is not necessarily an integer, and can be sparsely populated. The index for an associative array is called the key, and its type is called the KeyType.
Associative arrays are declared by placing the KeyType within the [ ] of an array declaration:
int[string] aa; // Associative array of ints that are // indexed by string keys. // The KeyType is string. aa["hello"] = 3; // set value associated with key "hello" to 3 int value = aa["hello"]; // lookup value from a key assert(value == 3);
Neither the KeyTypes nor the element types of an associative array can be function types or void.
Removing Keys
Particular keys in an associative array can be removed with the remove function:
aa.remove("hello");
remove(key) does nothing if the given key does not exist and returns false. If the given key does exist, it removes it from the AA and returns true.
All keys can be removed by using the method clear.
Testing Membership
The InExpression yields a pointer to the value if the key is in the associative array, or null if not:
int* p; p = ("hello" in aa); if (p !is null) { *p = 4; // update value associated with key assert(aa["hello"] == 4); }
Using Classes as the KeyType
Classes can be used as the KeyType. For this to work, the class definition must override the following member functions of class Object:
- size_t toHash() @trusted nothrow
- bool opEquals(Object)
Note that the parameter to opEquals is of type Object, not the type of the class in which it is defined.
For example:
class Foo { int a, b; override size_t toHash() { return a + b; } override bool opEquals(Object o) { Foo foo = cast(Foo) o; return foo && a == foo.a && b == foo.b; } }
- If toHash must consistently be the same value when opEquals returns true. In other words, two objects that are considered equal should always have the same hash value. Otherwise, undefined behavior will result.
- Use the attributes @safe, @nogc, pure, const, and scope as much as possible on the toHash and opEquals overrides.
Using Structs or Unions as the KeyType
If the KeyType is a struct or union type, a default mechanism is used to compute the hash and comparisons of it based on the binary data within the struct value. A custom mechanism can be used by providing the following functions as struct members:
size_t toHash() const @safe pure nothrow; bool opEquals(ref const typeof(this) s) const @safe pure nothrow;
For example:
import std.string; struct MyString { string str; size_t toHash() const @safe pure nothrow { size_t hash; foreach (char c; str) hash = (hash * 9) + c; return hash; } bool opEquals(ref const MyString s) const @safe pure nothrow { return std.string.cmp(this.str, s.str) == 0; } }
The functions can use @trusted instead of @safe.
- If toHash must consistently be the same value when opEquals returns true. In other words, two structs that are considered equal should always have the same hash value. Otherwise, undefined behavior will result.
- Use the attributes @nogc as much as possible on the toHash and opEquals overrides.
Construction or Assignment on Setting AA Entries
When an AA indexing access appears on the left side of an assignment operator, it is specially handled for setting an AA entry associated with the key.
string[int] aa; string s; s = aa[1]; // throws RangeError in runtime aa[1] = "hello"; // handled for setting AA entry s = aa[1]; // succeeds to lookup assert(s == "hello");
If the assigned value type is equivalent with the AA element type:
- If the indexing key does not yet exist in AA, a new AA entry will be allocated, and it will be initialized with the assigned value.
- If the indexing key already exists in the AA, the setting runs normal assignment.
struct S { int val; void opAssign(S rhs) { this.val = rhs.val * 2; } } S[int] aa; aa[1] = S(10); // first setting initializes the entry aa[1] assert(aa[1].val == 10); aa[1] = S(10); // second setting invokes normal assignment, and // operator-overloading rewrites it to member opAssign function. assert(aa[1].val == 20);
If the assigned value type is not equivalent with the AA element type, the expression could invoke operator overloading with normal indexing access:
struct S { int val; void opAssign(int v) { this.val = v * 2; } } S[int] aa; aa[1] = 10; // is rewritten to: aa[1].opAssign(10), and // throws RangeError before opAssign is called
However, if the AA element type is a struct which supports an implicit constructor call from the assigned value, implicit construction is used for setting the AA entry:
struct S { int val; this(int v) { this.val = v; } void opAssign(int v) { this.val = v * 2; } } S s = 1; // OK, rewritten to: S s = S(1); s = 1; // OK, rewritten to: s.opAssign(1); S[int] aa; aa[1] = 10; // first setting is rewritten to: aa[1] = S(10); assert(aa[1].val == 10); aa[1] = 10; // second setting is rewritten to: aa[1].opAssign(10); assert(aa[1].val == 20);This is designed for efficient memory reuse with some value-semantics structs, eg. std.bigint.BigInt.
import std.bigint; BigInt[string] aa; aa["a"] = 10; // construct BigInt(10) and move it in AA aa["a"] = 20; // call aa["a"].opAssign(20)
Inserting if not present
When AA access requires that there must be a value corresponding to the key, a value must be constructed and inserted if not present. The require function provides a means to construct a new value via a lazy argument. The lazy argument is evaluated when the key is not present. The require operation avoids the need to perform multiple key lookups.
class C{} C[string] aa; auto a = aa.require("a", new C); // lookup "a", construct if not present
Sometimes it is necessary to know whether the value was constructed or already exists. The require function doesn't provide a boolean parameter to indicate whether the value was constructed but instead allows the construction via a function or delegate. This allows the use of any mechanism as demonstrated below.
class C{} C[string] aa; bool constructed; auto a = aa.require("a", { constructed=true; return new C;}()); assert(constructed == true); C newc; auto b = aa.require("b", { newc = new C; return newc;}()); assert(b is newc);
Advanced updating
Typically updating a value in an associative array is simply done with an assign statement.
int[string] aa; aa["a"] = 3; // set value associated with key "a" to 3
Sometimes it is necessary to perform different operations depending on whether a value already exists or needs to be constructed. The update function provides a means to construct a new value via the create delegate or update an existing value via the update delegate. The update operation avoids the need to perform multiple key lookups.
class C{} C[string] aa; C older; C newer; aa.update("a", { newer = new C; return newer; }, (ref C c) { older = c; newer = new C; return newer; });
Static Initialization of AAs
NOTE: Not yet implemented.immutable long[string] aa = [ "foo": 5, "bar": 10, "baz": 2000 ]; unittest { assert(aa["foo"] == 5); assert(aa["bar"] == 10); assert(aa["baz"] == 2000); }
Runtime Initialization of Immutable AAs
Immutable associative arrays are often desirable, but sometimes initialization must be done at runtime. This can be achieved with a constructor (static constructor depending on scope), a buffer associative array and assumeUnique:
immutable long[string] aa; static this() { import std.exception : assumeUnique; import std.conv : to; long[string] temp; // mutable buffer foreach(i; 0 .. 10) { temp[to!string(i)] = i; } temp.rehash; // for faster lookups aa = assumeUnique(temp); } unittest { assert(aa["1"] == 1); assert(aa["5"] == 5); assert(aa["9"] == 9); }
Construction and Reference Semantics
An Associative Array defaults to null, and is constructed upon assigning the first key/value pair. However, once constructed, an associative array has reference semantics, meaning that assigning one array to another does not copy the data. This is especially important when attempting to create multiple references to the same array.
int[int] aa; // defaults to null int[int] aa2 = aa; // copies the null reference aa[1] = 1; assert(aa2.length == 0); // aa2 still is null aa2 = aa; aa2[2] = 2; assert(aa[2] == 2); // now both refer to the same instance
Properties
Properties for associative arrays are:
Property | Description |
---|---|
.sizeof | Returns the size of the reference to the associative array; it is 4 in 32-bit builds and 8 on 64-bit builds. |
.length | Returns number of values in the associative array. Unlike for dynamic arrays, it is read-only. |
.dup | Create a new associative array of the same size and copy the contents of the associative array into it. |
.keys | Returns dynamic array, the elements of which are the keys in the associative array. |
.values | Returns dynamic array, the elements of which are the values in the associative array. |
.rehash | Reorganizes the associative array in place so that lookups are more efficient. rehash is effective when, for example, the program is done loading up a symbol table and now needs fast lookups in it. Returns a reference to the reorganized array. |
.clear | Removes all remaining keys and values from an associative array. The array is not rehashed after removal, to allow for the existing storage to be reused. This will affect all references to the same instance and is not equivalent to destroy(aa) which only sets the current reference to null |
.byKey() | Returns a forward range suitable for use as a ForeachAggregate to a ForeachStatement which will iterate over the keys of the associative array. |
.byValue() | Returns a forward range suitable for use as a ForeachAggregate to a ForeachStatement which will iterate over the values of the associative array. |
.byKeyValue() | Returns a forward range suitable for use as a ForeachAggregate to a ForeachStatement which will iterate over key-value pairs of the associative array. The returned pairs are represented by an opaque type with .key and .value properties for accessing the key and value of the pair, respectively. Note that this is a low-level interface to iterating over the associative array and is not compatible with the Tuple type in Phobos. For compatibility with Tuple, use std.array.byPair instead. |
.get(Key key, lazy Value defVal) | Looks up key; if it exists returns corresponding value else evaluates and returns defVal. |
.require(Key key, lazy Value value) | Looks up key; if it exists returns corresponding value else evaluates value, adds it to the associative array and returns it. |
.update(Key key, Value delegate() create, Value delegate(Value) update) | Looks up key; if it exists applies the update delegate else evaluates the create delegate and adds it to the associative array |
Associative Array Example: word count
Let's consider the file is ASCII encoded with LF EOL. In general case we should use dchar c for iteration over code points and functions from std.uni.
import std.file; // D file I/O import std.stdio; import std.ascii; void main (string[] args) { ulong totalWords, totalLines, totalChars; ulong[string] dictionary; writeln(" lines words bytes file"); foreach (arg; args[1 .. $]) // for each argument except the first one { ulong wordCount, lineCount, charCount; foreach(line; File(arg).byLine()) { bool inWord; size_t wordStart; void tryFinishWord(size_t wordEnd) { if (inWord) { auto word = line[wordStart .. wordEnd]; ++dictionary[word.idup]; // increment count for word inWord = false; } } foreach (i, char c; line) { if (std.ascii.isDigit(c)) { // c is a digit (0..9) } else if (std.ascii.isAlpha(c)) { // c is an ASCII letter (A..Z, a..z) if (!inWord) { wordStart = i; inWord = true; ++wordCount; } } else tryFinishWord(i); ++charCount; } tryFinishWord(line.length); ++lineCount; } writefln("%8s%8s%8s %s", lineCount, wordCount, charCount, arg); totalWords += wordCount; totalLines += lineCount; totalChars += charCount; } if (args.length > 2) { writefln("-------------------------------------\n%8s%8s%8s total", totalLines, totalWords, totalChars); } writeln("-------------------------------------"); foreach (word; dictionary.keys.sort) { writefln("%3s %s", dictionary[word], word); } }
Associative Array Example: counting Tuples
An Associative Array can be iterated in key/value fashion using a foreach statement. As an example, the number of occurrences of all possible substrings of length 2 (aka 2-mers) in a string will be counted:
import std.range : dropOne, only, save, zip; import std.stdio : writefln; import std.typecons : Tuple; import std.utf : byCodeUnit; // avoids UTF-8 auto-decoding int[Tuple!(immutable char, immutable char)] aa; // The string `arr` has a limited alphabet: {A, C, G, T} // Thus, for better performance, iteration can be done _without_ decoding auto arr = "AGATAGA".byCodeUnit; // iterate over all pairs in the string and observe each pair // ('A', 'G'), ('G', 'A'), ('A', 'T'), ... foreach (window; arr.zip(arr.save.dropOne)) aa[window]++; // iterate over all key/value pairs of the Associative Array foreach (key, value; aa) { // the second parameter uses tuple-expansion writefln("key: %s [%s], value: %d", key, key.expand.only, value); }
% rdmd count.d key: Tuple!(immutable(char), immutable(char))('A', 'G') [AG], value: 2 key: Tuple!(immutable(char), immutable(char))('G', 'A') [GA], value: 2 key: Tuple!(immutable(char), immutable(char))('A', 'T') [AT], value: 1 key: Tuple!(immutable(char), immutable(char))('T', 'A') [TA], value: 1