D 2.0 FAQ
The same questions keep cropping up, so the obvious thing to do is prepare a FAQ.
D 2.0 FAQ
- Where is my simple language?
- What other cool features are in the plans for D 2.0?
- I suggested a great feature. Why was the suggestion ignored?
- Why const and immutable?
- Why the names const and immutable?
- How exactly is immutable related to multicores?
- Ok, I'm fine with immutable for safe data sharing among threads. But why do we need the uninformative const?
- Why are immutable strings favored in D 2.0?
- I want to contribute to D 2.0. How can I effect that?
- Why doesn't the case range statement use the case X..Y: syntax?
- What guarantees is shared supposed to provide?
- What does shared have to do with synchronization?
- What does shared have to do with memory barriers?
- What are the semantics of casting FROM unshared TO shared?
- What are the semantics of casting FROM shared TO unshared?
- Why does a large static array bloat my executable file size?
General D FAQ
- The D wiki FAQ page with many more questions answered
- What does D have that C++ doesn't?
- Why the name D?
- Could you change the name? D is kind of hard to search for on search engines.
- Where can I get a D compiler?
- Is there linux port of D?
- Is there a GNU version of D?
- How do I write my own D compiler for CPU X?
- Where can I get a GUI library for D?
- Where can I get an IDE for D?
- Why is [expletive deleted] printf left in?
- Is D open source?
- Why does the standard library use the boost license? Why not public domain?
- Why no fall through on switch statements?
- Why should I use D instead of Java?
- Doesn't C++ support strings, etc. with STL?
- Can't garbage collection be done in C++ with an add-on library?
- Can't unit testing be done in C++ with an add-on library?
- Why have an asm statement in a portable language?
- What is the point of 80 bit reals?
- How do I do anonymous struct/unions in D?
- How do I get printf() to work with strings?
- Why are floating point values default initialized to NaN rather than 0?
- Why is overloading of the assignment operator not supported?
- The ‘~’ is not on my keyboard?
- Can I link in C object files created with another compiler?
- Why not support regular expression literals with the /foo/g syntax?
- Why aren't all Digital Mars programs translated to D?
- When should I use a foreach loop rather than a for?
- Why doesn't D have an interface to C++ as well as C?
- Why doesn't D use reference counting for garbage collection?
- Isn't garbage collection slow and non-deterministic?
- Can't a sufficiently smart compiler figure out that a function is pure automatically?
- Why allow cast(float) if it isn't supposed to work?
- Why can't nested functions be forward referenced?
Why doesn't the case range statement use the case X..Y: syntax?
See the case range statement.
The usages of .. would then be:
- case X..Y:
- foreach(e; X..Y)
- array[X..Y]
What guarantees is shared supposed to provide?
Shared means that multiple threads can access the data. The guarantee is that if it is not shared, and not immutable, that only the current thread can see it.
What does shared have to do with synchronization?
Only shared data can be synchronized. It makes no sense to synchronize thread local data.
What does shared have to do with memory barriers?
Currently the compiler does not insert memory barriers around shared variables.
What are the semantics of casting FROM unshared TO shared?
Make sure there are no other unshared references to that same data.
What are the semantics of casting FROM shared TO unshared?
Make sure there are no other shared references to that same data.
Why does a large static array bloat my executable file size?
Given the declaration:
char[1024 * 1024] arr;
the executable size increases by a megabyte in size. In C, this would not as arr would be stored in the BSS segment. In D, arr is not stored in the BSS segment because:
- The char type is initialized to 0xFF, not 0. Non-zero data cannot be placed in BSS.
- Statically allocated data is placed in thread local storage. The BSS segment is not thread local, and there is no thread local equivalent of BSS.
The following will be placed in BSS:
__gshared byte[1024 * 1024] arr;
as bytes are 0 initialized and __gshared puts it in the global data.
There are similiar issues for float, double, and real static arrays. They are initialized to NaN (Not A Number) values, not 0.
The easiest way to deal with this issue is to allocate the array dynamically at run time rather than statically allocate it.
Why the name D?
The original name was the Mars Programming Language. But my friends kept calling it D, and I found myself starting to call it D. The idea of D being a successor to C goes back at least as far as 1988, as in this thread.
Could you change the name? D is kind of hard to search for on search engines.
No. We understand it can be frustrating but it's far too late for a name change at this point. We recommend using "dlang", "d programming", "d language", or "d programming language" in your search terms. Doing so should yield substantially better search results.
Most publicly available D code has "// Written in the D programming language" as its first comment.
Is there a linux port of D?
Yes, the D compiler includes a linux version.
Is there a GNU version of D?
Yes, gdc - the D frontend with GCC.
How do I write my own D compiler for CPU X?
Burton Radons has written a back end. You can use as a guide.
Where can I get a GUI library for D?
Since D can call C functions, any GUI library with a C interface is accessible from D. Various D GUI libraries and ports can be found at the D wiki.
Where can I get an IDE for D?
Lists of editors and IDEs that support D can be found on the D wiki.
Why is printf in D?
printf is not part of D, it is part of C's standard runtime library which is accessible from D. D's standard runtime library has std.stdio.writefln which is as powerful as printf but is much easier to use.
Is D open source?
The dmd D compiler and the runtime library are completely open source using the Boost License 1.0. All development takes place publicly on github. There are also the gdc and the ldc D compilers who come with the GCC- or LLVM-backend and the DMD front end.
Why does the standard library use the boost license? Why not public domain?
Although most jurisdictions use the concept of Public Domain, some (eg, Japan) do not. The Boost License avoids this problem. It was chosen because, unlike almost all other open source licenses, it does not demand that the license text be included on distributions in binary form.
Why no fall through on switch statements?
Many people have asked for a requirement that there be a break between cases in a switch statement, that C's behavior of silently falling through is the cause of many bugs.
In D2, implicit fall through is disallowed. You have to add a goto case; statement to explicitly state the intention of falling through.
There was further request that the break statement be made implicit. The reason D doesn't change this is for the same reason that integral promotion rules and operator precedence rules were kept the same - to make code that looks the same as in C operate the same. If it had subtly different semantics, it will cause frustratingly subtle bugs.
Why should I use D instead of Java?
D is distinct from Java in purpose, philosophy and reality. See this comparison.
Java is designed to be write once, run everywhere. D is designed for writing efficient native system apps. Although D and Java share the notion that garbage collection is good and multiple inheritance is bad, their different design goals mean the languages have very different feels.
Doesn't C++ support strings, etc. with STL?
In the C++ standard library are mechanisms for doing strings, dynamic arrays, associative arrays, and bounds-checked arrays.
Sure, all this stuff can be done with libraries, following certain coding disciplines, etc. But object oriented programming can also be done in C (it's been done). Isn't it incongruous that something like strings, supported by the simplest BASIC interpreter, requires a very large and complicated infrastructure to support? Just the implementation of a string type in STL is over two thousand lines of code, using every advanced feature of templates. How much confidence can you have that this is all working correctly, how do you fix it if it is not, what do you do with the notoriously inscrutable error messages when there's an error using it, how can you be sure you are using it correctly (so there are no memory leaks, etc.)?
D's implementation of strings is simple and straightforward. There's little doubt about how to use it, no worries about memory leaks, error messages are to the point, and it isn't hard to see if it is working as expected or not.
Can't garbage collection be done in C++ with an add-on library?
Yes, I use one myself. It isn't part of the language, though, and requires some subverting of the language to make it work. Using gc with C++ isn't for the standard or casual C++ programmer. Building it into the language, like in D, makes it practical for everyday programming chores.
GC isn't that hard to implement, either, unless you're building one of the more advanced ones. But a more advanced one is like building a better optimizer - the language still works 100% correctly even with a simple, basic one. The programming community is better served by multiple implementations competing on quality of code generated rather than by which corners of the spec are implemented at all.
Can't unit testing be done in C++ with an add-on library?
Sure. Try one out and then compare it with how D does it. It'll be quickly obvious what an improvement building it into the language is.Why have an asm statement in a portable language?
An asm statement allows assembly code to be inserted directly into a D function. Assembler code will obviously be inherently non-portable. D is intended, however, to be a useful language for developing systems apps. Systems apps almost invariably wind up with system dependent code in them anyway, inline asm isn't much different. Inline asm will be useful for things like accessing special CPU instructions, accessing flag bits, special computational situations, and super optimizing a piece of code.
Before the C compiler had an inline assembler, I used external assemblers. There was constant grief because many, many different versions of the assembler were out there, the vendors kept changing the syntax of the assemblers, there were many different bugs in different versions, and even the command line syntax kept changing. What it all meant was that users could not reliably rebuild any code that needed assembler. An inline assembler provided reliability and consistency.
What is the point of 80 bit reals?
More precision enables more accurate floating point computations to be done, especially when adding together large numbers of small real numbers. Prof. Kahan, who designed the Intel floating point unit, has an eloquent paper on the subject.How do I do anonymous struct/unions in D?
import std.stdio; struct Foo { union { int a; int b; } struct { int c; int d; } } void main() { writefln( "Foo.sizeof = %d, a.offset = %d, b.offset = %d, c.offset = %d, d.offset = %d", Foo.sizeof, Foo.a.offsetof, Foo.b.offsetof, Foo.c.offsetof, Foo.d.offsetof); }
How do I get printf() to work with strings?
In C, the normal way to printf a string is to use the %s format:char s[8]; strcpy(s, "foo"); printf("string = '%s'\n", s);Attempting this in D, as in:
char[] s; s = "foo"; printf("string = '%s'\n", s);usually results in garbage being printed, or an access violation. The cause is that in C, strings are terminated by a 0 character. The %s format prints until a 0 is encountered. In D, strings are not 0 terminated, the size is determined by a separate length value. So, strings are printf'd using the %.*s format:
char[] s; s = "foo"; printf("string = '%.*s'\n", s);
which will behave as expected. Remember, though, that printf's %.*s will print until the length is reached or a 0 is encountered, so D strings with embedded 0's will only print up to the first 0.
Of course, the easier solution is just use std.stdio.writefln which works correctly with D strings.
Why are floating point values default initialized to NaN rather than 0?
A floating point value, if no explicit initializer is given, is initialized to NaN (Not A Number):double d; // d is set to double.nan
NaNs have the interesting property in that whenever a NaN is used as an operand in a computation, the result is a NaN. Therefore, NaNs will propagate and appear in the output whenever a computation made use of one. This implies that a NaN appearing in the output is an unambiguous indication of the use of an uninitialized variable.
If 0.0 was used as the default initializer for floating point values, its effect could easily be unnoticed in the output, and so if the default initializer was unintended, the bug may go unrecognized.
The default initializer value is not meant to be a useful value, it is meant to expose bugs. Nan fills that role well.
But surely the compiler can detect and issue an error message for variables used that are not initialized? Most of the time, it can, but not always, and what it can do is dependent on the sophistication of the compiler's internal data flow analysis. Hence, relying on such is unportable and unreliable.
Because of the way CPUs are designed, there is no NaN value for integers, so D uses 0 instead. It doesn't have the advantages of error detection that NaN has, but at least errors resulting from unintended default initializations will be consistent and therefore more debuggable.
Why is overloading of the assignment operator not supported?
Overloading of the assignment operator for structs is supported in D 2.0.
The ‘~’ is not on my keyboard?
On PC keyboards, hold down the [Alt] key and press the 1, 2, and 6 keys in sequence on the numeric pad. That will generate a ‘~’ character.
Can I link in C object files created with another compiler?
DMD produces OMF (Microsoft Object Module Format) object files while other compilers such as VC++ produce COFF object files. DMD's output is designed to work with DMC, the Digital Mars C compiler, which also produces object files in OMF format.
The OMF format that DMD uses is a Microsoft defined format based on an earlier Intel designed one. Microsoft at one point decided to abandon it in favor of a Microsoft defined variant on COFF.
Using the same object format doesn't mean that any C library in that format will successfully link and run. There is a lot more compatibility required - such as calling conventions, name mangling, compiler helper functions, and hidden assumptions about the way things work. If DMD produced Microsoft COFF output files, there is still little chance that they would work successfully with object files designed and tested for use with VC. There were a lot of problems with this back when Microsoft's compilers did generate OMF.
Having a different object file format makes it helpful in identifying library files that were not tested to work with DMD. If they are not, weird problems would result even if they successfully managed to link them together. It really takes an expert to get a binary built with a compiler from one vendor to work with the output of another vendor's compiler.
That said, the linux version of DMD produces object files in the ELF format which is standard on linux, and it is specifically designed to work with the standard linux C compiler, gcc.
There is one case where using existing C libraries does work - when those libraries come in the form of a DLL conforming to the usual C ABI interface. The linkable part of this is called an "import library", and Microsoft COFF format import libraries can be successfully converted to DMD OMF using the coff2omf tool.
Why not support regular expression literals with the /foo/g syntax?
There are two reasons:
- The /foo/g syntax would make it impossible to separate the lexer from the parser, as / is the divide token.
- There are already 3 string types; adding the regex literals would add 3 more. This would proliferate through much of the compiler, debugger info, and library, and is not worth it.
Why aren't all Digital Mars programs translated to D?
There is little benefit to translating a complex, debugged, working application from one language to another. But new Digital Mars apps are implemented in D.
When should I use a foreach loop rather than a for?
Is it just performance or readability?
By using foreach, you are letting the compiler decide on the optimization rather than worrying about it yourself. For example - are pointers or indices better? Should I cache the termination condition or not? Should I rotate the loop or not? The answers to these questions are not easy, and can vary from machine to machine. Like register assignment, let the compiler do the optimization.
for (int i = 0; i < foo.length; i++)or:
for (int i = 0; i < foo.length; ++i)or:
for (T* p = &foo[0]; p < &foo[length]; p++)
or:
T* pend = &foo[length];
for (T* p = &foo[0]; p < pend; ++p)
or:
T* pend = &foo[length]; T* p = &foo[0]; if (p < pend) { do { ... } while (++p < pend); }and, of course, should I use size_t or int?
for (size_t i = 0; i < foo.length; i++)
Let the compiler pick!
foreach (v; foo)
...
Note that we don't even need to know what the type T needs to be, thus avoiding bugs when T changes. I don't even have to know if foo is an array, or an associative array, or a struct, or a collection class. This will also avoid the common fencepost bug:
for (int i = 0; i <= foo.length; i++)
And it also avoids the need to manually create a temporary if foo is a function call.
The only reason to use a for loop is if your loop does not fit in the conventional form, like if you want to change the termination condition on the fly.
Why doesn't D have an interface to C++ as well as C?
D 2.0 does have a limited interface to C++ code.
Here are some reasons why it isn't a full interface:Attempting to have D interface with C++ is nearly as complicated as writing a C++ compiler, which would destroy the goal of having D be a reasonably easy language to implement. For people with an existing C++ code base that they must work with, they are stuck with C++ (they can't move it to any other language, either).
There are many issues that would have to be resolved in order for D code to call some arbitrary C++ code that is presumed to be unmodifiable. This list certainly isn't complete, it's just to show the scope of the difficulties involved.
- D source code is unicode, C++'s is ASCII with code pages. Or not. It's unspecified. This impacts the contents of string literals.
- std::string cannot deal with multibyte UTF.
- C++ has a tag name space. D does not. Some sort of renaming would have to happen.
- C++ code often relies on compiler specific extensions.
- C++ has namespaces. D has modules. There is no obvious mapping between the two.
- C++ views source code as one gigantic file (after preprocessing). D sees source code as a hierarchy of modules and packages.
- Enum name scoping rules behave differently.
- C++ code, despite decades of attempts to replace macro features with inbuilt ones, relies more heavily than ever on layer after layer of arbitrary macros. D does not always have an analog for token pasting and stringizing.
- Macro names have global scope across #include files, but are local to the gigantic source files.
- C++ has arbitrary multiple inheritance and virtual base classes. D does not.
- C++ does not distinguish between in, out and ref (i.e. inout) parameters.
- The C++ name mangling varies from compiler to compiler.
- C++ throws exceptions of arbitrary type, not just descendants of Object.
- C++ overloads based on const and volatile. D overloads based on const and immutable.
- C++ overloads operators in significantly different ways - for example, operator[]() overloading for lvalue and rvalue is based on const overloading and a proxy class.
- C++ overloads operators like < completely independently of >.
- C++ does not distinguish between a class and a struct object.
- The vtbl[] location and layout is different between C++ and D.
- The way RTTI is done is completely different. C++ has no classinfo.
- D does not have two phase lookup, nor does it have Koenig (ADL) lookup.
- C++ relates classes with the 'friend' system, D uses packages and modules.
- C++ class design tends to revolve around explicit memory allocation issues, D's do not.
- D's template system is very different.
- C++ has 'exception specifications'.
- C++ has global operator overloading.
- C++ name mangling depends on const and volatile being type modifiers. D name mangling depends on const and immutable being type modifiers. D's const is also transitive, unlike C++. One cannot have a const pointer to mutable in D.
The bottom line is the language features affect the design of the code. C++ designs just don't fit with D. Even if you could find a way to automatically adapt between the two, the result will be about as enticing as the left side of a honda welded to the right side of a camaro.
Why doesn't D use reference counting for garbage collection?
Reference counting has its advantages, but some severe disadvantages:
- Cyclical data structures won't get freed.
- Every pointer copy requires an increment and a corresponding decrement - including when simply passing a reference to a function.
- In a multithreaded app, the incs and decs must be synchronized.
- Exception handlers (finally blocks) must be inserted to handle all the decs so there are no leaks. Contrary to assertions otherwise, there is no such thing as "zero overhead exceptions."
- In order to support slicing and interior pointers, as well as supporting reference counting on arbitrary allocations of non-object data, a separate "wrapper" object must be allocated for each allocation to be ref counted. This essentially doubles the number of allocations needed.
- The wrapper object will mean that all pointers will need to be double-dereferenced to access the data.
- Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.
- Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall.
- Ref counting does not eliminate latency problems, it just reduces them.
The proposed C++ shared_ptr<>, which implements ref counting, suffers from all these faults. I haven't seen a heads up benchmark of shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> turned out to be a significant loser in terms of both performance and memory consumption.
That said, D may in the future optionally support some form of ref counting, as rc is better for managing scarce resources like file handles. Furthermore, if ref counting is a must, Phobos has the std.typecons.RefCounted type which implements it as a library, similar to C++'s shared_ptr<>.
Isn't garbage collection slow and non-deterministic?
Yes, but all dynamic memory management is slow and non-deterministic, including malloc/free. If you talk to the people who actually do real time software, they don't use malloc/free precisely because they are not deterministic. They preallocate all data. However, the use of GC instead of malloc enables advanced language constructs (especially, more powerful array syntax), which greatly reduce the number of memory allocations which need to be made. This can mean that GC is actually faster than explict management.
Can't a sufficiently smart compiler figure out that a function is pure automatically?
The compiler infers purity (and safety, and nothrow) for delegate and function literals. It doesn't do this for normal functions for several reasons:
- most functions call other functions, which call other functions, until one of them calls a library routine whose source is not available to the compiler. So it will have to assume it is not pure. With a pure function attribute, external library functions that are pure can be marked as pure, and then the analysis can work for enough cases to be useful.
- Since virtual functions (and by extension delegates and function pointers) can be extended by the user at times the compiler won't see it, they have to be assumed to be impure.
- If the programmer intends for a particular function to be pure and the compiler detects it is not, this may go unnoticed by the programmer. Even worse, if the programmer does notice it, it may be arbitrarily difficult to determine why the compiler thinks it is impure - is it a programming mistake or a compiler bug?
Why allow cast(float) if it isn't supposed to work?
The floating point rules are such that transforming cast(real)cast(float) to cast(real) is a valid transformation. This is because the floating point rules are written with the following principle in mind:
An algorithm is invalid if it breaks if the floating point precision is increased. Floating point precision is always a minimum, not a maximum.
Programs that legitimately depended on maximum precision are:
- compiler/library validation test suites
- ones trying to programmatically test the precision
(1) is not of value to user programming, and there are alternate ways to test the precision.
(2) D has .properties that take care of that.
Programs that rely on a maximum accuracy need to be rethought and reengineered.
Why can't nested functions be forward referenced?
Declarations within a function are different from declarations at module scope. Within a function, initializers for variable declarations are guaranteed to run in sequential order. Allowing arbitrary forward references to nested functions would break this, because nested functions can reference any variables declared above them.
int first() { return second(); } int x = first(); // x depends on y, which hasn't been declared yet. int y = x + 1; int second() { return y; }
But, forward references of nested functions are sometimes required (eg, for mutually recursive nested functions). The most general solution is to declare a local, nested struct. Any member functions (and variables!) of that struct have non-sequential semantics, so they can forward reference each other.