
On comp.lang.c++.moderated, Anand Hariharan recently asked:
I recently attended a talk given by a .NET evangelist. Surprisingly, the speaker was quite sincere and explained why Garbage collection is no panacea (citing examples such as database connections and file handles), taking his presentation through muddy waters of Dispose, Close, etc.
At one point he showed how C# chose to overload the "using" keyword in a completely unrelated context viz., to specify that the variables within a block defined by "using" should be destroyed as soon as they leave the scope.
At that point I asked why C# could not have simply incorporated those semantics as a part of the language rather than requiring the programmer explicitly request it at specific places. His response: "Deterministic de-construction is generally expensive, especially if one has several small objects sporadically sprewn all over."
Is there a merit (statistical/empirical) to his assertion?
Short answer: No, but it's a common misconception even among experts. Having deterministic destruction can incur a minor performance penalty, and it is this he is thinking about. But deterministic destruction also gains a significant practical performance advantage, and is otherwise desirable. (See further below.)
Anand continued:
I thought C++ went great lengths for RAII to be possible, eschewing runtime guzzlers (such as mark-and-sweep) largely on performance grounds.
Actually, you want both; destructors are essential, and modern GCs are very sophisticated and high-performance and desirable for many reasons. It's a matter of using each where it's appropriate.
Jared Finder responded, adding:
This just seems crazy. I can't see how C#'s using, Java's try-finally, or C++'s automatic destruction would generate code that is different in any way.
Right, but it's useful to understand the actual issue. Here it is:
Let's say you have a stack frame containing one or more conceptually local objects. If any such object should be cleaned up at the end of the function (or more local scope), either for performance reasons or for correctness reasons, you need to express that. Depending on the language you're using, you express that essentially identically as one of the following:
- in C++, a stack-based object with a nontrivial destructor
- in C#, a using clause for a Disposable object
- in Java, the hand-coded Dispose pattern
In each case, you incur the overhead of an implicit or explicit "try/finally" for the first local object that will need the cleanup -- and it is that try/finally that the people who worry about performance are talking about.
Note, however, that:
- The constructs for expressing this that are essentially the same in all languages; the only question is ease of use, and the winners there are C++, C#, and Java, in that order. (To be complete, I should acknowlege that there are other areas where C# and Java win on ease of use, but in this particular case it is C++ that is the simpler language.)
- Generally it's wrong NOT to write the deterministic destruction when objects are conceptually local. If the object needs to be Dispose'd, you need to Dispose it. So it's usually a red herring to say this incurs some potential overhead, because to avoid the overhead would be to write an incorrect program (and/or a less well-performing one, see below).
- There are offsetting performance advantages to early destruction. In particular, you incur a local try/finally for the first local variable in a given scope that requires the cleanup (additional ones are essentially free because you already have the try/finally in place), but you often get great performance benefits later by reducing finalizer pressure and GC work. (In one example I cite in talks, the microsoft.com website uses .NET widely but at one point found that they were spending 70% of total system time(!) in the GC. It wasn't .NET's fault or the GC's fault, but rather in the way that GC was being used. The CLR performance team analyzed the problem and told the app team to make one change: Before making a server-to-server call, clean up (Dispose) all the objects you don't need any more. With that one change, GC went down to 1%. I submit that the problem would never have occurred if the app had been written in C++, which uses deterministic destruction by default. C# and Java have it off by default, and if you forget to write "using" or the Dispose pattern then your code will still compile, but will have either a correctness bug or a latent performance problem.)
Otherwise, if none of the conceptually local object does not require cleanup, you express that essentially identically as one of the following:
- in C++, a stack-based object with a trivial destructor (or, a heap-based object)
- in C#, no using clause
- in Java, no Dispose pattern
In each case, you avoid adding the exception handling to do the cleanup. Again, it's the same in all language. C++ happens to do turn cleanup on by default for stack-based objects and does this optimization to automatically avoid the overhead when the cleanup work is trivial.
So the argument really doesn't boil down to what some people often say, namely whether deterministic destruction of conceptually local objects is a good thing or not -- clearly it is important, otherwise you wouldn't have C++ auto semantics, C# using statements, and Java Dispose patterns! The argument really boils down to this: When you do need deterministic destruction, you really do need it regardless of the language you're using, and to avoid the overhead would be to write an incorrect program (and often one with more overhead in other places).
Jared continued:
I can see there being problems with an old ABI that requires each function to register itself as having a cleanup step, but standards can't prevent all stupid implementations. In addition, using garbage collection will remove much of the work done in destructors since a most of the resources used in a program tends to be memory.
The latter is true for trivial destructors. In short, finalizers (often but incorrectly called "destructors that run at GC time" which they are not) are fundamentally flawed and extremely complex. (See for example Chris Brumme's blog article about finalizers.)
I have personally come to the conclusion that destructors and GC are completely separate and must be kept completely separate. Trying to conflate the two ideas is the root of most of the problems with current GC systems in my opinion; in particular, this manifests most notably in the case of finalizers which exactly try to tie those two things together, and in the fact that all major current GC systems attempt to do GC instead of destructors, rather than in addition to destructors (with the notable exception of C++/CLI).
Of course, C++/CLI exposes what the CLI does (including finalizers) and what C++ does (destructors) and by bringing them together shows how beneficial destructors are even for today's GC systems. I think that C++/CLI is the best it can be in this regard and is really compelling over the current alternatives. I also think this approach could be taken further and further improved upon; I have definite ideas, not ready for publication, on potential improvements in GC by removing finalizers outright (which could be viewed as somewhat radical and I agree that departing from longtime practice is something that should never be done lightly).
I'd be interested in what Herb Sutter had to say about this, considering that one of the big advantages of C++/CLI over C# is the automatic calling of destructors.
Yes. Of course, C# has other advantages; I drool over anonymous delegates (a restricted but very useful form of lambda functions / closures). It would be cool to have those in C++... but that's another release...
Herb
posted on Tuesday, November 23, 2004 2:33 PM
|