Carlo Pescio: February 2011

Sunday, February 27, 2011

Notes on Software Design, Chapter 14: the Enumeration Law

In my previous post on this series, I used the well-known Shape problem to explain polymorphism from the entanglement perspective. We've seen that moving from a C-like approach (with a ShapeType enum and a union of structures) to an OO approach (with a Shape interface implemented by concrete shape classes) creates new centers, altering the entanglement between previous centers.
This time, I'll explore the landscape of entanglement a little more, using simple, well-known problems. My aim is to provide an intuitive grasp of the entanglement concept before moving to a more formal definition. In the end, I'll come up with a simple Software Law (one of many). This chapter borrows extensively from the previous one, so if you didn't read chapter 13, it's probably better to do so before reading further.

Shapes, again
Forget about the Shape interface for a while. We now have two concrete classes: Triangle and Circle. They represent geometric shapes, this time without graphical responsibilities. A triangle is defined by 3 points, a Circle by center and radius. Given a Triangle and a Circle, we want to know the area of their intersection.
As usual, we have two problems to solve: the actual mathematical problem and the software design problem. If you're a mathematician, you may rightly think that solving the first (the function!) is important, and the latter is sort of secondary, if not irrelevant. As a software designer, however, I'll allow myself the luxury of ignoring the gory mathematical details, and tinker with form instead.

If you're working in a language (like C++) where functions exist [also] outside classes, a very reasonable form would be this:

double Intersection( const Triangle& t, const Circle& c )
  {
  // something here
  }

Note that I'm not trying to solve a more general problem. First, I said "forget the Shape interface"; second, I may not know how to solve the general problem of shapes intersection. I just want to deal with a circle and a triangle, so this signature is simple and effective.

If you're working in a class-only language, you have several choices.

1) Make that an instance method, possibly changing an existing class.

2) Make that a static method, possibly changing an existing class.

3) If your language has open classes, or extension methods, etc, you can add that method to an existing class without changing it (as an artifact).

So, let's sort this out first. I hope you can feel the ugliness of placing that function inside Triangle or Circle, either as an instance or static method (that feeling could be made more formal by appealing to coupling, to the open/closed principle, or to entanglement itself).
So, let's say that we introduce a new class, and go for a static method. At that point is kinda hard to find nice names for class and method (if you're thinking about ShapeHelper or TriangleHelper, you've been living far too long in ClassNameWasteLand, sorry :-). Having asked this question a few times in real life (while teaching basic OO thinking), I'll show you a relatively decent proposal:

class Intersection
  {
  public static double Area( Triangle t, Circle c )
    {
    // something here
    }
  }

It's quite readable on the calling site:

double a = Intersection.Area( t, c ) ;

Also, the class name is ok: a proper noun, not something ending with -er / -or that would make Peter Coad (and me :-) shiver. An arguably better alternative would be to pass parameters in a constructor and have a parameterless Area() function, but let's keep it like this for now.

Whatever you do, you end up with a method/function body that is deeply coupled with both Triangle and Circle. You need to get all their data to calculate the area of the intersection. Still, this doesn't feel wrong, does it? That's the method purpose. It's ok to manipulate center, radius, and vertexes. Weird, coupling is supposed to be bad, but it doesn't feel bad here.

Interlude
Let me change problem setting for a short while, before I come back to our beloved shapes. You're now working on yet another payroll system. You got (guess what :-) a Person class:

class Person
  {
  Place address;
  String firstName;
  String lastName;
  TelephoneNumber homePhone;
  // …
  }

(yeah, I know, some would have written "Address address" :-)

You also have a Validate() member function (I'll simplify the problem to returning a bool, not an error message or set of error messages).

bool Validate()
  {
  return 
    address.IsValid() && 
    ValidateName( firstName ) && 
    ValidateName( lastName ) && 
    homePhone.IsValid() ;
  }

Yikes! That code sucks! It sucks because it's asymmetric (due to the two String-typed members) but also because it's fragile. If I get in there and add "Email personalEmail ;" as a field, I'll have to update Validate as well (and more member functions too, and possibly a few callers). Hmmm. That's because I'm using my data members. Well, I'm using Triangle and Circle data member as well in the function above, which is supposed to be worse; here I'm using my own data members. Why was that right, while Validate "feels" wrong? Don't use hand-waving explanations :-).

Back to Shapes
A very reasonable request would be to calculate the intersection of two triangles or two circles as well, so why don't we add those functions too:

class Intersection
  {
  public static double Area( Triangle t, Circle c )
    {     // something here    }
  public static double Area( Triangle t1, Triangle t2 )
    {     // something here    }
  public static double Area( Circle c1, Circle c2 )
    {     // something here    }
  }

We need 3 distinct algorithms anyway, so anything more abstract than that may look suspicious. It makes sense to keep the three functions in the same class: the first function is already coupled with Triangle and Circle. Adding the other two doesn't make it any worse from the coupling perspective. Yet, this is wrong. It's a step toward the abyss. Add a Rectangle class, and you'll have to add 4 member functions to Intersection.

Interestingly, we came to this point without introducing a single switch/case, actually without even a single "if" in our code. Even more interestingly, our former Intersection.Area was structurally similar to Person.Validate, yet relatively good. Our latter Intersection is not structurally similar to Person, yet it's facing a similar problem of instability.

Let's stop and think :-)
A few possible reactions to the above:

- I'm too demanding. You can do that and nothing terrible will happen.
Probably true, a small scale software mess rarely kills. The same kind of problem, however, is often at the core of a large-scale mess.

- I'm throwing you a curve.
True, sort of. The intersection problem is a well-known double-dispatch issue that is not easily solved in most languages, but that's only part of the problem.

- It's just a matter of Open/Closed Principle.
Yeah, well, again, sort of. The O/C is more concerned with extension. In practice, you won't create a new Person class to add email. You'll change the existing one.

- I know a pattern / solution for that!
Me too :-). Sure, some acyclic variant of Visitor (or perhaps more sophisticated solutions) may help with shape intersection. A reflection+attribute-based approach to validation may help with Person. As usual, we should look at the moon, not at the finger. It's not the problem per se; it's not about finding an ad-hoc solution. It's more about understanding the common cause of a large set of problems.

- Ok, I want to know what is really going on here :-)
Great! Keep reading :-)

C/D-U-U Entanglement
Everything that felt wrong here, including the problem from Chapter 13, that is:

- the ShapeType-based Draw function with a switch/case inside

- Person.Validate

- the latter Intersection class, dealing with more than one case

was indeed suffering from the same problem, namely an entanglement chain: C/D-U-U. In plain English: Creation or Deletion of something, leading to an Update of something else, leading to an Update of something else. I think that understanding the full chain, and not simply the U-U part, makes it easier to recognize the problem. Interestingly, the former Intersection class, with just one responsibility, was immune from C/D-U-U entanglement.

The chain is easier to follow on the Draw function in Chapter 13, so I'll start from there. ShapeType makes it easy to understand (and talk about) the problem, because it's both Nominative and Extensional. Every concept is named: individual shape types and the set of those shape types (ShapeType itself). The set is also providing through its full extension, that is, by enumerating its values (that's actually why it's called an enumerated type). Also, every concrete shape is given a name (a struct where concrete shape data are stored).

So, for the old C-like solutions, entanglement works (unsurprisingly) like this:

- There is a C/D-C/D entanglement between members of ShapeType and a corresponding struct. Creation or deletion of a new ShapeType member requires a corresponding action on a struct.

- There is a C/D-U entanglement between members of ShapeType and ShapeType (this is obvious: any enumeration is C/D-U entangled with its constituents).

- There is U-U entanglement between ShapeType and Draw, or every other function that is enumerating over ShapeType (a switch/case being just a particular case of enumerating over names).

Consider now the first Intersection.Area. That function is enumerating over two sets: the set of Circle data, and the set of Triangle data. There is no switch/case involved: we're just using those data, one after another. That amounts to enumeration of names. The function is C/D-U-U entangled with any circle and triangle data. We don't consider that to be a problem because we assume (quite reasonably) that those sets won't change. If I change my mind and represent a Circle using 3 points (I could) I'll have to change Area. It's just that I do not consider that change any likely.

Consider now Person.Validate. Validate is enumerating over the set of Person data, using their names. It's not using a switch/case. It doesn't have to. Just using a name after another amounts to enumeration. Also, enumeration is not broken by a direct function call: moving portions of Validate into sub-functions wouldn't make any difference, which is why layering is ineffective here.
Here, Validate is U-U entangled with the (unnamed) set of Person data, or C/D-U-U entangled with any of its present or future data members. Unlike Circle or Triangle, we have all reasons to suspect the Person data to be unstable, that is, for C and/or D to occur. Therefore, Validate is unstable.

Finally, consider the latter Intersection class. While the individual Area functions are ok, the Intersection class is not. Once we bring in more than one intersection algorithm, we entangle the class with the set of shape types. Now, while in the trivial C-like design we had a visible, named entity representing the set of shapes (ShapeTypes), in the intersection problem I didn't provide such an artifact (on purpose). The fact that we're not naming it, however, does not make it any less real. It's more like we're dealing with an Intensional definition of the set itself (this would be a long story; I wrote an entire post about it, then decided to scrap it). Intersection is C/D-U-U entangled with individual shape types, and that makes it unstable.

A Software Law
One of the slides in my Physics of Software reads "software has a nature" (yeah, I need to update those slides with tons of new material). In natural sciences, we have the concept of Physical law, which despite the name doesn't have to be about physics :-). In physics, we do have quite a few interesting laws, not necessarily expressed through complex formulas. The Laws of thermodynamics , for instance, can be stated in plain English.

When it comes to software, you can find several "laws of software" around. However, most of them are about process (like Conway's law, Brooks' law, etc), and just a few (like Amdhal's law) are really about software. Some are just principles (like the Law of Demeter) and not true laws. Here, however, we have an opportunity to formulate a meaningful law (one of many yet to be discovered and/or documented):

- The Enumeration Law
every artifact (a function, a class, whatever) that is enumerating over the members of set by name is U-U entangled with that set, or said otherwise, C/D-U-U entangled with any member of the set.

This is not a design principle. It is not something you must aim for. It's a law. It's always true. If you don't like the consequences, you have to change the shape of your software so that the law no longer applies, or that entanglement is harmless because the set is stable.

This law is not earth-shattering. Well, the laws of thermodynamics don't look earth-shattering either, but that doesn't make them any less important :-). Honestly, I don't know about you, but I wish someone taught me software design this way. This is basically what is keeping me going :-).

Consequences
In Chapter 12, I warned against the natural temptation to classify entanglement in a good/bad scale, as it has been done for coupling and even for connascence. For instance, "connascence of name" is considered the best, or least dangerous, form. Yet we have just seen that entanglement with a set of names is at the root of many problems (or not, depending on the stability of that name set).

Forces are not good or evil. They're just there. We must be able to recognize them, understand how they're influencing our material, and deal with them accordingly. For instance, there are well-known approaches to mitigate dependency on names: indirection (through function pointers, polymorphism, etc) may help in some cases; double indirection (like in double-dispatch) is required in other cases; reflection would help in others (like Validate). All those techniques share a common effect, that we'll discuss in a future post.

Conclusions
Speaking of future: recently, I've been exploring different ways to represent entanglement, as I wasn't satisfied with my forcefield diagram. I'll probably introduce an "entanglement diagram" in my next post.

A final note: visit counters, sharing counters and general feedback tell me that this stuff is not really popular. My previous rant on design literature had way more visitors than any of my posts on the Physics of Software, and was far easier to write :-). If you're not one of the usual suspects (the handful of good guys who are already sharing this stuff around), consider spreading the word a little. You don't have to be a hero and tell a thousand people. Telling the guy in front of you is good enough :-).

Thursday, February 03, 2011

Is Software Design Literature Dead?

Sometimes, my clients ask me what to read about software design. Most often than not, they don't want a list of books – many already have the classics covered. They would like to read papers, perhaps recent works, to further advance their understanding of software design, or perhaps something inspirational, to get new ideas and concepts. I have to confess I'm often at loss for suggestions.

The sense of discomfort gets even worse when they ask me what happened to the kind of publications they used to read in the late '90s. Stuff like the C++ Report, the Journal of Object Oriented Programming, Java Report, Object Expert, etc. Most don't even know about the Journal of Object Technology, but honestly, the JOT has taken a strong academic slant lately, and I'm not sure they would find it all that interesting. I usually suggest they keep current on design patterns: for instance, the complete EuroPLoP 2009 proceedings are available online, for free. However, mentioning patterns sometimes furthers just another question: what happened to the pattern movement? Keep that conversation going for a while, and you get the final question: so, is software design literature dead?

Is it?
Note that the question is not about software design - it's about software design literature. Interestingly, Martin Fowler wrote about the other side of the story (Is Design Dead?) back in 2004. He argued that design wasn't dead, but its nature had changed (I could argue that if you change the nature of something, then it's perhaps inappropriate to keep using the same name :-), but ok). So perhaps software design literature isn't dead either, and has just changed nature: after all, looking for "software design" on Google yields 3,240,000 results.
Trying to classify what I usually find under the "software design" chapter, I came up with this list:

- Literature on software design philosophy, hopefully with some practical applications. Early works on Information Hiding were as much about a philosophy of software design as about practical ways to structure our software. There were a lot of philosophical papers on OOP and AOP as well. Usually, you get this kind of literature when some new idea is being proposed (like my Physics of Software :-). It's natural to see less and less philosophy in a maturing discipline, so perhaps a dearth of philosophical literature is not a bad sign.

- Literature on notations, like UML or SysML. This is not really software design literature, unless the rationale for the notation is discussed.

- Academic literature on metrics and the like. This would be interesting if those metrics addressed real design questions, but in practice, everything is always observed through the rather narrow perspective of correlation with defects or cost or something appealing for manager$. In most cases, this literature is not really about software design, and is definitely not coming from people with a strong software design background (of course, they would disagree on that :-)

- Literature on software design principles and heuristics, or refactoring techniques. We still see some of this, but is mostly a rehashing of the same old stuff from the late '90s. The SOLID principles, the Law of Demeter, etc. In most cases, papers say nothing new, are based on toy problems, and are here just because many programmers won't read something that has been published more than a few months ago. If this is what's keeping software design literature alive, let's pull the plug.

- Literature on methods, like Design by Contract, TDD, Domain-Driven Design, etc. Here you find the occasional must-read work (usually a book from those who actually invented the approach), but just like philosophy, you see less and less of this literature in a maturing discipline. Then you find tons of advocacy on methods (or against methods), which would be more interesting if they involved real experiments (like the ones performed by Simula labs) and not just another toy projects in the capable hands of graduate students. Besides, experiments on design methods should be evaluated [mostly] by cost of change in the next few months/years, not exclusively by design principles. Advocacy may seem to keep literature alive, but it's just noise.

- Literature on platform-specific architectures. There is no dearth of that. From early EJB treatises to JBoss tutorials, you can find tons of papers on "enterprise architecture". Even Microsoft has some architectural literature on application blocks and stuff like that. Honestly, in most cases it looks more like Markitecture (marketing architecture as defined by Hohmann), promoting canned solutions and proposing that you adapt your problem to the architecture, which is sort of the opposite of real software design. The best works usually fall under the "design patterns" chapter (see below).

- Literature for architecture astronauts. This is a nice venue for both academics and vendors. You usually see heavily layered architectures using any possible standards and still proposing a few more, with all the right (and wrong) acronyms inside, and after a while you learn to turn quickly to the next page. It's not unusual to find papers appealing to money-saving managers who don't "get" software, proposing yet another combination of blueprint architectures and MDA tools so that you can "generate all your code" with a mouse click. Yeah, sure, whatever.

- Literature on design patterns. Most likely, this is what's keeping software design literature alive. A well-written pattern is about a real problem, the forces shaping the context, an effective solution, its consequences, etc. This is what software design is about, in practice. On the other hand, a couple of conferences every year can't keep design literature in perfect health.

- Literature on design in the context of agility (mostly about TDD). There is a lot of this on the net. Unfortunately, it's mostly about trivial problems, and even more unfortunately, there is rarely any discussion about the quality of the design itself (as if having tests was enough to declare the design "good"). The largest issue here is that it's basically impossible to talk about the design of anything non-trivial strictly from a code-based perspective. Note: I'm not saying that you can't design using only code as your material. I'm saying that when the problem scales slightly beyond the toy level, the amount of code that you would have to write and show to talk about design and design alternatives grows beyond the manageable. So, while code-centric practices are not killing design, they are nailing the coffin on design literature.

Geez, I am usually an optimist :-)), but this picture is bleak. I could almost paraphrase Richard Gabriel (of "Objects have failed" fame) and say that evidently, software design has failed to speak the truth, and therefore, software design narrative (literature) is dying. But that would not be me. I'm more like the opportunity guy. Perhaps we just need a different kind of literature on software design.

Something's missing
If you walk through the aisles of a large bookstore, and go to the "design & architecture" shelves, you'll all the literary kinds above, not about software, but about real-world stuff (from chairs to buildings). except on the design of physical objects too. Interestingly, you can also find a few morelike:

- Anthologies (presenting the work of several designers) and monographs on the work of famous designers. We are at loss here, because software design is not immediately visible, is not interesting for the general public, is often kept as a trade secret, etc.
I know only one attempt to come up with something similar for software: "Beautiful Architecture" by Diomidis Spinellis. It's a nice book, but it suffers from a lack of depth. I enjoyed reading it, but didn't come back with a new perspective on something, which is what I would be looking for in this kind of literature.
It would be interesting, for instance, to read more about the architecture of successful open-source projects, not from a fanboy perspective but through the eyes of experienced software designers. Any takers?

- Idea books. These may look like anthologies, but the focus is different. While an anthology is often associated with a discussion or critics of the designer's style, an idea book presents several different objects, often out of context, as a sort of creative stimulus. I don't know of anything similar for software design, though I've seen many similar book for "web design" (basically fancy web pages). In a sense, some literature on patterns comes close to being inspirational; at least, good domain-specific patterns sometimes do. But an idea book (or idea paper) would look different.

I guess anthologies and monographs are at odd with the software culture at large, with its focus on the whizz-bang technology of the day, little interest for the past, and often bordering on religious fervor about some products. But idea books (or most likely idea papers) could work, somehow.

Indeed, I would like to see more software design literature organized as follows:

- A real-world, or at least realistic problem is presented.

- Forces are discussed. Ideally, real-world forces.

- Possibly, a subset of the whole problem is selected. Real-world problems are too big to be dissected in a paper (or two, or three). You can't discuss the detailed design of a business system in a paper (lest you appeal only to architecture astronauts). You could, however, discuss a selected, challenging portion. Ideally, the problem (or the subset) or perhaps just the approach / solution, should be of some interest even outside the specific domain. Just because your problem arose in a deeply embedded environment doesn't mean I can't learn something useful for an enterprise web application (assuming I'm open minded, of course :-). Idea books / papers should trigger some lateral thinking on the reader, therefore unusual, provocative solutions would be great (unlike literature on patterns, where you are expected to report on well-known, sort of "traditional" solutions).

- Practical solutions are presented and scrutinized. I don't really care if you use code, UML, words, gestures, whatever. Still, I want some depth of scrutiny. I want to see more than one option discussed. I don't need working code. I'm probably working on a different language or platform, a different domain, with different constraints. I want fresh design ideas and new perspectives.

- In practice, a good design aims at keeping the cost of change low. This is why a real-world problem is preferable. Talking about the most likely changes and how they could be addressed in different scenarios beats babbling about tests and SOLID and the like. However, "most likely changes" is meaningless if the problem is not real.

Funny enough, there would be no natural place to publish something like that, except your own personal page. Sure, maybe JOT, maybe not. Maybe IEEE Software, maybe not. Maybe some conference, maybe not. But we don't have a software design journal with a large readers pool. This is part of the problem, of course, but also a consequence. Wrong feedback loop :-).

Interestingly, I proposed something similar years ago, when I was part of the editorial board of a software magazine. It never made a dent. I remember that a well-known author argued against the idea, on the basis that people were not interested in all this talking about ins-and-outs, design alternatives and stuff. They would rather have a single, comprehensive design presented that they could immediately use, preferably with some source code. Well, that's entirely possible; indeed, I don't really know how many software practitioners would be interested in this kind of literature. Sure, I can bet a few of you guys would be, but overall, perhaps just a small minority is looking for inspiration, and most are just looking for canned solutions.

Or maybe something else...
On the other hand, perhaps this style is just too stuck in the '90s to be appealing in 2011. It's unidirectional (author to readers), it's not "social", it's not fun. Maybe a design challenge would be more appealing. Or perhaps the media are obsolete, and we should move away from text and graphics and toward (e.g.) videos. I actually tried to watch some code kata videos, but the guys thought they were like this, but to an experienced designer they looked more like this. (and not even that funny). Maybe the next generation of software designers will come up with a better narrative style or media.

Do something!
I'm usually the "let's do something" kind of guy, and I've entertained the idea of starting a Software Design Gallery, or a Software Design Idea Book, or something, but honestly, it's a damn lot of work, and I'm a bit skeptical about the market size (even for a free book/paper/website/whatever). As usual, any feedback is welcome, and any action on your part even more :-)

Carlo Pescio

Pages

Sunday, February 27, 2011

Notes on Software Design, Chapter 14: the Enumeration Law

Thursday, February 03, 2011

Is Software Design Literature Dead?

some of my websites / projects