On code being model – maybe not what you think

I have heard the mantra ‘code is model’ several times. Even though I always thought I got the idea of what it meant, only now I decided to do some research to find out where it came from. Turns out that it originated from a blog post that MS’ Harry Pierson wrote back in 2005. It is a very good read, insightful, and to the point.

The idea that gave title to Harry’s post is that whenever we use a simpler representation to build something that is more complex and detailed than we want to care about, we are creating models. 3GL source code is a model for object code. Byte code is a model for actual CPU-specific executable code. Hence code is model.

He then goes to ask that if we have been successfully reaping the benefits of increased levels of abstraction by using 3GLs for decades now, what prevents us from taking the next step and using even higher level (modeling) languages? He makes several good points that are at the very foundations of true model-driven development:

  • models must be precise“- models must be amenable to automatic transformation. Models that cannot be transformed into running code are “useless as development artifacts“. If you like them for conceiving or communicating ideas, that is fine, but those belong to a totally different category, one that plays a very marginal role in software development, and have nothing to do with model-driven development. Models created using the TextUML Toolkit are forcefully precise, and can include behavior in addition to structure.
  • models must be intrinsic to the development process” – models need to be “first class citizens of the development process” or they will become irrelevant. That means: everything that makes sense to be modeled is modeled, and running code is generated from models without further manual elaboration, i.e., no manually changing generated code and taking it from there. As a rule, you should refrain from reading generated code or limit yourself to reading the API of the code, unless you are investigating a code generation bug. There is nothing really interesting to see there – that is the very reason why you wanted to generate it in the first place. Build, read, and evolve your models! Generated code is object code.
  • models aren’t always graphical” – of course not. I have written about that before here. The TextUML Toolkit is only one of many initiatives that promote textual notations for modeling (and I mean modeling, not diagramming – see next point).
  • explicitly call out models vs. views” – in other words, always keep in mind that diagrams != models. Models are the real thing, diagrams are just views into them. Models can admit an infinite number of notations, be them graphical, textual, tabular etc. Models don’t need notations. We (and tools) do. Unfortunately, most people don’t really get this.

The funny thing is that, most of the times I read someone citing Harry’s mantra, it is misused.

One misinterpretation of the “code is model” mantra is that we don’t need higher-level modeling languages, as current 3GLs are enough to “model” an application. The fact is: 3GLs do not provide an appropriate level of abstraction for most kinds of applications. For example, for enterprise applications, 4GLs are usually more appropriate than 3GLs. Java (EE) or C# are horrible choices, vide the profusion of frameworks to make them workable as languages for enterprise software – they are much better appropriated for writing system software.

Another unfortunate conclusion people often extrapolate from the mantra is that if code is model, model is code, and thus it should always be possible to translate between them in both directions (round-trip engineering). Round-trip engineering goes against the very essence of model-driven development, as source code often loses important information that can only exist in higher level models. The only reason people need RTE is because they use models to start a design and generate code, but then they switch to evolving and maintaining the application by directly manipulating the generated code. That is a big no-no in true model-driven development – it implies models are not precise or complete enough for full code generation.

So, what is your view? How do you interpret the “code is model” mantra?

EmailFacebookLinkedInGoogle+Twitter

9 thoughts on “On code being model – maybe not what you think

  1. Peter Friese

    May 4, 2009 at 12:16am

    Hi Rafael,

    I agree with your opinion that you just cannot use round-trip engineering in MDSD, as computers (still) are not able to abstract. Model driven software development is a forward-only process for this very reason. Fortunately, most people understand this point as soon as you tell them.

    I also think that the mantra “code is a model, too” does not help very much, because it makes people think that code and model are on the same level of abstraction. Of course, there are situations in which code and model are on the same level. Just think of traditional UML modeling (a class diagram is just a different view of the code).

    To me, modeling is a means to express concepts of the real world in a way that both a developer and a business person can understand it. It’s really a meet-in-the-middle approach. Domain Specific Languages (DSLs) can help to achieve just that. I am not convinced that UML can deliver on this, which is why I usually use Xtext (http://www.xtext.org) to achieve this.

  2. rafael.chaves

    May 4, 2009 at 12:33am

    Thanks for your comment, Peter.


    I also think that the mantra “code is a model, too” does not help very much, because it makes people think that code and model are on the same level of abstraction.

    I think what Harry really meant was that code at one level will be a model to a corresponding artifact in the next (lower) level.


    Of course, there are situations in which code and model are on the same level. Just think of traditional UML modeling (a class diagram is just a different view of the code)

    Well, even in the case of UML class diagrams, there is an increase in the level of abstraction. Think of things like associations (with shared/composite aggregation), multiplicities (with ordering/uniqueness), subsetting/redefinition of properties. None of those things can be represented in any of the OO languages I know of.

  3. Ed Merks

    May 4, 2009 at 7:33am

    This is a very interesting post. I completely agree.

  4. Ersin ER

    May 5, 2009 at 9:02am

    Generally “the code” is being created by developers’ mental work (and the motivations, reasons stay in those heads). The more frameworks or such reusable artifacts are used the more “model” we get and the system starts making more sense as it represents more abstract things than machine code. A software system should be developed using languages and environment which can express it and drive its realization in the most abstract form possible. So that we can close the gap filled by developers mental work generally. This does not mean that we should go crazy with defining models to cover every aspect of systems but it’s a matter of balance.

    On the other hand a basic distinction between code and model for me is that latter is/should be a built in a more declarative manner than imperative.

  5. Marc

    May 22, 2009 at 11:29am

    “Round-trip engineering goes against the very essence of model-driven development, as source code often loses important information that can only exist in higher level models.”

    An interesting statement; one could compare http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology .

    The idea of RTE seems to be that models can be built inductively by looking at existing solutions as implemented, warts and all. The idea of code as a mere instance of an abstract model – another kind of view – is appealing, but it leads to a kind of Platonic ideal of the model. I want my models to be living things that benefit from the successes and failures of implementation – shaped by the metainformation of interaction with the real world. I do agree that one instance isn’t enough to build a high-resolution model, but I’m not sure that a lot can’t be learned by examining many instances as part of the processes you’re modelling.

    You’re entirely correct that that’s not at all the same thing as equating model and code, of course.

    Great point about the appropriate application of 3GLs btw.

  6. Vladimir

    June 1, 2009 at 2:06pm

    “code is model” is really good mantra :)
    Just consider all app code, like: java code, java annotations, configs, DDL, SQL, …
    All of them are aspects of software. It should be enough to restore initial model. Different aspects can be expressed in the same model using different views of a model or by annotating model with appropriate profile (or equally DSL) definitions. If model can’t be restored from generated artifacts then it seems like it contains superfluous information.

    For example I have UML domain model. It contains annotations for: DB schema (tables, columns), ORM code (Hibernate/JPA/Transactions), Web pages (UI artifacts customization). It can be used for generation of a Java code, DB schema, services code, Web pages. This code then can be reverse engineered back to the model (at least at DB/ORM level since its based on a strictly defined specifications)

  7. rafael.chaves

    July 19, 2009 at 11:14am

    Marc, that analogy is quite interesting, thanks a lot. I feel like writing a post about that…

Comments are closed.