Thursday, January 13, 2011

The case of the wise flamer: In Memoriam Edsger W. Dijsktra

If you go to my desk at my current workplace, you'll find a bunch of books on top of each other. As of today, the topmost of them is a hungarian edition of Structured Programming from Ole-Johan Dahl, Edsger W. Dijsktra and Sir C. A. R. Hoare, which was published originally in 1972, although the articles in it are much more older.

Most people would have think that a book from 1978 has nothing to say about our current affairs. Nothing could be further from the truth. In fact, those things which stand for nearly 40 years might be the ones that are universal to our profession. Do believe me, a lot of things do!

On one side, Dijsktra could easily be one of the greatest flamers you've ever met. It's enough to look at this article about programming languages from him - don't forget he got a Turing award for his achievements in that field! - to know what I'm speaking about.

Most of people will remember him by the Dijsktra Algorythm, maybe the disappereance of the Goto statement from modern programming techniques. Not that subsconciously we wouldn't use a lot of his findings when we're writing programs.

One of his greatest misunderstandings I believe was about his rejection of versions, and version control systems. He said in an interview in 2001 that "[programs have version numbers] even with decimals like version 2.6 or 2.7, that's nonsense; while version 1 should have been the final product". He didn't see the need of tracking changes of a software, as they were to be implemented only once, perfectly.

He even wrote his articles this style: on paper, by hand.(In fact, he used a kind of blogging sometimes I'm tempted to use this way :) He gave photocopies of these handwritings out to friends and collegaues, who may have distributed it themselves.

Never forget that he was the strongest advocate of TDD in the early 70s, when Kent Beck, the inventor of unit tests was most probably still occupied by more childish stuff (born in 1961.)

Still, I do believe that his mistake was based upon the perception that customers, or whoever is to order the software from computer scientists - he used this title consequently - had a perfect, clear idea of how the product should look like, even if expressed vaguely. Maybe later we will be able to return and reveal the truths behind this platonian thinking, still it's probably not the way most our industry experiences software specification on a day-to-day basis.

(I do believe this thought has some base to stand, although I feel myself too weak and early to prove its pillars unfortunately)

Another of his mistake was that he didn't count on the externalities: it may be that the environment of the system changes aggressively in the lifetime of the product, wether it be development or production phase. For new needs, one must respond with a set of changes, the quickly the better (or at least faster than the competition does).

Also, it may be that our users have different needs than we may have thought: although we have scientific measurements to prevent this, as with social sciences, there's always a large area of mistake or unexpected things to happen.

His style sometimes makes it harder to understand the deep messages behind his writings: truly a genius of his age, Dijsktra was able to form our just-born-then industry, yet his findings may have much more long-term effect than we would be able to perceive, even today.

So, go and find some of his influental papers and interviews, maybe even those linked in this article. Then think about it: what if the really important things are to stand regardless of the centuries, even if a flamer says them first?

Sunday, January 9, 2011

FPA: an old, forgotten method for estimation

(Originally comment on InfoQ)
I don't really use new stuff which has an "agile" vignette on them even in an agile environment.

An FPA estimation spreadsheet

In the most agile team I had, a real agile one (own company, only developers, own product, full commitment, cycles and so on) - and even afterwards, I used a modified version of the Function Point Analysis.

What does it say?
- Every user story will sure have some datasources to get data from. Say these are Input Forms (IFs). Let's say creating such a form takes 5 points.

- Every user story will surely have some outputs. Let's call them Output Forms. Let's say it takes 4 point to create them

- Every user story will have to deal with some internal datastructures - tables, files, so on. Let's see how many distinct classes (tables, filetypes) seem to be involved; let's say developing handling of such a type is about 9 points, and call them Internal Logical Files (ILFs)

- Some user stories will have to deal with systems not within reach; these could be unknown libraries, web service APIs, maybe even the mailsystem. Let's say handling such stuff is 12 points.

Then, for each of the stories, we have a formula, like:
Creating a login screen = 1 IF, 1 OF, 2 ILF, 0 EX

Add them together multiplied:
1 x 4 + 1 x 5 + 2 x 9 + 0 x 12 = 18 FP (function point)

We could have also additional measures, but this will make them ok.

Now we have to determine how much time does it take to create a single function point. It takes a lot of efforts; my agile team had about 0.5 hours, the enterprise team was sometimes as slow as 2 hours. So, creating a login form with all bells and whistles takes about a day, or a week, depending wether you are in a small, enthusiastic company, or in an enterprise cubicle. Makes a lot of sense, even if you don't like the results.

Also the factors could be changed.

It's not agile, but this method was able to provide +-10% efficiency; so, after the second iteration, it's expected that a two week sprint's products cannot be late more than a day.

Archive: Software Engineering as an engineering discipline

Cross-posted from old blog

I have told this multiple times, maybe it's worth a blogpost on its own. It's mainly a reaction to recent posts of Dave West from InfoQ, and the discussions formed beneath them. Although I can feel it through what makes someone to think this way, yet I have different opinions which I would like to tell.

I have been hearing a lot of times about wether what programmers do is engineering, is science, art, or what. When I was about 18, even I had some thoughts that building software is not about engineering, but now, perhaps just because I became a master of software engineering officially (ok, it's called engineer-informatitian in hungarian), I do think it is.

Why does the question arise? Because our daily job - at least, for a lot of us - is not based on science, but rather is about some chaotic finding-your-way thing. It rarely involves drawings and science - not gut - based calculations if you don't explicitly insist on them, especially not in the enterprise world.

Some say 60 years ought to be enough for an engineering discipline to form, and therefore this isn't one; I think it otherwise. I think it will take us a lot more time to find out what this thing is, even if we reached this far, and even if our profession has roots in the ancient Egyptian civilization (have you ever thought of that the basis of Egyptian tax administration is a series of calculatiosn based on water sensors and other aggregated data?).

Let's start with two questions: what is engineering? What is software engineering?

Let me answer the first question with a personal point of view, and a second with an official one.

Archive: Testing legacy web applications using a black-box method

Cross-posted/archive:Originally a comment on an InfoQ article, later became an article on my personal blog

Recently, a whole module of a legacy web application written in PHP4 around 10 years ago (and constantly "maintained" since) was needed to have new features.

We're talking about 1000s of lines of code within one function, or even case of a switch case.

Most of the time, people don't refactor maintained legacy applications, as somebody told me "the first rule of support development is: don't change anything other than requested, just add your stuff."

I haven't been able to track the application back to its beginnings but its imminent to me that if-else branches don't grow to 600 lines by themselves, without human intervention. Somebody has to mess these up, and someone has to have such kind of thinking. This is pretty general in enterprise programmng:

  • most of the tasks are about legacy applications

  • people fear to clean things up

  • it's not about development, but adding requested features and fixing bugs.

Also, PHP is a dynamic language, and therefore formal refactoring tools are usually unavailable. For example, PHP refactoring support in netbeans is basically non-existing.

So, what would you do here?

I decided that a system's answer is dependent only from its input and its context. This seems pretty straightforward:

System ( input, context ) -> output

OK, what is the input of a web application? Of course, its HTTP request! In PHP, it's hard to think about any other input.

What's the context? Context is given by two components basically: the underlying platform, whatever it is (no matter you have a framework or just common libraries, we call these platform together), and the persistent data layer. So:

Web app (request, persistent data) -> answer

(I know, I forgot platform to add, but in refactoring scenarios, platform should stay the same anyway. In case you change platforms too, there are other complications which we won't talk about this time.)

What's the answer? First, it's an HTML (or XML, JSON, etc) output. We didn't have to care about it in this particular case. The other output is: changes to the persistence layer. It's unusual for web applications to change anything other than their database and cache layers.So:

Web app(request, persistent data) -> (written-out response, persistent data' )

OK, what to do? We have an old system and we want to refactor it to a new system, and the question was: are they equal in functionality?

Question is: Web app == Web app' ?

Let's see what I did:

  • Ask a manual tester to go through every possible combination on the user interface
  • Recorded these into files (serialize($_REQUEST)), or, even better, (serialize($GLOBALS))
  • Ask the DB layer to NOT write anything to the DB (ugly global variable hack, if it is present, only select queries are executed), this way, we ensure that we keep a consistet state
  • Record every writing operation (so, instead of executing them, take note of them)
  • An algorhythm:

    1. load the serialized request,

    2. start recording db,

    3. run the original controller,

    4. collect db recordings,

    5. re-load request (in case it was modified by the original controller - we could never know)

    6. run the new controller

    7. collect db recordings

    8. see if the two are equal

This way, I could be sure, that in all of the scenarios a manual tester could come up with, both of the controllers behave the same way.

After the original recordings, I did a few additional points:

  1. re-load the request again
  2. enable db writing
  3. run the new controller
  4. display result.

This way I could create an - albeit slower - but seemingly normally functioning version of the software, which did everything it did previously, and it was verified that functionality haven't changed with the new controller.

I called this blackbox-harness test.

What do you think?

What's this blog?

Back in 2009, I started a category on my personal blog, called "Architect Things". They covered some of the every-day problems I faced while working on a project. This is the continuation of that project

Who's this guy?

My name is Adam Nemeth. I'm a software engineer MSc. You can see my CV at LinkedIn. Usually I run under the name @aadaam (which is both my twitter and facebook account)

My main field of interest is applying software engineering theories in real-life situations. Normally, I'm at the frontend side (where the users are), and using dynamic languages, like JavaScript, PHP or Python, which makes my adventures a bit more weird.

Nonetheless, UML doesn't care about which language will you implement in, nor do real principles. In fact, it's a good test for them to check wether they're that universal :)

As you've probably guessed, in the last few years, I was doing software development teamleading, and large scale software design for large companies.

What could I expect here?

High-level thoughts on software engineering, including, but not limited to real-life modeling practices, architectural and design patterns, and some general ideas which are not directly about implementation. There are too many people speaking just about those...