Automatic programming from informal specifications

One of my worries about genetic programming is that we may just be shifting the problem of programming to the (equally difficult?) task of problem specification in the fitness function. This would be similar to the move made from assembly language to higher level languages such as Fortran in the 1950s. At the time this was seen as essentially solving the problem of programming, but that clearly isn’t the case, the problem was merely shifted to a different (albeit easier to understand) paradigm. In the 50s there were advantages such as easier to read and debug code, and shorter programs. Now, if we make the complete shift to formal specification for an evolving generator, what will the benefits be?

Initially it seemed to me that this would be a no-win situation. By requiring a formal definition of the problem, non-trivial programs are still going to require non-trivial specifications. Further, the skill of writing non-ambiguous and correct formal specifications is a challenging task requiring a reasonably in-depth knowledge of logic and the sort of mathematics that many shy away from.

What would be the ideal situation? Ideally any average non-programmer would be able to generate a program by doing no more than describing what it should do. That is still infeasible, but if we could automatically generate programs from an informal requirements document, then that would be a useful step! This has more practical potential than is immediately obvious. There has been a rather large amount of research, going back as far as the 1980s, into automatically generating formal specifications from informal, natural language definitions. Some of the results appear very impressive. So, if we are able to generate formal specifications from something close to natural language, and if we are able to automatically generate computer programs that satisfy those specifications, then it immediately becomes obvious that we can combine these two steps. We can generate programs from an informal description with very little expert interference. That is the theory at least.

This is not trivial in practice since there are difficulties in expressing formal specifications such that they provide a fitness function which rewards partial solutions and the generation of the specifications from natural language is farĀ  from perfect. However, these problems are certainly solvable and the solution would be a major step towards an admirable aim.

Tagged with: , ,
Posted in Genetic Programming
2 comments on “Automatic programming from informal specifications
  1. Michael Cox says:

    I was just thinking about your post and thinking about a fitness function for solving encryption, for example.

    It would indeed be very tricky, but more so than working out the decryption? – Hard to say.

    You’d want real words, as found in the dictionary, but also with spelling mistakes. Also slang and other things.
    Then those words would have to make grammatical sense.

    The problem then comes that a genetic solution for one script almost certainly wouldn’t work on another.

    Perhaps that implies that either that multiple inputs would be needed for the training, or that decryption isn’t a good candidate for GAs.

    • tom says:

      This is something I have considered actually, and might write a post about at some point.

      The important thing when designing the fitness function is that there needs to be some way of rewarding partial solutions, in order for the solution to build up gradually. So, assuming we’re just talking about a basic monoalphabetic substitution (cryptogram), then ideally a candidate solution which had one letter correct, should receive a better fitness score than one with none correct. This might make a fitness function based solely upon words quite difficult, since it would require whole words to be correct before rewarding a solution – which would mean you would have to get very lucky with your randomised initial populations in order to decrypt anything. A better approach might be to base the fitness function on a combination of letter frequency, digrams, trigrams and words.

      I believe there have been some attempts at this before, but I haven’t read them in any real detail.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">