Last.FM: Recently listened

Monday, August 27, 2007

Lexer rules need ordering...

I was trying to parse lines like

import <foo/bar.h>

using ANTLR, when I found out that ANTLR's lexer rules have a precedence: rules are tried in the order they appear in the .g file. In my case, I had a rule matching a keyword such as "import" and a rule matching anything that is a word. Playing around with rules and their order, I found that my grammar works, if I write the keyword rule before the word rule, otherwise I get a "rule not reachable"-error.

This is how you get the file name out of the line above:

fragment DIGIT : '0'..'9';
fragment CHAR : 'a'..'z' | 'A'..'Z';

IMPORT : 'import' ;
GT : '>' ;
LT : '<' ;
WORD : CHAR (CHAR|DIGIT)*;
WS : (' '|'\t'|'\n'|'\r')+ { self.skip(); } ;

filename : WORD ('/' WORD)* '.' WORD ;

import_r : IMPORT LT f=filename GT {print $f.text};


Monday, August 20, 2007

Bastei, ANTLR

I haven't posted for a while, time for an update. When not hacking L4Env, I am currently pursuing new roads.

First, I have been playing around with the Bastei OS Framework a little. Bastei is a C++-based framework for managing resources in a multi-server operating system. It has been implemented as an experiment by Norman Feske and Christian Helmuth. My first little contribution was a small Hello-World client-server example that can be used as a Getting-Started-Tutorial. Since Bastei is not available to the public up to now, I don't post it here. But the time will come...

Secondly, I am playing with ANTLR (ANother Tool for Language Recognition), a shiny parser-generator written in Java, which is able to generate parsers in many different languages such as C, C++, C#, Java, Python, ... you name it. I am currently working through Terence Parr's ANTLR Reference which I can definitely recommend for anyone interested in using ANTLR 3.0. Using ANTLR I did two things up to now:
  • I wrote an IDL parser for our L4Env IDL files that generates an abstract syntax tree from those files. Now that the parser works, I could easily write an IDL compiler - given ANTLR's strong template engine this should be pretty nice. I'm however not sure that this is something really new so I'll have a look at other things.
  • For debugging my grammars I wrote Tree2Dot, a tool to convert ANTLR's string representation of ASTs into GraphViz' Dot format. Sure enough this was before I reached page 191 in Parr's ANTLR reference, where I found out that this is already a builtin feature of ANTLR and I could have used it pretty easily. ;)
Now it's time to find out what I can use my newly learned parsing skills for. Several things come to mind:
  • As a first experiment I'm probably going to write a tool that takes IDL files and checks whether all the functions have Doxygen comments attached. If not, we'll add templates to the file. If the tool gets really cool, it will also use existing Doxygen comments and adapt them if parameters change etc.
  • Starting with the IDL parser I'd like to investigate how far we can go generating unit tests for specific interfaces. Typical approaches use programmer annotations to find out what specific parameters to a function are, what the parameters' domains are, and in which order functions of an interface need to be called. However I'm not convinced of the concept of annotations, because programmers are lazy. Therefore I'd like to minimize the amount of additional work posed on them and try to automate this work using static analysis.
  • Norman Feske gave me some other directions related to the Bastei framework:
    • Bastei uses statically allocated message buffers for its IPC framework. One could use a C++ parser to generate a callgraph from Bastei code and therefore try to determine the necessary message buffer size.
    • Bastei's IPC framework implements IPC using C++ streams (which results in code that can be heavily optimized by the C++ compiler). However in contrast to IDL there is no compile-time checking for function parameters anymore, servers need to trust clients to provide the right arguments in the right order and vice versa. Using a static analysis tool, we can look at clients and servers and detect if arguments are passed in a wrong way - this will mainly invoke counting the arguments passed given a certain opcode as well as having a look at the types of arguments.