Friday, February 6, 2015

The shape of things to come


Our new compiler is here!! The stats:

  • Compiler: 3500 loc
  • Extensions: 1000 loc, 15 extensions, about 66 loc per 
  • Tests: 12 passed
  • Dev time: 3 weeks flat
  • Awesomeness level: not for me to say (the awesomenest level)

But first thing first: what was wrong with the old compiler? A lot, what was there was basically me learning Roslyn and amalgamating features whether they were round or square. In other words, the beauty of writing prototypes: they show you all that was wrong with your initial thinking.

And the main thing I learned was not to be overly reliant on Roslyn. Excess is at its core a substitution engine: write the extension, wait for the compiler to complain, modify the syntax tree of the extended code so its valid C# and Roslyn is all smiles.

Problem is, Roslyn is inconsistent detecting defects. And don't get me wrong: Roslyn tries real hard and its absolutely awesome at this. But it still goes haywire in certain cases and in ways that make the extension writer's job a living hell. It is all about patterns: If you can get Roslyn to report your extension's errors consistently you are golden. But sometimes it is awful, take for example the current match extension up in the website:

           match(x)
           {
               case 0: do_something();
               case > 10: do_something_else();
           }

In this case Roslyn scoffs at cases without a matching switch and produces a stream of nodes that makes little sense and forces a lot of parsing chasing expressions and whatnot. We were promised better. The solution, obviously, is to help Roslyn a bit by feeding it a parse tree with less distortion.

What if we could simply substitute the match keyword with a switch and then process it syntactically? That would certainly make Roslyn all warm and fuzzy and we could be processing a sound, predictable parse tree and hence we would live happy ever after. Well, fear no evil my friend:

            var lexical = compiler.Lexical();

            lexical
                .match()
                    .token("match", named: "keyword")
                    .enclosed('(', ')')
                    .token('{')
                    .then(lexical.transform()
                        .replace("keyword", "switch")
                        .then(ProcessMatch));

This shouldn't require much explanation: during the lexical pass (i.e, before compiling a parse tree), match a pattern corresponding to our match keyword (no relation). When found, replace the keyword with "switch" and only then call ProcessMatch. Which is s Roslyn function taking a switch statement and transforming it into a series of ifs. You can  see that code here, much simpler and solid.
 
The first thing to notice is the process is now completely declarative, you tell the compiler what to do without having to worry too much about how is done. Very easy to read and understand. The plumbing (i.e, tree transforming) get confined to specific methods and only deal with specific parts of the tree.

Second: the compiling process is now divided into passes, first you transform the token stream (lexical), then the syntax tree and finally you apply semantically-derived changes where needed. And the syntax you use is consistent throughout your entire extension. For instance, a constructor construct looks like this:

            sintaxis
                .match<MethodDeclarationSyntax>(
                    method => method.ReturnType.IsMissing &&                                                                                        method.Identifier.ToString() == "constructor")
                .then(ProcessConstructor);

Or: find me a method missing its type and with the name "constructor" and apply it a transformation. It don't get any simpler, does it? The result of all this is my old xs language being written in about 600 loc, including the following extensions to C#:

1- JS-style function as lambdas 
2- JS-style function as method, with return type inference 
3- JS-style function inside of code blocks 
4- function as types 
5- method construct with return type inference and automatic visibility 
6- property construct, with type inference, automatic visibility, initialization 
7- constructor construct 
8- Delegate-less event declaration 
9- Syntactic event handlers: on click(args) {} 
10- JS-style arrays: var x = [1, 2, 3]; 
11- typedef construct, supporting generic parameters, etc. 2 variations. 

But more importantly, that the ability of users (not owners) to extend their languages of choice is real, and real easy. 

No comments:

Post a Comment