I've been working frantically on this swf>.Net>swf compiler (work on the horizon), and right now I'm parsing swf bytecode into the CodeDom. The plan is to be able to round trip from both the CodeDom (actually it will have to a modified CodeDom) and IL, each has its uses. The problem with the CodeDom is that it is incomplete. There are two categories of things missing - some stuff at the 'top' end, like nested classes etc, and some expression stuff from the 'bottom' end, which is a much bigger pain. This deals with the bottom end stuff - mostly unary operators and a few binary ones.

To be fair here, all schemes to generically map random high level languages to random high level languages, will almost certainly be incomplete. Here is the problem - what do you do when language X has a feature that language Y just doesn't have? For example Visual Basic can't do shifts (like >>, although I've heard VB 2003 can, but anyway). So for a 'metalanguage' like the CodeDom, you have three choices:

1) allow shift operators, VB programs will just have to deal with it.
2) don't allow shift operators, C#, JScript etc programs will not be allowed to use them.
3) don't allow shift operators, and pretend that 'CodeSnippets' (literal text snippets) are a nice compromise.

Personally I like the idea of number one. VB programmers are such smarmy bastards anyway, let 'em hang I say. Ok, that's a joke. VB programmers seem to be very sensitive people, especially sensitive to slights, so I thought I'd try one. In fact I learned to program in BASIC on a Vic20 (great language for kids.. err, 'kids too' I mean). Anyway, I like the idea, especially in .Net, of having a standard library that will emulate every CodeDom expression/tricky concept. That way, languages that are mentally challenged can always fall back on calling something like:

CodeDom.EmulationForDummies.RightShift(leftExp, rightExpr);

Oh yeah, no semicolon, sorry. Not that I'm singling out VB here, I'm sure there are many languages that can't shift bits. Spanish and English come immediately to mind. Sure that call will be slower than a real >>, but it allows full compatibility for a very common construct. Besides, if It is possible to write programs for your entire life and never need to use a bit shift operator, than there is nothing to worry about, as it won't ever come up.

Number two, lowest common denominator only, has obvious problems. This isn't so bad for the higher end functionality - at the conceptual end you can often juggle things around a bit, and you can also require 'CLS compatible' which is a nice standard subset target. At the lower end though, it is real painful. You pretty much have to just reject sections of code. Running a program with missing sections of code tends to lead to problems (though that isn't mathematically provable). Essentially number 2 means you can't (always/usually/ever) map valid code from language X to language Y -- which we may remember, is the point.

Option number three is the CodeDom solution - CodeSnippets, and it is very similar to number two in the end. The solution is to emit text, so instead of

(expr)(rightShiftOp)(expr)

you have

(expr)(">>")(expr)

Well, first thing to notice, VB is still hosed. Before your welling tears interfere with reading, there is a second, important thing. Second thing is (">>") doesn't have a lot of metadata, to say the least. What if you are going to a third language that can do shifts, but it uses a different symbol, or a call for it? It probably won't be attempting a parse of every snippet, just like you wouldn't for their CodeCompileUnits. You can put your own metadata in the CodeSnippetExpression.UserData property, but there is still no way another person's code will ever digest your CodeCompileUnits without tweaking their code to fit your ideas. So really you lose your original goal again, a portable description of code. And you don't solve the VB red-headed bastard stepchild problem.

There is a third more subtle problem here too. The missing binary expressions should fit into the CodeDom's CodeBinaryOperatorExpression class. This essentially is a LeftExpression, an operator, and a RightExpression. The operator is an Enum, which is naturally sealed, so you won't be extending that to include your missing operator. Instead you need LeftExpression, Snippet, RightExpression, which of course is no longer compatible with what the CodeBinaryOperatorExpression is expecting - so you have to convert all three to a snippet. However, when you are generating your CompileUnit, you may need to fill things in later, swap things, read metadata to derive types etc. Hard to do when all you really have is the string "x >> y". It is worth noting that while you can generate C#, VB or JS code from CodeCompileUnits, and you can compile and run CodeCompileUnits, there is nothing in the .Net Framework that actually makes CodeCompileUnits. I assume that is because pretty much every real program out there wouldn't work in the current state.

The solution then, is to use your own AST (eg your own CodeDom, but made more friendly to interm code representation). This would be complete regarding your target language(s), and generally easier to work with anyway. You can then map your AST to your target languages, as well as the CodeDom. You still have the problem of an incomplete CodeDom here though, but you do have an easier structure to map from at least. If you extend the CodeDom enough, it could probably become usuable as an AST.

What I've done so far in my work, is add the missing classes (derived from CodeDom classes) in a separate namespace, and then just before generation, the compile unit is cloned, and all the custom classes are replaced with CodeSnippets for the (current) target language. It leads to the question, if you have an AST you're happy with, and the CodeDom you generate isn't portable anyway, why bother? Well, you sort of get portability - just you get multiple CodeCompileUnits that are language specific. People can still edit and run that without needing to know about your CodeDom extensions, so it is something. You can still round trip with your own program. You also get the IL code generated by the Framework. In this case I'm mostly interested in bytecode>IL>bytecode so that is a pretty big consideration. Microsoft generates better IL than I do, hard to beleive I know.

The way .Net gets around this whole multi language problem is with the IL layer (that is, lower level Inermediate Language). It is a pretty brilliant system actually - you consume programs written in other languages via their interfaces, which conform to fairly generic 'lowest common standards' (and no, I don't mean VB here, cripes, don't be so sensitive!), the CLS. You only have to follow those minimal standards (eg no publicly exposed uint's for example) if you want other languages to consume your code (ok, and you have to rewrite VB to bring it up to this minimal level, but I didn't say that). Better yet, you only need the minimal standards on the face of it - inside a method you can shift left until you run out of bits, and then some, because other languages only need to call code, not run it.. The IL that all languages compile to is a generic pcode kind of thing, with tons of metadata. It gets compiled just before its run, optimized to your machine, so it is very fast. The IL has the ability to do most things asked of it, a superset of most languages at a lower level (though it doesn't inherently do everything - eg no multiple inheritance). Just because IL can shift left, doesn't mean VB has to of course, just it has that option. So a language can produce any IL it is comfortable with, and consume off generic interfaces. Its like sex - the trick to having it with many different people is to avoid any specific commitments. Well, there's also the issue of gaining FullTrust for interop, but we can't cover everything here.

For the record, what are the missing 'low level' things from the CodeDom? The following binary operator expressions:

LeftShift (<<)
RightShift (>>)
UnsignedRightShift (>>>)
ExclusiveOr (^)

All unary operator expressions (I have no idea why these aren't in, is there a VB lite I don't know about?):

Increment (++)
Decrement (--)
UnaryPlus (+expr)
UnaryMinus (-expr)
LogicalNegation (!)
BitwiseComplement (^)

There are some higher level things too - nested classes, readonly etc, but these mostly seem to have fairly simple workarounds. I can say this bravely because I'm not doing that part yet.

If you want to read a most excellent book about .Net compiler construction, I can't recommend ('enough' coming after title) John Gough's "Compiling for the .NET Common Language Runtime (CLR)" enough. It is a fantastic book. Most compiler books seem to have 13 chapters dedicated to scanning and generating AST's, and then when it comes actual design decisions, that is "left as an excercise for the reader, but here is an example that is great for addition of integers". This book however, covers everything important and skips everything that (in fact) has almost nothing to do with writing a compiler. It is based on writing a Pascal compiler, which I thought I wouldn't like (another one of those god damn languages I don't use), but it is actually perfect. Writing about a C# compiler wouldn't help much, because IL is so C# already. With Pascal there are enough tricky mappings that you really get a feel for the art, as well as the science of it. I'm assuming of course that if you read this far, you are interested in the subject. If you are just scanning, hoping for one more sex joke, well sorry I don't have one. But Redd Fox does. A woman walking with a friend sees her husband coming out of the florist's with a dozen roses. "Damn, now I'm going to have my legs up in the air all week.", to which her friend replies, "Why don't you just get a vase?".

PS I know there are many VB programmers out there wanting to comment on what an idiot I am. For sure. VB is faster, it invented the word rad, its used by millions, even chickens, it can do all this stuff, and all that stuff too, my facts are just wrong etc etc. I know that, I'm just being silly. I totally respect VB, and VB programmers, really. I say this because I'm somewhat fearful of full bore VB wrath overrunning the comment section in here. Fortunatly I just installed that spam guard thingy. The copy-the-number-into-the-textbox step should keep the majority of them at bay.

posted on Thursday, November 06, 2003 2:36 AM
Feedback
  • # re: The CodeDom
    darshan
    Posted @ 11/6/2003 4:14 AM
    Now that was a fun read. lol!

    Personally I have used a flavor of VB, VBScript quite a bit with ASP. And yeah the lack of unary operators is REALLY frustrating. Also you forgot the funky On Error Resume Next error handling!

    Btw, What on earth are you going posting an essay on VB bashing (erm, the CodeDom) at 2:36 A.M.?! :)

    cheers,
    darshan

  • # re: The CodeDom
    Robin Debreuil
    Posted @ 11/6/2003 4:44 AM
    Doing? Why drinking of course : ). What are you doing reading a blog of ill repute, in what is probably a beautiful afternoon? You have one of the most brilliant minds I know of Darshan, I'd avoid polluting it here if I were you. Then again, if it is this or VBScript, crack a beer and pull up a chair...

    They didn't have 'Resume Next Error' last time I programmed VB, it just happened automatically in those days. I did teach vba excel course last year though. A sure sign of a crappy language is when nothing you write infront of a class full of 'students that once respected you' will compile.

    There is the worst thing about VB - it is laughably childish, yet I can't understand it anymore. What else does that leave me, but to make fun of it?

    Dirty pictures coming soon.

    Your friend,
    Robin

  • # re: The CodeDom
    Ahmet Zorlu
    Posted @ 11/6/2003 6:47 AM
    a swf > clr > swf compiler-decompiler is a quite powerful concept. MS engineers also might be working on a similar project. Who knows maybe Sparkle (which comes with Longhorn) will export (and/or import) swf files.
    I have used CodeDom just for generating class stub code. Code generated with JScript .NET CodeDom provider can be converted to AS 2.0 code with some text manipulation. I haven't dealed with all those low-level complexities (just used code snippets when things get complicated). For sure, one can do a lot with System.CodeDom and xsd.exe.
    Best of luck with your project,


  • # re: The CodeDom
    Robin Debreuil
    Posted @ 11/6/2003 3:19 PM
    Thanks for the comments Ahmet, I came across your excellent overview of the codeDom many times while working with all this:
    http://www.zoode.org/index.php?m=200309#52

    I hadn't thought of just transforming jScript, great idea... One of the other things that is going in to this a stub library for flash (eg movieclip class, Sound class etc), that way you can get intellisense (and testing, docs, vss, etc etc) in IDE's that support it, and you can compile to IL and then just go bytecode to bytecode. I really want to do Flash work in C#, in Visual Studio - that is the real motivation here ; ).

  • # re: The CodeDom
    darshan
    Posted @ 11/7/2003 6:22 AM
    I guess that was the server timestamp, cause your other comment came at unbelievably 4:44 A.M.! :)

    Looking forward to the dirty pictures, of the err, Vic20 :)

    P.S: Twice now i have forgotten to enter the spam guard number. An alert reminder would be usefull...

  • # re: The CodeDom
    Robin Debreuil
    Posted @ 11/7/2003 4:33 PM
    Hey Darshan,

    A reminder for a blank entry would be a great idea, I'll try to put that in. Thanks : ).

    PS. I generally stay up nights and go to bed once the kids are off to school. I always thought it was just being in the wrong timezone, but then it was the same in China. Daytime is just too interesting to get any work done I guess (although not in Manitoba - the only noticable difference between day and night here is a solar calculator works better in the daytime).

  • # re: The CodeDom
    darshan
    Posted @ 11/7/2003 9:35 PM
    Wow, that adds a new meaning to the expression, "Coding in the dark" :) I am usually asleep by 11. I think i'll try that sometime...

    cheers,
    darshan



  • # Should an O/R mapper also be a code generator? A code-gen fan answers
    JosephCooney
    Posted @ 1/2/2004 9:06 AM


  • # re: Fustrated with CodeDOM
    aspZone.com
    Posted @ 3/7/2004 11:25 PM


  • # re: The CodeDom
    Joe Ward
    Posted @ 3/10/2004 7:21 PM
    Actually you can use CodeDom to create nested classes. CodeTypeDeclaration has a Members property which is the collection of members of the class. To create a nested class, simply create a second CodeTypeDeclaration and add it to the Members of the first CodeTypeDeclaration.

    I have used this with C#, but I suspect it works with other generated languages as well.

    Joe

  • # re: The CodeDom
    Robin Debreuil
    Posted @ 3/11/2004 4:15 AM
    Thanks Joe,

    I was just working on members and types in the last few days and noticed CodeTypeDecl inherited from CodeTypeMember - oops. Well I'll have to take that part back then ; ).

    I've actually kind of given up on using the codeDom for the AST - there are also a lot of tiny details also missing (like some specific places to attach custom attributes for example). Anyway I've just went with a kind of 'csharp dom' and I figure if I really want the code dom it is close enough that it shouldn't be a chore to get. The CodeDom is a great concept though, I hope it fleshes out a bit in the coming years...

    Thanks again,
    Robin

  • # re: The CodeDom
    Robin Debreuil
    Posted @ 3/11/2004 4:55 AM
    I was also thinking that it should be possible to get all the unary operators by wrapping the expressions in a method that returns a value. So ++x becomes PreInc(ref x), and x++ becomes PostInc(ref x) (where it returns the original x value, not the incremented one). Perhaps you could set that up in such a way that C# would get unary operators - maybe with #if or something.

    At least it should be possible to take any C# code and generate a valid codedom structure that way. Then again, I might well be forgetting something...

Blog Stats

  • Posts - 121
  • Stories - 1
  • Comments - 1441
  • Trackbacks - 47

.Net Blogs

01101 Blogs

Flash Blogs

Graphics

People