Sunday, January 27, 2008

DSL Design - that's Domain Specific Language

A Domain Specific Language - for those who don't know - is a bunch of functions named so that writing function calls 'reads' like natural language. It seems to be sprouting wildly in Ruby - most probably because Ruby doesn't require parentheses around function arguments and Ruby programmers are kind of rebels anyway.

For example:
rabbit_jumps_in_the_hole :hole_size => 10

Even if you don't understand Ruby Syntax, you know what the function call does.

Think of it this way:
  1. Function/methods are really 'verbs'.
  2. Function Options are 'adverbs'
  3. Object Identifiers are 'nouns'
  4. Object Attributes are 'adjectives'
  5. Class Names are 'class names' [gottcha!!!!]
I think we should think of learning one of these DSL things the same way we think about learning a new Language.

This can be either a good thing or very bad.

Size Matters

Which is easier to learn: A language with 10 verbs or one with 1,000?

Just for the heck of it, I recently tried to get a count of the 'verbs' in Rails 2.0.2. I ran 'egrep -r 'def [a-z]' on all lib directories and came up with:
  • actionmailer/lib 694
  • actionpack/lib 1393
  • activerecord/lib 1134
  • activeresource/lib 125
  • activesupport/lib 577

  • Total 3923
In contrast, the Merb Framework is much smaller:
  • merb 713
  • merb.rb 19
  • tasks.rb 0

  • Total 732
Of course, this isn't fair because Merb doesn't come with an ORM [Object Relational Mapper library (bunch of database access functions - for those really out of it)] [or Active Record Pattern Implementation, for those . . . - well, you know who you are], so you have to add that in.

But Merb gives you a choice of ActiveRecord - with it's 1,100 verbs; DataMapper - with about 500 methods; or Sequel - with about 600.

So learning Merb should be easier than Rails because the vocabulary is about 1/3 to 1/2 the size.

Synonyms are Bad

Which is easier to Learn: a language with one word for each concept or with two or more?

A programming language or environment isn't meant for composing poetry, novels, or movies. It's supposed to precisely express a procedure. Period. It should be concise. That makes it easier for Programmers to understand.

Case closed. DSL's should be concise, singular, and boring - but very, very accurate.

Corollary:

The Rails Inflector is a mistake in every possible way:
  • It Expands rather than Tightens the vocabulary of the Rails DSL
  • It injects confusion because Programmers now have to worry about singular and plural forms depending on context
  • It doesn't work:
    • 'XMLClass'.underscore -> 'xml_class'
    • 'XMLClass'.underscore.camelize => 'XmlClass'
    • 'slave'.pluralize == 'slaves'
    • 'slave'.pluralize.singularize == 'slafe'
  • It wastes lots of cycles doing it - machine, programmer, and learning
Distance is Good

I'm talking about the distance between words. For example frog is very close to frogs but far from toads. That makes it easier to tell a frog from a toad in print than in real life.

Good DSL design should not only use expressive and concise identifiers, but should also keep them far apart, especially when the referents do significantly different things.

Again, picking on Rails, the methods update_attribute(attribute) and update_attributes(attributes) are very close together, but one bypasses attribute Validation. Can you tell which one by the names? Don't you think it's important to know?

DSL is Not Documentation

Most DSL seem to grow more or less organically. The Ruby universe is filled a lot of apparently useful packages with virtually no documentation. Almost all of them have fairly reasonable API documentation - which allows 'one' to learn what each of the 'verbs' in the DSL do, but that's like learning to drive a car by reading Glossary of the Parts! It Just Don't Work.

It's hard as hell to learn a system without some sense of what the thing is supposed to be doing and how it's put together.

Don't belive me?

Figure out a Car from stuff like this:

Wheel - 1. circular object in contact with ground; 2. circular object interfacing driver to directional controls.
Nut - 1. Device for attaching wheel; 2. driver in other automobile; 3. nutritious snack
etc.

That's API doc and that's what you've got when all there is is the DSL.

'nuff for now

2 comments:

Anonymous said...

Why do you feel that a DSL has to read like a natural language? While this can be the case with some DSLs, it does not seem to me to be a requirement.

Mike Howard said...

Actually, that's the point: DSL's shouldn't be 'natural language' and attempts to make them 'more natural' by adding synonyms, tenses, plurality, etc make them difficult to use.

In other words, I think DSL's should be similar to natural languages, but not as flexible.