Wednesday, May 18, 2011

Tuesday, April 26, 2011

Creating a Rails 3.0 Gem

Building a Gem for Rails 3

This is a 'work in progress' - any and all corrections, comments, suggestions
welcome!


Believe it or not, it’s not that hard.

Here’s the basic outline:

  • lay out a basic Ruby gem, with the normal directory structure
  • decide how you need to hook into Rails
    • if you need to patch some of Rails internal structures - such as
      add some functionality to ActiveController - then you need to write
      a Railstie
    • if you need to add a controller, a model, view, rake task or a generator -
      then you probably should use an Engine. [the difference between a Railtie
      and an Engine is that an Engine is a Railtie with more stuff]
    • Or, if you want to embed an entire Application into another, you can use
      subclass Rails::Application. If you need to write an Application, then
      you need to find somebody who knows how to do it.
    • Or, if you want to make a Plugin, you should forget it and just build a gem
      and either implement a Railtie or an Engine
  • build, test as usual. Below I’ll show how you can hook your local, development
    gem into a locally run Rails app.
  • package and ship out to github.com and rubygems.org

In everything which follows, I’m assuming that you are adding functionality to
Rails which requires some sort of ‘initialization’.

What does ‘requires initialization’ mean?


Ok - when does your gem ‘require initialization’?

  • if you need to run a rake task or a generator to install some stuff in order for
    your gem to work - then it ‘requires initialization’. In fact, you should probably
    create an Engine.
  • if you’re adding controllers, views, etc, then you need an Engine
  • if you want to monkey patch ActiveController or one of the other basic Rails
    classes, then you will want to ‘include MyGemModule’ into that Rails class when
    it is autoloaded. For that you need to insert yourself into the autoload sequence
    and so you ‘require initialization’. Specifically, you need at least a Railtie.
  • finally, if you’re not doing any of this and there is some way for your gem
    to provide services without any startup initialization and without living in any
    namespace Rails knows about
    , then you don’t need initialization.

So, the answer is - pretty much all gems which will extend Rails are going to
‘require initialization’ and your initialization stuff goes into your Railtie
or Engine.

So …

What goes into your Railtie / Engine


Surprisingly little - but figuring out what that ‘little’ is can be
daunting.

Here’s the Railtie I wrote for my manage_meta gem:

module ManageMeta
  class Railtie < Rails::Railtie
    initializer "application_controller.initialize_manage_meta" do
      ActiveSupport.on_load(:action_controller) do
        include ManageMeta
      end
    end
  end
end

There’s a lot going on here in 9 lines (typical Ruby compactness!)

Before going into this code, it will help to put it in the context of how
this gem is structured.

  • the gem and github repository are both named manage_meta
  • all the code which does something is in manage_meta/lib
  • the file which is required is named manage_meta/lib/manage_meta.rb. All it does
    is require files from manage_meta/lib/manage_meta/.
    • It always requires manage_meta/lib/manage_meta/manage_meta.rb: this allows
      me to write unit tests which are independent of Rails
    • It conditionally requires the Railtie.

Here it is:

require 'manage_meta/manage_meta'
require 'manage_meta/railtie' if defined? Rails

OK - so all we need to do is get our Rails app to include manage_meta/lib/manage_meta.rb.
But this is a distraction right now. We’ll get back to it when we go
over the Rails boot process.

By the way, the names are significant!

  • The root directory manage_meta, the top-level require file manage_meta/lib/manage_meta.rb,
    and the subdirectory of lib manage_meta/lib/manage_meta/ need to all be
    the gem name.
  • I think that the Railtie (or Engine) really needs to be named railtie.rb
    (or engine.rb) in order for Rails to find it. [this is a guess which I may
    never resolve because: 1. it works when I do it like this and 2. there’s no reason
    not to. If anybody can confirm this, I’ll remove this caveat and give them credit]
  • module ManageMeta - I’m extending my module with some Rails specific code.
    While not visible here, this is conditionally included in lib/manage_meta.rb

OK, back to the Railtie:

  • module ManageMeta

    Your gem must be namespaced to a Module and you have to define your Railtie (or Engine)
    within that module.
  • class Railtie < Rails::Railtie (or class Engine < Rails::Engine)

    Sure, you can call it Bob if you want, but why bother? It’s real name is
    ManageMeta::Railtie or MyGem::Railtie - which is safely namespaced, so there
    won’t be any conflict here.

    The important thing is that this get’s you all the stuff in Railtie - most importantly,
    for what we’re talking about here, this is where you get the ability to initialize
    by calling initializer [defined in rails/lib/rails/inializable.rb (see the pattern?
    everything is a gem)]
  • initializer “applicationcontroller.initializemanage_meta” do …

    initializer takes a name, a block, and (optionally) a couple of options. The
    option keys are :before and :after and we’re going to ignore them for now.

    It builds an Initializer instance and stuffs it into the array-like object named
    initializers. I say it’s array-like because all the Initializer instances are
    in sequence, but the options allow anyone in the know to place their specific
    Initializer.

    Let’s just take it on faith that as Rails boots, it goes through initializers
    and runs the code blocks we pass in.

Here’s the initializer code:

def initializer(name, opts = {}, &blk)
  raise ArgumentError, "A block must be passed when defining an initializer" unless blk
  opts[:after] ||= initializers.last.name unless initializers.empty? || initializers.find { |i| i.name == opts[:before] }
  initializers << Initializer.new(name, nil, opts, &blk)
end

  • ActiveSuport.onload(:actioncontroller) do ; include ManageMeta ; end

    Finally we get to the place we were going.

    ActiveSupport seems to be pretty much a support library which just lays around like
    a sleeping dog until you poke it to do something. Then it does it and goes back to
    sleep.

    One of the things ActiveSupport provides is support for creating queues of code
    blocks which can be ‘run later’. This support is defined in
    active_support/lib/active_support/lazy_load_hooks.rb - which defines 3 class
    methods:

    • on_load(name, options, &block) - which is we use to add our code to a hook
    • runloadhooks(name, base = Object) - which runs all the defined hooks. We don’t
      touch this - but it’s really important.
    • execute_hook(base, options, block) - which we never touch - it’s used by
      run_load_hooks to actually run the code block on the object

So, here’s the short version of what those 9 lines do:

It adds the statement include ManageMeta to the autoload sequence which is triggered
when ActionController is first loaded.

It does it by adding that code to the on_load chain keyed to the name :action_controller.

Let’s Try to Generalize This


OK - so how did I know how to do this?

Code crawling - of course.

I grep’ed the Rails 3.0.6 gems for calls to run_load_hooks and found the following
keys defined:

:action_mailer
:action_controller
:action_view
:active_record

:i18n
:before_initialize
:before_eager_load
:after_initialize
:before_configuration

The first four (action_mailer, action_controller, action_view, and active_record)
all call run_load_hooks at the end of ::Base. So if you want or need
to hack these base classes, you just copy the code above and use the appropriate
name.


So far I haven’t found a need to hook into any of the other 5, so I’ll just leave
those as an exercise for the reader. [send me a link to what you find and I’ll
add it here]

But but but … What about Engines?


Well, one place you need an Engine if you have some stuff in the app directory.

So what you do is add an app directory to your gem, structure it like a Rails
app directory - and it “just works”.

Another place is if you want some rake tasks.

What you do then is create a directory called manage_meta/lib/tasks/ and put
your rake tasks in files called a_task.rake - and it “just works” too.

Another place is when you need a generator.

Here you need to call the generate

I haven’t messed with building a generator yet - but it’s on the list.

As soon as I get to it (and have at least a less vague idea of what to do), I’ll
add some stuff here

But … But … But … How does Rails Find my Gem and All this Goodness?


Glad you asked.

The answer is “The bundler did it!”

If you look in your Rails app in config/boot.rb, you will find a file which
contains something like:

require 'rubygems'

# Set up gems listed in the Gemfile.
ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../../Gemfile', __FILE__)

require 'bundler/setup' if File.exists?(ENV['BUNDLE_GEMFILE'])

Here’s an easy way to see what this does.

This is my Ruby environment in the top directory of a Rails site I’m working
on. [I’m using rvm, so it shows the ruby and stuff I have set up for this site]
This is just simply running irb, so the bundler setup stuff isn’t run.

Here’s the default $LOAD_PATH

mike:clovetech2 mike$ irb
ruby-1.9.2-p180 :001 > puts $:.join "\n"
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1/x86_64-darwin10.7.3
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/vendor_ruby/1.9.1
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/vendor_ruby/1.9.1/x86_64-darwin10.7.3
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/vendor_ruby
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/x86_64-darwin10.7.3
 => nil

Here’s what $LOAD_PATH looks like after bundler/setup is required. All those
extra files are pulled in from the Gemfile.

Most of the new stuff are gems which I’ve pulled from rubygems.org and which
are stuffed rvm’s gemset for the Ruby I’m using. It also adds in my development
gems. More below

mike:clovetech2 mike$ rails c
Loading development environment (Rails 3.0.6)
ruby-1.9.2-p180 :001 > puts $:.join "\n"
/Users/mike/Rails/clovetech2/lib
/Users/mike/Rails/clovetech2/vendor
/Users/mike/Rails/clovetech2/app/controllers
/Users/mike/Rails/clovetech2/app/helpers
/Users/mike/Rails/clovetech2/app/mailers
/Users/mike/Rails/clovetech2/app/models
/Users/mike/Rails/Mikes-Gems/use_tinymce/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/webrat-0.7.3/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/sqlite3-ruby-1.3.3/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/sqlite3-1.3.3/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/rails-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/railties-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/thor-0.14.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/nokogiri-1.4.4/lib
/Users/mike/Rails/Mikes-Gems/manage_meta/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/database_cleaner-0.6.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/autotest-rails-4.1.0/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/annotate-models-1.0.4/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/activeresource-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/activerecord-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/arel-2.0.9/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/actionmailer-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/mail-2.2.15/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/treetop-1.4.9/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/polyglot-0.3.1/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/mime-types-1.16/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/actionpack-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/tzinfo-0.3.26/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/rack-test-0.5.7/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/rack-mount-0.6.14/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/rack-1.2.2/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/erubis-2.6.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/activemodel-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/i18n-0.5.0/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/builder-2.1.2/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/activesupport-3.0.6/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/abstract-1.0.0/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/ZenTest-4.5.0/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@global/gems/rake-0.8.7/lib
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/bundler-1.0.10/lib
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1/x86_64-darwin10.7.3
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/vendor_ruby/1.9.1
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/vendor_ruby/1.9.1/x86_64-darwin10.7.3
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/vendor_ruby
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1
/Users/mike/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/x86_64-darwin10.7.3
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/actionpack-3.0.6/lib/action_controller/vendor/html-scanner
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/rack-mount-0.6.14/lib/rack/mount/vendor/multimap
/Users/mike/.rvm/gems/ruby-1.9.2-p180@clovetech2/gems/rack-mount-0.6.14/lib/rack/mount/vendor/regin
 => nil

Take a look at /Users/mike/Rails/Mikes-Gems/use_tinymce/lib
and /Users/mike/Rails/Mikes-Gems/manage_meta/lib. Those are my development
gems.

bundler has hooked my development gems into the load path.
I pulled that off by specifying those gems using:

`gem "manage_meta", :path => "~/Rails/Mikes-Gems/manage_meta"`

Take a look at the bundler site’s
for a discussion of the Gemfile
options.

And Wait … There’s More


Notice that only the lib directories for each included gem are added to
the $LOAD_PATH.

So how does the Engine get to the app directory and how does rake find
rake tasks?

The answer - as usual - is convention.

Directory structure is important. app has to be locatable in a standard way
using File.expand_path. For example, I strongly suspect that Rails::Engine
contains something which effectively does:

$LOAD_PATH << File.expand_path('../../../app', __FILE__)

(it doesn’t look like this - Rails is far more generic and self-programming than
something this straightforward)

Anyway - the thing to realize is that - for mere mortals - it’s best to stick
to the “correct” directory structure and naming conventions.

So if you build it and it doesn’t work - check your names and directory structure.

A Little about Initializers


railties/lib/rails/initializable.rb defines two classes, some class methods,
and an instance method:

The classes are:

  • Initializer - which is a named container for a block of code. It has provision
    for a context [which turns out to be a binding in which to run the code block]
    and before and after properties which are used to provide a sorting order.
    (more below)
  • Collection < Array - which is an array which can be sorted using the tsort
    method in the standard Ruby library. Sorting uses the before and after
    properties.

The class methods are:

  • initializers - which is a Collection
  • initializers_chain - which constructs an initializers instance for the class
    which calls it. Specifically, it consists of all ancestor classes and modules
    which respond_to? initializer [in reverse order, so the the initializations
    are processed from the top of the class/module hierarchy down to the lowest
    descendent (which is self)]
  • initializers_for(binding) - returns an initializers chain where each
    Initializer instance is bound to binding
  • initializer(name, opts = {}, &blk) - whom we’ve seen before up here
The instance method is:
  • initializers - constructs and caches the initializer_chain constructed for this
    instance of the class which mixes in Rails::Initializable.
All in all it’s really clever, remarkably flexible and hellish to unravel.

But, once you know it’s there and how to find the keys you need, it makes a lot of
sense and is clearly very efficient, flexible, and powerful

Saturday, December 25, 2010

Infinite Recursion in Parser Generators

Well, I've stuck my foot in it again and doing something which doesn't make a lot of sense.

Skipping the details, I decided I need to write a parser in PHP and the language I'm designing is embedded in PHP - which has a complex syntax and . . . anyway, one thing led to another and I've ended up writing kind of a parser generator in PHP.

It's not really a parser generator - it's more like a programmable parser where the program is a grammar specification.

So, I broke out the Dragon book and started reading, built a programmable recursive descent parser framework object and a hand coded parser for language grammars so I can can program it and a programmable lexical scanner - all in PHP and it all works pretty well.

And then . . .

I couldn't solve my problem with it.

Why, you ask?

Well, the problem I have cannot be solved by a parse tree created from a right-recursive grammar - which is what the book says a recursive descent parser needs to process.

Why?

Because when a recursive descent parser hits a left recursive production (which is what I need for my problem) it goes into an infinitely deep recursion.

Why does it do that?

It's stupid.

It turns out that no only will simple productions like: a : a TERMINAL ; create infinite recursions, but various, well hidden, mutual recursions will as well.

So - having faith in the Book - I decided maybe I need something which handles left recursive grammars. So I read and read and thought and thought and - as usually happens - I got tired, went to bed, and woke up this morning with a realization:

"It's not the recursion dummy, it's because processing non-terminals don't eat tokens!!!!!"

If that doesn't mean much to you - that's OK. The rest of this post is a boring explanation of what's happening and how to fix it.

First of all - why isn't it obvious from the book? Because it's not in there because:
  1. The book defines a mathematical formalism to describe language structure and parsing
  2. Like good mathematicians, they then ignore the actual problem and get buried in the formalism. And then . . .
  3. They come up with ways to solve problems in the formalism using programming techniques and computer constraints available at the time they are working
  4. The 2nd, 3rd, etc generation of 'students' become teachers and so they just teach the formalism in the computing context of the time of the original work
My dragon book is copyright 1977. Torben Ægidius Mogensen's "Basics of Compiler Design" is copyright 2000 through 2010 [nicely written, by the way] and the syntax analysis is a rehash of the stuff in the Dragon book [to be fair, I didn't read it all, but this is true to the margin of error inherent with a quick skim]

Believe it or not, things have changed.

The Apple 2 computer didn't exist in 1977 (I don't think it did. I got mine in 1979 or 1980) and it maxed out a a whopping 64 Kilobytes of RAM [that's 1024 bytes]. The processor executed one instruction about every couple of microseconds. In other words, both memory and speed were very very limited, so a lot of work went into algorithm design - at the expense of clarity and simplicity of code.

As a result, the compiler generators tend to avoid recursion ["function calls are expensive and take a lot of RAM"], but rather tended towards memory and speed efficient algorithms. As a result, the compiler generator section of the Dragon book is heavy into table driven parsers using conventional, non-recursive, non-functional programming techniques.

And - finally getting to the point - they are so deep into formalism and computing environment, they never actually get to the point of "what causes infinite recursion in parsers".

Well, here's the answer: any algorithm which revisits the same non-terminal without consuming a terminal symbol will infinitely recurse.

Huh?

This highlights another problem in understanding compiler generation: the compiler-eze terminology stinks. It emphasizes the algorithms, not the problem we're trying to solve.

So, here's what the Parsing Problem is:

Given a string of characters, does it make sense in a specific, grammatical language?

OK - that's not specific enough to answer. So let's make it more concrete:

First we will define a bunch of things we will call words and symbols. A word will be a string of 'letters' without any spaces in them. In English we also allow hyphens, so 'cat-food' could be classified as a word. In PHP a word might be a reserved word - 'foreach' or 'if' - or something like that. Anyway, we decide how to find these things in a string of characters.

We're going to call the things we find 'tokens' and it's the job of the 'lexical analyzer' to eat the string of characters and spit out an ordered list of 'tokens'.

These tokens are what the Language Grammarians call 'terminals' or 'terminal symbols'.

I'd rather call them 'tokens' or 'words' because that puts the focus back on what they are in the language. The term 'terminal' puts the focus on the activity of the parser - which we haven't gotten to yet.

Now, you might try to build a grammar description using only 'tokens', but it would get pretty large pretty fast and it would be really limited.

So you need something else. You need things which represent parts of the language. For example, you might need something called a 'sentence' [starts with a capitalized word and ends in a terminating punctuation mark: . or ! or ?] and maybe a 'paragraph' and maybe . . . well you get the idea.

These things which represent parts of the language can be composed of tokens or other parts of languages. In fact, in order to be really useful, these parts need to be able to refer to themselves as part of their definition - that is 'be recursively defined'.

For example, lets say I have only four words: A, B, C, and D. I also have a couple of symbols, say AND and OR. That's my whole vocabulary.

Now let's say I want to construct sentences. I might say something like:
sentence : A AND B | A OR B | C | D ;

where I'm using ':' to mean 'defined as' and '|' as 'or it might be' and ';' for 'that's all folks'.

But this is kind of limiting. So let's say I want to build more sentences than I can list using only words and symbols.

word : A | B | C | D;
sentence : sentence AND sentence | sentence OR sentence | word ;

In compiler-eze, these parts of sentences are called 'non-terminals' - again, putting the emphasis on the process of parsing [the parser can't stop on a non-terminal] rather than on the structure of the language. I'm going to call them 'fragments'.

Now, there are two ways I can use a grammar:
  1. I can build sentences using it - which you do all the time: writing, speaking, creating programs, etc.
  2. I can transform strings of characters (or sounds) into sentences so I can understand them - this is called 'parsing'
Before we get to parsing, let's look at how we can use the grammar to create a sentence.

Let's say I want to build a sentence - but I really don't care what it means, only that's it's grammatically correct.

I'll start with the fragment sentence. But this doesn't get me a string of characters. Grammars can only build sequences of 'fragments' and 'tokens'. Tokens are made up of sequences of characters - which is what I want - but 'fragments' aren't: they are made up of 'fragments' and 'tokens'.

So, in order to build a character string - or say something in the language - I have to get rid of all the 'fragments' so that I have a string of 'tokens' which I can (at least theoretically) feed to the un-lexical un-scanner which will produce a string of characters - which I can then print in a book.

So how do I proceed? (the arrow (->) means 'replace a 'fragment' on the left with one of the alternatives of the right side of the definition of the 'fragment' in the sequence and write it on the right side of the arrow.) (which is easier to do than say)

sentence -> sentence AND sentence -> A AND sentence -> A AND D

and now I'm done. I have 'produced' a sequence of 'tokens' [TERMINALS in compiler-eze]
which I can un-lexical analyze to produce a sequence of characters.

Now in compiler-eze, the alternatives on the right side of the definition of 'sentence' are called 'productions', because replacing a 'fragment' by one of them 'produces' something which is grammatically correct.


Ok - this is pretty straight forward, if boring. So let's turn to the 'parsing'. That is, given a string of characters, is it a grammatically correct sentence?

The mathematicians would say 'it's grammatically correct if (and only if) there is a sequence of replacement operations I can find using productions which will generate the sentence'. So - as they would have it - they have 'reduced' the problem of 'parsing' to finding a sequence of productions which will produce the sentence.

How do we do that? The Dragon book starts by analyzing algorithms, but let's take a different approach: let's look at what we do when 'parsing' a sentence somebody says or that we've just read.

What I think you do (or we do) is look over the sentence and divide it up into chunks which make some sort of sense. Like 'Joe ran through the forrest'. Well, what's this about? 'Joe'
What did he do? 'ran' Where did he do it? 'through the forrest'. Stuff like that.

Let's formalize this procedure:

First we'll lexically analyze the sentence: for 'Joe ...' this amounts to classifying each word according to its possible uses:
  1. 'Joe' - is a noun and a name. It can be used in a subject or the object of a phrase
  2. 'ran' - is a verb. It can be used as a 'verb', as part of a predicate, part of a compound verb, or a phase ['seen to run']
  3. etc
Then we start parsing by examining the first token: Joe. Some sentences start with a noun, so we put 'Joe' on the shelf and look at the next word to see if it fits with how sentences which start with nouns are constructed. etc.

The point is, we are scanning from left to right and trying sentence forms to see if they fit
the noise we just heard or read. [left to right, right to left, up to down - doesn't matter so much as the fact that it's really focusing on one word at a time in a consistent order].

So, in parsing we have two scans going on:
  1. we are scanning the token stream
  2. we are also scanning across a production to see if these tokens fit into it
The 'parse' terminates when the token stream is exhausted and all the tokens have been stuffed into 'fragments' OR something won't fit into any fragment. This is controlled by the sequence of scans across productions. Each time we start scanning, we start with some 'fragment' definition and exhaustively try all of it's productions to find a fit with the token stream - remember that we are scanning the stream left to right. So the only way to get into an infinite recursion is to find a production scan which does not terminate.

Scanning a production terminates on one of three ways:
  1. a segment of the token stream matches the entire production - then the production is accepted. Accepting means that we don't have to look at those tokens any more and we can make a record of the fragment we recognized. [in compiler-eze we then 'reduce' by replacing the production by it's non-terminal in the non-terminal definition (again, emphasis on algorithm rather than process]
  2. a token doesn't fit, in which case the production is rejected.
  3. the production can be empty - and so it's trivially satisfied. [I forgot this early and have to think some more about it. Golly! that's meat for another post on this topic]
So if - in our scan across the production - we never look at any 'tokens', we will never terminate the scan. How can this happen?

Here's an artificial example:

frag1 : frag2 | WORD1 ;
frag2 : frag1 | WORD2 ;

No matter what I scan, my production scan will first look for frag2 which will look for frag1 which will look for frag2 which will . . . and I will never examine a token, so I will never reach the end of the token stream.

To go to a less artificial example, let's go back to my A, B, C, D language.

I'm given with a sentence A AND C and I want to see if it can be produced by the grammar. I decide to 'run the grammar backward' to see if I can find a sequence of substitutions which work.

OK, I start by guessing it's a 'sentence', so I write down:

sentence

Now I say - 'what production might this be? Let's try the first one!', so I grab:

sentence AND sentence

Now you can look at the whole sentence and say 'Yep!!! It fits', but the computer will only look at what it's programmed to do. So, lets say that I've programmed up a recursive descent parser, which works by defining a function for each 'fragment' which it calls when it sees it's name in a production.

So my 'parser' will see 'sentence' and call the 'sentence' function which will then look at the first production and will see sentence and will call the 'sentence' function and . . .

And there you are - infinite recursion.

So we can't use a recursive descent parer. Right? Well, . . .

The recursion isn't caused by the parsing method, it's caused by any algorithm which attempts to match the same 'fragment' twice without recognizing and moving past a 'token'.

So infinite recursion in parsing results from designing an algorithm (any algorithm) which can cycle through a sequence of 'fragments' without ever recognizing (and using) a 'token'.

So, can I patch up a 'recursive descent parser' so that it handles 'right recursion' and other forms of infinite recursion?

Sure. I just have to keep track of my progress through the token stream and reject any production in which a 'fragment' occurs which I'm in the (recursive) process of examining AND which is at the same place in the token stream as it was before. Again, this will be easier to code than to write.

I'll post a note when I've finished fixing this thing - in case you want to look at the code

Mike



Monday, August 2, 2010

Change - again

Just about everyone I know would rather die than change something they believe.

That's too vague.

Let's say I believe I'm too fat. That can make sense if I look in the mirror and see somebody who looks like a sphere. But for an anorexic, when they look in the mirror they see somebody who looks like a stick.

The reason we say they are 'anorexic' isn't because they look like a stick. It's because they look like a stick and think that they are too fat AND they won't change what they believe.

So how do we react to this?

We call something like this a 'disease' and look for something to do to them to make them change. Probably some chemical we can put in a pill or an injection or a patch or a suppository.

Does it really make sense that an inert chemical can cause someone to have a specific idea? Isn't an 'idea' or a 'belief' more complex and specific than a single chemical?

So what can these 'drugs' really do? - other than slow down or speed up thinking?

If that's all they can do, then 'drugging' people just changes their ability to think - their 'thinking environment' - not their beliefs.

So 'drugs' can't 'cure their disease', although they may make it possible for them to think about it differently. Maybe they it makes them think more ssssllloooowwwwwlllllyyyyy. Maybe it makes them stop thinking at all. Or maybe it just makes them passive so we don't have to think about them at all. Or maybe - as my friend who knows these things says - they generally don't work.

But that's not the point.

The point is: if an anorexic didn't believe he/she was fat, she wouldn't be an anorexic. She'd be a skinny person who knew she was too skinny and would do something about it - like eat some more.

So how do you change a belief?

Take football for example. The team which wins consistently believes that they can win. Not only that, they believe they can win this game. Right now. If they think they can't, they always lose.

What makes them believe this?

It's pretty simple: they have a slogan, a mantra, a rallying cry, a whatever to repeat over and over again. So as long as they can keep telling to themselves they can win, they will win, they're going to win - then they believe they can, will and are going to win.

Is a belief anything more than something we keep repeating to ourselves?

What happens when we stop talking to ourselves about one specific belief? Doesn't the alcoholic or a smoker keep reminding himself that he needs a drink or a cigarette? What would happen if he - instead - reminded himself that he needs an ice cream cone? (Besides getting fat and maybe getting diabetes) Wouldn't he eventually go from being an alcoholic to an ice cream-aholic?

A belief is just a thought. It's not made out of stone or steel or even jello. It's 'mind stuff'. There's two kinds of 'mind stuff'. There are memories and there's 'what I'm thinking now'.
All you can do with a memory is either lose it or drag it out to 'think about it now'. Everything you do and experience is the 'what I'm thinking now' stuff. That's where the anorexic and the alcoholic and the smoker 'belief' exists.

There isn't any automated thought loader which pushes thoughts into your 'thinker' and makes you think them. You get to pick and choose.

Don't believe it? Close your eyes and try to count the thoughts which come up over the next 10 seconds.

If you're like I am, there were a lot of them. Ten, a hundred, I don't know. Just lots and lots of them. I'll bet you 'thought about' just a couple - maybe one or two. What happened to the rest of them? They're like the kids you didn't pick to be on your team: they just wandered off.

The stuff you and I believe - about life, goodness, and - especially - ourselves - are just these familiar little thoughts we keep repeating. And by repeating them, we think their real. And that's all a belief is.

So really, how hard is it to change a belief?

It's easy - if you want to and are brave enough to give it up.

Sunday, June 20, 2010

Human Development - kind of

So somehow I went from wherever I was as a young guy to this worn out husk who had to deal with with raising six kids. Was nothing I planned to do, but, well, there it is.

That's not the point - it's just to give you my credentials, assert my authority, bolster my claims, etc etc.

This first point is that I'm writing from my experience - not text book theory or some other academic point of view [I have those also, but not about kids]

Here's the Second Point:

When a kid is just born, they don't do a lot and the way they interact is really different from how they are later on. They're kind of like little animals that just crawl or run around. They don't really argue about anything. Their complaints are very personal and immediate - hungry, wet, cold, lonesome, tired.

After a couple of years something happens. You've been going along saying things like: Please don't do that or Please do this - and almost getting used to the fact that they never really pay much attention and don't seem to remember what you said from one second to the next. But then the thing happens: the kid turns around and looks you straight in the eye and says - very firmly and usually pretty loudly - "NO"!

This is a major breakthrough.

The kid hasn't turned into a monster or become a 'Terrible Two' (as I was told by my mother, over and over again). The kid just now finally noticed that you're trying to get him (or her) to do something that wasn't what he wanted to do.

That's the point when you can finally start teaching the kid how to 'not get run over by a car' and 'to clean your plate' and 'to pick up your toys'. It won't work - at least not for the first 10 or 20 years - but you can now get started and be (relatively) happy with the knowledge that you're not just being ignored. You're being Actively ignored and - believe it or not - all of you talking, pleading, reasoning, and (most of all) the example you set is sinking into that little mind.

After mulling this over for a bunch of years - I've come to the conclusion that this is a pivotal point in all of our developments. It's the point where we realize that we live in a context over which we do not have total control and with which we must learn to interact.

And for a Third Point:

Lora (we're married and four of the kids are her fault) is a certified Montessori method pre-primary instructor. She went to school for a year, did a year internship, read all of Montessori's books and practiced on our kids. Turns out that's enough for her - she prefers ponies and dogs to humans.

Anyway, she has creds as well.

Montessori discovered - by observing kids - that kids go through various phases of development where they are really different. She called them sensitive periods and it's fascinating stuff.

But - to get to the point - there's one thing I wanted to write about here. It's the thing which happens when the fish in the classroom fish tank dies.

The pre-primary Montessori classes have kids in the 3 to 5/6 year range. They generally have a fish tank. At some time during the year one or more of the fish will die.

Here's where it gets interesting.

The kids come in and quickly divide into two groups.

The younger kids look at the fish and say 'the fish is dead' and then go do something.

The older kids look at the fish and say 'the fish is dead. I wonder why the fish died. Was the water too hot? Was it too cold? Did another fish kill it? Wasn't the food right? I think that ...'

You get the difference?

Before something changes, it's ok to just see the dead fish and register it as a fact.

After 'the change' we have to 'know Why'.

I think this is the origin of Why.

You should listen to some of the silly 'reasons' 5 and 6 year old kids come up with. Their logic is not all that bad, but the 'facts' they start from are pretty lame. But that makes sense because the don't have much experience.

So do that for a while - - - and then - if you're brave - listen to some of your friends and neighbors talking about stuff. Things like 'the problem with ... is ... because ...'

When I started listening to adults and comparing it with the stuff 5 and 6 year olds come up with, I got really upset. It's the same stuff!!!

I think everybody is acting like 6 year olds - including the guys who run the countries and big companies.

Doesn't that explain a lot of what's going on?

I'm telling you - the problem is the whole thing is being run by 6 year olds and that's why nothing works. The answer is . . .

ORM or Not? Part Two - Definitely Not

Maybe that's a little too harsh - but I don't think so.

Here's the background:

One of the biggest recurring programming problems I've had to deal with in site and application development has been - (drum roll) - developing the database. Not populating it - that's just boring. Developing and refining the data definitions.

The "Conventional Wisdom" promulgated by Software Gurus is basically this:
  1. Databases are Good With Data
  2. Data Objects Are Good With Data
  3. A Programmer Must Define a Mapping Between these Two Good Things and then All Will Be Good
Not my experience.

Software Gurus don't write application. Software Gurus write Books and (occasionally) toy examples. So they really don't understand that Good does not generalize because the Good that Databases are with data is all about safety and accessing it in great and small piles. The Good that Data Objects are about is manipulation, fine structure, and flexibility.

My experience with Object Relational Data Mapping - which is just a fancy phrase for how you get the data from the database into an object and back - is this: when I Did it Their Way, I had to hand-coordinate both my Objects and my Database.

So when I wanted to add a field or change a field or change a datatype, I had to do everything twice.

Now Rails has Active Record - which is a stupid Software Guru name for an ORM which creates the Data Objects automatically (my drivers license is an active record - meaning it's current so I can legally drive), but that doesn't 'solve the problem' it just moves it. [maybe Rails 3 is smarter,
I dropped out just as Rails 2 was coming out]

Specifically, the Rails solution moved the problem to 'migrations' which turned out to be fragile. I know: I tried it.

So, a couple of years ago I proposed Not building an ORM, but rather loosening the coupling between the Database and the Data Objects.

Here's what I've done:
  1. Defined an PHP class which implements a data abstraction which includes most of the types of data and methods I want in my CMS [see http://www.yasitekit.org] and knows how to map those data types into database fields. It also knows how to create the database structures, create, update, delete, and retrieve those values.
  2. Defined a PHP class which makes it easy (relatively speaking) to create objects which are instances of the data abstraction class. These classes come with a bunch of methods which I've found useful as well as a management class which automatically provides interactive data administration.
  3. Some automated analysis tools which work with this stuff so it's possible - and relatively painless and safe - to add, delete, and modify the derived objects (point 2) and reload the database. All done by hacking the derived PHP objects, but never ever touching SQL.
  4. As a side benefit, I threw together a database adaptor which abstracts the 8 essential database operations [create/destroy database;create/drop table;insert,update,delete,select] at a level high enough that we never have to muck with SQL. This makes it possible to augment it with non-SQL databases - such as Mongodb. In it's present form it allows painless migration between sqlite, mysql, and postgresql (it currently handles 5 different PHP database interfaces - automatically figuring out what's available - yada yada yada)
So, at this point, I feel more than comfortable saying No to ORM's.

BTW - if you go to the YASiteKit site, you may be a little dissapointed because all of this good stuff is not available yet. But It Will Be - Real Soon Now (no kidding). And a whole lot more.

What is on the site is a pretty-much full set of documentation - both about the system design and about how each of the parts work [I write doc as I go, so it's not all pretty - but it's useful (I know because I use it)].

So take a look. If you have any comments - let me know.

Also, if you like it enough to want to help - let me know sooner - I'm almost ready to turn it loose to see if it gets any traction.

Tuesday, June 1, 2010

Choices [Rant Warning]

I've been thinking a lot about the idea of Choice lately. Not so much any particular choice, but the whole idea itself and how it differentiates us from rocks and water and stuff like that.

It seems to me that one of the distinguishing features of 'life' verses 'non-life' things is that 'living' things get to choose what they do rather than just follow the 'laws of nature'. For example, rocks don't do much of anything except just sit where they are or roll down hill or get knocked someplace else by something else, but a bug can get up and go someplace else - even against gravity, if it wants to.

So what? Well, I've been wondering for a long time why we (that is us 'humans') do such stupid things and - even when we realize they are stupid and we are hurting ourselves - keep doing them. We even go so far as to tell ourselves - and everyone who will listen - how hard it is for us to not do them.

Take eating too much. When I eat too much for a while, I get fat and crabby. Then I feel sorry for myself because (I claim) it's oh, so hard to not eat too much. It sounds like there is an army of bad guys stuffing food into my mouth - but what's really happening is that I'm choosing to pig out. It takes real effort to keep eating when I'm full or when I'm not really hungry.

That one's simple and easy to see. And it's easy to see that if I'm fat, I made myself that way and I did it because I wanted to eat more than I wanted to feel good.

But it goes a lot further than just eating.

I think we've made up all kinds of religious garbage to excuse our disgusting behavior. We make up and believe in gods and devils and genetic forces and team loyalty and goals and success and all kinds of other junk to take the heat off of ourselves. 'The Devil tempted me - and I was weak',
'It's in my Genes - I can't help myself', 'We've got to do it for the Team', ' .. or the Company', '... or our Kids', '... or for Honor', '... or Because That's the Kind of Man/Woman I Am'. etc

It's all a smoke screen.

We do it because that's what we choose to do - just like those guys who strap bombs on themselves and kill a bunch of people.

It's a personal, misguided decision.

The hell of it is, it isn't hard to Stop. It doesn't take Effort to not do something. There really isn't any force making us do this stuff. Want proof? Just anethesize your mind somehow - and you won't be doing all that stuff you think is so hard to not do.

So let's get over it and stop kidding ourselves.

We muck things up, piss each other off, get mad about stuff, drill oil wells places where we can't plug leaks, kill people, destroy the whole bloody planet and everything on it because we choose to.

That's all there is to it.

And that's what makes us different from a nice, peaceful rock.