Friday, May 1, 2009

PHP, Language Design, and Confusion


When I wrote some iterator code which returns a reference to an internal array in an object I discovered an interestingly easy to introduce bug. Without getting into the details, I left out a '&' and clobbered my underlying data structure.

Here's the offending loop:

for ($row=&$obj->firstRow();$row;$row=$obj->nextRow() {
other stuff

where firstRow() and nextRow() are iterator methods defined on the object which do exactly what they sound like. [oh yeah, they both are defined as returning references]

Spot the bug?

I left off the '&' between the '=' and the '$' in the update part of the for loop. This causes $obj->nextRow() to overwrite the firstRow.

Strangely enough, if I wrote this in C I wouldn't have made the error. The equivalent C code would be:

for (ptr=*obj->firstRow();ptr;*ptr=*obj->nextRow() {
*ptr-> whatever

Notice that I have to explicitly dereference the pointer ptr in order to clobber the initial element.

So this covers PHP and the Confusion.

Here's where the Language Design comes in:

C makes an syntactic distinction between accessing a pointer and dereferencing it.

PHP doesn't. If a variable contains a reference then assignment deferences it implicitly and silently.

Consequently, you can't really understand a chunk of PHP without reading all the definitions of the variables which preceed the chunk you're looking at. This is a bad thing and it violates that tried, true and mostly forgotten maxim of 'code locality'.

This also leads to a lot of bugs - and 'Bogus' bug reports - involving references. [just check out - 655 Bogus bugs relating to 'reference']

I think PHP would be a lot easier to understand if there was a syntactic difference between assigning to a variable and assigning to the referrant of a variable.

Practically speaking, I don't think it will happen because of 'backward compatibility' and 'NIH'.

What I'd like to see is a dereference prefix which is accepted in PHP 5.x and becomes mandatory in PHP 6. There should also be a warning option which generates a warning if a variable containing a reference is assigned value and the variable is not prefixed.

Here's my first (and only) choice:

This is currently used to denote a reference when used on the right side of an assignment. It should be a simple matter to extend this to dereferencing on the left side and in expressions.