2008-11-15

PHP5 Iterators. MySQL iterator example.

PHP is stupid, enough said. Recently I wanted to abstract a table printing function so it could work with either arrays and mysql. In Python this is screams iterator and since I heard PHP5 supported iterators I alway wanted to write one. So before get to the PHP let me explain the Python way first:

The Pythonic Iterator Protocol:
  1. Take the object to traverse, call its '__iter__()' method to obtain/initialize it
  2. Call its 'next()' to obtain the current item
  3. Exit from the iteration when 'next()' raises the 'StopIteration'
Simple isn't it? All the work is done in 'next()' and all it has to do is return a value or raise 'StopIteration'


The PHP Iterator Protocol:


  1. Call 'rewind()' to make sure we are iterating from the begining.
  2. Call 'valid()', if it returns false exit from the iteration.
  3. Take the first element by calling 'current()' fetching the first element.
  4. Optionally get the key of the first element by calling 'key()'.
  5. Call 'next'()' to do whatever is necesary to fetch next item, ignore the return value.
  6. Call 'valid()', if it returns false exit from the iteration.
  7. Take the next element by calling 'current()'.
  8. Optionally get the key of the next element by calling 'key()'.
  9. Call 'next'()' to do whatever is necesary to fetch next item, ignore the return value.
  10. Repeat steps from 6 to 9.

"Wait a minute!" you say "steps 2-5 are the same that steps 6-9!" No they aren't. Steps 6-9 operate in the "next" item, the one 'next()' fetched for us. steps 2-5 operate on some ghostly "first" item that nobody has fetched yet.

So 'valid()', 'current()' and 'key()' have to behave differently for the first run. In practice it's sufficient with calling 'next()' from within 'valid()' the first time. But the two resons why this is horrible are because...

OOP and semantic purity are like M. Night Shyamalan and plot twists:

One implies the other, and it hurts when it doesn't match our expectatives. In OOP methods are named in a way that you know what they do just from looking at its name. The boolean method 'valid()' suggest a simple procedure to ensure the currently selected item is part of the iteration you don't expect it to also fetch the first item. Another problem is one of efficiency, for an array with N elements 'valid()' will have to make a test N times where it will evaluate the same allways except the very first case.

No, we have to take the inicialization out of the loop. OOP principles tell us the constructor is the place to make these set ups. But there is a problem, 'rewind()' is called just before the iteration begins! So we find ourselves in a dichcotomy:

  1. Fetch the first item in '__construct()', make 'rewind()' do nothing.
  2. Fetch the first item in 'rewind()', that is, call 'next()' after rewinding.
Either way 'rewind()' is a lier because it doesn't do what its name says it does. Now if I have to choose the leser evil, option 2 is the way to go, because it makes the iterator reusable which is the purpose of calling 'rewind()' in the first place. And so hereby I present:

A simple PHP MySQL Iterator:


class mysqlIter implements Iterator{
private $resource;

private $count = 0;
private $pos = -1;
private $valid;
private $curval;
public function __construct($resource){
$this->resource = $resource;
}
public function next(){
if ($value = mysql_fetch_assoc($this->resource)){
$this->valid = true;
$this->curval = $value;
$this->pos++;
} else {
$this->valid = false;
}
}
public function valid(){
return $this->valid;
}
public function current(){
return $this->curval;
}
public function key(){
return $this->pos;
}
public function rewind(){
mysql_data_seek($this->resource, 0);
$this->next();
}
public function count(){
return mysql_num_rows($this->resource);
}
}


Aftermat.

At first I wasn't aware 'next()' was not going to get called until the second leap, then 'rewind()' started to mess up the result, so it took me a little longer to implement the iterator. I blame the PHP way and its documentation.

A php-head will tell me that this is a case of PHP just being a different language, not stupid but the devil is in the details. For instance it is a good argument to say that there is nothing incosistent on rewind calling next() because it means a manually rewinded iterator is pointing to its first item always but this opens the question, why would you manually access the first item in an iterator? The answer is because you aren't exactly handling an iterator but a data structure that is iterable. Iteration happens direclty to the object, in Pythonland most iterables actually use a proxy iterator object (that's the purpose of '__iter__()') which means, among other things, that iterable objects don't need to contain iterator related attributes o methods.

Iterable objects in Python don't usually carry an internal pointer or implement next(), they simply have an '__iter__()' method that returns an object that does so.

Another implication is that, because a new iterator is instantiated on demand every time, the same data structure can be traversed by multiple clients without conflicts unlike PHP iterators.

But there are other problems with the argument that 'rewind()' calling 'next()' ensures the internal pointer is at the right position. One of them is that, if directly accessing an iterable is so desirable, then one would expect people to access freshly instantiated iterators. That means '__construct()' also should call 'next()', just in case.

But if an iterator is instantiated and then used (a very common pattern) then the first item would have been fetched twice!!

In short, iterators in PHP5 suck.