Monday, September 22, 2008

More on Functional Queues in Scala .. More Memoization

In my last post, I had talked about a Functional Queue implementation in Scala that used the laziness of Streams to improve upon the worst case bound of the dequeue operation. In reality, with the initial benchmarks, the algorithm didn't perform as well, and was, in fact, in some cases, slower than the one that used the strict List based implementation. However, the figures that came out from the benchmarks indicated that the strict List based implementation suffered from the rough edges due to the O(n) reverse operations that were performed on the List for the dequeue. The performance curves of the lazy implementation was much smoother.

However, as I mentioned in the post, the lazy implementation also has overheads, out of hidden constants, which came out in benchmarking with large numbers. This is mainly due to increased heap usage from thunk creation, which can be quite large. And this, in turn can eat up the heap and kick off garbage collection, thereby affecting the overall runtime of the algorithm.

Still the lazy implementation, being O(log n) worst case and O(1) amortized for dequeue proved to have more predictable numbers.

In the same paper that I referred to in the last post, Okasaki goes on with an improved version of the lazy algorithm taking more advantage of memoization. The trick is to ensure that whenever we have a call of the form rotate(xs.tail, ..), the referred tail should have already been pre-evaluated and memoized. This leads to an O(1) worst case for the dequeue operation as well.

Incrementally .. Again ..

Once again we turn back to incremental operation. We introduce one more Stream into the implementation ..


class MemoizedQueue[+A] private (front: Stream[A], rear: List[A], pending: Stream[A]) {
  //..
  //..
}



pending points to the portion of front, whose tail is yet to be evaluated. That is, when pending is Nil, it implies that every tail of front has been evaluated and memoized. How do we take advantage of this ?

Assume we have just had a rotate ..

  • front points to a non Nil List

  • rear points to a Nil List

  • pending points to the same List as front


Till the next rotate is called, everytime we call makeQ, we walk down front, evaluate each tail incrementally and advance pending, thereby ensuring that every tail of front has been pre-evaluated and memoized by the time of the next rotate. Here is the modified makeQ ..


class MemoizedQueue[+A] private (front: Stream[A], rear: List[A], pending: Stream[A]) {

  //..

  protected def makeQ[A](f: Stream[A], r: List[A], p: Stream[A]): MemoizedQueue[A] = {
    if (p isEmpty) {

      // pending is empty, i.e. all tails evaluated
      // time for the next rotate

      val fr = rotate(f, r, Stream.empty)
      new MemoizedQueue[A](fr, Nil, fr)
    } else {

      // evaluate one tail in pending
      new MemoizedQueue[A](f, r, p.tail)
    }
  }
  //..
  //..
}



An O(1) worst case Functional Queue in Scala

Since front.tail is incrementally evaluated and memoized, the cost of dequeue comes down to O(1) worst case. Here is the final version of the implementation that does every queue operation in worst case O(1) ..


object MemoizedQueue {
  val Empty: MemoizedQueue[Nothing] =
    new MemoizedQueue(Stream.empty, Nil, Stream.empty)
}

class MemoizedQueue[+A] private (front: Stream[A], rear: List[A], pending: Stream[A]) {

  def this() {
    this(Stream.empty, Nil, Stream.empty)
  }

  private def rotate[A](xs: Stream[A], ys: List[A], rys: Stream[A]): Stream[A] = ys match {
    case y :: ys1 =>
      if (xs isEmpty) Stream.cons(y, rys)
      else
        Stream.cons(xs.head, rotate(xs.tail, ys.tail, Stream.cons(y, rys)))
    case Nil =>
      throw new NullPointerException("ys is null")
  }

  protected def makeQ[A](f: Stream[A], r: List[A], p: Stream[A]): MemoizedQueue[A] = {
    if (p isEmpty) {

      // pending is empty, i.e. all tails evaluated
      // time for the next rotate

      val fr = rotate(f, r, Stream.empty)
      new MemoizedQueue[A](fr, Nil, fr)

    } else {

      // evaluate one tail in pending
      new MemoizedQueue[A](f, r, p.tail)
    }
  }

  def isEmpty: Boolean = front isEmpty

  def enqueue[>: A](elem: B) = {
    makeQ(front, elem :: rear, pending)
  }

  def dequeue: (A, MemoizedQueue[A]) = {
    if (front isEmpty) throw new Exception("empty queue")
    (front.head, makeQ(front.tail, rear, pending))
  }

  def length = (pending.length + 2 * rear.length)
}

No comments: