While building a website, understanding what changes were made helps with advanced editing and Version Control.

The problem? How do we know what was changed between one publish of a world and the next?

My solution? Our very own diff checker algorithm tailored specifically to our worlds. I wrote a function called diff that would run every time a creator published their world. What this would do is find which parts of the world were added and removed, and save those id’s respectively.

My first challenge was trying to understand how I would be able to efficiently find the difference between two arrays. The easiest way to find the difference is to first figure out what’s the same- and that sounded like a popular programming problem!

I read up on diff algorithms, and came across Longest Common Subsequence- LCS.

LCS works by recursively using a hash map to keep a running track of similarities between two strings, so I figured that I could use that with our worlds (which were arrays) as well.

# frozen_string_literal: true

module TreeUtil
  def self.max(arg1, arg2)
    arg1 > arg2 ? arg1 : arg2
  end

  def self.lcs(old_tree, new_tree, m, n, hash, res)
    return 0 if m.negative? || n.negative?

    if hash[[m, n]] == -1
      result = -1
      if old_tree[m] == new_tree[n]
        result = 1 + lcs(old_tree, new_tree, m - 1, n - 1, hash, res)
        res << old_tree[m]
      else
        result = max(lcs(old_tree, new_tree, m - 1, n, hash, res), lcs(old_tree, new_tree, m, n - 1, hash, res))
      end
      hash[[m, n]] = result
    else
      result = hash[[m, n]]
    end
    result
  end

After finding the LCS, I knew that to find the diff it would be as simple as doing a set difference from the old and new versions of the world.

In short:

removed = old - LCS