The Inverted Index game

Here is a cool game I invented to teach a class the basic idea of how an Inverted Index works (the fundamental data structure of all search algorithms). It is also a fun way to get to know the class.

The rules:

  1. You want to find a number of students.
  2. You only know their names but not their faces.
  3. They are not allowed to say their names.
  4. When asked for a specific name they can reply with Yes or No whether this is their name.

Ask the students to come up with ideas on the fastest way to find the students in question. The most obvious way of course is to just go person by person and ask them for each name on the list. At O(number of students) this is quite inefficient.

Some optimizations come to my mind: Cross names of the list once you have correctly identified someone. Skip names that do not match the person’s gender. Still, this will take a long time.

Someone might have the idea for the class to line up alphabetically. They are on to something! This works pretty well for small group. Still, in the worst case this takes a while if one of the names in question starts with a late letter like X. Starting from the back of the line is also not helpful since there may be many students with names starting with Z, who knows. And it will definitely take a long time for a dozens or hundreds of students.

Inverted Index to the rescue: How about students form lines according to the first letter of their name? This way you just need to ask the first person in the line for the name until you have reached the line for the letter you are looking for. Then just go through this line until you have found the right person!

Say you are looking for Michael. People shuffle around and slowly form into lines. You start asking the first line: Anna… No. Next line: Eric… No. Next: Maria… Got the right line! Martin… Mary… Michael, gotcha!

In the worst case finding the right line takes 26 steps (one for every letter in the alphabet) and then following the line should not take too long either, unless all students have the same first letter which is unlikely.

This approach lends itself well to explain some of the concepts of the Inverted Index:

  • You need to do some pre-processing before you can start searching, this takes a while. However, this will pay off with a much faster search.
  • For a small number of students = documents this is probably not necessary and it is more effective to just look through all documents, akin to Unix’ grep.
  • During pre-processing documents are parsed (picking the first letter of the name in this case) and then added to a line or bucket depending on the parsed criteria.
  • You need to put some thought in the pre-processing to get it right. Which line is right for student whose names do not start with letters in the ASCII set, such as Ølaf. Do they go into the O lane or Ø, ie. do you do ASCII folding? This would reduce the number of lines but make the lines longer.
  • Sorting the lines alphabetically makes it easier to find the right line (in software the lines would be hashed for faster lookup but this is hard to do in real life).
  • Once you have identified the right line finding the requested document is easy since the line is relatively small.

Hope you like it!

Das Bedingungslose Grundeinkommen

Wir leben in der Krise. Sie begleitet uns nun schon seit Jahren, ohne dass ein Ende in Sicht ist. An manchen Tagen ist vom Ende der Krise die Rede, gleich darauf folgt schon die nächste Schreckensnachricht. Es ist also genau die richtige Zeit, sich über alternative Wirtschaftsmodelle Gedanken zu machen.

Ein Modell, das in den letzten Jahren zunehmend an Aufmerksamkeit gewonnen hat, ist das Bedingungslose Grundeinkommen (BGE). Bisheriger Höhepunkt ist eine erfolgreiche Volksinitiative in der Schweiz, die zu einer Volksabstimmung bezüglich der Einführung 2016 führen wird.

(more…)

Import XSPF playlists into Google Play Music All Access

I had a wonderful list of favorite songs on last.fm that I sadly could not listen to after I abandoned last.fm a long time ago. But now that I’m using Google Music, which has an unofficial API, I figured the time was right to revive that playlist.

So I quickly whipped up a small tool to import an XSPF playlist. The kicker is that it uses Google Music’s All Access library, so no uploading the songs required. On the other hand that also means that not all songs from your playlist may be available.

Go try it out, the code is at https://github.com/georgms/google-music_xspf-importer

JMS basics

I’m currently integrating JMS (Java Messaging Service) into a project. Being a true enterprise component it has a very complex nomenclature and consists itself of many modules to cover all kinds of topologies. To aid other newcomers here is an overview of what I’ve learned so far.

(more…)

How to install CyanogenMod 7 on your Samsung Galaxy Ace

I just successfully installed CyanogenMod 7 on my Samsung Galaxy Ace. Finding all the right instructions was a bit tricky so here’s my personal recollection on how to get you up and running with CM7.

(more…)

Zendesk widget for Thunderbird

I just released a small extension to integrate Zendesk with Thunderbird.

Run PostgreSQL in a ram disk

At a a project I am heavily relying on PostgreSQL in the unit tests. This works fine but creating and dropping databases takes quite a while which goes against the purpose of unit tests. One simple solution to this is to run PostgreSQL in a RAM disk which will make IO-heavy operations a snap.

(more…)

Accessing bundled resources in Java packages

This is sort of a continuation of a previous post about how to integrate your own Java classes into the Maven build lifecycle. This entry described how to generate files for inclusion in the final project artifact (ie. .jar, .war etc.). It took me some time to figure out how to access these files using Java’s resource mechanism so this post will summarize my findings.

(more…)

Calling Java classes from the Maven build lifecycle

I love Maven for its flexibility, but this flexibility at times makes it hard to figure out how to achieve certain tasks. In my case I wanted to execute a certain Java class that is included in the project at some point during the build lifecycle. While I can see many uses in my case the class itself compiles a SQL dump out of several source files so it can be included in the final artifact.

(more…)

Spring Security authentication with a Neo4j backend

I am currently working on a Spring-based Neo4j (a graph database) application and needed to add user authentication. Since Spring Security allows integrating custom authentication providers it felt only natural to implement a provider based on Neo4j. A data structure to support users and groups is already provided in the Neo4j Wiki. I used it as the basis for my implementation with some wrapping so it could be integrated with Spring Security and provides administrative methods.

(more…)

Go to Top