Introduction
Tag clouds (or word clouds) are very trendy in the fashionable web 2.0.
But very often those clouds are rendered via HTML / CSS and are quite ugly.
I discovered recently the website
http://www.wordle.net and was fascinated by the beautiful images it can generate. However they are rendered through a Java applet and the only way to use them is to take a screenshot and extract the image. I wanted clickable tag clouds, and, unless you manually create an HTML image map, it's not possible with Wordle. So I decided to create my own tag cloud generator using PHP.
I started crawling the web to get some information, and I found out an
interesting post on StackOverflow from a guy asking how to implement "something like Wordle". Suprisingly, Jonathan Feinberg, the creator of Wordle, replied to the post explaining the basic idea:
Each word "wants" to be somewhere, such as "at some random x position in the vertical center". In decreasing order of frequency, do this for each word:
place the word where it wants to be
while it intersects any of the previously placed words
move it one step along an ever-increasing spiral
That was all I needed to start coding a proof of concept, but a lot of problems still needed to be solved...
Bounding Boxes
In his reply Jonathan Feinberg says: "The hard part is in doing the intersection-testing efficiently, for which I use last-hit caching, hierarchical bounding boxes, and a quadtree spatial index".
Well that was quite too much for me, I needed to find a less efficient but simplier way to test for intersection.
So I came up with this idea:
- Each time a word is drawn, store its bounding box in an array
- To test if a new box intersects with the already drawn boxes do this:
- For each bounding box in the array:
- If the new box intersect the bounding box there is an intersection
This leads to another problem to solve: how to test if two rectangles intersect.
Rectangle Collision Detection
The scipt I wrote only allows to draw words either horizontaly or verticaly. Thus we need to test the collision of axis-aligned boxes. This is quite simple to do.
Two axis-aligned boxes do not intersect when their projection on one of the axis are disjoint. This is not the case for rotated boxes!
if ($box1->bottom > $box2->top) return false;
if ($box1->top < $box2->bottom) return false;
if ($box1->right < $box2->left) return false;
if ($box1->left > $box2->right) return false;
return true;
For arbitrarily rotated boxes you will need some more 2D geometry to test the collision, but for now let's keep it simple.
Searching a place for the new word
We have now all the pieces to write down the routine searching for a free space to draw the new words. We start in the center of the image and move the word along a spiral until it does not intersect with the words already drawn.
$i = 0;
$x = <image_center_x>;
$y = <image_center_y>;
while (! $place_found) {
$x = $x + ($i / 2 * cos($i));
$y = $y + ($i / 2 * sin($i));
$new_box = <place the word at x,y >;
$place_found = <the new word does not overlap with existing words>;
$i += 1;
}
return array($x, $y);
Changing the center of the spiral or its equation will lead to another distribution of the words in the image.
Since the PHP functions to draw text and to get its drawn dimension work with the top left corner as reference point, the above algorithm will tend to place all the vertical words on the left of the image. To prevent this I added a little bit of noize (random numbers) when selecting the center of the spiral for the vertical words.
Clickable images?
As stated at the top of this post, I wanted the generated tag clouds to be clickable. In other words I needed a mechanism to detect which word was clicked.
Since we store the bounding boxes of all the words we draw to detect the collisions, we can use this data to generate an HTML image map.
The problem is that each generation of a tag cloud will generate a different image. This is caused by the noize added when searching for the position of the words, but also to some randomness I added in the calculation of the font sizes.
That means that the tag cloud image and the image map must be rendered and sent to the client in a single call. Unfortunately it is not possible to send back to the client browser an encoded image at the same time as some HTML.
To solve this issue I used an advanced feature of the HTML img tag that allow to embed a base-64 encoded image in the URL.
First render the image in a temporary file and encode its content in base 64:
$file = tempnam(getcwd(), 'img');
imagepng($cloud->get_image(), $file);
$img64 = base64_encode(file_get_contents($file));
unlink($file);
Then set the data as the image URL
<img usemap="#mymap" src="data:image/png;base64,<?php echo $img64 ?>"
border="0" alt="" />
Unfortunately this does not work in ... well, as usual ... Internet Explorer... This is out of the scope of this article but you can find more information on how to fix this problem here:
Embedding Base64 Image Data into a Webpage
Since we now return HTML instead of a PNG image, we can as well send back the HTML image map.
See it in action
Sorry IE users this wil not work... ;-(
Get the code
The source code of the complete script can be found on
GITHub.