The making of:
Hay Fever Map The Hague
By John Hoogstrate - 10 March 2013
My first real cartography project
After putting markers on Google Maps for many years, it is finally time for something new and exciting in the field of online cartography. This is the story of how I made my own online map, with overlays based on open data with open source software. I host it on my own server and completed this in my spare time over the course just two weeks. This article is intended for a technical audience, with the aim to educate and inspire.
The municipality of The Hague has been releasing datasets as open data for a little over a year. The amount of released data is still relatively small, but we need to start somewhere. The open data can be used for the benefit of society, and at the same time makes the government a little more transparent.
The Hague is the third-largest city in the Netherlands, after Amsterdam and Rotterdam. It has a population of just over 500,000 and an area of around 100 km². It is in the west of the country on the North Sea. About 1 million people live in the urban area of The Hague, on about 405 square kilometers.
In February 2013 the data store for The Hague was opened. I was able to download the tree data without creating an account, but that does not seem possible anymore. It is slightly ironic that the "open" data is locked away in a store that requires registration to access it.
The tree data was provided as a CSV file with no other information attached. After downloading the 13 megabyte file, it turned out that it contained information on over 120,000 trees. With some guesswork regarding the column names, it was relatively easy to determine that it included the scientific Latin name of the species, the Dutch name, stem size, and some coordinates. There was also a non-unique number column, a date column, as well as a column for monument ID, which was not used for most trees. I did not make use of these three columns, as there was not enough context.
The coordinates presented a bit of a problem. They were not the usual latitude and longitude that one would expect. The file name contained a clue; the letters "RD". After some research, I was able to determine that RD is the abbreviation for Rijksdriehoekmeting, a coordinate system that is used exclusively in the Netherlands. It was invented long before satellites could be used to determine a location.
My first reaction was to try and find a formula to convert the coordinates to latitude and longitude. Thankfully, I am not the first to encounter this problem. The software that I already had running can work with RD. The only thing needed was the right "SRS projection string" and you are ready to go:
A sick idea
After plotting the trees on the map, I still had no idea what to do with them. A few people might find it interesting to see what the greenest part of the city is, or even what kind of trees are growing where. But with 390 different species in the dataset, it would only appeal to the most hardcore of tree enthusiasts. Another idea that crossed my mind was finding the ideal spot for a hammock. This involves finding two firm-stemmed trees spaced between 3 and 4 meters apart. The issues of traffic and privacy need to be taken into account, but this data was not available. It had also become obvious that only trees managed by the municipality were included, and thus many parks seemed not to contain any trees at all.
Hay fever (Allergic rhinitis) is an allergic inflammation of the nasal airways. Allergic rhinitis is triggered by the pollens of specific seasonal plants. It is commonly known as "hay fever", because it is most prevalent during haying season. In Western countries, between 10–25% of people are affected annually.
When showing the trees projected on the map, it was my friend Öz who came up with the idea that people who suffer from hay fever might find this interesting. Having never suffered from this affliction myself, it had not yet crossed my mind.
Some quick research revealed that there are certain kinds of trees that cause the most allergic reactions. In the Netherlands these are alder, birch, hazel, and oak. As such, the species in these groups should be highlighted on the map.
The data here cannot guarantee that a certain tree is not in the area, but if you happen to be allergic to a certain species, the map can give you a pretty good indication of streets to avoid when looking for a home.
Styling of the map
The goal of the map is to show the locations of the trees that are the biggest causes of hay fever symptoms. Any map features not relevant to this have been reduced as much as possible, while still trying to make it easy to determine what location is being shown.
To style the map, I used TileMill. The latest version of the software is 0.10.1, and as the version number indicates, it is not quite ready for prime time. Although it crashed a few times, you can work around many of the bugs and missing features. The online manual is of exceptional quality, and was very useful.
The data for the map is supplied by OpenStreetMap. Anybody can edit it and you can download the data of any country. It is like the Wikipedia of maps. The datasets from OpenStreetMap are a real game-changer; with this anybody can set up a map server.
Styles are applied using CartoCSS, a dialect of CSS made specifically for cartography. It works almost like regular CSS, but is quite different in many subtle ways that you would not expect. What I missed the most is an inspector which you can use to hover over an element and find out what layer it is on. That would save a lot of time toggling the visibility of layers and reloading the map over and over.
First tests with importing the tree data were done on top of the OSM Bright stylesheet. This is a general purpose stylesheet that includes many features, and uses different colors to highlight them.
Because we want to focus less on the geographical features and more on the data, I chose to use the Toner stylesheet as a basis. This is a high contrast stylesheet that uses mainly black and white and various bitmap patterns to differentiate between features. However, I found the high contrast and the patterns drew too much attention to themselves. For my map, the contrast has been greatly reduced. Patterns have been removed, and features and labels are in a shade of gray. The only colors are a light blue for water and a light green for parks. By removing irrelevant details and reducing color and contrast it becomes easier to focus on the data.
The city is divided in neighborhoods, which are known by most inhabitants and are therefore visible on the lowest zoom level. No other text labels are visible at this level. Because some labels were automatically positioned over clusters of trees, I had to adjust the size of many of them manually to keep them readable.
Trees are displayed as semi-transparent dots of different colors. The trees have different blend modes applied. This means that they are displayed more intensely when the dots overlap. This makes areas with lots of trees clustered together stand out visually. The colors of the trees have been chosen to stand out from the rest of the map, be distinguishable from each other, yet fit together as a group.
As we zoom in, the neighborhood labels become less opaque. Instead, the major streets start showing labels. The dots resembling trees turn into larger circles. It becomes easier to distinguish the individual trees now that they are spaced further apart. The stem size is known, and as bigger trees produce more pollen, the stem size is reflected in the size of the dot.
Serving the map
Tilemill is made by a company called MapBox that offers web hosting for interactive maps. But I think it's more fun to do it yourself. The map is exported as an MBTiles file. It is an SQLite database with a certain format and a whole lot of PNG images stored in it. There are over a dozen implementations. I started with one made in PHP, because that is what I could get running the quickest. Then I modified it to support caching, and extended it to serve the metadata, and thus TilePhlox was created.
I got the domain name "plekjes.com", which is Dutch for "places.com". The great thing about this name is that it is not just short, but I can prefix it with all kinds of subdomains for future geographic projects. In case you were wondering, "hooikoorts.plekjes.com" means "hayfever.places.com".
Displaying a map requires a lot of tiles. It doesn't matter that it is just one city. Just opening the map in the default view requires 24 tiles (depending on viewport size). That means 24 requests to the web server. The current configuration with Apache and PHP will most likely not perform very well under a high load. But that is something to worry about later.
The tree data was released in early 2013 and the map is frozen in time at that point. Unless the municipality makes a commitment to release an updated dataset periodically, the map will slowly become out of date and will gradually lose its usefulness.
It has been a very educational experience to craft a map out of raw data. Not always enjoyable, but rewarding in the end. The ability we have to create our own maps is a huge leap forward. The OpenStreetMap data is a good enough foundation for most projects, and has worldwide coverage. Personal computers are now fast enough that you can render a map of a city like The Hague on street level in less 30 minutes. The software is not quite there yet, but the whole stack is open source. It is under active development and shows great promise. Some amazing stuff is coming!
But to make amazing stuff, the amount of available open data needs to grow, and that takes time. Governments still need to learn that there is no such thing as "half open". There is no point in locking data up in stores. That is just silly.