by Martin Dobias
The code has been merged to SVN trunk and will be a part of QGIS v1.4.
This page summarizes the work done on my summer of code project. OSGeo asks participating students to prepare weekly reports, so mine will be posted here.
For brief introduction of this project check out the my blog post on QGIS blog.Important dates (see timeline):
- May 23: summer of code begins
- July 6-13: mid-term evaluation
- August 17-24: final evaluation
- mid-term - labeling with PAL integrated into QGIS with some essential options
- final - dynamic/static labeling, GUI for manual editing of labels, new options
|#1: 23.5. - 29.5.|
|[[#Week #1: May 23-29|report]]|
|#2: 30.5. - 5.6.|
|[[#Week #2: May 30 - June 5|report]]|
|#3: 6.6. - 12.6.|
|[[#Week #3: June 6-12|report]]|
|#4: 13.6. - 19.6.|
|[[#Week #4: June 13-19|report]]|
|#5: 20.6. - 26.6.|
|[[#Week #5: June 20-26|report]]|
|#6: 27.6. - 3.7.|
|[[#Week #6: June 27 - July 3|report]]|
|#7: 4.7. - 10.7.|
|[[#Week #7: July 4-10|report]]|
|#8: 11.7. - 17.7.|
|[[#Week #8: July 11-17|report]]|
|#9: 18.7. - 24.7.|
|[[#Week #9: July 18-24|report]]|
|#10: 25.7. - 31.7.|
|[[#Week #10: July 25-31|report]]|
|#11: 1.8. - 7.8.|
|[[#Week #11: August 1-7|report]]|
|#12: 8.8. - 14.8.|
|[[#Week #12: August 8-14|report]]|
So this is the end, dear readers.In this last week I've done some bugfixing:
- fixed problem with misplaced or missing labels for features due incorrect clipping by actually removing the code responsible for clipping. This might be a bit controversial change, however I had good reasons for doing so: it was giving wrong results nearly all the time, being a big chunk of code, hard to debug. It can be relatively easily replaced with GEOS routines. Moreover the clipping should be done when registering the features and not when actually extracting candidate labels.
- fixed misplaced labels with higher rotation
- fixed occasionally flipped characters in curved labels
I've added the same options for curved labels as used in parallel ones: multiple choice above/on/below line, for labels above and below line distance from the line.
All the work is available in the symbology-ng branch of QGIS source repository.
Conclusion: I consider my summer of code project as a successful one - although I haven't done everything I wanted to do within this project. The most important objectives have been met: I've integrated PAL library into QGIS for label placement, created a graphical interface where the user can easily set how vector layers should be labeled. And I've added a bunch of new functionality to PAL together with some bugfixes and refactoring.
I'm happy that this project has created some momentum around PAL and label placement in general and it's very likely that in near future PAL will get under the umbrella of OSGeo. This might make it more available for interation in other projects and might attract new contributors.
During the summer I've been splitting all my time between the work on GSoC and qgis-mapper project. That's a project we've been developing in a group of students at university and aims to create a platform for mobile mapping, useful especially for street mapping. (To be released soon.)
I've added saving of layer labeling settings into project file. For
this I've added "custom properties" for map layers - basically a
key-value map of values assigned by plugins to the layers. This map is
saved to project and loaded back during the project load.
Another feature I've added is the ability to remove duplicate labels
of connected lines. If turned on, the features with equal label text
are grouped and the algorithm tries to merge the lines into one: this
helps also to generate better label candidate positions.
Among other developments I've tweaked cost evaluation of label candidates:
- candidates further from the line's center point are penalized so the
label is now mostly put on center (for better readability)
- short lines / small polygons are penalized to increase possibility
of labeling greater (and thus more important) map features
The SoC is rapidly coming to the end, but there's still quite some
work that could be done, which I won't manage to do during the last
days of the program. The next week I'd like to finish some smaller
features, squash some bugs and maybe try to do some speed improvements
and documentation. Shortly after the end of SoC 2009 I'm travelling
abroad for several weeks, then I'm going to bring the labeling (and
new symbology) ready for QGIS 1.3 release.
This week I've done some refactoring and various cleanups in the PAL library. I've simplified the process of registration of features in a layer and other related stuff. This allowed me to add possibility for user to have labelled either every part of a multi-part feature or place only one label for feature. When using one label per feature, the engine selects the biggest/longest part that will be labelled.
While playing with the labeling I've found out that we should probably add some tweaks to the costs of labels regarding feature's size. In the attached map of world you can see that small countries in Europe like Andorra or Lichtenstein are labelled, while other much larger countries are left without a label.
Unfortunately I haven't done much more as I've spent few days out of office. I hope to catch up the next week.
Most importantly I've managed to get the curved labels working! The implementation still needs a lot of love, but at least there's something that can be built upon. I've borrowed some code from Mapnik to avoid spending hours with getting the placement right. To make curved labels possible in PAL, I've added support for multi-part labels and (optional) advanced label information which contains each character's width to calculate the curved labels correctly.
I've also added an option to show labels for all features - despite the conflicts they have with other labels.
Next week I'd like to add some options to remove repeated labels and do some improvements to the curved labels.
I've started to look into the topic of labels that follow the lines, i.e. curved labels. PAL library doesn't offer this functionality - the labels can be only rotated. But for good cartographic result it isn't always sufficient.
Curved labels can be produced by splitting the text to characters and finding out the right angle for each character to fit the line direction as much as possible. The attached picture shows a simple experiment of such fitting (a test app I've done in Python).
I'll probably reuse Mapnik placement finder code which seems to be relatively simple and has all the math inside (so I don't have to derive everything again).
I've done some work on making the LabelPosition class (from PAL) more universal so that label could be not only straight but possibly also curved.
Next week I'd like to get this to a usable state, though we're finishing a big school project next week (qgis-mapper) so I'm not yet sure how much time I'll be able to allocate.
First of all I've been exploring more precisely how the cost is assigned to the candidate labels. I may post the details later, this is going to be simplified a bit. First of all: We'll be looking for the cheapest solution, i.e. lower cost is better. The cost is assigned when the candidate is created, ranging from 0.0001 to 0.0021. Then, candidates are checked against obstacles (other features), scoring +1 for every overlap. Finally, the candidates are ordered by their cost and only few cheapest ones are kept. The only exception is with polygon candidates - their cost evaluation is very computationally intensive so the cost is assigned after filtering against obstacles.
Costs for points are assigned so that top-right candidate is the best, the bottom-left candidate is the worst, the rest somewhere between these two. I'd like to customize this, so one could prefer some positions or disable some of them. For line label candidates there's calculation of distance from label's begin to end on line and the cost is real line distance vs straight line distance ratio (using label's start and end points), candidates on straight segments are prefered.
I've done some refactoring - got rid of of many "friend class" declarations (that pretty much exploit the encapsulation), moved most of the cost calculation code to one place and removed some dead code.New functionality:
- "over point" placement that tries to place the label directly over the point rather than placing it around the point. Applicable also for polygons as "over centroid" option.
- "horizontal" placement for lines: this is prefered placement option when it comes to e.g. drawing of road shields.
- ability to select where exactly place parallel line labels - either above, below or on the line. Or any combination of these options. The above/below side is determined either from map orientation or from line direction.
- fixed calculation of point-label distance, resolving a problem with points not behaving as obstacles
- fixed cost evaluation where all candidates for polygons with lat/lon coordinates got equal cost
Next week I'd like to focus on improving the label placement options.
I've been poking a bit more into the PAL library and allowed retrieval
of candidate label positions, In engine configuration there's now an
option that displays all candidates as frames when the labeling is
done. It helps to see what is actually going on in the library. I've
created a simple map canvas tool that allows inspection of costs of
the label candidates. When the tool is active, clicking a candidate
shows its cost in a tooltip. This helped to find some problems, e.g.
incorrect cost calculation for polygons with numerically small
coordinates (e.g. degrees).
Another new features are scale-based visibility of labeling and
support for buffer (mask) around the rendered font for better
readability. With the increased number of settings, I've expanded
labeling configuration dialog. Finally, some bugs were fixed here and
We're going to need some more changes in PAL library, mainly to have
better control when creating candidate labels. The PAL code probably
should get some refactoring: there are quite many places where
classes/functions access directly private members of other classes
making the code harder to understand and modify.
Next week I'd like to continue with adding features to point and line labeling.
- registration of features - features are checked and inserted into R tree
- extraction of the problem - generation of label candidates, pruning of candidates and creation of collision graph
- problem solving - creation of initial solution (FALP) and its optimization (POPMUSIC)
Registration of polygons was slow due inefficient check for self-intersecting rings (in splitGeom function). I've changed it to isValid check from GEOS and the time for rendering loop (where the registration takes place) went from 4.5 secs to about 0.7 for my test layer. Generation of candidates when labeling around centroid was taking 5.9 seconds with much time spent calculating centroid. Actually it wasn't calculating centroid ("center of mass"): using a grid 10x10 points (in bounding box) it was looking for the point farthest from the polygon ring. Unfortunately sometimes the "best" point was somewhere in the corner of bounding box, thus the problem with misplaced labels I had. Using a correct centroid algorithm I've got rid of misplaced labels and the processing time down to 0.6 seconds. Of course centroid is not necessarily located inside the polygon (think of U shaped polygon) but the performance and position are fine.
Somehow worse situation is with horizontal and free positioning for polygons. 91% of the time is spent in calculation of cost of candidates that is very expensive. Additionally there's a large amount of candidates for some features: for 118 polygons there are 33K candidates, that's too much I think.
The exams period is over, next week I'd like to continue addressing the speed and do some user features.
This week was quite weak when speaking about SoC: I've been mostly studying for my exam from statistical methods (done!) and other school duties. I've done just few more measurements and profiling of PAL performance. Labeling of polygons with "Horizontal" and "Free" candidate generation methods has very poor performance: it took nearly 60(!) seconds to generate the candidates for my cca 100 polygons layer. The polygons aren't really simple, but that's just too long. These two methods contain more complex geometry handling like splitting the polygons into convex polygons which are then used for generation of candidate labels.
The next week will be very busy too: there's last exam I have to take (analyses of data structures). It's a really tough one, so there will be probably no progress next week, but I hope to catch up in following weeks when the summer begin (why the "summer" of code starts in May?)
I've added more options to the labeling settings: layer priorities, whether the layer features should be considered as obstacles and distance of label from the feature (applies only to points and lines). There's also a new dialog for configuration of PAL labeling engine: how many label candidates show be generated and what search method should be used for the placement.
The main issues from the last week are fixed: segfault due incorrect object deletion is gone, exception from PAL is gone too (happened with invalid lines).
Labeling speed is a serious issue in some cases, so I started to address it: I've added a labeling hook into vector layer's rendering routines that eliminates the need to iterate the layer second time just for the labeling. Then I've started to watch which parts of the labeling take most of the time. My conclusion is that usually most of the time the bottleneck is in extraction of the optimization problem from the data. The solution itself is usually quite quick if there isn't a high number of conflicts. The worst performance is with polygon layers (points and lines are relatively fine). For example it takes about 10(!) seconds to label my test layer with 100 polygons. I've decided to fire up callgrind and measure a bit. The results are fairly clear: when labeling polygons (using centroids), roughly 90% of time is spent in two functions: splitGeom and getCentroid. Hopefully that's something that can be optimized!
Next week I'd like to do some more profiling, explore the geometry code in PAL a bit and think about possible optimizations. The idea is to make the labeling suitable for ordinary map browsing. Having to wait several seconds on every map refresh is barely acceptable for the users. Apart from these issues I'd like to start with some user-visible features, e.g. defining classes of features in layer.
Currently the biggest problem is the exams period during June, one exam every week.
I've made some good progress with integration of my labeling tool into QGIS: I have created a simple plugin where you can set labeling options for every layer (in a dialog). Currently the options include placement method (depends on layer type), field for labeling, font and color. PAL supports few more options which will be added soon. The plugin hooks into the map renderer so that every time the map is rendered, plugin lets PAL do the calculation of labeling and draws the resulting labels on the map. The labeling can be done with more layers at once and the labels are now properly rotated.
I commit my changes to symbology-ng branch where I've been doing already some symbology/renderer improvements. The plugin is located in src/plugins/labeling directory.Next week I'll work on implementing the rest of the options PAL library offers for free (layer priorities, obstacles, engine configuration). Besides that I have a growing TODO list of various related tasks:
- store labeling information in vector layer (so it gets saved in project)
- extract the geometries for labeling in rendering loop to save one traverse of layer
- calculate labeling in a separate thread and draw it when it's ready -> user doesn't have to wait until the labeling is done
I've came across some problems already: sometimes when an exception is thrown PAL locks itself on an already locked mutex, sometimes I get a segfault with a dense layer (not sure yet what's the problem). Polygon labeling is sometimes painfully slow and sometimes the label can be placed outside the polygon. I'd like to address these problems in following weeks.
Like this week, the next one I have another exam so please bear with me :-)
I've downloaded sources of PAL (library for automatic label placement) from svn trunk, built it and started with some experiments. With configure option --enable-svgmap turned on it creates a .svg file that shows bounding boxes of the labels. I've managed to get it working and finally today I was able to create a simple command line tool (based on QGIS libraries) that opens a layer consisitng of points, draws them to an image and draw labels returned by PAL.
Take a look at the result: there are no collisions between the labels. Good. You might still see some problems - sometimes it's unclear to which point the label belongs.
Besides the labeling I've been busy with exams and the next week will be in similar fashion. I'd like to continue with my experiments in form of a plugin for QGIS that would do the labeling. So the next week I'll prepare some foundations of the plugin and GUI for configuration. Once the plugin will be working well, I'll consider putting the functionality into core library.
Currently I'm not blocked on anything. I'd like to continue with tests with PAL library and study some papers covering the algorithms implemented.