Migrating from Wordpress to Jekyll

posted by on 28 Apr 2012

I just switched this blog from an ancient version of wordpress running on a VPS to a static-file jekyll bootstrap site (hosted by github). Let me know if you experience any wierdness on the site or feeds. I’ve taken good measures to make sure links don’t break (old URLS should get a 301 permanent redirect to blog.perrygeo.net) but let me know if you get any 404s.

So why do it?

  1. Having a PHP-MySQL app running on a VPS just to serve up a bunch of blog posts seemed excessive. I don’t have the desire to maintain that sort of infrastructure for a simple blog!
  2. Wordpress’ editing and admin interface suck. I prefer vim and bash.
  3. Markdown is a great language for quickly banging out blog posts.
  4. Static files just make sense for what is basically static content.
  5. Github pages provides the hosting for me and even handles CNAMEs for DNS.
  6. Managing revisions with git.

The conversion process

It was not an entirely smooth transition, most of which can be traced directly to dumb decisions on my part. I won’t recount the entire process (there are plenty of guides on internets) but I’ll outline the major steps here:

  1. Export the wordpress blog to an xml file. I has to use xmllint to clean it up a bit.
  2. Set up a disqus account and import my wordpress file. Disqus will handle all the comments which are the only dynamic content on the page.
  3. Use exitwp.py to convert the xml to jekyll markdown files. This worked OK. Not great. Tags and formatting did not come through as expected and I had to wrestle the script a bit. Tables were destroyed and some iframes (youtube links) were lost.
  4. Forked Jekyll Bootstrap and brought in my posts.
  5. Started tweaking of css and markdown to get formatting right. Still have a ways to go on this front - let me know if there is any content you’d like me to restore faster than others.
  6. Had to write a little web service to redirect posts; the old blog stupidly used the default wordpress URLS like /wordpress/?p=4 which needed to go to /2010/01/01/blah
  7. My images were all over the place; some I had in wordpress uploads, others on various servers, some were absolute links, others relative. Gathering them all in one place and using some sed-fu to get the paths right was essential.
  8. Retagged some posts - still working on tags.
  9. Set up Google Analytics to track usage.

I think that’s about it. There are still some big formatting problems on older posts (mostly due to the fact that I used blockquotes for code). And tables are still destroyed. I’ll be working on cleaning these up as I go along.

Overall impression of Jekyll-Bootstrap and hosting with Github pages? Awesome. I would highly recomend it to anyone starting a new blog or converting a smaller/better-behaved wordpress site. It is so much better than having to deal with PHP and MySQL (hopefully the last time I’ll ever see them!). But the conversion was a bit tricky and took way more of my Friday and Saturday than I’d like to admit. I would not want to do that again… But I’m glad did.

What do you think of the new digs?

Working with mbtiles in python

posted by on 25 Mar 2012

python-mbtiles. Check it out.

I’ve been working a bit with Tilemill lately and love the Carto css styling, iteractivity through UTFGrids and being able to export the whole deal as a single mbtiles sqlite database. But when it comes to working with the mbtiles databases, I’ve found both Tilestache and Tilestream to be fairly limiting:

Tilestache serves images but does not (yet) serve up UTFGrids directly from mbtiles while Tilestream hardcodes a “grid()” JSONP callback around the returned json data making it fairly specific to Wax client libraries.

So I went down two paths, first trying to export all the tiles out of mbtiles to json and png files (for those times when you just want to serve static files), then trying to write a simple server that would do dynamic jsonp callbacks. Turns out that in the process, I was able to abstract a lot of the python< ->sqlite interaction into some generic python classes.

Thus python-mbtiles was born. It provides a simple mbtiles web server, a conversion script, and some python classes to work with. No frills, no anything really at this point. More an experiment gone right that might be useful to someone out there in GeoPython land. Enjoy and let me know if you have any ideas!

Average Aspect

posted by on 18 Mar 2012

Ever try to figure out what the average aspect of an area is? i.e.

What direction does this hillside face?

Let’s say we want to determine the average elevation of an area based on a raster DEM. Just take the arithmetic mean of all the elevation cells contained in the area - a simple zonal statistics problem.

Turns out that aspect is not quite as straightforward. True, we can easily use gdaldem to create an aspect map.

gdaldem aspect elevation.tif aspect.tif

This gives a raster with values in degrees: 0 is north, 90 is east, 180 is south, etc… but note that 360 is north as well. We’re dealing with angular units, not linear units.

For example, take a nearly North facing hillside; the left edge is facing slightly NW (350 degrees) while the right edge faces slighty NE (10 degrees).

The arithmetic mean of the aspect values = (350+350+10+10)/4 = 180°. Due south? That’s entirely wrong! It doesn’t take into account the angular units. For that we need to create grids representing the sin and cos of the aspect.

Luckily you can use the handy gdal_calc.py utility that comes with recent versions of gdal. This allows you to apply numpy’s trigonometric functions to a raster…

gdal_calc.py -A aspect.tif --calc "cos(radians(A))" --format "GTiff" --outfile cos_aspect.tif  
gdal_calc.py -A aspect.tif --calc "sin(radians(A))" --format "GTiff" --outfile sin_aspect.tif

Now we can look at the sum of the cos/sin grid cells for our area and take the arctangent according to this python code

import math
avg_aspect_rad = math.atan2(sum(cos_cells), sum(sin_cells))
avg_aspect_deg = math.degrees(avg_aspect_rad)
print avg_aspect_deg 

In our example avg_aspect_deg comes out to an aspect of 0 degrees (due north) which is exactly what we’d expect.

Thanks to Dan Patterson for his forum post which clued me into this approach.

UTFGrids with OpenLayers and Tilestache

posted by on 24 Feb 2012

A while back, the Development Seed team developed the UTFGrid spec to provide

a standard, scalable way of encoding data for hundreds or thousands of features alongside your map tiles.

The basics

In more detail, the UTFGrids are invisible “ASCII Art” and attribute data embedded in json. They sit “behind” your map tiles (they are not rendered visually) and allows quick attribute lookups without going back to the server. This allows a high degree of real-time map interactivity in an HTML web map - something that has typically been the strong point of plugin-based maps.

So take this tile image…

and it’s corresponding “utfgrid” …

          !######$$$$%%% %%%% % 
          !#######$$$$%%%    %%%
         !!#####   $$$%%%    %%%
         !######  $$$$%%% %% %%%
        !!!####  $$$$$%%%%  %%%%
      ! !###### $$$$$$%%%%%%%%%%
     ! !!#####  $$$$$$$%%%%%%%%%
    !!!!!####   $$$$$$%%%%%%%%%%
    !!!!!####   $$$$$$%%%%%%%%%%
    !!!!!####   $$$$$%%%%%%%%%%%
    !!!!!#####% $$   %%%%%%%%%%%
    !!!!!### #      %%%%%%%%%%%%
    !!! #####   ''''%%%%%%%%%%%%
     !   ###      ('%%%%%%%%%%%%
       ) ### #  ( ((%%%%%%%%%%%%
      ))  ##   (((((%%%%%%%%%%%%
      ))  #    ****(+%%%%%%%%%%%
       )        %**++++%%%%%%%%%
       , , ------*+++++%%%%%%%%%
.     ,,,,,------+++++++%%%%%%%%
..  /,,,,,,------++++++%%%%%%%%%
.  //,,,,,,------000++000%%%%%%%
  211,,,,,33------00000000%%%%%%
 2221,,,,33333---00000000000%%%%
222222,,,,3635550000000000000%%%
222222,,,,6665777008900000000%%%
22222::66666777788889900000 %%%%
22222:;;;;%%=7%8888890  0   %%%%
22222;;;; ==??%%888888  00 %%%%%
222222 ;;  =??%%%8888       %%%%
222     ;;   ?A>>@@@          B%
CCC      ;;   DEE@@@          BB

You can see how each character corresponds with a country. The character’s code is used as a lookup key to retrieve the data associated with that feature (which is also included in the json tile).

If you want to dig in, check out the mapbox demo.

The Server side

I’m going to assume you have Tilestache and Mapnik 2+ already installed (if not, you should!). The steps to configuring your server for UTFGrids are fairly simple..

First, set up mapnik xml file pointing to your data source.

<?xml version="1.0"?>

<!-- An ultra simple Mapnik stylesheet -->

<!DOCTYPE Map [
<!ENTITY google_mercator "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs +over">
]>

<Map srs="&google_mercator;">
    <Style name="style">
        <Rule>
            <PolygonSymbolizer>
                <CssParameter name="gamma">.65</CssParameter>
                <CssParameter name="fill">green</CssParameter>
                <CssParameter name="fill-opacity">0.5</CssParameter>
            </PolygonSymbolizer>
            <LineSymbolizer>
                <CssParameter name="stroke">#666</CssParameter>
                <CssParameter name="stroke-width">0.3</CssParameter>
            </LineSymbolizer>
        </Rule>
    </Style>
    <Layer name="layer" srs="&google_mercator;">
        <StyleName>style</StyleName>
        <Datasource>
            <Parameter name="type">shape</Parameter>
            <Parameter name="file">sample_data/world_merc.shp</Parameter>
        </Datasource>
    </Layer>
</Map>

Next, set up tilestache configuration file

{
"cache": {
           "name": "Disk",
           "path": "/tmp/stache"
},
"layers": {
    "world":
    {
        "provider": {"name": "mapnik", "mapfile": "style.xml"}
    },
    "world_utfgrid":
    {
        "provider":
        {
        "class": "TileStache.Goodies.Providers.MapnikGrid:Provider",
        "kwargs":
        {
            "mapfile": "style.xml", 
            "fields":["NAME", "POP2005"],
            "layer_index": 0,
            "scale": 4
        }
    }
  }
}

Finally, you’re ready to run the tilestache server…

tilestache-server.py -c your.cfg -i localhost -p 7890

Now you should be serving utfgrids to http://localhost:7890/world_utfgrid/

The Client side

Now we need something to consume the UTFGrid tiles and interact with them in an HTML/JS environment. The original client implementation of UTFGrid support is provided by Wax which sits atop mapping clients like Modest Maps and Leaflet. Wax is very slick and easy to use but doesn’t work so well for more complex arrangements or with OpenLayers-based maps.

Rather than clog up Wax with the complex UTFGrid use cases that we envisioned, we decided to implement a UTFGrid client in native OpenLayers. Hence my project for the OSGEO code sprint was born.

olexample.PNG

The result was a new OpenLayers Layer which loads up the json “tiles” behind the scenes…

        var grid_layer = new OpenLayers.Layer.UTFGrid( 
            'Invisible UTFGrid Layer', 
            "./utfgrid/world_utfgrid/${z}/${x}/${y}.json"
        );
        map.addLayer(grid_layer);

and an OpenLayers Control that handles how the mouse events interact with the grid. In this example, as the mouse moves over the map, a custom callback if fired off which updates a div with some attribute information.

       var callback = function(attributes) {
            if (attributes) {
                var msg  = "<strong>In 2005, " + attributes.NAME 
                    msg += " had a population of " + attributes.POP2005 + " people.</strong>";
                var element = OpenLayers.Util.getElement('attrsdiv');
                element.innerHTML = msg;
                return true;
            } else {
                this.element.innerHTML = '';
                return false; 
            }
        }

        var control = new OpenLayers.Control.UTFGrid({
            'handlerMode': 'move',
            'callback': callback
        });
        map.addControl(control);

Overall the design goal was to decouple the loading/tiling of the UTFGrids from the interactivity/control. I think this works out nicely and, while a bit more cumbersome than the method used by Wax, it is more flexible and integrates well with existing OpenLayers apps.

You can see them in action on the examples pages:

* Demonstrating the use of different event handlers (click, hover, move)

* Demonstrating multiple interactivity layers (the interactivity layer need not visible in the map tiles!)

And feel free to check out the code at my github fork for the code.

What do you think? Let me know…

Optimizing KML for hierarchical polygon data

posted by on 18 May 2011

For all the benefits of KML, it is decidedly a step backwards for handling large vector datasets. Most KML clients, including the cannonical Google Earth application, experience debilitating slow-down when viewing a couple dozen MB of vector data - datasets that I could easily open on a Pentium 4 in ArcView 3.2 10 years ago!

The unfortunate reality is that optimizing the performance of KML datasets is conflated with the structure of the data and is thus the responsibility of the data publisher. The wisdom of combining styling, performance-related structure, organizational structure, geometry and attributes into a single file format may be questionable, but KML has become the defacto geographic markup language due to it’s other benefits.

Anyways, back to performance enhancements on big vector datasets… The concept of “regionation” is used by several KML software to improve performance. From the Google LatLong Blog:

You can think of Regionation as a hierarchical subdivision of points or tiles, which shows less detail from afar, and more detail as you zoom in to the globe. This dynamic loading creates clearer visualizations by minimizing clutter, while simultaneously speeding up the rendering process.

In most implementations, there is a generic strategy for determining this hierarchy based on attributes or geometry size (in the case of vectors) or by a tile system. Neither is ideal when you want to preserve the vector nature of the data, split it into small, easily-loadable files and determine it’s view based on the natural hierarchy that is built into the data structure.

Specifically I am thinking about watersheds here - the US Hydrologic Units. Hydrologic units are watershed boundaries that are organized in a nested hierarchy; higher levels contain smaller watersheds that are contained within a single watershed from a “parent” level. The unique identifiers (hydrologic unit codes or HUCs) are rather ingenious as well; Each level is represented by 2 digits and are concatenated to form a single identifier that can be used to determine it’s “parent”. For example:

Level 4 HUCs

Level 5 HUCs

Level 6 HUCs

Level 4 HUCs
e.g. 17090011

Level 5 HUCs
e.g. 1709001104

Level 6 HUCs
e.g. 170900110403

Instead of fabricating a hierarchy of features, why not just use this natural hierarchy to structure the KML documents?

hucs-1.png

Or as KML markup:

    <placemark>
        <name>17090009</name>
        <styleurl>#HUC_8-default</styleurl>
        <polygon><outerboundaryis><linearring><coordinates>...
        </coordinates></linearring></outerboundaryis></polygon>   
    </placemark>
    
    <networklink>
    <name>17090009_children</name>
    <region>
      <latlonaltbox>
        <west>-123.001645628</west>
        <south>44.8300083641</south>
        <east>-122.203351254</east>
        <north>45.298653051</north>
      </latlonaltbox>
      <lod>
        <minlodpixels>256</minlodpixels>
        <maxlodpixels>1600</maxlodpixels>
      </lod>
    </region>
    <link>
      <href>./17090009_children.kml</href>
      <viewrefreshmode>onRegion</viewrefreshmode>
    </link>
    </networklink>

The advantages to this design are that you don’t have to break the geometries up to fit into a square tiling pattern, data loads and renders in a logical pattern and there will always be 100 or less (usually far less) placemarks per file due to the design of the HUC data structure. File sizes stay low, network links load quickly and request/rendering occurs only when they come into view. For this example dataset totaling 300M of shapefiles, there are several hundred resulting kmz files without any repeated features and all less than ~ 150K each. In essence, it achieves optimal performance by its very design.

Here’s a video of it in action:

This was all done with a fairly “hackish” python script. I’ll continue to refine it as needed for this particular application but, at this time, it’s not intended to be a reusable tool - if you want to use it, be prepared to dig through the source code and get your hands dirty. The same concept could theoretically be applied to any spatially-hierarchical vector data (think geographic boundaries … country > state > county > city).

Older Posts