Thoughts on #csvconf

Last week I attended and spoke at csv,conf here in Berlin. The event was a fringe event of the much larger open knowledge festival, and billed itself as “a conference for data makers everywhere”. At Lokku we don’t so much make data as consume it, and my talk focused on the things I’d learned in eight years of parsing real estate data at Nestoria. The slides are embedded below.

The organizers chose a good venue and all the logistical details were well taken care of. Most of all though what stood out for me was the amazing diversity of the speakers. Usually when someone comments on conference diversity they mean the gender mix of the speakers, and that was well balanced, but what I mean in this case is the wide professional and geographic diversity of the spakers. As an example I went from a talk from a guy working for the Belgian army about their data management challenges to a talk from a guy trying to get the Mexican government to open their schools data.

Consistent across all speakers though was the visible belief that free and open data will lead to a better world. Many people around the world are working on this, and impressive tools (see for example csvlint.io which was presented, amongst others) and best practices are emerging, small victories are becoming more frequent, and there was a sense that slowly but surely government and large multinationals are listening and in some cases “get it”. Slowly the conversation is becoming “how?” rather than"why?“. Nevertheless, it is also clear that change, as ever, is hard and probably a complete paradigm shift in which data becomes an essential right and the key tool by which truth is help up to power will take the work of at least a generation.

The main point of my talk, beyond providing a few humourous examples of the pain of working with data suppliers, was the path to solving problems is often not just technical, but rather via communication with the partner (though please note, I don’t try to imply this is easy).

One topic I didn’t cover in my talk, but wish in hindsight I had, is that the great difficulty lies not in building data manipulation systems, but rather in maintaining them. This is something we fight with continually at Nestoria as team members flow through the project.

The cost of maintenance is high, and just like during my recent attendance at the UK ODI’s pitch event, I am left wondering what the business model is that allows open data to thrive and be maintained. With benefits often widely distributed, but costs centralized I see a challenge in creating a sustainable model. Will it just have to be that society (and thus government) simply has to see this as part of the cost of operating? Food for a future blog post perhaps …

A living hell - lessons learned in eight years of parsing real estate data from lokku

Thanks again to the organizers for making the event happen and all the speakers for the enjoyable talks

Ed Freyfogle