Zalando explores the Hadoop Summit 2016

by Simon McGloin, Anthony Brew - 6 May 2016

With this year being the 10th birthday of Apache Hadoop, Dublin saw 1,400 members of the tech community gather for the 4th Hadoop Summit Europe. These days, the word Hadoop has a somewhat negative connotation in the minds of some people, but this Summit proved that it is an all-encompassing word to describe a diverse ecosystem of technologies. With our Fashion Insights Centre located in Dublin, where our engineers and data scientists use several Hadoop technologies, it made perfect sense for Zalando to be in attendance at the two day event.

The week started with a meetup organised by the Hadoop User Group in the vibrant Silicon Docks where Zalando’s Dublin office is also located. The topic of the day was graph databases and graph processing. A talk on OrientDB by Fabrizio Fortino showed the value of the NoSQL document database. The following night a second meetup, again organised by the Hadoop Users Group, was hosted in the heart of Dublin's city centre. This event saw six speakers from around the world talk on a variety of topics. One interesting use case was presented by Vincent de Stoecklin and showed how Dataiku DSS was used to create a predictive application for one of its clients, which enables drivers to find parking spaces faster.

After an entertaining display of some modern Irish dancing for the attendees, the conference itself began with keynotes from some of the leaders in the Hadoop community. The themes set out included enterprise readiness, the value being created, and the growing and thriving state of the Hadoop ecosystem. One of the key takeaways was how Hadoop enables you to work with your data at rest in a variety of ways.

For us one of the more interesting showcases of the Summit was Apache NiFi, a project that was originally open-sourced by the US National Security Agency. The tool allows users to create bidirectional and complex dataflows from a multitude of sources and outputs, which gives us some new ideas for our current projects. Throughout the conference HortonWorks were on hand to demonstrate their distribution of NiFi, called DataFlow.


The Apache Flink project was also took centre stage for much of the conference. With this platform in use by a number of teams within Zalando we were obviously very interested in the topics based around this. A very interesting, funny, and somewhat controversial talk by Slim Baltagi praised Flink as being the 4th Generation of data processing, with Apache Spark being left in its dust as an older and outdated tool.

And of course Apache Spark was also a huge topic throughout the event, with several talks focused on the subject. One of the more interesting demos was a video of an advanced execution visualiser for Spark jobs. This UI tool created, by the Hungarian Academy of Sciences in collaboration with Ericsson, could prove useful for investigating bottlenecks and having a better understanding of the physical execution of your Spark jobs.


For the data science enthusiasts in attendance there was plenty of action. A witty demo of TensorFlow by Google’s Ram Ramanathan called “Can I hug that?” classified images as huggable or not. This, along with other demos, displayed the power of deep learning that can be applied to everything from text to images. Bill Porto gave an upbeat presentation on the current shortcomings of some Machine Learning approaches and how to improve accuracy using real world examples, giving all of us some food for thought.


During the final day's keynote, David McCandless spoke of how data can be abstract, but how visualising it aids communication and understanding. A striking example David gave was the comparison of the billions estimated to fund the Iraq war and the final cost.

We learnt a lot from the Summit and made some great connections, too much to condense into a single blog post. Perhaps by visualising the notes we took at a macro level you will see the Hadoop Summit is more than just Hadoop, it's an ecosystem.


