At Zalando, we strive to help our customers find the most relevant fashion they can imagine. Zalando is known for its great fashion assortment and its huge selection of products. In order to ensure that our customers are not overwhelmed by this vast selection of products, the Recommendations Team builds systems that ensure that customers find products in an easy and convenient way.
Zalando offers its customers the possibility to subscribe to email newsletters, where subscribers receive a weekly email detailing the latest trends, Zalando news, sale announcements, and a selection of personalised product recommendations. It is extremely important in this context to provide recommendations that are truly personalised to each customer.
In this post I will describe the challenges that we faced while implementing personalised recommendations in the newsletter and elaborate on the technical decisions we made. I will also touch on some particularities we ran into while migrating the service to Amazon Web Services.
Emails are a bit particular
A newsletter email typically contains four to ten recommended products. The email itself is a rich HTML document that gets rendered at the moment the user opens the Zalando newsletter in their email client. The recommendations are displayed as product images with brand and price captions. These all link to the respective product detail pages in Zalando’s Fashion Store.
The recommendations in the email require only an image URL and a destination page URL to be displayed, as shown in the diagram below. When the email is sent out, it only contains placeholders. In other words, the recommendations’ URLs do not link directly to a specific product. The products that will be shown are selected when the email is opened for the first time. Once it is opened, multiple requests are effectively made simultaneously to render the recommendations’ images.
Never recommend the same thing twice
At this point we are faced with a challenge. The recommendations need to be a meaningful selection of products, but requests for each one are made independently. Even with a stable set of rules for choosing the recommendations, the selection made in two separate requests can differ due to products selling out or new products coming into stock.
Inconsistent recommendation selection may not be an issue in itself, but it can lead to product duplication, as shown below. In the diagram we can see that the recommendations at two different positions could be, if coming from two different selections, duplicates.
Duplicating product recommendations in the same e-mail is completely unacceptable. Therefore we need to completely eliminate this possibility.
The process of selecting recommendations can be computationally intensive, so in order to save on computing, and to avoid duplicates, we opted for building a solution that would select the products only once, when the first request is received. These would then be cached and the remaining requests would use the cached recommendations.
Load balancers, load balancers
Part of the challenge is the fact that our recommendations service needs to scale horizontally and therefore sits behind a load balancer, as depicted in the image below.
If we don’t intervene and the requests are left to be distributed by the LB, we will induce many cache misses and redundant computation of recommendations.
In the next sections I will outline two solutions to the challenge and we will see how load balancing comes into the spotlight. One described solution will be based on load balancing hardware that we have control of in a data center, and the other one based on the less configurable AWS Elastic Load Balancer.
Load balancing in the data center
The requests from an email come with only a few milliseconds between them and first hit a load balancer. They all request a product at a certain position of a recommendations selection. Other than the position, all requests coming from the same email share the same parameters.
Each request gets the full recommendations list, either by computing it or by retrieving it from the cache. Then it extracts the product at the relevant position and returns it to the client. The order in which the requests are received by the servers is unknown and some requests might even come very late.
Configuring your own bare metal
To take advantage of a trivial in-memory cache, some order needs be introduced into how the requests are load-balanced across our machines. We need to force all the requests coming from the same client to the same machine.
We configured our hardware load balancer to load-balance on OSI layer 7, i.e. to balance based on a header value extracted from the HTTP message.
Since we were able to use headers to direct the LB to send requests to a certain machine, a simple cache was all we needed. This solution works well, but dictates that the load balancer must be able to forward all requests onto the same machine. As we will see, this is not always the case.
Load balancing on AWS
Since last year’s introduction of Radical Agility, Zalando’s delivery teams have been enjoying complete team autonomy. Through the adoption of AWS, delivery teams became even more empowered. This was because every team was given complete responsibility and freedom within their AWS accounts. Since AWS is our cloud provider of choice, we looked into the load balancing functionalities offered there.
Not all load balancers provide customisable layer-7 load balancing, which is something I would discover while researching the features offered by Amazon Web Services’ Elastic Load Balancer. The ELB does offer some application-level load balancing features, but none that would fit our use case where requests are fired off in simultaneous bursts.
AWS ELB’s skimpy stickiness options
As Amazon’s official documentation puts it, “... a load balancer routes each request independently to the registered instance with the smallest load.” So by default, stickiness is disabled. The stickiness options that are offered are tied to HTTP cookies and therefore biased toward browser-like clients and a sequential request pattern, just like when a person surfs a web site.
Here’s an overview of ELB’s offerings:
Load Balancer Generated Cookie Stickiness
To bind multiple requests to the same machine, Load Balancer Generated Cookie Stickiness can be enabled. It is also sometimes called duration-based stickiness. This makes the ELB generate a cookie itself and send it together with the response to the first request made by a client. On subsequent requests, that cookie is re-sent by the client and the ELB knows this request is bound to a specific machine, according to the cookie’s content.
This is great for use cases such as a human user browsing the pages of a web site, but does not work when multiple requests are made at the same time.
Application Generated Cookie Stickiness
Instead of using the AWSELB cookie for stickiness for as long as it is still valid, we can use the app cookie option to handle stickiness via the cookie our application generates. If the application’s cookie is present, then an AWSELB cookie will still be generated and added to the response by the ELB.
However, If the app cookie is removed or expires, the AWSELB cookie is no longer added to responses by the ELB and the same email’s remaining requests are again spread out across all machines.
As we can see, even though the ELB offers application-layer load balancing, it does not support headers as a means of achieving it, only cookies. Requests cannot be made sticky to a certain machine a priori, so we are forced to abandon the in-memory cache and need to consider a separate coordination entity to:
- Decide which request will trigger the recommendations list computation and,
- Cache the recommendations until all positions in the recommendations list have been read and displayed in the email.
At first, implementing such a coordination service from scratch was discussed. Eventually we decided to go with a system that already implements the features we require. This way, we ensured that we would be testing a far narrower scope; only the specifics of our domain.
After a colleague of mine, Hrvoje Torbašinović, pointed out Redis’ interesting HSETNX command, a command that would become essential in the final solution, we decided to explore the functionality offered by Redis more deeply. A few designs were made, reviewed, and iteratively improved. Ultimately, we got to an implementation that easily serves 800-1000 requests per second during our email campaigns without reaching the system’s upper limit.
Instead of forcing every request onto the same machine, we let them go freely to any machine, at the ELB’s discretion, as depicted in previous diagrams.
The different machines receiving requests for products at different positions in the email will contact Redis and execute commands to decide which one will compute the recommendations list.
Three Redis operations are necessary:
- Set the lock to win the right to compute the recommendations,
- Write the recommended products into the cache and,
- Read computed recommendations from the cache.
These operations can be seen in the flow diagram below.
Basically, every request will attempt to get a lock in Redis, but only one should succeed. The lock winner will compute product recommendations, while the others will proceed to do a blocking read on the Redis cache. After the lock-winning request is done computing recos, it will write the list into the cache. Finally, the requests blocking on the read will get the recommended products.
An additional complexity is that a huge number of emails will be opened concurrently. Every e-mail’s lock and cached recommendations will be identified by a unique, personalised parameter combination. I name these Redis keys lock_key(params) and recos_key(params).
The system is described in more detail in the following diagram. Entry expirations are mentioned for the first time, as are the return values of the lock-setting commands. These are explained in the implementation section.
The keys are generated with a rule similar to:
lock_key(params) = "lock-" + concat(params)
recos_key(params) = "list-" + concat(params)
A request thread will attempt to get a lock by setting an arbitrary value into lock_key(params) by doing the following command:
HSETNX lock_key(params) "lock" "got it"
EXPIRE lock_key(params) lock_ttl
The HSETNX commands write a field “lock” with the value “got it” and returns 1 only if the the field does not already exist. Only the first write gets a 1 and any subsequent ones get a 0. This covers the decision on who will calculate the list.
After the lock-winning thread is done selecting the recommendations, it writes them with these commands:
LPUSH recos_key(param) recos_json
EXPIRE recos_key(params) recos_ttl
The non-lock-winning threads carry on and try to execute a blocking read on the list, but it is not yet there, so they wait. Since there are multiple requests trying to read, we cannot use the typical Redis pop commands, as that would mean the first thread would remove the list, and the remaining threads would fail with a timeout. Instead, we keep the item list in a one-element circular list using BRPOPLPUSH. Every thread reads the list with the following command:
recos_json = BRPOPLPUSH recos_key(param) recos_key(param)
EXPIRE recos_key(params) recos_ttl
Both the lock and list entries have their expiry times explicitly set, with lock_ttl < recos_ttl, as explained in the following section.
You can check out a sample Java implementation here.
The table below lists key features the implementation enforces and describes what happens when they are not enforced.
Additionally, an assumption is made that the network capacity is not fully saturated and the EC2 instances involved are not under heavy load. If these are not satisfied, Redis operations can time out and result in one or more products not being shown.
I hope this article has given you a bit of insight into how we handle the delivery of millions of personalised emails on AWS. If you’ve got any questions about the process, feel free to contact me on Twitter.