Introduction

This page discusses the various latencies that can occur when your app makes an ad serving request (by calling the Decision API). To ensure a smooth user experience, platform owners often set a latency threshold for serving ads. This means a circuit breaker is configured to respond differently if an ad request exceeds a predefined time limit.

Overview of latencies during an ad serving request (Decision API)

While integrating your systems with Moloco Commerce Media (MCM), it is crucial to understand the sources of end-to-end (e2e) latency. To facilitate this, we have divided the latency-generating areas into three sections:

Ad Serving Request: This is the API call to your backend from your web and mobile clients.
Decision API Call: This is the call sent from your backend to MCM.
MCM Decision API Processing Time: This includes the time it takes for MCM to process a Decision API call, including real-time machine learning (ML) inference.
Understanding these sections will help you identify and address potential latency issues during the integration process.

Latency between Web/Mobile clients and Platform backend

The latency here will depend on where your clients and your backend are located.
If you are hosting your backend in a single datacenter (or Cloud region), the latency will increase as the user gets further away from the backend location. One way to decrease this latency would be to host your backend in multiple locations and use a global load balancer (GCP Global Load Balancer or AWS Global Accelerator) to route the client request to the closest region and thus reduce latency between your clients and your backend.

📘
Note
Outside of the network latency created here, you might have extra latency generated by your own server side processing time (authentication, authorization, access to DB, etc). Depending where your DB is located relative to your backend servers location, you might not get the benefits of a multi-regional servers architecture.

Latency between your Platform backend and MCM

This is the network latency between your backend and MCM services. This will vary based on your backend hosting location and where we have provisioned our services for you.
Moloco Commerce Media is entirely hosted on Google Cloud and whenever we provision an environment for you, you can decide which GCP region would be the closest to your backend. A full list of GCP regions available here.

To measure the latency to each GCP region from your location (or from a VM running in your datacenter location), you can use tools like www.gcping.com to find the closest region latency wise.

MCM Decision API processing estimated latency

The last piece of the overall E2E latency is generated by the time it takes to process an ad serving request by the Decision API. This request will trigger a real time ML inference.
Moloco latency of ad decision responses may vary depending on various factors but on average we see p95 latency of around 100ms on our server side.

📘
Note
The latency mentioned above is not a latency SLA, this is the number we share to help you decide the proper circuit breaker threshold described earlier in this article.
It is also important to remember that this number includes the latency generated by our real-time ML personalization engine that enable our unique differentiator: the 1:1 personalization of your ads. If you want to learn more how to navigate Ad Blindness in Retail Media, we strongly recommended reading this blog post.

Summary

While it might be difficult for Moloco to drastically decrease the latency generated by our server side ad request processing that involves real time ML inference for high performance targeted ads, there are other areas where we can improve the latency. For instance, we can work with you to provision your MCM environment in a GCP region close to your backend systems. If you need advice on your integration architecture from Moloco’s customer engineering team, please reach out to your Moloco’s representative.