Building a Control Plane for Lyft’s Shared Development Environment | by Michael Meng

Even with all the benefits of Context IDs, developers still needed to inspect requests/responses in this new workflow. To accomplish this, we built another Envoy filter, henceforth referred to as the MITM filter. The essence of the MITM filter is that it relays HTTP and gRPC requests from itself to the ProxyApp server using a bidirectional gRPC stream. For the curious Envoy-oriented reader, we considered using the external processing OSS filter, but it didn’t quite fit our needs. We wanted more control at each hop: for instance, the ext_proc filter always calls the ext_proc server, but we only wanted to relay requests when users explicitly tell us they want to tap the traffic of a specific service.A request is only intercepted if the Context ID contains metadata opting in to interception; otherwise, the MITM filter acts as a passthrough. We refer to the act of opting-in as ConnectContext. From the Lyft engineer’s perspective, you specify what requests to intercept based on the source (downstream) and destination (upstream) services. For example, the following config would intercept requests egress-ing to service_B or ingress-ing from service_A.ConnectContext({// Opt-in requests tagged with this Context ID HTTP headercontext: ‘mmeng’,intercept_to: [‘service_B’],intercept_from: [‘service_A’],})ConnectContext can intercept requests/responses for any service, at any hop, in real-time. Contrast this to the previous generation of staging overrides: to see what upstream requests are made by your offloaded facet, you had to comb through historical logs in Kibana.Routing requests back to ProxyApp from the MITM FilterMITM interception also extends to mocking applications. In the previous staging overrides workflow, testing what happens if an upstream service gives back a certain response meant one needed to run the downstream service locally and hardcode the upstream call response. With ConnectContext, mocking the response for an upstream request can be a one-liner. Here’s an example of mocking one’s upstream response code to 500 to test fault‘/v1/post-endpoint’, async (c) => {// Example mock: Set the foo header to value barc.request.headers.set(‘foo’, ‘bar’)// Fetch the real upstream responseawait c.fetchApiResponse()// Example mock: Setting response code to 500c.response.status = 500})Lastly, moving the context injection responsibilities from ProxyApp to Envoy (Edge Gateway and Sidecar Envoy) elevated the extensibility of routing overrides, because we are now able to attach routing overrides to requests that originate from within the service mesh. With this, we unlocked several new infrastructure capabilities for safer, automated testing.The first infrastructure piece we improved was Lyft’s automated acceptance tests (henceforth abbreviated as ATs), moving the runtime from post-PR-merge to pre-PR-merge to improve the reliability of our shared staging environment. Previously, when ATs ran post-PR-merge, the AT ran against the main staging deployment; however, this sometimes led to bad code getting deployed and causing staging outages. Moving to pre-PR-merge ATs meant that the microservice worker that runs ATs would instead route to an offloaded deployment, so bad changes would be caught before they are deployed to staging. Before, when x-ot-span-context was strictly handled by ProxyApp, these requests that originate from within the service mesh were unable to attach routing overrides to offloaded deployments. But now, with Sidecar Envoy natively understanding Context IDs, we were able to implement this design.The second integration point arose when a customer team, Dispatch, wanted to make predictions on how PRs will alter marketplace metrics. The idea was: create an offloaded deployment of the PR under test, have a service that sends historical dispatch cycles from S3 to both the offloaded deployment and the main staging deployment, and compare the outputs. Similar to pre-PR-merge ATs, the Dispatch team was able to have their mesh-originated request attach a Context ID upon egress and trust that it’ll be routed to the intended offloaded deployment.