Python Upgrade Playbook. How Lyft upgrades Python at scale—1500+… | by Aneesh Agrawal | Mar, 2024

The Backend Language Tooling (BLT) team at Lyft is responsible for the Python and Go experience for our engineers and drives each upgrade as a key part of our remit (60% of our repos have Python code!). Funnily enough, working on a Python upgrade is usually the first project for most new hires joining our team.Keeping Python up to date has many benefits:keeps us secure and in compliance with patching targetsenables us to access the latest versions of tools (e.g. linters) and libraries as they drop old Pythons or only work on the newest Pythons (e.g. LLM libraries) — a frequent ask from our engineering teamsespecially with recent Pythons, brings efficiency gains to our fleet to control our costsGiven breaking changes in Python and its ecosystem, we can’t do everything by ourselves. We have a few key principles for engaging with our platform partners and service teams:Communicate early and often about upcoming upgrades so teams can plan for them and know what is needed at each stage.Build central infrastructure and automation first; only ask teams to jump in once we have exhausted what we can do for them.Focus on the critical chain — identify “long pole” repos up front to prevent delays at the end, and prioritize based on what will unblock the most repos. Using this principle our Airflow infrastructure went from the last repo to upgrade to one of the very first!Let’s walk through our steps after sending an initial “get ready for the next upgrade” email.Everything starts with data.The first time we did a Python upgrade, we kept running into an unexpected problem: teams would come to us and proudly let us know they were done, but we’d later realize we’d missed something and they were still running multiple Python versions. There are many ways a single repo can leverage Python — running locally, in builds, for linting, during tests, as a dependency, at runtime, or even via a sidecar — and we weren’t aware of all of them! We’ve invested in data to solve this, building two dashboards:First, a report for each repo showing which Python versions they use (and where), along with a clear call to action: a link to an open PR they could fix and merge (from our automation in the next section). It also has links to the upgrade docs, an FAQ, and our support Slack channel for any questions.Second, a separate, aggregate report showing overall Python usage across the company, additionally sliced by each part of the upgrade — dependency compatibility, linter/code rewriter adoption, build/test vs deploy progress, etc. This drives our planning and prioritization — which infrastructure to build first, e.g. if there are more Airflow users or more Streaming users, and forecasting common problems to solve up front, e.g. if a particular widely-used dependency needs a major version bump.We built seven different data pipelines to power these dashboards, extending the Security team’s setup for Spark-based reporting (though we have simplified the setup since that post):linter configurations, extracted from the .pre-commit-config.yaml file in each repo. Many non-Python repos use linters written in Python and thus are in-scope to upgrade!dependency graph, built on Cartography — reused from the Security team’s post abovetwo pipelines for dependency resolution metadata, as it is sensitive to Python version. One reading the standard python-requires metadata field (for libraries), the other using a custom comment in the requirements.in file used by pip-tools to lock our dependencies for applicationsbuild and deploy configuration, sourced from a Lyft-specific YAML metadata file in each repoOS packages present in each Docker image, leveraging existing image scanning infrastructure. This gave us a big surprise — turns out the nodejs package (and thus all our frontend services) depends on Python!Python process invocations in staging/production, leveraging an existing osquery deployment. This was tricky to build and we got help from our Data Engineer — it runs Trino SQLs in Airflow (instead of Spark jobs) to efficiently extract data from osquery.Anything that can go wrong will go wrong.The biggest challenge we had — and change we’ve made — was learning the true depths of Murphy’s Law. We ran into issues we didn’t expect like totally missing data, duplicate rows, wrongly-typed data, orphan/phantom rows, truncated data, uniqueness violations, and more. This made us true believers in Data Quality checks — we adopted tinsel and typedload to write our schemas once (as dataclasses) and apply them to every stage (extract, transform, load), all with runtime type checking.Having squashed these issues, we provide the reports as web pages (example from a previous upgrade):The reports are rendered via Voila from Python scripts that we usually iterate on as Jupyter notebooks (thanks to jupytext) or run directly on in a shell for scripting purposes. (We also use offloaded deployments to share and review proposed changes!)With data in hand, we’re ready to make some preliminary PRs.First — all the repos we don’t need to upgrade, because we can deprecate them instead! We find opportunities to do so by joining the datasets mentioned above with other internal datasets:services with no traffic can be turned offworkers and pipelines with unused output can also be turned offlibraries with only one consumer can be re-composed into the consuming repoBeyond that, we help teams that have custom repo setups move to internal standards — this lets us avoid building automation that would only benefit one or two repos.Second — having something to upgrade to. We add support for the new Python version in our internal tools in parallel with the existing Python version(s), as well as updating our docs. It’s fairly mechanical (copy/paste) — enough said.Third — having something to actually change in a PR. We heavily leverage industry-standard OSS tools for our developer experience, many of which use config files to define their Python version. In the beginning, many of the versions were implicit, e.g. the “unspecific pre-commit version” from the screenshot above; we made these all explicit so they can actually be updated. Given the aforementioned confusion of multiple Pythons, we enforce each repo uses a single Python version at a time, with guardrails that run on each PR to check that all the config files are in sync.Finally — automating the actual upgrade steps for a given repo. Given our polyrepo setup, the BLT team owns an internal tool that makes such wide-sweeping changes at scale. Essentially, the tool does the following for each repo: create a local git checkout, run an arbitrary Python function we call a “fixer” to make the desired changes e.g. update a dependency, and finally create and track a PR from the changes. (Fun fact: this tool is how we implemented the aforementioned guardrails! Once fixers are completed, they are marked as “enforced” so they run in CI to prevent regressions. Importantly, this means guardrails don’t just print an error message, but actually fix any issues and print a diff for the engineer to apply to their PR.)We have a fixer that knows how to upgrade Python itself — let’s dive into it now!The Python upgrade fixer is the most complicated fixer at Lyft, with over fifteen component sub-fixers that separate the logic for testability we can put into 3 groups:Dependency Management: Generally the hardest part of any upgrade is not due to changes in Python itself, but updating dependencies to recent versions which often bring their own breaking changes. Newer dependency versions are needed when:the library publishes wheels — older versions won’t have wheels for newer Pythonsthe library requires code changes for newer Pythons, e.g. if it interacts with the ASTWe end up updating almost all dependencies, and dependency resolution can be slow — to solve this, we built a simple service wrapping pip-tools. It brought p50 resolution times down ~50x (5–10 minutes to 5–10 seconds) by using a shared Redis cache! Beyond powering the fixer, it also doubled as a carrot to upgrade when we first created it — only repos on the new Python version were eligible to use the service day-to-day (outside the fixer). To simplify finding the right versions to update to, we configure our internal package repository to only host wheels for many packages: older versions lacking wheels for the new Python are entirely missing, so CI can fail fast instead of trying to build from source (which usually takes a long time and then fails). Beyond updating dependencies, this set of sub-fixers also updates config files to use the new Python version for dependency resolution.Linters: Beyond needing to be updated like any other dependency, they are a powerful part of the upgrade that automatically rewrite code. We leverage (and make OSS contributions to) tools like pyupgrade and reorder-python-imports that can drop legacy back-compat code, add forwards compatibility, and modify logic to use newer Python APIs. We have a sub-fixer for each linter — having the code editing smarts in individual linters lets engineers run them as part of their normal flow, while each sub-fixer only has to manage a linter’s configuration/version and run the linter to apply the actual autofixes. There’s also a sub-fixer using libCST to apply some very simple changes not worth creating a whole linter for.Build/Test/Run: The rest of the sub-fixers handle updating build, test, and run/deploy configurations, as any other miscellaneous changes (e.g. bumping the version if the repo we are fixing is a library). Two key libraries are ruamel.yaml and ConfigUpdater — we use these to preserve comments and avoid making spurious formatting-only changes when editing YAML and setup.cfg files respectively, something our engineers greatly appreciate.Rollout is simple in theory: we run the fixer on a cron to generate PRs for all repos and keep them up to date, and work with teams to merge. However, there’s a lot of nuance here!One level deep: We don’t do the whole upgrade in one PR! We generally run the dependency and linter update fixers as separate fixers ahead of the main upgrade fixer. This creates smaller diffs, increasing the chance they pass tests and can automerge while simplifying debugging in case of failure. (They’re also included in the main fixer for completeness.)Two levels deep: We can’t merge every PR at once! While dependency and linter upgrades are generally safely automergeable fleet-wide, the actual Python upgrade could cause breakage. After testing with a few early adopters, we work with our Infrastructure Operations team to automerge the generated PRs in batches, ordered by tier of criticality to the business (after which they are autodeployed). For PRs that have minor issues, we evaluate if the fix can be added to the automation, and if not, will go ahead and fix PRs for teams where the cost to communicate the need to fix would outweigh the time spent fixing it ourselves.Three levels deep: We can’t create PRs for all repos at once! There’s essentially three phases:partially updating libraries first to ensure compatibility with the new version, with new CI test suites as confirmationupdating the services using those librariesupdating the original libraries to drop compatibility with the older PythonThese three phases overlap as we have fine-grained data from the dependency graph dataset mentioned above. Practically speaking, the single cron job will mark services and libraries eligible to upgrade (and generate a PR) as soon as their specific dependencies/consumers have been upgraded.While most repos can be fully upgraded automatically, some always require a human touch. Once the automation has done all that it can, we file JIRA tasks against individual teams to track the remaining work — in most cases they only need to make a few fixes to the auto-generated PR.Throughout the upgrade, we send monthly update emails to create and share a heartbeat of progress, and have slack channels to answer questions (with dedicated channels for teams with complex upgrades). We also gate new features to only work with newer Pythons as additional carrots to incentivize upgrades, e.g. test/lint output colorization in CI and faster local venv updates.We’re consistently able to upgrade 1500+ Lyft repos and have never had any major issues — our excellent CI and staging environment catch them. (The only downside being: less fun incident stories to tell!) We’re getting faster every time, from years to, now, months, and all amidst other major initiatives, e.g. moves to k8s and ARM, overhauling dev workflows to be fully local, changes to our overall lines of business, refactoring our builds, and more. Most importantly, we achieve our perpetual goals — developers are unblocked from the latest libraries and functionality, we stay up to date and secure, and we are able to continually improve our hosting cost efficiency.The work we’ve done has paid other dividends as well — standardization has sped up overall development flows and the datasets we built are widely used for project tracking and ad-hoc exploration. And, we’re looking at making the tooling we’ve built here reusable across Infrastructure to track all upgrades and rollout of best best practices.From the Lyft BLT family to yours — the Python Upgrade Playbook is one of our favorite recipes and we hope you enjoy it as much as we do!