How to Configure Multiple Databases in Django the Simple Way
I was recently tasked with a project that required adding a second database to the Strongarm dashboard, which is a web application powered by Django. There are a number of reasons why having multiple databases in Django might be necessary an application. In our case, we’re using the database to store DNS requests that Strongarm processes.
This is a huge dataset (on the order of billions of records per year), and it is secondary to the primary functionality of our service. It made sense for us to isolate this data away from the rest of the Strongarm data to ensure that it never interferes with more important functionality, such as user logins or infection alerts.
What I Learned About How to Configure Multiple Databases in Django
Django’s documentation walks users through most of the key steps for configuring a second database, but there were a couple specific parts that I found confusing or unclear. Below, I’ll explain what I learned in figuring this out for Strongarm and share some takeaways that others can use as they address similar issues. All of the explanations below assume that you have already figured out how to define your second database with the
DATABASE setting (see Django’s documentation for details on this).
When using multiple databases, Django provides two different ways for you to specify exactly which database to read/write from. First, there is a QuerySet method available called
using which can be added to any QuerySet (as well as a
using parameter that can be passed to
Model.save()). These will instruct Django to use a specific database for a single operation, allowing you to easily specify which database to use on a per-operation basis, but they quickly become tiring. Using this method requires a developer to understand exactly which database is used for each model, and you quickly end up with a lot of repeated code.
The better option for handling multiple databases is called Database Routing. Django’s Database Routing allows you to specify a list of router classes, which can be used to determine when each database will be used. Once the routers are configured, you can query for and save records like normal, and Django will automatically determine which database to use based on the logic in the router classes.
Each database router class can provide four methods that will help Django determine which database to use for various functionality – reading, writing, relationships, and migrating. When trying to write my own router, I found the descriptions in Django’s documentation a little confusing and had a hard time figuring out best practices. Newcircle’s blog post on this same topic was helpful to me because they provide a router example, but it was written for Django 1.3, so it’s a bit outdated.
Django Router Example
Below is an example of the router that I ended up with. Currently we are only using one router. The logic is built so that all models from one specific app will be saved to our second database and everything else is saved to the default database. More complex configurations are possible, either within a single router or by using multiple routers. Care should be taken when writing the router logic as it can quickly get confusing. For example, we had slightly misconfigured logic initially and ended up with some tables being created in the wrong database.
class ExampleDatabaseRouter(object): """ Determine how to route database calls for an app's models (in this case, for an app named Example). All other models will be routed to the next router in the DATABASE_ROUTERS setting if applicable, or otherwise to the default database. """ def db_for_read(self, model, **hints): """Send all read operations on Example app models to `example_db`.""" if model._meta.app_label == 'example': return 'example_db' return None def db_for_write(self, model, **hints): """Send all write operations on Example app models to `example_db`.""" if model._meta.app_label == 'example': return 'example_db' return None def allow_relation(self, obj1, obj2, **hints): """Determine if relationship is allowed between two objects.""" # Allow any relation between two models that are both in the Example app. if obj1._meta.app_label == 'example' and obj2._meta.app_label == 'example': return True # No opinion if neither object is in the Example app (defer to default or other routers). elif 'example' not in [obj1._meta.app_label, obj2._meta.app_label]: return None # Block relationship if one object is in the Example app and the other isn't. return False def allow_migrate(self, db, app_label, model_name=None, **hints): """Ensure that the Example app's models get created on the right database.""" if app_label == 'example': # The Example app should be migrated only on the example_db database. return db == 'example_db' elif db == 'example_db': # Ensure that all other apps don't get migrated on the example_db database. return False # No opinion for all other scenarios return None
Test Fixtures for the Second Database
We use test fixtures throughout our Django app to load test data into the default database during unit tests. We needed to add fixtures for the second database as well, but this took some effort for me to figure out. The Django testing documentation explains that you need to set
multi_db = True in test classes to enable all test databases. That’s easy, but the tricky part was figuring out how to specify fixture data that should be loaded.
Fixture data for the default database in Django is loaded by specifying a list of filenames where the data is located (i.e.
fixtures = [‘test’]). The
.json extension is assumed if it is not included, and each file specified is loaded automatically into the default database.
After much searching, I finally figured out that it’s important to follow a specific filename convention in order to load fixture data into a different database than the default one. To load fixture data for a database named
example_db, you need to create a fixture file with that database name included in the right spot, such as
test.example_db.json. Then you can load that data using
fixtures = [‘test’] combined with
multi_db = True. With that command, Django will automatically load
test.json into the default database, and
test.example_db.json into the second database.
I was surprised to find that this is Django’s convention for handling fixtures for multiple databases. I would much prefer an option to explicitly declare the data that gets loaded instead of an automatic solution such as this.
Migrating Multiple Databases
Migrating multiple databases in Django is easy, but redundant: The
migrate command only operates on one database at a time, so you must run it separately for each database. To migrate the second database, you simply run
python manage.py migrate --database=example_db. This is fine in production, when you aren’t often migrating, but during active development on your local environment, it can get tiring. To solve this, we use Fabric to simplify complex tasks. In this specific instance, we have one command that runs the migrations on both of our development databases.
Migrations can also be confusing at first, because even if your router dictates that a migration should not be run on a database, Django will still appear to run the migration. The
migrate command will display output that looks like the migration was run, and the migration will show up in that database’s
django_migrations table. You can rest assured that, if your router is configured correctly, then the database won’t actually be affected by the migration because Django keeps track of migrations that were run against a database even if the migration wasn’t actually applied to that database.
While it was tricky to set this up at first, these key findings made it easy for us to configure multiple databases in Django. If you have any tips for running multiple databases, or would like to learn more, be sure to let us know on social media. We’re on Twitter @strongarmio.