#Integrating Existing Databases

14 messages · Page 1 of 1 (latest)

honest willow
#

I have a project with several existing databases (let’s say about 100).

My UI in Django will need to be able to query every database. Each user will probably only access 2-3 databases at a time while they’re logged in. Is it better to connect/disconnect at each query or does it make more sense to somehow cache the connections that are being used?

Lastly (and I’m guessing the answer is no), does Django have any built in settings or configuration to help with this situation? Thanks!

prime timber
#

are these databases supposed to be the db for the Django app, or just some generic datastore that you are talking to using non-django tooling?

#
honest willow
#

They’re generic data stores. They will need up be read from and updated, but the Django ORM will not be managing any of them

#

Clarification: when I say “updated” I mean the Django app will insert data and modify records, not have anything to do with the structure of the databases

boreal merlin
#

Do they all have different data structures or are they all the same?

prime timber
#

okay. I have no experience setting up dynamic database connections in a django backend, I would assume there are libraries to help with this. the tricky part would be maintaining managed open connections to reuse across incoming requests over some time. Django has this solved internally, I don't think it's made to be reused for arbitrary connections to non-django databases though.

#

what database engine are you targetting, all e.g. Postgres or Mysql, or is it mixed?

prime timber
#

from what I understand, Django uses connection pooling per process, but not across processes. depending on how you run it, e.g. a gunicorn with 4 worker processes, each process will maintain its list of open connections to reuse as requests come in and are handled by a newly spawned thread that shares that parent process connection pool.

I assume most db libraries like psycopg will have some sort of pooling API you can use for that.

#

by default the whole reuse seems disabled, as CONN_MAX_AGE=0 will just close any connection at the end of every request, it must be set to a positive number (seconds)

honest willow
#

It is mixed, SQL Server and Postgres. There are about 5 different schemas that hold similar data. My plan to deal with that is to have a shared interface with methods like “read_all_locations” that can change depending on the source. I will abstract away all of the differences in the databases before Django needs to deal with it.

If there’s not a ton of overhead in re-opening a database connection within every view, I might just try that as a first cut and see if there are any issues.

honest willow
prime timber
#

yea I would start with just connecting once per request and see how it goes. it's an interesting problem you have which I hadn't have to deal with yet, keep us posted 🙂