#Q

1 messages · Page 1 of 1 (latest)

analog mesa Sep 1, 2024, 8:16 PM

You cant have truly relational data in nosql

If you nest data to avoid applicatio -level joins what ends up happening is that you will duplicate a lot of your data

tired stone Sep 2, 2024, 2:10 AM

In industry, the go-to is usually a SQL DB (specifically Postgres). It's used by default because although you have to define table schemas, it's relatively easy to learn, has a lot of community support, and is very mature. If you don't know how your data models will be queried ahead of time, relational DBs offer a lot of flexiblity when it comes to creating/updating tables and having relations between each.

Sure, non-relational DBs like Mongo and DynamoDB are schemaless, but they should be used when you know the query patterns ahead of time (and said query patterns will not change often) so that you can optimize the table design: https://aws.amazon.com/blogs/compute/creating-a-single-table-design-with-amazon-dynamodb/

Amazon Web Services

Creating a single-table design with Amazon DynamoDB | Amazon Web Se...

This post looks at implementing common relational database patterns using DynamoDB. Instead of using multiple tables, the single-table design pattern can use adjacency lists to provide many-to-many relational functionality.

I've noticed a lot of places do use NoSQL databases because of the reasons you described, @queen patio . But, they end up writing and creating tables with a relational mindset (ie a new table for each different entity, like tables for Book, User, Account, etc.). If they want to do any joins between tables, they'd have to write it in the application code, which is way more annoying to deal with than having the DB do the joins for you

So, if you wanna make use of the high scalability and availability of NoSQL DBs effectively, you need to think with a very nonrelational mindset that minimizes the amounts of tables created by not having each entity be its own table, but by instead nesting data as you mentioned

Any "relations" in your NoSQL DB should be considered and built in within your single table

If you have no relations in your data models, the joining matters less and honestly it's up to you whether to use a SQL vs noSQL DB. Personally, I'd still start off with a SQL DB just in case there's some feature down the line that would require creating relations between existing data models, or creating new tables that have relations with existing data models (eg. an audit table).

queen patio Sep 4, 2024, 5:34 AM

This makes so much sense @tired stone , thanks so much for this info. I just have one final question in regards to SQL scalability. So for the longest time I thought that SQL was only able to vertically scale, however I've learned that SQL databases can also scale vertically but that this is usually "harder". Why is this harder and how would sharding for a SQL database work?

tired stone Sep 4, 2024, 6:59 AM

Any computer, whether they be a database or not, can scale vertically. However, there is a limit to how much memory or CPU that a given server instance can give a process based on the instance's specs. On the other hand, you can have multiple instances of a process (eg. an application, a database application, etc.) be ran on N instances to divide the load (ie horizontal scaling)

In which N is limited based on how much money you have to pay for your hardware and how much literal physical space there is to house your server instances

Sharding is a technique to split a table into smaller tables, in which all the tables have the same columns. For example, you can shard a table based on some datetime ranges, location, etc.

The point is to make overall query/insert time shorter by dealing with smaller tables than dealing than one large/mega table

This is useful especially if, for example, the application in one region doesn't care about the data in another region

Like, let's say we have a big table that has 1 million rows in us-east-1, and our application is in us-east-1, us-east-2, us-west-2, eu-central-1, and eu-west-1. Of course, the application in us-east-1 would have the lowest latency because it's closest to the DB in us-east-1. However, eu-west-1 and eu-central-1 would have a noticeable lag compared to us-east-1 due to their distance from the main DB. So, if we have each region having their own shard of the big table (eg. 200k rows per region), they'd minimize the read/write time of their DB and have fewer rows to search through.