#Q

1 messages · Page 1 of 1 (latest)

analog mesa
#

You cant have truly relational data in nosql

#

If you nest data to avoid applicatio -level joins what ends up happening is that you will duplicate a lot of your data

tired stone
#

In industry, the go-to is usually a SQL DB (specifically Postgres). It's used by default because although you have to define table schemas, it's relatively easy to learn, has a lot of community support, and is very mature. If you don't know how your data models will be queried ahead of time, relational DBs offer a lot of flexiblity when it comes to creating/updating tables and having relations between each.

#

Sure, non-relational DBs like Mongo and DynamoDB are schemaless, but they should be used when you know the query patterns ahead of time (and said query patterns will not change often) so that you can optimize the table design: https://aws.amazon.com/blogs/compute/creating-a-single-table-design-with-amazon-dynamodb/

Amazon Web Services

This post looks at implementing common relational database patterns using DynamoDB. Instead of using multiple tables, the single-table design pattern can use adjacency lists to provide many-to-many relational functionality.

#

I've noticed a lot of places do use NoSQL databases because of the reasons you described, @queen patio . But, they end up writing and creating tables with a relational mindset (ie a new table for each different entity, like tables for Book, User, Account, etc.). If they want to do any joins between tables, they'd have to write it in the application code, which is way more annoying to deal with than having the DB do the joins for you

#

So, if you wanna make use of the high scalability and availability of NoSQL DBs effectively, you need to think with a very nonrelational mindset that minimizes the amounts of tables created by not having each entity be its own table, but by instead nesting data as you mentioned

#

Any "relations" in your NoSQL DB should be considered and built in within your single table

#

If you have no relations in your data models, the joining matters less and honestly it's up to you whether to use a SQL vs noSQL DB. Personally, I'd still start off with a SQL DB just in case there's some feature down the line that would require creating relations between existing data models, or creating new tables that have relations with existing data models (eg. an audit table).

queen patio
#

This makes so much sense @tired stone , thanks so much for this info. I just have one final question in regards to SQL scalability. So for the longest time I thought that SQL was only able to vertically scale, however I've learned that SQL databases can also scale vertically but that this is usually "harder". Why is this harder and how would sharding for a SQL database work?

tired stone
#

Any computer, whether they be a database or not, can scale vertically. However, there is a limit to how much memory or CPU that a given server instance can give a process based on the instance's specs. On the other hand, you can have multiple instances of a process (eg. an application, a database application, etc.) be ran on N instances to divide the load (ie horizontal scaling)

#

In which N is limited based on how much money you have to pay for your hardware and how much literal physical space there is to house your server instances

#

Sharding is a technique to split a table into smaller tables, in which all the tables have the same columns. For example, you can shard a table based on some datetime ranges, location, etc.

#

The point is to make overall query/insert time shorter by dealing with smaller tables than dealing than one large/mega table

#

This is useful especially if, for example, the application in one region doesn't care about the data in another region

#

Like, let's say we have a big table that has 1 million rows in us-east-1, and our application is in us-east-1, us-east-2, us-west-2, eu-central-1, and eu-west-1. Of course, the application in us-east-1 would have the lowest latency because it's closest to the DB in us-east-1. However, eu-west-1 and eu-central-1 would have a noticeable lag compared to us-east-1 due to their distance from the main DB. So, if we have each region having their own shard of the big table (eg. 200k rows per region), they'd minimize the read/write time of their DB and have fewer rows to search through.