Higher Storage usage by convex compared to turso | Convex Community | Page 1

rotund lintel Apr 6, 2024, 7:51 AM

#

I imported the same json (22kb) to turso and convex (no indexes in convex, some indexes in turso but that makes no difference for the argument)

And Turso requires space like 27 mb compared to convex with 144.06 mb.

This problem gets exponentially worse if you add indexes which duplicate the data.

Why does Convex take so much more space?

#

tidal breach Apr 7, 2024, 3:32 AM

#

two reasons:

Convex adds _id and _creationTime fields to each document, so it's not just the raw json you're storing
Document sizes are rounded up to the nearest kilobyte https://docs.convex.dev/production/state/limits#database

the reason for (1) is to access your documents via db.get you need a unique id field and to query documents in the default order they need a creation time field.

The reason for (2) is that there is some per-document overhead within Convex, e.g. when reading a small document you often need to do IO for an entire 4kb page of SSD storage. However, we're rethinking this policy internally and may remove the rounding soon.

basically, the accounting policy is trying to be honest about how expensive small documents are, and incentivize good patterns. But also look forward to changes here soon 😊

Limits | Convex Developer Hub

We’d love for you to have unlimited joy building on Convex but engineering

rotund lintel Apr 7, 2024, 2:21 PM

#

tidal breach two reasons: 1. Convex adds `_id` and `_creationTime` fields to each document, s...

I understand every of your decisions but can just two new fields be the whole reason for this over 5x difference in storage?

tidal breach Apr 7, 2024, 2:21 PM

#

How big are your documents? i would guess that rounding each document up to 1kb is a bigger factor

rotund lintel Apr 7, 2024, 2:40 PM

#

tidal breach How big are your documents? i would guess that rounding each document up to 1kb ...

I have 147 thousand documents (rounded 😂) and the json/jsonl takes about 15 - 26 kb so it maybe is a very big factor

tidal breach Apr 7, 2024, 2:41 PM

#

147 thousand documents * 1kb per document comes out to 147mb, so that explains it 😝

rotund lintel Apr 7, 2024, 2:41 PM

#

tidal breach 147 thousand documents * 1kb per document comes out to 147mb, so that explains i...

Yes lol

#

Lmao

tidal breach Apr 7, 2024, 2:42 PM

#

I wouldn't worry about refactoring, since we're probably going to remove the 1kb thing soon

#

(i'm bumping the internal discussion now)

rotund lintel Apr 7, 2024, 2:44 PM

#

tidal breach I wouldn't worry about refactoring, since we're probably going to remove the 1kb...

Nice. This is a huge deal for me, since I can't do anything about my storage issues myself

rotund lintel Apr 9, 2024, 2:13 PM

#

tidal breach I wouldn't worry about refactoring, since we're probably going to remove the 1kb...

hi, when will this be removed? I'm sorry for pinging you here but my project is down (we got the exceeded plan error) until you either remove this error (plan exceeded) for a short time or fix the problem.

tidal breach Apr 9, 2024, 3:10 PM

#

curious, is an extra 100mb pushing you over the limit? i believe the limit is 500mb

rotund lintel Apr 9, 2024, 3:10 PM

#

tidal breach curious, is an extra 100mb pushing you over the limit? i believe the limit is 50...

yes, but I have to create a dev branch for me and my coworker plus indexes (mainly the indexes in the worst case we can share a dev account)

#

this problem is not a big deal initially but it's growing exponentially

tidal breach Apr 9, 2024, 3:14 PM

#

ok i'll re-enable your team. even without the 1kb rounding, it's around 40mb to store the data, multiplied by indexes and deployments. if you want to keep usage low, i would recommend using smaller datasets in dev deployments.

rotund lintel Apr 9, 2024, 3:15 PM

#

tidal breach ok i'll re-enable your team. even without the 1kb rounding, it's around 40mb to ...

that's a good idea with the smaller datasets, I will do that immediately

oblique grail Apr 9, 2024, 4:09 PM

#

@tidal breach - does creating a new branch (i.e. "preview deployment" https://docs.convex.dev/production/hosting/preview-deployments) make a full copy and so double the storage used?

tidal breach Apr 9, 2024, 4:37 PM

#

oblique grail <@706775590346162197> - does creating a new branch (i.e. "preview deployment" h...

new dev & preview deployments are empty. if you create data in them (e.g. https://stack.convex.dev/seeding-data-for-preview-deployments ) then they would hold that data. there's no automatic copy

Seeding Data for Preview Deployments

Now that we've launched Preview Deployments on Convex, you can test out backend changes easier than ever. But you may want to seed your project with d...

oblique grail Apr 9, 2024, 4:43 PM

#

Thanks @tidal breach. The reason I ask is because of financial planning. It seems from your comment and the article that branching in Convex is fundamentally about branching the schema, and not the business data. So just to be clear: if you branch production, and then seed the new branch with the same data as in production, would a Convex customer be billed for double the storage usage?

#

Also @tidal breach, a lower priority question: there's no automatic copy to a preview deployment today, but is it on the roadmap to make a copy-on-write branch that is a clone of production Convex deployment?

steel ingot Apr 9, 2024, 5:06 PM

#

Each deployment is essentially a blank new database. So whatever data you put in a new deployment (dev, preview, prod) will count towards your overall quota for the team.

#

We don't have copy-on-write branches in the roadmap yet. It's something we've discussed but haven't put it on any list yet.

#Higher Storage usage by convex compared to turso