#Does Cloudflare offer a database solution for Analytics type data?

48 messages · Page 1 of 1 (latest)

safe willow
#

I'll have a stream of events, potentially TB's / month. Does cloudflare offer any NoSQL/SQL solution that could support saving this amount of data? There's also the need to do some basic querying on it later.

#

Does Cloudflare offer a database solution for Analytics type data?

dusk veldt
#

Though it does sample at some point

slim gazelle
#

Yea that's pretty much the closest option, based on Clickhouse, but a few things to keep in mind:

  • Limited to 91 days retention
  • Currently in beta and free at the moment (pricing yet to be announced)
prisma hull
#

There’s both sampling on write (iirc above 3digits of rps) and on read

safe willow
#

What do you mean by "sample"?

#

Ah, i got it, it's a way of not recording every single event

#

Seems a little heavy handed. i already have all of that written, i just need somewhere to push the data

slim gazelle
#

it's not quite as simple as a set amount and then sampled, there's a document here explaining it all if you're interested: https://docs.google.com/document/d/17iThVQ3wr40JlRJ_8Y0qQr2YP1S8Cbn3FujQ5WYLG8E/edit#heading=h.9hfgsp5qfkgp

safe willow
#

OK wow, checking out ClickHouse now... there is definitely way more beyond NoSQL/SQL when it comes to this stuff. I need to do some learning

#

I'm using postgres right now, but it's just incredibly slow to query @ 1 million records (with proper indexing)

slim gazelle
slim gazelle
safe willow
#

I have events grouped by "session_id", so i figured a NoSQL solution would be better, since i can just keep the events in the session document

dusk veldt
safe willow
#

I'm guessing it gets it's performance boost by having well defined time bounds?

narrow field
safe willow
#

Ok excellent. I knew i was missing something. Thanks all

slim gazelle
safe willow
#

Clickhouse is it's own DB? what are they running underneath, SQL?

slim gazelle
#

own db

narrow field
#

"SQL" is not really a database fwiw its just a query language. You do use SQL for clickhouse

safe willow
#

ah of course

slim gazelle
#

yea Clickhouse supports postgres wire protocol as well, http interface and a few other ones

safe willow
#

This is wild. I think i'll need to set up multiple partition keys, i've got bucket_id for different tenants, visitor_id for different people, and session_id for different sessions

#

all within the "events" table, which i'd like to grow to infinity

slim gazelle
#

One of the big advantages of Clickhouse is Materialized Views, you can create a view that aggregates by time/tenant/etc, the magic is unlike normal views which are put together when you query them, Materialized Views update on rows being inserted onto the table they are made on (like a trigger could), so cheap to query

safe willow
#

ok woah, so they are like indexes... but views?

#

i wonder what happens if you modify a row that was already materialized, does the entire Materialized View get rebuilt?

slim gazelle
#

eh I think it's easier to just think of them as Views but you do more work on on insert instead of on query

slim gazelle
safe willow
#

that makes sense

#

events should be append only anyways

#

so i'm good with that

#

this seems like just what i needed, so glad

slim gazelle
#

One of the things that makes Clickhouse fast is that when you definite a partition key (let's say, by month), it defines how the data is stored into parts (single files) on the disk. When you insert data, it's super quick because it's a new part. In the background Clickhouse merges all of these parts (well, any that can be merged because they're in the same partition key). Deleting/updating/mutating data thus requires rewriting the part and is expensive

safe willow
#

So there would be no need to have separate tables per tenant_id, i can simply define a partition key and let Clickhouse do it's magic?

slim gazelle
#

I think that is generally not recommended as you really want to limit those part counts, and instead maybe use materialized views for that

safe willow
#

oh wow, so good

slim gazelle
#

There's a lot of content out there about Clickhouse, this is a good watch:
https://www.youtube.com/watch?v=6WICfakG84c
I explored it a fair bit a while ago and I'm sure I've forgotten quite a few things by now lol

Here's the link to the updated webinar: https://youtu.be/1TGGCIr6dMY

#ClickHouse is famous for speed. That said, you can almost always make it faster! In this webinar (September 2019), Robert Hodges and Altinity Engineering Team use examples to teach you how to deduce what queries are actually doing by reading the system log and system tables. ...

▶ Play video
safe willow
#

how have i not heard of this? Is this fairly new product? Cloudflare is using this for all of their analytics?

dusk veldt
slim gazelle
#

It's been around for a bit. Cloudflare uses it for all the major analytics (dns, http, radar, etc), they have a bunch of blog posts about it, ex:
https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/

The Cloudflare Blog

One of our large scale data infrastructure challenges here at Cloudflare is around providing HTTP traffic analytics to our customers. HTTP Analytics is available to all our customers via two options: