foggy tartan Aug 26, 2023, 12:12 PM

#

Dodge Charger
Type: Muscle
Seats: 5 Doors: 4

#

any idea

fading patrol Aug 26, 2023, 12:38 PM

#

I googled it for you: https://www.freecodecamp.org/news/how-to-get-started-with-firebase-using-python/

freeCodeCamp.org

How to Get Started with Firebase Using Python

This article a detailed guide that'll help you set up your Firebase database and perform simple CRUD operations on it using Python. Firebase, as you might know, is a platform provided by Google to accelerate app development. It offers BaaS or backend as a service, which means that Firebase

#

Read the data and print it. For example if that's SQLite then something like this https://stackoverflow.com/questions/17015980/python-how-to-print-sqlite-table

If that doesn't solve it for you try a help channel #❓｜how-to-get-help

Stack Overflow

Python - How to print Sqlite table

Here I have some simple python code to query a sqlite3 database.

import sqlite3 as lite

conn = lite.connect('db/posts.db')
cur = conn.cursor()

def get_posts():
cur.execute("SELECT * FROM Pos...

white gyro Aug 26, 2023, 12:43 PM

#

fading patrol I googled it for you: https://www.freecodecamp.org/news/how-to-get-started-with-...

Much thanks!

foggy tartan Aug 26, 2023, 1:05 PM

#

fading patrol Read the data and print it. For example if that's SQLite then something like thi...

can u help me with doing it with mysql.connector

fading patrol Aug 26, 2023, 1:35 PM

#

foggy tartan can u help me with doing it with mysql.connector

Very similar but you'll need to look up the exact syntax. And like I said if you're stuck get a help channel

foggy tartan Aug 26, 2023, 1:36 PM

#

Mhm

silk raft Aug 26, 2023, 2:24 PM

#

Working with roles in postgres,

I dont fully understand the implication of one role being a member of another role, say I have something like this setup would this mean that public role inherits permissions from its parent role ?

#

I initially created a database through aditya then I attempted to create/insert tables into that database and always got a permission error. It ended up getting fixed when I declared projects was a member of aditya, I wanted to reduce the amount of permissions given to projects role since I would most likely be sharing it with a bunch of people

pearl lodge Aug 26, 2023, 9:10 PM

#

okay soo i made tests here and for some reason i don't know why it's not returning me the document, it's just sending me an empty list:

    async def quick_search_default_character(self, *, user_id: int, name: str | None = None, prompt_prefix: str | None = None) -> None:
        documents = list()
        init = True
        if name and prompt_prefix:
            cursor = self.db.characters.find({'user_id': user_id, 'name': name,'prompt_prefix': prompt_prefix}, no_cursor_timeout = True)
            while (await cursor.fetch_next):
                data = cursor.next_object()
                documents.append(data)
        elif prompt_prefix:
            cursor = self.db.characters.find({'user_id': user_id, 'prompt_prefix': prompt_prefix}, no_cursor_timeout = True)
            while (await cursor.fetch_next):
                data = cursor.next_object()
                documents.append(data)
        elif name:
            cursor = self.db.characters.find({'user_id': user_id, 'name': name}, no_cursor_timeout = True)
            while (await cursor.fetch_next):
                data = cursor.next_object()
                documents.append(data)
        else:
            cursor = self.db.characters.find({'user_id': user_id}, no_cursor_timeout = True)
            while (await cursor.fetch_next):
                data = cursor.next_object()
                documents.append(data)
        
        print(documents)

        return documents if len(documents) > 0 else None

#

like, this code is suppose to make a quick search

#

!paste

delicate fieldBOT Aug 26, 2023, 9:13 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

pearl lodge Aug 26, 2023, 9:14 PM

#

https://paste.pythondiscord.com/E4CA

pearl lodge Aug 26, 2023, 11:03 PM

#

okay i realize it was my stupid misstake

tepid osprey Aug 26, 2023, 11:34 PM

#

Hi, I'm trying to figure out a way to get a Text widget to auto update in Tkinter to show all the values in a SQLite3 database, and the code runs without errors, but the contents of the database aren't inserted into the Text widget. Here is the code concerning just the auto-update and the database:

import tkinter as tk
import sqlite3 as sql
root = tk.Tk()

conn = sql.connect('database_store.db')
c = conn.cursor()
c.execute("CREATE TABLE dataStore (data text)")
conn.commit()
conn.close()

data_show = tk.Text(root, wrap='word') #other params for height/width that i didn't add
data_show.grid(row=1, column=1, padx=(100, 0), pady=20, sticky='ew')

def update_textbox():
  conn = sql.connect('database_store.db')
  c = conn.cursor()
  updated_db = c.fetchall()
  for data in updated_db:
    data_show.insert('end', f'{data}\n')

  conn.commit()
  conn.close()
  data_show.after(1000, update_textbox)

update_textbox()

And here is the full code if that wasn't enough to find any issues: https://paste.pythondiscord.com/6GTA

Thank you in advance for helping me out!

coral wasp Aug 26, 2023, 11:41 PM

#

tepid osprey Hi, I'm trying to figure out a way to get a Text widget to auto update in Tkinte...

Is this a db question or a tkinter question? I didn’t look at the code since it sounds like you’re saying the db part is fine?

#

But, now that I look at it, you have no query. You’re creating a cursor then calling c.fetchall() on an empty cursor

#

You should add some print statements so you can see / debug what’s happening. For example, add a print statement in the for loop to print the data.

tepid osprey Aug 26, 2023, 11:44 PM

#

coral wasp Is this a db question or a tkinter question? I didn’t look at the code since it ...

I think its a db question since all the tkinter code would seem to work for inserting something like a list or string, but maybe its just how i called the database over

tepid osprey Aug 26, 2023, 11:44 PM

#

coral wasp You should add some print statements so you can see / debug what’s happening. Fo...

yeah I'll try that out, thanks

coral wasp Aug 26, 2023, 11:46 PM

#

tepid osprey yeah I'll try that out, thanks

Well, you should do that in general. But in this case, you're not making a query: you need to query something. Like: c.execute('SELECT * FROM your_table')

tepid osprey Aug 26, 2023, 11:46 PM

#

coral wasp Well, you should do that in general. But in this case, you're not making a query...

Oh so the issue is that I am bringing in everything rather than specifically mentioning the table?

coral wasp Aug 26, 2023, 11:47 PM

#

You’re bringing in nothing

#

You opened a cursor to the db, then did fetchall, but without a query, the cursor returns nothing.

tepid osprey Aug 26, 2023, 11:49 PM

#

coral wasp You opened a cursor to the db, then did fetchall, but without a query, the curso...

ah alright. Thanks for explaining since I only recently started learning how to use databases for data handling while also applying it to a GUI

tepid osprey Aug 26, 2023, 11:52 PM

#

coral wasp Well, you should do that in general. But in this case, you're not making a query...

This seems to fix everything. Thank you again for helping me out and have a nice day/night 🙏

grim vault Aug 27, 2023, 7:08 AM

#

tepid osprey This seems to fix everything. Thank you again for helping me out and have a nice...

Also: .fetchall() will return a list of tuples (or an empty list).

  for data in updated_db:
    data_show.insert('end', f'{data}\n')

So data will be a tuple. I guess you want data[0] in that case.

silver robin Aug 27, 2023, 7:34 AM

#

Hello, has anyone had access setting up a connection to an MS access database? I have seen a few methods but struggling to get anything to work proper in the past.

#

I should mention I am not on windows

wise goblet Aug 27, 2023, 7:53 AM

#

silver robin I should mention I am not on windows

https://gifdb.com/images/high/buffering-confused-thinking-man-meme-5pd0isrj1gc25k53.gif

#

does it mean you are at MacOS, or at Linux, or at Android, or Chrome OS or at something else 😅

silver robin Aug 27, 2023, 7:54 AM

#

Acorn

#

You remember Archimedes machines?

wise goblet Aug 27, 2023, 7:56 AM

#

silver robin Acorn

not recognized

silver robin Aug 27, 2023, 7:56 AM

#

I am on MacOS, but will potentially run on linux, but most importantly it means I dont have access to native MSaccess drivers.

#

I know UCanAccess JDBC works with DBeaver on Mac, but had issues getting this to be recognised in Python, I tried using JayDeBeApi but I dont know if it hit my skill ceiling or if there was an actual issue with thoes drivers.

wise goblet Aug 27, 2023, 8:04 AM

#

silver robin I am on MacOS, but will potentially run on linux, but most importantly it means ...

i can suggest crazy alternative

#

What if u connect your MSaccess to sqlite3 as external source

#

and easily work in Macos/Linux through python with sqlite3?

#

which is natively supported in std libs

#

TLDR: selecting some more open source database engine as common denominator between your two environment

#

Postgresql, MySQL, Mariadb, Sqlite3

#

https://docs.devart.com/odbc/sqlite/access.htm quick googling says, Access supports sqlite3

silver robin Aug 27, 2023, 8:06 AM

#

ok thats interesting, thank you for the suggestion I will investigate

tepid osprey Aug 27, 2023, 1:41 PM

#

grim vault Also: `.fetchall()` will return a `list` of `tuples` (or an empty list). ```py ...

Makes sense, thanks for clarifying. I think the solution for selecting everything from the table is more readable though, so I might just keep using that

pale sinew Aug 27, 2023, 3:19 PM

#

Hello everyone, i have a problem with "RETURNING" stantment in python, I use sqlite3 .3.43 and databases[aiosqlite] databases==0.7.0

Traceback (most recent call last):
  File "/home/yosaki/.local/lib/python3.10/site-packages/discord/app_commands/commands.py", line 827, in _do_call
    return await self._callback(self.binding, interaction, **params)  # type: ignore
  File "/home/yosaki/main/cogs/suggest.py", line 206, in suggest_
    suggest_id = await db.add_suggest(i.user.display_name, consiglio)
  File "/home/yosaki/main/utility/db.py", line 66, in add_suggest
    return await database.execute(query=query, values=SuggestObject(member, 0, 0, suggest=suggest).to_dict())
  File "/home/yosaki/.local/lib/python3.10/site-packages/databases/core.py", line 169, in execute
    return await connection.execute(query, values)
  File "/home/yosaki/.local/lib/python3.10/site-packages/databases/core.py", line 288, in execute
    return await self._connection.execute(built_query)
  File "/home/yosaki/.local/lib/python3.10/site-packages/databases/backends/sqlite.py", line 133, in execute
    await cursor.execute(query_str, args)
  File "/home/yosaki/.local/lib/python3.10/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/home/yosaki/.local/lib/python3.10/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
  File "/home/yosaki/.local/lib/python3.10/site-packages/aiosqlite/core.py", line 133, in _execute
    return await future
  File "/home/yosaki/.local/lib/python3.10/site-packages/aiosqlite/core.py", line 106, in run
    result = function()
sqlite3.OperationalError: near "RETURNING": syntax error

any idea? this error only appear in my remote VPS Ubuntu 20.04.6 LTS x86_64

#

query = "INSERT INTO suggest(member, suggest, positive_vote, negative_vote, status) VALUES (:member, :suggest, :positive_vote, :negative_vote, :status) RETURNING *"
return await database.execute(query=query, values=SuggestObject(member, 0, 0, suggest=suggest).to_dict())

paper flower Aug 27, 2023, 3:21 PM

#

pale sinew ```py query = "INSERT INTO suggest(member, suggest, positive_vote, negative_vote...

I guess returning * is not a thing?

pale sinew Aug 27, 2023, 3:22 PM

#

paper flower I guess returning * is not a thing?

Why? but locally I don't have this problem....

paper flower Aug 27, 2023, 3:31 PM

#

pale sinew Why? but locally I don't have this problem....

Ah, it's supported 🤔

#

Not sure what's wrong then

pearl lodge Aug 27, 2023, 3:40 PM

#

https://paste.pythondiscord.com/X4AA

#

it's creating but not displaying

#

why?

#

it was working normally in Alpha version

#

but then in beta version it's not working like intended

waxen finch Aug 27, 2023, 3:53 PM

#

pale sinew Hello everyone, i have a problem with "RETURNING" stantment in python, I use sql...

is sqlite 3.43.0 the version reported by python itself or something else? could you import sqlite3 and run print(sqlite3.sqlite_version)" to double check?

pale sinew Aug 27, 2023, 3:55 PM

#

waxen finch is sqlite 3.43.0 the version reported by python itself or something else? could ...

I installed sqlite3 not from pip but using the source code, and to manage the dbs and queries I use databases[aiosqlite]

waxen finch Aug 27, 2023, 4:00 PM

#

pale sinew I installed sqlite3 not from pip but using the source code, and to manage the db...

well aiosqlite wraps the built-in sqlite3 library, and that links to some version of sqlite not necessarily the same as what is currently installed (ive had several issues with the wrong sqlite lib being used, and i dont know enough about the linking process to explain why they happened but i did eventually get it working)

pale sinew Aug 27, 2023, 4:00 PM

#

waxen finch well aiosqlite wraps the built-in sqlite3 library, and that links to some versio...

so I have to update it?

waxen finch Aug 27, 2023, 4:01 PM

#

iunno, what was the sqlite version shown by python?

pale sinew Aug 27, 2023, 4:01 PM

#

waxen finch iunno, what was the sqlite version shown by python?

'2.6.0'

waxen finch Aug 27, 2023, 4:02 PM

#

not sqlite3.version, i mean sqlite3.sqlite_version

pale sinew Aug 27, 2023, 4:02 PM

#

waxen finch not `sqlite3.version`, i mean `sqlite3.sqlite_version`

'3.31.1'

#

hm

#

Now the problem is, I uninstalled this version to install 3.43

waxen finch Aug 27, 2023, 4:04 PM

#

you could try recompiling python? might be a bad guess though

pale sinew Aug 27, 2023, 4:05 PM

#

but although, I have no idea where it gets the old version, maybe the sqlite library is compiled based on the one installed and made embed, now I try the recompilation

waxen finch Aug 27, 2023, 4:08 PM

#

some time ago with my raspberrypi, after make installing my sqlite into /usr/local/lib, i had to fidget with /etc/ld.so.conf and sudo ldconfig as i believed the system sqlite was being prioritized over my local install

pale sinew Aug 27, 2023, 4:12 PM

#

waxen finch some time ago with my raspberrypi, after `make install`ing my sqlite into `/usr/...

sqlite3.sqlite_version
still '3.31.1'

#

I don't know what to do really

waxen finch Aug 27, 2023, 4:14 PM

#

how did you upgrade sqlite? where were the .so libraries placed?

pale sinew Aug 27, 2023, 4:15 PM

#

Collecting pysqlite
Downloading pysqlite-2.8.3.tar.gz (80 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80.6/80.6 kB 10.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
pysqlite is not supported on Python 3. When using Python 3, use the sqlite3 module from the standard library.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details. @waxen finch

pale sinew Aug 27, 2023, 4:16 PM

#

waxen finch how did you upgrade sqlite? where were the .so libraries placed?

I completely reinstalled sqlite and python

waxen finch Aug 27, 2023, 4:18 PM

#

but how did you install sqlite? are you talking about that pysqlite module? if so, thats the wrapper which links to the sqlite program and is not sqlite itself, plus its already built into python 3, hence the error

pale sinew Aug 27, 2023, 4:19 PM

#

waxen finch but how did you install sqlite? are you talking about that pysqlite module? if s...

I'm talking about both

#

sqlite3 system and python sqlite

#

only that system sqlite is up to date, python's is not

waxen finch Aug 27, 2023, 4:22 PM

#

i checked the ubuntu package index, 20.04 only comes with sqlite 3.31.1 (see libsqlite3-0)

#

personally i would compile sqlite 3.43.0 from source using the autoconf distribution

#

unpack it, enter the source directory, ./configure then sudo make install

pale sinew Aug 27, 2023, 4:27 PM

#

already done nothing changes, I will update the whole VPS at this point.

waxen finch Aug 27, 2023, 4:29 PM

#

if you want to then sure, jammy 22.04 has sqlite 3.37.2 and the minimum version for RETURNING syntax is 3.35

noble oar Aug 27, 2023, 6:12 PM

#

what db do you recommend for python?

carmine tiger Aug 27, 2023, 6:31 PM

#

anyone expert on peewee please message me im so stuck and noone seems to be able to help

wise goblet Aug 27, 2023, 6:33 PM

#

noble oar what db do you recommend for python?

for backend? Postgresql for is very good default for real work

#

Sqlite3 if it is just pet project level

pale sinew Aug 27, 2023, 7:01 PM

#

waxen finch if you want to then sure, jammy 22.04 has sqlite 3.37.2 and the minimum version ...

Sorry for the ping, I solved it, I just forgot to run ./configure again after upgrading sqlite3

ember badge Aug 27, 2023, 9:28 PM

#

Hello! I have a question. I have an algorithm that gathers data on financial performance, keeps it and yields it to an AI. However, im facing problems feeding the ai bc the amount of data is too large. To solve this, I am trying with langchain models, but i dont know which one and how to use it. Does anyone know about it ? If not, do you know any alternatives?
langchain agents*

bold copper Aug 28, 2023, 12:00 AM

#

ember badge Hello! I have a question. I have an algorithm that gathers data on financial per...

You may be better off asking in #data-science-and-ml

rich violet Aug 28, 2023, 9:24 AM

#

Hi everyone, i am trying for a scholar project to do a multi-tenancy application, i did a quite "good" architecture for my backend following the MVC structure (Model, Vue, Controller) and also DAOS to access the data to avoid having this in the model part.

I am now trying to migrate my tables from a single tenant structure to have a 1 PostgreSQL schema per "user" so he can have all his tables for himself.

My project is in FastApi, SqlModel and PostgreSQL. I need some help because i can't figure out a way to create schemas directly and i would love some help for this one, would be very thankful, thanks in advance!

wise goblet Aug 28, 2023, 9:35 AM

#

rich violet Hi everyone, i am trying for a scholar project to do a multi-tenancy application...

Model, Vue, Controller.
That is wild

wise goblet Aug 28, 2023, 9:36 AM

#

rich violet Hi everyone, i am trying for a scholar project to do a multi-tenancy application...

I recommend using some library meant for migrations. Alembic is meant to integrate with SqlAlchemy, may be it will work in standalone way too

#

Django ORM could be abused for library migrating purposes. Just using it's migrating module

#

I would recommend probably Django ORM if u will not find thing with better integration. It has quite comfy migrating paths

wise goblet Aug 28, 2023, 9:40 AM

#

rich violet Hi everyone, i am trying for a scholar project to do a multi-tenancy application...

https://github.com/tiangolo/sqlmodel/issues/85
People wrote guide and examples how to do that with Alembic for SQLModel

GitHub

How to run a database migration when using SQLModel (such as Alembi...

First Check I added a very descriptive title to this issue. I used the GitHub search to find a similar issue and didn't find it. I searched the SQLModel documentation, with the integrated searc...

#

I googled it for you 😂

hollow oar Aug 28, 2023, 11:02 AM

#

https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
have a read here.
you want BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW probably

turbid quarry Aug 28, 2023, 11:10 AM

#

Wich datebase here use?

hollow oar Aug 28, 2023, 11:32 AM

#

correct.

if you want to optimise, imo you should always refer to the query planner's plan.

hollow oar Aug 28, 2023, 11:50 AM

#

potentially using recursive CTE if you want to short circuit it.

this is just a toy problem so i wouldn't have an issue with writing something like that, but if it's for work then i would really think twice and see if it's absolutely necessary.

also short circuiting it wouldn't necessarily mean it's faster in practice.

harsh pulsar Aug 28, 2023, 1:08 PM

#

fwiw i think a "sufficiently smart" query optimizer should be able to recognize this 🙂

#

i don't know if any real optimizers actually do that

#

i wouldn't be surprised if some olap-oriented databases implemented optimizations like this

hexed estuary Aug 28, 2023, 1:44 PM

#

harsh pulsar i wouldn't be surprised if some olap-oriented databases implemented optimization...

try it in duckdb, I guess :p

coral wasp Aug 28, 2023, 2:09 PM

#

(did someone summon me?)

#

Oh, it's a leetcode, can't use duckdb 😦

opal holly Aug 28, 2023, 5:23 PM

#

Having clause for speed

#

The DBA always gets mad at me when I do stuff like this without a having

opal holly Aug 28, 2023, 5:42 PM

#

I'm not proposing a code diff I'm answering the question of " how would I stop computing weights after hitting sum>1000" from ur first post.

#

Join on a.turn>=b.turn

#

Group by a.turn

#

Having sum(b.weight)<1000

#

Ur DBA may get mad that it's a quarter of the way to a cross join but he will be happy about the having clause and probaly overlook it

#

I don't know if it's better I'm just answering the question about how to quit counting after sum>1000

#

Write your MySQL query statement below

SELECT person_name FROM Queue
WHERE turn = (
SELECT MAX(turn) as turn FROM
(
SELECT q1.turn
FROM Queue q1 JOIN QUEUE q2
ON q1.turn >= q2.turn
GROUP BY q1.turn
HAVING SUM(q2.weight) <= 1000
) as table1
);

#

But there's the full working query if u wanna play around with it

#

And as for the prose " hello my name is Doug,
I don't use recursion,
Cuz it makes me need Tylenol;
Tylenol is bad for the liver
Having clause make ur DBA very very happy,
Greetings traveler "

coral wasp Aug 28, 2023, 6:55 PM

#

opal holly Having sum(b.weight)<1000

I doubt this would be optimized out, but who knows

opal holly Aug 28, 2023, 6:55 PM

#

Me neither it's a quarter of a cross join

#

Cursed join

valid sentinel Aug 28, 2023, 7:09 PM

#

Is there not just a general chat or am I just overlooking that channel

coral wasp Aug 28, 2023, 7:10 PM

#

valid sentinel Is there not just a general chat or am I just overlooking that channel

In lobby, discussion

harsh pulsar Aug 28, 2023, 9:06 PM

#

valid sentinel Is there not just a general chat or am I just overlooking that channel

#python-discussion

coral wasp Aug 29, 2023, 2:08 PM

#

I think the idea was whether a database could shortcut a cumulative sum operation once it reached a certain threshold. I don't know of any that would. The recursive CTE would shortcut the evaluation, but I doubt a window function would/could be shortcutted.

verbal bridge Aug 29, 2023, 2:09 PM

#

i use notes as my database

coral wasp Aug 29, 2023, 2:09 PM

#

If the cost of the cumulative calculation is prohibitively expensive, perhaps employ a search strategy to narrow the search space? @brazen lava

harsh pulsar Aug 29, 2023, 2:54 PM

#

coral wasp I think the idea was whether a database could shortcut a cumulative sum operatio...

part of what i wonder about is whether the overhead of the CTE version is so great that it outweighs the "short-circuiting" benefit

#

e.g. you see this in pandas code a lot, where the algorithmically-efficient code is not any faster than, or significantly slower than, the inefficient but vectorized version

coral wasp Aug 29, 2023, 2:55 PM

#

harsh pulsar part of what i wonder about is whether the overhead of the CTE version is so gre...

Recursive ctes are really iterative, there’s little room for optimization

harsh pulsar Aug 29, 2023, 2:55 PM

#

i guess it depends a lot on what the query optimizer can do

harsh pulsar Aug 29, 2023, 2:55 PM

#

coral wasp Recursive ctes are really iterative, there’s little room for optimization

right, but i don't know if internally they take a slower code path

coral wasp Aug 29, 2023, 3:15 PM

#

harsh pulsar e.g. you see this in pandas code a lot, where the algorithmically-efficient code...

Oh sorry, misread. Yah, agree.

coral wasp Aug 29, 2023, 4:40 PM

#

Oh, I'd perhaps accumulate just from the first 1million rows, and if I haven't reached 10k, use the max cumulative and get the next million, and so on.

#

Nah, just separate queries

#

I don't know if I could do it in a recursive CTE: accumulating the next 1 million rows. I guess depends on whether the engine allows a dynamic limit/offset clause.

#

You'd check the last row of each search

#

import duckdb
target = 100000
last = 0
iteration = 0
batchsize = 10000
with duckdb.connect() as con:
    results = []
    con.execute("create table abc as select * from range(10000000) t(i)")
    while last < target: 
        df = con.execute(f"select i, sum(i) over (order by i) as cumsum from abc limit {batchsize * (iteration+1)} offset {batchsize * iteration}").df()
        results.append(df)
        last = df.iloc[-1, -1]
        iteration+=1

print(df.tail())

coral wasp Aug 29, 2023, 5:11 PM

#

Yah, I don't think this approach can be used in a recursive cte... at least not in duckdb (limits/offset aren't allowed in recursive ctes). I could probably hack something together that used the max value from the last iteration and got the next n rows, but I can't imagine that performing very well

fading owl Aug 29, 2023, 9:12 PM

#

Good day guys it's my first time here!

My question is can someone suggest to me a video tutorial or a step by step guide on how to host twisted python to internet to work in real life not in local host

Thanks in advance

lunar cargo Aug 29, 2023, 9:15 PM

#

Hey guys, Im struggling with my database conception

player_close = sqlalchemy.Table('player-close', SqlAlchemyBase.metadata,
                                Column('player_id', Integer, ForeignKey('player.id')),
                                Column('close_id', Integer, ForeignKey('close.id'))
                                )


class Player(SqlAlchemyBase, SerializerMixin):
    __tablename__ = 'player'
    id = Column(Integer, primary_key=True, autoincrement=True)
    coins = Column(Integer, nullable=True)


class Close(SqlAlchemyBase, SerializerMixin):
    __tablename__ = 'close'
    id = Column(Integer, primary_key=True, autoincrement=True)
    winner = Column(Boolean)
    team1 = relationship(Player, secondary=player_close, backref='closes')
    team2 = relationship(Player, secondary=player_close, backref='closes')
    timestamp = Column(DateTime)

#

I want to connect two column to the same backref

#

but getting property of that name exists

paper flower Aug 29, 2023, 9:16 PM

#

lunar cargo I want to connect two column to the same backref

I don't think that's possible

#

But maybe I'm wrong : 🤔

#

You can make a third relationship with custom join

lunar cargo Aug 29, 2023, 9:18 PM

#

?

#

Im little bit suck at sql

#

could you describe more

paper flower Aug 29, 2023, 9:19 PM

#

Do you want that relationship to be read only btw?

lunar cargo Aug 29, 2023, 9:19 PM

#

read and write

#

I was thinking about creating third table Team with just 5 columns of player and connect to Closers table

#

and dont flex on it

#

like this

#

but it look mess to be honest

paper flower Aug 29, 2023, 9:23 PM

#

1 sec

lunar cargo Aug 29, 2023, 9:24 PM

#

paper flower 1 sec

check dm

paper flower Aug 29, 2023, 9:28 PM

#

lunar cargo like this

Maybe you can use a schema like this?

lunar cargo Aug 29, 2023, 9:28 PM

#

paper flower Maybe you can use a schema like this?

oh yeah

#

never thought about it...

paper flower Aug 29, 2023, 9:29 PM

#

https://docs.sqlalchemy.org/en/20/orm/basic_relationships.html#association-object

lunar cargo Aug 29, 2023, 9:29 PM

#

thanks mate

frosty mirage Aug 30, 2023, 12:45 AM

#

To paraphrase Jamie Zawinski: “Programmer has database problem. Thinks: ‘I know, I’ll use an ORM.’ Now programmer has two problems.”

paper flower Aug 30, 2023, 2:14 AM

#

frosty mirage To paraphrase Jamie Zawinski: “Programmer has database problem. Thinks: ‘I know,...

ORM is a tool, if you can't use it well - don't

shut tiger Aug 30, 2023, 5:21 AM

#

Programmer has a problem, he thinks "I'll build SQL stringly", now he has three problems.

frosty mirage Aug 30, 2023, 5:59 AM

#

shut tiger Programmer has a problem, he thinks "I'll build SQL stringly", now he has three ...

SQL is just another language. You learn to use it properly, just like you learn to use Python or JavaScript or HTML/CSS properly.

paper flower Aug 30, 2023, 5:59 AM

#

frosty mirage SQL is just another language. You learn to use it properly, just like you learn ...

What's the problem with orms?

shut tiger Aug 30, 2023, 6:05 AM

#

frosty mirage SQL is just another language. You learn to use it properly, just like you learn ...

That was a total non-sequitur

keen minnow Aug 30, 2023, 6:08 AM

#

shut tiger That was a total non-sequitur

it may help to also state what you think is the right approach to prevent common issues with people using SQL.

For instance relying on parametrized queries, ORMs and tooling (lint, etc.)

shut tiger Aug 30, 2023, 6:57 AM

#

keen minnow it may help to also state what you think is the right approach to prevent common...

Yes. I did assume Lawrence actually knows about all that.

frosty mirage Aug 30, 2023, 7:26 AM

#

Show me an ORM that handles this:

    def escape_sql_wild(s, escch) :
        "escapes SQL pattern wildcards in s with escch. The same escch" \
        "  needs to be passed to the ESCAPE clause for the LIKE" \
        " operator."
        if not isinstance(s, str) :
            raise TypeError("expecting s to be a string")
        #end if
        if not isinstance(escch, str) or len(escch) != 1 :
            raise TypeError("expecting escch to be a single-character string")
        #end if
        result = []
        for ch in s :
            if ch == escch or ch == "%" or ch == "_" :
                result.append(escch)
            #end if
            result.append(ch)
        #end for
        return "".join(result)
    #end escape_sql_wild

shut tiger Aug 30, 2023, 8:54 AM

#

frosty mirage Show me an ORM that handles this: ``` def escape_sql_wild(s, escch) : ...

Handles what? Show the problem, not the implementation of the supposed solution.

#

(And jesus, you really comment #end at the end of all the blocks?)

steady creek Aug 30, 2023, 12:37 PM

#

Question, does API is in the cloud not in your 'database'? Am I understanding it right? And if anyone knows your API Key, anyone can change it?

I don't know if this is related.

shut tiger Aug 30, 2023, 1:26 PM

#

steady creek Question, does API is in the cloud not in your 'database'? Am I understanding it...

Sorry but that made no sense

paper flower Aug 30, 2023, 1:31 PM

#

frosty mirage Show me an ORM that handles this: ``` def escape_sql_wild(s, escch) : ...

from sqlalchemy import String, select
from sqlalchemy.dialects import postgresql
from sqlalchemy.orm import DeclarativeBase, mapped_column, Mapped


class Base(DeclarativeBase):
    pass


class Book(Base):
    __tablename__ = "book"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(255), unique=True)


stmt = select(Book).where(Book.title.ilike("a;[]\\'/\""))
print(stmt.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))

And you can pass escape parameter to ilike

#

SELECT book.id, book.title 
FROM book 
WHERE book.title ILIKE 'a;[]\\''/"'

shut tiger Aug 30, 2023, 4:16 PM

#

He's asking if an ORM can do escaping properly? I hope not.

paper flower Aug 30, 2023, 4:57 PM

#

shut tiger He's asking if an ORM can do escaping properly? I hope not.

🙃

quartz lava Aug 30, 2023, 5:13 PM

#

guys maybe this is too much to ask for but can anyone write a program for me to make a database for online flight reservation using mysql and tkinter(I need a UI for it) if possible maybe by tomorrow before 4:30 AM GMT?

paper flower Aug 30, 2023, 5:19 PM

#

quartz lava guys maybe this is too much to ask for but can anyone write a program for me to ...

Sorry, but nobody would do that for free, also with such a tight deadline
Also this server is not a place for paid requests

quartz lava Aug 30, 2023, 5:20 PM

#

paper flower Sorry, but nobody would do that for free, also with such a tight deadline Also t...

I understand. thx for responding

paper flower Aug 30, 2023, 5:21 PM

#

People would be more than happy to help you with your code, but they won't code for you

frosty mirage Aug 30, 2023, 9:20 PM

#

paper flower ```py from sqlalchemy import String, select from sqlalchemy.dialects import post...

Umm, so where exactly do you supply the user-specified string to match?

paper flower Aug 30, 2023, 9:28 PM

#

frosty mirage Umm, so where exactly do you supply the user-specified string to match?

What do you mean?

frosty mirage Aug 30, 2023, 9:30 PM

#

paper flower What do you mean?

You’ll note that the function I posted took arguments, to be supplied by the caller. Your code does not.

paper flower Aug 30, 2023, 9:30 PM

#

frosty mirage You’ll note that the function I posted took arguments, to be supplied by the cal...

...

#

Just pass it into ilike

frosty mirage Aug 30, 2023, 9:32 PM

#

paper flower ...

OK, here’s the problem: I want to do a partial match on what the user enters. So I will use their input in, say “like '%input%'”. But I want the user input to always match literally, even if it contains “%” and "_” characters. That’s what my escaping function achieves. How do you do it with an ORM?

paper flower Aug 30, 2023, 9:35 PM

#

It doesn't sound like a db problem at all

#

Just a part of your business logic

coral wasp Aug 30, 2023, 9:36 PM

#

frosty mirage OK, here’s the problem: I want to do a partial match on what the user enters. So...

Oh, sorry, gotcha...

frosty mirage Aug 30, 2023, 9:36 PM

#

coral wasp Oh, sorry, gotcha...

Like I said: “partial match”.

frosty mirage Aug 30, 2023, 9:36 PM

#

paper flower It doesn't sound like a db problem at all

In other words, an ORM is no help. QED.

coral wasp Aug 30, 2023, 9:36 PM

#

? You'd escape and still use an ORM for a partial match.

paper flower Aug 30, 2023, 9:36 PM

#

frosty mirage In other words, an ORM is no help. QED.

Becauese that's not what orm does 😂

coral wasp Aug 30, 2023, 9:36 PM

#

Don't see why this is the ORMs problem.

frosty mirage Aug 30, 2023, 9:37 PM

#

coral wasp Don't see why this is the ORMs problem.

So what is the ORM’s problem?

coral wasp Aug 30, 2023, 9:37 PM

#

frosty mirage So what _is_ the ORM’s problem?

Are you asking how you'd do a partial match on the exact user input?

frosty mirage Aug 30, 2023, 9:37 PM

#

coral wasp Are you asking how you'd do a partial match on the exact user input?

Yeah.

coral wasp Aug 30, 2023, 9:37 PM

#

What database? PG?

frosty mirage Aug 30, 2023, 9:37 PM

#

coral wasp What database? PG?

Standard SQL.

coral wasp Aug 30, 2023, 9:37 PM

#

msft sql server?

frosty mirage Aug 30, 2023, 9:38 PM

#

coral wasp msft sql server?

You’ll note that I did not use any DBMS-specific SQL syntax, just standard SQL.

paper flower Aug 30, 2023, 9:38 PM

#

frosty mirage So what _is_ the ORM’s problem?

If you don't want your user to fiddle with your query in some way (sql injections aren't possible btw) you could implement a functon like this: https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/functions/database.html#escape_like

frosty mirage Aug 30, 2023, 9:40 PM

#

paper flower If you don't want your user to fiddle with your query in some way (sql injection...

If the user types “%”, and it happens to match an arbitrary substring instead of a literal “%”, then that’s a bug, whether you class it as “SQL injection” or not. Your ORM is of no help here.

paper flower Aug 30, 2023, 9:40 PM

#

frosty mirage If the user types “%”, and it happens to match an arbitrary substring instead of...

Again, it's not an orm problem

#

Who hurt you?

frosty mirage Aug 30, 2023, 9:40 PM

#

paper flower Again, it's not an orm problem

Fine. That’s conceding my point, that ORMs are useless for solving this sort of problem.

paper flower Aug 30, 2023, 9:41 PM

#

frosty mirage Fine. That’s conceding my point, that ORMs are useless for solving this sort of ...

Yeah, solving your specific problem

coral wasp Aug 30, 2023, 9:42 PM

#

I'm not even sure, it's just: replace userinput's wildcard variables (% and _), append a '%' then pass that as a parameter to the like function.

paper flower Aug 30, 2023, 9:42 PM

#

def escape(text: str, escape_char: str) -> str:
    return (
        text
        .replace(escape_char, escape_char * 2)
        .replace('%', f"{escape_char}%")
        .replace("_", f"{escape_char}_")
    )


user_input = "a/b%c"
stmt = select(Book).where(Book.title.ilike("%{}%".format(escape(user_input, "/")), escape="/"))
print(stmt.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))

SELECT book.id, book.title 
FROM book 
WHERE book.title ILIKE '%%a//b/%%c%%' ESCAPE '/'

coral wasp Aug 30, 2023, 9:42 PM

#

hah, yes, that

#

It's still parameterized, even though there's some f-strings involved.

paper flower Aug 30, 2023, 9:54 PM

#

coral wasp hah, yes, that

I don't know how I would have made anything remotely complex without an orm/query builder

#

But "orm bad"

coral wasp Aug 30, 2023, 9:56 PM

#

I hate ORMs... they almost always made the complex stuff harder.

#

But, they made the easy/medium stuff super easy

paper flower Aug 30, 2023, 9:57 PM

#

coral wasp I hate ORMs... they almost always made the complex stuff harder.

It depends on the orm

coral wasp Aug 30, 2023, 9:57 PM

#

but I'm a db person, so that didn't slow me down... I just use SQL for what SQL is good at.

paper flower Aug 30, 2023, 9:57 PM

#

What orms did you try? pithink

coral wasp Aug 30, 2023, 9:57 PM

#

Oh, this is my past Java life... so mostly hibernate

paper flower Aug 30, 2023, 9:57 PM

#

You should take a look at sqlalchemy

#

Pretty much everything you can write in sql you can write in it

#

e.g. subqueries, ctes, aggregates, window functions, etc

coral wasp Aug 30, 2023, 9:59 PM

#

Well, I do mostly analytic stuff, so there's no reason for an ORM

#

And, I'd rather write in SQL than in chained pythonic APIs

#

A single declarative statement is far more readable and debuggable, imo.

#

but I know people will fight me over that

paper flower Aug 30, 2023, 10:00 PM

#

coral wasp Well, I do mostly analytic stuff, so there's no reason for an ORM

Personally if I interact with DB from code I would use orm/query builder in 99% of cases

paper flower Aug 30, 2023, 10:00 PM

#

coral wasp A single declarative statement is far more readable and debuggable, imo.

sqlalchemy is almost exactly like sql, but in python syntax, I don't think there's much to debug 🤔

#

stmt = (
    select(Model.name, func.count(Model.id)
    .where(Model.name.ilike("a%"))
    .group_by(Model.name)
)

I think that's pretty similar to how SQL is written

coral wasp Aug 30, 2023, 10:02 PM

#

There's dozens of APIs like that, which is part of the problem. It's yet another thing to learn, when all I need to know is SQL.

paper flower Aug 30, 2023, 10:03 PM

#

coral wasp There's dozens of APIs like that, which is part of the problem. It's yet another...

Yeah, in case of analytical queries it might now matter as much, but when you need to dynamically create queries, maintain them, etc. it's much easier 🤔

coral wasp Aug 30, 2023, 10:03 PM

#

That's where dbt comes in

paper flower Aug 30, 2023, 10:03 PM

#

What's dbt?

paper flower Aug 30, 2023, 10:03 PM

#

paper flower Yeah, in case of analytical queries it might now matter as much, but when you ne...

E.g. I change/delete a model field, I would immediatelly get an error from my linter

coral wasp Aug 30, 2023, 10:04 PM

#

https://github.com/dbt-labs/dbt-core

GitHub

GitHub - dbt-labs/dbt-core: dbt enables data analysts and engineers...

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. - GitHub - dbt-labs/dbt-core: dbt enables data analysts a...

paper flower Aug 30, 2023, 10:05 PM

#

Ah, that's not really would fit a usecase of a generic app that needs a DB imo

#

E.g. some kind of web application

coral wasp Aug 30, 2023, 10:07 PM

#

no, not at all

glacial current Aug 31, 2023, 1:27 AM

#

coral wasp Oh, this is my past Java life... so mostly hibernate

Ah, yeh i had worked with someone that used hibernate that crushed the MSSQL server. this was because of threading and although i think that they just made a mistake, it was compounded by java philosophy that java is all you need to know. the actual result set should have been 100K rows. So i think you are more aware of a naked query result set than an ORM result set.

#

btw that was an interesting discussion you guys had and i agree with Billy that its sometimes simpler to bypass the ORM layer.

#

but ORM is nice too at times for the simple and medium stuff Billy said it well

harsh pulsar Aug 31, 2023, 1:50 AM

#

coral wasp https://github.com/dbt-labs/dbt-core

fwiw dbt doesn't actually check if you change model fields, it doesn't parse your sql at all and is surprisingly "dumb"

#

also yeah their attitude to jinja templating is a little... cavalier

#

you'd think they'd at least offer a quote jinja macro, but nope. '{{ foo }}'

coral wasp Aug 31, 2023, 1:51 AM

#

It’s data engineering, everything is cavalier:) a lot of duct tape and glue

harsh pulsar Aug 31, 2023, 1:51 AM

#

it doesn't need to be. it's that way because all the tooling is designed that way

#

there's no reason dbt couldn't at least try to make inserting strings into sql slightly safe

#

it works because 99% of the time users are well-behaved

mild crow Aug 31, 2023, 1:52 AM

#

hey @harsh pulsar i know you think i don't know anything or perhaps should learn a bit more, if you dpon't mind ? would you see this https://paste.pythondiscord.com/QLHA

harsh pulsar Aug 31, 2023, 1:52 AM

#

it's all the honor system

harsh pulsar Aug 31, 2023, 1:52 AM

#

mild crow hey <@389497659087650836> i know you think i don't know anything or perhaps shou...

i actually meant to look at this, i got busy with work

mild crow Aug 31, 2023, 1:53 AM

#

thank you? its ok im not tripping but i do understand better then most, when you do have time, only because you are the only person that understand it.

harsh pulsar Aug 31, 2023, 1:54 AM

#

let's move this back to #data-science-and-ml

mild crow Aug 31, 2023, 1:54 AM

#

sure

opal holly Aug 31, 2023, 3:04 AM

#

coral wasp but I'm a db person, so that didn't slow me down... I just use SQL for what SQL ...

I'm a business intelligence person so I just use SQL for what SQL is bad at. I'm just in it for the witchcraft

shut tiger Aug 31, 2023, 6:36 AM

#

coral wasp Oh, this is my past Java life... so mostly hibernate

It's fun that every time someone says they hate ORMs it's somehow always Hibernate they've worked with.

#

And when people like ORMs they've never used Hibernate :P

tame parcel Aug 31, 2023, 6:57 AM

#

so i want to take data which is saved in database and show it on frontend

#

but this not working

#

import { useState, useEffect } from 'react';
import { Container, Table } from 'react-bootstrap';

export default function Admintable() {
  const [ordersData, setOrdersData] = useState([]);

  useEffect(() => {
    const fetchData = async () => {
      const response = await fetch('/api/admintable');
      const data = await response.json();
      setOrdersData(data);
    };
    fetchData();
  }, []);

  return (
    <Container className=" container-fluid d-flex justify-content-center align-items-center">
      <div className='m-5'>
        <Table responsive>
          <thead>
            <tr>
              <th>Email</th>
              <th>Price</th>
              <th>Date</th>
            </tr>
          </thead>
          <tbody>
            {ordersData.map(order => (
              <tr key={order.email}>
                <td>{order.email}</td>
                <td>{order.totalprice}tk</td>
                <td>{order.date}</td>
              </tr>
            ))}
          </tbody>
        </Table>
      </div>
    </Container>
  );
}

shut tiger Aug 31, 2023, 6:59 AM

#

tame parcel but this not working

"Doesn't work" is not an error description

#

also, this is #databases but that looks like #web-development

tame parcel Aug 31, 2023, 6:59 AM

#

router.post('/admintable', async (req, res) => {
try {
// Attempt to save a new Admintable object to the database
let { email, totalprice, date } = req.body;
const newAdmin = new Admintable({ email, totalprice, date });
await newAdmin.save();
res.json({ message: "Data saved successfully" });
} catch (error) {
// If an error occurs, send a 500 (Internal Server Error) response to the client
res.status(500).send("Server Error");
}
});

tame parcel Aug 31, 2023, 7:01 AM

#

shut tiger "Doesn't work" is not an error description

Failed to load resource: the server responded with a status of 404 (Not Found)
admin:1 Uncaught (in promise) SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

shut tiger Aug 31, 2023, 7:03 AM

#

tame parcel Failed to load resource: the server responded with a status of 404 (Not Found) a...

again: move this to #web-development

paper otter Aug 31, 2023, 8:28 AM

#

hi ppl

hollow oar Aug 31, 2023, 8:35 AM

#

harsh pulsar fwiw dbt doesn't actually check if you change model fields, it doesn't parse you...

fwiw dbt doesn't actually check if you change model fields, it doesn't parse your sql at all and is surprisingly "dumb"
what does "change model fields" entail?

also yeah their attitude to jinja templating is a little... cavalier
ha yes indeed.

'{{ foo }}'
ha - personal taste i guess, i am completely fine with this.

floral basin Aug 31, 2023, 2:26 PM

#

(Windows)Can someone help? I'm trying install mysqlclient for my Django project but error pop up:

Failed to build mysqlclient
ERROR: Could not build wheels for mysqlclient, which is required to install pyproject.toml-based projects

harsh pulsar Aug 31, 2023, 2:33 PM

#

hollow oar > fwiw dbt doesn't actually check if you change model fields, it doesn't parse y...

by "change" i meant adding, removing, or renaming a column

#

at least as far as i know?

hollow oar Aug 31, 2023, 2:46 PM

#

harsh pulsar by "change" i meant adding, removing, or renaming a column

hmm yeah, the default behaviour is to ignore that.
that's an artifact of backward compat i guess - there are some documentation on this matter for incremental models at least (that's the model that matters the most in this case imo) https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change

fading river Aug 31, 2023, 7:19 PM

#

Question: Is there a way to use a well supported package to use schema definitions with mongodb with asyncio? Currently I don't think there is. The way I got to that conclusion was that mongoengine seems to be the standard way to do schemas for mongo, but mongoengine doesn't work with the motor package which provides mongodb compatibility with asyncio. there is a package called motorengine to get the same functionality. I looked up this package and it only supports Tornado which was the original use case of Motor before it was ported to asyncio. There is another package called AIOMotorEngine which has only 27 stars on GitHub and proudly states that it was ported from motorengine and that "All tests are rewritten too and seem to pass." this is pretty far from well supported in my view, and since this is for work, I want to use packages that I know have been extensively used and tested.

#

This seems a bit weird to me though because schemas, mongodb, and asyncio are all extremely popular and it seems like this combination would be better supported, so maybe there is an obvious approach that I'm missing.

brazen charm Aug 31, 2023, 7:40 PM

#

fading river Question: Is there a way to use a well supported package to use schema definitio...

Realistically I think if you want schemas with mongo... You should just use postgres

#

A large reason why nothing exists is because it's not really common for businesses and people who have the time to write those libraries to use mongo when they need a schema

torn sphinx Aug 31, 2023, 9:20 PM

#

hey i have got a question, should i store id's as bigint type or rather string and cast it to int in python?

model Guild {
  id             BigInt  @id @unique
  welcomeMessage String?
  welcomeChannel BigInt?
  leaveMessage   String?
  leaveChannel   BigInt?
}

#

its postgres btw

waxen finch Aug 31, 2023, 9:37 PM

#

torn sphinx hey i have got a question, should i store id's as bigint type or rather string a...

by IDs you specifically mean discord snowflakes? if so bigints seem to be pretty common, despite being signed integers

torn sphinx Aug 31, 2023, 9:39 PM

#

waxen finch by IDs you specifically mean discord snowflakes? if so bigints seem to be pretty...

yeah i mean snowflakes

waxen finch Aug 31, 2023, 9:39 PM

#

fwiw 2^63-1 converted to a discord timestamp would be <t:3619093655>, or <t:3619093655:R>, so i probably wouldnt worry about it overflowing yet

torn sphinx Aug 31, 2023, 9:40 PM

#

is bigint 2^63?

waxen finch Aug 31, 2023, 9:42 PM

#

the min value is -2^63, then another spot for 0, and the rest is 2^63-1
wiki: https://en.wikipedia.org/wiki/Integer_(computer_science)
psql numeric type ranges: https://www.postgresql.org/docs/15/datatype-numeric.html

torn sphinx Aug 31, 2023, 9:42 PM

#

yeah

harsh pulsar Aug 31, 2023, 10:34 PM

#

hollow oar hmm yeah, the default behaviour is to ignore that. that's an artifact of backwar...

i see, im still learning all the more "advanced" features

harsh pulsar Aug 31, 2023, 10:34 PM

#

fading river Question: Is there a way to use a well supported package to use schema definitio...

isn't that what odmantic is?

#

it's an "ODM", like sqlalchemy for mongodb, using motor for mongodb interaction and pydantic to define types and validate inputs/outputs

#

i've used it before, it's not bad at all

#

https://art049.github.io/odmantic/

ODMantic

AsyncIO MongoDB ODM (Object Document Mapper) using python type hinting

fading river Aug 31, 2023, 11:24 PM

#

harsh pulsar it's an "ODM", like sqlalchemy for mongodb, using motor for mongodb interaction ...

I'm already using pydantic in this codebase for API interaction, so I will definitely look at this, thank you!

broken charm Sep 1, 2023, 9:35 AM

#

Hi does anyone knows if there is a database with company wise production data for agriculture sector (crops livestock dairy)? Preferably europe

fading patrol Sep 1, 2023, 1:11 PM

#

broken charm Hi does anyone knows if there is a database with company wise production data fo...

At the individual company level? If such data exists I assume it will be expensive. If the data on Eurostat isn't detailed enough you can try https://www.reddit.com/r/datasets/

r/datasets

pale canyon Sep 1, 2023, 1:23 PM

#

Ive created a login and registration system for a project but im unable to get a working database using SQLite, can anybody help me with this please?

shut tiger Sep 1, 2023, 1:36 PM

#

pale canyon Ive created a login and registration system for a project but im unable to get a...

Just post the problem you are having. Don't ask to ask.

fading patrol Sep 1, 2023, 1:44 PM

#

pale canyon Ive created a login and registration system for a project but im unable to get a...

#❓｜how-to-get-help

broken charm Sep 1, 2023, 3:44 PM

#

fading patrol At the individual company level? If such data exists I assume it will be expensi...

Ill look into eurostats

torn sphinx Sep 1, 2023, 4:55 PM

#

async def run():
    conn = await psy.connect(user='postgres', password='root',
                                 database='postgres', host='localhost')
    #values = await conn.fetch(
    #    'SELECT * FROM mytable WHERE id = $1',
    #    10,
    #)
    await conn.execute("CREATE TABLE IF NOT EXISTS test1(id SERIAL PRIMARY KEY, number INT)")
    await conn.execute("CREATE TABLE IF NOT EXISTS test2(id SERIAL PRIMARY KEY, number INT)")
    await conn.execute("CREATE TABLE IF NOT EXISTS test3(id SERIAL PRIMARY KEY, number INT)")
    await add(conn)

async def add(co):
    await asyncio.create_task(a1(co))
    await asyncio.create_task(a2(co))
    await asyncio.create_task(a3(co))

async def a1(con):
    while True: # test1
        n = random.randint(100000, 199999)
        await con.execute(f"INSERT INTO test1 (number) VALUES ({n})")

async def a2(con):
    while True: # test2
        n = random.randint(200000, 299999)
        await con.execute(f"INSERT INTO test1 (number) VALUES ({n})")
async def a3(con):
    while True: # test3
        n = random.randint(300000, 399999)
        await con.execute(f"INSERT INTO test1 (number) VALUES ({n})")```

How come this isn't running all three of them at once, it's only doing the first one. (AKA a1)

## async pg

brazen charm Sep 1, 2023, 5:12 PM

#

it should run them concurrently, but your conn object might be holding a mutex or the database itself is limiting the concurrency if you are not seeing the values inserting

shut tiger Sep 1, 2023, 5:18 PM

#

I wonder if anything is awaiting

paper flower Sep 1, 2023, 5:42 PM

#

brazen charm it should run them concurrently, but your `conn` object might be holding a mutex...

If I'm not wrong postgres connections don't support concurrent operations

#

@torn sphinx Also you're not running your coroutines concurrently

grim vault Sep 1, 2023, 5:44 PM

#

And shouldn't a2() use test2 as table (same for 3)?

paper flower Sep 1, 2023, 5:45 PM

#

tasks = [a1(co), a2(co), a3(co)]
await asyncio.gather(*tasks)

Or in python 3.11:

async with asyncio.TaskGroup() as tg:
    for fn in (a1, a2, a3):
        tg.create_task(fn(co))

brazen charm Sep 1, 2023, 5:50 PM

#

oh yeah i forgot about the immediate await lol

torn sphinx Sep 1, 2023, 6:03 PM

#

paper flower <@456226577798135808> Also you're not running your coroutines concurrently

Ah so I need to group them then run all of them. In previous code I had two task and I did it pretty much the same as my other code and it worked, figured it worked again.

torn sphinx Sep 1, 2023, 6:09 PM

#

paper flower <@456226577798135808> Also you're not running your coroutines concurrently

  + Exception Group Traceback (most recent call last):
   File "c:\Users\m\Desktop\Test\V1\New\main.py", line 38, in <module>
  |     loop.run_until_complete(run())
  |   File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "c:\Users\m\Desktop\Test\V1\New\main.py", line 16, in run
  |     await add(conn)
  |   File "c:\Users\m\Desktop\Test\V1\New\main.py", line 19, in add
  |     async with asyncio.TaskGroup() as tg:
  |   File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\asyncio\taskgroups.py", line 147, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (2 sub-exceptions)

#

  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "c:\Users\m\Desktop\Test\V1\New\main.py", line 31, in a2
    |     await con.execute(f"INSERT INTO test1 (number) VALUES ({n})")
    |   File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\site-packages\asyncpg\connection.py", line 317, in execute
    |     return await self._protocol.query(query, timeout)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "asyncpg\protocol\protocol.pyx", line 323, in query
    |   File "asyncpg\protocol\protocol.pyx", line 707, in asyncpg.protocol.protocol.BaseProtocol._check_state
    | asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "c:\Users\m\Desktop\Test\V1\New\main.py", line 35, in a3
    |     await con.execute(f"INSERT INTO test1 (number) VALUES ({n})")
    |   File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\site-packages\asyncpg\connection.py", line 317, in execute
    |     return await self._protocol.query(query, timeout)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "asyncpg\protocol\protocol.pyx", line 323, in query
    |   File "asyncpg\protocol\protocol.pyx", line 707, in asyncpg.protocol.protocol.BaseProtocol._check_state
    | asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress
    +------------------------------------```

torn sphinx Sep 1, 2023, 6:18 PM

#

brazen charm it should run them concurrently, but your `conn` object might be holding a mutex...

Not sure what a mutex is

torn sphinx Sep 1, 2023, 6:18 PM

#

grim vault And shouldn't a2() use test2 as table (same for 3)?

I have no idea what you're talking about

grim vault Sep 1, 2023, 6:22 PM

#

torn sphinx I have no idea what you're talking about

async def a2(con):
    while True: # test2  <-------------------
        n = random.randint(200000, 299999)
        await con.execute(f"INSERT INTO test1 (number) VALUES ({n})")
                                        ^^^^^

harsh pulsar Sep 1, 2023, 6:42 PM

#

brazen charm it should run them concurrently, but your `conn` object might be holding a mutex...

i think without any further configuration this will all run inside one big transaction https://www.psycopg.org/psycopg3/docs/basic/transactions.html, but idk why that would cause this particular error

torn sphinx Sep 1, 2023, 6:57 PM

#

harsh pulsar i think without any further configuration this will all run inside one big trans...

It's asyncpg

harsh pulsar Sep 1, 2023, 6:57 PM

#

torn sphinx It's asyncpg

i see.. i saw psy in your code and assumed psycopg

torn sphinx Sep 1, 2023, 6:57 PM

#

harsh pulsar i see.. i saw `psy` in your code and assumed psycopg

lol yeah I always change it to psy when using libraries like this

#

Well mainly because I did use pyscopg

harsh pulsar Sep 1, 2023, 6:57 PM

#

that's guaranteed to confuse people

torn sphinx Sep 1, 2023, 6:58 PM

#

Fair enough

#

I'll change it in the message

torn sphinx Sep 1, 2023, 7:18 PM

#

grim vault ```py async def a2(con): while True: # test2 <------------------- n...

Ah I see, fixed it. It still doesn't do both though

waxen finch Sep 1, 2023, 8:22 PM

#

torn sphinx Ah I see, fixed it. It still doesn't do both though

afaik one connection isnt capable of doing more than one query at a time, so you'd normally open multiple connections to your database to run each of your queries in separate transactions, e.g. ```py
async with asyncpg.create_pool(...) as pool, asyncio.TaskGroup() as tg:
tasks = [tg.create_task(run_query(pool)) for _ in range(3)]

async def run_query(pool):
async with pool.acquire() as conn:
await conn.execute("INSERT INTO my_table VALUES ('foo')")```

#

(by default create_pool() will immediately open 10 connections, but you can adjust that with min/max_size= if you want less)

torn sphinx Sep 2, 2023, 1:08 AM

#

googleapis has a way to pull youtube data

torn sphinx Sep 2, 2023, 4:01 AM

#

waxen finch (by default [`create_pool()`](<https://magicstack.github.io/asyncpg/current/api/...

Is there a way to set unlimited? I plan on allowing alot of executions that could take 5 - 10 seconds to happen all the time. So It probably will go into the hundreds of executions

waxen finch Sep 2, 2023, 4:20 AM

#

torn sphinx Is there a way to set unlimited? I plan on allowing alot of executions that coul...

i dont think literally setting an unlimited number of connections would be a good idea because at some point your client/server won't have the resources to handle it, not to mention your database has a limit on the number of connections too (100 by default)
what specifically about your "executions" is taking up so much time? is it some kind of processing on the database, or on your client? if its the client, i would suggest batching your queries into a few transactions so you can release the connection during the heaviest parts, e.g. ```py
async with pool.acquire() as conn:
all_the_data = await conn.fetch("SELECT ...")

Work with data now...

async with pool.acquire() as conn:
await conn.execute("INSERT INTO ...")```

fallen vault Sep 2, 2023, 4:40 AM

#

I have a user level column in my database. How do i select all users where the user level is greater tahn or equal to 3?

#

Please ping me if you respond.

torn sphinx Sep 2, 2023, 4:46 AM

#

waxen finch i dont think literally setting an unlimited number of connections would be a goo...

I'll be storing billions of records in hundreds possibly thousands of tables. Each record will have to be found by going through each table and checking if there is a result that matches the query. This causes a lengthy process.

#

When I was doing it before with I think a couple hundred million records it took ~30 seconds

#

But that was using pyscopg so it could be faster with asyncpg

#

I fully coded the tool before but the only issue was it wasn't asynchronous so if one user called it, the next would have to wait the time left for that transaction + the time for the transaction they made

torn sphinx Sep 2, 2023, 4:49 AM

#

waxen finch afaik one connection isnt capable of doing more than one query at a time, so you...

Also I'm confused on what's happeing here, for _ in range(3). So this is specifically if I have 3 concurrent executions?

waxen finch Sep 2, 2023, 5:13 AM

#

torn sphinx I'll be storing billions of records in hundreds possibly thousands of tables. Ea...

if all that selecting/inserting needs to be done for each user request then i guess you would need to increase your pool size, but are you sure the schema is optimized for your use case? having thousands of tables sounds wrong to me

waxen finch Sep 2, 2023, 5:14 AM

#

torn sphinx Also I'm confused on what's happeing here, `for _ in range(3)`. So this is speci...

thats merely a list comprehension shorthand to create 3 different query tasks

keen minnow Sep 2, 2023, 5:14 AM

#

torn sphinx I'll be storing billions of records in hundreds possibly thousands of tables. Ea...

you can't improve what you can't measure.
So I would suggest to monitor and dive into it so you understand where it spends the most time

torn sphinx Sep 2, 2023, 5:27 AM

#

waxen finch if all that selecting/inserting needs to be done for each user request then i gu...

The only other way to do it is if I had one table with a shit ton of different column names (since I'd have possibly thousands of different tables there would be a lot of different names). I'd then need to have a column name specifically for what that table would be, I suppose that could work but I don't think it's worth it. I'm not sure though.

torn sphinx Sep 2, 2023, 5:27 AM

#

keen minnow you can't improve what you can't measure. So I would suggest to monitor and dive...

How can I monitor it?

#

I'm not sure if it is slow due to the resources available or if it's just the way psycopg was. I'm going to have a similar testing situation as before so I can actually grasp what's taking majority of the time.

#

Based off of their testing I should be seeing somewhere near a 5x speed difference, this is a big difference. That means if it took 30 seconds before it would take ~6 seconds now.

keen minnow Sep 2, 2023, 5:36 AM

#

torn sphinx How can I monitor it?

there are monitoring libraries, there are also ways to understand your queries better like the explain keyword

#

If you wonder if it's slow due to the resources available, then you ought to monitor the resources available

#

be it resources on the db, the thread or connection pool from the app, etc.

torn sphinx Sep 2, 2023, 5:45 AM

#

keen minnow there are monitoring libraries, there are also ways to understand your queries b...

Any library you'd recommend for this case? Also I'll look into the explain.

keen minnow Sep 2, 2023, 5:46 AM

#

torn sphinx Any library you'd recommend for this case? Also I'll look into the explain.

I would look into opentelemetry in case or some APM

torn sphinx Sep 2, 2023, 5:46 AM

#

keen minnow I would look into opentelemetry in case or some APM

Question, are these specifically for postgresql related things or is this a one in all type of library

keen minnow Sep 2, 2023, 5:47 AM

#

torn sphinx Question, are these specifically for postgresql related things or is this a one ...

it's applicable to everything and anything

torn sphinx Sep 2, 2023, 5:48 AM

#

keen minnow it's applicable to everything and anything

Alright cool, I'll jot it down and use it once I have the same data set as I had before.

keen minnow Sep 2, 2023, 5:49 AM

#

torn sphinx Alright cool, I'll jot it down and use it once I have the same data set as I had...

But overall, the answer is: science works, bitches

So make sure you can observe the variables at play, and don't hesitate to do load testing if needed

torn sphinx Sep 2, 2023, 5:50 AM

#

keen minnow But overall, the answer is: science works, bitches So make sure you can observe...

What's load testing? Also yeah with what you've guys taught me I'm much more knowledgeable now than I was 2 months ago when making the non asynchronous version of this. I have good hope for making this a bazillion times more efficient.

keen minnow Sep 2, 2023, 5:52 AM

#

torn sphinx What's load testing? Also yeah with what you've guys taught me I'm much more kno...

You may want to peruse through https://www.brendangregg.com/overview.html

But in a sentence, it's about creating test scenarios and making a bunch of requests against your service so that you understand when it starts falling apart and where it starts falling apart

torn sphinx Sep 2, 2023, 5:53 AM

#

Ah okay that makes sense, appreciate it.

#

I'll look into it

keen minnow Sep 2, 2023, 5:54 AM

#

At the end of the day, it's all about going through hypothesis and validating them

#

You should never have to guess

#

also, it's a very valuable and difficult skill to get

#

so it's great for your career too

elder sand Sep 2, 2023, 10:36 AM

#

Hey, can someone help me out with this question : I’m running 3 websites on a linux instance. For each website I have a sql database server running using docker container. So 2 postgresql docker containers & 1 mysql docker container. What I want to do is to migrate these db’s to a new postgres docker container in real-time, i.e copy the current data and keep copying the new data until its same. There are tools like pg_dump for posgres, but those are manual methods. Is there an automated approach to solve this problem, for eg : bash/ python scripts.

hexed estuary Sep 2, 2023, 11:36 AM

#

torn sphinx Based off of their testing I should be seeing somewhere near a 5x speed differen...

twice as fast as golang?? 👀

#

oh, I see, asyncpg is 25% cython. cool.

brazen charm Sep 2, 2023, 12:32 PM

#

yeah most of the heavy lifting is done in cython

#

not that those numbers matter much most of the time, since you need a massive DB to start hitting those numbers

fading patrol Sep 2, 2023, 12:59 PM

#

elder sand Hey, can someone help me out with this question : I’m running 3 websites on a li...

For pg to pg this looks like it might work out of box but I haven't tried it: https://github.com/ankane/pgsync

MySQL to pg will be a little more complicated I imagine.

GitHub

GitHub - ankane/pgsync: Sync data from one Postgres database to ano...

Sync data from one Postgres database to another. Contribute to ankane/pgsync development by creating an account on GitHub.

narrow shore Sep 2, 2023, 3:30 PM

#

Let's say you are in a position where you can't use Spark etc to process a huge chunk of data from a RDBMS (Postgres / MySQL) to calculate some value, but you need all the data from multiple tables to calculate it. What would your approach be? (stored procedures is a no-no)

west sentinel Sep 2, 2023, 7:12 PM

#

what does this error mean from pymongo?
SSL handshake failed: ac-qb0gg1h-shard-00-00.2lrs6ai.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:992),SSL handshake failed: ac-qb0gg1h-shard-00-01.2lrs6ai.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:992),SSL handshake failed: ac-qb0gg1h-shard-00-02.2lrs6ai.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:992), Timeout: 30s, Topology Description: <TopologyDescription id: 64f388b2e820f7223e1abc5d, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('ac-qb0gg1h-shard-00-00.2lrs6ai.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('SSL handshake failed: ac-qb0gg1h-shard-00-00.2lrs6ai.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:992)')>, <ServerDescription ('ac-qb0gg1h-shard-00-01.2lrs6ai.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('SSL handshake failed: ac-qb0gg1h-shard-00-01.2lrs6ai.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:992)')>, <ServerDescription ('ac-qb0gg1h-shard-00-02.2lrs6ai.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('SSL handshake failed: ac-qb0gg1h-shard-00-02.2lrs6ai.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:992)')>]>

brazen charm Sep 2, 2023, 7:15 PM

#

it means it was unable to connect to the address given due to it failing to complete a handshake within the timeout

west sentinel Sep 2, 2023, 7:20 PM

#

do you know what could be causing the problem

brazen charm Sep 2, 2023, 7:25 PM

#

are you running some really old version of pymongo?

#

or is atlas having any service issues rn?

west sentinel Sep 2, 2023, 7:26 PM

#

my pymongo version is 4.4.1

brazen charm Sep 2, 2023, 7:29 PM

#

can you run python -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])" in your terminal

#

(assumes you have requests installed)

#

And/Or run openssl version

west sentinel Sep 2, 2023, 7:30 PM

#

i ran this
from urllib.request import urlopen
print(urlopen('https://www.howsmyssl.com/a/check').read())

#

tls version is 1.3

brazen charm Sep 2, 2023, 7:31 PM

#

hmm

#

That should be new enough, only v1.0 or 1.1 should really be disabled

west sentinel Sep 2, 2023, 7:32 PM

#

openssl version gives me LibreSSL 2.8.3

brazen charm Sep 2, 2023, 7:32 PM

#

I assume you're using atlas btw?

west sentinel Sep 2, 2023, 7:33 PM

#

yes

brazen charm Sep 2, 2023, 7:33 PM

#

What if you run python -c "import ssl; print(ssl.OPENSSL_VERSION)"

west sentinel Sep 2, 2023, 7:34 PM

#

OpenSSL 1.1.1s 1 Nov 2022

brazen charm Sep 2, 2023, 7:34 PM

#

right

#

so it looks like the openssl version python is using is outdated

#

well, outdated enough to not be able to connect to atlas

#

What python version are you using?

west sentinel Sep 2, 2023, 7:35 PM

#

i think 3.11

#

3.11.2

brazen charm Sep 2, 2023, 7:35 PM

#

did you compile it from source?

west sentinel Sep 2, 2023, 7:35 PM

#

i used pip

brazen charm Sep 2, 2023, 7:35 PM

#

hmm?

west sentinel Sep 2, 2023, 7:36 PM

#

oh shit

brazen charm Sep 2, 2023, 7:36 PM

#

For the interpreter itself?

west sentinel Sep 2, 2023, 7:36 PM

#

nah im being stupid

#

no i didnt

brazen charm Sep 2, 2023, 7:37 PM

#

can you run python -c "import ssl; print(ssl.HAS_TLSv1_2, ssl.HAS_TLSv1_3)"

#

Just realised thats the openssl version not the tls version

west sentinel Sep 2, 2023, 7:37 PM

#

true true

brazen charm Sep 2, 2023, 7:38 PM

#

okay, well thats weird

#

you have the right TLS versions

#

pithink So why is mongo using 1.1

#

what is your connection URI? (Remove the auth stuff if that's in there aka the mongodb://foo:bar@

west sentinel Sep 2, 2023, 7:42 PM

#

"mongodb+srv://x:x@cluster0.2lrs6ai.mongodb.net/?retryWrites=true&w=majority"

brazen charm Sep 2, 2023, 7:42 PM

#

can you add &tls=true to the end of the uri?

west sentinel Sep 2, 2023, 7:43 PM

#

it worked

#

?? how

brazen charm Sep 2, 2023, 7:44 PM

#

pymongo having some seriously poor error handling and no TLS upgrade support apparently

#

So it's not really a TLS version error, the error is just super miss-leading

#

pymongo just doesn't bother attempting to setting up TLS unless you explicitly tell it to

west sentinel Sep 2, 2023, 7:45 PM

#

oh ok

#

thank you very much

pure mortar Sep 2, 2023, 10:31 PM

#

SeattleDataGuy:
https://www.linkedin.com/posts/benjaminrogojan_in-the-last-12-months-i-have-dealt-with-activity-7103463847377735680-MD3M

Benjamin Rogojan on LinkedIn: In the last 12 months, I have dealt w...

In the last 12 months, I have dealt with three different projects that involved moving companies' analytical workloads, from MongoDB to analytical platforms… | 27 comments on LinkedIn

harsh pulsar Sep 3, 2023, 2:26 AM

#

pure mortar SeattleDataGuy: https://www.linkedin.com/posts/benjaminrogojan_in-the-last-12-mo...

i worked on a system where mongodb was the only database, it was horrible

pure mortar Sep 3, 2023, 2:30 AM

#

harsh pulsar i worked on a system where mongodb was the only database, it was horrible

oof. and i guess you were on the analytics side of the house too

#

my condolences

#

blobpray 🕯️

#

mongo: complex queries over ton of data -> falls over

deep jetty Sep 3, 2023, 7:49 AM

#

Yes. You can use AND or OR to combine multiple conditions.
You only need one WHERE at the beginning.

brazen charm Sep 3, 2023, 10:02 AM

#

pure mortar mongo: complex queries over ton of data -> *falls over*

mongo: any query which isn't a lookup on a hashtable -> falls over

potent spire Sep 3, 2023, 10:09 AM

#

How I can store data in mongo db?

torn sphinx Sep 3, 2023, 10:29 AM

#

Hey guys. Need a little suggestion. So I need create a nosql db (dynamoDb). The purpose is to store ads data in there. And the queries will be like filtering on a certain keyword and getting data based in a date range.

What would be the appropriate nosql schema for this task. The number of rows would be high likely over a billion.

coral wasp Sep 3, 2023, 10:35 AM

#

Can you explain more about what ads detail is?

torn sphinx Sep 3, 2023, 10:56 AM

#

coral wasp Can you explain more about what ads detail is?

Google ads

#

When you search a keyword you will get multiple ads

coral wasp Sep 3, 2023, 10:56 AM

#

But what about the ads? What kinda of information? Can’t design a schema without know the objects and attributes

torn sphinx Sep 3, 2023, 10:57 AM

#

Hmm title, description, datetime, keyword

coral wasp Sep 3, 2023, 10:57 AM

#

4 attributes? Why nosql?

#

And I don’t really understand the question then, if just four attributes, the schema is just one table four attributes?

torn sphinx Sep 3, 2023, 10:59 AM

#

Yes I meant as in partition and sort key and all that kind of stuff

#

I have only worked with RDS so far

torn sphinx Sep 3, 2023, 10:59 AM

#

coral wasp 4 attributes? Why nosql?

I think because of scale maybe. I didn't make that decision but that's what the team wants to use

coral wasp Sep 3, 2023, 11:00 AM

#

If you’re only searching for keyword and date range, I’d think you’d partition in keyword with a secondary on date to support a range query

torn sphinx Sep 3, 2023, 11:01 AM

#

So will that be a sort_column?

#

coral wasp Sep 3, 2023, 11:03 AM

#

Yah, or if you are searching across entire, you could use a global secondary index… one sec find a ref

#

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html#

torn sphinx Sep 3, 2023, 11:04 AM

#

Ah, global secondary is like if you want ads in past 30 days irrespective keyword bin they belong to yeah?

coral wasp Sep 3, 2023, 11:06 AM

#

Yah

torn sphinx Sep 3, 2023, 11:08 AM

#

So what will I put as a partition key in global index

#

in this case

#

Since we essentially just need a sort key and no partition

coral wasp Sep 3, 2023, 11:09 AM

#

torn sphinx I think because of scale maybe. I didn't make that decision but that's what the ...

Personally, I find most people overthink this stuff early on before they know their query usage. Unless the team truly understands the queries they’ll need

coral wasp Sep 3, 2023, 11:09 AM

#

torn sphinx Since we essentially just need a sort key and no partition

I’d think you’d partition by keyword

torn sphinx Sep 3, 2023, 11:10 AM

#

coral wasp I’d think you’d partition by keyword

Even for global secondary index?

coral wasp Sep 3, 2023, 11:10 AM

#

Thinking more generally: I’m not an ad-guy, so I don’t know if this is optimal or even makes semantic sense (don’t searches have multiple keywords). Im just commenting on the schema you described

torn sphinx Sep 3, 2023, 11:11 AM

#

Gpt says make date the partition in gsi

coral wasp Sep 3, 2023, 11:11 AM

#

You said you were searching keyword and date, not merely date.

torn sphinx Sep 3, 2023, 11:11 AM

#

Yeah but this is secondary index.

#

In the case where we only want to filter on date

coral wasp Sep 3, 2023, 11:12 AM

#

Sure, you can put an index on any additional thing you want to filter on

torn sphinx Sep 3, 2023, 11:13 AM

#

Awesome. And do you see any use case of a local secondary index in this case?

#

Just relating to keywords and date

coral wasp Sep 3, 2023, 11:13 AM

#

There are limits and costs. Locals are good when you’re doing an and search: keyword=Foo and date…

torn sphinx Sep 3, 2023, 11:14 AM

#

Are secondary indexes slower than normal ones?

coral wasp Sep 3, 2023, 11:14 AM

#

Don’t think so, but they add overhead because they need to be updated during inserts

torn sphinx Sep 3, 2023, 11:16 AM

#

Cost wise?

coral wasp Sep 3, 2023, 11:16 AM

#

No idea, been a while since I look at that

#

Aws charges you for everything so certainly there’s a cost for everything and every io

#

Overall tho, I’d really make sure you understand the queries and whether dynamodb makes sense here. If there’s queries are aggregate stuff like: average ad hits per day over the last year, I dunno if it’s the right solution

brazen charm Sep 3, 2023, 11:19 AM

#

Secondary indexes don't have a read performance cost but it makes writes potentially slower/more expensive

#

As the cluster has to update the indexes across each node seperately

#

you dont want Dynamo for this though btw

#

It'll cost you an absolute fortune

torn sphinx Sep 3, 2023, 11:21 AM

#

Uniqueness: The combination of Partition Key and Sort Key (if present) must be unique for each item in the table.

Is this true?

brazen charm Sep 3, 2023, 11:21 AM

#

torn sphinx ```Uniqueness: The combination of Partition Key and Sort Key (if present) must b...

No, at least not when I last checked

torn sphinx Sep 3, 2023, 11:21 AM

#

Silly gpt

brazen charm Sep 3, 2023, 11:21 AM

#

partition keys and sort keys are not like primary keys

torn sphinx Sep 3, 2023, 11:22 AM

#

brazen charm It'll cost you an absolute fortune

We doing serverless

coral wasp Sep 3, 2023, 11:23 AM

#

Me, I’d probably dump the ad data into parquet files and do roll up analysis into a separate summary db

brazen charm Sep 3, 2023, 11:23 AM

#

You have 3 fairly reasonable options for this depending on what latency you want:

Clickhouse for the lowest latency possible, it'll be the most expensive option though and require manual deployment of clickhouse cloud.
Redshift a bit cheaper than Dynamo but still expensive
Athena, which is basically S3 datalake querying, your queries are in the seconds at best though, it is by far the cheapest option though

torn sphinx Sep 3, 2023, 11:24 AM

#

Ah dynamoDb even expensive that redshift

brazen charm Sep 3, 2023, 11:24 AM

#

Both are pretty expensive for a lot of data

#

but redshift is more tailored for analytic queries

#

Depending on the queries you could also potentially use https://quickwit.io (Although disclaimer I used to work there and still work on the project) which will probably one of the cheapest options and still give you sub second latency ish

Search more with less | Quickwit

Sub-second search & analytics engine on cloud storage

torn sphinx Sep 3, 2023, 11:26 AM

#

We are stuck with aws

coral wasp Sep 3, 2023, 11:26 AM

#

Is this db part of customer facing app? Like interactive? Or for internal analytics?

torn sphinx Sep 3, 2023, 11:27 AM

#

Hmm, yes it will query ads both to show user straight away. But also to do something in the backend

brazen charm Sep 3, 2023, 11:27 AM

#

torn sphinx We are stuck with aws

All of the mentioned options run on AWS, just ClickHouse or QW require manual management to be deployed with something like a k8s cluster.

torn sphinx Sep 3, 2023, 11:28 AM

#

But queries are simple like select * from table where keyword='djsj' and dat in past 30 days

brazen charm Sep 3, 2023, 11:28 AM

#

Or cassandra/scylla would work for the lowest latency possible

#

since these seem like single point click queries

stiff radish Sep 3, 2023, 11:30 AM

#

torn sphinx Hey guys. Need a little suggestion. So I need create a nosql db (dynamoDb). The ...

Your primary issue is that you need to be able to search over billions of rows of text quickly, not necessarily NoSQL yeah?

torn sphinx Sep 3, 2023, 11:30 AM

#

yes

#

Talking to gpt, it said maintaining indexes at that scale is difficult in an rds

brazen charm Sep 3, 2023, 11:31 AM

#

I kind of agree with GPT that a non-relational DB is probably well suited for this

#

What sort of queries per second are you expecting?

torn sphinx Sep 3, 2023, 11:32 AM

#

Not much I reckon, not many users

#

Internal uni usage by staff to post ads

#

And also to see competitor ads

brazen charm Sep 3, 2023, 11:33 AM

#

and what is your budget for this roughly? On a monthly basis

torn sphinx Sep 3, 2023, 11:34 AM

#

That's a talk we haven't had. But ig not super low

#

If dynamo costs you per time then the queries shouldn't take long and that can't be a fortune like you said

brazen charm Sep 3, 2023, 11:36 AM

#

It charges by just about everything, i.e. per op, per time, per GB, etc...

#

but if your read rate is very low (I.e. less than 1k QPS) and your data is like < 500GB then you are probably fine i guess

#

i think it'll be a couple hundred dollars a month ish

coral wasp Sep 3, 2023, 11:38 AM

#

torn sphinx Internal uni usage by staff to post ads

More or less repeating myself but: So why would you expect billions of hits? I’d aggregate, and create static roll up reports, rather than ad-hoc queries

torn sphinx Sep 3, 2023, 11:38 AM

#

brazen charm i think it'll be a couple hundred dollars a month ish

that's no big problem

brazen charm Sep 3, 2023, 11:38 AM

#

I am curious about this though this Since low QPS, and the queries seem to be very direct I.e. "Get n matches for keyword 'XY' in date range Z"

torn sphinx Sep 3, 2023, 11:39 AM

#

coral wasp More or less repeating myself but: So why would you expect billions of hits? I’d...

Aggregate what?

#

We need titles and descriptions

#

Of indivudal ads

brazen charm Sep 3, 2023, 11:39 AM

#

How many results do you expect per query? Just one?

coral wasp Sep 3, 2023, 11:40 AM

#

torn sphinx Aggregate what?

(And how many distinct ads would there be? Dozens? Hundreds? Thousands?)

brazen charm Sep 3, 2023, 11:40 AM

#

I think they said upto around 1B rows

coral wasp Sep 3, 2023, 11:40 AM

#

Yah but that’s events: ad hits

torn sphinx Sep 3, 2023, 11:40 AM

#

brazen charm How many results do you expect per query? Just one?

No probably more. Since the scrape for now is being done once per hour, for 20 regions. That's 480 per day

And say for past 30 days, that would be 480*30=About 15k

stiff radish Sep 3, 2023, 11:41 AM

#

Your biggest issue is how the DB handles indexing and search over such a big dataset.

torn sphinx Sep 3, 2023, 11:41 AM

#

Actually 15k and each search might have returned 3 ads on average. So that will be about 45k rows

coral wasp Sep 3, 2023, 11:41 AM

#

Like, could you just store two tables: ad and ad_hit? Ad_hit could be a rollup

brazen charm Sep 3, 2023, 11:42 AM

#

Hang on a sec, they just want to get the ad title and description though, this isn't really an aggregation or searching thing

#

it's Keyword, daterange -> rows

#

Right?

torn sphinx Sep 3, 2023, 11:42 AM

#

yep

brazen charm Sep 3, 2023, 11:43 AM

#

yeah Dynamo should be fine

stiff radish Sep 3, 2023, 11:43 AM

#

So you index over time but then you still need something that searches over all the records in that timeframe somewhat efficiently

#

I'm lazy so my reflex would be to explore possible postgres plug-ins.

torn sphinx Sep 3, 2023, 11:43 AM

#

stiff radish So you index over time but then you still need something that searches over all...

yeah making a Global secondary index on dat

brazen charm Sep 3, 2023, 11:43 AM

#

Your partition key is the keyword
Your sort key would be the datetime

If you have multiple keywords per ad, you're better off duplicating the rows

stiff radish Sep 3, 2023, 11:43 AM

#

How would the index over the text look like?

torn sphinx Sep 3, 2023, 11:43 AM

#

Yep they are duplicated already

stiff radish Sep 3, 2023, 11:44 AM

#

Do you have duplicates?

#

Ah ok.

torn sphinx Sep 3, 2023, 11:44 AM

#

one row per retrieved ad even though they came in same request

coral wasp Sep 3, 2023, 11:44 AM

#

brazen charm Hang on a sec, they just want to get the ad title and description though, this i...

Hah, I’m biased towards my analytical world… I assumed hits per day was the type of metric they wanted.

brazen charm Sep 3, 2023, 11:44 AM

#

Dynamo should be fine then, as long as you can target the specific partition key without having dynamo filter, it's fine

torn sphinx Sep 3, 2023, 11:44 AM

#

Awesome

brazen charm Sep 3, 2023, 11:45 AM

#

The aim of the game with Dynamo/Cassandra/Scylla is you always want to filter to a specific partition (because the drivers can work out exactly what node the data lives on) so you're not scanning loads of data

#

So if you're doing that, should be fine

brazen charm Sep 3, 2023, 11:45 AM

#

coral wasp Hah, I’m biased towards my analytical world… I assumed hits per day was the type...

Yeah initial I was a bit confused coming from ad tech where we have petabytes of data doing queries of that 😅

torn sphinx Sep 3, 2023, 11:45 AM

#

And should I be worried about having a gsi on date. That query wouldn't be as common, that is to query with a keyword or a list of keywords. So will that gsi be a lot of cost for lot of rows?

brazen charm Sep 3, 2023, 11:45 AM

#

Do you need a GSI on date?

#

like you want a list of keywords that appeared within a range?

torn sphinx Sep 3, 2023, 11:46 AM

#

Well ads* irrespective of the keywords they belong to

#

Though rare query

brazen charm Sep 3, 2023, 11:47 AM

#

You will probably be fine, it's a rare query and you don't have a huge amount of data or query rate.

#

If I remember right GSIs are built upon the materialized view system, so you do effectively double your data size though

stiff radish Sep 3, 2023, 11:48 AM

#

I'd still look into how text is indexed with that GSI thing.

brazen charm Sep 3, 2023, 11:48 AM

#

It's a hash

#

from a very high level view, a GSI effectively creates a new table with the field(s) you want to build the index on as the partition key

stiff radish Sep 3, 2023, 11:49 AM

#

Which is fine for text cause you only need equality anyway.

torn sphinx Sep 3, 2023, 11:51 AM

#

stiff radish Which is fine for text cause you only need equality anyway.

might need to pass a list of keywords as well so that is also just a list of equalities, yeah?

brazen charm Sep 3, 2023, 11:51 AM

#

yes

#

internally the DB hashes each key and then works out what partition the key lives in based on the hash

torn sphinx Sep 3, 2023, 11:52 AM

#

stiff radish Which is fine for text cause you only need equality anyway.

Wouldn't be equality in case dat though

coral wasp Sep 3, 2023, 11:53 AM

#

stiff radish Which is fine for text cause you only need equality anyway.

When I hear requirements like this, I can only imagine the other requests that will come in post implementation: ‘tell me all the keywords that this ad hit for’, or ‘which ads with xyz in the title had the most hits’

brazen charm Sep 3, 2023, 11:53 AM

#

https://opensource.docs.scylladb.com/stable/using-scylla/secondary-indexes.html has a good overview of GSIs, even though it as a DB is different to Dynamo, they all use the same principle

Global Secondary Indexes | ScyllaDB Docs

ScyllaDB is an Apache Cassandra-compatible NoSQL data store that can handle 1 million transactions per second on a single server.

torn sphinx Sep 3, 2023, 11:53 AM

#

It would be a range on the gsi

may<dat<june

stiff radish Sep 3, 2023, 11:53 AM

#

torn sphinx It would be a range on the gsi may<dat<june

I haven't used Dynamo. Imo indexes in DBs are a bit of a leaky abstraction. You should read the documentation to verify if it's up to spec for you.

brazen charm Sep 3, 2023, 11:54 AM

#

coral wasp When I hear requirements like this, I can only imagine the other requests that w...

In those cases you go beyond regular DB systems and enter search engines

#

If you're doing things like which ads with xyz in the title had the most hits in your postgres or athena queries you're going to be there a while

stiff radish Sep 3, 2023, 11:55 AM

#

Well, @coral wasp asked the right questions already "is this an operational or analytical system"

#

If it's really just searching for keywords my broke solution might've been to use an RDBMS and do it somewhat properly:

Before inserting split all the text into distinct words.
Have an ad table: ad(PK, date, title)
Have a bridge table: ad_keyword_bridge(FK_ad, FK_key)
Have a keyword table: keyword(Keyword). The keyword can just be the PK. You hash this.

#

Your query plan would look quite good; If you put a good ol' B-tree or similar index on date and a hash on the keyword you'll be fine. You're also storing way way less data this way because your keyword table contains a unique set of keywords.

Maybe I misunderstood a requirement but it aint that deep, no?

coral wasp Sep 3, 2023, 12:03 PM

#

stiff radish If it's really just searching for keywords my broke solution might've been to us...

There are specialized indices that do this stuff in most databases, PostGres has two or three different indices, including full text and trigrams (for partial matching)… but agree this doesn’t scale so well

stiff radish Sep 3, 2023, 12:04 PM

#

It's not just about full text search imo but also just storage size.

coral wasp Sep 3, 2023, 12:05 PM

#

stiff radish It's not just about full text search imo but also just storage size.

Yup, just pointing out you don’t necessarily need to split the words out.

stiff radish Sep 3, 2023, 12:05 PM

#

Depending on how many ads there are per 30 days (if that's truly their timeframe) this is a trivial solution. It depends on the type of query they're doing. What I'm suggesting might scale gracefully and is conceptually very simple and DB-wise idiomatic, I'd benchmark it 🤷

brazen charm Sep 3, 2023, 12:05 PM

#

I think you two might be slightly overengineering this 😅

coral wasp Sep 3, 2023, 12:06 PM

#

Now we need to decide on a caching layer

stiff radish Sep 3, 2023, 12:06 PM

#

I'm old school so I think No-SQL is overengineering

torn sphinx Sep 3, 2023, 12:06 PM

#

pithink

stiff radish Sep 3, 2023, 12:06 PM

#

3 tables in a RDBMS is not overengineering

brazen charm Sep 3, 2023, 12:06 PM

#

That is probably not a great way of going about things

coral wasp Sep 3, 2023, 12:07 PM

#

stiff radish I'm old school so I think No-SQL is overengineering

I feel that way about mongo (or perhaps it’s under engineering), but dynamodb is really good at what it does

stiff radish Sep 3, 2023, 12:08 PM

#

I'm exaggerating a bit but this is a very simple schema imo. I can't count the amount of times I've gotten Mongo from web devs that needed to become relational downstream.

brazen charm Sep 3, 2023, 12:09 PM

#

That's just because people fan boy over using Mongo because it's easy to do something quick and dirty VS actually thinking about the schema.

stiff radish Sep 3, 2023, 12:10 PM

#

I like nosql when it has a clear advantage but I'm pretty sure their schema is "fixed" and if they're not doing this on their entire DB it'll scale while likely having a lower storage cost so idk what the advantage of dynamo is.

#

This might be a me-problem, I'm curious to know why using 3 tables is worse than dynamo. I haven't used it because I'm squarely in an Azure country

coral wasp Sep 3, 2023, 12:12 PM

#

Dynamodb isn’t nosql in the sense that mongo is

#

(Lacking the right words right now to express this, I’m going to get some coffee)

brazen charm Sep 3, 2023, 12:16 PM

#

It is serverless point -> click access.

Dynamo is the same as Cassandra and Scylla if you've used either of those.

But the base concept is the partitioning of data and querying of data is done such that a query only ever needs to hit on database node/shard. Sometimes even down to what CPU core the data lives on.

So rather than your relational DB which might have several indexes and table to JOIN on which get expensive to scale, Dynamo and that go Keyword 'x' lives on partition 'Y' which is one node 'Z' which is a single round trip and much faster.

They operate on the "Many cheap queries per second is better than fewer expensive queries" I.e. issuing 40 queries for 40 keywords is cheaper than issuing one query, because the client can run all 40 queries concurrently and only talk to the DB node which actually has that keyword, rather than randomly picking one node, and then that node having to work out what keyword lives on what node and then sending the query to each node that has the data, which has a much higher number of round trips which increases latency.

stiff radish Sep 3, 2023, 12:17 PM

#

On Azure you have MPP versions of SQL server where you determine your hash and storage strategy

brazen charm Sep 3, 2023, 12:18 PM

#

Now for this amount of data, it doesn't matter that much because relational DBs can still realistically handle the data fairly easily, but in this case you're on serverless AWS instances where accessing dynamo is miles faster and easier than RDS for this workload

stiff radish Sep 3, 2023, 12:18 PM

#

Like round robin over all nodes or "store everything of one node together" etc.

#

It's also serverless

brazen charm Sep 3, 2023, 12:18 PM

#

I have some 😅 less than positive opinions of SQL server xD

stiff radish Sep 3, 2023, 12:18 PM

#

On-prem or cloud?

brazen charm Sep 3, 2023, 12:18 PM

#

Both

torn sphinx Sep 3, 2023, 12:20 PM

#

#

This is how it's actually behaving as well when I am pushing a df. It says contains duplicate rows

brazen charm Sep 3, 2023, 12:22 PM

#

You might need to do PK: keyword, SK: (dt, ad_id)

torn sphinx Sep 3, 2023, 12:22 PM

#

oh okay, gpt suggests just a random id

torn sphinx Sep 3, 2023, 12:23 PM

#

brazen charm You might need to do `PK: keyword, SK: (dt, ad_id)`

not much point of it though. Should just allow to put this directly

brazen charm Sep 3, 2023, 12:23 PM

#

It doesn't really have any use other than stopping the conflict

delicate fieldBOT Sep 3, 2023, 12:32 PM

#

Rules

The rules and guidelines that apply to this community can be found on our rules page. We expect all members of the community to have read and understood these.

#

Rules

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

gentle zinc Sep 3, 2023, 2:10 PM

#

is there any incentive with using the cli client over the workbench? i mean the workbench is so much more intuitive

fading patrol Sep 3, 2023, 2:29 PM

#

gentle zinc is there any incentive with using the cli client over the workbench? i mean the ...

Use whatever works for you.

gentle zinc Sep 3, 2023, 2:31 PM

#

fading patrol Use whatever works for you.

question from a novice here is mysql meant to be clunky , it seems unintuitive and inefficient to use ,excuse my ignorance as i'm only a 12th grader

#

or am i oblivious to it's greatness i mean there ought to be a reason for it's popularity

#

cause there's plenty of alternatives out there

fading patrol Sep 3, 2023, 2:34 PM

#

gentle zinc or am i oblivious to it's greatness i mean there ought to be a reason for it's p...

Depending what you want to do sometimes CLI is the only option or can be just easier. But even advanced professionals generally spend a lot of time in DBeaver or another GUI tool

gentle zinc Sep 3, 2023, 2:35 PM

#

oh and quick question if we dont mention the size what value will it take by default

#

it's 1 right

paper flower Sep 3, 2023, 2:36 PM

#

gentle zinc is there any incentive with using the cli client over the workbench? i mean the ...

My opinion is the same as @fading patrol - use whatever works for you
I use postgresql and rarely use it's cli directly, but I often write raw sql

gentle zinc Sep 3, 2023, 2:36 PM

#

paper flower My opinion is the same as <@382671972829495298> - use whatever works for you I u...

got it mate

paper flower Sep 3, 2023, 2:36 PM

#

Specific tool shouldn't matter

gentle zinc Sep 3, 2023, 2:37 PM

#

paper flower Specific tool shouldn't matter

well in this scenario it kinda does my school is forcing me to use the msql cli and it's really annoying cause i cant really move up lines and edit my text

#

that's why i've resorted to using the workbench for now

paper flower Sep 3, 2023, 2:38 PM

#

Yeah, that makes sense

#

If you're learning SQL - use a tool that allows you to write it more comfortably pithink

gentle zinc Sep 3, 2023, 2:39 PM

#

paper flower If you're learning SQL - use a tool that allows you to write it more comfortably...

mind answering this question for me bud?

paper flower Sep 3, 2023, 2:39 PM

#

Don't ask to ask

gentle zinc Sep 3, 2023, 2:40 PM

#

i've heard this numerous times lmao

#

oh and is there a way to "lock" queries in the workbench for future references?

#

for example i would like to lock the output to another pane

paper flower Sep 3, 2023, 2:40 PM

#

I don't really use mysql and whatever tool you're using too

#

But you can just write them to files 🤔

gentle zinc Sep 3, 2023, 2:42 PM

#

paper flower I don't really use mysql and whatever tool you're using too

it's the workbench which is installed alongside mysql

gentle zinc Sep 3, 2023, 2:42 PM

#

paper flower But you can just write them to files 🤔

eh might just do that

#

@fading patrol i need help :/

coral wasp Sep 3, 2023, 3:13 PM

#

gentle zinc well in this scenario it kinda does my school is forcing me to use the msql cli ...

Also, for school work and the like, you can also use: https://dbfiddle.uk/

db<>fiddle

a free online environment to experiment with SQL and Node.js

gentle zinc Sep 3, 2023, 3:21 PM

#

coral wasp Also, for school work and the like, you can also use: https://dbfiddle.uk/

thanks bud

drifting thunder Sep 3, 2023, 3:45 PM

#

anyone here work with sql alchemy?

coral wasp Sep 3, 2023, 3:48 PM

#

drifting thunder anyone here work with sql alchemy?

I don't really (rarely), but lots of people here do. Ask your question, someone might be able to answer it.

drifting thunder Sep 3, 2023, 3:51 PM

#

coral wasp I don't really (rarely), but lots of people here do. Ask your question, someone ...

I don't either, but I think I just spaghettited a fix

pure mortar Sep 3, 2023, 4:40 PM

#

brazen charm mongo: any query which isn't a lookup on a hashtable -> falls over

ahh dead 💀

paper flower Sep 3, 2023, 4:43 PM

#

drifting thunder anyone here work with sql alchemy?

I do, also don't ask to ask 😛

waxen finch Sep 3, 2023, 5:09 PM

#

@halcyon dew i saw your message about aiosqlite connection pooling yesterday #discord-bots message, and i was curious about it so ive written up a few benchmarks with asqlite to see how multiple connections perform
https://github.com/thegamecracks/asqlite-bench

performance apparently degrades very quickly: ```py

py -m asqlite_bench -c 1 queries\2-selects.toml
Finished in 0.512s
py -m asqlite_bench -c 2 queries\2-selects.toml
Finished in 0.969s
py -m asqlite_bench -c 10 queries\2-selects.toml
Finished in 7.106s
py -m asqlite_bench -c 100 queries\2-selects.toml
Finished in 7.643s``` the numbers suggest that asqlite performs better with a limited number of connections, though it doesnt make sense because SQLite is supposed to allow concurrent readers...

halcyon dew Sep 3, 2023, 5:48 PM

#

waxen finch <@454357482102587393> i saw your message about aiosqlite connection pooling yest...

Ah thanks for testing this. I didn't realize that

waxen finch Sep 3, 2023, 6:50 PM

#

similar results came out of manually running the select statements in different threads, although it was overall faster
https://paste.pythondiscord.com/YTLQ
1 conn: 0.212s
2 conns: 0.589s
10 conns: 6.600s
100 conns: 6.892s

waxen finch Sep 3, 2023, 7:28 PM

#

what in particular are you stuck on? i think your suggested table is fine, although the ordering of user1 and user2 might be ambiguous

#

what will you use for the row's primary key? if its (cmd, user1, user2), then (cmd, user2, user1) would be considered a separate interaction

shut tiger Sep 3, 2023, 7:46 PM

#

Sort the pks of user1 and user2.

waxen finch Sep 3, 2023, 7:59 PM

#

^ although as a derivative question, how would the query be written if it was a many-interactions to many-users relationship? for example: sql CREATE TABLE interaction ( id INTEGER PRIMARY KEY, cmd TEXT NOT NULL, count INTEGER NOT NULL DEFAULT 0 ); CREATE TABLE interaction_user ( interaction_id INTEGER REFERENCES interaction (id), user_id INTEGER, PRIMARY KEY (interaction_id, user_id) ); im thinking of a self-referential join but im not sure if there's a better way: sql SELECT i.cmd, i.count FROM interaction_user iu1 JOIN interaction_user iu2 ON iu1.interaction_id = iu2.interaction_id JOIN interaction i ON i.id = iu1.interaction_id WHERE iu1.user_id = ? AND iu2.user_id = ?;

hybrid marsh Sep 3, 2023, 10:11 PM

#

Need some help with cassandra, if anyone's around. Got a list<text> type, and when I call for cassandra to return it, it only returns the first ten items in the list, regardless of what I set the LIMIT to

brazen charm Sep 3, 2023, 10:22 PM

#

hybrid marsh Need some help with cassandra, if anyone's around. Got a list<text> type, and wh...

What is your query

#

Also in general it's recommended to avoid the list datatype if you can, really you want a set unless you really need duplicates or sorting

hybrid marsh Sep 3, 2023, 10:43 PM

#

Need duplicates

hybrid marsh Sep 3, 2023, 10:44 PM

#

brazen charm What is your query

SELECT wallet FROM madeas LIMIT 1300;
returns only 10 items

brazen charm Sep 3, 2023, 10:47 PM

#

take a screenshot of the rows returned

hybrid marsh Sep 3, 2023, 10:48 PM

#

Only 1 row

brazen charm Sep 3, 2023, 10:48 PM

#

hybrid marsh Need duplicates

Aight, that can be a bit risky though since you can create duplicates without knowing under normal usage

hybrid marsh Sep 3, 2023, 10:48 PM

#

but thats expected because there is only 1 row

brazen charm Sep 3, 2023, 10:48 PM

#

What are you using to query the data?

hybrid marsh Sep 3, 2023, 10:48 PM

#

my problem is the fact that the row thats returned, the list has 12 items, but the query only returns 10

hybrid marsh Sep 3, 2023, 10:48 PM

#

brazen charm What are you using to query the data?

Python datastax driver

brazen charm Sep 3, 2023, 10:49 PM

#

Can i see the code?

brazen charm Sep 3, 2023, 10:49 PM

#

hybrid marsh my problem is the fact that the row thats returned, the list has 12 items, but t...

How do you know it has 12 items btw?

hybrid marsh Sep 3, 2023, 10:51 PM

#

Proper weird. Now I only have 10 items. The table must not have updated in my client (TablePlus). Appreciate the help anyway

torn sphinx Sep 4, 2023, 3:17 AM

#

Had some help with this a couple of days ago but I'm still confused so I'm gonna post again.

import asyncpg as psy
import asyncio

tables = ('test1', 'test2', 'test3')

async def start():
    conn = await psy.connect(user='postgres', password='root', database='postgres', host='localhost')
    async with asyncio.TaskGroup() as tg:
        for fn in tables:
            tg.create_task(table_check(fn))
    async with asyncio.TaskGroup() as tg:
        for fn in tables:
            tg.create_task(search(fn))


async def table_check(i):
    conn = await psy.connect(user='postgres', password='root', database='postgres', host='localhost')
    await conn.execute(f"CREATE TABLE IF NOT EXISTS {i} (value INT)")

async def search(i):
    conn = await psy.connect(user='postgres', password='root', database='postgres', host='localhost')
    print(i)
    a = await conn.execute(f"SELECT COUNT(id) FROM {i}")
    print(a)

loop = asyncio.get_event_loop()
loop.run_until_complete(start())```

#

test1
test3
SELECT 1
SELECT 1
test2
SELECT 1```

#

^ Output

#

It should be printing how many records there are

torn sphinx Sep 4, 2023, 4:22 AM

#

@torn sphinx didn't work g

#

+ Exception Group Traceback (most recent call last):
  |   File "c:\Users\m\Desktop\Test\V1\New\search.py", line 27, in <module>
  |     loop.run_until_complete(start())
  |   File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 
653, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "c:\Users\m\Desktop\Test\V1\New\search.py", line 11, in start
  |     async with asyncio.TaskGroup() as tg:
  |   File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\asyncio\taskgroups.py", line 147, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "c:\Users\m\Desktop\Test\V1\New\search.py", line 24, in search
    |     print(a[0]['count'])
    |           ~~~~^^^^^^^^^
    | TypeError: string indices must be integers, not 'str'```

harsh pulsar Sep 4, 2023, 4:37 AM

#

pure mortar oof. and i guess you were on the analytics side of the house too

actually not in that particular case, but i was responsible for developing something akin to an analytics database... in mongodb 🫠 . lots of $lookup

#

admittedly that's because i had/have poor intuition for mongodb system design and i basically translated it to a relational model. because realistically it was relational data

#

i spent a long time thinking about how much i could precompute to take advantage of the "document" thing, but the computations all involved lookups across entities so the pre-computation upon insert would end up getting expensive, and i thought it was a lot less bad to have slightly slower lookups than to run the risk of accumulating backpressure due to a large number of updates/inserts, because the thing was being fed by a webhook triggered by updates in the cms

#

in hindsight i was probably too worried about that because the group of users who were actively editing things in the cms was small. with the benefit of hindsight i probably could have done without the giant pile of lookups in the main query

#

fortunately the lookups were all lookups on unique keys so we just added some indexes and the performance was good enough that we didnt need to pursue further optimization

pure mortar Sep 4, 2023, 4:41 AM

#

whew wow a happy ending with mongo

#

@brazen charm would maybe like to see this

pure mortar Sep 4, 2023, 4:43 AM

#

harsh pulsar in hindsight i was probably too worried about that because the group of users wh...

sometimes its hard to figure this stuff out in the moment tbh

harsh pulsar Sep 4, 2023, 4:43 AM

#

indeed

pure mortar Sep 4, 2023, 4:44 AM

#

funny enough, one of the products in the portfolio atm only has a small number of users but theyre kinda optimizing too much for them as well but what do i know

#

Oopsies

harsh pulsar Sep 4, 2023, 4:44 AM

#

yeah it takes discipline to not spend too long optimizing something that already has 50ms round trip response time and serves 10 concurrent users max per day

pure mortar Sep 4, 2023, 4:44 AM

#

its taking up two dev teams but not like im in charge

harsh pulsar Sep 4, 2023, 4:44 AM

#

damn ok

pure mortar Sep 4, 2023, 4:45 AM

#

harsh pulsar yeah it takes discipline to *not* spend too long optimizing something that alrea...

damn thats already good

harsh pulsar Sep 4, 2023, 4:45 AM

#

i thought i was bad taking an extra week to polish my data science code for posterity

pure mortar Sep 4, 2023, 4:45 AM

#

i would take 300ms

#

heck yknow how ML stuff is sometimes, sometimes we're looking at SECONDS in latency

harsh pulsar Sep 4, 2023, 4:45 AM

#

pure mortar damn thats already good

python with asyncio and gunicorn+uvicorn is surprisingly fast

pure mortar Sep 4, 2023, 4:45 AM

#

pure mortar heck yknow how ML stuff is sometimes, sometimes we're looking at SECONDS in late...

but usually that stuff is async or internal so its fine

#

~~for now~~

harsh pulsar Sep 4, 2023, 4:45 AM

#

oh yeah for sure

#

and in my case the 50 ms is just within the AWS VPC

pure mortar Sep 4, 2023, 4:46 AM

#

always aws vpc smh

harsh pulsar Sep 4, 2023, 4:46 AM

#

but that's all we need, the current app im finishing up is for an internal api only

#

i like it that way

#

keep that frontend away from me

pure mortar Sep 4, 2023, 4:47 AM

#

harsh pulsar keep that frontend away from me

they will have you redo and redo until the wheels come off the wagon in that world

#

Running

harsh pulsar Sep 4, 2023, 4:48 AM

#

lol

#

i dont understand it

#

this is getting offtopic now but the frontend tooling churn is really insane

pure mortar Sep 4, 2023, 4:49 AM

#

oh yeah i forgot we were in db. oops

harsh pulsar Sep 4, 2023, 4:49 AM

#

anyway currently something like 50% of our latency is just database lookups. id actually be curious about this

#

our main query is something like

select *
from records
where group_id = :1 and created_time <= :2
order by created_time desc
limit 1

and the schema for the table is something like this in timescale:

create table records (
  group_id uuid,
  created_time timestamp,
  -- other attributes ...
  primary key (group_id, created_time)
);
select create_hypertable('records', 'created_time');

and im wondering when i know if i need to add an index on group_id, like what tea leaves should i be reading in the explain output and/or what kinds of metrics should i be collecting about postgres and timescale

#

or does postgres actually create an index for lookups because it's part of the primary key?

#

i can never remember how that works in different databases

#

my intuition was that within each chunk of the hypertable, table scans should be reasonably fast, so i didn't feel like we desperately needed anything special here. i don't have a query plan handy but it seemed pretty "normal"

pure mortar Sep 4, 2023, 4:59 AM

#

hmm yeah if you ever need to optimize, those would be good places to look/investigate first. lmk if you end up doing so

harsh pulsar Sep 4, 2023, 5:02 AM

#

pure mortar hmm yeah if you ever need to optimize, those would be good places to look/invest...

sure. that or i might never touch it again ducky_skull

pure mortar Sep 4, 2023, 5:04 AM

#

||that may be for the best. go solve other problems, salt||

#

Running

paper flower Sep 4, 2023, 5:59 AM

#

harsh pulsar our main query is something like ```sql select * from records where group_id = :...

I guess since group_id, created_time is a composite PK/Index it does not allow you to efficiently filter and sort by created_time

#

Could you try slapping an index on created_time?

torn sphinx Sep 4, 2023, 6:12 AM

#

import asyncpg as psy
import asyncio

tables = ('test1', 'test2', 'test3')

async def start():
    conn = await psy.connect(user='postgres', password='root', database='postgres', host='localhost')
    async with asyncio.TaskGroup() as tg:
        for fn in tables:
            tg.create_task(table_check(fn))
    async with asyncio.TaskGroup() as tg:
        for fn in tables:
            tg.create_task(search(fn))


async def table_check(i):
    conn = await psy.connect(user='postgres', password='root', database='postgres', host='localhost')
    await conn.execute(f"CREATE TABLE IF NOT EXISTS {i} (value INT)")

async def search(i):
    conn = await psy.connect(user='postgres', password='root', database='postgres', host='localhost')
    print(i)
    a = await conn.execute(f"SELECT COUNT(*) FROM {i}")
    print(a)

loop = asyncio.get_event_loop()
loop.run_until_complete(start())```

#

test1
test3
SELECT 1
SELECT 1
test2
SELECT 1```

#

^ This is the output, I want it to be outputting the count, which I tried count(id) (mentioned by someone previously)

waxen finch Sep 4, 2023, 6:32 AM

#

torn sphinx ^ This is the output, I want it to be outputting the count, which I tried `count...

see the API reference:
https://magicstack.github.io/asyncpg/current/api/index.html#asyncpg.connection.Connection.execute
conn.execute() returns the status of the query (which is apparently an sql string?), but what you want is the row/values of that query, so you should use fetch() or fetchrow() or fetchval() instead: ```py
rows = await conn.fetch("SELECT ...")
row = rows[0]
count = row[0]

OR

row = await conn.fetchrow("SELECT ...")
count = row[0]

OR

count = await conn.fetchval("SELECT ...")```

torn sphinx Sep 4, 2023, 6:42 AM

#

waxen finch see the API reference: <https://magicstack.github.io/asyncpg/current/api/index.h...

Appreciate it mate

torn sphinx Sep 4, 2023, 6:59 AM

#

waxen finch see the API reference: <https://magicstack.github.io/asyncpg/current/api/index.h...

Do you know if you're able to add columns to a table?

#

Like after it's created

waxen finch Sep 4, 2023, 7:00 AM

#

torn sphinx Do you know if you're able to add columns to a table?

there's a lot of things you can alter about a table post-creation
https://www.postgresql.org/docs/current/sql-altertable.html

#

i mostly toy around with postgresql through a pgadmin container though, so i wouldnt be aware of specific details

torn sphinx Sep 4, 2023, 7:06 AM

#

waxen finch there's a lot of things you can alter about a table post-creation <https://www.p...

Appreciate it g. Also some other questions, I'm storing a lot of data. I planned on doing in separate tables as some have similar data but others have a lot different, I could end up with about 50 different columns in total. Should I do this all in one table (storing no more than 50b records) or in hundreds possibly thousands of tables.

waxen finch Sep 4, 2023, 7:13 AM

#

i think having that many tables would be cumbersome, but i don't know enough about the data you're modelling to give an appropriate answer (im no professional database designer either, so i probably wouldnt know anyway)

#

actually why would you have thousands of tables if the total number of columns is ~50?

#

or did you mean ~50 columns for each table

#

if you dont need to enforce any constraints on the data or access/modify individual columns (i.e. selecting the entire row), perhaps you'd be interested in denormalizing your schema with json[b] columns, arrays, etc.

torn sphinx Sep 4, 2023, 7:27 AM

#

waxen finch or did you mean ~50 columns for each table

If I do one table, I'd have a lot of columns do to the different types of data being stored. The issue is I'll need to do searches through all of the tables if I have multiple. meaning to find results (anything with related data needs to be pulled) I'd have to loop through each table.

torn sphinx Sep 4, 2023, 7:27 AM

#

waxen finch if you dont need to enforce any constraints on the data or access/modify individ...

I've never used constraints, not sure how they work.

waxen finch Sep 4, 2023, 7:30 AM

#

torn sphinx I've never used constraints, not sure how they work.

er, a constraint generally defines some rule/assertion about the data, like a column being NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, etc.

torn sphinx Sep 4, 2023, 7:33 AM

#

waxen finch er, a constraint generally defines some rule/assertion about the data, like a co...

I won't really have those, the data is all pretty much csv files I'm dumping. Most of the columns will probably be text aswell (to avoid issues)

waxen finch Sep 4, 2023, 7:56 AM

#

torn sphinx I won't really have those, the data is all pretty much csv files I'm dumping. Mo...

oh, so you're debating between giving each csv file their own table vs making it available in one? if you again dont want to constrain the data or modify it, i imagine a table with an indexed filename and an array column would be okay for querying sql CREATE TABLE datasets (filename TEXT, "row" TEXT[]); CREATE INDEX ON datasets (filename); though the lack of a schema makes it hard to work with the COPY command, so perhaps that's not a good idea

torn sphinx Sep 4, 2023, 7:57 AM

#

waxen finch oh, so you're debating between giving each csv file their own table vs making i...

What's an indexed filename?

torn sphinx Sep 4, 2023, 7:57 AM

#

waxen finch oh, so you're debating between giving each csv file their own table vs making i...

Not sure what the copy command is either, I made something like what I'm trying to make like 2 months ago but it was using pyscopg

#

I used pandas to convert csv files into tables which would then dump the data, I typed in the columns that needed to be created and it would create the table based off of the csv file's name

waxen finch Sep 4, 2023, 7:58 AM

#

torn sphinx What's an indexed filename?

i just mean using the filename of each CSV you're importing as a way to distinguish which rows come from which file

torn sphinx Sep 4, 2023, 7:58 AM

#

And by an array column do you mean I'd have one column which would be the entire row rather than having a column for each value?

torn sphinx Sep 4, 2023, 7:59 AM

#

waxen finch i just mean using the filename of each CSV you're importing as a way to distingu...

That's what I planned to do

waxen finch Sep 4, 2023, 8:04 AM

#

torn sphinx Not sure what the copy command is either, I made something like what I'm trying ...

its there to import/export files on the server directly to/from tables, csv format included

waxen finch Sep 4, 2023, 8:04 AM

#

torn sphinx And by an array column do you mean I'd have one column which would be the entire...

yeah

waxen finch Sep 4, 2023, 8:07 AM

#

waxen finch oh, so you're debating between giving each csv file their own table vs making i...

amending the schema to store the filename in a single place (less duplication/smaller storage requirement), along with its columns: sql CREATE TABLE dataset ( id SERIAL PRIMARY KEY, filename TEXT UNIQUE, columns TEXT[] ); CREATE TABLE dataset_row ( dataset_id INTEGER REFERENCES dataset (id), "row" TEXT[] ); CREATE INDEX ON dataset_row (dataset_id);

torn sphinx Sep 4, 2023, 8:12 AM

#

waxen finch amending the schema to store the filename in a single place (less duplication/sm...

What's the reason for creating two different tables?

waxen finch Sep 4, 2023, 8:14 AM

#

less duplication, the filename is stored once and a number is used in-place of it instead of storing 5 billion filenames

#

same goes for the columns column

#

alternatively you can look at it and say one csv file has many rows, a.k.a. a one-to-many relationship

torn sphinx Sep 4, 2023, 8:25 AM

#

waxen finch less duplication, the filename is stored once and a number is used in-place of i...

Basically relearning this so I'll lyk what I'm thinking is happening. You're creating a table which has a SERIAL PRIMARY KEY column under the name id. You then create a column called filename (text), which is unique. Meaning that only one of whatever text is inside of it can exist, not sure how it manages that though. Then you create another column which is test, I think the [] turns it into an array. Then you create another table which is linking a column it creates to dataset id, I think this means that its grabbing each unique row for each table name aka filename. Then a row with another column called text which is an array.

#

Not sure what the index thing is, also half certain about what's happening

waxen finch Sep 4, 2023, 8:40 AM

#

torn sphinx Not sure what the index thing is, also half certain about what's happening

i need some sleep now, but for the time being you can check out postgres's docs about indexing and foreign keys

PostgreSQL Documentation

11.1. Introduction

11.1. Introduction Suppose we have a table similar to this: CREATE TABLE test1 ( id integer, content varchar ); and the …

PostgreSQL Documentation

5.4. Constraints

5.4. Constraints 5.4.1. Check Constraints 5.4.2. Not-Null Constraints 5.4.3. Unique Constraints 5.4.4. Primary Keys 5.4.5. Foreign Keys 5.4.6. Exclusion Constraints Data …

waxen finch Sep 4, 2023, 2:55 PM

#

each interaction gets its own identifier, and the interaction_user table associates many users with many interactions (in other words a user can be in many interactions, and an interaction can have many users)

torn sphinx Sep 4, 2023, 3:02 PM

#

Hey guys. I saw that dynamoDb returns paginated responses. Isn't that gonna slow down the data retrieval rate if there are a lot of rows

#

it's only giving 700 rows at a time. So could be 100 calls to the api

stiff radish Sep 4, 2023, 3:08 PM

#

50/50 between here and #software-architecture but for those that use raw SQL, how do you guys keep it organised?

Having queries everywhere is not DRY whatsoever on the other hand stored procs or views place business logic in the DB.

I'm a bit "tired" of ORMs and buying into a unit of work and a repository pattern on top of MVC but a compromise would be a query builder a la SQlalchemy core.

paper flower Sep 4, 2023, 3:11 PM

#

stiff radish 50/50 between here and <#782713858615017503> but for those that use raw SQL, how...

Honestly I just use ORM where it's possible, using a query builder would essentially provide you with the same result as writing raw sql

#

And ORM really just maps your results to objects, something you'd probably do yourself anyway pithink

hoary onyx Sep 4, 2023, 3:31 PM

#

Yall know hwere can i find a massive list of websites?

coral wasp Sep 4, 2023, 3:41 PM

#

stiff radish 50/50 between here and <#782713858615017503> but for those that use raw SQL, how...

Are you familiar with dbt? I primarily do analytical stuff, so almost all straight sql

stiff radish Sep 4, 2023, 3:42 PM

#

coral wasp Are you familiar with dbt? I primarily do analytical stuff, so almost all straig...

Yes but it's a different use case

pearl lodge Sep 4, 2023, 3:42 PM

#

guys, i need help settling up my database:

#

i don't know why my username and password is not working

#

i'm trying to settup atlas

pure mortar Sep 4, 2023, 4:08 PM

#

coral wasp Are you familiar with dbt? I primarily do analytical stuff, so almost all straig...

oh hey congrats on the helper role Praise

stiff radish Sep 4, 2023, 4:29 PM

#

paper flower Honestly I just use ORM where it's possible, using a query builder would essenti...

Yeah this makes the most sense for my use case. You told me last time and I've decided I'll do that

#

For analytics type stuff I don't mind managing raw SQL because code there doesn't really really have to be DRY because the same concept is used over and over. If it's reused I just version control it 🤷

paper flower Sep 4, 2023, 4:46 PM

#

stiff radish For analytics type stuff I don't mind managing raw SQL because code there doesn'...

It would still make sense to use core/orm becaues that way your query results are typed and if something changes in your model (e.g. column name) it would change in that query too

stiff radish Sep 4, 2023, 5:05 PM

#

paper flower It would still make sense to use core/orm becaues that way your query results ar...

For me speed and memory usage are a big big requirement like I mentioned. I use https://github.com/sfu-db/connector-x so I don't go OOM.

GitHub

GitHub - sfu-db/connector-x: Fastest library to load data from DB t...

Fastest library to load data from DB to DataFrames in Rust and Python - GitHub - sfu-db/connector-x: Fastest library to load data from DB to DataFrames in Rust and Python

#

For the web stuff I do this isn't so much of a requirement but honestly, using connector-x was an incredible speedup.

torn sphinx Sep 4, 2023, 5:11 PM

#

waxen finch i need some sleep now, but for the time being you can check out postgres's docs ...

"If there are many rows in test1 and only a few rows (perhaps zero or one) that would be returned by such a query, this is clearly an inefficient method." Should I not use an index as I can expect multiple returns, possibly hundreds?

paper flower Sep 4, 2023, 5:15 PM

#

stiff radish For me speed and memory usage are a big big requirement like I mentioned. I use ...

I think you already mentioned, and it really depends on how you used your orm

#

Nothing would be able to load a large dataset into memory for example

stiff radish Sep 4, 2023, 5:16 PM

#

I do manage loading tens of millions of rows, which I all need at the same time so I can't do it in batch (or trust me I would). It's worth looking at the project because you never know, I don't think the current ORMs could handle this.

waxen finch Sep 4, 2023, 5:18 PM

#

torn sphinx ["If there are many rows in test1 and only a few rows (perhaps zero or one) that...

having a lot of rows to select does not necessarily mean that an index is inappropriate, its the ratio of rows you're looking for compared to the total rows being searched through - if you had a pool of 5 billion rows and you specifically want a million rows associated with a CSV file, not having an index would mean scanning all 5 billion rows to pick out the million rows you want

#

so relatively speaking, you still only want a few rows at a time

torn sphinx Sep 4, 2023, 5:20 PM

#

I see okay, so I should 100% be using indexes. What I don't get is how they work, and what they effect.

#

In what ways will it change?

long leaf Sep 4, 2023, 5:24 PM

#

lmao

torn sphinx Sep 4, 2023, 5:25 PM

#

why is this being posted here

long leaf Sep 4, 2023, 5:25 PM

#

wrong channel

torn sphinx Sep 4, 2023, 5:25 PM

#

ah okok

long leaf Sep 4, 2023, 5:25 PM

#

thought this was regular chat

waxen finch Sep 4, 2023, 5:57 PM

#

torn sphinx I see okay, so I should 100% be using indexes. What I don't get is how they work...

there are several algorithms/data structures involved in indexing, but binary trees are a pretty common one so you can think about that first - if you were to convert this list of numbers, [1, 1, 2, 3, 5, 8, 13, 21], into a sorted binary tree, you end up with something like this: 3 / \ 1 8 / \ / \ 1 2 5 13 \ 21 how do you find 5 in the binary tree? you start from the root and work your way down: 3 is less than 5, so you move right, and 8 is more than 5, so you move left, and you find that 5 exists
this only required 3 comparisons, whereas if you were to find 5 in the list, assuming you didn't know the list was sorted, you would need to start from the beginning and check every number until you found 5
in the worst case that would take 8 comparisons, which would be 2.6x slower than the btree

when you create an index on dataset_row.dataset_id, every row gets a corresponding entry in your index containing dataset_id and a pointer to where the row is stored on disk, so out of 5 billion rows, you'd only need to make at most 32 comparisons to find a row matching a specific dataset_id

torn sphinx Sep 4, 2023, 6:01 PM

#

waxen finch there are several algorithms/data structures involved in indexing, but binary tr...

That makes sense, that's pretty cool thanks for writing this lol. But yeah I get the idea now, I'll use it because that is a huge difference, especially when dealing with billions of records.

waxen finch Sep 4, 2023, 6:02 PM

#

though after the index match, postgres doesn't make any guarantees about how it orders rows with the same dataset_id, unless you specify a column with the ORDER BY clause - if you do care about the order of your data, you might want to add a position column and add it to your index so you can efficiently sort by (dataset_id, position)

torn sphinx Sep 4, 2023, 6:07 PM

#

waxen finch though after the index match, postgres doesn't make any guarantees about how it ...

Order won't really matter, It'll all be one table so as long as it matches the order isn't important

waxen finch Sep 4, 2023, 6:09 PM

#

when you want to write down an interaction between two users, you insert three rows; first row goes into interaction, then one row for each user participating in that interaction sql INSERT INTO interaction (id, cmd, count) VALUES (1234, 'rock-paper-scissors', 1); INSERT INTO interaction_user (interaction_id, user_id) VALUES (1234, 9876), (1234, 5432);

coral wasp Sep 4, 2023, 6:38 PM

#

torn sphinx That makes sense, that's pretty cool thanks for writing this lol. But yeah I get...

There’s a tradeoff with indices: lookups become faster but writes/updates become slower. In most common cases, the tradeoff is worth it…. You don’t notice any slowdown? But at high throughput rates, the cost becomes an issue

torn sphinx Sep 4, 2023, 6:47 PM

#

coral wasp There’s a tradeoff with indices: lookups become faster but writes/updates become...

Wdym by "high throughput rates"? Curious on this, wondering if it's worth it.

coral wasp Sep 4, 2023, 6:51 PM

#

torn sphinx Wdym by "high throughput rates"? Curious on this, wondering if it's worth it.

Usually the cost of an index is so small it’s not noticeable. Just be aware that the cost is not zero: so adding indexes everywhere is not a good idea. It’s somewhat hard to quantify.

torn sphinx Sep 4, 2023, 6:52 PM

#

Interesting okay, I'll keep that in mind thank you.

torn sphinx Sep 4, 2023, 6:53 PM

#

waxen finch there are several algorithms/data structures involved in indexing, but binary tr...

Doing what you said, having an array with the data, I'm not sure how I'd go about it. I plan on being able to let my co workers search through this data using filters. For example, first name, last name, middlename, age, email, occupation, phone number, experience, etc

harsh pulsar Sep 4, 2023, 7:29 PM

#

paper flower I **guess** since `group_id, created_time` is a composite PK/Index it does not a...

Well that's what I was wondering, that seems to be kind of what timescale helps with

#

We're always filtering on both, never just one

#

The data is already chunked/partitioned on creation time, and then we do a primary key look up within each partition. seems like it can't get better than that in theory

shut tiger Sep 4, 2023, 7:31 PM

#

There is an implicit return None too. Maybe you should look at an ORM like django or sqlalchemy.

paper flower Sep 4, 2023, 7:54 PM

#

harsh pulsar Well that's what I was wondering, that seems to be kind of what timescale helps ...

But your pk order is "wrong" I'd say

#

Can you change it to time, group_id?

#

See if there's a difference

#

In the end your order by could be slow, did you try profiling that query?

waxen finch Sep 4, 2023, 7:58 PM

#

torn sphinx Doing what you said, having an array with the data, I'm not sure how I'd go abou...

hmm, with the current schema you can do that, for example given the dataset us-counties-2023.csv: ```sql
-- columns = {date,county,state,fips,cases,deaths}

-- To sort rows by highest cases:
SELECT dr."row" FROM dataset_row dr
JOIN dataset d ON d.id = dr.dataset_id
WHERE d.filename = 'us-counties-2023.csv'
ORDER BY dr."row"[array_position(d.columns, 'cases')]::integer DESC``` you can also consider adding a GIN index on row to improve comparisons on individual elements within it but only if you find your query's performance is too slow, as it'll be a very expensive index for your database to maintain when you have a lot of rows

harsh schooner Sep 4, 2023, 8:01 PM

#

is anybody willing to code review my some 250 line long sqlite abstraction?? 🙃

trail rune Sep 4, 2023, 10:43 PM

#

Where do I put attributes that two entities share?
To start, I had a many-to-many relationship between CARD and CARDPARTS. A card can be related to zero, one, or many card parts. A card part can be related to one or many cards. To solve this relationship I tried putting an intersection entity between them: a card to card part mapping. Where my question comes from is that cards and card parts have some unique attributes and some that they share. For example, booster is a unique attribute of card; component is a unique attribute of card parts; and they both have the attribute name. Do I put the shared attributes in the intersection entity? Do I keep them in the CARD and CARDPARTS entities? Do I do something else?
If you need more information, please don't hesitate to ask.
Thank you.

harsh pulsar Sep 4, 2023, 11:47 PM

#

paper flower In the end your order by could be slow, did you try profiling that query?

Timescaledb has some kind of storage / query planner optimization and this is how they recommend querying the "most recent" item that matches a criterion

#

I can definitely try some benchmarks with the PK order swapped, ill also have to look carefully at the explain output after the data is more populated and there are realistic table stats

#

I think someone mentioned a pg profiling tool above? Or maybe that was somewhere else. I can look into it too

harsh pulsar Sep 4, 2023, 11:50 PM

#

trail rune Where do I put attributes that two entities share? To start, I had a many-to-ma...

the "name" on a card part seems like it's only coincidentally and superficially the same as the "name" on a card

coral wasp Sep 4, 2023, 11:54 PM

#

harsh pulsar our main query is something like ```sql select * from records where group_id = :...

The first thing you should look at in the explain output is whether the indexes are actually used. That’s 90% of what I look for in explains. Then the question is why not: was the index not usable for the query or simply not helpful (no benefit). Sometimes a missing compound index or something similar is the root problem. (Apologize if this is obv/basic, but most of the time I see people who are just missing the ‘correct’ compound index)

#

And if your main query is looking up the last record for each group, perhaps that could be denormalized and maintained separately (ie: on the group table). Or, perhaps a partitioning strategy

harsh pulsar Sep 5, 2023, 1:35 AM

#

coral wasp The first thing you should look at in the explain output is whether the indexes ...

good advice, thank you

harsh pulsar Sep 5, 2023, 1:36 AM

#

coral wasp And if your main query is looking up the last record for each group, perhaps tha...

the partitioning i believe is what timescale does automatically with its "hypertable"

#

maintaining this as two separate tables would be kind of a nightmare, it would have to be kept up to date in real time

coral wasp Sep 5, 2023, 1:38 AM

#

And could you use a last() aggregate, rather than a sort and limit?

coral wasp Sep 5, 2023, 1:39 AM

#

harsh pulsar the partitioning i believe is what timescale does automatically with its "hypert...

An aggregate last() might be optimized diff than what you’re doing (filter-sort-limit)

obsidian basin Sep 5, 2023, 1:39 AM

#

I am getting the error caused by the foreign key being NOT NULL but I made the Foreign key NULLABLE. in flask-sqlalchemy. How do I fix this? Thanks for the help. Please ping on reply.

class User(UserMixin, db.Model):
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(120), unique=True) 
    rel_payments = db.relationship('Payments', backref='profileinfo', lazy=True) 

class Payments(db.Model):
    '''
    One to many relationship
    This is the Many relationship. 
    '''

    id = db.Column(db.Integer, primary_key=True)
    item_name = db.Column(db.String(80))
    price_of_donation = db.Column(db.Integer)
    # How do I turn email into the foreign key? todo.
    email = db.Column(db.String(120))
    fk_user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=True)

Here is the code in routes.py
https://paste.pythondiscord.com/MIDA

Here is the error.
https://paste.pythondiscord.com/37TQ

harsh pulsar Sep 5, 2023, 1:40 AM

#

coral wasp An aggregate last() might be optimized diff than what you’re doing (filter-sort-...

yeah the timescale docs actually specifically advise not to use last() because it doesn't use the hypertable chunking or indexing effectively, that's only meant for groupby aggreation. apparently order + limit is their recommended idiom here

coral wasp Sep 5, 2023, 1:40 AM

#

Oh, interesting.

harsh pulsar Sep 5, 2023, 1:40 AM

#

https://docs.timescale.com/api/latest/hyperfunctions/last/

The last and first commands do not use indexes, they perform a sequential scan through the group. They are primarily used for ordered selection within a GROUP BY aggregate, and not as an alternative to an ORDER BY time DESC LIMIT 1 clause to find the latest value, which uses indexes.

Timescale Docs

Get the last value in one column when rows are ordered by another column

harsh pulsar Sep 5, 2023, 1:41 AM

#

obsidian basin I am getting the error caused by the foreign key being NOT NULL but I made the F...

can you check the actual table schema and confirm that it's nullable in the database?

obsidian basin Sep 5, 2023, 1:43 AM

#

@harsh pulsar You were correct let me try again and see if the error still appears , if this is the case sorry for such a dumb question

coral wasp Sep 5, 2023, 1:43 AM

#

harsh pulsar https://docs.timescale.com/api/latest/hyperfunctions/last/ > The last and first ...

Maybe reduce the search window? Instead of “before the end date”, add a start date. And, this could be computed / materialized infrequently. Like: once a day, compute the last event for each group.

harsh pulsar Sep 5, 2023, 1:44 AM

#

obsidian basin <@389497659087650836> You were correct let me try again and see if the error sti...

can't hurt to consider using a migration framework here. alembic integrates with sqlalchemy for example

harsh pulsar Sep 5, 2023, 1:45 AM

#

coral wasp Maybe reduce the search window? Instead of “before the end date”, add a start da...

interesting. i wonder if that actually makes a difference. i have a feeling that internally they're able to do an efficient lookup on the timestamp alone. but i can mess around with it and benchmark some things. as for precomputing, no, this needs to happen on-line

#

(to be clear, the current performance is more than acceptable)

coral wasp Sep 5, 2023, 1:46 AM

#

My suggestion was to precompute something that just narrows the space; even if it’s out of date, it might cut the query from ‘forever until today’ to a few days.

#

So, it’s online/real-time, with the crutch of a narrower range

#

The real test is: how well will this system respond to queries as the table gets larger: will the query remain responsive.

#

Im genuinely curious, haven’t worked with this in pg.

harsh pulsar Sep 5, 2023, 1:49 AM

#

coral wasp So, it’s online/real-time, with the crutch of a narrower range

my limited intuition about how these datetime indexes work is that they don't really need that kind of coaxing, but i can definitely experiment with it. can also maybe reach out to timescale since we're enterprise customers

#

basically we just need the "most recent previous" record for a given entity, but that could be arbitrarily far in the past or half a second ago

coral wasp Sep 5, 2023, 1:50 AM

#

Yah, if it’s an ~O(1) operation than yah, o worries

harsh pulsar Sep 5, 2023, 1:50 AM

#

yeah no idea, would love to know

coral wasp Sep 5, 2023, 1:51 AM

#

Me too! (Let me know if you do find out)

harsh pulsar Sep 5, 2023, 1:51 AM

#

coral wasp The real test is: how well will this system respond to queries as the table gets...

that's what i'm interested in too. my week coming up is mostly dedicated to QAing this app anyway, so i can hopefully work in a load test

#

won't be that realistic but at least blasting the system with a few hundred concurrent requests from my laptop should at least give us a rough sense of stability under load

#

the db is definitely the biggest perf bottleneck rn. the python app is maybe 1/3 of the time, and most of that is spent inside fastapi

#

ive been wanting to try pypy as well for this

obsidian basin Sep 5, 2023, 1:54 AM

#

@harsh pulsar I tried deleting the migration folder and app.db. Then recreating them by step 1 flask db init. Step 2 flask db migrate -m "Initial migration." . Step 3 flask db upgrade. Then I ran the code in the terminal. Here is the documentation on flask migrate. https://flask-migrate.readthedocs.io/en/latest/

coral wasp Sep 5, 2023, 1:58 AM

#

harsh pulsar won't be *that* realistic but at least blasting the system with a few hundred co...

Yah. My two cents is: consider data generation of longer histories… and maybe measure query performance vs db size.

harsh pulsar Sep 5, 2023, 2:14 AM

#

coral wasp Yah. My two cents is: consider data generation of longer histories… and maybe me...

good thinking on these. i was doing the former but not for the specific intention of load testing

paper flower Sep 5, 2023, 5:34 AM

#

harsh pulsar my limited intuition about how these datetime indexes work is that they don't re...

I think since there's no lower bound it may be checking all of your partitions, if you often need to retrieve latest values I'd store them in a separate table

craggy gust Sep 5, 2023, 5:48 AM

#

Sqlalchemy 1.4 and postgress, is there a way to load a row from different tables in a single query? With joins and such? Theh have foreingkeys within them to sort of mapping tables, no relationships, yet.

#

I mean like
Obj1, obj2, obj3 = db.query(table1).join(table2) etc

potent spire Sep 5, 2023, 7:04 AM

#

is there a way to edit same database from diffrent projects?

paper flower Sep 5, 2023, 7:13 AM

#

craggy gust I mean like Obj1, obj2, obj3 = db.query(table1).join(table2) etc

Query is legacy api

#

But yes

#

stmt = select(Table1, Table2).join(Table1.table2)
for table1, table2 in session.execute(stmt):
    ...

#

Also I'd rather just use relationships pithink

wise goblet Sep 5, 2023, 8:03 AM

#

trail rune Where do I put attributes that two entities share? To start, I had a many-to-ma...

still lacking understanding what u describe
try to write example in terms of dictionaries/classes/pure python code here that provides 2 examples of data

craggy gust Sep 5, 2023, 8:10 AM

#

paper flower Also I'd rather just use relationships <:pithink:652247559909277706>

Ye it's moving towards that for sure. It's a legacy system. Thanks for the info.

paper flower Sep 5, 2023, 8:11 AM

#

craggy gust Ye it's moving towards that for sure. It's a legacy system. Thanks for the info.

You can use 2.0 style querying with 1.4 version of sqlalchemy

#

And relationships should be the same too

craggy gust Sep 5, 2023, 8:13 AM

#

Yeah but trying to keep the codebase consistent before actually updating stuff. Adding relations should fix what I need to happen for now should be fine.

nocturne depot Sep 5, 2023, 11:00 AM

#

Guys I am making a program which will connect with remote mysql server and fetch data. I am utilizing mysql-connector-python and sqlalchemy. Is there a way to add proxy from my side to connect with database

harsh pulsar Sep 5, 2023, 11:15 AM

#

paper flower I think since there's no lower bound it may be checking all of your partitions, ...

interesting, ill look into this. i would hope that it's wise enough to avoid that though. also i don't think the separate table is worth the maintenance burden, it also would place much stricter requirements on transaction isolation, no? possibly requiring full serializability. might be significantly slower under load unless there's some efficient design idiom for that

paper flower Sep 5, 2023, 12:21 PM

#

harsh pulsar interesting, ill look into this. i would hope that it's wise enough to avoid tha...

Really depedns on how you use it, if you often write to specific group - then probably not 🤔

#

I would still try adding lower bound, changing query itself (e.g. removing some of it's parts), adding indexes, changing pk order

harsh pulsar Sep 5, 2023, 12:50 PM

#

paper flower Really depedns on how you use it, if you often write to specific `group` - then ...

hmmm good point. i think there is still somewhat of a concurrency concern within each group but maybe not so severe

paper flower Sep 5, 2023, 12:51 PM

#

harsh pulsar hmmm good point. i think there is still somewhat of a concurrency concern within...

Yeah, I understand

#

I had a similar situation, but with significantly lower database load - we were storing historing records in the same table, which becomes hard to query latest values from

harsh pulsar Sep 5, 2023, 12:51 PM

#

my bigger concern would be that the cost of maintaining that extra table would add more latency than we would save by optimizing the query... but i love all the suggestions, this is giving me a lot of things i can try this week

paper flower Sep 5, 2023, 12:51 PM

#

e.g. status of some kind of object

paper flower Sep 5, 2023, 12:52 PM

#

harsh pulsar my bigger concern would be that the cost of maintaining that extra table would a...

Yeah, I'd just fiddle with the query, indexes and primary key pithink

slender hollow Sep 5, 2023, 12:58 PM

#

Hey guys, I trying to create a csv file with pandas that import multiple csv and concatenate into one dataframe

I already tried every way I know, I dont know too much 🥲

coral wasp Sep 5, 2023, 1:01 PM

#

slender hollow Hey guys, I trying to create a csv file with pandas that import multiple csv and...

Share what you've got? If it's complicated, maybe open a help thread #❓｜how-to-get-help

harsh pulsar Sep 6, 2023, 12:28 AM

#

@coral wasp @paper flower reporting back from the world of query optimization: my lookup is reduced to a single index scan, but it does appear to be scanning every hypertable. had to work on other things today so i didn't get a chance to experiment and see if pre-filtering reduced the number of hypertables to be scanned. i will have a much more realistic amount of data loaded into the database after i get through my load testing tomorrow, so i will revisit this then

paper flower Sep 6, 2023, 12:31 AM

#

harsh pulsar <@738234281146712084> <@216540595403882496> reporting back from the world of q...

What did you change though? 🤔

coral wasp Sep 6, 2023, 12:32 AM

#

Hmm, kinda what i was expecting, I don't know hypertables but I do know snake oil 😉 (also known as tanstaafl)

torn sphinx Sep 6, 2023, 2:37 AM

#

waxen finch hmm, with the current schema you can do that, for example given the dataset [us-...

What if I choose to insert the data from the csv file into the database, then retrieve the information?

waxen finch Sep 6, 2023, 3:01 AM

#

torn sphinx What if I choose to insert the data from the csv file into the database, then re...

you'll need to insert one row into dataset with your filename/column names, and then all of the rows from that CSV into dataset_row, where dataset_row.dataset_id matches whatever id was given for your dataset

torn sphinx Sep 6, 2023, 3:13 AM

#

waxen finch you'll need to insert one row into `dataset` with your filename/column names, an...

I'm confused on how I should go about the column. I'm making one then how should I input the data inside? I'd need to know what the data is so wouldn't I need to store the format or something.

waxen finch Sep 6, 2023, 3:14 AM

#

im not sure what else you need to know about the "data", assuming that refers to individual rows

#

once you've parsed the csv into a bunch of lists (see csv.reader()), you can insert each of them directly into the dataset.columns/dataset_row."row" column and asyncpg will convert it into the appropriate array type

torn sphinx Sep 6, 2023, 3:29 AM

#

waxen finch once you've parsed the csv into a bunch of lists (see [csv.reader()](<https://do...

I don't get how the data is being formatted. Am I taking the string and turning it into a list, if I am am I keeping it one string? Or am I taking each value and putting it in as a different item. If I do that how do I what the values belong to once I retrieve it from pg

waxen finch Sep 6, 2023, 3:31 AM

#

#databases message
one column is storing the entire row as a TEXT[] array, so what you should be inserting is a list of values

#

if you did want to insert each value individually, you would need another column to identify the row number that value is associated with alongside the dataset_id (and also a column number to help define the order of those values)

torn sphinx Sep 6, 2023, 3:39 AM

#

waxen finch https://canary.discord.com/channels/267624335836053506/342318764227821568/114816...

Issue with this is I wouldn't know what values were what. I'd have to write a script that checks the csv file name then sets it up from there

waxen finch Sep 6, 2023, 3:41 AM

#

you do, dataset_row.dataset_id lets you check what dataset its from

#

every row in dataset_row can be associated with a dataset because of that foreign key

torn sphinx Sep 6, 2023, 3:45 AM

#

So in dataset_row I'm storing the format name, age, occupation, blah blah2. I'm so confused rn,

waxen finch Sep 6, 2023, 3:45 AM

#

the column names? no, you can store that once in the dataset.columns TEXT[] column

#

maybe we should walk through the data you're going to store and understand how to describe it in two tables

torn sphinx Sep 6, 2023, 3:52 AM

#

waxen finch the column names? no, you can store that once in the `dataset.columns TEXT[]` co...

Okay I get it now

torn sphinx Sep 6, 2023, 3:52 AM

#

waxen finch maybe we should walk through the data you're going to store and understand how t...

Yeah sure

torn sphinx Sep 6, 2023, 4:21 AM

#

Should I use pandas?

waxen finch Sep 6, 2023, 4:25 AM

#

thats a really vague question so its hard to answer, but if its more specific someone else who uses pandas could help you out

waxen finch Sep 6, 2023, 4:31 AM

#

torn sphinx Yeah sure

To start, in a relational database you normally have a fixed number of tables, and those tables should be able to describe your data in a way that makes it hard to store duplicate or invalid information. For your data, you have a bunch of CSV files, where each file has a filename, a list of column names, and a list of rows. Every row should be associated with a filename and column names, otherwise your row would be meaningless, so you might think of putting them in one table: sql CREATE TABLE dataset_row ( filename TEXT, columns TEXT[], "row" TEXT[] ); However, when you want to insert a bunch of rows from one file, you have to duplicate the filename and columns for every row: sql INSERT INTO dataset_row VALUES ('my.csv', '{col1,col2,col3}', '{123,456,789}'); INSERT INTO dataset_row VALUES ('my.csv', '{col1,col2,col3}', '{321,654,987}'); ... That increases wasted storage space, and makes it possible to say, have two rows in the same file with entirely mismatched column names. Ideally you would define the filename and columns one time, and then have each row refer to that same piece of information, such as: sql INSERT INTO dataset (id, filename, columns) VALUES (1, 'my.csv', '{col1,col2,col3}'); INSERT INTO dataset_row (dataset_id, "row") VALUES (1, '{123,456,789}'); INSERT INTO dataset_row (dataset_id, "row") VALUES (1, '{321,654,987}'); Each row in dataset now represents a CSV file, and each row in dataset_row corresponds to one of your CSV files

torn sphinx Sep 6, 2023, 4:34 AM

#

waxen finch To start, in a relational database you normally have a fixed number of tables, a...

So dataset will be a row with a unique filename and column. Then dataset_row will be all the rows

waxen finch Sep 6, 2023, 4:46 AM

#

As a sidenote, not all relational databases have array types, and an array column can often be represented with another table, for example: sql CREATE TABLE dataset_column ( dataset_id INTEGER REFERENCES dataset (id), name TEXT ); This conforms to 1NF and is normally what you want to design, but arrays tend to be useful when you have a bunch of values that you will often work with as a single item

#

i probably should have explained that earlier because i initially thought you wouldnt need to access/sort by individual elements in each row, which in hindsight made it kind of pointless

#

however postgres provides various operators and functions for working with arrays, so its not a deadend to be using it here

torn sphinx Sep 6, 2023, 4:53 AM

#

waxen finch As a sidenote, not all relational databases have array types, and an array colum...

What is the dataset_id INTERGER REFERENCES dataset (id) doing?

waxen finch Sep 6, 2023, 4:54 AM

#

REFERENCES dataset (id) is a constraint, or more specifically a foreign key constraint, which says that dataset_id must refer to an existing row in the dataset table matching the id column

#

its how you associate tables with other tables, and is often needed to safely join two tables together: sql -- Select the rows of any file which has a 'date' column SELECT d.filename, dr."row" FROM dataset AS d JOIN dataset_column AS dc ON d.id = dc.dataset_id JOIN dataset_row AS dr ON d.id = dr.dataset_id WHERE dc.name = 'date';

torn sphinx Sep 6, 2023, 5:16 AM

#

waxen finch its how you associate tables with other tables, and is often needed to safely jo...

I've never seen something like this before. Can you explain what's happening?

waxen finch Sep 6, 2023, 5:25 AM

#

JOIN table lets you combine the columns of another table, and in the example ON d.id = dr.dataset_id specifies that rows should be joined when they have the same dataset ID
For example, if you have dataset (1, 'birth-rates.csv'), a dataset column (1, 'year'), and another dataset column (1, 'rate'), joining dataset with the dataset columns by ID will give you (1, 'birth-rates.csv', 'year') and (1, 'birth-rates.csv', 'rate')

#

see also this interactive tutorial on joins https://sqlbolt.com/lesson/select_queries_with_joins

torn sphinx Sep 6, 2023, 5:28 AM

#

waxen finch `JOIN table` lets you combine the columns of another table, and in the example `...

Alright cool thanks. I'm gonna head to bed now when I'm free I'm gonna come back I'll probably respond as I have short term memory loss lol. But yeah I'm gonna get started tomorrow, thanks.

sonic plaza Sep 6, 2023, 9:52 AM

#

Hey where can I learn sqlite3?

fading patrol Sep 6, 2023, 11:27 AM

#

sonic plaza Hey where can I learn sqlite3?

On your computer. Just dive in. There are endless tutorials online. Which one is best depends on what you're trying to build and how you like to learn.

potent spire Sep 6, 2023, 1:18 PM

#

so how I edit a number in json file?

fading patrol Sep 6, 2023, 1:34 PM

#

potent spire so how I edit a number in json file?

#❓｜how-to-get-help

fallen vault Sep 6, 2023, 4:14 PM

#

What’s the easiest way to offload a local database to a cloud database?

fallen vault Sep 6, 2023, 4:14 PM

#

potent spire so how I edit a number in json file?

Use str() to change the number to a string.

#

Or use the json package to dump a dict into a json file.

storm mauve Sep 6, 2023, 4:28 PM

#

fallen vault What’s the easiest way to offload a local database to a cloud database?

depends on which database and which provider - most providers will have quite a few different options to import data though

#

the most straightforward and applicable to most might be just dumping and restoring (or exporting to something like csv/json and importing if not the exact same kind of database), but some may have more efficient options

potent spire Sep 6, 2023, 4:35 PM

#

fallen vault Or use the json package to dump a dict into a json file.

both not working

fallen vault Sep 6, 2023, 4:49 PM

#

storm mauve the most straightforward and applicable to most might be just dumping and restor...

I didn’t think about the database aspect of it. Just went on the assumption the data was already in a json.

potent spire Sep 6, 2023, 5:08 PM

#

how I can check if a json file is empty or no?

pure mortar Sep 6, 2023, 5:15 PM

#

harsh pulsar <@738234281146712084> <@216540595403882496> reporting back from the world of q...

oh so you did follow up on this?

#

PsyduckPopcorn

hollow oar Sep 6, 2023, 7:13 PM

#

ooo. a timescaledb optimisation question - can't believe i missed that.

select create_hypertable('records', 'created_time'); - this doesn't have space partition, worth a try after you have exhausted other options imo
what's the index on the chunks? is it a multi-column btree index? anyway - the ordering of group_id, time makes sense to me.
is there anything that uses JIT in the explain analyze plan? i recall i got bitten once by that, JIT'ing for absolutely no reason and wasting compute over and over.

torn sphinx Sep 6, 2023, 11:24 PM

#

waxen finch `REFERENCES dataset (id)` is a constraint, or more specifically a [foreign key](...

You're specifying the names of the variables after they're being called right? idk if I'm tripping or it's just a sql thing I don't get

waxen finch Sep 6, 2023, 11:46 PM

#

torn sphinx You're specifying the names of the variables after they're being called right? i...

im not sure what you mean by variables, those are completely different things from that foreign key syntax

#

<new_column> <column_datatype> REFERENCES <referenced_table> (<referenced_column>)

harsh pulsar Sep 7, 2023, 12:44 AM

#

paper flower What did you change though? 🤔

nothing yet, but until i have the table fully populated there isn't much else to do. had to deal with some unrelated stuff the last couple days, going to actually do it tomorrow

harsh pulsar Sep 7, 2023, 12:45 AM

#

torn sphinx You're specifying the names of the variables after they're being called right? i...

be careful here, the syntax superficially looks like a python function call but it's not at all the same thing. likewise with column and table names, they kind of look like python variables but they are not really the same thing

slender hollow Sep 7, 2023, 3:06 AM

#

Hey guys

#

Does have a good knowledge on pandas?

#

I'm struggling trying to merge CSV files.

torn sphinx Sep 7, 2023, 4:32 AM

#

waxen finch im not sure what you mean by variables, those are completely different things fr...

Variables as in the names you're assigning to the actual value, for example, "from dataset as d". You call d before dataset was referenced as d. You can see that where you say "d.filename".

waxen finch Sep 7, 2023, 4:41 AM

#

torn sphinx Variables as in the names you're assigning to the actual value, for example, "fr...

oh those are table aliases, its just to make the query easier to read
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-TABLE-ALIASES

fluid quail Sep 7, 2023, 5:03 AM

#

potent spire so how I edit a number in json file?

Male the number a string

#

Make

potent spire Sep 7, 2023, 5:46 AM

#

fluid quail Male the number a string

when I do that it telling that unwritable

slate basin Sep 7, 2023, 6:02 AM

#

How can i use independent db with a discord bot I want to use sql?

halcyon dew Sep 7, 2023, 6:46 AM

#

SQLite?

#

Or postgres. Bc with postgres u host the server, make an role and password for it, and then create a database that is the owner of the new account you just created

slate basin Sep 7, 2023, 6:57 AM

#

ic

fluid quail Sep 7, 2023, 8:08 AM

#

potent spire when I do that it telling that unwritable

I think in json you can just put the numbers. I dont think you need the “” or ‘’

potent spire Sep 7, 2023, 8:35 AM

#

I solved it

#

but I have a question

marsh otter Sep 7, 2023, 12:11 PM

#

Just wanted to understand industry standards, what's the benchmark for getting 100K recrods from table and having it in-memory. I have to apply couple of functions on top of this tabular data. Right now it takes 15 Seconds for me to get the data from snowflake, fetch it as Pandas DataFrame.
Is it normal in industry to do this kinda of thing?
Any one who have previously done, can you elobrate your tech stack.
I wanted to do this in a request response cycle. I'll send very minimal data to the frontend after computation. (sub second performance)

coral wasp Sep 7, 2023, 12:24 PM

#

marsh otter Just wanted to understand industry standards, what's the benchmark for getting 1...

Benchmark? Like, how long should it take? I'd guess that most of that 15 seconds is snowflake executing/processing the query, not anything on the client side. And, I doubt the snowflake database connector is the bottleneck here. What I'm assuming is: it takes nearly 15s for the query to get the 100k rows to complete. I'd first check that, such as doing the same operation on another system or snowflake interface or whatever.

brazen charm Sep 7, 2023, 12:36 PM

#

overall I would say it's slower than i'd expect

#

But I wouldn't expect it to be less than a few seconds

marsh otter Sep 7, 2023, 1:15 PM

#

coral wasp Benchmark? Like, how long should it take? I'd guess that most of that 15 seconds...

I have checked the query performance , query is executing in 100ms. It's simple select query in snowflake.
Client side snowflake connector is what taking its time to give it back as pandas dataframe.

#

Just to give more context, the data has 15 cols, 5 integer, 5 string and 5 dates. that's it

coral wasp Sep 7, 2023, 1:17 PM

#

marsh otter I have checked the query performance , query is executing in 100ms. It's simple ...

I'd start with normal troubleshooting: check versions of connector, pyarrow and pandas. Test a smaller query (ie: 1 row), etc.

#

15 sec is obviously unexpectedly slow.

marsh otter Sep 7, 2023, 1:18 PM

#

Got it. Python connector is in latest version, pandas is in 2.1.x

coral wasp Sep 7, 2023, 1:19 PM

#

And pyarrow?

marsh otter Sep 7, 2023, 1:19 PM

#

Pyarrow is in 10.0.1

coral wasp Sep 7, 2023, 1:19 PM

#

Oh, that's crazy old

#

Sure not 12.0.1?

#

They just released 13

marsh otter Sep 7, 2023, 1:21 PM

#

yeah! but the snowflake python connector restricted it to v10. 😦
https://github.com/snowflakedb/snowflake-connector-python/blob/351b2dca61dc2a685a2ef2dd419b3e94c0c6e595/pyproject.toml#L9

delicate fieldBOT Sep 7, 2023, 1:21 PM

#

pyproject.toml line 9

"pyarrow>=10.0.1,<10.1.0",```

marsh otter Sep 7, 2023, 1:22 PM

#

I'm still not able to figure out this 10 to 15 seconds for 100K. when I checked the memory size of the pandas dataframe or list of tuple. It is just around 100MB

marsh otter Sep 7, 2023, 1:23 PM

#

coral wasp * 15 sec is obviously unexpectedly slow.

From your experience, what's the approx time it takes for this. Just converting the query results to native python type

coral wasp Sep 7, 2023, 1:23 PM

#

<1sec on the client side.

marsh otter Sep 7, 2023, 1:24 PM

#

Man! I don't know what am I even doing wrong! 15 Seconds it takes for me 😦

coral wasp Sep 7, 2023, 1:27 PM

#

Maybe compare fetchall performance vs fetch_pandas?

marsh otter Sep 7, 2023, 1:35 PM

#

coral wasp Maybe compare fetchall performance vs fetch_pandas?

Yes I did that. fetch_pandas_all takes 15 Seconds. fetchall() takes 10 to 15 seconds.

coral wasp Sep 7, 2023, 1:43 PM

#

marsh otter Yes I did that. fetch_pandas_all takes 15 Seconds. fetchall() takes 10 to 15 sec...

Reading a 100k row csv takes me 400ms: ```py
import pandas as pd
import numpy as np

Dataframe creation

df = pd.DataFrame({
'int_col1': np.random.randint(1, 1000, 100000),
'int_col2': np.random.randint(1, 1000, 100000),
'int_col3': np.random.randint(1, 1000, 100000),
'int_col4': np.random.randint(1, 1000, 100000),
'int_col5': np.random.randint(1, 1000, 100000),
'str_col1': np.random.choice(['A', 'B', 'C', 'D', 'E'], 100000),
'str_col2': np.random.choice(['F', 'G', 'H', 'I', 'J'], 100000),
'str_col3': np.random.choice(['K', 'L', 'M', 'N', 'O'], 100000),
'str_col4': np.random.choice(['P', 'Q', 'R', 'S', 'T'], 100000),
'str_col5': np.random.choice(['U', 'V', 'W', 'X', 'Y'], 100000),
'date_col1': pd.date_range(start='1/1/2020', periods=100000, freq='H'),
'date_col2': pd.date_range(start='1/1/2020', periods=100000, freq='H'),
'date_col3': pd.date_range(start='1/1/2020', periods=100000, freq='H'),
'date_col4': pd.date_range(start='1/1/2020', periods=100000, freq='H'),
'date_col5': pd.date_range(start='1/1/2020', periods=100000, freq='H'),
})

df.to_csv("mytest.csv")

%timeit pd.read_csv("mytest.csv")

df2 = pd.read_csv("mytest.csv")
print(len(df2))
display(df2)

marsh otter Sep 7, 2023, 1:44 PM

#

coral wasp Reading a 100k row csv takes me 400ms: ```py import pandas as pd import numpy as...

Um but is this right measure. I mean I'm getting data from snowflake table.

coral wasp Sep 7, 2023, 1:46 PM

#

You asked about "converting the query results to native python type". Converting from CSV should be even slower than converting a database result.

shut tiger Sep 7, 2023, 1:47 PM

#

marsh otter Um but is this right measure. I mean I'm getting data from snowflake table.

You really shouldn't crosspost. And please just run a profiler.

narrow shore Sep 7, 2023, 3:13 PM

#

marsh otter I'm still not able to figure out this 10 to 15 seconds for 100K. when I checked ...

ugh maybe because pandas sucks?

shut tiger Sep 7, 2023, 3:14 PM

#

narrow shore ugh maybe because pandas sucks?

read #web-development for the thrilling conclusion

#

(it's not pandas)

hollow oar Sep 7, 2023, 3:25 PM

#

does snowflake not have a dedicated command for exporting data to a flat file which you can pull much quicker?
in my expereience using these commands is much faster than whatever fetch_all is doing.

(also yes - there is no need to crosspost)

brazen charm Sep 7, 2023, 4:55 PM

#

typically you just pull straight from the object store

torn sphinx Sep 7, 2023, 4:57 PM

#

how would I handle multiple updates to the db for something like likes in sqlite? The race condition would make it kind of tricky to aggregate the requests

brazen charm Sep 7, 2023, 5:01 PM

#

a lock

#

although sqlite already holds a lock on writers so it should be fine

waxen finch Sep 7, 2023, 5:53 PM

#

speaking of sqlite locks, im getting some really poor performance when running concurrent selects with this benchmark using sqlite 3.43.0
https://github.com/thegamecracks/asqlite-bench ```py

py -m asqlite_bench queries\2-selects.toml --connections 10
Finished in 7.080s
py -m asqlite_bench queries\2-selects.toml --connections 1
Finished in 0.513s``` for context, asqlite runs each connection in its own thread and implicitly sets conn.isolation_level = None and PRAGMA journal_mode=WAL

looking at the profiling results, most of the time is spent in Cursor.fetchall() so i believe it is dealing with locks, but i dont know why running ten concurrent selects would make it perform worse

brazen charm Sep 7, 2023, 6:54 PM

#

what happens if you try open the DB in read only mode?

#

you can also try setting nolock to 1

#

Or immutable to 1

#

https://www.sqlite.org/uri.html#uriimmutable

waxen finch Sep 7, 2023, 7:12 PM

#

brazen charm what happens if you try open the DB in read only mode?

tried ?immutable=1, ?nolock=1, and ?mode=ro (nolock required removing the WAL pragma from asqlite), and results were all around 7s, still waiting on fetchall()

#

oh i forgot to manually set WAL too, that made ?nolock=1 raise OperationalError: unable to open database file while the other two had no issue

#

so i guess that means my assumption was wrong, and something in the pysqlite wrapper is blocking...

trail current Sep 7, 2023, 7:34 PM

#

cursor.execute('''
    CREATE TABLE IF NOT EXISTS enlistments (
        user_id INTEGER PRIMARY KEY,
        discord_name TEXT,
        application_name TEXT,
        division TEXT,
        rank TEXT,
        gamertag TEXT
    )
''')

I have already got my table created, how would I go about adding an additional row to the table once it has been created or is this not possible?

pure mortar Sep 7, 2023, 8:05 PM

#

marsh otter I have checked the query performance , query is executing in 100ms. It's simple ...

simple select query? just in case, if youre doing select *, youll want to avoid that. this is a columnar db so the way you write sql here vs. rbdms is similar but dif.

#

this also leads to increased compute and/or cost/snowflake bills btw

#

how do i know? we have dbas over here that have lived in sql server world for so long but now need to do things in snowflake

#

the result?

#

take_my_money

brazen charm Sep 7, 2023, 8:43 PM

#

The cost of using a datalake for data which could just work on a regular machine

fallen vault Sep 7, 2023, 9:42 PM

#

Whats a good way to scrub user input to prevent sql injection for a desktop app?

storm mauve Sep 7, 2023, 9:43 PM

#

usually the library will take care of that for you as long as you specify the parameters separately from the query itself
(in which case the query itself should be hardcoded, and the user should only provide the parameters)

#

!sql-fstring

delicate fieldBOT Sep 7, 2023, 9:43 PM

#

SQL & f-strings

Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.

Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.

For example, the sqlite3 package supports using ? as a placeholder:

query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)

Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.

See Also
• Python sqlite3 docs - How to use placeholders to bind values in SQL queries
• PEP-249 - A specification of how database libraries in Python should work

fallen vault Sep 7, 2023, 9:44 PM

#

Yeah im using cur.execute(sql, (userdata,)) where sql is something like '''SELECT * FROM table WHERE column=?'''

storm mauve Sep 7, 2023, 9:45 PM

#

quite much no risk of injection in that case

fallen vault Sep 7, 2023, 9:45 PM

#

Alright, thank you. All i gotta do is check the data against the data tables then.

paper flower Sep 8, 2023, 12:26 AM

#

waxen finch speaking of sqlite locks, im getting some really poor performance when running c...

I never heard of asqlite but used aiosqlite for a bit, did you try it? 🤔

waxen finch Sep 8, 2023, 12:39 AM

#

paper flower I never heard of `asqlite` but used `aiosqlite` for a bit, did you try it? 🤔

i looked at the implementations some time ago and to my understanding they both follow the same principles, but ill try to adapt the tests for it too

#

fwiw ive also done a simpler ThreadPoolExecutor version #databases message

paper flower Sep 8, 2023, 1:03 AM

#

So it's likely a sqlite issue? pithink

waxen finch Sep 8, 2023, 1:29 AM

#

paper flower I never heard of `asqlite` but used `aiosqlite` for a bit, did you try it? 🤔

implemented a simple aiosqlite pool (source) and results were basically the same ```py

py -m asqlite_bench queries\2-selects.toml -c 1
Finished in 0.527s
py -m asqlite_bench queries\2-selects.toml -c 1 -m aiosqlite
Finished in 0.447s
py -m asqlite_bench queries\2-selects.toml -c 10
Finished in 6.786s
py -m asqlite_bench queries\2-selects.toml -c 10 -m aiosqlite
Finished in 6.968s```

torn sphinx Sep 8, 2023, 1:50 AM

#

waxen finch i probably should have explained that earlier because i initially thought you wo...

I'm getting ready to start coding as I have some free time, looking now what would be the speed difference if I did something like having all the columns?

#

Rather than storing the row in a single column and trying to access it

waxen finch Sep 8, 2023, 2:01 AM

#

torn sphinx I'm getting ready to start coding as I have some free time, looking now what wou...

im not sure of the performance difference between 100+ tables with custom columns vs. a few tables with the same schema, that you'll probably have to test yourself with some real data

paper flower Sep 8, 2023, 2:01 AM

#

waxen finch implemented a simple aiosqlite pool ([source](<https://github.com/thegamecracks/...

So it's probably sqlite thing pithink

torn sphinx Sep 8, 2023, 2:02 AM

#

waxen finch im not sure of the performance difference between 100+ tables with custom column...

Not 100+ tables, but one table with every column I'd need.

#

VS

#

The two tables linked together and putting all the text into one column

waxen finch Sep 8, 2023, 2:03 AM

#

as in a union of every column from every dataset you might store?

torn sphinx Sep 8, 2023, 2:03 AM

#

Indeed

waxen finch Sep 8, 2023, 2:04 AM

#

hmm i would guess that might lead to inefficient queries if every row has to store a bunch of NULL columns

paper flower Sep 8, 2023, 2:05 AM

#

waxen finch implemented a simple aiosqlite pool ([source](<https://github.com/thegamecracks/...

Maybe it has something to do with what query you're actually running?

#

Ah, nvm 😅

waxen finch Sep 8, 2023, 2:07 AM

#

oh yeah i did test SELECT 1 instead of selecting 1000 single-column rows from a table, and more connections marginally improved the performance rather than degrading it

paper flower Sep 8, 2023, 2:08 AM

#

select 1 wouldn't do disk IO though 🤔

#

in theory

torn sphinx Sep 8, 2023, 2:11 AM

#

waxen finch hmm i would guess that might lead to inefficient queries if every row has to sto...

Do you still think the method you explained before is the best way now knowing I want to be able to select specific data out of rows?

waxen finch Sep 8, 2023, 2:14 AM

#

paper flower `select 1` wouldn't do disk IO though 🤔

here were the results for that: py -c 1: Finished in 0.160s -c 10: Finished in 0.117s i assumed this CTE alone wouldnt have done any IO either, but it performed similarly to creating the table beforehand: ```py

query = """

WITH RECURSIVE generate_series(value) AS (

SELECT 1

UNION ALL

SELECT value+1 FROM generate_series

WHERE value+1<=1000

)

SELECT value FROM generate_series

"""

repeat = 1000

-c 1: Finished in 0.690s
-c 10: Finished in 6.925s

waxen finch Sep 8, 2023, 2:17 AM

#

torn sphinx Do you still think the method you explained before is the best way now knowing I...

the array columns could be further normalized, but i think it'll be fairly easy to use - try creating the tables yourself, inserting some data, and writing up queries to do what you need

torn sphinx Sep 8, 2023, 2:17 AM

#

waxen finch the array columns could be further normalized, but i think it'll be fairly easy ...

Can you link me to documentation for arrays in columns

paper flower Sep 8, 2023, 2:18 AM

#

waxen finch here were the results for that: ```py -c 1: Finished in 0.160s -c 10: Finished ...

I think the issue is that it just becomes cpu bound? pithink

waxen finch Sep 8, 2023, 2:18 AM

#

torn sphinx Can you link me to documentation for arrays in columns

https://www.postgresql.org/docs/current/arrays.html
if you find the need to do really complicated queries, that can be an indication that the schema isnt suitable for your data model

torn sphinx Sep 8, 2023, 2:22 AM

#

waxen finch <https://www.postgresql.org/docs/current/arrays.html> if you find the need to do...

What's considered a really complicated query?

waxen finch Sep 8, 2023, 2:26 AM

#

paper flower I think the issue is that it just becomes cpu bound? <:pithink:65224755990927770...

now that you mention it my cpu usage gets fairly high, even with in-memory databases... seems weird for merely selecting 10K numbers

paper flower Sep 8, 2023, 2:28 AM

#

Runtime seems to scale fairly linearly for me with the amount of connections

#

Makes me think it's probably cpu bound

#

Adding a lock in there makes queries run faster though 🙂

#

async def run_query(iterations: int, lock: AbstractAsyncContextManager[None]):
    stmt = select(Model).where(Model.c > 1000, Model.c < 2000).limit(100)
    async with async_session_factory() as session:
        for _ in range(iterations):
            async with lock:
                await session.execute(stmt)


async def main() -> None:
    iterations = 100
    start = time.perf_counter()
    lock = asyncio.Semaphore(1)
    async with asyncio.TaskGroup() as tg:
        for _ in range(100):
            tg.create_task(run_query(iterations=iterations, lock=lock))
    print(time.perf_counter() - start)

About 16s without lock and 7s with lock

waxen finch Sep 8, 2023, 2:32 AM

#

torn sphinx What's considered a really complicated query?

i dont think there's a definitive answer, but i'd consider a query too complicated if i need to write some verbose query like: sql -- Select all data from xyz 2021-2023 datasets with a population greater than 500 SELECT population FROM csv_xyz_2021 WHERE population > 500 UNION ALL SELECT population FROM csv_xyz_2022 WHERE population > 500 UNION ALL SELECT population FROM csv_xyz_2023 WHERE population > 500; or use esoteric features for something that should be simple

torn sphinx Sep 8, 2023, 2:34 AM

#

waxen finch i dont think there's a definitive answer, but i'd consider a query too complicat...

Possibly could be like this

waxen finch Sep 8, 2023, 2:36 AM

#

paper flower ```py async def run_query(iterations: int, lock: AbstractAsyncContextManager[Non...

wtf that worked for me too

#databases

Write your MySQL query statement below

Work with data now...

OR

OR

hmm, with the current schema you can do that, for example given the dataset us-counties-2023.csv: ```sql -- columns = {date,county,state,fips,cases,deaths}

Dataframe creation

query = """

WITH RECURSIVE generate_series(value) AS (

SELECT 1

UNION ALL

SELECT value+1 FROM generate_series

WHERE value+1<=1000

)

SELECT value FROM generate_series

"""

repeat = 1000

hmm, with the current schema you can do that, for example given the dataset us-counties-2023.csv: ```sql
-- columns = {date,county,state,fips,cases,deaths}