#databases
1 messages · Page 24 of 1
That’s something I’m thinking about but I don’t think they would want extra things to do (manage a flask website over paying for a database manager)
i'll vote that this is one of the top horrors i saw for the last year. Not really maintainable at all.
full of magical values solution
Easier isn’t always better.
Show us how you would do it better.
This is called “data-driven programming”. You put all the cases into tables, and this keeps the actual code part much simpler.
https://discordapp.com/channels/267624335836053506/342318764227821568/1169082360457855097 this one will work good enough
i don't like in it only that it returns still SQLqlalchemy statement a bit
would be nice wrapping this logic entirely into Pydantic returns for input and output 😅
The point is to build a query, right?
Or sqlalchemy dataclasses
yeah, and that makes it valid solution in current requirements
I would rather use sqlalchemy models instead of returning pydantic objects
i hate ORM code propagating through my code base further than from pydantic wrapper 🙈 it is the most type unsafe code usually. Or at least difficult to type without Any stuff
Depends on where the model is going
Usually you'd have some kind of adapter to your specific api (rest, cli, etc) and map a model there
class Book(MappedAsDataclass, Base, kw_only=True):
__tablename__ = "books"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
title: Mapped[str]
It's already a mapped table, no need to map it again
Unless you're returning some kind of aggregate
🤔 MappedAsDataclass? looks interesting
haven't seen this thing
Basically a dataclass, works with mypy too
i will note this usage. then there is posibility to utilize this code outside of repository wrapper yeah
Book(id=42)
src\app\db\models\_user.py:28:1: error: Unexpected keyword argument "id" for "Book" [call-arg]
Book(id=42)
Book()
src\app\db\models\_user.py:28:1: error: Missing named argument "title" for "Book" [call-arg]
Book()
still not liking though fragility of input params into SQL queries though (since they can be Any kwargs.) (or at least haven't seen input param checking)
there is no type checking anyway
so default behavior to wrap into repository thing is still valid solution
Because a lot of dialects do coercing implicitly iirc
It should be possible to add that sqlalchemy though, they should have their reasons to not do that
Do we do product name shaming, here? https://m.media-amazon.com/images/I/51+dCSsKH6L._AC_UX450_UY450_SR450,450_.jpg
NaN
Because Mapped[int] is typed, you could forbid using __eq__ with non-int objects
!off
There are three off-topic channels:
• #ot0-fear-of-python
• #ot1-this-regex-is-impossible
• #ot2-the-original-pubsta
The channel names change every night at midnight UTC and are often fun meta references to jokes or conversations that happened on the server.
See our off-topic etiquette page for more guidance on how the channels should be used.
in off topic channels
Yes, because I'm unaware of the offtopic channels.
@wise goblet
In SQLCoreOperations they're typed as Any
def __eq__(self, other: Any) -> ColumnElement[bool]: # type: ignore[override] # noqa: E501
...
def __ne__(self, other: Any) -> ColumnElement[bool]: # type: ignore[override] # noqa: E501
...
But it should be possible to specify stricter types, since the generic is there:
class SQLCoreOperations(Generic[_T_co], ColumnOperators, TypingOnly):
I've heard that some databases attempt to be "helpful" in implicit conversion of types. I thought it might amuse.
Didn't really get the reference 
@frosty mirage When was the last time you tried an orm in python or any other language?
You could give SQLAlchemy a go, and in C# there's Entity Framework Core, which works on top of LINQ and expressions
Also if you don't want to use an orm you can always at least build queries with sqlalchemy core
LINQ doesn’t seem to allow variables in place of names/operators/criteria.
You could substitute a model property
That's a more safe option
This is what I mean about adding all this extra mechanism on top of mechanism, just to avoid something that is so simple to do.
For example like this:
options = {
"1": Book.id,
"2": Book.title,
}
select(options[...])
It helps you to build more maintainable applications
And simplifies query building
That’s what we call a “hand-waving argument”.
Why? 😅
So far you’ve only been able to offer more complicated alternatives, contrary to your claim of simplifying things.
Why is my solution more complicated/less maintainable than yours?
You haven’t even offered an actual solution yet.
I sent one
Flesh it out to deal with all the fields I listed, and you’ll see what I mean.
That's what I call a hand-waving argument 😅
Precisely. You have to answer specifics with specifics.
If you want me to replicate your code could you explain what it does first?
What fields are there, how does it search them?
I kept it simple: all it does is assemble the part after the “where” in the query. It uses “and” to join together checks of all the fields that have nonempty values.
There are other things that are nice about an ORM though that makes the code simpler. I used to do this sort of thing you are doing but when i would look back on it later it took a while to comprehend what it was actually doing. but the ORM method is pretty straight forward pattern. plus you can use other tools on top of it like alembic that will take the model and play an alter condition into it with no effort.
@frosty mirage Maybe like this?
def escape_ilike(text: str, escape_char: str) -> str:
return (
text
.replace(escape_char, escape_char * 2)
.replace('%', f"{escape_char}%")
.replace("_", f"{escape_char}_")
)
class Model(Base):
__tablename__ = "model"
id: Mapped[int] = mapped_column(primary_key=True)
make: Mapped[str]
model: Mapped[str]
details: Mapped[str]
class_name: Mapped[str]
allocation: Mapped[str]
location_name: Mapped[str]
@dataclasses.dataclass
class ModelFilter:
make: str | None = None
model: str | None = None
details: str | None = None
class_name: str | None = None
allocation: str | None = None
location_name: str | None = None
def build_query(filter_: ModelFilter) -> Select[tuple[Model]]:
stmt = select(Model)
for field, value in (
(Model.make, filter_.make),
(Model.model, filter_.model),
(Model.details, filter_.details),
):
if value is None:
continue
stmt = stmt.where(
field.ilike(escape_ilike(value, escape_char="/"), escape="/"))
for field, value in (
(Model.class_name, filter_.class_name),
(Model.allocation, filter_.allocation),
(Model.location_name, filter_.location_name),
):
if value is None:
continue
stmt = stmt.where(field == value)
return stmt
Builds a query like
SELECT
model.id,
model.make,
model.model,
model.details,
model.class_name,
model.allocation,
model.location_name
FROM model
WHERE model.make ILIKE 'Maker ' ESCAPE '/'
AND model.model ILIKE 'a////:42' ESCAPE '/'
AND model.details ILIKE 'b|////\\\\' ESCAPE '/'
AND model.class_name = 'cls'
Does anyone know of a good Postgres client app for iOS?
For iOS? 🤔 Not really
Alright. Didn’t want to pay for something I wasn’t sure about.
hiiiiiiiiiii y'all
Sounds like a nightmare but maybe pgadmin would be usable in Safari?
It’s not for anything serious. Just to play around and learn while I’m not near a computer.
I have Pythonista. I could write a script, just wanted something prepackaged.
There are learning apps like DataCamp
Does it have Postgres specific courses?
does anyone have reading suggestions on Data manipulation languages?
any who can help tell me why this is only making 1 table then stopping
lines = []
for line in file:
line = line.strip()
lines.append(line)
name = str(lines[0])
name = name.replace('b', '')
name = name.replace("'", '')
cur.execute("CREATE TABLE IF NOT EXISTS " + name +" (Date DATE,Open float4, High float4, Low float4, Close float4)")
conn.commit()
cur.close()
conn.close()```
Try some basic debugging... line or another variable may not be doing what you think it's doing
I think its because its all one line. try
for line in file.readlines():
the same coed works perfectly fine with mysql database but it's not working with postgres
but looks like with postgres it just loops the first line of the .txt file
wait lol nvm
i realized what it was lol
i was looping through them all but i was setting name to lines[0]
suppose a table undergoes a set number of changes during a period of a year. is it common place to make a new table each time a change is made, and begin making new entries into this new table, instead of modifying the previously existing table?
Also, a more normal practice would be to create one table, and use a ticker column. One table per ticker is sort of an anti pattern.
no
There's tools like sql compre but you may manually alter the table. you don't recreate prod tables
What kind of changes? 🤔
i agree i just thought having 7000 tables would look funny
Hello. I hope you are well. Could you tell me about your industry experience? How did you get into the professional world?
Please don’t cross post in multiple channels. You already posted in careers. That’s the right place
With alembic what is best practice in the alembic.ini for the database connection url? sqlalchemy.url = driver://user:pass thing? Obviously don't want to commit that file. But what is normally done with that?
😉 environment variables are cool
https://alembic.sqlalchemy.org/en/latest/tutorial.html
read section
sqlalchemy.url - A URL to connect to the databa...
Yeh I already set that up in the last project. And checked that file into the project with passwords and all. My question is more like should that file contain the naked password or do you usually set pass a different way? Are you saying I should set %(pass)s for the variable in the url?
It's not in env by default
Well I guess that you just set the raw pass in there when you need to configure it. What I usually do is have some configuration set the credit in $HOME/.config/appname/config.ini
what's the preferred way to select an entry from a table using a hash of some columns?
for example, say i have a table with two columns, id bigint, name varchar(20), and i want to be able to find a row using the hash of the id and name combined in some way (say, md5(str(id) + name))
i could hash it before i insert and store it as another field, but is that the best solution?
That would work, but to do it simpler... I dimly remember it being possible in some DBs to make an index on an expression like md5(str(id) + name) - essentially the same solution but an index, unlike a real column, would update automatically when you alter one of the columns. maybe try that?
that sounds perfect, ill look into it, thanks
You can just get it from env variables in alembic's env.py
config.set_main_option("sqlalchemy.url", get_settings(DatabaseSettings).url)
OK yeh so that string is probably not necessary at all in that alembic.ini file if I do that. Can just generate it in that env.py
Yeah, you can leave it empty in alembic.ini
And get it from your environment
OK thanks.
I really like the supervisor.d conf env interpolation. That could be useful in alembic
!e looks like it works for sqlite. I used a user-defined function for md5 here, but for performance one'd want to use a sqlite extension instead.
import sqlite3
import hashlib
# in reality you can load an extension for that I think, but I don't know offhand how to do that from python
def md5sum(t: str):
return hashlib.md5(t.encode("utf-8")).hexdigest()
con = sqlite3.connect(":memory:")
# we need to promise to sqlite that this function is deterministic for it to let us use it in an index
# again, this won't be a problem for a built-in md5 from some sqlite extension
con.create_function("md5", 1, md5sum, deterministic=True)
with con:
con.execute("CREATE TABLE mytable (id bigint, name varchar(20))")
con.execute("INSERT INTO mytable values (?, ?)", (512, "yay"))
lst = con.execute("SELECT *, md5(id || name) FROM mytable").fetchall()
print(lst)
val = lst[0][2]
with con:
con.execute("CREATE INDEX mytable_hash ON mytable(md5(id || name))")
with con:
print(con.execute("SELECT * FROM mytable WHERE md5(id || name) == ?", (val,)).fetchall())
print(con.execute("EXPLAIN QUERY PLAN SELECT * FROM mytable WHERE md5(id || name) == 5").fetchall())
con.close()
@hexed estuary :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | [(512, 'yay', 'ff2b984f07eac019cc3694eb1e3fc6d0')]
002 | [(512, 'yay')]
003 | [(3, 0, 0, 'SEARCH mytable USING INDEX mytable_hash (<expr>=?)')]
im using postgres which has md5 built-in, but with slight modifications this works great! thanks
hi could anyone help me on how to get specific data
what do you mean?
I am trying to find a dataset that is within the past 3 years and has a lot of house features. It also needs the house price and location. Something like this https://www.openintro.org/data/index.php?data=ames. I just need newer data. That link is data from Ames, Iowa from 2006-2010, but I need features just like that
OpenIntro's mission is to make educational products that are free, transparent, and lower barriers to education. We're a registered 501(c)(3) nonprofit.
:incoming_envelope: :ok_hand: applied timeout to @torn sphinx until <t:1698999406:f> (10 minutes) (reason: role mentions spam - sent 4 role mentions).
The <@&831776746206265384> have been alerted for review.
!pban 1168331268551888896
:incoming_envelope: :ok_hand: applied ban to @tired flax permanently.
<@&831776746206265384>
Hello. I want to implemet Postgres query
select * from delivery.client_contacts_notification ccn
where array_to_string(ccn.phones, ';') like '%22298749%';
into SQLAlchemy ORM query
filter(any_(ClientContactsNotification.phones.like(f"%{client_phone}%"))
Because any_ doesn't work as I expected. Each item should be compared to expression with like operator
in json file like this
{'id': '1165181902852984913', 'name': 'SwesRa Members', 'icon': '2a5a93228b81b999e24122db3b7ccc27'}
how to get name?
You can use json.load to load a JSON file as a dictionary
I'm alarmed you've posted this message in the #databases channel
json is database
It's a text format
database can refer to either a collection of data (which json is), or database management system (which json isn't)
It's not a memory efficient data collection format either
I suppose it was made to be convenient for web communication
I don't think you're writing they query correctly? 
class Book(MappedAsDataclass, Base):
__tablename__ = "book"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
title: Mapped[str] = mapped_column(String(255), unique=True)
elements: Mapped[list[str]] = mapped_column(ARRAY(String))
stmt = select(Book).where(Book.elements.contains("target"))
print(stmt.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))
SELECT book.id, book.title, book.elements
FROM book
WHERE book.elements @> 'target'
Maybe this works for your case?
I guess ccn.phones is an array
Right. But if I use strictly contains in 2.0 { "error_msg": "(sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.UndefinedFunctionError'>: operator does not exist: character varying[] @> character varying\nHINT: No operator matches the given name and argument types. You might need to add explicit type casts." }
Cloud based Postgres 14, so I can't modify config
And why any doesn't work?
Plain SQL -> at least 1 result in set
select * from delivery.client_contacts_notification ccn
where '+79*****749' like any(ccn.phones) ;
SQLAlchemy empty result set
FROM delivery.post_operation JOIN delivery.client_contacts_notification ON delivery.post_operation.id = delivery.client_contacts_notification.id_post_operation
WHERE $1::VARCHAR LIKE ANY (delivery.client_contacts_notification.phones)
Shouldn't it be any(field) ilike 'target'?
I honestly don't know if order matters here
Also you're doing a join here 
Does it return anything if you remove it and select directly from delivery.client_contacts_notification table?
https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.ARRAY.Comparator.contains yep, looks like it should work, but throws errors. contains even can use escapes %
Looks like it only works on array arguments
I mean, again, shouldn't any work?
SELECT book.id, book.title, book.elements
FROM book
WHERE ('target' LIKE '%%' || (ANY (book.elements)) || '%%')
I think the only reason it's possibly not working is your join in the second query you shared
select(Book).where(any_(Book.elements).contains("target"))
ef any_(expr: _ColumnExpressionArgument[_T]) -> CollectionAggregate[bool]:
"""Produce an ANY expression.
For dialects such as that of PostgreSQL, this operator applies
to usage of the :class:`_types.ARRAY` datatype, for that of
MySQL, it may apply to a subquery. e.g.::
# renders on PostgreSQL:
# '5 = ANY (somearray)'
expr = 5 == any_(mytable.c.somearray)
# renders on MySQL:
# '5 = ANY (SELECT value FROM table)'
expr = 5 == any_(select(table.c.value))
Comparison to NULL may work using ``None`` or :func:`_sql.null`::
None == any_(mytable.c.somearray)
The any_() / all_() operators also feature a special "operand flipping"
behavior such that if any_() / all_() are used on the left side of a
comparison using a standalone operator such as ``==``, ``!=``, etc.
(not including operator methods such as
:meth:`_sql.ColumnOperators.is_`) the rendered expression is flipped::
# would render '5 = ANY (column)`
any_(mytable.c.column) == 5
Right, I have no unittest for that model, may be event selectinload changes behaviour
selectinload doesn't change the original query at all 🤔
I gave up and opened issue on GitHub 
async def get_player_item_value_by_name(user_id,items:list):
"""returns int value of items from database"""
async with sql.connect(DATABASE_PATH) as db:
cursor = await db.cursor()
list_of_item_values = []
for item in items:
query = f"SELECT {item} FROM users WHERE user_id = ?"
await cursor.execute(query,(user_id,))
value = await cursor.fetchone()
list_of_item_values.append(value[0])
return list_of_item_values
async def get_player_item_value_by_name(user_id,items:list):
"""returns int value of items from database"""
async with sql.connect(DATABASE_PATH) as db:
cursor = await db.cursor()
final_items = ""
for item in items:
final_items = final_items + f"{item} , "
query = f"SELECT {final_items} FROM users WHERE user_id = ?"
await cursor.execute(query,(user_id,))
value_list = await cursor.fetchone()
return value_list
which one is the best method that will actually work? becoz i havent tested them yet
cast argument to an integer?
But the thing i want save its a str
then why are you asking your db to increment it?
Is there any interface that generates the actual Python sqlalchemy Base code for the declaration of a table from a table already existing in a database?
i got my first bot(first bot use mongodb) worked, basically
if the data found ->
replace_one
elif not found ->
insert_one
And im working on another bot rn
I want make a game leaderboard (also with mongodb)
and I received an error while i add the if the data found…… stuff into my @client.event(this event hold the minigame fuction, and i add the db fuction at the place hold winner and loser) but I received this error
@client.event
TypeError: ‘Database’ object is not callable```
is that mean the db can not use able in @client.event?
Hello community!
o
how heavy is the table-binding operation in sqlalchemy? resource-wise
python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py) for easy invocation for any code.
python -m profile [-o output_file] [-s sort_order] (-m module | myscript.py)
NICE! i didn't know of this, thanks a lot
File "/home/runner/NOA-sqlite/.pythonlibs/lib/python3.10/site-packages/discord/ext/commands/bot.py", line 1350, in invoke
await ctx.command.invoke(ctx)
File "/home/runner/NOA-sqlite/.pythonlibs/lib/python3.10/site-packages/discord/ext/commands/core.py", line 1029, in invoke
await injected(*ctx.args, **ctx.kwargs) # type: ignore
File "/home/runner/NOA-sqlite/.pythonlibs/lib/python3.10/site-packages/discord/ext/commands/core.py", line 244, in wrapped
raise CommandInvokeError(exc) from exc
discord.ext.commands.errors.CommandInvokeError: Command raised an exception: InterfaceError: Error binding parameter 0 - probably unsupported type.```
What's going on here?
In case it isn't clear, I'm trying to use a command on discord that adds an id unto the database
And when I try to add an id number, say "12345" that comes out
usage of bad framework is going on here 😅
i refactored my code out of discord.py, and it was best decision i made
huh?
im not gonna not use discord.py for that, it's literally the bot's function lol
Can anyone help me with this issue? ><
I think you're trying to use parameter of a wrong type in your sql query
Stop giving people bad advice, error literally has nothing to do with discord.py
I set it to real/integer (i forgot which one but those should be close enough) and tried to input a number
INTEGER ?
i tried both but it doesn't work
Can you share your query though?
This is the table
This is the command that adds an id unto the database
name='add',
brief='Adds user id into the user list')
async def _add_(ctx, user_ids: commands.Greedy[int]):
if user_ids:
conn.execute("INSERT INTO users (id) VALUES (?)", (user_ids,))```
And what was the full error?
File "/home/runner/NOA-sqlite/.pythonlibs/lib/python3.10/site-packages/discord/ext/commands/bot.py", line 1350, in invoke
await ctx.command.invoke(ctx)
File "/home/runner/NOA-sqlite/.pythonlibs/lib/python3.10/site-packages/discord/ext/commands/core.py", line 1029, in invoke
await injected(*ctx.args, **ctx.kwargs) # type: ignore
File "/home/runner/NOA-sqlite/.pythonlibs/lib/python3.10/site-packages/discord/ext/commands/core.py", line 244, in wrapped
raise CommandInvokeError(exc) from exc
discord.ext.commands.errors.CommandInvokeError: Command raised an exception: InterfaceError: Error binding parameter 0 - probably unsupported type.```
That's not the full error
wait hold on
here
well
user_ids is a list
are you sure you didnt want this?
@client.command(aliases=['a'],
name='add',
brief='Adds user id into the user list')
async def _add_(ctx, user_ids: commands.Greedy[int]):
for u_id in user_ids:
conn.execute("INSERT INTO users (id) VALUES (?)", (u_id,))
thanks, it works now
neat
is that command supposed to take multiple user ids?
or just one?
yep, that's why it's greedy
Didn't know Greedy[int] would be a list
You should change it from a loop of conn.execute to one conn.executemany:
conn.executemany("INSERT INTO users (id) VALUES (?)", tuple(zip(user_ids)))
conn.commit()
The tuple(zip(...)) is a trick to turn an iterable with items to a tuple of 1-item-long tuples.
On another note, consider using an asynchronous database API because your function is asynchronous and you wouldn't want your database to block your client. There's aiosqlite if you come from sqlite3
1-item-long tuples?
I'll consider using an asynchronous database
Tuples are the stuff that aren't meant to be modified right? Kinda goes against my intentions of it being modifiable at will so I think not using tuples is the best for it
Hey, I have a problem. Does anyone know how to fix this error?
The update method itself works with other values, only with the solution it doesn't. The insert function works as well.
ERROR
C:\Users\vhipn\Desktop\Eject[RECODE] V1.1\venv\Lib\site-packages\aiomysql\cursors.py:458: Warning: Truncated incorrect DECIMAL value: 'Hochzeit'
await self._do_get_result()```
# **QUERY**
```py
async def update_solution(self, guild_id: int, solution: str) -> None:
return await MariaDB.execute(("UPDATE emojiquiz SET solution = %s WHERE gid = %s", (solution, guild_id)))```
tuple(zip(user_ids)) just turns the list of user IDs into a differently structured object which executemany needs. It doesn't modify user_ids in any way
user_ids is a list of user IDs, but executemany requires an iterable of arguments. If it's only one argument, it needs an iterable of 1-item-long argument iterables
You can do [[user_id] for user_id in user_ids] if you really want both iterable types to be list, but I'm just saying my conversion trick is shorter
Have you tried it with gid = %d? That column's data type is BIGINT, not VARCHAR
Why "gid = %d"? It works normally everywhere with "%s," but for some reason, I get an error in the only function at the top.
The letter after the % says its data type
d for (decimal) integers, s for variable-length character strings
return await MariaDB.execute(("UPDATE emojiquiz SET solution = %s WHERE gid = %d", (solution, guild_id)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\vhipn\Desktop\Eject[RECODE] V1\database\database.py", line 22, in execute
await cursor.execute(*operation)
File "C:\Users\vhipn\Desktop\Molly V1\venv\Lib\site-packages\aiomysql\cursors.py", line 237, in execute
query = query % self._escape_args(args, conn)
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: %d format: a real number is required, not str```
BIGINT
gid = guild_id
Before the updating
In any case
If guild_id is an ID string, just put it in int(...)
guild_id: int, solution: str
Apparently guild_id: str
I managed to fix it, but thanks anyway.
👍
Suggest best database language to learn for python
What's a "database language"? Are you asking what flavor of SQL database you should use? Start with SQLite if it's just a small personal project but if you need something more robust then Postgres
Aight, I'll try this
Hi
So lets say i have a many to many relationship
A table that stores sports and a table that stores people
Many people can play a sport and each person can Play multiple sports
And i have an app that lets people register their name and the sport they play and inserts that in the database
So if a person comes and makes a sport named football
And then another person comes and does the same
Does it create a new data?
Or use the same footbal?
This is happening in postgres
If that makes a difference
Theoritically you'd only want 1 of the same sport, e.g Not two footballs, so your sports table would have a unique column for the sport name or however you want to clearly identify the sport
If your talking about predefined sports in the database (not letting the user create a sport), you would just take the user imported sport, query your table for said sport, then put the new row in your peoples table where sport = sport.id (sport id being your foreign key)
What I'm asking is
If two users input the same name
Does it store it twice?
That depends, are you creating the sport via the user input?
Yes
Then yes it would, unless you add a unique column for the sport name, then what would happen is you would get an error when the second user tries
Is there a reason your not using pre-defined sports in the sports table?
It's not actually sports
It's a web project
And the subject is too big for me to predefine all the possibilities
I'd probably have to make billions if cases, which wouldn't be good for me 🙂
if you dont have any relation between the tables in that case, an SQL database might not be the best option
There are relations
Multiple actually
Thank you
I have a specific question, I would like to transfer my .db database to .sql to use it with postgres instead of sqlite3 as I did before but without losing any data
hi can someone help me connect my database with pycharm using psycopg2, it says “no data sources are configured to run this SQL and provide advanced code assistance”. I’ve been trying to fix it all day and it’s making me go crazy atp, feel free to dm me
'default': {
'NAME': 'myDB',
'ENGINE': 'mysql.connector.django',
'HOST': '127.0.0.1',
'PORT': 3306,
'USER': 'root',
'PASSWORD': 'my_password',
'OPTIONS': {
'autocommit': False,
'use_oure': True,
'init_command': "SET foo='bar';"
},
}
}``` What should I change my options too when connecting mySQL server to django
when I tried to runserver it threw an error
could you try 'django.db.backends.mysql' instead of 'mysql.connector.django'
I'm not sure I follow based on the replies, but you'd have three tables: person, sport, person_sport, where person sport was the link between sport and person.
The sports would be unique in sport, and people unique in person, but person_sport would be unique on (person_id, sport_id)
yall pls
Why it store nothing?
Code:
When using command:
It do have data in database
Anyone please help me solve this issue 🙏
Ask in #python-discussion , lots of people can help with pip install problems.
cheers
typically, is there a one-to-one mapping between a Pydantic model and a database table?
What do you mean?
for one Pydantic model, there is a unique database table assigned to it
That is if you want to map your tables that way
But pydantic isn't very suitable for that by itself
when would we have a Pydantic model that maps to more than one database table?
Because you don't always select all columns or relationships
e.g. you may have a model
class User(BaseModel):
id: int
username: str
But what if you only select id?
I see. So for instance, if I have a questionnaire about someone's nutritional habits and exercise habits, we'd have a single Pydantic model correspond to it, but may have the tables Exercise and Meals
I just think pydantic is a bad choice here, since you can just use an ORM that would do the conversion into python object for you
e.g. SQLAlchemy?
Yep
in which case, there actually is a one-to-one relationship between an ORM model and the SQL table
Usually, yes
noted, thanks!
Why are you asking about this by the way? 
There could be some edge-cases with inheritance, etc
my questionnaire takes time in the form "hours minutes", and then I need to transform it into a valid datetime object by inferring year, month, day, and only then it becomes a valid database entry
so I was wondering, what is the usual way this sort of cohersion is done on incoming data
Well, certainly not in your db models, where are you getting that data from?
user input
What kind of?
prompt-answer CLI
You can just pass user input into a method/function and do preprocessing/mapping into an intermediate data object or directly into a model
First option may be overengineered if you don't need to support other types of inputs
E.g. you may have a CLI and Web API
yeah, but the data normalization/cohersion procedure is not done usually done through the ORM model itself? or in this case would a Pydantic model make sense?
Depends 🤷
I personally prefer to not do any complex validation in pydantic
@dataclasses.dataclass
class QuestionaireCreateDTO:
a: int
b: str
c: datetime
def map_cli_args_to_dto(data: dict[str, Any]) -> QuestionaireCreateDTO:
# Map them how you want
...
def main():
user_input: dict[str, Any] = get_cli_args()
dto = map_cli_args_to_dto()
Then you could pass that DTO into a method/function that creates your db record
DTO: data transfer object?
Yep
You can map everything straight into your model, but that would be less "clean" imo
the QuestionnaireCreateDTO has all validation/cohersion methods attached to itself? or do you do them in map_cli_args_to_dto?
You do them in that function, it doesn't really have to be a function really
DTO is just that - a DTO, it doesn't have any methods, just data
and you would create a different function in the case of incoming data from, e.g., an API, correct?
Hm, if I have some simple validation it's usually done in the pydantic model, something more complex (e.g. logic that needs access to db) is done in code
so in the case of complex validation, you would do it at the map_cli_args_to_dto level, right? thanks a lot btw for taking the time
Hm, no, that's just a func to map everything to dto
If you have some specific data format coming from that API you do that in said API
So that logic doesn't pollute anything else
You mentioned that you have to infer time
yeah, the time inference is based on finding which time-tracking database entry the currently defined object is related to, at which point it will use that primary entry's datetime as it's primary key which is also a foreign key
async def books_create(
schema: BookCreateSchema,
command: Annotated[BookCreateCommand, Inject],
) -> BookSchema:
book = await command.execute(dto=BookCreateDTO(title=schema.title))
if isinstance(book, Err):
match book.err_value:
case BookAlreadyExistsError(): # pragma: no branch
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST)
return BookSchema.model_validate(book.ok_value)
class BookService:
def __init__(
self,
repository: BookRepository,
db_context: DBContext,
) -> None:
self._repository = repository
self._db_context = db_context
async def create(
self,
dto: BookCreateDTO,
) -> Result[Book, BookAlreadyExistsError]:
if await self._repository.get(title=dto.title) is not None:
return Err(BookAlreadyExistsError())
book = Book(
title=dto.title,
)
self._db_context.add(book)
await self._db_context.flush()
return Ok(book)
(Exampe from fastapi app)
I do mapping here but I usually make a to_dto method on Schema classes which maps them to dtos
book = await command.execute(dto=BookCreateDTO(title=schema.title))
I prefer to not use **kwargs since they won't be type-safe
e.g.
class BookCreateSchema(BaseSchema):
model_config = ConfigDict(title="BookCreate")
title: str = pydantic.Field(max_length=255)
def to_dto(self) -> BookCreateDTO:
return BookCreateDTO(title=self.title)
It's a simple example though
hm, i'll study this for a while. thanks a lot for sharing, very helpful stuff
BookService sends a book to the database, books_create creates the book Python object fromBookSchema?
Yes, it's a fastapi endpoint
There's also BookCreateCommand class but in this case it just passes dto into BookService.create method
so validation is split multiple ways here: BookService makes sure the book doesn't already exist in the database, books_create calls for the validation requirements stored in BookSchema
Yes, though it makes more sense to make a BookCreateDTO into a BaseModel too and duplicate validation here 
You need to leave some validation in fastapi's schema here just so it's built-in validation works
But some frameworks I integrated with don't have any built-in validation
I see, seems like you gotta keep a close eye to prevent poluting the code with bits of validation logic here and there
I'm trying to streamline it into a clearly visible high-level structure
Hello everyone, i'm currently needing help with my safe chat i want to have a database and i need to have one for users and messages but sadly nothing is working as efficient and instead i have json errors, i woud like some help from anyone that have the time to do so : https://paste.pythondiscord.com/SWDQ
Here's another example ( strawberry-graphql doesn't provide any built-in validation):
@strawberry.input(name="DashboardCreateInput")
class DashboardCreateInputGQL:
name: str
def to_dto(self) -> DashboardCreateDTO:
return DashboardCreateDTO(
name=self.name,
)
@strawberry.type
class DashboardMutations:
@strawberry.mutation
@inject
async def create_dashboard(
self,
info: Info,
command: Annotated[DashboardCreateCommand, Inject],
input_: Annotated[
DashboardCreateInputGQL,
strawberry.argument(name="input"),
],
) -> DashboardCreatePayloadGQL:
dto = validate_callable(input_.to_dto)
if isinstance(dto, ValidationErrors):
return DashboardCreatePayloadGQL(error=dto)
result = await command(
dto=dto,
user=await info.context.user_model,
)
return DashboardCreatePayloadGQL(
dashboard=DashboardGQL.from_orm_(result.ok_value),
error=None,
)
validate_callable is pretty simple:
def validate_callable(callable_: Callable[[], T]) -> T | ValidationErrors:
try:
return callable_()
except pydantic.ValidationError as exc:
return _to_graphql_error(exc)
Would this sort of structure make sense?
Questionnary method
Obtains a set of user imputs in a prompt-answer format
Cohersion method
Coherses questionnary strings into determinate data types
Validation method
Complex validation (Does validation which requires server access)
Simple validation (Simple Pydantic)
Model-creation method
Creates one or more ORM models from the incoming cohersed and validated data
To database method
Sends models to database
Yeah, I would split your application into "adapters" and your core/domain
For example you could have a cli adapter and rest api adapters, both could use the same methods from core
Adapters could use core, but not the other way around, keeping the core more or less isolated
cohersion, validation, model-creation, and transfer to database would be part of the core, right? the above Questionary method would be the part where we code the adapters
cohersion? Depends
Adapter should map it's data into the format that core understands
I see. so validation, model-creation and transfer-to-database would be the elements of the core
Yep
I think you can use pydantic models in core too, so the data received is guaranteed to be validated
noted! thank you very much for the help seriously, this has done a great deal in orienting me for finishing my application
tests/test_imports.py lines 10 to 16
def test_core_cant_import_adapters() -> None:
(
archrule("core_imports")
.match("app.core.**")
.should_not_import("app.adapters.*")
.check("app.core")
)```
coool!
@paper flower
Why ping me?
If you are not busy
I posted a thread in python help if you have time
the DTO is the end product of the adapter segment, right?
yes
nice! this all makes nice structural sense. thanks a lot again
@floral surge Generally you want your adapters to interact with core of your applications via pre-defined ports, I use Command/Query classes for that
You could use services directly but I think that exposes too much for the adapters
e.g. BookService.create returns Book or BookAlreadyExistsError, if we would want to add permission validation it wouldn't really be the right place to do that
hm, a service is a distinct conceptual unit in your program structure, right?
Yes
So I would put that into the command class:
class BookCreateCommand:
def __init__(
self,
book_service: BookService,
permission_service: PermissionService, # <-- New class
) -> None:
self._book_service = book_service
self._permission_service = permission_service
async def execute(self, dto: BookCreateDTO) -> Result[Book, BookAlreadyExistsError | PermissionDeniedError]:
if not await self._permission_service.can(Action.create, obj=Book):
return Err(PermissionDeniedError())
return await self._book_service.create(dto=dto)
Existing code that uses BookCreateCommand and properly handles error values would fail type-checking, since PermissionDeniedError is a new error
Like this:
src\app\adapters\api\books\_router.py:39:38: error: Item "Err[BookAlreadyExistsError | PermissionError]" of "Ok[Book] | Err[BookAlreadyExistsError | PermissionError]" has no attribute "ok_value"
[union-attr]
return BookSchema.model_validate(book.ok_value)
^~~~~~~~~~~~~
src\app\adapters\graphql\apps\books\_mutations.py:37:37: error: Item "Err[BookAlreadyExistsError | PermissionError]" of "Ok[Book] | Err[BookAlreadyExistsError | PermissionError]" has no attribute
"ok_value" [union-attr]
result=BookGQL.from_orm(result.ok_value),
hm, BookCreateCommand is instanciated by a user to initiate the creation of a Book object, which is then uploaded to the database with BookService?
Yep
Say if we are making a database where attendance of students comes into place
Over there we have to get date wise attendance
Would it be better to have a separate table for each date, where the student serves as the primary key and it just has two columns (student and present/absent column)
Or have a single table with the date column as primary key and each student has their own column (date, student 1, student 2..... student n)
@tough kiln Never go for the last option
Could you describe in a bit more detail what data you want to store and how you want to use it? 
There may be a better design
I want to store whether a student is present or absent on X date
Probably a table like this would do:
student_id, date, is_present
I would use a surrogate pk here and put an unique constraint on (student_id, date) 
Composite pkeys are a pain to deal with in most cases
I was thinking on that, seems like a better implementation
Just to configure pulling records now from a gui
alright i got a fairly hard-ish question for who ever can help.
how would one store data into a db that resembles the github activity calendar basically. Like just the logic behind it, how would be the best way to store that activity data to an account. Im fairly new to data modelling etc, sorry!
That activity calendar could theoretically just be based on a table of edits. You may not need to now anything more than user and date of each edit
What would be the best way to store around 10+ billion records, around 5 - 10 TB of data total
All of the cloud solutions like aws / google cloud / etc seem to cost hundreds if not thousands per month which I can't afford for a hobby
I'm assuming the best solution would be to buy a physical server and self host it and then just worry about electricity costs, but wondering if anyone has any experience on this front
You'd need to explain more about your use case, such as access pattern, change rate, etc.
Maybe explain what you're trying to do, what kind of data it is, etc.
The data would be queried / updated constantly, but wouldn't have to be high performance, and it's just 1 billion user profiles / archived profile info, and maybe like 20 billion relationships / friendships
oh, so graph data?
like as long as it could return a query within 10 seconds then that's fine tbh and it doesn't need to update the data quickly
no it's just user profiles. Like imagine parsing every instagram profile in existence or something like that (but for another website) as well as relationships like friends, profile summaries, avatar history, etc.
My usual thought process for this type of problem is first "Can I do it with parquet files?" (hive partitioned ofc)
Then second: Can I fit it into memory? If so, duckdb (is my personal choice)
would you store them on amazon s3 or something? cuz i think i looked at it before and it still seemed to be like $200 per month which isn't terrible but yeah
Do you know how well this data will compress?
If you don't, I would create a test case (like 1 million records into a parquet file)
no i just sorta calculated the end would be like 4 - 5 tb once everything's indexed etc in a typical sql database
Yah but you wouldn't use a typical sql database for this. You'd use an OLAP / columnar database, for sure.
This is static data, right?
Or are you appending? Or are you modifying?
there's multiple tables but there's say like 10+ billion relationships that tells you the date when x and y user were last seen as friends. New records would be added and old records would be updated with more recent timestamps like last_seen
And maybe 1 bill profile records with info such as summary, avatar hash, and whatnot, that might also be updated
i'm guessing it'd best to like store the relationships in a sql database but then individual profiles in parquet files or some other way
You'd just put everything in a columnar store.
Oh, sorry, read closer: new records added / old records update. I dunno, you'd have to test... if it's constantly updating, you might be better off with a key-value store, rather than a rdbms
do you think it'd be ok to buy like one of those 32cpu 256gb ram servers and then i'd mostly just worry about electricity costs and network stuff
idk that much about databases
Candidly, I think you're trying to run before you crawl. Get more data before spending $$.
Do some experiments on a smaller dataset.
Parquet, for instance, is especially nice when you can accumulate the data. ie: having a daily file for changes, etc.
Or, when you can do something like run a periodic job to update the "last seen" value... rather than computing it in real time.
yea there's also definitely ways i can optimize the db to make it smaller but i'll test it more later, ty
oh but: yah, usually buying the equipment yourself will save you in the long-term... cloud is great for availability/etc, but the $$'s add up.
The entire project still has to be planned carefully cuz you can't periodically just update 1 billion profile records for example, it has to only be active / relevant ones which i'm also taking into consideration so it's like only updating / tracking the top 10 million profiles for example
But, a 32core / 128GB system would cost about $3k usd.
i just saw like rly good ones on ebay for less than half that
granted they're used tho and stuff
I just bought one, top of the line, so that's what I paid for absolute latest i9-14th gen, so yah, agree.
(plus GPU)
oh mhm i haven't rly bothered to check the gen of the specs or anything, i think i even saw like 512gb ram 44 cpu for $2400 but that was without hard drives or a gpu and stuff
i'm assuming u need a bunch of ram to query rly large databases
if you index it properly, it shouldn't require that much RAM I believe?
I'd be more worried about the disk usage personally
(agree with etrotta) I doubt you'll need anywhere near that... but I don't know your workload, ofc.
ty for your response, i will look into it
Hi all this is more of a mysql python scripting issue.
But currently with this function, I get an addition of extra quotes for the parameters I pass in:
cur.execute("CREATE DATABASE IF NOT EXISTS `%s`;", (db, ))```
Example: ```createDB('supplies')```ends up being `''supplies''`, I've done this with f-strings but I know that it's best practice to avoid using string formatting... does anyone have any idea how I overcome the addition of extra quotes?
It’s that the db is adding quotes already when you use a parameter. I believe you’d just drop the quotes around the %s
Im not 💯 sure tho about mysqls support for parameterized database names: I understand the quote issue but was a little surprised this even worked.
If I take off those `` i get "mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''chatgpt'' at line 1"
with them CMySQLCursor: SHOW DATABASES; ("'chatgpt'",) ('information_schema',) ('mysql',) ('performance_schema',) ('sys',)
Yah, see, that’s what I -expected- would happen. Usually you’d need to prepare a dynamic statement or something
So, i am surprised your first thing worked at all
How would a dynamic statement be formated?
Oh I see, and I would would have to set the conn.cursor(prepared=True) and then setup the dynamic sql phase.
In your experience would there be a better way of doing this? Other than dynamic sql? Maybe use cursor.execute in a different way? like more directly?
Most dbs I work with don’t play well with dynamic table or column names. Query parameters are meant for values.
This is a common request/problem
Right, so the process of architecting the DB should be done with specifying values in the sql instead of passing them in. Sounds annoying xD
Very annoying indeed. It’s such a common problem: wanting dynamic table names.
I guess I could always just run the sql from a file, specific to each use case.
and use the function to call that specific action. Hand change it if I need to.
Why do you need to create db dynamically?
Im new to db creation, so I thought it would be more convenient and possibly efficient if I could run most of what I need using functions that I can change on the go If need be, especially If I'm linking it to another python script that might pass in different values.
Im not really used to setting up scripts for Database creation, and I want to have an understanding of how it works at scale and eventually fit for production.
If we're talking about a typical app usually a developer or a devops person, whoever has access or operates a db sets up a logical/physical db for you and provides you with it's name and credentials
Seeing as This is more of my own project, I will be doing that myself, I think you gave a good point out to the approach I should take for planning my db and structuring it for implementation. How would you go about setting up logical/physical db? and also how do you setup the script for it? How do you automate the sql for it to be easily integrated into the whole program? Hopefully I am not asking too many questions 😛
https://dbdiagram.io/ to plan ur database
could use an ORM or write raw sql for ur app to interact with the database
Thanks, I did consider using such tools, but didn't follow through yet. I believe MySQL comes with Workbench tool which does that. I was trying to get the functions coded before I setup the database, but that might be the wrong order... : )
I don't automate setting up the db
You just create a db and use some kind of migration tool to create all the tables and other db object
e.g. alembic
When time goes on you don't need to think about designing a db/tables that much becaues you already have experience with that, unless is's some kind of complex relationship, so you just write/create them
For example:
I need to store authors and their books: well, that's simple one-many relationship, create author and book tables, add author_id to books, done
Turns out there could be co-authors, well, add a third table to facilitate many-to-many relationship, migrate existing data with simple sql, you're done 😉
I see that would make more sense, than functions that are inherently used for creation and recreation. Thankfully I have got backups working so that will be a tick off the boxes. Also thanks for the tool recommendation, I am currently checking it out.
Also the example is appreciated, sometimes it's hard to see through the logic and abstract a situation for me!!!
Yeah, you should be able to create a copy of your production database schema locally, and you shouldn't make any change manually, so it's reproducable
Example for what specifically? 
Oh no the example you talked about, "store authors and their books".
What do you need exactly?
I like how you go to the point with it
Was just thanking you for the example lol 😛
Ah, I see 😅
I thought you wanted me to provide a concrete example (e.g. code) so I was a bit confused
I usually use sqlalchemy as an orm and alembic to handle migrations, it can generate migrations for most of the changes in sqlalchemy models automatically
But some things it doesn't handle well, e.g. column renames, it would think that you deleted and added a new column, you have to modify such migrations manually
You can take a look at django orm too, but I don't really like it
Haha no that would just be over the top.
Oh interesting, and here i thought the ORM was for visualization purposes (literally seeing it).
Haha, no
Thanks for all the info, I'm gonna read up and document what I need for a working framework for understanding this process!! Have a good one 
sqlalchemy itself is quite easy to bootstrap
But you may need to fiddle a bit with alembic
I'm still not at that stage, but def will keep it in mind
Hi All,
I've a query regarding an issue we've been facing lately in our MongoDB v4.4.18 cluster running over AWS EC2 instances and managed by Cloud Mongo service.
We are running with a Primary, 2 RR and 1 delayed hidden. The delayed hidden in 6 hours behind current time.
Lately, the replication lag between our delayed hidden instance and primary has gone over 300+ secs 2-3 times.
Once, the issue was due to our EC2 volume configuration, it was not able to handle the throughput of incoming data.
We checked it with AWS and fixed it and it went to normal.
2 weeks later, we again start getting the alerts of lag and the metric which we checked before for volume was normal.
So we don't have any clue what could be the issue here. Any open ideas?
hey guys
I know sqlite3 has some JSON support, do I need to install the extension or how does it work when using it trough Python?
oh interesting
but it still has a some builtin JSON functions: https://www.sqlite.org/json1.html
ok, so I'm rewriting an API that deals with a graph, and the most common operation I need to do is find the descandants of a given node, so instead of storing two columns as parent: str, children: list[str] I will simply store the parent once for each child, some thing like this:
create table parent_child (
parent integer,
child integer,
constraint parent_child_pk primary key (parent, child)
);
insert all
into parent_child values (1, 2)
into parent_child values (1, 3)
into parent_child values (3, 4)
into parent_child values (3, 5)
into parent_child values (2, 6)
into parent_child values (7, 8)
into parent_child values (7, 9)
select 1 from dual;
commit;
then I can use smth like this query to navigate it
with descendants (node) as (
select 1 from dual -- root
union all
select child from descendants inner join parent_child on parent = node
)
select node from descendants order by node;
what do you guys think of the design? I plan on using an in process SQLite3 DB
We recently published article about Cassandra..
Cassandra: Use Cases and Python CRUD Operations
https://www.blackslate.io/articles/cassandra-use-cases-and-python-crud-operations
can I index a column with duplicate values in SQLite3?
Yes, just don't use the UNIQUE keyword on index creation.
How long is altering a table to add a foreign key in one of your columns after you created an index for that column suppose to take? It’s been processing for 10 minutes so far.
I think altering a table never takes that long
depends on how large the tables are and which database you're using I think?
I believe it creates a new table behind the scenes, thats what I've heared
I did that thanks
I tried this query and it work great, until it finds a closed loop.
is there a way I can modify this query to keep track of the nodes that were already seen:
with descendants (node) as (
select 1 from dual -- root
union all
select child from descendants inner join parent_child on parent = node
)
select node from descendants order by node;
Even if this is the case, I think it happens way faster than ten minutes
its all relative, it depends how large the table
I had a MySQL table that had more than 50GB of data, and when I try to drop it and it took a few minutes and eventually the DB crashed...
any ideas anybody?
I dont want to use Python 😭
Hi guys, i'm using python firebase admin sdk and uploading csv file to firebase storage, now i'm trying to store the public downloadble and also setting that url in user collection document... but when i tap on the generated link, it throws permission error: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).
firebase storage permissions:
service firebase.storage {
match /b/{bucket}/o {
match /{allPaths=**} {
allow read: if true;
allow write: if request.auth != null;
}
}
}
python code to generate public url:
from firebase_admin import storage, firestore
bucket = storage.bucket()
blob = bucket.blob(file_path)
blob.upload_from_string(df.to_csv())
ref = db.collection(f"users/{user.uid}/history")
ref.add({
"query": str(search_query),
"result_file_url": blob._get_download_url(storage._StorageClient),
"created_at": firestore.firestore.SERVER_TIMESTAMP,
})
howdy all, i was given the task of creating a very barebones ERM in python, im having a problem with getting everything onto one window, any suggestions?
Hello, what value type do I use so that sqlite doesn't use decimals? Trying to print the database that has numbers in it results in the numbers ending with .0 (those are whole numbers)
Altering a table can indeed take a long time, lots of ‘it depends’ here
Adding a column in MySQL would result in rewriting the entire table, for instance
In most cases, tables are considered unordered… results can be returned in any order, unless you specify an order by. PerhPs it’s more precise to say: query results are unordered, unless order is specified in the query
guys, any ideas how I can keep track of visited nodes in a recursive CTE query?
I wonder if its possible to create a temporary table inside the CTE query that has one column, and I simply INSERT each node I visited
~~Why does SELECT DATE_ADD("2020-9-22", INTERVAL 2 MONTH); fail to run? I've seen this as the syntax on multiple websites? ~~
The error I get is: "Wrong arguments '2': Date_ADD() only takes TINYINT/SMALLINT/INT types as second argument, got INTERVAL_YEAR_MONTH. Looks like we use HiveQL 
This SELECT DATE_ADD("2020-9-22", 2); works just fine, by adding 2 days.
yoo how fu**ing clean is this:
conn.execute("""
WITH descendants (node) AS (
SELECT 149505
UNION ALL
SELECT child FROM descendants INNER JOIN categories ON parent = node AND NOT node_seen(node)
)
SELECT node FROM descendants ORDER BY node;
""").fetchall()
and its super fast
The magic number is bugging me
thats the ID of the root-node
the crazy part is actually the Python function node_seen()
but I think I will still use a different method
Is there a way I can optimize storage and performance if a copumn has many duplicate values?
I thought about using an ENUM but Mysql has a limit of just 64k unique values, is there an alternative?
Would be nice if there was something similar to pd.Categorical so I can hot encode the column
What do you want to optimize for? I don't quite understand
If there are 64k+ unique values it certainly shouldn't be any kind of enum
also, this is why your database has compression... The compression algorithms will almost certainly be removing all the duplicate data if it is sane to do so
mostly for storage, but also comparing an integer to another integer is much quicker than with a string
OLAP engines are good for this, but traditional row structured db’s won’t reduce across rows.
I suppose I will store the INTEGER to STRING mapping in a separate table, and in the other I will just keep the IDs/integers
what like Duckdb?
Yah, and parquet files too… column orientation unlocks columnar compression
I need indexed columns tho, with O(1) lookup
I dont think there is a way to avoid a full-scan with parquets
It’s not quite that simple: Parquet is organized in row groups, and when you scan, you’re only scanning the target column.
And row groups have pages/etc.
But, columnar comes with other tradeoffs, and I was just pointing out the diff.
I'm aware of that, also file metadata can help to know if to scan that file ...
but what I'm doing is too performance intensive, graph traversal
so I will manually hot-encode the values
What are you storing that enum-like values as right now?
@paper flower this
its this kind of data
I don't see the actual type 🙂
The most optimal type would be something like int32/int64, check how much storage you'll save and if it's worth it
Also how many records do you have?
the underlying type is a string, int32 is exactly what I need. the ENUM is a uint16 I believe (in MySQL)
And how many records do you have?
Mybe just use an int32 + foreign key if you can 🤷
@queen rose ?
idk man, the unique values are about 20% of all the values
How many rows?
Because if it's, say 200k rows it's not work optimizing at all
You don't know how much data you have and yet you want to optimize something?
If we assume average size of your string is 10 chars, 8 bytes each, then for 10m rows it would be roughly 700mb of storage, but I'm not aware of how db would actually store it
>>> 8*9 * 10_000_000 / 1024 / 1024
686.6455078125
Just think if it's worth optimizing for or not 🤷
You could definitely use a foreign key here but if you'd need to filter by name you'd have to join to a different table
Which would be slower
why 8 * 9?
yeah utf8
I'll use integers, I just gotta check how to configure that with SQLalchemy
Don't 😅
Do you actually need those string values?
If yes - where would you be storing them?
its definetly much much higher, even with SQLite I had a file of 500mb
Probably not because of the strings
what other columns do you have?
Also db could have been not truncated too
half of that was indexes (on the two columns)
anyway dont worry, I like optimizing
hello
does anybody know how to define an int32 with sqlalchemy?
I ended up using from sqlalchemy.dialects.mysql.types import MEDIUMINT with MEDIUMINT(unsigned=True)
god, I hate how SQLalchemy requires a PK in each table
What stops u from writing reusable Mixin class if necessary 🙂
With your desired default pk and type
wdym create my own mixin?
but I dont want a default PK
I want some tables without any PK
make all columns into a composite PK
but I dont want to spend that extra storage?
why should I have to have a PK if I dont use the ORM interface
its very annoying
Are there any library which allow us to load a mysql database in redis?
thanks man
It does not 
pk could be "virtual", but yeah, it requires it
does not have to be an actual pk though
Oh, etrotta already linked the relevant docs 😅
There are libraries to read from MySQL and others to write to Reddis. I doubt you'll find one that does both.
Ohh ok ok got it! Thanks
Hello, I'm trying to combine two crosstab in postgres but stuck it will be great if someone guide me
-- rainfall CROSS TAB
SELECT * FROM
CROSSTAB(
'SELECT block_id, forecast_date, rainfall_mm
FROM weather.imd_forecast
ORDER BY 1'
)
AS rainfall_crosstab(
block_id int, rainfall_day_1 numeric, rainfall_day_2 numeric,
rainfall_day_3 numeric, rainfall_day_4 numeric, rainfall_day_5 numeric
);
-- temp_max CROSS TAB
SELECT * FROM
CROSSTAB(
'SELECT block_id, forecast_date, temp_max_deg_c
FROM weather.imd_forecast
ORDER BY 1'
)
AS temp_max_crosstab(
block_id int, temp_max_day_1 numeric, temp_max_day_2 numeric,
temp_max_day_3 numeric, temp_max_day_4 numeric, temp_max_day_5 numeric);
my aim is to make one crosstab of rainfall and temp
so I benchmarked my graph traversal with MySQL and SQLite, and with SQLite it only took 1.2 seconds to scan the whole graph from the root-node, while with MySQL, with the same exact data and query it took 10 seconds
10X slower
SQLite really is the GOAT db
after adding indexes it only takes 6 seconds, still 5x slower
That is pretty suspicious
but also idk why you are using SQL for graph traversal unless you are creating connected components
Graphs in SQL is a hugeeeeeeeeeeee inefficient mess
There is modern shiny this lib to have it working nicely.
https://github.com/feincms/django-tree-queries
U could try this instead of whatever you are doing
this solution is optimal from the point of writing at least
while having nicely clean.... from the point of running sql code for reading
There are also alternate solutions which are very optimized to reading operations... but they are...mmmm in a very messy state.
They're not using django
As someone who's been developing restful apps for quite awhile, how/where do I start learning about graphql?
how trivial is it to close a cursor (not connection) when working with a database?
!rule paid
Is there a way to remove commas on integer when I fetch the database? When I use a command to get a database the output is like this
[(123,)]
that might be the reason why a certain function isn't working but I'm not too sure
that is
- a list containing a
- tuple containing a
- integer
you could usething[0]to select the first element of the list, orthing[0][0]to select the integer in that case
what the
how xD
probably something with the add command on my program
Hey all friends! I want to grab the coordinate data of a large number of data points from a graph, but my graph is not complete, I need to do mathematical fitting to get the missing data points and complete the curve, and I want to import the coordinate data into Excel or a database for subsequent point fetching tasks. Can this be done using Matlab OR Python and what tools or libraries do I need to use?
Can I use sqlite on production?
If anyone here is a MariaDB user, I could use your help! I'm running a bot that uses a MariaDB MySQL database connection, but it's not very active, so it times out rather frequently. Is there a good way to prevent this happening? I've thought of checking to see if the server is active, and if not, reinstating the connection, but I don't know how to check whether or not the connection is still active. Any help is appreciated!
Not if you're using sqlite3 or aiosqlite. fetchall might fetch values from multiple records, so it puts them in a list. Each of these values is put in a tuple because you may have fetched multiple values, like with fetchone
Yes, depending on your production environment 😉
I work with a legacy customer who does and it's been running that way for years but a) it's a fairly low-traffic app and b) maintenance is pretty annoying and time consuming.
So depending on the use case I'd say you might get away with it, but would avoid it if possible
You can, but isn't the best option in long term, even on my small projects I had to change to MySQL due to the limitations
There are also many non-python applications that use an embedded sqlite database. It's a fairly old/proven technology.
guys, im having a column in a table for storing email ids
for that, i've decided to add a constraint to check whether they match the email regex before inserting the data
is this ok to do? like, im just validating the input one step prior
is it really worth it?
im using postgres and this is the regex ^(?!\\.)[\\w.-]*[^.]@\\w+(?:\\.\\w+){1,2}[^.\\W]$
but another problem is, its not working
pls ping me on reply
Does anyone know if catch an sqlalchemy error like this one
<bound method SQLAlchemyError._message of DataError('(pymysql.err.DataError) (3263, "NULL supplied to NOT NULL column \'future_releases\' at row 1")')>
How do i actually use that obj to get at the message itself "NULL supplied to NOT NULL column \'future_releases\' at row 1" ?
what do you mean storing ids is a problem? How?
Dude: This is def the wrong channel for that 😄
Hello. I would like to get rid of the escape characters format in my database. They were updated like this in Python but I changed the code to stop that from happening. Now I would like to change the formatting of the JSON string in the table. How can I do that?
I tried ```sql
SELECT *
FROM jb_projects
WHERE JSON_UNQUOTE(JSON_EXTRACT(obj, '$')) LIKE '[______,%';
It says
[22001][3140] Data truncation: Invalid JSON text: "The document root must not be followed by other values." at position 4256 in value for column 'jb_projects.obj'.
We can calculate if object has children as
SELECT 1 AS "a" FROM "zone_zone" WHERE "zone_zone"."parent_id" = '728046be-0df2-432c-0001-ac781965844f'::uuid LIMIT 1
How to rewrite it in a good way to be optimal for postgres for multiple zone objects in bulk to check?
Some kind of window function i would guess to apply? 🤔
or alternatively we could go with just subquery perhaps
SELECT *, (SELECT 1 AS "a" FROM "zone_zone" WHERE "zone_zone"."parent_id" = ZONE.PK::uuid LIMIT 1)
FROM ZONE
WHERE ZONE IN desired list of PKs
As a dumb way i can think of 🤔
(i am trying to affirm that zone is not having any children (parent_id is a thing from adjucency list to represent graph)
SELECT *,
count/exists() OVER (PARTITION BY zone_zone.parent_id)
FROM zone_zone
WHERE ZONE IN desired list of PKs
something like this may be possible to write
or it is less efficient than handling as a subquery above 🤔
Essentially i am trying to write query: SQL check if foreign key exists for every row
The error here is not the query, but the value of "obj". Can you share an example row or reproducible example?
for example: sql with mydata as (select 'somejsontext' as obj) select * from mydata where JSON_UNQUOTE(JSON_EXTRACT(obj, '$')) LIKE '[______,%';
SELECT z.pk
FROM zone_zone z
WHERE NOT EXISTS (SELECT 1 AS "a" FROM "zone_zone" WHERE "zone_zone"."parent_id" = z.pk::uuid LIMIT 1))
in theory perhaps should be efficient way to check SQL for every row, check that no row has parent id belonging to it
subquery way with abusing EXISTS to maximum
This is one of those areas that annoys me endlessly when optimizing. It's often hard to predict which query is going to be optimal.
[367603, "{\"id\": 367603, \"name\": \"name_goes_here\", \"short_description\": null, \"custom_id\": \"12345\", \"version\": \"1.4\", .....
Is this what you’re asking?
Yes, so put it all together in a single query that reproduces your error.
Graphs in relational db 😅 problems by default
But I only want the first 8 characters removed.
Also, what database is this? Postgres?
MySQL
If you just want to remove 8 characters, why json_extract/unquote? You could just take the substring.
always an option to try writing dummy performance test though
Yah, I always seem to need to. Keeps me in business.
I don’t get what you mean by put it all together
My query above, ```sql
with mydata as (select 'somejsontext' as obj)
select * from mydata
where JSON_UNQUOTE(JSON_EXTRACT(obj, '$')) LIKE '[______,%';
Replace somejsontext with an example of your JSON text.
tuple.rstrip(',')?
technically i can always run them both with EXPLAIN ANALYZE first on our staging or production db i guess
before i try writing final ORM code i am asked to write
with jb_projects as (SELECT '[' as obj) SELECT * FROM jb_projects WHERE JSON_UNQUOTE(JSON_EXTRACT(obj, '$')) LIKE '[______,%';
This is my error
[22001][3141] Data truncation: Invalid JSON text in argument 1 to function json_extract: "Invalid value." at position 1.
Well, that's easy. '[' is invalid JSON. Give it valid JSON.
Not really. Because then I can’t use JSON keywords in my query then?
If ‘[‘ isn’t JSON, then I can’t remove the first 8 characters that are in the obj column
with jb_projects as (SELECT '{"a": 2}' as obj) SELECT * FROM jb_projects WHERE JSON_EXTRACT(obj, '$');
If you just want to remove the first 8 characters of a string, then: ```sql
select substring('123456789abcdef', 9)
JSON_EXTRACT requires the input to be valid JSON.
Ok. So then any JSON keywords won’t work then.
I’m trying to remove “[______,” that are in my obj column. I have some rows like this in my obj column.
Use substring to remove the text you want to remove.
Ok
My goodness.
[22001][3140] Data truncation: Invalid JSON text: "The document root must not be followed by other values." at position 4256 in value for column 'jb_projects.obj'.
My query
UPDATE jb_projects SET jb_projects.obj = SUBSTRING(obj, 9) WHERE obj LIKE '[______,%';
let me know if i understand the problem correctly. you inserted a list [a_number, json_string], and you want to just change it to be the json object that json_string represents
Correct.
you can use a list comprehension to make it into a list
result = [(11,),(222,)]
>>> [r[0] for r in result]
[11, 222]
or if you know you have 1 row you can just get it directly from the result
result[0][0]
11
result[1][0]
222
which is the same as saying
val = result[row][col]
You really should start with a basic question: Do you have valid JSON?
Just open the data in VSCode, in a .json file. and see what errors pop up.
I assume this particular error is because you didn't remove the trailing ']' after removing the leading '[', but who knows what other issues there are.
the original question didn't make sense to me. the error wouldn't have been caused by escape characters, rather trying to insert a string with 2 json objects concatenated together i think
Per #databases message, I think the same: the text is just something like: '[1234, "{escapedjsontext}"]`, which... even after removing the first element is still not valid json.
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
it should be valid json though, right? as long as the string was escaped properly
Nah, it should look like this https://dbfiddle.uk/vE_RIkoa: ```sql
WITH jb_projects AS (
SELECT '{"id": 367603, "name": "name_goes_here", "short_description": null, "custom_id": "12345", "version": "1.4"}' as obj
)
SELECT JSON_EXTRACT(obj, '$') AS extracted_id
FROM jb_projects;
| extracted_id |
|---|
| {"id": 367603, "name": "name_goes_here", "version": "1.4", "custom_id": "12345", "short_description": null} |
So, need to extract the second element from the list, and also unquote it
https://dbfiddle.uk/8jkSLQZJ will unquote.
| json_extract(json_unquote(obj), '$') |
|---|
| {"id": 367603} |
hmm. i thought the data in the message you linked was a print from python
i'm a bit out of my depth (breadth?) with mysql though
json and databases is usually a weird area.. I never like working with it.
i think there are some good use cases, but it's very icky from a theoretical perspective
hi guys what is this error:
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 7932089088 and this is thread id 6135279616.
The values in the obj column that start with “[“ followed by 6 digits that end with a comma is not valid json.
That’s why I’m trying to remove it.
columns got updated that way with Python
My question was/is: is it valid if you remove the first 8 characters?
Yes (I think)
I will also want to remove the escape characters
Hint: it’s not.
But I have been told that with escape characters, it’s valid JSON syntax
the error was not pointing to an invalid string, it mentioned that you have two json objects in the column
So how can I remove the first object?
i'm not sure. i don't know how you would do it in mysql, and i'm honestly not sure how you would do it with python
You can substring and remove the leading and trailing character(s). That why I suggested copying to vscode and seeing what’s required to make the string valid json
Ohhh ok I see
yeah if it's just one entry, then you can just do it manually
I was thinking of just slicing (substring is same as a Python sliceb
yeah
its not a problem actually
but i've just decided to implement a regex check constraint to verify that its indeed receiving an email as a value for its column
chat GPT suggests this type
CREATE TABLE employees (
id serial PRIMARY KEY,
first_name varchar(50),
last_name varchar(50),
email varchar(100) CHECK (email ~* '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')
);
otherwise both your regex and chatGPT regex seem to work on my tests with re.match
so, i cant use \w
i just have to expand them it seems...
i'll try this and report it later
thre are a few forms. but yours worked for a few i tried but as the stackoverflow showed a+b@whatever.com is also a valid mail address. so you possibly could just test the entries with re.match(patt, email) on all entries in the DB to see if it holds up
guys, how can i encrypt the entire postgres db at rest?
i tried using capture groups, negative lookbehind
but it failed
how did my regex work for u?
r1 = '^(?!\.)[\w.-]*[^.]@\w+(?:\.\w+){1,2}[^.\W]$'
re.match(r1, 'abc@domain.com')
<re.Match object; span=(0, 14), match='abc@domain.com'>
but its not very good
re.match(r1, 'a.bc@domain.com')
<re.Match object; span=(0, 15), match='a.bc@domain.com'>
no i dont use postgres. did the stack overflow help ?
yep, it did
the problem was postgres regex didn't support positive lookahead groups, and even flags like \w, \d, \s
guys, im thinking of encrypting the whole db at rest using pgcrypto to encrypt each column
but i read somewhere that if there's any error occurred while running the sql query, and if the error statements gets logged with the query, it'll expose my secret key that i use for encryption/decryption
so, is there any better way to encrypt the db?
guys, ive decided to make a personal password manager
and i've also decided to store all the info that i get from a site, like username, api keys, even the stuff like recovery codes, etc. in it, in a separate table
im creating a table for each one like this
CREATE Table IF NOT EXISTS emails (
email_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
email TEXT UNIQUE NOT NULL
CONSTRAINT valid_email
-- CHECK (email ~* '^[a-z0-9_-][a-z0-9._-]*@[a-z0-9_]+(\.[a-z0-9_]+){1,2}[a-z0-9]$')
CHECK (email ~* '^([^<>()[\]\.,;: \r\n\t@"]+(\.[^<>()[\]\\.,;: \r\n\t@"]+)*|".+")@(([a-z0-9-]+\.)+[a-z]{2,})$')
);
CREATE TABLE IF NOT EXISTS sites (
site_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
email TEXT,
site TEXT UNIQUE NOT NULL,
username TEXT,
CONSTRAINT fk_email
Foreign Key(email)
REFERENCES emails(email)
ON DELETE No ACTION
);
CREATE TABLE IF NOT EXISTS api_keys (
api_key_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
site TEXT NOT NULL,
apikeys JSON NOT NULL,
CONSTRAINT fk_site
Foreign Key (site)
REFERENCES sites(site)
ON DELETE NO ACTION
);
CREATE TABLE IF NOT EXISTS recovery_vault (
recovey_key_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
site TEXT NOT NULL,
recovery_data JSON NOT NULL,
CONSTRAINT fk_site
Foreign Key (site)
REFERENCES sites(site)
ON DELETE NO ACTION
);
now the question is, is the schema for creating a structured relational database is good here?
im keeping emails, sites, pretty much a separate table for each attribute
but now the problem i see is, if i wanted to insert a new entry in sites table, or in apikeys table, and the email id doesnt exist, then i have to run several queries to update the whole db
plus im doing full db encryption
what should i have to do here?
- just store everything in a single table, and fill null to empty values
- make separate constraints to add values to email table if not exists
- just make those extra queries (again, i've done a full db encryption)
guys, ive decided to make a personal password manager
Genius idea. I want to make mine too now
guys, should I create my own graph database in Rust even tho I've never used a proper graph db like Neo4J ?
very simply and probably dumb question
what is the point of a database, or where is it useful
a database is for storing (and accessing data), most real-world applications use a database of some sort
okay, can you give an example of such?
i see how in huge scale things such as a search engine or hosting sites need that, but what else
for example if you have an e-commerce website you want to store all the products that you have avaialble in a database (with their stock, price, description, etc.), you might also want to save some data on your customers; whenre they live, whats their email, how many orders have they made, etc.
it depends how you define "large scale projects", but even if your e-commerce business doesnt have nay customers, you still want to save that data somewhere
okay
I mean it depends by what your doing ..
in the simplest case, a database is for storing information persistently, i.e., after your app stops and starts, the information will still be there
Ah okay, ive done that a bit by litteraly changing the code lol
I will point out too, that there are many ways to get at data in a database. All programming languages support it in some way, and can intercommunicate though the database on that data. Some tasks can be updating and others displaying. Most databases have a common language SQL that in itself is very powerful data hunter that a few words can almost always, get you the answer. SQL is the single most important aspect of the top databases. On top of all that people have developed ORM which is a way to access all that data, with class objects that most languages support, making it a compelling choice of data storage even for a small project.
Okay
And today Python ( which i assume you know python ) which is rather slow, is becoming the center of data analysis because its easy to learn. But because its a slow language, people use databases to get results and update data faster. So, if you get a job doing anything accounting or data related, chances are someone has a database doing a critical job in the middle. So its important enough to learn.
I’ll disagree with a few things here, and I I don’t want to get into the ‘is Python slow’ debate. But: we use databases with every language. The language isn’t a factor.
Your experience and mine differ. Perhaps ive seen more systems than you but i have seen large systems and seen how the database is the only thing that really ties every language into that system in any unified way.
The python is slow. yes it is. getting several million rows from a database could introduce 3 seconds if you grab those rows with a pure python api. you can chop that down to a thenth of a second if you grab the rows from a call to the db with a the c client mysql -u user "select * from whatever" so what i have found is that you do as little work in python itself. Its the same code but you jsut made an improvement to you code by taking the workload off python
that is the patterrn over and over that i see on a day to day basis with python devs that i work with
But because its a slow language, people use databases to get results and update data faster.
IMO there are better alternatives in Python
Depends on the data
when you are joining 10 million rows to a billion .. definitlly doing that in sql not in python
I wouldn't want an SQL database for a fluid simulation
as an example
I wouldn't use an SQL database for training a neural network
as another example
Every program manages its data in the best suitable way, which isn't always in an SQL database
Tables are suitable when you want to use table records to begin with
Also, SQL tables can have constraints and relations to other tables
It's just a data management system which is suitable for one scenario
Then you are agreeing with me: database choice and language choice have very little to do with each other.
I believe it was you who raised the disagreement with my observations over the years that a database becomes the core of many systems written in many different languages. At the heart of all those systems is a common language SQL. You can say there is a side case that dosnt match. Fine. 99% of all systems i have seen meet the SQL standard. Perl used to be the Python 20 years ago. they come and go. SQL is still here at the heart of those systems. that was my point.
AH. I understood your point differently then. You said: "Because [Python is] a slow language, people use databases to get results and update data faster", which I took to mean: "We use databases because Python is slow"... which is the point I was disagreeing with. Understand and agree with your point now!
The python is slow business is not so much important for me. There are times when im told "this is too slow, make it faster" which inevitably comes down to doing some trick of getting the work done externally to python and taking delivery of those results. a very simple example would be if i have a 400Meg binary and i need to copy that. Its definitely too slow to do a open("file").read() then write that file out.. instead in python you make a file system call to cp file file2. im excited to see python 3.11 and 3.12 trying to make it faster though.
How strict are data types in Postgres when passing data from Python? For example, if I have a date data type column in Postgres, can I pass the date as a string from Python?
very strict, it will fail if it is invalid date, but at the same time u can pass it as a string, as long as it is a valid string u converted to date
STRING::DATE or something is available to cast value from string to date in sql
What if I put everything down as a string and convert it in python lol.
Mostly as a quick solution for a proof of concept. I signed up for CS50 last night and plan on learning Sql from start to end.
it is a bad practice
why would u wish hack strict static typing into Serialized Strings?
you could have not used Postgres then, and just used Sqlite3 then
you are creating yourself pits to fail on your own with this decision
I’m using SQLite3 currently but have been asked if I could make a mock software where the information can be accessed from anywhere. I want to get it working to prove I can do it and use the extra time given to rewriting for the new version to learn Postgres correctly.
What's the problem exactly? Migrating to postgres?
Because queries should look basically the same
Kinda. I currently don’t have extensive data validation as I’m racing someone else to get this done lol. SQLite doesn’t care it seems.
I think postgres would try to coerce string values but I'm not sure
You can always explicitly cast them
Also, you could use an orm
Not sure how to do that but will definitely be looking into it, either when this goes well or whenever I finish cs50.
So the question is how to make it so someone can access your SQLite DB from anywhere? You could share the file, you could use django, a lot depends on your use case and how much time you have
Is that possible? I figured postgresql through a database manager service (in using vultr) is the best option.
You could also do that, again it depends on your situation
How would I host a sqlite file so that people on different networks can access it?
uh, if you only care about reading you could literally copy/paste the file, but if you want for multiple people to be able to edit it and keep it synced, definitely use a "proper" database like postgres or mysql instead of sqlite
see also https://sqlite.org/whentouse.html
... SQLite will work over a network filesystem, but because of the latency associated with most network filesystems, performance will not be great. Also, file locking logic is buggy in many network filesystem implementations (on both Unix and Windows). If file locking does not work correctly, two or more clients might try to modify the same part of the same database at the same time, resulting in corruption. Because this problem results from bugs in the underlying filesystem implementation, there is nothing SQLite can do to prevent it.
A good rule of thumb is to avoid using SQLite in situations where the same database will be accessed directly (without an intervening application server) and simultaneously from many computers over a network.
Alright. Get Postgres “functional” and worry about doing it correctly when I get full approval.
ay, SQLite is superior to Postgres
also SQLite has a STRICT statement for creating strongly typed tables
interesting stuff:
https://www.sqlite.org/wal.html
What is the best method of handling database migrations for a containerized python application? Should the migrations be ran when the container is initialized or can/should they be automated externally? Examples appreciated. Thank you.
I have set up a Docker Django/PostgreSQL app closely following the Django Quick Start instructions on the Docker site.
The first time I run Django's manage.py migrate, using the command sudo docker-
Thank you, I will take a look
If anyone has experience with PySpark/Spark, I have a question in #data-science-and-ml that people here might find interesting
is it normal to be angry at a person who inserted
"Map(key->value)" as a single string value in database? my query is flooded with regexp_replace, because i need to use presto sql.
It is normal to ask what made the person choosing such design
It is normal to ask yourself, who approved Pull Request review to this (and if u have reviews in the first place)
It is normal to ask what is the database are u using where this is normally possible to do in the first place
Me: yes i finally fixed "Map(key->value)" to the format I need it to be. Feels great!
5 mins later...
Also me: nooooooo who tf put these data. What the hell is "Map(val1->val2->val3->val4)" or "Map(val1->val2->val3, val4->val5->val6)" or "Map(val1->val2, val3, val4->val5->val6)"?!
Looking for some assistance with SQLAlchemy. I have also asked on IRC in the #sqlalchemy channel. Can anyone help with this? https://pb.recoilnetworks.com/?48789a5e14a0cc3c#m+PKunPXY/XwNGADe3Qs4p0E/YOvp5EycexmUKOHq1o=
If you do not wish to click the link, here's a screenshot:
For some reason, the data set returned is a simple dictionary with only the first database column requested.
not particularly familiar with sqlalchemy, but doesnt scalars() only return the first column? you might want execute() instead
Hm, if I change the query to look like the one below, it works fine but returns columns I do not want:
query = select(User).order_by(User.firstname)
https://docs.sqlalchemy.org/en/20/orm/queryguide/select.html#selecting-orm-entities
When selecting a list of single-element rows containing ORM entities, it is typical to skip the generation of Row objects and instead receive ORM entities directly. This is most easily achieved by using the Session.scalars() method to execute, rather than the Session.execute() method ...
so if you wanted to your query to beselect(User)it would make sense to use the scalars shorthand, but otherwise it would throw away the rest of your columns
Man, I badly want this to work so I can use Python for APIs but NodeJS is so much easier (maybe perhaps because I have been writing it for so many years). We're using Python in many other ways with Lambda functions but not in any real APIs. I just cannot seem to grasp this. I tried changing it to session.execute(query) and it just throws up everywhere.
Which is weird because according to https://docs.sqlalchemy.org/en/20/orm/queryguide/select.html#selecting-individual-attributes it should work fine.
did you still want to retrieve the results as a list? cause .all() exists for Result too
Oh wait, maybe it does work. It's throwing up trying to serialize the result from what I can tell
Ah, it isn't returning key value pairs
Why don't you select an orm object?
Also rows should be named tuples iirc
Yeah, I tried that as well but that threw up on me, too.
Do you have an example? The documentation is pretty large and to be honest, it seems like there's 10 ways to do the same thing but for some reason, it doesn't work for me. For example, there seems to be multiple ways to define a table model. For what reason?
It was already mentioned to you -select(User)
And there's only one way to define a model right now
Okay, now I am really confused. I already mentioned if I define the select like that, it returns all the columns for each row in the database, which I do not want. I do not want to return the password column which means I have to select which columns I want returned. Right now, that works however it is returning a tuple instead of dict.
I think you could do select().options(defer(Model.password))
Forgive me my formatting, I'm on a phone
No worries, I appreciate you trying to help from your phone. That's definitely an interesting concept but can get real ugly, in my opinion. I am trying to find a more elegant way of doing it based on the documentation but nothing seems to be working and most search results are for SQLAlchemy versions prior to 2.0
I remember there was "only" function too
What are you looking for specifically?
With SQLAlchemy, there's actually two ways to define tables: directly by using the Table constructor, or indirectly by using ORM Mapped classes.
The result I am looking for is this:
[
{
"created_at": "2023-01-01 00:00:00",
"email": "johnny@appleseed.com",
"firstname": "Johnny",
"id": 1,
"is_active": 1,
"is_admin": 0,
"lastname": "Appleseed",
"updated_at": "2023-01-01 00:00:00",
"uuid": "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF"
}
]
If I use select(User).order_by(User.firstname) I get that exact result but with the password column, which I do not want.
If it's important, I am using FastAPI and have Pydantic models defined which I was expecting the model to be applied and the columns not defined in the model to be stripped but that doesn't happen. Haha
I know, I have been told in the past that FastAPI + SQLAlchemy + Pydantic is overkill and difficult to learn and I should use something like Django instead because it has tons of features built-in but I like doing things the hard way and Django has massive overhead.
Well, table does not define a model, but a table
Orm model defines, well, a model and a table underneath
So, from my understanding, I am using ORM Mapped Classes. I could be wrong though.
You can attach your own table instance to it if you want too
Yep
This appears to be the Table constructor, which I am not using:
from sqlalchemy import Table, Column, Integer, String
user_table = Table(
"user_account",
metadata_obj,
Column("id", Integer, primary_key=True),
Column("name", String(30)),
Column("fullname", String),
)
I am wondering if I should be, though.
I tend to see a lot of references to a column like this user_table.c.name in the documentation which is not available when using mapped classes
You don't have to use it really, sqlalchemy can and does create a table automatically for you
I definitely like the mapped class approach over the table constructor.
@marsh ledge I think using Column in orm class definitions is deprecated, you should use mapped_column
Yeah, I changed it. My class now looks like this:
from datetime import datetime
from lib.database import Base
from sqlalchemy import String, DateTime, Boolean
from sqlalchemy.orm import Mapped, mapped_column
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
uuid: Mapped[str] = mapped_column(String(36))
firstname: Mapped[str] = mapped_column(String(32))
lastname: Mapped[str] = mapped_column(String(32))
email: Mapped[str] = mapped_column(String(128))
password: Mapped[str] = mapped_column(String(128))
is_admin: Mapped[bool] = mapped_column(Boolean(False))
is_active: Mapped[bool] = mapped_column(Boolean(True))
created_at: Mapped[datetime] = mapped_column(DateTime)
updated_at: Mapped[datetime] = mapped_column(DateTime)
Which to me, is clean and understandable.
Also as of your original question, I think: https://docs.sqlalchemy.org/en/20/orm/queryguide/columns.html#sqlalchemy.orm.defer
And there's a guide for that too: https://docs.sqlalchemy.org/en/20/orm/queryguide/columns.html#orm-queryguide-column-deferral
Thank you, I appreciate it. I am going through the unified tutorial in the documentation and will have a look at those as well.
So you would use something like
stmt = select(User).order_by(User.firstname).options(defer(User.password))
And probably pass a raiseload to it too, if you don't use asyncio with sqlalchemy
i.e. defer(column, raiseload=True) if I'm not mistaken
Cool, I think sqlalchemy is a great library but it definitely has a high barrier of entry, especially since 2.0 migration happened and some stuff got deprecated
Yes sir, I am quickly finding that out. If I can get close to mastering it, this API will pretty much write itself. I am pretty good with logic having a programming background for 20 years but in other languages so I am just struggling with new concepts and what not. I have only been writing Python for a year now and have nearly 20 AWS Lambda functions and some other code that works great but definitely still learning every day. We do a lot with GIS and drone mapping so it has definitely been a fun learning experience.
I've used sqlalchemy pretty actively for the past year or two, so if you have any problems - you can ping me
Thank you, that is much appreciated. I will admit, what I am trying to build is definitely more advanced that just something basic which is probably not a good idea, but I am definitely learning.
Honeslty if you know how to use sql - you'll manage it with sqlalchemy
The API is pretty similar to sql, and it adds some helpful things on top, like relationships
Yeah, I can definitely write raw SQL queries, I am not a master but I do well with joins and grouping and what not.
If you have a relationship and must join to it for some reason (explicitly join, not to load it into memory, but to filter by a related object for example) you can actually specify a relationship:
select(User).join(User.address)
And it would build a correct join condition for you
If you want to load relationships - there's a guide for that https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html
Yeah, I have seen that in the docs, which makes complete sense. Just for some reason without using scalars(), the data set is always a tuple, even the documentation shows this 😦
Unfortunately, this does not work either though
results = session.execute(select(
User.uuid,
User.firstname,
User.lastname,
User.email,
User.is_active,
User.is_admin,
User.created_at,
User.updated_at
).order_by(User.firstname)).scalars().all()
You can read about the differences between scalar, scalars, execute in the docs
But you mostly use scalars
So, how do you return a dict from the database? LoL
Yeah, in this case only User.uuid would be in your result
Right, and that is what is happening.
scalar and scalars just takes the first column of your select
If you select an orm model it would be that model 👌
Yeah, I read that in the docs.
I just do not understand why returning a dict seems so difficult. Haha
scalars may be impossible to use in a case like this:
stmt = select(User.id, Address.street).join(User.address)
for id_, street in session.execute(stmt):
print(id_, street)
That's weird: https://docs.sqlalchemy.org/en/20/orm/queryguide/select.html#selecting-individual-attributes - That's pretty much what I am after.
I'd use orm object though
It's easier to work with
And you're selecting almost all fields
But again, I am getting tuples and not dicts so I cannot reference row.firstname
Oh, yes I can! It does work. LoL
Super weird. So now, how to convert that to a JSON object / dict so I can return it from my API?
What are you using for your api?
FastAPI?
I have to go for like 1 hour though 😉
In case of fastapi create according pydantic model
I have pydantic models created, just not sure what the deal is with those.
So, create pydantic models with fields you need, then if field names match you can use PydanticModel.model_validate method or map all the fields manually
And then return that from your API endpoint
Here's my pydantic models for the users: https://pb.recoilnetworks.com/?7b63d0ccbeeb6bcc#gyckHzcJkt4y3KilYMw5SGOgjvK3jOWXozTk36RoXn8= - I cannot paste the code here because it is too long
And then this is the route I am working with right now:
@routes.get(
"/",
summary="Get All Users",
response_model=List[User],
status_code=status.HTTP_200_OK,
responses={
status.HTTP_200_OK: {
"model": List[User],
"description": "OK",
}
}
)
async def get(session: Session = Depends(get_session)):
result = service_user.get(session)
return JSONResponse(
content=jsonable_encoder(result)
)
Just trying to return all the users in the database.
I wish that would just strip the columns from the result set that are not in the User model, but that is not the case.
Well, this works:
results = session.execute(select(
User.uuid,
User.firstname,
User.lastname,
User.email,
User.is_active,
User.is_admin,
User.created_at,
User.updated_at
).order_by(User.firstname))
return [dict(r._mapping) for r in results]
It isn't pretty, if you ask me. But it works.
And, I am accessing a protected member of a class which PyCharm hates
I know you're away but when you get a minute, if I am using Pydantic models correctly, should the password column be stripped from the result set before it is returned since the Pydantic model I am using as the response_model does not include that column?
Only fields that are in that model would be returned
Okay, so something is not working properly then with those.
I am a bad developer, I have never written test cases for code. That's something I need to learn.
I've always been in startups in the last 20 years where I am the only developer and server administrator doing everything so I did not get time to learn to be a real developer, if you will. Haha
Well, based on the models defined:
class UserBase(BaseModel):
firstname: constr(strip_whitespace=True) = Field(
max_length=32,
min_length=2
)
lastname: constr(strip_whitespace=True) = Field(
max_length=32,
min_length=2
)
email: EmailStr = Field(
max_length=128
)
is_admin: bool = Field(
default=0
)
is_active: bool = Field(
default=1
)
model_config = ConfigDict(extra="ignore", from_attributes=True, json_schema_extra={
"example": {
"firstname": "Johnny",
"lastname": "Appleseed",
"email": "johnny@appleseed.com",
"is_admin": 0,
"is_active": 1
}
})
class User(UserBase, CreatedUpdatedAtModel, IDModel, UUIDModel):
model_config = ConfigDict(extra="ignore", from_attributes=True, json_schema_extra={
"example": {
"id": 1,
"uuid": "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
"firstname": "Johnny",
"lastname": "Appleseed",
"email": "johnny@appleseed.com",
"is_admin": 0,
"is_active": 1,
"created_at": "2023-01-01 00:00:00",
"updated_at": "2023-01-01 00:00:00"
}
})
The result set should include all columns in the database except the password column. I tell FastAPI that the response model for this particular request is response_model=List[User] but without explicitly not selecting the password column from the database, it is still returned so not sure what the deal is there.
The example is correct
But the actual response has the password column
Ah ha, I GOT IT!
What was the problem?
Don't laugh! LoL - I had the following code:
async def get(session: Session = Depends(get_session)):
Instead of
async def get(session: Session = Depends(get_session)) -> list[User]:
Not sure what the Python term is for the end there but it wasn't mapping to the return model?
Okay, now to do some really fun stuff. Time to write a depend to ensure a user exists before trying to update or delete it!
From a newbie Python developer to a seasoned veteran, it would be awesome to get your feedback on my structure and code. Is this something you might be up for doing? It wouldn't have to be for free, I can compensate you in someway.
I would usually try to differentiate between pydantic models and sqlalchemy models by naming them in the following convention:
All db models - just a model name, e.g. User
Pydantic models - {ModelName}Schema, e.g. UserSchema
No need to compensate me, if you want any feedback on your code you can probably post it in #software-architecture or #web-development, I'll try to help too
Right on! I chose a different route. My SQLAlchemy models use the ObjectSchema convention. For example, UserSchema because to me a schema is a table in a database where my Pydantic models follow the ObjectModel convention for example UserBaseModel or UserCreateModel.
Ah, kind of difficult to post all of it but it's all good.
I am pretty organized, at least from my perspective. The only thing I am not doing (yet) is API versioning. This is a pretty small test project that I am hoping to be able to apply to larger projects.
You can start doing versioning when you need to
A lot of changes could be incremental too, also versioning is not as important if it's an internal API or it serves your frontend specifically
Just an example of my structure
Public API's are a different story
Honestly it's not bad but I personally found some problems with this structure, I'd separate your API from your core logic/services, etc
And db models too
Yeah, this is going to be just to serve my front end. It's a website for a softball team that I built. I want it to be dynamic because it's all static right now
I try to do something like this:
.
└── src/
├── adapters/
│ └── api/
│ ├── users/
│ │ ├── _router.py
│ │ └── _schema.py
│ ├── books/
│ │ ├── _router.py
│ │ └── _schema.py
│ ├── app.py
│ └── schema.py (common pydantic models)
├── db/
│ ├── engine.py
│ └── models.py (Could be a package too)
└── domain/
├── users/
│ ├── repositories.py
│ └── services.py
└── books/
└── ...
Yup, everything is separate. I do the same thing in my NodeJS APIs as well.
I dislike the src directory name so I used app. LoL
In this case they're not separated since your application logic and framework (FastAPI)/api specific code reside in the same package
src/app, change my mind
I have a fastapi template here: https://gitlab.com/ThirVondukr/fastapi-cookiecutter
Haha, the only reason I have is that src is overused and honestly, there could be code outside the src directory. For example. my main.py is not inside my app directory. Probably bad structure, but it works for me.
src is very convenient for simply copying all the code in dockerfile 👌
Right on! I'll have a look. I've been using https://github.com/zhanymkanov/fastapi-best-practices#4-chain-dependencies with some changes
Well shoot, never thought about that. I do plan to use a Docker container and AWS Lambda + AWS CloudFront to host this.
So, I guess I am confused. How are they not separated when it comes to Python? To me, everything is separated into different files.
They're in the same package
So, they're not really separated, imo
With my structure I can just nuke adapters/api for example 😅
I thought they had to be a package so I coud do things like from app.auth.services import AuthService? Should I be doing something else?
Also such structure just gets bloated when you need other "adapters" - maybe you want to add some other protocol like graphql or grpc, cronjobs, rabbitmq, kafka, etc...
All of this code gets piled up in the same directory
Haha, funny story. I wanted to use GraphQL instead of REST but I couldn't find any easy tutorials. Everything I tried didn't work so I gave up after about 8 hours. Haha
https://strawberry.rocks/ is a pretty good framework
For the company I work for, I would prefer to go the GraphQL route so I will have a look at that when I have time (it's 2AM where I am)
As for things like cronjobs. queueing services, etc, I have come accustomed to using AWS services. For example, if I need to do something periodically, I write an AWS Lambda function in Python + Docker that is deployed via AWS ECR and then uses AWS EventBridge Scheduler to execute it. For queueing, I rely on AWS SQS and use an AWS Lambda function that is kicked off when a message is entered into the queue.
I prefer as much serverless as possible. That's not saying I do not have AWS EC2 instances and even physical servers because I do. When processing drone imagery, that's not cost effective on AWS and cannot be done with a Lambda function. Those applications are Python deployed via Docker containers on physical servers that simply listen to the AWS SQS queue.
I just use k8s 🥴
Yeah, I do not have that luxury. Working in a startup has its drawbacks.
Hey quick question, with FastAPI, can I secure the routes at the root level instead of having to secure them using Depends individually? For example, I import the routes into a routes.py file which is then loaded in my main.py. The routes.py looks like this:
from fastapi import APIRouter
from app.auth.routes import routes as auth
from app.users.routes import routes as users
routes = APIRouter()
routes.include_router(auth, prefix="/auth", tags=["Authentication"])
routes.include_router(users, prefix="/users", tags=["Users"])
Can I use a Depends here to secure the users routes?
Hm, looks like a middleware would be the best bet except I do not want to block the auth routes
Well, my authentication system works! 🙂
You can attach depends to a router
Well, thank you for the help @paper flower , I appreciate it.
:incoming_envelope: :ok_hand: applied timeout to @analog shell until <t:1700075684:f> (10 minutes) (reason: burst spam - sent 8 messages).
The <@&831776746206265384> have been alerted for review.
@analog shell I’m going to let that stick. If you’re trying to spam for voice verification, don’t. It’ll get you voice muted.
so before I was using SQLAlchemy + SQLite3 and I was doing many queries like this: SELECT .. WHERE col IN (very_big_list_of_values), and everything worked fine.
now I removed SQLalchemy (because I was having issues with the dependencies) and I'm passing the parameters outside the query using qmarks: SELECT .. WHERE col IN ({','.join('?' for _ in range(len(values)))}) and I keep getting the error:
sqlite3.OperationalError: too many SQL variables
I understand that SQLite has a max of 999 values which you can pass as parameters, but how was SQLalchemy able to bypass that restriction?
or perhaps there is a way to make the number higher?
according to https://www.sqlite.org/limits.html the default compile-time limit got bumped to 32766 in SQLite 3.22.0 (<t:1516615200:R>), which sqlalchemy is aware of here
https://github.com/sqlalchemy/sqlalchemy/blob/rel_2_0_23/lib/sqlalchemy/dialects/sqlite/base.py#L2037-L2039
although that attribute doesn't apply to select statements...?
lib/sqlalchemy/dialects/sqlite/base.py lines 2037 to 2039
if self.dbapi.sqlite_version_info < (3, 32, 0):
# https://www.sqlite.org/limits.html
self.insertmanyvalues_max_parameters = 999```
idk if SQLalchemy is using that limit for SELECT queries as well
but I did a quick test and I was actually able to pass just below 250k parameters using sqlite3 directly
iirc sqlalchemy actually does its own client-side parameter binding by default
maybe not in all cases, but i remember reading about it and being surprised
Hello,
I am pretty new to sql alchemy and im trying to use it in my fast api app. I am getting a type error when trying to call this get all referrals function.
def get_all_referrals(self):
try:
referrals = (
self.db.query(DbReferral.id, DbReferral.code, DbReferrer.name, DbProduct.name)
.join(DbReferrer, DbReferral.referrer_id == DbReferrer.id)
.join(DbProduct, DbReferral.product_id == DbProduct.id)
.all()
)
print(referrals)
return referrals
except Exception as e:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail = f"error retrieving referrals"
)
ValueError: [TypeError('cannot convert dictionary update sequence element #0 to a sequence'), TypeError('vars() argument must have __dict__ attribute')]
any idea into why this might happen?
wel I do self.db.query(DbReferral).all() this has no problem :
def get_all_referrals(self):
try:
referrals = (
self.db.query(DbReferral).all()
)
print(referrals)
return referrals
except Exception as e:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail = f"error retrieving referrals"
)
https://stackoverflow.com/questions/76322342/fastapi-sqlalchemy-cannot-convert-dictionary-update-sequence-element-0-to-a-seq - You have a TypeError meaning that you passed a value that is not a dict since it is expecting a dict
Are you using SQLAlchemy 2.0? If so, a lot has changed since the 1.x versions. Look back in this chat from last night between Doctor and I.
yeah i am using 2.0
I have yet do any joins but what I can tell you is to ensure your models are defined properly and that you pay close attention to the documentation
gotcha. I am pretty sure they are. When i run normal sql queries against my db, the right results come up, but i dont know how to translate the sql queries to a sql alchemy query
Have you tried using scalars?
self.db.query(DbReferral.id, DbReferral.code, DbReferrer.name, DbProduct.name)
.join(DbReferrer, DbReferral.referrer_id == DbReferrer.id)
.join(DbProduct, DbReferral.product_id == DbProduct.id)
.scalars().all()
Also, the query function is legacy at this point. I would switch to using execute
oh really? ok. thats good to know.
Ill try that.
@marsh ledge got it working. Thank you for the assistance.
No worries, glad you figured it out!
I didnt realize I was using out dated commands and the select statement made things much easier to understand
Haha, yeah, I was falling into the same issues last night.
Hey @paper flower you available right now for a question?
May I private message you? It doesn't really belong in this channel.
I would prefer if you don't
No problem. It relates to FastAPI not necessarily SQLAlchemy or databases, but it definitely could. Is there a better place?
Maybe #web-development
that would exaplain how it manages to do those big queries
should I attempt to do that as well? SQL injection is not a problem in this case, I'm more afraid of breaking the string if one of the values has a quote ...
if I do:
SELECT 1, 2, 4
it will create a single record: [(1, 2, 3)]
but how can I make it create one column that has 3 values: [(1,), (2,), (3,)] ?
(dynamically)
i would guess it depends on the database, for example postgresql has arrays '{1,2,3}'::int[] and composite types (1,2,3)
oh wait, did you mean you wanted 3 rows?
hmm for sqlite/psql at least, you could write VALUES (1), (2), (3);
In addition to thegamecracks: there's the simple: sql select 1 union select 2 or using unnest: ```sql
select unnest(array[1,2,3])
All three ways: https://dbfiddle.uk/6WkBepBe
Thanks, this is definitely my favorite way out of the 3
Gotta bookmark that website...
I've got a novice question--I (at least in part) come from data compression, and there's a rich world which approaches in data compression through a lens of information theory. Is there a resource for SQL or databases in general that teach how to organize and design a database (and pick an architecture) from that perspective?
I went through the pinned comments and they largely focus on *how* to use sql (or other types of dbs), but I'm curious about the decision making when it comes to organizing a database:
- if I have time series data of multiple sensors, which I'll typically select first by a date range, then by a sensor subset, should the rows or columns be the timestamps?
- If I then take that time series data for a couple of sensors out of the database, transform it into some combined value, and want to load it back into a database, should I put it into the original table or create a new table, especially if the transformation operates on the full length of time series and the transformation needs to be re-performed as often as once each hour (when the data is added to the db at 1hz)
What resources are there which talk about database design from these perspectives? Should I just live in the docs for postgress, mongodb, redis, graphql for a few weeks or is there something more general that I should start with?
you're better off asking in a community related to whichever database you are using
there are a bunch of generic tools for relational databases, but niche NoSQL databases are a bit harder to find (/ build) generic tools for
should the rows or columns be the timestamps
typically you'll want to have a fixed amount of tables and columns, not dynamic table names / column names
... combined value, and want to load it back into a database, should I put it into the original table ...
how would an aggregate/combined value even fit in the original table? duplicate it for every row used to calculate it? do these rows need to have it?
The first thing I would do is probably try to see if other companies that operate with data similar to yours have blog posts explaining what their infrastructure looks like.
Other than that... idk, maybe ask in the data engineering server
Uh, any tools you find for manipulating a database are quite much by definition database administration tools.
You could use Roles or other permission systems to limit what users are allowed to do, but abstracting away how users interact with the database is literally your job
at best you can try to use an open source project as a base/reference I guess
hello, i am trying to create a discord bot which can run few servers without using few bots a bot for every server so i decided to use a json file to store data on each server. well i got to a point which i struggel and someone told me that json file is not that good and he suggested that i got to here and talk to you. I would like some guidness on how I can create a database for my bot. https://discord.com/channels/267624335836053506/1174793064099889284
Many modern analytical databases are column oriented, for a few reasons, one of which is compressibility. Columnar formats apply data appropriate compression which have a huge impact. Generally your question should be; do I need a row oriented db or a columnar? There are also specialized time series databases, too, which is a separate family
This is an example of duckdb’s columnar compression algorithms (not unusual in the family of OLAP db’s): https://duckdb.org/2022/10/28/lightweight-compression.html
There’s a PostGres timeseriesdb engine, along with influx and other dbs, but hard to talk about applicability without knowing your particulars