#databases
1 messages ยท Page 22 of 1
There are many ways to do auth tokens
Also #web-development may be a better place to ask this
Oh, right! Thank you
I'll figure it out. Don't worry
I'm done here. Thanks to all those who helped out. Y'all the real VIPs :)
Is learning how to interact with databases in python an important knowledge in general? I have been wanting to look into it but i have no idea on where to start or if i should start. Thanks in advance to anyone who answer me!
All sorts of developers need to interact with databases. Not everyone needs to be an expert, but I think itโs a fundamental skill that everyone should be introduced to. Hereโs a handy sql intro: https://selectstarsql.com
Really thanks! I will look into it
it is part of generic university education. (ability using raw SQL)
In any case, ability using at least Sqlite3 is universal skill useful pretty much to all types of devs.
Could be cool to learn up to using Postgresql in addition. This postgresql is like... univerally powerful to cover 97% usage cases at server side
learning more obscure databases on another hand is mostly field of Backend engineers, DevOps engineers and Data engineers. So at postgresql most devs could finish up their education regarding databases ๐
Thanks for the additional info! I will start with raw SQL first then i will keep in mind what you said to learn later on when i'm more advanced
Guys, one LAST question, am I supposed to modify the revision files generated by alembic myself inorder to create database tables???
it has autogenerating feature ๐ to autobuild difference between current db and defined ORM models in sqlalchemy
it assumes even in this case u will recheck and correct if necessary though
But I had run the revision command and it just had the pass keyword in the upgrade/downgrade functions
What am I doing wrong?
Like.. where is the code or errors man?
where is link to github what are you doing ๐
Error looks like this:
raise translated_error from error
sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.UndefinedTableError'>: relation "users" does not exist
[SQL: SELECT users.id, users.email, users.password, users.created, users.modified
FROM users
WHERE users.email = $1::VARCHAR]
[parameters: ('test@email.com',)]
(Background on this error at: https://sqlalche.me/e/20/f405)
It's not on GH yet unfortunately
"""fixed models
Revision ID: 1fe65c3d80f5
Revises: bc4522ef3b3d
Create Date: 2023-09-24 22:20:43.507352
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '1fe65c3d80f5'
down_revision: Union[str, None] = 'bc4522ef3b3d'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
pass
def downgrade() -> None:
pass
That's the most recent revision file
๐ check your ORM declaring files now
something is missing to integrate ORM with alembic
also u can check alembic config
may be it is just not pointing to defined ORM model
Which file is this specifically, the env.py file?
The env.py file is autogenerated by alembic though
Or you mean my regular model files?
everything matters here.
alembic file is supposed to be somehow connected to your model files ๐
otherwise it will not have its autogenerating migrations command working
I'm quite confused.
I can see the .ini file from alembic and its env.py file. Other than that, it's my regular model files which I defined by subclassing DeclarativeBase class
Do you have a public code I could look at?
My model is similar to this
Ah, wait, am I supposed to use the --autogenerate flag??
Works now :)
For searching purposes in mongodb which is faster , splitting one big db in multiple smaller db or making smaller collections within the big db and query to collection under specific conditions.
I know that this is a python and not SQL related discord but usually python for analysis uses SQL and vice versa - how easy is SQL at a basic to intermediate level to learn?
basics aren't that hard to learn
intermediate is also probably not that hard either
the harder part is probably coming up with a proper database design for your needs
A basic or intermediate level of SQL isn't hard to achieve. It has a low skill floor and a very high ceiling. So long as you're not becoming a DBA you don't need to reach that ceiling
Any ideas why this may not work?
Sqlite3
It doesn't raise any errors
And yet the values don't appear in the table
The table exists
When I try to execute the same thing through the DB browser, it works
Make sure you con.commit() afterwards.
Also, don't use string interpolation here. Use placeholders: https://docs.python.org/3/library/sqlite3.html#how-to-use-placeholders-to-bind-values-in-sql-queries
is there a way to check if you are currently connect to a sqlite3 databse?
Could use https://www.sqlite.org/lang_corefunc.html#sqlite_version, perhaps?
That returns the value. Maybe Iโm looking to see if a SQLite3.connect() has been made? Does the close() function destroy the connect object created with connect()?
You're wondering if the db connection is still alive?
Yes.
If you have a connection object, you could run a test query. Without a connection object, I dunno. I guess you could check the db file for open file handles (like lsof)
Any idea why the materialised view of a table consumes much less disk space than the actual physical table?
Also, reltuples is 0 for a materialized view?
guys how can i fix this sqlalchemy errror?
sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got Uuid()
id: Mapped[Uuid] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid4, nullable=False, unique=True)
Mapped[UUID]
UUID from uuid
dont i do from sqlalchemy.dialects.postgres?
No
Also you can shorten it to
id: Mapped[UUID] = mapped_column(primary_key=True, default=uuid4)
is my import wrong?
rom sqlalchemy import ForeignKey
from sqlalchemy import String
from sqlalchemy import DateTime
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.sql import func
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import relationship
from uuid import uuid4
uuid.UUID
primary keys are already not nullable, indexed and type is inferred from your Mapped argument
oh
So just this should be enough
oh, thanks
are primary keys unqiue by default?
Of course
If anyone here
Can someone tell
How is mongodb scheme is different than other dbs scheme ( according to some google search results And website )
And how can I build a schema in mongodb? ( have no idea how to build a schema )
if you reply turn on ping please
if you are trying to build a schema in mongodb you are using the wrong database
As CF8 said, you don't. MongoDB is schemaless
Ofc ik that
But I was wondering if there is any way to make a schema in mongodb
fun fact: I don't even need schema for my project. I'm just curious can we do it or not
no, because that would defeat the purpose of mongo
but also, If you data can be correctly represented in a schema
why use mongo at all ๐
I mean I suppose you can enforce the equivalent of a schema with application logic but why not just use a real DB?
Ping when replying
you did SELECT guild FROM antinuke WHERE v1 = 1 AND guild = ... and it returned a row? outside of a database transaction?
It returned None only
i mean, execute that query directly against your database, not using python
it might be the case that the table does not contain the data you think it contains, or you're using the wrong values to query it
But I'm using database in python application
I fetched table and it has data
yes, it has data, but the correct data?
right, you should be able to query from the database another way, using the official command line tool, or a database GUI tool like dbever, squirrel sql, etc.
or even in a python script, try writing out the query without any query parameters, just put the values directly into the query
Real database?? Isn't mongodb a real database??
no
It's not a relational database
nope. Mongo is just remotely accessable way to store json files ๐ It is kind of close to S3 buckets (which allow to store files too)
Real data databases are relational dbs like Postgres (or least Mariadb/MySQL)
As long as I know Postgres or MySQL is considered SQL based database. While mongodb is nosql database ๐
difference in Relational database providing typing validation of structure
and giving access to migrate tables from one state to another one
it gives
a) protection from having code that uses db with wrong data
b) gives data integrity, you are sure all your data reflects set structure and linked with data from other tables
c) allows smooth transition to new data structure, when deploying new application versions
with mongodb u a basically making... application that will not scale in terms of your code growth
one time app, which will be able to live only in short amount of tables and only its first version prod version
when u will start adding new features and making new releases -> you are going to be growing in complexity exponentially very quickly
especially bad it happens when people not having unit tests. Relational databased code can survive dozens of times longer without them, mongo db code can not
Mongo has it uses but it shouldn't be the default
really, it is by design enforcing for you that every column will be present for all rows in your table and having specified data type?
you aren't able to insert records without certain columns missing?
Btw I need speed and performance at database so I think nosql db is faster generally
๐
No
What kind of scale r u running lel
Discord runs on scylladb iirc
Which is built on top of postgres
Nvm Scylla is nosql
XD
๐๐
Theory proven: if you have no idea how your database will be look like then pick nosql database
๐
Point is I doubt the speed and performance u need justifies nosql compared to what SQL can offer u
U never mentioned anything about not knowing ur schema beforehand?
In most cases nosql is faster
Which I doubt again
Cough
*ik mongodb is schemaless db
-> if u are lazy and not capable to design database and code that will be maintainable, pick nosql as main database ๐
-> if application code is one time written and will not be maintained or extended in features, pick nosql as main database
๐
Btw I got got some objections in 2nd one
Code will be still maintained in future
By November 2015, as they reached 100 million messages, various issues were observed like data and indexes exceeded available RAM. This prompted the transition to a more suitable database, ensuring a high-quality user experience. Considering the above points Cassandra was chosen as the successor of MongoDB at Discord.Sep 12, 2023
https://www.linkedin.com โบ pulse
Discord's journey from MongoDB to Cassandra - LinkedIn
Looks like mongodb will be good choice for begining to long run
๐
yeah, and u are 100 million messanging application, capable to actually unit test your code so well that u will not suffer issues araised from using mongodb
everyone forges their own fate. u was warned of problems u will suffer
I audit code projects, regular people suffer from mongo problems after creating just 4 tables
Personal opinion:
They changed mongodb to another db cause mongodb documents size must be in 16mb
How much data discord stores in one document ๐คจ
All is left to insert k8s and 10+ microservices into your 100 messages application
and not writing even single unit test (or having even single type)
And u will have all problems of 100 million messages app ๐ + not even bit of quality to reach it
U keep talking about scale as justification for nosql
Well this isnt really true xD
Just because a DB isn't relational doesnt mean its not a DB
it just means it is not a relational db
That is also not how that works btw, NoSQL != More fast and tbh, if you are thinking that way, it's probably a good indicator that you are fine with a regular relational DB if your data is relational. Because what you consider 'fast' probably isn't actually that fast, and your data is small.
i know, the problem with quality of application still stands though
MongoDB is much less forgivable to mistakes
from the point of view of choosing it as main database for beginners, it is not really db for them
Well, its much easier to shoot yourself in the foot and make less performant queries
idek how to write a query in mongo HAHAHA
what people often don't realise is relational DBs are actually really fast when querying data they don't have indexes. A lot of NoSQL/NewSQL databases suffer quite hard if you are missing a index which means it can't do a hashtable lookup.
anyway, the main problem of mongodb usage for beginners
is that... if they had poor code quality (they usually all have poor code quality), mongodb will multiply their problems further with having unstructured untyped database
SQL database gives a more room for application growth
IDK what you mean by that, no one is storing 16MB values, in fact in Cassandra the limit is closer to 1MB (which is still treated as huge and hurts performance)
The biggest issue with mongo is all indexes are kept purely in memory, and that gets really expensive once you have a couple indexes and a lot of docs.
and if you not have indexes, your performance falls off a cliff as mongo starts doing linear scans across the data
I need help my customer is getting this error with my compiled app he cant install pysqlite3 on windows
might wanna try installing a specific version
maybe give a requirements.txt tile to them or smth
why on earth are you using pysqlite3
as its not a massive thing, and its only for a small sqlite3 database
you know that a) it hasnt been maintained for over a year and b) is already included within windows?
so you dont actually need to install it
y sqlite in the first place?
as what else would I use its just for a small program for storing confi
and customer? hmmm
its a tool
called scrapemate. it stores what amazon stores they want to monitor for new products
and stores around 90 products per store

https://discord.com/channels/267624335836053506/1156652152140341259
If anyone knows anything about peewee SQLite databases and threads, help please ๐ญ
For some reason i get this error from my flask app
Traceback (most recent call last):
File "/home/reikimann/coding/awesome_recipes/main.py", line 3, in <module>
app = create_app()
^^^^^^^^^^^^
File "/home/reikimann/coding/awesome_recipes/app/__init__.py", line 19, in create_app
db_parser.add_user(new_user)
File "/home/reikimann/coding/awesome_recipes/app/models/db_parser.py", line 28, in add_user
self.cursor.execute(query, values)
sqlite3.IntegrityError: UNIQUE constraint failed: user.username
Even if I delete the database I get this error. It runs for one second then crashes.
from app import create_app
app = create_app()
if __name__ == "__main__":
app.run()
app.init() file:
def create_app():
...
db_parser = SqliteParser("test.db", app.instance_path)
new_user = ("asdfjasoifjayeeet", "yet")
db_parser.add_user(new_user)
db_parser.close_connection()
...
SqliteParser file:
class SqliteParser:
""" Class containing all the DB interaction methods. """
def __init__(self, database, instance_path):
self.dbName = database
self.db = sqlite3.connect(database)
self.cursor = self.db.cursor()
# Temp
self.db_url = Config(instance_path).get_db_url()
self.config = Config(instance_path)
with open(self.config.schema_path, "r") as f:
schema = f.read()
self.cursor.executescript(schema)
def add_user(self, values):
""" Creates a new user """
query = "INSERT INTO user (username, password) VALUES(?, ?)"
self.cursor.execute(query, values)
self.db.commit()
Schema.sql file:
CREATE TABLE IF NOT EXISTS user (
id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT UNIQUE NOT NULL,
password TEXT NOT NULL
);
consider what happens if create_app runs more than once...
when create_app() runs, it adds the user ("asdfjasoifjayeeet", "yet"). Its likely that this user already exists in the database.
Hence the error that its not unique
what is this erver
Yep, I saw that it was restarting once when I started the server
Recipe collection server
sql injections?
I have this schema in PostgreSQL:
message(id, content)
message_tags(message_id, tag)
I know that I can find all messages that feature both foo and bar tags with where tag = all('{foo,bar}'), or where tag = all($1).
But what if I want to select all messages which match (foo AND bar) or (foo AND fizz AND buzz) or (fizz AND other)? I know I can combine three all()s with an OR, but is it possible without dynamically constructing a query?
wait, I think all() is exactly wrong here
So this is a query I came up with:
select id, tags from
(
select m.id, array_agg(t.tag) as tags
from messages m
join message_tags t on t.message_id = m.id
group by m.id
) subq
where (%s::TEXT[]) <@ tags;
Here's the code for generating sample data: ```py
def genstring(size: int) -> str:
return "".join([random.choice("0123456789abcdef") for _ in range(size)])
messages = [(i, genstring(100)) for i in range(1, 100_001)]
tags = [genstring(10) for _ in range(100)]
message_tags: set[tuple[int, str]] = set()
for tag in tags:
for _ in range(500):
message_id, _ = random.choice(messages)
message_tags.add((message_id, tag))
The query does work, but it's absolutely pathetic
GroupAggregate (cost=5533.12..6090.08 rows=1 width=40) (actual time=72.131..72.133 rows=0 loops=1)
Group Key: m.id
Filter: ('{5032b7c164,f605eff885,ab362709ba,0f142381da,d08c93cff3,87ec1815f9,647004d3a2,870b369760}'::text[] <@ array_agg(t.tag))
Rows Removed by Filter: 39376
-> Sort (cost=5533.12..5595.00 rows=24754 width=19) (actual time=48.828..51.552 rows=49854 loops=1)
Sort Key: m.id
Sort Method: quicksort Memory: 3703kB
-> Hash Join (cost=2941.12..3726.65 rows=24754 width=19) (actual time=26.542..40.114 rows=49854 loops=1)
Hash Cond: (t.message_id = m.id)
-> Seq Scan on message_tags t (cost=0.00..720.54 rows=24754 width=19) (actual time=0.010..4.589 rows=49854 loops=1)
-> Hash (cost=2482.72..2482.72 rows=36672 width=8) (actual time=26.368..26.369 rows=100000 loops=1)
Buckets: 131072 (originally 65536) Batches: 1 (originally 1) Memory Usage: 4931kB
-> Seq Scan on messages m (cost=0.00..2482.72 rows=36672 width=8) (actual time=0.006..13.915 rows=100000 loops=1)
Planning Time: 0.478 ms
Execution Time: 72.600 ms
I think unnest is what youโre looking for, but I havenโt used Postgresโs version of it in a while. Iโll take a closer look later
this smells LATERAL potentially
good question though, i can work up a sql fiddle thing
Yah, in duckdb Iโd write a lateral against the unnested array
that or ive been brain poisoned by snowflake's LATERAL FLATTEN
yep it's the columnar brain poison
actually i think snowflake is like... columnar, but each column is an entire row? or something like that
I changed the table to have a tags text[] not null column, and the query is extremely simple: select id from messages where (%s::TEXT[]) <@ tags
Bitmap Heap Scan on messages (cost=837.43..841.44 rows=1 width=8) (actual time=17.145..17.146 rows=0 loops=1)
Recheck Cond: ('{83dde7b967,367177e827,0ebeecc7ae,df70048773,e50b676be8,d3d74f1285,444cc44eb2,b2966b3689}'::text[] <@ tags)
-> Bitmap Index Scan on messsage_tags_idx (cost=0.00..837.42 rows=1 width=0) (actual time=17.144..17.144 rows=0 loops=1)
Index Cond: (tags @> '{83dde7b967,367177e827,0ebeecc7ae,df70048773,e50b676be8,d3d74f1285,444cc44eb2,b2966b3689}'::text[])
Planning Time: 0.217 ms
Execution Time: 17.165 ms
test data: ```py
tags = [genstring(10) for _ in range(100)]
messages = [(i, genstring(100), random.choices(tags, k=random.randint(0, 10))) for i in range(1, 1_000_001)]
curious what your duckdb solution would look like @coral wasp if you have the chance to write it up
I'm actually not married to postgresql, it's just for prototyping mostly
yeah but that's maybe not the best way to store the data
that's basically what you were creating with your subquery, right?
is this a good interview question or no
(assuming it's live and they are allowed access to the postgres docs)
I do not understand SQL and I hate postgres documentation, so please don't interview me
you can already use any() on an array in postgres tho
yeah maybe postgres is just too good
why?
Oh anyway, what was the problem? I forgot
Oh, yah: so you want unnest so you can join the unnested values on the tags: without having to do a string comparison)
can't add per-tag metadata like creation time
I need to select all messages that are tagged with all of foo, bar, and baz at the same time
Yah, so itโs joining message tags on the array of tags youโre looking for is the first step
I think sticking with an array could be okay for my purposes ๐
I know that it's a bit of a poo poo, but I will need to research this further as I don't understand SQL at all
Arrays are fine. Itโs just a relatively new thing in sql land
One option in PostGres is to normalize the tags (have a separate tag table). Rather than storing as strings.
Thatโll shrink the message_tags association table to just two ids.
Tried it on 1 million messages: ```
Bitmap Heap Scan on messages (cost=1387.16..1391.17 rows=1 width=8) (actual time=30.443..30.444 rows=0 loops=1)
Recheck Cond: ('{37d2f,2b84e,34608,8712f,b9ff6,fd099,cce90,fb20d}'::text[] <@ tags)
-> Bitmap Index Scan on messsage_tags_idx (cost=0.00..1387.16 rows=1 width=0) (actual time=30.441..30.441 rows=0 loops=1)
Index Cond: (tags @> '{37d2f,2b84e,34608,8712f,b9ff6,fd099,cce90,fb20d}'::text[])
Planning Time: 0.709 ms
Execution Time: 30.461 ms
Or: store the ids as an array in the message table and make it completely flat (at least, Iโd consider that)
What was that query?
Same array query with <@ and 8 tags
Table ```sql
create table messages (
id bigserial primary key,
author_id bigint not null,
created_at timestamp with time zone not null default NOW(),
content text not null,
tags text[] not null,
unique(id)
);
create index messsage_tags_idx on messages using GIN (tags);
i guess this is what GIN is for lmao
Do you have a data generator? Might try it on duckdb later
Here's the entire script https://paste.pythondiscord.com/6KLQ
this sort of thing is where text-search engines take over
I think for Postgres' case they could make it more efficient by creating a DFA over the btree
but I guess realistically if you're trying to do those queries, a DFA is not going to help you that much, compared to an actual index
The tags are completely arbitrary if that matters
The issue you tend to have is that postgres is having to do the job of building an inverted index, on every query.
Since effectively the inverted index takes that whole scan and instead goes "Create this sets and either perform a union or intersection with other terms"
I wonder if I can keep a read model in RAM with just this index, if I need some kind of extreme performance
For PG just adding a GIST index would be more than enough
what's the difference between GIST and GIN?
or at least it will be better
Basically size and performance is most of the difference
GIST indexes are technically lossy and can produce false positives (which PG then filters out when it validates the row) but they are much smaller than GIN indexes
and much faster to update
so if you are in a situation where you have a large number of inserts coming in
a GIN index might be too expensive for what you really want out of it
GIN indexes are actual inverted indexes, GISTs are a bit different
Did you spot check the results from your run? I don't think <@ is what you were looking for, based on your original question... i think you're looking for tags && %s::text[] array operators
And, fyi, you can cut your load time significantly with: psycopg2.extras.execute_values(cursor, "INSERT INTO messages(id, content, tags) VALUES %s", messages, page_size=1000)
so im facing a small issue, i need a certain button that when pressed does 2 things, calculates the sum of all items in a table and then deletes all the records from the table. im doing the delete part first but it doesnt seem to be working so am i missing something here
No that's exactly what I want. I want to find posts that match every tag in my query
&& finds posts that have at least one tag in my query
ok i managed to solve it, forgot to execute with my cursor
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.InvalidSchemaName) no schema has been selected to create in
LINE 2: CREATE TABLE alembic_version (
^
[SQL:
CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL,
CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
)
this happens when i try to run an alembic.
did somebody encouter this and how did you fix it?
Hi guys! Please I'm working with sqlalchemy, how can I implement bulk deletion using the core API?
I guess I should just use a for loop??
Tip: Create one global database connection and pass it in the functions that require it or put a global conn line in them. You can create cursors as you need, and you can probably use a with statement so you don't have to close them yourself. Connecting to the database and disconnecting is a waste of time and may even result in being locked from using the database because there is an established connection in use
That does not seem right. In Django at least you can do Foo.objects.filter(...).delete()...
Do you have some kind of article that explains how they're implemented? I understand the basics of how b-trees work, but anything beyond that is a big mystery to me
The official docs were not very helpful
Mmm maybe the Postgres source code? ๐ I'm not really aware of anyone using the indexes outside of postgres
maybe I should look at the RUM index? https://github.com/postgrespro/rum
It sounds like it suppots the kind of nested and-or queries
Though it is even slower to create
Anyway, I think I'm really overthinking stuff at this stage. <1s to search a million messages will serve me very well
and I can always implement some kind of eventually consistent read model
(because I only need one kind of query, maybe I could optimize it with an extremely specific index)
clearly I know about data access more than the people behind Postgres ||/s||
RUM seems to be able to store additional stuff like timestamps, that could be useful
to avoid a heap scan
RUM looks like a more traditional FTS index
so i would probably say it's a solid option
I was listening to a talk from Oleg Bartunov and he said that RUM was more of an improvement upon GIN for fts purposes
ah ok will do i was planning on refining the code once i got done with the initial workings, i had a slighter more complex issue and that is the app im making is for a restaraunt so i have add to cart buttons next to the items. When the buttons are pressed i need them to be added to an sql table with default price values which is the issue so is there any way to do that since i only know how to set price values for everything at once
You want to delete multiple records or all records?
You could do something like
delete(Record)
or
delete(Record).where(Record.id.in_(ids))
hello guys
Im a little bit confused. My context manager doesn't work properly it doesn't close the session each time Im calling it
Base = declarative_base()
engine = create_async_engine('sqlite+aiosqlite:///my_url?check_same_thread=False', echo=False)
__session = async_sessionmaker(engine, expire_on_commit=False,)
class Player(Base, SerializerMixin):
data...
async def global_init():
async with engine.begin() as connection:
await connection.run_sync(Base.metadata.create_all)
async def create_session() -> AsyncSession:
async with __session() as session:
async with session.begin():
return session
async def main():
data = [370996968811397122, 423231892943536139, 428188503390814212, 453197176714166273]
session = await create_session()
for player in data:
result = await session.execute(select(Player).where(Player.id == player))
p = result.scalars().fetchall()
for user in p:
print(user)
if __name__ == '__main__':
asyncio.run(main())
# The garbage collector is trying to clean up non-checked-in connection <AdaptedConnection <Connection(Thread-1, stopped daemon 12448)>>, which will be dropped, as it cannot be safely terminated. Please ensure that SQLAlchemy pooled connections are returned to the pool explicitly, either by calling ``close()`` or by using appropriate context managers to manage their lifecycle.
I think I did something wrong
this is a good point that i didn't think of, if you denormalize this you get to add an index
this is because the context manager closes right when the function returns, so you're returning a session object that's already been closed by the time you get a chance to use it in main
you need @asynccontextmanager and yield instead of return
also i strongly suggest not tying together creating a session with opening a transaction
that's asking for trouble
@lunar cargo
Base = declarative_base()
engine = create_async_engine('sqlite+aiosqlite:///my_url?check_same_thread=False', echo=False)
session_factory = async_sessionmaker(engine, expire_on_commit=False)
class Player(Base, SerializerMixin):
data...
async def global_init():
async with engine.begin() as connection:
await connection.run_sync(Base.metadata.create_all)
@contextlib.asynccontextmanager
async def create_session() -> AsyncSession:
async with session_factory() as session:
yield session
async def main():
data = [370996968811397122, 423231892943536139, 428188503390814212, 453197176714166273]
async with create_session() as session:
async with session.begin():
for player in data:
result = await session.execute(select(Player).where(Player.id == player))
p = result.scalars().fetchall()
for user in p:
print(user)
if __name__ == '__main__':
asyncio.run(main())
I'd consider a few things here... date-based partitioning, as I'd assume your message searches would probably have some sort of time component... and with partitions, may address long-term gradation problems as the tables grow massive. I'd also think about factoring out the tags, rather than storing them as strings in the messages table. Storing an array of int[] (tag ids) may be more efficient than an text[]
I have sort of a weird idea i need to implement, im creating an app right now using python and tkinter. So when a button is pressed i need a record to be added to a mysql table and ofc if multiple are pressed i need them to be added in there but the records have to come from a seperate table with pre existing data i was thinking with a key. is there someway to implement this?
This looks like deleting just one instance. I'm talking of bulk deletion (multiple instances).
Ps: what I have (for loop) currently works but I was wondering if it had any impact on perf
I'll have to test out storing tags separately. It will certainly occupy less space, but will require a join when creating a message and retrieving a message
Though reading the row also takes up time, so the space might actually increase the performance
also, I think I completely botched my syntetic test tbh
because I tested on uniform data, not a real-world distribution
i.e. there will be a few extremely popular tags and a lot of unpopular tags
If I understand correctly, to find a post that matches every one of foo, bar, baz a GIN index literally does a linear scan, like ```py
relevant_indices: list[set[MessageId]]: ...
relevant_indices.sort(key=len) # traverse the least popular tags first
final_message_ids = reduce(lambda a, b: a & b, relevant_indices)
(if not, I think that's what I am missing in how GIN works)
So even if the intersection itself is small, if all three tags are "popular", it will have to do some hefty scanning
So, has anyone ever considered a markdown to SQlite script? I document my schema before data modeling, and figured I could just parse it to create a DB. Working on inserts now.
Why not work backwards, and print out the schema after you created it?
Also you usually have some SQL file that you use to initialize. Or a migrations folder
Though to understand the result of a series of migrations you'll have to do a fold manually...
My dev process is Data Modeling,UI/UX,Business Logic. When I am doing data modeling I usually document it using markdown (usually with Markdown tables with "column Name" and "Column description" as the headers. I was wondering if I could go write from my "notes" about the data directly to the database.
Another alternative is using graphviz, if thereโs a lot of relations
Hi. Can someone tell me which database would be suitable. I want to store daily changes in about a million products. I get the data as
{id:some, field1:value ...........field33: value}
It is not guaranteed that all fields will change everyday. Think of it like Description of the product which might change in a year, number of sold which will change daily.
What is the best setup for this
Thank you!
Looks like a NOSQL database?
I did think using mongo but how would I go about recording the changes? Edit in place or append new record?
If you have many different kinds of products with differing fields, you have two options in a relational database:
- Entity-Attribute-Value
- Storing the dynamic parts of the product as a JSON if the database supports it
Fields are always the same for all products
Values will change daily which i need to capture
Then you can just make a table with columns like (id, name, description, foo, bar, baz, ...)
Got it. So rdms is preferred way for this instead of nosql
Though having too many columns can be bad in some cases, I don't think having 100 or even 50 columns is good. In that case you can store a JSONB field in postgresql, it can index them just fine (if you use the right index)
Thank you! @brave bridge
NoSQL (which is a very questionable name tbh) does have some applications, but for most applications a SQL database is a good default
I think this quote is good from a website:
The five critical differences between SQL and NoSQL are:
SQL databases are relational, and NoSQL databases are non-relational.
SQL databases use structured query language (SQL) and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data.
SQL databases are vertically scalable, while NoSQL databases are horizontally scalable.
SQL databases are table-based, while NoSQL databases are document, key-value, graph, or wide-column stores.
SQL databases are better for multi-row transactions, while NoSQL is better for unstructured data like documents or JSON.
Like fix error said, you can also use SQL database with a JSONB field
Thank you @simple barn @brave bridge
Looks like postgres with jsonb it is
well, this list is oversimplifying stuff a bit
"NoSQL" is a very broad term that can mean different things
For example, Cassandra has schemas
Document db to column stores
that point in particular
SQL databases are vertically scalable, while NoSQL databases are horizontally scalable.
is oversimplying wayyy too much
I think ChatGPT wrote that post, haha. Disregard.
I guess SQLite is not horizontally scalable
But if you have a SQLite database and you need horizontal scaling, you made some extremely backwards design decisions lol
Oh no lol. These same 5 points have been going around since ages
I think defining terms as a negative of something usually turns out in confusion and vagueness
Like, is EdgeDB a "NoSQL" database?
as it's just a fancy frontend on top of postgresql, but doesn't use SQL for querying
also "non-functional requirements"
Or "clipless pedals" which is not about IT at all, but has exactly the same problems.
I've sharded the heck outta mysql in the past... anything's horizontally scalable if you try hard enough ๐
Well, I guess you can partition it "manually"
Filter is for many
Many with same attribute(s) iirc
Not different attributes
That made no sense. Filter can do IN filters and OR.
Oh yeah, I forgot the Q class
It's been long I did that kind of query in django
but in django, calling delete() on the queryset can delete all the objects
I'm not sure that's the case with sqlalchemy
I'm using sqlalchemy orm. Seems django orm already abstracts that part
Pretty sure it can. Sqlalchemy is generally better at mapping to any SQL than djangos orm
Oh..
You don't need Q for this anyway.
Well, I haven't seen a solution yet. Maybe I need to keep searching
In django? how else?
Foo.filter().delete() will delete all rows.
thanks for helping me mate!
In defense of sqlite, scarily you can actually make it horizontally scalable with some wrapping
It would actually be a pretty cool concept to see, you could do a fairly abstract version and a more efficient version which modifies some more of the sqlite internals
But personally, I would like even to see a POC of a DB built on top of a set of sqlite dbs
I'm sure you could make a more efficient system
But damn sqlite setup right could probably do pretty solid
I did something like this with mysql years ago with great results, even did some partition magic to parallelize some otherwise unparallelizable queries. Totally agree on the sqlite part, would be interesting.
The key part of all of that is writing a sql rewriter/interpreter to subdivide the work/etc.
Is this function correct because even after executing this, it is still retuning None
def enable_v1(self, value_1: int):
self.cursor.execute("UPDATE table_1 SET v1 = ? WHERE guild = ?", (1, value_1,))
self.conn.commit()
what is still returning none?
also, cursors should be short lived
you should make a new one effectively for each query
rather than as a class var
def check_v1(self, value_1) -> bool:
cursor.execute('''SELECT guild FROM table_1 WHERE v1 = ? AND guild = ?''', (1, value_1))
value = cursor.fetchone()
print(value)
if value is None:
return False
return True
This ^
I just tried to make all the different sql queries in same file so multiple databases are connected in different classes inside a single file
Always returning false even if i use this [function](#databases message)
All, trying to deploy something and noticed my tests were failing. Apparently its a DROP DATABASE command. Using Postgresql. Noticed I was on 12 so updated to 16
psql (16.0 (Ubuntu 16.0-1.pgdg20.04+1), server 12.16 (Ubuntu 12.16-0ubuntu0.20.04.1))
Type "help" for help.
amelia=> DROP DATABASE IF EXISTS foo WITH (FORCE);
ERROR: syntax error at or near "WITH"
LINE 1: DROP DATABASE IF EXISTS foo WITH (FORCE);
^
amelia=>
If you run the SQL directly you get a different result?
Any ideas why I am getting syntax errors?
Uhh i didn't test
But it was working in my previous code
Seems like it isn't fetching correctly or not inserting values into table
Do you need the IF EXISTS part or could you try without that?
I think I'll need it but I can create a blank and try
yeah it just errors on WITH FORCE each time
if I remove it its fine
but wondering why
https://www.postgresql.org/docs/12/sql-dropdatabase.html - you are using PG 12 so you should be reading PG 12 docs not PG 16
yeah just realized that. I saw the 16 in the front and thought I was good to go
ah 16 is your client version
in fact it should have warned you saying you are connecting to PG 12 with a newer client version
Hi,
im working on a flask app which uses sqlalchemy.
Locally when testing, im using sqlite without problems. I even set up a local mariadb 11 server to test the production config, without problems.
When i deploy it to my server, which also runs mariadb 11, i get errors that the table already exists.
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1050, "Table 'events' already exists")
from .config import Config
from flask_sqlalchemy import SQLAlchemy # noqa
from sqlalchemy import create_engine, MetaData
from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.orm import declarative_base, scoped_session, sessionmaker
engine = create_engine(
Config.SQLALCHEMY_DATABASE_URI, pool_pre_ping=True, pool_recycle=3600
)
db_session = scoped_session(
sessionmaker(autocommit=False, autoflush=False, bind=engine)
)
class CustomBase:
@declared_attr
def __tablename__(self):
return self.__name__.lower()
Base = declarative_base(
metadata=MetaData(
naming_convention={
"ix": "ix_%(column_0_label)s",
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s",
}
),
cls=CustomBase,
)
def init_db():
import apps.models # noqa
Base.metadata.create_all(bind=engine, checkfirst=True)
def update(self, data):
# Update an object with the data provided
for key, value in data.items():
if hasattr(self, key) and value is not None:
setattr(self, key, value)
Base.query = db_session.query_property()
Base.update = update
I dont understand why this can happen.
- why does my local mariadb server work, but the remote mariadb server (fresh install, dockerized) does not?
Base.metadata.create_all(bind=engine, checkfirst=True)should add IF NOT EXISTS afaik - how can it even try to re-create the tables
:
moved on from #ot0-fear-of-python
If there is a caching mechanism for admin panel dashboards, do people use Redis? And if they do, what happens if a cached data gets deleted by LRU tactic and admin starts seeing old events?
like how all does that shit work
Hmm, i don't know about your situation, but the admin panels that update live, just pull thier data according to timestamp, if they are live updating
so they dont do cache at all?
just trust on the main database?
It depends,
Case 1: Data is not frequently updating in real-time
eg: records of user data
-
admin panel do cache, but, everytime before UI updates, or in a fixed interval it would check for updates.
-
or.., there would be a webhook, which notifies, if there is an update in database
Case 2: Data is updating real-time frequently
eg: stock market
- it's based on, like I said, timestamps, and it always updates
Probably case 2? I expect a lot of records, users, events, transactions, ban appeals, reports etc etc
And they get updated frequently
there isnt one admin, there will be more than one (expected, might be one)
Well, do you have access to database?
You can check for yourself which case it comes under
Yes, i host the admin panel from the backend directly, I just mount the panel to FastAPI app.
i can connect to redis or postgresql
I am just planning atm
like there is 2-ish month development time in ahead of us,
Try to test it with a test data, I'm sure it will take utmost an hour or 2 to get it done
It's better to be safe than sorry ๐
frontend is not ready at all, dunno what can I test against
i can populate with bunch of random data
Even in case 2, sometimes data needs to be pre-computed and pushed to a cache rather than live-queried. ie: If you're drawing stock market candles with a 1 minute granularity, you could pre-compute (denormalize) and cache the OHLCV candles, rather than querying/constructing the results directly. Or, adding a materialized view that has a refresh interval or refreshes on certain events, etc.
That makes sense, but i am worried about how will I cache things.
Are you perhaps thinking about premature optimization? Dropping caching in is relatively easy, once you ahve an understanding of your bottlenecks.
You'd be, perhaps, surprised at how well databases perform under reasonable load: often caching is not needed.
Maybe yeah, main PostgreSQL database might be more than enough
With the admin panel and the backend itself, we will do like, what, 500k row reads monthly? (again, expected)
There's also lots of things you can do within the database to optimize (namely: indexing, partitioning, materialization, etc), before you even get to caching.
Alright, I will worry about caching later i guess.
Thanks a lot
Materialized views are kind of already form of caching
Otherwise it is common to use in python redis lib for caching
Plus there is caching available at the level of reverse proxy Nginx
Plus there is caching available at the level of CDN
Too many caching layers, at this point worry becomes how to invalidate them ๐
hey can anyone explain this diagram a little please
@coral wasp thought you would be interested. i found this insightful:
Oh, that's nice actually. Altho I think they're being generous to Redshift (at least with regards to cost!)
oh yeah? i dont have exp with redshift nor do i know anyone personally that uses it. snowflake on the other hand... 
Fair. Would like to see clickhouse on that, more affordable than snowflake but larger scale than duckdb.
oh yeah that would be dope
MySQL be like: https://benchmark.clickhouse.com/
Im trying to count records based on a partial information from a column. I have a date column with the format mm/dd/yyyy and i want to pull data based on a specific year. I know i need count and in but im having trouble with the parsing the year out.
Or should i just do this with a python loop?
what sql engine are you using?
Sqlite3 .
I'm just refreshing my memory on how sqlite3 lets you work with dates.
Iโm using strings
so you can parse it into the date type and then format it to how you need it.
nice. i mean the nuance obv is the specific workload theyre looking at
SELECT strftime('%Y', you_col) FROM your_table;
I donโt think sqlite3 has a date data type.
At least I donโt remember seeing it in their website.
It does not. U should store date like year/mm/DD
Preferably just use isoformat already
Then u will be able using syntax Like properly
I think even with current format u can write
WHERE data LIKE %s-2023 though
Iโll have to look into like, thatโs a new statement for me. And I just realize Iโm using test data, I donโt remember if the software generated it or I dumped it. If I dumped it, the software may create a different format.
Thank you.
is it difficult to link py with sql for databeses and stuff
not really, there's a sqlite3 library built-in you can use from the moment you install python and most databases have python drivers
Easy peasy lemon squeazy. People very far from proper software engineering manage to do it ๐
It is entirely another question how good, maintainable or performant result will be though
Look into ORMs because that's generally the best way to do it and prevents you from using raw SQL instead of pure Python. Peewee is super easy to start with but SQL Alchemy is generally better, and Django has its own ORM built in too
Thats what I thought too though consulted https://www.sqlite.org/lang_datefunc.html
My code does format the dates properly, I input the text data with the wrong format. Used the like key word (function?) and everything is working. This ended up being perfect for the next step as I needed to gather data from a range of dates.
is this the general form of doing bulk updates for sqlalchemy orm?
session.execute(update(magicInfo),bulk,)# bulk is [{id:1,"set_names":"abc,def"},{id:2,"set_names":"hif,klm"}
i suspect session is a transaction ๐ค
oh... also... how the hell to make SQL update in bulk if raw SQL is not supporting it ๐
ORM Bulk UPDATE by Primary Key
update()
List of dictionaries to Session.execute.params
https://docs.sqlalchemy.org/en/20/orm/queryguide/dml.html SQLAlchemy update yeah allows bulk update nevertheless.
https://docs.sqlalchemy.org/en/20/orm/queryguide/dml.html#bulk-update-by-primary-key-for-joined-table-inheritance
available ๐ค
from sqlalchemy import update
session.execute(
update(User),
[
{"id": 1, "name": "scheeks", "manager_name": "Sandy Cheeks, President"},
{"id": 2, "name": "eugene", "manager_name": "Eugene H. Krabs, VP Marketing"},
],
)
hehe, under the hood it is not very bulky as expected
UPDATE employee SET name=? WHERE employee.id = ?
[...] [('scheeks', 1), ('eugene', 2)]
UPDATE manager SET manager_name=? WHERE manager.id = ?
[...] [('Sandy Cheeks, President', 1), ('Eugene H. Krabs, VP Marketing', 2)]
Probably at least has advantage of submitting query over single network query
Hi
I need a help to make database of song
How to Start?
So I can play Multiples of songs
This is my website, and I'm working on it
Help me please
I donโt think songs are usually stored in a database. You might store a link to a song in the database, with the link going to AWS S3, or to Google drive or to Dropbox etc.
I am using sqlalchemy using uselist = False should have made it one-to-one?
but on erd diagram(pgadmin 4) it still shows one-to-many
i want it to be one-to-one with out using the unique constraint in UserSettigs.user_id
class Users(Base):
__tablename__ = "user"
id: Mapped[int] = mapped_column(primary_key=True)
tg_id: Mapped[int] = mapped_column(nullable=False, index=True, unique=True)
settings = relationship(
"UserSettings", backref="user", uselist=False, cascade="all, delete-orphan"
)
class UserSettings(Base):
__tablename__ = "user_settings"
id: Mapped[int] = mapped_column(primary_key=True)
is_premium: Mapped[bool] = mapped_column(Boolean, default=False)
is_banned: Mapped[bool] = mapped_column(Boolean, default=False)
enable_notifications: Mapped[bool] = mapped_column(Boolean, default=True)
user_id: Mapped[int] = Column(Integer, ForeignKey("user.id"),unique=True)
)```
settings > edit environment varriables for system > environment varriables > you are here
thanks
does this mean i did the PATH thing correctly?
look like i forgot the password ๐ how do i change it
i reset my password and its still wrong =.=
i use the change scram-sha-256 to trust method
i try finding but cant find it where do i change it
Do you mean master password?
Currently I don't see anything wrong, just expand "Servers" tab
i think so
when i try to create a table it ask me for a password
and idk how to reset the password ;-;
There should be reset password button
How to create 1 to 1 and 1 to many relationship using sqlalchemy
Good Afternoon. Im trying to ceate an application which stores internships and gradschemes im applying to. Im using Pandas and openpyxl to store the database in an excel spreadsheet.
My issue is whenever i create a file, it is always corrupted, and i have no idea why
ill send the relevant code below:
i dont find it ;-;
Could you send screenshot of password prompt
What do you want to do
oo ok
#this checks to see if the file exists or not:
try:
oldData = pd.read_excel("Tracker.xlsx", sheet_name="Data")
existingReminders = pd.read_excel("Tracker.xlsx", sheet_name="Reminders")
mode = 'a'
except FileNotFoundError:
oldData = pd.DataFrame()
existingReminders = pd.DataFrame()
mode = 'w'
#creates the writer:
with pd.ExcelWriter("Tracker.xlsx", engine="openpyxl", mode=mode) as writer:
toExit = False
while not toExit:
choice = int(input("1, 2 or 3: "))
if choice == 1:
newEntry(writer)
#uploads data:
if correct in ["YES", "Y"]:
toTransfer = pd.DataFrame(data, index=[0])
toSubmit = pd.concat([toTransfer, oldData])
toSubmit.to_excel(writer, sheet_name = "Data")
#I am certain a sheet is being created, its just the file itself does not open
#Ive reached the cap, so will be putting the error after this message
#Here is the error:
Traceback (most recent call last):
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\Grad Tracker.py", line 166, in <module>
newEntry(writer)
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\Grad Tracker.py", line 154, in newEntry
updateEntry(data, data["Name"], writer)
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\Grad Tracker.py", line 97, in updateEntry
if (not data["Second Stage"]) or data["Second Stage"] == "No Response":
~~~~^^^^^^^^^^^^^^^^
TypeError: string indices must be integers, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Necus\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\232.9559.58\plugins\python\helpers\pydev\pydevconsole.py", line 364, in runcode
coro = func()
^^^^^^
File "<input>", line 1, in <module>
File "C:\Users\Necus\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\232.9559.58\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Necus\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\232.9559.58\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py",
line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\Grad Tracker.py", line 161, in <module>
with pd.ExcelWriter("Tracker.xlsx", engine="openpyxl", mode=mode) as writer:
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\pandas\io\excel\_base.py", line 1370, in __exit__
self.close()
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\pandas\io\excel\_base.py", line 1374, in close
self._save()
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 110, in _save
self.book.save(self._handles.handle)
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\workbook\workbook.py", line 386, in save
save_workbook(self, filename)
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\writer\excel.py", line 294, in save_workbook
writer.save()
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\writer\excel.py", line 89, in write_data
archive.writestr(ARC_WORKBOOK, writer.write())
^^^^^^^^^^^^^^
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\workbook\_writer.py", line 150, in write
self.write_views()
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\workbook\_writer.py", line 137, in write_views
active = get_active_sheet(self.wb)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Necus\OneDrive\Documents\Grad Tracker\venv\Lib\site-packages\openpyxl\workbook\_writer.py", line 35, in get_active_sheet
raise IndexError("At least one sheet must be visible")
IndexError: At least one sheet must be visible
any solution would be beneficial
how detailed can you get with sql? Ive seen someone use it to do advent of code.
but does anyone have experience with making Servers, Databases and User Authentiction / Accounts
https://paste.pythondiscord.com/USUA
I made this query.
Yet i have a contract that had their last visit from attendancelist last week or in the last month that get state_id = 4.
I don't see what triggers this behavior.
Does anybody has any pointers?
Only pretty much every first backend dev ๐
lol
It is common as dirt under foot thing for backend devs
#web-development kind of better for this topic also
yeah, just asking if anyone can help really im making a game (unity with C#) and im trying to make server for it to send user creds too so if they need to reset password its possible
If we would have used Python and wished as less development as possible
It would have been Django with (sqlite3? Or postgres)
Plugged with django-ninja or DRF
Deployed with preferably docker to some Linux machine
I strongly recommend developing in .Net core with aim for deployment at Linux if u make it for C# game though
Because then u can reuse server code at desktop app game code, less complexity, automated typing validation
dm me
Not doing dms, ask in public
alr i was gonna ask if you would like to help with the game but alr
anyone good at databases
explain
lets supposed we make CRUD backend API that can accept JSON requests and answers with them at server.
If we use C#(same language) at server and backend
We could define Struct of data that is possible input for Request body, and struct for answer
https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/how-to?pivots=dotnet-8-0
using System.Text.Json;
namespace SerializeBasic
{
public class WeatherForecast
{
public DateTimeOffset Date { get; set; }
public int TemperatureCelsius { get; set; }
public string? Summary { get; set; }
}
public class Program
{
public static void Main()
{
var weatherForecast = new WeatherForecast
{
Date = DateTime.Parse("2019-08-01"),
TemperatureCelsius = 25,
Summary = "Hot"
};
string jsonString = JsonSerializer.Serialize(weatherForecast);
Console.WriteLine(jsonString);
}
}
}
With using same code for backend and desktop app
Desktop app could import same Structs that server app uses
and to send correct data from desktop app with validation at the level of compiler that u used correct input fields, and u can reuse backend code for Struct answer to deserialize recieved in request answer from json back to C# struct
this alone simplifies situation and making u using all the time correct input/output fields to communicate between desktop and server
Making possible refactoring easily (renaming fields), adding/removing freely fields for communications, and all without super need for unit tests
yeah i would but my friend knows c# and i only know python
so thats why i want the server in Python so i can make it
shrugs. then make it in Django-Ninja https://django-ninja.rest-framework.com/
Best option i think as slightly less option
Django Ninja - Django REST framework with high performance, easy to learn, fast to code.
it will autogenerate documentation for your API
available at some endpoint...
https://django-ninja.rest-framework.com/guides/api-docs/
/docs
/redoc
Django Ninja - Django REST framework with high performance, easy to learn, fast to code.
if u will use framework at a good capacity, friend will always know automatically possible input and output and possible endpoints
https://pyjwt.readthedocs.io/en/stable/
Btw, this is great library to make password resets ๐
thanks, can i add you i can ask for advice whenever rather then having to check this server all the time
Django-ORM comes with out of the box with working best boilerplated ORM, great for both SQlite3 and postgres. solid option for least amount of hustle at the cost of decreased potential flexibility which u will never need
nope. I am too often answering questions to people. i can spare time only when i am free at work pause and willing
Ask everyone at public in #web-development or #databases , may be i will notice your question and answer too
alr
I have this query
SELECT SUM(pulls) from toning group by job that returns a list of numbers (pulls). How do i average the data thats returned?
like:
SELECT AVG(pulls_sum) FROM (SELECT job, SUM(pulls) AS pulls_sum FROM toning GROUP BY job)
Thank you.
Didnโt realize Sql could get so complex. And this is the shallow end lmao
which database best for web developmnenr
Postgresql for work. Excellent database covering 98% usage cases.
Sqlite3 good for pet projects
oh thanks man
how to check table is exists or not in sqlite3
define 'pet projects'
I do it like this:
def table_exists(conn, tbl_name):
return bool(
conn.execute(
"SELECT COUNT(*) FROM sqlite_master WHERE type = ? AND name = ?",
("table", tbl_name)
).fetchone()[0]
)
i usually depend on the error lol
try:
for svinfo in range(0,3):
cur.execute(f"CREATE TABLE sv{svinfo}({', '.join(str(v) for v in svbase)})")
# This means that the db exists
except sql.OperationalError:
pass
but that looks reliable, thank you
The CREATE TABLE statement does have IF NOT EXISTS if you want that, like:
CREATE TABLE IF NOT EXISTS mytablename
even better! thanks
Any project u do in your own time for learning purposes and just for resume to show off is definitely a pet project
oh ok
any SQL God here?
Nope, just mortals, but you could ask your question and see what happens
Thou walkest among sql atheists
As a sanity check, I'll ask here. I have a program that uses SQL to access a MariaDB database. I have a few users that use the program, sometimes simultaneously. After one user commits, the other cannot see the new data until they either commit, or when they restart the program.
My understanding on this is that the default isolation level, repeatable read, runs on a snapshot grabbed at the first connection and updated with each commit. My options appear to be to have the program constantly commit wihout any changes, or change the isolation level. This is where my question lies.
If I am understanding correctly, I can use Read Commited to update the snapshot with each read. Is there a downside to doing this?
Thanks!
Sounds like you're using long-running connections with implicit transactions. Rather than opening a connection for the life of the app, open a conneciton for specific purposes. Try switching to using a new cursor for each query (or batch of queryies) first.
Thanks for getting back to me.
Does that work better than changing the isolation level?
Well, what I said is something you should do anyway. I think you're keeping a transaction open for a long period, which is never a good thing.
Oh I see. They use the program for (usually) minutes at a time, unless they are mass updating inventory. I do open a single connection and then use the same cursor for all transactions. Out of curisoity, and the fact that I'm new to SQL, what is the issue when keeping a transaction open?
The problem with a long-lived transaction is: the database is constantly changing.
Eh, there's lots of issues, actually. The way databases handle locks and the like, etc.
Generally, best practice is to do things in small chunks.
Use a cursor/transaction for when you need to update. Usually get in and get out.
Otherwise you end up with the isolation issues you're describing, as one side effect.
Isolation is a good thing, makes your code a lot more complex without it.
So should I open and close the overall connection as well as the cursor, each time I do a read or write to the database?
You don't need to reopen the connection, just use a fresh cursor for each db transaction, and keep the transactions relatively short.
I see. I think the transactions are short. I mainly grab the information from the database I need to initialize the program, and then only hit the database for when doing searches and saving to the database.
Hey guys I got a Q. I have a column of lists in my dataframe, and I'm trying to pull the first value from these lists to make another column. The problem is that when it runs df['col_2'] = df['col_1'] [0] it uses the first value in the column, instead of the first value in every list in the column. How do I make it look at each row individually?
Learning sqlite im a beginner do u guys have any tips or other light weight data bases
!e feels pretty cursed but .str[0] works iirc: py import pandas as pd lsts = [ [1, 2, 3], [4, 5, 6], [7, 8, 9], ] df = pd.DataFrame({"A": lsts, "B": ['a', 'b', 'c']}) print(df) print(df["A"].str[0]) or just .apply(lambda x: x[0]) / .map(lambda x: x[0])
@storm mauve :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | A B
002 | 0 [1, 2, 3] a
003 | 1 [4, 5, 6] b
004 | 2 [7, 8, 9] c
005 | 0 1
006 | 1 4
007 | 2 7
008 | Name: A, dtype: int64
class Comment(Base):
__tablename__ = "comment"
id: Mapped[int] = mapped_column(primary_key=True)
content: Mapped[str] = mapped_column(String(4000))
parent_id: Mapped[int] = mapped_column(ForeignKey("comment.id", ondelete="CASCADE"), nullable=True)
parent: Mapped["Comment"] = relationship("Comment", remote_side=[id], back_populates="children")
children: Mapped[list["Comment"]] = relationship("Comment", remote_side=[parent_id], back_populates="parent")
idea_id: Mapped[int] = mapped_column(ForeignKey("idea.id", ondelete="CASCADE"))
idea: Mapped[Idea] = relationship(back_populates="comments")
author_id: Mapped[int] = mapped_column(ForeignKey("user.id", ondelete="CASCADE"))
author: Mapped[User] = relationship(back_populates="comments")
reply_count: Mapped[int] = column_property(
select(func.count(text("comment.id")))
.select_from(text("comment"))
.where(text("comment.parent_id = comment.id"))
.correlate_except(text("comment"))
.scalar_subquery()
)
am i doing this right? (the reply_count) part
it's always returning 0
unlike this
class Idea(Base):
__tablename__ = "idea"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(70))
description: Mapped[str] = mapped_column(String(4000))
author_id: Mapped[int] = mapped_column(ForeignKey("user.id", ondelete="CASCADE"))
author: Mapped[User] = relationship(back_populates="ideas")
likes: Mapped[list["IdeaLike"]] = relationship(back_populates="idea")
likes_count: Mapped[int] = column_property(
select(func.count(text("idea_like.id")))
.select_from(text("idea_like"))
.where(text("idea_like.idea_id = idea.id"))
.correlate_except(text("idea_like"))
.scalar_subquery()
)
comments: Mapped[list["Comment"]] = relationship(back_populates="idea")
comment_count: Mapped[int] = column_property(
select(func.count(text("comment.id")))
.select_from(text("comment"))
.where(text("comment.idea_id = idea.id"))
.correlate_except(text("comment"))
.scalar_subquery()
)
which return the correct count of comments and likes
I think you should not be committing SELECT and PRAGMA queries, but for every other transaction you must
hey i am trying a login interface for my site and it says in the tutorial i need a server side script how do i create on in python?
if you are just wishing to learn more stuff in a raw way, go through flask mega tutorial for that
https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
If you are doing smth for work... https://www.djangoproject.com/start/
I would recommend actually using django for that.
More boilerplated and error safe, and more... devproof in general
(if u need rest API, it has ready to use plugins too)
yes i need to create a site in 2 days for a competition
๐ค tempted to say go django in this case. higher chances not to screw up in a process
is it easy
sure, it is easy. Quite not dev-people manage to make it work in django.
result is messy, but it works
thats exactly what i need.2 days is such a short time that if i am able to finish it all i will be blessed
thanks !
The principle of django is everything boilerplated
it has already out of the box ready to use ORM, database, login system for you
you only need to figure out how to use it and turn on
Flask on another hand is more raw approach, there are a lot of small things to figure out and setup from zero.
I don't believe u have good chances to be in time of 2 days if u start from zero with Flask
Django i think is safer bet
i think you are right but i am gonna try and not sleep lol
ok so flask more complicated time consuming djiango easier shitier result less time
Flask is more raw, time consuming to setup from zero correctly, full flexibility to make everything. easy to screw up completely beyond recoverability.
Django is more idiotproof, everything is boilerplated (configured out of the box). At low skill level Django is easier to make something good enough in a fast rocket way (assuming u make it in Django way). Django cost is more rigid solution in a super long run (if you are expert level dev, u can feel contrained by it. Not really issue for your 2 days timeline at all)
ok
From personal experience, half of time will be consumed by figuring out how to use correctly SQLAlchemy if starting with Flask :/
Although u can bypass this issue by actually just using raw SQL.
Django ORM has out of the box ready to be used with SQLite3(or could be switched to another engine)
What kind of SQL. If SQLite like this. If something else, the equivalent is easy to find with a web search. https://stackoverflow.com/questions/1601151/how-do-i-check-in-sqlite-whether-a-table-exists
dumb universal way could be SELECT 1 FROM table_name ๐
if u got 1, then it exists.
Should work for literally every SQL engine.
If u get error, then it does not exist.
i was hoping for a condition statement but that sounds good since it hypothetically can work for any ver of sqlite
it will work even for postgres, and mysql, mariadb and probably cassandra and all other SQL like engines ๐ full cross platformness.
Lmao, so try my code and get ready for an exception?
Sure, good enough approach for python and its error handling. As long as you specified precise exception type in the catching (that matches exactly the error u get if table does not exist)
Thank you
Optional boolean route ๐ค
Hey, using sqlite in an async application. Is there a popular async ORM that any of y'all would recommend (or is an async even a requirement for an ORM)?
https://github.com/Rapptz/asqlite is a thing
there are a few others too but tbh sqlite is so 'lite' that it probably wouldn't make much of a difference if you just used the normal standard library sync api for it, assuming that you do not have like dozens/hundreds of things trying to access it at the same time, in which case you really should consider using a bigger database like postgres
but in general yes, you'll want to use async ORMs in async projects
with sqlite3, the database is on the same machine as your program so it shouldn't hurt way too much to use it, but with other databases you'll often have to make network requests which could be a problem
thanks, was looking for something more high-level tho
I think that SQLAlchemy has an async mode?
I'm not sure what pragma is, but I don't commit when I'm doing select statements. The only time I commit is when I'm saving a new asset or a change to an existing asset.
im making a game, would it make sense to use a .txt file as a logbook (boot ups, log offs, sessions) and a sqlite table for game saves?
why not just use sqlite for everything?
im actually coming off of sqlite and using csv for my project
You might not even need SQLite for game saves
Yeah lol
Would anybody know where to retrieve data about revenue per square m of a specific restaurant?
Im trying to find out how the position of the restaurant correlates with its revenue, and I was wondering where I can find such data.
proly annual reports of restaurant businesses
I doubt you'll find that in the public domain but maybe ask r/datasets on Reddit. Location details will be highly relevant (we don't even know what country you are in)
for a single restaurant? or for a brand of restaurant?
if your meant the brand as a whole and that they are listed company then as mentioned above - quarterly/annual report is your best bet.
if you meant revenue broken down by individual restaurant of the brand, you are probably looking for spend data, you are unlikely to find spend data for free (if you found it, please ping me as i am also interested)
(also for location of the restaurants, open street map will likely have a almost complete list)
if u would have used Java, u could have saved Java objects directly into files. Java has cool feature to have all stuff like that saveable
Saving Objects in Java ๐
Otherwise in python... closest achievable thing with Pydantic. All pydantic objects are serializable/deserializable too.
Database has its own advantages and disadvantages... i am not sure if using Sqlite3 is actually good choice in general for game dev
Using serializable structs(like pydantic BaseModel) i think fits better game development
because i will be able to achieve saveability at a lesser code effort and same code is bidirectional to save and load itself
TLDR: try to use pydantic first https://docs.pydantic.dev/latest/concepts/models/ if u a in python. and use no databases ๐
Data validation using Python type hints
isn't pickle the equivalent for serializing objects? but in both cases malicious code can be executed from loading an untrusted file, so pydantic would be a safer choice
wow. Quiclkly checking pickle... it is may be what i needed ๐ค
looks like the closest analog to java inbuilt serializer
going to help a lot in one tricky place at my library
i need to serialize deserialize very complex object of uknown structure. Pickle looks like will manage to do a better job there
it works for now with just json lib though, hopefully i will not need pickle may be
Pydantic has some runtime type validation, that is why it is cooler by default
i would say though picke and pydantic serve different roles
pydantic is using json under the hood
it could work with pickle too
pydantic is better structs, json and pickes are different serializing engines
there is some interpolability i can admit, but still they serve different intentions
There's many issues with pickle. I'd suggest using dataclasses, which have built in JSON serializers.
It depends what you mean by "making a crud" but you probably want an ORM like SQL Alchemy plus the database driver like SQLite3 or whatever
An ORM is not strictly necessary though
And then what kind of interface? Is this a web backend, CLI or GUI
GUI
GUI or web
to make a crud using cli I use mysql.connector
how long have you been working with this?
Django Rest Framework
Or Django Ninja should serve as best CRUD
Very CRUD boilerplateful
thx for the resource
Web? django+iommi
just woke up to almost all my mongodb collections droppped and a single object in my database asking for 0.0125 BTC to decrypt my data ๐ณ
I was just running this docker-compose.yml:
version: '3.8'
services:
mongodb:
command: mongod
image: mongo:latest
ports:
- '27017:27017'
volumes:
- data:/data/db
volumes:
data:
I was shocked to discover that even though I didn't set my firewall, ufw, to allow remote connections on 27017, some thing did anyway. I am still not sure how this happens, but I believe docker just bypassese/overwrites whatever ufw configurations exist and just edits iptables by itself? I don't know. But I want to disallow remote connections to my docker container. I tried adding --bind_ip 127.0.0.1 to the mongod command above (in the hopes of explicitly not binding to 0.0.0.0), the service starts up, it waits for connections, but when I try to connect with connection string mongodb://localhost:27017, it times out, not even refuses connection.
I am at a loss here. How do I setup a mongodb instance with a docker-compose file such that only local connections are allowed?
thankfully this is just a toy project of mine, but it still feels like almost being hit by a bus. I can totally see some poor unaware soul just blindly running this docker compose file in production expecting to be safe.
ah yes the good old docker ufw ๐
Basically
Docker bypasses your firewall, specifically it bypasses UFW by default
you need to configure your iptables to have it not route to the outside world beyond the firewall
but also i'd recommend doing
ports:
- '127.0.0.1:27017:27017'
as a standard practice though since it's rare you actually want your container exposed to the outside world
spooky!
I was gonna add some username/password authentication but this seems way more reasonable!
thanks for the heads up i didn't know about this ufw/docker thing
side note, but you should definitely also add authentication to you database anyway
noted
that's exactly what I was looking for though, it works! thanks again!
Also it is smart idea actually to change default passwords if u use for prod ๐
i just need to live dangerously.
I just read this article and I want to know if this seems like a good idea to anyone else?
An alternate to traditional deleted_at soft deletion that stays out of the way and minimizes bugs.
how I can make a list in MySQL
like I want to store some ids
and then check if an Id in it
and also add Ids to list
I can do that with MySQL?
make every list record as separate row in a table.
link rows of new tables with foreign key to the record u wish
don't try to make list ๐
I have too many rows and I want to add list for every row
I think Json for this is better
i will repeat again. Don't try doing that if u don't understand SQL databases and what are levels of normalizations
let SQL engine work for you properly
don't break it with usage of list/json stuff if u don't understand what u are doing
create separate table and link values via foreign keys
that is best recommended way, good default working
in SQL databases u need to think trice if u really want lists and jsons
they break proper functionality of database design and must be avoided unless there is some specific usage case that can justify it...
...the trick it is very rarely when it can justify it
Hi im new, somebody knows where can i learn python for data analysis?, especifically, pandas, numpy. and matplotlib
The options are endless so try whatever you like. This intro course is totally free. https://www.freecodecamp.org/news/how-to-analyze-data-with-python-pandas/
I also like interactive apps like DataCamp and DataQuest
Ultimately there's no substitute for real projects though. Once you know some basics try Kaggle competitions and other projects like that
Data Analysis is an in-demand field but it can be hard to get into as a beginner. We've just released a 10-hour beginner-friendly video course to teach people how to analyze data with Python, Pandas, and Numpy. This course offers a coding-first introduction to data analysis. Besides the video content,
Is it generally a better approach to store images as blobs in a database, or store them locally on the file system, for a bot? I am developing a discord bot that will hold items with images, that will be sent as embeds on a discord. I just don't know if the images are stored permanently, until the message is deleted. The database would be PostgreSQL.
in general it is better storing images only as files in filesystem
and if u wish to go cloud native, store them in S3 buckets (optionally in self hosted Minio)
tldr: use Blob storage databases
keep only links for relational databases
it is great antipattern trying to store images in relational db directly
but in blob storage like S3 this is fine and meant to be
otherwise if your app is super is super simple, just store in filesystem (and keep link in relational)
Nice answer, thank you. ๐
I swear I remember seeing some research by Microsoft that I can't find right now showing that if the images are quite small, the performance advantage of storing them in a DB can be considerable.
But I think for the overwhelming majority of use-cases it's probably a bad idea and you should follow Darkwind's advice
@golden vector also if u store them in S3, u don't even need to take them out of database. S3 is possible to show as static assets web site out of the box ๐ So u can just literally link file to view into website
If images are under 1MB they are typically more performant in a wide column DB than blob store yes, I don't thing relational DBs are as good at is because they (at least in postgres) start using TOAST pages which require additional hops around the disk to fetch each row. But things like Cassandra can effectively be told "Here's this blob, dont both compressing it" which ends up being pretty damn efficient.
Interesting. Cassandra is pseudo SQL like db right? Could be interesting option
CQL ๐
Incidentally, ntfs stores small files fully in the MFT for similar reasons, which is why disk space doesnโt change when you write lots of small files
the space complexity of the query: SELECT DISTINCT col FROM table LIMIT :N is always the limit-value, regardless of the cardinality or the COUNT(*) of col, correct?
cause I'm doing this query with datafusion on an S3 dataset (parquet) and the query is taking a long time and 3-5GB of RAM, even tho I'm only selecting the first 10 values,
and I know for a fact that any parquet file has atleast 10 unique values, so it should just read the first file and return instead of doing a full scan ...
it will keep going until N distinct values are gathered
but I suspect datafusion is probably making a very intelligent query plan
Probably test with duckdb and see how that performs?
guys whats best way of using db in fastapi,i am using asyncpg as cdriver
Define "best"? Personally I would use SQL Alchemy unless I have a very good reason not to.
I benchmarked datafusion vs duckdb, and they take about the same (17.93 vs 16.45 seconds), and also the memory usage is the same
with duckdb I can at least reduce memory usage trough the settings, with datafusion I havent been able to configure it
So that probably suggests your parquet files may be difficult to only do partial reads with
how big are the files?
the thing that drives me nuts, is that all 10 values are present in the first file fo the Pyarrow dataset ((x in pd.read_parquet(dataset.files[0])['col'] for x in out['col']))
it did a full scan when all the necessary values could be found in the first partition/file
Does data fusion actually give you a query plan?
wdym? even if it loads each parquet file at once; loading just the first is more than enough for the reasosn mentioned above
less than 2MB each
Why use orm , when you have sql
I found the the query plan generated by datafusion, for the statment: SELECT DISTINCT my_column FROM dataset LIMIT 10 (
# sql.logical_plan()
Limit: skip=0, fetch=10
Distinct:
Projection: dataset.my_column
TableScan: dataset
# sql.optimized_logical_plan()
Limit: skip=0, fetch=10
Aggregate: groupBy=[[dataset.my_column]], aggr=[[]]
TableScan: dataset projection=[my_column]
# sql.execution_plan()
GlobalLimitExec: skip=0, fetch=10
CoalescePartitionsExec
LocalLimitExec: fetch=10
AggregateExec: mode=FinalPartitioned, gby=[my_column@0 as my_column], aggr=[]
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([my_column@0], 8), input_partitions=568
AggregateExec: mode=Partial, gby=[my_column@0 as my_column], aggr=[]
DatasetExec: number_of_fragments=568, projection=[my_column]
@brazen charm
Iโd suggest asking this question over on the duckdb discord; if selecting only certain columns, it should retrieve only the data youโre querying not a full retrieval. I havenโt really read the full issue, but you shouldnโt need a full retrieval for a single column.
Anyone ever try pysondb? It's a JSON database for python. https://github.com/pysonDB/pysonDB
The fact they call it a database really annoys me somewhat
It's be better off being called a gloried set of text files
Don't use this is what I'd say, if you really really want schema less stuff, use mongo
pyson is:
- not a database
- not atomic
- not crash safe
- not efficient
- not concurrency safe
- not storage efficient
Its not a package I would use in production, there are better alternatives
Btw the plan was from datafusion, but they are both behaving the same
I opened an issue: https://github.com/apache/arrow-datafusion/issues/7781
But you said duckdb. What query were you running against duckdb?
Bruh
.
I tried the same query on both, and they are both slow as shit
All I see from@that bug is โlots of timeโ and โmost of my ramโ. How many rows/how large is the parquet file? How does it perform differently if you run against a local parquet file?
(I doubt anyone will pick up that bug as it lacks anything tangible or comparative or reproducible)
Why use a mess of SQL embedded in Python when you can just use Python? ORMs exist for good reason and having cleaner code is generally worth any performance hit.
I would never use an ORM for analytics workโฆ I think itโs a very different space than transactional and object data that orms are good for (my opinion, I know people disagree)
There is always a temptation to use at minimum SQL query builder which will help to chain/utilize programming language features to more full capacities (and may be even making it more typed)
SqlAlchemy Core 2.0+ stuff is more SQL query builder to me with some optional ORM elements integration
What's tough for me is I'm often trying to do stuff that's just hard in SQL to begin with: stuff where I might be using some esoteric features, new functions, etc, where I'm running ahead of what's reasonable to expect from an ORM. This is maybe a side effect of OLAP stuff tending to be on the bleeding edge.
What in your opinion would be better? I'm actually avoiding mongo.
I would go with Postgres, I've heared SQLite also has some JSON extensions
Better to normalize you data but if you just want to dump JSON somewhere as is, MongoDB makes it stupid easy to do so
ะป
Is it bad to store data in 150k json files
it depends on what u store. if datasets, then csv is great
if application data, then better using stuff like postgres (or at least sqlite3, or smth like duckdb... u get the idea)
Otherwise... 150k json files i guess are acceptable for datasets (but not for app data)
BigQuery SQL: Can someone more knowledgable than me explain why I often see examples using ROW_NUMBER() OVER or RANK() OVER in conjuction with PARITION BY to get the most recent row instead of doing a GROUP BY and choosing MAX(ROW_CHANGED_DATE) (of course we have a row_changed_date column)
when should I create sessions? persist it throughout the whole app or one for every request
(sqla)
One reason is; what if you want top 3 by group? Or, what if you want, for each group, the most recent row? Compare the sql via groupby vs using a window function. Window functions are far more flexible.
I am learning SQL , so why I skip it?
If your purpose is learning just do whatever is helpful in that respect. But you should probably learn how to use an ORM too once you have a grasp on SQL basocs
Hi everybody, I'm trying to create unit tests for a function which uses a pymongo db. My idea was to use an embed db by mocking it with mongomock but nothing of what I tried works. I want to do something like this:
def setUp(self):
self.mongo_client = mongomock.MongoClient()
self.db = self.mongo_client['test_db']
def tearDown(self):
self.mongo_client.close()
@patch('app.config.database.db')
def test_payment_body_creates_successfully(self, mock_db):
mock_db.return_value = self.db```
This is the config for my db
```from pymongo import MongoClient
client = MongoClient(host=mongo_host, port=mongo_port)
db = client[str(db_name)]```
The idea is for the tests to be runnable without having mongo up locally
@fringe sundial as both ORM and raw SQL user i encourage to learn raw SQL first throughly and then to dive into ORMs
ORMs are a mess... and if u know how it works under the hood... u gain... a lot of productivity (10-1000X times)
not every ORM is a mess though (i am mostly talking about Django ORM as a super abstractional ORM mess)
Python SQLALchemy or Golang Bun are a different story
they are faaar less abstracting raw SQL from its usage
they just compliment it with language features to do it in a more static typed/nicer/structured way
despite its issues, Django ORM is still awesome (hey, a lot of boilerplating for CRUD) (But as i mentioned again... u really have to know how operates raw SQL first highly preferably)
does this belong to normal level, good, or bad (or genius)?
concat('{"someKey":', case substr(regexp_replace(someKey, 0, 2) when '[[' then regexp_replace (regexp_replace(regexpt_replace(someKey, '\\\\|\"\\[|\\]\"', ''), '@@@', '\\\\') when'["' then concat('[', regexp_replace(regexp_replace(regexp_replace(someKey, '\\\\{3}', '@@@'), '\\\\|\"\\[|\\]\"', ''), '@@@', '\\\\'), ']') else '[""]' end
it's in hql but just wanted to share. We are migrating database and need to update all our queries, and this is what I need to work on.
EDIT:
I guess it's normal, since my new query isn't too far from this...
Has anyone worked with graphdb and graphql? Something like go gqlgen but not for rdb
Dgraph.io looks very interesting with native gql support.
Any opinions on how we can manage protobufs and their relationship with graph databases? wouldn't it be better to have a single source of truth instead of marshaling and unmarshaling via json?
what the hell is this
Hey all does anyone here know DAX?
what help u need in that
ive got a powerbi question.. iv made this on excel... this creates a running total counting the amont of start dates and end dates on each calender date... eg there are 9 start dates from 01/07/22 to 05/07/22... anyways how do i do this in powerbi? pref using DAX
@torn sphinx
the most recent row?
This is exactly what I want. With what I'm working on we always want the most recent row and wouldn't ever change it to be top 3 or something else
Yes, show me how youโd do that with a group by, and Iโll explain why window is (usually) better
Well the problem I have with the window function is that it seems to return different number of rows every time I run it
whilst the group by does
which I don't get
I think it's because of hte order by clause in the window function
but I would have assumed that group by would have the same problem
Could you share an example? That doesnโt make sense to me.
This is how I did it with the window function:
WITH FOO AS (
SELECT
RANK() OVER (
PARTITION BY name
ORDER BY date_time DESC
) AS rank,
OTHER_FIELDS,
...
FROM FOO_TABLE
)
SELECT ...
FROM FOO
WHERE rank = 1 and state = 'CREATED'
And this would return different results every time I run in. I believe it's because there is a tie in the ORDER BY clause and BigQuery just picks one randomly
Yah, well, that was what I was going to say is one problem with the groupby method
In some cases the state is CREATED other times it's DELETED and the WHERE clause filters out those
You can get the max value, but what if itโs not unique
But when I use Group by I always the same number of rows
Secondly: youโre filtering on created in the select but not window. So if most recent state is not created, youโll get nothing
WITH FOO AS (
SELECT
name,
MAX(date_time) AS most_recent
FROM FOO_TABLE
GROUP BY name
)
SELECT ... FROM FOO
INNER JOIN FOO_TABLE
ON FOO.name = FOO_TABLE.name
AND FOO.most_recent = FOO.date_time
WHERE state = 'CREATED'
That's fine this table is a log of the state of files in a google cloud storage bucket. A single file can have states CREATED, DELETED, CREATED, etc. but I always want the most recent entry so that I know whether it's in the bucket or not
I mean, if thatโs what you want, then you could just get the last() value via window and skip the whole rank thing and drop the cte
Oh, nm, you just want the last entry, so yah, nm
Is there anybody who had problem at connecting mysql with python3 in Mac M1 and got NameError: name 'mysql' is not defined ?
looks like typing problem to me ๐
if u would have used strict mypy/pyright, it could be not an issue to you
oh you are right 
Hello, I have three list of dicts. called vpc, po, and eth,
Now, these are somewhat related. An item in vpc, can be related to a po, a po can be related to an eth.
I want to visualize this somehow, maybe networkx or similar.
all the dicts in each list is unique, and the are related by name. like vpc_member: 'po2', or similar.
where do I even start? ๐ suggestions
Unclear what youโre asking, are you asking how to visualize this? Or put it in a db?
Visualise it. Not sure why I put it in databases channel,
Check out graphviz. Itโs the best (by far) of charting tools: you do have to install the executables separate from the pip install.
Yeah, thanks a lot. and quite much easier than expected! now I just have to make some more filters etc.
G = graphviz.Digraph('Nexus relationship', filename='Nexus-VPC', format='png')
for vpc in vpcs:
G.edge(vpc['id'], vpc['port'])
for lag in lags:
# if lag has 1 or more members
if len(lag['phys_iface']) >= 1:
for iface in lag['phys_iface']:
G.edge(lag['bundle_iface'], iface)
G.render()
Anybody know where I can get a psycopg2 wheel for python 3.12???? For windows 10.
the official psycopg2-binary package seems to host them
I can't seem to get that to work. ร Getting requirements to build wheel did not run successfully.
โ exit code: 1
โฐโ> [5 lines of output]
running egg_info
writing psycopg2_binary.egg-info\PKG-INFO
writing dependency_links to psycopg2_binary.egg-info\dependency_links.txt
writing top-level names to psycopg2_binary.egg-info\top_level.txt
error: [WinError 2] The system cannot find the file specified
[end of output]
oh i didnt look closely enough, all the 3.12 wheels there were for linux/macos
Should I post some where else?
What exactly do you mean by that
If you're doing something super generic you can easily find some schema to plagiarize as a starting point, but if another app already has the DB schema you need, then you probably want to use that app instead of reinventing the wheel
I don't other than using Django which does require a set of tables pre-defined by Django.
The important part is keeping as a code for serious custom applications
Files of migrations in your git repository.
How they will be written it does not matter. They should be present as defined library versions or anything.
Django ORM migrations, SqlAlchemy Alembic migrations.
In Django we keep our migrations as a files of code of our applications
And third party installed libraries can run their own migrations from installed django applications (those are premade tables).
hey guys!
I've started to make this project a few days ago
my goal is to make a weight tracker
and the required qualities are as below:
1-ask the user for weight
2-save the date of the added weight
3-show a diagram for the user progress
4-calculate bmi and ....
since I need to store the data long term I was loking for a way to do so
but I'm getting overwhelmed by the variety of file formats and their uses
until now I have csv,json and pickling in mind .
any suggestion?
Sqlite3
๐
- It will have strong data integrity between all data (if foreign check integrity is on)
- Enforced data structure
- Ability to migrate already existing data to new state (works well if u use solutions to keep migrations between app versions as a git commit files)
- enforcing rules/constraints on saved data
- easiest out of all Relational dbs
is there an specific module i need to leanr to work with it?
SQLite3 is the standard Python library for working with SQLite databases directly
You will need to learn basic SQL syntax to use it but it's not hard and very useful
Ultimately you may want to use an ORM like Peewee or SQL Alchemy but that will be a later step if you're new to SQL
inbuilt https://docs.python.org/3/library/sqlite3.html sqlite3 lib is sufficient to work with in minimal capacity.
ORM like SQLAlchemy+Alembic(migrating lib) or Django ORM (just hook it in standalone mode), or any other solution is desirable to work easier with it through python classes and having migrating autoversioned
- (I like in general Django ORM for migrating system, it works)
- (SQLalchemy has better syntax to make queries for custom projects may be. I like its migrating a bit less but it is good enough too and somewhat more reliable perhaps)
- haven't tried Peewee, having no opinions. Important to have good migrating system
- checked docs... no support for introspection in migrating system (and smth else named is missing, database "versioning"?). i would not have chosen Peewee
Hello guys, do u use Python in Excel?
I'm not aware of aby way to use Python in Excel but there are libraries to work with Excel files in Python.
in the Microsoft Beta version now u can use Python in Excel
the libary for sure is Pandas
@primal sparrow in the end i will recommend using just Django ORM. for simple projects it is all that need it and it has everything configured out of the box
Django ORM simplifies dealing with SQL significantly more than any other ORM.
- U get good enough code code structure (very life saving point for novices)
- migrating system (that u don't need to configure like in SQLAlchemy) ๐
- unit testing framework (again not need to configure anything, it just works)
- integrations with other unit testing solutions to simplify it even further.
- highly abstracted python class dealing syntax
- ability to integrate with other boilerplating stuff
Dealing with SQLAlchemy will be kind of pain in the ass to make it all working correctly i think
(Or just using sqlite3 inbuilt library alone can be an option as well ๐
)
would not use ever in my life (hopefully)
Because there is no Python in Excel ๐
It just calls remotely Microsoft cloud for its functionality.
Using Python to operate Excel from outside is acceptable idea to me
Or preferably putting data into DuckDB/Sqlite3 which is even better
This isn't really in Python, but I feel like a more general understanding is needed here.
I am trying to setup Redis. I keep getting this error and I don't understand what it means. I don't get any errors in my editor, but when the code executues i just get this.
Not much of a java dev but it looks like the jar file does not have the class Jedis? You might do better on some java forum, but be aware that there are several versions of redis server that are not compatible right now. So a newer client may fail if the server is ver 3 . redis-server --version should be 5. But that dosnt seem to fit your issue.
I know your not much of a java dev, but would the class being written in Java 7 be a problem while im in Java 21? I opened the package to see if Jedis was in there and it is, but its in Java 7.
Sorry dude, it's been years and I'm a novice.
It's alright, I'll go on to some forum site. This is just tiring, I've been at this for 4 days and I don't get how its not working.
There's a few cases where I think I'll use it, where data is generated or refreshed client side, and I need to encapsulate some logic + data transformation. If I can do it server-side, I will, but in these client-side cases, it's nice because I may not need to setup a separate app to locally preprocess the data.
im triyng to use sqlite3 to compare elements from a column of jsons to a given element. I want to fetch all rows where the element is somewhere in the json. can I do that? I tried using json_extract but it doesnt seem to be working for me
staffTask = cursor.fetchone()
and then i have a while statement which manipulates that row and then goes to next row until the staffTask = None. the issue seems to be that the staffTask is never getting set to anything
guys i have this in db.py
class Connection:
DB_URL = f"postgresql://{getenv('USER')}:{getenv('PASSWORD')}@{getenv('HOST')}:{getenv('PORT')}/{getenv('DATABASE')}"
DB_POOL = asyncpg.create_pool(DB_URL)
async def get_db():
async with Connection.DB_POOL.acquire() as db:
try:
yield db
finally:
await db.close()
i am not getting intellicense or i dont see a method like fetchrow, etc on the DB ,how to fix it?
@user_router.post("/register")
async def register(
request: Request,
DB: Annotated[Connection, Depends(get_db)]
):
body: dict = await request.json()
if not check(body):
raise HTTPException(400, createError("Missing required fields",400))
user = RawUser(
email = body["user"]["email"],
name = body["user"]["name"],
password = body["user"]["password"]
)
I'm not familiar with asyncpg API but maybe you need to create a cursor first?
I have created pool instead
Cursor is acquired from a connection
Ye and it is yielded, I guess I have type hint wrong on DB
Actually there is a fetchrow method on a connection
I think you type hinted it as your custom Connection clas
i got an issue with mongodb
basically i want use it to make the fuction that could save userโs friend code id
and i want it delete old id user signed before when user signed again
but it isnโt deleted old documents
guys what best of using sqlalchemy in fastapi? using asyncpg as driver
best of what? what are options to choose?
1: spread the curd
2: don't spread curd
how coagulating milk is relevant to SQLAlchemy?
i was using metaphor to tell
not known to me metaphore for programming.
i mean like
1: just spread whole code and use bad practices
2: organise code into classes and use good practices
at minimum i highly recommend using covering your database interactive code with pytest
highly preferably with running --cov module and ensuring having at least 80% coverage.
Factory boy integration with SQLALchemy is also great to simplify testing
If u want to increase quality of your code further, make sure to run strict mypy with SQLalchemy stubs
Enforce unit testing coverage and mypy via CI check for every commit.
If you are specially frisky. Consider wrapping database code into pydantic base model structs, with organizing code in... classes somewhat yeah... in order to define minimal strictly defined interface for your other code parts in terms what it needs from database.
The trick is not in using classes, the trick is minimizing amount of strictly typed variables your code needs in order to interact with database. That makes kind of nicer architecture considering how much ORM/database code is kind of fragile. (Pydantic variables are good)
That's more optional thing, consider to be not using it. Use unit tests first, and go for mypy first as thing that will bring you more benefit.
organizing code is usually more difficult for people so it is not very obvious how to achieve good and bad results in it
(In order for classes to work better, u need to minimize amount of local variables mutations u have. Don't change preferably __init__ configured instance attributes of a class once instance of a class was created)
oh, k ,
thansk you so much
What is the preferred method of doing migrations in sqlalchemy ? Ive only used flask-migrate so far. Looking at alembic and https://sqlalchemy-migrate.readthedocs.io/ anyone have any recomendatioins/preferences on this ?
I would prefer simple over full featured because i have to make this pallet able to others on my team that are more ruby based. Ruby has a very straight forward migrate system.
does anyone have an SQL cheat sheet somewhere, to reference for syntax when I'm fuzzy on details?
niiiiice, thank you!
Alembic https://alembic.sqlalchemy.org/en/latest/
Specifically made for SQLAlchemy
Alembic it shall be then ,ty
also, its feature to autogenerate SQL code of migrations, isn't experimental in comparison to tech docs of the tool you provided ๐
The Alembic doc seems to be easier to digest to me, which is important.
Hi everyone!
Is there anyone working on Databases here
I need a small interview for my masters assignment. Please DM me ASAP if possible.
can someone help me? #1162820834717597799
normally my code would do Base.metadata.create_all(engine) somewhere, but with alembic workflow, do you create migration step instead for table create?
Yes, reserve create_all reserve for unit tests only, preferably run migrations in session scope there too
Hello all, Iโm looking to schedule a Python script to run, scrape a website, and store the data. Any advice on the scheduling/storing part of this?
U can use excel to store the data i scrape imbd website top movies in which after scraping the whole data my py script would open the excel file and set the data
Any advice on the scheduling?
Sorry i dont have
It depends on your use case. If you just need to run a single script at a certain frequency then a cron job may do. For more complex stuff you've got Celery, Airflow, who knows what else
it depends on size of a program, what it does and your infrastructure skills
- dumb solution is just using Cron. Zero monitoring, easy to make. Well, u can have some measure of low monitoring with logging at least, but logging in a good way is hard.
- infrastructure profficient choice to run rare to work jobs via AWS Lambda in event bridge (Great option if u know Terraform or Pulumi and how to make them work with AWS. book
Terraform up and runningis great to start learning it)
u pay only for time of your application running then, highly optimized in terms of money spending (can be less than 10 cents at a month for all your runs ๐
)
Comes with inbuilt AWS monitoring)
- programmatic python cool choice to go with Celery. this option is good if u learned docker-compose at least preferably (book
Docker Deep Diveis great to learn compose). Celery comes with periodic scheduler Beat. and capable to run its jobs through redis broker. has easy to install Flower Monitoring (it is important to have observability to your script runs!)
- there are more of course advanced options... AWS batch jobs, or mentioned airflow and etc, but their complexity jumps further and not really justified unless u have matching level of complexity. At somewhat limited capacity you can chain task perfectly fine just with Celery
there is also another option...
Just having While True and some solution fully built on using shared memory of your application and smart usage of multithreading and multiprocessing, optionally asyncio stuff too. It has its own advantages and disadvantages. Works great for me in golang with goroutines and its shared memory without GIL lock at least.
Great option in terms of... simplicity with full programmatic control to have if necessary multiple in memory periodic jobs. Somewhat lacking proper monitoring, but good enough logging can compensate here.
https://pypi.org/project/APScheduler/#description
I use APScheduler a lot. It covers all basis (cron, one off, regular interval, event driven) , and its super simple to add to any project.
basically i want use it to make the fuction that could save userโs friend code id
and i want it delete old id user signed before when user signed again
but it isnโt deleted old documents
How can i solve it?
@client.tree.context_menu(name='Save NNID')
async def save_nnid(interaction: discord.Interaction, message: discord.Message):
user = message.author.id
key = {'ussr': user}
data = {
"NNID": message.content,
"user": message.author.id,
}
user_file = {f'nnid_id.{secrets.token_hex(6)}': data}
db.nnid_log.update_one(key, {'$set': user_file}, True)
await interaction.response.send_message("NNID saved!")
@client.tree.context_menu(name='Find NNID')
async def find_nnid(interaction: discord.Interaction, message: discord.Message):
user = message.author.id
key = {'user': user}
mes = ''โ
for m in db.nnid_log.find(key):
mes = mes + "\n" + m["NNID"]
print(m['NNID'])
await interaction.response.send_message(f"NNID found\n{mes}")```
when i open a scripts on python it close auto
Not sure what that means but it seems like you're in the wrong channel. If you're asking for help with something check this: #โ๏ฝhow-to-get-help
I'm wanting to get the index of a record based on its row.
when i use SELECT base_name FORM bases I get the following data Mixing Black Base Mixing Blue Mixing Green Mixing Orange Mixing Red Mixing Violet Mixing Yellow I want to return 4 for Mixing Orange. Any suggestions?
Or would i be better off converting the list to python and getting the index from that?
i'd use the row_number() window function to enumerate the rows, e.g.: sql SELECT row_number() OVER (), some_column FROM some_table;
hmm that seems to auto sort my data. I guess im going to have to add a pid column. Thanks
Yes u should almost always have an idex column for each table
A primary key you mean?
Why not? ๐ค
It's recommended to have a primary key in all tables, and they're indexed
My primary key was the base name.
What did you mean by pid columnt?
Primary id. Itโs just what I call any column I auto-increment.
primary key shouldn't necessarily be an auto incrementing integer
But in a lot of cases surrogate primary keys are easier/better choice than a natural key
Thatโs why I had it set to my base_name so I could also autocheck for duplicate bases.
can sql alchemy update non primitive arrays?
i have an array in class 'User' and whenever i try updating the order of it, the array order won't save
but if i try to add to it, it works
has anyone ever used mongoose.SchemaTypes.Buffer as a schema type for storing the password and salt hashes? the stored values are in a different format than i'm used to (both String and Blob). They are hashed because they went thru a hashing function. that much i'm sure but i'm not sure about the end result.. it's not plain text but it's also not not plain text
๐ค
Sqlalchemy most likely marks models as dirty on assignment to the fields
You can either use this extension or assign to your array field when you're mutating it
%%sql
select count(community_area_number) as Total_Reports, (select community_area_name from census_data where chicago_crime_data.community_area_number = census_data.community_area_number) as Community_Area_Name from chicago_crime_data
group by community_area_name
order by Total_reports DESC limit 1 ;
does this count as using a sub query?
Yah, it's a correlated subquery
Normally tho, if someone is asking for a subquery, you'd write a join
had another way to solve it without using subqueries spent last 2 hours trying to rewrite it
The problem probably wanted you to write: ```sql
select ...
from chicago_crime_data
join census_data
on ...
i dont recall learning something like that
? How could they teach subqueries without teaching joins?
actually, they probaly wanted this: ```sql
select ...
from census_data
join (select count(community_area_number), ... from chicago_crime_data group by ...)
on ...
we have never actually used the word join in our work
i can send you some example files to show you what ive been taught
if implicit, you're prob used to something like this: ```sql
select ...
from census_data,
(select count(community_area_number), ... from chicago_crime_data) crime_query
where ...
this question absolutely ate into my supposed break time on my timetable. Scheduled to break at 12 for 1 hour its now 13:30
Has anyone else noticed an increase in AWS S3 5xx errors within the last month or two?
Recently everything seems to slowly be catching fire and all AWS is saying is
Amazon S3 can't handle the request at that time
idk if this is the right chat but i have a homework problem do in 4 hours, could someone hop in a call with me and help me with this shit, need to Determine Distance of Robot Arm Movement from givin points
rad
It keeps on throwing the same error untill I terminate it manually while executing the command to convert the standalone mongodb instance to replica sets.
Does anybody have a fix?
sqlite3 Vs Redis
do you guys think the former could beat Redis for SELECT * WHERE pk_col = x LIMIT 1 queries?
like for a table that has about 1M records
Redis by default is in RAM memory db, with periodic saving onto disk.
Sqlite3 is pure file in filesystem, with zero ram usage.
Difference will be in latencies by definition
https://gist.github.com/jboner/2841832
(Hint: Redis will be faster ๐
)
Curiously enough if u would have compared Postgres vs Redis, potentially you would not have seen difference
Because postgres has very intriciate Caching that uses RAM too, that starts somewhere at 1GB or 4GB and can be regulated being higher
or at least somewhere from second request when it was already automatically cached, then there would not have been difference
Addition to answer @queen rose
There exists also File system caching in Linux. for small dbs out of just 1M records it would have potentially equalized Sqlite3 with Redis, despite them being in unequal situations
File system caching in Linux is a mechanism that allows the kernel to store frequently accessed data in memory for faster access. The kernel uses the page cache to store recently-read data from files and file system metadata.22 ัะตะฒั. 2023โฏะณ.
to this equation both dbs have also indexes, but that's probably not important detail at this step
the important step that Redis is not usable as main database (because data can be lost if power surge happened)
and because it is noSQL crap not usable to be main db ๐
a usual Postgres W
ok I think I will still try SQLite3 because its simpler
do you know by chance if sqlite3 actually uses indexes/primary keys like real databases do?
Yes
Honestly postgres is easy to set up, either in docker or on your pc 
But you could go with sqlite
I love sqlite's simplicity
no need to add any dependencies, drivers, or start any CLI processes
Depends on if you need async or not
I dont need async
but I will be reading (not writing) data from multiple proccesses cuncorrently
should be fine, right?
Also personally I just use ORM these days
I think so, but I would personally stick with postgres ๐ค
it does, but you may need to turn on its PRAGMA settings for Foreign key integrity i think
Yep, they're disabled by default
also, if you are going to use stuff like max_length in varchars of ORM, it will be ignored. Sqlite3 is more simple
It does have constraints at least though
ugh I hate ORMs, I like SQLalchemy but I use the 'core' instead of ORM.
I basically enjoy the OOP way to create queries (select(Person).where(Person.name == 'abc')) but I dislike how the ORM will try to sync the DB to the state in the program and vice versa
orm is quite nice, since you can use all the core constructs with it
And in a lot of applications you'd want to map everything into an object anyway
You could write something like this btw:
select(Person.id, Person.name, Person.email).where(Person.name == 'abc')
Basically works like core
I never understood if this is considered Core or ORM?
Eh, both? Kind of?
ok got it
You're using ORM model
no why?
And queries are unified since 1.4/2.0
I mean in this example
why? because you create a table/class Person?
Person is an orm model. tables are a bit more annoying to work with
is that considered ORM?
class Person(Base):
...
Is orm
from sqlalchemy import Table, Column, Integer, String
user = Table(
"user",
metadata_obj,
Column("user_id", Integer, primary_key=True),
Column("user_name", String(16), nullable=False),
Column("email_address", String(60)),
Column("nickname", String(50), nullable=False),
)
Is core
What do they really mean by Core? Because these are class objects, (Table Column, Integer) that are describing table creation by sending
create table user (
user_id integer,
...
I mean it has to translate that into different dialects of SQL, but isnt that, in itself, taking a set of objects and translating that into a a dialect of declarative table creation which would in some aspect be ORM?
ORM models use core classes under the hood
like Table
ORM provides features like declarative model/table definition, relationships, state management, etc
With core you can just declare tables and make queries to them, everything else is on you
So you mean the the actual query for data is naked sql without object translation? Like
conn.execute(text("select * from user ...."))
It's always naked sql in the end, but no, not necessarily
You could use sqlalchemy's query builder: select(table.c.id).where(table.c.username == "Doctor")