#databases

1 messages · Page 38 of 1

tepid thicket
#

Ah, It's better to understand it now

tepid basalt
#

If you have a web dev project, the Django ORM provides a completely different approach that is very popular.

unreal pumice
icy glacier
#

Best orm in all the lands

glacial ether
#

how do i make it so that when i change the name of a class on 1 file it changes the name on all other files?

tepid basalt
hidden creek
#

so i have an sqlalchemy orm like this

in project/settings.py:

class BaseDatabase(AsyncAttrs, DeclarativeBase):
    pass

in posts/models.py:

from sqlalchemy import String, Text
from sqlalchemy.orm import Mapped, mapped_column

from starlette_project.settings import BaseDatabase


class Post(BaseDatabase):
    __tablename__ = "post"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(50))
    body: Mapped[str] = mapped_column(Text)

but BaseDatabase.metadata.tables is empty
am i missing something?

paper flower
#

Usually via import 🤔

hidden creek
tepid basalt
hidden creek
#

don't remember seeing that around
I'll check it out, thanks!

delicate fieldBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

cunning tangle
#

anyone here know how to use pandas

fading patrol
hidden creek
#

trying to learn sqlqlchemy with alembic

i wan to open a session, how do i get the engine from alembic?
or do i have to make an engine for my uses?

#

meaning, alembic is creating an engine to do migration, right?
do i get that for my own uses? or make a different one?

west hill
#

alembic is mostly just for migration standalone scripts

hidden creek
#

fair enough
thanks

brave bluff
#

I'd love some feedback on loading a SQLAlchemy relationship. I'm basically doing this:

purchase = await purchase_repository.get_by_id(purchase_id, get_store=True)
return dict_from_schema(purchase, schemas.Purchase)

where Purchase has

class Purchase(Base):
    // other fields
    store_id: Mapped[int] = mapped_column(ForeignKey("stores.id"))
    store: Mapped["Store"] = relationship(back_populates="purchases")

Because it has the linked store, but the store isn't accessed, it's not loaded and I get

Error extracting attribute: MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

So I changed my query to do this:

        statement = select(Purchase).filter(Purchase.id == id).limit(1)
        if get_store:
            statement.options(joinedload(Purchase.store))

I thought by making it a joinedload, I'd solve the problem? What have I missed?

brave bluff
#

So it's worth noting, the model is using AsyncAttrs, and if I add await purchase.awaitable_attrs.store before the return, it works since the data is loaded. Which makes it seem like the eager load doesn't work?

#

Of course, a SUPER dumb mistake

#

In the if block, I never stored the updated statement back into statement

brave bluff
#

I got an integrity error, which was expected:

sqlalchemy.exc.IntegrityError: (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.ForeignKeyViolationError'>: insert or update on table "purchases" violates foreign key constraint "purchases_item_id_fkey"
DETAIL:  Key (item_id)=(5) is not present in table "items".

Does anyone know how I can pull that "DETAIL" string? I see that IntegrityError has a detail value, but every time I print it, it comes back as empty.

#

Best I can find so far is `str(e.orig), which I then need to parse via regex

humble cloud
#

Hi are there any MS SQL resources in this channel?

delicate crag
#

How can I query multiple tables parallely

tepid basalt
#

Databases support concurrent reads. Are you running into a particular error?

distant carbon
#

When connecting to databases with Python, is there a recommended way of connection within a function. An example below is how I currently think it should be done, but wondering if there's a way to take advantage of context manager's or some other way of doing it that's better.

def connect_to_snowflake():
  conn = snowflake.connector.connect(
    user=USER,
    password=PASSWORD,
    account=ACCOUNT)
  return conn


conn = connect_to_snowflake()
cur = conn.cursor()
cur.execute("SELECT * FROM TABLE")
cur.close()
conn.close()```
tepid basalt
#

A context manager is a good idea, here's how you could create one using the contextlib.

from contextlib import contextmanager

@contextmanager
def snowflake_cursor():
    conn = connect_to_snowflake()
    cur = conn.cursor()
    try:
        yield cur
    finally:
        cur.close()
        conn.close()

with snowflake_cursor() as cur:
    cur.execute("SELECT * FROM TABLE")
distant carbon
tepid basalt
#

I'll try my best! 🙂 A function that uses yield instead of return, returns a generator. A generator can be iterated over. yield specifically returns an instance of the iteration. When with snowflake_cursor() as cur is called it loops over the first iteration of the generator. When the block is closed it also reaches the end of the iterator, because there is only one value. Then the finally block is executed.

distant carbon
#

Ah okay. I think I understand. So with a return it would exit the function, whereas with yield is sort of pausing the function, so once the with statement is finished, the function will resume and close the connection within the finally block?

tepid basalt
#

I think that's fair to say!

timid pelican
#

Hi Yall

#

Could I speak?

brave bluff
#

I'm starting tests for a FastAPI project. With Postgres, should I create a separate db for the tests, or a schema?

small geode
#

im making a bot to pull blockchain data and predict rugpulls so im wondering if i should just pay for an api to store my database or go through the trouble of using flask

tepid basalt
brave bluff
severe vine
#

hi

brave bluff
#

With SQLAlchemy, anyone know if I get two db objects and modify them both, is there a way to only update one of them? If I do a flush/commit, both are sent.

lament parcel
floral adder
#

Damn

sour rune
brave bluff
#

Honestly, I'm considered dropping using ORMs entirely.

sour rune
#

Can be a little tricky building complex model structure with lots of relationships ill give you that

brave bluff
paper flower
sour rune
#

Unsure why that would be a bad practice

#

Im not particularly fan of sqlmodel, but the whole point of it is to merge API schemas and db models

glacial ether
#

what percent of android phones globally run on android 10 and lower vs android 10 and higher
not just the us and eu

torn sphinx
#

saw doc here and forgot which server it was lol

glacial cargo
torn sphinx
#

not redefining, they are just passing the values
in this case USER will typically be a string constant, this value is passed to the function

glacial cargo
#

ok

paper flower
junior light
#

Anyone have experience with dbt?

#

Or any data quality check really

coral wasp
junior light
# coral wasp Yes, why?

I am reading up on some tools to implement some data quality checks and decided to give dbt a try. Is there some way to implement dbt testing in an existing database/projet?

#

So far it's all about creating a new project

coral wasp
#

Dbt isn't specifically around testing, it's around structuring your data pipeline in a particular way, which includes testing

junior light
#

yeah that's what i thought

#

maybe great expectations or soda would be more appropriate for the task

floral forge
#

hey guys, so im currently working on a discord bot with some challenges.

bot overview:
tracks message activity across different "lounges" in a large discord server
currently handling around 700k members
tracks user message counts in real-time
needs to handle frequent writes (every second-ish)

current setup:
using json files for data storage
periodically saving message counts
im concerned about file i/o performance and potential data corruption

my main questions:
is json still a viable option for this scale?
would sqlite be a better alternative?
are there any performance concerns i should keep in mind with 700k members?
i've already added some optimizations like atomic writes and error handling, but i want to make sure i'm using the best possible approach

tepid basalt
#

SQLite was designed for your use case. Would it be a lot of work to switch?

floral forge
#

yeah it would, i’ve already fully implemented the json files

#

i also have to migrate the data

#

for some of the files

#

and completely rework my database file

waxen finch
# floral forge hey guys, so im currently working on a discord bot with some challenges. bot ov...

a JSON file and SQLite do share the property that neither can safely perform parallel writes to the file/database, however, while you had to implement your own locks to ensure you didn't corrupt your file, SQLite has its own locking mechanisms built-in, and additionally can distinguish between read and write transactions to allow concurrent access, including reading while writing if write-ahead logging is used, aka WAL mode

im concerned about file i/o performance and potential data corruption
that is indeed an issue, there are benefits over performing full reads/writes to a JSON file in that SQLite does not need to read the entire database to retrieve information, nor does it need to re-write the entire database to change a value, though there are some caveats to that last point:

  • by default, SQLite uses a "rollback journal" to maintain durability, which essentially means that every write transaction requires SQLite to copy parts of the database as a backup before modifying it, in case the program/system crashes during the write (see also Atomic Commit In SQLite)
  • if you switch to WAL mode, writes are appended to a separate file instead of being written to the database directly, which is very fast for writing, but worsens read performance over time until a "checkpoint" is performed to move data from the WAL file to the database (see Performance Considerations)
    • SQLite automatically checkpoints when the last connection closes, or during COMMITs if the WAL is large enough, at the cost of a writer taking a bit longer to commit
#

all of this info above is assuming you're going to use SQLite with multiple connections - others may suggest to use one connection for your entire application which, from my understanding, is fast because SQLite won't have to contend with transactions from other connections, but now you're back to JSON because you're now responsible for locking the connection every time you need to use it for reading/writing, so i'm not a fan of it (FWIW this was how i used SQLite for my first discord bot, but my newer ones have moved towards simply starting connections on demand and letting SQLite handle locking)

and last thing, for an async application, you probably don't want to use the standard library's sqlite3 module since it will block whenever it needs to wait on a pending transaction; it's possible to carefully workaround that, but it is easier to use an async wrapper which handles SQLite in a separate thread, like aiosqlite or asqlite
(the latter was written by the same author as discord.py and is my personal preference because i like its default configuration; it sets WAL for you, enables foreign key checks, and disables implicit transactions so you're always managing it)

tepid basalt
tepid basalt
waxen finch
#

if you say, made an API request while inside an write transaction, another writer can't butt in front of the line to commit their own changes, so that's what im referring to

#

(technically you can start a transaction but not actually acquire a lock until executing your first query, but im ignoring that)

tepid basalt
#

Sure, but that's not fair to compare the disk level constraints of SQLite to the filesystem level constraints of JSON.

#

This comparison makes sense in the context of a client-server DBMS for example.

waxen finch
waxen finch
tepid basalt
#

Fair enough. I would just note that SQLite is much faster than writing directly to a flat file, and if OP wants concurrency then there are relatively simple multi-thread solutions available for SQLite.

floral forge
#

thank you too @tepid basalt

proven turtle
#

Nice

versed temple
#

guys any resources to learn SQL

grave kraken
#

Does making a column UNIQUE NOT NULL automatically make it a primary key (MySQL)? I didn't add a primary key explicitly

floral forge
#

hey guys, could use some implementation advice for my discord bot. i'm currently storing user interaction stats (hugs, kisses, etc) in a json file that's around 2mb with a structure like {user_id: {"hug": 5, "kiss": 2}}. my concern is i/o performance, every time someone uses a social command, the bot needs to read the entire json, update a counter, then write the whole thing back to disk

i've already implemented sqlite for my lounge activity tracking (tracks every message in the lounge channels) and that's working great with proper indexing. wondering if it's worth migrating these interaction stats to sqlite too, even though the file size is relatively small? the server i'm implementing this to has 700k members, but the interaction commands are mainly reserved for donors

what's bugging me is the read/write cycle for json feels inefficient, especially when sqlite would just update the specific row. but is 2mb small enough that the migration effort outweighs the benefits? i know sqlite would be more "correct" for this use case, but im trying to be pragmatic here. has anyone done similar migrations or have thoughts on the json vs sqlite tradeoff at this scale? would appreciate any insights, thanks!

runic fiber
#

I don't think that query should work

#

Or am i wrong and gpt is right?

lament parcel
tepid basalt
# runic fiber

You're right. This query would no return any results. The HAVING clause is equivalent to this WHERE clause:

WHERE salary > salary

Of course there are no rows where the salary is greater than itself.

final tulip
#

Is anyone here really good at using sql, pandas and python? I’m really struggling with doing etl stuff. I would post here but the files are too large

frank patrol
#

Hellow guys and gals. My 3 years old flask-sqalch-marshmallow backend is growing unmaintainable. Specific point of pain is marshmallow schemas. I have a lot of relations and tables, and as a result, by the time staff gets serialized for GET requests, a top level schema has few field.Nested members serialized, each with their own field.Nested, and the tree goes up to 5 nested deep, with some schemas in that member tree having their own inheritance tree. The mix of nesting and inheritance makes it hard to tune things for api views. I end up having to drill request params with context.args through the tree, or write out custom schemas for business requirement, just to tune output a little bit. This starts to look ugly, as some deeply nested children in their schema.py definition file have 6 different versions for different goals, even though some are very similar, and most are used only once.

How do I scale and future-proof that shit: Rewrite the thing with mixins (How do i then pass customization params down the tree)? Dump complete model from every model at every layer, and filter out garbage with only= or final, pre-response processing?

Related question: do you know any open source mature projects using marshmallow that I can learn from?
Kind regards ☕

boreal storm
#

json is used for?

floral forge
#

a lot of lightweight applications use json

tepid basalt
#

Probably most frequently used as a standard format to pass data around. Think json API

stone bronze
#

Hello! Anybody need help in SQL

paper flower
tepid basalt
digital rune
#

hey i'm using SQLite (learning it) and i'm wondering what are the limit of :memory: storage ? can i really store a lo ton it ?

tepid basalt
#

Depends how much memory (RAM) you have. But just to be sure, you understand all the data is gone once you close the connection right?

digital rune
digital rune
#

let's say if i wanna load multiple discussion from a server...

tepid basalt
#

I wonder if DuckDB might be better for your use case?

digital rune
digital rune
tepid basalt
#

Ahhh, probably not much

#

Are you talking discord server?

digital rune
#

not really i'm making a chat app

#

basicly

#

and i would like to store the data exchanged on the server and have them temporaly on the pc when loaded, i wonder how much it could hold up, as i could delete data when new are comming and the defined cache is full but i don't want this to be too recurent or limitating.

digital rune
tepid basalt
#

I see, with gigabytes you can store millions

thorny anchor
#

yeah, most data is pretty small

dawn shard
#

Does anyone have any recommendations for databases for write heavy workloads (lots of small writes to relatively few values) -- I want to maximize the throughput on a fairly low-end machine.

floral forge
#

use WAL mode on SQlite

tepid basalt
#

What is lots? <50,000/s? Then SQLite as barney said 🙂

dawn shard
#

Interesting, I'll test WAL.

dawn shard
tepid basalt
paper flower
dawn shard
#

batching writes is not possible here (outside of writing my own write-ahead log)

paper flower
dawn shard
#

and each action has to write to the DB

#

it's not going to be too complex of an environment, so I am willing to sacrifice some comfort for cost savings

paper flower
#

so agents pick up jobs, but what exactly do they write to the db?

dawn shard
#

an agent can say things like "pick up ball", "move left", "switch this valve" etc. And they receive the environment state/the change in the environment this caused.

#

normally, this would be entirely in-memory, maybe dumped to a flatfile periodically, but here I'd like it to be properly in a DB.

paper flower
#

If you're not receiving literally thousands of rps it would probably be fine

#

Since it doesn't seem that would be a limiting factor

dawn shard
#

In practice, the limiting factor will probably be networking, a gigabit connection can't bring in enough data to saturate an SSD, but it is probably somewhat important to check that I'm not going to hit like two second pauses on 99.9% latencies

paper flower
#

Again, I don't know what exactly you're doing so it's hard to judge what exactly may be the bottleneck in your architecture 😅
Generally dbs are relatively slow to write to, but I think sqlite may be better, also if you could use a stream-like structure (kafka/redis streams) for a job queue-like structure - redis streams can probably handle ~10k rps or more

dawn shard
#

There isn't a lot to it, receive request - read O(1) data from db - run logic (I can get this more or less arbitrarily fast, it's fairly basic number crunching) - write O(1) data to db.

#

and agents will be running roughly 24/7, so traffic should be pretty flat.

paper flower
#

I don't think reads are O(1), but it depends

dawn shard
#

O(1) data as in a constant amount data

paper flower
#

I thoughs you're talking about time

dawn shard
#

yea, but it's important that read size doesn't scale with rps/number of agents

#

tho it kind of isn't IG, it's still probably just going to be scylladb

tepid basalt
#

I wonder if a simple key-value db would be right for your use case. This is out of my wheelhouse though. A quick search reveals some interesting embedded options.

dawn shard
#

yeah, I was looking at LMDB and rocksdb as well

#

sqlite gets 0.4s worst case latency on an SSD, and processes 100M commands in ~73s (using the C API directly). So the only angle for improvement is the tail latency, which embedded key-value stores should do well (tho they do also need to do compaction).

dawn shard
#

hm, lmdb really sucks here, or I am using it wrong.

dawn shard
dawn shard
#

1.2s worst case latency, 10M commands in 14m. Let's see if LMDB can be meaningfuly better.

dawn shard
dawn shard
#

Go's badgerDB did it in 2m

#

I'll verify the results tomorrow, but it looks like I'll be writing some Go

wise goblet
#

hmmmm, cute. By default Badger can be embedded into my app binary file, no extra processes are needed

#

That's very powerful and interesting for me

#

may be a thing i needed to cache some data / unload some of my data from RAM

digital rune
#

Hey i'm making my first database using SQLite for an chat app with account, i wanted to know if anyone had any tips about what is important to store from the user, i have the usual credential and id but after that i was wondering about other data i could need in the future, and same for my channel Table.

tepid basalt
#

I would add created_at and updated_at fields. Along with created_by and updated_by. Maybe a last_login_at field. Beyond that just add new fields as you need them.

digital rune
#

Also i plan on storing the message each user send to each other. Now there are two posibilities :
First i make one tab per user and store all his message... on his table, or i focus on a channel based table and i make a table per channel. I tend to prefer the channel based but the user based storing message certainly need to query from other user when reading a channel but would make less tables overall...

digital rune
tepid basalt
tepid basalt
digital rune
#

oh

#

but wouldn't be too much ?

#

dammn

#

really

#

lol

#

SQLite is that strong

tepid basalt
#

It's not a question of performance but ease of development.

digital rune
#

ok

digital rune
tepid basalt
#

A primary key is the unique identifier of a record, used in other tables to define a relationship between both tables.

digital rune
#

what does that mean

tepid basalt
#

If you have an author table and a book table. Your book table might have an author_id field, this field has the value of the primary key of that book's author.

tepid basalt
#

I recommend going through the course above. It will explain this more clearly than I am.

nova hawk
#

A primary key is a unique value, this can be a single column or multiple columns, that identifies a record in a table.

tepid basalt
#

And no, I suggest not using username or password as the primary key. Instead stick with the sqlite default.

brave bridge
tepid basalt
brave bridge
#

yeah, it can be potentially puzzling if you're looking at a table structure and it's not clear how a column is uniquely identified

#

also, when you do SELECT * FROM table;, rowid is excluded (generally SELECT * is not a great idea for production code, but it's kinda useful for inspection/debugging)

#

Primary keys are kinda confusing as a concept. At least in PostgreSQL, you don't have to reference a primary key in a foreign key constraint, it just needs a unique constraint.

#

From what I've gathered:

  • a primary key cannot contain nullable columns
  • a table cannot have more than one primary key
#

https://www.postgresql.org/docs/current/ddl-constraints.html

A table can have at most one primary key. (There can be any number of unique and not-null constraints, which are functionally almost the same thing, but only one can be identified as the primary key.) Relational database theory dictates that every table must have a primary key. This rule is not enforced by PostgreSQL, but it is usually best to follow it.

Primary keys are useful both for documentation purposes and for client applications. For example, a GUI application that allows modifying row values probably needs to know the primary key of a table to be able to identify rows uniquely. There are also various ways in which the database system makes use of a primary key if one has been declared; for example, the primary key defines the default target column(s) for foreign keys referencing its table.

tepid basalt
#

I can't think of a good reason to reference something other than a primary key.

#

This rule is not enforced by PostgreSQL, but it is usually best to follow it.
+2

#

I work with a db where a lot of tables are missing PKs and it is a hassle

brave bridge
#

what I'm saying is, I'm not quite sure what the difference between just a unique non-null key and a primary key is

#

maybe it's just about semantics

tepid basalt
#

Practically nothing, it's a theory thing as the docs you quoted mention.

#

Many tables might have unique non-null fields eligible to be a primary key and used for foreign key constraints. But it is confusing if relationships are defined using different fields.

paper flower
digital rune
lament parcel
paper flower
#
The parent key is the column or set of columns in the parent table that the foreign key constraint refers to. This is normally, but not always, the primary key of the parent table. The parent key must be a named column or columns in the parent table, not the rowid. 
tepid basalt
brave bridge
lament parcel
#

Ahh i see, I assumed since most implementations use physical order

paper flower
brave bridge
#

I recently started reading the PostgreSQL 14 Internals book, it seems like storage is quite complicated in postgres

tepid basalt
#

Oh cool, any interesting tidbits?

paper flower
brave bridge
brave bridge
#

yes

#

as in beer

paper flower
paper flower
brave bridge
brave bridge
digital rune
digital rune
#

quick question about SQLite again, i want to have a row so for each user i can write each channel they belong to, but how could i for exemple when a channel is created and the user name is in this channel automaticly link it to the user page in question and also have multiple channel linked to one user. I would like to know if it's possible in SQLite or i do that in python even tho i would prefer it linked

tepid basalt
#

To be clear, a channel has many users and a user can be in many channels. This is called a many-to-many relationship. It requires a third table that has a foreign key to both the user table and the channel table.

junior light
#

what would you guys recommend for data quality checking tools?

digital rune
tepid basalt
#

Yep, you got it 😄

digital rune
# tepid basalt Yep, you got it 😄

ok nice, just quick question how could i implement multiple value into one case only, like having multiple channel into the channel row of one user ?

#

is there something already exsiting into SQLite

late jungle
#

Integration of tens of a model with a back end of ask and with integration of react native mobile app is quite challenging

tepid basalt
digital rune
#

or i should dedicate the database for message and json to store account information

#

or i guess i could organise in an other manner my database

tepid basalt
#

The simplest thing to do is stick to relational database best practices. Once you start hitting scaling or performance issues, that's when you can look at designing for those. Trying to do that before you run into those issues, will certainly be more work.

wise goblet
#

The only reason to hesitate picking or not picking SQL/Sqlite3 can be a question if u can afford also preferably getting ORM in (in python world it is SQLAlchemy with Alembic, or Django ORM, or some other ones)

#

default SQL is meh, ORMs rule

#

if no ORMS, for simple apps i would prefer storing my data structs in nosql instead

tepid basalt
#

ORMs are good. But learn SQL first.

wise goblet
digital rune
wise goblet
digital rune
wise goblet
wise goblet
wise goblet
wicked mist
#

Guys does my ERD (entity relationship diagram) look ok? I'm very new to db design

#

For context I'm trying to create a basic role-based access control

wise goblet
wicked mist
#

Like this?

#

So 'role' can be given to users and groups (one-to-many?)

wise goblet
#

i would imagine though basing RBAC on example of postgres for example

#

in this situation we could be wishing asigning Roles to users

#

Many to Many relationship i can imagine. Many users, can be in many roles i guess.

wicked mist
#

Well yeah I'm planning to use Postgres

#

Made some changes

#

@wise goblet I just want to know if it looks right lol

#

Like in terms of valid ERD design

#

Because this is literally my first time creating one

wise goblet
wicked mist
#

And I gotta show it to my peers

wicked mist
#

And yes I did write it on my own

wise goblet
wicked mist
#

Can you elaborate on what that means?

#

Foreign Keys defining relationships?

wise goblet
wicked mist
#

What should it rather be?

#

The Group-User table for example has FK1 group_id NOT NULL for its relationship with Group

wise goblet
#

or head first series for a bit smth more serious, yet very brain friendly too

#

i am not comfortable spoon feeding primitive answers. Not good thing to do for both parties

wicked mist
#

My man, I'm not sure if I have to read 55 pages worth just to know what you mean lol. I don't want you to spoonfeed me just tell me what you mean by "Your last all iterations are meaningless for some reason"

brave bluff
#

With SQLAlchemy, if I do something like

connection.execute(text("SELECT * FROM items"))

can I get the column names associated with each result set? Best I can see, the answer is no?

torn sphinx
wicked mist
#

Real

#

Bro just made a claim and didn't bother elaborating on it

torn sphinx
#

Maybe he had good intentions bro, but maybe he type to fast so came out little rude 🤔

#

My advice is it depends how complex of a permission management system you want.

#

It is easier to first know the requirements and then do the ERD.

deft apex
#

why is 55 pages punishment

#

i guess it depends on how much content there is on each page but

#

when it comes to an average amount of material per page for a textbook, 55 pages doesn't seem too bad

torn sphinx
#

@deft apex You don't need 55 pages to learn the normalisation rules. And, mostly up to 4NF is fine. That book is teaching more than this person needs to know right now.

#

Take baby 👶 steps, it is better

#

🙂

deft apex
#

yeah normalization rules are pretty chill

#

i don't know them but i'm aware that they're a thing and i'm aware of their comprehensiveness

wicked mist
torn sphinx
#

Hot take: SQLite is more than enough. I will die on this hill.

torn sphinx
#

I mean, It won’t host a billion dollar company but it’s enough for 80 percent of people.

#

It only slows down when you have a couple hundred thousand records to query through

#

Not really hundred thousand records is actually peanuts for modern hardware

#

Sqlite is usually limited when you need to write. Especially with write heavy applications.

dawn shard
#

SQLite is indeed excellent, tho being dynamically typed is a bit concerning.

torn sphinx
#

It also has few other weird quirks as to how it implements SQL

dawn shard
#

Also, SQLite doesn't fare all that well when you have multiple distinct machines handling requests.

tepid basalt
torn sphinx
#

I tried using mongodb once and I don’t know if I liked it or not

#

Though it was my first experience with databases or anything close to that so maybe I was just awful at using it

#

Maybe MongoDB is an absolute legend with strength I am not blessed to handle

torn sphinx
#

One is relational other is not

karmic pond
#

But postgreySQL is more powerful then any database

dawn shard
#

postgres covers most database usecases, but it would be inadequate e.g. for discord messages

thorny anchor
#

honestly it's probably fine for a decently long while

foggy fractal
#

Is table bloat still an issue for PG(high'ish frequency updates)? I recall hearing about some mitigations some years ago, but haven't been keeping up.

#

google bots says yes

granite geyser
#

Hello guys, how are y'all doing? I have a question about SupaBase in Gemini API 2.0 Flash. I want to integrate the AI with SupaBase using MCP, but I'm not sure which method is efficient and optimized in terms of performance. Because I want to use it on our website that will be available for a company. Is it a good idea? Or if there is a way that is easy to do, please let me know.

Thank you a lot, and I wish you all a good life!

amber owl
foggy fractal
# amber owl probably better to @ a postgres mantainer

Casual interest. The bot's conclusion looks accurate enough. HOT was the feature that I was recalling, and it would appear to, still, provide a fairly strong bloat counter for common UPDATE cases such as counter increments/decrements.

fathom pilot
#

hello guys i just started working with flask and currently im learning sql with python.So can u all suggest some courses or books

torn sphinx
#

to learn sql i think basic w3 school is fine. but for more indepth check this
https://roadmap.sh/sql

roadmap.sh

Comprehensive roadmap to learn SQL from scratch in 2025. From basic syntax to advanced querying, this step-by-step guide will equip you with the skills needed to excel in database management and data analysis.

small meteor
#

Hello, I have a 200 gb csv file, I need to open and take a small part of it, chatgpt suggest using database, what do you advice

tepid basalt
#

I would give DuckDB a shot

storm mauve
#

(e.g. a bunch of parquet files with 1 gigabyte each)

tepid basalt
small meteor
#

I know R and familiar with python but I am not coder

#

so with your guidance I will learn the way to make it work

#

I will be reading some DUCKDB now, you mentioned apis I only know them from gaming 😄 . splitting is good also but the problem is it creates more documents and takes too much space

#

I will buy a 2tb external waiting some good deal

dusty knoll
small meteor
# tepid basalt I would give DuckDB a shot

I am starting ot learn it, but had hard time in python so switched to R, but the funny thing is my csv file used ; semicolon instead of commas and even though I learned how to do it which works in my test.csv now I get errors for the real data which has ; semicolons 😄

tepid basalt
small meteor
tepid basalt
#

Can you share what you tried please?

small meteor
tepid basalt
#

Are you able to paste the code?

small meteor
#

someone had the same issue, my solution is if I can delete column names it will work with default names

small meteor
tepid basalt
small meteor
#

yes I installed all today

#

let me show u

#

I will ping u when I do

#

duckdb.sql("SELECT 'ORDER NO' FROM 'C:/Users/icewind/Desktop/aaa/bigdata.csv'")

it returns

#

@tepid basalt but when I look it normally order no has data in it this just replicates

#

when I try one word column it works

tepid basalt
#

Have you tried to SELECT *?

small meteor
tepid basalt
#

And did you try specifying the delimiter?

grim vault
#

Single quotes are used for string literals use double quotes "

tepid basalt
#
SELECT "ORDER NO" FROM ...
small meteor
#

nope it doesnt work I tried it

#

I wil ltry something else 1 sec

#

ok this worked thx
duckdb.sql('SELECT "ORDER NO" FROM "C:/Users/icewind/Desktop/aaa/bigdata.csv"')

#

changed the outside ones to ' and inside ones " as u suggested

tepid basalt
#

👍

small meteor
#

but the opposite doesnt work

#

the outside " and two inside '

tepid basalt
#

That's right

#

As expected

grim vault
#

SQL-Standard: 'string' and "identifier"

small meteor
#

I am new to python normally I use R but duckdb is very confusing in it

#

I read all documentation in website but one thing confuses me that, it says "To use DuckDB, you must first create a connection to a database. " but I use direct commands to access the csv file in pc without any connection

#

in R it forces you, thats where I failed no space left since creates a huge temp

peak delta
#

Hello World ! I'm trying to scrape a website using an 2 years old tuto and I encounter this problem. Can someone help me please ?

lime current
small meteor
#

hmm seems I can only use duckdb.sql("SELECT once , tried selecting on selected didnt work

small meteor
dreamy fox
#

How do I right test from my project

coral wasp
small meteor
coral wasp
#

But I'm happy to answer any Sql related q's, I don't use R

#

(It is moderately active on weekdays, but many of the devs are in Europe timezones)

small meteor
# coral wasp Ask in the R channel.

UPDATE: I switched to python from R since its very confusing and very weird. Python was really easier. After tons of reading and trying I become a bit familiar, for today I needed to filter 2 columns and I achieved it by this code:
x1 = duckdb.sql("SELECT * FROM read_csv('bigdata.csv', header= False) WHERE column00='aaa' ")
x2 = x1.filter(" column04='bbb' ")
x2.write_csv(" out.csv ")
edit: forgot to mention in python I have no problem with semicolon

coral wasp
small meteor
coral wasp
small meteor
coral wasp
#

Use 'offset 1'

small meteor
#

this didnt work

#

dunno where to put 😄

near wyvern
#

hey guys is this job position for a backend developer or for data engineer ?

Job Duties and Responsibilities:

Writing new services to collect and process data from external vendors and public APIs
Writing internal APIs to assist with the processing of card applications and their ongoing underwriting.
Integrating new research from the data team into our production processes.
Writing ad-hoc scripts in order to bulk-process historical data.
May be required to perform other tasks and duties reasonably related to job responsibilities.
Experience/Knowledge, Skills & Abilities:

Bachelor’s degree in a relevant field required, Graduate degree in relevant field a plus
Expertise working with one or more relational databases systems (e.g. SQL Server/Postgres).
Strong knowledge of Python
Data processing experience and analysis using Pandas/Numpy
Experience working with consumption of external REST/SOAP APIs
Experience in writing new APIs (preferably using Python/Flask)

tranquil aspen
near wyvern
#

yeah that's what deepseek told me also but it is like data engineer and back end because of creating api's

orchid zenith
#

nice tips

near wyvern
coral wasp
near wyvern
#

so according to you this is a backend position or just data engineer

coral wasp
near wyvern
#

ohh great thanks for clearing that out

#

i was a little confuse because i have just a little experience with node js and i want to get this job which title was junior python dev

#

i also have some experience with python on automating simple stuff

lime current
#

Can you double check? This import error has to be investigated

peak delta
#

I'm sure

#

This is the beginning of the error message

#

And when I do it with the other link it pass

#

the link that works is the one of the tuto, but cause I'm interested in MLB I done try with another link.

#

maybe it's because it is a shtml link

grim vault
#

Well, I get: ImportError: Missing optional dependency 'html5lib'. Use pip or conda to install html5lib.
So I installed html5lib, after that I got ImportError: Missing optional dependency 'beautifulsoup4'. Use pip or conda to install beautifulsoup4.
So I installed beautifulsoup4 and after that I get ValueError: No tables found

peak delta
grim vault
#

I don't know. You have to install the html5lib package before you can continue.

peak delta
#

Okay thank you

#

I’ll try it

peak delta
#

And when I try to import both of them directly on my jupyter file they can't be found

grim vault
#

Sorry, I can't be of help any more. I never worked with jupyter. Maybe try a help channel, it's not realy database related.

peak delta
#

Okay I was looking for the webscrape channel but I didn’t found it. Thank you have a nice day

proper wedge
#

Hey guys

#

I have some problem with a sql application

tepid basalt
#

Hello, what's the problem?

proper wedge
#

I'm using "SQL Server Management Studio 20", I'm unable to open the containing folder for my database diagram. It gives a warning that says "Unable to open the folder. It might have been deleted or existing permissions might be insufficient."

#

I'm trying to send the database diagram file to another person

tepid basalt
#

Are you able to view the folder with file explorer?

proper wedge
#

In the application, I can but not the windows file explorer

tepid basalt
#

I'm not familiar with windows permissions, sorry. Sounds like SSMS is installed with more permissions than your account has

orchid zenith
#

it should have some permissions set automatically so check in chatgpt how to remove it

proper wedge
#

Thanks guys

orchid zenith
ripe ore
#

where you guys put scripts that insert hard coded data into dbs?

#

In my case all data of one specific SQL tabel is hard coded (for now), but I want to use the same API to all persistent data

#

the matter here is project structure

#
migrations/
  <today>.py             # here?
scripts/
  db/
    data/
      that_one_table.py  # here?
src/
  db/
    scripts/
      that_one_table.py  # here?
#

I'm using sqlalchemy so wherever it'll be, I'll have to import src.db.* stuff

ripe ore
#

is there a consensus about this?

cedar tiger
digital rune
#

hey i'm making a chat app and i want to make a table that stores for each channel the users that are in, but i wonder if it is a good idea to add a row for each user that is added to a channel. i don't think it is real efficient but idk how to do better. If someone as any idea for it. i'm taking it

proven ore
#

Does someone know how to prevent in a TRIGGER function checking columns for a table that does not exist?
I have a trigger for 2 tables and I use TG_TABLE_NAME check to check if it correct name, however the actual check for that table was also executed and because column does not exist on other table it fail to execute query:sql IF (TG_TABLE_NAME = 'api_transfer' AND (SELECT ts.name = 'transferred_to_account' FROM api_transfer_statuses ts WHERE ts.id = NEW.status_id)) THEN Here is NEW - record of table that called in trigger function and status_id exist only on api_transfer table, but not the other, however it still being called and trigger crash exit.

ripe ore
rich nymph
#

Hey everyone , what do you think about firebase as database?

marsh robin
#

Hello. Im trying to learn good practices using postgresql(psycopg2) and python. Like folder structure, how to separate different sql statements etc. You guys recommend any good article/video/github project?

tepid basalt
#

Hello, can you share a little more context about your project? Are they DDL statements?

marsh robin
tepid basalt
#

Just write your raw SQL statements in the Python file. Keep it simple.

ripe ore
#

I'm writing a test to test an event that checks and create the schema on the first ever connection due to async structuring

#

And I wrote a NOTE comment stating "Pre-runtime tests of schema validation are a migration related matter"

#

That's because, since it is SQL, the safest runtime check is just to see if the tables already exist and create them if they don't

#

Is this code comment correct? Is ensuring schemas upon first and posterior runs a matter of migration, thus, needing to tested pre product launch?

#

And by production launch I also mean maintenance pauses for SaaS

brave bluff
#

Anyone have experience using the alembic api to use it in a script? I'm trying to get alembic to run on my test database in a pytest fixture:

@pytest.fixture(scope="session", autouse=True)
def setup_database():
    database_url = f"postgresql+psycopg://{configs.DATABASE_USER}:{configs.DATABASE_PASSWORD}@{configs.DATABASE_HOST}"
    engine: Engine = create_engine(database_url)

    alembic_cfg = AlembicConfig()
    alembic_cfg.set_main_option("script_location", "/app/src/alembic")
    alembic_cfg.set_main_option(
        "sqlalchemy.url", f"{database_url}/{configs.DATABASE_DATABASE}_test"
    )

    with engine.connect().execution_options(isolation_level="AUTOCOMMIT") as connection:
        connection.execute(
            text(f"DROP DATABASE IF EXISTS {configs.DATABASE_DATABASE}_test")
        )
        connection.execute(text(f"CREATE DATABASE {configs.DATABASE_DATABASE}_test"))

        alembic_cfg.attributes["connection"] = connection
        alembic_command.upgrade(alembic_cfg, "head")

        yield

        connection.execute(text(f"DROP DATABASE {configs.DATABASE_DATABASE}_test"))

But when I run tests, I get the error that the relation I'm testing against doesn't exist, which seems to indicate alembic never ran? Also, I don't love that this will modify my logging settings...

brave bluff
#

I also can't find if there's some way to give alembic a different database at the same host to run against (in the cli).

indigo socket
#

Is anyone good at using flask ?

brave bluff
#

Given this is a Python discord, you're better off asking the question than asking to ask.

hallow brook
#

/rant I wish a good DB ORM existed in Python. sqla should've forked for 2.0 and dropped their tech debt, gone all-in on mapping/typing

hallow brook
#

and then export that engine from my db package

#

the engine is then imported by env.py and everything just works

tepid basalt
hallow brook
#

I actually really like sqlx from the Rust ecosystem, but it's not actually an orm

#

I like sqla well enough, but it's needlessly verbose and relationship/mapped_column are poorly documented

#

because it's insane and does runtime eval() metaprogramming to find types

tepid basalt
#

fair enough haha

hallow brook
#

the metaprogramming is probably one of the biggest issues I have with sqla because it doesn't play nicely with inheritance which it advises you to use as well

#

so it feels at odds

#

does anyone know how to use alembic to "reset" your DB?

#

just take the base revision and delete everything else?

brave bluff
hallow brook
#

so for tests, I run sqlite in memory, localhost I run sqlite-on-disk or postgres, and in prod I run cloud postgres

brave bluff
brave bluff
# hallow brook I am, yes

You haven't run into any logging issues? It seems that's a complaint on every post I've seen about using the API. But regardless, I don't know why I can't even get it working.

hallow brook
brave bluff
#

But I can't even run an upgrade, with or without the logging issue, that was just an aside.

hallow brook
#

are you getting an error?

brave bluff
#

I'm not even sure how to turn this into a more testable scenario

hallow brook
#

so to run alembic programmatically, you pass it an alembic.config.Config pointing at your alembic.ini and then use alembic.command.upgrade(cfg, "tag")

#

it's rather simple

brave bluff
#

I'd hope it's that simple, heh, but you can see from my code above, I have that

#
        alembic_cfg = AlembicConfig()
        alembic_cfg.attributes["connection"] = connection
        alembic_cfg.set_main_option("script_location", "/app/src/alembic")
        alembic_cfg.set_main_option(
            "sqlalchemy.url", f"{database_url}/{configs.DATABASE_DATABASE}_test"
        )
#

I got the main option modifcation from a stack overflow answer

#

I just ran this:

from sqlalchemy import Engine, create_engine, text

from alembic import command as alembic_command
from alembic.config import Config as AlembicConfig
from app.configs import configs


def main():
    database_url = f"postgresql+psycopg://{configs.DATABASE_USER}:{configs.DATABASE_PASSWORD}@{configs.DATABASE_HOST}"
    engine: Engine = create_engine(database_url)
    with engine.connect().execution_options(isolation_level="AUTOCOMMIT") as connection:
        alembic_cfg = AlembicConfig()
        alembic_cfg.attributes["connection"] = connection
        alembic_cfg.set_main_option("script_location", "/app/src/alembic")
        alembic_cfg.set_main_option(
            "sqlalchemy.url", f"{database_url}/{configs.DATABASE_DATABASE}_test"
        )
        alembic_command.upgrade(alembic_cfg, "head")


main()

It took a few seconds to run, but no output and no change to my db

hallow brook
brave bluff
#

See, that's what I thought, but when I pointed it to the migrations dir (versions), it complained that it couldn't find env.py

hallow brook
#

I suggest saving your alembic.ini to disk and generating it from alembic itself so you can modify it as needed rather than just generating it in code

brave bluff
#
Can't find Python file /app/src/alembic/versions/env.py
hallow brook
#

ah yeah, so your alembic dir is your migrations dir

brave bluff
#

well, the parent folder

#

And I do have an ini, but for my app

#

You're suggesting a separate ini for each env?

#

I guess as it wont' be updated too often, that's not too big a deal

#

Oh, but I do some modification work in my env.py, as per other answers I got on this discord

#

Frick, why is something so basic so difficult

hallow brook
brave bluff
#

Same, at the top of my alembic.env.py, I did

DATABASE_HOST = os.getenv("DATABASE_HOST")
DATABASE_USER = os.getenv("DATABASE_USER")
DATABASE_PASSWORD = os.getenv("DATABASE_PASSWORD")
DATABASE_DATABASE = os.getenv("DATABASE_DATABASE")

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config

config.set_main_option(
    "sqlalchemy.url",
    f"postgresql+psycopg://{DATABASE_USER}:{DATABASE_PASSWORD}@{DATABASE_HOST}/{DATABASE_DATABASE}",
)

Unless you mean something different

hallow brook
#

I do mean something different

#

I just set the url in my engine directly since that's what you're importing into env.py

#

I tried to set it in different alembic.ini headers, but that feature is very clunky and I gave up on it since it doesn't merge "defaults"

#

and the programmatic use doesn't even let you reference different headers

#

example and then I use a match os.getenv("DB_ENV"): block at the end of the module to determine which function is called to export as engine

brave bluff
#

Oh, but then you can't do it via the cli, I'm guessing

#

Which I guess you could just replace the cli with script

hallow brook
#

right, you do it via the CLI by using the correct envvar

#

so you set DB_ENV to select the engine

brave bluff
#

oh, I see

hallow brook
#

I do this because my cloud db requires me to set up a complex event listener for password rotations on the engine

brave bluff
#

Makes sense

hallow brook
#

so I set up the listeners inside the engine functions

#

you can see that in set_sqlite_pragma

brave bluff
#

I don't suppose this is something you could throw onto a gist for me to look at? I'm definitely not following the full flow. If not, I understand, I'll see if I can piece it together

hallow brook
#

mmm I think I can illustrate

#

give me a moment

brave bluff
#

Take your time

#

I've been bashing my head against this long enough

hallow brook
#

@brave bluff

def test_engine():
  e = create_engine(some_sqlite_str)
  return e

def local_engine():
  e = create_engine(local_pg_str)
  return e

def cloud_engine():
  e = create_engine(cloud_pg_str)
  @event.listens_for(e, "do_connect")
  def receive_do_connect(dialect, conn_rec, carsg, cparams):
    cparams["password"] = get_rotated_password()

match os.getenv("DB_ENV"):
  case "local":
    engine = local_engine()
  case "prod":
    engine = cloud_engine()
  case _:
    engine = test_engine()

Session = sessionmaker(engine)
brave bluff
#

And this is in your app, not the alembic/env.py, right?

hallow brook
#

this is yeah, in my app.db.engine.py

#

then in env.py you import the engine

brave bluff
#

Oh, so so far this has nothing to do with the alembic side

#

Ok

#

That makes sense

hallow brook
#

yeah, it's used as your connectable

brave bluff
#

Instead of the engine_from_config

#

Yah

#

Hm, ok

hallow brook
#

yeah, I had bad luck using the alembic tools in a configurable way

brave bluff
#

Yah, same here looks like

#

This feels so fundamental

#

And yet is so frustrating

hallow brook
#

I spent like a week to figure it out and realizing that what I wanted to do wasn't possible or required a lot of code after reading through too many github issues

brave bluff
#

Ok, so not just me

#

How the heck is this such a big player in the market?!

hallow brook
#

because it's the only one

#

migrations are hard

brave bluff
#

This really makes me consider swapping to atlas

#

But for now, I just need to get my test db structured

hallow brook
#

same reason that the JVM only really has Hibernate despite being also terrible

brave bluff
#

Yah, I definitely don't mean to make light of the difficulty

hallow brook
#

I think it's the way it is because it has to deal with the tech debt inside sqlalchemy as well

brave bluff
#

But I'm also shocked at the entire lack of answers/lack of knowledge, given it's the only player

#

Yah, SQLA should have 100% done a break at v2 and tossed the debt

hallow brook
#

and both frameworks have a lot of customers/clients consuming their code, so breaking changes are hard

brave bluff
#

Yah

#

But it means forever debt

#

And v2 already has a bunch of breaking changes

hallow brook
brave bluff
#

So much stuff I'd like to be able to do, but then throws errors

hallow brook
#

or being able to use alternatives like msgspec but you can't because it doesn't use the same API as dataclasses.dataclass

brave bluff
#

SQLA and Alembic have been making me think I'm a much worse engineer than I probably am, so it's nice to hear someone else with similar opinions

hallow brook
#

so you're kind of just stuck using MappedAsDataclass or whatever

brave bluff
#

(I've had various people try to convince me the SQLA docs are fine, if you just read them carefully)

hallow brook
#

the docs kind of are fine? but the code docstrings definitely are not

#

I shouldn't need several browser tabs of documentation open to use the framework

brave bluff
#

I donno, as someone who's using it for the first time, I think fine is giving it a lot of credit 😛

#

Yes, 100%

#

And for example, this all came up because I'm trying to use postgres's citext

hallow brook
#

it actually is better than using pre-2.0 sqla

brave bluff
#

Yes, that's for sure

hallow brook
#

I wanted to use the AWS-provided RDS Postgres client with sqlalchemy, but that was a headache and a half because they themselves are importing sqlalchemy for connection pools with no instruction on how to integrate with sqlalchemy ORM 🤣

brave bluff
#

As someone who's just starting to use sqla... I couldn't figure out how to set a column as citext, because it and other db specific columns CITEXT (all caps) where as the base colums are Text (snake case)

hallow brook
#

AWS just assumes you're using raw connections everywhere which is absurd

brave bluff
#

It didn't occur to me that CITEXT was a class, because it was all caps

#

And nothing in the docs indicated it was just a swap

hallow brook
#

yeah, you're talking about type_annotation_map stuff now, btw

brave bluff
#

Yah

#

Sorry, I went off the rails a bit

#

heh

#

The docs aren't terrible, but I do think they're bad

hallow brook
#

I found out that the feature I wanted, which was using NewType UUIDs for type safety, was only implemented like a week before I was trying to use them recently

brave bluff
#

Haha, I had similar experience!

hallow brook
#

but I have to iterate EVERY ID NEWTYPE into that annotationmap which is absurd

brave bluff
#

So many design choices I can't imagine making...

hallow brook
#

at least my base class is beautiful

#

DbId is a union of all the newtypes, so it automatically implements the correct id primary key for each table

#

I'm quite proud of it because you can't misuse it

brave bluff
#

Nice

#

I do have to learn how to use generics better

#

Anyway, for now, I'm gonna try to get this testing working... given what you said about building an engine per env, while I'm doing my ini in memory, i'm basically doing the same thing in code, so the fact that it's not working is frustrating

hallow brook
#

yeah, dynamic connection_string in the ini wasn't working out for me which lead me down this path and it's easier to reason through

brave bluff
#

Does your script output something when alembic actually runs?

#

The fact that there are no outputs also bothers me

hallow brook
brave bluff
#

Thanks. I've gotten as far as alembic spitting out

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.

in my script, which means now it's just not finding the actual migrations...

#

Which I guess is forward movement, but still not actually a result...

hallow brook
#

if it's not versions you need to configure version_locations

brave bluff
hallow brook
#

so I would verify you are specifying the command correctly

#

are you using upgrade(cfg, "head")?

brave bluff
#

This is my ini: https://gist.github.com/rohitsodhia/af3e4b20bdd6ad14cea47914996a7036
This is the script:

from sqlalchemy import Engine, create_engine, text

from alembic import command as alembic_command
from alembic.config import Config as AlembicConfig
from app.configs import configs


def main():
    database_url = f"postgresql+psycopg://{configs.DATABASE_USER}:{configs.DATABASE_PASSWORD}@{configs.DATABASE_HOST}"
    engine: Engine = create_engine(database_url)
    with engine.connect().execution_options(isolation_level="AUTOCOMMIT") as connection:
        alembic_cfg = AlembicConfig("/app/src/alembic.ini")
        alembic_cfg.attributes["connection"] = connection
        alembic_command.upgrade(alembic_cfg, "head")


main()
hallow brook
#

is attributes["connection"] correct?

#

I think your script is including code that should exist in your env.py instead

brave bluff
#

I plan on moving it there, I'm just going by what's in the alembic docs

#
with engine.begin() as connection:
    alembic_cfg.attributes['connection'] = connection
    command.upgrade(alembic_cfg, "head")
#

I'm trying to get it working with the minimum changes to start

#

Because I already have a feeling if it can't find the versions in my script, it won't find the versions in my tests

#

Hell, I'll be shocked if it works at all in my tests

hallow brook
#

so you'll run into conflicts by doing it explicitly without changing the setting

brave bluff
#

Ok, I can sort through that

hallow brook
#

what are you building for?

brave bluff
#

This was meant to be a small crud app to learn fastapi and sqla off of, and improve my testing skills

#

This has turned into a nightmare that makes me question if I'm fit to be anything of an engineer after 10+ years if I cant even get a simple crud api going

hallow brook
#

ah yeah, so put your migrations into your lifespan startup

#

and then you can invoke them in your test by using the testclient as a ctx mgr

brave bluff
#

I don't mind that as an ultimate solution, though I'd still like to figure out what's wrong with the current test script... if it can't find the migrations here, why would it find the migrations at startup?

hallow brook
#

my suspicion is your env.py script

brave bluff
#

Which besides the portion I showed you earlier, is the same as the default. I get the connection is something I have to address, but that doesn't explain the lack of finding versions

#

It's making a db connection to the right database, I confirmed that

hallow brook
#

or bump your alembic logger to DEBUG

brave bluff
#

Let me bump the logger to start, and I only have one migration right now; I'll throw it on gist. As mentioned before, it works via the cli on my app db, just can't get the api to work

hallow brook
#

oh, I see

#

you probably don't need all that ceremony then

delicate tinsel
#

Hey guys, Im tryna make a dynamic website using flask but idk what kind of database I should use. I've asked my teacher abt it and he said it's easier to start off with mongodb but from what I've seen MySQL or SQLite are much more popular (i guess?) so which one do I actually want to use? I'm a total newbie in database and I just wanna know which one is much more beginner friendly (it doesn't have to be limited to only the options I've mentioned above). Thx guys, much appreciated 🙏

grim vault
#

SQLite will be the easyiest one and it already comes with python, no extra setup or server required.

delicate tinsel
#

alr sounds great! im definitely going to try it out thx!

hallow brook
delicate tinsel
hallow brook
brave bluff
#

I also don't think theres an "easier" database. There are databases that are easier to set up, and come with downsides. There are more complicated to set up databases that can make your life easier down the line. For a quick simple project, and not knowing what else to do, sqlite is great, but choosing between a relational database like mysql vs a non-relational like mongo requires knowing what else does and why you'd want to pick each one. I'd say none are more "beginner friendly". If you're using SQL based databases, you need to learn SQL. If you're doing a NoSQL, then you need to learn the syntax for it (which are largely javascript based, but not always). Either way, there's something for you to learn.

brave bluff
#

I'm having trouble with SLQAlchemy 2. I have this model:

class Item(Base):
    __tablename__ = "items"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(), unique=True)

For which I wrote this repository method:

    async def update(self, id: int, name: str | None = None, notes: str | None = None):
        item = await self.get_by_id(id)
        if not item:
            raise NotFound(Item)

        if name is not None:
            item.name = name
        if notes is not None:
            item.notes = notes

        db_check = await self.db_session.scalar(
            select(Item).where(func.lower(Item.name) == item.name.lower()).limit(1)
        )
        if db_check and db_check.id != item.id:
            raise AlreadyExists(db_check)

        await self.db_session.commit()
        return item

And have this test:

    async def test_update_duplicate_item(self, db_session: AsyncSession):
        item_repository = ItemRepository(db_session)
        items_inserted = await db_session.scalars(
            insert(Item).returning(Item), [{"name": "test1"}, {"name": "test2"}]
        )
        item1, item2 = items_inserted.all()

        with pytest.raises(AlreadyExists) as e:
            await item_repository.update(id=item2.id, name="test1")

        item1 = await item_repository.update(
            id=item1.id, name="test1", notes="changing notes"
        )
        assert item1.notes == "changing notes"

If I run each assertion block (the with and the line below it, and the bottom few lines) on it's own, it works. But when I run them together, I get an error on the second update that says there's an integrity error for trying to update id = 2, which means it's still trying to update the first attempt. I fixed it with session.expunge, but I don't understand why that's happening? In the update method, item is being overwritten each time.

hallow brook
brave bluff
delicate fieldBOT
#
Read-Eval-Print Loop (REPL)

A REPL is an interactive shell where you can execute individual lines of code one at a time, like so:

>>> x = 5
>>> x + 2
7
>>> for i in range(3):
...     print(i)
...
0
1
2
>>>

To enter the REPL, run python (py on Windows) in the command line without any arguments. The >>> or ... at the start of some lines are prompts to enter code, and indicate that you are in the Python REPL. Any other lines show the output of the code.

Trying to execute commands for the command-line (such as pip install xyz) in the REPL will throw an error. To run these commands, exit the REPL first by running exit() and then run the original command.

brave bluff
#

I have another test:

    async def test_update(self, db_session: AsyncSession):
        item_repository = ItemRepository(db_session)
        item = Item(name="test", notes="test")
        db_session.add(item)
        await db_session.commit()

        item = await item_repository.update(id=item.id, name="test2", notes="test more")
        assert item.name == "test2", item.notes == "test more"

        item_from_db = await db_session.scalar(select(Item).limit(1))
        assert item_from_db
        assert item.id == item_from_db.id
        assert item.name == item_from_db.name

That shows, even without adding it to the session, the db changes

hallow brook
brave bluff
hallow brook
#

the db_check seems redundant. you can just try the insert and catch the integrityerror for non-unique names

brave bluff
#

Which I'll do

#

But I'm hoping someone can help me understand what's going wrong with the update in my situation

#

If I change the method to

    async def update(self, id: int, name: str | None = None, notes: str | None = None):
        item = await self.get_by_id(id)
        if not item:
            raise NotFound(Item)

        if name is not None:
            item.name = name
        if notes is not None:
            item.notes = notes

        try:
            await self.db_session.commit()
        except SQLAIntegrityError as e:
            await self.db_session.rollback()
            raise AlreadyExists(e)
        return item

the error moves to a greenlet spawn issue in the second update?

hallow brook
brave bluff
#

Yah, the docs are BAD about orm updates

hallow brook
#

how is it this bad, it's so common

brave bluff
#

That's what I say about most of SQLAs docs

#

They're docs are better now with v2

hallow brook
#

the session automatically tracks changes

#

re-adding is not necessary

brave bluff
#

Yah, and that's an obscure example, heh

brave bluff
#

So I guess that tracked change persists in the session even if the instance itself doesn't?

#

Which is WEIRD behavior IMO

#

And they talk about rolling back there, but rolling the session back for the update didn't work for me

#

I had to expunge the item from the session

brave bluff
#

So... the session.get document says it's legacy, but the v2 tutorial says it's not

hallow brook
brave bluff
hallow brook
#

oh no

brave bluff
#

No?

hallow brook
#

you need one session per request

#

they aren't intended to be durable and long-living

brave bluff
#

See, this is the frustrating thing... some folks say "one session all the way!" while others tell me "commit commit commit"... I don't know which way to go!

hallow brook
#

each session is basically its own transaction

brave bluff
#

Yah, I read the transaction docs

hallow brook
#

one engine per app, but session per request

brave bluff
#

Heh, it's a pretty common attitude though, and as someone just getting deeper into SQLA, there's no reason for me not to believe it until I get to an issue like this.

#

Oh, as I think about it, I think for a app request, treating it as one transaction makes sense

#

But for tests

hallow brook
#

oh

#

I found your issue

brave bluff
#

Each attempt should be it's own transaction

hallow brook
brave bluff
#

Oh?

hallow brook
brave bluff
#

I have autoflush disabled though...

#

I hate auto flush

hallow brook
#

I've disabled autobegin and gotten errors still, too 🤷

brave bluff
#

Ok, well, I think with your transaction advice and knowing a bit more about what's going on, I can fix this

#

Thanks

hallow brook
#

this is confusing

#

so it's flushed (emitted to the db) but not committed

brave bluff
#

SQLA docs confusing? ::shocked::

#

And I think I'm going to have to look into nested transactions for a test like this

#

Because I don't want to commit any transaction, so my db is clean between tests

#

But in some tests, I'm gonna set up data

hallow brook
brave bluff
tepid basalt
#

Use save points and roll back your transactions.

hallow brook
hallow brook
tepid basalt
hallow brook
tepid basalt
#

Well, per class, not per test

tepid basalt
hallow brook
tepid basalt
#

That's unfortunate, I wonder how Django works around that

wild nymph
#

I have a database with these four entities:

  • Property
  • Department
  • Municipality
  • RuralSettlement

Each property must be associated with a specific Department, Municipality, and RuralSettlement.
The relationships are:
Department one-to-many with Municipality
Municipality one-to-many with RuralSettlement

What foreign keys should I have in Property?

tepid basalt
#

Just one foreign key field for RuralSettlement. Because the other tables are already connected by their relationships.

wild nymph
tepid basalt
#

Like in what case?

#

That would be confusing because if you want to know a Property's Municipality how do you know if it's property.rural_settlement.municipality or property.municipality.?

wild nymph
tepid basalt
#

hmmm

#

You could make Department, Municipality, and RuralSettlement into one hierarchal entity, so you have only one foreign key field on Property.

#

But that might be more complicated than having nullable foreign key fields on Property...

#

That's all I have off of the top of my head

wild nymph
# tepid basalt hmmm

Uhmm, this is a good idea. But I think I need each entity to be independent because, for example, there is a feature where the user creates a property, and they select the department. Then, they need to see the available municipalities for that department.

tepid basalt
#

Add a type field so you know what's a Municipality

#

Once again, this might be more complicated than it's worth

wild nymph
#

okok i will think about it

brave bluff
#

I've got a SQLA issue I can't figure out. I have this test:

    async def test_update_duplicate_item(self, db_session: AsyncSession):
        item_repository = ItemRepository(db_session)
        items_inserted = await db_session.scalars(
            insert(Item).returning(Item), [{"name": "test1"}, {"name": "test2"}]
        )
        item1, item2 = items_inserted.all()

        with pytest.raises(AlreadyExists) as e:
            await item_repository.update(id=item2.id, name="test1")

        item1 = await item_repository.update(
            id=item1.id, name="test1", notes="changing notes"
        )
        assert item1.notes == "changing notes"

Along with the update method:

    async def update(self, id: int, name: str | None = None, notes: str | None = None):
        item = await self.get_by_id(id)
        if not item:
            raise NotFound(Item)

        if name is not None:
            item.name = name
        if notes is not None:
            item.notes = notes

        try:
            await self.db_session.commit()
        except SQLAIntegrityError as e:
            self.db_session.expire(item)
            await self.db_session.rollback()
            raise AlreadyExists(e)
        return item

When I run the test, it fails on the second update. I realized from feedback here it's because SQLA still has the first item in memory and tries to commit them both (but it somehow doesn't get caught in the try block?). So I thought adding the expunge would work, but then I get an error that I need to rollback. When when I add the rollback here, I get a greenlet error. I can't figure out what's going on.

brave bluff
#

BTW @hallow brook ,i did do a nested transaction, which ultimately fixed it, but now I'm hoping to learned what's going on in this situation to be better able to use SQLA.

floral forge
#

ive been running into a lot of database performance issues with my discord bot, and I’m looking for a better alternative to SQLite, something that’s still easy to set up but handles high-concurrency writes better.

what im looking for:
easier to set up than PostgreSQL (I don’t want the hassle of full DB server management)
more scalable than SQLite without major config headaches
handles high-concurrency writes well
minimal setup & lightweight

any suggestions?

#

im also very unfamiliar with postgresql

sinful moat
hard current
#

Mysql seems a good alternative for database.

sinful moat
# sinful moat
async def route(request):
    async with create_session() as session:
        # db logic
        ...

async def route(request, session):
    # route & db logic
    ...
floral forge
sinful moat
# sinful moat

You just need to call setup_database() to configure the database, then you can forget about it.

floral forge
#

i don’t want to use something that needs to run on a server like postgresql

sinful moat
#

Do you mean you need to store the data locally?

sinful moat
floral forge
sinful moat
#

That's the best current setup I can have for a DB.
It is safe due to its good ORM handling, and supports concurrency (async), and also combines the Pydantic SDK.

floral forge
sinful moat
#

Obviously, it is an asynchronous session

#

You just need to familiarize yourself with SQLAlchemy, for example this one for update

item = await session.get(objects.Item, itemId, with_for_update=True)
item.price += 15
await session.commit()
paper flower
floral forge
#

i already enabled WAL mode

paper flower
#

Yeah, I'd try something like postgres there

floral forge
#

im not very familiar with postgres, and the setup seems to be really difficult

#

ill be running the code on a vps as well

paper flower
#

As Viktor mentioned sqlalchemy as an orm may be a good choice too

paper flower
floral forge
#

i was considering runninig it in docker

waxen finch
floral forge
#

ive decided to use mysql

waxen finch
#

was it easier to setup over postgresql? i thought both had docker images

#

also not familiar with how they handle contention between transactions, i'd like to read a bit about that...

floral forge
#

okay now im in a dilemma again

#

honestly, i cant decide what to use

#

its killing me

#

im about to pull out my hair

lament parcel
waxen finch
floral forge
# lament parcel Mind sharing how you are using sqlite to write ? also you can try playing around...

i'm using a custom connection pool built on top of aiosqlite rather than sqlachemy.

this is my current approach to database writes:

connection pooling: I've implemented a DatabasePool class that manages between 3-8 connections with configurable min/max settings
write locking: i use a dedicated asyncio.Lock to ensure only one write operation runs at a time, while still allowing concurrent reads
transaction management: my implementation supports proper transaction nesting and has safeguards to prevent leaving transactions open.
retry logic: I have exponential backoff with jitter when encountering "database is locked" errors.
WAL mode: im using sqlite's WAL journaling mode with optimized settings

the current issue im facing isn't the connection pool itself but rather sqlite's fundamental limitation with concurrent writes. no matter how good the pool is, sqlite still locks the database during writes which is extremely frustrating

waxen finch
# floral forge okay now im in a dilemma again

i did a benchmark of sqlite's write performance a while ago: #databases message
in short, the results were that one connection committed 1000 writes in 0.2s, while ten connections committed 1000 total writes in 6.6s, so around 150 writes/second

oh whoops, i didn't actually report the write performance in this message, i'll need to rerun the benchmark...

edit: results were 1000 writes in 0.6s with 1 conn, and with 10 conns it increased to 1.5s, so it was actually quite fast for writing

tepid basalt
waxen finch
sinful moat
tepid basalt
floral forge
#

my write transactions are generally short lived and don't wait on external operations. most of my transactions execute a single database operation or a small batch, then commit immediately. the issue isn't transaction duration, but rather multiple concurrent transactions attempting to write to the same tables, but i do have proper pooling and transaction management with automatic commits/rollbacks, but the contention happens when multiple requests hit simultaneously

#

what do you guys think about sharding my database?

#

one for user data, one for chat data, etc

#

honestly, if i were to migrate to postgres or mysql, id have to make several adjustments

waxen finch
tepid basalt
lament parcel
sinful moat
#
SELECT * FROM orders WHERE id = 123 FOR UPDATE
#

This locks certain rows to prevent access to old values ​​that have not yet been updated in the database but are being executed.

floral forge
#

based on my understanding, sqlite will effectively lock the entire database during write transactions anyway due to its file-based locking mechanism. so while FOR UPDATE would make my intent clearer, it won't fundamentally solve the concurrency issues im currently experiencing, correct me if i'm wrong though

waxen finch
# floral forge it hasn't increased

then i would suspect that the duration of one or more write transactions is the problem, not necessarily the quantity of concurrent writes

sinful moat
#

SQLite is used for local databases, your machine should have a good disk write speed, the more writes per second you have,

sinful moat
waxen finch
# waxen finch then i would suspect that the duration of one or more write transactions is the ...

my thinking is that say you had 100 writes to finish immediately, but each transaction only lasts for say, 20ms, then that would take 2 seconds to complete, since it goes through them sequentially (as you've understood correctly)

in contrast, if you had 10 write transactions to complete, but one of them needed to stay open for 5 seconds, then even though there are fewer writes being done, the total time needed to finish those writes is 5.18s, just long enough for sqlite3's default 5 second timeout to raise "database is locked" exceptions on some of those transactions

#

can you reproduce the exact transactions that causes those locked errors?

sinful moat
#

In that case it would not be good to handle updates as a future, so when you want to access them you should make sure that the future has ended, although you should almost always assume that the transaction will be successful.

floral forge
#

there are a few operations where i might be holding transactions open longer than necessary.
i have a matchmaking system within my code, this is what it does when processing matches:

  1. removes users from queue table
  2. creates a new chat record
  3. updates user records
  4. calls discord api to create thread
  5. updates some more records with the thread id

that discord api call happens within the transaction in some places, and that's potentially problematic. even with my retry logic, a 200-300ms api delay during high activity could be causing the lock contentions

sinful moat
#

Does the session remain active throughout the entire process?

waxen finch
floral forge
#

honestly, i dont think using sqlite is a viable solution anymore, this bot is being implemented in a server with 700k members

so my options are;

  1. shard my database
  2. completely migrate to either postgres or mysql
sinful moat
#

try using a decorator that displays the response time

import time
import collections.abc
import functools


def timer[**P, T](
    func: collections.abc.Callable[P, T],
) -> collections.abc.Callable[P, T]:
    @functools.wraps(func)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
        s = time.perf_counter_ns()
        r = func(*args, **kwargs)
        print(func.__qualname__, time.perf_counter_ns() - s)
        return r

    return wrapper

# example
@timer
def sum(a: int, b: int):
    return a + b


sum(1, 2)
sinful moat
lament parcel
paper flower
# floral forge ive decided to use mysql

I'd personally prefer postgres, but I think they're equally difficult to set up 😅
By the way, sharding some of your database tables seems like a nice solution, depending on what you're doing

floral forge
#

i tried installing it, its all so complicated

#

this is so frustrating

paper flower
#

What exactly was complicated? I'm really curious pithink

floral forge
#

so i managed to connect to the port, but i couldn’t view the tables for some reason?? i tried watching a few tutorials as well

paper flower
#

I don't use psql directly that much if you're talking about this, but there are commands to see tables or tables in current schema. If you have a db client/browser like pycharm's built-in one I'd recommend that

floral forge
paper flower
#

I mean, if you didn't create anything there wouldn't be any? 🤔

floral forge
#

i did

#

i was tryna see if i could configure it

#

i just gave up after a while because i was on the verge of pulling out my hair

tepid basalt
#

I tried to see how far I could push WAL mode before things started breaking:

| Connections | Transactions | Journal | Time  |
| ----------- | ------------ | ------- | ----- |
| 5           | 1,000        | DELETE  | 4.64s |
| 5           | 1,000        | WAL     | 0.19s |
| 10          | 1,000        | DELETE  | LOCK  |
| 5           | 10,000       | DELETE  | LOCK  |
| 10          | 1,000        | WAL     | 0.45s |
| 50          | 1,000        | WAL     | 4.63s |
| 10          | 10,000       | WAL     | LOCK  |

I'd still like to try a few more things like increasing the timeout.

#

Each transaction is an insert to the same table btw.

plucky aurora
#

Hi anyone alive here?

undone frost
#

Im currently looking for more efficient ways to update a different table based on changes from a settings table

How does one implement this in a way that won't crash a db?
For example, a different table (T1) references a central settings(T2) table, and when that setting changes, ALL rows in T1 must update with that new setting
To make it more complicated, T1 has rows which use the setting for calculations, so those values will have to be recalculated as well


For context, i'm currently creating a payroll system for a small company

I currently implement the solution to what i mentioned by having a 'bulk regenerate calculations' button (for example for the employee timesheets). Only problem is, with just around 100 of those, supabase already cuts the connection/times out

what would be a better way to approach this?

floral forge
#

oh my god

tepid basalt
#

What are you using for pooling?

tepid basalt
undone frost
#

how'd you suggest i go about the first one tho in a way that doesn't recompute on every read?

#

oh what do you mean by perform the calculation btw? recompute then store at the DB level? or just pull the data then compute at either the server or frontend?

tepid basalt
#

Let's say you want to know the total value of an order. That would be annoying to store in the database because each time the order, quantity or price changes, it needs to be updated. Instead when you're reading the data, add a field in the select statement to calculate the total value.

tepid basalt
undone frost
#

some of the fields are too complicated to be generated/computed at the DB level (mostly because the values themselves depend on values from other tables), but i got a lot of ideas from what you said, thank you!

dull coyote
#

Hello people, I'm looking for an sql database client that autorefreshes when changes occur, I'm using dbeaver but when I make changes in the tables or so it's not autorefreshed (i need to click in refresh to see the difference), can you recommend me one that have this feature enabled?

hallow brook
floral forge
#

now i just need to figure out how to host it on my vps

tepid basalt
#

Docker is probably easiest if that's an option.

dull lintel
#

could someone tell me how to add a csv dataset into vs code. Im new to python and im using pandas.

storm mauve
# dull lintel could someone tell me how to add a csv dataset into vs code. Im new to python an...

VS Code is a glorified text editor, it is not a database in itself nor has any built-in tools for managing data like that besides "open files in a text editor"

working with dataframe libraries like pandas or polars, you'll just need to write code to read that file - there is no concept of datasets

Read the User Guide of the library you are using, it covers basic operations like reading from and writing to files, as well as all sorts of basic operations

topaz widget
storm mauve
#

usually you'll want to avoid having to read the entire file to test each change you make though
you can use something like Jupyter Notebooks, IPython, marimo or even just running code in the terminal to interactively run your python code one line (or block/"cell") at a time, keeping the entire data loaded in memory in between

TL;DR watch https://www.youtube.com/watch?v=suAkMeWJ1yE then read and follow along https://pandas.pydata.org/docs/user_guide/index.html

In this video, you'll learn how to create your very first Jupyter Notebook in VS Code, including setting up your environment, running and debugging code, and visualizing data. Getting started with Jupyter Notebooks has never been easier!

🔎 Chapters:
00:00 Install
00:50 Setting up your environment
02:29 Running text, code and visuals in one n...

▶ Play video
dull lintel
#

but Im using jupyter notebook for this

topaz widget
dull lintel
#

thanks for the help I got everything working now

amber geyser
#

Im also pretty new to python, and just started with pandas a few days ago. This might sound trivial but should i try to integrate jupyter nbs into vsc or can i just use the anaconda browser localhost thingy

tepid basalt
amber geyser
#

alright thank!

floral forge
#

i already figured it out

#

i have not seen a single deadlock yet

tepid basalt
#

That's awesome, let's go!

delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied timeout to @tulip kite until <t:1743525234:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

brave bluff
#

Ok, finally a SQLA "how would you do this" rather than a "how is this done"!
I have a SQL object with a hybrid property:

class Purchase(Base):
    _price: Mapped[int] = mapped_column("price", Integer())

    @hybrid_property
    def price(self):
        return self._price / 100

    @price.inplace.setter
    def set_price(self, value):
        self._price = round(value, 2) * 100

But when you do a bulk insert (https://docs.sqlalchemy.org/en/20/orm/queryguide/dml.html#orm-queryguide-bulk-insert), you pass in a dict of values, which means the hybrid part never triggers. So if I try to insert {"price": 1.50}, it just fails. And I'd have to set _price to 150 for it to work, which means data manipulation anyway.
Right now, I'm doing this:

    purchases_data: list[dict] = []
    for purchase in purchase_input.purchases:
        purchases_data.append(
            Purchase(price=price).__dict__
        )

Any thoughts on another mechanisms?

topaz widget
#

Possibly a dumb statement but just making sure you've considered session.add_all()? That isn't as efficient but it will run your setters

#

Oh I guess you could also do an event listener?

#
@sqlalchemy.event.listens_for(Purchase, "before_insert")
def process_price(mapper, connection, target):
    if hasattr(target, "_price") and target._price is None:
        target._price = round(target.price * 100, 2)
brave bluff
#

Instead of the hybrid property? I guess that is another way to handle it. I'd need to add a second listener for the other way around.

hallow brook
#

you want a @validates

brave bluff
# hallow brook you want a `@validates`

Sorry, not sure how I'd apply that? And I know it's not applied at initialization, but it's like any other getter/setter property, using the value in the obj to calculate the result. But since that's not how bulk works, my code doesn't work. What would I be validating?

hallow brook
#

just add to your session and commit at the end

brave bluff
#

Yah, I considered that. I just know that'll run multiple inserts. Not a HUGE deal, specially at the small scale of my app, but at a larger scale, that'd be a poor solution.

hallow brook
#

huh?

brave bluff
#

I do see there's an update_expression for bulk updates, but I don't see anything for bulk inserts...

opaque crypt
#

What are the git problems you have run into and felt like - someone should have taught me / or told me about this

#

I need to conduct session for Git - I know basic commands only..

#

need to know advance level

topaz widget
#

I guess I would have liked a better walkthrough of “rerere” when trying to fix rebase conflicts. Not sure this is the channel though it’s sassy to think of git as a db

opaque crypt
#

Idk where to ask 😭

ripe blaze
#

hi i am trying to do a little data bases exercice where ive made a list of movies and next to it i have "acting", "soundtrack" ... and "good", "mediocre", "bad" as a first row and i will have to check one of the 3 for each criteria

#

i want to code a python script to note the movies using pandas, how do you suggest i proceed?

topaz widget
#

I would have a "rating" column that took values like "good", "mediocre", "bad", rather than having them be separate.

ripe blaze
#

this is exactly an exercice im trying to learn how to convert check signs into a rating

copper heron
#

Dm me if you can dump of a db

copper heron
#

Dm me now

floral forge
#

i'm currently using ubuntu 24.04.2 on my vps and i'm wondering how to add my setup.sql into my postgres database using the command line

#

or should i integrate the schema directly into my code

wise goblet
floral forge
wise goblet
wise goblet
# floral forge yup

well, a thing to check, but alembic is probably usable standalone as migrational lib.
I know for sure it should be able to handle Raw SQL migrations

wise goblet
#

but technically nothing forbids having file in Git repo without migrational lib, that is already enough to bootstrap server

#

still better doing all actions through alembic for versioning purposes though

#

=> At minimum ensure your SQL Schema is always stored in git repo.
Usage of tools like alembic will ensure u aren't doing manual things outside of git repo code

#

====
It depends on size your development. if it is micro pet project with small amount of code, it can survive with whatever shit u make. Just going to be less pleasant shit

If u have many devs and too much code, better using migrational versioning lib and reviewing changes in pull requests

hushed smelt
#

Hey I had a doubt we use databases to store data and perform olap operations

#

But I stacked my data in an array

#

And created a python script to query it

#

And used dask to parallelize it

#

What's the exactly the diff between both ?

#

What will be the advantage to use database here

storm mauve
#

how are you storing your data?

#

if you only need to perform olap operations, there aren't that many benefits - to the point you see tools like https://duckdb.org/ gaining popularity lately

if you need to perform updates, run queries that benefit from indexes, manage a lot of data in a single place etc. then you might rather use a proper database, but for some simple use cases storing the data in parquet files* and using libraries like duckdb, polars or dask+pandas could be enough
-# *or even just plain CSV

hushed smelt
#

I have raster data/gridded data

#

So like for each time stamp it's an image

storm mauve
#

there might be some PostgreSQL Plugin or even some NoSQL databases better suited for that, but depending on which kind of processing you're doing it might be fine to just not use an actual database

hushed smelt
#

Will it scale to if data is in tbs?

#

Currently I got only 1month data and it's 100gb

#

For 4 variables

#

If it add rest of the remaining 4 it will be around 200gb

#

So for whole year it will be 200 x 12 = around 2.4tb

hushed smelt
pallid tartan
#

hi

thorny anchor
#

yes postgis can handle raster data quite well

timid brook
#

I think this is most likely a general issue with my lack of python knowledge but I am unsure why I am getting this error when trying to attach to a sqlite db.

    dbconnect = db_handlers.createDB()
    cur = dbconnect.cursor()
    db_handlers.addExpenses(cur, rawimport, "2024")```
throws an error AttributeError: 'str' object has no attribute 'cursor'

This I have two handler files basically.  handlers.py and db_handlers.py.  This function is in handlers.py and is being called by an upper level script for choice handling.

The actual function createDB works perfectly fine in that file along with cursor but not outside it
*edit* well apparently the issue is createDB fails when its being called outside of db_handlers.py
*edit2* ok yea it is me being new at python.  the db path is currently hard coded and I was using relative path, which basically fails when I execute from a source file because the top level is in a different folder.  time to convert to absolute path after all.  Thanks for the rubber ducky moment ![Duck](https://cdn.discordapp.com/emojis/1018788794008358942.webp?size=128 "Duck")
topaz widget
#

dbconnect = db_handlers.createDB() that suggests createDB() is returning a string, which is probably not what you want.

timid brook
# topaz widget `dbconnect = db_handlers.createDB()` that suggests `createDB()` is returning a s...

yea, it was due to my sloppy error handling that was very basic. Which basically createDB() returns error text if it fails. Which worked fine for direct call but not as a library so that was the actual indicator I had it wrong and it came down to trying to do like mydb="data/mydb.db" vs "../data/mydb", etc. So I just need to go ahead and do absolute path and that will solve that problem in the future too.

Basically ive dabbled in python for a while but I wanted to try and like really learn it so in addition to taking a class im making my own program from the ground up and trying to not just look up a code snippet to copy paste but actually figure it out. Fun but also challenging

topaz widget
#

If you start adding 'type hints' a bit at a time to your functions, that will catch a lot of these errors for you automatically.

#

e.g. it would have complained that your createDB function returns two different types conditionally

timid brook
#

hmm good point, Ive done that in a few cases before for troubleshooting but worth doing here. I also just need better error handling for the initial db setup in the first place because if that fails for some reason nothing else really matters 🙂

strange harness
#

j'ai une petite question sur les bases de donnés sql y aurai t il des connaisseur qui pourrai m'aider (probleme de com avec une aplication vbnet)

#

I have a quick question about SQL databases — is there anyone knowledgeable who could help me? (I'm having a communication issue with a VB.NET application.)

timid brook
#

Since this is a Python discord probably not the right place for VB.NET help you might get lucky

topaz widget
timid brook
#

huzzah, db import totally works finally. dealing with preventing duplicates is going to be a whole thing though heh...

topaz widget
#

If you can manage it, it's nice to prevent duplicates via uniqueness constraints on the columns.

timid brook
topaz widget
#

In a perfect world where you have super precise timestamps, you can use that as a proxy for identity.. but life isn't always that easy.

timid brook
digital rune
#

hey i'm using sqlite3, is this the right way of writing this : ```py
cur.execute('SELECT account_name, password from account WHERE account_name =? AND WHERE password =?', (username,password))

terse viper
#

You don't need to do WHERE another time

#

Just AND etc

hallow brook
#

so I found this before and can't find it again. How does sqlalchemy interpret string annotations / expressions? I know it tries to literally eval() them against a namespace, but somewhere they explain this in their docs and I can't find it

timid brook
#

I feel like im overthinking but do I really have to call connect = sqlite3.connect("db") in every instance that I do something with the database? I currently have that wrapped in a function for error handling but it still feels like I'm doing it wrong

storm mauve
restive elm
#

I'm trying to create a project for my resume that would show my ability to use SQL and do some ETL. I'm taking some nested json and putting it into a database. I wanted to get it into a tabular format to put into a staging table and then normalize it. But is this an odd thing to do? Should I not be trying to put it into relational database if its nested SQL (not in a records/row format, more of a keys with dictionaries as values that can potentially have their own dictionaries). Ive heard NOSQL may fit it better. But I'm down to make a relational database out of it as long as its not seen as "bad".

#

{'name': 'Not in Jojo but should be a stand name', 'lastModifiedDate': '2024-11-20', 'items': [{'track': {'trackName': 'Can You Stand The Rain', 'artistName': 'New Edition', 'albumName': 'Heart Break', 'trackUri': 'spotify:track:1vbe9sh7U9vdMceAhitWr2'}, 'episode': None, 'audiobook': None, 'localTrack': None, 'addedDate': '2024-11-20'}, {'track': {'trackName': 'This Place Hotel (a.k.a. Heartbreak Hotel)', 'artistName': 'The Jacksons', 'albumName': 'Triumph', 'trackUri': 'spotify:track:2eADspwGh5Ad8tDoJ2bHsR'}, 'episode': None, 'audiobook': None, 'localTrack': None, 'addedDate': '2024-11-20'}], 'description': None, 'numberOfFollowers': 0}

This is an example of just one piece of data from the json I have. Its basically a bunch of these.

topaz widget
#

It’s not “bad” it just requires analysis to turn into a good SQL schema

#

For example do you want an “albums” table or just a label on “tracks” etc

restive elm
topaz widget
#

Yeah, makes sense. A playlist has many songs, a song has many streaming events/listens.

restive elm
topaz widget
#

(By “many” I mean 0 or more)

#

Now, if each of these songs had different attributes, that’s when NoSQL might make more sense

restive elm
#

Oh nah, they very consistently have the same stuff. Its just the formatting being a little annoying to read in.

#

Thank you for your time and help 🙂

topaz widget
#

No problem, good luck!

junior mulch
#

Not sure where to ask this question, I think here is the correct topic:

I have 3 folk stories that I want to create an inverted index for (attach word to files that the word is present in). To do that, I created a dictionary that has words as keys and indices of those stories as values. Now I want to save it as .json, so that my programme would not run it all the time.

The only way I know how to do that is by writing the dict into pl.DataFrame and then .write_json on the DF. Is there any more "meta" way to do it that you would recommend?

#

This is the current iteration (cast_to_df actually creates the dict)

topaz widget
#

You could do json.dumps(word_index.items()) to get back the JSON string and then write that to disk.

junior mulch
timid brook
# storm mauve depends on what you are doing, some projects can reuse the same connection or re...

That makes sense and I did also realize I had stuff too spread out and too specialized when I didn't need it to be so consolidating to like a single function for writing and then called by others to handle different types of writes helped a lot with keeping the db interaction cleaner. Now just need to figured out what I'm doing wrong that results in no data for a query lol.
Been over a decade since I last did any SQL and that was with PHP so basically I remember nothing

storm mauve
#

make sure you're using transactions and committing at the end of each

timid brook
sudden torrent
#

yo

delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied timeout to @sudden torrent until <t:1744296396:f> (10 minutes) (reason: burst spam - sent 8 messages).

The <@&831776746206265384> have been alerted for review.

timid brook
#

good bot

And I do in theory know im doing this right as from what I see in the documentation if I was querying a nonexistent table my result.fetchall() would return None instead of an empty tuple

timid brook
#

welllll I tore it all down and started again. and directly in py I got it working now so it must be the queries themselves. what is odd is even directly in the interpreter previously it wouldnt work either
update well I figured it out though not sure on the correct adaptation going forward but it seems that when using cursor.executescript() it must be a script, a single command doesnt appear to work. I was only using that as the docs said that cursor.execute() was being deprecated.

update2 ok that is not entirely correct though I am not sure why. I have an INSERT query that operates just fine with executescript but the SELECT does not.

grim vault
#

What? cursor.execute() is not deprecated. You just get a warning if you do not use a dictionary with named parameters. And as for .fetchall() will return an empty list [] if no rows are found and .fetchone() will return None if no row is found.

timid brook
#
[('Jan 5', ' Applebees', 123.12, ' ', ' '), ('Feb 4', ' Best Buy', 999.95, ' ', ' '), ('Apr 19', 'King Soopers', 225.33, ' ', ' '), ('Aug 10', 'INTERNETCHARLOTTESVILVA', 89.0, ' ', ' '), ('Aug 10', 'BEER & WINE', 17.29, ' ', ' '), ('Aug 10', 'GOOGLE *YouTubePremiumg.co/helppay#CA', 14.94, ' ', ' '), ('Aug 12', 'STEAMGAMES.COM ellevueWA', 30.74, ' ', ' '), ('Aug 13', 'THE HOME DEPOT', 84.93, ' ', ' ')]

>>> print(readCursor.executescript("SELECT * FROM expenses2024").fetchall())
[]
grim vault
#

I'm wondering that you can fetchall() on an executescript at all. A script normally consists of more than one sql statement. For which one should a result list be done?

timid brook
#

And yea using executescript instead of execute was my mistaken reading of what the deprecation refered to

grim vault
#

I would say just like for executemany() any result list is discarded for executescript()

timid brook
#

ok so basically I was right in thinking I used the wrong function just not the reason why. there is no return to fetchxxx() if I am executing something wiht multiple commands since it doesnt know what I want to get back
edit and I didnt notice until now because executescript for insert I didnt care about a return value outside of it working so there was no fetch() there to tell me it was wrong

hot finch
#

Hi, I've faced very strange issue:

def create_metering(
    metering: metering.MeteringCreate,
    db: Session,
    current_user: user.User,
    account_id: int,
    background_tasks: BackgroundTasks
):
    # Validate metering record
    valid, rec = is_metering_record_valid(account_id=account_id, request=metering, db=db)
    if not valid:
        raise HTTPException(
            status_code=status.HTTP_409_CONFLICT,
            detail=f"Please submit record with Customer: {metering.customer},"
            + f" Product: {metering.product_code}, "
            + f"Dimension: {metering.dimension_quantity} in next hour, "
            + f"current hour has a record submitted at UTC {str(rec.created)}",
        )

    details = metering.details.__dict__ if metering.details else None
    mtrng = Metering(
        account_id=account_id,
        customer=metering.customer,
        customer_aws_account_id=metering.customer_aws_account_id,
        dimension_quantity=metering.dimension_quantity,
        org_id=metering.org_id,
        metering_type=metering.metering_type,
        product_code=metering.product_code,
        updated_by=current_user.email,
        details=details,
    )

    if metering.agreement_id:
        agreement = db.query(Agreement).filter(Agreement.identifier == metering.agreement_id).first()
        if not agreement:
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="Agreement not found",
            )
        mtrng.agreement = agreement

    db.add(mtrng)
    db.flush()
    print("------------------------------------------")
    print(mtrng.__dict__)
    # if mtrng.details is None:
    #     account = db.query(Account).filter(Account.id == account_id).first()
    #     if account:
    #         mtrng.no_of_times_submitted += 1
    #         background_tasks.add_task(get_metering_processor(account=account, db=db).process_metering(mtrng))
#

This code is really strange

#
def create_meterings(
    account_id: int,
    request: list[metering.MeteringCreate],
    db: Session,
    currentUser: user.User,
    background_tasks: BackgroundTasks,
):
    account = db.query(Account).filter(Account.id == account_id).first()
    log.info(
        REPO_LOG_FMT,
        account.org.name,
        account.name,
        currentUser.email,
        ActionEnum.ADD.value,
        request,
    )
    if account.cloud == Cloud.AWS:
        if not (account.status == "UPDATING" or account.status == "OK"):
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="Invalid Role Arn",
            )

    errors = []
    passed_data = []

    for req in request:
        try:
            create_metering(req, db, currentUser, account_id, background_tasks)
            passed_data.append(req.dimension_quantity)
        except HTTPException as e:
            errors.append(str(e.detail))

    db.commit()
    data = db.query(Metering).filter(Metering.account_id == account_id, Metering.product_code == request[0].product_code).first()
    print("+++++++++++++++++++++++++++++++++++++++")
    print(data.__dict__)
    if errors:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail={"errors": errors, "passed_data": passed_data},
        )
    
    return {"Message": "Meterings Created Successfully"}
#

When I use this data for metering creation i can eventually see customer_aws_account_id in db

[
  {
    "org_id": "2",
    "agreement_id": "agmt-8dxk1wcle1p3bznb3krotuqkk",
    "account_id": 1,
    "dimension_quantity": "[{\"Dimension\":\"dim1GbpsAPI\",\"Quantity\":222}]",
    "metering_type": "individual",
    "product_code": "2co1o9ighmy1kkuivy6rnbtga",
    "customer_aws_account_id": "607485323993"
  }
]
#

But when I use below data for metering creation I noticed that customer_aws_account_id isn't saved db even other fields are saved in db

[
  {
    "account_id": "1",
    "org_id": "2",
    "metering_type": "individual",
    "agreement_id": "agmt-8dxk1wcle1p3bznb3krotuqkk",
    "customer_aws_account_id": "607485323993",
    "product_code": "2co1o9ighmy1kkuivy6rnbtga",
    "dimension_quantity": "[{\"Dimension\":\"dim1GbpsAPI\",\"Quantity\":11}]",
    "details": {
      "is_schedule": true,
      "is_recurring": false,
      "schedule_at": "2025-04-16T15:00:00-03:00"
    }
  }
]
#

I'm using FastAPI with SQLAlchemy

topaz widget
#

You aren’t capturing the return value of “create_metering”; what is that function doing exactly?

hot finch
#

@topaz widget I've already solved problem. Thanks for your help.

hollow iron
#

Hi all, can someone please recommend a good python library which allows me to test my postgresql queries made in my python module? I had a look at few and they looked like they required a postgresql server to be running in the background. I don't want to keep a postgresql server running to test, rather I would like the library to spin off the server during testing and direct any psql qeuries made by my module to this test server

#

Appreciate any help 🙂

topaz widget
#

Hmm, that's a lot easier to do with something like SQLite than it is with PG, just because the PG server infrastructure is fairly large.. Would you be OK with a solution that used Docker?

#

The github page for the python version has sqlalchemy/postgres as its example in the README

#

(Huh, this even supports NATS. Neat.)

hollow iron
#

Thanks for the quick response

wise goblet
#

docker-compose defines stuff in yaml, which postgres, quickly on up to raise, and on docker-compose down --volumes complete destruction (if not used volume mapping to host directly)

#

testcontainers in comparison allow a more flexible control through, u could be cleaning postgresql between your modules for examples, having it reraised more than once for testing session

#

nobody forbids running drop db between tests with docker-composed postgres too though

#

i am hesitant to use testcontainers just for the sake of not putting more libs to my apps -_- than less libs is used then better

topaz widget
#

testcontainers would be nicer if you have multiple dependencies to worry about, vs. having to deal with every combination you might want in docker compose. If you just have a single big daemon to care about, I agree docker compose might be less trouble.

wise goblet
topaz widget
#

Yeah that's pragmatic

restive elm
#

hey again ppl. So for context, I've got json data, and im working with python and sqlite. The goal is to practice some ETL. Im trying to understand proper procedure for loading data into a database. My question is, do people typically load an entire CSV into a database all at once, or are things inserted one record at a time?

I've transformed the raw json data I'm working with into a proper table. But I'm wondering if I should loop through it and INSERT one at a time, or is it better to just load it entirely via Sqlite3's import. I've heard of batch vs streaming.

deft apex
#

is it better to use docker or to install my own server for any dbms

hollow iron
wise goblet
#

Docker very strongly simplifies all that

#

Baremetal postgres installation is a big pain in comparison

hollow iron
#

Ahh I see

#

Okay will read up on docker more

#

I have a question, maybe I’ll figure it out while reading, but docker would be heavy to run? I’ve 16gb ram laptop. If it is, then it seems testing setup is heavy, but it also seems inevitable. Have to figure out how to run docker on GitHub actions for my testing also

wise goblet
#

So the weight of running postgres in docker is equal to regular postgres for 99.9%

#

But only if u a using Docker Engine, real docker runs without VMs

#

Docker Desktop is entirely another story. Windows/macOS abomination made to support those oses with GUI in proprietary way. It comes with VM and heavy

#

Possible should be to use even in wsl2 docker engine normally

#

I just work from Linux and normal for me running dozens containers

#

At some point went crazy and had running even more than half hundred at least