#databases

1 messages ยท Page 34 of 1

maiden verge
#

@real jungle

midnight karma
#

im sorry i dont know these things

maiden verge
#

Can u tag

#

Who know

#

@stray gulch

#

Hello brother

#

@livid solstice

livid solstice
#

please don't ping random people.

maiden verge
livid solstice
#

no

maiden verge
brisk flame
#

What do you think about an ORM dedicated only to creating tables, validating and displaying those tables via GUI? I want to create such a tool for fun, because I think ORM is mainly helpful for table management and validation, but imo creating queries in raw SQL is easy enough that I don't need this functionality.

keen minnow
mint wharf
#

I'm trying to learn AWS features in prep for an interview, but I don't know how to create a database without costing me. It says it's in free tier, but both options for configuring Dynamo and RDS are saying they'll cost me. Any help?

wise goblet
wise goblet
#

i don't use any ORMs only if i develop tool... which as basis i build around smth which has docs providing me with Raw SQL only

#

in this case i am lazy to translate to ORM, and can go raw, since usually this tool is small enough anyway and i would not benefit from using ORM

#

otherwise if it is any even small tool that has a lot of CRUD stuff... then ORM is obligatory for me to simplify things and having type safety

#

TLDR: The essential problem is with Raw SQL that it is unsafe.

#

but at the same time it could be nice having tool like Atlas that defines tables, helps with migrations

hallow pelican
#

Hello,

I'm going to start a project with FastApi, and it would also be my first Python project, so I would like some advice about the stack.
I was going to use SQL Alchemy because it was what I had seen a lot, but then I read about Prisma Python Client, and I've already used Prisma but in Node and it's great, how is it with Python?

wise goblet
#

Atlas kind of sucks in having almost everything behind paywall

#

would be nice having similar tool as FOSS

wise goblet
#

at least that's how my Prisma experience went i tried to check its usage for golang client

#

better use Python native solutions like SQLAlchemy or Django ORM

#

they will provide with far better experience, because both natively intergrate with Python

#

and provide even full static typing capabilities (in case of SQLAlchemy ORM)

#

Prima will double suck, because its models aren't even described in Python, but in its own custom language/files

hallow pelican
wise goblet
#

and as far as i see Python Prisma Client has not really good level of... type safety or providing intellisence support in general

#

so it will be... full of runtime surprises thing

#

that is at least what i see from short Prima offered doc examples, and is suspect there is nothing more to it ๐Ÿค”

#

Although Prisma does have Pyright integration

#

may be it is not that bad

#
    post = await db.post.create(
        {
            'title': 'Hello from prisma!',
            'desc': 'Prisma is a database toolkit and makes databases easy.',
            'published': True,
        }
    )

Still

#

Prisma example has DICTIONARY as input

#

that's nightmare

#

i can't seem them being capable to catch it with Pyright

hallow pelican
#

I'd rather go the safe route then since this api will move money hehe

wise goblet
#

so your python devs will be way more surprised too if u go with it

#

SQLAlchemy has nice module called Alembic to manage migrations and even can auto generate them based on introspection of a database for you
Should be quite good

hallow pelican
#

thank you for taking the time to reply dude

candid mica
#

hello does anyone know to link sqlserver with django python

maiden verge
maiden verge
cedar tiger
delicate fieldBOT
last panther
#

ok so i'm in a debate with this guy at work for large mysql tables. ๐Ÿ˜ฆ

#

to either partition large table or create explicit new archive table

coral wasp
#

One big reason is recovery: I don't want to have to restore massive tables to get my app back online.

last panther
coral wasp
#

No, why would I need to union? If I have some operational data that I also need a history for, I keep it separate. So I only need to query one side

#

I'm eating the cost of two writes for the benefit of query time

last panther
#

We have an app that would have to query the current and archive .

coral wasp
#

We're just talking abstractly right now, could you explain more?

last panther
#

we have a large table. and we want to either put records < 2020 in a different table or create a partition on the table by the date field with <2020

#

We have a use case that would have to scan all partitions or union all tables.

#

a query by only id and not a date

coral wasp
#

Nothing wrong with just throwing a partition on the year of a history table. Makes it easy to drop a year, or recover specific years.

#

I've done both tho, but problem with a year specific table is needing another one in a few years,
And so on

#

Lately, I've just been putting history in hive partitioned parquet files, but that's a different story

last panther
#

Models don't have to be updated on all the applications.

coral wasp
#

MySQL also isn't so great at parallelizing queries, so I don't love it for large historical analytical queries

last panther
#

This company has been using mysql since the 90's

coral wasp
#

A lot of my MySQL work was working around its scale limitations, a lot of denormalization and 'manual' parallelization

#

Oh, I ran a SaaS backend on MySQL, sharded the heck out of it. We got it done, just needed a lot of love

last panther
#

Thanks for help.

brisk flame
torn sphinx
#

is this stupid

#

i e slow

#
class DuckDBLogger:
    def __init__(self):
        self.con = duckdb.connect()
        self.con.sql("CREATE TABLE metrics (name VARCHAR PRIMARY KEY)")
        self.con.sql("CREATE TABLE steps (step INTEGER PRIMARY KEY)")
        self.con.sql("CREATE TABLE logs (metric VARCHAR REFERENCES metrics(name), step INTEGER REFERENCES steps(step), value FLOAT)")
        
    def log(self, step, metric, value):
        self.con.sql(f"INSERT OR IGNORE INTO metrics VALUES ('{metric}')")
        self.con.sql(f"INSERT OR IGNORE INTO steps VALUES ({step})")
        self.con.sql(f"INSERT INTO logs VALUES ('{metric}', {step}, {value})")
#

i mean yes it is slow but is it slow because it is badly written or because databases won't help with that

#

its actually 5688.87785007 times slower than a nested python dictionary

coral wasp
#

It's the wrong kind of database. Use almost any other database.

#

Preferably, just stream to a .csv file.

#

Also, never write SQL queries with f-strings like that:

#

!sql

delicate fieldBOT
#
SQL & f-strings

Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.

Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.

For example, the sqlite3 package supports using ? as a placeholder:

query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)

Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.

See Also

  • Python sqlite3 docs - How to use placeholders to bind values in SQL queries
  • PEP-249 - A specification of how database libraries in Python should work
fallen vault
#

Would it be bad practice to have my database open to all IP addresses when setting up a azure Postgres database?

thorny anchor
#

yes

frank cloak
#

table.sql is like this:

CREATE TABLE IF NOT EXISTS BLACKLIST(
  user_id BIGINT NOT NULL PRIMARY KEY,
  reason TEXT,
);

vs something like this:

https://paste.pythondiscord.com/I2QA

I am trying to convert my old schema.sql to the original table.sql

#

I could probaly find the old things

#

but it would be way easier to conver it from the schema.sql to table.sql

#

and I'll send the schema.sql if someone replies with ping only
(so I know you responded)

torn sphinx
#

sqlite is still 16.4110429335 times slower than python nested dictionary for logging

torn sphinx
#

next up I try redis

#

nooooo they dont have a redis tutorial on w3s how am I going to be able to understand it

#

"Redis is not officially supported on Windows"

#

we will see about that

torn sphinx
paper flower
torn sphinx
paper flower
#

How would dictionary be useful for logging though? ๐Ÿค”

torn sphinx
paper flower
#

Well, would they be persistent anywhere?

#

How would you fetch them?

#

What operations would you perform on these logs? E.g. finding averages, means, etc

#

Writing logs/metrics isn't that useful unless you can do something with them later

torn sphinx
#

fetching is not an issue, its instant, only writing

paper flower
#

If your application crashes or just exists you lose all of your data though

torn sphinx
#

oh yeah I save logs into numpy savez compressed

paper flower
#

Also memory could be a concern depending on how long your application runs for and how many metrics you collect

torn sphinx
#

true as well

#

but I am trying to find an alternative and everything I tried is much slower

paper flower
torn sphinx
paper flower
#

I mean, you'd have to know how that was packaged, etc, and it would only be accessible with python

#

sql is much easier to use

#

Also regarding write performance - you didn't share how you actually benchmarked it
also I'm not completely sure if you want metrics or logs

torn sphinx
#

thats sort of true about sql, how its packaged depends on database and databases can have different column names

#

this is my benchmarking code

dlogger = DictLogger()
from glio.python_tools import perf_counter_context
with perf_counter_context():
    for i in range(1_000_000):
        dlogger.log(i, "loss", 1.)
        dlogger.log(i, "accuracy", 1.)
        dlogger.log(i, "dice", 1.)
        dlogger.log(i, "iou", 1.)
        if i % 2 == 0: dlogger.log(i, "loss2", 1.)
        if i % 2 == 0: dlogger.log(i, "acc2", 1.)
        if i % 4 == 0: dlogger.log(i, "dice4", 1.)
        if i % 4 == 0: dlogger.log(i, "iou4", 1.)
        if i % 10 == 0: dlogger.log(i, "loss10", 1.)
        if i % 10 == 0: dlogger.log(i, "dice10", 1.)
paper flower
#

So, it's just logs

#

not some sort of metrics?

torn sphinx
#

well it logs metrics

#

dictionary is 2.6316637999843806 perf_counter seconds
sqlite is 44.37015440012328 perf_counter seconds
and duckdb I had to reduce number of iterations to 1000 and it still took 11.377755700144917 perf_counter seconds

paper flower
#

Imo there's a difference between metrics and logs, logs usually are just strings, when metrics could be something more structured ๐Ÿค” Honestly I don't have a good explanation for that

#

For logs you can just use logging

torn sphinx
#

but 1 is replaced by the value of the metric

#

dlogger.log(i, "accuracy", 1.) means at step i metric accuracy was 1

paper flower
#

Seeing your duckdb code - I don't think you need two of those tables at all? ๐Ÿค”

#

Also you're not batching your operations in any way

#

Why insert records one after the other in different sql operations when you can insert like 10 thousands of them at once

torn sphinx
paper flower
#

In your code you just insert the same values into metrics and steps tables, so they're kind of redundant, no?

paper flower
#

Can you share your dict code too?

torn sphinx
#
class DictLogger:
    def __init__(self):
        self.logs = {}

    def log(self, step, metric, value):
        if metric not in self.logs: self.logs[metric] = {step: value}
        else: self.logs[metric][step] = value
near tapir
#

Miscellaneous Data Types

Transact-SQL supports several data types that do not belong to any of the data type groups described previously:

  • Binary data types
  • BIT
  • Large object data types
  • CURSOR (discussed in Chapter 8)
  • UNIQUEIDENTIFIER
  • SQL_VARIANT
  • TABLE (discussed in Chapters 5 and 8)
  • XML (discussed in the previous edition of this book)
  • Spatial (e.g., GEOGRAPHY and GEOMETRY) data types (discussed in Chapter 30)
  • HIERARCHYID
  • TIMESTAMP
  • User-defined data types (discussed in Chapter 5)

Why is TIMESTAMP included in miscellaneous? I thought it would be a temporal data type

torn sphinx
#

is this correct in sqlite "INSERT INTO logs (step, ?) VALUES (?, ?)"? It says OperationalError: near "?": syntax error, even though I passed it a length 3 list

#

before I had "?" not being replaced because of quotes

paper flower
#

Also you can always send logs in the background thread or make use of asyncio

torn sphinx
paper flower
#

(I was using an orm for this example, so this may be the reason for it being slow ๐Ÿ˜…)

torn sphinx
#

still weird, why would this not work self.cur.execute("ALTER TABLE logs ADD COLUMN ? FLOAT", [metric, ]), it says OperationalError: near "?": syntax error, even though metric = "a", why would it not subsititute

#

this works fine self.cur.execute(f"ALTER TABLE logs ADD COLUMN {metric} FLOAT")

#

man substituting is so weird in sqlite

paper flower
#

Just tested, and simply inserting all records into a sqlite db shouldn't take that long

#
import time
from collections import defaultdict

from sqlalchemy import create_engine, insert
from sqlalchemy.orm import Mapped, mapped_column, \
    sessionmaker, DeclarativeBase


class Base(DeclarativeBase):
    pass


class Metric(Base):
    __tablename__ = "metric"

    id: Mapped[int] = mapped_column(primary_key=True)
    iteration: Mapped[int]
    name: Mapped[str]
    value: Mapped[float]


engine = create_engine("sqlite:///db.sqlite3")
session_factory = sessionmaker(bind=engine)


class DictLogger:
    def __init__(self):
        self.logs = defaultdict(dict)

    def log(self, step, metric, value):
        self.logs[metric][step] = value


def main() -> None:
    Base.metadata.create_all(bind=engine)

    t = time.perf_counter()
    with engine.begin() as connection:
        logger = DictLogger()
        for i in range(100_000):
            logger.log(i, "loss", 1.)
            logger.log(i, "accuracy", 1.)
            logger.log(i, "dice", 1.)
            logger.log(i, "iou", 1.)
            if i % 2 == 0: logger.log(i, "loss2", 1.)
            if i % 2 == 0: logger.log(i, "acc2", 1.)
            if i % 4 == 0: logger.log(i, "dice4", 1.)
            if i % 4 == 0: logger.log(i, "iou4", 1.)
            if i % 10 == 0: logger.log(i, "loss10", 1.)
            if i % 10 == 0: logger.log(i, "dice10", 1.)

        values = []
        for metric, obj in logger.logs.items():
            for step, value in obj.items():
                values.append({"name": metric, "iteration": step, "value": value})

        connection.execute(insert(Metric), values)
    print(time.perf_counter() - t)


if __name__ == "__main__":
    main()
#

You can swith to just sqlite here, but I'm just used to sqlalchemy

#

It still takes around 10 times longer, which is to be expected, but you can parallelize your operations and writes

torn sphinx
#

I just tried 1 table batched yeah it takes 10 times longer

#

I need to try asyncio now

paper flower
#

asyncio would be beneficial if you just do that in the background

coral wasp
torn sphinx
coral wasp
pastel vale
#

Someone here, much smarter than me, will explain it better

whole mica
#

I am currently using MongoDB for saving data and PyMongo in Python.
Here is what my data look like:

{
    "_id": 123,
    "weapons": [
        { "name": "Sword", "durability": 100 },
        { "name": "Shield", "durability": 50 }
    ]
}

{
    "_id": 456,
    "weapons": [
        { "name": "Sword", "durability": 100 },
        { "name": "Shield", "durability": 0 }
    ]
}

{
    "_id": 789,
    "weapons": [
        { "name": "Sword", "durability": 50 },
        { "name": "Hammer", "durability": 50 }
    ]
}

how can I filter out a list of "_id" that does not have a "name": "Hammer" in its "weapons" ( i.e. [ 123, 456 ] )?

coral wasp
#

The trailing comma makes a difference

cedar tiger
whole mica
#

or i think it can be a cursor using find() because i would like to add a hammer to each of those who dont have it

cedar tiger
#

then loop over each dictionary ids and loop over each weapons name to find the Hammer. If its not Hammer, add it to a new list

#

oh you wanna add a hammer to ids that don't have it

whole mica
cedar tiger
#

yeah then I guess do that double loop I mentioned and then append hammer to it after

cedar tiger
whole mica
grim vault
# whole mica oof alright thanks lemme have a try

!e

data_list = [
  {"_id": 123, "weapons": [{"name": "Sword", "durability": 100}, {"name": "Shield", "durability": 50}]},
  {"_id": 456, "weapons": [{"name": "Sword", "durability": 100}, {"name": "Shield", "durability": 0}]},
  {"_id": 789, "weapons": [{"name": "Sword", "durability": 50}, {"name": "Hammer", "durability": 50}]}
]


def check_for_weapon(weapons_list, weapon_name):
    for weapon in weapons_list:
        if weapon["name"].upper() == weapon_name.upper():
            return True
    return False


weapon_all_should_have = "Hammer"
for row in data_list:
    if not check_for_weapon(row["weapons"], weapon_all_should_have):
        print(f"{row['_id']} needs a {weapon_all_should_have}")
delicate fieldBOT
whole mica
#

so it still needs a double loop

paper flower
whole mica
paper flower
#

I think you can add an and condition, too there

#

No idea how to do it ๐Ÿ˜…

#

I don't use mongodb at all, mostly just sql

#

To only get ids you can add a projection:

db.collection.find({
  "weapons.name": {
    "$nin": [
      "Hammer"
    ]
  }
},
{
  "_id": 1
},
)
paper flower
#

e.g. sword with durability 50

paper flower
#

Just

db.collection.find({
  "weapons.durability": 50
},
)

?

#

It's kind of the same

#

Seems like in firs query you can do the same instead of using nin operator

#

Ah, no, nvm ๐Ÿ˜…

whole mica
#

i thought like weapons.name and weapons.durability wont work if weapons is a list of objects, but came out i was wrong

paper flower
#

If you want a specific weapon youl could use $elemMatch:

db.collection.find({
  "weapons": {
    "$elemMatch": {
      "name": "Sword",
      "durability": 50
    }
  }
},
)
#

And you should be able to use comparison operators too instead of just 50 if you need that

whole mica
#

thanks a lot!! ๐Ÿ™Œ

west blaze
#

Hi! Can someone tell me if it makes sense in context of django/psql to create a generic model class with fields like created_at, deleted_at and perhaps some helper classmethods like create/delete for other actual models to inherit from?

paper flower
#

Django's managers already have create/delete methods, and created_at is supported by django itself

#

For deleted objects you can just mark them as deleted instead of calling delete

west blaze
paper flower
keen ravine
#

I am starting to work on a timescaledb project(db built on top of postgres for time series). I am working on my wsl2. I just pulled the official image. I want to know if I am not using python, how do I work on package management and dependency issues? My PC is trash so I do not want to trash it further by keep installing on my local, and I do not want to use virtual machine.

storm mauve
#

Which "package management and dependency issues" exactly?

It depends heavily on which manager you are using

fading patrol
keen ravine
fading patrol
# keen ravine Someone told me to use vscode devcontainers, I want to develop, scale and deploy...

Why should you run a container? Precisely because you said you're having trouble managing dependencies and that you don't want to install things locally. What other solution could there be?

I'm assuming you've already looked at venv. If that's good enough, then that's what you should use. Maybe layer Poetry on top of that if you really want. But to me Devcontainers is just simple and easy.

#

BTW, this has absolutely nothing to do with this channel, please move it over to #tools-and-devops if you have more questions

dry cipher
#

how can I use python to create an edit a txt file, for example I can put save data. however the open() wont let me do it is there a way to do it without root access

fierce kite
dry cipher
fierce kite
#

Looks like permissions for the location you are saving the file. I haven't run into that issue.

dry cipher
#

hmmm its in a foulder that i created

fierce kite
#

Maybe it is the permission level of the python interpreter

#

Is the file being saved in the same location as the .py file?

dry cipher
#

yea

obsidian basin
#

Hello I am using flask and flask sqlachemy. To create my db I am using flask migrate I am wondering if there are better and more advance ways like postgresSQL. I am going to research postgresSQL more but will it work with flask and create the db with flask sqlachemy ? Also is it easy to setup in flask.

wise goblet
#

but your migrations created for Sqlite3 highly likely will not work for it ๐Ÿ˜‰

#

so you would have to "regenerate" migrations

obsidian basin
#

If I am understanding correctly that okay I haven't launched a site so it is nothing important

#

Thanks

#

Do you think postresql is the best tool to use

wise goblet
# obsidian basin Do you think postresql is the best tool to use

that depends on a project and somewhat even programming language.
for web related pet projects which have user database, Sqlite3 is kind of more lazy option to go for as default
if you utilize a good programming language and project is simple, depending on its data... less conventional dbs like redis or mongodb can fit too, or even just storing in memory of a program (+using some of its native storing in file) and that's enough too. some projects run on stuff like etcd and etc, it is very app specific.
if you develop backend api for production for work (once again we have user database we wish to persist), then postgres is good default.

TLDR: it depends on project requirements.

#

Postgresql is nice in allowing to bruteforce its usability for everything

#

but... for simple enough pet projects it is too much, if u are lazy to maintain it, sqlite3 is better then

obsidian basin
#

Thanks it is just a pet project which I hope grows but might not , I am just an amateur

paper flower
#

But postgres ones if you postgres specific data types won't

tawny orchid
#

What should I do to make this query go fast?

SELECT * FROM words
WHERE
    (book_number, chapter_number, verse_number) IN (SELECT book_number, chapter_number, verse_number FROM words WHERE word LIKE 'jeesus%')
AND
    (book_number, chapter_number, verse_number) IN (SELECT book_number, chapter_number, verse_number FROM words WHERE word LIKE 'elรครค%');

Db is sqlite, and it takes over 200ms to respond from backend. I mean to run it on every key-stroke, though, but at least once second (or two ๐Ÿ˜„ ). ChatGPT suggested some bizarre join-gadget which was meant to eliminate subqueries, but it just worsened it. :O.

storm mauve
tawny orchid
#

i don't use regex. But I would need to transform the database

storm mauve
#

How many rows do you have in that table?

tawny orchid
#

bible words

storm mauve
#

How many rows does that amounts to in your db?

tawny orchid
#

some 54k

#

oops

#

*10

storm mauve
#

so you expect to perform around a million string comparisons for each keystroke?

idk if there is some simpler way to create a text index, but I am pretty sure you will have to transform the database

grim vault
#

This could be faster: deleted doesn't work like I thought.

tawny orchid
#

Mmm... i really like the idea of of having flexible search i get with this. One can write words, and the wording need not be exact

storm mauve
#

also, side note: storing each word as a separate row sounds like a pretty bad idea in first place but idk the best/right way to do it besides "use FTS"

tawny orchid
#

full text search doesn't give that?

#

fts?

storm mauve
#

the thing I linked earlier

tawny orchid
#

ah

#

yes

#

i mean word counts if you start it correctly. It has to be 3 or more chars (or it won't generate sql for that). Words can be out of order too

#

๐Ÿ˜ฎ So i prefer not to have full text search, because this is more flexible

#

maybe i just don't send every keystroke, but instead set a timer and after a time make query

#

maybe im just imagining things and i should change the db ;D

storm mauve
#

at least consider de-duplicating if you haven't yet I guess

tawny orchid
#

(just making sure)

#

i haven't :< I just edited something and plรถrรคytin it from pandas to db ;D

waxen finch
waxen finch
tawny orchid
#

i forgot --release ๐Ÿ˜ฎ It's still slow, but 3x faster still than before. Thanks for fts links ๐Ÿ˜„ I just kind of assumed it wasn't suitable and thus made the db one word per line. Why is it always I only learn after the mistakes. ๐Ÿ˜„

paper flower
#

Hm, you could check how long it takes to just do a select with specific book_number, chapter_number and verse_number?

#

Maybe filtering is just slow

tawny orchid
#

It searches bible verses that contain both search terms

cursive phoenix
#

Hello All, I have a challenge in implementing a scenario. I have 2 databases. first is updating with data for every 10 min interval and the second database is updating 3 times a day. I have to compare data and similarities between these 2 databases. which tools fit for this scenario and how to do it

delicate fieldBOT
#

:warning: Your 3.12 eval job has completed with return code 0.

[No output]
fading patrol
# cursive phoenix Hello All, I have a challenge in implementing a scenario. I have 2 databases. fi...

If these don't absolutely need to be two separate databases, they probably shouldn't be. Why are they separate?

If you really need to sync two unrelated databases with two different schemas there are discussions online about the possibilities like: https://softwareengineering.stackexchange.com/questions/288370/best-way-to-synchronize-data-between-two-different-databases

west blaze
#

Hey everyone! I got a quick question about the json module.


def populateLocations(apps, schema_editor):
    file = open('./data/geo_list_1.json')
    locations_data = json.load(file)

    for location_data in locations_data:      
        
        location = location_data
        Location.populate(**location_data)
    file.close()

Do I need to .close() the file after the loop, as shown above, or can I .close() the file right after it loads?
Also are there any constraints keeping the file "open" for a while?

grim vault
#

You can close it after the load but maybe use the with statement:

def populateLocations(apps, schema_editor):
    with open('./data/geo_list_1.json') as file:
        locations_data = json.load(file)

    for location_data in locations_data:      
        Location.populate(**location_data)
tranquil island
#

writes full stack app in lua wishing I could use python

near tapir
#

Hey guys I understood the basic concept of CLUSTERED and NONCLUSTERED in MS SQL but I am unsure when's the right time to use either of the two. And I'm wondering if there are any disadvantages between them?

coral wasp
#

Clustered indices?

near tapir
#

In Transact-SQL, the names of all data types and system functions, such as CHARACTER and
INTEGER, are not reserved keywords. Therefore, they can be used to denote objects. (Do not
use data types and system functions as object names! Such use makes Transact-SQL statements
difficult to read and understand.)

They say it isn't a reserved keyword but why is it highlighted in my case? (I'm using SQL Server)

torn sphinx
#

mongoDb is the best ngl

fading patrol
brazen charm
#

old but gold

cinder spear
thorny anchor
near tapir
#

Hey guys, I'm trying to use MS SQL but there are so many variations? I don't know which one to choose.

astral harbor
#

Hello there

#

I hope you are all doing well.

#

I am new and I need help, can someone please help me? I have a problem with relationships in my sqlalchemy database.

fading patrol
fading patrol
# astral harbor

It's warning you that whatever is at line 31 of your file is depricated. It should still work for now but you should put that error into a web search to find the relevant documentation, and then look at your code

fading patrol
astral harbor
deep heron
#

I could use a pointer, if someone knows, on how to properly handle excluding zero dates (0000-00-00) and NULL dates in MariaDB using SQLAlchemy.

def _build_search_query(self, table, fields, term, like):
        term_string = str(term)
        search_conditions = []
        query = smc_database.sql.select(table)
        
        for field_name in fields:
            term = term_string
            field_attribute = getattr(table, field_name, None)
            if field_attribute:
                field_type = self._get_field_type(table, field_name).lower()
                if "int" in field_type or "float" in field_type or "decimal" in field_type:
                    try:
                        term = float(term) if "." in term else int(term)
                        search_conditions.append(field_attribute == term)
                    except ValueError:
                        continue
                elif "date" in field_type or "time" in field_type:
                    term = self._parse_date(term)
                    if term:
                        search_conditions.append(
                            smc_database.sql.and_(
                                field_attribute == term,
                                field_attribute != None,
                                field_attribute != "0000-00-00"
                            )
                        )
                else:
                    if like:
                        search_conditions.append(field_attribute.like(f"%{term}%"))
                    else:
                        search_conditions.append(field_attribute == term)
        
        if search_conditions:
            query = query.where(smc_database.sql.or_(*search_conditions))#.limit(500)
        print(f"\n{query}\n")
        return query
#

The part in question is elif date. If I understand sql correctly, I think the generated sql query is correct. (Excerpt)

WHERE `SMC_Billing`.`Customer_No` = ? OR `SMC_Billing`.`Invoice_Date` = ? AND `SMC_Billing`.`Invoice_Date` IS NOT NULL AND `SMC_Billing`.`Invoice_Date` != ? OR `SMC_Billing`.`Item_No` = ?   
2024-08-19 12:11:41,935 INFO sqlalchemy.engine.Engine [generated in 0.00169s] ('2016-09-23', datetime.date(2016, 9, 23), '0000-00-00', '2016-09-23')
paper flower
deep heron
#

I might break it up a bit once it's all working. This is a general search program where the user can type a string, number, date, or date object, and it has the ability to search every column for something like it. It's complex because it's attempting to match data type as best as possible.

deep heron
#

Funny enough, I think it was working the whole time, but I didn't realize it. Pointers in general would still be great though.

deep heron
paper flower
deep heron
#

It's for any model.

#

The idea is that I can set it up so if I need to give a department access to edit fields of our core database, I can just make a little program that imports the module that includes this. Then I call with the necessary fields, which fields have which permissions, and then display a search followed by an output display.

#

For very simple views and edits, I should add. Anything more complicated deserves more program, but can still use this module for whatever would be common. It's part of a set of core modules I'm writing for the company.

coral wasp
#

The or and and there makes me wonder if you got the order of operations right

paper flower
#

Maybe like this:

from datetime import date

from sqlalchemy import inspect, select

from app.db.models.manga import Manga

mapper = inspect(Manga)
stmt = select(Manga)
for column in mapper.columns:
    try:
        python_type = column.type.python_type
    except NotImplementedError:
        continue
    if issubclass(python_type, date):
        stmt = stmt.where(column != "0000-00-00")

print(stmt)

change the model for your own here

#

But honestly I'd work on fixing your db to exclude these zero dates ๐Ÿ‘€

#

Honestly I'd make a separate set of searchable columns for each model which you'd apply your search term to

stray cipher
#

class User(UserMixin, db.Model):
# ...
following: so.WriteOnlyMapped['User'] = so.relationship(
secondary=followers, primaryjoin=(followers.c.follower_id == id),
secondaryjoin=(followers.c.followed_id == id),
back_populates='followers')

#

I don't get the logic of the secondary join, it makes sense that we're joining the user table with the assoiciation table where the follower_id == id. But I'm not sure what the secondaryjoin does

deep heron
#

If it works, that's the beauty of it.

#

And the database is full of garbage data from poor imports when the tables were created, so I need to account for that, too. Including the zero dates.

coral wasp
paper flower
#

In case you're using a secondary table secondaryjoin establishes how you join from your secondary table to a child model, and primaryjoin how you join from this model to a secondary table

#

In most cases you don't need to use primary or secondary join attributes as sqlalchemy can figure out how a typical relationship should work

outer yacht
#

I'm testing some python automation for some reports I am tasked with. I have some excel sheets that connect to our MS-SQL databases, but I am not too sure about how to connect to them with my python environment. It's health data, so the security can be weird. Does anybody know how I can generate a sqlalchemy connection strings? I have the connection string that works in excel but I don't exactly understand how to transfer that to something python can understand

deep heron
#

I wish I could just clear them out, but there's so many garbaged up tables with tens of thousands of rows that I can't possibly go through and fix them.

coral wasp
deep heron
#

๐Ÿค” I might have to take a look at it. I thought it just checked to make sure you didn't return results with repeated information. I might have misunderstood the reading.

coral wasp
#

The way I'd do this in sql is: id write a cte to filter out irrelevant records, then query against that, like: with q1 as (select * from mytable where date is not null and date != 0000....) select * from q1 where conditions...

pastel vale
#

Could someone link me a database project theyve done that can still be understood by a beginner (as in the code is understandable to someone like me, even if parts of the code is above my level)?

#

Also itd help if the code itself has a lot of comments so I can understand what each part functions as? Thank you

deep heron
#

I finally found why I needed the protections against the zero date and null entries. If I am searching for something that isn't a date, when it gets to the date field, I get a "match" with those, even though they don't match.

#

So I added a few checks to block the search and block against those returns as a sort of multi level protection against false positives.

#

So for example, If I search for "WSB-BX-LGCOVER#2" with the protections, I get one result. Without the protections, I get a segmentation fault (I think). There's so many results that the program just kind of... gives up. It exits with no error to catch. I had to create and pull a minidump to find that error.

paper flower
#

You can just add a limit to see the small subset of what you get back from db

deep heron
#

To make sure the results I would get back were actually bad if it weren't for the error. They were.

#

On a related note, is it better to query a date field with a date object? Or the string of said date object? I'm leaning towards object since it would be a closer match in my mind.

paper flower
#

But date would be preferred

deep heron
#

I thought so. Thanks ๐Ÿ˜„

paper flower
deep heron
#

I use this to parse a given date from one of two formats.

def _parse_date(self, date):
        try:
            if "/" in date:
                return datetime.datetime.strptime(date, "%m/%d/%Y").date()
            elif "-" in date:
                return datetime.datetime.strptime(date, "%Y-%m-%d").date()
            else:
                return None
        except ValueError:
            return date
paper flower
#

And you can use stmt = stmt.where(...) instead of collection all the where clauses in a list

paper flower
#

Can't you handle that with orm?

deep heron
#

lol why?

paper flower
#

are you getting these values from db?

deep heron
#

The date to search? Or the date returned?

paper flower
#

Or is that what you're getting from your client?

deep heron
#

The date being parsed is from the user.

#

So yeah, client.

#

And the source of part of my headache with item "numbers" like given before ๐Ÿ˜„

paper flower
#

It's not that bad then ๐Ÿ˜… I would be scared If I had dates in my db as strings and in different formats

deep heron
#

Nono. That at least is one of the correct things about this database. Dates are in date type.

paper flower
#

Honestly if you can - use ISO dates/datetimes

#

And force your client to use that ducky_devil

deep heron
#

I'd love to. I don't have that control. The joke is that some of the data in this database is older than I am.

#

Imported from other sources of course.

#

I'm curious though, why ISO?

obsidian basin
#

Please ping on reply.

I have a few questions about sqlachemy + flask sqlalchemy could someone please answer?

In my database/models I have the RouteToken table and the a column called token.
Can someone think of a better name for RouteToken and the column token.
For example RouteToken and the token column is used when when I have a route like

@email_password_reset.route('/verify_email_token/<username_db>/<token_db>', methods = ['GET', 'POST'])

The reason I want a different name is because RouteToken and token are so similar. Or do you think my reasoning is incorrect.

Another question I have is in RouteToken I want to add a column called ย  ย  attempts_token_tried: so.Mapped[int] = so.mapped_column(sa.Integer, default=0). It would reset to zero if someone tried to create more then 5 tokens. Should I create a 1 to many relationship or just have a 1 to 1 and delete the token each time and keep track with attempts_token_tried?

https://pastebin.com/hYRJgXti

Also how is the 2nd example in the 1 to many link different then the 1st example in 1 to 1 link?

https://docs.sqlalchemy.org/en/20/orm/basic_relationships.html#one-to-many

https://docs.sqlalchemy.org/en/20/orm/basic_relationships.html#one-to-one

fallen vault
#

@coral wasp figured we should move to a dedicated channel. I havenโ€™t played with the idea much yet since it was just meantioned yesterday, but the software would probably have to save locally and preform a scheduled online backup rather than saving in both locations every time a record is created.

coral wasp
#

This can get complicated quickly, just warning; having a central database with local onsite replicas raises a lot of issues. The other approach is many onsite databases that 'push' to a central database. A third approach is a dual db, where transactions are applied across two db's.

pastel vale
simple pendant
#

Hii ,
I am dishant. I work with startups & businesses to find suitable freelancers they require. If you are a web developer or editor you can contact me with your portfolio and pricing

sudden delta
#

is there any way to fix the thrift error when trying to make a connection to a hive database ?

fading patrol
simple barn
#

I need to store JSON in an SQLITE table column. JSON1 extension or Blob? Pros and Cons?

waxen finch
simple barn
waxen finch
#

i think if you want to use BLOBs for the slight performance benefit of sqlite's jsonb format, that's fine

#

but otherwise it's probably more convenient to use TEXT for json strings

(to clarify, converting {"Hello": "world!"} into a BLOB has no benefits over storing it as TEXT, but in 3.45.0+ you can convert it into sqlite's JSONB format using the jsonb() function if you want slightly more compact storage and/or other JSON1 functions like jsonb_insert() to run more efficiently)

deep heron
#

Is there a trick to matching a python float to a MariaDB float? I know there's a slight difference (python 64 bit, MariaDB 32 bit?) and I can't change it to decimal.

#

I confirmed an issue exists when I even ran the query in MariaDB through Heidi, same result. I have a row with a float value, and when I put that float value in a WHERE clause, it returns no results.

grim vault
delicate fieldBOT
deep heron
#

It's already a float by default, right?

#

If you don't specify?

brazen charm
#

python floats are double precision so 64 bit

deep heron
#

Ah. I'm thinking backwards.

brazen charm
#

the driver should cast it correctly tho

deep heron
#

What's weird is that using a mariadb program also doesn't seem to find it.

deep heron
# grim vault !e You could try to convert the double to a float, like: ```py import struct fro...

I was having trouble with that, but I tried this out and it's hitting the matches, at least. It's not the most precise...

def _build_numeric_query(self, attribute, value, conditions):
        try:
            if isinstance(value, int):
                numeric_value = int(value)
            elif isinstance(value, float) or "." in value:
                numeric_value = float(value)
                conditions.append(attribute.between(numeric_value - 0.005, numeric_value + 0.005))
                return None
            else:
                numeric_value = int(value)
        except (ValueError, TypeError) as error:
            numeric_value = None

        if numeric_value or numeric_value == 0:
            conditions.append(attribute == numeric_value)
#

(Right before the return None)

#

This is using SqlAlchemy, btw.

green yew
#

I'm using SQLAlchemy and i think i may have written my app pretty wrong, are you suppost to do:

engine = create_engine(get_db_url())

def nameFromID(ID:int):
    with Session(engine) as session:
        stmt = select(Customer).where(Customer.id == ID)
        result = session.scalar(stmt)
    return result.name

def userFromID(ID:int):
    with Session(engine) as session:
        stmt = select(Customer).where(Customer.id == id)
        result = session.scalar(stmt)
    if not result:
        raise Exception('invalid id!')
    return result

or

engine = create_engine(get_db_url())
session = Session(engine)

def nameFromID(ID:int):
    stmt = select(Customer).where(Customer.id == ID)
    result = session.scalar(stmt)
    return result.name

def userFromID(ID:int):
    stmt = select(Customer).where(Customer.id == ID)
    result = session.scalar(stmt)
    if not result:
        raise Exception('invalid id!')
    return result
outer tulip
#

hi, db masters. In sqlalchemy I got a model SKU that has decimal/number col price and category (str) . However different clients have their discount config in ClientCo.disc_config (json) field.

    def client_price(self, clientco_disc_config: dict[str, dict]) -> Decimal:
        '''returns client price with computed discount for client catalog'''
        cat_config_dict: dict = clientco_disc_config.get(self.category, {})
        cat_disc_str = cat_config_dict.get(CatConfigKey.DISCOUNT.value, '0')
        disc_fraction = Decimal(cat_disc_str) / 100
        return round(self.price * (1 - disc_fraction), 2)

This method works well. but I would like to sort paginated catalog of SKUs based on this computed-disc_config based client_price. As per my learnings in recent day, hybrid_property does not take any extra arguments. What are my options?

outer tulip
#

appears, solved it. Not quite nice, but works.

from sqlalchemy.sql import case
from sqlalchemy import Select, asc, desc

def apply_client_price_sort(stmt: Select, clientco_disc_config, sort_asc: bool = True) -> Select:
    cases = []
    for category, config in clientco_disc_config.items():
        cat_disc_str = config.get(CatConfigKey.DISCOUNT.value, '0')
        disc_fraction = Decimal(cat_disc_str) / 100
        cases.append(
            (SKU.category == category, SKU.price * (1 - disc_fraction))
        )
    client_price_expression = case(*cases, else_=SKU.price)
    sort_order = asc(client_price_expression) if sort_asc else desc(client_price_expression)
    return stmt.order_by(sort_order)
...
skus_stmt = apply_client_price_sort(skus_stmt, clientco_disc_config, sort_asc=False)
uneven lynx
#

Heyo people! Earlier I have only worked with xampp and mysql for a earlier project. But now my current project is getting close to needing a database to store some data. What do you say about using xampp and host it with AWS server? All ideas are appreciated

#

The database needs to hold a number (Id) and under every Id there will be three columns of data (temperature, time and sensors)

#

So I believe mysql would work fine for it but I have no experience having a database cloud hosted. Earlier it was just local

near tapir
#

how can i fix this

pastel vale
#

any book recommendations for learning SQL (particularly books that have projects to immediately apply your knowledge)? Prefer reading books as it gives more structure to my learning. Been using sqlite and python but feel ive hit a wall as, I'm very much a beginner

pastel vale
#

The owencampbell one looks like itll be the perfect resource for me as Ive essentially been learning sql through python

jagged garden
#

hi

#

I have about select data using pandas

#

this line not filter

#

aprovados = escola_selecionada[escola_selecionada['situacao_nome'] ==
'APROVADO']

sudden delta
#

How do I connect to a hive database ?

fallen vault
wicked flame
#

Can I ask a question about vector databases here?

storm mauve
# wicked flame Can I ask a question about vector databases here?

yes, also preferably ask your question instead of asking to ask

if it's off-topic, you'll just get redirected elsewhere
if it's on topic, someone may answer it without the extra delay/compromise of responding to the original message and having you reply with the actual question

blissful yoke
#

Is this a good chat to get help with creating a data logger?

#

I mean, itโ€™s more just a functionality in Python rather than purely this but just wanna ask just in case

#

But basically

#

I have created a data logger for a module system for a raspberry pi to track various modules for a car. This is just context

#

My issue is, Iโ€™m creating a framework for it, so that it would act as a library, you create a logger object, and you have the options to add sources, information you want to log, and those will be consistently updated with a update_source() function

#

There are 2 types of updates, updating a specific source, and updating all sources of the logger, both with the same problem

#

I want it to be as easy as possible, so by simply specifying the name of the source, or just running the update_all_sources() function, it will update and add the values into the list storing each update

#

This is going to be difficult to explain, but how I was thinking was to instead of getting the value, to instead store where the data is being changed, so you would simply just have to read the stored variable and then obtain the value

#

Because if not you would have to plug these variables into the parameters of update_all_sources and that ruins convenience to an extent. Instead do it once when you create the source and then run said code. Just wanted to know if it was possible

#

Sorry for the long post, itโ€™s hard to describe my thought process, hopefully it helps

keen minnow
#

A datalogger is a start, but then how much retention do you need? Who would need access to that? How often would they need access to that? Would they go straight to the RPI or would it go to a DB or a cloud or something else?

halcyon slate
#

how do i make an er diagram for students submit issues

#

and normalize it

hidden creek
#

hi

so i have a many to many relationship
using postgres

#

say a table called records
and a table called meds

#

when the users create a record
they can make any number of med for it
the med can be new or one if the meds already in the database

#

so i need a query the first checks if each med exists or not
if it exists, make the relationship in the middle table
otherwise create it then make the relationship

#

if anyone knows how this can be done in an efficient way

wispy cloud
#

Guys i forgot abt the cursor what does it do

coral wasp
wispy cloud
#

Ooooo

#

Thankuuu

fair coral
#

anyone know any good books on implementing databases? namely looking into learning how distributed keyvalue stores work.
could even just be theory, just a decent book on how databases work

coral wasp
thorny anchor
mental cove
#

Hi, I'm making a login system as a part of a app project I'm working on and I want the app to be accessed publicly. I plan to code it in python using the sqlite3 library and a sqlite3 database to store all of the users's information (passwords, usernames, etc...). Anyway, I was wondering if I have to code it differently as a result of wanting the app to be public ? Do I have to import libraries other than sqlite3 and consider things like protocols or am I ok to just code it normally ?

torn sphinx
#

hi. in my python project i used pure python mysql connector. why do ppl use sqlalchemy or other stuff to work with database??

storm mauve
# torn sphinx hi. in my python project i used pure python mysql connector. why do ppl use sqla...

they abstract SQL away into Python classes and methods, are sometimes more portable than SQL with how each database has a different dialect and can help make database migrations and versioning easier

there are many people that prefer to use raw SQL queries instead of using an ORM though
(the reason varies per person but in general extra tool, extra complexity, extra dependency, more things that can go wrong and that you have to manage, sometimes their queries are less efficient than hand-written queries etc)

wise goblet
# torn sphinx hi. in my python project i used pure python mysql connector. why do ppl use sqla...

SQLAlchemy stuff + Adding things like Alembic + Factory boy can take care for you of

  • Creating SQL migrations
  • Applying not applied to db
  • Easier to unit test code logic (thanks to Factory boy)
  • Having DECLARATIVE table/columns definitions, where u define them once and they are used to autogenerate migrations
  • Having type safety, you could is using recognizable attributes of those ORM class instances
  • Increased uniformness/flexibility to implement more complex in terms of Python syntax itself code logic (raw SQL is pretty much a mess)
  • You can use raw SQL if necessary too, but only if u exhausted all ORM lib capabilities
    Good to choose if u develop fully fledged backend api/service (or just big in code size app. Lets say over 20-40k code lines u are drastically needing that)
    And also IMP very good for any CRUD api
#

On another hand using raw SQL has advantages of:

  • Having increased flexibility to SQL syntax (At the cost of almost no integration with a programming language itself, and no typing safety)
  • Less dependencies involved (u depend only on a single minimal lib!)
    Good to choose if u develop very minimalistic tool that will be maintained only once in several months. Haven't encountered usage cases when else it would be good to use raw only, since u can use it if u need for ORM too anyway
torn sphinx
torn sphinx
wise goblet
torn sphinx
#

why put extra stuff in it

#

even tho i havent learned them yet

wise goblet
#

anyway your app is simple enough to live without ORM

#

student level project after all

#

i would be more excited to see unit tests first in it than ORM

torn sphinx
#

tnx for the advice

ashen crown
#

hack

#

hacking

delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied timeout to @gusty scroll until <t:1724626385:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

quaint mauve
#

Anyone know if My SQL can be used in Jupiter noteboooks on Mac

quaint mauve
#

How do I connect

#

This is what w3 schools says

#

But I donโ€™t know my username

thorny anchor
#

did you run MySQL locally?

quaint mauve
#

Wdym

quaint mauve
visual bough
#

bluds, in my face attendance system, I am using sheety API to store the data in google sheets, it is working well but it can't save images in google sheets, any alternative ?

robust grotto
#

HI, does MariaDB Connector support async operations ? I try to create SQLALchemy async engine with valid URL started with mariadb+mariadbconnector:// using create_async_engine(...) , then SQLAlchemy raises following exception :

File "/path/to/my-project/.venv/lib/python3.12/site-packages/sqlalchemy/ext/asyncio/engine.py", line 1031, in __init__
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: The asyncio extension requires an async driver to be used. The loaded 'mariadbconnector' is not async

Should I switch to other connector packages such as asyncmy (looks like the only option) ?

[update]
I switched to asyncmy , async things work well

visual bough
#

not blud creating anchors ๐Ÿ’€

somber ember
#

if i have a problem that I was able to resolve, but not able to comprehend the log messages, should I move on?

figured out...

fading patrol
ashen wraith
#

Use a database like MongoDB, MySQL, or Postgres.

#

And store files in a server, store a file as a file.

fathom beacon
#

:D

ashen wraith
#

DDoS?

#

Did I understand you well?

#

You mean Distributed Denial-of-Service (DDoS) Attack?

#

You want malicious code..

#

And the server ToS doesn't agree with you.

#

Rule 5
Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.

fathom beacon
#

Do I need to delete messages?

ashen wraith
#

idk

fathom beacon
#

I think I need to delete

ashen wraith
#

Okay

fathom beacon
#

Why would a person write code if he isn't going to do such things?

ashen wraith
#

To make a billion dollar company and be rich?

fathom beacon
#

Coding a calculator is seriously boring

ashen wraith
#

To take $100K a year?

ashen wraith
#

Did someone tell you that's coding?

fathom beacon
#

Be realistic my friend.

ashen wraith
#

Realistic?

fathom beacon
thorny anchor
ashen wraith
fathom beacon
#

I don't think it's right to continue this discussion from here.

ashen wraith
#

Google Top Results:
The average salary for Software Developer is USD 104,440 per year in the India, MS

fathom beacon
#

:D

#

joke

ashen wraith
#

bad joke D:

#

๐Ÿ˜ก

fathom beacon
#

I know :D

#

where are you from

ashen wraith
#

if you know my github page, Lives in the terminal

fathom beacon
ashen wraith
ashen wraith
fathom beacon
#

:D

#

My goal is not to enter the bank

ashen wraith
fathom beacon
#

That's why my friend went to court today.

ashen wraith
fathom beacon
ashen wraith
#

?

fathom beacon
#

He was released on the condition that he pay back the money he stole to its owner.

#

2.058$

#

I'm not looking for money

#

where are you from @ashen wraith

ashen wraith
#

ok, if you want to be a programmer, I recommend Flask & HTML/CSS, then Django, then beautifulsoup to autosearch for jobs

#

From earth

fathom beacon
ashen wraith
fathom beacon
#

I think yes

#

Do you want to help me with this?

ashen wraith
#

no

fathom beacon
#

:(

ashen wraith
#

it's illegal

fathom beacon
#

you will just teach

ashen wraith
#

i'm wondering deleted messages are viewable by maintainers

fathom beacon
#

anyway thank you

#

o7

ashen wraith
#

thank me for what?

next shadow
fathom beacon
fathom beacon
#

see you buddy

ashen wraith
fathom beacon
#

I think I'm about to get banned

next shadow
#

Yup, and they have now been informed of the rules (thanks for that by the way) and have discontinued their search here.

ashen wraith
#

What do you mean by " discontinued their search"?

fathom beacon
next shadow
#

this That is what I mean

ashen wraith
#

i don't understand, do you mean you told them to not open a topic like this again?

fathom beacon
#

yeah

ashen wraith
#

ok

next shadow
#

They have understood that we will not allow it again on the server. Leniency is usually a good policy if someone is not aware of the server rules (even though everyone clicks the button saying they have read and understood the server rules when they join)

ashen wraith
pastel vale
#

How do I try to level up my sql skills (in terms of learning and projects). Like Ive got a good grasp in the basics but I want to go beyond that. Using sqlite atm before I go onto other sql variants

cedar tiger
pastel vale
#

Ive been working on a ufc database but its getting kinda boring now. Going to move onto a new project thatll properly test me

#

I swear this shit is so difficult because Im self learning everything. Im not going to spend money on courses unless i really want to

pastel vale
coral wasp
pastel vale
#

Ive been focusing more on the analytical side as thats what I'm more interested in. just dont know what to do to further my skills

coral wasp
#

You'd want to balance it with some transactional stuff, if you want to be "expert" you need to know both sides.

#

But, I'd suggest working through some of those sites I linked above. It looks like you're just scratching the surface, and haven't yet worked with CTEs, aggregates, subqueries, and window functions.

pastel vale
#

Ill do that then for the next month or so

somber ember
#

when i was in game industry i was constantly told that when i write code it should be future ready, that if something unforeseen happens, it should be able to handle with minimal intervention. But looks like same practice doesn't apply in data engineering. I was told, that instead of writing dynamic code, i should only apply changes when user request since table is a fixed structure.

in general is this a standard in data engineering or should i see this as case by case?

fathom beacon
#

@next shadow Hey man, do you mind if I find teammates here?

next shadow
#

We don't allow recruitment or advertising

fathom beacon
cunning plover
#

Guys, im a beginner

#

what database should i install in my pc

#

like postgresql

tulip cove
#

maybe it will work for you.

fading patrol
# cunning plover like postgresql

If you've never touched SQL at all before, I recommend SQLite first. Postgres is also a fine choice but SQLite is just a bit easier to dive into quickly and start building stuff with.

fallow otter
#

Is it normal to have SQL query like this? Or I should have do more logic in python instead

WITH search AS (SELECT id.id         as item_id,
             td.scalar_min as scalar_min,
             td.scalar_max as scalar_max,
             id.type       as type
      FROM item_held
               INNER JOIN main.item_log il on il.item_id = item_held.item_id
               INNER JOIN main.item_def id on il.source_item_id = id.id
               INNER JOIN main.tool_def td on id.id = td.item_id
      WHERE id.type in ('tool_dirt', 'tool_fish')
        and holder_user_id = 0),
search2 AS (SELECT scalar_min, scalar_max, type
    FROM search
    WHERE (
        (scalar_max = (SELECT max(scalar_max) FROM search WHERE type = 'tool_dirt') and type = 'tool_dirt')
        or
        (scalar_max = (SELECT max(scalar_max) FROM search WHERE type = 'tool_fish') and type = 'tool_fish')
)),
search3 AS (SELECT scalar_min, scalar_max, type
FROM(
    SELECT scalar_min, scalar_max, type
    FROM search2
WHERE (scalar_min = (SELECT max(scalar_min) FROM search2 WHERE type = 'tool_dirt') and type = 'tool_dirt')
    or (scalar_min = (SELECT max(scalar_min) FROM search2 WHERE type = 'tool_fish') and type = 'tool_fish')
)),
dirt_out AS (
    SELECT scalar_min, scalar_max, type
    FROM search3
    WHERE type = 'tool_dirt'
    ORDER BY scalar_max DESC, scalar_min DESC
    LIMIT 1),
fish_out AS (
    SELECT scalar_min, scalar_max, type
    FROM search3
    WHERE type = 'tool_fish'
    ORDER BY scalar_max DESC, scalar_min DESC
    LIMIT 1
)
SELECT * FROM dirt_out
UNION ALL
SELECT * FROM fish_out;
cedar tiger
fallow otter
#

Ok

coral wasp
fallow otter
visual bough
fallow otter
fading patrol
cunning plover
#

can anyone help me choose which one to download

coral wasp
#

Sqlite is in the Python stdlib , you don't need to install

fading patrol
cunning plover
#

oh.....

#

i already installed it

#

i thought itwas like postgresql

fading patrol
cunning plover
#

Like it was jus 5mb to download.

#

I was surprised

#

cause postgresql gave me a lot to download

fading patrol
#

Just like writing to JSON or a text file basically

cunning plover
#

does that mean i can uninstall what i installed

fading patrol
cunning plover
fading patrol
waxen finch
#

yeah, python's windows installer bundles its own sqlite library so it works out of the box, but if desired, you can replace the sqlite3.dll inside your installation with a newer version from that SQLite downloads page

#

those sqlite executables are just convenient CLI tools you can use on your database files (they have some neat formatting options too)
https://sqlite.org/cli.html

spark bluff
#

Yes

teal hare
#

Hey guys, I've found this cool open-source project for visualizing databases. I want to test it, but I've never used PostgreSQL before, and I'm using it as a Docker container. Can anyone tell me how to make it work?

Here's the link: https://app.chartdb.io/diagrams/diagramexample01

I tried adding the script mentioned in my PostgreSQL terminal, but it didn't return any JSON at all.

fading patrol
teal hare
#

This is the result

 fk_info 
---------

(1 row)
teal hare
#

Here's how I run

postgres=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# WITH fk_info AS (
postgres(#   SELECT json_agg(row_to_json(t)) AS fk_info
postgres(#   FROM (
postgres(#     SELECT
postgres(#       schema_name,
postgres(#       table_name,
postgres(#       fk_column,
postgres(#       foreign_key_name,
postgres(#       reference_table,
postgres(#       reference_column
postgres(#     FROM (
postgres(#       SELECT
postgres(#         tc.constraint_schema AS schema_name,
postgres(#         tc.table_name,
postgres(#         kcu.column_name AS fk_column,
postgres(#         ccu.table_name AS reference_table,
postgres(#         ccu.column_name AS reference_column,
postgres(#         tc.constraint_name AS foreign_key_name
postgres(#       FROM
postgres(#         information_schema.table_constraints AS tc
postgres(#         JOIN information_schema.key_column_usage AS kcu
postgres(#           ON tc.constraint_name = kcu.constraint_name
postgres(#           AND tc.table_schema = kcu.table_schema
postgres(#         JOIN information_schema.constraint_column_usage AS ccu
postgres(#           ON ccu.constraint_name = tc.constraint_name
postgres(#           AND ccu.table_schema = tc.table_schema
postgres(#       WHERE
postgres(#         tc.constraint_type = 'FOREIGN KEY'
postgres(#     ) AS sub
postgres(#   ) AS t
postgres(# )
postgres-# SELECT * FROM fk_info;
 fk_info 
---------

(1 row)
fading patrol
gusty flint
#

is there a way to connect and execute HSQLDB commands through python script ?

coral wasp
#

Looks like they connect through an odbc connection, so it's really about python -> pyodbc -> hsqldb

fading patrol
teal hare
#

if it works

lone raft
#

Will work

unreal hemlock
#

look like just a filter

#

it will only show rows where that column value is "UDF111"

#

np

obsidian basin
#

Hey I am using back_populate because the modern docs doesn't include backref? Also if I have a 1 to many relationship in the many if I have a column in the many table how do I get a column in the one table in flask sqlalchemy?
https://docs.sqlalchemy.org/en/20/orm/basic_relationships.html#one-to-many
The link below is using backref + back_populate. But the example below are from the old docs. Is the link below accurate with the new docs? I can't find anything with the modern docs involving my previous question. Any help would be appreciated.
https://stackoverflow.com/questions/39869793/when-do-i-need-to-use-sqlalchemy-back-populates

Please ping on reply.

mental cove
#

Can somebody help me with this please ?

#

I'm making a create account screen using sqlite3 and hashlib in pycharm. Whenever I create an account it says "Database locked"

waxen finch
#

that can happen from leaving an unclosed connection in your script, but also from external programs like database viewers, or just having your script running twice - can't say for sure what's causing it in your case without seeing the rest of the code

slate lichen
#

any naming usggestions for these methods??

storm mauve
# slate lichen

get_guild_modules, (get|list|find|search)_guilds_with_module

slate lichen
#

Finally, i can continue coding ๐Ÿ˜‚

mental cove
paper flower
#

Maybe create more generic methods with related filter objects? So if you have to fetch a module/guild by different property you won't have to make a new method?

@dataclasses.dataclass
class SomeFilter:
    module_name: str | None = None

async def guilds(self, filter: SomeFilter) -> list[Guild]: ...
delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied timeout to @delicate lodge until <t:1725227150:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

unreal hemlock
#

does anyone have a database class which I could get some inspiration from

fading patrol
unreal hemlock
#

and wanted to get some inspiration from others

west hill
#

asyncpg?

unreal hemlock
fluid mortar
#

I'm starting a new back-end project with fairly loose persistence requirements. I have experience with SQLAlchemy but it feels like a sledgehammer to crack a nut, plus it seems like there are so many different query syntaxes these days it's hard to remember how everything works.
So, does anyone have any alternative suggestions? I don't necessarily need SQL, but I do need to be able to have trivial local hosting on Windows and Linux, even better if it's self-contained like sqlite. Looking for something easy to use from Python but powerful if needed.

wise goblet
wise goblet
#

Python ORMs are too powerful, they make it all a class, with easy to use abstractions and ecosystem around for easy unit testing
And they are able to autogenerate migrations and have inbuilt migrating versioning accordinly

#

Like...Other options would be very hard to refactor later if u will pick anything else

#

ORMs are abstracting away which SQL engine u use and later if u will need to make the project more powerful than at its start u will be able easily switching Sqlite3 to postgres or other fully fledged engine

wise goblet
# fluid mortar I'm starting a new back-end project with fairly loose persistence requirements. ...

So u can use SQLAlchemy with SQlite3 as mentioned. and to make it even more powerful u could ensure u use its mypy integration ๐Ÿ˜‰
https://docs.sqlalchemy.org/en/20/orm/extensions/mypy.html the the code will be awesome.
Because despite u using leaky abstraction like orm, it will have all the validations for typing and correct usage

from sqlalchemy import Column, Integer, String, select
from sqlalchemy.orm import Mapped
from sqlalchemy.orm.decl_api import DeclarativeMeta


class Base(metaclass=DeclarativeMeta):
    __abstract__ = True


class User(Base):
    __tablename__ = "user"

    id: Mapped[Optional[int]] = Mapped._special_method(
        Column(Integer, primary_key=True)
    )
    name: Mapped[Optional[str]] = Mapped._special_method(Column(String))

    def __init__(self, id: Optional[int] = ..., name: Optional[str] = ...) -> None: ...


some_user = User(id=5, name="user")

print(f"Username: {some_user.name}")

select_stmt = select(User).where(User.id.in_([3, 4, 5])).where(User.name.contains("s"))

Ensuring u use only existing attributes to a model, including during a syntax to SQL query commands

#

There are some other options... but they are meh in average
like if you are going to maintain your project only once in year, then good choice can be just using std lib sqlite3 and nothing else

paper flower
paper flower
fluid mortar
paper flower
#

"Just read the docs"

#

They cover all the relevant techniques

fluid mortar
#

Like I said, the docs are far, far too long.

paper flower
#

You don't need to read everything

fluid mortar
#

I've been using SQLAlchemy for something like 15 years, on and off. I'm not a newbie at this. I'm just looking for something a bit more convenient.

paper flower
#

quickstart section is pretty short

wise goblet
paper flower
wise goblet
paper flower
#

Can you link it please?

paper flower
#

Deprecated since version 2.0: The SQLAlchemy Mypy Plugin is DEPRECATED

wise goblet
# paper flower `Deprecated since version 2.0: The SQLAlchemy Mypy Plugin is DEPRECATED`

ORM Declarative Models
SQLAlchemy 1.4 introduced the first SQLAlchemy-native ORM typing support using a combination of sqlalchemy2-stubs and the Mypy Plugin. In SQLAlchemy 2.0, the Mypy plugin remains available, and has been updated to work with SQLAlchemy 2.0โ€™s typing system. However, it should now be considered deprecated, as applications now have a straightforward path to adopting the new typing support that does not use plugins or stubs.
๐Ÿค” https://docs.sqlalchemy.org/en/21/changelog/whatsnew_20.html#whatsnew-20-orm-declarative-typing

#

okay mapped_column and Mapped[int] is latest

fluid mortar
#

This is exactly why Iโ€™m considering other options. Almost every example of SqlAlchemy out there uses old syntax, whether itโ€™s for queries, declaring ORM fields, or something else. It gets tiring trying to mentally translate it all the time. So Iโ€™m just interested in alternatives.

paper flower
#

Depends on the queries you want to run really and how far you're ready to stray away from sql

fluid mortar
#

Not sure I want to try and integrate Django ORM into a non-Django project ๐Ÿ™‚ Tempted to consider MongoDB again but that doesnโ€™t seem popular any more

wise goblet
#

Since MongoDB is not sql, data is migratable to new format only manually by self written scripts

#

Sql (and especially popular python orms) take care about it for you

#

Also if u use mongo, u usually need to be not caring about complex syntax quering with data joins.
Supposedly you are okay your db having simple operations only

#

Tldr: mongodb is even more poor alternative than Sqlite3, but can play nice enough if u have very simple needs and it is one time development application without future growth in terms of code and data complexity

wise goblet
#

Like mongo, except inbuilt std lib and bd as simple file ๐Ÿ™‚

fluid mortar
#

I hadn't thought about using Pydantic with Mongo before, which is an interesting idea. At some point I will probably need the data to be readable from multiple processes in a fairly reliable way so I don't think shelve would be good enough, but thanks for the reminder that it exists

wise goblet
#

As long as u wrote unit tests involving their running against local (preferably in docker compose) instance of mongo

thorny anchor
wise goblet
#

it is not present in Django ORM ๐Ÿ˜‰

#

trully automatically

obsidian heart
#

kind of an odd question but i want to make a webapp with python that would show some charts on an analysis of data that i have in a csv. this data has tags but it is currently just comma separated in a single cell for each item which feels so wrong. I am not sure if I am gong to move the data to a db or use pandas, but i am struggling to find the best way to store tags that will make it easy to read and sort by (e.x. click on a tag in the webapp and it will show all items that have that tag, and each item can have multiple tags). anyone have experience with this?

#

ik in something like bigquery i could use something like an array in a column for each line item but i also dont want to get that involved if it is a personal passion project that is to just track books i am reading and the tags i am adding to them

#

i have also considered the option of having a denormalized set where there is an entry for each book and each tag it has but that also feels kind of icky to me and could cause issues with latency (i sometimes have more than 20 tags on a book and there will be at least 1500 books with the list growing each day)

wise goblet
# obsidian heart kind of an odd question but i want to make a webapp with python that would show ...

kind of an odd question but i want to make a webapp with python that would show some charts on an analysis of data that i have in a csv. this data has tags but it is currently just comma separated in a single cell for each item which feels so wrong. I am not sure if I am gong to move the data to a db or use pandas, but i am struggling to find the best way to store tags that will make it easy to read and sort by (e.x. click on a tag in the webapp and it will show all items that have that tag, and each item can have multiple tags). anyone have experience with this?
Relational database like Postgres has several solutions for that.

  1. Just using Array and store as it is tags https://www.postgresql.org/docs/current/arrays.html
    U already again ability to query
    Postgres has all the syntax to treat them correctly and return u query if only specific tag is present

https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#array-types

from sqlalchemy.dialects import postgresql

mytable = Table("mytable", metadata,
        Column("data", postgresql.ARRAY(Integer, dimensions=2))
)
mytable.c.data.contains([1, 2])

SQLAlchemy with its Postgresql dialect can use this functionality for example

#
  1. More traditional SQL solution...
    using Normalization. Creating extra table with unique tags for example

And then creating third table for many to many relationship
where each record links your row from main db with specific tag.
Repeat records for linking multiple tags with specific row from main table
Django ORM probably has it implemented as Many To Many relationship shortcut or smth like that

#
  1. Also can be just used some pervert solution like using "LIKE" syntax for searching values in any pseudo array for any SQL engine (including Sqlite3)
wise goblet
#

if it is personal project as it is, i would have crossed postgres away. too heavy for pet projects

wise goblet
paper flower
cedar tiger
#

-1 would mean the last character. 6 would mean the 6th position. -1:6 means last character and up to the 6th character, but it doesn't work that way

mossy swift
#

I think it's because name[-1] (the last element) is to the right of name[6] and the default step for slicing is 1, so you move to the right
You can specify your own "step" for slicing, for example:

#

!e
name='Jennifer'
print(name[-1:6:-1])

delicate fieldBOT
cedar tiger
#

If you just wanted the last 2 characters, just do -2:

#

!e

name='Jennifer'
print(name[-2:])
delicate fieldBOT
torn sphinx
#

Hey guys I have 20tb of images to store

#

I need to store it to train a model later (yolov8)

#

Where should I store it? I need it long term too

#

If I use amazon s3/glacier, will I be able to train models with the stored data directly or will I have to download it again

weak yoke
torn sphinx
coral wasp
#

Are you training locally or in AWS?

torn sphinx
#

I heard that sagemaker can read data directly from s3 but Iโ€™m not sure

coral wasp
formal latch
#

More of a conceptual and generic question.

Best or good practice: how should I handle my database connection code? Where should I place it, how should I structure it, etc.

fading patrol
storm mauve
#

depends on the framework, not that generic

for FastAPI, you should use Dependencies to create connections that draw from a global pool
for Flask, iirc you typically just save the pool as part of the app configs

formal latch
fluid mortar
#

To be honest, structuring good DB code within a non trivial FastAPI app is a pain in the ass. Like Flask, itโ€™s too keen on using globals.

#

The example above is okay but it relies on a module level โ€˜engineโ€™ variable. But at least it gets the settings from the environment

storm mauve
#

at least for FastAPI, you can and should use their dependency injection overwrites if you need to mock it for testing

if you need to use a different engine depending on the request, that feels pretty cursed but could be done by having another dependency instead of leaving it global I guess

fluid mortar
#

I donโ€™t want logic being executed at import time. The engine global has side-effects, so you have to be careful when importing that file.

Their dependency override system for testing is nasty because it uses the original object as the key. Which means you get the side effects from importing it even though you wonโ€™t use it

rustic remnant
#

Hey all, I have an eight month long class for designing a project with 2-3 people that fills some need in CS that hasn't been filled. I really am trying to orient my career path towards databases (with a little bit of DS), but for the life of me can't think of a project that would take that long to make with that many people. Of course it wouldn't be full time work on the project, but I would need to get something presentable and just really don't know what I could even present related to databases. Any ideas?

storm mauve
#

collect usage analytics in whichever project you do, and look for ways to store it efficiently and analyse them

databases are part of nearly any CS project ever, but not a project on their own unless you're creating your own (which I wouldn't really recommend)

severe basin
#

Do you have any idea how I can add exceptions for the database calls?

#

I'm using AsyncIOMotorClient

potent quartz
#

do i need a conn pool with async conn?

#

wont like the async-ness of the conn act as a paralelism mechanism by itself?

#

just like with conn pool

waxen finch
potent quartz
#

i would just pass single conn via app state

potent quartz
#

however

#

i think i should just like test

#

with pool vs no pool

#

beacuse about high concurrency

#

i am going to store binary data in db

#

so perhaps separating would help

waxen finch
potent quartz
#

or more like process conn limit

waxen finch
#

not to mention there's the overhead of handshaking every new connection

potent quartz
#

i read up a bit of docs actually

#

and

#

it seems like i misunderstood how it works internally a bit pithink

potent quartz
#

because i cant use same conn from different threads at the same time without a lock..?

#

i use rust btw, just to clear up some possible misunderstandings

waxen finch
potent quartz
#

and pg's pipeline mode

#

seems more like just batch request

#

rather than some automatic magic

waxen finch
#

seems about right, its described as just client-side so i'd assume the database was already able to queue up queries (from one connection) before the feature was added

thorny anchor
#

the model in relational databases is multiple concurrent connections, not multiple concurrent transactions on one connection

tender tree
#

hey guys why are dbt docs such a mess

torn sphinx
#

MongoDB is best

oak pumice
#

This may be a silly question but it's Monday morning and i'm only two sips into my coffee (That's my story and I'm sticking to it):

I have two tables [PartsInventory] & [PartHistory] that pull just under 60k items every morning from our software's DB via API calls. I also have three more tables [CurrentDayInventory] CDI, [PreviousDayInventory] PDI, and [ArchivedInventory] AI. THe current process is to dump PDI's data into AI. Then dump CDI into PDI. Then dump the up to date data from a View I created into CDI.

I recently began working on this piece again and decided to compare the View to CDI and which ever Part required updating to update it instead of deleting all data in CDI and then inserting 60k items.

My question, is whether or not it is more efficient to simply delete the data in PDI, then insert all of CDI into it. Then delete CDI data and insert all of my View into it. OR iterate over each item and update where necessary? The latter method appears to have increased the run time on the script.

Thanks!

#

I should also add that I am using SQLAlchemy within my Python script to do all of this.

clever topaz
#

I wonder also if you could get away with renaming the tables (at least for CDI -> PDI). Should be pretty fast : https://dba.stackexchange.com/a/53850

oak pumice
#

I have CDI and PDI set up so that I can compare yesterday's items with today's items which I use for a couple of reports for the managers.
Changing their names wouldn't help i nthis scenario. As for the update, I will grab a code sample and post it

clever topaz
spark aurora
#

wsg ppl

oak pumice
#

Here is the function that updates iterates through my View and updates CDI where necessary. Previously, I simply deleted the data in CDI and inserted everything from the View into CDI. I figured updating would be quicker but it seems to have lengthed the process.

#

That might be easier to read

clever topaz
#

I would expect a query like this to take on the order of 10s of seconds, a couple mins max. How long is it taking at the moment?

paper flower
clever topaz
paper flower
#

And also why?

#

Let's say you have a code like this to add new entities:

session: Session

for chunk in chunked(scraped_data, 1000): # Probably would be a good idea to chunk your data there
    for schema in chunk:
         model = SomeModel(id=shcema.id, name=schema.name, ...)
         session.add(model)
    session.flush()

You can just modify it a bit to update existing records:

session: Session

for chunk in chunked(scraped_data, 1000):
    stmt = select(SomeModel).where(SomeModel.id.in_([schema.id for schema in chunk]))
    existing_models = {model.id: model for model in session.scalars(stmt)}
    for schema in chunk:
         model = existing_models.get(schema.id) or SomeModel(id=schema.id)
         model.name = schema.name
         session.add(model)
    session.flush()
#

That's if you're using orm of course

#

With core you could just use on conflict do update?

clever topaz
paper flower
#

Honestly if you're after raw speed you probably shouldn't use python should just benchmark all the options ๐Ÿค”

#

Maybe yeah, just call delete/truncate on a table and insert all the records again, see if that would be better

#

But generally sqlalchemy should try to optimize updates, if model didn't change nothing should be emitted into db

slate lichen
#

Hmm, I will either get praise or a tonne of backlash due to this code

class DictWrapper:
    def __init__(self, dictionary):
        self._dict = dictionary

    def __getattr__(self, key):
        if key in self._dict:
            value = self._dict[key]
            if isinstance(value, dict):
                return DictWrapper(value)
            return value
        else:
            return None

    def __setattr__(self, key, value):
        if key == "_dict":
            super().__setattr__(key, value)
        else:
            if isinstance(value, dict):
                self._dict[key] = DictWrapper(value)
            else:
                self._dict[key] = value

    def __repr__(self):
        return repr(self._dict)

The only reason I made it is so that I can easily read and manupulate the dictionary without getting random index errors. The code looks prettier as well.

||This code was motivated by the javascript objects||

clever topaz
slate lichen
next shadow
#

Hello, your messages have been removed for violating rule 6

slate lichen
pearl lodge
#

guys why my table content is not deleting?

#
Template.py is ready
result = [(None, 1130268405333753857, 'Hikaru', 'Hikaru:', None)]
deleted
#

the process is occuring but the deletation si not happening

#
async def delete_default_character(
        self, *, user_id: int, name: str | None = None, prompt_prefix: str | None = None
    ) -> None | str | list:
        cursor = await self.db.execute_fetchall(
            "SELECT * FROM default_characters WHERE user_id = ? and (prompt = ? or char_name = ?)",
            (user_id, prompt_prefix, name),
        )

        result = list(i for i in cursor)

        print(f"{result = }")

        if len(result) > 0:
            if len(result) == 1:
                if name:
                    await self.db.execute(
                        "DELETE FROM default_characters WHERE user_id = ? and char_name = ?",
                        (user_id, name),
                    )
                    await self.db.commit()

                    print("deleted")

                    return "SUCESS"
                elif prompt_prefix:
                    await self.db.execute(
                        "DELETE FROM default_characters WHERE user_id = ? and char_name = ?",
                        (user_id, name),
                    )

                    print("deleted")

                    await self.db.commit()
                return "SUCESS"
            else:
                result_list = list()

                for i in result:
                    data = {
                        "name": i[2],
                        "prompt_prefix": i[3],
                        "image_url": i[4],
                    }
                    result_list.append(data)

                return result_list
        else:
            return "ERROR"
#

why it's not deleting from my database?

oak pumice
mossy whale
harsh pulsar
delicate fieldBOT
#

dbt/include/global_project/macros/materializations/snapshots/helpers.sql line 62

nullif({{ strategy.updated_at }}, {{ strategy.updated_at }}) as dbt_valid_to,```
finite edge
#

What is the fastest way to check "in" on integer on dbs?

Get the whole list from JSON and placing it in a set?

For example, a list of channels ids

coral wasp
finite edge
# coral wasp Can you give an example of what you mean?

channels ids in a json file

[
    16382728238381982384,
    16382728238381982386,
    16382728238381982385,
    16382728238381982383,
    16382728238381982387,
    16382728238381982388,
    16382728238381983389,
]

on message event for example

white_channels = getallwhitechannelsjson() #returns the whole json in set

if message.channel.id in white_channels:
     ...
coral wasp
#

And what do you want to query from a db? (And what type of db?)

finite edge
finite edge
coral wasp
thorny anchor
finite edge
thorny anchor
#

maybe

#

json could be a good choice for other reasons. it's not well suited for this task

coral wasp
# finite edge So is json the best way to do something like that?

Since this is the db channel, I'll give you my ducky_drawing solution: ```py
with open('myfile.json', 'w') as f:
f.write("""
[
"16382728238381982384",
"16382728238381982386",
"16382728238381982385",
"16382728238381982383",
"16382728238381982387",
"16382728238381982388",
"16382728238381983389"
]
""")

import duckdb
df = duckdb.sql("select * from 'myfile.json' where json = '16382728238381982384' ").df()
print(df)

thorny anchor
#

well that's impressive

slate lichen
coral wasp
thorny anchor
#

that is indeed cool

harsh pulsar
#

and duckdb is cool but you might want to consider just using regular python for this... json.load interprets json arrays as python lists. so once your data is loaded, you just have a normal list, and you can use normal python code on it

last perch
#

I set up a db but have no idea how to work it. Still using json because we are less than 5 users so far

fading patrol
spare fox
#

Can I set automatic attachment downloading of specific email sender that I receive every day and set it to be also automatically saved on my pc when I receive it?

spare fox
#

Hi ! Tnx...Is there a software for that or I have to write code ?

spare fox
cedar tiger
torn sphinx
#

which database offers the most storage for free with the most read/write cycles?

fading patrol
torn sphinx
fading patrol
fading patrol
lime dagger
torn sphinx
#

i was thinking of upgrading to serverless mongodb if i have to

torn sphinx
ornate wave
#

Hi, I have a question about SQLAlchemy's cascade deletion, specifically when using a secondary join.

If I declare cascade="all, delete", does it delete rows from the association (middle) table only, or does it also delete rows from the related table on the other side of the relationship? Or does it delete from both?

If I declare onDelete="CASCADE" on the middle table, do I also need to declare cascade="all, delete" on the left and right tables?

gilded ferry
#

i feel dumb looking at all of this

warm igloo
#

I have a question. So I appear to be having an issue with my connection string pyodbc. I mean the code runs perfectly on my computer, but when this other person tries running it, she gets the following error:

Login failed for user 'janedoe'

Here's the code:

`
server = 'servername'
database = 'dbname'
username = 'readonly'
password = 'xxxxxxxxxx'

conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password, trusted_connection='yes', autocommit=True)
`

fading patrol
fading patrol
warm igloo
#

Now to wait for a response.

warm igloo
#

I guess I'll contact someone to try and get her access.

sly acorn
#

hello all, is it bad practice to name certain keys in the database as "name.bio" instead of name_bioi? The original source where I'm getting the data from has keys names with both underscore and dots as well.

Should I make the schema according to that or rename the incoming data to match the database with all keys in underscore?

paper flower
#

On python side if you use orm it would certainly be easier to use an underscore

#

Also regarding the field name.bio itself, can't it be just bio?

coral wasp
#

Periods are for scoping, and you'll be forced (as Doctor said) to always double quote the names

paper flower
#

Yeah, e.g. schema.table, table.field

#

Would be much easier to write table.name_fio than table."name.fio"

static badger
#

Not sure if this is the right channel but Iโ€™m curious for dealing with local data files as part of Jupyter hosted data analysis Iโ€™ve found my three most useful external modules are pandas, pyarrow and duckdb. Am I missing anything thatโ€™s better than these?

clever topaz
harsh pulsar
#

Other options include Xarray for when you want >2-dimensional Pandas data frames, or Dask for when you want to do Spark-style partitioned parallel batch processing without the fuss of Spark (arguably this is less useful for local computing than it used to be, now that we have higher-performance multithreaded stuff like Duckdb that can handle out-of-core processing)

#

Pandera is good if you are building applications around data frames

sly ember
#

Hi, not sure if this is the right channel to ask this in but I was having some trouble so I figured I'd try, https://github.com/vn-ki/film-raffle-bot I'm attempting to recreate a discord bot that uses python, I think I've got the main elements in the config file down but I'm having trouble with setting up a database for it since I have no experience with that. Any suggestions or recommendations for what to use?

GitHub

Contribute to vn-ki/film-raffle-bot development by creating an account on GitHub.

harsh pulsar
#

(The sqlite database is just a file)

sly ember
#

iโ€™m not really sure how hosting solutions work

fading patrol
sly ember
#

i created the file in the same folder as the rest of the files, edited the config with the necessary info but the main.py isnt working no matter what and im not sure what the issue is

burnt maple
#

I have a question how can I make subfolders for my gui? My apologies

fading patrol
sly ember
#

oh i got that fixed but now itโ€™s just the commands not working, iโ€™ll post in a bit

spring palm
#

why am I get floating point errors with my float columns using asyncpg? i'm not doing anything exotic with my data except from loading it into my class

paper flower
spring palm
paper flower
#

I don't understand what kind of errors are you getting since these seem like normal values to me

#

Also is that war thunder?

spring palm
#

also i should've clarified, i'm getting this

paper flower
#

Are you getting this from your api or from warthunder's api?

spring palm
paper flower
spring palm
paper flower
#

That's what I'm asking you to simply understand what's the cause of that problem ๐Ÿ˜…

#

If that's a screenshot from warthunder's api for example then you can't really fix that

#

other thatn just round it to nearest N digits after coma yourself

spring palm
#

and i guess i could round it, but i would prefer a more robust solution

paper flower
#

Depending on how your client works with these values you may want to use decimal.Decimal here and a numeric type in your db ๐Ÿค”

#

Float values can't be accurately represented with IEEE754 floats if that's what you're going for

#

But generally that should be fine if your client can handle formatting correctly

spring palm
#

i'll just round it for now until i work out something better

hasty helm
spring palm
coral wasp
faint marlin
grim quest
#

hey guys i got a job interview for a data analyst position in two days

#

what topics should i go over

fading patrol
static badger
# harsh pulsar Ibis might be interesting to you: it's a common interface over several data fram...

Iโ€™ve also been using AWSwrangler + boto for non remote stuff. (Mostly for integrating Athena queries into my notebooks or to dynamically generate glue records for S3 hosted data..)

Ibis has been on my radar to checkout, as was Polars but i actually ran into an issue with Polars and would have had to compile it myself to handle datasets with more than 4 billion (or so) records, which ironically pandas has no issue with if you have the ram. (64 bit index).

My largest dataset had 300+ billion rows. I melted my laptop trying to analyze it, so I ended up spinning up my notebook on a 1TB ec2 instance with Vscode remote. Believe it or not pandas (with Pyarrow) was able to handle it fine. (But took 12 hours.) in hindsight they t would have made sense to use a distributed processing engine but I was trying to contain my costs. (Was about $70 for the ec2 instance)

Polars is still on my list for its Iceberg support.. but Iโ€™m more and more attracted to duckdbโ€™s native format for those cases I need an extendable file based data store that supports schema evolution.

Ibia remains on my list.. Iโ€™ll learn more about out it at the very least. I kind of like learning to work directly with the different libraries so may wait to actually use ibis.

coral wasp
static badger
#

Yeah Iโ€™m in there. Duckdb was a recent discovery. I love it.. refactored a bunch of parquet code to use it.

#

Itโ€™s not as fast with parquet as its native format but I canโ€™t give up the space efficiency of parquet. I ended up writing a simple query caching function which more than made up for the performance difference.

#

(Itโ€™s gross but I instantiate a global variable dictionary which uses a tuple of the query parameters as the cache key.)

It speeds up the first run and since I live in Jupyter I get the benefits on subsequent runs as long as I do t restart the kernel

#

Next time I have to analyze a large dataset Iโ€™m going to try duckdb

coral wasp
#

I use a dbt pipeline that mostly starts from parquet, too

#

(Which also creates various tables for the intermediate steps)

#

And, if you want to survive restarts, then make it an on disk db rather than :memory:

static badger
#

All this data is data to help me. (I do Finops and Iโ€™m largely left to fend for myself.. I had to teach myself python and enough data analysis/engineering to get it done.

And I love it!

coral wasp
#

Nice, I'm on data engineering side of fintech

static badger
#

Ah neat! My background is devops/cloudops/linux.

coral wasp
#

One important skill: understanding SQL window functions. Windows are the answer to many complex queries.

static badger
#

(Iโ€™ve had to do ops support for de teams in the past so I wasnโ€™t scared to get my hands dirty)

#

Ok good to know. Is this particularly to duckdb queries or in general?

#

As of now I largely only use sql with Athena and duckdb. (In both cases I dynamically generate the queries in code)

#

I also have to use psql sql for generating cloud watch metrics in lambda. (Much more lightweight as itโ€™s for tracking database metrics like length of longest running query)

coral wasp
static badger
harsh pulsar
#

Duckdb does better

#

I think Dask can do some out of core stuff but meh, Duckdb just works most of the time

coral wasp
# harsh pulsar Duckdb does better

I think they're all somewhat 'incomplete': it's an area of active work and optimization for all of them... I haven't compared them side by side, I generally can just throw RAM at a problem and partition the dataset.

harsh pulsar
#

They all have areas where they are good and less good

coral wasp
#

Yah, i guess what I really wanted to say is that this stuff is the state of the art, and as such, will have bumps along the path of using

static badger
#

Looking at window functions. Can they kinda replace the groupby functions in pandas? (Among other things.. really just looking g to see why I would need them)

#

Oh nvm I see.. seems like it could be useful.. but maybe not for any problems Iโ€™ve had to solve YET.

coral wasp
#

Windows are good for things like: give me the top three products by country, or: show me rolling 7 day sales average

#

Or, let's say you want to compare sales this quarter to sales 4 quarters ago

#

(These are all things you can also do in pandas, just explaining why windows are useful)

torn sphinx
#

I'm in a bit of a predicament and i need help please because i need to deploy
so first of all every time i need to execute sql on my webapp i open a new connection but this is a very expensive operation to do like opening new tcp connection to postgres db in the cloud so what people do is that they create a connection pool and pull connections from it so that there is no opening and closing overhead but the problem is that the appache webserver makes my code in process and each process has 5 threads so my server code gets ran num_ processes * 5 times which results in initializing a number of connections higher than the available (5)

coral wasp
#

You can use a connection pool with only one connection.

#

This just means that each process will reuse the same connection, rather than making new connections each time

paper flower
#

Honestly I was thinking that maybe don't use apache or use pgbouncer so it does the pooling instead pithink

austere badge
#

Is it bad to run a database server and a website on the same machine?

#

I only have 1 machine and I am trying to run both so any recommendations which database would be fine running on the same machine as my website

thorny anchor
#

it's fine

barren abyss
#

what happen if i index on column with alphabetical data/ char type in SQL ?

#

like name for example

paper flower
#

Index would get created pithink

barren abyss
#

But idk if possible for char type

paper flower
#

btree indexes (which are default in most databases, I believe), so anything that can be compared can be indexed

#

chars can be indexed (if we're talking about single characters), strings can be too

obsidian basin
#

@weak yoke , hopefully it is okay if I ping you

I created a simpler example

Like stated in the post

I am having trouble inserting and then querying data in flask sqlalchemy 2.0. Does anyone know how?

Also can someone please show an an example?
Can someone use sqlalchemy 2.0 querying when answering?

Also if I need to add other columns at another time how would I add the columns in a many to many relationship? I assume this would be equivalent to updating columns even though the columns are optional.

Here is the error.
https://paste.pythondiscord.com/TPWQ
app.py
https://paste.pythondiscord.com/TPTQ

home.html
https://paste.pythondiscord.com/B6FA

create_book.html
https://paste.pythondiscord.com/WBUQ

Thanks

craggy hornet
#

Hi, I'm looking for someone to give me a hand with some facebook scraper, I have to collect phone numbers of a particular location.

fading patrol
acoustic lava
#

Hey everyone!
Can anyone tell me how can I store 3d models in sql database and how I can retrieve to render it on website.

fading patrol
# acoustic lava Hey everyone! Can anyone tell me how can I store 3d models in sql database and h...

Why do you think you want to store 3D models in your database? It's probably better to store the URLs of the files in your DB instead, but if they are small blobs that should be trivial

The normal way to retrieve any data from a DB for a website is an API which you can build with whatever framework you like

For the rendering part, pick a library https://medium.com/lai4d/how-to-embed-an-interactive-3d-model-on-a-web-page-9bd631046927

Medium

Thanks to the HTML5 standard nowadays it is possible to visualize 2D and 3D vector graphics in any web browser and in any device withoutโ€ฆ

jade gorge
whole mica
#

How can I update all values in an object in MongoDB?
For example, the data look like ```
{
"_id": ...,
"myObject": {
"a": 1, "b": 2, "c": 3
}
}

and I would like to change all 1, 2 and 3 to 0s
jade gorge
#

You can use the $set operator

#
query = {"_id": your_document_id}

update = {
    "$set": {
        "myObject.a": 0,
        "myObject.b": 0,
        "myObject.c": 0
    }
}

collection.update_one(query, update)

sacred shell
#

Hello, Ive been trying to make this work for the past 2 days, I honestly dont know if this is the correct place to come, But my bot keeps "closing" the databases connection

Ive learned that this is because of my init_database_schema function that loads the schema.sql and gets it contents and executes the commands idk, But, Nothing is wrong there, I load my schema.sql and execute, Weirdly it shows that error, But fun fact, When i do return <asyncpg.Pool>._closed, It for some reason returns, False. Even if the error occured, Can someone help me?

paper flower
#

Honestly I didn't work with asyncpg directly but sqlalchemy's pool includes a pre_ping functionality which checks if the connection is alive before sending any requests

delicate lodge
#

B

#

Mysql or posgrrs

tender kernel
#

Alright... here's the thingy...

I'm building a database in PostgreSQL of raw trade data from the Kraken cryptocurrency exchange. There are 724 .csv files. Each one represents a unique trading pair. Each files contains anywhere from 10,000 rows up to around 50 million rows of data. The data in each row always consists of unix time stamp, price, volume like so:

1628609725,2.50300,50.36358000
1628609725,3.07400,2.00200000
1628609725,3.07400,2.00200000
1628609725,3.07400,2.00200000
1628609733,2.50300,7.87232000
1628609739,2.50300,355.43768000
1628609750,2.50300,3.93616000
1628609757,2.50300,2.00400000```

As you can see, entries 3-5  are identical, which is fairly common in raw trade data due to trading algorithms executing trades in rapid succession. Herein lies the problem...

This following link is to the kraken exchange support page
https://support.kraken.com/hc/en-us/articles/360047543791-Downloadable-historical-market-data-time-and-sales-
There you will find two links within, on for the bulk of their entire historical trade data in its most granular form and the second like is to quarterly data dumps. So the problem is, I am currently aggregating the bulk of the data into my database upto and including Q2 2024. Sometime in early Oct they will release the Q3 datadump.

My plan is to be consistently running a datacenter on my raspberry pi to capture live trade data and add it to my database via websocket, however, I'm considering the fact that this data stream can and will be interrupted at some point in the future. For example, if I started the data stream now, there would be  gap of missing data from the end of the bulk data up to the end of Q2 all the way until I start the websocket for live data. 

So If I added the Q3 data to the database, there is no way to differentiate between identical entries, and duplicate entries. I can't seem to figure out how to 'fill in the gaps' without duplicating trade data.
#

So, the best I can come up with is giving a unique trade id's to the bulk data that identifies as a data dump, and giving unique trade id's to the live data that identifies as live trade data so if there are identical entries that already exist from one side or the other, then processing will skip over adding it as a duplicate?

Does that make sense?

clever topaz
#

Also lots of other ways to approach this.

thorny anchor
coral wasp
#

Maybe compute a rolling checksum (ie rabin) over last N entries to find the overlap points?

tender kernel
paper flower
cedar tiger
#

!pip dataset is interesting. Avoids the whole sql injection and the methods looks easy to use.

delicate fieldBOT
#

Toolkit for Python-based database access.

Released on <t:1689152052:D>.

storm mauve
# cedar tiger !pip dataset is interesting. Avoids the whole sql injection and the methods look...

"Avoids the whole sql injection"?
SQL Injection is not a problem if you are using any library correctly, and I bet that if you use it wrong enough you'll still end up with code vulnerable to sql injections

For extremely simple projects it might be fine, but for anything with a non-trivial schema I would recommend using SQLAlchemy (which that library is built on top off) instead of using it

placid plume
#

Hey guys. I'm new to this channel and our team recently built a SQLAlchemy extension for VS Code. It's basically an AI chat (RAG) system built out around the SQLAlchemy docs that allows you to ask questions about the project and get relevant answers without leaving VS Code (it's also completely free). Wanted to share here in case anyone found it useful! https://marketplace.visualstudio.com/items?itemName=buildwithlayer.sqlalchemy-integration-expert-jYSzG&

fervent stump
#

hello everyone, apologies if the answer is obvious, but i need some assistance identifying my server name.
i am trying to connect to my local sql server inside my .py script. however, i get the following error:
Error connecting to database: ('HYT00', '[HYT00] [Microsoft][ODBC SQL Server Driver]Login timeout expired (0) (SQLDriverConnect)')
when running select @@SERVERNAME , it shows me the default DESKTOP-###### server name.

how do i setup my connection string?
prior to this i was using a docker instance, however, the image for that was lost so i am no longer using it.

tender kernel
paper flower
tender kernel
#

recurring trades from algo bots

paper flower
#

I'm just saying that you can for example:

  1. Get latest timestamp from your db
  2. Delete everything with that timestamp
  3. Start receiving data from the timestamp you're deleted
#

so you won't get any duplicates

#

I don't think there's really a need to overcomplicate, how many trades per second are you expecting to get on average?

#

Correct me if I'm wrong, but I think something like

# Pseudocode
last_record = select(Something).order_by(Something.timestamp.desc()).limit(1)
delete(Something).where(Something.timestamp == last_record.timestamp)

for record in data_stream(from_=last_record.timestamp):
   add_to_db(record)

should work?

#

If there's not a lot of records that could be in the same second/timestamp then impact of that delete statement is pretty minimal, so there shouldn't be any obvious problems?

tender kernel
# paper flower I'm just saying that you can for example: 1. Get latest timestamp from your db 2...
  1. Timestamp is placed into the raw .csv files by the exchange not my db.
  2. Can't delete duplicates because they aren't duplicates, they are identical trades that occur in rapid succession on the excahnge within the same timestamp. It's down to the second so multiple trades can occur within the same second, in fact they commonly do.
  3. The data dumps occur about 1-2 weeks after each quarter ends. So I would be using a data dump to fill in the gaps between the last data dump and whatever gets missed by the live websocket. but if the entire entry from the live data is identical to the data dump, there is no way to differentiate between a duplicate and an identical trade.
#

there are hundreds of millions of trades

paper flower
#
  1. There would be duplicates if you accidentally go over the same timestamp twice
tender kernel
paper flower
#

that's what I was trying to avoid

tender kernel
paper flower
#

sorry, I'm not really in the mood

tender kernel
#

so if live data is recorded....

obtuse halo
#

That's not typically how you do this

tender kernel
#

hang on

obtuse halo
#

You add a 4th column for batch identity

#

Which records which input task inserted that row

tender kernel
#

so if live data is recorded... and then I add the data dump from the whole quarter to fill in any gaps, the data dump and the live trade data that exists incrementally, there is no way to tell the dumplicates from the identical trades

obtuse halo
#

Right

#

You keep track of the job that inserted each row