#databases
1 messages ยท Page 25 of 1
Also, for sensor data, consider parquet rather than a database. Parquet is a great format for storing batches of data in a high compressed, columnar format, supporting partial reads. You might store data in daily batches, etc.
people do it, e.g. in data engineering you often see people abusing jinja templates to inject data into sql queries. it's not impossible, usually the only real concern in a non-hostile scenario is unexpected quotation marks. but if you're injecting a ton of data into a query, you might want to consider instead creating a temporary table (if your db supports them) and inserting data into that.
if I have time series data of multiple sensors, which I'll typically select first by a date range, then by a sensor subset, should the rows or columns be the timestamps?
surely it has to be rows, otherwise you'd infinitely add columns as you go forward in time. or am i misunderstanding?
If I then take that time series data for a couple of sensors out of the database, transform it into some combined value, and want to load it back into a database, should I put it into the original table or create a new table, especially if the transformation operates on the full length of time series and the transformation needs to be re-performed as often as once each hour (when the data is added to the db at 1hz)
you'll have to evaluate this in the context of your particular app and performance needs, but the default imo should always be to separate "raw" ("staging") and "processed" ("transformed") data
parquet is an interesting storage choice, but i don't see much value in using that as your primary storage mechanism. however it's an excellent format for storage in a data lake, or importing/exporting between various systems.
for example the "delta lake" system is built on carefully-constructed parquet files
and many query engines (drill, datafusion, duckdb, etc) can query from parquet files more or less transparently as if they were a database
Reading this but I don't understand what is meant by transient state. Anyone know?
You don't need to be in root to make a virtual environment. But its failing because the files are owned by root and pip install requests would need sudo to install the files because the dir is owned by root. My advice is to start again as the normal user. Create a user virtual env with. python3 -m venv markve that will make a dir markve in the current dir. Look in it and you see a bin/python3. And a bin/pip. You can use that pip to install modules into markve and it doesn't need root
To activate the markve use source markve/bin/activate
That will put markve/bin/pip ahead of all other pip versions in your PATH and just install requests that way with pip install requests
You don't actually need to activate an environment to use one either. You can just call ./markve/bin/pip install requests without activating the env. Any time you want to execute a acript with that env just use the python in that dir ./markve/bin/python3 main.py
If you setup a job in cron you may want to use this method of selecting an env
it's referring to "object states" as defined at the top of that document https://docs.sqlalchemy.org/en/20/orm/session_state_management.html#quickie-intro-to-object-states
any instance in the "persistent" state will transition to the "detached" state, and any instance in the "pending" state will transition to the "transient" state
it isn't very clearly emphasizing that those words are specific states, not just general adjectives. i think that sentence could be clarified
if an object is detached and has an update pending like obj.row2 = 5 if you reconnect to that database with a new conn, and the row is still there, is that able to update that row properly when you commit() ?
i suppose i should gust go try it ๐
that i don't know, i've never needed expunge before
i suppose it's possible to "re-attach" an object to a session
well i was thinking about this bcs I have several dbs that i thought it might be useful to do something like this.
with eng.connect() as conn:
stuff = conn.scalars(select(StateObj).where(...)).fetchall()
# process stuff list and send updates on to a long running task but dont commit yet.
# expung all objects
updated_stuff = long_running_task(stuff, ... )
with env.connect() as conn:
for state in updated_stuff:
conn.merge(state)
conn.commit()
the idea is that you dont hang no to a connection (because it might fail) but you can use full objects in the long_running_task function that attaches to another db to do some cross referencing and updates fields in the stuff list of objs. Not sure if its a good idea to do in general though
Ello, I need some help :<
I have two different sql query, and for some reason, both of them returns different kind of data in case if the nothing was found in database. First query returns an empty list( [ ] ) and second one returns None. Could someone explain me how it works scince, what Im trying to achive is to get None in first query
Im using Postgresql and asyncpg to create connection with database
#first query
result = await conn.fetch(
'select * '
'from usersrelation '
'where '
'(parent_id = $1 and child_id = $2) or '
'(parent_id = $2 and child_id = $1)', user_a, user_b)
if not result:
return result
#second query
result = await conn.fetch(
f"select {', '.join(fields)} from users where email ilike $1",
f'%{email}%'
)
if not result:
return result
Also could it be that there is something wrong with Tables in DB?
This will seem dumb to python pros but can someone explain what it is when someone says envoirment in python?
Ask in #python-discussion . Itโs a good question just not a db question
@coral wasp sorry it was for my database project but placement is important i get it
can anyone check out my question about installing mysql? I can't find anything online to help i've tried everything being said to me
wait i might bes tupid
don't "ask to ask". just state your question. you're in the right channel.
Run as ./venv/bin/python3 main.py
is there another way to do this? i feel like i'm doing something wrong. i want to read all rows from a list of args using aiomysql
@classmethod
async def read_many(cls, query: str, args: List[dict]) -> List[Dict[str, Any]]:
async with connect(user=cls.user(), password=cls.password(), db=cls.database()) as connection:
connection: aiomysql.Connection
async with connection.cursor(aiomysql.DictCursor) as cursor:
cursor: aiomysql.DictCursor
ret: List[Dict[str, Any]] = []
for arg in args:
await cursor.execute(query, arg)
result: Dict[str, Any] = await cursor.fetchall()
if result:
ret.append(result)
return ret
Looking for some help: https://stackoverflow.com/questions/77506005/sqlalchemy-using-a-join-table
Sessions hold a connection under the hood, so you should use one per request, it cannot be used concurrently, same as a db connection
As of how to use it - it's up to you/your app architecture
so ehm...
I have a table of classes
I have a table of subclasses
I have a table of features, however a feature can either belong to a class or subclass
should I split it into two table of features? or how to properly set this up?
Two tables in total: classes and features. Subclasses are just classes, or not?
well, hm
a class has several subclasses
subclasses share the features with their parent class (like they have every feature that their parent class has as well), however they have their own separate features as well
it could work with classes, features and a class_feature_connector table maybe?
I remember my teacher talking smth about having connectors where I could list which ids are paired from two different tables if the connections are not singular one way
A subclass can reference its parent class, that would be a self referencing table. So you are able to select all features up to the root parent. You only need a connector table if different classes can reference the same feature.
A subclass can reference its parent class, that would be a self referencing table.
a self referencing table.... damn, alright
do I just...
add a foreign key and make it reference it's own primary key?
that sounds... kinda reasonable?
Yes, connector tables are the standard way to handle many-to-many relationships.
Aside from its own id column, the connector table would have (at least) a column for the class Id and another for the feature id
somewhere I read that it is possible to just make it have the class_id and feature_id be the primary key togther and no need for 3rd column (it's own id), is that a bad idea?
No, thats ok. It doesn't need its own id.
Yeah, something like:
CREATE TABLE classes
(
class_id INTEGER PRIMARY KEY,
name TEXT,
parent_id INTEGER,
FOREIGN KEY (parent_id) REFERENCES classes(class_id)
ON UPDATE CASCADE ON DELETE RESTRICT
);
-- features are unique to a class:
CREATE TABLE features
(
class_id INTEGER NOT NULL,
name TEXT,
FOREIGN KEY (class_id) REFERENCES classes(class_id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- or different classes can have the same feature:
CREATE TABLE features
(
feature_id INTEGER PRIMARY KEY,
name TEXT,
);
CREATE TABLE class_features
(
class_id INTEGER NOT NULL,
feature_id INTEGER NOT NULL,
PRIMARY KEY (class_id, feature_id),
FOREIGN KEY (class_id) REFERENCES classes(class_id)
ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (feature_id) REFERENCES features(feature_id)
ON UPDATE CASCADE ON DELETE CASCADE
);
what are these lines?
ON UPDATE CASCADE ON DELETE RESTRICT
(or should I just google around with it for a bit? :D)
That's how it should behave when the referenced entry is updated or deleted.
eg. if the id of the root class is updated all references in the subclasses will be updated with the new id.
ooooh, that is nice
but if you try to delete a root class which is still referenced in an subclass you get an error because it's restricted.
I have so much to learn about sql goddamn
alright, I'll fidget around with it for a while
and also still writing the code to pull everything nicely from my google sheets databases (I know... but I was young and stupid + too lazy to learn sql (+it's personal project so I'm not endangering anyone ๐
))
Nothing wrong with spreadsheets. It's pretty much application specific.
is there website or something I can use to learn more about regexp matching pattern? more specifically, with the use of special characters such as \ or " " or []?
Tbh, I think the best place to study this is the documentation: https://docs.python.org/3/library/re.html.
You can skip over the irrelevant parts but itโs nice to know that things like lookaheads and lookbehinds exist (for instance)
Is it possible to import google docs data into sqlite3 database automatically?
At Regular-Expressions.info you will find a wide range of in-depth information about a powerful search pattern language called regular expressions.
I googled it for you: https://www.xlwings.org/tutorials/connect-googlesheets-to-sqlite
if i am running multiple sql queries like so during migration
-- Create the table with two columns
CREATE TABLE test (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
-- Alter the table to add a new column
ALTER TABLE test
ADD COLUM new_column INTEGER;
but the second query has an error so none of the queries are ran right? i dont need to use transactions with asyncpg that means?
Supposedly a pickle file isn't the best choice for "large datasets" - does anyone know how large we're talking?
I'm writing app to manage file metadata and there will be several hundred thousand files being managed on many users' systems, and each file will have metadata, is this too large for pickle to handle efficiently?
or are they talking like millions or billions of rows type of large?
I'm also considering using a flat file with JSON data but then I have to provide serialization formatting for every single object when Im likely never going to specifically need the JSON portability anyway
pickle is not the best choice for anything that will live on multiple systems if you consider the security risks
Just use JSON. You can use something like pydantic to serialize and parse the objects for you.
Will pickling the data be more space-efficient than JSON data? Im not sure which type of compression if any is used, but I could always gzip or 7zip the JSON anyway
btw, my use case here is a desktop app with no networking needed, its just a way for a user to manage some files locally only
Luckily I forgot I actually already prepared most of my data for JSON serialization, so I could def go that route, thanks for the pydantic rec as well I hadnt heard of that
have you considered using sqlite to store your data? if not to normalize your data, then at least integrity guarantees to avoid corruption while you write JSON to it
oh yeah their documentation advocates for it too https://sqlite.org/appfileformat.html
Ive thought about it... I just felt it may be kinda heavy for this MVP im building
Thanks, reading that page
Youre right though there is nothing saying I need an elaborate table structure, but could get some benefits with just 1 table and column technically lol
i guess it'd be different to work with compared to parsing/dumping a JSON file, but the overall complexity shouldn't be much higher
i would still try to take advantage of SQL, for example having a table with rows for each file and their metadata so you can more easily search and filter them
if you still want to store some fields as JSON, newer versions of sqlite provide JSON functions/operators by default to help with more complex queries
this sounds like a bad case for pickle, because pickle will be really fragile across python versions
strong +1 for sqlite both for performance reasons and for avoiding the mess of pickling
and another +1 for json-in-sqlite, it works pretty well
HELP PLEASE
Hi, how to solve sqlite3 thread error, can anyone help me for it
How to find object that causes exception?
Neither 'BinaryExpression' object nor 'Comparator' object has an attribute 'c'
self = <sqlalchemy.sql.elements.BinaryExpression object at 0x7fee0f673b50>
key = 'c'
By the stacktrace?
Nah, SQLAlchemy points only to internals, doesn't show object string id. Bug was found anyway, had to drop relationships and turn it on one by one
I mean, you're just accessing c property on a comparator
Share you stacktrace
Or are you getting this error when trying to serlialize it to a pydantic model? ๐ค
(That's the only explanation I can find of why you can't debug it)
i use mac and for some weird reason my pgadmin 4 wont open ? has anyone else experienced this ?
Looks like strict primaryjoin expression caused this error. Moved a lot of things to backref and back_populates and not it works. Magic... Anyway, I couldn't find breakpoint where I can see all objects, I thought SQLAlchemy has a debug flag for such cases not only echo
You shouldn't really do explicit joins unless you need to filter by your related entities ๐ค
sqlalchemy can do relationship loading with it's loading options
Yeah, but if you want to access to model B from A but not vice versa, why not to use primaryjoin and make code cleaner?
Why wouldn't you want to do it vice-versa?
joins are just not made for that (at least in sqlalchemy)
model B could be in another schema and not even related by business processes. Not my architecture, don't blame me
It's easier to do select(User).options(joinedload(User.address)) than doing a join manually though ๐
wait. I meant lazyloading with relationship() and selectinload
- Don't do lazy loading
- Don't do lazy loading
- Disable lazy loading ๐
I always do explicit eager loading on query level, because lazy loading is bad in 99% of cases
Select IN loading
In most cases, selectin loading is the most simple and efficient way to eagerly load collections of objects. The only scenario in which selectin eager loading is not feasible is when the model is using composite primary keys, and the backend database does not support tuples with IN, which currently includes SQL Server.
That's official documentation for 2.0 and strategies. joineload uses JOIN strategy. If I need just one parent with collections selectinload works faster for me
I mean, use whatever loading is more suitable for your selationship
If it's 1:1 relationship then joined is probably better
Use dbeaver hehe
can import my databse on there ?
It's same as pgadmin
But better imo
Was a pgadmin4 user then swtiched to dbeaver
i looked into it and i would still prefer pgadmin
๐ฎ
Hey guys! So I have a Postgres db hosted on ElephantSQL and using it with Django REST framework but whenever I start my app using manage.py with the Django default dev server it gives me a lot of broken pipe errors in the console log
what is the convention on models, to pluralize the classname
class Houses(Base):
__tablename__ = "houses"
or not
class House(Base):
__tablename__ = "houses"
__tablename__ = "houses"```
k thanks
Hey all, I'm having some serious trouble constructing this SQLAlchemy 1.4 query. I have a mixin ORM object that defines a relationship on everything it's mixed in to, and I want to write some type of computed property that allows me to query a subset of that relationship. This is what I've got -- it all works except for the narrowed_children column attribute
class HasChildren:
@classmethod
@property
def children_join_tablename(cls):
return f"{cls.__tablename__}_roles"
@classmethod
@property
def children_backref_name(cls):
name = cls.__name__
return re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
@declared_attr
def children(cls):
return relationship(Child, secondary=cls.children_join_tablename, backref=cls.children_backref_name)
@declared_attr
def narrowed_children(cls):
# tried this with lazy='dynamic' on children
# return column_property(select(cls.children).filter(Child.special_property == True))
# this also fails
return column_property(
select(Child).
join(cls).
filter(Child.special_property == True).
where(Child.id.in_(cls.children.id))
)
The error is
sqlalchemy.exc.ArgumentError: Join target, typically a FROM expression, or ORM relationship attribute expected, got <class 'User'>.
where SQLBase is my declarative base and User is given by class User(SQLBase, HasChildren): ...
Afaict it fails because the mixin is processed before the mapper knows about the User class? How do I even rectify that?
changing out column_property for deferred produces the same error
Please do let me know if I've XY'ed this as well, or if 2.0 provides a feature here I can use
So... I have a hobby project where there is a postgres docker container in production that I have to manually admin. I know that's a reallyl poor choice for an enterprise database and most people prefer to just pay for a managed db, but this is a hobby that is very unlikely to make me any money ever. There is a very limited amount of sensitive data. So its not THAT big of a deal imo. The thing is, I would still like tto avoid losing my user data if possible, just so that they can continue using the app in the future.
I already have the volumes set up like this so that the data persists
volumes:
- ~/srv/docker/template-postgres/data:/var/lib/postgresql/data:rw
My question is... how should I be approaching backing this thing up? What other things would you consider thinking about if you were doing this way? Again, I know there are reasons people do not do this. In this case, the risk is worth it to me, and im not afraid to learn some things about db admin as i go.
Do I need to install the sqlite3 library in Python or is it included?
it's included
import sqlite3
class Database:
def __init__(self, database, connection):
self.connection = sqlite3.connect("rmc.db")
self.cur = connection.cursor()
self.RMC_TABLE = '''
CREATE TABLE IF NOT EXISTS user_data (
discord_id TEXT PRIMARY KEY,
game_id TEXT DEFAULT NULL,
security_level INTEGER DEFAULT 0,
staff_role TEXT DEFAULT NULL,
event_log TEXT DEFAULT NULL,
activity TEXT DEFAULT NULL,
affiliation TEXT DEFAULT NULL
);
'''
try:
self.cur.execute(self.RMC_TABLE)
self.cur.commit()
except sqlite3.Error as e:
print(e.message)
finally:
self.cur.close()
Will this function as intended? I want it to initiate the database when called upon (unless it already exists).
almost. some things to note:
-
you're using
commiton the cursor, but that method is on the connection. that will result in an error. -
you're using
commitbut notbegin. either you need both (autocommit=False) or neither (autocommit=True). see https://docs.python.org/3/library/sqlite3.html#transaction-control -
don't just print an error if the operation fails. handle it appropriately in your application. usually converting "errors" to "informative error messages" should happen at the edges of the application, not deep inside. consider: what happens if there is in fact an error? you print an error message, but your application continues running in a broken state, likely resulting in more errors later. is that really what you want? #software-architecture is a good place to discuss this topic.
-
don't re-use cursors. one query per cursor. weird things happen otherwise.
-
you can get a cursor directly by calling
executeon the connection. you can just write this:
connection.execute(query)
- if you do want to create and close cursors, you can use
contextlib.closinginstead of try/finally:
from contextlib import closing
cursor = connection.cursor()
with closing(cursor):
cursor.execute(query)
results = cursor.fetchall())
The face that you're using Postgres in docker doesn't change much https://simplebackups.com/blog/docker-postgres-backup-restore-guide-with-examples/
Thank you
this isnt python related but i dont know where else i could ask this, im trying to make a database for an uber-like website and its my first time using databases, i usually just use json and files but i think its a very bad practice so i want to stop, what improvements can i make to this database? its the first version but im inexperienced so i cant see any flaws
the backend is written in python though, im just using phpmyadmin to visualise it
i don't like idea of using different length varchar as Foreign Key between those tables
can we have a normal ID autogenerated/autoincremented/ingeterish one for them
something like bigserial
you mean like you think all of the primary keys should be the same length?
i mean they should be different data type
ah i see
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-primary-key/
ID SERIAL PRIMARY KEY
this is a good default
In this tutorial, we will show you what is the primary key is and how to manage PostgreSQL primary key constraints through SQL statements.
it will be autoincremented some sort of interger
https://www.tutorialsteacher.com/postgresql/serial-type
SMALLSERIAL 2 bytes 1 to 32,767
SERIAL 4 bytes 1 to 2,147,483,647
BIGSERIAL 8 bytes 1 to 9,223,372,036,854,775,807
here u go. SERIAL is int32 under the hood
ty
@wise goblet but apart from the primary key type, the database is fine? is not using any sort of bad practices?
ah i see its apparently faster to lookup and compare
I don't think you nede to be that strcit with your types,
journeys.date and time could be a timestamp, journeytime could be a timedelta
Generally don't use strings for foreign and primary keys unless necessary
is the reasoning because its faster or is there more?
Year manufactured could probably be a date too
Reasoning behind what?
using integers over strings
ok ty
and they dont need to be the max length right
e.g i can use 1 instead of 0001
and it wont cause any problems
fare is only up to 99.99?
change journeytime to timedelta
In postgres it's called interval
I think it makes more sense, because it's a type specifically designed for that
From your current column I can't tell what it is - minutes, seconds, milliseconds, etc
And If you store date and time for both departure and arrival times you won't need that ๐ค
i dont store arrival times
The main reason is your key is not tied to your implementation details and you will not encounter in a future a problem of inability to insert a record because you exhausted your primary key limits
It is a key u will never (probably )have need to change or improve
Hey I'm trying to query a Postgres database using Sqlalchemy. I can't express this filter clause for some reason. Basically, I want to express A(B+C) (where A,B,C are boolean expressions), so some options I've tried(sqlalchemy => generated sql): query(Model).filter(A).filter(or_(B,C)) => WHERE A AND or(B,C)
query(Model).filter(and_(A, or_(B,C))) => WHERE A AND and(or(B,C))
query(Model).filter(or_(and_(A,B), and_(A,C))) => WHERE or(and(A,B), and(A,C))
All of these are invalid. How do I express this predicate with sqlalchemy?
A(B+C)?
I don't remember such boolean operations to be possible. / To exist.
This does not make sense.
It is like trying to summate carrots and potatos
Write us expected this equation in pure python booleans first and print result?
How would sql you want look like? It should be easy to convert to sqlalchemy
query(Model).filter(and_(A == True, text("B = 1 or C =1"))
When ORM fails use the raw query ||Luke||
and use echo=True, becaue in my case or_ was translated to AND. Nested or_ filter again
You need need raw sql here
You could use use or_
Hi i am using mysql and sqlalchemy, i have the following two tables:
class DbUsers(Base):
__tablename__: str = 'users'
id = Column(String(50), primary_key=True, default=lambda: str(uuid.uuid4()), nullable=False)
is_deleted = Column(Boolean, default=False)
company = relationship('DbCompanies', back_populates='user', cascade='all, delete-orphan')
class DbCompanies(Base):
__tablename__: str = 'companies'
id = Column(String(50), primary_key=True, default=lambda: str(uuid.uuid4()), nullable=False)
is_deleted = Column(Boolean, default=False)
user = relationship('DbUsers', back_populates='company', cascade='all, delete-orphan', single_parent=True)
Until now i used to delete them normally with cascade, when i delete the company, user gets deleted as well. But i want to make change to this, instead of deleting them i want to just change the is_deleted column to True. Is there a way to set it up so when i change the is_deleted value in one of the tables for it to automatically change in the related table as well ?
what are my options to turn my local database into a database that can be queried over the internet?
What do you want exactly?
where A and (B or C) this works. A,B,C are boolean expressions like before.
I have an org-roam database I would like to be able to query, but when I'm outside my house
where(a, or_(b, c))
Just use a db that can be accessed over the network? Like postgresql or mysql
will begin learning about these, am literally at step 0 and am used to local sqlite3. thanks for the keywords!
I would just install postgres locally, or locally in docker if you're familiar with it
this generates WHERE A AND or(B,C) which is invalid sql as well
I'm using 1.4 btw, could that be the issue? Can't upgrade with this particular project
Why can't you? ๐ค
1.4 queries should work with 2.0 but I'll try to help you a bit later, I'm not at home at this moment
It's an ancient project with tons of code that I'd probably have to change for any breaking upgrade
sure thanks
right i'll try that
I may be wrong though
Worked with query(Model).filter(A & (B|C)). Seems off that and_/or_ produce different results from &/|
They should do the same though
Where did you import or from?
sqlalchemy.func
Import them from sqlalchemy
They're not functions
Did you do func.or_?
Func allows you to call arbitrary sql functions, but as I said or and and are not functions but operators
yeah I guess this is where I went wrong
sqlalchemy.func.or_ instead of sqlalchemy.or_
thanks
can confirm. What a dumb rabbit hole.
nice, thanks for the suggestion!
A true Python uber-like service ๐
Does anyone have experience working with shapefiles (.shp) and GeoTiff data (.tif)?
Hey guys, I am making a database for some sprites I want to use in my actual code with sqllite 3
how do i start?
I would start with the sqlite3 tutorial in the documentation https://docs.python.org/3/library/sqlite3.html#sqlite3-tutorial
yup thats what im following
Yo guys, I wanna link two MySQL databases to my python code on Visual Studio Code. How do I do that?
It has nothing to do with vs code, you just need a Python library that supports MySQL https://dev.mysql.com/doc/connector-python/en/
Where does the conversion of python / templating -> raw sql happen within SQLAlchemy ? I'm trying to get the raw sql for some fairly complex dynamic queries and it's proving to be far more difficult than expected... but presumably somewhere it must convert from execute(text(query), params) into a raw sql string ? I just can't find where to hook into this
seems that it's here : https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/engine/base.py#L1416
though when i drop a breakpoint I just get:
(Pdb) type(meth)
<class 'method'>
which isn't that helpful.
lib/sqlalchemy/engine/base.py line 1416
return meth(```
๐
Any resources to learn datavases
Having an issue with my program, basically, I call sqlite3 to check whether a user is whitelisted using their game_name, I have added robust logging to ensure that variables are all right, which they are, just getting a bit confused at this rate as it could be an issue with the script that checks the database to see if they are whitelisted, or the one that adds them to the whitelist.
Figured it out, didn't have the right path set in the one that updates the database with the game names ๐ญ
documentations
Can someone point out the difference between 2 NF and 3 NF ? I got confused due to the similarity .
reading the 2NF and 3NF wiki, the quote from Bill Kent seems to be a good summary:
[every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.
the "whole key" represents 2NF, meaning attributes can't depend on only a subset of the primary key, and "nothing but the key" represents 3NF, where attributes can't transitively depend on the primary key
see also the examples: Electric toothbrush models Manufacturer | Model | Manufacturer country ===========================---------------------- Forte | X-Prime | Italy Forte | Ultraclean | Italy Dent-o-Fresh | EZbrush | USA Brushmaster | SuperBrush | USA Kobayashi | ST-60 | Japan Hoch | Toothmaster | Germany Hoch | X-Prime | Germany Tournament Winners Tournament | Year | Winner | Winner's date of birth ============================----------------------------------------- Indiana Invitational | 1998 | Al Fredrickson | 21 July 1975 Cleveland Open | 1999 | Bob Albertson | 28 September 1968 Des Moines Masters | 1999 | Al Fredrickson | 21 July 1975 Indiana Invitational | 1999 | Chip Masterson | 14 March 1977 the first table violates 2NF because Manufacturer country does not depend on Model, and the second table violates 3NF because Winner's date of birth directly depends on Winner rather than the primary key
peewee is godd!
i hope it does have SQL version migrating system then ๐
XXDD
I think it's not commonly used migrating.......well for me)
im fine using peewee
with ORM of the peewee is simple to do my project
it looks like it does support migrations:
https://docs.peewee-orm.com/en/latest/peewee/playhouse.html#migrate
?XDD
but it no support all type of database
if u want migration to db noSQL
u can't
migrations can more simply be changing something about your model for an existing database, like adding new columns or altering constraints
Anyone know what's going on here?
File "/home/runner/ADV/advProject/main.py", line 5, in <module>
from gameapp import models
File "/home/runner/ADV/advProject/gameapp/models.py", line 4, in <module>
class Player(models.Model):
File "/home/runner/ADV/.pythonlibs/lib/python3.10/site-packages/django/db/models/base.py", line 129, in __new__
app_config = apps.get_containing_app_config(module)
File "/home/runner/ADV/.pythonlibs/lib/python3.10/site-packages/django/apps/registry.py", line 260, in get_containing_app_config
self.check_apps_ready()
File "/home/runner/ADV/.pythonlibs/lib/python3.10/site-packages/django/apps/registry.py", line 137, in check_apps_ready
settings.INSTALLED_APPS
File "/home/runner/ADV/.pythonlibs/lib/python3.10/site-packages/django/conf/__init__.py", line 102, in __getattr__
self._setup(name)
File "/home/runner/ADV/.pythonlibs/lib/python3.10/site-packages/django/conf/__init__.py", line 82, in _setup
raise ImproperlyConfigured(
django.core.exceptions.ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.```
apparently you need to define your settings before you can use models?
https://docs.djangoproject.com/en/4.2/intro/tutorial02/
https://docs.djangoproject.com/en/4.2/topics/db/models/#using-models
well bro give u docs XDD
i dont see a .models file
This seems exactly like django orm ๐
What are Querys are they like sending commands to be executed on the database itself or are query's only requesting information from the database?
Whatโs the difference? Either way, youโre โaskingโ the db to do something: return the data you wanted.
For example, I might say: โgive me all people who are older than 18 years oldโ select * from people where age>18
like are querys read only or can i manipulate add change and removestuff?
Yes, there are read queries (selects) but also delete and update queries. I could write: update person set type=โboomerโ where age > 25
ohh tysm
anyone used hudi before to join table on hive in spark sql? is it possible? i'm running a query and nothing is coming up. it just shows it ran for some time. I'm able to query specific data from hive table as well as hudi, but joining them altogether gives me nothing.
Use Pyspark

Let's suppose i have a table where i have three rows (Room, Username, Status) where Room can have duplicates
Status is 0 for everyone and i want to make it 1 for random username from specific room
are you asking how to accomplish this in sql? or something else?
Yes how to do this
Hey guys how I implement regression
Did you mean to ask in #data-science-and-ml ?
Anyway: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
Examples using sklearn.linear_model.LinearRegression: Principal Component Regression vs Partial Least Squares Regression Plot individual and voting regression predictions Comparing Linear Bayesian ...
Regression #data-science-and-ml
hello, let's say you have a relational database running in prod, maybe an instance of postgres, and it's been running for a while storing a bit of data. on your local environment with your mock data, you can change/modify the tables etc.. maybe adding a new row field, how would you apply these changes to the production instance without losing the existing data?
You apply migrations if you are using an ORM, or you run the SQL commands to update the schema. Either way, adding new rows should not threaten existing data, but you obviously want backups to be sure.
Is there any good guide on joins, I don't understand them ๐
https://sqlbolt.com/lesson/select_queries_with_joins is where i learned it (also the lesson after that which mentions outer/full joins), though there are some alternative syntaxes not included
Ty
How does cardinality affect the effectiveness of indexes? I've read that higher cardinality columns benefit most from indexing. Is that accurate?
can you be a bit more accurate? cardinality describes relations between tables in databases
one to many
many to one
one to one
many to many
I meant it in the meaning of.. basically the number of unique values. Like cardinality of a set
The more unique values you have the more it makes sense to create indexes. It speeds up querys by a lot, just be careful with them
assuming we are talking about sql based dbs
Can I take that the other way around and say that if, say, u have only 3 unique values - indexing makes little sense in such a case?
Yes, do you know how an index works?
Only roughly. I'm not very good with data structures
Letโs keep it simple, an index is basically a shadow table ( you canโt access it) thatโs sorted specially. He parses the alternate table which is faster, then compares it with your actual table. This is slow if you have small amounts of data however itโs quick on lots of data
Yeah, I get that much.. but like. I know btree is generally the default, but I'll take a hash version, for example. We got our value in the key, and the IDs in memory in a set as the value. I don't really see how the number of unique values has anything to do with the effectiveness of the index. I mean, accessing the key is O(1) either way.. whereas without an index a value search would be O(n).. so if you have, say, a billion rows, I don't quite see how those billion rows being 3 unique values vs a million unique values should make a difference, but apparently it does
Youโre thinking about it the right way, itโs just the point that the query and the data (including cardinality) all have to be considered when thinking about whether an index helps or not (and how much)
Sometimes a database will โdecideโ to just do a full table scan instead of using an index (assuming one exists), as they have a cost function to decide which query execution plan is โbestโ
For instance, if you had a 1 million row table, with an indexed column A with three distinct values randomly distributed: {1,2,3}. If you want all rows where A=1, it may be faster for the database to just read the entire table (in pages) and pick out the rows where A=1 iteratively, rather than consulting the index and retrieving the rows. The reason is: the database would likely need to read each page into memory anyway... and a sequential scan in this case is probably faster than a random scan. (I'm waving my hands at the details here, just giving you the general idea)
Ig that kinda makes sense. Ig I don't quite understand the underlying implementation of exactly how records are searched over and retrieved. Cus, in my understanding, even so, with 3 randomly distributed values, with an index you'd only need to 1) access the index to get the locations (O(1)), 2) retrieve the rows from the location (O(1) for the retrieval, but O(m) to loop over the locations, so O(m), where m is the frequency of the value, which by definition will be lower than n (the total number of records). And.. a sequential scan would still be the entire database.. O(n) ๐
There are probably just some optimizations I'm not thinking about. Anyway, I'll just take it at face value. Thx for the explanation
The optimization part is pretty straight forward, but has to do with memory and storage access. Sequential is faster than random access. So, sometimes it's better just to do things sequentially rather than randomly. (again, a gross simplification, but just pointing out the principle at play)
Got it ๐
guys, does anyone know of a good database which i can use? i require it for some business and idk which one to choose. tell me of one which is free
PostgreSQL
In terms of libraries to integrate PostgreSQL in a python app, service, program, you could look at these libraries:
- https://www.sqlalchemy.org/
- https://alembic.sqlalchemy.org/en/latest/ (for automating schema management)
Just in case you happen to write an asyncio (or perhaps FastAPI) based python app, service, program, it could be for example: - https://magicstack.github.io/asyncpg/current/
hmm ok
can u explain a bit how to use it in my program? like i wanted to add a chat availability to my code
im atually quite a noob to coding lol
uh ehm I am a bit busy, but let me Google real quick if I find some useful tutorial
ok thx
A bunch of links not directly related to database choice, but more the application layer:
https://gealber.com/simple-chat-app-websockets-fastapi
https://fastapi.tiangolo.com/advanced/websockets/
https://www.youtube.com/watch?v=OorurBA6x4w
Take these with a grain of salt.
alr rlly thx sven
If you want to really get serious and have a database full of millions of chat messages, consider using TimescaleDB as the database backing your application layer. It is "just" an add-on to PostgreSQL for massive data that usually does not change after having been written. There, you make a PostgreSQL table and then convert it into a TimescaleDB hypertable. You can define data retention policies (to wipe old chat messages), data compression, etc.
A good first glance intro to TimescaleDB is https://www.youtube.com/watch?v=vm_hcP4CDbA&list=PLsceB9ac9MHScvW5NBuCaYafW87hP-Gi2&index=1 but you'd certainly also need to consult the official documentation and research some stuff. It is not that self-explanatory, but one gets used to it. You can talk to is using for example asyncpg (link above) if you are in an asyncio (or FastAPI) context.
hmm alr ima look into it thx sven. uve rlly helped me out here
@grave ruin Allow me to mention one last important thing. If you go that road, definitely consider the PostgreSQL / TimescaleDB data type JSONB as in this example, to store your chat message as a JSON document. https://docs.timescale.com/use-timescale/latest/schema-management/json/ I am sure Discord has a much different backend and tricky optimizations for even more efficiency (you could explore their developer blogs), but the links I provided to you are, advanced, serious stuff ๐
alr ill try it
hope it works ive tried so many databases but for some reason they werent working incorporatedly with my code
I think you mentioned my instead of someone else ๐
hi does anyone know about LZ77
What about it? Also #algos-and-data-structs is probably more appropriate.
ok. thank you
Sorry I'm stupid but how do I need to change this SQL query such that it will return all the tags and values and not just the one mentioned in the WHERE clause:
SELECT images.image_path, tags.tag_name, tag_values.value
FROM images
LEFT JOIN tag_values ON images.id = tag_values.image_id
LEFT JOIN tags ON tag_values.tag_id = tags.id
WHERE (tags.tag_name = 'sport' AND tag_values.value = 'soccer') ORDER BY images.id
if I don't have the WHERE clause it will return me all the tags and values, I just want to filter them but it should still return all of the associated tags and values per row
select * ....'remaining query'
no?
nvm, I dont even know what left join is
I'm quite oblivious to this SQL stuff but this is stumping me right now
all I know is cross, natural and equijoin
me too
if I have the WHERE clause then I'll just get that specific tag+value back with the image, not all of them
do you know how to connect python to mysql
then remove it
if I don't have the WHERE clause I get all the data for all the images but I just want to select which ones I get back
I mean there are other tags.tag_name + tag_values.value combinations I'm not getting back with this query
then choose another condition
which ones of what? are you trying to get the tag names/values for a specific image, or the tags of all images that have the sport: soccer tag?
I want to select all images which have tags.tag_name = 'sport' AND tag_values.value = 'soccer' but for every image that matches that I want ALL of the tags, not just the tags.tag_name = 'sport'
so there are other tags like 'league' etc. I'd like to get them too
but I don't want images which have lets say tags.tag_name = 'sport' AND tag_values.value = 'hockey'
also I'm using PostgreSQL, I'm not sure is it relevant
the original query I pasted just returns ('foo\\bar\\000000.png', 'sport', 'soccer')
hello
do you know the module mysql connector?
i know mysql-connector-python exists, but i dont use it or mysql
what specific question do you have about it? if you ask that upfront, i or someone else can try to answer it right away
so the tags of all images that contain sport: soccer, i think i'd write a CTE to filter the images first, then select the tags for those images: sql WITH soccer_images AS ( SELECT image_id FROM tag_values WHERE tag_id = (SELECT id FROM tags WHERE tag_name = 'sport') AND value = 'soccer' ) SELECT images.image_path, tags.tag_name, tag_values.value FROM tag_values JOIN soccer_images ON soccer_images.image_id = tag_values.image_id JOIN images ON images.id = tag_values.image_id JOIN tags ON tags.id = tag_values.tag_id;
assuming the schema and data looks something like this: https://paste.pythondiscord.com/WVPA
Yes I think that works, thanks a lot! At least the following seems to return all the tags for images with the matching tag:
WITH soccer_images AS (
SELECT image_id FROM tag_values
WHERE tag_id = (SELECT id from tags WHERE tags.tag_name = 'sport' AND tag_values.value = 'soccer')
)
SELECT images.image_path, tags.tag_name, tag_values.value FROM tag_values
JOIN soccer_images ON soccer_images.image_id = tag_values.image_id
JOIN images ON images.id = tag_values.image_id
JOIN tags ON tags.id = tag_values.tag_id;
I'll need the ability to filter for multiple tags too but I think that should be easy to add
this SQL stuff is kinda annoying because it is seemingly a bit difficult to even google for
select image_id
from tag_values v
join tags t
on v.tag_id = t.id and t.tag_name in ('sport', 'something else') and v.value in ('soccer', 'basketball')
This is just an alternative, without using a correlated subquery
thanks, I'll try to get this one working, I'm having a helluva time figuring out how do brackets and stuff work in SQL
for multiple filters
What do you mean? Just show an example of what you're trying to do, there's many ways to solve sql problems.
is (SELECT id FROM tags WHERE tag_name = 'sport') a correlated subquery? i haven't referenced anything from tag_values
You could do group by + having
select * from image
join tag on image.id = tag.image_id
where tag.name in ('sport', 'soccer')
group by image.id
having count(tag.id) == :AMOUNT_OF_TAGS_SELECTED:
Hyvok took yours and added "tag_values.value" to it, making it correlated... this line: WHERE tag_id = (SELECT id from tags WHERE tags.tag_name = 'sport' AND tag_values.value = 'soccer')
ah right
soo this again returns just the filtered tags, not all of them, the one by @waxen finch I adjusted earlier worked for one condition at least (the one I pasted before your solution)
Oh, if you want all the tags for the image too: ```sql
select i.image_path, t.tag_name, v.value
from images i
join tag_values v
on v.image_id=i.id
join tags t
on t.id=v.tag_id
where i.id in (select image_id
from tag_values
join tags on tags.id=tag_values.tag_id
and tag_values.value = 'soccer'
and tags.tag_name = 'sport')
That said, this really isn't different than the CTE example, just showing it written differently.
Also, in this case, I'd probably be looking to concat the tags, rather than one row per tag, such as: ```sql
select i.id, i.image_path, group_concat(distinct t.tag_name order by t.tag_name) tag_names, group_concat(distinct v.value order by v.value) values
from images i
join tag_values v
on v.image_id=i.id
join tags t
on t.id=v.tag_id
where i.id in (select image_id
from tag_values
join tags on tags.id=tag_values.tag_id
and tag_values.value = 'soccer'
and tags.tag_name = 'sport')
group by i.id, i.image_path
yes ok that works again, thanks, so what if I have multiple tags I want to match lets say "sport":"soccer" and "league":"la_liga"
If you know the number of conditions, you could do:
select i.id, i.image_path, group_concat(distinct t.tag_name order by t.tag_name) tag_names, group_concat(distinct v.value order by v.value) values
from images i
join tag_values v
on v.image_id=i.id
join tags t
on t.id=v.tag_id
where i.id in (select image_id
from tag_values
join tags on tags.id=tag_values.tag_id
and tag_values.value = 'soccer'
and tags.tag_name = 'sport')
and i.id in (select image_id
from tag_values
join tags on tags.id=tag_values.tag_id
and tag_values.value = 'la_liga'
and tags.tag_name = 'league')
group by i.id, i.image_path
I have the conditions in a Python dict so yeah I can add them there, thanks!
Yah, if you have arbitrary numbers of conditions (sometimes 2, sometimes 3, sometimes 4), you could either have a query for each case... or do what Doctor suggested before: #databases message
this doesn't seem to work, no group_concat(): Error fetching data from PostgreSQL: function group_concat(text) does not exist
Oh, use string_agg(column, ',') https://www.postgresqltutorial.com/postgresql-aggregate-functions/postgresql-string_agg-function/
uhh like this string_agg(distinct t.tag_name, "," order by t.tag_name)?
I think so? Not sure if distinct will work there
Just keep it simple and do string_agg(t.tag_name, โ,โ) first
The order and distinct just help clean it up
I don't seem to get anything back with that at least
I'm not sure in general should I even try to have a single query for all of this or just separate queries for each tag
in the final use case I'd be querying quite a large amount of data per query since this is for video frame archival
though performance is not a concern really since this is used for machine learning training, just forming the initial dataset
Best bet is whatever makes the most sense to you: whatever is easier for you to edit/fix/update.
Even if it's multiple queries and doing an intersection in Python, that's not bad for starting out.
yeah I mean each model training takes x days or whatever on the GPU cluster so it doesn't really matter if it takes 0.1 or 50 seconds for the queries to form the dataset, I just dislike doing something I don't quite understand the implications of
yeah idk I can't figure out the logic with this thing, this works:
select i.image_path, t.tag_name, v.value
from images i
join tag_values v
on v.image_id=i.id
join tags t
on t.id=v.tag_id
where i.id in (select image_id
from tag_values
join tags on tags.id=tag_values.tag_id
where (tag_values.value = 'soccer' and tags.tag_name = 'sport')
)
but this doesn't
SELECT i.image_path, t.tag_name, v.value
FROM images i
JOIN tag_values v ON v.image_id = i.id
JOIN tags t ON t.id = v.tag_id
WHERE i.id IN (
SELECT image_id
FROM tag_values
JOIN tags ON tags.id = tag_values.tag_id
WHERE
(tags.tag_name = 'sport' AND tag_values.value = 'soccer')
AND
(tags.tag_name = 'league' AND tag_values.value = 'serie_a')
)
adding parenthesis like
SELECT i.image_path, t.tag_name, v.value
FROM images i
JOIN tag_values v ON v.image_id = i.id
JOIN tags t ON t.id = v.tag_id
WHERE i.id IN (
SELECT image_id
FROM tag_values
JOIN tags ON tags.id = tag_values.tag_id
WHERE
((tags.tag_name = 'sport' AND tag_values.value = 'soccer')
AND
(tags.tag_name = 'league' AND tag_values.value = 'serie_a'))
)
doesn't seem to help, then it just doesn't return anything
(correction: below query selects images with either tag, not images containing both tags as OP intended)
@vast trench oh yeah, grouping makes it easy to filter images with the HAVING clause: sql SELECT images.image_path, string_agg( tags.tag_name || ': ' || tag_values.value, ', ' ) AS tags FROM tag_values JOIN images ON images.id = tag_values.image_id JOIN tags ON tags.id = tag_values.tag_id GROUP BY images.id HAVING bool_or( tags.tag_name = 'sport' AND tag_values.value = 'soccer' OR tags.tag_name = 'league' AND tag_values.value = 'National Hockey League' ); assuming there's an aggregate ANY function, which i couldn't find while i was testing it in sqlite...
oh right, i could do sum(...) > 0 in sqlite
btw your AND operator means you're trying to find a tag that both has the name sport and league at the same time, which isn't possible
SELECT
images.image_path,
string_agg(
tags.tag_name || ': ' || tag_values.value,
', '
) AS tags
FROM tag_values
JOIN images ON images.id = tag_values.image_id
JOIN tags ON tags.id = tag_values.tag_id
GROUP BY images.id
HAVING bool_or(
tags.tag_name = 'sport' AND tag_values.value = 'soccer'
AND tags.tag_name = 'league' AND tag_values.value = 'la_liga'
)
returns nothing again
^
sec
doesn't seem to work either :/
with
SELECT
images.image_path,
string_agg(
tags.tag_name || ': ' || tag_values.value,
', '
) AS tags
FROM tag_values
JOIN images ON images.id = tag_values.image_id
JOIN tags ON tags.id = tag_values.tag_id
GROUP BY images.id
HAVING bool_or(
tags.tag_name = 'sport' AND tag_values.value = 'soccer'
OR tags.tag_name = 'league' AND tag_values.value = 'la_liga'
);
I get rows with "league":"serie_a" etc.
if I change the OR to AND then I get nothing
SELECT
images.image_path,
string_agg(
tags.tag_name || ': ' || tag_values.value,
', '
) AS tags
FROM tag_values
JOIN images ON images.id = tag_values.image_id
JOIN tags ON tags.id = tag_values.tag_id
GROUP BY images.id
HAVING bool_or(
(tags.tag_name = 'sport' AND tag_values.value = 'soccer')
AND (tags.tag_name = 'league' AND tag_values.value = 'la_liga')
)
results in nothing but OR (tags.tag_name = 'league' AND tag_values.value = 'la_liga') returns (wrong) stuff
hmm starting to lean back towards this CTE (except with a join on tags like billy showed), but INTERSECTing multiple selects to find the image IDs that have all the desired tags
yeah that worked but I couldn't figure out how to extend that to multiple filters either
by intersecting selects?
yeah not sure how to do that
https://sqlite.org/lang_select.html#compound_select_statements
https://www.postgresql.org/docs/current/queries-union.html
have one SELECT to find the images with sport: soccer, another SELECT to find the images with league: la_liga, and INTERSECT them so you only get images found in both sets
this just seems super complicated for such a basic operation, is my DB layout exceptionally dumb or something?
I have no fixed layout yet so I can change whatever
from a normalization standpoint the schema seems fine to me and i believe satisfies at least 4NF, assuming there isn't any other expectation about your data model like value being dependent on tag_id, which could violate BCNF (though i could be wrong since i'm not particularly versed in normalization)
as for querying it, there's no query i can think of that doesn't involve multiple SELECTs, and i think organizing it as an intersect query is the simplest to understand (so yes, between this and billy's suggestion referenced below by berndulas comes down to personal preference for me)
fwiw the problem here is also similar to this stackoverflow post where their table(MTYPE, RNO, VAL) is roughly equivalent to tag_values(tag_id, image_id, value) in your situation
You need two separate IN lists if you want to check for two different tags, see: <#databases message>
Now you can use AND or OR for the two in-lists according to your needs.
yeah trying to add the INTERSECT now, this at least returns nothing:
WITH soccer_images AS (
SELECT image_id FROM tag_values
WHERE tag_id = (SELECT id FROM tags WHERE tags.tag_name = 'sport' AND tag_values.value = 'soccer'
INTERSECT SELECT id FROM tags WHERE tags.tag_name = 'league' AND tag_values.value = 'la_liga')
)
SELECT images.image_path, tags.tag_name, tag_values.value FROM tag_values
JOIN soccer_images ON soccer_images.image_id = tag_values.image_id
JOIN images ON images.id = tag_values.image_id
JOIN tags ON tags.id = tag_values.tag_id;
Ah yeah I needed this, seems to work now!!
WITH soccer_images AS (
SELECT image_id FROM tag_values
WHERE tag_id = (SELECT id FROM tags WHERE tags.tag_name = 'sport' AND tag_values.value = 'soccer')
INTERSECT SELECT image_id FROM tag_values
WHERE tag_id = (SELECT id FROM tags WHERE tags.tag_name = 'league' AND tag_values.value = 'bundesliga')
)
SELECT images.image_path, tags.tag_name, tag_values.value FROM tag_values
JOIN soccer_images ON soccer_images.image_id = tag_values.image_id
JOIN images ON images.id = tag_values.image_id
JOIN tags ON tags.id = tag_values.tag_id;
Hey guys, Im working on a mini graph database, was wondering where I can find very big datasets with billions of relationships that I can test it on, like trees or DAGs ?
kaggle graph dataset to google
https://www.kaggle.com/datasets/startupsci/awesome-datasets-graph
and etc
although u know, for such usage case, i would have probably just wrote my auto generator ๐ค
not a rocket science, just one-two functions with some loops and random int choices
You sure your parenthesis for the inner sub-select is right?
WITH soccer_images AS (
SELECT image_id FROM tag_values
WHERE tag_id = (SELECT id FROM tags WHERE tags.tag_name = 'sport') AND tag_values.value = 'soccer'
-- --------------------------------------------------------------^
INTERSECT SELECT image_id FROM tag_values
WHERE tag_id = (SELECT id FROM tags WHERE tags.tag_name = 'league') AND tag_values.value = 'bundesliga'
-- ---------------------------------------------------------------^
)
...
at least it seems to work that way
the parentheses for this unnecessarily makes the inner select a correlated subquery whereas the parentheses shown by berndulas (and #databases message ) isn't correlated
though arguably the inner join that billy showed for the CTE is what i should have written the first time, i.e. sql SELECT image_id FROM tag_values tv JOIN tags t ON t.id = tv.tag_id WHERE t.tag_name = 'sport' AND tv.value = 'soccer' INTERSECT SELECT image_id FROM tag_values tv JOIN tags t ON t.id = tv.tag_id WHERE t.tag_name = 'league' AND tv.value = 'bundesliga' either way, you might want an index on (tag_id, value) for this query if you haven't added one
ok changed the parenthesis, seems to still work ๐
Is there a way to convert the formatting of a parameterised SQL query from SQLA -> Psycop? ie i have something like:
select x from tbl where x > :val
and would rather:
select x from tbl where x > {val}
so that I can use sql.SQL(str) from psycop
I could regex replace, and maybe that's necessary, but am curious if there's an easier approach I'm unaware of
lol
any sysadmin resources for maintaining servers that hold databases? like regular integrity checks and such
It probably depends on what kind of databases
I need help with this badly,
This is my database
TABLE1 ----> TABLE2 <------ TABLE 3
BETWEEN TABLE 1 AND TABLE 3 IS MANY TO MANY RELATIONSHIP AND THE RELATIONSHIP HAS AN ATTRIBUTE, SO THE FOREIGN KEYS ARE STORED IN TABLE 2 , PRIMARY KEYS FROM TABLE 1 AND TABLE 3
MY QUESTION IS, IN MY ASSIGNMENT IT SAYS MAKE IT HARD TO delte the relationship between table 1 and table 2 ('the client is afriad someone might delete the relationship bymistake and wants to ensure that it cannoy be easily deleted.')
So how do i do this? my theory is i could add on delete resitrct, but that also does total partiipation which is something i dont want.
So how?
?
Close that terminal and do it again. You're in Python's REPL
hi there
can't figure out how to make migrations with alemibc
i have metadata defined in root.database and table defined in root.service.database
from sqlalchemy.orm import registry, relationship
from src.database import metadata
mapper_registry = registry(metadata=metadata)
auth_user_t = Table(
"auth_user",
mapper_registry.metadata,
Column("id", Integer, Identity(), primary_key=True),
Column("email", String, nullable=False),
Column("password", LargeBinary, nullable=False),
Column("is_admin", Boolean, server_default="false", nullable=False),
Column("created_at", DateTime, server_default=func.now(), nullable=False),
Column("updated_at", DateTime, onupdate=func.now()),
)
refresh_tokens_t = Table(
"auth_refresh_token",
mapper_registry.metadata,
Column("uuid", UUID, primary_key=True),
Column("user_id", ForeignKey("auth_user.id", ondelete="CASCADE"), nullable=False),
Column("refresh_token", String, nullable=False),
Column("expires_at", DateTime, nullable=False),
Column("created_at", DateTime, server_default=func.now(), nullable=False),
Column("updated_at", DateTime, onupdate=func.now()),
)
class AuthUserTable(object):
pass
class RefreshTokensTable(object):
pass
mapper_registry.map_imperatively(AuthUserTable, auth_user_t)
mapper_registry.map_imperatively(
RefreshTokensTable,
refresh_tokens_t,
properties={
"auth_users": relationship(AuthUserTable, backref="auth_user")
},)
and heres env.py alembic
from src.database import metadata
from src.auth.database import mapper_registry as auth_reg
from src.plants.database import mapper_registry as plants_reg
from src.auth.database import AuthUserTable, RefreshTokensTable
from src.plants.database import AnnualEmissionsTable, PlantTable
config = context.config
if config.config_file_name is not None:
fileConfig(config.config_file_name)
target_metadata = [metadata]
the migrations are coming out empty
how to get a collection in MongoDB from an information in it
is anyone here into data engineering?
Lots of us are, I am. Whatโs up?
I'm doing a project using pymongo and tkinter and wanted some guidance if possible
Those arenโt my specialty, but whatโs your question? Youโll get better responses with a direct question here, a lot of people lurk
I will do so tomorrow I'm afk rn
hey guys im doing a project and want to use nosql (mongodb) as database, do you guys have any odm to recommend?
iโve been looking into beanie, odmantic, mongoengine
should i go with the most stars in github? which is mongoengine
if i happend to do it in mysql or any sql database i will create new table for questions and make foreing key to the topic. But now is this the way to do it in firebase? like seen in the screenshot. I have topics and each topics have questions that will grow in time. each question is a multiple choice question. What am i doing is this a way to do it in firebase. Now i am doing like a sql database. How can i organize the data?
Anyone have any tips for sql? Iโve been doing questions for weeks, but I cant seem to memorize the statement details completely..
async def read_dbs(sos_event_record):
async with database.transaction():
query = select([vivid_event]).where(
and_(
func.text(vivid_event.c.localdate) == sos_event_record["date"],
func.similarity(vivid_event.c.venue, sos_event_record["venue"]) >= similarity_threshold
)
)
results = await database.fetch_all(query)
for result in results:
g.append(result)
print(
f"Match found: {sos_event_record['name']}-{result['name']} - {result['localdate']} - {result['venue']}"
)
async def main():
await database.connect()
sos_events = await database.fetch_all(sos_event.select())
print("start")
for i in sos_events:
await asyncio.create_task(read_dbs(i))
print(len(g))
anyway to make this thing way faster? i need to take 200k records from one db and then compare with second one. its databases orm and postgres sql
. slow with gather too
Can u do the comparing in the db instead?
They're different dbs I guess
Maybe loading both into some kind of temporary storage would be faster if it's not a one-off script
Anyone well versed with alembic?
what are you stuck on specifically?
instead of doing this in a loop, maybe load the data from one database into the other? maybe using a temporary table if your db supports that
can someonw tell me where did i go wrong https://paste.pythondiscord.com/ESIQ
so i load and then what? im udsing postgres builtin fuzzy algo
how wouldi do that
The main problem is that you're running a db query for each sos_event. If you had a single condition, you could just pass a list of values (ie: pass a list of dates) and query everything on those dates, and then filter by similarity. I think in this case, assuming date range is suitably narrow, and I just wanted to get a quick answer, I'd write the query to filter on a fixed date range, and pass a list of venues something like: (event.c.venue in (venues)), and return all hits w/ similarity score... then do a final filter in Python to get the exact date/venue matches.
The most flexible approach is to create a table in the target db, and join the new table with the data, then you can adjust the criteria however you want later. You could also filter this to the date(s) you're interested in. ** It might even be faster/easier to just query on the date range and pull down all the records, and compare in python... compared to querying one by one.
I want to learn mongo can anyone help me?
Hey people, not sure if this is the right sub-channel - i am currently in the process of profiling a few scripts, and have been using cProfile which is fine, but what I actually just simple want is an overview of how long each line takes to run, cum'ed given that there are loops etc. - are there any libraries for this?
a better question for #python-discussion
Hey @lethal bronze , you may want to check out https://pypi.org/project/line-profiler/ ๐
Thanks man! It was perfect
hey i have a question how do i raise a error if studennumber is not in the database: student_number = input("What student do you want to change(give student_number?")
change_class = input("Which class do you want to assign him.")
c = con.cursor()
c.execute("UPDATE students SET class = ? WHERE studentnumber = ?", (change_class, student_number))
After running the query, you can check the .rowcount attribute on the cursor to see how many rows were affected. If it's 0, no update was done
I'm running a bunch of operations on a SQL db using peewee, in this it doesn't add any records, but does delete a bunch (as of right now 96,295) I made a backup of the database before running through this script but it has the exact same amount of disk space as the original. Why is this?
which database?
postgres?
sqlite3
if you delete a row in SQLite the space occupied by the row is just marked as "free" to sqlite itself, and not returned to the operating system. As to why, if you have 20 tables and delete half of some table, and the location of the table in SQLite would be somewhere in the middle of the file, how would SQLite return it to the operating system, since it's a single file? You should be able to use VACUUM (https://www.sqlite.org/lang_vacuum.html) to free up disk space
Ah okay makes sense, thank you!
If I have done rows = conn.scalars(select(MyTable).where(....)).fetchall(), what is the correct method to find the col names?
This does not look right list(rows[0]__dict__.keys()) as it uses an internal var and includes _sa_instance_state
I'm super new to this and just looking for some help. When using sqlalchemy, can an association object in combination with association proxy have two relationships pointing to the same table. Example: let's say I have a table of characters in a book and the association object keeps track of who has met who and a third column on the association object tracks the date of them meeting. Is this possible?
There's a section in the documentation titled "Association Object"... It looks like exactly what you are asking for
@fading patrol thank you for the response. Ive been reading through it all day but am unsure how to make it work with 2relationships to the same table. It keeps throwing back an ambiguousforeignkey error cause error.
If some one has the tome to go over in the future with examples, I would be grateful. I can share my current code and everything
i assume you're using sqlalchemy? in this case each row would be a MyTable instance. you can get a list of instance attributes dynamically of course, but normally you'd be expected to just look them up on the class definition. or are you using dynamic reflection?
You are correct in sqla assumption. Yeh I was thinking at one point that I could look up the class def. And I had actually made a to_dict() method that would do that , but i am also using alembic (which is realy helpful) so i would like the plain simple way of intergetating the row of its columns
what are you actually trying to do?
Bcs the to_dict is wrong unless I update it
this sounds like an XY problem https://xyproblem.info/
Asking about your attempted solution rather than your actual problem
No. I'm simply asking. What is the method call to list out the colum names of a thing that was returnd by conn.scalar or conn.scalars
the answer is that there isn't one, because internally sqlalchemy turns the columns into instance attributes on the class
maybe there's a sqlalchemy method for it that i'm not aware of. you might have to dig around in the docs. but it's not how sqlalchemy is normally used.
there might be some method on the class itself to enumerate all mapped fields
if you check the sqla docs, you'll see that Connection.scalars returns a ScalarResult which is documented here https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.ScalarResult
and that's based on the Result class https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Result
aha, you might be in luck
it looks like that might not be an exact listing of column names, but it might be a step in the direction you're looking for, without needing to introspect into __dict__
The documentation of sql alchemy is renound to be one of the worst example of documentation. So search bar in sqlalchemy gives you a bunch of stuff that you have to then go and find the ref. It's not great at all.
no, but in this case you can start with the specific function you're calling and look at the result types
the docs have improved quite a bit. the search is still very bad, but the reference doc is well cross-linked now
https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Result.mappings calling that method gives you a https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.MappingResult which yields dictionaries
so that might be what you want?
rows = conn.scalars(select(MyTable).where(....)).mappings().fetchall()
if i'm reading the docs right, that should return a list of dicts (or at least things that look and behave like dicts)
if that doesn't work, you'll have to keep following similar threads
browse methods, look at result types, browse those methods, repeat
I do recognize that but I still asking the question here you must understand that I have tried and gave up on docs. I have only so much time to live ๐ช
well i just found at least two things that might help, so try those and see if you can make progress
if not, at least you can start where i left off
Thank you. Now teach me how you can get there with a search. Teach me how to fish please?
i wanted to find the docs for Connection.scalars.
so i started with "sqlalchemy cursor" in duckduckgo. that got me to this page, which i've seen many times but i think would be impossible to find from the sqla home page: https://docs.sqlalchemy.org/en/20/core/connections.html
from there, i clicked the link in the first paragraph to Connection because i knew i wanted a method on that class: https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Connection -- it turned out to be a link to an anchor further down on the same page
then i clicked the scalars method in the gray box and got to here: https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Connection.scalars
so on and so on
finding that connections.html page is the hard part. from there it's just following links (open in new tab so you don't lose your place in that giant page) until you find what you need, or if it looks like you're stuck or going in a bad direction, you can backtrack and try something else
Hmm. Totally appreciate your dedication. But you started with searching for cursor. I'm using Scarar
right. i happened to know that the result of scalars is still something resembling a cursor, and i was fairly sure sqlalchemy used that word in their docs a few times
this also worked: sqlalchemy connection scalars site:docs.sqlalchemy.org
the 2nd search query should work better for you
I'm just saying that we need pepole like you that are willing to supliment docs. So docs are insufficient. It's not just me, it's a trove of people who have said sqlalchemy docs is the worst documentation, and yet the api is the most widely used.
anyone know sql?
im unsure if what i wrote is correct
i get confused when to have a sum() or not
Seems like you answered your own question then, tinydb is the module
We don't know what you wrote, or what you are trying to do. #โ๏ฝhow-to-get-help
Does anyone have any ideas on how I could easily import data from a query executed with a python script into tableau automatically?
Want a workaround for more complicated filtering and transforming vs running it via sql making the query take longer than using an efficient algorithm for it. I don't think tabpy is available for me to use in this case
that wrapper around a json file?.. should be fine for small projects, but be aware that it's not any better/more error proof than using a json file
["object1", "object2", "object3"]
if i wanted to add an "object4" to this json list from a python file, how would i do that? sorry its been years since ive coded and i couldnt find any examples :/
open the file, read the list into memory, append to the list, write back to the file?
assuming that it's stored in a normal something.json file
import json
with open(filename, 'r') as file:
data = json.load(file)
# modify data
with open(filename, 'w') as file:
json.dump(data, file)
Hello, I'm particularly using flask in python as its backend with database, I just wanted to ask if anyone could recommend a free domain I could use to host it.
replit?
tysm, i had code that used it a while ago but i think i deleted it so i wasnt sure how to do it anymore ๐
@bot.command()
async def test(ctx, channelid):
with open('channels.json', 'r') as file:
data = json.load(file)
if str(channelid) in data:
await ctx.send("1")
else:
await ctx.send("0")
so im trying to make a thing where it tests to see if the variable "channelid" (sent by the user in the command) is in channels.json and if it is send a 1, and if not send a 0. it keeps returning a 0 indicating it can't find the item even though it is in the list. the id is stored as a string in the file as well but it still doesnt work for whatever reason, is there any obvious reason this is broken?
what does print(repr(str(channelid)), repr(data)) shows
Are you checking if it's a key in the dictionary or a value?
If its a value you need to check using data.values(), you said in a list, so I'm assuming the list is a value of a key
the file just has this ["1183131046586359908"]
thats a value correct?
That's what's in your json file?
@bot.command()
async def test(ctx, channelid):
with open('channels.json', 'r') as file:
data = json.load(file)
if str(channelid) in data[0]:
await ctx.send("1")
else:
await ctx.send("0")
That should work. But json isn't the format you should be using then. Prolly just make a .txt file with each id on a new line
whats the [0] after the data for
The first item in the array
ok
Just check and see if it works
type of argument "int" is not iterable
can i edit a txt file?
with commands
from the main .py file
Yes you can
ok
OK so data is an integer
This is unquoted in the file
It has just [1183131046586359908]
so remove the quotes?
The problem is there are no quotes
yes there are
But you don't need them, you should be fine if you remove the str
See the โช ? It means the file is unsaved
bruh
Damn good catch
Vsc auto save FTW
Not a great idea
Depends if you like flipping a coin on deleting all your code ๐ญ
still not working
The problem is accidentally deleting, autosaving and accidentally closing
But sure
Error?
You must show another screenshot and the code you currently have
This shows you're checking for a string in a list of integers
Remove [0]
i tried that
This isn't even a #databases problem
Retry that and save
true
Not really related to anything
Are you sure it's the same ID?
i just copied the id in the file and ran the command
Have you restarted the bot
every time i make a change
Otherwise the command isn't reregistered
ok issue
i printed the two things
it thinks the value in the json file includes the brackets
No shit. Doing channelid in data should work
data basically equals [channelid] if I understand correctly
channelid is in [channelid] therefore channelid is in data
ok is there a better way to do this than json files
bcs i can just switch if it would be easier
Overwrite the file with [5] and see if !test 5 works
Make sure channelid: int
That was probably the issue now that I'm thinking about it
so delete the value in the json file and replace it with [5] and typehint the channelid value
just typehinting?
i know how it works
it just takes "channelid" and reads it as an int instead of a string, which is the default since the message sent by the user is a string
I assumed it worked
๐
anyone ever venture into information science?
waht do
Any suggestions for building a data base for a behavioral enumeration node
Can anyone help me get a free website template
how do i make a cheap or free database for a discord bot?
SQLite's a good place to start off, it's a file-based relational database which doesn't require setting up a server and is well suited for the structured data you might store in a discord bot (guild configuration, member stats, etc.)
Python comes with the sqlite3 module built-in but if you need async support, you can use asqlite https://github.com/Rapptz/asqlite written by the same author as discord.py
and if you don't know much about SQL, you can learn the language with https://sqlbolt.com/ and from SQLite docs https://sqlite.org/index.html
alternatively if you want a client-server database, PostgreSQL is another good choice too
aye
would this work for a larger server?
sqlite can handle larger workloads when you manage your connections properly, but there are other factors to consider like whether you need to access the database from a different computer (which is arguably the most important consideration)
you can look through their appropriate uses section and see if it'll suit your needs
it depends how you expect your discord bot to be used. sqlite can't handle concurrent writes, only concurrent reads
if i want it to track reactions and such
how many writes (insertions/updates/deletions) do you expect to happen in a second?
not many, maybe 2-3 at max
then writing definitely won't be a bottleneck
^ to clarify this is true with sqlite's write-ahead log mode which the asqlite package enables by default, if you used something else like aiosqlite and forgot to execute PRAGMA journal_mode = WAL;, writing connections would also block reader connections
you still can't handle concurrent writes, only that a writer won't block readers
I'll come back if i have any questions
oh yeah, it's not mentioned in asqlite's readme but they have connection pools if you want to use that instead of creating connections directly: ```py
async with asqlite.create_pool("mybot.db") as pool:
bot.pool = pool
# in some other code:
async with bot.pool.acquire() as conn:
cursor = await conn.execute("SELECT ...", 1, 2, 3)
...```
https://leaderboard.cowbel.ly/ would somthing like this be possible? its from a server im in, the owner is shutting down the bot so we're making a cheaper to mantain copy ( we have permisson to)
yeah sqlite would be fine for that
gotcha
well thanks a bunch guys, im not the best at bot dev so your help is really appereciated
#discord-bots message
@vital garnet your initial comment didn't give many reasons for not using sqlite in real applications, and what i listed are its typical benefits over bare files / writing a custom storage solution
I'm not sure, what you're arguing about. You've said yourself, that concurrency is an issue. There is just such an incomprehensible gap between a fully fledged mature RDBMS and sqlite. It doesn't even compare, and it is nonsensical to try
correct, thats why it depends on their needs
Whether you take my word for it or not, it doesn't require much digging to figure that out yourself.
Yes, and your needs should not include hosting a service as I've said
Sqlite3 concurrency becomes not an issue after turning on WAL
sqlite.Open(string(dbpath)+"?cache=shared&mode=rwc&_journal_mode=WAL")
Configurable at the level of opening connection string at least
I haven't been in industry for over almost 3 years now, so I haven't really kept up with databases
Concurrency is absolutely always an issue. Dealing with concurrency related problems gives millions of people a job. What you may refer to, is that more recent versions can handle concurrency to a certain extend
im still quite confounded by this last benchmark i did on sqlite though #databases message , one connection managed to do on-disk selects faster than 10 connections
10 connections still managed 142 queries/second though, so not terrible
Technically, connections reopening is not intended for any database. All databases assume u will reuse connection/other pool of connections.
So I would say that is out of scope of any database driver to care about
I just don't know, what to make of such results. What type of queries? More importantly, connection management is an entirely different issue, where you also have to factor in other elements you'll have in practice e.g., Connection Pools
using WAL in sqlite allows you to have writes and reads at the same time, but only 1 writer still
yes, the benchmark was reusing connections which went into an asyncio.Queue (following asqlite's implementation)
You'll, also never beat fully fledged RDBMS with sqlite for non-trivial queries, due to low level optimizations, graph optimizations, etc.
In sufficiently large databases"
YAGNI. You aren't always needing to care about parallelism if your end users receive results fast enough as it is.
Lack of parallelism helps to keep more sane code logic. Which easier to unit test predictably
Since I use golang, which has ultra powerful simple parallelism, I do abuse it a lot though. I still often make switches to ensure synchronous executions for unit testing or debug runs
You absolutely always need concurrency for any service, and this has absolutely nothing to do with "sane code logic." You're still writing the same queries, we're talking about the underlying engine you'll be using to serve your queries
this is the decision-making normally made between sqlite and other databases, enough that's there's an appropriate uses page
how much concurrency is expected? and could there be enough traffic that the service needs to be split across multiple systems (and therefore require network access to the database)?
fyi mentioned in that page is:
Websites
SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
Gotta love the weasle words
Anyway, I'm out of this discussion. Not really sure what we're arguing about, and honestly I don't care. The past 5+ messages have provided no additional insights. Nothing here contradicts with what I've said earlier. Neither did I contradict anything you've said
how do I install rapptz asqlite? @waxen finch
assuming you have git installed, pip install git+https://github.com/Rapptz/asqlite is how you would get it
TY
anyone know any simple ways for converting data in csv file to a db file or should i just remake the db file from scratch?
Most databases have a method to read from a csv. What db are you using?
ah okay i didnt know if I was losing performance by doing that, im using the async sqlite which is just sqlite3
Nah, I think theyโre more optimized than anything Iโd do outside the db.
do u know what pandas is in comparison to sqlite? is reading csv files with it slower?
Pandas isnโt particularly fast for csv reading, I use pyarrow for that.
No idea for SQLite. What kind of data are you working with?.
just for a game
to output the data in embed messages
w discord py
maybe edit the data sometimes
Oh, then why so worried about out performance? I thought you had some big dataset.
my first time ever using sql, I usually do pandas to read csv for personal bot, but then I was gunna move bot to the gaming server and I didnt know how the code would differ if multiple people called commands at the same time trying to read or write to the csv file
Do people have good suggestions for resources and libraries that aid in SQLite migrations? I just redeployed my application that uses a SQLite db with a bunch of schema changes and it was ... kinda miserable to fix manually (unfortunately I cannot just drop and recreate the table like I would in dev because I have production data stored).
I am not using an ORM if that makes any difference (I hope not).
The most commonly used is https://alembic.sqlalchemy.org/en/latest/ , it is easier with an orm like sqlalchemy since all examples use it, but you can still do it by connecting to the database with whatever engine you are using. For production data i think you can also do it using alembic, but an alternative could be to use a seed command depending on your framework (something like flask manage.py seed) with the command having some logic to creates tables and load data in your database.
i've been using Liquibase at work, it's pretty heavy-duty but it has a lot of features
i was not a big fan of alembic when i tried it. lots of manual fussing around compared to liquibase
+1 for alembic, you can use it without sqlalchemy 
Is there a way to make postgre queries in sql alchemy not case sensitive?
Wdym? ๐ค
I want to find names such as Kaiba or like Dark Magician but I don't want to care about the case sensitivity i found something called ilike that seems to do it but idk what the trade off it has
lower on the query or lower on the search term?
On both
wouldn't that in both instances just look for the element kaiba and not find it since the element in the db is Kaiba?
I just said that you'd have to use lower on both
select * from table
where lower(name) = lower(:search_term)
You're going to need a library with an async/await interface
Other than that, it's a matter of efficiency over simplicity
guys why i getting this error, weird
near "(": syntax error
CREATE TABLE IF NOT EXISTS data (
project_id VARCHAR(100) NOT NULL,
schema_id VARCHAR(100) NOT NULL,
content_id VARCHAR(100) PRIMARY KEY NOT NULL,
content BLOB NOT NULL,
type TEXT NOT NULL
UNIQUE (content_id)
)
I think you're missing a comma after type TEXT NOT NULL
Btw, why not use content_ID VARCHAR(100) UNIQUE PRIMARY KEY NOT NULL
Thinking about it, why use the UNIQUE constraint on a primary key
It's already unique
I'd understand if you used it together with other columns, but it's just the primary key of the table :/
Also varchar(100) doesn't seem to be the best choice for an ID datatype.
Right
ill insert uuid in that, i using sqlite
Why do you want to use uuid? ๐ค
Also uuids are a lot shorter
i second that it isn't the best data type for uuid, you can do shorter or use the GUID extension
nvm found it outt
then post the solution here so others can find it
I have the following classes and was wondering what the best approach to populating them from a database would be. The approach I thought of was to just have a get() method, but I was wondering if there was a more elegant approach
class User():
def __init__(self, user_id, name):
self.id = user_id
self.name = name
podcasts = List['Podcast']
``` and
```python
class Podcast():
def __init__(self, podcast_id: int, db_instance: Database):
self.id = int(postcast_id)
self.podcast= db_instance.fetchone("SELECT * FROM Podcasts WHERE id = ?", (podcast_id, ))
if self.podcast:
self.host_id = self.podcast[1]
self.name = self.podcast[2]
self.platform = self.podcast[3]
When working with sqlite3, does the database file type matter? Example .db, .sqlite, .sqlite3
I would recommend just using sqlite3
You can find some options and arguments for each of them in https://stackoverflow.com/questions/808499/does-it-matter-what-extension-is-used-for-sqlite-database-files
what interfaces exist for creating a secondary database whose entries are based and modified according to the state of a primary database? (in sqlite3)
do we just deploy some python-fu, sqlite3-fu, and sqlalchemy-fu?
entries are based on: just copy the file
entries are modified - as in if you edit in db 1, it will also be in db2?
entries are modified: data is processed from db1 to make an entry in db2
so you don't have the same modifications in both dbs? you want to read from some table in one, and write to another?
yeah, and db1 will often be modified, and db2 must keep up with it's processing of db1 data to make its entries. doesn't need to be realtime, just, whenever I want to consult db2
seems to me atm that this is just a combination of python, sqlite3 and sqlalchemy right?
yep, i would say the same
nice, thanks!
hi all! i have an sqlite database where one of the rows is a timestamp. does anyone know how i could do a query where i gate data from a certain date?
Something like this: ```py
import sqlite3
from datetime import datetime, timedelta
timestamps = [(datetime.now() + timedelta(hours=12*i),) for i in range(5)]
with sqlite3.connect(':memory:') as conn:
cursor = conn.cursor()
cursor.execute('CREATE TABLE data (id INTEGER PRIMARY KEY, ts DATETIME)')
cursor.executemany('INSERT INTO data (ts) VALUES (?)', timestamps)
cursor.execute('SELECT *, date(ts) FROM data where date(ts) = ?', ('2023-12-13',))
r = cursor.fetchall()
print(r)
cheers! i'll give it a shot
The key thing is the date(ts) part. It converts the timestamp to a date so you can check equality
yeah, I was using DATE, but I couldn't get it working. turns out i just had to add 'unixepoch'
like DATE(timestamp, 'unixepoch')
well that was annoying... been using more time than i'd like to admit figuring that out ๐
8,15 add degree of both the tables multiply the cardinality of both the tables
I need help with an SQL Model design using SQLAlchemy, trying to create an API for an application that automatically marks attendance of teachers once they enter a class room.
Admins can add schedules for staff members/teachers, which will be nothing but class timings.
A class/schedule can be a single time one, or reoccurring as well.
Below is my model:
__tablename__ = "schedules"
id = Column(Integer, primary_key=True, index=True)
staff_member_id = Column(
Integer, ForeignKey("users.id"), unique=True, nullable=False
)
location_id = Column(
Integer, ForeignKey("locations.id"), unique=True, nullable=False
)
title = Column(String, nullable=False)
is_reoccuring = Column(Boolean, default=True)
# Date will be null for reoccuring classes
date = Column(Date, nullable=True)
day = Column(Enum(DaysEnum), nullable=False)
start_time = Column(Time, nullable=False)
end_time = Column(Time, nullable=False)
created_at = Column(
DateTime(timezone=True), nullable=True, server_default=func.now()
)
updated_at = Column(DateTime(timezone=True), nullable=True, onupdate=func.now())
class AttendanceModel(Base):
__tablename__ = "attendances"
id = Column(Integer, primary_key=True, index=True)```
Now the problem is that how do I mark attendance for each class?
I thought of creating a JSON object for each user against their schedule and storing it in ```AttendanceModel```, but what if a schedule gets changed in between, then I will have to recreate the entire JSON object, and iterate over previously stored records which will be a huge pain.
Any suggestions will help.
May I request if you could please share the link of the DB/SQL course that you are working on? I would like to look into it.
just a random sample question paper from cbse
Hi ! is there a channel where i can get help with an sql issue ? ^^'
here, if you ask your actual question
(assuming that someone knows the answer and has time to spare)
oh , okay thanks, this is a very basic thing,
I m trying to import an excel file as database, but i have this error :
The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local computer. (System.Data)
i tried uninstalling and reinstalling mssms but doesnt fix it, i trired the developper one and the express one still nothing
Since i m a beginner in this field, i m a bit lost haha
you mean a .xlsm / .xls file or just .csv?
.xlsx
I don't know if the database you are using has tools for importing that directly, but I would probably try converting it to a csv first, probably manually
i will convert it and try but as far as i know and of what i red online, the error isnt comming from the tpye of file , but ill try
it seems that it can't take csv files, only xls
TITLE: Assistant Importation et Exportation SQL Server
The file path contains an invalid Excel file. Provide a file with an .xls, .xlsx, xlsx, .xlsm or .xlsb extension.
BUTTONS:
OK
the full error message is :
Program Location:
ร System.Data.OleDb.OleDbServicesWrapper.GetDataSource(OleDbConnectionString constr, DataSourceWrapper& datasrcWrapper)
ร System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)
ร System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)
ร System.Data.ProviderBase.DbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
ร System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)
ร System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) ร System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions)
ร System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
ร System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
ร System.Data.OleDb.OleDbConnection.Open()
ร Microsoft.SqlServer.Dts.DtsWizard.DTSWizard.GetOpenedConnection(WizardInputs wizardInputs, String connEntryName)
ร Microsoft.SqlServer.Dts.DtsWizard.Step1.OnLeavePage(LeavePageEventArgs e)
can I name an env file like token.env?
you can name an env file what you want
please don't ping random people for help
Hello , i m still struggling with my Microsoft sql server issue , where i cant input data from an xlsx file, can someone help ? here is the issue, i dont even have an error code anymore .. thanks
Quick question what if I want to make a leaderboard for my bot and it has to be large enough to be added to multiple servers with the largest having over 11,000 members. Is there any chance of having a site host that large of a amount for a low cost or something
Ping me if you have an answer please
hello python people of python land, I am working on a tiny bit with a database, and what does isolation level do?
Is it possible to migrate data from one database to another using SQLAlchemy without removing the foreign keys?
yes, this isn't a lot of data
which database? postgres?
Well idk, seems like a lot to me, never made a database/leaderboard before so Iโm still learning
unless you're storing a ton of data for each user, pretty much any host will do
https://leaderboard.cowbel.ly/ Iโm trying to sort of copy this in a way
Also the amount of mutes a person has
is this something that counts for every time someone sends a message?
yeah, that should be fine
The thing is what and how can I achieve that
Keep in mind Iโm new to this so sorry if Iโm a bit slow
(dw volcyy i suggested them asqlite a couple days ago)
fetchall() returns a list of sqlite3.Row objects, and each row contains one set of values for the columns you've selected
you can read more about rows here https://docs.python.org/3/library/sqlite3.html#sqlite3-howto-row-factory, but basically you can retrieve a value from the row either by 0-based index, column name, or in a for-loop by tuple unpacking: ```py
c = await conn.execute("SELECT user_id, item FROM inventory")
rows = await c.fetchall() # list[Row]
first_user_id = rows[0][0]
or:
third_item = rows[2]["item"]
or:
for row in rows:
for user_id, item in row:
...```
the rough idea is that you wait for reactions and then insert or update an entry into your database every time someone clicks it
even in larger servers, you can still reduce bandwidth and memory consumption by carefully selecting the types of events you want to receive via gateway intents, as well as limiting any unnecessary caching like message/member objects; database performance won't be that important as it'll take a lot more members to see traffic close to hundred reactions per second
Fortunately most of the servers getting it will be mostly inactive
Iโm just wondering how it works and what hosting site it can use
Like how would you do that?
If given the task
a cheap vps running some linux distro would be a good idea, here's a list of hosts:
https://www.pythondiscord.com/pages/guides/python-guides/vps-services/
and some digital ocean articles about what you might do on a VPS:
Thanks
Thanks @lament parcel @harsh pulsar and @paper flower for your suggestions. I'll take a look at my options with those included and decide on a solution when I return to my project!
FWIW, there's the first-party sqldiff program which can detect some changes to your schema, and i've also spent the last couple of days writing up my own sqlitediff package for more detailed comparisons, but if you prefer keeping your migrations low-tech, you might want to track changes using PRAGMA user_version (or a regular table) and apply hand-crafted SQL scripts accordingly
can someone explain mysql calls to me?
I think that's essentially what alembic and most migration tools do? 
Hey guys, I'm working towards reaching an intermediate level in SQL. So far, I've read up on W3Schools and practiced on HackerRank. Where should I go from here?
Build apps 
https://www.sqlite.org/isolation.html the official docs will probably explain it best, the short form is that the isolation level determines how other connections see uncommitted data
^ i also found wikipedia useful for understanding isolation levels in general https://en.wikipedia.org/wiki/Isolation_(database_systems)
but for python's sqlite3 module in particular, their isolation_level attribute has a different meaning:
https://docs.python.org/3/library/sqlite3.html#transaction-control-via-the-isolation-level-attribute
If the connection attribute
isolation_levelis notNone, new transactions are implicitly opened beforeexecute()andexecutemany()executesINSERT,UPDATE,DELETE, orREPLACEstatements ...
in other words, you need to commit() whenever you use one of the above statements:py conn = sqlite3.connect(...) conn.execute("INSERT INTO my_table VALUES (?, ?, ?)", (1, 2, 3)) conn.commit()unless you setisolation_levelto None, then you don't need to commit:py conn.isolation_level = None conn.execute("INSERT INTO my_table VALUES (?, ?, ?)", (4, 5, 6))though in python 3.12 they've introduced conn.autocommit as the preferred way to handle transactions going forward: ```py
conn = sqlite3.connect(..., autocommit=False)
conn.execute("CREATE TABLE my_table (x, y, z)")
Unlike isolation_level, all statements will be in a transaction
conn.commit()```
yeah probably, and alembic's source code seems a little bit familiar to my own implementation (just way more advanced), though alembic in particular requires sqlalchemy models to use for comparisons
For comparison and autogeneration - I think yes
hello i need help and im a noob at coding python. I trade stocks and need help backtesting
This is half serious advice but: try to solve an AOC puzzle in SQL
um... what's aoc?
anyone created a admin dashboard/user management with roles like admin and general user using firebase. what is the idea implementing it.
sqlalchemy doesn't support columns with no type, right?
what database supports that ๐ค
sqlite, but granted that's basically all about dynamic typing and type affinities
never really thought about defining sqlalchemy columns without a type, but i havent used it in a while
i see, ty. am getting obliterated by this database i need to work with, will make an intermediary table with types.
also known as: everything is a string
Hey, I've been using pony.orm and playing around with Mixins, inheritance, etc., and have a few questions related to that:
My current structure is this:
T = TypeVar('T', bound='EntityListMixin')
class EntityListMixin:
@classmethod
@db_session
def get_list(cls: Type[T], user_id: int, guild_id: int) -> List[T]:
return select(e for e in cls if e.user_id == user_id and e.guild_id == guild_id)[:]
@classmethod
@db_session
def delete_if_exists(cls: Type[T], user_id: int, entity_id: int, guild_id: int):
if(record := cls.get(user_id=user_id, entity_id=entity_id, guild_id=guild_id)):
record.delete()
@classmethod
@db_session
def create_or_replace(cls: Type[T], user_id: int, entity_id: int, guild_id: int):
cls(user_id=user_id, entity_id=entity_id, guild_id=guild_id)
class Whitelists(db.Entity, EntityListMixin):
user_id = Required(int, size=64)
entity_id = Required(int, size=64)
guild_id = Required(int, size=64)
composite_key(user_id, entity_id, guild_id)
class Blacklists(db.Entity, EntityListMixin):
user_id = Required(int, size=64)
entity_id = Required(int, size=64)
guild_id = Required(int, size=64)
composite_key(user_id, entity_id, guild_id)
I noticed that if I make the EntityListMixin a parent Entity class containing all the Required fields and have the classes merely "pass", then it creates a single EntityListMixin table, where "class" is an added field.
My question is whether this is preferable or not.
so I search a table to pull up a result, if theres more than one result thats similar to the name searched, it says "be more specific", if its unique it pulls up the name. Now Im trying to combine it with a second table, to search through, but I keep getting the "be specific" error even though its a unique name. Can anyone help me on how to join the tables together through code so it can search it like one?
Works: await connection.execute("SELECT * FROM vip_melee")
Doesnt: await connection.execute("SELECT * FROM vip_melee, vip_ranged")
async with asqlite.connect(vip_db) as connection:
database = await connection.execute("SELECT * FROM vip_melee, vip_ranged")
tables = await database.fetchall()
search=[]
s2=[]
for row in tables:
if name.lower() in row[0].lower():
search.append(row[0])
shop_item=row
if name == row[0]:
s2.append(row[0])
shop_item = row
if len(search) != 1:
if len(s2) == 1:
search = s2
if len(search) > 1:
embed = discord.Embed(color = discord.Color.red())
embed.set_author(name="Did you mean:")
x=1
for ting in search:
embed.add_field(name=str(x)+".", value=ting, inline=False)
x+=1
await interaction.response.send_message("That search exceeds the limit ({} cards were returned). Please be more specific.".format(str(len(search))), ephemeral=True, delete_after=5)
if len(search) == 1: print("The Results")
The long version
it repeats the answer as an error when I add the second table in sqlite3
in sqlalchemy, can I get the rows of a table as a list?
if you're using Connection.execute(), it returns a CursorResult which has an .all() method to return the rows as a list
is connection.execute part of Core or ORM?
oh, you've linked it
thanks a lot, lol, missed that
was reading it like syntax highlighting lol
hm. this is confusing
I create the engine, then I autoload a table, then I need to query the database?
feels like I could skip autoloading the table
can anyone help with the telegram bot when the telegram bot does not save messages sent to the telegram channel in the database and the console does not get errors?
generally it's a better idea to do the searching inside your SQL query for simplicity/performance reasons, for example SELECT * FROM user WHERE lower(name) = 'name', but regardless the reason why SELECT * FROM vip_melee, vip_ranged doesn't do what you expect is because that syntax performs a cartesian product on both tables
to clarify, if you had the following tables: sql CREATE TABLE foo (a, b); INSERT INTO foo VALUES (1, 2), (3, 4); CREATE TABLE bar (x, y); INSERT INTO bar VALUES (11, 12), (13, 14); a cartesian product of those tables would be: ```sql
SELECT * FROM foo, bar;
a b x y
1 2 11 12
1 2 13 14
3 4 11 12
3 4 13 14in your case you might want to [UNION](<https://sqlite.org/lang_select.html#compound_select_statements>) two select statements instead:sql
SELECT a, b FROM foo UNION ALL SELECT x, y FROM bar;
a b
1 2
3 4
11 12
13 14``` or consider using one table with a column to distinguish between 'melee' and 'ranged', if your tables' columns are similar
is the proper way to access a FacadeDict object, e.g., metadata.tables, with metadata.tables['elm']?
for some reason it's just returning 'elm' lol
ah nevermind, I was not doing it right
nevertheless, the documentation for sqlalchemy is an absolute pain
at least it's documented, but i have no clue why they structure it the way they do lol
autoloading referring to table reflection right? ive never tried it but i assume sqlalchemy's query builder can't do much if it doesn't know what the schema looks like beforehand
The following is in SQLAlchemy:
class Link(Base):
__tablename__ = "links"
pos: Mapped[int] = mapped_column(primary_key=True)
new_entry = Link(pos=100)
Any way to return new_entry with some method on metadata?
(1) so this works well for combing them both and listing all the values of each BUT I dont have access to the other columns[1],[2] etc.
SELECT NAME_0 FROM vip_melee UNION ALL SELECT NAME_0 FROM vip_ranged
(2) this doesnt work because they have dont have the same number of columns(Part of the reason I dont combine into one table cause the columns vary in #)
SELECT * FROM vip_melee UNION ALL SELECT * FROM vip_ranged
(3) this works well for giving me the entire row BUT I cant do it with two tables (below)
SELECT * FROM vip_melee WHERE lower(NAME_0) = '{name}
(4) when I try something like this it doesnt work out
SELECT * FROM vip_melee, vip_ranged WHERE lower(NAME_0) = '{name}
ERROR: ambiguous column name: NAME_0
I want a combination of them all something like this (psuedo code), this way I can search the exact column and still refer to other columns of that row after my search while using two tables. Ofc I tried a couple combinations but no luck
SELECT * FROM vip_melee UNION ALL SELECT * FROM vip_ranged WHERE lower(NAME_0) = '{name}
Please and thank you SQL GOD
there might be a better way to structure your tables depending on what you need, but assuming you have to keep them separate, it'd be easier to execute your selects separately, i.e. py c = await conn.execute("SELECT * FROM vip_melee WHERE lower(?) IN lower(name)", name) rows = await c.fetchall() ... c = await conn.execute("SELECT * FROM vip_ranged WHERE ...") rows = await c.fetchall()
the idea is to search for a name and find it in either table /info senpaisensei => search both tables -> output row with name snepaisensei
fyi there are also full-text search tables provided by sqlite, but it's a bit complicated to set up and only matters if you have a lot of cards to search through
Is there anyone who can help me with Prestashop development?
I want to know about the data flow in the prestashop
why is this not recommended?
seems like the cleanest way of making a central connection pool to put anywhere and run queries in
not just cleanest but most readable
vs a context manager every time i wanna do a basic query
Because context manager is the preferred way of working with similar resources
So you don't see the forget to close them, and it's also shorter
usually you can still use a context manager just before you run whatever code that needs it, e.g. py async def main(): app = MyApp() async with asyncpg.create_pool(...) as pool: app.pool = pool await app.run() # pool remains open for as long as the app is running
ooh that checks out with my use case
its for a discord bot btw. tracking message count and stuff
so i want it to always be the same pool
for discord.py, i override the Bot.start() method to handle connecting the pool
https://github.com/thegamecracks/thestarboard/blob/main/src/thestarboard/bot.py#L62-L66
src/thestarboard/bot.py lines 62 to 66
async def start(self, *args, **kwargs) -> None:
async with self.config.db.create_pool() as pool:
self.pool = pool
self.query = DatabaseClient(pool)
return await super().start(*args, **kwargs)```
mainly so i can use their synchronous bot.run() method instead of putting a bunch of bot-related setup in main()
i dont understand your DatabaseClient class at all
@contextlib.asynccontextmanager what is this
nvm dont need to answer that
oh wait i see now. its to put the create_pool in diff file
another option would be to pass an existing connection pool to the bot's init method
i personally prefer that pattern, it makes unit testing very easy
that particular example of passing a config object does a similar job too
also it allows you to separate concerns: your bot code can focus on being a discord bot, you don't need worry about setting up a database connection inside the bot code
it feels like overriding start is just another way to do an async init cause asyncpg is used
sure, otherwise you'd need a separate async def run(): ... or something like that
Databases have record limits right? What happens when you reach that limit? I know it takes a substantial time but years or decades of record, youโll eventually hit that limit right?
either "undefined" or "everything falls apart and breaks", but the limits are extremely high, and in the absurd multi-billion / trillion dollar companies that might actually have to worry about them, their engineers probably take precautions to avoid reaching them (for example, horizontal scaling)
using sqlite3 for example:
A 281 terabytes database can hold no more than approximately 2e+13 rows, and then only if there are no indices and if each row contains very little data
2e13 = 20 trillion
Even if you added one row every second, it would take at least tens of thousands of years to reach the database limit
Thank you.
I have a column of IDs, that are unfortunately in the format of something like 200000200000. Excel turns this automatically to 2E+12 When I read this into pandas, I end up with 2E+12 As a fix, I'm using:
df['col'] = df['col'].astype(str)
df['col'] = df['col'].str.split('.').str[0]
It looks correct but I just want to check the logic.
Is there any potential risks I should watch out for using this method?
I'm working on a webcrawler and it subsequently needs to store all URLs that it finds, I want this project to be scalable to use multiple servers sending the requests. The question I have is how should I store the queue of URLs to go to, I don't just want to dump it into a text file, but using something like a SQL db sounds very overkill. With how I'm currently writing it the data about the response is saved in a SQLite database.
i might be missing something here, but how does .str.split('.').str[0] convert "2E+12" back into "200000200000"?
Hi guys!!
I want to use a in memory db to store skills in heirarchy order, meaning one skill may come under other skills. The skills will have name and description. I want do in-casesensitive prefix search on the name field for now but may require more sophisticated searching later. What approach you think is best?
- What database do you recommend? I need to make the queries super fast so in-memory db would be best
- How shall I structure this type of data in the db?
- Ideally I should be able to scale it easily and the skills will be in millions
if you want a pure in-memory "database", you want a cache, not a database
if you want a good and performant database, use postgresql
for hierarchic data you could use the ltree module https://www.postgresql.org/docs/current/ltree.html
or have a nullable parent field on every skill row
The df['col'].astype(str) converts it to 1234.0, then the 2nd line removes the .0
well that's not the same thing at all as the string "2E+12"
def fix_excel_id(val):
if isinstance(val, str):
return val
return format(val, "d")
df["col"] = df["col"].map(fix_excel_id)
if you want leading zeros, you can adjust the d format specification accordingly
something like the above is going to give you much better control over the output
what about sqlite?
or in this case maybe a purpose-built data structure like SortedList
not sure what "hierarchical" means in their case however
CC @sudden snow โ๏ธ
I checked the dtype, Pandas read it in as a float64 (No nulls). But if you do a .head() you will see 2.0000020e+12 .
I guess there are 2 cases here:
- Excel display scientific notation, which you can end up saving as scientific notation.
- Pandas reads in and displays as scientific notation.
Either way, looks like it's working.
excel is also saving the actual number, and the scientific notation is only for display, otherwise 2E+12 would lose any digits after the first
i'm not sure I would say sqlite is an in-memory database, it still persists to disk
or rather: its main medium of storage is the disk
it can run completely in memory though
so for example if you have some static data that you don't need to modify on disk, you load it up-front into a sqlite database upon app startup
i was actually planning on doing precisely that for an upcoming project
Thanks for your valuable input. Appreciated. Will take it from here
If I have any doubts I will ask back. Are you expert in databases?
Thanks
Also any idea how google search results are so fast? ik they rank them already but still
hey how do i connect my bot to a database?
wut. give more context please :))
what kind of database do you want
well i got a MySQL
just gotta figure out how to get the bot to connect
im sorry im just smoothbrained when it comes to this
your bot is async?
yea
use this
and you alose need a driver for it
i think this aiomysql is for mysql
as the name suggest
thank ya
I've got an old professor who want's me to help him transition some of his work away from excel. He's got hourly temp and water level data from multiple sites going back around 10 years and he's been using excel for everything. He's heard of R and wants to start using it. Should I try to get him to use Python or just let him decide what to do? He's a bit old fashioned....
If he doesn't know ruby already I'd say python, syntactically they're somewhat similar, however from what I know python is a lot better for data science.
With that, I don't know much about ruby so if I'm wrong please correct me
In memory database is not a cache 
Cache could be stored on disk, then why isn't postgres a cache too?
if u want pure in memory "database", u may be just wishing to store everything in process memory(in heap) in your language variables/structs ๐ works wonders at least in Golang which has ultra powerful parallelism tooling.
hi
i have Q
how can i link mysql database that create in xampp with python program
meh. you are just a newbie. im a db dev and i dont feel that way it all.
hello , can anyone help me with indexing in sql after 1 hour ?
my database is giving me a pain
so i'm making a discord bot with a database, but whatever i try to do to the data base, it says its locked
I only use with sql3.connect statements so it should be closed on its own
Task exception was never retrieved
future: <Task finished name='discord-ui-modal-dispatch-bf1fa053c8c336d76a0371c53cbec37b' coro=<Modal._scheduled_task() done, defined at C:\Users\pokem\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\ui\modal.py:179> exception=NotFound('404 Not Found (error code: 10062): Unknown interaction')>
Traceback (most recent call last):
File "C:\Users\pokem\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\ui\modal.py", line 188, in _scheduled_task
await self.on_submit(interaction)
File "c:\Users\pokem\Downloads\Cubiverse\Cubiverse.py", line 89, in on_submit
cursor.execute(f"INSERT INTO applications VALUES({message.id}, {interaction.user.id}, '{self.mc_name.value}')")
sqlite3.OperationalError: database is locked
yes, somebody here will know PHP, but this is the Python server, so it's the wrong place to ask
look for a PHP community instead
do you know any link?
no
i am facing this error
object(SoapFault)#4 (14) { ["message":protected]=> string(33) "looks like we got no XML document" ["string":"Exception":private]=> string(0) "" ["code":protected]=> int(0) ["file":protected]=> string(51) "C:\xampp\htdocs\dubai_books\admin\order_process.php" ["line":protected]=> int(56) ["trace":"Exception":private]=> array(1) { [0]=> array(6) { ["file"]=> string(51) "C:\xampp\htdocs\dubai_books\admin\order_process.php" ["line"]=> int(56) ["function"]=> string(6) "__call" ["class"]=> string(10) "SoapClient" ["type"]=> string(2) "->" ["args"]=> array(2) { [0]=> string(21) "proceedNewRequestBook" [1]=> array(6) { [0]=> string(0) "" [1]=> string(1) "1" [2]=> string(1) "5" [3]=> string(8) "89787978" [4]=> string(6) "jkjjbl" [5]=> array(2) { ["trace"]=> int(1) ["exception"]=> int(0) } } } } } ["previous":"Exception":private]=> NULL ["faultstring"]=> string(33) "looks like we got no XML document" ["faultcode"]=> string(6) "Client" ["faultcodens"]=> string(41) "http://schemas.xmlsoap.org/soap/envelope/" ["faultactor"]=> NULL ["detail"]=> NULL ["_name"]=> NULL ["headerfault"]=> NULL }
i removed the comma
form php.ini file from apache xampp server
this is still a python server, and this is a database channel
i havent slept in the last 32 hours
then please do so for the sake of your health
What postgres connector do you guys recommend?
Recommend for what?
If you're just trying to execute queries then psycopg2
Oh, thought there were more options. Alright.
I wonder how long will it take until 3 becomes the default, if ever
hi guts, can i ask for help here? im usnig sqlite3 and have this issue... some tables exist but others dont and the .tables command doesnt work ๐ฆ
this channel also exists for help with their related topics
and mysql.connector seems to not work, or somthin
whats the import command again?
Right, but I was sharing it as a polite hint instead of saying "nobody wants your fugly screenshots" ๐
is there a legit reason why attr mapping in sqlalchemy is not for SELECT queries?
it just feels odd that i need to restate the default in the return here
class MessageCount(QueryManager, Base):
__tablename__ = "message_count"
member_id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
count: Mapped[int] = mapped_column(Integer, default=0)
@classmethod
async def get(cls, member_id: int) -> int:
query = await cls.session.execute(
select(cls).where(cls.member_id == member_id)
)
result = query.scalar_one_or_none()
return result.count if result else 0
what seems to make more sense to me would be something like
result = query.scalar_one()
return result.count
and it would return 0 because thats stated as the default at the top
is there another ORM lib that works with asyncpg that does that?
anyone has thoughts/references on validating that a Python object has been properly "absorbed" into a database? We're talking about an object that would itself be subdivided into ORM objects
i.e., said Python object is not necessarily structured in a relational manner
in sqlalchemy, what do you call the result of a query? a row/rows? am trying to figure out if I can build a table from a list of such results
Honestly that doesn't make any sense 
In your get method you select a model and expect it to return something even if db is empty?
Also mixing your db model and querying methods is a bad idea imo
yes since the messagecount class is an sqlalchemy base and the session is for ORM operations. makes sense to me
How does cls.session works?
its just the session from async with asyncssession
What if you need to run queries that use your get method concurrently?
it cant be concurrent because im not gonna code it that way
What's the point of using async then? ๐ค
wait doesnt concurrent mean at the same time
No, I meant what would use it
Then concurrent requests are possible
each update will have its own session
okay this is the kinda convo i wanted to have cause i dont exactly trust myself, since im doing async sql operations for the first time ever
1 sec
A "proper" of doing what you want would be to make a repository:
class MessageRepository:
def __init__(self, session: AsyncSession) -> None:
self._session = session
async def count(self, member_id: int) -> int:
stmt = select(MessageCount.count).where(MessageCount.member_id == member_id)
return await self._session.scalar(stmt) or 0
And remove all querying methods from your base
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from .models import MessageCount model
class MessageRepository:
def init(self, session: AsyncSession) -> None:
self._session = session
async def count(self, member_id: int) -> int:
async with self._session.begin():
stmt = select(MessageCount.count).where(MessageCount.member_id == member_id)
result = await self._session.execute(stmt)
count = result.scalar_one_or_none()
return count or 0
try is
Why would I do it that way?
Because the code you shared with us had an error ๐
The code you've provided looks mostly correct, assuming that MessageCount is a valid SQLAlchemy model and AsyncSession is a valid asynchronous session from SQLAlchemy's asyncio support. However, there are a few things you might want to consider:
Import Statements: Make sure that you have imported the necessary classes and functions correctly. For example, you need to import AsyncSession from sqlalchemy.ext.asyncio and select from sqlalchemy.
Error Handling: It's a good practice to handle potential errors that might occur during the execution of the query. For example, if there's an error in executing the query, you might want to catch the exception and handle it appropriately.
Session Management: Ensure that the session is being used and closed correctly. Depending on your application's context, you might need to manage the session's lifecycle more explicitly, especially if it's part of a larger transaction.
We added a try/except block to catch any exceptions that might occur during the execution of the query.
In the except block, you can handle the exception, log the error, or take any appropriate action.
In the finally block, we ensure that the session is closed after the query is executed. This is important for proper resource management.
Please replace .models import MessageCount with the actual import statement for your MessageCount model, and adjust the code further based on your specific requirements and project setup.
gpt?
!rule 10
Is edit OpenI
then heres how i expect to use it
async def run(self):
async with ConnectionPool.create() as pool:
self.pool = pool
self.start(TOKEN)
@listen(MessageCreate)
async def on_message(self, event: MessageCreate):
member_id = event.message.author.id
async with self.pool.connect() as query:
await query.message_count.add(member_id)
Somehow I used this in production for years and now it shouldn't work ๐
You just need to create your session:
async with async_session_factory() as session:
repository = MessageRepository(session)
...
If you want to run it in a transaction you can use async_session_factory.begin, repositories shouldn't manage transactions themselves
I think everything you did sqlalchemy already can do out of the box
i need to look up repository and factory cause im lost lol
async_session_factory is just a sessionmaker
from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
from settings import DatabaseSettings, get_settings
_settings = get_settings(DatabaseSettings)
engine = create_async_engine(
_settings.url,
pool_size=20,
pool_pre_ping=True,
pool_use_lifo=True,
echo=_settings.echo,
)
async_session_factory = async_sessionmaker(
bind=engine,
expire_on_commit=False,
)
No, it's more akin to functools.partial to construct your session with a preset of parameters
With some helper methods, like .begin
oh shit i just noticed i forgot to remove transaction logic. transaction was something else related to asyncpg when i was using that
but yeah arent i doing that already?
I mean, you don't need to do this for example:
@classmethod
@asynccontextmanager
async def connect(
cls,
*,
transaction: bool = True,
) -> AsyncGenerator["QueryManager", None]:
async with AsyncSession(cls._engine, expire_on_commit=False) as session:
if transaction:
await session.begin()
try:
yield QueryManager(session)
if transaction:
await session.commit()
except:
if transaction:
await session.rollback()
raise