#databases
1 messages · Page 18 of 1
- you need good concurrency control
- you need to handle large amounts of data
- you have multiple clients that need to access the data
- you want to execute more complex queries
"complex queries" wouldn't necessarily require a client/server relationship
the main reason are concurrency, scaling and being able to access it from multiple different machines
right, I meant that for the examples given
I wonder if security is also a consideration here
privacy can be a consideration, but security not as much imo
you might need to make considerations about security, but client/server can still be made safe
hi, everyone! i need a way to extract real time market data from the London Stock Exchange and host it in my server, or store in db, any suggestions?
Thank you for the response. A few questions:
- Could you quantify roughly what "large" here means?
- From a quick google search, it seems sqlite is great at concurrent reads but locks the db while writing. Is this really a problem in most cases? I imagine you're reading 95% of the time
- What do you mean by multiple clients? Like end users of the product or in the technical sense?
2/3) if you have a website that is likely to be used actively by many users at the same time, SQLite can become a problem pretty quickly.
It was brought to my attention recently that sqlite is also tested incredibly extensively
Depends on what you mean by "real time market data": what frequency do you mean? I wouldn't recommend sqlite for market data storage and analysis (except for some simple/small-scale problems), but the exact recommendation would need a bit more information (how many tickers, frequency, duration, etc). Sorry, tagged wrong person, meant to tag: @south heath
Yes, but there are many other databases to choose from too. There's are many cases where sqlite is great, and certain cases where it's not appropriate.
agreed
sqlite will be my entry to databases, i just have to get used to working with tables, columns, rows etc, and get used to the queries and what I can do with it.
based on my initial 'research', most databases rely on it. So i'm quite sure even if the query language is different, what experience I gain would certainly be transferrable
Every database has differences in their SQL implementation, but if you're good with one, it's really easy to switch to another (with a little Googling to fix a few syntax differences). There are also transpilers, like sqlglot, that help translate SQL between database engines... but the differences aren't big.
import sqlite3
con=sqlite3.connect('sqlite3.db')
u=input('enter the value :')
e=con.execute("select * from students where name like '%"+u+"%' ")
for i in e:
print(i)```
Is there any easier way to search instead of doing this _'%"+u+"%' "_
i'm a little confused on how am I going to insert this into a new table in a database
i wrote a generator with intention to use it to do dml into sqlite3 database:
yield {
"rarity": int(key),
"name": name,
"baseTypeText": piece_part,
"maxLevel": max_level,
"mainAffixes": main_stat,
}
my first thought is: the key in this case would be the column name and the values are each result of the iteration.
is this right? or should i make this into a tuple, and do INSERT INTO table_name VALUES(?, ?, ..., ?) with the values positionally arranged
or maybe put the result of the generator into a list and do cursor.executemany?
yes, emphatically there is a better way
import sqlite3
from contextlib import closing
user_input = input('enter the value :')
user_pattern = f'%{user_input}%'
with closing(sqlite3.connect('sqlite3.db')) as con:
cursor = con.execute("select * from students where name like ?", (user_pattern,))
for row in cursor:
print(row)
I haven't learnt about cursor yet
the result of execute is called a "cursor"
Oh
you can ignore the details of what a cursor is supposed to represent, but it's useful to know that it's an iterable over result rows
you can also use the fetchone() and fetchall() methods on it
https://docs.python.org/3/library/sqlite3.html see here, search for the string "placeholder" to learn about the ? thing
After taking a look it seems hard to understand to me Maybe because i dont know much about contextlib,closing
I will learn about them soon
it's in the docs. it's not necessary, but it's a good tool for making sure the connection is closed when you are done with it. otherwise you need to manually call .close on it. if you never close it, you can cause problems if your application runs for a long time. it will always be closed at the end of your script, but it's important to be in the habit of managing external "resources" in your code. a sqlite database is one such resource.
that won't work, you'll need to close it after execution
!d contextlib.closing
contextlib.closing(thing)```
Return a context manager that closes *thing* upon completion of the block. This is basically equivalent to:
```py
from contextlib import contextmanager
@contextmanager
def closing(thing):
try:
yield thing
finally:
thing.close()
```...
Should i refer this?
For documents
what do you mean?
docs.python.org this is the official python documentation
am i allowed to post code here i need some help creating a database converter
an sqlite converter
hey, i'm using redis-py(async), and can't figure out why 1 ex doesn't print hello, however 2 ex do print len
1 ex:
if await redis.llen('array') > 1:
print('hello')```
2 ex:
```py
if (l := await redis.llen('array')) > 1:
print(f'{l=}')```
i was looking for a solution to store lists/dicts into a sqlite database, and it seems i can use JSON serialization? does it mean i use json.dumps?
almost always you dont need to store JSON inside sql data base, if you have some struct of your json, which repeats from data to data, make another table for data and pair them via primary keys
if you really need to store json, try to use non-sql db, e.g. mongodb
third option is to just store json as a string in db, but never do it
i'd personally rather use sqlite with json1 than mongodb for most things
This is doable, i guess next question is on convention
So this database contains multiple items where each item is of dict type. Its easy enough to simply create a table with columns named as per key
However, one key of the dict is of type list as value, and this is what I intend to store as a JSON serialized string. So say I extract each member of the list and store it in another table where these two tables are linked through primary keys, now it means the database has two tables for each item.
How should I name this second table? Won't it be messy? (I havent used primary keys yet)
question
when to use sqlite over postgres?
I've really only used mysql or postgres but sqlite is nice and lightweight but can't think of when to use it
when you need of something that works offline with little setup required or otherwise can't be bothered to setup postgres
I see
is it a dict because it's convenient, or is it a dict because it actually has an unknown and varying structure? adding fields, removing fields, lists inside fields, etc.
if the structure is consistent and predictable, you might want to consider flattening or "normalizing" the data into multiple tables
json should be a fallback only for when the structure of the data is not consistent or predictable, or if you have other specific needs that probably aren't relevant here
I need help learning the basic and CRUD operations
Please Ping me, Thank you
https://sqlbolt.com/
gl hf
@sly garden is sql databases free?
yes
can it be used thru websites
you're asking the wrong questions I think
why would you want to use it through websites?
you can have a local database file or have a database free online, sure...
no because im on a school chromebook rn and it doesnt let me download and access those type of files so can i use databases on browser or only applications like mobile apps
like do i need to download it on my chromebook to use the database or can i use it through browser
you can connect to a database hosted online
it depends on the database. there are web-based admin interfaces for postgresql and mysql
what's better postgresql or mysql
if one was strictly better than the other, the worst one would be long dead by now
may as well just go with postgres though
postgres has more features overall, mysql has better support for some particular high-performance uses. i'd say postgres is probably a safe default if you're building a hobby project or learning for the first time.
what is the best NoSQL database?
no one. use Sqlite3 for start 😆 or postgresql for real work. best default choices
I dunno, what's the best vegetable? They're all a bit different but most people don't get particularly excited about any of them.
What makes you think you need one? Use case would shape which (if any) you might want to use.
Mongo is popular and stupid-easy but I agree with Darkwind that NoSQL is probably not actually what you want.
i've kinda noticed that a lot of people fawn over nosql and especially mongo for whatever reason
but to answer your question
if we go by the nutritionist definition, tomatoes are the best vegetables
Botanically, tomatoes are fruits.
can i find out if someone is interested in generating a flask rest api from a postgresql or mysql database? Or am I in the wrong channel?
how i can display database table
with SQlite and FastAPi, is it common to directly connect an endpoint to the database? so an endpoint /endpoint/user would promptly query a database and return the result?
you mean having returned results in json format?
yes
then yes. it is called building api usually (sometimes rest api)
proper separation of front and backend
then i'm on the right path
it is also common to build CRUD API
GET method to get data on /endpoint/user
POST method to create user
PUT to replace
PATCH to change
DELETE to delete
this api would probably only have GET requests, its like those info API endpoints, but in this case it's for game data
but i think a POST api for some endpoints would lead to a better experience, and easier data retrieval with various parameters
the point of GET, that it is cachable individually per each requested data
i believe this is a plus for a particular endpoint that will have high traffic?
My SQL brain has gone off a cliff, how to I effectively merge multiple rows into one based on a shared key?
(There is another key cluster_id here which rows can shared in order to group the rows into what should be merged)
I dont really want to use MAX here or another aggregate :/
what flavour of db?
That's more or less what you need though.
Unless your db has an unpivot/etc.
Athena
Yeah I think it's probably going to have to be :/ Technically if this was postgres I could probably avoid the whole situation to begin with but meh
yikes, there isn't a freely avaiable playground for athena right?
don't feel particularly inclined spinning up presto (or whatever the heck it is called today) just to test 🙀
Normally I just write things out in postgres first just sticking to the more standard SQL tooling tho
If you can unpivot it (convert from wide to narrow), you could then re-pivot it back to a flat version.
but I don't know athena.
I don't think (at least im not aware of it) I can
https://gist.github.com/shotahorii/6b710c902a8a6ef184987ca787d329d9
maybe relevant, but i really dislike it
im using a file.pkl to store a list of objects
when i try to read what i stored without stoping the python script it works
the problem is when i stop the python script and i start it again it should load the data from the file but is not reading any data despite it being there
could you send over your code?
i solved the issue defining the variable in the func as global
i thought i just had to declare it outside to be global
and it would make the association by itself
since the name its the same
but i guess i was wrong
The variable is already in the global scope outside of functions. Name scope declarations must be found where the name is referenced. So if you want to write to a variable defined outside of a function, you need to declare the variable as a global so Python knows where to look.
from pymongo import MongoClient
client = MongoClient('mongodb+srv://username:pass@cluster0.yy14chf.mongodb.net/')
db = client["Onepiece"]
collection = db["Server_info"]
data = collection.find_one({"_id": 1120040588339843200})
print(data)```
the output of the following code is **__none__** but there is data stored in it whihc i wana retrive any suggestions how cna i
I'm working with AWS Athena and was wondering if I should convert col IN ('some_value') to col = 'some_value' dynamically when I only have one element in the set/tuple (I can only know that at runtime), the question is really if this optimization needs to be done manually or its already done by Athena?
I'm sure every regular DB already optimizes that, but Athena make more money when the query takes longer (I think)
Not able to install it..!
I have had installed it before but when i try to make connection in pycharm it says 'pymysql module not exist'
pip install pymysql
Hello dear Python lovers. I have built app using Flask and it should communicate with MySQL database. Today I tried to deploy the app on linux server to make it public but I got some problems. I deployed it using WSGI and Apache2. The front-end (html & css) is loaded but there is no interaction with DB.
I would get it if I see any errors but there is not even one.
What to do next?
Hi, please I have a problem with SQL syntax but I don't know why
def create_database_and_table(self, name):
conn = sqlite3.connect("database.db")
cursor = conn.cursor()
request ='''CREATE TABLE IF NOT EXISTS ? (
user_name TEXT,
ip TEXT,
port INT
)'''
cursor.execute(request, (name,))
conn.close()```
class database_test():
def __init__(self):
self.database = database.Database()
def create_database(self, name):
self.database.create_database_and_table(name)```
objet = database_test()
objet.create_database('test_database')```
problem is sqlite3.OperationalError: near "?": syntax error
I don't believe you can parameterize the table name.
I thnik it is because my variable is inserted like that python '''CREATE TABLE IF NOT EXISTS 'name'....
I'm not a sqlite expert, but generally you can't parameterize the table name and have to do some dynamic stuff to achieve that.
hmm okey thank you so much
Hello! I am used to work with SQLite, but I need a noSQL DB for my next project. Is there a "serverless" easy solution like SQLite?
(I know noSQL but I never tried it with python)
- SQLite is not "serverless" in the meaning "serverless" is typically used in networking (see: AWS Lambda, CF Workers). What you are looking for sounds more like local / self-hosted.
- """noSQL""" is used to refer collectively to dozens of different databases types that do not fit within the SQL model, not to one specific type of database. Which kind of database exactly are you looking for? (some examples include object/document storage like Mongodb, as well as more exotic things like time series, graph databases and vector databases)
depending on what exactly you are doing, you could host Mongodb locally in a Docker container, or depending on the scale might as well just straight way use JSON files if you're 10000% certain that it could not fit well in SQLite and cannot use a proper database
side note: SQLite offers some support for storing JSON data in columns, though I haven't used it myself
Basically I want to store a huge dict(). Would do it in Json, but it is possible that 2 modifications are done at the same time. So i'm better with a database.
mongo or one of it's alternatives might work, but overall you probably should try splitting up instead of having anything be 'one huge' thing
cur = conn.cursor()
conn.autocommit = True
cur.execute('''
SELECT EXISTS (SELECT 1 FROM pg_namespace
WHERE nspname = 'crimes')
;''')
exists = cur.fetchone()
if not exists:
cur.execute('CREATE SCHEMA crimes;')
conn.autocommit = False
qu_schema = """SELECT schema_name
FROM information_schema.schemata;
"""
cur.execute(qu_schema)
schema_name = cur.fetchall()
cur.execute("SELECT current_database();")
dbname = cur.fetchall()
print(exists)
print("Schema name = {} \n DB name = {}".format(schema_name, dbname))```
I used this code to create a schema, but when I try to reference the ‘crimes’ schema later, it says no such schema exists. So I added code to see what the conditional is holding. The conditional is false, so the schema should be created, but it’s not🤔
Here’s the output
(False,)
Schema name = [('pg_toast',), ('pg_temp_1',), ('pg_toast_temp_1',), ('pg_catalog',), ('public',), ('information_schema',)]
DB name = [('crime_db',)]
I have a table with columns A B C D Is it possible to make a simple query that returns:
A1 B1 C1
A1 B1 D1
A2 B2 C2
A2 B2 D2
...
```?
Im using BigQuery for what it's worth
something like SELECT (A, B, C) FROM ... UNION ALL SELECT (A, B, D) FROM ... might work, but I would recommend just doing that kind of logic in Python tbh - otherwise in this case you'll end up wasting resources a bit
(every 2 out of 6 values you get would be duplicated)
what database server would anyone recomend for me I have been using the wamp sever for sometime now and something just feels a little off about it
Is it good practice to concatenate sql query string
like that python request =''' SELECT ip'''+''' FROM test_database WHERE test_database.port = 5000 AND test_database.user_name = 'Thomas' '''
no, not at all. Why would you want to do that?
I have a database of 3 columns, user_name, IP, and Port
I would like to create a function called searching
That will return the results of the research
I know how to do it when the number of conditions is fixed
and the type of result as well is fixed
but now when it is not fixed
basically, it looks like that in my brain python def searching (result1, result2,...., condition1, condition2,...)
number of result and conditions can change
Fo exemple I would like use this function to get user_name and IP of each users that has 5000 as number port
But it must work also if I want only user_name...
so yeah... my suggestion is just: don't
So I can do it with concatenation
create one function for each condition you want to search
for concatenation ?
no concatenation
def search_by_username(username):
...
def search_by_username_and_port(username, ip_address):
...
yes ok i can do like that but it too easy
I d like to have only one function I can call every where you know
don't.
Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.
Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.
For example, the sqlite3 package supports using ? as a placeholder:
query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)
Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.
See Also
• Python sqlite3 docs - How to use placeholders to bind values in SQL queries
• PEP-249 - A specification of how database libraries in Python should work
it would not as bad if you were doing something likepy += "WHERE col = ?" but definitely do not do + "WHERE col = " + thing
I guess that ```py
def search(*, a = None, b = None):
query = 'select ... from ...'
args = []
if a is not None:
query += 'where A = ?'
args.append(a)
# same for B
execute(query, args)
def searching(*args, **kwargs):
conn = sqlite3.connect("database.db")
cursor = conn.cursor()
select_request = '''SELECT ?'''
if len(args) > 1:
for i in range(len(args)-1):
select_request += ''',?'''```same
not sure if select ? works, in a bunch of places the names of the tables/columns must be hardcoded
I don't want to have too many functions even if most porgrammers say that
several small functions are better than one.
I prefer to have a single function that will take care of research and that can do absolutely everything in research
I find it clearer in the code
Don't know bro but I am gonna try
the reason why we recommend small things is that the more explicit your code is, the fewer things you have to keep in mind / look at when modifying each part of your code
Agree, op looks like they’re trying to build a dynamic query, rather than pass a parameter
if you know the full extent of what a function might do, you know what to pass to it and what you should expect for it to return, and can verify these assumptions somewhat easily
if a single function can do way too many different things, it is easy to get lost, both inside of the function as well as in any other piece of code that relies on that funciton
okey you right
I keep like that for this one but for the others i'll follow your advice bro
some tools will even warn you if your code exceeds a certain degree of complexity
Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.
Cyclomatic complexity is computed using the control-flow graph of the program: the nodes of the graph...
oh nice I didn't know thank you bro
!code
If there is sense to write unit tests for smaller parts of your code, commercial devs will not ever write single function only, because it will be impossible to test smaller parts of a code in a sane way
about database normalization (SQlite):
i have a couple of tables that, in it, there isn't any data that can become a primary key due to the nature of the data inside it being a supplementary info to another main table that has a primary key (which one of the columns references).
do i just create an auto_increment primary key column in it and leave it at that?
Yah, most people do, even if there's a natural key.
cur = conn.cursor()
conn.autocommit = True
cur.execute('''
SELECT EXISTS (SELECT 1 FROM pg_namespace
WHERE nspname = 'crimes')
;''')
exists = cur.fetchone()
if not exists:
cur.execute('CREATE SCHEMA crimes;')
conn.autocommit = False
qu_schema = """SELECT schema_name
FROM information_schema.schemata;
"""
cur.execute(qu_schema)
schema_name = cur.fetchall()
cur.execute("SELECT current_database();")
dbname = cur.fetchall()
print(exists)
print("Schema name = {} \n DB name = {}".format(schema_name, dbname))```
(False,)
Schema name = [('pg_toast',), ('pg_temp_1',), ('pg_toast_temp_1',), ('pg_catalog',), ('public',), ('information_schema',)]
DB name = [('crime_db',)]
Anyone know why the conditional statement isn’t executing correctly?
Does the exists() query work directly against the db?
This is suggesting that there are no rows in pg_namespace where nspname=‘crimes’
Not sure what you mean but trying to create a schema named crimes
got it
thanks
Do yall know a database for an ai in python?
I have no idea how those three words fit together. What do you mean?
Database for what?
For AI training data (what types of data u need to store?), for storing AI models or user submitted new data to process (of what kind), smth else?
Ai
Can you explain more about what type of data you want to store? (please don't say AI, it doesn't really tell us anything)
after using sqlite for a couple of days, working with a relatively small amount of data and data types, i can see myself moving to mysql or postgres much quicker than i thought in the future
How would I rollback a model instance in SQLAlchemy 2.0.16? I'm writing a command-line editor for my models, Quiz and QuizQuestion, where the editor is operating on detached instances of those models. Presumably Session.refresh() would have done this given its description:
https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.refresh
The selected attributes will first be expired as they would when using Session.expire(); then a SELECT statement will be issued to the database to refresh column-oriented attributes with the current value available in the current transaction.
However when I tried it, the attribute changes did not revert after having done anadd()+refresh().
MRE: https://paste.pythondiscord.com/2BRCW
To clarify, there are two ways I want to revert my changes:
- Rollback just the question, not affecting other modifications
- Rollback the quiz including its questions (even though the example only modifies one at a time)
i did initially consider fetching new instances but i might have multiple references to the same models, plus swapping out a quiz's question in-place seems iffy to me
Perhaps since it doesn't have persisted state it's now refreshed?
Also you think refresh works with relationships? 🤔
https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.refresh
SELECT statement will be issued to the database to refresh column-oriented attributes
Uh, refresh api is a bit tricky
print("Initial commit:")
with sessionmaker.begin() as session:
session.execute(delete(Quiz))
quiz = Quiz(title="Quiz 1")
question = QuizQuestion(title="Question 1")
quiz.questions.append(question)
session.add(quiz)
pprint(quiz)
quiz.title = "Quiz 1 Modified"
print("Rolling back Quiz:")
with sessionmaker.begin() as session:
session.add(quiz)
session.refresh(quiz, ("questions",))
pprint(quiz)
You're trying to refresh questions here, did you mean to refresh title?
im still unsure of how it would handle relationships, and thats only sounding more complicated - although the docs does say explicitly naming the relationship will work if its lazy-loaded...?
It mentions that you can explicitly refresh them 🤔
doesnt adding a detached instance move it to the persisted state?
or do you mean that it forgot what changed after being moved to a different session
I don't know, in your example you don't detach it 🤔
Also you could just check
oh yeah, inspect()
afaik closing the session detaches it each time
Ah, I confused it with transient 🥴
Detached should work fine
whoops, ("questions",) was there to prevent lazy loading but i forgot it excluded the rest of the attributes; sure enough refresh(quiz) + refresh(quiz, ("questions",)) did revert the title, although when also modifying the question it didnt revert that
i prob shouldve read the link more closely cause it does point to the populate_existing execution option for more complex refreshing, but im trying it right now without success: ```py
quiz.title = "Quiz 1 Modified"
with sessionmaker.begin() as session:
session.add(quiz)
session.scalars(
select(Quiz)
.where(Quiz.id == quiz.id)
# Not sure how to get the relationship loading correct:
.options(selectinload(Quiz.questions).selectinload(QuizQuestion.quiz))
# Not mentioned in docs but quiz.title needs autoflush=False to be rolled back?
.execution_options(autoflush=False, populate_existing=True)
).all()
print(quiz.title)
Quiz 1
pprint(quiz)
InvalidRequestError: 'Quiz.questions' is not available due to lazy='raise'```
selectinload(Quiz.questions).options(selectinload(QuizQuestion.quiz))
autoflush=false doesn't make sense here
actually nvm
You can use session.no_autoflush block instead
You aslo probably commit here because you use .begin()? 🤔
that was another issue i noticed in my actual code where trying to refresh() the question not only didnt revert its changes but also got committed
https://docs.sqlalchemy.org/en/20/orm/queryguide/api.html#populate-existing
Also it fully erases data 🤔
.begin() commits when you exit context manager block, Session() doesn't
So if you just want to fetch data and not manipulate it use Session(), otherwise Session.begin()
Or you can always commit manually
derp, i completely forgot about calling sessionmaker even though the tutorial uses that syntax
might have gotten a bit too used to using begin() for everything
well that at least fixed my pending changes being committed, and also revealed that refreshing quiz/question's column attributes works
as in .options(selectinload(Quiz.questions).options(selectinload(QuizQuestion.quiz)))? same error
i think i got what i was looking for after combining everything, although the relationships still trigger lazy loads when being printed
https://paste.pythondiscord.com/FFJJ6 py Initial commit: Quiz 1 / Question 1 Pending changes: Quiz 1 Modified / Question 1 Modified Rolling back Quiz and QuizQuestion: Quiz 1 / Question 1 Pending changes: Quiz 1 Modified / Question 1 Modified Rolling back QuizQuestion: Quiz 1 Modified / Question 1 (just realized i forgot to replace the begin() for the last rollback, but example still works)
import pymysql as p
import pymysql.cursors
obj-pm.connect(host=localhost',
user='root',
passude=' ',
database='newDb,
)
cursorobj obj.cursor()
tc='create table abhishek (s_id int auto increment primary key, s.name varchar(50),s class varchar(10), s.email varchar(70))" I
cursorobj.execute(tc)
Can someone tell how do i apply try and except in this code
@torn sphinx
Thanks for replying
does anyone know how i would input the data that has been inputted into a modal into a database but the modal has been answered multiple times?
would the data be transferred to all of the occurances or just one of them?
Is it true that in pymysql '?' questions mark isn't supported as placeholder?
judging from the codebase yes, the library manually escapes arguments rather than using prepared statements https://github.com/PyMySQL/PyMySQL/blob/main/pymysql/cursors.py#L110-L131
see also PyMySQL/PyMySQL#202
Alright thanks
Hey guys'
FROM Carts_3NF
;```
I'm using SQLite3 through Jupyter but I cannot extract the date. Does anyone know which function I could use?
A POST endpoint is being used to save stuff to my DB. For particular types of bodies in request, I want it to additionally insert a very similar looking record with just one field changed
Is this best handled using server-side logic or SQL?
Provide examples (with shown how request body changes) and difference in sql
'%M' are the minutes not the year? How does the content of the column InvoiceDate look like?
!e
import sqlite3
conn = sqlite3.connect(":memory:")
print(conn.execute("SELECT strftime('%Y', '2023-07-15 08:30')").fetchone())
@grim vault :white_check_mark: Your 3.11 eval job has completed with return code 0.
('2023',)
This is what it looks like.
The problem is the single digit hour, which is not ISO-8601 conform.
!e
import sqlite3
conn = sqlite3.connect(":memory:")
print(conn.execute("SELECT strftime('%Y', '2023-07-15 08:30')").fetchone())
print(conn.execute("SELECT strftime('%Y', '2023-07-15 8:30')").fetchone())
@grim vault :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | ('2023',)
002 | (None,)
You could use substr(InvoiceDate, 1, 4)
^^
what are databases? im new
That’s a very broad question. Have you ever used Excel or Google Sheets?
no
It's a way to store data permanently such that you can use the data for later use
@safe glacier database is a more advanced alternative to notepad in order to store data for application
Difference in:
- usually database is reachable over network
- database had its own interface to submit and request data
- you don't need to know how database saves final data to disk and loads back
- database can be having its own powerful quering language and engine that takes most of application code logic onto itself
Can a database in Azure SQL Server "crash" or """run out of memory""" when using pyodbc?
I've been having issues when inserting lots of rows for some reason.
Also, can a server hosting multiple databases have issues when querying and/or inserting from multiple databases at the same time (no operation between the databases take place)?
I'm looking for a database to use for my project, which processes thousands of data objects
A quick overview, I have each data object being a person, with specific attributes like their generation number, names, aliases, birth and death dates. Currently as a skeleton, I'm just using objects & classes. But you can't really use objects and classes in a data base.
Does anyone know a database that takes in objects? If not, does anyone have suggestions on which data struct to use? I was thinking of dictionaries
longblob should do the trick for storing a object a table, in every database, or do you plan to use the objects for data manipulation, then look at stored procedures, nearly every database has its own stored procedures
I'm just looking to compare these objects, the point of the project is essentially to determine whether person X and person Y are reliable, and if so, if they ever met each other, (if person X has a generation number of 3 and Y of 7, its impossible for them to have met each other)
understand, so you need a database which is capable to index those object with their attributes, so I think MongoDB was such a database
if you willing to code more and you use sql instead of nosql, you can split your object into JSON then you can use PostgreSQL for indexing those JSON attributes?!
huh
You can go with any SQL database
can anyone tell me how to handle data in #1035199133436354600 in this post https://discord.com/channels/267624335836053506/1130101048070119595
how would i go about storing a list in a database
You can use One-to-many relation or use DBMS that allows storing arrays
I would prefer to use the first case, but it depends on scenario
.
okay thanks
can go the NoSQL route
if you don't need to do anything with said list except store, a json column would work.
(in fact, you even kinda sorta can work on these, e.g. sqlite's ->)
Does anyone know how to do an UPDATE JOIN with sqlalchemy?
#async-and-concurrency message @digital coral if you're interested in the further details on how SQLite transactions block each other, you can read through the various resources on SQLite docs
read vs. write transactions: https://sqlite.org/lang_transaction.html#read_transactions_versus_write_transactions
connection's locking states: https://sqlite.org/lockingv3.html
how locks are acquired + OS operations: https://sqlite.org/atomiccommit.html#_single_file_commit
ive also written up an example demonstrating blocked connections via sqlite3 and threading, which should behave similarly for aiosqlite
https://paste.pythondiscord.com/ISJA
it basically just demonstrates what i said, the "database is locked" error only occurs after exceeding the timeout (plus some odd threshold), but committing releases that lock so another transaction can continue
also worth noting that SQLite is capable of queuing any number of connections so you can be liberal with opening/closing them: ```py
async with new_connection() as conn:
c = await conn.execute("SELECT name, id FROM pet")
pets = {name: id for name, id in await c.fetchall()}
Release the connection/transaction while doing something else
pet_id = await input_choice("Which pet do you want?", pets)
async with new_connection() as conn:
await conn.execute("INSERT INTO pet_owner (owner_id, pet_id) VALUES (?, ?)", (owner_id, pet_id))
await conn.commit()```
Oh man! Thanks for this! Ill read through those 
flask + graphQL or django + graphQL
I don't get what that the following error means and google didn't help either: asyncpg.exceptions.FeatureNotSupportedError: cannot use column reference in DEFAULT expression
The table i'm trying to create:
CREATE TABLE IF NOT EXISTS welcomer (
type SMALLINT,
guild_id BIGINT NOT NULL,
content TEXT NOT NULL,
text_color TEXT NULL DEFAULT "white",
template SMALLINT NOT NULL DEFAULT 6,
background TEXT NOT NULL DEFAULT "night"
):
non of my defaults reference the any column as far as i can see.
why do u have a colon and not semicolon
besides that :, it seems to be valid
Single quotes, not double quotes, right?
yea single quotes too
“White” means column ref. Single quote means literals
It’s a common mistake, esp coming from Python
ALTER DATABASE test READ ONLY = 1;
why this MySql command is not working?
[42000][1064] You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'READ ONLY = 1' at line 1
i am using phpmyadmin
and jetbrains data grid IDE
It’s probably something like set read_only. Where did you get that from?
from mysql tutorial
it is valid syntax
It’s not valid syntax, it’s certainly not ‘read only’ with a space. Edit; I’m wrong, new feature
but i am new in databases idk it is depends on version or somthing
#MySQL #SQL #database
MySQL CREATE USE ALTER DROP a database tutorial example explained
00:00:00 What is a database?
00:00:22 CREATE
00:01:42 USE
00:02:06 DROP
00:02:40 READ ONLY
00:03:40 ENCRYPTION
00:04:03 summary
How to enable dark theme: https://sebhastian.com/mysql-workbench-dark-theme/
see 2:57
and i wonder i am using mysql but in the error there is mariaDB wtf?
mariaDb is plugin or somthing like that?
and why this mysql website is bullshit XD
Oh, I’m thinking the global ‘set read_only=true’.
So, I stand corrected: https://dev.mysql.com/doc/refman/8.0/en/alter-database.html
Perhaps your version is older than 8.0.22?
probably
some commands works
some not
Database client version: libmysql - mysqlnd 8.2.4
bruh
That’s your client, not the server
Server: 127.0.0.1 via TCP/IP
Server type: MariaDB
Server connection: SSL is not being used Documentation
Server version: 10.4.28-MariaDB - mariadb.org binary distribution
Protocol version: 10
User: root@localhost
Server charset: UTF-8 Unicode (utf8mb4)
Oh, Maria db is different than MySQL.
I think it’s considered a fork
phpmyadmin uses mysql
i am think i am gonna use traditional way to use mysql
idk why phpmyadmin do not work
I’m just saying: Maria db is not MySQl (although it is very similar and is a fork), so that documentation isn’t relevant. I don’t see any read only alter database statement for mariadb, but you could use set global or set
okay thanks
I don't understand what's the problem here
I m trying to update it but ut show errors
As the error messages said: The column state.cid is referencing country.c_id but fails, which means there is no country where c_id = '0'
Ofc it's not that's why i m updating it
You cant update a referencing column if the reference does not exists, that's the whole point of an foreign key, to keep integrity.
How can i make it exists
Reference
There needs to be a country with c_id '0' (the value you try to reference). But 0 isn't a good value for an id.
If state.cid does allow NULL values you can set it to that if there should be no reference.
Idk how this guy did that
@grim vault
See he just updated it as 0
Even tho i was country_id reference of country (c_id)
Maybe he has a country with id 0? Or he switched off the reference checking?
Idk he didn't told this, he just open mysql and updated it but now i m trying to do it shows erros
Can i switch off it?
Then it will work?
I don't know mySQL but I guess it can switch it off. You'll need to look in the docs.
Ah
You normaly don't switch it of, you'll run into trouble if you want to switch it back on if you messed up.
Hey, I have a question about Linux permissions and python. I am building a Flask login and registration app and I am using PyCharm for my IDE. I am having some trouble figuring out how to give sudo permissions to my app so that it can access the MySQL database. Also I guess I would add that I am using venv virtual environment for the python interpreter. Any help would be much appreciated thanks!
cant u just run it with sudo?
sudo path/to/venv/python main.py
oh you're using flask
why you need sudo to connect MySQL
you almost certainly shouldn't need root user access to connect to mysql from your app
Hello! My web scraping project that checks for restocks/new items on funko.com is having some issues saving scraped data to the json database and im not sure why. I have googled this issue and asked chatGPT for help but still no luck. Can someone check and see if I'm missing something simple?
my support thread is: #1130732206445301790
what did i do wrong?
First create and activate the environment, then install libraries
so i need to redo everything ?
now the environment is active
it shows Requirement already satisfied:
ohh bro, i just restarted VS-Code. Its normal now, no underlines
thank you
How do I learn SQL send me a course in google
https://www.sqltutorial.org/ and https://sqlbolt.com/ are both great resources I've been recommended and have used used personally to learn SQL
This SQL tutorial helps you get started with SQL quickly and effectively through many practical examples. After the tutorial, you will become an SQL Pro.
SQLBolt provides a set of interactive lessons and exercises to help you learn SQL
Date base?
?
Course datebase
I don't have courses for SQL or databases unfortunately, the second is a bit like like a mini SQL course though
SQL is the query language used with (most) databases, SQL != databases
so backend programming is, html, java, sql, linux & unix? & these are all databases?
sql is a language used in some databases ^
linux is OS and idk how is it related to this list
Its because i have 2 folders titled frontend and another titled backend, trying to decide where to put all the programs in which folders ._.
it sounds like many can be used for both front or back end?? but most of the time which are each used for? front or back
Any (almost) programming language can be used for backend (It only should be able to send requests)
Front is only html and css (and javascript maybe, i am not sure how does it counts)
thanks! just re arranged my folders.. i had it all backwards.. ._.
thoughts around having a local DB for speed purposes and then doing a daily backup to Google Cloud? (Which my data team will access for reporting/Power Bi etc)
Right now I’m just making a lot of fast api requests and storing the responses.
When I used SQLAlchemy/MySQL on my own server, the calls were going through A LOT faster
Now that I’m using Google Cloud, it just seems so slow
Are they in different datacenters?
If so it obviously wouldn't perform that well
I use local databases for in-process analytics, and it works fairly well... but, there's a bunch of things to consider... like, could you fix your "speed" purposes with batching (fewer/larger requests) and caching?
Or, do you move your local server to be co-resident with the database (in google cloud).
Yeah different data centres, I’ll look into batching
I’m quite new to handling this amount of data, so would be good to get it right
What would be a.. "better", more traditional approach? Say I want to store stock price data (OHLC). Should I just store the data with an interval of, say, a day, and calculate the OHLC for the week, month or whatever greater interval I want (but that means spending more time on processing the data each time I make a request), or would it be better to make separate tables to store for a day, week and month intervals (no extra processing when requesting data, but extra, redundant memory usage)?
pretty sure you can just use a materialized view if this is on any reasonably modern DB
Clickhouse and other timeseries DBs also have the ability to roll up old data into pre-computed tables
I do use a local duckdb instance for my local analytics, too, if you're doing OLAP type work.
If you change the initial table, does your materialized view also change, or stay the same?
normally it depends on the database
some materialized views can update when the main table updates, others are manually triggered
Since materialized views are cached and occupy memory.. how are they different from just a new table?
I mean everything occupies memory... It'll still be storing it on disk though, it doesn't hold the data purely in memory.
Normally your DB is plenty smart enough though to manage it's cache pools.
They're not different to new tables, they are just tools for doing things like pre-computation on a table and effectively caching it, saving you the additional computation from running the query over and over
curious if anyone knows of a good data engineering server?
i'm a member of a server called "Data & Dev Ops" but i never go in there so i can't say if it's good or not
this channel + #data-science-and-ml are the appropriate channels here, as well as #software-architecture maybe
I'm open it it if you want to send me an invite
there is also https://discord.gg/GEG8R7u7
Hello, I need help.
I'm trying to host my bot in sparked hosting, but apparently is not getting my database. I imported my database, but it catches as a file.
Does any one knows how to install the bot correctly with database in sparked host.
??
hello, can somebody help me? if I want to execute a query like so:
what SQL indexes should I create for the snapshot table?
the way I would do it in DynamoDB is create a custom field that concatenates the "shopId" and the "timestamp" columns and make that field an index
Which is it database engine?
Anyway, default indexes onto
Shop id and Shop snapshot shopid are sufficient
Should be valid for all relational database engines
right. but someone from my team said the query is faster if we make the timestamp an index as well. i don't quite get it
And they are right. Order by does searching table too.. but
It's just SQL in general. It's from a systems design interview, i didnt get why they ask sql questions either
There is room to optimize. There is special index type more better for timestamps (in postgres at least)
i see
Not remembering name from a spot
Point of better index type = less disk size will be taken / optionally faster queries
i see. thanks
It is observable if u will run your query as
EXPLAIN query
Database like postgres will build u plan how it will process it
Where sequence scans, it needs to check full table to gather data for this field
If it is backend position, truly valid questions. Relational database is corner stone making 50%+ of a job required knowledge
For system design obviously valid too. Can't design system without databases usually
Relational database are good default that yield acceptable result in terms of application quality (mongo is horrible in most of cases)
I see. but systematically, how does the Binary Search Tree work when working with Two types of Indexes in 1 query?
I can fully understand if it's just one index which is the "ShopId" in the "ShopSnapshot" table
Relational database are better because they are.. relational, data typed, and having data migrations between app versions possible. They are easily consistent in stored data, data is having integrity of being validly linked across tables
Observe EXPLAIN plan of a query
Learning mysql and mangodb ia important? Both
First is necessary, second is optional that may never be needed in entire career (or at least never can be justified)
Plus MySQL i would have replaced with postgresql.
Depends on country/company though
Thoughts on SQL Server?
good evening everbody, i've the problem on VPS centos 7, i've already installed python and i'm tryng to connect the database mysql but don't connect. when i've used on terminal, work perfectly. it's same host, user, password and name database. same on remote
`import mysql.connector
try:
# Apri la connessione al database
db = mysql.connector.connect(
host='xxx.xxx.xxx.xxx',
user='root',
password='xxxxx',
database='xxxxxx'
)
# Utilizza la connessione per le operazioni sul database
finally:
# Chiudi la connessione
if db.is_connected():
db.close()`
By same host you mean same machine?
yes
`Traceback (most recent call last):
File "/root/anaconda3/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 291, in _open_connection
self._cmysql.connect(**cnx_kwargs)
_mysql_connector.MySQLInterfaceError: Access denied for user 'root'@'localhost'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "testconn.py", line 9, in <module>
database='nuoto_argentina'
File "/root/anaconda3/lib/python3.7/site-packages/mysql/connector/pooling.py", line 293, in connect
return CMySQLConnection(*args, **kwargs)
File "/root/anaconda3/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 120, in init
self.connect(**kwargs)
File "/root/anaconda3/lib/python3.7/site-packages/mysql/connector/abstracts.py", line 1181, in connect
self._open_connection()
File "/root/anaconda3/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 298, in _open_connection
) from err
mysql.connector.errors.ProgrammingError: 1698 (28000): Access denied for user 'root'@'localhost'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "testconn.py", line 16, in <module>
if db.is_connected():
NameError: name 'db' is not defined`
Seems like invalid username+password to me
it's same user and passw when i've entered with mariadb, and external same
and sqlbrowser, same user and password, it's correct
work on WSL connecting mysql external
Looks like permission issues. Ran into similar errors when J first started working with sql server. Had a big headache from getting all the permissions set up prior to doing anything.
Anyone here is familiar with SQLModel ?
I have an issue where the Relationship field is not being populated
My issue has been closed due to not receiving any answer, so I'm using it to share the problem instead of re writing it again another time.
How are you querying your data?
Also I'd personally recommend to use sqlalchemy instead
This look ok?
Looks good! A few thoughts that pop up:
- Does it make sense for
dateto be a separate table? priceseems to be an OHLC candlestick, maybecandlestickwould be a better name?- Does the candlestick not have a resolution? I.e. minute, hour, day, ...
- I'd expect the currency to be an attribute of the ticker, rather than the exchange
- Wouldn't it make more sense for
change_52_weekto be an attribute of the ticker? - It may be practical for the
tickerto have a spot price attribute. Is the current price is the close of the latest candle? - Have you considered something like a unique constraint over (
ticker.exchange_id,ticker.symbol)? - Have you considered having a
currencytable?
Does it make sense for
dateto be a separate table?
I think so, because the date will repeat for different tickers in the price table, so I made it in order to not repeat the date
maybe
candlestickwould be a better name?
maybe, but it also includes change_52_week, and that's not part of a candlestick, so.. idk, I think this is fine.
Would price*_data* be better or worse, considering that change_52_week isn't exactly the price, but the change, and volume is the number of units traded?
Does the candlestick not have a resolution?
It's day. I was thinking to also add separate tables for week and month with triggers to fill them when the day table is updated, but will probably do that later (sticking with sqlite for this project, so materialized views aren't an option)
currency to be an attribute of the ticker, rather than the exchange
Will look into that
Wouldn't it make more sense for
change_52_weekto be an attribute of the ticker?
I don't think so, because it changes every day, as does the price. The ticker table pretty much doesn't change. I could make a separate table for it, but I see no point as it changes with the price (one less table to insert rows to)
It may be practical for the
tickerto have a spot price attribute. Is the current price is the close of the latest candle?
Good idea, might implement that in the future, but don't need it for now. I suppose it would have to be constantly updated.. maybe later
Have you considered something like a unique constraint over (
ticker.exchange_id,ticker.symbol)?
Yeah, all non foreign keys are unique in all tables except price
Have you considered having a
currencytable?
Good point, again, dunno y I don't think about that
Is it fine that ticker_id and date_id are the joint primary key of the price table, while also being foreign keys?
At first I had a separate id column as the primary key, but then I thought.. what for, if I don't use it anyway as a foreign key, and I can just set a joint key on those two. Is that fine?
I'm using FastAPI as my backend
I meant the actual query that you use to query your data
async def get_all():
async with AsyncSession(engine) as sess:
stm = select(Shape).options(joinedload(Shape.vertices))
fetch = await sess.execute(stm)
data = fetch.unique().all()
logger.info(data[0])
return jsonable_encoder(data)
Here's the full get_all() method
Hm, the problem is with Shapes.vertices, correct?
Yes
vertices: List["Vertex"] = Relationship(
back_populates="shape", sa_relationship_kwargs={"lazy": "joined"}
)
I would refactor your code a bit if you don't mind
Go ahead!
class ShapeService:
def __init__(self, session: Annotated[AsyncSession, Depends(get_session)) -> None:
self._session = session # You can use FastAPI's dependency injection to provide session into your service class
async def get_all(self):
stmt = select(Shape).options(joinedload(Shape.vertices))
return (await self._session.scalars(stmt)).all()
@router.get("/shapes")
async def shapes_list(
service: Annotated[ShapeService, Depends()]
) -> list[Shape]:
return await service.get_all()
Probably like this
Main change is with how you manage your session and how you use it
.execute method is only useful if you fetch multiple objects per row, but you fetch one model ,so you can use .scalars, it would provide you with a flat list
Also you can use DI to provide session into your classes and endpoints without creating it manually
I will modify accordinly
Let me know if you want me to explain something more in depth
Alright thank you !
Can you help me with the get_session() ? Sorry but I'm fairly new to FastAPI & SQLModel
Yep, it should just return a session:
async def get_session():
async with get_session() as session:
yield session
Don't try to commit in there, since it could lead to some weird side-effects
engine = create_async_engine(url=f"{DB_URL}", echo=True, future=True)
I have have this as my engine
I would personally recommend to use sqlalchemy + create your schema objects manually
But that's just my opinion
also aren't you doing an infinte loop here ?
o.o
my brain stopped working
from sqlalchemy.ext.asyncio import create_async_engine
from sqlmodel.ext.asyncio.session import AsyncSession
from .env import DB_URL
engine = create_async_engine(url=f"{DB_URL}", echo=True, future=True)
async def get_session():
async with AsyncSession(engine) as session:
yield session
I think it should be like this
xd
idk
Oh, i actually meant to use a different thing instead of get_sesion 😅
You can create a sessionmaker object, it would work kind of similar to functools.partial, so you don't have to specify engine each time
But yeah, your code is more correct
Thanks ^^
Okay so I solved my problem
# imports here...
class Shape(ShapeBase, table=True):
id_shape: Optional[int] = Field(default=None, primary_key=True)
vertices: List["Vertex"] = Relationship(
back_populates="shape", sa_relationship_kwargs={"lazy": "joined"}
)
class Vertex(VertexBase, table=True):
vert_id: Optional[int] = Field(default=None, primary_key=True)
shape: Optional[Shape] = Relationship(
back_populates="vertices", sa_relationship_kwargs={"lazy": "joined"}
)
I modified the models as follows, the Base version is where all the data attrs are
class ShapesCrud:
def __init__(self, session: Annotated[AsyncSession, Depends(get_session)]) -> None:
self._sess = session
async def get_all(self):
return (await self._sess.exec(select(Shape))).unique().all() # type: ignore
# Rest of the CRUD ops...
@router.get("/all", response_model=List[ShapeWithVertices])
async def get_all_shapes(*, map_crud: Annotated[ShapesCrud, Depends()]):
return await map_crud.get_all()
Following the docs of SQLModel, it should be like this
class ShapeBase(SQLModel):
file_name: str = Field(max_length=100)
district: str = Field(max_length=100)
program: str = Field(max_length=100)
governorat: str = Field(max_length=100)
commune: str = Field(max_length=100)
residents: str = Field(max_length=100)
area: str = Field(max_length=100)
class VertexBase(SQLModel):
lat: float = Field(default=0)
long: float = Field(default=0)
shape_id: Optional[int] = Field(default=None, foreign_key="shape.id_shape")
class ShapeRead(ShapeBase):
id_shape: int
class VertexRead(VertexBase):
vert_id: int
class ShapeWithVertices(ShapeRead):
vertices: List[VertexRead] = []
you can use type annotation instead of response_model
let me try
Also I belive it should be scalars instead of exec 🤔
Yeah, but there's only one object in each list
same result xD
when you do exec select * from (1, 1, 1); you'd get
[
(1,),
(1,),
(1,),
}
with scalars it's just [1, 1, 1]
[
{
"file_name": "",
"district": "",
"program": "",
"governorat": "",
"commune": "",
"residents": "",
"area": "",
"id_shape": 2,
"vertices": [
{
"lat": 123,
"long": 123,
"shape_id": 2,
"vert_id": 10
},
{
"lat": 114,
"long": 457,
"shape_id": 2,
"vert_id": 11
},
... // rest of the vertices data in the list and the next shape object
something like this
I get the same result with both exec and scalars
probably fastapi or sqlmodel does some "magic"
Hey guys, please help me or push me in the right direction
I have a DateTime column with the format
Example: "2020-08-05 0:00:00". (Please notice hh has only 1 digit)
SELECT strftime('%m', InvoiceDate) To extract the date. But it always returns none.
Works when I do it like this
Does not work when I do it as a column
This is the Data Type
I've also tried casting to Date as the YYYY-MM-DD format. but then it returns the year and nothing else.
Does anyone know what I could be doing wrong? by the way, I'm using SQLite through Jupyter
where is that %%read_sql ipython magic command from?
from sql_magic
wait what? why do you only have one digit for the hour
there's a fairly high chance that it's breaking because of that
# >>> con = sqlite3.connect(':memory:')
>>> con.execute('select strftime("%m", "2020-01-01 10:20:30")').fetchall()
[('01',)]
>>> con.execute('select strftime("%m", "2020-01-01 5:20:30")').fetchall()
[(None,)]
yeah, that's the odd thing I cannot understand myself. The data is like that throughout the column.
Do you know how I could "change" that?
where did you get that data from? a csv you imported once and will never have to update again, your own program inserted into it, someone else's program or something else
Imported from a CSV, however the data is the same there too. let me show you.
However, when I split the data using Excel it inserts the correct date format. as HH not just H
I'd just str.replace(" 0:00:00", " 00:00:00") before importing
I thought of that too. However, I'm not allowed to change certain parts of this project, And the importing part is one of those.
might as well take a substring / slice at this point
I’m currently doing an sql course and doing a oracle, spreadsheet and currently transaction. I barely understand the definition.
that's easy. transaction just means => do all actions or nothing. if any error is encountered before transaction is finished => then everything will be reversed
and then we have transactions locking each other and different transaction levels with different level of data dirtiness. the fun gets stronger
I understand. Do all command and if one fail, reverse it in that transaction
The book I’m reading has like bunch of lock transactions like deadlock, binary lock, row lock etc. i guess the main idea to understand the basic knowledge of transactions
Question about indexing. To give some context, there will be joins, and where clauses. So exactly which columns should I index? All columns that are referred to in the where clauses? All columns that are joined on (in both tables, or only in the largest one? maybe only in the tables that are updated less often? or how?)? All of the above?
Yah, was wondering same, I use jupysql usually.
Does something like SELECT to_timestamp(_string_column, 'YYYY-MM-DD H:MI:SS') AS ts FROM tbl ; work?
Otherwise, you could just create a new text column and replace like etrotta recommended, or replace in a subquery.
What are the best sources of databases you guys use?
Sqlite3 for simple pet projects, Postgresql as an average default for everything else that fits 95%+ usage cases 😆
That question doesn't make any sense though
If you want some idk random data to test something then you can use one of the many random data generators similar to faker
But you don't normally just randomly download database 'sources' full of random data that you don't actually need
If you’re looking for research datasets, Kaggle is one place
exactkly what i needed ty
I have a little issue trying to figure out the best way how to handle this case. May be someone have better idea how can it be structured.
For simplicity I remove other tables and unnecessary columns.
I have 3 tables: user - user accounts, organization, departments - multiple departments within organization
org columns:
id | name
dep columns:
id | org_id | name
I think it self-explanatory, one org can have multiple deps.
Now, the problem created in user table. For user I need to store which org it form and which deps he is in, user can have multiple deps.
At the moment I handle it like this:
user columns:
id | ... | org_id
and add intermediate table:
user_department columns
id | user_id | dep_id
Now, this solution is not ideal because theoretically you can have department that from other organization.
One idea was to use columns like this:
id | user_id | org_id | dep_id
But it not ideal as well, because with this solution user now can be in multiple organizations.
So, any idea how can it be better organized?
PS. I using PostgreSQL
In theory I could also use something like text array for deps in user table, but this remove constraint checks, which is not ideal either.
yes
but it can be in 1 or more departments
so I need to store user org and deps
The problem that if there would be department from other organization it will create issues, because later on when report for organization is issued it will make a report for all users and then pull some extra data per department.
I wish to store departments with constraint to organization.
If organization id was saved in this table it is possible to do and check, but then it create a problem that if someone save two different organization for user, then user will be in multiple organizations, which is wrong.
Could it be implemented within insert statement? (INSERT INTO ...)
to throw an error or not add if wrong id was passed
If I at least make safe insert, then I may use this solution
I'm currently looking at one idea that I just found with foreign key, but with multiple columns. I'll see if it will work first. I find it for MySQL, but I think that Postrge should have similar thing.
I find solution with only tables, it uses 3 tables for storing user org data:
- first is user table itself
- 2nd is 2-column
user_id | org_idwith both being PK and first being unique - 3rd with 3 columns
user_id | org_id | dep_idwith FK from uid, oid to 2nd table and FK oid and did to dep table (it has one change that id and org_id columns are unique in order to work)
In SQL it is like that, 2nd table:sql CREATE TABLE IF NOT EXISTS public.user_org ( user_id integer NOT NULL, org_id integer NOT NULL, CONSTRAINT user_org_pkey PRIMARY KEY (org_id, user_id), CONSTRAINT user_id_unique UNIQUE (user_id), CONSTRAINT org_id FOREIGN KEY (org_id) REFERENCES public.org (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT user_id FOREIGN KEY (user_id) REFERENCES public.user (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE )3rd table:sql CREATE TABLE IF NOT EXISTS public.user_org_dep ( user_id integer NOT NULL, org_id integer NOT NULL, dep_id integer NOT NULL, CONSTRAINT user_org_dep_pkey PRIMARY KEY (user_id, org_id, dep_id), CONSTRAINT org_dep FOREIGN KEY (dep_id, org_id) REFERENCES public.dep (id, org_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION NOT VALID, CONSTRAINT user_org FOREIGN KEY (user_id, org_id) REFERENCES public.user_org (user_id, org_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE )
So if we had data like:
- user 1, 2, 3
- org: 1, 2, 3
- dep's with columns id, org_id:
1, 1,2, 1,3, 2,4, 3 - then if we assign to user 1 org 2, in table
user_org_dep: - we can't insert
1, 1, 1because org is 2 - we can't insert
1, 2, 1,1, 2, 2because dep is not for org 2 - we can insert
1, 2, 3
Hi! Is it possible to get SQLAlchemy to return the proper types for rows here?
rows = (await session.scalars(
select(Scan.name, Scan.version)
.where(tuple_(Scan.name, Scan.version).not_in(valid_packages))
)).all()
rows is just Sequence[Any] which, I'm not a fan of. Both Scan.name and Scan.version are strs so I was under the impression it could infer that rows should be a Sequence[tuple[str, str]] or something of that sort.
Thanks!
do you use SQLAlchemy 2.0?
2.0 added new syntax for creating tables, like:```py
class User(Base):
username: Mapped[str] = mapped_column(String, unique=True, index=True, nullable=False)
this Mapped will return types in the answer
https://docs.sqlalchemy.org/en/20/orm/declarative_tables.html you can read more here
Yup, I'm using SQLA 2.0 and the relavant parts of my model look like so:
class Scan(Base):
name: Mapped[str]
version: Mapped[str]
if Scan from above is SQLA model, then it should return proper types
Do you use Pydantic in a project overall?
Yeah, this is inside of FastAPI so I am using Pydantic models
for example, FastAPI used it out of a box
then next thing I usually do is:```py
result = await session.execute(select(User).where(User.username == username.lower()))
user = result.scalar_one_or_none()
return User.model_validate(user) if user else None
with that after that it would be User model (Pydantic one)
I solved it by myself
I just told you how I solved it
er?
no
oh, you wish to see how it would look like?
I did that simply in a testing environment
creating temp tables and see for an outcome
constraints on one company per user and multiple departments from that company, but checking that department actually assigned to this company and not any other
I have re read it already five times as still not understanding your issue
Could you provide example of wrong query, wrong results you are going to query and expected results instead?
That is easy
User table has company id
- company_id should have index
Table company departments:
Id, company id, department id
As a source of truth of existing departments in company (feel free to normalize department if it has repeative data or not)
- Unique combined constraint onto (company id, department id) can be added
- Obviously u need indexes onto both foreign keys individually
Then table
User company departments
User_id, company_department_id
- Unique combined constraint onto (user_id, company_department_id) can be added
- U need indexes individually on both foreign keys
Should make everything auto validated
Did anyone know about vector databases?
hii, where can i find a gpd per capita by state dataset?
im looking for a database containing ethical questions, cant find any on kaggle or anything simillar
Hi everyone, i am new here. Can someone provide me some advice, on how to get cndidate keys from this data?
here is my attempt, but it is wrong.
P.S. *Candidatekeys
candidate_keys1 = []
for i in range(1, df.shape[1] + 1):
combinations = df.drop_duplicates(subset=df.columns[:i])
if combinations.shape[0] == df.shape[0]:
candidate_keys1.append(tuple(df.columns[:i]))
print("Candidate Keys:")
print(candidate_keys1)
from typing import Iterator
from dataclasses import dataclass
@dataclass
class Order:
candidate_code: str
def __init__(self, *args: list[str]) -> None:
self.candidate_code = args[3]
def get_lines() -> Iterator[str]:
with open("file.csv", "r") as file:
file.readline() # skip headers
for line in file:
yield line
def get_orders() -> Iterator[Order]:
for line in get_lines():
yield Order(*line.split(","))
if __name__=="__main__":
unique_candidate_codes = set()
for order in get_orders():
unique_candidate_codes.add(order.candidate_code)
print(unique_candidate_codes)
here you go, if to be fabulous about it
feel free to fix, instead of args[3], taking whatever value was necessary in number
Use Iterator instead of Generator 
fair. thanks 😊
I've tried to switch to MySQL server on my Ubuntu VPS.
I've been able to install it, open port 3306, add it to my PY Flask config and create the DB/Tables, etc. (All good on that side)
However, I'm unable to connect to it via MySQL Workbench on my local PC.
I've checked my VPS' firewall and can see it's allowed, I checked UFW on the VPS and it's allowed.
The MySQL User has the following:
| user | host |
+------------------+-----------------+
| data | % |
I've tried restarting the VPS, my local machine, resetting the UFW rules, no luck.
I am able to use the VPS' IP via HTTP, SSH and FTP, so i'm not entirely sure why i'm having no luck with port 3306!
Any ideas would be greatly appreciated
Edit:
MySQLWorkbench simply says "Unable to connect to localhost"
With our without a password provided, so doesn't seem to be the user
sudo lsof -i -P -n | grep LISTEN returns:
mysqld 57774 mysql 23u IPv4 332687 0t0 TCP MYSERVERSIP:3306
How do I correctly create document models in different files where BackLink and Link are used in odm beanie for mongoDB?
I don't know if it's possible to do the same as on Tortoise ORM
Does join order matter (performance-wise, not asking about the result) when there is a composite index on foreign keys participating in the join (many tables are joined together on primary-foreign keys; one table has a composite index on several foreign keys)? Or will the query optimizer take care of things? It doesn't take care of things in the where clause.. order of conditions matters for performance.. thus my question
are we what, a sql relational database query engine optimizer to you? 🤣
run EXPLAIN on_query1 and EXPLAIN on_query2
optionally EXPLAIN ANALYZE query_too
read and compare results
oh. btw, i actually remembered, order matters on composite indexes i think. remembered i already encountered that explicitely told i think.
not 100% sure though. my memory can be faulty. so best EXPLAIN ANALYZE stuff 🙂
I was hoping there was some theory behind it, as how conditions on composite indexes should have an equality on the leftmost condition, and don't work after an inequality condition. I'd never deduce such a thing from EXPLAIN 💀
dissregard my bubbling above. i actually reread official docs for postgresql
https://www.postgresql.org/docs/current/indexes-multicolumn.html
Currently, only the B-tree, GiST, GIN, and BRIN index types support multiple-key-column indexes. Whether there can be multiple key columns is independent of whether INCLUDE columns can be added to the index. Indexes can have up to 32 columns, including INCLUDE columns. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)
A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
actually read it whole. it looks like order matters.
Yeah, order does matter. But I was specifically wondering if the order of joins matters when joining on fks with a composite index
of course it would matter. because depending on your order of joins, different sized sub tables can be participating in next queries.
So... exact answer only EXPLAIN ANALYZE can give you, if it matters for your queries or not.
it depends :'3
async def create_one(
data,
model,
db: Session,
):
query_statement = insert(model).values(**dict(data))
try:
db.execute(query_statement)
db.commit()
except Exception as e:
db.rollback()
return False, e
return "Success", None
Rather than returning "Success" I want to return the newly inserted row with the updated data. How do I do it?
Or should I just simply rely on session.add() here ?
You could use orm instead of using core here
Even then, you could use .returning method of Select
So:
- Ideally you shouldn't manage transactions inside of your functions, since
sessionis passed in by the caller he should manage it, e.g.:
async with async_session_factory.begin() as session:
await create_one(session=session, ...)
- Rollback is managed by the session itself, if you use context manager
e.g.:
async with async_session_factory() as session:
# If exception happens here session would automatically rollback
As to solve your initial problem I'd say work with orm, but I don't think you need that create_one function at all:
model = Model(**data)
session.add(model) # This adds ORM model to session, it would track all it's changes and emit according SQL to db when you flush
await session.flush() # This flushed session changes to db
print(model.id) # model.id should be populated at this point
as an alternative you should be able to do something like this:
stmt = insert(model).values(**data).returning(model)
instance = await session.scalar(stmt)
But I would generally prefer to work with orm, not core in this case
@paper flower got it thanks
Also read sqlalchemy docs, but they could be hard/confusing at first 😅
@paper flower Exactly I am confused with this version updates. Query has been legacy so we got to use select. Other than that am I good to go with other crud operations like update and delete?
Using the orm and not relying on core
You could use orm for deletes and updates too
model = ... # Some model
model.name = "New Name"
session.add(model) # if wasn't already added
await session.flush()
await session.delete(model)
For bulk deletes you can always use delete statement
Got it!
query = db.query(model).filter_by(id=id)
rating = query.first()
if not rating:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Not Found",
)
query.update(
model.model_dump(), synchronize_session=False
)
db.commit()
return query.first()
Does this work for the update? @paper flower
Don't use query
What do you want to update here?
So there are like 6 fields in the row including an id i.e primary key auto-generated. Except this one (obviously) a user using this endpoint can update any field!
So one sec let me send the whole method. I’m using fastapi bydway
@router.put("/ratings/{id}", status_code=status.HTTP_200_OK)
async def update_ratings(
id: int,
updated_rating: RatingUpdate,
db: Session = Depends(get_db),
):
"""
Updates a rating
"""
query = db.query(Rating).filter_by(id=id)
rating = query.first()
if not rating:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Rating with id {id} not found!",
)
query.update(
updated_rating.model_dump(), synchronize_session=False
)
db.commit()
return query.first()
This is the only place I am having problem and I had no other option left other than using query. Documentation was hell confusing!
You can destructure your pydnatic model via model.model_dump(), but I personally don't like it because type checkers won't see anything wrong with it, though, the choice is yours
You can update your db record like this:
@router.put("/ratings/{id}", status_code=status.HTTP_200_OK)
async def update_ratings(
id: int,
schema: RatingUpdate,
db: Session = Depends(get_db),
):
"""
Updates a rating
"""
record = await db.get(Record, id)
if not rating:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Rating with id {id} not found!",
)
for key, value in schema.model_dump().items():
setattr(record, key, value)
db.add(record)
db.commit()
return record
Personally I'd explicitly assign all fields
e.g.
record.something = schema.something
record.something1 = schema.something1
...
But that's just me
Does it like play role with performance or something ?
This works. Would you check my create, read, delete route ?
This? No, but type checkers won't work with setattr
ORM should be somewhat less performant, but you wouldn't really notice it
Sure
e.g.
I can have
class BookCreate(BaseModel):
title: int
class Book(Base):
title: str
title: int is obviously a typo, and if I use setattr or **model.model_dump() mypy or pyright wouldn't be able to check that
But if I assign that property directly an error should be raised by a type checker
Any how this will not be a user facing api. This will be like for internal use.
Again, I think this shouldn't really matter
(performance-wise)
query = select(Ratings).limit(limit).offset(offset)
ratings = db.scalars(query).all()
return ratings
Get all ratings
Unless you're updating thousands of records
Nop just for single entries
Maybe
(await db.scalars(query)).all()
Also add order_by, without it order of returned records is not guaranteed to be stable
query = select(Ratings).where(
Ratings.id == id
)
rating = db.scalars(query).first()
if not age_rating:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Rating with id {id} not found!",
)
db.delete(rating)
db.commit()
return Response(status_code=status.HTTP_204_NO_CONTENT)
Delete
You can use Session.get instead of explicitly creating a select statement, but it may fetch data from session's cache, though, cache is local to a specific session and IIRC is deleted when you commit 🤔
And you can use db.scalar() instead of db.scalars().first()
new_rating = Rating(
**rating.model_dump()
)
db.add(new_rating)
db.commit()
db.refresh(new_rating)
return new_rating
Create rating
if you want to fetch only one record, and raise an exception otherwise, use
(await db.scalars(query)).one()
Make is async 🙂
If I remember correctly you no longer should need to call refresh after inserts 🤔
Yes! I had to add that, but really thanks for reminding 😅
I will check that! and also refresh
Thanks a lot @paper flower for helping me out. I am coming to sqlalchemy like after 1.5 years 🥲.
Would KeyError make the most logical sense to raise as an error for items that aren't found in a DB?
You always can create your custom one.
But i think yeah, KeyError is.
I think it could make sense only if you're trying to find a model by pk
But custom exception should be better, or just return None
I have to refresh after insert or else when I am returning the record after committing without refreshing I am getting an empty dictionary. refreshing makes it work. Don't know if it something specific to db.
Dictionary?
Can you share your code again?
I mean the record
Also what version of sqlalchemy you're using?
new_rating = Rating(
**rating.model_dump()
)
db.add(new_rating)
db.commit()
db.refresh(new_rating)
return new_rating
sqlalchemy version '2.0.19'
Works fine by me without refresh. Maybe you are trying to use it outside the session?
Can i do
value = cursor.fetchall()
print(len(value))
To get lenght of existing values
How do i use count?
For example i want to count
Where category and type is 1
"SELECT type FROM table WHERE category = ? AND type = ?", (1, 1)
It works same?
Return integer right?
Using SQLite btw
Wdym by null values?
I am a beginner in programming. Currently I want to create a bot and have to download discord.py but I can't download . can you show me.
sorry to bother
Look i want to count rows where both column has value equal to my condition
Thanks
Gotcha
Thanks
SELECT COUNT(type) FROM table WHERE type = ? AND value = ?; (1, 1)
Right
Thanks
Understood it now
i think COUNT(*) is better described as counting all rows that the query matched, although in this case it doesnt matter because type = NULL will never be true anyway
Modelling nosql data in situations where location matters weather data
If I have an entity whose fields could vary maybe weather/news. Users from different locations will follow different news items. Also based on locations different news content will be prioritized. So having a single news collection for all articles wont be feasible. What could be best approach for schema design in these cases?
daily reminder to remember OLTP vs. OLAP
Data architecture
OLAP database architecture prioritizes data read over data write operations. You can quickly and efficiently perform complex queries on large volumes of data. Availability is a low-priority concern as the primary use case is analytics.On the other hand, OLTP database architecture prioritizes data write operations. It’s optimized for write-heavy workloads and can update high-frequency, high-volume transactional data without compromising data integrity.
Ngl these statements are so general you can never really say that they're true. They're the catch all statements except that most DBs just dont fit into them, everything is too varied
true
still helpful as a heuristic of sorts in some cases
I feel like posting my hw question and answer I got
how do get into python using im new !
No, not bad. Very normal. We often have pipelines that chain complex queries. This is where dbt pipelines, for instance, come into play.
But also, are you familiar with CTEs (with clauses)?
No, more like a way of organizing your pipelines. Like, "query1 -> query2 -> query3"
Especially important when query1 might be used in different ways
A CTE is just a "with" clause: ```sql
WITH q1 as (...query1...), q2 as (...query2)
SELECT * FROM q1 JOIN q2 on q1.id=q2.id
"Common Table Expressions". Life sucked before them.
CTEs are necessary when you need to reuse the same subquery multiple times. It sounds weird, but you'll eventually run into this kind of problem.
Technically sqlalchemy isn't an orm 
Some popular orms don't support CTEs btw, e.g. django, tortoise, pony
ORM is a part of sqlalcemy
It's completely optional to use
fwiw, I don't use ORMs for most of my work... not very relevant for my types of use cases (reporting/analytics)
Yah, if it's logic that's used throughout multiple queries, you generally write it as a view. (dbt is just a tool to help with that stuff).
What do you mean by "pure" orm?
If it's logic that's used multiple times within a single query, you'd use a CTE. Or, you also use a CTE to simplify the structure of your query... so instead of sql SELECT * FROM (... subquery1 ...) join (SELECT * FROM subquery2 join (SELECT * FROM subquery3) on true) on true you'd write sql with q1 as (... subquery1... ), q3 as (...subquery3...), q2 as ..... select * from ...
sqlalchemy.orm is an actual orm 🙂
Sqlalchemy is an sql toolkit / query builder, but you can use ORM with any of it's features
I do use orms
https://paste.pythondiscord.com/SZ3Q
I am trying to extract the earlier upload date
fyi, I think you're just parsing an iso8601 string here: datetime.strptime(v, "%Y-%m-%dT%H:%M:%S.%f%fZ")
figured it out tysm though
Ok, there's builtin format codes for that
anyone tried out duckdb. your exp?
someone said it was basically sqlite but for olap and that clicked for me

Hi, I'm trying to web scrap from a website and have it displayed in a google sheet. Im able to get the data and able to get it onto the sheet, it just it doesnt stay int he collum I want it to stay in
sh.append_row([title, full_href], table_range="C3:C")
thats the code I'm using, and it starts off in the right collum, but then switches to collum A and B after the first iteratiion
How do I get it to stay in C and D
I use it every day. It’s amazing. We use it for a few purposes, including within notebooks, in some of our pipelines, and for quick prototyping
hi guys, I'm making requests to an external api i found on github, and the JSON response i'm getting from the api itself has the following formatting error:
**json continues above w/o issue**
{
"name": "foo",
"url": "https://foo.com",
},
{
"name": "bar",
"url": "https://bar.com",
{
"name": "foobar",
"url": "https://foobar.com",
},
**json continues below w/o issue**
this is the response i'm getting as it is from the api, and as you can see, in between the "bar" and "foobar" objects there is an open bracket { instead of a closed bracket }
i'm not really sure how to fix this, since all of the stackoverflow responses regarding errors with json formatting like this just say to change the json string into raw text using r{'string'} however, i need the api response to be returned as a coroutine in order to be used in an asyncio.run(response) function later on in the program
thanks for any help you guys have!
appropriate options
- opening Github Issue at their side to fix this stuff at this third party API
- not using this third party since they can't even build proper JSON (The most recommended option)
- downloading data in raw way, and parsing on your own with some regex magic
yeah look idk how they managed to mess up the json but cheers anyway
i reckon ill just create my own (actually valid) db and run it off that instead 🤷♂️
thats awesome to hear — gonna have to give it a whirl myself

Nice, one of their coolest features is being able to link Python functions as sql functions (udf’s).
i saw that and i was like neat!
If the problem were just the trailing commas (and not the missing tag) One option to consider is a yaml parser: ```py
import yaml
data = """
{data: [{
"name": "foo",
"url": "https://foo.com",
},
{
"name": "bar",
"url": "https://bar.com",
},
{
"name": "foobar",
"url": "https://foobar.com",
}
]}"""
d = yaml.safe_load(data)
print(d)
you fixed the missing closing one here, though
hilarious advice though :p
first rule of data engineering: make it work. Then make it good 🙂
i'm trying to use mongodb to watch a change on my database with
for change in mongoconnection.database.collection.watch([{"$match": { "operationType": "delete"}}]):
# do something here when a document be deleted
the biggest problem is i'm trying to do that inside a discord bot, so when someone type a /command it execute this loop, but if the person type the command 2 times it dont work, because he be watching the previous loop, so i was thinking if there is a way to break the previous loop when someone type the same command twice...
can u provide more accurate user scenario expected interactions?
how it should look from user point when he executes your commands
nothing special, the person use the command, the loop execute and wait for a change, if the change dont happen in one minute so the bot should stop waiting, if the person type the command twice the bot should ignore the first command and execute the last one
Oh wow, I missed that... I thought the OP was complaining about the trailing commas, never noticed the missing }
you describe programming logic you write to fit some unknown user interacting scenario
focus on user
describe what are expected results of commands
from the point of user
- User writes command x
user receives stuff y
nothing special could be just a await ctx.send('hello')
sigh, okay giving answer to totally hypotetical question then
your user should not interact/influence with this stuff start up at all.
you should have background process running all the time for all users catching event changes and sending appropriate stuff to users based on those reactions
separation of application into api that offers one off actions fired based on user input
and application part worker that process events
i solved it with one interesting way
endTime = datetime.datetime.now() + datetime.timedelta(seconds=60)
if collection.find_one({'user_id_here': ctx.author.id}):
collection.delete_many({'user_id_here': ctx.author.id})
collection_db.insert_one({'user_id_here': ctx.author.id})
while collection.find_one({'user_id_here': ctx.author.id}):
if not collection.find_one({'user_id_here': ctx.author.id}) or datetime.datetime.now() >= endTime:
break
await asyncio.sleep(1)
Trying to figure out pynamoDB and it's driving me nuts.
if not dbModels.pynamoUser.exists():
print("How the hell doesn't the table exist?")
else:
user_model = dbModels.pynamoUser()
user_model.id = 2
user_model.username = 'test'
dbModels.pynamoUser.save(user_model)
username = 'test'
# user_model = dbModels.pynamoUser.scan(dbModels.pynamoUser.username==username)
print(dbModels.pynamoUser.get(1))
user_model = dbModels.pynamoUser.scan()
print(dbModels.pynamoUser.count())
Table exists, user gets added. get returns the user. Scan returns nothing?! And count is 0 even though the table has 4 users...
Am I dumb or is something really weird going on here?
could anyone help me out, I'm trying to set the params in my response variable with the get_character variable, but it's not working it's returning a list and has no attributes
File "C:\Users\kylem\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\app_commands\commands.py", line 828, in _do_call
return await self._callback(interaction, **params) # type: ignore
File "c:\Users\kylem\OneDrive\Desktop\Python\DiscordBot\main.py", line 103, in get_character
f"Name: {get_Character.name}\n"
AttributeError: 'list' object has no attribute 'name'```
```py
class getCharacter:
def __init__(self, name):
self.name = name
@bot.tree.command(name="get_character")
@app_commands.describe(name = 'Name')
async def get_character(interaction: discord.Interaction, name: str):
conn = sqlite3.connect('character.db')
c = conn.cursor()
character = getCharacter(name)
c.execute("SELECT * FROM characters WHERE name=(?)", (character.name,))
print(c.fetchall()[0])
get_Character = c.fetchone()
response = (
f"{interaction.user.name} created a new character:\n"
f"Name: {get_Character.name}\n"
f"Level: {get_Character.level}\n"
f"Race: {get_Character.race}\n"
f"Class: {get_Character.role}\n"
f"Gender: {get_Character.gender}\n"
)
await interaction.response.send_message(response)
conn.commit()
conn.close()```
It looks like you’re calling fetchone after calling fetchall? Why are you doing both?
ah this seems to be a good rule of thumb. especially for looking at higher level patterns
this channel is about databases. data modeling falls under that topic
you can design a database without data modeling. you can also end up in a world of hurt later too

so i have 3 tables message, channel and chat
msg and channel are related by the channel_id foreign key in message and
channel and chat are related by the chat_id foreign key in channel
when inserting a message is it possible to get the chat_id in a single query using RETURNING?
Yeah in a CTE I'm quite sure you will pull this off.
Something like this would work in postgres. Not sure about other engines.
with new_message AS (
INSERT INTO messsage (id, channel_id, content)
VALUES id, channel_id, content
RETURNING *
)
SELECT nm.*, c.chat_id
FROM new_message nm
INNER JOIN channel c ON nm.channel_id = c.id;
How can I use a cell value as a parameter for a SQL query?
This seems relevant though I had to make many assumptions. For the record googled
“excel param in datasource odbc”
Please provide more details next time. Though this is database related I feel this has nothing to do with python and should find another server for this.
Hi there, Via ODBC I am connecting to a Pervasive (version 13) database. Is there a way for me to somehow make use of a parameter? For example, in the below..
Thanks, I have resolved it earlier and forgot to update here. I had to pick up some skills with power query to fully understand the solution necessary.
As a side note, the scope of this channel is "Questions and discussion about databases". This falls under that scope.
When defining SQLModel classes, how can I give a default value to all of the fields in the class without having to explicitly assign = "" to every field? FastAPI will give these fields a default value of string
class Item(SQLModel):
company_name: str
source: str
customer: str
has anyone ever had a "unable to connect to 'localhost'" connection error before? it worked fine before but i restarted my vps and now i can’t log in
hey, i want to store skins that have:
id, owner_id, etc
and i have an user that can have those skins and have them equipped or not, how would you make a conceptual model of that?
this is how i'm doing it, is it the best way? Or at least a good way?
attention that entity "knives" for example, stores all knives from all players
hi everyone, I wrote the code to view the vi files, who can rate?
this is a test version
guys, is there any database similar to mongodb but for android?
for android? wut?
the only thing "for android" would be firebase
90% of the databases are just server-side, platform agnostic
Hello! I just wanted to ask a quick question:
I'm running Python code on a machine with 32gb RAM, AMD Ryzen 9 5900x.
I'm using it to load hundreds of thousands of rows onto an SQL Database on Azure. I'm currently using a database with 5 DTUs, the most basic one since it's free. I don't care how fast it is as of now. What I need it for right now is enough.
However, I was wondering if the performance of loading so many rows would be better with a better SQL database instance. Basically, if the performance of this process is not only dependant on my machine, but also the database.
Currently, I'm loading around 10k rows/second with my machine and the basic SQL database of 5 DTUs. Could that be improved with a, say, 50 DTU database? How much could I gain, if so? I understand that 50 DTUs is basically x10 the resources, but I don't know if this would affect performance
i have no idea, look simple but im overthinking it
it sounds like DTUs are more optimised for Read operations than Write operations?.. If you just want to import one time, there might be more efficient ways than what you're doing rn
I'd probably check your network bandwidth before anything else though
Bandwidth is fine. I just want to know if, in general, "in theory", getting a database with more DTUs would improve performance, just with that alone.
I think ultimately you’d first need to construct a test to determine where the bottleneck is, I don’t any of us could tell you with the information you’ve provided.
Perhaps you're simply using whole thread on your cpu? 
In case your code doesn't use multiprocessing/threading
There’s a bazillion questions jd ask, like; where’s the data coming from? Are they inserting one row at a time? Is it a single table, and how ‘wide’? Latency between end points? Cpu load on source system, etc
There's no bottleneck. Currently it's fine for my use case. Out of curiosity I was wondering if database resources could also affect performance, and not just those from my PC, since after all, it's a VM with an SQL server instance
There’s always a bottleneck. Otherwise it would be infinitesimally quick
So the first thing is to determine if you’ve ‘optimized’ your local side such that the remote side is the bottleneck. Then, and only then, would the database matter
Say everything was perfect on the local side, and the remote side is the bottleneck. Would more DTUs improve the row loading speed?
I mean yah, if the bottleneck is server side and since dtu represents cpu, memory and io limits, more dtu’s would mean more capacity for whatever was your bottleneck in the database (cpu, memory or io).
Generally speaking though, a bulk load direct to db will be the fastest vs loading individual rows
(And server sizing is certainly a factor) there’s probably (certainly) some limits where dtu’s stop mattering… such as when single core speed is maxed or io limits for the storage layouts
So, I’d expect a performance plateau for a single writer to a single table at some number of dtu’s.
i see
thought is there a way to know my mongodb URI?
i need it to continue my project
if you mean like "is there a risk of someone reverse engineering my app", not sure, I've never made apps for mobile
nope
you mean like you literally don't know your own URI?
yeah
which hosting service are you using? Mongodb Atlas? or self-hosting?
i believe self-hosting
did you set it up at all?
look up their documentation then
oki doki
Hello i have a question its a program for ripping Streamz for Belgium?
Hello is it possible to provide sslcert argumets to pg_dump in order to generate database backup?
I can only connect to the database by means of SSL certs. Telling to generate backup from dbeaver prompts an error saying that tlsv13 alert certificate required
It also tells me the command that executes behind the scenes using pg_dump and it doesn't include arguments for rootcert, key, client cert...
I wonder if there is another way to create a backup in which you can provide SSL certificate. Thank you in advance. Any help is very welcome
Hello Guys, I am using dynamoDB on AWS to store data about posts for my application. A example of a table in my application would be the Post Likes table. Where i store the id of the user and post but also a created_at variable as to when the post was liked. I need store the following:
- user_id
- post_id
- created_at
The following queriies will need to be made on the table: - get the liked information based on user_id and sort it by created_at
-get the liked information based on post_id and sort it by created_at.
I am new to dynamoDB and was wondering how I could set up the keys on my table. I seen ways of using GSIs, so for example:
PK: user_id
SK: created_at
GPK: post_id
GSK: created_at
But i was wondering whether there is a way to avoid using GSIs as they are an additional cost to using them. I would appreciate some help. Thank you!
@storm mauve what do i do here?
hi everyone, I wrote the code to view the vi files, who can rate?this is a test versionhttps://nekobin.com/wozecatemi.py
hey there guys, I have a question regrading connecting postgresql with django, so I know there is an staight-forward way to connect django with postgres by modifying the Databases dictionary and making postgres as the default database and adding all of the connection params into the code...but I believe that there is a more convenient way to store the connection params explained in the Django docs but I can't seem to understand it, can anybody help ?
Anybody help pls 😦
I have a DB with people's emails and passwords for their logins to a site. I want to secure the info I have there but I don't know how to secure the emails without making it useless for myself.
I've hashed/salted the passwords generated but if I do the same to the emails they'll become useless to me. Is there a way to protect or even obfuscate this data in the table to do this or is this me worrying about the wrong things and I should just concentrate on making the dB and dB connections secure?
you can use something like a fernet to encrypt the emails with a secret key but which allows you to decrypt them
didn't you pepered the password?
hash > encription
salted hash > hash
salted papered hash > salted hash
CAN ANYONE TELL ME WHICH MONGODB PRODUCT IS BEST FOR E COMMERCE TYPE WEBSITE
I have no idea what you're trying to convay here
i think i wrote that in way to short.... my fault i guess......
still i have a question which mongodb product is best for e commerce type website
Just normal mongo?
.....
Again I'm not really sure what you mean by that because... The products mongo offer outside of the standard DB are wildly different and not doing the same job
my reaction rn:

i meant which one would be great for a e commerce website:
MongoDB Community Server
or
MongoDB Enterprise Server
or
MongoDB Atlas?
Well how big is your E commerce website, and how much are you willing to pay. Thats basically the only difference there.
- The Enterprise server is for well, enterprises who want to self-host their database locally or on a different cloud provider, it's mostly support and legal related things like licensing.
- Atlas is mongo's DBAAS so it's whether you want to run mongo locally yourself, or if you're happy to pay and just let mongo manage the DB for you.
It's the same database though 😅
Are you sure that you even need mongodb? 🤔
I mean this tbh
Ngl for most things a regular relational DB is probably better
Pepper doesn't make your hashing algorithm more secure really 
@brazen charm and @paper flower
im back
do you think i would ask that question if i had a big e commerce website 😭 😂
it doesn't make your alogrithim secure but it improves your scurity
let me explain if
How so?
let me explain please
btw im having a internet connection error so dont mind if left in middle of talking
so
lets assume your database got breached
your database had user email and password
now the hacker knows your users password and email
let assume those password are in salted hash form
now the hacker knows your hash password
we know hash password can't be reverse
but
we can predict the password
like if a password hash is 123fs
then we will keep guessing password until we match that hash right?
You can't "predict" it
by predict i meant keep guessing.....
Depending on algorithm it will take a lot of time
ye there is algorithim that makes the hashing slow on purpose
Assuming your db is breached then in 99% of cases they have access to your app server and can retrieve your secret key/pepper
And it should be slow
thx for that
now since the database was breached so the hacker copied the password and store it somewhere
then he kept guessing offline
I think recommended time for web apps is 100-250ms to hash, on your hardware
OFFLINE
i would say 1 second 💀
This is not feasible
in whatever way let just assume he did......
It's not feasible to guess a password, just check how much time it would take
im not talking about timing attack
It's not
then?
Calculate how much time it would take to guess a single password
by guessing i meant he keep hashing random password and matching it hash to the entire breached database password
You can only match it with a single password, because they're salted
ye my bad
now do you notice something in that proccess?
@paper flower did you notice somethng?
ok let me tell then looks like you are offline
the hacker is guessing the password offline
using peper it let us to make sure that hacker uses the password online
and if we notice that someone is trying to guess so many accounts password then we can stop them right there
peper is actually a another "secret_key" that is attached to the password and peper is same for all users
and peper is implented on the website so the hacker cant guess offline
@brazen charm typing something?
2 things, 1) This isn't really related to databases 2) It doesn't stop them guessing password offline with a breached DB entirely, it just means they need to also get the secret key, which happens more often than you think.
And finally, this conversation extended from your comparison of two things which are completely different and should be (are) used for for different things.
as long as i studied how is it differnt??
Going back to this, just use Postgres or something then, Mongo isnt giving you anything useful here
i dont know postgres...
Excellent time to learn 😄
You mean why is symmetric/asymmetric encryption different from hashing?
nope i meant how pepering doesn't improve the secruity?
I never said that...
what is this then??
Your original message is jumping in saying that hashing is better than encryption, and then tails off into "Oh salted is better than nothing, but peppered is better than salted"
anyway, this doesnt really belong in databases
i actually meant
hashing is great for password security and salted hash is greater than only hashing and salted and pepered hash greater than only salted hash
okay, but that wasn't relevant to the conversation previous 😅
.....
which one???
still i think we both misunderstood something
I would assume if either db or app are breached - the whole system is
xD
imagine the whole system got hacked
Your entire system security right now:
ok bye gtg
guys can help me, I get an error when I want to display the table. Here is the error: Process finished with exit code -1073740791 (0xC0000409)
my friend tried to help in some way, but he didn't succeed
<@&267628507062992896> Hello. Would you, please, consider to add SQLAlchemy to topical chat or open source projects? I haven't found any good community to discuss topics and solutions with this library and even though it's great, the documentation is a pain (as you can often read around the internet). Thanks for considering...
You can already talk about SQLAlchemy in this channel
@blissful narwhal what do you think about SQLAlchemy?
Leaving me hanging
@gusty mesa do you have opinions about SQLAlchemy?
Regardless, I just changed the channel description to
Questions and discussion about databases
+ (both SQL and others) and using them in Python
Hopefully that makes it more clear
uhh, i know very little about it, and thus hold not much of an opinion 😅
I know...I was thinking of creating dedicated place for this one. Another databases and py modules usually have their communities around, but not this one. There is an official room on Matrix network, but it's quite quite 🙂 Another reason is that comparing for example to Mongo or Elastic, this one is a little bit complicated and it would be nice to have a place where you can read about others problems/solution etc.
But I understand, it's not possible to make a channel for every library, but...:D
the open source channels here are for projects where the devs themselves made this server their communication hub. we don't have that arrangement with the SQLAlchemy devs.
sqla is aight
Honestly It would be great to have a SQLAlchemy channel/ Server on Discord as I helped some people with it and generally interested in SQLA itself, though, Discord has one disadvantage - It's not indexable
SQLAlchemy devs and Mike Bayer himself respond quite fast in GH Discussions, If you have a specific question it would be better to ask it there 👀
i specifically don't want sqla on discord because GH discussions are so much better, being an actual persistent non-login-gated forum
the best help chats are links to forum posts imo
I have to say im a bit lost with SQLAlchemy documentation, So in SQLAlchemy 2.0 conn.execute("select * from blah;") was discontinued so you gotta use result = conn.execute(text('show databases;')) now?
Also im unsure if that create_engine is the way to go for long running connections. should i be grabbing a session object to do queries on
engine = create_engine(f"mysql+mysqldb://{username}:{password}@{hostname}")
conn = engine.connect()
# long runnung session
while True:
sleep(1000):
conn.execute(text( some_query ))
is this the correct method for long running sessions or do i have to get a new session each time?
and i guess last one: what is the difference between a session and a connection ?
Hey,
Can someone help with SQLite3 syntax ?
I have a table PRODUCT that has 5 columns (date, seller, product, cartons, weight)
I am trying to select with these specific conditions:
group by date and seller,
sum of the cartons and sum of weight
having a bit problem with the syntax and would appreciate any help!
select sum(cartons) as qty, sum(weight) as sum_weight, date, seller ... from table where product like "%widgets%" group by date, seller;
does that help? Not sure if it at all works but that is general syntax
did it, but it sums all products, into one.
I want to sum per product
say i have
2x product 1 and 5x product 2 in date y
I want the sum per each product in that day
then you need to add product in the group by group by date, seller, product
where do i add the group by in the syntax ?
it wil be at the end always
hey guys, what's the best channel to share open-source packages/projects made for postgres+python?
:incoming_envelope: :ok_hand: applied timeout to @soft basin until <t:1690796951:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
Hi
II have problem with mysql connector python i use query
e = cursor.execute("SELECT * FROM userbalance WHERE ip = '127.0.0.1'");
but it returns None even though i can see there are rows in gui app also query works in gui app
What do you get when you remove the where clause?
the cursor itself shouldnt return anything after executing the query, you have to follow up with .fetchone() or .fetchall() to retrieve the results
(technically it can return something but the DB-API 2.0 spec doesn't require it)
hey guys. I'm currently completing an assignment in which I create a Movie Theatre Booking System, we have to allow the system to Receive and Send information to a CSV file. The CSV file needs to store the booked seats(with name, date ETC). It must be able when opened to load the CSVs layout (booked seats and non booked)
would that be considered under databases?
probably not right
Right, a CSV is not a database. If you have a question about the assignment though, see #❓|how-to-get-help
CSV is very dumb database. Totally not scalable and not maintainable. Usually used by students and by data scientists for data sets. Good for data sets exports/imports
A normal backend developer would have used SQLite
Or if he wishes to go for dumb solution... Well file in JSON format sometimes can work too and kind of easier to read than CSV. It will take more space tough, but we would not go for large size usage in JSON anyway. We would use SQLite or postgres for normal stuff
It would be more true to say that CSV is acceptable exporting/importing format. Equally deserves to exist like JSON
Not exactly database, but can be used like one with a stretch 😄
Hey folks, I want to write a program where I can import a Table class and create the table. How would I do that? I tried this but it didn't work:
from tables_class_define_file import TableName
from sqlalchemy.orm import declarative_base
Base = declarative_base()
TableName(Base)
# Code related to DB & Schema creation
Base.metadata.create_all(bind=engine)
I want to have a separate file for Table defines and its related functions. Another file to create the DB, schema, tables & default values.
https://stackoverflow.com/questions/74623946/sqlalchemy-separation-of-table-classes-by-different-files
Found this but can someone give an example on how this is?
All of your python modules (files) should be executed (imported) in order for models to initialize
If you don't import them it's like they didn't exist at all
Also maybe this would be helpful https://gitlab.com/ThirVondukr/task-tracker/-/tree/main/src/db/models?ref_type=heads
I have imported TableName, yet table isn't created
Are you using ORM or Tables?
ORM, TableName is a class with Column definitions
I'm using Postgres btw
It should work 🤷
If you're planning to run this project in any "serious" environment you should use migrations btw
And not create_all
Oh ok, I'll check it out. Thanks!
This is going to be very hard to admit, but it's a significant hole in my CS understanding..
can someone explain to me what the difference is between sql, nosql, mysql, sqlite, mariadb, mongodb, etc?
my current understanding is that you have a database that uses sql to retrieve information. Is sql a "family" of languages that each have their own implementation specific to their database?
mysql, postgresql, sqlite, oracle sql, microsoft sql server, et alia are "relational databases" (sometimes called "RDBMS" = "relational database management system"). what exactly makes a database "relational" is a good question that i won't answer right now, but it's very useful to understand.
mariadb is a fork of mysql. that is, a separate company took the open-source mysql code and are now developing a separate product using that code.
sql is the standard programming language used to query and otherwise interact with relational databases.
nosql is a silly umbrella term used for non-relational databases, which generally use query languages that aren't sql, or are not quite standard sql.
mongodb is a "document database", which works differently from a relational database, has different tradeoffs, and does not use sql for queries.
And yes, while sql is a language with well defined standards, every database has its own sql variant; it’s mostly the same but there are idiosyncrasies.
SQL databases are family of relational databases kind of. Postgresql / Mysql / MariaDB / Sqlite3.
Good to use by default database engine, because it has higher quality by design because
- Database is in a good way possible to normalize =>
- It allows to store in a such way, that any unique data is stored is only once, without storing unnecessary amount of denormalized/copied data
- means least possible disk usage happens
- Data structure is typed! makes great safety and quality of your application by design
- Data structure is possible to migrate from state A to state B in how it looks, along side with all already stored data
- Allows sane transition of application versions into future, with easily updating database structure
- Relational databases having powerful quering engines, that allows to join any types of data and extract any complexity data results. That kind of makes a lot of backend logic thrown onto relatioal db.
TLDR relational database is number one choice for many reasons in general.
NoSQL:
- MongoDB is part of NoSQL family. Document type db to be specific
- There are many types of NoSQL databases -> Key Value storages, blob databases, different distributed databases, and around 10 more at least types.
- Each one noSQL database has its own specific usage case when it could be useful for super scaled application.
- But using it for fresh application from the start as main database? is very often wrong because they have serious some kind of limitations
- Nevertheless MongoDB is popular in MERN stack i think.
- Anyway, MongoDB is not enforcing data structure, not able to migrate application data, has very limited quering capabilities, rarely justified to be used. Useful for unstructured data, which by default is wrong to have unless u really know what you are doing and experienced already in relational databases. Nevertheless used as default database in MERN, we can only pity them 😅
hey, I'm trying to make a many to many association table, using this models.py and sqlalchemy
from sqlalchemy import Column, Integer, String, Numeric, Boolean, DateTime, Table, ForeignKey
from sqlalchemy.sql import func
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
cs_skin_case = Table('cs_skin_case', Base.metadata,
Column('cs_case_id', Integer, ForeignKey('cs_cases.id')),
Column('cs_skin_id', Integer, ForeignKey('cs_skins.id')))
class CS_SKINS(Base):
__tablename__ = 'CS_SKINS'
id = Column(Integer, primary_key=True)
cs_cases = relationship("CS_CASES", secondary = cs_skin_case, back_populates= "cs_skins")
class CS_CASES(Base):
__tablename__ = 'CS_CASES'
id = Column(Integer, primary_key=True)
skins = relationship("CS_SKINS", secondary = cs_skin_case, back_populates= "cs_cases")
but it's throwing this error
sqlalchemy.exc.NoForeignKeysError: Could not determine join condition between parent/child tables on relationship CS_SKINS.cs_cases - there are no foreign keys linking these tables via secondary table 'cs_skin_case'. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify 'primaryjoin' and 'secondaryjoin' expressions.```
when I try to append
```py
skin.cs_cases.append(case)
session.commit()
any ideia what's happening or how to solve this?
sqlalchemy is case sensitive
you're using uppercase in your tablenames but not in foreign keys
I am using mongodb
I try to connect mongodb atlas with mongodb compass and its connected but not showing database in local
Hi guys, I have a question. I want to save two different documents in two different collections. but the condition is either both of them are saved and if one of them fails due to any reason then other document should also not save in its collection. can it be possible in mongoose?
https://www.mongodb.com/docs/manual/core/transactions/
Transactions are available
ohk let me check. thanks
what should one do if session.abortTransaction() also throws an error in catch block in mongoose?
