#databases
1 messages · Page 3 of 1
you should really use query parameterisation rather than string formatting
yeah that makes sense, I wrote a lot of this when I was fresh to python (coding in general) and I'm now coming back to it
Is the reason for that mostly around preventing SQL injection?
in SQL dbs specifically yeah, pretty much just injection attacks
I would probably be better to use an ORM too, but sqlalchemy specifically could take some time to learn 
https://docs.sqlalchemy.org/en/14/
one thing to note though is some NoSQL Dbs actually use parameterised queries as optimisations, so generally always use them if they're available
In there much of a performance trade off of using an ORM?
There is, but you shouldn't really notice any
I do remember when writing a REST api using django for the same database it took some learning to understand why the queries were taking so long to execute
N+1 problems? 🙂
sqlalchemy also has this problem if you use ORM part and not Core, but it's pretty configurable
JOINs are pretty much the easiest way to destroy performance of your DB
which i think people don't entirely realise until they hit the hard limit where it's too late to change it
What would you use instead?
😅 Generally speaking, I dont use them :P
Most of my stuff is NoSQL wide columns now
The issue is slightly un avoidable with SQL, because most existed back when there was alot less data to handle
I think in some cases it would be better to use something like select in but it really depends on the data 🤔
but normalization is very heavily encourages within SQL, which unfortunately means most people have their 'hot' queries, spanning multiple tables, across potentially multiple scans. And if the query planner can't optimise it, you can potentially make a very expensive query without knowing.
My problem was I was trying to server the api to people that already expected a particular format, eg for some football match
the end user would expect a table with headers match_num, home_team, away_team, home_score, away_score, player_name, goals, assists, etc
I had one table for matches, one for teams and one for player_stats. I was joining the player stats for a particular match to the match and matching the teams across too
If that makes sense
Largely speaking they become the same behaviour in most DBs.
A good example is postgres, which arguably has probably the best query planner around. But in some cases if you join two tables + a sub query WHERE clause, it should be able just to execute the sub query, and cache it. But the planner gets confused and you end up with a plan cost of 5000 and go from 10ms to 100ms 😅
Can't remember the exact query unfortunately but doesn't take a whole lot in alot of applications with a reasonable amount of data.
I generally try to use selectinload with sqlalchemy unless it's a 1-1 relationship 🤔
not sure how alchemy builds those queries, although ig thats one trade off of an orm is sometimes you lack the transparency like that.
Most complex queries I had to write are just some aggregations and they have to do full scan anyway
It issues separate query with select ... where id in (id1, id2, id3, ...), like django's prefetch_related
It actually described in documentation afaik + you can simply print any query or enable echo on your engine
yeah ik you can echo it out, iirc if you have logging enabled in python by default it'll log it on the INFO level.
Also sentry does log sqlalchemy queries 🤔
I usually don't look at queries because I roughly know what query it would make, I might if I query something more complicated
A join between all these tables was what caused me the biggest issues
But if the api was serving a front end I think it'd be fine
Between all of them? 👀
Yeah basically the pre-existing R dataframe people were being given was in a format which included all the information row by row, so I was trying to replicate that
Mmmm those are some thiccc rows
I soon realised it was better to just serve a static csv for the people that can't use R, and then keep the database limited to api calls for each table needed as requiredfor when i eventually build a front end
So for example if I write something like
select(User)
.options(
joinedload(User.products).options(selectinload(Product.tags))
)
.where(
User.username == username,
Product.shipped_at > now() - timedelta(day=1)
)
I know how query would look like, because it's basically sql
tbh I know nothing about database creation, eg I dont fully understand the trade off in storing values that can be calculated vs calculating them at runtime essentially
eg margin between two teams is abs(Team 1 Score - Team 2 Score) but you could also store that
it's essentially storage vs compute time trade off. Unless your storage is slow in which case the compute is probably cheaper
Depends on the values, there are some things that you can easily store in database via generated as 🤔
Which django doesn't support 🙂
it's been a long time since i've touched Django, but I swear it had something similar
I can't find anything similar, maybe it can do that on python side, but i'd rather prefer db to do that
Also I just don't like django orm since it limits you a lot
Is there any other options for serving a restful API in python
FastAPI 🤔
FastAPI would probably be my go to if i did an API in python
does protocol buffer ever makes it's way into web dev? or is it only really used in service to service comms?
Well, there's also https://github.com/Neoteroi/BlackSheep but I don't know how I feel about dataclasses 🤔
not really a frontend - client end thing really
fair enough, it was my first exposure to API's so almost most familiar with it
Using MySQL 8, I'm attempting to get the entry in the table Session with the highest score, but when trying to also retrieve that entry's id, I'm greeted with an error. This is the query I'm attempting:
SELECT id, MAX(score) FROM `Session`;```
To reiterate, if I change the query to ``SELECT MAX(score) FROM \`Session\` ``, it works just fine
The error can be avoided by changing ``id`` to ``ANY_VALUE(id)`` but I feel like that shouldn't be necessary
? Did u turn on your pc 24*7 ?
Found it!
you have to sign up to read which then you have only 5 times to read as limit so better save all those links though
You can just:
SELECT id, score FROM `Session` ORDER BY score DESC LIMIT 1;
I was just looking at a stackoverflow explaining that, however, I do find it a bit silly since at this point, what is even the point of MAX?
I guess if you only need the highest value without any context behind it but I can't see any real use case for that
But, I'm just rambling at this point, thanks for the help
You can have more than one row which the same value. In my example you will only get one of the rows with the max value.
If there are multiple entries with the same score, it doesn't matter which one it picks, as long as it picks one of them instead of mixing values between both of them
SELECT id, score FROM `Session` WHERE score = (SELECT MAX(score) FROM `Session`);
will give you all, or use some window functions.
You seem to be quite knowledgeable about this, do you know of any resource where I can learn more about how MySQL (or SQL in general) gets interpreted to find the data as efficiently as possible?
Like, in the example you just gave, I would assume that it doesn't actually fetch data twice but actually just interprets it down to one more specific query
No, sorry. I learned on the job. And I think the example will scan the table twice, so not very efficient.
That does sounds pretty inefficient but I guess computers aren't exactly omniscient
anyone knows how to build games
I'm stuck with Postgres replication with docker
Master pg_hba.conf has this line
host replication replicator 172.22.0.0/16 trust
as well as the replicator user
CREATE ROLE replicator WITH LOGIN REPLICATION CONNECTION LIMIT 5 PASSWORD 'password';
And when I attempt to run pg_basebackup from slave, I'm getting this on master:
2022-08-17 15:02:09.008 UTC [41] FATAL: no pg_hba.conf entry for replication connection from host "172.22.0.4", user "replicator", no encryption
which means that the slave actually reaches it through common docker network
I've even tried adding the specific 172.22.0.4 host to master's pg_hba.conf, but it changed nothing
And pg_hba.conf changes are applied, I restart the containers every time, I even checked the file inside
What could be the problem here? I'm starting to lose it
It should, unless there's an index 🤔
With index you could also use order by + limit
The solution to ERROR 1524 (HY000): Plugin 'mysql_old_password' is not loaded that I used (if anyone else ever needs it): https://gist.github.com/vitorbritto/0555879fe4414d18569d
I scrolled down and found this in the comments for Apple Silicon devices:
brew remove mysql
rm -rf /opt/homebrew/var/mysql
brew install mysql
mysql.server start
mysql -uroot
Is there a particular reason I should grant the permission?
I believe this is from the PGAdmin app
i'm surprised that it made a separate user account, i assume that's for running the postgres server daemon itself?
maybe try it without the permission first and see if it works
Yeah I think you're right about it being for the daemon. I've restarted multiple times since installing it and haven't had any trouble accessing the db. I just wasn't sure if there was a really good reason to do this
otherwise this just seems like potentially a big security issue
Yeah, I have a NUC with a few self hosted service and code and DB
tornado
fastapi is probably the best choice nowadays imo
is there anyway to (assume that i have five datas - data1,data2,data3,data4,data5) if i delete data3 from database (data3's id=3) re id the other datas again (like data4 id=4 after delete operation data1 id=1, data2 id=2, data4 id = 3, data5 id = 4 because data3 deleted)
in sqlite3
python
technically it's possible but what's the use case? it seems like strange thing to do
class AddOrder(DataBase):
def __init__(self) -> None:
super().__init__()
#ADD
def add_user_tg_data(self, user_name: str, user_id: int):
self.cursor.execute(f"INSERT INTO orders (user_name, user_id) VALUES ('{user_name}', {user_id});")
def add_user_real_name(self, real_name: str):
self.cursor.execute(f"INSERT INTO orders (user_real_name) VALUES ('{real_name}');")
def add_user_phone_number(self, number: str):
self.cursor.execute(f"INSERT INTO orders (user_number) VALUES ('{number}');")
def add_bot_type(self, bot_type: str):
self.cursor.execute(f"INSERT INTO orders (bot_type) VALUES ('{bot_type}');")
def add_bot_funcs(self, bot_funcs: str):
self.cursor.execute(f"INSERT INTO orders (bot_funcs) VALUES ('{bot_funcs}');")
def add_bot_desc(self, bot_desc: str):
self.cursor.execute(f"INSERT INTO orders (bot_description) VALUES ('{bot_desc}');")
def add_deadline(self, time: str):
self.cursor.execute(f"INSERT INTO orders (order_time) VALUES ('{time}');")
async def last_step(message, state: FSMContext):
await state.update_data(time = message.text)
db = AddOrder()
data = await state.get_data()
db.add_bot_type(data["bot_type"])
db.add_user_tg_data(data["user_name"], data["user_id"])
db.add_bot_desc(data["desc"])
db.add_bot_funcs(data["funcs"])
db.add_user_real_name(data["name"])
db.add_user_phone_number(data["number"])
db.add_deadline(data["time"])
await message.answer("Все готово")
del db
``` main file
why this do this (photo)
hey guys, i'm trying to get the values back from a query using fetchall as a list but whenever i try to access the list i get an index out of range error. if i print mycursor.fetchall, it prints the values in list form though.
myresult = mycursor.fetchall()
print(myresult[9])
Same error if i run the code below:
myresult = list(mycursor.fetchall())
print(myresult[9])
I've tried different variations of accessing the list but they all seem to return the same index out of range error. So i'm using the simplest one for this example. I'm hoping to use the results of the query and insert them into entry fields. Thanks for the help.
why are you using [9]? "index out of range" means that there aren't 9 results
i'm trying to get a certain item in the list, [9]
how can i go about to get a specific item in it?
myresult is a list of rows
each element in the list is a tuple representing one row
if you want the 9th element from each row, you need to loop over myresult
field9s = [row[9] for row in myresult]
then field9s will be a list of all the 9th fields from all the results
if you only want one result, use fetchone which just returns a tuple, not a list of tuples
i actually need 3 results from it, so i can use:
~~fields = [row[9], row[10], row[11], for row in myresult]~~
? -- Nevermind, i guess i can use three seperate loops and assign them to variables. Thanks!
results = [row[9:11] for row in myresult]
or did you mean this?
field1 = []
field2 = []
field3 = []
for row in myresult:
field1.append(row[9])
field2.append(row[10])
field3.append(row[11])
I am currently testing to insert 600 million rows, mainly emails and meta data (2 integer fields) into mongodb (select to check if exists and an insert) but I am only reaching around 40 inserts/s and 40 queries/s the database is almost empty (40.000 docs). Any idea what is going wrong here?
are you using batch operations and asyncio (motor)?
I think even without batching it should be a lot faster? 🤔
Thats also what I think
it might also have to do with your computer, your network, and/or your mongodb configuration
I assume your mongodb is local?
Its running on the same server
so this is single-threaded, non-async (pymongo?), not using bulk insert, and connecting over localhost. right?
Currently localhost (mongo and insert script running on the same server) and pymongo (nothing fancy, 1 find_one, 1 insert_one), correct.
Wait, so you're not just inserting but also fetching at the same time? 🤔
yeah 40/sec sounds about right for that scenario if you're doing 1 find and 1 insert over and over. especially considering that this is probably not a super powerful machine (physical server? vps?)
40/sec sounds terrible. So terrible that I am buffled 😄

I am sure I did something fundamental wrong here?
seems like an issue with the index, first time user so not sure what got borked.
But still quite low for an empty database. Any recommendations what kind of database I should use for that workload?
You can try using bulk inserts first 
show us your code
also if this is some low-spec vps i would not be surprised at all about bad performance
@harsh pulsar Do you personally use nosql dbs? 🤔
i used mongodb at my last job
my current company uses it for something but nothing that i need to bother with
lookup = db.emails.find_one({'email':email})
if not lookup:
db.emails.insert_one({'email':email})
I do understand that bulk insert would be faster for the initial insert, but if I cannot go beyond 10k selects/edits/inserts per second, it is not viable to use in production. It would just take to long to parse 700 million entries.
Oh, that probably makes a full scan here on your db 
because email is probably not indexed
That's what you probably need
@harsh pulsar So you mostly work with SQL? 🤔
Still, use upsert
@paper flower , thanks let me read up on that (I used that before, but need to check if I can do this on sub arrays).
In general I don't mind switching if mongodb does not fit here. I am not invested into a db ecosystem for now. I just prepared all my data for batch processing.
I don't really have enough knowledge to recommend you a database, SQL/NoSQL have their tradeoffs, it all depends on what you need from that database and how much data you have
At the moment I have around 120 million rows and 5 references for each row (around 700 million total rows in sql and 120 million documents in nosql)
I can probably run a benchmark on my local machine if you're interested but it's mot powerful than your vps 🙂
If you already have your data stored why do you need to change anything?
Its in csv/pickle files
yes. but i'm a data scientist, not an application developer
Did you ever consider using json/jsonb in sql?
I think it might be useful in some cases 
i've done it and it works pretty well. i can't speak to how well it performs "at scale" compared to mongodb though
I think postgres scales well enough for 90% of the apps
in general i agree, but i wouldn't know about jsonb specifically
i mean, i wouldn't know about the scaling properties of json or jsonb operations in postgres
I will test postgres after this little insert. I also have yugabytedb on my list.
@torn sphinx what is your actual objective here anyway?
are you doing an initial data load for an application, or trying to benchmark the database?
You mean what my "project" is about?
I am trying to see how I can handle my amount of data, to fit it into a database in a reasonable amount of time.
How wide is your data/tables?
You mean the amount of columns? Sorry not so familiar with database lingo
"can handle" is vague, and if you just want to load a lot of data in bulk, then every modern database will have some bulk insert option that you should be using
I think you should be concerned about database size too 🤔
let me throw up a sql schema real quick
just brief description would do
int, int, string(64), etc...
- string(320), int, int <-- 120 million rows
- int(id), int(id) <-- 720 million rows mainly references/foreign keys
How much space do your raw files take? 🤔
Well, it should take around same space in sql db I think
What are your vps specs? (If you're using one)
8 vCPU, 32GB ram, 240GB ssd
Should be ok for database of that size I think
But its a cloud VPS so "adjustable".
You can try sql 🤔
Yeah will give it a try
I think the op/s can be multiplied by 4 if I use threads, looking at the current server load. So probably around 4000 op/s for now.
@torn sphinx With code like this it took me around 2 minutes to insert 1 million of rows, python seems to be the bottleneck here
engine = create_engine(
"postgresql://username:password@localhost:5432/database-name",
future=True,
# echo=True,
)
Session = sessionmaker(
future=True,
bind=engine,
)
Base = declarative_base()
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
email = Column(String(320), nullable=False)
some_int = Column(Integer, nullable=False)
def main():
with engine.begin() as con:
Base.metadata.drop_all(con)
Base.metadata.create_all(con)
with Session.begin() as session:
for _ in range(100):
for _ in range(10_000):
user = User(
email="".join(random.choices(string.ascii_letters, k=320)),
some_int=random.randint(1, 1_000_000)
)
session.add(user)
session.flush()
So you can use multiprocessing here for sure
8 processes, 11k ops/s
If it stays constant, will just take 12 hours 👀 to parse 20GB of my data.
Well, amount of inserts seems to be only 3k/s?
Yeah but that is fine, I care more about the total ops/s for this workload.
I'm not sure but I thnk you could get more speed with postgres here 🤔
I will try that next today, any recommendations about that?
Can't give any concrete recommendations, it should be pretty straightforward
Read your files, send data to your db
Batch your inserts 😉
Hello 👋, i have a fast API app, and I'm now planning my permissions system. I was thinking that a user could have roles, and roles could have permissions. From which one could calculate the permissions. Of the user.
This would then be 3 database queries for one single request a user makes.
I could also calculate the permissions once, and then embed them into the JWT, but then I would need to log users out whos permissions changed, in order for them to update their permissions.
Is there a Standart way to handle this? I tried googling, but it didn't yield the results I was looking for.
Please ping on reply
You could use oauth scopes
?
The inserts are slowly progressing 🙂
Slowly...
Try postgres 😉
I saw around 8k writes per second on my machine, which would be different on a vps
But I was using a single core (Ryzen 2700)
And was mainly CPU limited by python itself
any specific setup?
I remember tuning some parameters but I think default installation should do well too
My postgres instance doesn't break a sweat, I'm mainly limited by python
It takes a bit less time (100s, so ~10k inserts per second) when using sqlalchemy core but it would depend on your vps 🤔
A lot of time is spent on random.choices here 😅
I finally figured this out. Although the conf files were loaded correctly, they weren't applied because they were put into /etc/postgresql/ while postgres was applying the ones in /var/lib/postgresql/data/ Solved by adding the command to compose
command: postgres -c config_file=/etc/postgresql/postgresql.conf```
And by adding the path to hba in postgres.conf
```conf
hba_file = '/etc/postgresql/pg_hba.conf'```
Damn I've wasted so much time on this
@paper flower getting the row based on a indexed column is not expensive right? As long as the where clause element is indexed?
Or is the row data retrieval dependent also on index (after the row has been found)?
Usually indexes use b-tree, so its log(N) to get something by that column
async def register(ctx):
db = sqlite3.connect("userinfo.db")
cursor = db.cursor()
cursor.execute(f"INSERT INTO userinfo (userid, balance) VALUES ({ctx.author.id}, {0})")
db.commit()
cursor.close()
em=discord.Embed(title=f"You've successfully registered!", description=f"Now you can use currency commands :)")
await ctx.send(embed=em)```
How to get this thing working? database was already created but it just cant connect or insert?? this already wasted so much of my time :/
What's the error message?
What do you expect it to do?
How do you know it's not working?
im checking it using DB Browser for SQLite
when a user runs register command,
i want the bot to put their userid in the userid column,
and put 0 for balance column , next to their userid
Are you sure there's no errors? Also did you create your table?
So I would need to give new tokens once the permissions change?
Yes, your access tokens should be short-lived
Hmmm. Why should it be?
How would I handle long term logins then?
Refresh tokens 
The refresh token is long lived?
Oh. Yes you are right.
Do I send a new refresh token once it has been used?

you can use mariadb.connect() with a cursor
https://mariadb.com/resources/blog/how-to-connect-python-programs-to-mariadb/
Thanks
Hello. I was thinking of modifing a sql server database with a python script, what is the best way(module) to do so?
I've use it before, I thought it only worked with sqlite databases
Im using microsoft sql server as database managment system
oh nice, I'll give it a try. Thanks
I am trying to create a view with the query below:
SELECT *
FROM [1.1.1.1].database_name.Dbo.table_name
but unable to do so because getting incorrect syntax error and SSMS is deleting the brackets. On the other hand, I can run this query on a query window and see the data just fine. I tried to disable intellisense but that didn't help either.
ok so basically this is a .json file and say i want to update 6552254 under values and change what it is equal to which is "+0" right now to "+10" then save the json file can anyone help
What's the problem here? 🤔
i want to keep the indents
not make it on the same line
json.dump(..., indent=4)
Can someone tell some good module for mysql?
Heard of pymysql/mysqlclient/mysql-connector
Need something which supports pooling too ( instead of creating and closing each time, reuse previous ones)
I think majority of dbapi's should support that 🤔
Ik but need some good module
There's a lot on pypi 🗿
Need for API
And web development
sqlalchemy + any driver
For mysql?
Gonna try this thnx
I'm importing a .sql file to a DB instance on Google cloud - the file size is ~5gb
The import has been going for nearly 12hrs now.... this can't be right, can it?
Sounds bad... Did you do a query to see if any of the new data has been added?
I connected from my local machine using sql server mgmt and there's nothing
something looks to be occuring
ERROR: (gcloud.sql.connect) HTTPError 409: Operation failed because another operation was already in progress. Try your request after the current operation is complete.
getting that if I try to connect via the shell @fading patrol
What does this mean?
The database is locked (in use). This is a limitation of SQLite. You need to close the existing connection before you open a new one
Yep I already did, thanks for the help anyways.
can i work with databases in python?
Yes
can you recommend a module?
Depends on what database you want to use
just sql
MySQL or SQLite but i dont quite know which one to choose
Postgres is also a good one
you can start with sqlite since it's just a file and you don't have to install anything
And sqlite3 module is built-in: https://docs.python.org/3/library/sqlite3.html
i will check that out
im making a database which pulls info from the web so 'item:price' (about 10k lines) but ive got little experience/knowledge on datases what sort of database would be best suited and quickest to use?
trying to understand the difference between postgres, redis and mongodb just starts getting confusing 😄
As you can see from the constructor of the User model, the created_at has the default value 2020. Later on when I create some default users and leave the created_at as is (without specifying it), the user table doesnt store 2020 at the created_at column
For quick and easy, I like MongoDB. I've never tried Redis but Postgres is probably more complicated then you need
Hi, what is the Technical term for "Occurs when different values are stored in the associated data elements for the same atrribute"
Maybe you mean a concurrent update? 
imo mongodb is a trap. it's "quick and easy" for really small projects, but it becomes a huge pain in the ass for anything of medium scale, especially with multiple developers working on it. it's good for exactly one purpose: high volume of writes of data for which you do not know the schema or for which there is no consistent schema. that's it. otherwise use a relational database, the minor setup complexity is worth it.
and sqlite requires basically no setup. if you don't need high concurrency writes and don't need to run the database over a network, just use sqlite and ignore all other options
postgresql is a relational database. redis is a "key-value store", not suitable for regular application data storage. mongodb is a "document database" which lets you store blobs of json without any additional structure imposed.
if you are just storing item id's with their prices, i suggest creating a sqlite database with a table item_prices with two columns, item_id and price
there are a bunch of resources in the pinned messages for this channel that will help you learn how to use sqlite and sql
great meme from jordan lewis (cockroach DB)
i like how NULL is both above the water line and below the abyss
literally at the bottom haha
i don't know why denormalization is in the 4th level though
belongs in the 2nd or 3rd at most. it's a pretty fundamental technique, definitely more fundamental than GROUP BY CUBE
maybe its because its coming from a traditional DBA point of view, where denormalization is heresy, while columnar DBs optimize for denorm.
and utf8mb4 definitely also needs to be higher up, it's obnoxious but it's not exactly uncommon to run into it

ehhhh that's a rationalization 😛 the rest of this is great and hilarious

TIL about "allballs" https://www.postgresql.org/message-id/20050124200645.GA6126%40winnie.fuhr.org
hahahaha the explanation is hilarious
you know something is obscure when its only reference online is a 2005 mailing list thread
Tbh idk
This is one of the questions from my Homework....
I agree, mongo is a trap 😅
yes and then if you need to analyze any of that data later...🕯️
What is the form of your data? Uhh ehh...
Any help
I'm not reading a blurry photo. Please paste formatted code
And yet many people use tools like https://art049.github.io/odmantic/ or similar, what's the point? 😅
AsyncIO MongoDB ODM (Object Document Mapper) using python type hinting
there isn't, and if you're using odmantic you probably fucked up and should have used a relational database from the beginning
the one thing that's nice about mongo is that it's trivially easy to "inline" subdocuments that would typically end up as separate tables
which is 99% of the time a premature optimization of course
but it is faster to look up and get data from a document like {"id": 1234, "tags": ["a", "b", "c"]} than it is to do a JOIN tags ON user.id = tags.user_id
of course if you need to actually get any info about those tags, you're back to doing joins (except shittier) using $lookup
or you just inline the whole damn tag object, but then you basically need to have the foresight of a god in order to design a schema like that
unless you take advantage of the other alluring aspect of mongo: it's trivially easy to add fields without doing a migration to add a bunch of "foo": null to the existing data
Well, I WOULD be comfortable throwing some data (even structured data) into mongo if I just have a lot of it
Any critical data? Nah
all that is to say, mongo is an extremely alluring trap for a small group of devs working on a relatively new application where maybe requirements aren't so clear but there is a concern from day 1 about request latency
so it's tempting to say "no joins, no migrations? sign me up!"
might i go so far as to suggest that mongodb is the ruby of databases
SQL is fine for 90% of apps, you'd never have enough data to need another server, and migrations are easy to deal with
(jk thats not fair to ruby)
(people probably were saying stuff like this about python back in the 90s)
You don't even need to do any joins yourself, most people use ORMs 🤨
@paper flower one more newbie question, in mongodb I will not be able to fetch a document quicker if all the document content is indexed, right? It is enough if the lookup key is indexed? Once it found the lookup key, it will read from ram (working set, if I am lucky) or read directly from disk right?
I will play with postgres today
If you're looking it up by certain key you only need that key to be indexed
_id is indexed by default afaik
The way I understood it the index just "points" to a specific byte position in a big file that contains that document, right?
It depends on the implementation I think 
But generally it works like that
So technically if all my indexes fit in RAM but only a super small portion into the "working set", if I have a disk only being able to handle 1 byte/s and my document is 10 byte/s it will take 10 seconds to get my document?
Mongodb stores indexes in ram (where it can I think, SQL dbs do the same), so searching by it would be pretty fast, if your documents aren't cached in ram they would be fetched from disk
@paper flower can you recommend me an online resource to calculate index storage size for postgres?
I don't really know any 🙂
class User(db.Model):
id = db.Column(db.Integer, primary_key = True)
name = db.Column(db.String(30), unique = True)
password = db.Column(db.String(30))
bio = db.Column(db.String(30))
created_at = db.Column(db.Integer)
def __init__(self, name: str, password: str, bio: str, created_at: int = 2020) -> None:
self.name = name
self.password = password
self.bio = bio
self.created_atv= created_at
You don't need custom init with sqlalchemy 
Worked for me.
!e
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///:memory:'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key = True)
name = db.Column(db.String(30), unique = True)
password = db.Column(db.String(30))
bio = db.Column(db.String(30))
created_at = db.Column(db.Integer)
def __init__(self, name: str, password: str, bio: str = None, created_at: int = 2020) -> None:
self.name = name
self.password = password
self.bio = bio
self.created_at = created_at
db.create_all()
db.session.add(User("Name_1", "Pass_1"))
db.session.add(User("Name_2", "Pass_2", created_at = 2022))
db.session.commit()
for user in User.query.all():
print(f"{user.id = }, {user.name = }, {user.created_at = }")
user.id = 1, user.name = 'Name_1', user.created_at = 2020
user.id = 2, user.name = 'Name_2', user.created_at = 2022
But as the doctor said, you normally don't overwrite the __init__ function. You could just do:
created_at = db.Column(db.Integer, default = 2020)
***Hey! Does anyone know how to make a flask api?
i want to be able to send a request to the api when i have it hosted.
like this example.com/key/{code}
and it checks if the code is in a txt document stored on my pc. if its not i want it to respond in json. Failed Attempt
and if it works i want it to display Success. ***
Alright thx. Idk why it doesnt work ill look into it
Yes this is relatively simple and there is loads of info online. Maybe start here: https://pythonbasics.org/flask-rest-api/
!rules 9
@hasty magnet no
okay sorry
Although when i create an instancec i get a type error. Default jnit only acdepts hte instance(self)
You'll need to use the keywords if you use the standard init, eg: admin = User(name = "admin")
Yes.
@flint salmon
class User(db.Model):
id = db.Column(db.Integer, primary_key = True)
name = db.Column(db.String(30), unique = True)
password = db.Column(db.String(30))
bio = db.Column(db.String(30))
created_at = db.Column(db.Integer)
user = User(name="SomeName", password="...")
Should work
You also need to specify nullable=False if you won't want them to be, well, nullable
Passed default at created at still doesnt work though.
Got a unique constraint fail error
Although none of the usernames are th same
Created at suddenly works
Because there probably was no problem with it
Im getting a integrity error, user.name has a unique check but failed ofr some reason
Because there's user with same name in your db
Is there a way to bind an engine to a SQLAlchemy object after it's been initialized. I'm trying to implement a multi-file MVC scheme but am lazy loading the alchemy object
# project/app/databases/app_db.py
from flask_sqlalchemy import SQLAlchemy
engine = create_engine(...) # contains the connection information
db = SQLAlchemy() # notice no `app` passed in
db.engine = engine # ???
My models import db to leverage db.Model:
# project/app/models/Customer.py
from app.databases.app_db import db
class Customer(db.Model):
...
# project/app/models/Store.py
from app.databases.app_db import db
class Store(db.Model):
...
In top level file:
# project/app.py
from app.databases.app_db import db
app = Flask(__name__)
app.config.from_object('config') # config.py includes non-db configurations
db.init_app(app) # attempts to connect to sqlite and not my default
If you want you can use sqlalchemy directly, without flask-sqlalchemy wrapper 🤔
But with flask_sqlalchemy your config (including connection string) is provided in your app.config iirc
In postgres using sqlalchemy, if you know the string type column is less than 100 characters but it varies. Will you use String type or String(100)? Which is better in terms of storage and search?
Strings in postgres would take as much space as they need, so 20 characters for example won't take all the space
@prime chasm If you try storing larger string you'll get an exception 
Right. That's what I'm looking at to debug. I have to say I'm a bit of a Python novice and am unsure how I can take the engine from create_engine() and integrate it back into the sqlalchemy object created from invoking SQLAlchemy(). Is there a way? To highlight my ignorance, assignment (i.e. db.engine = create_engine()) doesn't seem to work.
Again, I didn't use flask_sqlalchemy, but I use sqlalchemy itself often
I can help you to set it up
Thank you. How would you propose? I'm guessing I may need to publish an example repo to demonstrate what I have so far.
LibreOffice and I’m assuming excel uses , as a separator for functions and array columns. So I'm using ,, to separate values in each field, (like a list of numbers). Would seeing this annoy you or is it not a big deal?
Can you expand a bit more on what problem you are trying to solve?
Definitely. When i opened my csv file in LibreOffice Calc, the fields made up of numbers separated by commas are read as formulas I guess?
For example this field 125,126,133,163,131,139,132 turns into 1.25126133163131E+020
I wanted to remove the comma as a seperator in the settings but for some reason I cant.
when you open a csv file, you can configure the separator. It's also common to wrap fields with , with a "" as well to avoid these problems
well, you do need a separator
Using "" will solve my problem thanks. I didn't originally have a problem exactly, I just feel uncomfortable changing the separator characters to whatever I want and wondered if there was an accepted set of characters for each case.
Besides the obvious comma to separate values.
I actually changed the default separator in most of my tools to | so I could use a comma in a list of numbers. But now idk if I should use that. Just in case someone else looks at my spreadsheet and says "why are you using this instead of this?"
It's called Comma Separated Values for a reason :p
Tabs are also another commonly used separator
thanks im just going to do what everyone else does instead of making random changes lol
CSVs are typically used as an interchange format, not a working format.
To that end, I would keep the CSV as simple and stupid as I can and then leverage the formatting options in libreoffice and use its native format
i personally do think it's a good idea to us a different character for the "internal" separator
often i've seen things like using tab \t to separate fields, and comma , to separate values within fields
or pipe | and comma , in some configuration. or semicolons ; or even colons : in some unix-ey cli programs
I kinda hate having options it makes my indecision worse.
But i do like the way pipe looks
pipe is good as the "primary" field separator imo because it looks like table columns
a|b|1,2,3|x
c|d|4,5|y
I created a [non-functional] repo here to demonstrate the point. https://github.com/mike-usa/flask
It is influenced by:
- https://python.plainenglish.io/flask-crud-application-using-mvc-architecture-3b073271274f
- https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
- https://github.com/svieira/Budget-Manager/
The idea is for the app to eventually have models that use multiple databases. To reduce technical debt, I tried to keep those database configuration in the ./app/databases directory instead of ./config.py, ./init.py, or ./main.api.
This is primarily a demonstration, but to run:
git clone https://github.com/mike-usa/flask.git
cd flask
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
flask run
It should output the results of app config at main.py#10, demonstrating SQLALCHEMY_DATABASE_URI is not set.
Tag: @paper flower
I'm looking for help regarding JSONB querying in postgres with sqlalchemy when using jsonb_array_elements, posted here:
#help-falafel message
Anybody has an idea on how to turn this func into a JSONB?
Maybe you can use op method 
jsonb_array_elements(...).op("->>")("Field name goes here")
Oooooooh thanks, it works 😄
I wonder if there's a more "direct" way of writing that though, without ending up writing the operator ourselves
It feels like a pretty "standard" use-case for JSONB querying
Hmm, maybe this? data_table.c.data['some key'].astext == 'some value' 🤔
Also func.jsonb_array_elements() might have some custom methods too
This doesn't work when combined with the func unfortunately.
Sometimes there's a dialect-specific way of expressing those kinda of things though
Hey
I'm running a flask app on google cloud, but I'm having no luck at all using pyodbc which throws the following pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found (0) (SQLDriverConnect)")
so it's missing the necessary drivers. I ran a query to add pyodbc.drivers() which returned an empty list, confirming no drivers.
but it imports pyodbc as a package, as per my reuqirements file.
does anyone have any experience here?
Would it be better to create multiple db files or different table on same file for different types of things say for example a db of people and how much money they own and a database of a discord server and id of its general channel?
You should definitely create tables instead of dbs
Any specific reason?
That's how dbs are meant to be used 
Otherwise you won't be able to use foreign keys and joins to name a few
And you'd have to make separate connection if you want to use another table
i am going to use this database for my discord bot
so it'd have things like the welcome channel of each server and currency system for each user
Also would it be better to leave the db open all the time or just open use and close it everytime needed
I didn't use sqlite much but I think it's ok to keep it open since you're running a single process
what's a recommended DBMS for something large scale? it will mostly be handling reading/querying so I'm looking for something that's efficient and fast in that area
Depends on the scale and types of operations you want to do 🤔
well as i mentioned, it'll be pretty large scale and will be used for querying, a lot
consider more than 30 million entries
where i want the querying for it later to be as fast as possible
30m entries isn't much
true but it's my first time dealing with a dataset of this scale and im just looking for something that's fast in retrieving this info
Well, it depends on type of query you want to run anyway 🤔
If you're just selecting it by id then any rdbms will do
not sure if i have enough knowledge in this field but it'll mostly be searching for text in the database that (even partially) match my query text
it'll be returning the entire rows where the text matches
There are some indexes that support LIKE and ILIKE operations which you could try 🤔
PostGres/MSSQL Server/Oracle, any of them can do it but the server will have to be right sized
The type of operations matters. If you're doing simple aggregate compilations for terabytes-to-petabytes of data, you may need something like mapReduce, and there are dialects/platforms designed for breaking out operations across servers. If you're crunching time-series, it requires more time-oriented framing.
For most simple, straight-forward use cases; speed is going to really come down to your design & hardware; how much RAM, how fast your disk reads are, your CPU/GPU, etc. A good upper-midrange (and free) database is PostgreSQL.
In your case, PostgreSQL comes with a module called FTS (full text search), which means you'll have the ability to take a text-based document and the database will break it down, canonicalize the words, and build a relationship map between the words/phrases. That relationship model (think google, which yields a score for search terms) is important because you no longer are searching for exact string match. You'll be able to search for similar words ("pet" may reveal "cat" results) and words/phrases in different order (searching for "president kennedy" may reveal documents that have "John F. Kennedy was the 35th President of the United States")... with other potential features of misspelling forgiveness, etc all depending on your search algorithm and the data you feed it.
IP's are generally not good long-term data. Most are DHCP, which means they may change fairly frequently. Nefarious actors will more than likely have work arounds through private VPN channels or other traffic obfuscators. The question is if you want to go NoSQL or go relational. Only you know how the characteristics of your database, how much data, intended use, performance, QoS, etc.
You could stick them all in an array on the table or if you want to go relational:
- User: id <pk>, email, etc
- IP: user_id <fk><ck>, ip_address<ck>, date, etc
Consider:
- Different countries have different privacy laws. When storing this information, be sure you're adhering to your userbase. For instance, Europe has fairly restrictive laws about storing user information, including IP addresses.
- Web traffic (request, IP, etc) is generally stored in logs
that is actually really informational, appreciate it
Instead of using an array you can just create 2 or more records in your Ip table 🤔
Also I don't think you can insert into two tables using a single statement, but they can be in the same transaction
Do you need full text search or just partial matching? 🤔
Because full text search depends on a specific language you're using for example
The array was for a single table (User) with IPs field, or as a NoSQL approach -- I generally detest arrays for any searchable content as conceptually, they're like storing a table inside a table, w/ tradeoffs.
I think if they want to search by it and/or attach additional information then storing it as a separate table is better
e.g.
class LoginAttempts:
created_at: datetime
user: User
ip: str
...
Exactly. Anything other than data regurgitation
how do you make those color fonts?
!code
ohhhhh
I see
!e
print("hello world")
Is there module or smthn that allows you to make fancy fonts like
@paper flower
Could i use sqllite to automitcally update with webhoooks from a webapi? or do ihave to get the json data first and then update sqlite?
Well, webhook in most cases assumes receiving data into some sort of API, usually in JSON format
right, but can sqlite receive data directly like this? right now im using the requests library to get the json data first and then i send it to the database.
im wondering if i can just directly cut out requests in the first place
As I said, you can't use sqlite directly:
- It's not secure to give full access to your db
- Sqlite doesn't have capabilities to connect to remote machines
In most cases some sort of HTTP based API serves as a "glue" here
And webhooks usually assume using http + json, so no direct db connection
When building a website, should you keep track of how many people/users viewed each page?
Website is for internal use.
It doesn't hurt to use Google Analytics or Statcounter if you think that data will be useful to know, and it probably is
Would a big company be in favor of using Google Analytics for internal tooling systems?
I dunno, why not?
It's an internal tool. I dont know how comfortable a big company would be giving away internal tooling info out to Google.
So, when im connecting to my postgresql server in python, is there a way to do
conn = psycopg2.connect(login_info)
without actually putting the actually login info there? (for pushing to github)
.env files maybe
But you probably don't want it to be cloned by anybody but you either
.env files, and use something like python-dotenv to load and read it. make sure that your .env file is on your .gitignore if you are pushing to a git repo
Oh, right, .gitignore exists
Depends on how the stakeholders / your company feels about it. If there are no strict measures in place for how data is handled transferred outside the business then you probably can use it.
If you want to build your own internal simple tracking for just page views then that’s also possible and implementation would not be very difficult.
I am using PySpark, trying to filter only the maximum records for each country. So far I have the dataframe grouped by country and sorted by cat_sums (sum of that category for that country):
+----------------+--------+----------+
|product_category|cat_sums| country|
+----------------+--------+----------+
| Gear| 1090| Angola|
| Weapons| 581| Angola|
| Armor| 315| Angola|
| Armor| 49735| Argentina|
| Gear| 16319| Argentina|
| Weapons| 6349| Argentina|
| Armor| 13835| Australia|
| Gear| 947| Australia|
| Weapons| 838| Australia|
| Gear| 337| Austria|
| Weapons| 250| Austria|
| Armor| 64045|Bangladesh|
| Gear| 12353|Bangladesh|
| Weapons| 4542|Bangladesh|
| Weapons| 1050| Bolivia|
| Gear| 1014| Bolivia|
| Armor| 740| Bolivia|
| Armor| 92635| Brazil|
| Gear| 20791| Brazil|
| Weapons| 18330| Brazil|
+----------------+--------+----------+
only showing top 20 rows
How can I filter this so it only contains the maximum category for each country?
There are about 300 more countries iirc
This will be the same thing as the first row for each country.
I can bullshit this with modular arithmetic (maybe) but I'm going to be in trouble
I am having a lot of trouble running MAX() on a dataframe column and returning the entire record.
Please help?
hi if anyone here has experience with prisma could help me in #help-pear
Hello, I'm not sure how you solve this in python, but in sql you could write something like this:
select distinct on (countries.country) * from countries
order by cat_sums desc;
Which would essentially do following things:
- Order whole collection by
cat_sums - Select rows with
countrythat wasn't selected yet
I think you can do the same in python using set
pySpark uses sql queries and I have experience with mySQL and said queries in PySpark. I have not seen this format before though
Sadly distinct on is a postgresql specific construct and I'm not sure if it works with pyspark
But it can be replaced with CTEs
I think countries.country would be <table/view name>.country
Yep
Or just country would do too
lemme try something here
no, I couldn't really make that one work
This is what generated the table
*the dataframe
df_sum_of_categories = spark.sql('SELECT product_category, SUM(totals) AS cat_sums, country FROM category_totals GROUP BY country, product_category ORDER BY country, cat_sums DESC')
@paper flower
I've never seen it and it seems ambiguous
Better than using CTEs/joins 
You're using mysql, correct?
Or... What db are you querying?
it's one that another group made with fake online store data
15000 records
this one only has about 300
But what rdbms is it?
*900
Like postgres, mysql, etc
ok
@boreal oak Can I get some data? Either a dump or csv 
Pastebin it
Not the best idea
Pastebin was elite for sending large quantity of code, especially whole databases
and it formatted it perfectly
database != code
what?
Oh btw im new to python so dont scrutinise my lack of lingo/terminology when it comes to python
Databases are not code
ik
Yep, also dumps can be huge, talking gigabytes+
but isnt pastebin a dump
You can't use pastebin unless you store your data in a textual file format
Which is not great
Storage wise
No, it's mostly used for sharing code/text and not in such large quantities
That was my point from the get go
Oh
that we are parsing with pySpark
gimme
this is why you understand convo before you speak ig
it's made up data
But they need a category too
Hey @boreal oak!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
Does this not work
SELECT * FROM countries
GROUP BY country
HAVING MAX(cat_sums)
it's not called countries
guys
cmon
it would be called "table" by default
in sql
I don't know, I used the other person's table name
select *
from countries
inner join (
select max(cat_sums) as max_sum, country
from countries
group by country
) as max_categories
on max_categories.max_sum = countries.cat_sums
and max_categories.country = countries.country;
Maybe something like this
Could use a cte too if that's more readable
What is a CTE
common table expression
This uses cte:
with max_categories as (
select max(cat_sums) as max_sum, country
from countries
group by country
)
select *
from countries
inner join max_categories
on max_categories.max_sum = countries.cat_sums
and max_categories.country = countries.country;
Oh this probably fetches you only one record
Hi, I'm trying to use FastAPI with PostgresSQL using SQLAlchemy, I have a database with a column:
analysis_id = Column(UUID(as_uuid=True), primary_key=True, index=True, default=uuid.uuid4)
However, when I try to create an entry, it says:
`sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: character varying = uuid
LINE 3: WHERE analyses.analysis_id = '6f511b59-7b4b-42f5-8cee-c9c0e2...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
[SQL: SELECT analyses.analysis_id, analyses.started_timestamp, analyses.last_updated_timestamp, analyses.status, analyses.status_detail_messege, analyses.percentage
FROM analyses
WHERE analyses.analysis_id = %(pk_1)s]
[parameters: {'pk_1': UUID('6f511b59-7b4b-42f5-8cee-c9c0e29f538e')}]
(Background on this error at: https://sqlalche.me/e/14/f405)`
Any leads?
Could also be written as this but it uses subquery 
select * from countries
where (
select max(c_inner.cat_sums) = countries.cat_sums
from countries as c_inner
where c_inner.country = countries.country
)
How did you generate your schema/db?
this didn't work
class Analysis(Base):
__tablename__ = "analyses"
analysis_id = Column(UUID(as_uuid=True), primary_key=True, index=True, default=uuid.uuid4)
started_timestamp = Column(DateTime, default=func.now())
last_updated_timestamp = Column(DateTime, default=func.now())
status = Column(String)
status_detail_messege = Column(String)
percentage = Column(Integer)
class Config:
orm_mode = True # Make compatible with ORM's
#################################################################################################################
#
# 1b) What is the top selling category of items per country?
#
#################################################################################################################
df_category_totals = df_typecast.select('product_category', (df_typecast.qty*df_typecast.price).alias('totals'), 'country')
df_category_totals.createOrReplaceTempView('category_totals')
df_sum_of_categories = spark.sql('SELECT product_category, SUM(totals) AS cat_sums, country FROM category_totals GROUP BY country, product_category ORDER BY country, cat_sums DESC')
df_sum_of_categories.show()
columns = ['product_category', 'gross', 'country']
listed_soc = df_sum_of_categories.collect()
grossest_list = []
for i in range(len(listed_soc)):
if i%3 == 0:
grossest_list.append(listed_soc[i])
del listed_soc
df_sorted = spark.createDataFrame(data=tuple(grossest_list), schema = columns)
df_sorted.orderBy('gross')
df_sorted.show(100)
# in console type spark-submit pythonfilename.py
spark.stop()
this gives this result
sqlmodel?
It's an existing postgres table
I think id in your existing table is declared as varchar then? 
I delcared it as a uuid though
@boreal oak I've sent you 3 potential solutions, could you try them?
you can see that even though I said to sort by 'gross', it's sorted haphazardly
did you run those on the data?
on very small sample, yes
oh, ok
@paper flower i.e i created it in a sqlscript using analysis_id uuid PRIMARY KEY
Can you share your python query?
Also are you using sqlmodel or sqlalchemy?
Sure, pretty simple -
def create_analysis(db: Session, analysis: Analysis):
db.add(analysis)
db.commit()
db.refresh(analysis)
return analysis
I see, how did you create analysis model here? Maybe you passed string into it's id, not sure if it matters 
I was able to use strings in place of uuids in postgres, so, that's weird
I tried at first passing uuid.uuid4() object, didn't work - afterwards I just didn't pass and used the default I defined, but still the same problem
That it cannot cast between string <-> uuid
Although both are defined as uuid 😦
what is c_inner
alias for a table so I can compare their columns
do I need to create a temporary view called 'c_inner' from something?
I still think that your id on db side might be declared as string, I just don't see another explanation here
Check this out though -
This is taken from the db
🤷♂️
you used 2 tables to do this query?
Seems like even if analysis_id was a string you still should be able to compare it with uuids
No
What is [PK] in the uuid name?
Primary Key
You should try generating your tables via alembic
@hard sandal I have an example fastapi setup with migrations, you could try using it https://gitlab.com/ThirVondukr/fastapi-cookiecutter
what is the difference between c_inner.cat_sums and countries.cat_sums?
Well, I'm making an inner query to the same table
To be able to address correct table (since they have same name) I should alias one of them
so c_inner is countries_inner 
where did you assign the alias?
as <something>
from countries as c_inner
you said you ran this query on the data?
On very very small sample, yes
can you copy-paste that setup, please?
in its entirety?
so I can follow the nomenclature and make adjustments
I just have a small table:
Gear,1090,Angola
Weapons,581,Angola
Armor,315,Angola
Armor,49735,Argentina
Gear,16319,Argentina
table name (countries) doesn't matter, you can change it as you wish
?
can I please see the complete query?
It's just a table
create table countries (
product_category varchar not null,
cat_sums varchar not null,
country varchar not null
);
@paper flower Before i try using Alembic for my DB migration, do you have another suspects? I'm confused, I'm certain now that the entry in the DB is indeed of type uuid
I really have no idea 😵💫
You probably should use alembic anyway, It's good
It's my first project using databases & fastapi, so I did it manually
You could try to copy my setup and remove everything you don't need
Perhaps I should delete this table and let SQLAlchemy create it for me? I noticed later that I didn't have to create it manually
Yep, but don't use sqlalchemy's create_tables
I currently use - models.Base.metadata.create_all(bind=engine)
Yep, don't do that
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Alembic is a tool from same author that you can use to migrate your schema (add/remove tables,columns etc)
I think fastapi docs use it because it's an example 
I see, do you have a lightweight tutorial for adding alembic to your project?
I'm not sure where do you define your DB in the repo you sent
Mmm, I might actually write something like that 😅
install alembic and run alembic init
wow...so this ran, but it's jumbled:
(Just to clarify) - My use case is to create a db with known column types, and reach it using my apis
Order it i think
df_new = spark.sql('WITH max_categories AS ( \
SELECT MAX(cat_sums) as max_sum, country \
FROM category_totals \
GROUP BY country\
) \
SELECT * FROM category_totals \
INNER JOIN max_categories \
ON max_categories.max_sum = category_totals.cat_sums \
AND max_categories.country = category_totals.country;')
df_new.show()
add order by at the end 
@paper flower How do I init my alembic at a specific DB Schema? My DB has two schemas, one is populated that I don't want to touch named public, and one is a new black one that I created
so now when I initialized my alembic it did lots of stuff with the other schema that I don't want
You can attach models to specific schemas, I'm in the middle of writing a guide for you, so give me ~15 min 😉
Thanks! I'm trying to find how to do that in their documentaiton
Unless you'll add it in your guide 
Hey @paper flower!
It looks like you tried to attach file type(s) that we do not allow (.md). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
😦
Are you using Mongo compass?
nvm i got it
Hello everyone
any suggestions on where to start learning sqlite?
i'd like to connect my DS18S20 temp sensor to a SQL database on my VPS but have no experience with it
I have a raspberry pi
Just do a web search like "sqlite3 tutorial" and you'll see loads of beginner-friendly options
so
how
do
i
connect
oracle database
software
thing's
tables
and work on them
using
PYTHON
🙂
here are several ways
https://www.oracletutorial.com/python-oracle/connecting-to-oracle-database-in-python/
ok tq
How can I exclude the smallest and keep all the others on a sorted numerical column using t-sql?
YEARS
2021
2020
2016
2015
2013
2011
the query will return the following:
YEARS
2021
2020
2016
2015
2013
a bit of topic but does anyone know php mysql database
how to display phpmyadmin on a web page
do u have any resources I could look at for how to do this?
These are relevant https://pypi.org/project/python-dotenv/#getting-started
I have json file, wanted to store keys in one column, values in other column, how it can be possible
Json files don't have columns 
Do you mean to put them in the columns of a database table? This is trivial but exactly how depends on what db
Yes, i knew , it is a nested json file, so I wanted to store some keys and values in different columns of data frame
Don't worry done it. 😀
If you're storing multiple things/entities in single json file then I think it would better to use a db instead
Yes but, it was already a json file, that i got through api
Typo in passAngers?
Hello, I'm creating tests with pytest, I've created a mock DB using SQLAlchemy, I'm trying to create a test to use a get endpoint of my API to query the db, however, I need to populate the DB beforehand, I've tried using a python fixture, but it doesnt seem to get executed, code example:
SQLALCHEMY_DATABASE_URL = "sqlite:///./test.db"
engine = create_engine(
SQLALCHEMY_DATABASE_URL, connect_args={"check_same_thread": False}
)
TestingSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base.metadata.create_all(bind=engine)
@pytest.fixture
def mock_db():
db = TestingSessionLocal()
analysis = Analysis(
analysis_id='mock_id',
status=ApiStatus.Accepted,
status_detail_messege=ApiStatus.Accepted,
percentage=0
)
create_analysis(db, analysis)
@pytest.mark.usefixtures("mock_db")
def test_analysis_status(analysis_id="mock_id", locale="en"):
response = client.get(f'{ROUTE_PREFIX}/analysis/{analysis_id}/status?locale={locale}')
assert response.status_code == 202
assert response.status == ApiStatus.Accepted
But it looks like my fixture is not being invoked
Is there a recommended way in pytest to add an entry to a DB before exeucting the test?
I'm pretty sure It's being executed, you just didn't commit
any reccommendations for where i can look for datasets to practice sql?
Hey there so I'm trying to make ca calculator in python and tkinter an I've added almost everything that I've wanted except one thing; being able to add custom powers to equations, what I mean by custom powers is not just being able to do x to the power of 2, so you could do x to the power of y; you would choose both numbers, I've tried updating each label and updating the out put but I just can't figure out how to do it, please help me in #help-bread.
The function create_analysis() commits though:
def create_analysis(db: Session, analysis: Analysis):
db.add(analysis)
db.commit()
db.refresh(analysis)
return analysis
Ok I'd be very surprised if anybody could help, but does anyone know how to translate that query to SQLAlchemy?
SELECT
player->>'deck_code',
player->>'factions',
COUNT(*) as games,
COUNT(CASE WHEN (player->>'win')::boolean THEN 1 END) as wins
FROM lor_game,
jsonb_array_elements(data->'info'->'players') AS players(player)
WHERE data->'info'->>'game_version' = 'live-green-3-13-42'
GROUP BY
player->>'deck_code',
player->>'factions'
ORDER BY games DESC
It's pretty simple to write in SQL but:
- SQLAlchemy doesn't have good support for unpacking
JSONBarrays - Casting things is a mess with
JSONBas the SQLAlchemy API already changed 3 times for it...
Ok it seems .column_valued("elem") is what I was looking for
It helps, but it doesn't mean you never have to think about doing things the correct way. https://stackoverflow.com/questions/6501583/sqlalchemy-sql-injection
SELECT "meme_price"."id", "meme_price"."product_id", "meme_price"."floor_price", "meme_price"."creation_time" FROM "meme_price" WHERE ("meme_price"."product_id" = 121 AND "meme_price"."creation_time" > '2022-08-22T14:27:43.785750+00:00'::timestamptz AND "meme_price"."creation_time" < '2022-08-23T14:27:43.785735+00:00'::timestamptz) ORDER BY "meme_price"."creation_time" DESC LIMIT 1
I was trying to make some efficiency on this query. This query is used in one API call .../products/.
This query is taking 55.08 ms
Is there any possibility make query more efficient? I am not getting any way
Do i transfer sorting which is by ORDER_BY in query to Client side(Android)?
I don't know what DB you are using but Postgres is what I'm familiar with and I would start with EXPLAIN to check the query plan
You might need to add an index to more or one of those fields and see what works
I don't see anything blatantly inefficient there but I'm pretty much a noob
Check if you have indexes on relevant fields 
I'm getting this error after creating an elephantsql instance, any idea guys?
Thanks.
I think that no one will do it for free. I recommend you writing about some issues while coverting
Hi. Yes
Why do you use sorting with limit 1? Use MAX() instead. It can speed up this
Check login data
Who knows database replication. I needed help with database replication. I need a good resource to learn about database replication
I couldn't find a good source on the internet
There are too many issues to list tbh. SQLAlchemy is so far behind modern ORMs like Prisma it's not even funny.
Hey i have a trouble
I have installed mysql python connecor and my visual studio code detects the module but when i try to run it sends me that is not a package
What's the problem with sqlalchemy
I am facing problems on logic building.
How can i do a work in one time DB open and close instead of multiple call?
for hour in hours: r = ( Price.objects.filter( product__in=chart_products_ids, creation_time__gte=hours[i] ) .order_by("product__id", "-creation_time") .distinct("product__id") ) print(r)
I am thinking not to do Price.objects.filter for all loop
only do once then loop
Are you sure you installed it in the venv or whatever you're using? You could check with pip freeze
ok thanks
Yes, that's probably what you should do
@fading patrol Yes i have installed the packages but, what package is missing?
what do you mean by check the login data? What kind of login data?
I am not getting any logic to do
😦
Strange, no idea, did you do a web search with the error message?
Not sure what you mean... Select all and then loop through the results
First of all Price.objects.filter( product__in=chart_products_ids, )
Then loop over hours[i]
I don't know off the top of my head whether that's the right syntax for whatever ORM you're using but yes I think that's the right concept
Thank you for cooperating
You can probably group by hour 🤔
Though it's quite tricky to write in django orm
In query itself
oh only in one query call 😮
@paper flower but the time should be high if i use group_by in query
What do you mean?
because i have thousands data in query for each loop
so doing query for each time in a loop will call DB again
Grouping is a good suggestion because it doesn't require a loop at all, unless I'm missing something
what would be a good audit table format for association tables? Basically what we have here is association happening between 2 tables. Let's say Team and User table objects in many to many relationship.
Let me try
is it gonna be True if the table is exists or is it gonna be false if the table is exists?
How can I check the table is exists or not
this is what I'm trying to doing.
Your file is named mysql.py, that's the problem. Rename it to mysql_test.py or something.
can you check my sql code?
I don't know postgresql but I don't think your SQL will raise an exeption if not found. So looking at the code that will always return False, mno matter what.
Ok thanks
so then what do you think, how can I catch it? Like it is true or not?
You just need to fetchone() the result. An EXISTS select should always return a value true/false and just use that.
And use parameter binding, not f-strings.
for example:
test = cur.fetchone()
print(test) # true/false
?
why?
I cannot believe it, it works 🤣🤣🤣🤣
The result will be a tuple, you'll need test[0]
!sql-f-string
SQL & f-strings
Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.
Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.
For example, the sqlite3 package supports using ? as a placeholder:
query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)
Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.
See Also
• Extended Example with SQLite (search for "Instead, use the DB-API's parameter substitution")
• PEP-249 - A specification of how database libraries in Python should work
Thanks
Yes, avoid naming sourcecode files after package names, that doesn't work with python because you can import from another file in the same directory like that.
I get this error, the post_data has already existed in the db. The existing variable was false(that is why I tried to create a new table), but it exists. What do you think? @grim vault
this thing return False
@grim vault I think I got it
the upper case was wrong
because in the pgadmin everything is lowercase.
@grim vault Thanks I got it now.
Hi
What are some good uses for dealing with a SQLite database using python? Is it worth my time if I'm just running a SQL script here and there? Or is Python generally used for other purposes with a database?
if you need to store data, no matter which kind of data, it's good practice to use a database over just random files
SQLite is a reasonable solution for small/middle sized scripts if the only thing ever communicating with the database is that script, and is much easier (and depending on the scenario cheaper) to setup than anything bigger
I am copying data from oracle dB (source) to ms Sql (destination). I am using a pandas DF as the intermediate layer with Sqlalchemy. It's large tables, so I read select * from Sql in chunks. The source table has millions of rows. And the connection can be reset or interrupted. Is there a way for me to continue from the last iteration or do I need to drop the destination table every time? I want to understand the chunks part of SQLALCHEMY better.
I'm pretty sure that you should not be using python, specially not pandas, for that.
isn't there some more direct way? or at least an specialised tool for data transfer / importing from another database
Provably. Thanks for the heads up.
I use SQLite all the time with python. Just limit multiple read/write connections at a time to the same file. Other than that, it's been stable for me for years.
Hello
I’m new to databases
This is the method but I don’t understand it
Here’s the question and what I’ve already tried
Are you missing that part?
np
But Ty for trying to help
no worry, that was easy 🙂
What do you all think? I just got caught by the dropped support for postgres:. I happened to be in the middle of moving off Heroku (of course) and so it wasn't obvious where the problem was. https://github.com/sqlalchemy/sqlalchemy/issues/8436
Can someone recommend an API that follows semver? 😫
Hello I’ve made a discord bot
But there is one problem
I want to get the users argument to use it as a variable sort of thing but I need to know sql I only know the basics, and when I say basics I mean like the very beginning of sql
So can someone help
Learn sql?
Is it possible to make a database table through replit
Of course. Maybe this will help: https://replit.com/talk/learn/How-to-create-an-SQLite3-database-in-Python-3/15755
📂 How to create an SQLite3 database in Python 3 📂 PLEASE NOTE: This article assumes that you are familiar with importing modules in Python and SQL syntax/datatypes. If not, please read the following articles How to import modules in Python 3 SQL Datatypes SQL Syntax Knowledge of Python's class methods such as , and (Most commonly known one ...
mongo
Hi, I'm getting this error after I tried to connect to my database.
def __conn_singleton(self) -> Connection:
self.conn = ""
conf = self.conf
try:
self.conn = connect(
host=conf['host'],
dbname=conf['dbname'],
user=conf['user'],
password=conf['password'])
except errors.ConnectionDoesNotExist as e:
logger.exception(f'db connect failed {conf}')
raise e
except errors.OperationalError as e:
logger.exception(f'db connect failed {conf}')
raise e
return self.conn
def __initiate_database(self) -> None:
"""
Database initialization, add the tables or drop it.
"""
with self.__conn_singleton() as conn:
logger.info("Connecting to DB")
if self.conf['should_initialize_database']:
self.__initiate_database()
This is what I'm trying to do.
what should I do?
Thanks I’ll check it out
@sterile pelican where do i start, do you have a good video ?
Dites moi vos retours dans les commentaires et si vous avez eu des problèmes afin que nous puissions les résoudre ;) Vous pouvez également rejoindre mon serveur discord ou il sera plus facile de vous aider.
😎 Rejoignez la chaine pour débloquer des avantages et me supporter:
https://www.youtube.com/channel/UChDVo_Uqomuk7KnMVp-Lhhw/join
✅ CLIQUE...
@twilit arch check the pins of this channel
@sterile pelican how do i store a player that has an inventory and each item has stats for exemple ?
Store items in another table and for player use column with array of items IDs or smth
anyone can help?
I have been failed 😦 to use group by with the list of hours
Sadly django orm is hard to work with when doing any analytical queries
Including your case i think
🥲🥲
does django have a "union" operator? also isn't there a way to join conditions with or?
i was thinking join but can not get logic of syntax
what are you actually trying to do here anyway?
looks like django natively supports UNION on querysets https://stackoverflow.com/q/4411049/2954547
from functools import reduce
from operator import or_ # the "|" operator
qsets = []
for hour in hours:
qsets.append(
Price.objects
.filter(
product__in=chart_products_ids, creation_time__gte=hour
)
.order_by("product__id", "-creation_time")
.distinct("product__id")
)
qset_all_hours = reduce(or_, qsets)
also your code was wrong, you had =hours[i] but you meant =hour
I think they want to make a graph of some sorts? 🤔
I assume hours are adjacent
hm, i guess this is some kind of rolling filter then
You usually can use group by and aggregate whatever metric you want
how would you use group by for this?
select case when creation_time > hour1 then 'hour1' when creation_time > hour2 then 'hour2' ... else 'hourN' AS hour_min_group and group by hour_min_group?
even then i think you'd need group by grouping sets
maybe some kind of trickery with lateral or rolling joins, that stuff all makes my brain hurt
For example
select sum(sale.cost)
group by date_trunc('hour', sale.sold_at)
where sale.sold_at > :after
and sale.sold_at < :before
that's not the same as
select sum(sale.cost)
where sale.sold_at > @hour1
union
select sum(sale.cost)
where sale.sold_at > @hour2
union
...
which is more like what their code does (although idk if that's what they intended it to do)
I don't even fully understand their problem tbh 
Looking at the original post they want to make some sort of chart?
for hour in hours:
r = (
Price.objects.filter(
product__in=chart_products_ids, creation_time__gte=hours[i]
)
.order_by("product__id", "-creation_time")
.distinct("product__id")
)
Group by is perfectly fine for this
but that is literally the query
select distinct product.id
from price join product on price.product_id = product.id
where product.id in @chartProductsIds and price.creation_time >= @hour1
order by product.id, price.creation_time desc
union
select distinct product.id
from price join product on price.product_id = product.id
where product.id in @chartProductsIds and price.creation_time >= @hour2
order by product.id, price.creation_time desc
union
...
which you can't easily do with group by as far as i can tell
maybe you can do it with several joins onto the same table and aggregating the values to array if you're in postgres or some other dbms that supports arrays
Yeah, it's weird
or again, some kind of rolling or lateral monkey business
That query doesn't even have to use unions:
where product.creation_time >= :min_hour
they are pulling the list of distinct products in a table after each hour
each result will be a subset of the former, but they don't appear to be interested in just combining them like you said
perhaps my query should have been
select distinct product.id, @hour1 as min_hour
from price join product on price.product_id = product.id
where product.id in @chartProductsIds and price.creation_time >= @hour1
order by product.id, price.creation_time desc
union
select distinct product.id, @hour2 as min_hour
from price join product on price.product_id = product.id
where product.id in @chartProductsIds and price.creation_time >= @hour2
order by product.id, price.creation_time desc
union
...
i have a pandas dataframe that contain information about games
and user insertes ean
and idk how can i search for value in ean cloumn and then return whole row of the item
barcode name
1234 idk
5879 idk 1
5874 idk 2
and user inserts barcode and it should search barcode column and return entire row where is the value located
how can i do this
tanks
thanks
Yes right bro
Eactly. For each hour calculating latest price for multiple products
How to use this wirhout loop in django filter
🤧🤧🤧
I'm still struggling with this.
i showed you. you still need to write a loop, but you can combine querysets together with .union
i even linked to the docs
the queryset itself doesn't contain any data. so you can do the loop, combine the querysets together, and then start querying. the database will do its best to optimize
although i think this is something you might want to consider doing a different way... i just thought of how
@abstract socket you can do this an entirely different way.
first, make this query:
select product.id, min(price.creation_time) as first_creation_time
from price join product on price.product_id = product.id
group by product.id
order by first_creation_time desc
then, in python code (not in sql), walk through the items and start hours, collecting the results into lists as needed
hm, that logic is actually a bit tricky
took me a while to figure this out
!e ```python
from collections import defaultdict
query_results = [
('a', 10),
('b', 8),
('c', 7),
('d', 6),
('e', 4),
]
start_hours = [8, 6, 4, 2]
product_groups = defaultdict(list)
start_hours_rev = list(reversed(sorted(start_hours)))
start_hours_iter = iter(start_hours_rev)
start_hour = next(start_hours_iter)
for product_id, first_creation_time in query_results:
while first_creation_time < start_hour:
start_hour = next(start_hours_iter, None)
if start_hour is None:
break
product_groups[start_hour].append(product_id)
product_groups = dict(product_groups)
leftover_start_hours = list(start_hours_iter)
for start_hour in leftover_start_hours:
product_groups[start_hour] = []
for later_start_hour, earlier_start_hour in zip(start_hours_rev, start_hours_rev[1:]):
product_groups[earlier_start_hour].extend(product_groups[later_start_hour])
print(product_groups)
@harsh pulsar :white_check_mark: Your 3.11 eval job has completed with return code 0.
{8: ['a', 'b'], 6: ['c', 'd', 'a', 'b'], 4: ['e', 'c', 'd', 'a', 'b'], 2: ['e', 'c', 'd', 'a', 'b']}
this technique does all the grouping in a single pass over the group by output (here simulated with the query_results list), maintaining the start_hour as a kind of "cursor"
hey all, i need some help with my sql query
i have two tables:
TABLE 1 - Meals
Cols
Recipe_1_id Recipe_2_id meal_id
---------------
TABLE 2 - Recipes
Cols
Recipe_id name
if i wanted to get both names for recipe_1_id and recipe_2_id, should i make one query which joins both tables?
so that the output would be:
recipe_1_id recipe_1_name recipe_2_id recipe_2_name meal_id
or is it better to separate the queries, making one query to get a meal and another one to get the respective recipe names?
I think your design is kind of wrong here in the first place unless you want at most 2 recipes to be linked to a meal 
hm what would a better design be?
meals will usually be formed by 1-2 recipes
It looks like you need a many-to-many relationship
what would the columns look like in that case? sorry, it's been a while since i took databases
i assume we'd have a junction table for meals_recipes, linking each recipe with each meal right?
and then querying meals_recipes with the meal_id would output all the linked recipe_ids
i have a good idea on how to approach this problem now, thank you!
drop table if exists meal cascade ;
drop table if exists recipe cascade ;
drop table if exists meal__recipe cascade ;
create table meal
(
id serial primary key,
name varchar(250) not null
);
create table recipe
(
id serial primary key,
name varchar(250) not null
);
create table meal__recipe
(
meal_id integer references meal(id),
recipe_id integer references recipe(id),
primary key (meal_id, recipe_id)
);
insert into meal(name)
values ('Meal');
insert into recipe(name)
values ('Recipe1'), ('Recipe2'), ('Recipe3');
insert into meal__recipe(meal_id, recipe_id) values
(1, 1),
(1, 2),
(1, 3);
select meal.*, recipe.* from meal
join meal__recipe mr on meal.id = mr.meal_id
join recipe on mr.recipe_id = recipe.id;
meal.id;meal.name;recipe.id;recipe.name;
1,Meal,1,Recipe1
1,Meal,2,Recipe2
1,Meal,3,Recipe3
Here meal and recipe tables are linked via meal__recipe table
Now any amount of recipes could be linked to a meal and vice-versa
Yep
would it be okay to add additional columns to meal__recipe?
ah, no, inner should be default 
It depends
for instance if i'd like to specify quantity of each recipe, etc?
Yep, that's the use case I want to talk about
Some data might benefit from that, for example:
You can have a recipe and ingredient tables, and recipe__ingredient table linking them
recipe_ingerdient can contain data about that link - for example amount of that ingredient required by the recipe
Ingredient itself can contain name, nutrient values, etc
For example knowing amount of each ingredient added you can calculate nutritional value of a recipe
for instance, if i want to add a different ingredient quantity for each [meal, recipe] pair, i'd need to create one-to-many relationship between [meal,recipe] and ingredient?
I think ingredient should be linked to recipe but not to meal 🤔
hmm, but maybe i can have something like: "MEAL 001", "Pizza", "500g" and then it'd need to have different ingredient quantity values
not sure if im explaining myself properly haha
If recipe has different ingredients then it's a different recipe 
I serving size is a different you can add it to meal__recipe table
hmm not sure if i'm explaining myself well
like for instance, a meal could be formed by pizza (lets say the quantity is 350g based on this specific pair)
meal_recipes would have that meal instance in it
and so in order to link the right ingredient quantities for 350g, maybe i could have meal_recipe_ingredient like "MEAL 1", "PIZZA", "Cheese", "72.412g"
Yep, it would have meal and recipe ids, you can add additional info into that intermediary table, for example portion_size
If you need to calculate amount of ingredients later you can multiply thir amounts by meal__recipe.portion_size / recipe.portion_size
if i couldn't do that hypothetically speaking, would it be correct to create a junction table of meal_recipe_ingredient?
so if you have meal__recipe.portion_size = 200 and recipe.portion_size = 150 and your recipe requires 100g of cheese then portion of 200g would require 133.33g of cheese
I don't think it would be worth to duplicate data here
okay, thank you! i will have a look at common database designs,, definitely need a refresher on that
Hello, I'm using the SQLAlchemy ORM and I found a problem in my query; I'm storing tracks with info like its title and artist each time a user listens to it
When I query this data (namely, right now, to count how many times they've listened to each song), I am counting two songs with the same name by different artists as the same song. How can I make this func.count method count songs with the same name, but different artists, distinctly?
with Session.begin() as session:
user_id_query = (
session.query(User.id).filter_by(last_fm_user=lfm_user).subquery()
)
tracks: list[Scrobble] = (
session.query(Scrobble)
.filter_by(user_id=user_id_query.c.id)
.filter(Scrobble.unix_timestamp > after_unix_timestamp)
.filter(Scrobble.unix_timestamp < before_unix_timestamp)
.group_by(Scrobble.title)
.order_by(desc(func.count(Scrobble.title)))
.limit(num_tracks)
.all()
)
Scrobble is my table storing the songs and title and artist are columns in it that store the song's title and song's artist respectively
I think you can use a subquery here 🤔
Not sure if there are any better/more optimized options, but that would essentially require your query to do a full seq scan
I actually didn't pay attention to your query and didn't see you were using group_by here, but I still don't quite understand what exactly you want to query
I can try to explain it a little more thoroughly
Essentially what is a Scrobble? It's not a Song but a song listening "event"? 🤔
A scrobble is last.fm's verbage, it's pretty much an 'event', yeah
Each time you listen to a track at least halfway through they 'scrobble' it and pretty much store the timestamp you listened to it at along with the track itself, like artist/title/album of course
Doesn't it have a song id?
I'm not storing any IDs of the songs, just the track's title and artist which is enough to distinguish it - and that's the problem I'm having
for example, the most minimal reproduction would be this (imagine these are entries in my database):
Gasoline - Halsey
Gasoline - Halsey
Gasoline - Halsey
Gasoline - The Weeknd
I want Gasoline - Halsey to be first in the query result, with func.count() returning 3 for it. Gasoline - The Weeknd should have a count of 1 because the track was only listened to once
Can you share your scrobble model?
However I just get a count of 4 for "Gasoline"
class Scrobble(Base):
__tablename__ = "scrobble"
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
artist = Column(String, nullable=False)
album = Column(String)
lfm_url = Column(String)
unix_timestamp = Column(Integer, nullable=False)
user_id = Column(Integer, ForeignKey("user_account.id"))
# each scrobble belongs to one single user
user = relationship("User", uselist=False)
forgive me if any of this is redundant or not best practice because this is my first time using SQLAlchemy but I set this specific table up a month or two ago
It looks fine, you should probably store time as datetime though
Also if there's song_id available it could make everything a bit easier
yeah that may make more sense
I'll check really quick if I can get song ids, I'm in the testing stages where it's not much of a problem to change my model so I suppose that may be the way to go
You can't select all data when using group by too, that's just how it works
You can only select aggregates and columns you use in group by, that's why song_id would be perfect here
Otherwise you could group by title, artist and album, and select these coulmns + number of scrobbles
I'm not going to lie I don't even know if my group by statement is necessary lol
It all depends on what you want to do 
Also it's looking like the API wrapper I'm using doesn't give song IDs which sort of makes sense, since I've gotten the impression that the platform is a bit weird and allows several 'versions'/listings of the same song. maybe something related to users being able to store listening data for songs that aren't on streaming services or something, idk
I see, group by it is then 🙃
I'll try to work with this for the time being
@dim cedar
scrobble_count = func.count("*").label("scrobble_count")
stmt = (
select(
Scrobble.title,
Scrobble.album,
Scrobble.artist,
scrobble_count,
)
.where(
Scrobble.user_id == user_id,
Scrobble.unix_timestamp > after_unix_timestamp,
Scrobble.unix_timestamp < before_unix_timestamp,
)
.group_by(
Scrobble.title,
Scrobble.album,
Scrobble.artist,
)
.order_by(scrobble_count.desc())
.limit(num_tracks)
)
You could write something like this 🤷♂️
You can't select Scrobble model itself because there's no id to reference a concrete row, just keep that in mind
I'm a little confused mainly because I'm not comfortable with SQL stuff yet, but also impressed that this worked
I just added the artist to the group_by line and it seems to be working now
.group_by(Scrobble.title, Scrobble.artist)
You shouldn't be able to select models when using group by
thank you very much for the help! seeing this example also helps because I hadn't seen the select method before but looks pretty similar to query, and your formatting is nicer to read
so session.query(SomeModel).group_by(SomeModel.non_pk_field) shouldn't work 
i wish i could explain why it worked but i've been at this for a while and my brain is a bit perplexed by what is going on 🥴
You should get result similar to this
[
('Black Static', 'VOL. 4 Slaves of Fear', 'HEALTH', 3),
('Flat on the Floor', 'The Long Road', 'Nickelback', 3),
('Slaves of Fear', 'VOL. 4 Slaves of Fear', 'HEALTH', 1),
]
What db are you using?
Yep, sqlite is kind of weird when it comes to that
Try postgres 😵💫
It's easy to install on any os
I'll check it out, I definitely wanted to learn a fancier/more commonly used database in production settings in the future. I went with SQLite for this project to start off with because I've used it in the past with the sqlite3 module, first project using SQLAlchemy though
I have a bad taste in my mouth from trying to set up some microsoft SQL server app but that was also quite a while ago
and was probably too much for my needs at the time
looks like an inner join to me, not an outer one.
Ah, didn't see that 👍
@grim vault That query should be fine with just group by, right? Since it lacks a different appropriate column like a song_id
Also sqlite is allowing them to select any field when using group by for some reason
I guess there's some kind of flag as that's the case with foreign keys
SQLite doesn't enforce you to use all non aggregate columns in the group by: https://www.sqlite.org/lang_select.html#bare_columns_in_an_aggregate_query
The ability to include bare columns in a query is an SQLite-specific extension.
any resources on how I can access my postgresql database remotely?
for me i mainly have been using Postgres on production for quite some time now. still works just fine for me
Anyone can help? What could be wrong?
So no idea how it would actually work with aggregates other than min/max 😃
Exciting
heyy, how do i save items from a player ?, i need to store multiples stats for my items
Whats the best way to store class objects in a database to call them after I restart the program?
Using Sqlmodel as my ORM - how do I remove this warning output in the Terminal:
/home/hlynge/dev/property/venv/lib/python3.8/site-packages/sqlmodel/orm/session.py:101: SAWarning: Class Select will not make use of SQL compilation caching as it does not set the 'inherit_cache' attribute to ``True``. This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions. Set this attribute to True if this object can make use of the cache key generated by the superclass. Alternatively, this attribute may be set to False which will disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)
return super().execute( # type: ignore
200
I Googled it for you: https://www.netiq.com/documentation/identity-manager-47/setup_windows/data/connecting-to-a-remote-postgresql-database.html
But are you sure this is actually what you want to do? More than likely this is an unnecessary security risk
Really? I’m new to this, is there a way for multiple people to access a database from different places? And thanks!
Yes, best to put it behind a firewall and/or VPN, but if it's going to be totally public make sure the username and password are very complex
Ssh with private key would be fine. Depends how you're hosting
Someone can help? Or do you have any tips?
tmsg = await ctx.send(embed=embed, view=endutton())
await tmsg.add_reaction(":giveaways:")
f_msg = await ctx.channel.fetch_message(my_msg.id)
reaction = [r for r in f_msg.reactions if str(r) == ":giveaways:"][0]
sleeptime = duration.to_seconds()
await asyncio.sleep(sleeptime)
users = [user async for user in reaction.users() ]
``` how can i mention the `users` ?
simply use users.mention?
Hey, what as per you is the fastest file format to read and write in python? Or is there a way to interact with two scripts in different python versions?
await message.channel.send(f"{user.mention} is the best")
which kind of things do you have to store?
arbitrary python objects, unstructured data (e.g. json) or structured data (e.g. csv)?
i have constant rocket telemetry data
floats, vectors
structured I guess
maybe look into parquet
do you have to stream it in real time, or just write and read?
real time
yikes
hehe
maybe look into websockets or subprocess then
ok thanks for the advice. is there any way to like.. make a class in one script running python 3.10 and accessing one running python 2.7?
perhaps needless to say, it would be better to just update that 2.7 to 3.10 if possible
other than that, these might work
actuallyy the library i use doesnt work for py3
i think it would be easier to access the script and just run an update() function, but I cant figure out if it is even possible to make a class inside 3.10 and access 2.7
probably not - this isn't on topic for this channel at this point though
yes thank you though
