#databases
1 messages Β· Page 75 of 1
sqlite3..if you did not see that from the code in the link
Everything worked just fine until I got a error saying the same database is locked FROM ANOTHER COG......I will give that cog code,too
I always did the databases on a timeout(1 second) and I even tryed deleteing and recreating the database serversettings.db
A database cannot be accessed by 2 threads at once
You need to use something that can handle that for you
Also disconnecting/reconnecting from the database each time a command is executed isn't a great idea if you need good performances
@torn sphinx
I had that problem once, but now I use sqlachemy with a session factory and it's working just fine π
π€
bruh
so if I run a command that is connecting to the database#1 and THEN on_message happens and connects to the database #1, that equals ERROR, database locked?
Did I understand you correctly?
But But I tried just typing a message and same error happened..good thing I make frequent back_ups of my code
@crystal sapphire
Yeah this is the idea
Since discord.py is an async library, there is a probability that 2 parts of your code can access the database at the same time, causing the error
Β―_(γ)_/Β―
Hey guys it would mean a lot if someone who knew what they were doing would please help me
heres my post
@crystal sapphire I never heard/worked with sqlalchemy,any tips?
please feel free to comment ;//
I don't have anything special, if you need some help, here is my repo https://github.com/Renondedju/ISARTBot/tree/rework-2.0
Just keep in mind that this is a WIP so things might no work as it should here and there
but it should get you started
@alpine heath query = "UPDATE inventory SET name=?, type=?, vendor=?, stock=?. price=?, sellingprice=? WHERE id=?" should be query = "UPDATE inventory SET name=?, type=?, vendor=?, stock=?, price=?, sellingprice=? WHERE id=?" instead
you used a . instead of a , somwhere
Read the error tho, everything is written
sqlite3.OperationalError: near ".": syntax error
Bruh
I will go trough 5 databases until I found the perfect one
database types*
This is a nightmare
@torn sphinx here what the discord.py guys recommends:
https://www.postgresql.org/ - Widely used, open source database with good data integrity and feature set. The typical recommendation for bots.
https://www.mysql.com/ - The de facto SQL standard in software development. Usually not recommended here due to the decreased feature set in comparison to PostgreSQL, but strongly time-tested.
https://www.sqlite.org/index.html - File-based DB, no authentication. Lack of networking means query-by-query time is relatively fast, but tends to chug past a certain amount of load.
Back to REWRITING MY WHOLE DATABASE CODE ;-;
Thanks @crystal sapphire ..At least I am on the best way(to Hell) π

When you drink because of problem...and while you are drunk,the problem gets massivly bigger :'d
Over 3000 lines of code
in one to many relationship one contains foreign key and many contains the relationship code?
what do u mean by relationship code?
its easier to under stand this with an example, Lets say you have 2 tables User and another called post. A user can have many posts but a post can have one author, i.e a User.
So you will define a foregin key relationship on the User table from the post table
Hey ! I have a problem, I can't find how to edit an already existing data in a json file, can someone please help me ?
how do you go about designing a complex SQL database? do you use some kind of software to help the process?
I've never built as big a database as I'll have to in my current project and I don't even know where to start
Hey, im experiencing some issues about hosting a postgresql database on my new raspberry pi 4 4gb, the setup is fine and i can interact with the database locally from the pi. When trying to connect through pgAdmin on a different computer is where the errors come. The spesific error im having trouble with is this SSL error:
If there is more information i can provide to get help please let me know
Im going to sleep now so dont expect a answer right away :/
Which of these would be fastest
Update users set username =null where userid =2525
Replace?
Drop users where users =2525
@torn sphinx I use https://dbdiagram.io/home. First, group your data into tables. Just how you think it is practical. Even if it is duplicate. Then, remove the duplicates by deciding which it's main table is and insert references into the other tables it has to be in. When I have done all that, then I go and use DBeaver to write a SQL file for the db and table creation and insert some example values to get started programming.
@proven wagon Have you allowed access for your computer in /etc/postgresql/%version%/main/pg_hba.conf ? E.g. something like host all all IP/Subnet md5?
@mystic surge One query is always faster than 2
@tawny sail class Theatre(db.Model):
""" Model for storing Theatre related details """
__tablename__ = "theatre"
__table_args__ = {'extend_existing': True}
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(50), nullable=False)
public_id = Column(String(100), unique=True)
description=Column(String(255))
base_price=Column(Integer,nullable=False,default=200)
audi_id = Column(Integer, ForeignKey('audi.id'))
Audi(db.Model):
""" Model for storing auditorium related details """
__tablename__ = "audi"
__table_args__ = {'extend_existing': True}
id=Column(Integer, primary_key=True,autoincrement=True)
audi_name= Column(String(20),nullable=False)
rows = Column(Integer, nullable=False)
columns = Column(Integer, nullable=False)
theatre = relationship('Theatre', backref='audi',lazy='dynamic')
dont mind the syntax
one theatre has many audi. So is it correct way to define it
?
u can wrap it in python markdown highlighting
how
!syntax
lol
so one theater has many audi, so u define a foreign key in the audi table which links to the theater table
foreign key will be in audi table ? and not in theatre table?
okay foreign key in many's table and what about relationship?
theatre = relationship('Theatre', backref='audi',lazy='dynamic')
this line....will be changed to audi ?
and has to be put in theatre table?
yes
oh im not sure how you do it with sqlalchemy but you get the concept right?
i am confused only about that
in the audi table, define a foregin key on the theater table
on where to put foreign key and where to put relationship
wait what
class Theatre(db.Model):
""" Model for storing Theatre related details """
__tablename__ = "theatre"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(50), nullable=False
audi_id = Column(Integer, ForeignKey('audi.id'))
this is right?
theater VARCHAR(100) REFERENCES Theater(id)
)```
you put the code for the foreign key in the audi table
hope u can convert that to sql alchemy
class Audi(db.Model):
theater = Column(String(100), ForeignKey(table.column)```
something like this i belive
where to put relationship?
what relationship?
need to define relationship also na
theatre = relationship('Theatre', backref='audi',lazy='dynamic')
can change this to audi
i guess, idk sql alchemy
yw
In an online booking system, how do we make sure if someone is in the process of booking a reservation, we don't allow another user to try to book it?
Temporary tables are dropped when the connection that created it is dropped
So, set a table with a boolean and an id of the reservation as a temp table and check for it before allowing booking
I see
make sure the connection stays active until the session of the user ends
so a 2nd table, query this table before querying the main booking table
emphasis on temporary tables
how come temporary table?
Temporary tables are dropped when the connection that created it is dropped
and make sure there is a fallback timer
otherwise I can block your business by leaving my browser open
I'd allow 5-15 minutes until the reservation is cancelled, depending on how much there is to fill out etc
I am confused as to why we are creating a temporary table. I would understand if its maybe a temporary row in a table
because there is no such thing as a temporary row
There are temporary tables though, which you can create through sql within a select statement
oh
@rain wagon thanks!
@gilded stag what are you working on?
i will have to deal with the same problem later on
@rain wagon Yes, and also changed some the listen_addresses
Also tried turning SSL off, changing the port etc and always get the same error
I have a feeling its not the postgresql configs that are wrong
@rain wagon thank you for the suggestion
Seat.query.with_entities(Seat.id).filter_by(audi_id=(Showing.query.with_entities(Showing.audi_id).filter_by(id=data['showing_id']))).first()
will this get me a list of id in seat table?
i am getting error : none type object is not iterable....it means it is returning nothing right?
solved π
Hi,
Is there a way to keep data somewhere in database and let many tables point to that data. Like if i have some text and i want to distribute that data in multiple location can i make the data once and then let the location point to that data
@rich trout will this query give me a list of IDs in table Seat
seat_id=Seat.query.with_entities(Seat.id).filter_by(audi_id=(Showing.query.with_entities(Showing.audi_id).filter_by(id=data['showing_id']).first())).all()
if it works, yes
but when i run a for loop over it i get only one entry
def save_a_reservation(data):
rsrv=Reservation.query.filter_by(showing_id=data['showing_id']).first()
if not rsrv:
seat_id=Seat.query.with_entities(Seat.id).filter_by(audi_id=(Showing.query.with_entities(Showing.audi_id).filter_by(id=data['showing_id']).first())).all()
seat_id=[id for id in seat_id]
for i in seat_id:
new_rsrv=Reservation(
seat_id=i,
showing_id=data['showing_id']
)
save_changes(new_rsrv)
response_object = {
'status': 'success',
'message': 'Reservation detail successfully saved ',
}
return response_object, 201
else:
response_object = {
'status': 'fail',
'message': 'Reservation already exists.',
}
return response_object, 409
which is first id of seat table
i did list comprehension if in case i was getting a tuple instead of a list
there was some error in code .....corrected that and it worked
Error when connecting to the database hosted on my raspberry pi.
SSL error: wrong version number
expected authentication request from server, but received S```
I have:
- Changed listen_address to `'*'`
- Turned SSL off
- appended `host all all my_ip md5`
- Changed port to same as pi is hosted off
- Tried to re-install (twice)
Please help π€·
How can i check? π€ Only got my Pi this week @ionic pecan e: sorry for late ping π€·
no worries, friend, i don't mind it at all
for checking server version, you can either ask postgres or ask your package manager
$ sudo -u postgres psql -c 'SELECT version();'
[sudo] password for myself:
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.5 (Debian 11.5-3.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
(1 row)
or
dpkg --list | fgrep postgresql-
ii postgresql-10 10.10-1.pgdg100+1 amd64 object-relational SQL database, version 10 server
ii postgresql-11 11.5-3.pgdg100+1 amd64 object-relational SQL database, version 11 server
but as you see that may display other versions as well, so the top one is more reliable, as it uses the default one (based on postgres' port)
for client library - how are you connecting to postgres from your python app?
chances are you're using psycopg2, right?
No im connecting using pgAdmin
For some reason that query doesent work for me
I have 11.5
@ionic pecan
wait, so that error is from pgadmin?
I dont think sop
I have always been able to connect to localhost and databases hosted online with pgAdmin
oh yeah nvm yes, the error is coming when connecting in pgAdmin @ionic pecan didnt really see what you meant
hey guys
can anyone please help me
i have this code
await ctx.send('player has won')
async with conn.cursor() as mycursor:
userint = ctx.author.id
check2 = "SELECT discordCoin FROM discordcoin WHERE discordID = %s"
await mycursor.execute(check2, (userint,))
myresult2 = await mycursor.fetchone()
addCoin = (myresult2[0] + (coinAmount + coinAmount))
sql = "UPDATE discordcoin SET discordCoin = %s WHERE discordID = %s"
val = (addCoin, ctx.author.id)
async with get_conn() as conn:
async with conn.cursor() as cur:
await cur.execute(sql, val)
await conn.commit()
conn.close()
else:
conn.close()
await ctx.send('NPC has won')```
just wondering why it aintworking >
Why are you getting a new connection when you already have one?
or a new cursor for that matter
@bot.command()
#@commands.has_any_role("[JA] Junior Agent", "[BA] Border Agent", "[BC] Border Commander", "[BI] Border Inspector", "[DD] Department CO/XO", "[DBC] Deputy Border Chief", "[BC] Border Chief", "[BA] Board of Advisors", "[DBD] Deputy Commissioner", "[C] Commissioner")
async def createFile(ctx):
member = ctx.author
role = member.roles
await ctx.send('`Enter your ROBLOX name.`')
def check(m):
return m.author == ctx.author
robloxIgn = await bot.wait_for('message', check=check)
await ctx.send("`Enter your callsign.`")
callsign = await bot.wait_for('message', check=check)
hs = discord.utils.get(ctx.author.roles, id='634073747195822090') #630370651537932308
ofo = discord.utils.get(ctx.author.roles, id='634129126655000586') #630370583959306287
if hs in role:
hs_var = {HS}
c.execute("INSERT INTO this (roblox, call, depname) VALUES (?, ?, ?)", (robloxIgn.content, callsign.content, hs_var))
conn.commit()
elif ofo in role:
ofo_var = {OFO}
c.execute("INSERT INTO this (roblox, call, depname) VALUES (?, ?, ?)", (robloxIgn.content, callsign.content, ofo_var))
conn.commit()
else:
await ctx.send('ok')
await ctx.send(f"`Successfully created a user file for {robloxIgn.content}`")
This code does not output anything onto the SQL, not sure whats wrong nor does it output any errors.
@vocal moon hs and ofo are always None, because discord.py's ids are integers, not strings. There will be no role with id "634073747195822090", but there may be one with id 634073747195822090
with pgadmin 4 how do you make a sub table from a column?
i have a column guild id and i want to have a set of preferences for the guild
you can make a table and connect it
foreign keys to connect tables
take a look at table relationships, fundamental for database management and creation
Postgresql
Is it bad if I directly assign a value when selecting things from the database? Security wise
I know in sqlite3 we do SOMETHING = ?
http://initd.org/psycopg/docs/usage.html
>>> cur.execute("INSERT INTO numbers VALUES (%s, %s)" % (10, 20)) # WRONG
>>> cur.execute("INSERT INTO numbers VALUES (%s, %s)", (10, 20)) # correct
That is if you use psycopg. Other libs may have other approaches.
Thank you very much!
There is a statement for that in table schema query.
I think in postgres there is a datatype called SERIAL.
PostgreSQL - AUTO INCREMENT - PostgreSQL has the data types smallserial, serial and bigserial; these are not true types, but merely a notational convenience for creating unique identifier co
You are such a nice person!! Thanks
You can make your own sequence if you like, but generally serial should be enough.
Cool
I get too many connections on the role??
the port is too busy or what?
nevermind I just had to refresh my database browser -_-
Error when connecting to the database hosted on my raspberry pi.
I am using pgAdmin to connect.
SSL error: wrong version number
expected authentication request from server, but received S```
I have:
- postgresql version 11.5 on my pi
- Changed listen_address to `'*'`
- Tried turning SSL off
- appended `host all all my_ip md5`
- Changed port to same as pi is hosted off
- Tried to re-install (twice)
Please help π€·
For anyone looking back at this. I solved it.
did everything again, but with sudo apt-get update && sudo apt-get upgrade -y before installing postgresql
π glad you fixed it
I store datetime.datetime.utcnow() on my database on a column that's TIMESTAMP WITH TIME ZONE,
when i fetch this value from the database (i get it as utc, not sure if i can change this behavior with asyncpg)
i try to compare it again with datetime.datetime.utcnow() and it complains that i can't deal with offset-naive and offset-aware datetimes?
datetime.utcnow does not return any timezone offset
while the result from a database will have explicit UTC offset
Hi guys, I am trying to set up a python DB using python DB-API I am having issues inserting values into my DB using parameters when I want to insert integers and not just strings.
I tried to do it this way - but it makes sense that I get an error since I have designed my tables columns to be integers other than "Name".
What is the correct paramstyle for inserting integers I already tried %int
?
should I use multiple databases rather than different tables only for POSTGRESQL ? I was using sqlite3 before,but I got databases locked..thus why I am switching
@pure scroll So it is wise to do: datetime.datetime.now(datetime.timezone.utc) instead of utcnow()? I want to handle everything in UTC and convert as it's needed (if it is)
Also if I want to compare the UTC time I have on my database on that table (from that datetime.datetime.now()) I should compare it with something like CURRENT_TIMESTAMP AT TIME ZONE 'UTC'?
depends on the database, but most of the times databases are able to handle time offsets
e.g if you compare UTC vs GMT database will know how to convert one to another
datetime.datetime.now(datetime.timezone.utc) yeah that will work
Well all I know is that what I will be inserting into the table is utc all the time
yes, but you can query for it using any timezone
db will convert your query timestamp to UTC
depends on your configuration you can make it store in any other timezone
but it's recommended to do it in UTC
hmm i never saw any kind of setting to change how it' stored (other than setting current time zone)
which database you have?
PSQL
yeah, it stores in UTC and only utc
Yeah I was just checking that
I'm curious if the only way to change what time zone it returns is by doing select current_timestamp at time zone 'anything other than UTC';
or if it can be i.e. set up by asyncpg
I think it's up to you, I personally prefer to have as less logic on DB layer as possible, so I'd do that in my python code
but there is nothing wrong with doing that on DB and most likely DB will make it faster
oh mostly curiosity as i'll handle everything in utc and dont need to show any other time zone
just want to be sure nothing messes up
multiple databases to solve what problem?
Postgres is able to hold lots of concurrent requests and connections to it (depends on your server and queries of course but can go up to thousands)
COOL
I am asking because I am scared the database gets locked or something like sqlite3 did π
So to summarize:
create table testing (timedata timezone with timestamp);
# then insert with
await db.execute('insert into time(timedata) values($1)', datetime.datetime.now(datetime.timezone.utc))
is fine?
yeah, should work
I'm not really sure if this is correct behaviour or i'm doing something wrong but if i do:
now = datetime.datetime.now(datetime.timezone.utc)
jst = now.astimezone(pytz.timezone('Japan'))
await db.execute('INSERT INTO testing(timedata) VALUES($1)', jst)
result = await db.fetchval('SELECT timedata AT TIME ZONE \'Japan\' FROM testing')
print(result == jst) # This gives False
# Japan Time: 2019-10-22 20:50:55.351206+09:00
# Japan Time (Database): 2019-10-22 20:50:55.351206
I'm not sure what's causing this
But if I return it with no timezone (I assume it should be returning as UTC since the value is 2019-10-22 12:13:27.724365+00:00 the comparison works fine
select timedata AT TIME ZONE 'Japan' from testing;
timedata
----------------------------
2019-10-22 21:13:27.724365
(1 row)
Hey! I would need to interact with a MySQL DB in one of my python project but I don't know which module to use for that. Should I just use the MySQL Connector for python or would you rather advise others? Which? Why?
@tawny plank look up sqlalchemy
Im learning sql. Any good certs to prove to employers I know what Im doing?
@rough onyx ok thanks. Why would you recommend it over the others though? What are your experiences with DB modules for python?
@tawny plank sqlalchemy is extremely popular by itself and you will find lots of plugins to make it interact with libraries like flask, graphene, etc...
also it will be a bit more "abstract" than something like mysql-connector-python, so it will probably lead to quicker development and code that's easier to port for other rdbms
How is it more abstract? (not very experienced with DBs)
basically mysql-connector-python is about sending raw mysql queries
ok
sqlalchemy let's you use the db with python function
like
s = select([table.c.col1]).where(table.c.col2==5)
So it create sother models? Like makes a table a dict or something of the kind?
ok I see
kinda yeah
ok
columns = [seller,buyer,item,price]
c.execute("""INSERT INTO auctions (seller,buyer,item,price)
values(?,?,?,?)""" % (columns))```
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 4, and there are 0 supplied.
c.execute("""INSERT INTO auctions (seller,buyer,item,price)
VALUES(?,?,?,?)""" % (seller,buyer,item,price))```
`TypeError: not all arguments converted during string formatting`
what am i missing?
How many columns does the table auctions have? If there are missing columns do they have default values?
c.execute("""CREATE TABLE auctions (seller,buyer,item,price)""")
!e
"""INSERT INTO auctions (seller,buyer,item,price) VALUES (%d, %s, %s, %s)""" % (1, "hello", "world", 2)
Sorry, but you may only use this command within #bot-commands.
@elfin geyser you need to use the old printf-style formatting placeholders if you're using % for string formatting
>>> """INSERT INTO auctions (seller,buyer,item,price) VALUES (%d, %s, %s, %s)""" % (1, "hello", "world", 2)
'INSERT INTO auctions (seller,buyer,item,price) VALUES (1, hello, world, 2)'
try this
never try this with database queries
also true ^
just never use string formatting as this is a subject to sql injection
c.execute("INSERT INTO auctions (seller,buyer,item,price) VALUES(?,?,?,?)" , (seller,buyer,item,price))
I think the proper syntax is a , not %.
but @twilit swan was half way there, you most likely need to use %s instead of ?, even though we have a database API standart in python, none of db drivers do really care about it
using ? is the recommended way of executing sql statements I believe.
yeah, this is a standard but again, many libraries don't care about it
@twilit swan when i try to run your example:
c.execute("""INSERT INTO auctions (seller,buyer,item,price) VALUES (%d, %s, %s, %s)""" % (1, "hello", "world", 2))
sqlite3.OperationalError: no such column: hello```
try % (1, "'hello'", "'world'", 2)
" 'hello' "
@elfin geyser it thinks hello is a column name because i didn't put it in single quotes
did it work? you mean you want to store the " 'hello' " into a variable?
yes
a - "hello" c.execute("""INSERT INTO auctions (seller,buyer,item,price) VALUES (%d, %s, %s, %s)""" % (1, 'a', "'world'", 2))
a also needs to be put into quotation marks else it looks for a column called a instead of the string a.
with the ? you don't need to worry about putting the right quotes. I also don't know if single quotes are accepted.
yeah, ? is the better way compared to manually constructing queries and worrying about escaping
This is why you should stick to ? when possible:
c.execute("""INSERT INTO auctions (seller,buyer,item,price) VALUES (%s, %s, %s, %s)""" % ('"sql"','"injection"','"careful"','"lmao"); DROP TABLE auctions;--'))
it inserts a new row into auctions and deletes the table auctions. With ? the last value would be the string "lmao"); DROP TABLE auctions;. Without ? it would evaluate the SQL statement DROP TABLE auctions
-- is a comment. With that it will ignore the closing bracket ) that would normally follow.
What are the main differenced between asyncpg.pool.Pool and asyncpg.Connection
a pool is a set of connections
Because establishing connection with a database server takes time it makes sense too keep it alive.
So Pool is abstraction of set of connections, whenever you take a connection from it, it will start a transaction and give one available connection to you.
Whenever you close it, it will actually return it back to pool and also would execute rollback statement so that connection is not dirty, and has no pending transaction running on it.
Thanks π
Hey!
I must be completly blind but I've been looking at how to install a DBAPI for sqlalchemy
I wonder, which is the recommended one?
@pure scroll reading what you said about, since a pool is a set of connections,
whats the difference between doing:
mypool.execute('myquery')
vs
async with mypool.acquire() as test:
test.execute('myquery')
what is the purpose of the second and when should i use it?
in the first example pool will take the first available connection and execute statement on it.
so e.g if you do
mypool.execute('query1')
mypool.execute('query2')
``` its not guaranteed that both sttatement will happen within the same db connection
while in the second example yoou explicitly aquire a connection and pool will never give it to any other thread/coroutine until you release it back @torn sphinx
@pure scroll oh, as i thought makes sense. what "real life" use cases are a good example of when needing to acquire()?
Can't come up with any apart from a transaction, as long as you don't need a transaction, you can stick to executing things on pool directly
hey, I've tried asking this in web-dev channel... How do I make each name return it's own database... right now, when I print my database in html, it returns every database ...
I need some help with choosing a data type for postgres sql
I basically have some json that's in a format
the string appears multiple times, like tags from an album
and some cells may have more than one of these items
Basically Im planning to make arrays of hstores
is there any better data type I should use?
the key can be dense_ranked and made into a one to many relationship
how to make changes into and existing record in SQLAlchemy?
is it generally bad practice to access data (csv's, sql tables) in a classes constructor? that is to say, I pass a name to the constructor and then have the constructor find the relevant data in a file to set it's attributes
import discord
from discord.ext import commands
import sqlite3
class RPG(commands.Cog):
def __init__(self, bot):
self.bot = bot
global conn
global c
conn = sqlite3.connect('main.db')
c = conn.cursor()
def create_table():
c.execute('CREATE TABLE IF NOT EXISTS info(name BLOB, ammount BLOB)')
create_table()
@commands.command()
async def create(self, ctx):
value = 100.00
value = float(value)
author = ctx.message.author
c.execute("INSERT INTO this (name, ammount) VALUES (?, ?)", (author, value))
conn.commit()
await ctx.send(f"`Created a file for {author}`")
def setup(bot):
bot.add_cog(RPG(bot))
I have this code, the main error is in the line where I insert the two variables, I am getting no errors and no output
anyone?
@vocal moon
I just started Python 2 weeks ago, so I'm not rlly sure, but maybe...
The output of ctx.message.author is type abc.User. So it's not just a simple string, not sure if your database can store that datatype? You can try changing it to ctx.message.author.id (Data type integer).
:p
Np ;)
Idk dude, I'm not using your kind of database.
o thanks ill google it also can i dm you real quick
Ofc you can dm me
thanks
can someone explain how this is possible in sql server and what dumb mistake I am making?
the only difference between the left and right query is removing the degenerate groupby fields. The output is the same. But the left query runs in 8s while the right in 45s...
What exactly is the issue?
the fact that removing a bunch of degenerate groupby makes the thing run 5x slower
Execution time difference? Grouping may take quite a while depending on the nature and distribution of the data.
Oh, so it's the other way around.
Lol. I see.
slower with groupby removed
and key point is the groupby are degenerate, they have a cardinality of 1, they match the where clause predicates
Then I would guess there are different optimizations and essentially different low level queries are being executed.
You may want to do EXPLAIN or whatever the equivalent of your server for it is and inspect the output.
yeah i'm currently locked out from showplan/explain so just left to guess at the random magic sql server must be doing under the hood to act like this
But yeah, the behavior is puzzling to say the least.
I would be interested to know the specifics too if you ever get to it. =]
only guess is the groupby stack might be aligned to the clustered index order and the reduced stack is not, so it seeks more..? Idk
I suspect it's difficult for the database to compare A_Location_ID
That, or Type_ID is indexed and A_Location_ID is not, which would lead to the differing query speeds
The one on the left, then, would involve grouping first by type_Id, then by whatever else it has, but the other would involve grouping over a parameter that may be much more awkward to section over
Especially if it treats the index as a precomputed binary tree, at which point the grouping is already essentially computed at the beginning of the query itself
hrm
...I read your query again
If group_by occurs before the WHERE statement, this makes sense
that does not seem like an optimized query plan though
I found a good explanation of your exact circumstances
due to bad luck and a hedging optimizer, it does indeed seem that some of your filters are being pushed behind the join
I would expect only adding type_id to the group_by would return you to the fast speed
Or, as the article suggests, in some cases () might work too
i'll give that a whirl and ping you back if it works
nope
and it seems like removing any of the groupby stages from the original stack causes a perf hit (not sure if linear)
hm.
A portion of the performance is lost for each one?
If it's splitting the where clause, that might happen
Each group_by could force the where to be on the other side of the join
thanks for helping out, im gonna give up for now and wait for them to give me showplan so at least this isn't blind guessing
Hi guys, I am getting a syntax error that I can not fix.
I am typing this statement: """CREATE OR REPLACE TABLE item (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(128) NOT NULL,β¨ type VARCHAR(128) NOT NULL,β¨ amount INT NOT NULL,β¨ cost INT NOT NULL);
I get this error:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'type VARCHAR(128) NOT NULL,β¨ amount INT NOT NULL,β¨ cost INT NOT NULL)' at line 1
I also tried to change "type" to a different name in case it was a reserved word
!sucribe
import sqlite3
conn = sqlite3.connect('users.db')
c = conn.cursor()
def create_table():
c.execute("CREATE TABLE IF NOT EXISTS this(id BLOB, value BLOB)")
create_table()
@bot.command()
async def create(ctx):
member = ctx.author.id
value = 100.00
c.execute("INSERT INTO this (id, value) VALUES (?, ?, ?)", (member, value))
conn.commit()
await ctx.send(f"`Created a file for {ctx.author}`")
It does not seem to be working or outputting any errors
INSERT INTO this (id, value) VALUES (?, ?, ?)
is wrong, the number of ?'s should match the number of inputs
You should be getting an error
Do take note to sanitize the inputs before inserting them into the SQL query.
@rich trout so turns out at least some facet of it is caching of some sort. :/
oh that sounds peachy
every time the query string is changed, it takes 45s to run, then 8s on the subsequent run. The groupby stack was a red herring
but that makes the actual perf problem even more fun to diagnose...
I had to deal with an http cache that ignored cache-control: no cache
man was that fun debugging cached authentication
mumble
You are grouping by A_Location_ID, correct?
Do you have permission to create an index?
location id, geometry
I also spot a distict(stringvalue), how big is stringvalue?
yeah I think I can do pretty much anything except showplan, because our itsec guys are super smart and don't need to have their hand held for each individual permission on each db
it's a varchar 100
we'll be reducing it to a numeric hash
it walks 10m, the total table is about 40m
i'm shocked it takes 8 seconds to do so little work
that would do it
we must still be doing something really wrong
It's not doing little work
I typically use something like clickhouse for these kinds of workloads and it'd do this in 8 milliseconds
its comparing 10 million rows of 100 length strings to each other
see value
check against seen values
is it new? add, incr count
move on
or, I suppose, O(n) hashes
yeah
but 100 characters has a very large number of possibilities
i want to try another agg that's not count d to verify this, not sure what i can use though
sum/max/avg/etc are all precalculated in column stores
so not a fair comparison
I'd just try Count(stringvalue)
precalc'd
Is it though? Since it's per group?
yes because it can then sum the precalc group counts without actually evaluating them per row
count d is unique here in having to walk every row
why not do an == comparison
I just tried count anyway. Same 8 seconds as count d actually
so that's not the bottleneck
i mean maybe 8 sec is just how long sql server takes to traverse 10m rows, i don't know what to expect from it
but something like clickhouse would do this closer to 8-80ms
Well, since it pulled 8 seconds I think we can deduce that it is the hashing of uniques that's triggering the 5x time increase
let me verify that..
good point
yes, the caching behaviour and 5x slowdown on first run seems to be gone. That's one down
Good, at least we know where that's coming from
I guess it retains the hashmap of seen items between runs and dumps it if the query changes
Try a run with just counts(), no actual rows returned
That ought to give us a view on how fast it iterates and filters through it all
ok so
1: the caching behaviour is back, so I guess it's just an unpredictable force for now
2:
FROM
(
SELECT
count(*) as Num_Records
FROM dbo.FACT_AssetAttributes_Col
where Attr_URN = 'urn:asset:ci_name'
and TypeId = 5
and Aggregation = 'COUNTD'
and DataFormat = 'numeric'
and DisplayName = 'CI Name'
and GroupDisplayName = 'Asset'
group by
Geography
,A_Location_ID
) t```
slow time with groupbys, instant without
Is Geography a string?
try group_by type_id
you win
Instant again?
anything that's not a string in the groupbys is instant
Yep
So the slowdown is occurring because it builds a massive list of unique strings and has to filter on them
In both cases
i've been spoilt by analytics stores that intern strings as a matter of fact
good to know, note to self: all that extra work you did turning string flags into foreignkeys was a very good idea
I don't know, but I hope mysql does that, but your strings are probably too long
they're pretty long but low cardinality. But yeah this all makes sense now π

thanks again
May I point you towards an alternate solution instead of integer hashing?
interning?
foreign keys
how's that work
I've noticed you have more than a few string flags involved
yes
Generally, you'd create a table for each column of flag
so, you'd have an Aggregations table
and it would have an id, and the aggregation label, and anything else specific to it
then in this table you would simply have the integer referencing that aggregation
it does end up making queries a little bit more complicated, but as you can see it is very efficient
so interning with extra steps?
i was thinking of just chucking in an additional column for each string flag with an integer version of it, no need for hashing, just inrc an int or something for each uniq
It's basically that except db side
It should be basically free
(again I mostly live on data-science and work with very different tooling)
It only needs to be done once, and if you do a JOIN WHERE then it should filter first
well, I have time. So might as well try both and see if there's a perf cost
regardless, even if you don't want to add that to the query, you can simply use the ID's as your integer interning and load them manually into your application at the beginning
yeah
π
good luck!
sorry, join on
ugh
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
is an example
That should filter prior to joining
noted
i think as long as an index exists it'll be o(1) slower than storing the intern'd/key value explicitly alongside the string
1 being very small
F
F to respect our itsec team that's going to get an ass whooping on monday
hiii! is there any way to store list of strings in a sqlite3 db
list of strings per row, or just list of strings in general?
per row
apparently not, SO recommends storing a long string with a separator per item and deserialising from that
right on, thanks
@bot.command()
async def flip(ctx, choice, bet: int):
c.execute(f"SELECT * FROM this WHERE id = {ctx.author.id}")
data = c.fetchall()
value = data[0][1]
choice = choice.upper()
x = random.choice(["HEADS", "TAILS"])
if choice == x:
y = value+bet
c.execute("UPDATE this SET value = {} WHERE id = {}".format(y, ctx.author.id))
conn.commit()
await ctx.send("yess boi u won")
else:
await ctx.send(x)
await ctx.send("`(test)`")
Not updating and it's not presenting any errors
Β¬
Could it be the curser isn't set properly or something?
I think it should still be the same tho, right ?
Yeah
Can you look into the database using the gui and see if the values are updating or what's going on?
It should have some record of changes
That's all I can really offer π¦
are you using sql alchemy?
could it be because there is one row im not inserting into?
sqlite
already done that
wait i think i found my issue
nvm
yeh
no
wait
lemme try
nop
nope*
I FOUND THE ISSUE
i had an on_error event that i did not even know about and it was throwing the errors away
??
Wait
what was causing the on-error event?
How was it being handled that you couldn't see it?
what was the issue ?
None Type + int
lol
lmao
Hey, how to use a cursor in a multi-file project without open and close it for each operation?
yes, why?
In one db
yes
Or separate
same db
Just rerun you sql code to the table that you want
And it should work with out re-declaring
Unfortunately
You do need to reopen or redo the cursor i think
Just gimme a sec to check my notes
cur.execute(sql, params)
That
Will be needed to be runned again
Otherwise it should work
Since I'm pretty new with db I'll try to explain clearly what I've in mind:
Let's say I've 2 file
-db.py
-other.py
In my db.py I define all my functions related to my database, and then I import these functions in other.py to use them. Question is: How to avoid opening and closing the database in each function of db.py? Or is it ok to do so?
Use a getter function that checks if the object already exists
look at setattr and getattr as well
especially if using Flask or Django
It is also fine to open it each time
but you also don't need to close the db if you still need it
commit() should be enough and your data is safe
I don't use any framework, but I'll look into getter and setattr
Okay so then your using sqlite3???
discord bot with a db
Okay
I don't really understand how to have a cursor that can be used from different files ,that's why I thought I needed to open and close it each time.
@rain wagon You said it was possible not to close it, how? each function is separated from the others
I've assumed you are making a normal program or web app
Ok, thank you both anyway!
Hey all, I'm having trouble understanding how to structure my database such that users can select their interests from a pre-defined list as well as add their own custom interests
Well, you are making a table with some predefined interests and then have a form where user can enter their own
also add the ability to remove the predefined stuff
so, when a user registers, prefill the table and let them manage it
how does that look as far as the database? Like I wouldn't make a column in my interests table for every interest would I?
And this would be a many-to-many relationship between users and interests?
there are multiple ways to approach this: One way is to have a table where you resolve the n:m relationship, insert the foreign keys for the predefined interests and the user id as another foreign keys. It get's tricky when users make their own interests, because you need to save them somewhere as well
so I'd say have 2 different tables, one with predefined, one with user defined and resolve it in a third table
hm okay. If I try to mock this schema up would you take a look?
hold on, need to think again π
@mystic ivy
I think this is kind of a solution
Add all interests to ONE table, reference the creator, then resolve the chosen interests in a new table. All predefined interests get id 1, so make sure to never delete ID 1
This way you can also offer those interests to other users as they type
Thanks!! So when I say predefined I don't mean pre-selected, I mean when they're looking at this page there are checkboxes for preset options but if they're option isn't there, they can type this in
that might simplify things
also confused how all predefined interests get id 1, wouldn't every interest have its own ID?
So I'm guessing I would just have checkboxes, then pull the values from the checkboxes that are selected, then commit them to t_interests with the current_user.id?
Ah right so the resolve table is used to capture those interests that weren't created by the current user? So I would query the resolve table instead. But if I go the route I mentioned above, I could eliminate the resolve table right?
When you have a n:m relationship, you need a resolve table
Oh right. But I could get rid of the created_by_id field?
I wouldn't.
So you can delete stuff created by a specific user
and considering DSGVO, you may have to
sorry, GDPR in English
Okay, thanks so much for all your help π
I have table of polls.
When i insert the polls into the table i do it in newest to oldest.
Which makes it so that my serial id of the newest becomes 1.
This causes trouble if i keep inserting polls to the table since than it will go 3 -> 2 -> 1 -> 46 -> 47... when i output the polls to users.
Is there a way to reset the serial id in the opposite order or a custom order? (i got a column for poll creation i can sort by)
there's most likely a way to reset a serial id (on postgres it's possible if that's what you're using)
but I wouldn't do that
if it's just an output issue, I think a simple for loop will allow you to reindex things easily
I have table1, I want to add a colum to it from table2 with value depending on some condition. I currently have done this but I did it by creating a table3 which is made from JOIN of table1 and table2. What is a better way of just updating table1 so I don't need to have table3 ?
def Test():
c.execute("DROP TABLE IF EXISTS TEST")
Query = "CREATE TABLE if not exists TEST AS SELECT a.Symbol, a.FieldName, a.Value1, a.Value2, b.Value as Permission FROM FieldDifferences a JOIN '{}' as b USING(Symbol) WHERE b.FieldName = 'P'".format(tablename1)
c.execute(Query)```
so now I want to have table3 (TEST) replace table1 (FieldDifferences)
which does not seem like good practice for this
nvm i did this by doing a select of a select
Hi all - does anyone know if there's a way to either a) cascade deletion down to foreign keys when deleting their related rows or b) refuse to delete if a conflict arises? I'm running into an issue where running raw SQL against my postgres DB doesn't seem to be obeying the foreign key constraints I've put in place.
The query I'm using is
WITH updated AS ( INSERT INTO screener_ticker SELECT * FROM screener_ticker_temp ON CONFLICT (ticker) DO UPDATE SET "table" = EXCLUDED."table", permaticker = EXCLUDED.permaticker, ticker = EXCLUDED.ticker, name = EXCLUDED.name, [...other fields] RETURNING id ) DELETE FROM screener_ticker CASCADE WHERE id NOT IN ( SELECT id FROM updated );
and I have a related table screener_incomestatement which has a FK back to screener_ticker.id but for some reason this is deleting the ticker but not the related income statement.
I think I found the problem. I think my FK constraints got deleted somehow as I was messing around here
If I have a string that is repeated multiple times in a table is it better to have a lookup table and use a constant (integer) to reference to the string in the lookup table?
yes
any free database cloud ?
hello, I am new to learning sql in class and I've tried googling this but haven't really found what I'm looking for
I'm wondering if SUM has to be by itself (not nested in anything) or if I can use it in the following way:
SELECT (SUM(col1)+SUM(col2)) col3
(also same question for COUNT)
(please tag if answering)
Quick question, what's the interest of disabling auto commit in a DB? (Postgres)
autocommit causes every statement to "commit" if it finishes successfully
With autocommit off, you need to issue a COMMIT or ROLLBACK command to finalize your instructions
I know what it does, ty, but I am more questioning the goal of this practice
Here's an example use:
rows = cursor.execute("Massive expensive query")
if not validate(rows):
cursor.execute("ROLLBACK")
raise Error()
more_rows = cursor.execute("Modify half of the data we need to")
if not validate(more_rows):
cursor.execute("ROLLBACK")
raise Error()
param1, param2 = learning_model_operate(rows, more_rows)
finish = cursor.execute("Finish modifications, %, %", param1, param2)
cursor.execute("COMMIT")
Essentially, it lets you "pause" a query while you perform external operations, without losing your state or leaving the database "halfway" between valid states
depending on your isolation level, this "pause" could be visible or not
this transaction would preserve atomicity depending on your isolation level, whereas with autocommit on you may overwrite new data with results from old data
Hmm right, but why checking the data after executing the query and not before?
I mean, checking if the query is right before sending it to the db
You could, but with autocommit off you can interface with things outside the database to ensure the data is correct, such as a REST api
You can do the same by prepending your queries with "BEGIN TRANSACTION"
@elfin geyser if you're looking for a free database you won't really find a good stable and reliable one, other than Firebase (made by google), it uses NoSQL tho
postgres definitely is stable and reliable lol
@ionic salmon afaik INTEGER PRIMARY KEY should reuse deleted ids
but maybe that's only when the id count reaches the max
i'm not a sqlite user, but do you have AUTOINCREMENT turned on?
https://www.sqlite.org/autoinc.html
If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.
@ionic salmon
anyone familiar w/ postgres cursor objects and how to execute more advanced statements safely w/ them? particularly w/ parameterized SQL .... for example how can I safely do something like this in a cursor.execute()?
CREATE TABLE "table_name" ("id" serial NOT NULL PRIMARY KEY, "field1" varchar(255) NULL, "field2" varchar(255) NULL, [... other fields]);
I'm reading that escaping the quotes is "doing it wrong" so I'm trying to understand how to form this string properly
yeah I'm seeing that right now actually π
removing the quotes as we speak lol
what about if I have a column called "table"?
might I need to quote that?
don't use reserved names
it's a requirement to have this field called table. changing it isn't currently an option
table is a sql keyword
I understand that. I agree w/ you on not using reserved names, just saying in this situation I don't have a choice
I need to make the schema match the data going into it which has a column called "table" referencing which table the data comes from at the source
then call it from_table
If it is homework, then your teacher needs a lesson
if it's work, then tell your boss it does not work this way
It's not homework, it's a professional codebase w/ 200k lines of code and I am my own boss, and this is beyond my control.
I don't really wanna argue about the semantics of it though - I agree w/ you, but I'm very aware of the need to stay away from reserved keywords and I know for a fact this will not be a problem in this case that's all
anyway you answered my original question so it's all good!
thanks
as an aside though I do agree w/ you...the issue is that the API I get the data from calls the field "table" (which I agree is dumb and bad practice) but I don't want to deal w/ translating fields that are named differently on my end just for this one column.
If you know there are going to be conflicts or are potentially going to be conflicts with reserved names, I would suggest having an easy-add easy-remove shared prefix/postfix applied to every table and column name
something like project_
so you can have project_table
If you're using SQLite, here's a guide on string quoting:
The standard quoting for identifiers is the double quote
thanks - I'm using Postgres and you're right that's not a bad idea but it's really not a big deal in my situation π
just quoting that one field works fine and won't cause any problems. This is a nightly management operation I'm writing anyway so I don't need to worry about SQL injection here
I'd be more inclined to avoid escaping and stuff if it was user facing
Postgres has some interesting string escaping features sqlite doesn't have, too, but I think "" will work fine for your needs
yea that worked fine.
at the moment the issue I'm struggling w/ is how to handle duplicate IDs that are inserted in a temp table
for example in my temp table:
----------------------------------------------------------------------------
(1,A,"Agilent Technologies Inc")
(2,AA,"Alcoa Corp")
(3,AAAB,"Admiralty Bancorp Inc")
(4,AAAGY,"Altana Aktiengesellschaft")
(5,AAAP,"Advanced Accelerator Applications SA")
(6,AABC,"Access Anytime Bancorp Inc")
(7,AAC,"AAC Holdings Inc")
(8,AAC1,"Arcadia Financial Ltd")
(9,AACB,"Alliance Atlantis Communications Inc")
(10,AACC,"Asset Acceptance Capital Corp")
(11,AACE,"Ace Cash Express Inc")
(12,AACG,"ATA Inc") <---------------------------------note this guy
(13,AACPF,"Australia Acquisition Corp")
(14,AAI,"Airtran Holdings Inc")
(15,AAI1,"Aqua Alliance Inc")
(16,AAII,"Aaipharma Inc")
(17,AAIIQ,"Alabama Aircraft Industries Inc")
and in the table I want to "merge" with this one:
----------------------------------------------------------------------------
(1,A,"Agilent Technologies Inc Updated")
(3,AAAB,"Admiralty Bancorp Inc")
(4,AAAGY,"Altana Aktiengesellschaft")
(5,AAAP,"Advanced Accelerator Applications SA")
(6,AABC,"Access Anytime Bancorp Inc")
(7,AAC,"AAC Holdings Inc")
(8,AAC1,"Arcadia Financial Ltd")
(9,AACB,"Alliance Atlantis Communications Inc")
(10,AACC,"Asset Acceptance Capital Corp")
(11,AACE,"Ace Cash Express Inc")
(12,AACPF,"Australia Acquisition Corp") <-------causes an ID conflict on INSERT INTO
(13,AAI,"Airtran Holdings Inc")
(14,AAI1,"Aqua Alliance Inc")
(15,AAII,"Aaipharma Inc")
is it possible to have multiple ON CONFLICT solutions?
The question is why
Well the reason is this query throws an integrity error cause id 12 exists already in the current table:
CREATE TABLE deleted_tickers_temp AS (
WITH updated AS (
INSERT INTO screener_ticker SELECT * FROM screener_ticker_temp ON CONFLICT (ticker) DO UPDATE SET
ticker = EXCLUDED.ticker,
name = EXCLUDED.name,
[...] RETURNING id
), deleted_tickers as (
SELECT id FROM screener_ticker WHERE id NOT IN (
SELECT id FROM updated
)
) select * from deleted_tickers
);
cause I create the temp tickers table from a CSV which has its own id sequence
Is there a reason not to discard one of the two id sequences?
yes, the IDs are FK'd from another table
so I don't want to lose the original sequence
I delete the tickers that aren't in the new temp table, which cascades down to their related objects, but for new ones I probably want to just create them at the end of the table
this is what I've got so far which works up to the point of that ID conflict (these are just SQL statements imported from another file called ops to keep the code reasonable but it should give you an idea of the process:
with connection.cursor() as cursor:
cursor.execute(ops.DROP_EXISTING_TEMP_TICKERS_TABLE)
cursor.execute(ops.CREATE_TEMP_TICKERS_TABLE)
cursor.execute(ops.COPY_CSV_TO_TEMP_TICKERS_TABLE.format(filename))
cursor.execute(ops.DROP_EXISTING_DELETED_TICKERS_TABLE)
cursor.execute(ops.INSERT_UPDATE_AND_CREATE_DELETED_TICKERS_TABLE)
I think it would help to split this into two steps
ok - how so??
alternatively, I wonder if instead of selecting * from the temp table I select everything except ID will that let the real one handle the auto-increment?
I'm finding it difficult to untangle the logic and references in that statement, which is why I suggested separating it
which statement?
The big create table statement
ok I can try to simplify it ...
working from the inside out, the WITH updated AS part just takes everything from the screener_ticker_temp table and inserts it into screener_ticker (the current/existing table) but if it hits a conflict on the ticker column, rather than insert, it updates that row
it saves that result as a CTE called "updated"
The syntax is ON CONFLICT (column) DO THING
What I would do is create two temporary tables
right, so the ON CONFLICT (ticker) looks for any tickers that exist in both tables and rather than insert a duplicate just update the existing row
one for insert, one for update, and on conflict insert it into another table
You're missing an ON CONFLICT (id), then, I think
unless, since I don't really care about the ID I just don't select it to insert
You'd still need an ON CONFLICT (id) DO NOTHING, or w.e
not if I'm not inserting that column though?
true
but you wouldn't get an id conflict if you weren't inserting that column?
From what I can tell, it looks like you've got a conflict handler for the ticker column, but not the ID column
yeah you're right - that's why I was wondering if there's a way to have multiple
cause if there's an ID conflict I don't want to update, I want to insert, lol
but generate a new ID π
hmm, it doesn't seem to like me skipping the ID column either.
django.db.utils.ProgrammingError: column "id" is of type integer but expression is of type record
LINE 3: INSERT INTO screener_ticker SELECT ("table",
^
HINT: You will need to rewrite or cast the expression.
I think writing the on conflict handler for ID is trivial, the question is just what do I actually want to do on that conflict π
Good luck!
thanks haha
Perhaps try splitting it out a little bit more. If you manually separate your sequences out into "conflicts on id" and "conflicts on ticker", then it could be more cleanly handled
yeah that's what I'm trying to figure out
Something like SELECT id WHERE table1id = table2id
Where you filter proactively instead of reactively
I think basically what I want to do is
- select everything from temp table
- insert it into existing table
2a) if there's a conflict on 'ticker', update the row
2b) if there's a conflict on 'id', toss the temp ID and insert a new record in the existing table after the last current ID
but I don't know how to do that π
might have to just do nothing and then make a second pass
or insert into another temp table
Why "if there's a conflict"
Why "insert, and if there's a conflict" and not "if there is a conflict, update"
You can SELECT conflicts easily, so you could explicitly handle each case individually
without dealing with the conflict machinery
because the goal is to insert new tickers as well as update existing ones, and delete ones that have been removed
hence try to insert, if there's a conflict, update
but that doesn't work when it tries to insert a ticker w/ an ID that already exists
and if I just add an on conflict (id) do update set .... then it will overwrite the existing ID w/ the new ticker's info
which effectively changes the PK
Here:
CREATE deleted_tickers_temp AS ( SELECT * FROM screener_ticker_temp );
INSERT INTO deleted_tickers_temp ( SELECT * from screener_ticker ON CONFLICT (ticker, id) DO NOTHING);
INSERT INTO deleted_tickers_temp ( SELECT all but id FROM screener_ticker WHERE ticker not in (SELECt ticker from deleted_tickers_temp))
UPDATE deleted_tickers_temp ...from.. (SELECT * from screener_ticker WHERE ticker in ( SELECT ticker from deleted_tickers_temp))
You can use better syntax, this was just to get the idea out
you can use UPDATE FROM on the conflicting tickers, and the insert into is self-explanatory
Or you could chain them
I don't think this does what I need though ... particularly this:
INSERT INTO deleted_tickers_temp ( SELECT * from screener_ticker ON CONFLICT (ticker, id) DO NOTHING);
That step inserts all the non conflicting rows
The next step inserts new rows that have conflicting ids but not tickers
but there will be like 99% conflicting rows
the final step updates rows with conflicting tickers
Even better to handle them explicitly then
hmmm, I'm trying to visualize how this works.
Divide the job into three different instructions
if (row is good) -> row
if (row should be updated) -> update row
if (row should be adjusted and added) -> insert row with new id
so here's the problem with it...
I just did the first step and it creates a table which is basically a dupe of screener_tickers_temp (ok fine, no worries) ... but this is what's in it ...
----------------------------------------------------------------------------
(1,A,"Agilent Technologies Inc")
(2,AA,"Alcoa Corp")
(3,AAAB,"Admiralty Bancorp Inc")
(4,AAAGY,"Altana Aktiengesellschaft")
(5,AAAP,"Advanced Accelerator Applications SA")
(6,AABC,"Access Anytime Bancorp Inc")
(7,AAC,"AAC Holdings Inc")
(8,AAC1,"Arcadia Financial Ltd")
(9,AACB,"Alliance Atlantis Communications Inc")
(10,AACC,"Asset Acceptance Capital Corp")
(11,AACE,"Ace Cash Express Inc")
(12,AACG,"ATA Inc")
(13,AACPF,"Australia Acquisition Corp")
(14,AAI,"Airtran Holdings Inc")
(15,AAI1,"Aqua Alliance Inc")
note in particular id 13 (AACPF)
now I do: INSERT INTO deleted_tickers_temp ( SELECT * from screener_ticker ON CONFLICT (ticker, id) DO NOTHING);
That's not quite right
it was an example, so there's bits of syntax missing
that on-conflict is in the wrong ()
that's not the problem though - regardless of the syntax when that query works, it will pull from this table:
----------------------------------------------------------------------------
(1,A,"Agilent Technologies Inc Updated")
(3,AAAB,"Admiralty Bancorp Inc")
(4,AAAGY,"Altana Aktiengesellschaft")
(5,AAAP,"Advanced Accelerator Applications SA")
(6,AABC,"Access Anytime Bancorp Inc")
(7,AAC,"AAC Holdings Inc")
(8,AAC1,"Arcadia Financial Ltd")
(9,AACB,"Alliance Atlantis Communications Inc")
(10,AACC,"Asset Acceptance Capital Corp")
(11,AACE,"Ace Cash Express Inc")
(12,AACPF,"Australia Acquisition Corp")
(13,AAI,"Airtran Holdings Inc")
(14,AAI1,"Aqua Alliance Inc")
(15,AAII,"Aaipharma Inc")
and when it gets to id 12 (AACPF) it will do nothing right?
yep
that's no bueno
Why?
because then I have the ticker AACPF w/ an id of 13 and it's supposed to be 12
it got shifted in the temp tickers table cause there was another ticker inserted above it
But the tickers still conflict, no?
yea ... so what does that mean?
Ok
You select in from your first table
that defines your initial, old data
you select in from your second table
if the id's are duplicate, don't insert, if the tickers dupe, don't insert
hmm, ok, I think you've got the tables backwards
probably
screener_ticker is the one that will have the correct existing IDs
temp is the one that will have new stuff
then create your temp from your valid base data
doing that now. So now I have basically a dupe of the current (valid) data, and I need to take everything from the temp table and stuff it in, handling the various conflicts.
lemme go back to your query again π
π
ugh, apparently you can't use multiple columns in the on conflict clause
Meh, just make it a where
INSERT INTO deleted_tickers_temp ( SELECT * from new_data_table WHERE id not in (SELECT id from deleted_tickers_temp) AND ticker not in (select ticker from deleted_Tickers_Temp));
doesn't that search the whole table repeatedly?
depends on the optimizer
also, try just a plain ON CONFLICT DO NOTHING
you may not have to specify a constraint
welp ... that ain't right lol
lfa_pg_test=# drop table deleted_tickers_temp;
DROP TABLE
lfa_pg_test=# CREATE TABLE deleted_tickers_temp AS ( SELECT * FROM screener_ticker );
SELECT 14376
lfa_pg_test=# INSERT INTO deleted_tickers_temp SELECT * from screener_ticker_temp ON CONFLICT DO NOTHING;
INSERT 0 14390
lfa_pg_test=# select count(*) from deleted_tickers_temp ;
count
-------
28766
(1 row)
lfa_pg_test=#
What's wrong about it?
it created 14,000+ duplicates π
yikes
Where'd the constraints go?
oh
Your query doesn't have ()
it's possible the on conflict is applying to the inner query
looks like CREATE TABLE AS didn't copy the constraints
I think I know how to resolve that
I need to be going, but good luck!
no prob - thanks for your help!
oye this query will be the death of me.
I was wondering if anyone can help me with how to search or lookup in SQLite3 with python
@rich trout that 45s query down to 700ms or so after removing all strings, cheers
still slow asf for 10m rows but yeah i guess
I've posted a question in help-7 that i think i maybe should have posted here, but any help is appreciated π
Is it a bad idea to have two different type of databases in one same project? I.e. Postgres and mongo.
Reason: My project right now uses postgres for it's data. But I have a specific type of data which comes from another source that i'm formatting that has been a huge pain until today to organize in something like postgres, a lot of tables, references, etc which breaks my brain.
can you use a jsonb column for that data?
I can, but I'm not sure if it's a good idea. As I need to query specific values of that data
It's not a great idea to mix them, because it significantly increases your operational complexity and causes you to have to manage two different datastores separately. But if it's going to be an extreme pain to manage in a RDBMS then perhaps a secondary "document store" could solve your problem. Code that works is better than pretty code that does not.
But I would suggest taking a second look or opinion on how hard it is to organize the data into postgres, as there could be some easy savings overlooked
please what is the equivalent for decimal(8,2) in SqlAlchemy?
I'm declaring models for tables
Would
db.Numeric(8,2)
suffice?
@rich trout Well the data in both databases don't really interact with each other at all, they're separated
It's just a lot of data that sounds painful to keep updated, lot of tables too
Anyone able to help me get mongodb working. I am having issues connecting to my cluster from python. this is the rror.
pymongo.errors.ServerSelectionTimeoutError: SSL handshake failed: cluster0-shard-00-01-ltipa.gcp.mongodb.net:27017: [WinError 10054] An existing connection was forcibly closed by the remote host,SSL handshake failed: cluster0-shard-00-02-ltipa.gcp.mongodb.net:27017: [WinError 10054] An existing connection was forcibly closed by the remote host,SSL handshake failed: cluster0-shard-00-00-ltipa.gcp.mongodb.net:27017: _ssl.c:1039: The handshake operation timed out
figured it out, guess my work is blocking port 27017f or outbound.
hi can someone please explain this sql query to me
Select A., B. From A Full Join B on A.key = B.key Where B.Key is null or A.key is null```
specifically, the last bit. I don't understand what checking if A.key or B.key is null does?
For context, A and B have identical table schema but values for the primary key may or may not be different between them
this is meant to find differences?
If memory serves me right, you may have two columns where the data is null, and it could happen a lot. so you wouldn't want it to match one very one, as null is not very unique.
how can I refer to my tables composite primary key? it's made of two fields
so how do I join on it?
e.g join on table1.primarykey = table2.primarykey ?
ON Table1.Key1 = Table2.Key1 AND Table1.Key2 = Table2.Key2
would this work? by .Key1/.Key2 is just that the column?
Query1 = "SELECT Symbol, Fieldname, a.Value as Value1, b.Value as Value2 FROM '{}' as a JOIN '{}' as b USING(Symbol,FieldName) WHERE a.Value != b.Value".format(tablename1,tablename2)```
Can someone explain to me why this has the same query speed whether I make Symbol,FieldName my primary composite key in the table or not?
I have two Symbol,FieldName,Value tables where Symbol+FieldName is unique (primary key)
and i want to find the different values for Value
how can I make this query faster, I thought it would be if I define these as my composite key in table?
optimizing querys depends on a lot of things, like what other data is stored in the table, the order of the columns, the amount of data etc.
But I do see something that should be addressed
using .format() to format SQL queries puts you at rist of SQL injection attack unless you're doing aggressive input validation yourself
You should use a library that can do parametrized queries with prepared statements
The pymysql library has a simple api for this, here's an example: https://stackoverflow.com/questions/775296/mysql-parameterized-queries
Almost all other databases and client libraries support similar things if you're not using mysql
Yes
ok ive used that before on other statements, ill make sure im consistent
If your application doesnt accept input from users there's no danger, but it's good practice anyway
im just trying to work out whether SQL is what I want for my project
and im not sure if its too slow or if my queries are just bad
since im new to it
a major slowdown with SQLite has to do with the order of columns
if you have a column that is type BLOB in the middle of the table and the objects are very large it will hurt performance as it has to scan over those object every time
my columns are all VARCHAR(255) at the moment
How many columns?
5, 4 of which are unique and can make a composite key
(BLOB was just an example btw, any large field will impact performance)
I would just recommend using an AUTOINCREMENT as PK anyway
im using a Attribute Entity Value schema someone recommended me
or maybe its entity attribute value
well
its basically
I have two tables, both of which should be identical but in reality aren't due to various issues
they're all for stock market symbols btw
each stock market symbol (lke AAPL, MSFT)
has fields
so
AAPL 52WeekHigh 34.5
AAPL 52WeekLow 23.4
MSFT Close 30.3
so I have these in tabe1, but I also have them in table2 but the values might be slightly different
I want to get the differences all in one table
like
AAPL 52WeekHigh 34.5 56.4
so
symbol field value1 value2
make sense?
and symbol field is a primary key
its unique
Sure so you want to join on the identical PKs and detect discrepancies?
yes
but
my question is
I was stupid and didn't define my primary KEy in my table schema
and I was using this query: python Query1 = "SELECT Symbol, Fieldname, a.Value as Value1, b.Value as Value2 FROM '{}' as a JOIN '{}' as b USING(Symbol,FieldName) WHERE a.Value != b.Value".format(tablename1,tablename2)
and I changed it so I defined my primary key as a composite of Symbol,FieldName
but there was no difference in speed
so is that query already using the primary key to index ?
and its not needed?
but yes, what I want to do is join on the identical PK's and detect discrepancies
thats exactly it
now I'm wondering if for an example table with potentially 120,000,000 rows this is way too slow and not feasible?
because that's a potential max use case i may run into
and i have to build my app based on that just in case
I mean 60mil rows is a bit big for SQLite, are you considering putting the data into postres/mysql for the large table?
because in theory it should just iterate through table1 then use index lookup on the PK to see if the same or not
I am not, I wanted to use SQLite so its running locally and not on a port/etc on my box
when ive found the differences, I will be making some stats on them e.g how many are different, what are the main differences
and sending this over the network to a remote MySQL database
the SQLITE DB is meant to be temporary - it will get deleted after I generate stats off of the differences
I was using dictionaries before this but I ran out of memory when I got to the large datasets
so now im trying sqlite
as its disk based
I'm not sure then, if it doesn't take too long (less than days) and its a one time ETL it might be "good enough"
it needs to be run once a day every day on a large number of servers π
(which are also doing other things)
i guess I will just download my max use case dataset and try it on there and see how long it takes..
Hmmm, then I might look at something more productionalized, but I'm not sure how to optimize that query as is
yeah its a bit above my pay grade to be honest but if no one else does it it wont get done
its that kind of situation
Β―_(γ)_/Β―
i've got a discord bot that currently uses sqlite with an ORM (peewee) but i want to migrate to postgres and do everything with queries rather than an ORM.
can anyone recommend a sort of best practices guide for this? i have no prior experience with postgres and my other DB experience is with mssql using exclusively sprocs.
related question, looking at some examples of other bots using dbs like these, they seem to have a lot of boiler plate stuff like:
class DateSQL(SQLType):
python = datetime.date
def to_sql(self):
return 'DATE'
defined for all types. is this a requirement?
anyone know if it's possible to capture conflicts when doing an INSERT INTO blah_table ... ON CONFLICT DO UPDATE SET blah=EXCLUDED.blah?
I want to know which rows were updated vs which ones weren't touched.
actually, on that note, I think my query here is just updating all rows since the conflict isn't actually on the row data but rather on whether it's trying to insert a dupe ... in other words, most times when there's a conflict on the constraint in question, the row won't change so I want to DO NOTHING, however if something has changed I want to update it.
Here's my query (Postgres):
CREATE TABLE updated_income_statements AS (
WITH updated AS (
INSERT INTO screener_incomestatement (
...
) SELECT
...
FROM screener_incomestatement_temp ON CONFLICT ON CONSTRAINT unique_income_statement DO UPDATE SET
col1 = EXCLUDED.col1,
col2 = EXCLUDED.col2,
...
RETURNING ticker_id, dimension_id
) SELECT * FROM updated
)
figured this out by adding a WHERE clause ... now I am going the other way, trying to figure out how to also log which ones were inserted lol
anyone know why a 3.8 GB text file inserted into a sqlite3 table is making my DB more than 40 GB ??
is that normal?
What exactly are you using to store it? Also, I would consider simply storing the file path to it in the DB that's on an FTP server, rather than the file itself. But if that's not an option you should be able to store it as a BLOB.
I need to store datas (json/python dictionaries) temporarily (datas that are used in combination with my Postgres db), I don't want to add a column/table to store them because after a short period of time I'll delete them. Is storing them in a local file (using pickle ?) is okay or a bad idea?
Depends on the load. It may be slow if you want to store a lot at once.
Also pickle has it's own issues. It may make sense to use something like raw json.
the datas would be a couple of ids
Then it should be fine.
ok ty
just use json if the data is easily serializable.
that said, there are also things like dataset that lets you literally write to an actual sql db (sqlite or postgres) with the ease of using a dict, or tinydb that uses a flat json file as a "database" but has the ability to query it like you might a true db, etc
those options are probably way overkill for what you are doing currently but nonetheless they are fantastic tools to be aware of for down the road
if I have two tables with identical schema and have 30 columns, there's one primary key. What would bet he most efficient way of finding all column values where the primary key differs between to two tables?
e.g primary key BOB in table 1 might have the 10th column with value 10, but in table 2 its 20
I guess I can do a join on primary key where a.column1 != b.column1 or a.column2 != b.column2 .... a.column30 != b.column30
but that seems a bit silly?
would anyone be willing to help me design a small database from scratch? i know whats going to be stored in it etc. however i need a design write up. So something that shows how i developed the tables from no normal form up to 3rd normal form and an ERD
also wont be online for a couple of hours so if you are willing could you private message me thanks π
I'm using debian 10 and trying to set up mysql (mariadb) on it wjth Python aiomysql. So I can connect to it in CLI with
sudo mysql -u root -h localhost -p (empty password for now just want to connect to it)
but when trying to connect with python I get "cant connect to database" alltho I'm using the same data as in sudo mysql -u root -h localhost -p aka user root host localhost port 3306 passeword empty
ping for reply
Aka I know the code is working because I used it on win. But it wont conmect on linhx vps
Sec
if your code is connecting into a remote host you'll obviously need to specify that
plus you'll need to GRANT ALL ON *.* (or whatever database/table) TO 'user'@'clientip'
no no it's all on local
could be the hostname of your VPS isn't coming through as you'd expect
if you cat /etc/hostname on your VPS what do you get?
your python could be trying to access it as 'user'@'hostname'
rather than being viewed as local host or any
so my config
{
"host": "127.0.0.1",
"port": 3306,
"user": "root",
"password": "",
"database_name": "spectro"
}
I tried "localhost" too no bueno.
The database "spectro" as created trough CLI sudo mysql -u root -h localhost -p and then using mysql queries.
My connect command in python looks like:
async def _get_pool(self):
host = self.database_config.get_key("host")
port = self.database_config.get_key("port")
user = self.database_config.get_key("user")
password = self.database_config.get_key("password")
password = password if password else None
database_name = self.database_config.get_key("database_name")
return await aiomysql.create_pool(host=host, port=port, user=user, password=password,
db=database_name, autocommit=True)
And I run the script with sudo python3 bot.py
Id on't have that much epx with setting databases in linux, first one in fact
I can't test that command as it's not my VPS (it's a client for which I'm creating the script but the database won't connect)
SELECT user,host FROM mysql.user;
It google cloud btw
and see what the host is set to
if you set the host to 'user'@'%' then that user can log in from any IP
I though local is always enabled
create a new user for your app to use rather than root too
yeah but it could be specified as "localhost" and your code is trying to connect in as "<hostname of server>"
yes I planned that but I was wondering why I can't connect with root
oki
oke thanks π
ok im making a database. which stores usernames (unique key), whether they are an admin, whether its their first time logging on, and what parts of a programme they have access too. with this i notice their is a many-to-many relationship that needs resolving with a link table as many users can have access to many different programmes. On top of this different users may have access to different parts e.g. USER1 has access to part A and B whereas USER2 just has access to part A. So how would i make the link in an Entity relationship diagram
would something like this work?
ignore the lists in the Users table