#databases
1 messages Β· Page 28 of 1
No worries.
ok, many things are similar across the board to be honest
are you interested in being an analyst/data scientist? Or more of a database admin/data engineer?
or just use databases in software development?
Both but more Data Engineer now
ok you want to be learning DDL/DML features of SQL, creating tables and views etc.
and also understanding the performance of different queries, different joins, what happens when you join on int vs join on string, how the join differs between two big tables, vs one small table and one big table
also ETL, processing data to make it ready to then load onto a database/data warehouse, e.g. python on unstructured data to turn it into csv
I'm an analyst
so I won't pretend I know too much about data engineering
you can start getting into Terraform etc.
Oh well thank you so much anyway, I will use this as my learning guide
There are books as well as courses
Any in particular you'd recommend?
https://www.oreilly.com/library/view/google-bigquery-the/9781492044451/ I know more about Google Cloud Platform
but see if you can find an equivalent book on AWS
Got it, I'll get to it now
Good luck!
You're welcome!
anyone has a tutorial to get oracle db up and running for remote connections in azure vms i tried setting it up though azure cli but towards the last steps it asks for a password for the db but i have no idea what its set to based on the tutorial Microsoft has on this
sudo su - It asks for a password when i am logged in as oracle user and try to invoke the above command
What's the problem with using enums? π€
does anyone know of an opensource relational database that I can use to practice with?
mysql or postgres
not the system software, a database example that I can use
you mean some example of data itself?
yes
There are endless possibilities a web search away. Here's a section of the MySQL docs with several example datasets for download. But whatever you're looking for probably exists in multiple places. https://dev.mysql.com/doc/index-other.html
guys i need to choose between two database to store a thousands of records in a single table like a table containing thousands of names in a single column. I need help for what to use Mysql or Mongo for such purposes? so that its faster to store and fetch at the same time
Mongo is for lazy devs, making app that will not be (ideally) ever improved later or having next versions. There is less than 1% specific usage case when it could be needing for some stuff beyond relational dbs though. Mongo can flare managably somewhat if u utilize Pydantic very strictly, but eventually u will still face consequences if it is constantly developed app.
Mysql is needed in 97% cases to build a backend that will be of adequate quality and improved later. Fitting for millions/terrabytes of data
thanks for the help bro π
usually people choose between Sqlite and fully fledged db like Mariadb/Postgres/Mysql π
sqlite3 can operate fully locally
Thanks for that info as well.
Not sure if this is the right place, but I'm having some issues with my code I've wrote for interpretting and plotting datasets for my uni physics work. Is anyone able to maybe help me out? I imagine its's fairly basic stuff but I've only been coding a few months.
Looks like they deleted the original message? They wanted the thing as a string instead of as an enum element
sqlalchemy.Enum(native_enum=False) should do the trick
since python 3.11, they is ultra nice option from enum import StrEnum that operates way better as strings, yet serving as enum
yet offers enum functionality. Could have been nice to offer to that person
it is a real string in almost everything, and writeable and comparable and etc as a string, without need for .name/.value stuff
it may fail only in case of exact isinstance(str) smth comparisons, there is some warning written about it in docs
Sounds unrelated to this channel topic, but #python-discussion is the main Python channel. Or open a help thread #βο½how-to-get-help
or maybe #data-science-and-ml
Can anyone with knowledge in SQLAlchemy 2.0 help me with some queries? I'm really really really lost here. The docs only makes me more confused
I have a SQLite table and I want to fzf/fuzzy search any number of columns. There could be multiple readers, but itβs unlikely. Should I ad-hoc return all columns or implement some caching, for example with a local file and flock
All readers are local, I can just pass a file handle with the cached file
Hey guys! I'm looking for good workflows for building databases in Python. I've found working with SQLAlchemy to be typing the same stuff over and over again and kind of a nightmare to change things, and Django is a dream, but understanding concepts like unit of work and some other software architecture concepts that should supposably "Make things easier" hasn't been so fun.
I'm looking for the best place to learn the following as quickly as possible:
- Database migrations
- Organizing and deciding what makes a "unit of work" for SQLAlchemy
I find database migrations to take forever when it seems like it should be something that should be easy (Add a column in an excel sheet, no? π€·ββοΈ)
Trying to find a good general workflow for Python projects, but like, Domain model, sqlalchemy orm, pydantic validators, 3 things to create the same concept in fast api? Seems a little bit like overkill imo π
Wondering how you guys are experiencing this and how to get to a point where it's not confusing as hell.
I only know databses through the Python ORM but need to know how to keep things simple.
Does anyone really know these things like the back of their hand?
I've only done one project each with SQL Alchemy and Django but am finding Django substantially easier when it comes to migrations just working as expected
As for the "typing the same stuff over and over", that just sounds like an IDE issue. In the age of Copilot (better) or CodeWhisperer (completely free) I no longer find that much of a problem
As for organizing and deciding what makes a unit of work, I don't really see a problem there specific to using an ORM so maybe you want to elaborate
Hi i am new to this community, I am currently working on a project from work that involves PySpark. I have some implementations issues, can I ask them here or am I in the wrong channel?
there's a #data-science-and-ml channel too, you might get more people familiar with PySpark there
ohh ok thanks π
hi, i was planning on making an application like "medium" where you can create your own articles. Of course, in a much smaller scale since i believe i would be the only one using it. Which type of database should I use? Because I know SQL but i don't know if it would be appropriate because i don't know how would i query each article and then style it. My first thought was of doing a row for every paragraph but.... i don't think that's a very good idea. Could anyone give me a hand? also i was planning on using django
!pypi wagtail
a cms seems good for this endeavour
If you don't commit with sqlalchemy, does it rollback all the changes you made to the database?
Somewhat off-topic. We have Grafana in our stack and I am trying to find out if there exists a Discord for discussions on Grafana, or perhaps if not, a more general Discord for discussions around data visualization and or "NoSQL"'ish systems.
I would then have a question regarding Variables in Dashboards in Grafana, in particular, whether or not it is possible to use a Variable within the definition of another Variable, and if yes, whether or not Grafana is smart enough to resolve the first Variable before resolving the second Variable.
There is a notion of chained variables in Grafana, but that is not quite it, it seems that chained variables are more a thing revolving around namespace'ish kinda property-path'ish logic.
I found https://docs.sqlalchemy.org/en/20/orm/session_basics.html#what-does-the-session-do and https://docs.sqlalchemy.org/en/20/core/connections.html to be helpful on this question. If you visit these two pages, make sure that you do read the very first paragraphs on either page as well, they already contain key information that points in the right directions, I mean just don't TL;DR-skip over the first paragraphs π
I read it
Just making sure
class BaseCount(models.Model):
property = models.CharField(max_length=32)
subgroup = models.CharField(max_length=16, null=True)
end_time = models.DateTimeField()
value = models.BigIntegerField()
class Meta:
abstract = True
class UserCount(BaseCount):
user = models.ForeignKey(UserProfile, on_delete=models.CASCADE)
realm = models.ForeignKey(Realm, on_delete=models.CASCADE)
class Meta:
# Handles invalid duplicate UserCount data
constraints = [
UniqueConstraint(
fields=["user", "property", "subgroup", "end_time"],
condition=Q(subgroup__isnull=False),
name="unique_user_count",
),
UniqueConstraint(
fields=["user", "property", "end_time"],
condition=Q(subgroup__isnull=True),
name="unique_user_count_null_subgroup",
),
]
# This index dramatically improves the performance of
# aggregating from users to realms
indexes = [
models.Index(
fields=["property", "realm", "end_time"],
name="analytics_usercount_property_realm_id_end_time_591dbec1_idx",
)
]
INSERT INTO analytics_usercount(property, subgroup, end_time, value, realm_id, user_id)
VALUES ('messages_read::hour', NULL, '2024-02-24T20:00:00+00:00'::timestamptz, 4, 2, 11)
ON CONFLICT (property, subgroup, end_time, realm_id, user_id)
WHERE subgroup is NULL
DO UPDATE SET
value = analytics_usercount.value + excluded.value
psycopg2.errors.InvalidColumnReference: there is no unique or exclusion constraint matching the ON CONFLICT specification
It depends on connecting settings.
If u did not turn on auto commit as true, then yes
Thanks for clarifying
Oh I need to ask ChatGPT about my above question, too. Thanks for the reminder.
hey yall im using sqlite to save some values and stuff and for some reason this code gives me this error and i do not understand why...
code:
self.db['cursor', database].execute(f"SELECT {values} FROM {table}{filters}{order_by}")
Error:
Traceback (most recent call last):
File "D:\Python Project\Origins Updater\main.py", line 430, in <module>
open_second_gui()
File "D:\Python Project\Origins Updater\main.py", line 282, in open_second_gui
app = App()
File "D:\Python Project\Origins Updater\main.py", line 229, in __init__
for item in self.forks[self.sqdb.dict_getone('settings', 'fork')]:
File "D:\Python Project\Origins Updater\libs\database.py", line 163, in dict_getone
data = self.fetchone(table, filters=f'itemkey={key}')
File "D:\Python Project\Origins Updater\libs\database.py", line 135, in fetchone
self.db['cursor', database].execute(f"SELECT {values} FROM {table}{filters}")
sqlite3.OperationalError: no such column: fork
Process finished with exit code 1
the query that gets generated if i take f"SELECT {values} FROM {table}{filters}{order_by}" into a print statement:
SELECT * FROM settings WHERE itemkey=fork
chatgpt cant figure it out either
okay i just figured it out. if i put fork in "" it works
Also, please be aware of the dangers of f strings to build dynamic sql
!sql
Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.
Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.
For example, the sqlite3 package supports using ? as a placeholder:
query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)
Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.
See Also
- Python sqlite3 docs - How to use placeholders to bind values in SQL queries
- PEP-249 - A specification of how database libraries in Python should work
(note: using sqlite for this project)
I really feel like I should be using several .db files to keep things organised
for example one with general data that the application uses and one with user data but realistically I don't see why would I, it's just my organisational spirit telling me to
is there a good reason to use seperate databases sometimes? what are benefits of it? (as in cons I see that I lose out on possible joins or whatnot if it is ever needed)
i would keep it in the same database most of the time, otherwise you can't do joins
i think it depends on how useful it is to your project, being able to export and operate on individual partitions without the presence of the other files
and as BillyBobby mentioned, it is generally a very bad idea (and a major security hazard) to do dynamic SQL like that unless you sanitize inputs like crazy and really know what you are doing, this can't be overstated
DBs are already internally organized, you have tables, some dbs (like postgres) also have schemas, I don't think you need a separate file
alright, I'll bite my lips and keep it in one file
How can I make a select in SQLAlchemy 2.0 ORM in table which have a relationship and the attribute of the relationship is a list?
So I have this abstract idea I need to implement for my web app written in Python:
1. Upload various time series as CSV/polars DataFrame from user input to database - done
2. Catalogue each of these inputs in a separate table including metadata on the uploads - in progress
3. Allow user to make a join plan between different time series and define new/derivative columns.
4. Define a domain-specific schema for use in my application that dictates how the data will be treated internally (e.g., there are special columns that my application needs to be defined for certain operations to be valid on it, certain columns/series will be treated differently based on tags in this schema)
5. Reify the specified joined + transformed dataset and save it for later use
6. Save the schema so that it can be passed to downstream objects that will operate on the data
7. Map the reified dataset to the defined schema so that they are always used together when the dataset is selected from the DB
I was wondering if anyone has any suggestions for how to implement 2-4 in particular and whether you think I should enforce a schema definition as part of 1 or defer the schema definition until the data is being prepared for use in a job as the numbering suggests
if anyone has any experience representing database schemas as UI inputs from a web app thatwould be helpful I guess
this sounds like a frontend for the database engine like snowflake, big query etc
what prompted you to make something like this?
Tools for creating datasets for use with reinforcement training models
I'm trying to create a web app that show training and results of reinforcement training applied to time series problems
part of it would be allowing people to define their own datasets from data they upload to the app
I think it makes it a lot more usable
have you tried some existing solution?
Not sure what I can integrate with
I'm developing local for now because GPU costs on cloud too much
I have a home server with a threadripper and a few RTX gpus
you can host a db locally
I do have a postgres db
that's what the app uses
I'm blobbing dataframes and just unblobbing them for joins/column definitions
but I need a way to make it easy to join and apply transformations to the data after the user uploads them
basically I'm trying to generalize a particular use case I had myself and so I need to go back and parameterize things so it can work on arbitrary data with user provided schemas
all the use cases you mentioned can be done with a db and a database manager, don't you think?
You mean if I actually have users do all the DB operations themselves to define the tables they want to use for a job?
Not sure I follow you
I mean yes I have the capability to do things I'm trying to figure out how to abstract it so it can be user configurable
people who want to upload stock time series and other exogeneous data and try to fit reinforcement models for them
so I'm guessing mixed levels of sophistication with DB use
I'm seeing the value provided as being a place to ingest data, prepare it for time series analysis, fit a reinforcement model to it to try to find a trading strategy, and give analytics on the results
the thing is financial data is kind of BYOB
because it's all very commercialized and has proprietary licenses
and people might want to try different types of exogeneous data
so I need to make it a BYO data system
all this sound like what snowflake does, tbh
i'm not affiliated with snwoflake or anything, like i have used it the past
what does snowflake provide besides the cloud DB
easy of use basically
data and compute is seperated
have you heard about datalake architecture?
snowflake is basically that
that's more about how the DB's state is structured isn't it
like how they serve the shards of the DB etc.
you can make something similar with apache iceberg, if you want to do it yourself
but if I replace postgresql with snowflake what does that get me in terms of something like a representation of how to join tables together to create a deferred result table
that's kind of what I'm talking about with making the data user configurable
they upload a bunch of CSVs let's say, then they have a UI to define how to join them
postgres is not a warehouse, the other one is
to summarize upload data, analyze, visualize and make models?
yes
there are tools to already do this, why make one yourself, that was the initial question
- Not sure what's out there
- Don't have a lot of money to spend on development
- low-code or code?
- making one cost $ too
just you use it then?
for now yes
I'm planning on making something usable more generally
like I said I implemented something for my particular use case
with specific logic
now I want to parameterize it to make it more generally useful
and part of that is being able to replace my specific dataset with configurable datasets
also it will save me time in the future too tbh
preparing a dataset can be a timesink
so to upload files to some place you need a file server, something like minio
and make schema, tables on them, maybe apache iceberg
connect to it via some database manager, jupyter or make a custom ui
what does minio do that replaces something like this
@main_app.post("/upload_csv")
async def upload_csv(request: Request):
"""
Uploads a CSV file, saves it to a specified output path,
and inserts its contents into a database as a raw table.
Args:
request (Request): The request object containing the file path,
output path, parse_dates flag, and index_col.
Returns:
dict: A dictionary with the status of the upload process.
"""
if not request.files:
return json({"status": "error"})
input_file = request.files.get("file")
if not input_file:
return json({"status": "error"})
if input_file.type != "text/csv":
return json({"status": "error"})
output_path = request.args.get("output")
parse_dates = bool(int(request.args.get("parse_dates", 0)))
file_path = os.path.join(app_path, "data", output_path)
logger.info("Uploading %s to %s", input_file.name, file_path)
with open(file_path, "w", encoding="utf-8") as f:
f.write(input_file.body.decode("utf-8"))
logger.info("Uploaded %s to %s", input_file.name, file_path)
df = pl.read_csv(file_path, try_parse_dates=parse_dates)
db.RawData.insert(df, output_path)
logger.info("Inserted %s into database as raw table", output_path)
return json({"status": "success"})
the db here is postgres?
ya
how would you deal with files with bad schema?
I need to get rid of the output path as an arg too that's just a testing thing mostly
but good question
the common pattern for this would be to save the raw data as it is and then separately treat them afterwards
I see
this is where datalake comes in
this could be useful
https://dataengineering.wiki/Concepts/Medallion+Architecture
thanks
ignore the products and branding
even if it doesn't turn out to be useful for anyone but me I still think this is a valuable learning exercise for practical skills
I think knowing how to do this stuff yourself is a valuable skillset
seeing as everyone wants to sell it
I'm looking into MinIO now though
yeah i am aware but in this case i am not too worried about it since it will be used as a local only database for the user itself to save settings and stuff. And using f strings to fromat just makes my code easier in this case π
I see so I will replace Postgresql with MinIO and use Kubernetes
it's still a bad habit to get in to, you really should avoid it as far as you can, but you do you π€·
agree, its just about how much time you have at hand
either you solve a original problem or you build a platform
unlikely you have time to do everything at once
do look at alternatives too, i said minio because it was part of kubeflow and i have used it before
Looks like you are trying to save files.
While storing the file itself is commonly done on a blob/object store, you may still need postgres around to manage their metadata
CHROME_PATH_LOCAL_STATE = os.path.normpath(
r"%s\AppData\Local\Microsoft\Edge\User Data\Local State" % (os.environ['USERPROFILE']))
CHROME_PATH = os.path.normpath(r"%s\AppData\Local\Microsoft\Edge\User Data" % (os.environ['USERPROFILE']))
``` why is it this somehow causes a file to get written to with the information as it should, but when the file is sent to a webhook it sends an empty file with the filename
i have the EXACT same copy paste code for google chrome and opera gx, same send to discord function but they both send the full files
CHROME_PATH_LOCAL_STATE = os.path.normpath(
r"%s\AppData\Roaming\Opera Software\Opera GX Stable\Local State" % (os.environ['USERPROFILE']))
CHROME_PATH = os.path.normpath(r"%s\AppData\Roaming\Opera Software" % (os.environ['USERPROFILE']))
CHROME_PATH_LOCAL_STATE = os.path.normpath(
r"%s\AppData\Local\Google\Chrome\User Data\Local State" % (os.environ['USERPROFILE']))
CHROME_PATH = os.path.normpath(r"%s\AppData\Local\Google\Chrome\User Data" % (os.environ['USERPROFILE']))
``` this is the google chrome and opera gx versions and they both work fine
Are you familiar with https://pypi.org/project/platformdirs/?
(might make some of that path stuff easier)
Also, this seems nothing to do with this channel. #python-discussion would be a better place to start.
Any one here good with sqlmodel? I'm having a lot of problems getting multiple Relationships in the same table working
you should ask, somebody might know
class User(SQLModel, table=True):
__tablename__ = "botsettings_user"
id: Optional[str] = Field(default=None, primary_key=True)
servers: List["Guild"] = Relationship(back_populates="owner", sa_relationship_kwargs={'cascade': 'all, delete'})
premium_servers: List["Guild"] = Relationship(back_populates="premium_owner", sa_relationship_kwargs={"cascade": ""})
class Guild(SQLModel, table=True):
__tablename__ = "botsettings_guild"
id: Optional[str] = Field(default=None, primary_key=True)
owner_id: str = Field(foreign_key="botsettings_user.id")
owner: User = Relationship(back_populates="servers", sa_relationship_kwargs=dict(foreign_keys=["owner_id"]))
premium_owner_id: Optional[str] = Field(foreign_key="botsettings_user.id")
premium_owner: Optional[User] = Relationship(back_populates="premium_servers", sa_relationship_kwargs=dict(foreign_keys=["premium_owner_id"]))
Im getting sqlalchemy.exc.AmbiguousForeignKeysError: Could not determine join condition between parent/child tables on relationship User.servers - there are multiple foreign key paths linking the tables.
so like each user has a guild and a guild should have an owner?
ya each user can have many guilds and each guild requires an owner
also the list of guilds in the user, would violate the primary key condition right?
i guess a mapping table to store user and guild info would makes sense
trying to convert from a Django model
something like User -> UserGuild <- Guild
user guild would save foreign keys to user and guild
db is already made using django, just now trying to make a sqlmodel maping
the django model looked like that? 
Looks like:
class User(models.Model):
id = models.CharField(max_length=20, primary_key=True)
class Guild(models.Model):
id = models.CharField(max_length=20, primary_key=True)
owner = models.ForeignKey("User", on_delete=models.CASCADE)
premium_owner = models.ForeignKey("User", related_name="premium_owner", null=True, blank=True, on_delete=models.SET_NULL)
this looks good, why not maintain the same model in sqlmodel?
Not sure what you mean
maintain this same schema in sqlmodel
this looks totally different
what im asking help to do
Hello!
I need someone who can build an Asterisk server.
Please DM me.
it all works happily in sqlmodel if i dont care about the Relationship's and remove them, but then i cant delete User's do to the ForeignKey's
class User(SQLModel, table=True):
__tablename__ = "botsettings_user"
id: Optional[str] = Field(default=None, primary_key=True)
class Guild(SQLModel, table=True):
__tablename__ = "botsettings_guild"
id: Optional[str] = Field(default=None, primary_key=True)
owner: User = Relationship(
back_populates="servers", sa_relationship_kwargs=dict(foreign_keys=["owner_id"])
)
premium_owner: Optional[User] = Relationship(
back_populates="premium_servers",
sa_relationship_kwargs=dict(foreign_keys=["premium_owner_id"]),
)
something like this would be the equivalent right?
i haven't used sqlmodel yet, might have to lookup the docs, but this seem like a valid schema
hmm, how did this work in django then?
ah i see, in the django model it'll become null on user delete
I have an interesting issue. Async reads were working on my native install of my application and postgres but when I try to run my app in a container I get a message like this:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/src/sanic_vec_env.py", line 61, in _worker
env = _patch_env(env_fn_wrapper.var())
File "/workspace/src/trader.py", line 135, in __init__
self.data = db.StdCVData.read(table_name, chunk).sort("date")
File "/workspace/src/db.py", line 588, in read
blob = cls.get(table_name, chunk)
File "/workspace/src/db.py", line 610, in get
fetched = cls.conn.execute(query).fetchone()
...
File "/opt/conda/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
self.dialect.do_execute(
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) lost synchronization with server: got message type "5", length 808464432
Do I need to use asyncio and the async sqlalchemy API here so I don't lose synchronization or could it be something else?
show your code if possible.
without more details, i am guessing you create a connection that's shared between different multiprocessing owned process, just a guess.
It's hard to share the full context without a huge dump but yes that's probably happening
let me post some bits that I think are relevant
class CVData:
conn = sa.create_engine(CONN).connect()
metadata = sa.MetaData()
metadata.reflect(bind=conn)
try:
table = sa.Table("cv_data", metadata, autoload_with=conn)
except sa.exc.NoSuchTableError:
table = sa.Table(
"cv_data",
metadata,
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
sa.Column("table_name", sa.String),
sa.Column("chunk", sa.Integer),
sa.Column("data", sa.LargeBinary),
)
@classmethod
def get(cls, table_name: str, i: int) -> bytes:
"""
Retrieve CV data by table name and chunk index.
Args:
table_name (str): The name of the table.
i (int): The chunk index.
Returns:
bytes: The CV data.
Raises:
AssertionError: If data is not found for the given table name and
chunk index.
"""
table = cls.table
query = (
sa.select(table.c.data)
.where(table.c.table_name == table_name)
.where(table.c.chunk == i)
)
fetched = cls.conn.execute(query).fetchone()
assert fetched is not None, f"Data not found for {table_name} chunk {i}"
res = fetched[0]
return res
@classmethod
def read(cls, table_name: str, i: int) -> pl.LazyFrame:
"""
Read a chunked table from the database.
Args:
table_name (str): The name of the table.
i (int): The index of the chunk.
Returns:
pd.DataFrame: The chunked table as a pandas DataFrame.
"""
blob = cls.get(table_name, i)
df = cloudpickle.loads(blob)
return df.lazy()
just lower your multiprocessing worker count to 1 and retry, if it doesn't blow up then it's like 90% confirmed
ah yeah I already didt hat and it worked
just trying to figure out how to fix it
so this is what's happening:
def __init__(
self,
env_fns: List[Callable[[], gym.Env]],
sanic_app: Sanic,
jobname: str,
start_method: Optional[str] = None,
):
self.waiting = False
self.closed = False
n_envs = len(env_fns)
if start_method is None:
start_method = "spawn"
ctx = sanic_app.shared_ctx.mp_ctx
pipes = [ctx.Pipe() for _ in range(n_envs)]
self.remotes, self.work_remotes = zip(*pipes)
for work_remote, remote, env_fn in zip(
self.work_remotes, self.remotes, env_fns
):
hex_name = md5(
str(uniform(0, 1)).encode(), # nosec
usedforsecurity=False,
).hexdigest()
worker_name = f"RLVecEnv{hex_name[:10]}"
kwargs = {
"remote": work_remote,
"parent_remote": remote,
"env_fn_wrapper": CloudpickleWrapper(env_fn),
}
sanic_app.m.manage(
ident=worker_name,
func=_worker,
kwargs=kwargs,
workers=1,
restartable=True,
)
env_fn is actually a class definition and the callable returns the instance. the init connects to the db to read some data
this is happening in async across n cpus
so it spawns like 64 processes asynchronously let's say
they all start reading from the DB
when the env_fn is actually called
Sanic is an async http server
all this stuff worked NATIVE though
it's only when I moved the app (but not postgres) to a container and connected to the host postgres service that I started getting this error
and in native is the worker count > 1?
yea
that's odd..
agree, I thought it was a postgres security for external connections at first
then I read the traceback
in any case, you can always defer the creation of connection until your worker actually uses it.
right now you are creating the connection upon class definition.
there are probably other ways of fixing this (e.g. make all process go through the import/class definition process, iirc that's possible), but i think the above is cleaner.
I wonder if it "didn't lose synchronization" because the hardware was all running on the same kernel
but when I switched to the container the app has its own kernel now right
can I fix this using async though?
I know you can do async engines with sqlalchemy
i wouldn't even try, since you are using multiprocessing already π€·
i am curious, can you do a little multiprocessing.get_start_method() and report what it shows you?
my guess is native shows spawn, dockerised shows fork.
i wouldn't even try, since you are using multiprocessing already
i say this because by using multiprocessing, multiple python interpeters would be spun up, and then subsequently potentially using the same python object for db connection, i don't think this is particularly safe, nor asyncio will help you in anyway here, because the interpreters would just fight for control of that python object, with 0 coordination or intent of collaboration. (well this is just my hand wavy way of explaining this without me actually knowing EXACTLY what's happening under the hood)
dfs = asyncio.gather(*[cls.read(table_name, i) for i in range(chunk)])
std_df = pl.concat([df.collect() for df in await dfs])
I didn't know that you could use await like this
nice
what is happening here? you're using polars in async with batching?
doesn't polars automatically do a connection pool and get it in parallel?
just specualting, now sure
again reading this code, the collect is in the second line, so the actual work happens only after that, the async section is not likely doing anything, other than batching or creating connections, so idk π
you should just make something single thread, measure performance and then try to optimize it
I was trying to fix an issue with concurrency in the container
that only happens when using the container
(the db is still on the host)
so i wanted to see if using the async API would help with losing synchronization with the db on the host
even in containers everything runs on the same kernel as the host os
if it's running on it's own kernel no longer a container but rather virtualization through a hypervisor of some kind
how i can input the name of table and the other columns by ask user do it
con = sqlite3.connect('data.db')
nameT = self.ui.table.text()
col_1 = self.ui.culonm_1.text() #+ self.typeC_1.text()
col_2 = self.ui.culonm_2.text() #+ self.typeC_1.text()
col_3 = self.ui.culonm_3.text() #+ self.typeC_1.text()
col_4 = self.ui.culonm_4.text() #+ self.typeC_1.text()
st = f""" CREATE TABLE '{nameT}' (
id INTEGER PRIMARY KEY,'{col_1}', '{col_2}', '{col_3}', '{col_4}');"""
crs = con.cursor()
#crs.execute(cle)
crs.execute(st)
i can do this ?
!sql-fstring
Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.
Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.
For example, the sqlite3 package supports using ? as a placeholder:
query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)
Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.
See Also
- Python sqlite3 docs - How to use placeholders to bind values in SQL queries
- PEP-249 - A specification of how database libraries in Python should work
iirc it doesn't allows for you to specify table/column names though, so you may as well just verify that it is within a set of allowed values then use normal string formatting techniques
that said, this is mostly for when you are running untrusted user input against a database they do not have direct access to.
If you're using a local SQLite file as the database, and the user is inputting directly on the same device + has full access to the database file anyway, it is not as important, but you must be careful not to create a bad habit of it
this is something that is very dangerous to do and you need to sanitize input data rigorously and know exactly what you are doing if you even consider going this route
one way is to match the input as an exact match against a key in a dictionary or an element in a tuple or a list of allowed values for that exact item/part of the SQL statement and use that as the string in to the SQL statement
SQL injection π
So reading a single table/query concurrently?
yes
well the reason it's doing it concurrently is that all those threads are doing work on the data concurrently
I was originally passing the data to the spawned threads via pipes
that was super slow
so now they read concurrently from the DB
it worked without container but when put in a container vs host OS DB it started to have issues
so I went in and made alot of changes
I probably should have changed less and tested it without async first
but it's working now so will probably just keep it
maybe improve the async over time
Could we take a step back and understand what you're trying to do?
I'm using vectorized environments to train a reinforcement learning model in parallel on an asynchronous webserver that will provide a UI for data selection and processing as well as analytics on the model fitting and the out of sample results
the environments need to get data from the db to do their job
they run in parallel
the reason they are running in parallel is so that I can do a parallel search of the parameter space of the model
the action space to be more precise
Each activity is a sequence of steps which are independent from each other? They all run in parallel and do their own stuff?
Why does this sound like a data pipeline to me?
they are running independent searches of the action space for each batch and then their results get aggregated when calculating the policy gradient
there's a vectorized learning environment API from stable baselines 3 that I'm using
Ya like it can do it's own work
yea
Whatever that could, get data, pre process, train model, evaluate, predict? Something in these lines?
yes
Bruh thats a data pipeline
but what the solution if i want to ask user to input the name of table and columns
Specifically it becomes an ml pipeline here
yeah but it's managed through a web server UI
so it's interactive
you can define different models interactively
that's the point
Ya that can be done
for a new table or for one of the existing tables that you know exist already?
Yes that can be done too
new
is your point that I should be doing something else?
Look at some pipeline orchestrating tool like prefect, dagster, airflow, kubeflow etc
You don't have to do this by hand, plenty of tools available
will do although I will say I'm having fun
waiting to start new job and have some time
This reminds of myself making an etl tool when IT said you can't use external packages when I started out
Totally knock yourself out
so what do those pipeline orchestration tools provide per se
what in my code would they replace
I haven't implemented the full general case yet btw that's what I'm working on now
Makes the pipeline process easier, making the dag itself, isolation of process, concurrency and parallelism etc
Most have nice ui, track previous runs stats etc
Just try something small like prefect, it's just a pip install away
now you are really treading into extra dangerous ground as you can no longer match the name against known good/safe value π¬
yeah I'm looking at it
ty
if you decide to continue on this path regardless of our repeated warnings you need to take several steps to try to safeguard your security
@vernal shore what database are you planing to use and is it for administrating a database on the users own system or on a remote system?
on the users own system (local application) or on a remote system relative to the system the user is sitting in front of or owns?
local
wait i will tell u
i create a simple app with pyqt5 and sqlite3
to create a table and ask the user to input the name of table and columns
wait i will show u
oh, then it's much less of an issue as it's local on the users own system where they have full access to the database anyways and if they would successfully exploit the application they would only be exploiting themself and do what they could have done directly in the database if they have the knowledge
just don't make it a habit so that you forget about the dangers of SQL injection when developing other types of applications and do something even remotely similar there
sa you are working with an embedded database that runs within the application itself you don't have to worry that the user might use the database to elevate privileges on the local system either
so i think you are rather safe in this instance π
now u have solutin for me ?
as security is much less of an issue in this specific scenario you can do what you where planing on
or you can use a "SQL Query Builder" like SQLBuilder or PyPika (there are others as well) that hopefully has defenses against SQL injection and takes care of protecting you from such things already
ty
u can come to voice chat 0 ?
no, sorry, i don't do voice chat and generally not DMs or FRs either
ok ty
are you using windows?
which is better
make a lot of small sql queries to get different data
or make one big query then filter the data in python
?
by big i mean the amount if data it would return
Depends on all the details. Why not test and find out?
the container and the db are running on a fedora server on my LAN, I am sending requests from a Windows machine to the container
I got the container working in docker but when I moved the app and db to kubernetes now I'm getting connection forcibly closed by peer
so the config now is db and app are both running on minikube cluster hosted on Fedora server
everything works, but long requests get forcibly closed
gotcha.
if you were curious why it worked on your windows machine but not linux after it's been dockerised, it's because the default start method of multiprocessing on windows/mac is spawn, meaning your db connection is never shared between workers. whereas on linux, the default is fork which means whatever you have defined prior to starting the workers could be shared.
and sharing a db connection that's not designed to be thread/process-safe would potentially yield such weird errors without a clear explaination of what actually went wrong - that's what you were seeing when running in a dockerised environment
thanks
I was running it in Linux without the container though
I am just using windows as the web client
the web host went from being the baremetal linux host to the container running on linux while the db host remained on baremetal linux
most of the time you don't want to dump massive amounts of data out of your database just to throw most of it away in the application
it's often better if you can at least pre-filter it as much as possible in the database
maybe you can come up with a way to do it so that you don't need to do lots and lots of small queries either
right, i assumed wrongly. i assumed originally when you meant "native" or "host", it meant windows, and anything else is just dockerised/linux.
anyway, next time you see an issue that might be related to how multirpcoessing spawn process , you can chuck a multiprocessing.get_start_method() in and see what it gets.
i believe you will see fork instead of a spawn.
though again if it were me, i would just defer the creation of db connection to after the worker processes has started.
(i think you might have a different issue if you are seeing long requests getting forcibly closed, so you can very well just ignore me)
that's what I ended up doing
I think they were two separate issues
I face different sets of issues when going from baremetal linux -> container -> kubernetes
think the last one is some kubernetes config issue
but not sure
do you have 2+ machines?
I have 2 machines yes
the windows one I'm typing on now and a headless homeserver running fedora
hmm all of them are used for compute?
no just the fedora one is being used for compute
π
whether it's in the form of baremetal, docker container, or kubernetes cluster
the thing that changes when I go from kubernetes are #1 I'm using the db in a pod instead of on the baremetal host #2 using kubernetes pod network instead of host network
but I think the latest error is more likely to do with the latter
maybe also an nginx thing since I'm reverse proxying the cluster to the host network
so I can access it from windows
I kinda want to post everything for a code review but it's not "complete yet" and I doubt people would want to spend the time understanding my application architecture or why I did things the way I did them
how long are the long requests btw?
long requests in what sense as well?
DB-related or your HTTP API?
so what I'm doing right now and should probably change to a lazy representation is having the training of a model be a request and it returns when it's done training
so it's like a 30 minute request
I should probably just make it so it creates a process that is managing the training in the background
and returns when it's done with that
it works on baremetal and container though
I'm sure it's some default setting I'm running afoul of in kubernetes but that means that I'm also doing things well out of the ordinary
you would need to keep the connection alive with some periodic traffic if you are running that long of an operation or you risk getting the connection killed after for example five minutes or some other arbitrary timer
I'm thinking the more elegant pattern is to have the request just start a process and having an API for checking in on the process
rather than waiting for it to finish to return
many times you would want to do this asynchronously and just let the client poll at intervals to see if the job with a certain job id has completed or maybe implement webhooks or something to send a notification when it's done
right I think that's what it should be
I'm learning web dev though so starting from very simple implementations and a place of ignorance
like I know it's not good yet but I'm not sure what good looks like
have some ideas
but need to figure out how to implement them
many people use celery for such jobs on the server, even if i don't care much for celery myself
why don't you like it?
it's probably just because the implementations i've seen it used in that i really didn't like
@main_app.get("/start")
async def start_handler(request: Request):
"""
Handles the start request for training.
Args:
request (Request): The request object containing the run parameters.
Returns:
A redirect to the summary page.
"""
args = request.args
logger.info("Starting training")
table_name = args.get("table_name")
jobname = args.get("jobname", "default")
start_i = int(args.get("i", 0))
cv_periods = await db.CVData.get_cv_no_chunks(table_name)
await db.Jobs.add(jobname)
logger.info(
"Starting training for %s with cv_periods=%s",
table_name,
cv_periods,
)
for i in range(start_i, cv_periods - 1):
logger.info("Starting training on fold %i of %i", i, cv_periods)
await train_cv_period(args, i, request)
await validate_cv_period(args, i)
redirect_summary = request.app.url_for(
"get_job_summary", jobname=jobname, table_name=table_name
)
return redirect(redirect_summary)
so this is the bad boy request that takes a very long time to return
I want to move this part out to a managed process or coroutine:
for i in range(start_i, cv_periods - 1):
logger.info("Starting training on fold %i of %i", i, cv_periods)
await train_cv_period(args, i, request)
await validate_cv_period(args, i)
redirect_summary = request.app.url_for(
"get_job_summary", jobname=jobname, table_name=table_name
)
return redirect(redirect_summary)
and just have the original return some handle for it that gets stored
an async task or a subprocess etc.
I prefer async task
jobname can handle the labeling
I'm not sure if I need anything besides the event loop I already have
i would have it in an external process rather then a coroutine so that you can restart the web server when ever you want without killing the long running job
then the server cant manage its resources though
or can it
I guess what I'd have to do is keep the record of what's running in persistent storage
depends on what you mean with "the server" and "manage its resources"
just check if the GPU is already being used for a job that was started previously
I guess that's where the DB comes in
yes, a database with backing on disk comes to mind
I want to have a more sophisticated GPU VRAM management system eventually too
so it knows if a job can fit before it tries to put it in the GPU
but that's pie in the sky for now
you could have a job queue
I don't know how to get any of the existing resource management stuff to work with GPUs
like I think the GPU will just eagerly try to put everything into VRAM
@main_app.get("/start")
async def start_handler(request: Request):
args = request.args
logger.info("Starting training")
table_name = args.get("table_name")
jobname = args.get("jobname", "default")
cv_periods = await db.CVData.get_cv_no_chunks(table_name)
await db.Jobs.add(jobname)
logger.info(
"Starting training for %s with cv_periods=%s",
table_name,
cv_periods,
)
coro = manage_training(args, request, cv_periods)
asyncio.get_event_loop().create_task(coro)
return json({"status": "success"})
async def manage_training(args, request, cv_periods):
table_name = args.get("table_name")
jobname = args.get("jobname", "default")
start_i = int(args.get("i", 0))
for i in range(start_i, cv_periods - 1):
logger.info("Starting training on fold %i of %i", i, cv_periods)
await train_cv_period(args, i, request)
await validate_cv_period(args, i)
redirect_summary = request.app.url_for(
"get_job_summary", jobname=jobname, table_name=table_name
)
return redirect(redirect_summary)
so I guess the question is now what happens with that redirect
anyway think this is decently offtopic for dbs now, thanks everyone for the help
maybe #async-and-concurrency ?
for x in characters:
name = x[0].replace(" ", "_")
def column_exists():
c.execute("SELECT * FROM pragma_table_info('users') WHERE name = ?", (name,))
return c.fetchone() is not None
if not column_exists():
c.execute(f"ALTER TABLE users ADD COLUMN ? INTEGER PRIMARY KEY AUTOINCREMENT", (name,))
gives an error:
sqlite3.OperationalError: near "?": syntax error
SQLite does not allows for you to use placeholders for table and column names, these should be fixed, static, ideally hardcoded
is there any way i could add a column to an existing table then?
i mean i can but
i also am forced to check if the column already exists because il be constantly adding more
the "professional" solution for that would typically be database migration tools
oh, so sqlite just doesnt work with this?
just throw it in a separate file and run it manually
You should not have to update your tables often?
you'll have to edit the code whenever you update the tables anyway
fair
@jade wing i get the solution of my problam
in PyQt5 we have qtsql
there is many statement there for sqlite
ty for help me
In this situation you can use an f-string to insert the table name. But be very careful that the string is safe. VERY
im aware ab SQL injections, but this is a very small project for me and my friends only
Cool. Plus if you can audit that the strings come from your own code it's fine.
I want to know if this is conceptually possible. Say you are using a buying/selling third party website. You list a bunch of items you want to sell from your account. Is there a way for your account on this third party website to interact with a database that you created? Such as when you create a list of items you want to sell, those items get inserted to a table. When you remove an item from the front-end, it also reflects that deletion on the database.
It would depend on the website and how their APIs work. Your description seems to assume that the third party website would trigger the call outs which is unlikely but not impossible. More likely, your own app would have to poll or scrape to periodically update your database
If the third party sends detailed update emails, that's another approach but not ideal
Polling was my idea too. I have a function that would send a GET request to grab a list of information and it would populate a table. I would get an error for a duplicate entry. Do you think writing a query that deletes all entires the table and then re-populate would be a great idea?
No, don't delete data, you can either UPSERT or just check for the data before trying to write
Yeah but a user may delete an item from the front-end so that has to be reflected in the database
Thatβs why I delete and repopulate in case a user deletes something from the front-end
You can just delete what needs to be deleted instead of deleting everything
If you're going to constantly delete your whole database then you probably don't need a database
This is the crux of the problem
In practicality, yes I could manually delete a row that has to be deleted. The secondary issue is if I go on vacation, my co-workers donβt know how to access our database and delete data
But yes you are right.
Why manually and not programatically?
To be more explicit... If a record exists in your data but not in the latest pull, then you know it needs to be deleted, right?
Yeah but I donβt know how to write a query for that
And when I pull and insert data I get the SQL duplicate primary key error anyway
These are both trivial issues. If ChatGPT can't solve them, put your best shot in a help channel
Good Morning,
regarding TimescaleDB I read this article https://www.timescale.com/blog/time-series-analytics-for-postgresql-introducing-the-timescale-analytics-project/ and just copy pasted the query SELECT lttb_downsample(time, value, num_buckets=>500) FROM data; into our Grafana and adjusted the name of our time series hypertable.
It fails with the error function lttb_downsample(timestamp with time zone, real, num_buckets => integer) does not exist. Does someone know a reason for this?
There's a similar function in timescaledb, maybe you needed that? https://docs.timescale.com/api/latest/hyperfunctions/downsampling/#lttb
Thank you for this hint!
Guys, is it a good idea to reuse ID's for user id i the user record was deleted?
No, not a good idea. Try to use UUIDs instead. Databases like PostgreSQL have native support for UUIDs. Also most NoSQL systems are used to deal with them efficiently.
PostgreSQL does have an SERIAL datatype which should do for a unique id. https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-SERIAL
A question about constructing a SQL query using python (Oracle dialect if it matters):
Say i have a column named PROVISIONED_AT which can be either a timestamp or null (not provisioned yet). A query needs to filter out the active requests, which means that column is null. How can I construct a sql statement that selects the active/expired rows based on the boolean value of the filter active?
Ideally I believe it's best to not use string concatenation to construct the query, as it will hinder DB optimization (correct me if I am wrong)
Hopefully this is the right channel to ask in:
Does anyone have any suggestions on a means of using SQLAlchemy in a similar means to MERGE?
I am trying to update a table that contains approximately 50,000 rows of Parts with a table that contains only Updated Parts. If I were to write raw SQL I could execute a merge statement and be done with it, but I'd like to try and do this by means of ORM.
Any suggestions?
Select * from table where column is null is answer to first part of question. Didnβt follow the last part
I donβt use sqlalchemy, but if you want a better search criteria, search for sqlalchemy upsert (the broader term for this operation)
No, what I meant is that, the python variable active can have 3 possible values: None, True, or False. If it's None, then this where clause is not needed. If it's True, then the where clause should be column is null. If it's False, the where clause should be column is not null. How can I alter this where clause based on the value of the inbind variable active?
I am not writing 3 separate queries because there are many other where conditions and I hope to chain this one with and
So you want β(field = value and field2 is null) or (field=value and field2 is not null)β
I am guessing where ... and ((:active is null) or (:active = 1 and column is null) or (:active = 0 and column is not null))
Can a bind variable be mapped to both null and 1/0 like that?
I guess thatβd work. A bit weird, normally Iβd hand that condition in code, in something like this
Hello chat , I need suggestions for a sqlachemy database for flask . Iβm working on a mechanic scheduler for a project and I was wondering what type of model tables I should have . I currently have a user model for users . User can book appointments based on mechanic availability slot . User can input their zip code and when user submits zip code a list of mechanics pop up with their profile .I want to integrate the Google booking api . I was wondering what other model classes I should have . I have already a login and register page functionality with the sqlalchemy database working . Suggestions thanks..
is it possible in SQLAlchemy load relationship with selectinload() technique but assign attribute with alias?
person: Mapped["Person"] = relationship(
)
packaging_works = relationship(
"PackagingWork",
uselist=True,
viewonly=True,
)
client_packaging_works = relationship(
"PostOperationExtractClientChild",
uselist=True,
viewonly=True,
)
client_person = relationship(
"ClientContactsNotification",
uselist=False,
viewonly=True,
)
and alias client_person as person in statement, because it's a second case and relationship person not loaded
Hey , can i have a suggestion for postgresql book?
New "data scientist" with a messy question.
What are some potential strategies to reverse-engineer the primary-foreign keys that a massive obscure set of databases logically have, when these pk-fk constraints are not actually present in the database, and I do not have access to the code of most of the applications that use the database?
Got a bunch of big old legacy databases to work with, a lot of which is profoundly abstract data with no data definitions that are meaningful. These databases have no foreign keys (or at least available to me in my test/production views of it.). It is part of my role to query these databases for data. I want to know some tactics to reverse-engineer the relationships that the tables should have. At least some of the tables have what are clearly intended to be primary-foreign key relationships, it's just these constraints are not imposed on the database. I am hoping to build an interactive ER diagram in Python to help start addressing this.
My not-experienced self is considering starting like this: compiling a Dataframe in Python with every table and all of their columns. Not all of the PK-FK columns are identically named, but most (80-90%) I've found (so far) are. I would begin by building relationships by checking if column names are identical and having it draw a connection (maybe working with Mermaid ?) For each connection, I would check a random sample of data, and see if there is matching data on the other side. Then, it would check the matching data to see the # of occurrences of that data in each identically named column, to see if I can imply any one-to-one, many-to-many, or one-to-many relationships for it.
I would then collect notes from interacting with the applications and interviewing the developers that maintain them to see if I can include contextual data and snip/mend/update connections.
123
The idea of trying to build a model with no domain/semantic knowledge sounds terrible. But, with domain knowledge, I might then do something like you suggested; explore the dataset to see what the relationships look like and confirm my expectations.
.describe() of each df (or subsets of a df) would be one early step , paying attention to both count and unique
anyone worked with below scenario:
I have source table, and final output csv file. The final output csv file is transformed. I have user who's requesting to have same data as csv file to be inserted into new table. But csv file is already transformed. Source table has 150k rows while csv file only holds 900 rows. I can track how some of transformation is done, but there are some columns in csv file that gives me headache. For example, in csv file, there's columnA that has value of "Yes." But in source table, the key with same name has a nested json as value, and none of kv pairs inside that nested json has value of "Yes".
Is it possible to figure out how final output was created from source table even with these attributes? How did you solve this situation? I also do not know who worked on transformation logic (no one who's involved in this task knows about it)
How is it transformed? If you have the code to transform it, then it should be possible to rewrite it in SQL as long as it's nothing too absurd
If it was transformed by hand, then you'll have to either use trial-and-error or ask someone that understand how the data is structured before/after
I gathered that from all research I make on the topic, but I appreciate the (experiential) validation. And thank you for letting me know that this would be a reasonable direction to try
i think someone else coded to transform the file; the problem is, no one know who did that. And I'm told to just "figure it out"
just found out who did it. Now the question is, how do I persuade the team that has code to let me see the code as well..
whats something that is different for every chrome browser user so like an ID that I can get and verify?
There's no one thing. Read up about browser fingerprinting.
^ this
Also you can issue a unique id to each client/session which connects to your server and persist it in the client via browser local store or cookies.
I guess this would be off topic for this channel.
Maybe #web-development ?
can issue a unique id to each client/session which connects
Seems like what open telementry are talking about in https://opentelemetry.io/docs/concepts/signals/traces/
And I agree that it is off topic.
hey guys,,
I get this error RuntimeError
RuntimeError: The session is unavailable because no secret key was set. Set the secret_key on the application to something unique and secret.
but this code seems to work
`@app.route('/authorize/<provider>')
def oauth2_authorize(provider):
if not current_user.is_anonymous:
return redirect(url_for('index'))
provider_data = current_app.config['OAUTH2_PROVIDERS'].get(provider)
if provider_data is None:
abort(404)
# generate a random string for the state parameter
session['oauth2_state'] = secrets.token_urlsafe(16)
# create a query string with all the OAuth2 parameters
qs = urlencode({
'client_id': provider_data['client_id'],
'redirect_uri': url_for('oauth2_callback', provider=provider,
_external=True),
'response_type': 'code',
'scope': ' '.join(provider_data['scopes']),
'state': session['oauth2_state'],
})
# redirect the user to the OAuth2 provider authorization URL
return redirect(provider_data['authorize_url'] + '?' + qs)`
Does anybody know what the hell?
RuntimeError: The session is unavailable because no secret key was set. Set the secret_key on the application to something unique and secret.
ok... needed to add
app.secret_key = 'super secret key'
for flask
So.. exactly what the error said? π€£
A dumb question
I need to store a python class object inside a db
Stackoverflow recommends pickling it and storing it as a blob
That's a bad idea
sqlite3.binary(pickle.dumps(Object))
And what is considered a good idea?
I highly urge you to reconsider that premise. Surely you can serialize it to something. Like proper tables or at least json?
Well, JSON could be another option
But writing data properly to the db as rows in tables is much better.
I fear there will be too many rows
I have never had a problem with postgresql
Though this could be a motivation to funally learn sqlalchemy
at least from it
Okay, I'll consider switching to postgresql
Lol. If that's the case then storing it as a pickle or json will FOR SURE make it much much worse.
What are we talking here? A billion rows?
Trillion?
For school I have to create a fullstack webshop, so a database aswell. Do I create one whole database for all the data (customer, products, orders etc). Or do I do them separate?
One database
It can contain multiple tables, if you create a db per table then you wouldn't be able to use foreign keys or do joins
Okay thank you for making it clear.
Like how do I set up a database from scratch? Do I first collect all the data I am working with and make relevant tables with it. Or are there templates?
What is the proces
You generally just design db from scratch
Depending on what you want to store and how
is it possible to get help for csv files
This isn't the right channel. But you should just ask the question.
really
ty
its a stupid question, im stupid so dont be surprised why i need help with that
import csv
filnavn = "..."
with open(filnavn, "r") as fil:
reader = csv.reader(fil)
for linje in reader:
print(linje[1])
why is it telling me list index out of range
[1] means the second item. So one line at least has just one column (for example just a line that is an empty line at the end)
yeah i know
My csv file looks like that: ",""id"",""track_name"",""size_bytes"",""currency"",""price"",""rating_count_tot"",""rating_count_ver"",""user_rating"",""user_rating_ver"",""ver"",""cont_rating"",""prime_genre"",""sup_devices.num"",""ipadSc_urls.num"",""lang.num"",""vpp_lic""";
"1,""281656475"",""PAC-MAN Premium"",100788224,""USD"",3.99,21292,26,4,4.5,""6.3.5"",""4+"",""Games"",38,5,10,1";
"2,""281796108"",""Evernote - stay organized"",158578688,""USD"",0,161065,26,4,3.5,""8.2.2"",""4+"",""Productivity"",37,5,23,1";
just add print(repr(linje)) to check what it prints before the crash
hmm.. that's a new error. Did you remove the print(linje[1]) line without telling me?
or any other change?
i removed it without telling you yeah sowwy
btw, you should REALLY write all your code in English. It makes it a lot easier to get help.
shit
import csv
filename = "..."
with open(filename, "r") as file:
reader = csv.reader(file)
for line in reader:
print(repr(line))
No. That's two changes.
import csv
filename = "..."
with open(filename, "r") as file:
reader = csv.reader(file)
for line in reader:
print(repr(line))
print(line[2])
nah wait
ok
it still gives error
import csv
filename = "-"
with open(filename, encoding="utf-8-sig") as file:
reader = csv.reader(file)
for line in reader:
print(line[0])
why does that print the whole data
im tryna get the first indexes from the lines
well... yes. I'm trying to help you. Please calm down. One step at a time. You didn't show me what the print showed the data was.
please don't sent pixels of text. Send text. https://unofficial-django-discord.github.io/rules/markup.html
why are you destroying random parts of your code in the middle of all this? It's impossible to help you.
its right above
wait
import csv
filename = "..."
with open(filename, "r") as file:
reader = csv.reader(file)
for line in reader:
print(repr(line))
print(line[2])
You wrote "..." as the filename. That's clearly nonsense.
import csv
filename = "..."
with open(filename, "r") as file:
reader = csv.reader(file)
for line in reader:
print(repr(line))
print(line[2])
That's what I mean by randomly destroying the code
You are flailing around. You need to take a deep breath. Think about things. Go slow. If you can't, then go sleep and come back to it tomorrow.
I have to be done with this today, and that isnt even part of the assignment
Next time don't do it at the last second π€£
π¦ Ik
Stop with the pixels.
Anyway. The point of the print I told you to add was that you can look at the last line before the crash. Then you don't have to guess in panic.
yeah right
i tried a few alternatives but it keeps saying out of range
Might there something off with the file it self?
Look. At. The. Print.
If you don't want help then leave. Don't ignore help when it is given.
You have been ignoring what I say for 3.5 hours. How much of a hurry are you in? Clearly you are ok with wasting 3.5 hours.
Do this.
ONLY THIS. Then look at what it prints before it crashes.
Then copy paste it so we can all see.
You told me to give me the output and I did
I didnt ignore it Im just stupid I dont understand your requests since Im not good english, neither am I at coding
-
Send text. This is unreadable on mobile.
-
No it's not what I asked for. The line BEFORE the traceback. BEFORE means above.
I cant print the output
Ok. Tell me what you see in that output.
O
k
[",""id"",""track_name"",""size_bytes"",""currency"",""price"",""rating_count_tot"",""rating_count_ver"",""user_rating"",""user_rating_ver"",""ver"",""cont_rating"",""prime_genre"",""sup_devices.num"",""ipadSc_urls.num"",""lang.num"",""vpp_lic""";]
Traceback (most recent call last):
File "C:/Users/sha01021/OneDrive - Vestfold og Telemark fylkeskommune/IT2/BlaBlaBla/csv/AppleStore.csv", line 10, in <module>
print(line[2])
IndexError: list index out of range
there is an error line under print(line[2])
And I didn't know you couldn't see it, would've helped if u told me
The first line there. What does it mean?
its the first line from my csv file
Not really. It has [ at the start. Try again.
Look at it. Carefully.
A list I know
Good. A list of what?
Your csv file consists of only one column:
",""id"",""track_name"",...,""vpp_lic""";
^^ ^^ ^^ ^^ ^^ ^^ two quotes is equal to one escaped quote
^ opening quote closing quote^
So the content is one column with the value: ,"id","track_name",...,"vpp_lic"
Yes, somehow your csv file only consists of one column. Looks like a failure at generation time.
but if i could explain it that way, the csv shows information for each app in app store. So each line represents an app, and has its own information
that was only one line
oh wait
I'm talking about your example, it's every line.
Alguien que sepa hablar espaΓ±ol para enseΓ±arme Python en directo
sorry for wasting your time
I didnt make the file so I didnt recognize the problem
That's what a help channel is about: help you find the problem and solve it. ;)
yeah thanks
but cant I still use delimeter ?
because I asked, and the teach said it is supposed to be in one column
import csv
filename = "the name is there"
with open(filename, encoding="utf-8-sig") as file:
reader = csv.reader(file, delimiter=",")
for line in reader:
print(line[2])
You wasted your own mostly. I hope you learned a lot about debugging and asking for help.
Nuh uh I didn't
you do realize I didn't solve my problem right? Cuz even if they're in one column, there should be a solving. And I actually tried to send another code that might be right, but you said I'm going too fast
I do realize the elements are in the same colunm now, great. But shouldn't the commas between the elements help to show indexes
At this rate I'm not even gonna ask for help
It's not really a database problem now, is it?
format1 = ff.FortranRecordReader('(F8.4,F9.4,ES12.4, F8.4, 1x, 1x, 1x, I3, I3, ES12.4, F8.3, F8.3, 1x, ES12.4, 1x, ES12.4, ES12.4, 1x, ES12.4, ES12.4 1x, ES12.4, 1x, 1x,I3,I2,I3,I2,I3,I2)')
Im trying to read a fortran formatted output file using this command but I get the following error
InvalidFormat:
Token: type=ED7, value=ES has invalid neighbouring token
at first I had this error but with value=x, I fixed it by putting a 1 in front of every x. But when I try this fix with ES, I still get the same error
anyone know what Im doing wrong?
No, but it wasnt from the very beggining
he told me its not the right place and i knew it wasnt
but he said I should just ask right away and I did
so it wasnt a database problem from the beginning
A CSV file is some kind of database. We did find the error: Your csv file is not really a csv file. Now I would just correct the csv file and than your code would work.
Ah. Sorry I thought you did solve it. Ok, so next thing seems to be that there are two " around each string in the headers? And each line separated by ;? Very strange.
!e Just write out the read in line into a new file, that new file will be a correct csv file:
import io
import csv
old_file = io.StringIO('''",""id"",""track_name"",""size_bytes"",""currency"",""price"",""rating_count_tot"",""rating_count_ver"",""user_rating"",""user_rating_ver"",""ver"",""cont_rating"",""prime_genre"",""sup_devices.num"",""ipadSc_urls.num"",""lang.num"",""vpp_lic""";
"1,""281656475"",""PAC-MAN Premium"",100788224,""USD"",3.99,21292,26,4,4.5,""6.3.5"",""4+"",""Games"",38,5,10,1";
"2,""281796108"",""Evernote - stay organized"",158578688,""USD"",0,161065,26,4,3.5,""8.2.2"",""4+"",""Productivity"",37,5,23,1";
''')
old_reader = csv.reader(old_file, delimiter=";")
new_file = io.StringIO()
for line in old_reader:
new_file.write(line[0] + "\n")
new_file.seek(0)
new_reader = csv.reader(new_file)
for line in new_reader:
print(line[2])
@grim vault :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | track_name
002 | PAC-MAN Premium
003 | Evernote - stay organized
Ive found a selotuion, the line being in one colunm wasn't the problem, but thank you for your help
What was?
Pythong was treating every column as an index, but in reality I want every element in that column as an index, so I had to strip and split. ```py
import csv
from collections import Counter
filename = "I think everyone knows my location because of that already"
categories = []
with open(filename, encoding="utf-8-sig") as file:
reader = csv.reader(file, delimiter=",")
for line in file:
line = line.strip().split(",")
category = line[12]
now I have no idea why the delimeter method didnt work, since it worked with my friend, but this method did work at least
One last time: Because your csv file is faulty. Your friend might have the correct one.
well my friend used the same one cuz he has no idea what he's doing so I dont think he'd find what the right csv file would look like, but I guess there's a problem
At least delete the reader = ... line. And then remove the import csv. You're not using either
Im using it for the rest
really? There's no use of csv in the code you showed
maybe because I didnt need to show my whole code idk
does anyone know of a thing like tinydb but it better supports nested arrays
like if you wanna store a dict as a value of a key in tinydb, you might as well not use it and make something with raw json and open()
what kinds of operations are you looking for?
sqlite with json columns can be surprisingly effective for that kind of thing
the kinds of things that would make sense with a json file. im still gonna use postgres for main db stuff, but there's some stuff id like to still use a json file for
stuff like storing last restart state etc
are you doing lots of things like in-place updates? or just storing whole json-ish things? are you doing any kind of sophisticated queries? what are the performance constraints?
have you considered just using postgres jsonb?
or heck, actual json files in cloud storage with uri/url stored in postgres
if you want in-memory data, then maybe an actual dict or list of dicts could be good enough
oh tbh i was just wondering if there's any package recommendation like tinydb. any more complex than that and id be better off finishing my postgres module, ive just been slacking cause concurrency issues 
What kind of concurrency issues? π€
doing +/- 1 math on every function call, ensuring that the count is correctly applied in the end, even when there's mixed concurrent operations on the same piece of data. ie concurrent -1 and +1 for the same data
I'm not exactly sure what you're doing though
Is it a python or postgres issue?
i think both idk
There are some reasons to deal with it like pessimistic and optimistic locking
Or using atomic operations, e.g. incrementing your counter only on db side (I think that should work)
that feels so confusing
all of that stuff
i feel the need to pay someone whos done it before cause i dont trust myself to make something production ready
Just use normal tables
the json thing i talked about earlier isnt related to this at all
Oh. I thought it was the same thing. Anyway: for the json thing use tables :)
Storing unstructured data is almost always a mistake.
im looking for a non db solution still. cause when my postgres module is all done, im still gonna have reasons to store a plain json file
ill show a real world example in a sec
async def killbot_cmd(message,args):
with open(os.path.join(sys.path[0], 'bot_state.json'), 'r+') as f:
state = json.load(f)
state['just_killed'] = True
state['killed_in'] = message.channel.id
state['killed_by'] = message.author.id
f.seek(0)
json.dump(state, f)
f.truncate()
await client.send_file(message.channel, 'images/killbot.gif', filename='killbot.gif')
sys.exit()
i guess in the end to meet my needs im gonna have to make something. which will basically be a wrapper for with open and maybe subclassing dict for custom getting/setting
the sys.path[0] thing looks mighty flaky. Maybe you want something more robust like Path(__file__).parent / 'bot_state.json'?
Do you have any other data in that file than those three state variables? If not, then you don't need to write the old..
the fact its a dev function is my whole point
its the stuff i havent coded an interface for yet
and dont need a whole table
like i write good code, and then i write code in every weird way when im debugging or testing new ideas. then sometimes i wanna store that data in a flat file for further debugging
like a scratch pad. is there a programming word for that
But why not?
cause i need to be connected to the db to see the data
Why that's a problem though?
id rather have a vector saved as an .svg file so i can open it literally anywhere, rather than an .ai file so i it doesnt require a computer with adobe illustrator installed
idk maybe a poor analogy but it should be obvious esp when bringing up testing / debugging
You can query a sql db from anywhere
I think you may be just making your life harder by working with files 
why are we debating using 10+ commands to get there when i can just open 1 file and use my eyes
It's not 10+ commands though
Also in this particular case - if that "state" is global, why not simply store it in memory?
Or is it unique per server?
do you have to write or update data or just read static data?
ah, yea ok
both
rule of thumb is that you are going to have a hard time trying to maintain a consistent state for your data without a DB without loosing any data.
You can look at sqlite if you really want a file.
yeah idk if anyone here gets what im doing
you are writing a bot?
unrelated, my original question was asking for a tinydb alternative that better supported nested structures
Sounds like a https://xyproblem.info/
yeah
I do :P
I still think you would be better off simplifying that code with using __file__ and opening the file twice, one for reading and one for writing.
that's also why people are trying to dig deeper into the problem you are trying to solve. Otherwise it has all the signs of a xy problem
doesnt matter tho cause in the end im not even going to have that function. its basically to ping me in the same channel i ran the command in when the bot restarts, which im only restarting cause im actively developing it
there is no problem here
im just seeking a library recommendation thats all
if there isnt one, i have zero issues writing something in house
Fantastic! Then I hope the recommendations are helpful
well no cause people are just telling me to use a database when the whole point is to not use a database
that's because you missed the point
?
If you want something simple and dirty then a json dump seems ok. Maybe add indent?
yeah im just gonna write something like tinydb but way more rudimentary
i like you can just do "db = TinyDB("name")" and then immediately do "db.search(...)"
MongoDB? Just put JSON and keep calm
that would be the mistake yea
Unstructured data or Mongo?
Both.
There are use cases where it's ok, like historical facts where you specifically want the schema of old things to be incompatible with the new stuff. But that's rare.
i mean json rn works right? any other relational database needs you to sql yadda yadda. maybe try googling for a json wrapper library?
hello
would like to create a dataset of images with each image having transparent background.
id like to add #tags to each of these image files within the dataset but i dont know how to add #tags on the images.
We need more context. Are we talking Django?
what library do you use for postgresql
Hey, So I made a Terminal Based Python Program which basically asks like 240 questions and take user input for each. I store the question number and user input in a dictionary (let it be dictionary1) I then use another python program to store the data (The dictionary1) using MySQL locally into a new table every time. Now I have two queries:
-
I will be hosting the python program with PythonAnywhere. How do I store this data locally on my pc.
Can I automate the process? Like storing that data (dictionary1) locally and also adding it to my database using the python program I made exclusively to take this dictionary and store it in a table. -
Since this is 240 questions long is there any way to save the user's progress? Something like how google forms work.
Note: If this query of mine belongs to some other channel please direct me to that channel so I can have solutions. Also, I am doing this as an project to improve my skills in programming as a final year highschooler.
Whatever django tells me to use :)
First you should have just one table. Second you should store the answers as they come in. And last if you want to have others use it I would guess you want to make it a web app. I recommend django.
what did you mean
So then I will need like a two coloumn primary key where one is the userid and other one is the question id and then store responses. This will be better? And I have never used django do you have any good tutorial that is concise?
Psycopg3 is the new good thing I believe.
I would recommend a separate primary key that is just an auto increment int. Then fk for user, question.
The official Django tutorial is quite short. It requires you to know SQL, html, css, some python, and how http works roughly. If that's too much the Django Girls tutorial is good.
Thank you
sqlalchemy.exc.ArgumentError: Could not parse SQLAlchemy URL from string 'SQLALCHEMY_DATABASE_URL'
I get that error when I try to put in the correct login info using sqlalchemy.
SQLALCHEMY_DATABASE_URL = 'mysql+mysqlconnector://root:PASSWORD@localhost/api'
The docs say not to use mysqlconnector: https://docs.sqlalchemy.org/en/20/dialects/mysql.html
Are you sure you're not passing string "SQLALCHEMY_DATABASE_URL"? Just looks weirdly similar to that π€
guys i have a project coming up soon in which i have to use SQL and python , i have basic knowledge of both of them if you guys have an idea which is not a game and unique please share it with me
I would like a nice little web app frontend for https://github.com/open-spaced-repetition/py-fsrs
#1214594873366282250 message @torn sphinx also for something that long-running, you should consider an asynchronous driver like aiosqlite to avoid potentially blocking your event loop
https://aiosqlite.omnilib.dev/en/stable/
#1214594873366282250 message ah, sqlite does flush your writes automatically during the transaction: ```py
import contextlib, sqlite3
def setup_tables(conn):
conn.execute("CREATE TABLE IF NOT EXISTS person (name, eye_color)")
def insert_rows(conn):
for _ in range(1000000):
conn.execute(
"INSERT INTO person VALUES (?, ?)",
("Foobardian Hamspamming", "lavender")
)
with contextlib.closing(sqlite3.connect("test.db")) as conn:
with conn:
setup_tables(conn)
with conn:
insert_rows(conn)
input("Press enter to commit")```
speaking of which, @torn sphinx will there be more than one connection to the database happening concurrently? with sqlite's rollback journal (the default), a write transaction blocks all other read transactions, while using a write-ahead log (PRAGMA journal_mode = WAL) allows one writer to run concurrently with other readers
if it matters that inserting your data is atomic (all or nothing), and/or you don't care about concurrency with other connections, then yes
I'm reading up on DynamoDB and trying to get started with it. I'm a little confused as to how broad or granular my table needs to be
Like, do i create a "customers" table and create a Partition key called customerID and Sort key called inventory::carts?
my table is to store configuration data about these carts
this necessarily has to return a lot of rows right? i suspect the performance is going to be poor regardless, but i guess another way to do it is to select the users and exclude guild owners, i.e. sql SELECT id FROM user EXCEPT SELECT owner_id FROM guild
Is this standard sql? π
Let me quickly read your mind (Tell us what your problem is π )
sorry
i have no idea, i just know it exists in sqlite and postgresql
so i have had this code for awhile its sherlock and its just now saying error raw file or something
Seems to be pretty widespread, but I think you could use a join too
how do i fix it
I guess this? @waxen finch
select id from user
left join guild on guild.owner_id = user.id
where guild.id is null
You'd have to share your code, also probably would be better to post it in #1035199133436354600
ok thanks
interestingly sqlite does create different query plans for it: sql sqlite> CREATE TABLE user (id INTEGER PRIMARY KEY); sqlite> CREATE TABLE guild (id INTEGER PRIMARY KEY, owner_id INTEGER REFERENCES user (id)); sqlite> EXPLAIN QUERY PLAN SELECT id FROM user EXCEPT SELECT owner_id FROM guild; QUERY PLAN `--COMPOUND QUERY |--LEFT-MOST SUBQUERY | `--SCAN user `--EXCEPT USING TEMP B-TREE `--SCAN guild sqlite> EXPLAIN QUERY PLAN SELECT user.id FROM user LEFT JOIN guild ON user.id = guild.owner_id WHERE guild.id IS NULL; QUERY PLAN |--SCAN user |--BLOOM FILTER ON guild (owner_id=?) `--SEARCH guild USING AUTOMATIC COVERING INDEX (owner_id=?) LEFT-JOIN
Postgres generates a different plan too, I guess that's expected π€
Do you have index on owner_id?
just tried it, only the latter query changes with its presence: sql sqlite> CREATE INDEX ix_guild_owner_id ON guild (owner_id); sqlite> EXPLAIN QUERY PLAN SELECT user.id FROM user LEFT JOIN guild ON user.id = guild.owner_id WHERE guild.id IS NULL; QUERY PLAN |--SCAN user `--SEARCH guild USING COVERING INDEX ix_guild_owner_id (owner_id=?) LEFT-JOIN
select id from "user" except select created_by_id from "group";
HashSetOp Except (cost=0.00..26.60 rows=110 width=20)
-> Append (cost=0.00..26.00 rows=240 width=20)
" -> Subquery Scan on ""*SELECT* 1"" (cost=0.00..12.20 rows=110 width=20)"
" -> Seq Scan on ""user"" (cost=0.00..11.10 rows=110 width=16)"
" -> Subquery Scan on ""*SELECT* 2"" (cost=0.00..12.60 rows=130 width=20)"
" -> Seq Scan on ""group"" (cost=0.00..11.30 rows=130 width=16)"
select "user".id from "user"
left join public."group" g on "user".id = g.created_by_id
where g.created_by_id is null;
Hash Anti Join (cost=12.93..24.44 rows=1 width=16)
" Hash Cond: (""user"".id = g.created_by_id)"
" -> Seq Scan on ""user"" (cost=0.00..11.10 rows=110 width=16)"
-> Hash (cost=11.30..11.30 rows=130 width=16)
" -> Seq Scan on ""group"" g (cost=0.00..11.30 rows=130 width=16)"
huh, looks pretty similar...
SELECT "user".id
FROM "user"
WHERE NOT EXISTS
(SELECT "group".id FROM "group" WHERE "group".created_by_id = "user".id);
Generated she same query as join
I'm curious how would that work with larger second table, because that hash set could become big π€
but for that particular case it's more than fine
Good day! I'm working with a database using SQLAlchemy. Two tables are connected, but they dont use foreign keys (just ids as plain int fields). I cannot change this database. Is it possible to still get a relationship between these two tables in SQLAlchemy? If not, do you have a suggestion on how I can otherwise work on these tables in an intuitive way?
I get very worried when you say you can't change it.
Anyway, I think you might be able to just map it as foreign keys in SQLAlchemy like normal and it will work, just that you won't get referential integrity so you might need to handle errors in a lot of places due to data corruption.
Ok, I think I'll do a simpler solution uusing joins then.
It's a database from a nieche software called Hoteldruid. I'm making an integration for it in our software. Therefore, I can't change the DB π
Thanks for the help.
You can create a join condition without a FK'
It sounds like you didn't read what I wrote.
It just sounded complicated, and for something that'll just be used in one script, a simple solution is fine I think.
Here is the query if you're interested.
periodi_start = aliased(Periodi)
periodi_end = aliased(Periodi)
query = (
select(Prenota, periodi_start, periodi_end)
.join(periodi_start, periodi_start.idperiodi == Prenota.iddatainizio)
.join(periodi_end, periodi_end.idperiodi == Prenota.iddatafine)
.where(or_(periodi_end.datafine > cmp_time))
)
Its in italian because the Hoteldruid database is in italian.
prenota=booking
periodi=period
inizio=start
fine=end
So I'm basically getting the booking table, and the start and end period.
But anyways, your help was helpful, even if I didn't use it π
I don't use SQLAlchemy but isn't that already how you do it WITH fks?
So it's basically the same?
I'm not good with SQLAlchemy myself. But I know there are some ways to avoid writing SQL like syntax like this, and do something that looks more OO like.
Maybe not in this case though, as the query is a tiny bit complicated? Idk.
that's not a complicated query
Alright then π
Does anyone have any suggestions for books or reference material (besides the docs) for SQLAlchemy?
I'm trying to avoid writing with raw SQL to merge two tables or basically do and update if item exists or insert if it does not and struggling to get it to work (I believe it's my pk that is the issue). So if anyone has experience with it or has any reading material they recommend I'd be forever grateful! β₯οΈ
Sounds like a fairly basic problem, if you show your code I expect someone can help #βο½how-to-get-help
As ashamed as I am to admit it, I don't believe I have noticed the help channel prior to your post. So thank you for pointing it out!
I'll get a snippet of my code and do as suggested. Thanks again!!
I have a few SQL related recommendations books (and I realized I misread the question, thought it was for SQL)
hi, how can i execute read queries to sqlite3 from the main thread and execute write queries on a separate thread without getting a "database locked" error?
Two options come to mind: a global lock, and maybe you need to go to a normal full size db like postgres
when you're using multiple connections, you should commit your read and write transactions early so they release sqlite's locks quickly
https://sqlite.org/lang_transaction.html#read_transactions_versus_write_transactions
https://sqlite.org/whentouse.html
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock [should last] for more than a few dozen milliseconds.
and/or if feasible, turn on the write-ahead log so one write transaction can run concurrently with other read transactions
https://sqlite.org/wal.html
hello , i need help , i make container oracledb local , i want copy my database from my original db to container
Im preparing to use a Generative Adversarial Network with my own images. Will be using pytorch (i think).
Is there a way that i can create the datasets using django or other database languages?
I want to create a database or library or dictionary to exist where each image asset in the library has any number of #tags that are associated with the image.
Doesn't have to be django
Im visualizing a table with 1 column and each square (vertical) is an image and each image is followed by their own row consisting of #tags
I wouldnβt use a database to store the images themselves
Images are good to store in S3 buckets, blob storages, self hosted Minio (it is all same stuff)
Or just straight in filesystem
between parent/child tables on relationship Users.dms - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, providing a list of those columns which should be counted as containing a foreign key reference to the parent table.```
class Users(Base):
__tablename__ = "users"
username: Mapped[str] = mapped_column(primary_key=True)
email: Mapped[str] = mapped_column()
password: Mapped[str] = mapped_column()
profile: Mapped[str] = mapped_column(default="https://firebasestorage.googleapis.com/v0/b/discord-83cd2.appspot.com/o/default.png?alt=media&token=c27e7352-b75a-4468-b14b-d06b74839bd8")
dms: Mapped[List["Dms"]] = relationship(back_populates="user")
def __repr__(self):
return f"Users(username={self.username},email={self.email},profile={self.profile})"```
class Dms(Base):
__tablename__ = "dms"
id: Mapped[int] = mapped_column(primary_key=True)
sender: Mapped[str] = mapped_column(ForeignKey("users.username", ondelete="CASCADE"))
receiver: Mapped[str] = mapped_column(ForeignKey("users.username", ondelete="CASCADE"))
user: Mapped["Users"] = relationship(back_populates="dms")
def __repr__(self):
return f"Dms(id={self.id},sender={self.sender},receiver={self.receiver})"```
Could someone help me get a better understanding about sqlalchemy relationships?
You have multiple foreign keys, sqlalchemy can't determine what FK to use for Dms.user relationship
Also probably don't use username as PKs 
You should add FK explicitly in this case:
user: Mapped["Users"] = relationship(back_populates="dms", foreign_keys=[receiver]) # Depends on what FK you want to use here (receiver or sender)
it's just a fun side project
Would this work?
dms: Mapped[List["Dms"]] = relationship(back_populates="user",foreign_keys="Dms.receiver")
I mean, yes, but only for one of the foregin keys
I think it would be better to add FK to Dms.user
I assume dms relationship on User would know what FK to use since you specify it on the other side of your relationship
Like this user: Mapped["Users"] = relationship(back_populates="dms",foreign_keys="Dms.receiver")
in the Dms class
I think so
You may need to use list for foreign_keys though, I don't remember the exact API
is Dms.receiver the same as [receiver]
Yes
Does relationship make sqlalchemy perform a join or select statement when accessing a row from the table with a relationship?
I read the documentation, but I want to make sure I undertand what they're saying with relationships being somewhat confusing.
Depends on how you choose to load it
By default if you don't use async I believe it has the same behavior as djang orm, it would do a query per each access to a relationship field you didn't access before
So N+1 problem
Could you give me an example on when a relationship would be loaded. I know there are different methods to loading it. What loads can you not use for the Async engine?
This will do a join:
select(User).options(joinedload(User.posts))
This will issue a separate query with where post.user_id in (...) filter
select(User).options(selectinload(User.posts))
Would using separate queries for relationships be faster than using a join for one query?
It depends on query complexity and width
I'd say
If you have many to many relationship it would be better to use selectinload
But if you have a 1:1 join should perform the best
Is there a default load or do I have to explicitly state what type of relationship load I want for sqlalchemy to even load the related objects?
I wouldn't recommend that
I'd use default load as raise so it throws an exception π
If I did just select(User) would it load the related objects?
No
And using async, I have to explicitely state what type of load I want?
or is it for both sync and async
Yep, I'd do the same with sync
With sync if you access user.groups it will try to load that relationship, and in async it's simply not possible because it can't await that operation, so it errors out
You can override that behavior in sync sqlalchemy though
Okay thanks for all the info. Relationships are a tough concept to grasp
like how do i do this thing
This doesn't sound like a database question, it sounds like a pandas thing or something
sorry i will remove this
can someone help me with this mongodb question? https://discord.com/channels/267624335836053506/1215319657615986770
@paper flower py friends = await db.execute(select(models.Users.profile, models.Users.username,models.Dms.id).join(models.Friends, or_(and_( models.Friends.receiver == current_user.username, models.Friends.sender == models.Users.username), and_( models.Friends.sender == current_user.username, models.Friends.receiver == models.Users.username))).outerjoin(models.Dms,or_(and_(models.Dms.sender == current_user.username,models.Dms.receiver == models.Users.username),and_(models.Dms.receiver ==current_user.username,models.Dms.sender == models.Users.username)))) any recommendations on how to improve this query?
- Would be good to know what you're trying to achieve this
- Let's fix the formatting so it's easier to read (and probably get rid of
models.priefix as it takes too much space IMO):
friends = await db.execute(
select(Users.profile, Users.username, Dms.id)
.join(
Friends,
or_(
and_(
Friends.receiver == current_user.username,
Friends.sender == Users.username,
),
and_(
Friends.sender == current_user.username,
Friends.receiver == Users.username,
),
),
)
.outerjoin(
Dms,
or_(
and_(
Dms.sender == current_user.username,
Dms.receiver == Users.username,
),
and_(
Dms.receiver == current_user.username,
Dms.sender == Users.username,
),
),
)
)
So are you just selecting Dms.id essentially? π€
I think if you use relationships on your models it could be simplified a bit
Hi
How to create alembic migrations for PostgreSQL database with different schemas and ORM-based sqlalchemy client?
I can do it without schemas supply
But canβt do with them. ORM classes of tables canβt see table names in query
<@&831776746206265384>
Is anyone aware of a way to create a relationship using a hybrid property that references a relationship? On the model TaskCircuitBuildSubtask below, I've defined the relationship 'nni_list', but this is causing the below error. The relationship itself is referencing 'supplier', which is a hybrid property. The hybrid property itself also references the relationship 'task', which I think is specifically causing the issue.
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with TaskCircuitBuildSubtask.task has an attribute 'interface'
The hybrid property itself works as expected and returns the correct value.
`class Nni(Base):
tablename = 'nnis'
id = Column(Integer, autoincrement=True, nullable=False, unique=True)
parent_id = Column(Integer, ForeignKey("interfaces_juniper.id", ondelete="CASCADE"), index=True, primary_key=True)
nni_type = Column(String(32))
supplier = Column(String(32))
nni_ref = Column(String(128))
description = Column(String(64))
platform = Column(String(32))`
`class TaskCircuitBuild(Base):
tablename = 'task_circuit_build'
id = Column(Integer, autoincrement=True, primary_key=True)
customer_code = Column(String(16), nullable=False)
support_ref = Column(String(16), nullable=False)
created_by = Column(String(255), nullable=True)
date_created = Column(DateTime, default=datetime.datetime.now)
date_completed = Column(DateTime, nullable=True)
status = Column(Integer, default=0)
family_type = Column(String(255), nullable=True)
speed = Column(Integer, nullable=False)
supplier = Column(String(255), nullable=False)
layer = Column(Integer, nullable=False)
as_target = Column(Integer)
vrf_target = Column(Integer)
config_holder_set_id = Column(Integer, ForeignKey("config_holder_set_juniper.id"), index=True)
subtasks = relationship("flex.models.TaskCircuitBuildSubtask", viewonly=True, lazy='joined', backref="task")
routes = relationship("flex.models.TaskCircuitBuildSubtaskRoute", viewonly=True, lazy='joined', backref="parentTask")
config_holder_set = relationship("juniperConfigHolderSet", lazy='joined', back_populates="task_circuit_build")
task_queue = relationship("TaskQueueDeploy", lazy='select', backref="build_task")`
`class TaskCircuitBuildSubtask(Base):
tablename = 'task_circuit_build_subtask_unit'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey("task_circuit_build.id", ondelete="CASCADE"), index=True, nullable=False)
handoff_type = Column(String(20), nullable=False)
node_id = (Column(Integer))
interface = Column(String(64))
unit_num = Column(Integer)
vlan_id = Column(Integer)
inner_vlan_id = Column(Integer)
ip_address = Column(String(20))
configuration = Column(String(4294000000))
preference = Column(Integer, nullable=True)
bfd_enabled = Column(Boolean, nullable=True)
active = Column(Boolean, nullable=True, default=True)
@hybrid_property
def supplier(self):
if (self.handoff_type=="CUSTOMER" and self.task.layer==2):
return self.task.customer_code
else:
return self.task.supplier
nni_list = relationship('Nni', foreign_keys=[Nni.supplier], primaryjoin="Nni.supplier == TaskCircuitBuildSubtask.supplier", uselist=True, viewonly=True)`
Okay
!code see here for instructions on formatting in a "code block", it's a lot nicer than inline formatting like you did
I have the user_genres and a database table with party_genres, i want to return the party with maximum intersection, how do i write the mongodb query?
i started playing with prisma and im impressed
dont really know the advantages it has yet, but im impressed solely by the fact it generates a python library that's tailored to my db
and that you have full control over the name of the attributes for your objects too, via mapping in the prisma schema
I don't think using the relationship in the query would simplify it
can you clarify in more detail what you're trying to do?
i haven't needed graphql in a while but i remember also being very impressed with the graphql interop that it offers
I'm trying to get the current user's friends which the friend being the models.Users and the dm associated with the current user and its friend
yeah just so many things you can do with it. seems like a full stack dev's best friend
indeed. for a pure backend project i'm not sure i feel the need for it, but if you know you're going to use graphql and js/ts on the frontend then it seems like a great idea
i get a little leery because the prisma project itself seems less open to the idea of a broad ecosystem built around their language. there isn't really even a mention of it in the docs. seems like a second-class citizen thing. they seem to think of themselves primarily as a javascript/typescript framework
im only using it for its ORM features, but i think so far it provides a pretty nice interface for connecting to the db
Wouldn't it be easier to do this?
select(User).join(User.friends)
that could make it easier
didn't think of that
You can also specify a condition in that join too
sqlalchemy never ceases to amaze and confuse
π
im ditched sqlalchemy for it being way too hard to figure out
that query I sent hurts my brain
select(User).join(User.friends.and(User.username.ilike("something")))
Though not sure if it will use correct alias here
Since User.friends are Users too
sqlalchemy is great for reminding me of how i felt when i was first learning other complicated tools like git and even python. i got very used to deeply understanding my tools. with sqlalchemy i feel like i'm back at the beginning, picking the small subset that i understand and ignoring the rest.
Yeah, same, because you can have a "friend" relationship go either way
Maybe there's a way to simplify it, what db are you using?
postgres
Maybe tuples could work π€
it wouldnt be that bad if every stackoverflow post wasnt outdated and their docs were better designed
(a, b) in ((a, b), (b, a)) You could do an operation like this
@paper flower is it true that you're one of the top contributors of FastAPI?
Is it just about Design or just docs structure?
I didn't contribute a line into fastapi
oh
i mean both when i say design
Honestly you shouldn't care about design, and their docs structure isn't bad
design means accessibility so
like what readability means
Bad design != not readable to be honest
gotta disagree there
This is pretty readable to me 
I've been reading the documentation whenever I have free time
I feel like I have a better understanding
Sqlalchemy docs look bad at first, you get used to them in like a week or two of active reading
Worth it tbh
look at how bad the search is
but then again I knew the basics of sqlalchemy when I read the documentation
I think I may know a solution
the search doesn't do a good job. they do a poor job of separating "very important" information from "arcane miscellany". they tend to jump right into examples without spending some time to clarify the data model of the thing you're trying to read about. the api reference pages being huge massive all-in-one affairs makes it very hard to find things.
you can absolutely learn from the docs. but sometimes it will take you a while to find what you're looking for, or you might never even realize you should be looking for it.
from sqlalchemy import select, tuple_
from sqlalchemy.dialects import postgresql
from app.db.models import Manga, MangaTag
stmt = select(Manga.id).join(Manga.tags).where(
tuple_(Manga.is_public, MangaTag.name).in_([
(True, "name1"),
(False, "name2")
])
)
print(stmt.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))
SELECT manga.id
FROM manga
JOIN manga__manga_tag__secondary AS manga__manga_tag__secondary_1
ON manga.id = manga__manga_tag__secondary_1.manga_id
JOIN manga_tag
ON manga_tag.id = manga__manga_tag__secondary_1.tag_id
WHERE (manga.is_public, manga_tag.name) IN ((true, 'name1'), (false, 'name2'))
I honestly don't understand the code
but I guess it's a learning opportunity
Here be dragons
i actually do appreciate those. i shouldn't imply that it's universally bad. for example the pandas user guides are still worse.
the fact that a complete and completely type-hinted reference doc exists is amazing
and with near 100% docstring coverage too
It did go a long way since 1.3-1.4
and the info is all there. it's just a questionable UI and information hierarchy
yeah i definitely remember it being worse.
like matplotlib. i was pleasantly surprised when some time around the 3.x release the docs started to become useful
Optional type hints were introduces in 1.4.something, started using them immediatelly
And in 2.0 you can declare a column with just name: Mapped[int], awesome
yup, i love that
however db.execute(select().scalar()) vs. db.scalars(select()) is confusing
I'm seeing select().scalar() for the first time tbh
Maybe
execute().scalar/s()?
They're the same, but Session.scalar/s are just helper methods
wait maybe i have that backwards
yeah, this.
see, i can't keep them straight π
You can turn scalar() into
execute().scalars().one_or_none() π
They should do the same thing
yup. too much convenience...
execute returns everything you selected in select, in case there are multiple elements (ORM models "count" as one though)
scalar returns first element and a single row
scalars returns first element and all rows
E.g. if you want to fetch all users you'd do
session.scalars(select(User))
but there's no session.scalar π
One user:
session.scalar(select(User).where(User.username="abc"))
There is
also .where and .filter being synonymous
wait really? i definitely tried it the other day and got AttributeError
maybe i wrote saclar or something
well that's convenient
i would hope not but anything is possible
i take back all criticism, sqlalchemy is perfect
fix it 
i don't have that power
It's cool that sqlalchemy allows you to use any custom operators too
if filter.search_term:
stmt = stmt.join(Manga.alt_titles, isouter=True).where(
AltTitle.title.op("&@~")(filter.search_term),
)
idk, everything is so complicated with there being multiple foreign keys in one table and having to perform multiple joins
It kidna hurts my brain too π
Let me google a bit, maybe there's a better way to model that relationship
Seems like a good article: https://minimalmodeling.substack.com/p/modeling-mutual-friendship
Prisma Editor: Prisma Schema Editor, Prisma Schema visualization, visualize and edit Prisma schemas.
im trying to query out only parties having 2 or more matching genres
Hi, I need an advice..
So am working on an app that basically needs to perform alot of inputs, the user will create and I will save his filtered data and it will affect multible tables, then in update and delete, I will have to go through those tables to update the changes and it becomes a bit complicated..
I want to make a non-relation table that stores everything and deal with the creating and changing in a simple way, and then make a schedule task to take this raw table's data and store them properly in other tables to make the reports, what do you think?
I think that sounds like a bad idea. Not enforcing the data to be the correct shape causes all kinds of problems.
My Instagram account was ban please help me for unban instagram account
This is not instagram support
Please help me
<@&831776746206265384>
alright he's using an alt
You're right it's silly, to put my faith in some scheduled task sounds terrible..
Maybe I should stick with the original models and deal with the complexity since there are no more than 100 users to manage..
You're putting the big update in a transaction I hope?
What do u mean?
Do you know what a transaction is?
with atomic() creates a transaction yes.
if you put your code inside a with atomic() block yes. Otherwise no.
And this allows me to do what?
Like undo the transaction if it fails or something?
This is my first big project so I still have no idea about the best practices
You will either get all changes inside the atomic block or none. This is very nice if you're modifying many tables at the same time.
Wow, thanks man.
This is the first time I hear about this.
Are there any sources that I could learn this type of things from?
I would assume the Django Girls tutorial mentions this all
I found a really nice book π called the temple of django database, but thank u anyways
I found a good package https://github.com/bruin-data/ingestr
I feel like there is no "better solution" to the query I sent first. The complexity of the other queries would be the same.
tuple looks a bit better IMO, but it's a postges specific thing
SQLite too
I kind of want to leave it the way it is because it's easier for me to read the first query I sent
It's kind of hard taking advantage of relationships when there are multiple foreign keys in one table.
hello guys im using sqlite3 for a python/pygame game project and ive ran into a problem where my score is not getting updated into my database. i have aleady debugged and found that the problem lies somewhere within the data being updated into the database. not sure if theres something specific im mssing out
There isn't anything for us to go by in that.
import sqlite3
from login_form import LoginForm
score = round(end_time, 2)
print(score)
if LoginForm.logged_in_user:
db = sqlite3.connect("main.db")
cursor = db.cursor()
update_score = 'UPDATE user_scores SET score = ? WHERE username = ?'
cursor.execute(update_score, [score, LoginForm.logged_in_user])
db.commit()
db.close()```
sorry i realise that was vague but this is my logic for updating the database it just doesnt do it
im new to this as well id like to get that out there
You need to create an instance of that LoginForm class.
guys anyone expert with google cloud services ?
i believe i have
oh sorry i misinterpreted no i havent i will attempt that thanks
You should ask your actual question. But probably not in this channel if it's not specifically about a database issue.
I have a MySQL database setup on an external host, I have the host, user, password, and port. Do I have to use a Python package or can I manually send requests to the database?
friends = select(
case(
(models.Friends.sender != current_user.username, models.Friends.sender),
(models.Friends.receiver != current_user.username, models.Friends.receiver)
).label("username")
).select_from(models.Users).options(joinedload(models.Users.friends)).join(models.Users.friends).filter(models.Users.username == current_user.username).subquery()
friends_information = await db.execute(select(models.Users.profile,models.Users.username,models.Dms.id).join(friends,friends.c.username == models.Users.username)
.outerjoin(models.Dms,or_(and_(models.Dms.sender == current_user.username,models.Dms.receiver == models.Users.username),
and_(models.Dms.receiver == current_user.username,models.Dms.sender == models.Users.username))))
friends_information = friends_information.all()``` @paper flower I spent hours just to produce this
there's no hope