#databases

1 messages · Page 41 of 1

paper flower
timid brook
#

Hmm I have seen some converter components already as part of pandas and qt6 so maybe I could go that route and just leave sql to do its basic thing since one part of the issue is display and the other part is the import. But the import I am massaging the data so much already it is easy to just adjust the input type to be an actual date for example.

coral wasp
naive otter
#

any1 know anything about aws dms endpoint creation?

#

when selecting RDS instance nothing shows up but i have rds aurora postgresql db

civic cloak
#

I was pondering this exact issue and decided to hop on here to ask. Saw this from a few days ago and it answered my question in full. Cheers mate!

prime cypress
#

Guys I am creating a legal document app which is using django.

I have permissions based on organization and subcategories within the document. A user is assigned to manage a subcategory.

Right now, my document is created using many foreign key. I can't denormalize it because of permissions etc.

I am using django with pgsql.

I am not sure which method to follow:

  • Temporal database
  • Django-simple-history (relation with history_id?)
  • or any other approach that could be better would be appreciated!
tepid basalt
prime cypress
#

I could share it but I have app for authentication and the project itself.

prime cypress
#

I'll paste the model code in a github repo for you to have a look into

prime cypress
tepid basalt
# prime cypress https://github.com/abubakar20-02/code

Thanks for sharing! I haven't ever built anything similar to this, but it looks like you have some clearly defined requirements and constraints. A couple notes, might be worth adding the answer to some of these as doc strings:

  • A User's organization can be null. Can a user exist without being a part of an organization?
  • The TenantOwnedMixin defines an organization and an user field. A user can also belong to an organization. Can the organization ever be different from the user's organization? If not, consider a boolean field for is public or not instead.
  • There are a lot of RFP models. What is the relationship cardinality between them? For example, if a GeneratedRFP can only have one SubmittedRFP, consider combining them into one model.
  • Can GeneratedRFP.organization and FinalizedRFP.organization be different? If not, just keep one.
  • On FinalizedRFP, if there can only be one per GeneratedRFP, consider using a is_finalized boolean field instead on GeneratedRFP.
  • In ResponseRFP there are three boolean fields, and a status field. Are these all distinct pieces of data? Perhaps they can be combined into one status field.
  • Instead of naming your foreign key fields to User, user. A more descriptive name like created_by or owner would be helpful.
prime cypress
#

Thanks for looking into my app.

  • a standard user cannot exisit without being part of an organization.
  • as for your second point if a data does not have organization_id attached to it, it becomes publicly available to all organizations.
  • for your third point, I was expirementing different approaches and at that time, I had tried to create a denormalized copy. There is another tracking feature I was going to work on which required sql like behaviour.
  • you are right here.
  • I wanted the form to be immutable, hence why I created another table.
  • you are right for the other 2 points.

Now for what I am trying and it does seem to be a better solution imo that temporalor history tables.
I have tried to containerize all relations in the rfp document- they have their own copy of relational data which then can be used to query using relations.

Do you think this would be a good approach at scale? I believe this should get rid of the mutable issue.

tepid basalt
# prime cypress Thanks for looking into my app. - a standard user cannot exisit without being...

I think the different tables make it complicated by introducing a lot of redundancy. Instead I would use just one table for RFP, if you need to be able to view the same document at different steps of the process (submitted, accepted, finalized), then I would add one other table to hold the unique IDs and any constant data (user, organization). You can use composite unique constraints to enforce one finalized version per RFP for example.

I know I don't fully understand all your requirements and constraints, but maybe that's helpful.

wicked geyser
prime kelp
#

I'm trying to use sqlalchemy and I'm really struggling with the typing from an IDE perspective

#

I'm on sqlalchemy version 2.0.42

#

Same thing when I just try and do this simple getting-started script

#

From what I can tell there is no maintained types lib for sqlalchemy?

wheat basalt
#

Guys, i need some advice on something, basically im building a side project that involves storing a huge amount of chat messages (encrypted and embedded) for semantic search using ElasticSearch, my laptop isn’t quite powerful 16GB and 512GB storage, from a system design perspective, what is the most optimal why to store a huge amount of messages at once in seconds so that the user can’t wait that much?

finite kelp
finite kelp
sharp temple
#

I want to learn some SQL, I do have mysql installed on wsl ubuntu but are there any massive databases I can connect to to practice on? Saves me making loads of tables and stuff, unless there is an easy way to do that

coral wasp
fallow sapphire
willow eagle
#

Hi, I want to use dbdiagram.io for my django webapp, idk how, can someone help me? When I coded the tables, how do I link them?

tepid basalt
lofty summit
#

looking at deploying a python package directly from github action, pypi recommends publishing a new version to its test server for every commit? :monkacoffee: https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/#separate-workflow-for-publishing-to-testpypi

seems like a waste

And it’ll publish any push to TestPyPI which is useful for providing test builds to your alpha users as well as making sure that your release pipeline remains healthy!

I don't get it, alpha users can download it the from the repo directly like pip install git+...

storm mauve
# lofty summit looking at deploying a python package directly from github action, pypi recommen...

that is more on topic for #packaging-and-distribution than this channel

that said, for projects with multiple contributors you should never commit directly to the main branch - only merge PRs to it, and it is common to run tests + (when applicable) test that building and packaging and such works on each merge

having those versions available in a package index can make it a lot easier for users to help find regressions (bugs, performance issues, etc) without needing to manually install and bisect - pip install git+ requires building the project locally, which can be complicated for non-pure-python projects

floral willow
#

Hi guys need help in mysql replication setup, any dedicated channel

floral willow
#

Is there any dedicated server for MYSql

timid brook
# floral willow Is there any dedicated server for MYSql

possibly but you would have to google for it as this is the python one but there are certainly channels for common topic overlaps. So if you are using python and MySQL you could still just ask here, otherwise you can also try looking for another server. Best bet in finding one would likely be finding the subreddit for MySQL and seeing if they have a discord link. They probably do.

silent mountain
#

should I implement sqlite3 for my sproject that has a track of your all expenses??

north quail
#

ive heard database type of questions have been going out of date for interviews and similar envirements does that mean leetcode should be less useful for the newger generation of coders such as myself

valid umbra
slow condor
#

Yall how do I perform a ddos attack vía ftp protocol

verbal loom
#

Wrong channel and we can't answer anyways

jade wing
river wyvern
#

If I every day get zipfile with a bunch of data, I want to load this file into a database in a good way.

What the zip file is about ~2-3gb's of data, in a bunch of .json.

Currently loading this takes about ~45-60 minutes for me. Is this resonable?

jade wing
river wyvern
jade wing
river wyvern
# jade wing everything installed locally at least prevents any latency that the network coul...

Im just more messing around, there's some loose plan it could be a public api at some point.

Then this question might apply.

If I have a zip with a bunch of data.

if I want to access data, I assumed I should load this data into a Database for fast access. Is this assumption correct?

As an example If I want the revenue of Apple 2023 i figured it would be faster to find that from within a database rather than sort of parsing through the .json in python.

timid brook
jade wing
# river wyvern Im just more messing around, there's some loose plan it could be a public api at...

consider that compressed data will generally be faster to read from disk then uncompressed data since it's less data to read from disk
but instead it will use more cpu to uncompress the data each time it needs to be read it and many compression formats and algorithms (but not all) requires you to read the file from the beginning to uncompress it until you have read and uncompressed the chunk that contains the data that you are looking for
some algorithms and formats can uncompress chunks directly without reading and uncompressing all of the proceeding chunks, but then you need to know where in the file the data you are looking for resides through some kind of index or similar, that is a challenge in itself

generally a database of some kind (one that is suitable for the sort of data and data structures you have) will probably be better to find data, especially if you can apply the right kind of index (for the sort of data in question) on the data

river wyvern
timid brook
river wyvern
timid brook
# river wyvern ITs looking at forms submitted, so quarterly and annaual reports, however, there...

I am not sure if you are not fully understanding what I am saying or like just not sharing enough information but like what is the nature of the relationship between the data each day? Is every day like every single copy of every form ever submitted, and thus it is all the forms from the previous day plus the new ones or something completely different? Is it just a bunch of freeform text fields for every form or explicit details and maybe a 'comment' field or something? What is the goal of the data, like what do you want to do with it exactly?

timid brook
dense ravine
#
#Creating local temp table
 source_conn.execute(f"""DROP TABLE if exists {temp_table};""")
 source_conn.execute(f"""CREATE TABLE {temp_table} (LIKE {table} INCLUDING ALL);""")
 source_conn.execute(f"""INSERT INTO {temp_table} SELECT * FROM {table};""")

Guys I wanna create copy of table in postgres

#

Is this the best way?

#

Seems quite slow

#

Goal is to send a table from local to remote. Without creating any downtime. To make it transactional

#

So I create a temporary table.

tepid basalt
#

You can format code with triple backticks:
```python
code
```

tepid basalt
# dense ravine Goal is to send a table from local to remote. Without creating any downtime. To ...

In the past I've used duck db to copy a table between databases. That worked pretty well.

    # Connect to the databases
    duckdb.install_extension("postgres")
    duckdb.execute(f"ATTACH 'host={source_db.host} port={source_db.port} dbname={source_db.db} user={source_db.user} password={source_db.password}' AS source_db (TYPE POSTGRES);")
    duckdb.execute(f"ATTACH 'host={target_db.host} port={target_db.port} dbname={target_db.db} user={target_db.user} password={target_db.password}' AS target_db (TYPE POSTGRES);")

    # Copy table
    duckdb.execute(f"DROP TABLE IF EXISTS target_db.{table};")
    duckdb.execute(f"CREATE TABLE target_db.{table} AS FROM source_db.{table};")
dense ravine
#

Okay so this is just a copy

#

Does duckdb have a full solution. To send a table from one db to another in a transaction?

tepid basalt
#

As in a database transaction? Yeah that's supported by duckdb.

jade wing
harsh pulsar
timid brook
harsh pulsar
#

will you drop your database? unlikely. will your app crash with an error? likely.

timid brook
icy glacier
harsh pulsar
harsh pulsar
#
with db.cursor() as c:
    c.execte("DROP TABLE IF EXISTS identifier(%s)", table_fqn)

it's very very useful for this kind of thing

icy glacier
storm mauve
#

security, access control, data ownership, centralized storage, easier replication, syncing between devices?

Though many apps do store data locally, and Firebase is a thing

#

who has access to which parts of the data (be it per tables, columns or rows)

#

for single user applications, sure
for applications that must interact with data produced by others, you would either end up with a centralized database of some sorts or need to use some federation model, which can be annoying to work with

#

and yeah, by ownership I mean "(some) companies want to keep the user's data hostage, stored exclusively in their servers"
see also: discord messages

#

maybe take a look at ActivityPub and AtProto if you haven't heard about them before

not databases, but federation models to share data between different applications without one centralized server
-# the data is still stored in servers though, just not one server

#

(also don't forget that client-only applications exists, i.e. do not interact with any external servers at all, in which case you can just store everything in a sqlite file... or just store data locally and use APIs for whatever)

#

as long as you write it in a cross platform language/framework, you don't need to rewrite for each type of device?

no need to separate into a server

torn sphinx
#

Is there a way to check a range of numbers?

#

I'm using a stack of if statements to check if int(input()) is 1-4 by doing
if input == 1:
** ** elif input == 2:
** ** elif input == 3:
etc....

What is the shortcut?

#

if input = (1-4)
?

#

Oh pshshhh

#
print("Valid Output")
.   else: 
.        pass```
jade wing
# torn sphinx Is there a way to check a range of numbers?

yes, several ways, but this doesn't sound like a question that has anything to do with databases, which is the topic of this topic channel

but while i'm at it i'll answer you, but if you have follow-up questions or further questions of this general nature it's better to take it in #python-discussion or your own help channel #❓|how-to-get-help

with all that out of the way, the two most common ways of checking if an integer in a variable (let's call it x) is between two integers (inclusively), let's say 1 and 4 as in your example, are:

if x >= 1 and x <= 4:
    print(f"{x} is between 1 and 4 inclusively")
```or a more mathematical and compact (and in my opinion more concise) way:
```py
if 1 <= x <= 4:
    print(f"{x} is between 1 and 4 inclusively")
```and one last thing, don't call a variable `input` as that would "hide" the built-in function (or at least make it much less accessible) with the same name that is part of the core python language
#

but it's also stuck on that device and many people use more then one device nowadays, so synchronization of data between that persons different devices and knowing which instance is the latest and correct might be a big issue
also, backup of the data will be all up to the person, and some people will be responsible for their data and back it up properly, but a lot of people today are so used to having that taken care of for them and if something happens with their device they can recover most of their data by just logging in to their apps

keen minnow
#

how would you solve synchronization?

Dealing with conflicts is a tough problem. Other interesting problems:

  • How do your device A and B find each others?
  • You login with device A. You do some work. Then A disconnects completely. Now you login with device B but device A is not available anymore to make available your latest changes
  • Migration of schemas and generally dealing with difference of versions are gonna be more work to ensure a compatible path
#

All in all, you are creating a lot more work for yourself without a well define value

#

nextcloud is privacy first and they do use a server model

paper flower
#

You can look into projects like anytype, they use server for data sync, but it's optional, and all of your notes are encrypted.

jade wing
#

signal only allows for one device to be logged in to an account at a time and if you change device it is up to you to first migrate your information to the new device if you want to keep it

#

it's cumbersome to say the least

#

no, since you can't be logged in to any second device other then via web browser which is kind of a hack which has some security problems of its own

#

oh, yes the other party, they have their own copy for their account

#

it's not a sync, you send and they received, as it's messages

#

if one party deletes something it will still remain for the other party, it's not synced at all

#

no, it's not at all the same if we are talking databases

#

it's vastly different from messages

#

one needs consistency and the other doesn't

#

in that case you will have to look into "eventual consistency" that some database clusters use, but it can be extremely complex, especially for this kind of scenarios that we have been outlining above and can also be a less then satisfactory (and extremely confusing) user experience

#

and if a device which had the last update breaks before it can sync with one of your other devices that data is just gone forever

#

and this is very different then a database cluster where all nodes are more or less expected to be connected with one another at all times to exchange data continuously with only edge cases and failure scenarios where they wouldn't be for the wast majority of the time

#

that isn't at all true for user devices

#

also, Signal and WhatsApp transfer information between parties in the conversation using each platforms servers where it is stored temporarily in an asymmetricly encrypted form until the other party can pick it up, so it is not strictly local in that sense

#

if you want to keep it only local i mean

#

you also mentioned processing of the information server-side, then there is still the issue of user trust, gdpr and all of that

#

unless you strictly use homomorphic encryption (look it up) for the information that is sent to the server, which is a really cool technology but that is still being researched and can't be used for just any processing yet

#

GDPR compliance officers and lawyers probably beg to differ, as long as you process the information or even store it only in ram... it can still be leaked and you would be liable for that then

#

if there is no PII (if not even by reference) you should be safe from GDPR (but don't take my word for it, it's not legal advice)

#

yeah, discord isn't really a good reference, and they do store PII, e-mail, IP addresses and even the username or user ID (even the numerical one) counts as such

#

either way, i think an eventual consistency scheme with such sporadically interconnected and low number of nodes is ripe for data loss and extended times of inconsistent data and only partial data being available, will be the trade-off here, the question is if that is acceptable to users

glacial nymph
#

at two years ago

#

I use

#

sqlite3

#

because

#

easy

#

comments

#

but

#

now

#

I use

#

mysql

#

because

#

faster (:

timid brook
#

both of them are faster than one word per message I bet

hardy tapir
#

Hey guys I want to learn data bases from scratch, I want to learn dbs for both data science and machine learning as well. I have good programming background as firstly I learned C language and I am working on python and it's libraries for previous 4 months. I want to be the suggested sources to learn dbs

waxen burrow
hardy tapir
#

Thanks bro

coral wasp
lean flint
#

Hey.
I have a bit of a specific scenario on my hand regarding a databse/how to handle it and i would like input from the a greater audiance 😄

#

Is anyone up for a quick discussion?

#

(I guess i will just get to it so if someone reads it later they are up to speed)

#

So. I am making a website for a friend.
It needs to hold several tests and their questions + data of the questions.

As of right now i opted to use 1 big database that holds all of the questions where the columns are.:

  • question-id
  • language
  • text (The question itself)
  • test (Which thest the question belongs to)
  • tags (Future expandability/test specific things if any)
  • revers (Boolean. Important for result calculation only)
  • min_val
  • max_val
  • domain
  • style
#

There is localization on the seite (Or at least planned to be) and the question id is going to be the same between the same questions just in different languages.

#

I am pretty happy with this setup so far but i talked with a friend and they did not like it at all and recommended that every test is it's own table and to hold the localized texts in the same row just in a differnt collumn.

#

And i am thorn.
Because my version holds a lot of metadata that is shared betwen questions (localized questions are basicly the same except 2 collumns)

#

But his proposal makes addition of a new language kind of a nigthmare and a new test means a new table.

silent bison
lean flint
#

You mean something like.:

QuestionMeta table that holds the meta data that is shared between localized questions

and

A Questions table that holds the language and the text for the question itslef while both have a shared question ID as a link?

lean flint
#

I am a bit lost chief lol

silent bison
#

this is following DNF rules

lean flint
lean flint
#

Appriciate the input

silent bison
hybrid dirge
paper flower
hybrid dirge
#

Document databases are not always correct, nor should my statement imply that, it's just a knee jerk response to your given scenario since transforming and mapping within a python ecosystem is much more user-friendly since you just apply your core methods to literal objects.

paper flower
#

Certainly not worth using a different db in their case imo.
By the way, recommended solution looks fine.
The way you'd probably expand on this if you need multiple types of answers is to embed json into the Question table to hold the question type and it's parameters, e.g.:

{
    "type": "number",
    "expected": 42
}
{
    "type": "string",
    "expected": ["any", "of", "these"]
}

but you'll still bump into localization problem if you want your answers to be localized too

paper flower
#

In my case MongoDB was recommended by system architect for some reason, but I ended up just using postgres, sqlalchemy and embedded the document I needed into the sqlalchemy model/table, wrapping it into a pydantic model.

hybrid dirge
# paper flower My experience with mongodb was mostly negative, maybe I wasn't using it correctl...

I had a similar time walking into mongo from a die hard sql background myself.

It took multiple projects leveraging both SQL and document db styles to realize the benefits of both.

It's actually really reminiscent of the paradigm of dynamic vs static programming as you get similar problems. Document drift is real and will indeed hurt you if your migrations do not respect previous versions, where in SQL you can process migrations with a stronger sense of confidence that you won't blow up your entire schema.

paper flower
#

I don't see a lot of reason to use both sql and mongodb in the same project, at least if you have interact with both within a single transaction

hybrid dirge
#

Indexing and searching

#

Document databases will always out perform SQL matching

paper flower
#

I feel like that's completely irrelevant to probably 95%+ of projects

#

When you have hundreds of gigabytes of data you should probably be fine, and then you could think about using specific databases for your search functionality, if you end up needing them.

hybrid dirge
#

Depends on your field, and specific use case. For example building a search engine or needing to ad-hoc reference historical data on the fly that isn't cached. You can use a document style db for a lookup map to help optimize your SQL set.

#

There's a lot of use cases, but it really boils down to experience

#

I've also been doing data science for about 20 years

coral wasp
#

Also, all problems are ducky_blurp problems to me, anyway. (And I'll take parquet over anything else)

hybrid dirge
# coral wasp I'm very happy with using DuckDB to query json files, fwiw, rather than going th...

Yeah that's totally fair man - it does boil down to what you and your team prefer.

In the context of simplicity a lot of headaches for relational databases can be solved in document databases, but they swing on a pendulum of generating inverse problems due to being un-marshaled.

I can't say I use mongo all that often myself, I just live in cloud infrastructure these days and choose the path of least resistance when designing a schema for database problem.

brazen charm
hybrid dirge
placid crag
#

#include <stdio.h>
#include "sqlite3.h"
//#include <sqlite3.h>
//main.c:2:10: error: 'sqlite3.h' file not found with <angled> include; use "quotes" instead
// 2 | #include <sqlite3.h>
// | ^~~~~~~~~~~
// | "sqlite3.h"
//1 error generated.

int main() {
sqlite3 *db; // указатель на базу данных
// открываем подключение к базе данных
int result = sqlite3_open("test.db", &db);
// если подключение успешно установлено
if(result == SQLITE_OK)
{
printf("Connection established\n");
}
else
{
// выводим сообщение об ошибке
fprintf(stderr, "Error: %s\n", sqlite3_errmsg(db));
}
// закрываем подключение
sqlite3_close(db);
}

//LINK : fatal error LNK1181: cannot open input file 'sqlite3.lib'
//clang: error: linker command failed with exit code 1181 (use -v to see invocation)

old portal
placid crag
placid crag
placid crag
jade wing
#

yeah 🙂

torn sphinx
#

Hi

torn sphinx
#

Text Russ + C this is bad combination.

placid crag
placid crag
torn sphinx
# placid crag Why?

Because combining completely different letters can confuse someone, especially beginners. And the difference between Russian and English writing is significant.

#

No, just in order, although I'm using Python and I don't understand much because I hardly focus on the language.

torn sphinx
placid crag
torn sphinx
#

Me use the English and my language Spanish for my codes. @placid crag

#

For easy

placid crag
torn sphinx
#

Aveces a mi se me da un poco largo el código y como algunas veces ahí un lado que necesito revisar y otro la verdad es más fácil poner eso.

#

Otras líneas están conectadas con otra línea se transforma en un problema de orden

blazing gate
#

Anyone experienced with neo4j? have some queries to ask

pastel vale
#

Experimenting on a UFC database to practise SQL queries and how to visualise data. The UFC database, I got from kaggle, contains the fight record and body statistics (i.e. height, reach, weight class) for every fighter. I want to make a query that sums up the wins accumulated per weight class. How would I go about it?

thorny anchor
#

what have you tried

pastel vale
#

not tried anything yet. im thinking Ill create a view to join two tables (as one table contains the fight record and another contains the weight class). Then, maybe create a windows function to calculate the sum of wins per weight class. I could do this with python but, i want to improve my sql skills

pastel vale
# thorny anchor what have you tried

/\ btw, I've managed to get an engineering apprenticeship but I still want to use what I've learnt from these past 2 years about data analysis and databases, into the apprenticsehip. Any advice on how I can try to improve further?

tepid basalt
lean olive
#

The website and content looks completely generated with AI. Are you affiliated with it?

jade wing
#

this almost sounds/looks like advertising
besides, this is probably not even the right channel for this
#data-science-and-ml is more Pandas/Polars and data analytics territory
and don't go advertise in that channel as well now

hollow nacelle
jade wing
hollow nacelle
spiral geode
#

Hello everyone,
I'm seeking assistance and feedback on my new open-source package, fastjson-db. It's a lightweight NoSQL database designed for speed and simplicity.
I would greatly appreciate any help with testing, code review, or general feedback on its functionality and usability. Your insights would be invaluable!

This is my first open-source package, so mistakes are expected. Feel free to open issues or pull requests.
Check it out here: https://github.com/MauricioReisdoefer/fastjson-db
Thank you in advance for your time and support!

GitHub

Contribute to MauricioReisdoefer/fastjson-db development by creating an account on GitHub.

storm mauve
# spiral geode Hello everyone, I'm seeking assistance and feedback on my new open-source packag...

I feel like if using SQLite is "overkill" for your project, you should either just use shelve or pydantic (whenever you serialize to & deserialize from JSON files directly, or use something like SQLModel)

Your examples do not feel very convincing either, pretty sure it would be simpler to do the same thing with other libraries? In particular:

  • having to flush feels awkward, at least consider adding a context manager that does it automatically
  • Either use a decorator or inheritance, requiring both for the user to inherit your model class and for them to use a decorator in the same class is odd
  • Optionally combine "define a Table" and "add table to the Registry" into a single function call
#

also comments in Portuguese still make sense if all developers involved speak it, but keep the docstrings in English if you hope for international users to use it

#

if that was a serious project, I'd also recommend using an expression based way of querying things to chain all filters you need before loading data instead of calling get_all() in each JsonQuerier method

(see polars for a practical example)

#

lastly, that fixture is very weird: https://github.com/MauricioReisdoefer/fastjson-db/blob/main/tests/jsonquerier_test.py#L20

you should keep the code inside of the with tempfile context manager, as long you yield inside of it it'll only exit after the fixture is no longer being used so need is no reason to detele=False then delete it yourself, and you should use the Path it gives you directly instead of getting its name

(if it was for some reason like your library only accepting str and not Path objects, do add support for Path objects)

spiral geode
# storm mauve I feel like if using SQLite is "overkill" for your project, you should either ju...

First of all, thank you for the feedback!

About the comparisons:

Using / substituting shelve or pydantic is not exactly the focus of my library. My goal is to build a lightweight NoSQL ORM/Database, closer in spirit to something like TinyDB (or even pymongo on a smaller scale). But your point is valid — other libraries already exist, and this project is mainly for my own learning process, including packaging and distribution. Who knows, maybe one day it could be genuinely useful.

About the structure:

Flush: I agree, having to flush manually is awkward. A possible solution is to create a base class that automatically flushes and also handles table definition + registry in one place.

Decorator + inheritance: only inheritance is required. The decorator I’ve been using mostly for style, but I’ll probably remove it from examples to avoid confusion. And now seeing in the documentation i said about them. It was mostly to tell it uses dataclasses. Because JsonModel is already a dataclass, you don't need to put the decorator again, but i didn't know exactly how to write that. Probaly i'll change that comment.

Comments/docstrings: the Portuguese comments were for my teachers, but I’ll soon translate everything to English to make it more accessible.

Query chaining: this is something I’d really like to add in the future, though I’m still figuring out how to design it. I’ll study projects like polars (like you said) for inspiration.

Testing/fixtures: you’re right, the fixture I made is unnecessarily complex. I’ll fix it by keeping the tempfile context and adding support for Path objects directly in the library.

Thanks again for the thoughtful feedback! I’ll work on these improvements and come back later with updates.

#

The flush system i thought would be a good idea when i saw the performance being completly trash. So i changed it to save everything in cache and later on "flushing" to .json files. Something like C buffers. And it actually helped with the performance, that's why i mantained it, but hiding this complexity would be good

harsh pulsar
#

"JSON + lightweight kv store" isn't new, eg TinyDB or doing it manually with shelve or dbm. But they aren't exactly a lot of options either, so I'm not going to doubt that there is room for innovation here. But it might be interesting to describe upfront how your thing differs from those things

#

For example shelve and dbm are very low level. And TinyDB is a Mongo-alike and that's not everyone's cup of tea

#

Alternatively, maybe the value proposition is precisely that your thing is a higher level wrapper over established lower level things. "I wrote this library so you don't have to write this boilerplate code yourself" that kind of a project

#

Provide some opinionated structure over what would otherwise be an open ended task

#

That's not a bad thing either

spiral geode
# harsh pulsar Does this handle concurrent write and/or read?

Thank you for the comment!

Currently, FastJson-DB doesn’t handle concurrent read/write operations yet — that’s planned for an upcoming release.

About positioning: you’re right — JSON + lightweight KV isn’t new. Libraries like shelve and dbm already cover the very low-level use case, while TinyDB is a more flexible “Mongo-like” approach. The idea behind FastJson-DB is to provide something in between: a lightweight NoSQL ORM-like system that offers some opinionated structure (dataclasses, typed models, organized tables) without being as heavy as a full database system.

I’m currently restructuring the core so it can be simpler, faster, and easier to use, while avoiding users having to deal with complexity. The roadmap includes a journaling + garbage collector system, (still not in github), which should keep performance high but still ensure file safety and consistency.

If you’re interested, I’d be happy to share how the next update will work so you can provide feedback.

Thanks again — your feedback helps a lot

(Note: the current version is just an early prototype and should be majorly discarded with a new structure — I even built a working forum on top of it — but it still needs a major reformulation. That’s what I’m working on now.)

sharp tinsel
#

Hey folks, I have a very open ended question. I've been using SQLAlchemy on and off for about 10 years, and the structure and features of the API just never clicks. I can't remember any of the semantics and I end up copy-pasting stuff from documentation (recently with the help of LLM:s). I think I should understand all the constituent parts by now -- sockets, SQL, serialization, modeling, async, etc, but the way SQLAlchemy is put together just doesn't make sense to me.

It all ends up feeling more like Java than Python, with a never-ending pyramid of wrapping classes and indirection. Maybe I don't understand why the abstractions are laid out the way they are, which creates a lot of cognitive overhead, and I find that I just want to drop down to a familiar raw SQL socket connection all the time. Reading the source code has helped me understand many other libraries, but trying that with SQLAlchemy kind of leaves me even more confused.

I don't mean to say the library is bad, I'm probably just a bit dumb. Did you guys have any point in time when SQLAlchemy finally "clicked"? Like, are there any deeper points I seem to be missing, or is it a matter of going wide enough?

jade wing
# sharp tinsel Hey folks, I have a very open ended question. I've been using SQLAlchemy on and ...

sorry to say, i've never ever liked a single ORM i've encountered in any programming language through my years of programming and just think that SQL is so much easier (not that it is always easy, but still easier than the same functionality from most ORMs) and only use ORMs very seldomly, i'm mostly use raw SQL with bind variables/placeholders (to keep security intact)
but i'm also guessing it's not the fault of the ORMs but rather the failing is probably on my part 🤷

silent bison
tepid basalt
visual prism
#

I have a problem achieving the data sets from my sql account

minor cradle
#

Hey guys, i am new in programming and i have reached out in the flask to the database could you recomomend me a stack to choose for the database and with the sources please

jade wing
jade wing
# minor cradle I have just started

then just build something like a prototype using something simple like sqlite3 to start with
first to learn and also to discover what requirements the project may have

hollow spindle
cerulean flume
#

Hey guys, I just made ljobx, a Python CLI tool to explore LinkedIn jobs using public guest endpoints - no login needed. Fast, async, filterable, proxys/rotation support … built to make finding your next opportunity way easier.

PyPI: https://pypi.org/project/ljobx/

jade wing
obtuse galleon
#

I've started learning about SQL Databases :D

warm cedar
autumn hornet
#

How to learn python backend developer

cinder saffron
north pumice
visual prism
#

<?php
// deploy_ftp.php
// Minimal PHP FTP deploy for a static site

$server = 'ftp.example.com';
$username = 'ftp_user';
$password = 'ftp_pass';
$localRoot = DIR . '/dist'; // change to your build/output folder
$remoteRoot = '/public_html'; // change to your web root on the server

$ftp = @ftp_connect($server);
if (!$ftp) {
fwrite(STDERR, "Could not connect to $server\n");
exit(1);
}
if (!@ftp_login($ftp, $username, $password)) {
fwrite(STDERR, "FTP login failed\n");
ftp_close($ftp);
exit(1);
}

// Passive mode often required behind NAT/firewalls
ftp_pasv($ftp, true);

function ensureRemoteDir($ftp, $path) {
$parts = array_filter(explode('/', trim($path, '/')));
$curr = '';
foreach ($parts as $p) {
$curr .= '/' . $p;
// Attempt to create; ignore if it already exists
@ftp_mkdir($ftp, $curr);
}
}

function uploadDir($ftp, $localDir, $remoteDir) {
if (!is_dir($localDir)) {
throw new RuntimeException("Local directory not found: $localDir");
}
ensureRemoteDir($ftp, $remoteDir);

$items = scandir($localDir);
foreach ($items as $item) {
    if ($item === '.' || $item === '..') continue;

    $localPath  = $localDir . DIRECTORY_SEPARATOR . $item;
    $remotePath = rtrim($remoteDir, '/') . '/' . $item;

    if (is_dir($localPath)) {
        uploadDir($ftp, $localPath, $remotePath);
    } else {
        $ok = @ftp_put($ftp, $remotePath, $localPath, FTP_BINARY);
        if (!$ok) {
            throw new RuntimeException("Failed to upload: $localPath");
        }
        echo "Uploaded: $remotePath\n";
    }
}

}

try {
uploadDir($ftp, $localRoot, $remoteRoot);
echo "Deploy complete.\n";
} catch (Throwable $e) {
fwrite(STDERR, "Error: " . $e->getMessage() . "\n");
ftp_close($ftp);
exit(1);
}

ftp_close($ftp);

#

Ready to deploy

jade wing
shadow sandal
#

Hi, I’m Francis 👋

Aspiring Data Engineer learning Python & SQL, currently building my first projects.

Excited to learn & connect 🚀

ashen socket
icy ruin
#

What's the most lightweight ORM you guys would recommend? I'm just writing simple queries but don't wanna write raw SQL in code

spiral geode
stray viper
#

I tried opening another one and answering it on here but the bot also closed that one

stray viper
#

the fix was to add an __all__ field with the model classes in the module I say where the models + not using the fastapi integration (it doesn't seem to be updated as it just silently explodes and using the previous method also doesn't work)

final forge
#

Hello i am new

coral wasp
near locust
#

I’m a Full-Stack Developer ready to take on new project.
Let’s connect!
If you have a project that needs a reliable developer, message me

jade wing
#

!rule ads

delicate fieldBOT
#

6. Do not post unapproved advertising.

old island
#

Ammmmm

storm mauve
orchid niche
uneven pagoda
thorn flint
#

Would anyone be able to help me put together logic for this sql exercise?

I have a table called company_table

3 columns: department, num_of_emps, types_of_devices

types_of_devices has these values: phone,laptop, camera,tablet

In one table view I want the count of accounts and count of employees who have:
1)only phone
2) phone and laptop
3)phone and other devices (not laptop)

thorny anchor
#

what have you tried so far?

analog gust
#

Hi everyone, my name is Konstantin.

I have a huge favor to ask the server, my best friend recently got into an accident and now he cant finish his university thesis. Its a small python game that also uses q learning. I will present his thesis in his place and i also have to finish it. But unfortunately i am not very good at these stuff. If someone with a kind heart could help me. I would appreciate it so much!!

I have most of it ready, i need small help with the database and with the AI.

coral wasp
analog gust
#

I finished with the database, but thank you very much!!!

#

im now trying to figure out why the car moves weird and the q-learning agent

coral wasp
analog gust
#

yes sir!!

earnest warren
#

What is database?

cedar tiger
# earnest warren What is database?

its what we use to store data in. so it allows applications to query the saved database, creates new records in the database, updates the records in the database and deletes the records in the database.

earnest warren
#

So, its just a storage place for online services?

#

Damn! Its sounds professional

long pendant
topaz widget
# earnest warren What is database?

Just the word on its own is so vague as to mean pretty much anything that plans to remember some data for you and give it back later. (Even if that place is just in RAM, and would go away if you rebooted, actually.)

Where it gets interesting is when you dive down into a "kind" of database. By default people usually mean "relational database" when they say the word on its own.
There are also "key/value" databases, "column-oriented" databases, etc, etc.

SQLite is a popular open source relational database to get started with, has easy Python libraries.

The "programming language" you talk to many databases with is called SQL, but there are others, especially for the more specialized kinds of databases.

The first database I worked with at a job was called IMS, an old IBM thing of a type we call a 'hierarchical database".

topaz widget
#

PostgreSQL 18 can do Index Skip Scans now. Pretty useful sometimes.
I'm no longer really a massive pgsql fan for my own work but it's all over etc, and it might be worth mentioning.

So, like, let's say you have:

CREATE INDEX idx_laser_face
ON staff_members (status, retires_at);

...and you want to do:

SELECT * FROM staff_members WHERE retires_at <= NOW();

...before this, the idx_laser_face index would be ignored, because the status column isn't being materialized.

Before PostgreSQL 18, this query was a TABLE SCAN!

In particular, FinTech db tables often need multi-column indexes to get any kind of plausible performance, and often end up extremely expensive to ALTER without complicated games. Always be careful setting up multi-column indexes in a table that might end up growing. Measure twice, cut once, etc.

topaz widget
tough condor
#

hi

#

what is a good vehicle detection dataset that is easily manageable

harsh pulsar
topaz widget
#

and actually, languages in the K family are basically THAT, and are used in finance land

#

K is wild

harsh pulsar
#

and i never thought of awk as being like an array language

#

but in a way it kind of is

#

that's interesting too

topaz widget
#

I believe there's some kind of family tree relationship between Awk and APL, but maybe I am misremembering?

harsh pulsar
#

i'm not aware of one. i usually think of AWK as being an offshoot of the C/Unix development ecosystem

#

and I think of APL as its own distinct lineage

#

unless Iverson was connected to Bell Labs or something like that?

topaz widget
#

Indeed, Bell Labs is I guess the root of my memory here.

#

Both vectorized over interesting collections, both funky terse syntax, but I guess no direct ancestry.

harsh pulsar
#

yeah i never thought of AWK as vectorized

#

it's maybe a bit of a stretch but i think it's fair

topaz widget
#

Interesting, to me it's always been the vector tool kinda, just with lines as the default 'first class citizen'

#

Also with jq though I guess JSON top-levels can be {} as well as []

harsh pulsar
#

i was actually thinking about awk for the first time in a while today because there was a lobsters thread about the sam editor, and some of the commenters brought up awk

deep jetty
topaz widget
deep jetty
#

Are there parsers that don’t?
Python’s does: #bot-commands message

topaz widget
#

But I've run into more than one where {} and [] were the only valid toplevels, especially early ones to support 'streaming' processing modes.

#

I actually like that interpretation because IMO if you just want a number, use a different format

#

and same with null, what's the point of having a text file with 32 bytes to say null?

#

Anything that isn't an object or a list of things is degenerate IMO

#

they should have chosen nothingheretoseemovealong instead of null to save space in the repr.

#

mu would have been 1000x cooler.

harsh pulsar
#

It's one of those things where the spec imposes basically an arbitrary restriction that in practice costs everyone nothing to support

#

And while normally supporting something off spec opens up a risk of incompatibility across implementations, this one is so ambiguous it's hard to argue with not implementing it just as a courtesy to users

#

Postel's law and all

#

I feel the same way about comments, or at least line comments. But apparently that opinion is not widely shared enough to get the same treatment

topaz widget
harsh pulsar
#

While we are still kind of on the topic of AWK, you know what feature is amazing and should be in every database and data frame library? Snowflake match_recognize

#

It's regex for rows

#

It's an absolutely amazing feature

#

And it's something I personally have wanted for such a long time

#

Once I wanted it so badly that I almost resorted to manually translating row patterns to strings so I could run a regex search on it

#

I guess it's not that useful for transactional data

#

But for analytics, oh man

topaz widget
harsh pulsar
#

It definitely has some challenges when it comes to performance optimization

#

they keep coming out with various solutions for various cases, but sometimes you just have to deal with the fact that there are no indexes and your only optimization tool is clustering, which is not conceptually that different from dumb hive partitioning

#

But I'm curious what other peoples struggles are because I'm sure they are not the same as mine

topaz widget
# harsh pulsar What was the workload where you struggled with performance?

I’m struggling to remember exactly, but what I can reveal of the application was a pretty scaled-up Dask Cloud-based worker setup, where we had a temporary very fast MySQL as working space, and were every morning trying to compute our “financial graph” data into “pre-computed” form so that the kinds of common answers our customers wanted didn’t have to be calculated on the fly. Lots of data though, $5T of financial records.
I think it was around how long Snowflake was taking to ingest that, but I might be misremembering and it was the other way around, getting it back out? I do like their security model and ability to “share” data with customers.

harsh pulsar
#

It's like Spark in that regard

#

The slowest thing you can do is move lots of data around

topaz widget
#

Yeah definitely a weird application, pushing a lot of limits to get inside the necessary “pre-trade” T+1 window, which is why I don’t go around saying Snowflake is bad. I would still use it for a lot of stuff.

#

Especially “selling” datasets to customers.

harsh pulsar
#

My favorite way to export data from snowflake is by saving it directly to S3 via "external stage"

topaz widget
#

Definitely. I wish AWS gave us Hyperplane access so we didn’t have to use their panoply of garage private connection options.

#

I should be able to “know about” the relationship with Snowflake in the software defined networking layer

#

Privatelink and most of the others are awful

#

And the fully routable gateway option just pours money into their hands like crazy, you end up paying for the traffic like three times

#

I just want a dual-plane fully connected network of low-level shit to run something better than TCP/IP on with no overhead.

#

Time-Sensitive Networking (TSN) is a set of standards under development by the Time-Sensitive Networking task group of the IEEE 802.1 working group. The TSN task group was formed in November 2012 by renaming the existing Audio Video Bridging Task Group and continuing its work. The name changed as a result of the extension of the working area of ...

harsh pulsar
#

All I gotta say is, I have never even come close to thinking I might need anything resembling that 😆

topaz widget
#

At scale it’s easy to get on the chat with your AWS specialist and come up with an elegant design that turns out to cost more money than you could possibly justify

#

And sometimes the workaround you need is crazy complicated just to save money

#

Transit Gateways are like this. They do everything, but they multiply your network costs by a constant factor forever

#

I built a multi account design for our org at work a couple of years ago, initially around transit gateways, using really careful and IMO cool IP ranges so we didn’t have to coordinate between teams ever again

#

And we had to basically cripple it completely when we looked at the billing reports

#

At the time I had admin on $1.3M/month of AWS spending and we were like “oh naaaaaaah”

#

This is what I think we wish we would have done at that job I described above, but it wasn’t really mature enough at the time

#

(Though not applicable to OLAP as far as I can see)

#

But respectively I guess those are my current fave OLAP and OLTP dbs. CockroachDB is good too.

harsh pulsar
topaz widget
#

Yeah, but this implementation is good and fast etc

harsh pulsar
#

And a lot of programmers like it too because you can kind of use it as a replacement for a "data frame", but with a more familiar API (SQL)

#

It's also really good at dealing with a whole variety of data formats

topaz widget
#

I’m about to start helping with a Rust “in process” config management library that looks great, in-process is often really useful

harsh pulsar
#

JSON, Hive, Iceberg, local, network, whatever

#

And it can be pretty fast

topaz widget
#

I like Cassandra for a bunch of reasons but administering it is awful

harsh pulsar
topaz widget
harsh pulsar
#

Ah

#

What does "in process" mean then?

topaz widget
#

Instead of a CLI, it’s a library with modules you can pick and choose from, and do from whatever piece of code you already have that makes sense to enhance.

#

And it can do every paradigm, from agentless like Ansible to fanout fancy “pull” agents with brokers and everything in between.

#

Which is cool because now you can use the same tool for simple stuff and silly hyper scale stuff

#

Used to be called “duxcore” but I joined his Discord as the 3rd member and now it is “regent” haha

#

Wait sorry there’s a better page with diagrams

#

This is a bad description because it is far from being as limited as Ansible in terms of “paradigm”

#

(Temporary logo I made, some kind of fun stylized crown seems like the right idea eventually)

#

(I think his robot arm looks like it is having a bad day and is a little depressed)

harsh pulsar
#

I like the circuit starburst logo

coral wasp
topaz widget
hushed smelt
topaz widget
harsh pulsar
#

It's a column oriented database meant for "analytics" type workloads

#

Imagine a workload where you have some business process that emits a JSON status output to S3 every 30 minutes from a fleet of 20,000 devices, assigned across dozens of customers. You want to generate a nightly report that, for each customer, displays a time series in a 6 month window as well as a few different moving averages and a moving standard deviation, and computes some aggregate per customer metrics.

#

You could do that all in Duckdb

#

There is a geospatial extension but I don't think it handles raster data

hushed smelt
harsh pulsar
#

points polygons etc

hushed smelt
#

Correct c

#

I was working on creating a olap system for rastor data

#

So I was exploring different options

#

Have you explored this side of the domain?

harsh pulsar
#

I have not, I almost never work with raster data

desert wadi
#

i'll be learning DataBases in my 2nd semester - starting 13th Oct

strange elm
rose mortar
#

Does atomic operations (update) guarantee concurrency? For example if multiple calls are made to update a db, will there still be a race condition?

desert wadi
digital yew
shell perch
#

I have a community project called "Portfolio do <dev>". Our main goal is to bring together people who want to participate in complex projects that are difficult to complete alone and start developing projects as a team. This is not a paid work, it's a voluntary work, in the end of each project you will be able to show off yours skills and put that project on your portfolio. We're strongly focused on networking and team work, interacting and having fun while developing. Feel free to chat me on private

digital yew
#

!rule 6

delicate fieldBOT
#

6. Do not post unapproved advertising.

steel pilot
#

.

tender vortex
#

which of those are supported by Python?

cedar tiger
tender vortex
#

I mean I want to know then search for best efficient db for Python

cedar tiger
cedar tiger
tender vortex
#

I mean fast with Python

cedar tiger
#

Postgresql or MySQL are usually the 2.

#

Then flip a coin

cedar tiger
#

Dont overcomplicate yourself bro

tender vortex
#

all are supported?

cedar tiger
tender vortex
#

also which one do u recommend for Python?

tender vortex
#

Alright ty

grim vault
#

I would start with SQLite. Python does support it out of the box with an standard module.

red matrix
#

I'm working on a project with a Postgres database and there is a 'pubdate' column with the type TIMESTAMP. Is there a way to make the DB convert this to a string in the query? The docs have different kinds of data conversion functions but I got kind of lost since I don't do db stuff much.

topaz widget
#

SELECT (colname::timestamp at time zone 'UTC', 'YYYY-MM-DD"T"HH24:MI:SS"Z"') I think?

#

something like that? I've been in MySQL land instead recently

#

SELECT to_json(now())#>>'{}' zomg

#

which does force iso8601 so.. wild

#

thems the haxx, lads and lasses, wow

red matrix
#

Lol yeah there is zero chance I would have figured that out on my own

topaz widget
#

the JSON stuff they added to pgsql is actually potent and worth learning

#

You can query for deeply-nested keys etc.

#

(I don't actually totally recommend this as a from-scratch idea, but it can make dumb things easy once you get there)

winged stump
#

I use sqlalchemy with aiosqlite. Is there a way to completely ensure this doesn't happen? Or must I just implement a retry logic?

#

Or it's my code's problem? Let me know and I'll send snippets

waxen finch
winged stump
#

Alright thanks

topaz widget
#

When do we rename this channel to #json-storage-engines? ABBATH

red matrix
#

This isn't really a python thing, just sql, but I have a table with a bunch of TEXT columns I want to convert to VARCHAR.

ALTER TABLE Inventory MODIFY Description VARCHAR(MAX);

That's an error because mysql apparently doesn't like MAX;

ALTER TABLE Inventory MODIFY Description VARCHAR(65535);

Still no go, but progress. It is telling me the max size for a varchar is 16383. I guess that makes sense because it is 65535/4.

ALTER TABLE Inventory MODIFY Description VARCHAR(16383);

Well this isn't going to work either.

Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

Now we all know nobody reads documentation, but I tried to find out how to fix this but I found nothing that made sense.

So I have several questions:

Why is the max row size 65535? Can that be made larger?

I have several other fields (a few ints, decimals, etc). How do I know how much space I have left over for VARCHAR fields?

I read somewhere that it is better to use VARCHAR(n) than TEXT. Is that true? I'm trying to switch because mysql is (for some inscrutable reason) returning BLOB types when I select a TEXT column.

red matrix
#

Yeah I was just really trying to get away from the TEXT types because they've been giving me trouble, but I might just have to work around it

autumn hornet
#

As i known MySQL now I want learn with python should I learn MySQL-python+connector or sqlite3 library?

rustic pasture
#

I'm using psycopg3. If I want to do a large amount of transactions, currently I find myself opening a context manager each time. If I call pyscopg.connect().execute() will it close the connection automatically as if it was a context manager after committing the transaction ?

#

Also I'd like to have a DB interface class, where I expose various operations through methods, and from a performance standpoint I think it'd be better to have a connection pool property of the instance. Since the use of a connection pool is through a context manager, how can I have a pool going as long as the instance exists ?

winged stump
waxen finch
winged stump
harsh pulsar
#

You can try contextlib.closing if you want to use a context manager to open and close the connection

harsh pulsar
#

The best way to set this up kind of depends on how your application is structured and how it actually works

#

I love context manager syntax and I love the context manager protocol, it's one of my favorite things about Python. But actually using context manager syntax is not always the best choice for long lived resources like a connection pool

#

Often you end up wanting something more like a dynamic scoped variable and passing around your resource as a function or class init argument gets clunky or infeasible

bronze jackal
#

Hello everyone. I want to ask for advice to start with databases. For context, I only know that a database is a way to store data in a structured way. I have been using python for games, Tkinter applications and some maths problem solving/applications. I also know a bit of C and Cpp

chilly zealot
# bronze jackal Hello everyone. I want to ask for advice to start with databases. For context, I...

First you should learn how to model databases, look into E/R modelling tutorials, there are plenty free resources online

After you practice a lot and feel confident with them, you can start learning any type of SQL you like the most, I highly recommend PostgreSQL tho. Then start transforming those E/R diagrams you have made in the previous steps into real database tables and after that, learn how to insert, delete and retrieve data from the DB using SQL

#
roadmap.sh

Comprehensive roadmap to learn SQL from scratch in 2025. From basic syntax to advanced querying, this step-by-step guide will equip you with the skills needed to excel in database management and data analysis.

#

And remember this: never forget your where clause when deleting

bronze jackal
#

Thank you very much @chilly zealot

chilly zealot
#

You are welcome, feel free to dm me any questions you have

coral wasp
bronze jackal
#

Thank you @coral wasp

cursive hatch
#

Hiii

unborn kettle
#

Any good resource to.learn DBMS

#

?

bronze jackal
#

I came across TiDB on youtube. How is it?

topaz widget
#

New to me, but interesting; we actually looked for something like this at my last job when picking tools for what we built pipeline-wise, and I guess it didn't exist yet.

#

Basically "MySQL plus non-sucky columnar mode". Never tried it, but it looks intriguing.

coral wasp
harsh pulsar
#

S3 iceberg tables (possibly snowflake-managed/created) that you can then query from either snowflake or duckdb

coral wasp
#

I'm curious how much I'll actually care. My workflows are well served by parquet

harsh pulsar
#

as far as I understand Iceberg is just a collection of parquet files, in a particular layout + particular metadata

#

but maybe I misunderstand what Iceberg is

topaz widget
#

To me the coolest thing about parquet is the zero-cost concatenation. Cool trick.

coral wasp
topaz widget
#

I've been in the "larger than any amount of RAM you can put in a server" scene recently, kinda fun also.

#

Not big data but "medium interesting data"

#

Which turns out to be, for example, what FinTech is about

coral wasp
topaz widget
#

Lots of 'data products' to compute in parallel

#

The existing app was 'boiling the ocean' to calculate everything on the fly

#

So we built a thing that was three orders of magnitude cheaper to run haha

coral wasp
#

How large was the working set?

topaz widget
#

Dask is cool

coral wasp
#

Yah, I do like Dask

topaz widget
#

Uhh, can't quite remember but like.. a couple TB?

#

Tedious to restore but not impossible etc

coral wasp
#

Yah. Single digit TB is workable

coral wasp
topaz widget
#

His gig was going to places like Dow Chemical, after finding them in the EPA's Toxics Release Inventory dataset and finding out how much they were paying to get rid of toxic waste.

coral wasp
topaz widget
#

And then he'd say "I can save you $X million per year on Process Z in exchange for a small percentage"

coral wasp
#

lol, my father did that with utility bills

topaz widget
#

He did that for Dow, Reliant, a bunch of other big things

#

These days what he was doing is called P2: "Pollution Prevention"

brazen charm
#

it's great right up until it isn't, where some parts and features just don't work

brazen charm
# coral wasp Tell me more!

Most of the AWS tooling doesn't properly work with iceberg on S3 tables, for example Athena & in particular views, break when using arrays.
Compaction is not very customisable, my co-worker knows more about the issues in detail as there was more, but we ended up just using iceberg manually on S3 with Spark and friends, because we could actually control it and debug things when it broke.

#

I also can't remember if this was a thing or not but I think we also had some issues of just not being able to view the buckets with normal CLI tooling and that caused other issues, something along those lines

coral wasp
lean walrus
#

How do I change an existing table structure by providing a new table definition? For example,

`accounts` (
  `username` VARCHAR(100) NOT NULL,
  `email` VARCHAR(255) NOT NULL
);
``` I have this table already in the db, now I want to change `username` field to `VARCHAR(120)`, how do I do that with a new definition rather than using `ALTER TABLE`?
pure grove
#

first creat the new table that inculding the struct

#

second copy the data from old table to new table

#

delete the old table
finally rename the new table with old table

#

r u understand? @lean walrus

#

hey, r u there? @lean walrus

#

if you want to more detail, send me DM

lean walrus
pure grove
topaz widget
#

(I would ALTER TABLE accounts -> accounts_old vs. dropping it, myself, but yeah)

lean walrus
# pure grove this is full code <@650447110402998302> ```sql CREATE TABLE accounts_new ( us...

yea that probably works for simple table schema, what about a more complex definition:

CREATE TABLE product_order (
    id INT NOT NULL AUTO_INCREMENT,
    product_category INT NOT NULL,
    product_id INT NOT NULL,

    PRIMARY KEY(id),
    INDEX (product_category, product_id)

    FOREIGN KEY (product_category, product_id)
      REFERENCES product(category, id)
      ON UPDATE CASCADE ON DELETE RESTRICT,
)   ENGINE=INNODB;

The foreign keys doesn't get copied since it is on a separate constraint, not column definition.

lean walrus
topaz widget
#

Well, I've never seen that ever so yeah, don't do it then

#

Personally I would stand up a whole different database for such an evolution

lean walrus
#

I like how ORMs like django's ORM handle migration. IDK how to implement them using plain sql

topaz widget
#

It works the same way; you just do two steps:

#

first migration adds new columns etc but doesn't break the existing deployed code

#

next thing is a deploy of code that's aware of the new thing

pure grove
#

you mean is about the primary key(id)?

topaz widget
#

then a second migration to drop the old etc

#

a clean migration needs to be decoupled from a 'cold deploy' that you might not ever be able to find the downtime for

lean walrus
topaz widget
#

If you have very complex changes to make, an "ETL" tool might be the right choice vs. just migrations

topaz widget
#

Rails and things inspired by it have an explicit framework for doing this

#

But you can script it yourself

lean walrus
#

do u have any references I can look at?

topaz widget
#

OK I found two that seem decent enough.

#

"expand and contract" seems to be a popular name for the pattern now

#

Schema migrations are a pain. They typically require multi-step playbooks to roll out in production and migrate your data over. In this workshop we review the problem space and tools available to make reversible schema changes in production with zero downtime.

Xata 🦋
This presentation was brought to you by Xata (https://xata.io/), the only s...

▶ Play video
#
#

(the two videos there seem to have very similar lists of things they cover, not sure which is better)

lean walrus
#

damn thanks a lot, I'll definitely look through them

topaz widget
#

the tl;dr is just..

  • Expand schema with new stuff
  • Deploy code that writes to both for now
  • Switch the app to the new stuff as primary once the "backfill" finishes
  • Deploy code/migration that drops the old stuff, now no longer used by your code, removing the 'legacy' if/else branch
#

It's work but it's the only thing you can do once you have enough data that the ALTER TABLE downtime isn't something you can fit in.

#

It would have taken two weeks of downtime to ALTER TABLE any indexed column in the main table at my last job.

#

(MySQL is super bad at online schema changes, even today)

lean walrus
lean walrus
topaz widget
#

moving from MongoDB to PostgreSQL at Code Climate tripled our infrastructure costs.

#

(Admittedly we were using them pretty hard etc)

lean walrus
topaz widget
#

So we had to run 3x as many servers to do what we'd been doing

#

clustering is powerful, you can say "I don't care what the primary key is, lay this table out on disk in "employee id order"", or whatever

#

Microsoft SQL Server is fantastic at it.

#

so is Oracle

#

clustering lets you achieve memory-locality, a thing that RDBMS normally suck at

#

but is crucial to speed on modern hardware

#

I don't love MySQL and never have, but I have seen it stand up for sure. The company I left earlier this year (since 2018), had $5 trillion of customer financial records in the single main MySQL db.

lean walrus
topaz widget
#

(I worked on the team using Python/Dask/NumPy/SQL to "offline" process that before the next day's market opened etc)

#

(Before that I was Senior Site Reliability there etc)

lean walrus
topaz widget
#

No partitions

#

(you need all the data at once to do certain calculations)

lean walrus
#

but I think MySQL popularity is decreasing and everyone starts using postgres instead

topaz widget
#

I haven't seen that myself

#

But maybe it's happening in certain sectors

#

I mostly see people moving from MySQL to Databricks rather than to another RDBMS

lean walrus
lean walrus
topaz widget
#

Cockroach is way better than pgsql to me

lean walrus
topaz widget
#

If you use it carefully you can process truly shocking amounts of data in parallel.

#

It's cloud but you can host it inside your own private AWS etc and just use a 'connector' out to their API as needed.

#

So they don't inherently 'see' your data unless you upload the results of your computations to them

#

It all runs on Spark, and I don't love Spark, but if you "phrase your problem right" to Spark, it's so fast.

#

The trick is figuring out how to do that, which isn't always obvious at all.

lean walrus
#

damn u really naming out every data platforms 😭

topaz widget
#

(Snowflake is pretty nice actually)

#

Databricks offered us a huge discount though so we switched

#

Snowflake has a cool thing where you can create datasets and then sell them to customers through Snowflake directly without handling payments

#

Databricks is more powerful but I think Snowflake’s “product management” is better

#

Databricks seems to just constantly be adding 70 new features, no idea how they are going to maintain it all.

topaz widget
#

Just the AWS accounts I had admin on were $1.3m/month, so when Databricks shows up and is like "Why don't you spend a bunch on us and move things off Amazon", executives start losing their minds

#

AWS networking is such a scam, it's SO expensive to plumb your databases up securely.

#

Often you end up paying for each packet 3 or 4 times.

#

especially if you need their Transit Gateway offering

#

"database behind transit gateway" probably makes a screen in AWS Sales's command center light up, and they pour a round of whiskey shots.

lean walrus
#

that's bad

#

btw, how long have u been through this data ecosystem thing

topaz widget
# lean walrus why would they do that...

Each NAT Gateway costs money per packet, each Transit Gateway is another, and a typical account-to-account setup using Transit Gateway is: [NAT in Account A] <-> [Transit Gateway in global networking account] <-> [NAT in Account B]

topaz widget
#

Under the hood of AWS networking is a software-defined-networking stack they call Hyperplane, and it could solve this pricing problem easily, but they don't let customers access it.

#

Hyperplane would just let you directly connect account services together etc.

#

So the network pricing thing is an obvious scam once you're aware of that etc.

#

Hyperplane would let you 'auto zero-trust' your whole networking layer etc.

#

It's aggravating. I don't use clouds for anything unless forced anymore.

topaz widget
#

Everything we were paying that $1.3m/month for runs easily on one rack-scale machine from Oxide.

#

And you can buy one for less than we were paying AWS every month etc.

#

They almost fired me for pitching Oxide instead of AWS haha

lean walrus
topaz widget
#

blah blah "capex is harder than opex"

#

Eat it, this is where we keep the treasure

#

This is what CapEx is FOR

lean walrus
#

hahahah truee

desert wadi
#

hello

#

any resources to learn database

#

i have to connect it with flask

#

but i never used it

#

it's my first time using flask and database

#

i have to learn a lot

#

how much time would it take me to master both?

cedar tiger
desert wadi
#

thank you mate

gloomy blade
topaz widget
gloomy blade
gloomy blade
topaz widget
past yarrow
#

anyone a data or bi expert here

#

and from india

#

we need one person in this role for an agentic ai hackathon

worthy eagle
#

Guys

#

any of u use HF?

#

or GEMINI

#

I need some working APIs

topaz widget
#

HF?

topaz widget
#

Video about the thing this is based on https://www.youtube.com/watch?v=-8QMqSWU76Q

Probabilistic inference is a widely-used, rigorous approach for processing ambiguous information based on models that are uncertain or incomplete. However, models and inference algorithms can be difficult to specify and implement, let alone design, validate, or optimize. Additionally, inference often appears to be intractable. Probabilistic prog...

▶ Play video
harsh pulsar
#

also something i will probably never use in practice 😂

#

and of course the weird and wonderful worlds of prolog and datalog, which i always wished i had more time to study

jade igloo
#

Which do you ultimately use for django

cedar tiger
jade igloo
#

Same here,
Was considering oracle for a while

cedar tiger
torn sphinx
jade igloo
desert wadi
#

if i learn sqlite3 with localhost

#

do i still need to learn other like MySQL, MongoDB etc?

#

or server based?

inland thunder
#

Your question is a little confusing... for most database systems you would learn using localhost with your own instance running on your machine, regardless of if that's an in-memory database or a relational one saved to disk.

Connecting to an instance of a database on the server is not much difference really beyond the connection string and any authentication you need to do to a different address.

Your reason for learning different database setups would mostly depend on what you intended to use them for/build

cedar tiger
pale crest
# desert wadi if i learn sqlite3 with localhost

sqlite is an SQL database, and so is MySQL, so most of your SQL knowledge would carry over between those. But there are differences in how it's set up, configured and maintained that you would need to study.

#

MongoDB is not an SQL database, so the differences are larger there.

#

Deploying a production database, rather than a local dev installation, is a different matter, and requires additional knowledge about configuration and networking.

desert wadi
#

i learned CRUD operations in SQL using localhost, so am i already halfway there? Security, Configuration, and Networking - will it be difficult or a piece of cake like SQL?

#

i created a table, inserted, updated, indexed, deleted and other functions as well using sqlite3

tepid hollow
#

is sql good or bad

coral wasp
#

Or maybe chaotic good. I guess depends.

quaint scaffold
#

well since it is the language of the actual database, relational databases at least, it is definitely "good" and not "evil." and since it really boils down to "just a string" that would mean "chaotic" for sure. except there are "names parameters" and some people even just use sprocs. so anywhere on the "good" scale I'd say.

topaz widget
#

SQL feels more Lawful Neutral to me.

#

Not sure. Tough one.

#

Lawful Neutral is the "alignment of the dutiful sentry"

polar ruin
cerulean turtle
#

Im trying to learn SQL and MongoDB but i dont know where to start (i dont have a laptop, im getting one in december), can anyone suggest resources with which i can learn in my phone

dark violet
# cerulean turtle Im trying to learn SQL and MongoDB but i dont know where to start (i dont have a...

Learn how to make a discord bot that tracks data with mongoDB Atlas (cloud) (last time I checked, it was inclusive of a free small memory pool to use with an account)

https://www.mongodb.com/products/platform/atlas-database

MongoDB

Find out how the document model eliminates operational complexity while ensuring unmatched resilience, scalability, and enterprise-grade security through the Atlas cloud database.

#

PyCord or Nextcord (discord related frameworks for making bot projects) and SQL queries triggered with python

#

A few of the gotchas will be discord bot related, so if you want to do something more simple just make an account on Atlas and write a python program that uses SQL and Tkinter (a python library for simple UI stuff), or just is a terminal program that lets you change the data on MongoAtlas via an API key.

desert wadi
#

or any other language for it

dark violet
#

Very basic SQL is supported, but MongoDb uses its own very similar looking NoSQL language MQL

#

But the patterns of interacting with data are similar. Learning those patterns of interaction for ETL (export transform load) and beyond will adapt you to just about any data oriented language you desire 1011667513236525170

meager brook
#

Would anyone have a reccomendation on a legitimate SQL course to learn and implement alongside python? Straight-cut, covers the main fundamental portions.

idle owl
coral wasp
idle owl
cerulean turtle
cerulean turtle
severe moat
#

All relational databases can run ACID transactions

#

But not all of them have the postgreSQL sauce

young trail
#

Hey I have a question about databases. I have a process that reads rows from a pandas dataframe and processes them in a FIFO manner via a queue, the process needs to loop over each row in order to make a descision on wether it goes into the queue, or not (meaning marked with a status). I am doing this in a dataframe but thinking i could make such operations I/O bound by querying for a single database row in a FIFO manner?

tight grove
severe moat
#

Same reason why most companies use dinosaur languages

#

Like C Java and php

fervent charm
#

why did nobody tell me about pgcli sooner, I've been stuck using psql like a caveman

coral wasp
#

you can, it's just usually missing the entire point of having a dataframe. ie: If there's a status column, then do it in one operation.

brave bluff
#

I'm wondering if someone could help me diagnose a MySQL connection issue with a Python app? I'm building a v2 of a site I currently run. The old site uses MySQL, whereas I'm moving to Python/Postgres. Both versions are developed on docker compose. So when developing locally, the MySQL server is up on one docker network, with port 3306 exposed, while the Python app is on a separate docker network. As I'm on Linux, I have the extra_hosts config set up

extra_hosts:
    - host.docker.internal:host-gateway

When I try to connect, I get a "Can't connect to MySQL server on 'host.docker.internal' ([Errno 111] Connection refused)" error. I am able to connect to the MySQL server via dbeaver (a db gui), on localhost, using the same credentials I'm feeding to python. The MySQL server is set up to listen on any address. Python is attempting to make an async connection via SqlAlchemy and asyncmy, but as far as I can tell, there is no config I'm supposed to set to support the connection. I'm using MySQL 8.4 and Python 3.13. I checked the grants, and theyre on *.* TO user@% WITH GRANT OPTION, which seems like is what it should be?

young trail
coral wasp
young trail
waxen finch
# fervent charm why did nobody tell me about pgcli sooner, I've been stuck using psql like a cav...

im not a frequent user of pg, but nice to know this exists! FWIW im hosting a postgres server behind SSH, and apparently paramiko v4 broke sshtunnel which pgcli depends on:
https://github.com/pahaz/sshtunnel/issues/299

ended up working around it by using the linked pull request, #300: ruby $ uv tool install pgcli[keyring,sshtunnel] --with git+https://github.com/lglines/sshtunnel@7030d0c76c679c2934bdc27adc48ff5a84d1ae9a $ pgcli postgres://... --ssh-tunnel ...
i also tried connecting to it through my reverse proxy with SSL termination, but pgcli didn't work with that, and i couldn't find any "direct TLS" option to solve it...

formal current
#

I'm a bit confused.. shouldn't timediff() exist in Python's sqlite3?

#

Are not all functions available?

#

Oh.. It's sqlite version 2.6.0... then my next question would be: how do I find out which functions exist there?

coral wasp
#

hmm

delicate fieldBOT
#

:white_check_mark: Your 3.14 eval job has completed with return code 0.

[('pow', 1, 's', 'utf8', 2, 2099200), ('group_concat', 1, 'w', 'utf8', 1, 2097152), ('group_concat', 1, 'w', 'utf8', 2, 2097152), ('json_type', 1, 's', 'utf8', 1, 2048), ('json_type', 1, 's', 'utf8', 2, 2048), ('julianday', 1, 's', 'utf8', -1, 2099200), ('ntile', 1, 'w', 'utf8', 1, 2097152), ('nullif', 1, 's', 'utf8', 2, 2099200), ('sqlite_compileoption_get', 1, 's', 'utf8', 1, 2097152), ('json_valid', 1, 's', 'utf8', 1, 2048), ('json_quote', 1, 's', 'utf8', 1, 2048), ('json_patch', 1, 's', 'utf8', 2, 2048), ('->', 1, 's', 'utf8', 2, 2048), ('json_array', 1, 's', 'utf8', -1, 2048), ('current_timestamp', 1, 's', 'utf8', 0, 2097152), ('power', 1, 's', 'utf8', 2, 2099200), ('sqlite_compileoption_used', 1, 's', 'utf8', 1, 2097152), ('json_remove', 1, 's', 'utf8', -1, 2048), ('json_object', 1, 's', 'utf8', -1, 2048), ('json_insert', 1, 's', 'utf8', -1, 2048), ('->>', 1, 's', 'utf8', 2, 2048), ('sin', 1, 's', 'utf8', 1, 2099200), ('sum', 1, 'w', 'utf8', 1, 2097152), ('quote', 1, 's', 'utf8',
... (truncated - too long)

Full output: https://paste.pythondiscord.com/DF3COG3WBRJBGW7AUHNT7UAQ4Q

formal current
hard dragon
#

Anyone mind pitching in on whether using json files as a digital assistants database format is an effective idea please?

paper osprey
#

@jade wing jsyk i despise the Method HOWEVER it has resulted in this so thank you for that
i'm still trying to figure out how to get the owner data to fill in

jade wing
# paper osprey <@936769916072259654> jsyk i despise the Method HOWEVER it has resulted in this ...

what method are you referring to?
for storage i would model this as two tables
one named attacks (if that is what it is and ifni remember it right) with the fields id, name, owner_id, damage, cooldown, damage_rate_ticks and ststuses
and another named owners with the fields id, name, love, badge, health, dodges and secret
and then be able to join the two tables on attacks.owner_id and owners.id
possibly create a ststusez table and a many-to-many connection table between the attacks and statuses if needed to be able to have several statuses associated with each each attack if that is a thing (i lack context here)

paper osprey
rigid pike
#

guys i want help

#

i have bought a brand new pc

#

for

#

a database

#

but i only wanna use it for a database

#

using windows

#

and i want it to be available on my network + a vpn tunnel

glass thunder
#

could anyone help me with making a data flow diagram?

winged stump
#

If you want a machine to function as only a database or similar but not general usage, you should've went for a Raspberry Pi-like machine IMO

#

Not a PC with Windows (though you can always just switch to Linux on it)

#

All of this is from my knowledge. I'm not stating super duper reliable facts but it's what I know

jade wing
jade wing
fierce copper
#

Hi, there!
I have a database which contains millions and billions as text 'Mln' and 'Md'.
I need to transform 'Mln' and 'Md' into numbers.
Python always gives me the wrong numbers.
Any suggestions, please?

#

The main problem is due to the coma conversion in dot.

#

In my country the float is coma not dot and millions and billions are dot

#

When I run I get 22.12.00

#

I could use spreadsheets but I'd like to know any solution in python

marble swan
#

Which are currently considered the best ORMs in Python?

long pendant
jade wing
fierce copper
#

I want to get:
Fatturato
Utile
Costo personale
Numero dipendenti

#

The database i get is the follow picture

spark vector
#

took me long enough...

zealous spire
spark vector
#

do you want to find out

#

??

delicate fieldBOT
spice wedge
#

Well, my detective skills tells me it's a group chat that's all fun and games.

jade wing
spark vector
velvet panther
#

I'm coding a VCS with a command that encrypts a file using XOR encryption. It stores each users credentials in a separate JSON file. Only problem is I couldn't link a specific user. So once the file is encrypted, any other user could 'encrypt' the file with another key and corrupt the file. How can I fix this?
P.S - I have user verification system already

jade wing
# velvet panther I'm coding a VCS with a command that encrypts a file using XOR encryption. It st...

"XOR encryption" is the strongest there is and has been mathematically proven to be unbreakable, if but only if the key is as long as the data you encrypt and is made out of true randomness and is never ever reused again for encryption anything else, if you can't live up to those three stringent requirements "XOR encryption" isn't any good and can even be one of the weakest forms of encryption (in worst case scenario is almost like "Caesar encryption" or "ROT13")
with all of that out of the way, you need an ownership/permission system where you can tag a file with an owner and only let the owner have write access to such a file, as anyone with write access can otherwise overwrite the file with whatever they want (encrypting again with another key is just another way to overwrite it)

jade topaz
#

anyone here good with postgis? i've got a bunch of data that's using HK1980 northing/easting, and i'm under the impression that it should be stored in postgis by converting it to WGS84 (srid=4326) before storing it. is this correct? i want to be able to find the rows whose location is within a given radius of a point

#

(disclaimer: this is my first time working with GIS)

marble umbra
thorny anchor
#

if your data will all be in HK1980 it would be more accurate to keep it in HK1980

pale crest
tight grove
#

Can someone enlighten me on redis

rigid pike
long pendant
#

if im not wrong they even updated thier license couple of years ago to paid if you're making commerical based application

#

even though they're open source

tight grove
#

From what I understood yeah , for cache ,rate limiting..not for complex queries

torn sphinx
#

Hi everyone.
I'm new here and a python rookie. Been learning for months now.
Just finished building a simple Expense Tracker in Python.
It handles income, expenses, and displays running balance using basic file operations.
Would love any feedback or improvement suggestions 👇

https://github.com/Variant1740/Expense_Tracker/blob/main/Expense_tracker.py

GitHub

A simple Python project for managing daily expenses - Variant1740/Expense_Tracker

ionic pecan
#

you could probably get rid of the time.sleep or at least replace multiple calls to it by one long sleep. Also, in view_expense, you could use string alignment to make sure the output is neatly columnized

torn sphinx
somber fractal
#

whats database

hallow pine
paper quest
#

Whats the best database ( free) I can use to store json data?

#

can MongoDb store json? , if so can someone show how to do that?

keen minnow
rough otter
#

Encountered two children with the same key, null. Keys should be unique so that components maintain their identity across updates. Non-unique keys may cause children to be duplicated and/or omitted — the behavior is unsupported and could change in a future version.

@app.post('/hangouts')
def create_hangout(hangout: dict):

    cur = con.cursor()
    cur.execute("""
            INSERT INTO hangouts (activity, hour, minute, maxAttendees, location, description)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (
            hangout['activity'],
            hangout['hour'],
            hangout['minute'],
            hangout['maxAttendees'],
            hangout['location'],
            hangout.get('description', '')
            # no need for attendees because the default is one
        )
    )
    con.commit() # for safety and control, think of this like git commit
    # update to database and then give it to the frontend
    new_id = cur.lastrowid

    # fetch newly added hangout
    cur.execute("SELECT * FROM hangouts WHERE id = ?", new_id)
    new_hangout = cur.fetchone()
    return dict(new_hangout)

getting this error in my Python code. think it is this function

rough otter
#

figured it out

runic goblet
#

I am trying to log transform data but I could be sure if somethings is wrong

#

is this values normal for log transformed one?

#

I kinda get how skew() works but I didnt expect to negative values

lean walrus
#

If I have an API token stored in the database hashed with a salt (using bcrypt), how do I retrieve that row using the token in plain text form?

storm mauve
#

you don't

either you hash then retrieve based on the hash, or store some metadata in the token after hashing (e.g. account/token id) then fetch based on it

keen minnow
lean walrus
lean walrus
keen minnow
lean walrus
#

then there is no way I can store the token in hashed form?

keen minnow
lean walrus
#

ok, pretend I have a table with two columns (user_id[str], api_token[bytes]). How would I get the user id from the db query if I only have the token in string?

keen minnow
#

If you store the hashed value of the token in the DB in that table, then you need to hash the incoming token and look for any row that contained that hashed value

lean walrus
#

the problem is that bcrypt needs salt to hash the token, I'll just probably take your solution and add a "salt" column in the table

hollow pebble
#
  1. ts_in_delta
    Select timestamp column (1-N): 1
    2025-11-20 20:58:58,123 - INFO - Using ts_col='ts_event'
    2025-11-20 20:58:58,123 - INFO - PASS 1: Creating UNSORTED parquet (no RAM usage)...
    2025-11-20 21:03:37,466 - INFO - Pass 1 complete G:\quant\data\raw\datatbbo\datatbbo\converted\merged\ES_TBBO_CONTINUOUS_(04-12-2024- 09-22-2025)_UNSORTED.parquet
    2025-11-20 21:03:37,467 - INFO - PASS 2: External disk-based sort...
    2025-11-20 21:03:37,738 - ERROR - Merge failed: Invalid Input Error: Cannot change enable_external_access setting while database is running

❌ Merge failed: Invalid Input Error: Cannot change enable_external_access setting while database is running

❌ Merge failed

mr duckdb king

#

what does this mean

coral wasp
#

Seems like you're setting a bunch of duckdb settings (esp from the thread where you said you set the memory limit to).

warm fern
#

When starting a back end app, what's your process like? I know you have to create an er diagram before coding and write documentation after it's done but what else?

keen minnow
#

Your data model will be impacted by the access patterns and requirements

icy glacier
#

For example I might add a comment to the locking code that says "it uses a table and select for instead of advisory locks because we want to use cockroachdb which doesn't support them", or highlight that somewhere another developer can read quickly.

balmy pier
#

Hey there, I'm planning to start learning postgresql could you guys recommend me any tutorial so that it will be good for me to get into web development(python) and data analysis ?

prime palm
# balmy pier Hey there, I'm planning to start learning postgresql could you guys recommend me...

it depends on your current SQL knowledge and your task in web development. I would recommend reading introduction docs at first , I Tutorial part , learn about MVCC in postgres, if you don't know SQL well , may be you may also need a SQL part in docs a bit. If you wanna learn postgres architecture , you may watch this video. However, the best approach to learn certain topics so you would avoid tutorial hell is by practising them in your own web project , e.g if you are working with concurrency in your web application - you might wanna read more about Transaction Isolation , Explicit Locking and etc. So you may read a docs , solving a certain problem in your project and watch Hussein's channel

Creating a listener on the backend application that accepts connections is simple. You listen on an address-port pair, connection attempts to that address and port will get added to an accept queue; The application accepts connections from the queue and start reading the data stream sent on the connection.

However, what part of your application...

▶ Play video
gaunt meteor
#

the main drive for the project is SPEED, but it limits you to unqiue key pairs, but its primarly a cacheing system to speed up requests

#

rather than work as a database

#

and it works 100% in memory

#

so you cant use it when you want a cache that you expect to have on disk lookups

tight grove
gaunt meteor
#

mostly its used as loadbalncer or head node. but you can use it as a raw DB, just kinda dont as powerloss makes you lose all data

storm mauve
#

this is not relevant for this channel whatsoever

and that smells more like self-promotion than anything else

devout nova
jade wing
#

!rule ad

delicate fieldBOT
#

6. Do not post unapproved advertising.

jade wing
formal current
#

!warn 1379302063292547155 Don't advertise, and listen to staff instructions.

delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied warning to @devout nova.

waxen acorn
#

!rule

delicate fieldBOT
#

The rules and guidelines that apply to this community can be found on our rules page. We expect all members of the community to have read and understood these.

warped brook
#

Hi guys

#

I'm developing an Algorithm Trading bot. So I'm wondering what do you guys use in VS Code to visualize/analyse large base of data

ripe oasis
#

hi, im new here and also in python, I know some basics, and im trying to make a code to help me with a study im making for my math class, can someone help me please?

lucid valve
long pendant
#

its free and open source

#

and has support for many kinds of db formats

fair pasture
#

Hi guys, is posting an open source Python project allowed here or is it considered advertising?

long pendant
prime palm
#

Hi there, how would postgres react on INSERT ON CONFLICT clause if we are doing a bulk insert and there are same rows in the insert statement? I suppose that if there are no unique constraint violation error it will only insert 1 example of row and ignore other duplicates, am I right?

fair pasture
# long pendant if you want some suggestions or feedback you can send your github link here

Thanks! Here is the github link: https://github.com/manoss96/onlymaps

It's a micro-ORM, which means that you still get to write plain SQL queries to interact with a database either synchronously or asynchronously, while it takes care of mapping the results to Python objects. It also supports all major relational databases.

I believe that a micro-ORM was missing from the python ORM landscape, as most ORMs are pretty bloated feature-wise and come with their own OOP-like API.

Any questions/feedback is welcome.

GitHub

A Python micro-ORM. Contribute to manoss96/onlymaps development by creating an account on GitHub.

fair pasture
lean walrus
#

I heard that for document-like data e.g. ```json
{
"id": "page1",
"background": "#FFFFFF",
"elements": [
{
"id": "shape342",
"type": "rect",
"x": 100,
"y": 200,
},
{
"id": "text123",
"type": "text",
"content": "Hello world",
"x": 200,
"y": 150
}
]
}

prime palm
#

Hi there , I am building web scraper of many websites and on each website there are specific time zone, e.g there could be both european , american websites with differn timezones. I am using postgres for a database , yet a local server with pgadmin app. As I know the preferred way to store data is in UTC timezone, however, my default value in postgres is specific timezone like Europe/(some city) , so the question is do I need to change the default value somewhere , if yes how? (set time zone (timezone) doesn't help as it is restarted with each new connection to the database )
And the preferred way is when I know the user's timezone using date postgres operators I should show the data using this certain time zone.
Example: date in the database
1999-01-08 04:05:06-8:00

query if user is from los angeles:
select created_at at time zone 'utc' at time zone 'america/los_angeles' from users;

Am I right with a concept? Or there are some misunderstandings?

gaunt meteor
#

there is an ISO standard for this

#

ISO 8601 is an international standard covering the worldwide exchange and communication of date and time-related data. It is maintained by the International Organization for Standardization (ISO) and was first published in 1988, with updates in 1991, 2000, 2004, and 2019, and an amendment in 2022. The standard provides a well-defined, unambiguou...

#

dont worry about the city, just do the offset

jade wing
#

the "continent/city" notation is the most correct when it comes to DST, but it's also the hardest to extrapolate unless you have that information about the user from a reliable source

gaunt meteor
#

generaly you dont care on the backend whats going on, and everything shoudl pretty much be done in 100% UTC

#

you can do customer stuff on their customer ID, or in their VIEW settings

#

if you 100% must put in customer specific cusomtization in you DB

#

but dont as almsot any app handles DST for you, hell the endpoint OS handles it

jade wing
# gaunt meteor thats done client side

that depends on the context, on the web you can most of the time ignore it and just offload it to the client side
but if implementing this in a local application one might need to take care of that one way or another

gaunt meteor
#

and another reason you dont wanna deal with it in a DB, it changes year to year for alot of places

gaunt meteor
jade wing
#

but generally, yes, i would just store everything in UTC in the DB and let the client application code deal with the local time if necessary

gaunt meteor
#

even if your doing this pure sqlite flat file you would do this in the program/app not the DB itself

#

youd just set that sucker to iso utc

#

because EVERY programing lang has stuff to handle it for you

#

any libary expects iso8601

jade wing
# gaunt meteor even if your doing this pure sqlite flat file you would do this in the program/a...

i totally agree with you there
it's just that you might need to implement things in the presentation layer or business logic yourself if it is on you to implementing that side of things (i have had to do this a few times, last time was less then a month ago, but it is very special/specific cases only where you have to deal with this, for example when not dealing with timestamps right now but rather in the past or future)

gaunt meteor
#

one of the only ones ive come accross in a DB is an extra gps feild

#

but that is kind sketchy now due to data protection laws

prime palm
jade wing
prime palm
gaunt meteor
#

yeah do all your customization in the buisness logic

prime palm
#

ok, thx, i will convert all datetime to utc and store it in utc. and show according user's timezone. btw, banal , but not least interesting fact that timestamptz takes 8 bytes in postgres and date (without time) only 4 bytes. so earlier i used to store kind of birthday dates in timestamptz but today i have investigated the key difference. may be would be also useful for someone , you can find more about it in postgres docs

torn sphinx
#

Professional Data Analysis Example

Coded by Aiko Lark

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

-----------------------------

1️⃣ Load dataset (example: sales data)

-----------------------------

data = {
"Month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug"],
"Sales": [1500, 1800, 1700, 1900, 2200, 2100, 2300, 2500],
"Expenses": [800, 900, 850, 950, 1000, 980, 1100, 1200]
}

df = pd.DataFrame(data)

-----------------------------

2️⃣ Data overview

-----------------------------

print("=== Data Overview ===")
print(df.head())
print("\nSummary Statistics:")
print(df.describe())

-----------------------------

3️⃣ Add new calculated column

-----------------------------

df["Profit"] = df["Sales"] - df["Expenses"]

-----------------------------

4️⃣ Analysis

-----------------------------

print("\n=== Profit Analysis ===")
print(df[["Month", "Profit"]])

Find month with highest profit

best_month = df.loc[df["Profit"].idxmax(), "Month"]
print(f"\nMonth with highest profit: {best_month}")

-----------------------------

5️⃣ Visualizations

-----------------------------

sns.set(style="whitegrid")

Line plot for Sales & Expenses

plt.figure(figsize=(10,6))
plt.plot(df["Month"], df["Sales"], marker='o', label="Sales")
plt.plot(df["Month"], df["Expenses"], marker='o', label="Expenses")
plt.title("Monthly Sales vs Expenses")
plt.xlabel("Month")
plt.ylabel("Amount ($)")
plt.legend()
plt.show()

Bar plot for Profit

plt.figure(figsize=(10,6))
sns.barplot(x="Month", y="Profit", data=df, palette="coolwarm")
plt.title("Monthly Profit")
plt.xlabel("Month")
plt.ylabel("Profit ($)")
plt.show()
#this is a tutorial for begginers

delicate fieldBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

jade wing
torn sphinx
#

guys what app is best for data analysis in python should i use and also works in low end pc?

stone gull
#

you can use Jupyter Notebooks with Pandas/Polars/PySpark

snow widget
#

Hey guys good evening. I'm starting with this of python. Can anyone give some advice. Thanks

lucid valve
warped brook
#

does anyone use lightweight_chart in Python ?

obsidian basin
#

In sqlachemy in a many to many relationship do I add associates/connection table after both tables the M:N tables have values? please ping on reply

jade igloo
#

Any pro dev in the house

jade wing
jade igloo
jade wing
rigid pike
wanton ruin
#

What's the best free provider for databases? Just in case I want to make a side project or something.

lean olive
wanton ruin
#

I didn't know about AWS having free tiers. I assumed Amazon were thirsty for money.

storm mauve
lean olive
#

for databases, Amazon Aurora DSQL and Amazon DynamoDB have an always-free offer

sonic jacinth
#

Hi guy i need help with connecting into a database it dont let me connect it keep saying Error module not found

sonic jacinth
#

they both in the same folder but dont work

cedar tiger
sonic jacinth
cedar tiger
#

here pls

sonic jacinth
# cedar tiger here pls

% npm run test:db > node -r dotenv/config -r ts-node/register lib/testConnection.ts Error: Cannot find module '/Users//Desktop//lib/mongodb' imported from /Users//Desktop//lib/testConnection.ts at finalizeResolution
(node:internal/modules/esm/) at moduleResolve

(node:internal/modules/esm/) at defaultResolve

(node:internal/modules/esm/) at ModuleLoader.#cachedDefaultResolve

(node:internal/modules/esm/) at ModuleLoader.resolve

(node:internal/modules/esm/) at ModuleLoader.getModuleJobForImport

(node:internal/modules/esm/) at ModuleJob.#link (node:internal/modules/esm/)

{ code: 'ERR_MODULE_NOT_FOUND', url: 'file:///Users//Desktop//lib/mongodb' }

cedar tiger
junior frost
#

Hey 👋

wanton ruin
#

I know this is probably a dumb question but oh well.

Let's say I have a database which stores information about discord guilds. And I want to display the information on a website.

How would I go about updating the information? (For example, server changes their icon and I want it to update on the site also)

(This is assuming that I have both the guild_id and icon_hash saved)

hexed estuary
#

Well, the same way you got the info into the database in the first place, pretty much. You'd probably do an appropriate update query.

wanton ruin
dense elk
hexed estuary
#

Constantly looking for updates would be a bad idea (it's not even resources you should be worrying about, but API limits), yeah

wanton ruin
#

Gotcha

hexed estuary
#

Consider storing for each thing you look up (e.g. guild) the last lookup time, and periodically selecting one of the not-updated-for-longest-time records to update

marsh kernel
#

good

rough grove
#

Hi all, have been facing a persistent issue with sqlalchemy/postgresql and multithreading, specifically regarding sqlalchemy throwing an “instance not persistent within this session” exception. Has anyone faced a similar issue and found a robust solution? This issue also occasionally happens under normal operation, primarily when long processes have been running for a while.

#

To clarify a bit, I just use a session factory to connect to the database at runtime, i.e. import the factory function per file and call it within the file to create a session, then use it as needed. This project may switch to using flask or Django in the near future, so this issue may end up being a moot point, but I have been banging my head on the wall with this for a while

bitter sparrow
#

Hi all, as a newcomer to Python programming, i was wondering if you know some nice websites with script templates for data science? I'm thinking of a premade script that will generate graphs or carry out multiple T tests by just copying in datasets and defining them. Thanks in advance 🙂

ionic pewter
#

Anyone wanna help? Dms please

gilded sinew
fierce sundial
#

anyone can advice me before I try to make a program (I'm new to python and learning) to convert csv into bat into txt into etc. and vise versa, and along with that it supports a database using sql, it will save the data from those different types of files into the database. I will respect any advice and I need it if it helps. This is a project for a one shot to prepare for my exams fyi.

cedar tiger
cedar tiger
#

alright np

hard dragon
#

Any thoughs on using json as a categorial database for a digital assistant?

dawn shard
hard dragon
chilly glen
#

in what cases i should use sqlalchemy

cedar tiger
chilly glen
cedar tiger
rapid mason
#

Hey guys, I'm new here. I started learning python so time ago, I created my frist mini-app in fastapi just to have something to show off and ask for an advice what to improve. If you could tell me what's wrong with the code I have, what I should change or never actually do. I'd be really glad and please be kind haha, I don't want to get discouraged.
https://github.com/dawidw-km/Mini-Library

GitHub

Contribute to dawidw-km/Mini-Library development by creating an account on GitHub.

cedar tiger
tardy latch
#

Hi guys, please check out dbzero. It's an open-source Python package which allows building blazing fast apps without any external database, ORM or cache. It's based on a principle of an "infinite memory" - just use your objects however you like. It's already tested in prod for several medium-size projects. We'd love to hear your feedback. https://github.com/dbzero-software/dbzero

GitHub

DISTIC (Durable, Infinite, Shared, Transactional, Isolated, Composable) memory system for Python 3.x - dbzero-software/dbzero

undone lagoon
#

Hello guys, I have started learning Django some days now.I have some experience with Flask, I built a Quiz Website back then.Now, I want to build a website appointment system for a local haircut studio.The customers will be able to book for only one appointment and each appointment will have only one user.Furthermore the customers will be able to select haircut types they desire.It is my first time attempting to design database logic on my own.I know it might be very simple but I am still a begginner and I wanted to get your opinions in case I need to add/delete something.The problem is that I am a soo confused because: When I am thinking about the database logic I feel it seems alright the way I have divided the entities.But then I am thinking how are customers gonna select a haircut if they are not connected somehow (in the database section)? And what should I do with the form so that the haircuts can be added in my database while using ModelForms in Django? The last think that keeps getting me confused is how am I gonna keep the appointments clean in case an appointment is already booked and another user books at the same time.It seems like a whole mess in my head so if anyone can help I would appreciate it!

spiral breach
#

Polish guys! please DM me if you are free to chat.

fallen zenith
# undone lagoon Hello guys, I have started learning Django some days now.I have some experience ...

Assuming you are using Django to create the tables, this is a great start. I would consider extending the Customer model to the abstract user model so they can create logins/passwords.

A question that I’d have is the appointments, how are you blocking available and unavailable time?

As far as holding the appointment time, I would maybe create that in your form logic first, and then when the scheduling is verified and completed, have a is_taken bool so you can easily query for open slots.

cedar tiger
#

If you do need to identify the hair stylist, you will need to also check to make sure that the hair stylist doesn't get double booked at the same time for multiple customers

civic cargo
#

Can anyone suggest an embedded document-based database with a good Python API? Like if SQLite and MongoDB had a baby. I'll be doing more reads than writes. Long-term persistence isn't an issue, I'd be using it as an index to speed up some file generation

civic cargo
# keen minnow postgres?

Can I use it without expecting someone to spin up a Postgres server? And can I use JSON documents (as opposed to table rows) as the primary unit of data?

storm mauve
#

not fully sure but I don't think that anything embedded works great for that
if it's small enough to keep in memory, may as well just use a dictionary?
you can also store JSON in SQLite

civic cargo
# storm mauve not fully sure but I don't think that anything _embedded_ works great for that i...

Let me go into a bit more details about my problem, then. If you search my name on this server, you can see past discussions I've had, but I'll keep it to the salient points:

  • I'm fetching data from a couple of web APIs and matching it locally with some data that's already in a repository. Specifically I'm matching game ROM data (e.g. the checksum of a ROM file) to game metadata (e.g. game genres, features, tags).
  • The ultimate goal is to generate .dat files with this data to use within an app.
  • I was previously doing exactly what you describe, and pickling the Python objects to disk and back. But profiling revealed that that was a huge bottleneck, which was getting in the way of me iterating on the .dat file generation -- even loading the pickled index I built manually took several minutes.
  • So I decided to store this info in an SQLite database in the hopes that querying would be faster. But dealing with SQLModel (Pydantic + SQLAlchemy) is proving to be a massive pain in the ass. Right now I'm just trying to store all the data I'm fetching from one service (including many-to-many relationship tables) in it.
  • I don't want to require a database process because my use of databases isn't for any kind of persistent state; just as a cache to speed up local processing.
  • I'm asking about document-based alternatives in the hopes that I can find something that's easier for me to get up and running
#

Also I'm trying to make this maintainable for people who aren't already heavy users of databases and Python

storm mauve
#

I'd just use raw SQLite tbh

civic cargo
#

I tried that but it got way too verbose specifying all those tables

#

The CREATE TABLE strings I hardcoded were longer than the classes themselves

#

So I hand-rolled some logic to create tables and column definitions based on type annotations, but I couldn't get relationships right. It then hit me that I was unwittingly creating my own ORM

storm mauve
#

do you need of all columns you were creating? you can include only the actually important ones, and leave the rest as part of a JSON object or even binary BLOB

civic cargo
lavish osprey
civic cargo
#

Why do you ask?

quaint sinew
#

it sounds like you just want to «stage» the data. just dump the tables raw. Without foreign keys, etc… that would solve your issue. If you want faster querying, create denormalized versions where the tables are joined. Also, duckdb is faster for this than sqlite

#

if you are trying to integrate data, you may want to create a star schema. Which is highly denormalized and optimized for querying. You still dump the data in a raw fashion, but then you update the star schema using MERGE

#

if you have a budget, and we are speaking larger amounts of data, you may want to consider apache spark (which is distributed processing) and a deltalake

#

if you are fetching data and you just want to dump it, then you don’t have to go through the pain of writing «create table». For example, you can use the pandas method to_sql() that will create the table for you

undone lagoon
undone lagoon
cedar tiger
#

Thoughts on https://sqlite.ai ?

SQLite AI transforms SQLite into a distributed AI-native database for the Edge—combining the simplicity of SQLite with cloud-powered scalability, fault tolerance, automatic backups, and powerful new extensions like SQLite-AI, SQLite-Vector, SQLite-Sync, and SQLite-JS to enable intelligent applications across devices, IoT, and mobile platforms.

storm mauve
#

ew
vector search is great, but the rest of their AI features sound awful
do they even have the permission to use the SQLite name that way?

#

I cannot imagine any compelling reasons to have LLMs integrated directly in the database, specially on edge devices

TTS and ASR are useful technologies, but again, keep it in the application layer instead of having the database deal with audio blobs or whatever you must do to get ASR working within SQLite

uneven bear
#

Hllo dm me I need data base

lofty nimbus
brave bridge
#

OSI does not allow restricting fields of endeavor (such as "you can't use this commercially"), so it is impossible to incorporate this project into an open-source project. So the license is misleading at best.

frosty vale
tardy latch
tardy latch
# civic cargo Let me go into a bit more details about my problem, then. If you search my name ...

in dbzero you just annotate your classes with @db0.memo and that's all - it handles persistence and caching transparently, allowing you to expand your program's working memory to virtually any scale. And for indexing purposes use the "tags" feature or db0.index then use db0.find to locate your data objects. Visit our documentation website for details and examples: https://docs.dbzero.io/

Get started with dbzero-enhanced programming in Python

tardy latch
# frosty vale Looks really interesting! Would I be able to use pydantic with a custom class? I...

Thank you fsade. Absolutely you can use pydantic. dbzero is Python-native so we made it to support most of the language features seamlessly (more extensions such as pandas integration coming soon). It can be used as a schema-on-write - when you enforce strict type validation or schema-on-read (no need to constrain yourself to rigid schemas). You can also use abstractions and polymorphism with your data objects.

tardy latch
# frosty vale Looks really interesting! Would I be able to use pydantic with a custom class? I...

State is persisted on disk automatially (default autocommit setting is 367ms - so roughly 3x every second). When you gracefully close the program - you can continue from where you left it (so indefinitely running programs are possible). If it crashes - it starts from the last consistent state. If you write REST API - and want to prevent data loss, use async locked feature to block response until there's a confirmation of data persistance (for high stake requests such as new signups or financial transactions), for most requests we don't recommend it.

tardy latch
tardy latch
frosty vale
frosty vale
tardy latch
# frosty vale That sounds impressive...how did you implement that? How do you handle polymorph...

It's tightly integrated with CPython, when you wrap your class with @memo dbzero takes control of all the operations on the object. The references are stored as 64-bit identifiers and we manage ref-counts in the exact same way as Python does (so object stays alive for as long as it's referenced - note that tags also "reference" objects). The internals are quite complex - it's a lot of layers of C++ code but the source is available if you want to go deep 🙂

tardy latch
# frosty vale Okay, so I need to use python? I cannot use a backup program like restic or simi...

Yes, you need to use Python (but the function can be executed from another process, and it's just few lines of code). By default all goes to one file (we call it prefix) but you can split into many files by specifying either at a class level @memo(prefix = "settings") , @memo(prefix = "data") - or dynamically using db0.set_prefix function (in which case it must be the first instruction of the init method)

frosty vale
#

That's cool, I will try it out, thank you 🙂

tardy latch
#

Great, thank you :-). Let me know if you have any questions or problems.

frosty vale
#

In your intro video you mention that you are exploring other technologies? I presume other languages? Do you have a roadmap for this already?

tardy latch
#

we have a prototype version for JS, C/C++ SDK can be exposed quite easily I guess, but no specific roadmap yet - we'll listen to what the community says and navigate towards their needs (we still need to figure out what would be the best use-cases for dbzero)

scarlet horizon
#

anyone know good youtube videos guides for mysql

manic tide
#

mumps (Massachusetts General Hospital Utility Multi-Programming System, aka M, has a whitespace-aware syntax where each line of code is syntactically significant. It integrates a built-in NoSQL hierarchical database, accessed using the same commands as local variables) + rag + py for fun i guess idk, as seen in screenshot running in my own html/skulpt based terminal.

delicate fieldBOT
coral wasp
manic tide
#

idk I just wanted peoples thoughts, using this syntax in py with rag has potential advantages: AI-driven healthcare: RAG on patient globals for semantic queries in telemedicine.
Legacy integration: Bridge old MUMPS systems (e.g., VA hospitals) with Python ML for predictive analytics. Knowledge graphs: Store/retrieve hierarchical AI data in edge computing for IoT.Hierarchical, schema-less storage for sparse data (e.g., medical records). Fast key-based access, global persistence.integrates MUMPS-like efficiency into Python apps without full DB switch. I have no experience in mumps or medical billing but what interested me is when I seen someone talking about EPIC has been using it since the 70s for reliable DB which exsists outside modern syntax. Epic uses MUMPS (Massachusetts General Hospital Utility Multi-Programming System) because it's a powerful, integrated language and database ideal for high-volume healthcare data, offering rapid development, scalability, reliability, and efficient, real-time data handling crucial for Electronic Health Records (EHRs) since its early days in hospital systems. Pairing this with PY and RAG is the innovation ive added, i'm just trying to find anyone familiar with mumps to maybe test it or weigh in, im trying to figure out what to build it into, its the template for something larger (no name yet Mumps and rag are just the DBs it uses within the py.)

#

the M (mumps) and RAG databases exsist indepdently from eachother i think right now, I want to enhance it for MCP functionality and API integration possible medical db data set training using a custom model

#

another advantage: niche syntax are less prone to cyber threats, (not that AI couldn't decrypt instantly ) just that people writing malicious software might overlook M syntax)

tardy latch
# manic tide idk I just wanted peoples thoughts, using this syntax in py with rag has potenti...

right, EPIC is loyal to MUMPS but likely for legacy reasons (they've been on the market since late 70-s). Same as Cobol is still prevalent in banking systems. Rewrites of such systems might cost billions so it's easier to train people in coding than taking on reimplementation and migration. Is there any specific reason you chose this particluar technology, aren't there any modern alternatives ? I understand the use-cases, just wanted to understand your motivation.

manic tide
#

well the idea is to make mumps modern by supporting running in PY with RAG, thats essentially all i'm going for atm. old doesn't mean bad, it's known for its short CLI handling, which (conceptually) could speed up model training over larger data sets with less whitepsace and characters, while running in RAG/PY pipeline etc. thats my vision anyways

tardy latch
manic tide
#

i thought about doing the same thing for cobol, i figure is not a way to eliminate old syntax, a way to revive them for modern use by more developers (idk if these companies that rely will be happy about this though)

#

embedded

tardy latch