#databases

1 messages · Page 32 of 1

finite crow
#

Oh

#

Okay. But it lists them in age order in their example.

coral wasp
#

Sure, that's a database decision what order to provide. If you want an order, add it as an order by to the outer query.

#

Otherwise, most databases just say the result is non-deterministic (it could come back in any order)

finite crow
#

It's implementation defined?

coral wasp
#

It depends on how the database decided to execute the query, which may depend on the data. So, you always put an order by if you depend on the order.

finite crow
#

Alright, that makes sense.
Thanks for helping me understand this.

wooden topaz
#

ok. @coral wasp. did like you said.
first i tried below query

SELECT category, product, total_spend,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend 
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query

got the correct result. From this result, i proceeded to next step of filtering only those rows whose row_number <=2, then the error comes.
To put it in simple way, i have added only last line of the below query.....

SELECT category, product, total_spend,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend 
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query
WHERE row_number <= 2

Error:
column "row_number" does not exist. FYI: in PostgreSQL all identifiers (including column names) that are not double-quoted are folded to lowercase. So if you're using any uppercased letters in the names and want to specify them in SQL-query, please double-quote them first. (LINE: 14)

coral wasp
#

Ok, so row_number is not a column. You can't filter on something that isn't a column.

#

Look at the results from the first query. What columns do you see?

wooden topaz
coral wasp
#

Share a screenshot of the result

wooden topaz
coral wasp
#

Yes: but row_number isn't a column in sub_query.

#

The where clause filters the sub_query, before the "select" statement runs. If you want to filter by something you created in the outer select, then wrap it again as a subquery and filter it.

wooden topaz
coral wasp
#

The where clause applies to columns in the sub_query.

#

Your query says: "select <stuff> from subquery where subquery.row_number <= 2"

wooden topaz
coral wasp
#

The where clause is before the select.

#

I know it's confusing.

wooden topaz
coral wasp
wooden topaz
# coral wasp Yes. that's why your code doesn't work.

yeah..now i get it. sorry for bothering you.
So, this is what happening.
First WHERE clause executes & whatever the result comes out of that WHERE clause, SELECT clause executes --> Becasue of this, When WHERE clause is getting executed, the row_number column is not yet there.

coral wasp
#

Yes, that's right.

#

The reason is:

#

If the where clause was after the select, it might change the results. What if you said:

#

select category, row_number() over (order by total_spend) from sub_query where total_spend > 100 and row_number < 3

#

This would be an impossible query... because you'd be filtering on both an input to the row_number (total_spend) AND the result of the row_number. (ok, not impossible, just really weird)

wooden topaz
#

ok. so, what could be the best way to filter out only those rows whose row_number <=2 from the below query:

SELECT category, product, total_spend,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend 
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query
coral wasp
#

select * from (yourquery) where row_number <= 2

wooden topaz
coral wasp
#

The way I would've written this, using a CTE, is:

#
with subquery as (...),
  sq_rownumber as (select .... from subquery)
select * from sq_rownumber 
where row_number <= 2
#

Much more readable, imo.

wooden topaz
coral wasp
wooden topaz
# coral wasp Yes. With a cte or without, it's the same query.

without CTE, it looks like below 🙂

SELECT category, product, total_spend
FROM
(
SELECT *,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = '2022'
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query_1
)AS sub_query_2
WHERE row_number <= 2
wooden topaz
coral wasp
wooden topaz
coral wasp
wooden topaz
# coral wasp In this example, it's just a nicer way of writing sub queries, yes. But, in othe...

I prepared notes for CTE as below. pls correct, if you have few minutes, could you pls correct, if there is any wrong.

What is CTE?

CTE stands for common table expressions.
It is advanced version of sub query which differs from sub query in 2 ways.

  1. It is much more readable compared to sub query.
  2. When a subquery has to be used multiple times, you can't use with the help of alias name defined. So, going with CTE is the best option here.

Syntax

WITH my_cte AS (
  SELECT a,b,c
  FROM T1
)
SELECT a,c
FROM my_cte
WHERE ....
coral wasp
#

Yes, that looks correct

wooden topaz
#

I think, a stand alone CTE will give error.
There must be a SELECT statement which selects from that CTE

coral wasp
#

Correct, yes.

wooden topaz
coral wasp
#

You use "with" once:

#
with q1 as (...),
  q2 as (...),
  q3 as (...)
select * from ...
``` (corrected to remove trailing comma)
coral wasp
#

You have a trailing comma

#

Oh sorry, my example had a trailing comma.

wise wind
#

Sqlalchemy was updating instead of deleting the children rows because I didn't specify a delete cascade with my user to dm relationship.

#

I ended up changing my relationship configurations to this: py sent_dms: Mapped[List["Dms"]] = relationship(foreign_keys='Dms.sender', back_populates='sender_user',cascade="all, delete") received_dms: Mapped[List["Dms"]] = relationship(foreign_keys='Dms.receiver',back_populates='receiver_user',cascade="all, delete")

#

Sqlalchemy has some weird behavior.

finite crow
#
SELECT candidate_id
FROM candidates
WHERE (
    (candidate_id, 'Python') IN candidates
    AND (candidate_id, 'Tableau') IN candidates
    AND (candidate_id, 'PostgreSQL') IN candidates
)
;

What's wrong with the syntax here?
syntax error at or near ""candidates"" (LINE: 4)

#

I'm trying to do the 2nd datalemur problem.

#

And how do I do what I'm trying to do? What's the right syntax to do this?

wise wind
finite crow
#

Okay that sort of makes sense, but now it's showing me some insane error that my alias doesn't exist.

SELECT candidate_id AS this_candidate_id
FROM candidates
WHERE (
    (this_candidate_id, 'Python') IN (SELECT (candidate_id, skill) FROM candidates)
    AND (this_candidate_id, 'Tableau') IN (SELECT (candidate_id, skill) FROM candidates)
    AND (this_candidate_id, 'PostgreSQL') IN (SELECT (candidate_id, skill) FROM candidates)
)
;

column "this_candidate_id" does not exist. FYI: in PostgreSQL all identifiers (including column names) that are not double-quoted are folded to lowercase. So if you're using any uppercased letters in the names and want to specify them in SQL-query, please double-quote them first. (LINE: 4)

#

Of course it doesn't exist, it's an alias.

#

I don't get it.

wise wind
coral wasp
#

Hold up a sec

#

Your query is a bit insane

#

Your query says: "Give me all candidates where any candidate knows Python and any candidate knows Tableau and any candidate knows Postgresql?

wise wind
#

It is redundant.

coral wasp
#

Oh soryr, the tuples... nm

#

Who writes like this... don't write queries like this

#
select * from candidates where skill in (x,y,z)
finite crow
#

No, they need to know python, tableau, and postgresql

#

all three of those skills

coral wasp
#

Well, what the heck. What is your data model?

#

Show an example of the data.

finite crow
coral wasp
#

Ah, it's a weird name for the table. I'd call taht "candidate_skills", not "candidates".

finite crow
coral wasp
#

The naive thing to do is something like: ```sql
with python_candidates as (....),
tableau_candidates as (...),
sql_candidates as (...)
select ... from ...

wise wind
coral wasp
#

The fancy version is what Blaziken just posted... you could count the number

#

fwiw, I prefer people understand the first version first, because it's more extensible to different types of problems. The second version (counting) fits only certain cases that are exactly designed for this kind of solution.

finite crow
#

I just want to check if the three tuples are in the table.

wise wind
#

Yeah starting simple is the way to go.

coral wasp
#
select *
from (select distinct candidate_id from candidates) distinct_cands
where distinct_cands.candidate_id in (select * from candidates where skill='Python') ...
#

I don't like that... tho

#

I would've written, if need be:

#
select *
from (select distinct candidate_id from candidates) distinct_cands
left outer join (select * from candidates where skill='Python') python_cands
  on python_cands.candidate_id = ...
#

Or maybe,

#
select candidate_id, candidate_id in (select * from candidates where skill='Python') as knows_python, ...
from (select distinct candidate_id from candidates) distinct_cands
#

(still, professionally, I would've done what Blaziken did)

wise wind
finite crow
#
SELECT DISTINCT candidate_id AS this_candidate_id
FROM candidates AS c1
WHERE (
    EXISTS (SELECT candidate_id FROM candidates AS c2 WHERE c1.candidate_id = c2.candidate_id AND c2.skill = 'Python')
    AND EXISTS (SELECT candidate_id FROM candidates AS c2 WHERE c1.candidate_id = c2.candidate_id AND c2.skill = 'Tableau')
    AND EXISTS (SELECT candidate_id FROM candidates AS c2 WHERE c1.candidate_id = c2.candidate_id AND c2.skill = 'PostgreSQL')
)
;
finite crow
coral wasp
#

Think of how messy this query is

#

The number of times you're visiting candidates

#

My duckdb version: ```py
import duckdb

result_df = duckdb.sql("""
create or replace table candidates as
select * from (values
(123,'Python'),
(123,'Tableau'),
(123,'PostgreSQL'),
(234,'R'),
(234,'PowerBI'),
(234,'SQL Server'),
(345,'Python'),
(345,'Tableau')
) t(candidate_id, skill)
;

select candidate_id, list_has_all(list(skill), ['Python', 'Tableau']) from candidates group by candidate_id
""").df()

print(result_df)

wise wind
coral wasp
finite crow
#

I thought the query optimizer is supposed to deal with stuff like that.

coral wasp
finite crow
#

yes

coral wasp
#

It tries to map your query to the equivalent simplest version, but your query isn't the same as the inner join version.

#

I guess it'd probably map it to an equivalent of:

#
select * from candidates a
join candidates b
  on a.id=b.id
    and b.skill='Python'
join candidates c
  on ...
#

But, things get hard to fathom when you start looking at the cardinalities of each step and how many rows are returned/etc.

#

I'm going to bet that the count solution will beat any of these solutions easily.

coral wasp
finite crow
#

Ok, I thought it could optimize it to not be redundant.

finite crow
#
SELECT candidate_id
FROM candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(skill) = 3
ORDER BY candidate_id;
coral wasp
#

That said, the "distinct skill" isn't needed because the question (I see now) says: "There are no duplicates in the candidates table."

finite crow
#

I want to complete datalemur. But I've done a few of the easy level questions now, and they're about the difficulty of the questions I dealt with on my final exam. Maybe slightly less, but only slightly.

#

I feel like it's one of those situations where the class only actually gave us a surface level of knowledge/skill.

coral wasp
wise wind
#

It's better to do something on your own even though it isn't going to be totally correct or it takes a ton of time.

finite crow
#

My plan is to try to do the problems, THEN look at their solution.

#

And hopefully end up absorbing their way of doing things.

coral wasp
finite crow
torn sphinx
#

Dynamic

wise wind
wooden topaz
#

https://datalemur.com/questions/sql-highest-grossing
I have solved this problem using ROW_NUMBER()

But, the official solution contains RANK()

I analysed for some time & for this problem -- Either you solve with ROW_NUMBER() / RANK() / DENSE_RANK, doesn't matter. The reason is; we are partitioning based on SUM(spend) & then taking top 2 SUM(spend) -- Right?

Amazon SQL Interview Question: Write a query to identify highest-grossing products within each category.

coral wasp
wooden topaz
#

ROW_NUMBER

No matter what the salary is, irrespective of that, ROW_NUMBER() will produce sequence of numbers one after the other.

RANK()

It's like ranking students based on the marks they got. When two students get same marks, they will be allocated same rank, isn't it. In the same way, RANK() works.
But the point to note down is, when it allocates same rank to different persons, it will skip the next rank, which can be better explained with above example.

DENSE_RANK()

It's like ranking students based on the marks they got. When two students get same marks, they will be allocated same rank, isn't it. In the same way, RANK() works.
But the point to note down is, when it allocates same rank to different persons, it won't skip the next rank, which can be better explained with above example.

coral wasp
#

The problem specification does not say what to do when there are ties, therefore any of these options are arguably right.

wooden topaz
#

When to use what?

For example, if you were reporting winners, and always needed to report a first, second, and third place, regardless of ties for each place, you would use DENSE_RANK.

Otherwise, you would use RANK, which might mean that there is no second or third place. If you were certain that there could never be duplicates, then you could just use ROW_NUMBER.

coral wasp
#

I rarely use Rank. row_number is usually what I select first, and then dense_rank if I must consider ties.

#

But, that's all a matter of preference .

wooden topaz
#

When solving question https://datalemur.com/questions/top-fans-rank
Answer i came up with

SELECT a.artist_name, COUNT(a.artist_name) no_of_time_below_10
FROM artists a
JOIN songs s 
ON a.artist_id= s.artist_id
JOIN global_song_rank r  
on s.song_id = r.song_id
WHERE r.rank <= 10
GROUP BY(a.artist_name)
ORDER BY no_of_time_below_10, artist_name
LIMIT 5

The official solution in the solution tab was solved with the help of DENSE_RANK(). Where could have been my query deviated from the official solution? pls..

coral wasp
wooden topaz
#

Corresponding output:

#

official answer in solution tab

WITH top_10_cte AS (
  SELECT 
    artists.artist_name,
    DENSE_RANK() OVER (
      ORDER BY COUNT(songs.song_id) DESC) AS artist_rank
  FROM artists
  INNER JOIN songs
    ON artists.artist_id = songs.artist_id
  INNER JOIN global_song_rank AS ranking
    ON songs.song_id = ranking.song_id
  WHERE ranking.rank <= 10
  GROUP BY artists.artist_name
)

SELECT artist_name, artist_rank
FROM top_10_cte
WHERE artist_rank <= 5;
#

corresponding output:

coral wasp
#

And what's wrong with your answer?

wooden topaz
coral wasp
#

What's your result vs their result? Not the query, the output of the query

wooden topaz
coral wasp
#

I agree the specification is ambiguous here: top 5 ranks vs top 5 artists.

wooden topaz
coral wasp
#

Aren't you ordering ascending? Ariana has the least number, not most.

#

And limit 5 will never work, because it'll truncate ties.

#

But your first issue is the sort order

wooden topaz
coral wasp
#

That makes perfect sense because you're sorting in wrong order.

#

And your artist_rank isn't a rank, it's a count of top 10 hits

coral wasp
#

What does your artist rank represent?

wooden topaz
# coral wasp What does your artist rank represent?

i see this logic. But, my thought process was like this:
Let's consider a class room with 2 students as below

name  marks
peter  10
heidi  20

If i want to rank the students, i have 2 options

  1. RANK()
  2. sort based on RANK which is nothing but, the student with highest marks is 1st rank & the second highest marks will become 2nd rank.
#

both are giving same kind of result. isn't it?

coral wasp
#

What does your artisst_rank represent? It's not a rank.

wooden topaz
coral wasp
#

There you go, that's how you debug!

wooden topaz
# coral wasp There you go, that's how you debug!

no_of_times an artist appeared in the top 10 is different from the rank.
i was thinking no_of_times as artist appeared it self as their rank which is the difference between my answer & official answer.
Thank you very much to be there & help me while i debug...
you are gem of SQL...
So, for an data science interview perspective, apart from the below topic, could you pls list down some more important things, based on your experience?

  1. Aggregate fn's
  2. Window fn's -- Especially ROW_NUMBER(), RANK() & DENSE_RANK()
  3. Cluster index & non Cluster index concept
  4. CTE
coral wasp
#

What I'd say is start with understanding every part of the SELECt statement: https://www.postgresql.org/docs/current/sql-select.html Indices are advanced topics: it's very database specific, though, so you'll really be learning about Postgres.

coral wasp
#

Third, there's other advanced topics, like: what's the difference between different databases. How does Dynamo differ from Snowflake/Duckdb/Clickhouse?

#

Fourth: learning how to express the same query in Pandas vs SQL : all of the problems above could be written in Pandas. You should learn how (if you want to be in a data role)

wooden topaz
#

Thank you BillyBobby. I sent you an friend request. pls accept it(If you are ok)

coral wasp
finite crow
finite crow
#

I finished a hard difficulty one on sql-practice.

#
WITH docs_years (doctor_id, `year`) AS (
    SELECT DISTINCT doctor_id,
        YEAR(admission_date)
    FROM admissions
        JOIN (
            SELECT DISTINCT doctor_id
            FROM doctors
        )
), adm_map AS (
    SELECT *
    FROM docs_years
        LEFT JOIN admissions AS a
            ON docs_years.doctor_id = a.attending_doctor_id AND docs_years.`year` = YEAR(a.admission_date)
        JOIN doctors AS d
            ON docs_years.doctor_id = d.doctor_id
)
SELECT doctor_id,
    first_name || ' ' || last_name                           AS doctor_full_name,
    specialty,
    `year`,
    CASE WHEN patient_id IS NULL THEN 0 ELSE COUNT(*) END    AS total_admissions
FROM adm_map
GROUP BY doctor_id,
    `year`
;
#

But I overcomplicated it because it looks like there aren't any doctors with no admissions.

coral wasp
#

Let's take apart the first subquery: ```sql
SELECT DISTINCT doctor_id,
YEAR(admission_date)
FROM admissions
JOIN (
SELECT DISTINCT doctor_id
FROM doctors
)

#

What you're looking for is (something like):

#
select doctor_id, YEAR(admission_date) year, count(*) num_admissions
from doctors
  left outer join admissions
  on doctors.doctor_id = admissions.doctor_id
group by doctors.doctor_id, year
coral wasp
finite crow
#

Thanks, but would that still work if there were doctors who had no admissions in certain years?

coral wasp
#

That's why it's a left outer join

finite crow
#

Oh right, I see now.

#

would num_admissions be null?

#

Or would it be 1?

#

Oh yeah, I was thinking that if year was null, then COUNT(*) would end up being 1

wooden topaz
#

With related to https://datalemur.com/questions/signup-confirmation-rate
Below query produces result of 0.33 which is correct answer.

WITH basic_data AS(
SELECT e.email_id d, t.email_id n
FROM emails e  
LEFT JOIN texts t
ON e.email_id = t.email_id
AND t.signup_action = 'Confirmed')

SELECT ROUND(COUNT(n)::DECIMAL / COUNT(d),2)
FROM basic_data

If i remove the DECIMAL from last but one line, then it results in 0. what could be the reason, pls..

hollow oar
# wooden topaz With related to https://datalemur.com/questions/signup-confirmation-rate Below q...
finite crow
#

Except on the last one, I can't figure out why it's SUM(unit_price * quantity * discount) instead of just SUM(quantity * discount)

coral wasp
#

How could you debug this? How would you write a query that gives more insight into what's happening?

unkempt prism
#

at least one product from each product category.

I don't see any reference to product_category in your attempt.

finite crow
#

Are you expected to be really knowledgeable in statistics for most jobs that use SQL?

unkempt prism
#

by summing the product count you are discarding the category

A better way is to use distinct. This should work in postgres. I'm unsure with SQL Server though I assume there is some sort of equivalent syntax.

SELECT cc.customer_id,
FROM customer_contracts cc  
JOIN products p   
ON cc.product_id = p.product_id
GROUP BY cc.customer_id
HAVING COUNT(distinct product_category) >=3 
ORDER BY customer_id
#

Saying that you may get the right answer if you use COUNT and not SUM here. to make it SELECT customer_id, COUNT(product_count) That way if one category has 2 products its not over represented in the result as the question doesn't ask for it.

finite crow
#

Why would the expected output not include month 6? It shows right there that user 445 was active that month and the previous month. I don't get it.

coral wasp
finite crow
#

No that is my output

finite crow
#

I see now.

#

Write a query to obtain number of monthly active users (MAUs) in July 2022

#

I missed the "July". I thought it wanted the whole year.

#

That seems easier.

wooden topaz
#

The solution i came up with is:

#
WITH basic_data_with_count AS(
SELECT cc.customer_id, p.product_category, COUNT(product_name) product_count
FROM customer_contracts cc  
JOIN products p   
ON cc.product_id = p.product_id
GROUP BY cc.customer_id, p.product_category)

SELECT customer_id, SUM(product_count)
FROM basic_data_with_count
GROUP BY customer_id
HAVING SUM(product_count) >=3
ORDER BY customer_id
#

not getting where i went wrong...

coral wasp
# wooden topaz not getting where i went wrong...

Think about how to rewrite this query to get more information about what it's returning. For instance: which products and product categories did each customer see? If you do that, you'll see your mistake.

reef ermine
coral wasp
#

A lot of new developers stare at their code trying to find the bug, rather than getting more information.

echo rampart
#

what is this

wooden topaz
#
SELECT cc.customer_id, p.product_category, p.product_name
FROM customer_contracts cc  
JOIN products p  
ON cc.product_id = p.product_id  
#

now i have basic data in my hand.
what i need to do is: find whether customer is buying at least one product per each product_category. I am thinking on this now.

#

came up with this:

#
SELECT cc.customer_id, p.product_category,
CASE
  WHEN COUNT(p.product_name) >= 1 THEN 1
  ELSE 0
END AS product_buyed
FROM customer_contracts cc  
JOIN products p  
ON cc.product_id = p.product_id 
GROUP BY cc.customer_id, p.product_category
wooden topaz
#

pls leave above...i came up with new approach

#

SELECT cc.customer_id, p.product_category
FROM customer_contracts cc  
JOIN products p  
ON cc.product_id = p.product_id 
GROUP BY cc.customer_id, p.product_category
HAVING COUNT(cc.customer_id) >= 3
ORDER BY cc.customer_id
coral wasp
#

It's really hard to debug when you compute the count in the having clause. Move the count to the select, and reference it in the having: select count(...) as mycount from ... having mycount>3. Then, use something like group_concat(...) to inspect the values.

coral wasp
wooden topaz
#

finally....

wooden topaz
wooden topaz
#

below is still more improved answer

WITH basic_data AS(
SELECT cc.customer_id, p.product_category
FROM customer_contracts cc
JOIN products p
ON cc.product_id = p.product_id 
GROUP BY cc.customer_id, p.product_category)

SELECT customer_id
FROM basic_data
GROUP BY customer_id
HAVING COUNT(product_category) >= (SELECT COUNT(DISTINCT product_category)
FROM products)


wooden topaz
#

may be the measurement_id is potraied as integer in question, but it could be char behind the scenes

half gyro
wise wind
# wooden topaz

I recommend taking the time to figure the problem out on your own. You're not always going to have someone help you.

half gyro
wise wind
grim quest
#

you guys know any good SQL interview prep sources? i'm looking for something that at least sorts its problems by category, like tagging them as joins for example

#

this is the only one i've found but i just wanna see if i can find some more variety in the questions

coral wasp
finite crow
#

How much statistics do you need to know for most jobs that use SQL?

#

I tried to do one of the easy statistics questions and couldn't do it.

#

I didn't understand the terminology or what it was asking.

coral wasp
finite crow
#

Basically ANY statistics

coral wasp
coral wasp
finite crow
#

I basically only know mean, median, and mode

finite crow
coral wasp
#

What major?

finite crow
#

CS

coral wasp
#

I've never seen one that doesn't.

finite crow
#

No, I've never taken stats

#

I took calculus that I've mostly forgot since it's been awhile.

coral wasp
#

That's weird. Well, I think it's expected that SWEs should have taken engineering stats. That's my experience, at least. And this question is like chapter 1 of engineering stats

#

Even business majors cover this in their lighter business stats course (non calc based)

finite crow
#

Is it hard?

#

Could I just look at khan academy?

coral wasp
finite crow
#

Lmk if you know of any resources to learn it quickly. I don't know any statistics at all.

#

I have summer off. Then I need to go back and I won't have any time.

coral wasp
#

Off the top of my head, no, but it's a normal course that is probably very similar at every Uni. Find a syllabus/cirriculum and maybe ask in #data-science-and-ml

#

(More math ppl there than here, I think)

finite crow
#

Alright, I'll ask there. Thanks for telling me about this.

echo rampart
#

so good

finite crow
# finite crow That seems easier.

Got it.

WITH months AS (
    SELECT DISTINCT user_id,
        EXTRACT(MONTH FROM event_date) AS "month"
    FROM user_actions
    WHERE (
        (
            EXTRACT(YEAR FROM event_date) = 2022
            AND (EXTRACT(MONTH FROM event_date) = 6 OR EXTRACT(MONTH FROM event_date) = 7)
        )
    )
)
SELECT current_month.month,
    count(*) AS monthly_active_users
FROM months AS current_month
WHERE EXISTS (
    SELECT months.user_id,
        months.month
    FROM months
    WHERE months.month = current_month.month - 1 AND months.user_id = current_month.user_id
)
GROUP BY current_month.month
;
pliant compass
#

Hello everyone,
I have a quick question. I’m building an app and I’m trying to connect it to an online database, but I actually don’t know how to do it. I keep getting an error saying I don’t have privileges, but gave my user all the privileges. Also, how do I setup an online database for my app. Can I use regular website hosting plan and just create a database?

wise goblet
#

Databases can be very different

#

And it also sounds like u use mysterious provider of some managed databases. Which once again we can't know which one u use

pliant compass
#

I'm using Python and the provider is Hostinger

#

Hostinger for web hosting

#

So what I did, was I created a tic tac toe game, but I need to connect it to an API, which means I need a database

#

So I used a web hosting platform because it has a database, but to me that doesn't seem like the right solution

wise goblet
wise goblet
wise goblet
wise goblet
#

no library installing is necessary in python for its usage

#

but using it with ORM is ergh... more simple if necessary

#

i would say, using migrating system in python is kind of obligatory to have simple usage of SQL

#

SQLAlchemy + Alembic is strongly encouraged to use in python to have very simple life with SQL
(or Django ORM for Djangoists)

pliant compass
#

Yeah, I was using SQL. What other database is there? I thought almost all used SQL

#

I don't know oif there is difference between Sql and sqlite3, but the setup I used was very confusing 😁

storm mauve
#

SQL is a structured query language, used to interact with relational databases

relational databases store data in a format similar to spreadsheets, with well defined rows and columns, and nearly all of them do use SQL, but there are also a lot of other databases which store data in less structured formats, such as documents (MongoDB) or even graphs (Neo4J)

pliant compass
#

Oh, well, yes mine is a relational database for sure

storm mauve
#

SQLite, PostgreSQL and others are databases, which you can interact with using the SQL language

#

SQL is only used for queries though, you still need of a programming language like Python to interact with other things, and you need of a program that actually sends the query to the database

pliant compass
#

Yeah, so I'm doing that now using Python, but the problem lies in the query

#

It just doesn't work 😕

#

I keep getting an error saying my user doesn't have privileges, so I don't know if it's the database or the provider at this point

storm mauve
#

I would recommend using SQLite to make sure it works, before you try to debug connecting to an online database

pliant compass
#

OK 👍

storm mauve
wise goblet
#

Nice for pet projects

storm mauve
#

there is:

  • The database itself
  • Whatever is hosting it and managing external connections to the database (their network -> their database) *****
  • Whatever you are using to connect to it (your network -> their network) *
  • Your program itself (your program -> your network)

Right now it is probably on that * level aka 'fix your credentials'

wise goblet
finite crow
#
SELECT profit.year,
    profit.product_id,
    profit.curr_year_spend,
    CASE
        WHEN (
            profit.product_id = LAG(product_id, 1) OVER (ORDER BY profit.year, profit.product_id)
            THEN LAG(product_id, 1) OVER (ORDER BY profit.year, profit.product_id)
        )
    ELSE
        NULL
    END AS prev_year_spend,

Why is this invalid syntax?

#

syntax error at or near "THEN"

#

no wait

finite crow
#

nvm, apparently it was the parentheses

finite crow
# coral wasp See lag() window function

Thanks again, I got one with it:

WITH profit (product_id, "year", curr_year_spend, prev_year_spend) AS (
    SELECT product_id,
        EXTRACT(YEAR from transaction_date),
        SUM(spend),
        CASE
            WHEN
                product_id = LAG(product_id, 1, NULL)
                OVER (ORDER BY product_id, EXTRACT(YEAR from transaction_date))
                THEN LAG(SUM(spend), 1, NULL)
                OVER (ORDER BY product_id, EXTRACT(YEAR from transaction_date))
        ELSE
            NULL
        END AS prev_year_spend
    FROM user_transactions
    GROUP BY product_id,
        EXTRACT(YEAR from transaction_date)
    ORDER BY product_id,
        EXTRACT(YEAR from transaction_date)
)
SELECT profit.year,
    profit.product_id,
    profit.curr_year_spend,
    profit.prev_year_spend,
    CASE
        WHEN
            profit.prev_year_spend IS NOT NULL
            THEN ROUND(((profit.curr_year_spend - profit.prev_year_spend) / profit.prev_year_spend) * 100, 2)
        ELSE
            NULL
        END AS yoy_rate
FROM profit
ORDER BY profit.product_id,
    profit.year
;

https://datalemur.com/questions/yoy-growth-rate

coral wasp
#

A few things: DRY: don't repeat yourself. You have extract(year..) four times.

#

I don't understand that case in the CTE. Looks like a hack that could be replaced with PARTITION BY product_id

#

well, I'll eave that there.

finite crow
coral wasp
#

That's not the only reason we don't repeat ourselves.

finite crow
#

Would you give it an alias and then use the alias?
I remember trying to do that at a certain point, and it said the column doesn't exist, so I've just been repeating expressions instead.

coral wasp
#

Push it into a subquery and give it an alias there.

mild gyro
#

do i have to install mongodb to access it or there is another way

storm mauve
mild gyro
#

so i dont have to ..
thanks!

wooden topaz
storm mauve
#

oh wait, never mind

wooden topaz
#

Currently, you're analyzing the highest and lowest open prices for each FAANG stock by month over the years.

input data is for every month, highest stock price & lowest stock price
from that input data, i need to pick up below per stock
which month has highest stock price & what is that month
which month has lowest stock price & what is that month

storm mauve
#

I'd say just try doing it without grouping by and see what you get

#

I am not signing up to see their Solution though

finite crow
wooden topaz
#

This is actually correct output. Now, all i want it what is the corresponding month where this highest_open & lowerst_open are happening.
But, i try to include blow in the SELECT statement, it says it needs to be included in the GROUP BY & on the other hand, i should not GROUP BY month. How to deal with this dichotomy?

TO_CHAR(date, 'Mon-YYYY') mth
coral wasp
coral wasp
storm mauve
#

not sure if you can do something like WHERE/HAVING min/max

coral wasp
wooden topaz
coral wasp
#

Another approach is to use row_number to rank the values, then filter on rows where rank = 1

wooden topaz
coral wasp
wooden topaz
coral wasp
#

I don't understand. If you want to group by the month, then you need to group by the month?

wooden topaz
coral wasp
#

Didn't the question ask for which month has the highest or lowest?

wooden topaz
#

i could be wrong also. in that case, pls tell we where i mis undertood the question....

coral wasp
wooden topaz
#

but, when you dont GROUP BY, answer is
march 50

coral wasp
wooden topaz
coral wasp
#

Don't think of it as a "single query". Think of it as a sequence of questions:

#

Start at the end: What information do you need to know which months are the highest?

#

The idea is to work backwards from the result, and construct subqueries that make it easier

finite crow
#

does x != y evaluate to FALSE if y is NULL and x isn't?

#

WHERE user_id = lag2 AND (user_id != lag3 OR lag3 IS NULL)
There was a WHERE clause that didn't work until I wrote it like that.

remote leaf
coral wasp
coral wasp
#

(after you knwo answer, I'm happy to explain)

finite crow
coral wasp
finite crow
#

I know that null isn't equal to null. But I didn't check if it's inequal to null.

#

You know what I mean?

coral wasp
#

You should check. Very important.

finite crow
#
SELECT *
FROM candidates
WHERE (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
)
LIMIT 1
;

This gets no data.

#
SELECT *
FROM candidates
WHERE NOT (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
)
LIMIT 1
;

This also gets no data.

#
SELECT *
FROM candidates
WHERE NOT (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
) IS NULL
LIMIT 1
;

This also gets no data.

#
SELECT *
FROM candidates
WHERE (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
) IS NULL
LIMIT 1
;

This DOES get data.

#

That seems really weird.

coral wasp
#

You should build a truth table

finite crow
#

Is there an existing truth table?

#

Or a set of rules for this?

#

It seems like any comparison involving NULL evaluates to NULL.

coral wasp
#

The idea is this - null represents absence of knowledge.

#

You can't reason about two unknowns.

#

Does "I don't know" equal "I don't know?

#

Does "I don't know" not equal "I don't know"?

finite crow
coral wasp
#

Test it

finite crow
#

It seems like it does, but I'm not really sure.

#

And google isn't telling me yes or no.

#

Idk if there are exceptions to it evaluating to NULL.

coral wasp
finite crow
#

I mean if just not sure if the tests I did are comprehensive enough.

#

I just did some comparison tests, like less than, greater than, equal to, inequal to

#

You know what I mean?

coral wasp
brazen charm
#

Normally NULL handling differs from DB to DB as well

finite crow
#

Even for something like THAT?

coral wasp
finite crow
#

I hope so

coral wasp
#

there are differences in some of the edge cases, like concatenation with nulls, for instance

finite crow
#

So basically think of NULL as "it could be anything" instead of "nothing"?

#

That's what it seems like.

coral wasp
#

Yes exactly

finite crow
#

Alright then, that clears that up. Thank you.

coral wasp
thorny anchor
#

sqlserver 🗿

elfin cradle
#

!paste

#

what do i even do

#

db = mysql.connector.connect(
    host="localhost",
    user="root",
    passwd="root"
)```
#

im just tryna setup mysql 😭

elfin cradle
#

i dont how this even works tbh im just tryna figure it out as i go

elfin cradle
#

im an idiot

#

whoops

elfin cradle
#

ty

cedar tiger
elfin cradle
#

sorry i know im asking to be spoonfed but im confused

cedar tiger
elfin cradle
#

yes

#

works fine

#

did it thru workbench

cedar tiger
#

OK then run the commands or alter your user to use the mysql_native_password. Not sure how that looks like in workbench

#

But its something to edit your user in mysql

elfin cradle
#

wdym alter user

cedar tiger
#

alter also means "change"

elfin cradle
#

ik but how

cedar tiger
#

🤷‍♂️

#

Idk what that looks like in workbench

#

But

#

The command stated in that answer shows it can be done via CLI

elfin cradle
#

so would i need to run those commands in command prompt?

#

cli?

cedar tiger
#

If you can login to your mysql server via command prompt/cli, then you can alter your user from there (by following the commands shared)

torn sphinx
#

Can someone help me complete the following process? I am trying to download Postgres to do this but I dont know which version I need and what packages to use?
Setting up a Django project locally.
Setting up a MySQL or Postgres database locally, ideally using Docker.
Make sure things can run locally, all connected.
Understand what properties are being edited to control the connection between the Django app and database.
Use ENV variables to define those properties.
Figure out how to Dockerize the Django app
Deploy a cloud database. (AWS RDS? but that costs money? other suggestions)
Set up users and grants in the cloud database.
Deploy the Django app in a cloud provider with the right ENV variables so it points to your cloud database

fading patrol
torn sphinx
fading patrol
torn sphinx
fading patrol
finite crow
#
WITH cte1 AS (
    SELECT SUM(num_users) OVER (ORDER BY searches) user_count,
        searches
    FROM search_frequency
    WHERE num_users > 0
    ORDER BY searches
), cte2 AS (
    SELECT user_count                                 AS uc_high,
        searches                                      AS s_high,
        LAG(user_count) OVER (ORDER BY user_count)    AS uc_low,
        LAG(searches) OVER (ORDER BY user_count)      AS s_low
    FROM cte1
    ORDER BY user_count
), cte3 AS (
    SELECT MAX(uc_high) AS uc_total
    FROM cte2
), cte4 AS (
    SELECT *
    FROM cte3
        CROSS JOIN cte2
    WHERE uc_low <= (CAST(uc_total AS DECIMAL) / 2) AND uc_high > (CAST(uc_total AS DECIMAL) / 2)
), cte5 AS (
    SELECT
        CASE
            WHEN MOD(uc_total, 2) = 1 THEN s_low
        ELSE
            CASE
                WHEN uc_low = (CAST(uc_total AS DECIMAL) / 2) THEN (CAST(s_low AS DECIMAL) + s_high) / 2
            ELSE
                s_low
            END
        END AS median_not_rounded
    FROM cte4
)
SELECT ROUND(median_not_rounded, 1) AS median
FROM cte5
;

https://datalemur.com/questions/median-search-freq
Their solution did this in ten lines. It's kind of frustrating.

finite crow
#
WITH searches_expanded AS (
  SELECT searches
  FROM search_frequency
  GROUP BY 
    searches, 
    GENERATE_SERIES(1, num_users))

SELECT 
  ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY searches)::DECIMAL, 1) AS  median
FROM searches_expanded;

The searches_expanded part makes no sense at all to me. grouping by searches, GENERATE_SERIES(1, num_users)) should be exactly the same as grouping by searches, num_users.

#

Because it just returns a series of numbers from 1 to num_users.

#

Also, when I put GENERATE_SERIES(1, num_users) in the SELECT clause, it evaluates to integers, and I can't figure out where they're coming from. I don't get what it's doing.

#

The documentation doesn't explain it either.

hollow oar
#

you can disregard the group by abuse that the "model answer" uses tbh.

#

i.e. just use

WITH searches_expanded AS (
  SELECT searches, GENERATE_SERIES(1, num_users)
  FROM search_frequency
)

SELECT 
  ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY searches)::DECIMAL, 1) AS  median
FROM searches_expanded;
finite crow
hollow oar
#

of which

This section describes functions that possibly return more than one row.
possibly return more than one row. is the key point

finite crow
#

Ok, I think I see now.

#

Doesn't that blow up the number of rows?

#

It seems like it would be kind of inefficient.

hollow oar
#

yeah i would assume so.

finite crow
#

Yeah wtf, it would end up generating the trillions of rows like you're supposed to be avoiding.

#

I don't get what their idea is.

finite crow
finite crow
#

I kept having a hard time with stuff like not being able to nest window functions, or not being able to look at the maximum in a column in the WHERE clause. Stuff like that made it so I couldn't figure out how to reduce the number of CTEs

hollow oar
#

i think this is just a quiz on your knowledge on window function, if you have written SUM(...) OVER(...) then i am already happy as an interviewer tbh.

#

the rest is just an execrise of how careful you are

#

i would have used a bunch of scalar query instead of cross joins

WITH cumsum AS (
  SELECT
    SUM(num_users) OVER(ORDER BY searches) AS running_sum, searches
  FROM
    search_frequency
), total_users AS (
  SELECT SUM(num_users) AS val FROM search_frequency
), mid_point AS (
  SELECT val::decimal / 2.0 AS val FROM total_users
), aux AS (
  SELECT
    (SELECT searches FROM cumsum WHERE running_sum <= (SELECT val FROM mid_point) ORDER BY running_sum DESC LIMIT 1) AS bot
    , (SELECT searches FROM cumsum WHERE running_sum > (SELECT val FROM mid_point) ORDER BY running_sum ASC LIMIT 1) AS top
    , MOD((SELECT val FROM total_users), 2) = 0 AS should_avg 
)
SELECT 
  CASE WHEN should_avg THEN (bot + top)::decimal / 2.0
  ELSE bot
  END
FROM aux
#

(not sure if the above is correct as well tbh.)

finite crow
#
SELECT ROUND(
  CASE WHEN should_avg THEN (bot + top)::decimal / 2.0
  ELSE bot
  END,
  1
)
FROM aux
#
WITH cte as(
    SELECT *,
        SUM(num_users) OVER(ORDER BY searches)    AS cumsum_users,
        SUM(num_users) OVER()                     AS tot_users
    FROM search_frequency
)
SELECT ROUND(AVG(searches), 1) AS median
FROM cte
WHERE (
    tot_users <= cumsum_users * 2
    AND tot_users >= (cumsum_users - num_users) * 2
)
;

Here's one someone posted in the discussions. The first comment posted there, the one with this answer, points out how inefficient the recommended answer is.

finite crow
#

SUM(something) OVER ()
That's one of the things I was looking for.

#

I could have done that instead of the cte that just joins with the MAX(uc_high) cte.

finite crow
#

I don't have a very good idea of how smart query optimizers are.

coral wasp
coral wasp
#

Why not use percentile_cont?

finite crow
#

Because then you need to generate two trillion rows.

coral wasp
#

Why would it generate two trillion rows?

finite crow
#

Because it says there are two trillion users, so using GENERATE_SERIES for each num_users would create that number of rows.

finite crow
#
SELECT CAST(ROUND(CAST(SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END), FLOAT) / COUNT(*), 2), TEXT) || '%' AS percent_of_male_patients
FROM patients
;

near ",": syntax error
Why is there a syntax error in this?

finite crow
#
SELECT CAST(ROUND((CAST(SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) AS TEXT) || '%' AS percent_of_male_patients
FROM patients
;

This fixed it.

#
SELECT round(100 * avg(gender = 'M'), 2) || '%' AS percent_of_male_patients
FROM patients
;

Apparently this works too.

#

Is giving a boolean expression to an aggregate function only possible on certain SQL implementations? It lets me do it on sql-practice but not on datalemur.

coral wasp
torn sphinx
# torn sphinx i did it!

now how do i use this to make a database? should i make it in docker or use docker to make it somewhere else?

#

I am trying to figure out how to deploy a django REST api with a MySQLdatabase on an AWS EC2 server
In order to do this should I still use Postgres/Docker to access MySQL or should I use AWS RDS or something?

finite crow
finite crow
#

I completed all the available hard difficulty SQL questions on datalemur, sql-practice, and leetcode. But I still find that I need to look at the documentation a lot.

wild lotus
#

hello if someone could help me that would be great. its stupid but i need help merging a csv file into my pycharm ide file

#

oh wow this indian lady just helped me on youtube, 1 hour of trying solved in 30 seconds, #indian people are great

#

does this look accurate? now i can edit and manipulate the data is that correct?

waxen finch
#

@woeful coral #discord-bots message
yes, there are a few different sqlite extensions that provide encryption support

personally i found SQLite3MultipleCiphers the most convenient mainly because it provides prebuilt windows DLLs (see their releases page), but also because its completely free compared to SQLCipher and sqlite's official encryption extension, SEE

fading patrol
finite crow
coral wasp
torn sphinx
#

Anyone experienced with setting up local mongodb?

#

I’m trying to decide if I should do my apps rate limiting/usage tracking through local mongodb or redis and shared mongodb which gets periodically synced.

Obviously I want speed, the reason I’m doing this in the first place, however if the speed isn’t THAT different from local mongodb and redis id gladly take the little slowdown of mongodb to make it easy and no syncing needed. But if the difference is substantial or not even that far from shared mongodb time then might as well go through the work of redis and syncing

#

I tried installing local mongodb got ton of errors that seemed common but all the common fixes did nothing for me…

Because of this, I wasn’t able to run any benchmarks to see what my best choice would be.

So I’m asking for either help setting it up locally on Ubuntu or with giving me your experiences using local mongodb to get an idea of the speed. Thanks

keen minnow
#

<@&831776746206265384> shitposting

radiant elbow
#

!cban 1245012707523170399 You seem to have joined just to troll.

delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied ban to @bleak plover permanently.

fading patrol
daring laurel
#

Howdy there, having some issues with my sqlalchemy set up. I am getting a reference error. I have tried forward referencing, and using the typing's List type rather than the build in list. I am still getting blocked and am very confused. Any advice would be much appreciated. I have included the code below as well.

sqlalchemy.exc.ArgumentError: Could not locate SQLAlchemy Core type for Python type typing.List[services.host.ip.ip_record_sqlalchemy.IpRecordSqlAlchemy] inside the 'ip_history' attribute Mapped annotation

ip_record_sqlalchemy.py

from datetime import datetime
from typing import TYPE_CHECKING
from uuid import uuid4

from services.automatic_fiesta_base import AutomaticFiestaBase
from services.host.ip.ip_common import IpDTO
from sqlalchemy.orm import Mapped, mapped_column, relationship

if TYPE_CHECKING:
    from services.host.host_record_sqlalchemy import HostRecordSqlAlchemy


class IpRecordSqlAlchemy(AutomaticFiestaBase):
    __tablename__ = "ips"

    id: Mapped[int] = mapped_column(primary_key=True, default=uuid4().int)
    address: Mapped[str] = mapped_column()
    last_seen: Mapped[datetime] = mapped_column()
    host: Mapped["HostRecordSqlAlchemy"] = relationship(back_populates="ip_history")


def ip_record_to_dto(ip_record: IpRecordSqlAlchemy) -> IpDTO:
    return IpDTO(
        address=ip_record.address,
        last_seen=ip_record.last_seen,
    )

host_record_sqlalchemy.py in thread.

daring laurel
# daring laurel Howdy there, having some issues with my sqlalchemy set up. I am getting a refere...

oops, i don't see an option for threads, so here it goes:

host_record_sqlalchemy.py

from datetime import datetime
from typing import TYPE_CHECKING, List
from uuid import uuid4

from pydantic import PositiveInt
from services.automatic_fiesta_base import AutomaticFiestaBase
from services.host.host_common import (
    HostDTO,
    HostRecordNotFoundException,
)
from services.host.host_record_base import HostRecordDatastoreABC
from services.host.ip.ip_common import IpDTO
from services.host.ip.ip_record_sqlalchemy import IpRecordSqlAlchemy, ip_record_to_dto
from services.scan.directory_scan.directory_scan_record_sqlalchemy import (
    directory_scan_record_to_dto,
)
from services.sqlalchemy.sqlalchemy_database import SqlAlchemyDatabase
from sqlalchemy import select
from sqlalchemy.orm import Mapped, mapped_column, relationship

if TYPE_CHECKING:
    from services.scan.directory_scan.directory_scan_record_sqlalchemy import (
        DirectoryScanRecordSqlAlchemy,
    )


class HostRecordSqlAlchemy(AutomaticFiestaBase):
    __tablename__ = "hosts"

    id: Mapped[int] = mapped_column(primary_key=True, default=uuid4().int)
    hostname: Mapped[str] = mapped_column()
    os_type: Mapped[str] = mapped_column()
    os_version: Mapped[str] = mapped_column()
    os_full_type: Mapped[str] = mapped_column()
    ip_history: Mapped[List["IpRecordSqlAlchemy"]] = mapped_column(back_populates="host")
    directory_scans: Mapped[List["DirectoryScanRecordSqlAlchemy"]] = relationship(
        back_populates="host"
    )
    check_ins: Mapped[List[datetime]] = mapped_column()
daring laurel
# daring laurel oops, i don't see an option for threads, so here it goes: host\_record\_sqlalch...

I realized that I did not do a forward ref in host_record_sqlalchemy.py. I have updated the type checking to this:

if TYPE_CHECKING:
    from services.scan.directory_scan.directory_scan_record_sqlalchemy import (
        DirectoryScanRecordSqlAlchemy,
    )
    from services.host.ip.ip_record_sqlalchemy import IpRecordSqlAlchemy

However, getting similar error

sqlalchemy.exc.ArgumentError: Could not resolve all types within mapped annotation: "sqlalchemy.orm.base.Mapped[typing.List[ForwardRef('IpRecordSqlAlchemy')]]".  Ensure all types are written correctly and are imported within the module in use.
daring laurel
torn sphinx
#

I am running this command correctly by using two lines?

jagged light
#

bash-4.4# is part of the prompt. Don't copy it when following guides.

torn sphinx
jagged light
#

you can't. aws databases are hosted, so you can't access the console.

#

run mysql on a ec2 instance using the host of the database

torn sphinx
jagged light
#

Is your database exposed to the internet?

torn sphinx
jagged light
#

then you have to do it from an ec2 instance.

#

This is all aws, right?

torn sphinx
torn sphinx
jagged light
#

install the mysql-client

#

sudo yum install mysql

#

you might also be able to use dnf instead of yum depending on the os version

#

It's dnf on 2023

torn sphinx
torn sphinx
jagged light
#

Looks like you need to manually add the mysql repo

fading patrol
torn sphinx
fading patrol
torn sphinx
#

they time out or otherwise have errors so i have to restart them (unless i am doing something wrong?)

torn sphinx
#

"Install the downloaded release package with the following command, replacing platform-and-version-specific-package-name with the name of the downloaded RPM package:" What does this mean, after I download the package I have to manually edit it through powershell or something?

torn sphinx
jagged light
#

amazon linux uses dnf, not apt

torn sphinx
#

Hi, i'm trying to get mysql working with my flask application but i keep getting AttributeError: 'NoneType' object has no attribute 'cursor' when i try to get any data from my table. I've double checked the config details.

fading patrol
west blaze
#

Is there a way to generate and write migration files for postgres in pure python anyone can recomend?

#

Like, I'd like to make a migrationfile that creates a couple of tables. Any packages that would allow me to generate such a file and ensure that the migration is applied to the db?

fading patrol
wooden topaz
#

Now, need one more column TO_CHAR(date, 'Mon-YYYY') mth in the above query. could any one pls tell me how to do this.

west blaze
finite crow
#

When you're using MIN and/or MAX, is there a way to list values associated with the minimum and maximum rows aside from the minimum and maximum values?

finite crow
fading patrol
coral wasp
coral wasp
finite crow
dawn moss
#

yo currently using postgresql with fastapi & sqlalchemy

#

was wondering how should I implement RLS or RBAC

wooden topaz
# coral wasp If you're still stuck: if you know the highs for every stock, then you can join ...

finally: after lot of tries & tries since 3 days. I mean, there is a solution in the site, but looking at that & proceeding in the same way would gives the answer, but i thought of trying on my own, step by step by understanding...

SELECT 
  ticker,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MAX(t1.open)) highest_mth,
  MAX(t1.open) highest_open,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MIN(t1.open)) lowest_mth,
  MIN(t1.open) lowest_open
FROM stock_prices t1
GROUP BY ticker
ORDER BY ticker 
#

By the way, i did it using corelated sub query, on the other hand you suggested to use JOINS right. I did not get that.
could you pls modify the above query using JOINS & show me once(Nothing urgent, whenever you have few minutes). Thx

coral wasp
coral wasp
wooden topaz
wise wind
#
        dm = Dms(sender=username_1,receiver=username_2)
        dm_message = Dm_Messages(text="hi",username=username_1,date=datetime.now())
        dm.dm_messages.append(dm_message)```
#

UPDATE dm_messages SET dm=$1::INTEGER WHERE dm_messages.id = $2::INTEGER

#

[generated in 0.00020s] (None, 1)

#

When I try to create a related sqlalchemy model via relationship, it runs this query under the hood.

#

This is leading to a not null constraint error. Is there anyway to avoid this behavior?

coral wasp
wooden topaz
#
SELECT 
  ticker,
  MAX(t1.open) highest_open,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MAX(t1.open)
    LIMIT 1) highest_mth,
  MIN(t1.open) lowest_open,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MIN(t1.open)
    LIMIT 1) lowest_mth
FROM stock_prices t1
GROUP BY ticker
ORDER BY ticker
coral wasp
#

Thats doesn't fix the problem correctly: what if two tickers had the same max in different days?

wooden topaz
# coral wasp Thats doesn't fix the problem correctly: what if two tickers had the same max in...

any ways, i tried new approach:

WITH highest_values AS(
SELECT ticker,
MAX(open) highest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS row_num
FROM stock_prices
GROUP BY ticker, open, date 
ORDER BY ticker),

lowest_values AS(
SELECT ticker,
MIN(open) lowest_open,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS row_num
FROM stock_prices
GROUP BY ticker, open, date 
ORDER BY ticker)

SELECT 
highest_values.ticker,
highest_values.highest_mth, 
highest_values.highest_open,
lowest_values.lowest_mth,
lowest_values.lowest_open
FROM highest_values
JOIN lowest_values
ON highest_values.row_num = lowest_values.row_num
  AND highest_values.row_num = 1
  AND lowest_values.row_num = 1
#

look at the last two lines alone.....
i am only joining the rows where the row number is 1, right?
but at the output, every row is coming.

coral wasp
#

Inspect each of your subqeuries first

#

Are they returning what you want?

wooden topaz
coral wasp
#

I don't believe they are: I'm talking about highest values and lowest values.

wooden topaz
coral wasp
wooden topaz
coral wasp
#

So why are you asking about the row number? You're only getting row numbers 1, but your subqueries are wrong.

wooden topaz
#

whats wrong with it?

#

for each and every ticker, this sub query is presenting all the rows from highest to lowest with corresponding row_num's

coral wasp
#

Also, secondly, look at your on clause: you're not comparing tickers

coral wasp
wooden topaz
coral wasp
#

That's what the window partition does.

#

Row_number() over (partition by ticker order by open) ranks each row by open, for each ticker.

coral wasp
#

So what's the group by doing?

wooden topaz
#

how about this: this MAX(OPEN) --> this aggregate needs to be executed per GROUP of ticker.

coral wasp
#

Why? Row number is already telling you the largest value

#

The way you wrote it, that's the max value for each (ticker, open, date), which is the same as the open (since only one value per date). Max does nothing.

wooden topaz
coral wasp
#

You're computing the max(open) over sets of exactly 1 value, since there's only one open for each (ticker, open, date)

wooden topaz
# coral wasp Your aggregate is doing nothing in this query.

right & i also corrected the mistake at JOIN ON clause & below is the perfect working query

WITH highest_values AS(
SELECT ticker,
open highest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open DESC) AS row_num
FROM stock_prices),

lowest_values AS(
SELECT ticker,
open lowest_open,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS row_num
FROM stock_prices)

SELECT 
highest_values.ticker,
highest_values.highest_mth, 
highest_values.highest_open,
lowest_values.lowest_mth,
lowest_values.lowest_open
FROM highest_values
JOIN lowest_values
ON highest_values.ticker = lowest_values.ticker
  AND highest_values.row_num = 1
  AND lowest_values.row_num = 1
coral wasp
#

You know you don't need two subqueries now

coral wasp
#

And I'd optimize by pulling the to_char into the outer select. You only need it for the result

wooden topaz
coral wasp
#

Or just refactoring to have one window query, then a highest and lowest query that uses that, and then a final query to join them

wooden topaz
# coral wasp Yes (same query)
WITH data AS(
SELECT ticker,
open highest_open,
open lowest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open DESC) AS highest_row_num,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS lowest_row_num
FROM stock_prices)

SELECT 
d1.ticker,
d1.highest_mth, 
d1.highest_open,
d2.lowest_mth,
d2.lowest_open
FROM data d1
JOIN data d2
ON d1.ticker = d2.ticker
  AND d1.highest_row_num = 1
  AND d2.lowest_row_num = 1
#

i know below things can be written only once, but atleast from readabiliy perspective, it looks easy to understand like this. thats why left like this

#
open highest_open,
open lowest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
coral wasp
#

Now, there's a trick you can use to eliminate the join, using an aggregate

#

You could get the first() filtered where row num is 1

#

And I would move the t o_char to the outer query.

wooden parrot
#

Hello! I want to ask in MongoDb how to update or push the array by matching a key and value pair?
So when I used $push operator it always inserting new array in test. But I want this to behave like, it should match with the user_id if same user_id already exist in the test array just update that else it will push an new array to test. How can I do it in a single operation.

data = {
    "test": [
        {
            "user_id": 1,
            "key": "value"
        },
        {
            "user_id": 2,
            "key": "value"
        }
    ]
}
wooden topaz
#

When i am reading about AWS step functions, it is always referred as serverless fn orchestrator. So, an step function can not orchestrate services like EC2, because it is server based service?
There is one more AWS service named AWS Glue, which is called as server less Data pipeline provided by AWS. Through AWS we can define whole work flow such as data loading, data cleaning & data loading. In fact we can even schedule jobs.
So, isn't it both AWS Glue & Step fn's are doing same job?

mortal solstice
#

using .json file as my discord bot database? Is this good or suitable if i plan to make my bot high scale?!

storm mauve
#

it is a very bad idea if you want to scale at all

fading patrol
# wooden topaz When i am reading about AWS step functions, it is always referred as serverless ...

You can orchestrate multiple Glue crawlers in parallel with Step Functions. https://aws.amazon.com/blogs/compute/orchestrating-aws-glue-crawlers-using-aws-step-functions/

Amazon Web Services

This blog post is written by Justin Callison, General Manager, AWS Workflow. Organizations generate terabytes of data every day in a variety of semistructured formats. AWS Glue and Amazon Athena can give you a simpler and more cost-effective way to analyze this data with no infrastructure to manage. AWS Glue crawlers identify the schema of […]

kind burrow
#

Hi, Does anyone know how to tackle this error ?
js exceptional error 503

coral wasp
torn sphinx
#

how deep does the database rabbit hole go? Wha am I in for?

coral wasp
torn sphinx
wise goblet
# torn sphinx how deep does the database rabbit hole go? Wha am I in for?

the regular non web developer usually just learns Raw SQL, able to use Sqlite3 if needed, and calls it a day.
Usually learn designing relational tables and writing complex queries to query anything

for backend developers (data engineers) journey only begins with raw SQL in depth, preferably for specific also engine details (like postgresql, or optionally mysql/mariadb)
Also of course learn designing and querying in all ways
They learn how to debug it and performance optimize it (with operating millions of row records), with solving its different issues from at query level to maintenance level a bit
small enough app can be using relational db for almost everything.

As they need to go further, it is not uncommon learning Elastic Search or its equvalients. (to feel gaps in functionality)
working with blob storage databases for designing dynamic assets stateless (for horizontal scaling) backend apps
and working with other dbs depending on needs (than more data, more scale of app, then more specific usage cases can be needed)
redis/memcache for caching stuff
redis/rabbitmq or alterantives for message queue stuff (or even Kafka is common for event streaming db)
due to managed databases simplifying life a lot, it is also common choosing cloud provider to specialize in (AWS, GCP or Azure)

Maintanance of databases is usually in hands of System Administrators.
but they can be having also Database Administrators, specialized guys responsible for it with expertise in depth
If they have DevOps engineers, they can be usually responsible for infrastructure as a code approach to keep running managed databases at cloud provider, or having running dbs in kubernetes like https://cloudnative-pg.io/

CloudNativePG - PostgreSQL Operator for Kubernetes

#

fifty shades of databases, with its self hosted options, and cloud provider specific managed alternatives

wise goblet
#

also databases are one of main reasons (but not all of them) why frontend devs learning small bit of backend and calling themselves full stack devs are heavily frowned upon sometimes. They have to learn dealing with databases properly, otherwise app is greatly screwed up in terms of performance and storage consumptions. At beginner level there are plenty of rookie mistakes that can make 100 times difference.

manic gyro
#

I have this sqlite3 database for a bot for a VR game that adds a leveling system, but when I try to use this function I get None returned even if the user id is in the database, does anyone know why this would happen?

def get_level(user_id):
    c.execute('''SELECT level FROM users WHERE id = ?''', (user_id,))
    result = c.fetchone()
    if result:
        return result[0]
    else:
        return None
wise goblet
wise goblet
torn sphinx
#

is sqlite better on python or vscode?

wise wind
coral wasp
cosmic surge
#

Does anyone know how to transfer 1 postgres db thats in a docker to another host? But without losing the data in it..?

#

Just needing docs or something on it?

torn sphinx
coral wasp
coral wasp
torn sphinx
fading patrol
fading patrol
waxen finch
# torn sphinx no, like two years ago I downloaded some sqlite extension for VS code

which is better in terms of interactively exploring sqlite databases? writing a python script can be cumbersome for that depending on what you need, as for your vscode extension i cant say how good the experience is

personally i recommend the SQLiteStudio desktop app for interacting with databases, you can easily see the schema, get an overview of a table's data, and of course run queries on it
https://sqlitestudio.pl/

if however you need to programmatically access your database or automate some work, only python can do that

cosmic surge
#

We already got it, but thank you.

fading patrol
torn sphinx
wise goblet
cosmic surge
fading patrol
slim garden
#

does this schema make sense to yall

#

is it even normalized

cosmic surge
fading patrol
# slim garden is it even normalized

room_number could be a little dicey but maybe acceptable. Would you ever want to store anything else about a room, like how many students it accommodates? If yes, make it a table

slim garden
fading patrol
delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied timeout to @true zinc until <t:1719351915:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

hushed wren
#

I tried to import large CSV files to MySQL Workbench database but I got this error. I wonder if there's any chance I coudl fix?

#

You mean, change the data type?

#

Should I change the data type in MySQL or with Python?

#

The file is 2.7 GB, I think it may take plenty of time for me to process the file in Python.

wise wind
hushed wren
#

What tools should I use and free?

wise wind
#

I should've asked that earlier.

wise wind
hushed wren
#

Sure, lemme upload the file to Google Drive and I'll send you the link

#

Lmk what you think

wise wind
hushed wren
#

I was able to read this file in Python

#

Should I load this to a dataframe then load to MySQL database?

wise wind
#

Is it on a cloud server?

hushed wren
#

I use Community Edition

wise wind
#
user = await db.execute(select(Users).where(Users.username == usernames[index]).options(selectinload(Users.sent_dms)
                                                                                                    .selectinload(Dms.dm_messages)
                                                                                                    .joinedload(Dm_Messages.user))
                                                                                                     .execution_options(populate_existing=True))```
#

Does anyone know why the relationships aren't loading? The relationships load when I run this query:

#
user = await db.execute(select(Users).where(Users.username == usernames[index]).options(selectinload(Users.sent_dms)
                                                                                                    .selectinload(Dms.dm_messages)
                                                                                                    )
                                                                                                    .execution_options(populate_existing=True))```
#

but not when I run the top query

hushed wren
chilly gulch
#

anyone here famiiliar with structure charts?

unreal obsidian
#

Can anyone provide me dataset for generators(of thermal power plant).

torn sphinx
#

how do I run this sql lite project through an AWS EC2 server, should i sudo apt install sqllite?

torn sphinx
coral wasp
torn sphinx
warm igloo
#

I have a question. Is anyone here familiar with the WISKI software? I'm trying to take data stored on there and then importing that data into Python.

warm igloo
#

One library I found when looking was called kiwis_pie. Sadly there's not much information about it other than the documentation.

warm igloo
#

Never mind. I'm likely going to be able to query the data in Python via SQL Server thankfully.

neat umbra
#

can i ask for php here?

coral wasp
#

!topic

delicate fieldBOT
torn sphinx
#

Can anyone help me with executing postgres query using psycopg ?

torn sphinx
#

It's been a while since I messed with databases, so I was wondering how to structure a database for a simple project of mine?

#

I'm making a bot that tracks scores of users on a per server basis.

#

The idea being that your score is specific to only your server.

#

So I think it only has userID, serverID, and score, but I'm not sure how to design it to seperate the scores.

torn sphinx
#

Guess this channel isn't useful, I am leaving.

#

*left

wise goblet
brave tundra
#

for row in cur.execute("select distinct(player_id) from results order by match_no"):
list.append(row[0])

#

Why does the none appear at last?

#

I only have 8 rows

#

or if I remove the 'distinct' it became like this.. more none

wise goblet
#

Deselect nones with Where condition if u wish not seeing those rows

brave tundra
#

here's the db

#

there really is only 8 rows

wise goblet
#

U may have filter in place to ignore some rows

brave tundra
#

oh I'll check it thanks

#

nope no filter

#

nvm I addded 'is not null'

#

thanks

fresh grail
#

Which database package should I use with python?

#

The databases I do with sqlite3 malfunction all the time

hexed estuary
#

Malfunction? How?

torn sphinx
#

I'm actually reading up on sqlite3 now being how I should probably use it.

#

Where would you guys store your database?

fresh grail
#

I'd need to say, the database gets created by the program

brazen charm
#

You are definitely doing something wrong if that is the case

#

It is almost certainly not an sqlite issue

hexed estuary
torn sphinx
fresh grail
#

I'll send the relevant code in

#
try:
    # Verbindung zur SQLite-Datenbank herstellen
    conn = sqlite3.connect('battle_data.db')
    cursor = conn.cursor()

    # Tabelle erstellen (falls nicht bereits vorhanden)
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS Battle (
       BattleID VARCHAR(255) PRIMARY KEY,
       BattleTime VARCHAR(255),
       BattleMode VARCHAR(255),
       BattleMap VARCHAR(255),
       BattleResult VARCHAR(255),
       PlayerName1 VARCHAR(255),
       PlayerName2 VARCHAR(255),
       PlayerName3 VARCHAR(255),
       PlayerName4 VARCHAR(255),
       PlayerName5 VARCHAR(255),
       PlayerName6 VARCHAR(255),
       Player1 VARCHAR(255),
       Player2 VARCHAR(255),
       Player3 VARCHAR(255),
       Player4 VARCHAR(255),
       Player5 VARCHAR(255),
       Player6 VARCHAR(255),
       Brawler1 VARCHAR(255),
       Brawler2 VARCHAR(255),
       Brawler3 VARCHAR(255),
       Brawler4 VARCHAR(255),
       Brawler5 VARCHAR(255),
       Brawler6 VARCHAR(255)
    )
    ''')

    # Daten in die Tabelle einfügen
    cursor.execute('''
    INSERT INTO Battle (
        BattleID, BattleTime, BattleMode, BattleMap, BattleResult, 
        PlayerName1, PlayerName2, PlayerName3, PlayerName4, PlayerName5, PlayerName6,
        Player1, Player2, Player3, Player4, Player5, Player6, 
        Brawler1, Brawler2, Brawler3, Brawler4, Brawler5, Brawler6
    ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', battle_data)

    # Änderungen speichern
    conn.commit()

except sqlite3.DatabaseError as e:
    # Fehlerprotokollierung und detaillierte Fehlermeldung
    print(f"Failed to insert battle data: {e}")
    raise sqlite3.DatabaseError(f"Failed to insert battle data: {e}")

finally:
    # Verbindung schließen
    if conn:
        conn.close()

hexed estuary
# torn sphinx Yeah, pretty much.

Depends on what's in it - e.g. if it's something like settings, store it in a platform-specific app folder (e.g. on Windows it'd be in %APPDATA%, and in general you can use appdirs). If in doubt, can always just use the folder your program is in - that'd be a portable installation of sorts.

torn sphinx
torn sphinx
#

has anybody used sqlalchemy alembic? is it a common pattern to create migrations as models are defined?

fading patrol
spare fox
#

hi guys! so my company is copy pasting all the data from google manually..... to get real time data like working hours every week. Is it possible to scrape that from the website? and not actually write it in the table

coral wasp
fervent charm
#

anyone used kdb+

coral wasp
#

What's up tho?

fervent charm
# coral wasp What's up tho?

looking to store daily trade data at minute resolution. also I am broke lol. Also storing daily reddit info from api, so far have about 250 parquet files for it, so getting unwieldy and some database is probably in order. doing all processing of reddit data with polars because pandas was too slow

#

kdb has like a one year free trial but honestly whos trying to deal with that and renewing and stuff (and I think I have to learn their weird q language).. probably gonna go with more ML approach rather than try to compete with HFT which is what kdb tries to cater to

coral wasp
#

1min candles are usually manageable without something like KDB

#

I'm a DuckDB shill but you can get pretty far querying a directory of partitioned parquet files. Might need to compute some rollups first/etc, but good partitioning leads to efficient queries

fervent charm
coral wasp
#

That's (partly) what the partitioning is for.

fervent charm
#

so I should just read in everything and rewrite it to more organized folders?

coral wasp
dim hull
#

hey anyone here watched CMU Intro to Database Systems course because i tried to watch it and lot of things i didn't understand i already have some database knowledge with sql but this is the first course i found it hard to understand so is this normal or what

coral wasp
fervent charm
#

stuff like timescale db has automatic partitioning, researching that rn

dim hull
coral wasp
coral wasp
coral wasp
brave tundra
#

for i in semi_finalist(): cur.execute('update player set seed = 1 where player_id = {i}')

#

how do I check the player_id is = to the i

#

the semi_finalist() is a list

coral wasp
#

!sql

delicate fieldBOT
#
SQL & f-strings

Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.

Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.

For example, the sqlite3 package supports using ? as a placeholder:

query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)

Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.

See Also

  • Python sqlite3 docs - How to use placeholders to bind values in SQL queries
  • PEP-249 - A specification of how database libraries in Python should work
coral wasp
#

You are missing the 'f' to make it an fstring, but you never want to do that. Do what the above text says to do instead.

brave tundra
#

okay thanks

coral wasp
#

And, for efficiency's sake, you should run a single update instead of a loop. You can do this in a single query with a list of ids.

brave tundra
#

alright I'll give it a try

#

thanks

coral wasp
#

Can't do = a list

#

You want 'in (?)' instead

#

Ie: Give me all rows where the id is in the id_list

brave tundra
#

[1, 2, 3, 4]

fervent charm
#

timescale also seems nice for adding data in real time from a stream while duckdb not so much for that

coral wasp
#

HFT vs non is a big divide. I live on the non side, so my tech stack is more throughput vs latency focused

fervent charm
#

and timescale seems flexible enough that I can use it for financial metrics, economic indicators, as well as the typical time series

coral wasp
fervent charm
coral wasp
#

Or perhaps said differently: you'd still need a production pipeline to retrieve (or receive) and store the data.

brave tundra
coral wasp
#

One option is timescale, another is parquet, etc

coral wasp
brave tundra
#

Thanks a lot

cedar tiger
fresh grail
#

My databases keep getting "malformed" but it successed in inserting data

#

I use sqlite3

#

When I download the database and want a website to read it for me, it says it can't

#

Does anyone knows about how to fix it

fading patrol
brave ingot
#

hey! I'm trying to avoid hard-coding relationship strings in sqlalchemy to make everything more type safe

for example, if I have such models:

class User(Base):
    id = Column(Integer, primary_key=True)
    is_active = Column(Boolean, default=True)

    items = relationship(lambda: Item, back_populates="owner")


class Item(Base):
    id = Column(Integer, primary_key=True)
    owner_id = Column(Integer, ForeignKey(User.id))

    owner = relationship(lambda: User, back_populates="items")

how do I avoid hard-coding values like "owner" or "items" of back_populates?

astral current
#

Best database for beginners doing dpy?

wise wind
fervent charm
coral wasp
wise goblet
wise wind
#
await db.execute(select(Users).options(selectinload(Users.sent_dms).selectinload(Dms.dm_messages).selectinload(Dm_Messages.user)))``` Whenever I try to load in `Dm_Messages.user`, nothing loads leading to a greenlet error, but whenever I don't load it, the relationships load like normal. Does anyone know what could be causing this issue?
slim ingot
#

got a thing going

fervent charm
#

ended up using an lru cache and just dealth with the first run being kinda slow

#

So if questdb makes reads faster at the cost of just not having a DELETE operation available...

#

though now I am seeing someone on reddit say: "I tried to migrate from numpy .npy files to a timeseries database. I saw a degradation of 10x in read performance. I scrapped the idea and stuck with flat files."

#

for the web app I actually tried using clickhouse thinking itd be faster but it wasnt (I was using pandas' read_parquet() instead of polars or duckdb )

coral wasp
#

Databases are a frustrating specialty.... I aim for good enough and predictably scalable, over perfect/etc

astral current
astral current
wise goblet
astral current
wise goblet
astral current
#

I need resources I code on mobile

astral current
wise goblet
#

Coding on mobile is not intended 😕

astral current
wise goblet
astral current
wise goblet
# astral current You got a plan?

I am not you too know your conditions.
Choices can vary from asking parents/relatives to buy one
To getting summer job to get the money

astral current
#

I thought you were talking about sqlite3

#

Don't worry I'm gonna buy one next month

astral current
brave tundra
#

match = 'delete from matches' result = 'delete from results' cur.execute(match) cur.execute(result)

#

why isn't the record in the table deleted

coral wasp
#

Did you commit?

bright sigil
#

I wanna make a voting system using python, and html/css/js what's the best db to use? around 1000 voters, I thought I'm gonna use SQLite, but how do I handle multiple simultaneous votes? Cause I looked it up it and SQLite can only have 1 write simultaneously.
I'm pretty new to this db and networking stuff, and concurrency (if it's needed)

#

Help? Please 🥺

#

Guidance for what I should learn

waxen finch
# bright sigil I wanna make a voting system using python, and html/css/js what's the best db to...

sqlite can handle concurrency reasonably well if you ensure your connections/transactions are short-lived, and sqlite also has a WAL mode to remove contention between readers and writers

see also: https://sqlite.org/whentouse.html

Situations Where SQLite Works Well

  • Websites
    SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
  • Server-side database
    With this pattern, the overall system is still client/server: clients send requests to the server and get back replies over the network. But instead of sending generic SQL and getting back raw table content, the client requests and server responses are high-level and application-specific. The server translates requests into multiple SQL queries, gathers the results, does post-processing, filtering, and analysis, then constructs a high-level reply containing only the essential information.
#

you should learn how to design good database schemas, like what tables you should create, what primary/foreign keys you should use, normal forms you should follow, using indexes to optimize particular queries, etc. and then get used to making connections and executing queries to retrieve/insert into your database

(see also the SQLBolt interactive tutorial if you want practice with writing queries, and the SQLite docs to learn their SQL syntax like INSERT, SELECT, UPDATE, etc.)

brave tundra
fading patrol
#

I see some really old references to people using SQLite and other SQL databases as the message broker for Celery. The Celery documentation doesn't mention this possibility.

Does anyone happen to know if it is still possible?

storm mauve
steady epoch
#

should i use sqlalchemy with asyncpg? or just sqlachemy alone , i want good performance
is using it is overkill?

wise goblet
#

I also don't recommend such extra over compleixities to people who did not cover their database interacting code with unit tests and added their execution into CI

opal fiber
#

Guys i need help

why does python deleted my " HELP " from the " python-help " ??

opal fiber
#

Anyway i will post my problem here 😁😁😁

coral wasp
opal fiber
#

BRUH ☠️

coral wasp
opal fiber
coral wasp
opal fiber
#

i'm in need of help to finish my term paper i know this is for simple help, but i really need the help of a good soul so i can finish it, so here goes;

My project consists of a control of entry and exit of students using the QR-CODE of the student card when entering you must show the card to the laptop* and take the photo of the card (The QR-CODE shows the student's enrollment that will be previously registered with the student's data that I have already done).

At this point, the system must record the time in the MySQL database, and the same thing applies to the student's departure.

I'm doing my project in Python and MySQL

The problem I have is this enrollment data that I can't transport and save the time and the link to the database.

I need to present it on July 2nd now 😢

opal fiber
coral wasp
#

What have you done already for the database? Do you have a database? Have you created a database table?

opal fiber
#

I created the database

#

And the tables

#

I just need help to put these things usable

coral wasp
opal fiber
#

The alter table and the relationtip with the main python code

coral wasp
#

Please share the relevant code and ask a specific question, I still don't know what you need help with.

opal fiber
#

Ok

coral wasp
#

The more specific the better

fervent charm
#

Is it bad to have thousands of tables

coral wasp
fervent charm
coral wasp
#

Why would you put each ticker in a separate table?

fervent charm
#

I dont have any yet, just trying to see how I should do this

fervent charm
coral wasp
#

Just use one table and have a ticker column.

#

What you're trying to do is partitioning, and partitioning can be done within a table.

#

The idea is sound, just the wrong tool

fervent charm
#

Like, I don't put all my price data into a single parquet file, there's a separate praquet file for each. So naturally I didnt think that putting everything into a single table would be correct

fervent charm
#

My experience with databases has been limited to school busy-work until now and definitely never worked with time series data in a db before

coral wasp
#

Many tables would require dynamically selecting tables in the sql
Query, something sql doesn't allow (can't parametrize table names)

#

Most database optimizations come down to: indexing, partitioning, and orientation (columnar vs row)

#

(And roll ups and sharding and denormalization and caching)

fervent charm
#

I'm going with questdb so far, seems pretty good. Its column oriented and as a result faster than timescale, but pretty sure the catch is you can't delete rows easily

#

It described itself as OLAP, while describing timescaledb being more OLTP

opal fiber
#

Yes i selected all the stuff

coral wasp
#

'Unknown MySQL server host'

opal fiber
coral wasp
#

Your server host should probably be localhost, not pou

opal fiber
#

Yeah LMAO

tulip estuary
steady epoch
#
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
import asyncpg
from config import DatabaseConfig
from sqlalchemy import MetaData

DATABASE_URL = f"postgresql+asyncpg://{DatabaseConfig.username}:{DatabaseConfig.password}@{DatabaseConfig.hostname}:{DatabaseConfig.db_port}/{DatabaseConfig.db_name}"
print(DATABASE_URL)
DEFAULT_DATABASE_URL = f"postgresql://{DatabaseConfig.username}:{DatabaseConfig.password}@localhost:5432/postgres"
print(DEFAULT_DATABASE_URL)
# Base for declarative class configuration
Base = declarative_base(metadata=MetaData())


async def create_database_if_not_exists():
    try:
        # Connect to the default database
        default_engine = await asyncpg.connect(DEFAULT_DATABASE_URL)
        # Check if the target database exists
        result = await default_engine.fetchrow(f"SELECT 1 FROM pg_database WHERE datname = '{DatabaseConfig.db_name}'")
        print(result)
        if not result:
            # Create the target database if it doesn't exist
            await default_engine.execute(f'CREATE DATABASE "{DatabaseConfig.db_name}"')
            print(f"Database '{DatabaseConfig.db_name}' created successfully")
            await default_engine.commit()
        await default_engine.close()
    except Exception as e:
        print(e)


async def create_tables():
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)
        await conn.run_sync(Base.metadata.create_all)

        print("Tables created successfully")


async def initialize_database():
    try:
        await create_database_if_not_exists()
        await create_tables()
    except Exception as e:
        print(f"Error during database initialization: {e}")

# Create the async engine for the target database
engine = create_async_engine(
    DATABASE_URL,
    echo=True,            # Set to True for debugging SQL queries (optional)
    # Enable pool_pre_ping to check for stale connections (recommended)
    pool_pre_ping=True,
    pool_size=20,         # Maximum number of connections in the pool
    max_overflow=0        # Maximum number of connections that can be created beyond pool_size
)

# Create a configured "Session" class
AsyncSessionLocal = sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False
)

why tables are not getting created : (

#

literally i cant figure out where i am doing wrong everything works but then it doesnt create any table at all

coral wasp
#

Add more print statements and narrow it down.

hexed estuary
#
except Exception as e:
    print(e)

don't ever do this. at the very least do traceback.format_exc. otherwise you're discarding the valuable traceback.

spare hemlock
#

unsure if better here or elsewhere but someone knows if it's possible to read a JS JSON in python? like it isn't plain JSON but a JS file meant to emulate JSON (along with variable names, etc…). right now made it able to read file from internet, now only to read content from it 🙂

hexed estuary
#

manually strip the variable stuff and load the rest as 3 different json files (or as python code for that matter)

spare hemlock
#

it have 3 "tables"

#

aka - 3 variables

#

and i want them to be loaded into separate variables in python (number of "tables" in JS should be same as number of variables in python)

hexed estuary
#

so, why not just change let initial = ["fire", "water", "air", "earth"]; to initial = ["fire", "water", "air", "earth"]; and similarly for the other two, and then just import that as a python file?

#

or must it be automatic?

spare hemlock
#

?

#

JS is broken then

#

(i want one file for 2 apps - one in JS and second - local client in python)

coral wasp
#

I dunno if I'd say narrow is easier

#

Maybe easier to stick data in

#

But often performs terribly, and dealing with data types is problematic

#

It also compromises various compression and optimizations, and managing data integrity can be harder

wooden topaz
#

--> Created an .tf file where i wrote the code to create an lambda fn
--> After executing the series of commands init, plan & apply, the lambda got created on AWS console.
--> Now, if i want to use that lambda, i need run that lambda from AWS console, pass the input & see the output.
--> But, i want to execute this lambda from my console & see the output also in my console if self without touching AWS portal. Can terraform do this?

lavish hull
#

Hello everyone just a quick question:
I have a get request to one of my models lets say Departments and im passing a column value from a dataset which is typically a short string like 'La county' but sometimes it is a string with multiple counties as one string like 'La county, multnomah county, idaho county' and my get method call fails. what would be the best way cut up that string into seperate strings and run the get method on each?
would making it into a list with split(',') work and then run a loop on that list with each string making its own get call work? Thanks

fervent charm
serene thorn
#

EDIT: Problem solved! See below.
Hey everyone, I am having problems creating a session context manager. I have some generic database methods in a class like bulk add. Sometimes I do not want to commit changes unless all method calls execute properly, therefore I was thinking of creating a session context manager. The code executes without errors but the changes are not written to the database. I tried creating it:

async def transaction():
    try:
        async with async_session() as session:
            yield session
            if session.dirty or session.new or session.deleted:
                await session.commit()
    except SQLAlchemyError:
        await session.rollback()
        raise

async def bulk_update_objects(
    self,
    objects: List[Any],
    update_fields: List[str],
    model: Type[Any],
    session: AsyncSession = None
):
    primary_key = inspect(model).primary_key[0].name
    updates = [{primary_key: getattr(obj, primary_key), **{field: getattr(obj, field) for field in update_fields}} for obj in objects]

    # Perform the bulk update
    async with session or async_session() as sess:
        try:
            await sess.execute(
                update(model),
                updates
            )
            if session is None:
                await sess.commit()
        except SQLAlchemyError:
            if session is None:
                await sess.rollback()
            raise

async def example():
    async with self._db_operator.transaction() as session:
        await self._db_operator.bulk_update_objects(
            objects=examples,
            update_fields=["example_flag"],
            model=Example,
            session=session
        )```

Even though the session object has the same ID in all places the changes are never actually written to the PostgreSQL database when using the yielded session. The bulk_update_objects method works fine when the session argument is not provided. Anyone have any idea why that might be?
#

I have also unsuccessfully used the begin method since I thought it might commit the changes after the session is closed automatically, thereby forcing a commit after the yield. But that didn't work either. For example:

    @asynccontextmanager
    async def transaction():
        session = async_session()
        try:
            async with session.begin():
                yield session
        except SQLAlchemyError:
            await session.rollback()
            raise
serene thorn
#

Nevermind, it was stupid to create a new context using "async with" like so:

async with session or async_session() as sess:

Instead I can make it work by having the method constructed like this instead:

        context = async_session() if session is None else nullcontext(session)

        async with context as sess:
            try:
                await sess.execute(
                    update(model),
                    updates
                )
                if session is None:
                    await sess.commit()
            except SQLAlchemyError:
                if session is None:
                    await sess.rollback()
                raise

Problem solved.

cloud sedge
#

my code has an error and idk what it means

coral wasp
opal river
#

This is a random question but can you make a model that for examples tries to draw trendlines on financial data and on each output it gets a rating from a human and the correct example to learn from

fading patrol
opal river
#

Ohh sorry

#

I thought this was the channel

#

My mistake

deep heron
#

I am in the process of making a central company package that we can import into our projects for common dev stuff. It's only for internal use and is being stored on a local hosted gitlab, and the users are locked into a menu through a putty session and as soon as their process completes, the session gets closed. So they can't view the actual coding. One thing I realized is that we have a single username and password into our database for general use, and this is something I have to work around (out of my control). The thing is, all of our current projects hardcode that user password combination.
Being that I can't adjust the database, what is the best way to handle database access? Is there a way to get away from hardcoding that? Currently I use it in a JSON.

#

Packages are being used with Poetry, btw.

fading patrol
#

But to be clear you absolutely should not be hard coding DB credentials in your repo

deep heron
#

Git lab is locally hosted, is that what you mean by on-prem servers? We're not using any containers, unless you count the Putty remote sessions each user uses. I've raised the question about a system environment variable for the "Business Box" as we call it, as each user uses a remote session into it. I've not heard a good answer back yet other than, "rather not if it can be avoided" so I'm unsure on that. ("Small" company problems, I know...)

#

Should I move the question?

#

It was a toss up for this question and my other one on which would be the proper channels. I chose poorly 😂

#

And we use SQLAlchemy through a connector for MariaDB, if that's the engine.

fading patrol
# deep heron Git lab is locally hosted, is that what you mean by on-prem servers? We're not ...

This is a good summary of the options as I understand them. If you're stuck with the infra you're on, I do think environment variable is probably the way to go

https://medium.com/twodigits/keep-passwords-out-of-source-code-why-and-how-e84f9004815a

Medium

A very common security misbehavior I see in my daily work is that credentials are checked into source code control (like git). This is often referred to as hard-coded credentials. This article…

deep heron
#

Ok, thanks 😄 I will give them a read.

deep heron
#

Storing a password as an environmental variable would still be open to anyone who could get access to those, right?

deep heron
#

But this is still more secure than having them in the program, and much better than having them in the repo.

#

I'm currently researching the hashicorp option, but it seems like a bit much for just one user/pass

small bramble
#

A database can use both data storage layout row and columb based together?

#

In a system

coral wasp
delicate fieldBOT
#

:incoming_envelope: :ok_hand: applied timeout to @torn sphinx until <t:1720106129:f> (10 minutes) (reason: emoji spam - sent 24 emojis).

The <@&831776746206265384> have been alerted for review.

obsidian basin
#

When a site is live or phrased better connected to the internet do people use flask migrate to update a database?

small bramble
fervent charm
#

Is this an alright design:

CREATE TABLE IF NOT EXISTS estimates (
    timestamp_accessed TIMESTAMPTZ NOT NULL,
    ticker_id INTEGER NOT NULL, -- AAPL
    line_item TEXT NOT NULL, -- net_sales
    quarter VARCHAR(6) NOT NULL, -- 2022Q1
    released INTEGER,  -- 27908
    forecast INTEGER, --28233
    spread NUMERIC, -- -1.15%
    announcement_date DATE, -- 2024/09/06
    PRIMARY KEY (timestamp_accessed, ticker_id, line_item, quarter),
    CONSTRAINT fk_ticker_id FOREIGN KEY (ticker_id)
        REFERENCES tickers (id)
        ON UPDATE CASCADE,
    CONSTRAINT check_quarter_format CHECK (quarter ~ '^\d{4}Q[1-4]$')
);
coral wasp
delicate fieldBOT
#

failmail :ok_hand: applied timeout to @tardy bolt until <t:1720190228:f> (10 minutes) (reason: mentions spam - sent 6 mentions).

The <@&831776746206265384> have been alerted for review.

obsidian basin
#

@fading patrol Are there other options with flask sqlalchemy, thanks for the previous response

obsidian basin
#

@fading patrol to add and delete columns in a table without losing the data in the db

fading patrol
obsidian basin
#

I just mean wihthout losing data in other columns

#

that still exist

fading patrol
obsidian basin
#

So again is flask migrate the best tool,

#

or are there other options?

queen sleet
#

Hi everyone,does anyone know how to get the unused table in MySQL other than using count(For MySQL)

coral wasp
#

Wdym unused?

torn sphinx
#

is big data even there yet?

coral wasp
torn sphinx
# coral wasp Where?

is spark and Hadoop even good? Is the technology even there yet in terms of quality?

coral wasp
#

You're asking about old tech that's been around for quite a while.

#

So, yes.

torn sphinx
#

ok

coral wasp
#

Spark is like 10 years old.

#

Hadoop almost 20

torn sphinx
#

don't they require like a nonsense amount of RAM?

coral wasp
brazen charm
#

I would say Hadoop is mostly on its way out tbh

#

I do not see it much really outside of legacy systems or hidden away by spark's handling

#

by "on its way out" I mean more along the lines of it isn't being used directly on new project anymore

sacred widget
#

Hello guys ! I need tips about Sqlite3 functions management. I'm having some exercises with Python and now I'm facing the database connection with Sqlite3.

My question is this " How can i organize well all my function for connect, disconnect, insert, select, update, etc... ? " I tried to organize as best I could but I've always a lot of issue with connection and cursor object thru all the functions i created.

It's better to write always clean code customized every time in this case or someone of you can help me with some example how the manage all these database function ?

Thank you very much !!!

fading patrol
sacred widget
willow mortar
#

Hi guys. I need some suggestions.

My project currently has two tables in the database. staff_id = Column(Integer, primary_key=True, autoincrement=True) but the autoincrement is relative to individual table. But I would love it to be global for the database. How can I achieve this?

Example: Intern employed on July 7th gets a Staff ID of 1092 but a manager employed the following day could be getting a Staff ID of 723.

coral wasp
#

What database are you using?

willow mortar
#

SQLite3

#

SQLAlchemy

keen minnow
#

What benefit do you see in having a global id for the database?

coral wasp
#

One option is another base table for 'person'.

willow mortar
keen minnow
willow mortar
#

...

#

Can't think of any. Probably because that how I imagined it during the idea conception.

keen minnow
# willow mortar Can't think of any. Probably because that how I imagined it during the idea conc...

Your problem sounds a lot like a https://xyproblem.info/ to me. So I am trying to understand the root problem here and the trade offs you are willing to make.

There might be reasons to make the number sequential within an organization, but there are also reasons to make the numbers not predictable as well (ex: security measure so no one can't predict the hiring rate). Both having very different trade offs

#

A concrete example of trade off is that having a global id means you will run out of ID faster since it's now shared across all the tables 🙂
And note that's not something generally advisable to do

willow mortar
#

However, the idea behind having unique IDs is such that I can query database with the ID, and get a maximum of 1 value. If I get lucky enough, the two tables might get as many entries that there might be an overlap at some point.

fading patrol
willow mortar
#

I got a similar suggestion (using a single table) earlier. I would just use that.