#databases | Python | Page 32

finite crow Jun 14, 2024, 7:29 PM

#

Oh

#

Okay. But it lists them in age order in their example.

coral wasp Jun 14, 2024, 7:29 PM

#

Sure, that's a database decision what order to provide. If you want an order, add it as an order by to the outer query.

#

Otherwise, most databases just say the result is non-deterministic (it could come back in any order)

finite crow Jun 14, 2024, 7:30 PM

#

It's implementation defined?

coral wasp Jun 14, 2024, 7:30 PM

#

It depends on how the database decided to execute the query, which may depend on the data. So, you always put an order by if you depend on the order.

finite crow Jun 14, 2024, 7:31 PM

#

Alright, that makes sense.
Thanks for helping me understand this.

wooden topaz Jun 14, 2024, 7:57 PM

#

ok. @coral wasp. did like you said.
first i tried below query

SELECT category, product, total_spend,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend 
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query

got the correct result. From this result, i proceeded to next step of filtering only those rows whose row_number <=2, then the error comes.
To put it in simple way, i have added only last line of the below query.....

SELECT category, product, total_spend,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend 
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query
WHERE row_number <= 2

Error:
column "row_number" does not exist. FYI: in PostgreSQL all identifiers (including column names) that are not double-quoted are folded to lowercase. So if you're using any uppercased letters in the names and want to specify them in SQL-query, please double-quote them first. (LINE: 14)

coral wasp Jun 14, 2024, 7:58 PM

#

Ok, so row_number is not a column. You can't filter on something that isn't a column.

#

Look at the results from the first query. What columns do you see?

wooden topaz Jun 14, 2024, 7:59 PM

#

coral wasp Look at the results from the first query. What columns do you see?

I see row number there!
pls just execute the query here once https://datalemur.com/questions/sql-highest-grossing

coral wasp Jun 14, 2024, 7:59 PM

#

Share a screenshot of the result

#

Also, your code will look better if you format it first: https://sql-formatter-org.github.io/sql-formatter/

wooden topaz Jun 14, 2024, 8:00 PM

#

coral wasp Jun 14, 2024, 8:00 PM

#

Yes: but row_number isn't a column in sub_query.

#

The where clause filters the sub_query, before the "select" statement runs. If you want to filter by something you created in the outer select, then wrap it again as a subquery and filter it.

wooden topaz Jun 14, 2024, 8:02 PM

#

coral wasp The where clause filters the sub_query, before the "select" statement runs. If y...

i am adding where clause at the outer query, not inside the sub query

coral wasp Jun 14, 2024, 8:02 PM

#

The where clause applies to columns in the sub_query.

#

Your query says: "select <stuff> from subquery where subquery.row_number <= 2"

wooden topaz Jun 14, 2024, 8:06 PM

#

coral wasp The where clause applies to columns in the sub_query.

hmm..so i am thinking like this.

SELECT a,b,c,d,e
FROM(
SELECT a,b,c
FROM so_and_so_table
)

Now, at the output i have 5 columns on which i can execute WHERE clause. Pls correct me, where is wrong in my thinking.

coral wasp Jun 14, 2024, 8:06 PM

#

wooden topaz hmm..so i am thinking like this. SELECT a,b,c,d,e FROM( SELECT a,b,c FROM so_a...

No, you ahve 3 columns on which you can execute the where clause

#

The where clause is before the select.

#

I know it's confusing.

wooden topaz Jun 14, 2024, 8:07 PM

#

coral wasp Your query says: "select <stuff> from subquery where subquery.row_number <= 2"

?? row_number column is produced by super query, not sub query

coral wasp Jun 14, 2024, 8:07 PM

#

wooden topaz ?? row_number column is produced by super query, not sub query

Yes. that's why your code doesn't work.

wooden topaz Jun 14, 2024, 8:09 PM

#

coral wasp Yes. that's why your code doesn't work.

yeah..now i get it. sorry for bothering you.
So, this is what happening.
First WHERE clause executes & whatever the result comes out of that WHERE clause, SELECT clause executes --> Becasue of this, When WHERE clause is getting executed, the row_number column is not yet there.

coral wasp Jun 14, 2024, 8:09 PM

#

Yes, that's right.

#

The reason is:

#

If the where clause was after the select, it might change the results. What if you said:

#

select category, row_number() over (order by total_spend) from sub_query where total_spend > 100 and row_number < 3

#

This would be an impossible query... because you'd be filtering on both an input to the row_number (total_spend) AND the result of the row_number. (ok, not impossible, just really weird)

wooden topaz Jun 14, 2024, 8:13 PM

#

ok. so, what could be the best way to filter out only those rows whose row_number <=2 from the below query:

SELECT category, product, total_spend,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend 
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query

coral wasp Jun 14, 2024, 8:13 PM

#

select * from (yourquery) where row_number <= 2

wooden topaz Jun 14, 2024, 8:14 PM

#

coral wasp `select * from (yourquery) where row_number <= 2`

so, totally 3 SELECT statements?

coral wasp Jun 14, 2024, 8:14 PM

#

wooden topaz so, totally 3 SELECT statements?

Yes

#

The way I would've written this, using a CTE, is:

#

with subquery as (...),
  sq_rownumber as (select .... from subquery)
select * from sq_rownumber 
where row_number <= 2

#

Much more readable, imo.

wooden topaz Jun 14, 2024, 8:22 PM

#

coral wasp ```sql with subquery as (...), sq_rownumber as (select .... from subquery) sel...

yes....i will write with CTE.
but, here also 3 SELECT stataments will be there, right?

coral wasp Jun 14, 2024, 8:23 PM

#

wooden topaz yes....i will write with CTE. but, here also 3 SELECT stataments will be there, ...

Yes. With a cte or without, it's the same query.

wooden topaz Jun 14, 2024, 8:23 PM

#

coral wasp Yes. With a cte or without, it's the same query.

without CTE, it looks like below 🙂

SELECT category, product, total_spend
FROM
(
SELECT *,
ROW_NUMBER()
OVER(
PARTITION BY category
ORDER BY total_spend DESC
)
FROM
(
SELECT category, product, SUM(spend) total_spend
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = '2022'
GROUP BY category, product
ORDER BY category, total_spend DESC
)AS sub_query_1
)AS sub_query_2
WHERE row_number <= 2

wooden topaz Jun 14, 2024, 8:25 PM

#

coral wasp Yes. With a cte or without, it's the same query.

i just read about CTE. it's the neat way of writing sub query. right?

coral wasp Jun 14, 2024, 8:26 PM

#

wooden topaz i just read about CTE. it's the neat way of writing sub query. right?

In this example, it's just a nicer way of writing sub queries, yes. But, in other examples where you need to use the subquery multiple times, it's very important.

wooden topaz Jun 14, 2024, 8:28 PM

#

coral wasp In this example, it's just a nicer way of writing sub queries, yes. But, in othe...

Even when i have to use single sub query multiple times, we can give alias name to sub query through AS keyword & use, right?
I mean, without using CTE also, we can write sub query only once & use multilpe times...

coral wasp Jun 14, 2024, 8:29 PM

#

wooden topaz Even when i have to use single sub query multiple times, we can give alias name ...

Show me an example of using a subquery multiple times. [It won't work without a CTE. ]

wooden topaz Jun 14, 2024, 8:30 PM

#

coral wasp Show me an example of using a subquery multiple times. [It won't work without a ...

oh..ok.

wooden topaz Jun 14, 2024, 8:40 PM

#

coral wasp In this example, it's just a nicer way of writing sub queries, yes. But, in othe...

I prepared notes for CTE as below. pls correct, if you have few minutes, could you pls correct, if there is any wrong.

What is CTE?

CTE stands for common table expressions.
It is advanced version of sub query which differs from sub query in 2 ways.

It is much more readable compared to sub query.
When a subquery has to be used multiple times, you can't use with the help of alias name defined. So, going with CTE is the best option here.

Syntax

WITH my_cte AS (
  SELECT a,b,c
  FROM T1
)
SELECT a,c
FROM my_cte
WHERE ....

coral wasp Jun 14, 2024, 8:41 PM

#

Yes, that looks correct

wooden topaz Jun 14, 2024, 8:41 PM

#

coral wasp Yes, that looks correct

thx

#

#

I think, a stand alone CTE will give error.
There must be a SELECT statement which selects from that CTE

#

coral wasp Jun 14, 2024, 8:50 PM

#

Correct, yes.

wooden topaz Jun 14, 2024, 8:54 PM

#

coral wasp Jun 14, 2024, 8:54 PM

#

You use "with" once:

#

with q1 as (...),
  q2 as (...),
  q3 as (...)
select * from ...
``` (corrected to remove trailing comma)

wooden topaz Jun 14, 2024, 8:57 PM

#

coral wasp ```sql with q1 as (...), q2 as (...), q3 as (...) select * from ... ``` (cor...

followed same syntax:

coral wasp Jun 14, 2024, 9:03 PM

#

You have a trailing comma

#

Oh sorry, my example had a trailing comma.

wise wind Jun 14, 2024, 9:29 PM

#

Sqlalchemy was updating instead of deleting the children rows because I didn't specify a delete cascade with my user to dm relationship.

#

I ended up changing my relationship configurations to this: py sent_dms: Mapped[List["Dms"]] = relationship(foreign_keys='Dms.sender', back_populates='sender_user',cascade="all, delete") received_dms: Mapped[List["Dms"]] = relationship(foreign_keys='Dms.receiver',back_populates='receiver_user',cascade="all, delete")

#

Sqlalchemy has some weird behavior.

finite crow Jun 14, 2024, 10:22 PM

#

SELECT candidate_id
FROM candidates
WHERE (
    (candidate_id, 'Python') IN candidates
    AND (candidate_id, 'Tableau') IN candidates
    AND (candidate_id, 'PostgreSQL') IN candidates
)
;

What's wrong with the syntax here?
syntax error at or near ""candidates"" (LINE: 4)

#

I'm trying to do the 2nd datalemur problem.

#

And how do I do what I'm trying to do? What's the right syntax to do this?

wise wind Jun 14, 2024, 10:34 PM

#

finite crow ```postgresql SELECT candidate_id FROM candidates WHERE ( (candidate_id, 'Py...

I think putting just IN candidates won't work because you haven't selected anything from the candidates table therefore no value (tuple) is being returned.

finite crow Jun 14, 2024, 10:37 PM

#

Okay that sort of makes sense, but now it's showing me some insane error that my alias doesn't exist.

SELECT candidate_id AS this_candidate_id
FROM candidates
WHERE (
    (this_candidate_id, 'Python') IN (SELECT (candidate_id, skill) FROM candidates)
    AND (this_candidate_id, 'Tableau') IN (SELECT (candidate_id, skill) FROM candidates)
    AND (this_candidate_id, 'PostgreSQL') IN (SELECT (candidate_id, skill) FROM candidates)
)
;

column "this_candidate_id" does not exist. FYI: in PostgreSQL all identifiers (including column names) that are not double-quoted are folded to lowercase. So if you're using any uppercased letters in the names and want to specify them in SQL-query, please double-quote them first. (LINE: 4)

#

Of course it doesn't exist, it's an alias.

#

I don't get it.

wise wind Jun 14, 2024, 10:38 PM

#

finite crow Okay that sort of makes sense, but now it's showing me some insane error that my...

I assume you have to double quote the label for candidate_id

coral wasp Jun 14, 2024, 10:38 PM

#

Hold up a sec

#

Your query is a bit insane

#

~~Your query says: "Give me all candidates where any candidate knows Python and any candidate knows Tableau and any candidate knows Postgresql?~~

wise wind Jun 14, 2024, 10:39 PM

#

It is redundant.

coral wasp Jun 14, 2024, 10:39 PM

#

Oh soryr, the tuples... nm

#

Who writes like this... don't write queries like this

#

select * from candidates where skill in (x,y,z)

finite crow Jun 14, 2024, 10:40 PM

#

No, they need to know python, tableau, and postgresql

#

all three of those skills

coral wasp Jun 14, 2024, 10:40 PM

#

finite crow No, they need to know python, tableau, and postgresql

Well, that your current will never work.

#

Well, what the heck. What is your data model?

#

Show an example of the data.

finite crow Jun 14, 2024, 10:41 PM

#

I'm doing this one.
https://datalemur.com/questions/matching-skills

coral wasp Jun 14, 2024, 10:42 PM

#

Ah, it's a weird name for the table. I'd call taht "candidate_skills", not "candidates".

finite crow Jun 14, 2024, 10:42 PM

#

wise wind I assume you have to double quote the label for candidate_id

I don't get what you mean

coral wasp Jun 14, 2024, 10:43 PM

#

The naive thing to do is something like: ```sql
with python_candidates as (....),
tableau_candidates as (...),
sql_candidates as (...)
select ... from ...

wise wind Jun 14, 2024, 10:43 PM

#

finite crow Okay that sort of makes sense, but now it's showing me some insane error that my...

SELECT candidate_id
FROM candidates
WHERE skill IN ('Python','Tableau','PostgresSQL')
GROUP BY candidate_id
HAVING COUNT(DISTINCT(skill)) = 3```

coral wasp Jun 14, 2024, 10:43 PM

#

The fancy version is what Blaziken just posted... you could count the number

#

fwiw, I prefer people understand the first version first, because it's more extensible to different types of problems. The second version (counting) fits only certain cases that are exactly designed for this kind of solution.

finite crow Jun 14, 2024, 10:45 PM

#

I just want to check if the three tuples are in the table.

wise wind Jun 14, 2024, 10:45 PM

#

Yeah starting simple is the way to go.

coral wasp Jun 14, 2024, 10:46 PM

#

finite crow Okay that sort of makes sense, but now it's showing me some insane error that my...

A simpler version of this original idea is:

#

select *
from (select distinct candidate_id from candidates) distinct_cands
where distinct_cands.candidate_id in (select * from candidates where skill='Python') ...

#

I don't like that... tho

#

I would've written, if need be:

#

select *
from (select distinct candidate_id from candidates) distinct_cands
left outer join (select * from candidates where skill='Python') python_cands
  on python_cands.candidate_id = ...

#

Or maybe,

#

select candidate_id, candidate_id in (select * from candidates where skill='Python') as knows_python, ...
from (select distinct candidate_id from candidates) distinct_cands

#

(still, professionally, I would've done what Blaziken did)

wise wind Jun 14, 2024, 10:49 PM

#

finite crow I just want to check if the three tuples are in the table.

The problem is that it's redundant. Imagine if you wanted to check if a candidate had 20 distinct skills.

finite crow Jun 14, 2024, 10:53 PM

#

SELECT DISTINCT candidate_id AS this_candidate_id
FROM candidates AS c1
WHERE (
    EXISTS (SELECT candidate_id FROM candidates AS c2 WHERE c1.candidate_id = c2.candidate_id AND c2.skill = 'Python')
    AND EXISTS (SELECT candidate_id FROM candidates AS c2 WHERE c1.candidate_id = c2.candidate_id AND c2.skill = 'Tableau')
    AND EXISTS (SELECT candidate_id FROM candidates AS c2 WHERE c1.candidate_id = c2.candidate_id AND c2.skill = 'PostgreSQL')
)
;

finite crow Jun 14, 2024, 10:54 PM

#

wise wind The problem is that it's redundant. Imagine if you wanted to check if a candidat...

Won't it optimize that away for you?

coral wasp Jun 14, 2024, 10:56 PM

#

Think of how messy this query is

#

The number of times you're visiting candidates

#

My duckdb version: ```py
import duckdb

result_df = duckdb.sql("""
create or replace table candidates as
select * from (values
(123,'Python'),
(123,'Tableau'),
(123,'PostgreSQL'),
(234,'R'),
(234,'PowerBI'),
(234,'SQL Server'),
(345,'Python'),
(345,'Tableau')
) t(candidate_id, skill)
;

select candidate_id, list_has_all(list(skill), ['Python', 'Tableau']) from candidates group by candidate_id
""").df()

print(result_df)

wise wind Jun 14, 2024, 10:58 PM

#

finite crow ```postgresql SELECT DISTINCT candidate_id AS this_candidate_id FROM candidates ...

SELECT DISTINCT candidate_id AS this_candidate_id
FROM candidates AS c1
WHERE (
    (SELECT COUNT(DISTINCT(skill))
     FROM candidates c2
     WHERE (
            c1.candidate_id = c2.candidate_id
            AND c2.skill IN ('Python','Tableau','PostgreSQL')
      ) = 3
)
;

coral wasp Jun 14, 2024, 11:00 PM

#

finite crow ```postgresql SELECT DISTINCT candidate_id AS this_candidate_id FROM candidates ...

At minimum, you're querying candidates once too many times. You have one query to get all Python candidates, then Tableau, then Postgresql... why bother with the from clause? You just want the inner join of all three subqueries.

finite crow Jun 14, 2024, 11:01 PM

#

I thought the query optimizer is supposed to deal with stuff like that.

coral wasp Jun 14, 2024, 11:01 PM

#

finite crow I thought the query optimizer is supposed to deal with stuff like that.

Deal with what? Bad queries?

finite crow Jun 14, 2024, 11:01 PM

#

yes

coral wasp Jun 14, 2024, 11:02 PM

#

It tries to map your query to the equivalent simplest version, but your query isn't the same as the inner join version.

#

I guess it'd probably map it to an equivalent of:

#

select * from candidates a
join candidates b
  on a.id=b.id
    and b.skill='Python'
join candidates c
  on ...

#

But, things get hard to fathom when you start looking at the cardinalities of each step and how many rows are returned/etc.

#

I'm going to bet that the count solution will beat any of these solutions easily.

coral wasp Jun 14, 2024, 11:06 PM

#

wise wind ```sql SELECT DISTINCT candidate_id AS this_candidate_id FROM candidates AS c1 W...

~~Altho, Blaziken's solution has one bug which I'll leave an an exercise to the reader.~~ Sorry, no, they corrected for it (the distinct)

finite crow Jun 14, 2024, 11:09 PM

#

Ok, I thought it could optimize it to not be redundant.

finite crow Jun 14, 2024, 11:10 PM

#

wise wind ```sql SELECT candidate_id FROM candidates WHERE skill IN ('Python','Tableau','P...

The solution is close to what he showed here.

#

SELECT candidate_id
FROM candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(skill) = 3
ORDER BY candidate_id;

coral wasp Jun 14, 2024, 11:15 PM

#

wise wind ```sql SELECT candidate_id FROM candidates WHERE skill IN ('Python','Tableau','P...

That is this.

#

That said, the "distinct skill" isn't needed because the question (I see now) says: "There are no duplicates in the candidates table."

finite crow Jun 14, 2024, 11:27 PM

#

I want to complete datalemur. But I've done a few of the easy level questions now, and they're about the difficulty of the questions I dealt with on my final exam. Maybe slightly less, but only slightly.

#

I feel like it's one of those situations where the class only actually gave us a surface level of knowledge/skill.

coral wasp Jun 14, 2024, 11:30 PM

#

finite crow I feel like it's one of those situations where the class only actually gave us a...

Yup, it's worth doing, but just accept that you don't know many of the "tricks" and "patterns", so you should accept "it works" over "it's identical to their solution".

wise wind Jun 14, 2024, 11:31 PM

#

It's better to do something on your own even though it isn't going to be totally correct or it takes a ton of time.

finite crow Jun 14, 2024, 11:31 PM

#

My plan is to try to do the problems, THEN look at their solution.

#

And hopefully end up absorbing their way of doing things.

coral wasp Jun 15, 2024, 12:17 AM

#

finite crow My plan is to try to do the problems, THEN look at their solution.

Personally, I find other peoples solutions demotivating. THe important thing for a beginner/intermediate is that you can solve the problem. Worry about the optimal solution some other time 🙂

finite crow Jun 15, 2024, 4:43 AM

#

https://datalemur.com/questions/sql-average-post-hiatus-2
Do any of you know what this one is? Is it to find the average amount of time between posts through 2021 for each user?

torn sphinx Jun 15, 2024, 5:05 AM

#

Dynamic

coral wasp Jun 15, 2024, 12:45 PM

#

finite crow <https://datalemur.com/questions/sql-average-post-hiatus-2> Do any of you know w...

See lag() window function

wise wind Jun 15, 2024, 1:57 PM

#

torn sphinx Dynamic

How many rows are you loading?

wooden topaz Jun 15, 2024, 2:05 PM

#

https://datalemur.com/questions/sql-highest-grossing
I have solved this problem using ROW_NUMBER()

But, the official solution contains RANK()

I analysed for some time & for this problem -- Either you solve with ROW_NUMBER() / RANK() / DENSE_RANK, doesn't matter. The reason is; we are partitioning based on SUM(spend) & then taking top 2 SUM(spend) -- Right?

Amazon SQL Interview Question | DataLemur

Amazon SQL Interview Question: Write a query to identify highest-grossing products within each category.

coral wasp Jun 15, 2024, 2:07 PM

#

wooden topaz https://datalemur.com/questions/sql-highest-grossing I have solved this problem ...

Do you know the difference between rank and row_number?

#

If not, read this: https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-ver16 "ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5)."

wooden topaz Jun 15, 2024, 2:08 PM

#

coral wasp Do you know the difference between rank and row_number?

Let's consider that you are selecting salaries column from department table.
When all the salaries are unique, there will be no difference between ROW_NUMBER(), RANK() and DENSE_RANK().

Difference comes when there are same salaries

#

#

ROW_NUMBER

No matter what the salary is, irrespective of that, ROW_NUMBER() will produce sequence of numbers one after the other.

RANK()

It's like ranking students based on the marks they got. When two students get same marks, they will be allocated same rank, isn't it. In the same way, RANK() works.
But the point to note down is, when it allocates same rank to different persons, it will skip the next rank, which can be better explained with above example.

DENSE_RANK()

It's like ranking students based on the marks they got. When two students get same marks, they will be allocated same rank, isn't it. In the same way, RANK() works.
But the point to note down is, when it allocates same rank to different persons, it won't skip the next rank, which can be better explained with above example.

coral wasp Jun 15, 2024, 2:10 PM

#

The problem specification does not say what to do when there are ties, therefore any of these options are arguably right.

wooden topaz Jun 15, 2024, 2:11 PM

#

coral wasp The problem specification does not say what to do when there are ties, therefore...

I have generalized when to use when. If you have any points to add to this, pls...

#

When to use what?

For example, if you were reporting winners, and always needed to report a first, second, and third place, regardless of ties for each place, you would use DENSE_RANK.

Otherwise, you would use RANK, which might mean that there is no second or third place. If you were certain that there could never be duplicates, then you could just use ROW_NUMBER.

coral wasp Jun 15, 2024, 2:12 PM

#

I rarely use Rank. row_number is usually what I select first, and then dense_rank if I must consider ties.

#

But, that's all a matter of preference .

wooden topaz Jun 15, 2024, 3:50 PM

#

When solving question https://datalemur.com/questions/top-fans-rank
Answer i came up with

SELECT a.artist_name, COUNT(a.artist_name) no_of_time_below_10
FROM artists a
JOIN songs s 
ON a.artist_id= s.artist_id
JOIN global_song_rank r  
on s.song_id = r.song_id
WHERE r.rank <= 10
GROUP BY(a.artist_name)
ORDER BY no_of_time_below_10, artist_name
LIMIT 5

The official solution in the solution tab was solved with the help of DENSE_RANK(). Where could have been my query deviated from the official solution? pls..

coral wasp Jun 15, 2024, 4:39 PM

#

wooden topaz When solving question https://datalemur.com/questions/top-fans-rank Answer i cam...

Just to repeat my question in pydis: is your result correct? If not, in what way is it not correct?

wooden topaz Jun 15, 2024, 4:40 PM

#

coral wasp Just to repeat my question in pydis: is your result correct? If not, in what way...

my query:

SELECT a.artist_name, COUNT(a.artist_name) artist_rank
FROM artists a
JOIN songs s 
ON a.artist_id= s.artist_id
JOIN global_song_rank r  
on s.song_id = r.song_id
WHERE r.rank <= 10
GROUP BY(a.artist_name)
ORDER BY artist_rank, artist_name
LIMIT 5

#

Corresponding output:

#

#

official answer in solution tab

WITH top_10_cte AS (
  SELECT 
    artists.artist_name,
    DENSE_RANK() OVER (
      ORDER BY COUNT(songs.song_id) DESC) AS artist_rank
  FROM artists
  INNER JOIN songs
    ON artists.artist_id = songs.artist_id
  INNER JOIN global_song_rank AS ranking
    ON songs.song_id = ranking.song_id
  WHERE ranking.rank <= 10
  GROUP BY artists.artist_name
)

SELECT artist_name, artist_rank
FROM top_10_cte
WHERE artist_rank <= 5;

#

corresponding output:

#

coral wasp Jun 15, 2024, 4:47 PM

#

And what's wrong with your answer?

wooden topaz Jun 15, 2024, 4:51 PM

#

coral wasp And what's wrong with your answer?

😦 that's my question billybobby. I am trying to understand where is my answer deviated from the official. could not find out, so came here for help.

coral wasp Jun 15, 2024, 4:59 PM

#

What's your result vs their result? Not the query, the output of the query

wooden topaz Jun 15, 2024, 5:05 PM

#

coral wasp What's your result vs their result? Not the query, the output of the query

first of all, the number of rows itself different....

coral wasp Jun 15, 2024, 5:07 PM

#

wooden topaz first of all, the number of rows itself different....

Yes, because of ties. The problem statement says: "two or more artists have the same number of song appearances, they should be assigned the same ranking, and the rank numbers should be continuous (i.e. 1, 2, 2, 3, 4, 5)"

#

I agree the specification is ambiguous here: top 5 ranks vs top 5 artists.

wooden topaz Jun 15, 2024, 5:09 PM

#

coral wasp Yes, because of ties. The problem statement says: "two or more artists have the ...

i also covered this aspect in my query.

#

coral wasp Jun 15, 2024, 5:14 PM

#

Aren't you ordering ascending? Ariana has the least number, not most.

#

And limit 5 will never work, because it'll truncate ties.

#

But your first issue is the sort order

wooden topaz Jun 15, 2024, 5:16 PM

#

coral wasp And limit 5 will never work, because it'll truncate ties.

i see your point. because of this, i am getting less no of rows in my output. but, within the output i got, yet the data is mismatching..

#

coral wasp Jun 15, 2024, 5:18 PM

#

That makes perfect sense because you're sorting in wrong order.

#

And your artist_rank isn't a rank, it's a count of top 10 hits

wooden topaz Jun 15, 2024, 5:21 PM

#

coral wasp That makes perfect sense because you're sorting in wrong order.

wrong sorting?

#

coral wasp Jun 15, 2024, 5:21 PM

#

What does your artist rank represent?

wooden topaz Jun 15, 2024, 5:25 PM

#

coral wasp What does your artist rank represent?

i see this logic. But, my thought process was like this:
Let's consider a class room with 2 students as below

name  marks
peter  10
heidi  20

If i want to rank the students, i have 2 options

RANK()
sort based on RANK which is nothing but, the student with highest marks is 1st rank & the second highest marks will become 2nd rank.

#

both are giving same kind of result. isn't it?

coral wasp Jun 15, 2024, 5:31 PM

#

What does your artisst_rank represent? It's not a rank.

wooden topaz Jun 15, 2024, 5:42 PM

#

coral wasp What does your artisst_rank represent? It's not a rank.

This query gave me clear picture

#

#

coral wasp Jun 15, 2024, 5:42 PM

#

There you go, that's how you debug!

wooden topaz Jun 15, 2024, 5:47 PM

#

coral wasp There you go, that's how you debug!

no_of_times an artist appeared in the top 10 is different from the rank.
i was thinking no_of_times as artist appeared it self as their rank which is the difference between my answer & official answer.
Thank you very much to be there & help me while i debug...
you are gem of SQL...
So, for an data science interview perspective, apart from the below topic, could you pls list down some more important things, based on your experience?

Aggregate fn's
Window fn's -- Especially ROW_NUMBER(), RANK() & DENSE_RANK()
Cluster index & non Cluster index concept
CTE

coral wasp Jun 15, 2024, 5:52 PM

#

What I'd say is start with understanding every part of the SELECt statement: https://www.postgresql.org/docs/current/sql-select.html Indices are advanced topics: it's very database specific, though, so you'll really be learning about Postgres.

PostgreSQL Documentation

SELECT

SELECT SELECT, TABLE, WITH — retrieve rows from a table or view Synopsis [ WITH [ RECURSIVE ] with_query [, …

coral wasp Jun 15, 2024, 5:53 PM

#

wooden topaz no_of_times an artist appeared in the top 10 is different from the rank. i was t...

Second, practice debugging. What you just did above (modifying query to see debug info) is the most important part. I don't write perfect sql: I'm just good at debugging

#

Third, there's other advanced topics, like: what's the difference between different databases. How does Dynamo differ from Snowflake/Duckdb/Clickhouse?

#

Fourth: learning how to express the same query in Pandas vs SQL : all of the problems above could be written in Pandas. You should learn how (if you want to be in a data role)

wooden topaz Jun 15, 2024, 5:58 PM

#

Thank you BillyBobby. I sent you an friend request. pls accept it(If you are ok)

coral wasp Jun 15, 2024, 6:01 PM

#

wooden topaz Thank you BillyBobby. I sent you an friend request. pls accept it(If you are ok)

Your persistence here is going to pay off!

finite crow Jun 15, 2024, 6:19 PM

#

coral wasp See lag() window function

Thanks, I didn't know about that.

#

https://www.tutorialspoint.com/explain-division-operation-in-relational-algebra-dbms
Does postgresql have this?

finite crow Jun 15, 2024, 8:11 PM

#

I finished a hard difficulty one on sql-practice.

#

WITH docs_years (doctor_id, `year`) AS (
    SELECT DISTINCT doctor_id,
        YEAR(admission_date)
    FROM admissions
        JOIN (
            SELECT DISTINCT doctor_id
            FROM doctors
        )
), adm_map AS (
    SELECT *
    FROM docs_years
        LEFT JOIN admissions AS a
            ON docs_years.doctor_id = a.attending_doctor_id AND docs_years.`year` = YEAR(a.admission_date)
        JOIN doctors AS d
            ON docs_years.doctor_id = d.doctor_id
)
SELECT doctor_id,
    first_name || ' ' || last_name                           AS doctor_full_name,
    specialty,
    `year`,
    CASE WHEN patient_id IS NULL THEN 0 ELSE COUNT(*) END    AS total_admissions
FROM adm_map
GROUP BY doctor_id,
    `year`
;

#

But I overcomplicated it because it looks like there aren't any doctors with no admissions.

coral wasp Jun 15, 2024, 8:28 PM

#

Let's take apart the first subquery: ```sql
SELECT DISTINCT doctor_id,
YEAR(admission_date)
FROM admissions
JOIN (
SELECT DISTINCT doctor_id
FROM doctors
)

#

What you're looking for is (something like):

#

select doctor_id, YEAR(admission_date) year, count(*) num_admissions
from doctors
  left outer join admissions
  on doctors.doctor_id = admissions.doctor_id
group by doctors.doctor_id, year

coral wasp Jun 15, 2024, 8:43 PM

#

finite crow I finished a hard difficulty one on sql-practice.

Above was for you

finite crow Jun 15, 2024, 9:27 PM

#

Thanks, but would that still work if there were doctors who had no admissions in certain years?

coral wasp Jun 15, 2024, 9:28 PM

#

That's why it's a left outer join

finite crow Jun 15, 2024, 9:31 PM

#

Oh right, I see now.

#

would num_admissions be null?

#

Or would it be 1?

#

Oh yeah, I was thinking that if year was null, then COUNT(*) would end up being 1

wooden topaz Jun 15, 2024, 9:35 PM

#

With related to https://datalemur.com/questions/signup-confirmation-rate
Below query produces result of 0.33 which is correct answer.

WITH basic_data AS(
SELECT e.email_id d, t.email_id n
FROM emails e  
LEFT JOIN texts t
ON e.email_id = t.email_id
AND t.signup_action = 'Confirmed')

SELECT ROUND(COUNT(n)::DECIMAL / COUNT(d),2)
FROM basic_data

If i remove the DECIMAL from last but one line, then it results in 0. what could be the reason, pls..

hollow oar Jun 15, 2024, 9:38 PM

#

wooden topaz With related to https://datalemur.com/questions/signup-confirmation-rate Below q...

have a read here - https://www.postgresql.org/docs/14/functions-math.html#:~:text=Division (for integral types%2C division truncates the result towards zero)
that's a very common mistake when dealing with division.

PostgreSQL Documentation

9.3. Mathematical Functions and Operators

9.3. Mathematical Functions and Operators Mathematical operators are provided for many PostgreSQL types. For types without standard mathematical conventions (e.g., date/time …

finite crow Jun 15, 2024, 9:43 PM

#

finite crow I finished a hard difficulty one on sql-practice.

Now I got the two hard difficulties from the northwind.db problems.

#

Except on the last one, I can't figure out why it's SUM(unit_price * quantity * discount) instead of just SUM(quantity * discount)

coral wasp Jun 15, 2024, 10:39 PM

#

How could you debug this? How would you write a query that gives more insight into what's happening?

unkempt prism Jun 16, 2024, 12:08 AM

#

at least one product from each product category.

I don't see any reference to product_category in your attempt.

finite crow Jun 16, 2024, 1:11 AM

#

Are you expected to be really knowledgeable in statistics for most jobs that use SQL?

unkempt prism Jun 16, 2024, 2:22 AM

#

by summing the product count you are discarding the category

A better way is to use distinct. This should work in postgres. I'm unsure with SQL Server though I assume there is some sort of equivalent syntax.

SELECT cc.customer_id,
FROM customer_contracts cc  
JOIN products p   
ON cc.product_id = p.product_id
GROUP BY cc.customer_id
HAVING COUNT(distinct product_category) >=3 
ORDER BY customer_id

#

Saying that you may get the right answer if you use COUNT and not SUM here. to make it SELECT customer_id, COUNT(product_count) That way if one category has 2 products its not over represented in the result as the question doesn't ask for it.

finite crow Jun 16, 2024, 2:33 AM

#

Why would the expected output not include month 6? It shows right there that user 445 was active that month and the previous month. I don't get it.

#

https://datalemur.com/questions/user-retention

coral wasp Jun 16, 2024, 4:41 AM

#

finite crow Why would the expected output not include month 6? It shows right there that use...

I think it's just saying that the output for June 2022 is that, not that it's the entire output

finite crow Jun 16, 2024, 4:46 AM

#

No that is my output

finite crow Jun 16, 2024, 8:01 AM

#

I see now.

#

Write a query to obtain number of monthly active users (MAUs) in July 2022

#

I missed the "July". I thought it wanted the whole year.

#

That seems easier.

wooden topaz Jun 16, 2024, 2:58 PM

#

https://datalemur.com/questions/supercloud-customer

Microsoft SQL Interview Question | DataLemur

Microsoft SQL Interview Question: Write a query to identify the Azure Supercloud customers.

#

The solution i came up with is:

#

WITH basic_data_with_count AS(
SELECT cc.customer_id, p.product_category, COUNT(product_name) product_count
FROM customer_contracts cc  
JOIN products p   
ON cc.product_id = p.product_id
GROUP BY cc.customer_id, p.product_category)

SELECT customer_id, SUM(product_count)
FROM basic_data_with_count
GROUP BY customer_id
HAVING SUM(product_count) >=3
ORDER BY customer_id

#

#

not getting where i went wrong...

coral wasp Jun 16, 2024, 3:06 PM

#

wooden topaz not getting where i went wrong...

Think about how to rewrite this query to get more information about what it's returning. For instance: which products and product categories did each customer see? If you do that, you'll see your mistake.

reef ermine Jun 16, 2024, 3:14 PM

#

wooden topaz not getting where i went wrong...

The count(product_name) in your CTE will also count cases when 1 product_category that has multiple product names - consider using COUNT(DISTINCT )

coral wasp Jun 16, 2024, 3:17 PM

#

wooden topaz ```python WITH basic_data_with_count AS( SELECT cc.customer_id, p.product_catego...

What you need to learn next is: Stop trying to debug a query by analyzing the code. Start debugging by adding information or modifying the query to understand what is happening.

#

A lot of new developers stare at their code trying to find the bug, rather than getting more information.

echo rampart Jun 16, 2024, 3:19 PM

#

what is this

wooden topaz Jun 16, 2024, 3:44 PM

#

coral wasp What you need to learn next is: Stop trying to debug a query by analyzing the co...

ok. i deleted everything & staring a fresh...
i will write querry step by step & paste here.....when you see somehting is going wrong. just show me red flag

#

SELECT cc.customer_id, p.product_category, p.product_name
FROM customer_contracts cc  
JOIN products p  
ON cc.product_id = p.product_id

#

now i have basic data in my hand.
what i need to do is: find whether customer is buying at least one product per each product_category. I am thinking on this now.

#

came up with this:

#

SELECT cc.customer_id, p.product_category,
CASE
  WHEN COUNT(p.product_name) >= 1 THEN 1
  ELSE 0
END AS product_buyed
FROM customer_contracts cc  
JOIN products p  
ON cc.product_id = p.product_id 
GROUP BY cc.customer_id, p.product_category

wooden topaz Jun 16, 2024, 4:12 PM

#

pls leave above...i came up with new approach

#


SELECT cc.customer_id, p.product_category
FROM customer_contracts cc  
JOIN products p  
ON cc.product_id = p.product_id 
GROUP BY cc.customer_id, p.product_category
HAVING COUNT(cc.customer_id) >= 3
ORDER BY cc.customer_id

coral wasp Jun 16, 2024, 4:29 PM

#

It's really hard to debug when you compute the count in the having clause. Move the count to the select, and reference it in the having: select count(...) as mycount from ... having mycount>3. Then, use something like group_concat(...) to inspect the values.

coral wasp Jun 16, 2024, 4:29 PM

#

wooden topaz pls leave above...i came up with new approach

This is a weird formulation of the question: show me all customers that have three or more contracts?

wooden topaz Jun 16, 2024, 5:45 PM

#

finally....

wooden topaz Jun 16, 2024, 5:46 PM

#

coral wasp It's really hard to debug when you compute the count in the having clause. Move ...

WITH basic_data AS(
SELECT cc.customer_id, p.product_category
FROM customer_contracts cc
JOIN products p
ON cc.product_id = p.product_id
GROUP BY cc.customer_id, p.product_category)

SELECT customer_id
FROM basic_data
GROUP BY customer_id
HAVING COUNT(product_category) >= 3
finally....

wooden topaz Jun 16, 2024, 5:46 PM

#

coral wasp It's really hard to debug when you compute the count in the having clause. Move ...

but, this one i did not get. could you pls show me small snippet, when you have few minutes

#

below is still more improved answer

WITH basic_data AS(
SELECT cc.customer_id, p.product_category
FROM customer_contracts cc
JOIN products p
ON cc.product_id = p.product_id 
GROUP BY cc.customer_id, p.product_category)

SELECT customer_id
FROM basic_data
GROUP BY customer_id
HAVING COUNT(product_category) >= (SELECT COUNT(DISTINCT product_category)
FROM products)

wooden topaz Jun 16, 2024, 6:28 PM

#

#

may be the measurement_id is potraied as integer in question, but it could be char behind the scenes

half gyro Jun 16, 2024, 6:51 PM

#

@wooden topaz why do you ask SQL questions in #python-discussion?

wise wind Jun 16, 2024, 6:53 PM

#

wooden topaz

I recommend taking the time to figure the problem out on your own. You're not always going to have someone help you.

half gyro Jun 16, 2024, 6:56 PM

#

wise wind I recommend taking the time to figure the problem out on your own. You're not al...

i just wish they would ever respond to me when I ask them not to post these questions in the wrong channel. I've never gotten a reply.

wise wind Jun 16, 2024, 7:08 PM

#

half gyro i just wish they would ever respond to me when I ask them not to post these ques...

Yeah I just think he should learn to solve things on his own because that how you learn. Also, he has a lot of messages from previous conversations that hold valuable information that could solve his problems.

grim quest Jun 16, 2024, 9:58 PM

#

you guys know any good SQL interview prep sources? i'm looking for something that at least sorts its problems by category, like tagging them as joins for example

#

https://leetcode.com/studyplan/top-sql-50/

#

this is the only one i've found but i just wanna see if i can find some more variety in the questions

coral wasp Jun 16, 2024, 10:10 PM

#

grim quest you guys know any good SQL interview prep sources? i'm looking for something tha...

Scroll up a bit, you'll see a few like datalemur

finite crow Jun 16, 2024, 10:32 PM

#

How much statistics do you need to know for most jobs that use SQL?

#

I tried to do one of the easy statistics questions and couldn't do it.

#

I didn't understand the terminology or what it was asking.

coral wasp Jun 16, 2024, 10:36 PM

#

finite crow I tried to do one of the easy statistics questions and couldn't do it.

What question?

coral wasp Jun 16, 2024, 10:37 PM

#

finite crow How much statistics do you need to know for most jobs that use SQL?

'How much' is probably not quantifiable, nor is 'for most jobs'

finite crow Jun 16, 2024, 10:38 PM

#

coral wasp What question?

The first one

#

Basically ANY statistics

coral wasp Jun 16, 2024, 10:38 PM

#

finite crow The first one

Link?

finite crow Jun 16, 2024, 10:39 PM

#

coral wasp Link?

https://datalemur.com/questions/coin-fairness-test

coral wasp Jun 16, 2024, 10:39 PM

#

finite crow <https://datalemur.com/questions/coin-fairness-test>

Are you in Uni?

finite crow Jun 16, 2024, 10:39 PM

#

I basically only know mean, median, and mode

finite crow Jun 16, 2024, 10:40 PM

#

coral wasp Are you in Uni?

Yes but not for math

coral wasp Jun 16, 2024, 10:40 PM

#

What major?

finite crow Jun 16, 2024, 10:40 PM

#

CS

coral wasp Jun 16, 2024, 10:43 PM

#

finite crow CS

Doesn't your Cs program require a stats course?

#

I've never seen one that doesn't.

finite crow Jun 16, 2024, 10:43 PM

#

No, I've never taken stats

#

I took calculus that I've mostly forgot since it's been awhile.

coral wasp Jun 16, 2024, 10:45 PM

#

That's weird. Well, I think it's expected that SWEs should have taken engineering stats. That's my experience, at least. And this question is like chapter 1 of engineering stats

#

Even business majors cover this in their lighter business stats course (non calc based)

finite crow Jun 16, 2024, 10:46 PM

#

Is it hard?

#

Could I just look at khan academy?

coral wasp Jun 16, 2024, 10:47 PM

#

finite crow Is it hard?

I dunno. It's a 1 semester class, usually. It's not easy, but everyone takes it, ime

finite crow Jun 16, 2024, 10:48 PM

#

Lmk if you know of any resources to learn it quickly. I don't know any statistics at all.

#

I have summer off. Then I need to go back and I won't have any time.

coral wasp Jun 16, 2024, 10:49 PM

#

Off the top of my head, no, but it's a normal course that is probably very similar at every Uni. Find a syllabus/cirriculum and maybe ask in #data-science-and-ml

#

(More math ppl there than here, I think)

finite crow Jun 16, 2024, 10:50 PM

#

Alright, I'll ask there. Thanks for telling me about this.

echo rampart Jun 17, 2024, 1:19 AM

#

so good

finite crow Jun 17, 2024, 1:25 AM

#

finite crow That seems easier.

Got it.

WITH months AS (
    SELECT DISTINCT user_id,
        EXTRACT(MONTH FROM event_date) AS "month"
    FROM user_actions
    WHERE (
        (
            EXTRACT(YEAR FROM event_date) = 2022
            AND (EXTRACT(MONTH FROM event_date) = 6 OR EXTRACT(MONTH FROM event_date) = 7)
        )
    )
)
SELECT current_month.month,
    count(*) AS monthly_active_users
FROM months AS current_month
WHERE EXISTS (
    SELECT months.user_id,
        months.month
    FROM months
    WHERE months.month = current_month.month - 1 AND months.user_id = current_month.user_id
)
GROUP BY current_month.month
;

pliant compass Jun 17, 2024, 12:26 PM

#

Hello everyone,
I have a quick question. I’m building an app and I’m trying to connect it to an online database, but I actually don’t know how to do it. I keep getting an error saying I don’t have privileges, but gave my user all the privileges. Also, how do I setup an online database for my app. Can I use regular website hosting plan and just create a database?

wise goblet Jun 17, 2024, 1:01 PM

#

pliant compass Hello everyone, I have a quick question. I’m building an app and I’m trying to c...

It sounds like u use we can't possibly know what

#

Databases can be very different

#

And it also sounds like u use mysterious provider of some managed databases. Which once again we can't know which one u use

pliant compass Jun 17, 2024, 1:02 PM

#

I'm using Python and the provider is Hostinger

#

Hostinger for web hosting

#

So what I did, was I created a tic tac toe game, but I need to connect it to an API, which means I need a database

#

So I used a web hosting platform because it has a database, but to me that doesn't seem like the right solution

wise goblet Jun 17, 2024, 1:58 PM

#

pliant compass So I used a web hosting platform because it has a database, but to me that doesn...

Looks very complicated

#

I will offer using the good provider Hetzner https://www.hetzner.com/
That just provides you with Linux OS machine at which u can run anything

Dedicated Server, Cloud, Storage & Hosting

Your partner for system-wide hosting, from cloud to dedicated servers. We deploy the latest tech at the best price in minutes.

wise goblet Jun 17, 2024, 1:59 PM

#

pliant compass So I used a web hosting platform because it has a database, but to me that doesn...

still dunno which db u are using, but Sqlite3 is a solid start, that can be run at the same server

wise goblet Jun 17, 2024, 2:03 PM

#

pliant compass So I used a web hosting platform because it has a database, but to me that doesn...

For especially perveted people, there is an option to just use shelve 😄 https://docs.python.org/3/library/shelve.html
Sqlite3 is good default for pet projects. But sometimes makes sense dumb solution like shelve too i guess

wise goblet Jun 17, 2024, 2:05 PM

#

pliant compass So I used a web hosting platform because it has a database, but to me that doesn...

https://docs.python.org/3/library/sqlite3.html here is tutorial to sqlite3

Python documentation

sqlite3 — DB-API 2.0 interface for SQLite databases

Source code: Lib/sqlite3/ SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard ...

#

no library installing is necessary in python for its usage

#

but using it with ORM is ergh... more simple if necessary

#

i could recommend using it with https://sqlitebrowser.org/ which is very nice GUI to traverse it

DB Browser for SQLite

#

i would say, using migrating system in python is kind of obligatory to have simple usage of SQL

#

SQLAlchemy + Alembic is strongly encouraged to use in python to have very simple life with SQL
(or Django ORM for Djangoists)

pliant compass Jun 17, 2024, 2:21 PM

#

Yeah, I was using SQL. What other database is there? I thought almost all used SQL

#

I don't know oif there is difference between Sql and sqlite3, but the setup I used was very confusing 😁

storm mauve Jun 17, 2024, 2:23 PM

#

SQL is a structured query language, used to interact with relational databases

relational databases store data in a format similar to spreadsheets, with well defined rows and columns, and nearly all of them do use SQL, but there are also a lot of other databases which store data in less structured formats, such as documents (MongoDB) or even graphs (Neo4J)

pliant compass Jun 17, 2024, 2:24 PM

#

Oh, well, yes mine is a relational database for sure

storm mauve Jun 17, 2024, 2:25 PM

#

SQLite, PostgreSQL and others are databases, which you can interact with using the SQL language

#

SQL is only used for queries though, you still need of a programming language like Python to interact with other things, and you need of a program that actually sends the query to the database

pliant compass Jun 17, 2024, 2:26 PM

#

Yeah, so I'm doing that now using Python, but the problem lies in the query

#

It just doesn't work 😕

#

I keep getting an error saying my user doesn't have privileges, so I don't know if it's the database or the provider at this point

storm mauve Jun 17, 2024, 2:27 PM

#

I would recommend using SQLite to make sure it works, before you try to debug connecting to an online database

pliant compass Jun 17, 2024, 2:27 PM

#

OK 👍

storm mauve Jun 17, 2024, 2:27 PM

#

pliant compass I keep getting an error saying my user doesn't have privileges, so I don't know ...

right now it is probably an issue with the provider

wise goblet Jun 17, 2024, 2:29 PM

#

pliant compass Yeah, I was using SQL. What other database is there? I thought almost all used S...

SQL is good default, working for 95%+ usage cases.
Do use its engine version Sqlite3, it is just a file in filesystem

#

Nice for pet projects

storm mauve Jun 17, 2024, 2:29 PM

#

there is:

The database itself
Whatever is hosting it and managing external connections to the database (their network -> their database) *****
Whatever you are using to connect to it (your network -> their network) *
Your program itself (your program -> your network)

Right now it is probably on that * level aka 'fix your credentials'

wise goblet Jun 17, 2024, 2:30 PM

#

pliant compass Yeah, I was using SQL. What other database is there? I thought almost all used S...

In general case noSQL databases should be avoided used unless u became professional enough to know when they are justified (very very rarely)

finite crow Jun 17, 2024, 4:57 PM

#

SELECT profit.year,
    profit.product_id,
    profit.curr_year_spend,
    CASE
        WHEN (
            profit.product_id = LAG(product_id, 1) OVER (ORDER BY profit.year, profit.product_id)
            THEN LAG(product_id, 1) OVER (ORDER BY profit.year, profit.product_id)
        )
    ELSE
        NULL
    END AS prev_year_spend,

Why is this invalid syntax?

#

syntax error at or near "THEN"

#

no wait

finite crow Jun 17, 2024, 5:00 PM

#

finite crow `syntax error at or near "THEN"`

yeah, still this.

#

nvm, apparently it was the parentheses

finite crow Jun 17, 2024, 5:18 PM

#

coral wasp See lag() window function

Thanks again, I got one with it:

WITH profit (product_id, "year", curr_year_spend, prev_year_spend) AS (
    SELECT product_id,
        EXTRACT(YEAR from transaction_date),
        SUM(spend),
        CASE
            WHEN
                product_id = LAG(product_id, 1, NULL)
                OVER (ORDER BY product_id, EXTRACT(YEAR from transaction_date))
                THEN LAG(SUM(spend), 1, NULL)
                OVER (ORDER BY product_id, EXTRACT(YEAR from transaction_date))
        ELSE
            NULL
        END AS prev_year_spend
    FROM user_transactions
    GROUP BY product_id,
        EXTRACT(YEAR from transaction_date)
    ORDER BY product_id,
        EXTRACT(YEAR from transaction_date)
)
SELECT profit.year,
    profit.product_id,
    profit.curr_year_spend,
    profit.prev_year_spend,
    CASE
        WHEN
            profit.prev_year_spend IS NOT NULL
            THEN ROUND(((profit.curr_year_spend - profit.prev_year_spend) / profit.prev_year_spend) * 100, 2)
        ELSE
            NULL
        END AS yoy_rate
FROM profit
ORDER BY profit.product_id,
    profit.year
;

https://datalemur.com/questions/yoy-growth-rate

coral wasp Jun 17, 2024, 7:19 PM

#

A few things: DRY: don't repeat yourself. You have extract(year..) four times.

#

I don't understand that case in the CTE. Looks like a hack that could be replaced with PARTITION BY product_id

#

well, I'll eave that there.

finite crow Jun 17, 2024, 7:33 PM

#

coral wasp A few things: DRY: don't repeat yourself. You have extract(year..) four times.

The query optimizer wouldn't fix THAT?

coral wasp Jun 17, 2024, 7:34 PM

#

That's not the only reason we don't repeat ourselves.

finite crow Jun 17, 2024, 7:35 PM

#

Would you give it an alias and then use the alias?
I remember trying to do that at a certain point, and it said the column doesn't exist, so I've just been repeating expressions instead.

coral wasp Jun 17, 2024, 7:38 PM

#

Push it into a subquery and give it an alias there.

mild gyro Jun 17, 2024, 7:46 PM

#

do i have to install mongodb to access it or there is another way

storm mauve Jun 17, 2024, 7:48 PM

#

mild gyro do i have to install mongodb to access it or there is another way

what do you mean?

you have to install the driver to connect to the database, but for the database itself, you can either run it locally or in some cloud.
in Mongodb's case, they have a free cloud tier under their MongoDB Atlas offering

mild gyro Jun 17, 2024, 7:50 PM

#

so i dont have to ..
thanks!

wooden topaz Jun 17, 2024, 7:52 PM

#

https://datalemur.com/questions/sql-bloomberg-stock-min-max-1
In official solution present inside the Solutions tab, data is GROUPED BY month. The questions doesn't ask to provide data month wise. still why to GROUP BY date? pls...

storm mauve Jun 17, 2024, 7:54 PM

#

oh wait, never mind

wooden topaz Jun 17, 2024, 7:55 PM

#

Currently, you're analyzing the highest and lowest open prices for each FAANG stock by month over the years.

input data is for every month, highest stock price & lowest stock price
from that input data, i need to pick up below per stock
which month has highest stock price & what is that month
which month has lowest stock price & what is that month

storm mauve Jun 17, 2024, 7:55 PM

#

I'd say just try doing it without grouping by and see what you get

#

I am not signing up to see their Solution though

finite crow Jun 17, 2024, 7:57 PM

#

coral wasp I don't understand that `case` in the CTE. Looks like a hack that could be repla...

You're right, this worked.

WITH profit (product_id, "year", curr_year_spend, prev_year_spend) AS (
    SELECT product_id,
        EXTRACT(YEAR from transaction_date),
        SUM(spend),
        LAG(SUM(spend), 1, NULL) OVER (PARTITION BY product_id ORDER BY EXTRACT(YEAR from transaction_date)) AS prev_year_spend

wooden topaz Jun 17, 2024, 7:58 PM

#

storm mauve I'd say just try doing it without grouping by and see what you get

I am trying to solve the problem step by step without looking at solution.
Progressed up to here..

SELECT ticker,
MAX(open) highest_open,
MIN(open) lowest_open
FROM stock_prices
GROUP BY ticker
ORDER BY ticker

#

#

This is actually correct output. Now, all i want it what is the corresponding month where this highest_open & lowerst_open are happening.
But, i try to include blow in the SELECT statement, it says it needs to be included in the GROUP BY & on the other hand, i should not GROUP BY month. How to deal with this dichotomy?

TO_CHAR(date, 'Mon-YYYY') mth

coral wasp Jun 17, 2024, 8:00 PM

#

finite crow You're right, this worked. ```postgresql WITH profit (product_id, "year", curr_y...

You don't need to do the aliases in the with part:

coral wasp Jun 17, 2024, 8:00 PM

#

finite crow You're right, this worked. ```postgresql WITH profit (product_id, "year", curr_y...

WITH profit AS (
    SELECT product_id,
        EXTRACT(YEAR from transaction_date) as "year", --- use a different name for this, don't use a keyword/function name/etc
        SUM(spend) as curr_year_spend,
        LAG(SUM(spend), 1, NULL) OVER (PARTITION BY product_id ORDER BY EXTRACT(YEAR from transaction_date)) AS prev_year_spend

storm mauve Jun 17, 2024, 8:01 PM

#

wooden topaz This is actually correct output. Now, all i want it what is the corresponding mo...

as far as I understand:
The database cannot guess what you want to happen when rows inside of the same ticker group have different dates

#

not sure if you can do something like WHERE/HAVING min/max

coral wasp Jun 17, 2024, 8:02 PM

#

wooden topaz I am trying to solve the problem step by step without looking at solution. Prog...

" all i want it what is the corresponding month where this highest_open & lowerst_open are happening": You want to know the month of the highest open, and separately, the month of the lowest open? Two different months?

wooden topaz Jun 17, 2024, 8:06 PM

#

coral wasp " all i want it what is the corresponding month where this highest_open & lowers...

this red colour data is needed

coral wasp Jun 17, 2024, 8:08 PM

#

wooden topaz this red colour data is needed

Yah, normal pattern: one query to get the minimums, then a second query to find rows where the value = that minimum.

#

Another approach is to use row_number to rank the values, then filter on rows where rank = 1

wooden topaz Jun 17, 2024, 8:10 PM

#

coral wasp Yah, normal pattern: one query to get the minimums, then a second query to find ...

actually, in the solution which is written in solution tab, there is GROUP BY month.
But, the qn dont ask us about month wise data, right? it is asking across whole time......

coral wasp Jun 17, 2024, 8:11 PM

#

wooden topaz actually, in the solution which is written in solution tab, there is GROUP BY mo...

Then turn the data into a monthly summary first.

wooden topaz Jun 17, 2024, 8:11 PM

#

coral wasp Then turn the data into a monthly summary first.

no. my question is about why GROUP BY month needs to be there?

#

coral wasp Jun 17, 2024, 8:15 PM

#

I don't understand. If you want to group by the month, then you need to group by the month?

wooden topaz Jun 17, 2024, 8:20 PM

#

coral wasp I don't understand. If you want to group by the month, then you need to group by...

pls forget about the solution. it may make to look through hind sight.

When we read the qn, it no where asks that the results shoud be grouped by month.. So, then why GROUP BY month is used in the solution?

coral wasp Jun 17, 2024, 8:21 PM

#

Didn't the question ask for which month has the highest or lowest?

wooden topaz Jun 17, 2024, 8:25 PM

#

coral wasp Didn't the question ask for which month has the highest or lowest?

suppose, below is the sample data...
The questions doesn't ask as below
with in Jan months group, what is the highest?
with in feb months group, what is the highest?
with in march months group, what is the highest?

It asks among whole data, which month has the highest & what is that month?

#

i could be wrong also. in that case, pls tell we where i mis undertood the question....

coral wasp Jun 17, 2024, 8:30 PM

#

wooden topaz suppose, below is the sample data... The questions doesn't ask as below with in ...

Yes. One way to calculate the highest and lowest is to summarize the data by month.

wooden topaz Jun 17, 2024, 9:11 PM

#

coral wasp Yes. One way to calculate the highest and lowest is to summarize the data by mon...

????? That completely gives wrong results right?
when you GROUP BY month, the answer will be:
jan 20

feb 40

march 50

#

but, when you dont GROUP BY, answer is
march 50

coral wasp Jun 17, 2024, 9:15 PM

#

wooden topaz ????? That completely gives wrong results right? when you GROUP BY month, the an...

If you group by month, you'd then use a second query to find the month with the largest value.

wooden topaz Jun 17, 2024, 9:27 PM

#

coral wasp If you group by month, you'd then use a second query to find the month with the ...

not getting what to do from here, but any way, pls leave it. i am wasting your time also..

SELECT ticker,
MAX(open) highest_open,
MIN(open) lowest_open
FROM stock_prices
GROUP BY ticker
ORDER BY ticker

coral wasp Jun 17, 2024, 9:28 PM

#

Don't think of it as a "single query". Think of it as a sequence of questions:

#

Start at the end: What information do you need to know which months are the highest?

#

The idea is to work backwards from the result, and construct subqueries that make it easier

finite crow Jun 17, 2024, 10:36 PM

#

does x != y evaluate to FALSE if y is NULL and x isn't?

#

WHERE user_id = lag2 AND (user_id != lag3 OR lag3 IS NULL)
There was a WHERE clause that didn't work until I wrote it like that.

remote leaf Jun 17, 2024, 10:37 PM

#

coral wasp Don't think of it as a "single query". Think of it as a sequence of questions:

Not disagreeing. This approach works. But what works me as well is breaking it up. Get the first small thing because I can verify that works. Then build complexity onto it. Just 2 cents.

coral wasp Jun 17, 2024, 10:47 PM

#

remote leaf Not disagreeing. This approach works. But what works me as well is breaking it u...

fair, I'm not actually sure which way I do it. end to beginning or beginning to end. I guess depends if I already know the pattern.

coral wasp Jun 17, 2024, 10:47 PM

#

finite crow does `x != y` evaluate to FALSE if y is NULL and x isn't?

This is super important to understand, and so important that I won't tell you the answer. Test it yourself.

#

(after you knwo answer, I'm happy to explain)

finite crow Jun 17, 2024, 11:12 PM

#

coral wasp This is super important to understand, and so important that I won't tell you th...

It seems like it evaluates to false.

coral wasp Jun 17, 2024, 11:13 PM

#

finite crow It seems like it evaluates to false.

Even better, what if both are null?

finite crow Jun 17, 2024, 11:14 PM

#

I know that null isn't equal to null. But I didn't check if it's inequal to null.

#

You know what I mean?

coral wasp Jun 17, 2024, 11:14 PM

#

You should check. Very important.

finite crow Jun 17, 2024, 11:32 PM

#

SELECT *
FROM candidates
WHERE (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
)
LIMIT 1
;

This gets no data.

#

SELECT *
FROM candidates
WHERE NOT (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
)
LIMIT 1
;

This also gets no data.

#

SELECT *
FROM candidates
WHERE NOT (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
) IS NULL
LIMIT 1
;

This also gets no data.

#

SELECT *
FROM candidates
WHERE (
    NULL = NULL
    OR NULL != NULL
    OR NOT (NULL = NULL)
    OR NOT (NULL != NULL)
) IS NULL
LIMIT 1
;

This DOES get data.

#

That seems really weird.

coral wasp Jun 17, 2024, 11:35 PM

#

You should build a truth table

finite crow Jun 17, 2024, 11:40 PM

#

Is there an existing truth table?

#

Or a set of rules for this?

#

It seems like any comparison involving NULL evaluates to NULL.

coral wasp Jun 17, 2024, 11:44 PM

#

finite crow Is there an existing truth table?

You should build one... but:

#

The idea is this - null represents absence of knowledge.

#

You can't reason about two unknowns.

#

Does "I don't know" equal "I don't know?

#

Does "I don't know" not equal "I don't know"?

finite crow Jun 17, 2024, 11:46 PM

#

finite crow It seems like any comparison involving NULL evaluates to NULL.

Then is this accurate?

coral wasp Jun 17, 2024, 11:48 PM

#

Test it

finite crow Jun 17, 2024, 11:50 PM

#

It seems like it does, but I'm not really sure.

#

And google isn't telling me yes or no.

#

Idk if there are exceptions to it evaluating to NULL.

coral wasp Jun 17, 2024, 11:56 PM

#

finite crow It seems like it does, but I'm not really sure.

Why aren't you sure? What test isn't clear?

finite crow Jun 17, 2024, 11:58 PM

#

I mean if just not sure if the tests I did are comprehensive enough.

#

I just did some comparison tests, like less than, greater than, equal to, inequal to

#

You know what I mean?

coral wasp Jun 18, 2024, 12:00 AM

#

finite crow You know what I mean?

Yup, I understand... so, now google: "sql three valued logic". That's what you're experiencing.

brazen charm Jun 18, 2024, 12:00 AM

#

Normally NULL handling differs from DB to DB as well

finite crow Jun 18, 2024, 12:01 AM

#

Even for something like THAT?

coral wasp Jun 18, 2024, 12:01 AM

#

finite crow Even for something like THAT?

The stuff you're looking at I think is well standardized.

finite crow Jun 18, 2024, 12:02 AM

#

I hope so

coral wasp Jun 18, 2024, 12:02 AM

#

there are differences in some of the edge cases, like concatenation with nulls, for instance

finite crow Jun 18, 2024, 12:08 AM

#

So basically think of NULL as "it could be anything" instead of "nothing"?

#

That's what it seems like.

coral wasp Jun 18, 2024, 12:10 AM

#

Yes exactly

finite crow Jun 18, 2024, 12:11 AM

#

Alright then, that clears that up. Thank you.

coral wasp Jun 18, 2024, 12:13 AM

#

finite crow Alright then, that clears that up. Thank you.

Also many dbs have null safe comparisons. Some db's use 'is distinct from', MySQL has operators like <=> https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_equal-to

thorny anchor Jun 18, 2024, 12:16 AM

#

sqlserver 🗿

elfin cradle Jun 18, 2024, 7:53 AM

#

!paste

#

https://paste.pythondiscord.com/ACFA

#

what do i even do

#


db = mysql.connector.connect(
    host="localhost",
    user="root",
    passwd="root"
)```

#

im just tryna setup mysql 😭

cedar tiger Jun 18, 2024, 7:59 AM

#

elfin cradle ```import mysql.connector db = mysql.connector.connect( host="localhost", ...

Did it work?

elfin cradle Jun 18, 2024, 7:59 AM

#

cedar tiger Did it work?

the error is the the paste

#

i dont how this even works tbh im just tryna figure it out as i go

cedar tiger Jun 18, 2024, 8:00 AM

#

elfin cradle ```import mysql.connector db = mysql.connector.connect( host="localhost", ...

passwd != password

elfin cradle Jun 18, 2024, 8:00 AM

#

im an idiot

#

whoops

cedar tiger Jun 18, 2024, 8:00 AM

#

elfin cradle i dont how this even works tbh im just tryna figure it out as i go

Follow the docs: https://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html

elfin cradle Jun 18, 2024, 8:00 AM

#

ty

elfin cradle Jun 18, 2024, 8:00 AM

#

cedar tiger Follow the docs: https://dev.mysql.com/doc/connector-python/en/connector-python-...

getting sam error

cedar tiger Jun 18, 2024, 8:02 AM

#

elfin cradle getting sam error

Try this: https://stackoverflow.com/a/50027851

Stack Overflow

"Authentication plugin 'caching_sha2_password'

I'm new to MySql environment and installed :
MySQL with the following commands:
sudo apt-get update
sudo apt-get install mysql-server
mysql_secure_installation

and also installed mysql workbench.
...

elfin cradle Jun 18, 2024, 8:05 AM

#

cedar tiger Try this: https://stackoverflow.com/a/50027851

i dont understand that page, does that code go into command prompt?

#

sorry i know im asking to be spoonfed but im confused

cedar tiger Jun 18, 2024, 8:05 AM

#

elfin cradle i dont understand that page, does that code go into command prompt?

have you tried logging into your mysql without python?

elfin cradle Jun 18, 2024, 8:05 AM

#

yes

#

works fine

#

did it thru workbench

cedar tiger Jun 18, 2024, 8:06 AM

#

OK then run the commands or alter your user to use the mysql_native_password. Not sure how that looks like in workbench

#

But its something to edit your user in mysql

elfin cradle Jun 18, 2024, 8:06 AM

#

wdym alter user

cedar tiger Jun 18, 2024, 8:06 AM

#

alter also means "change"

elfin cradle Jun 18, 2024, 8:07 AM

#

ik but how

cedar tiger Jun 18, 2024, 8:07 AM

#

🤷‍♂️

#

Idk what that looks like in workbench

#

But

#

The command stated in that answer shows it can be done via CLI

elfin cradle Jun 18, 2024, 8:07 AM

#

so would i need to run those commands in command prompt?

#

cli?

cedar tiger Jun 18, 2024, 8:08 AM

#

If you can login to your mysql server via command prompt/cli, then you can alter your user from there (by following the commands shared)

torn sphinx Jun 18, 2024, 1:50 PM

#

Can someone help me complete the following process? I am trying to download Postgres to do this but I dont know which version I need and what packages to use?
Setting up a Django project locally.
Setting up a MySQL or Postgres database locally, ideally using Docker.
Make sure things can run locally, all connected.
Understand what properties are being edited to control the connection between the Django app and database.
Use ENV variables to define those properties.
Figure out how to Dockerize the Django app
Deploy a cloud database. (AWS RDS? but that costs money? other suggestions)
Set up users and grants in the cloud database.
Deploy the Django app in a cloud provider with the right ENV variables so it points to your cloud database

fading patrol Jun 18, 2024, 2:17 PM

#

torn sphinx Can someone help me complete the following process? I am trying to download Pos...

Use the official Docker image https://hub.docker.com/_/postgres

postgres - Official Image | Docker Hub

The PostgreSQL object-relational database system provides reliability and data integrity.

torn sphinx Jun 18, 2024, 3:23 PM

#

fading patrol Use the official Docker image https://hub.docker.com/_/postgres

I am reading this, does it say anything anywhere about what Postgres version to download, also is it true I do not need to techncially download anything for Docker?

fading patrol Jun 18, 2024, 3:29 PM

#

torn sphinx I am reading this, does it say anything anywhere about what Postgres version to ...

By default Docker will pull the latest version.

If you haven't installed Docker, you may need to download that

torn sphinx Jun 18, 2024, 4:39 PM

#

fading patrol By default Docker will pull the latest version. If you haven't installed Docker...

i have download docker I keep getting this error

fading patrol Jun 18, 2024, 7:30 PM

#

torn sphinx i have download docker I keep getting this error

Is that Amazon Linux? Have you tried sudo yum install docker? https://www.cyberciti.biz/faq/how-to-install-docker-on-amazon-linux-2/

finite crow Jun 18, 2024, 7:49 PM

#

WITH cte1 AS (
    SELECT SUM(num_users) OVER (ORDER BY searches) user_count,
        searches
    FROM search_frequency
    WHERE num_users > 0
    ORDER BY searches
), cte2 AS (
    SELECT user_count                                 AS uc_high,
        searches                                      AS s_high,
        LAG(user_count) OVER (ORDER BY user_count)    AS uc_low,
        LAG(searches) OVER (ORDER BY user_count)      AS s_low
    FROM cte1
    ORDER BY user_count
), cte3 AS (
    SELECT MAX(uc_high) AS uc_total
    FROM cte2
), cte4 AS (
    SELECT *
    FROM cte3
        CROSS JOIN cte2
    WHERE uc_low <= (CAST(uc_total AS DECIMAL) / 2) AND uc_high > (CAST(uc_total AS DECIMAL) / 2)
), cte5 AS (
    SELECT
        CASE
            WHEN MOD(uc_total, 2) = 1 THEN s_low
        ELSE
            CASE
                WHEN uc_low = (CAST(uc_total AS DECIMAL) / 2) THEN (CAST(s_low AS DECIMAL) + s_high) / 2
            ELSE
                s_low
            END
        END AS median_not_rounded
    FROM cte4
)
SELECT ROUND(median_not_rounded, 1) AS median
FROM cte5
;

https://datalemur.com/questions/median-search-freq
Their solution did this in ten lines. It's kind of frustrating.

finite crow Jun 18, 2024, 8:06 PM

#

WITH searches_expanded AS (
  SELECT searches
  FROM search_frequency
  GROUP BY 
    searches, 
    GENERATE_SERIES(1, num_users))

SELECT 
  ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY searches)::DECIMAL, 1) AS  median
FROM searches_expanded;

The searches_expanded part makes no sense at all to me. grouping by searches, GENERATE_SERIES(1, num_users)) should be exactly the same as grouping by searches, num_users.

#

Because it just returns a series of numbers from 1 to num_users.

#

Also, when I put GENERATE_SERIES(1, num_users) in the SELECT clause, it evaluates to integers, and I can't figure out where they're coming from. I don't get what it's doing.

#

The documentation doesn't explain it either.

hollow oar Jun 18, 2024, 8:12 PM

#

finite crow Also, when I put GENERATE_SERIES(1, num_users) in the SELECT clause, it evaluate...

it's an abuse of group by.
think of it this way

WITH tmp AS (
SELECT searches, GENERATE_SERIES(1, num_users) AS searches_user_id
  FROM search_frequency
)

SELECT
  searches
FROM
  tmp
GROUP BY
  searches, searches_user_id

#

you can disregard the group by abuse that the "model answer" uses tbh.

#

i.e. just use

WITH searches_expanded AS (
  SELECT searches, GENERATE_SERIES(1, num_users)
  FROM search_frequency
)

SELECT 
  ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY searches)::DECIMAL, 1) AS  median
FROM searches_expanded;

finite crow Jun 18, 2024, 8:23 PM

#

hollow oar i.e. just use ```sql WITH searches_expanded AS ( SELECT searches, GENERATE_SER...

I don't get what GENERATE_SERIES(1, num_users) is doing

hollow oar Jun 18, 2024, 8:23 PM

#

finite crow I don't get what GENERATE_SERIES(1, num_users) is doing

https://www.postgresql.org/docs/current/functions-srf.html

PostgreSQL Documentation

9.25. Set Returning Functions

9.25. Set Returning Functions # This section describes functions that possibly return more than one row. The most widely used functions …

#

of which

This section describes functions that possibly return more than one row.
possibly return more than one row. is the key point

finite crow Jun 18, 2024, 8:24 PM

#

Ok, I think I see now.

#

Doesn't that blow up the number of rows?

#

It seems like it would be kind of inefficient.

hollow oar Jun 18, 2024, 8:26 PM

#

yeah i would assume so.

finite crow Jun 18, 2024, 8:31 PM

#

Yeah wtf, it would end up generating the trillions of rows like you're supposed to be avoiding.

#

I don't get what their idea is.

finite crow Jun 18, 2024, 8:39 PM

#

finite crow ```postgresql WITH cte1 AS ( SELECT SUM(num_users) OVER (ORDER BY searches) ...

Anyways, are there any ways that this could be written more concisely?

finite crow Jun 18, 2024, 8:56 PM

#

I kept having a hard time with stuff like not being able to nest window functions, or not being able to look at the maximum in a column in the WHERE clause. Stuff like that made it so I couldn't figure out how to reduce the number of CTEs

hollow oar Jun 18, 2024, 9:04 PM

#

i think this is just a quiz on your knowledge on window function, if you have written SUM(...) OVER(...) then i am already happy as an interviewer tbh.

#

the rest is just an execrise of how careful you are

#

i would have used a bunch of scalar query instead of cross joins

WITH cumsum AS (
  SELECT
    SUM(num_users) OVER(ORDER BY searches) AS running_sum, searches
  FROM
    search_frequency
), total_users AS (
  SELECT SUM(num_users) AS val FROM search_frequency
), mid_point AS (
  SELECT val::decimal / 2.0 AS val FROM total_users
), aux AS (
  SELECT
    (SELECT searches FROM cumsum WHERE running_sum <= (SELECT val FROM mid_point) ORDER BY running_sum DESC LIMIT 1) AS bot
    , (SELECT searches FROM cumsum WHERE running_sum > (SELECT val FROM mid_point) ORDER BY running_sum ASC LIMIT 1) AS top
    , MOD((SELECT val FROM total_users), 2) = 0 AS should_avg 
)
SELECT 
  CASE WHEN should_avg THEN (bot + top)::decimal / 2.0
  ELSE bot
  END
FROM aux

#

(not sure if the above is correct as well tbh.)

finite crow Jun 18, 2024, 9:39 PM

#

hollow oar (not sure if the above is correct as well tbh.)

Yeah it was

#

SELECT ROUND(
  CASE WHEN should_avg THEN (bot + top)::decimal / 2.0
  ELSE bot
  END,
  1
)
FROM aux

#

WITH cte as(
    SELECT *,
        SUM(num_users) OVER(ORDER BY searches)    AS cumsum_users,
        SUM(num_users) OVER()                     AS tot_users
    FROM search_frequency
)
SELECT ROUND(AVG(searches), 1) AS median
FROM cte
WHERE (
    tot_users <= cumsum_users * 2
    AND tot_users >= (cumsum_users - num_users) * 2
)
;

Here's one someone posted in the discussions. The first comment posted there, the one with this answer, points out how inefficient the recommended answer is.

hollow oar Jun 18, 2024, 9:51 PM

#

finite crow ```postgresql WITH cte as( SELECT *, SUM(num_users) OVER(ORDER BY se...

that's neat

finite crow Jun 18, 2024, 10:54 PM

#

SUM(something) OVER ()
That's one of the things I was looking for.

#

I could have done that instead of the cte that just joins with the MAX(uc_high) cte.

finite crow Jun 18, 2024, 10:59 PM

#

finite crow `SUM(something) OVER ()` That's one of the things I was looking for.

It only calculates SUM(something) once, right?

#

I don't have a very good idea of how smart query optimizers are.

coral wasp Jun 18, 2024, 11:20 PM

#

finite crow ```postgresql WITH cte as( SELECT *, SUM(num_users) OVER(ORDER BY se...

What was the original question?

finite crow Jun 18, 2024, 11:23 PM

#

coral wasp What was the original question?

https://datalemur.com/questions/median-search-freq
This one.

finite crow Jun 18, 2024, 11:23 PM

#

finite crow ```postgresql WITH cte1 AS ( SELECT SUM(num_users) OVER (ORDER BY searches) ...

I came up with this.

coral wasp Jun 18, 2024, 11:26 PM

#

Why not use percentile_cont?

finite crow Jun 18, 2024, 11:28 PM

#

Because then you need to generate two trillion rows.

coral wasp Jun 18, 2024, 11:31 PM

#

Why would it generate two trillion rows?

finite crow Jun 18, 2024, 11:34 PM

#

Because it says there are two trillion users, so using GENERATE_SERIES for each num_users would create that number of rows.

finite crow Jun 19, 2024, 5:05 AM

#

SELECT CAST(ROUND(CAST(SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END), FLOAT) / COUNT(*), 2), TEXT) || '%' AS percent_of_male_patients
FROM patients
;

near ",": syntax error
Why is there a syntax error in this?

finite crow Jun 19, 2024, 5:22 AM

#

SELECT CAST(ROUND((CAST(SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) AS TEXT) || '%' AS percent_of_male_patients
FROM patients
;

This fixed it.

#

SELECT round(100 * avg(gender = 'M'), 2) || '%' AS percent_of_male_patients
FROM patients
;

Apparently this works too.

#

Is giving a boolean expression to an aggregate function only possible on certain SQL implementations? It lets me do it on sql-practice but not on datalemur.

coral wasp Jun 19, 2024, 11:21 AM

#

finite crow ```SQL SELECT round(100 * avg(gender = 'M'), 2) || '%' AS percent_of_male_patien...

Use a sql formatter like https://sql-formatter-org.github.io/sql-formatter/

coral wasp Jun 19, 2024, 11:22 AM

#

finite crow Is giving a boolean expression to an aggregate function only possible on certain...

Yes, I think that works in MySQL but not Postgres for instance. See dbfiddle.uk to test on different dbs. ** but it's normally not allowed by SQL standard: use an if statement or case to convert to binary, or use count(*) filter .... instead

torn sphinx Jun 19, 2024, 3:47 PM

#

fading patrol Is that Amazon Linux? Have you tried `sudo yum install docker`? https://www.cybe...

i did it!

torn sphinx Jun 19, 2024, 3:48 PM

#

torn sphinx i did it!

now how do i use this to make a database? should i make it in docker or use docker to make it somewhere else?

#

I am trying to figure out how to deploy a django REST api with a MySQLdatabase on an AWS EC2 server
In order to do this should I still use Postgres/Docker to access MySQL or should I use AWS RDS or something?

finite crow Jun 19, 2024, 5:39 PM

#

coral wasp Yes, I think that works in MySQL but not Postgres for instance. See dbfiddle.uk ...

Thanks. I hate CASE for ternaries, it takes up so much space. IF(expression, 1, 0) is okay, but it doesn't work on datalemur.

finite crow Jun 19, 2024, 6:31 PM

#

I completed all the available hard difficulty SQL questions on datalemur, sql-practice, and leetcode. But I still find that I need to look at the documentation a lot.

coral wasp Jun 19, 2024, 8:03 PM

#

finite crow I completed all the available hard difficulty SQL questions on datalemur, sql-pr...

So do I

wild lotus Jun 19, 2024, 9:55 PM

#

hello if someone could help me that would be great. its stupid but i need help merging a csv file into my pycharm ide file

#

#

oh wow this indian lady just helped me on youtube, 1 hour of trying solved in 30 seconds, #indian people are great

#

does this look accurate? now i can edit and manipulate the data is that correct?

waxen finch Jun 20, 2024, 1:13 AM

#

@woeful coral #discord-bots message
yes, there are a few different sqlite extensions that provide encryption support

personally i found SQLite3MultipleCiphers the most convenient mainly because it provides prebuilt windows DLLs (see their releases page), but also because its completely free compared to SQLCipher and sqlite's official encryption extension, SEE

fading patrol Jun 20, 2024, 1:39 AM

#

torn sphinx I am trying to figure out how to deploy a django REST api with a MySQLdatabase o...

You seem very confused and might need to take a few steps back,. follow a basic tutorial or two, etc.

For starters, Postgres and MySQL are both DB engines so you want to pick one or the other and not use both.

You can host either one via RDS instead of a Docker container which is more convenient and reliable for production with automatic backups and such , but does cost money.

finite crow Jun 20, 2024, 2:01 AM

#

coral wasp So do I

But would they let you look it up in an interview?

coral wasp Jun 20, 2024, 2:54 AM

#

finite crow But would they let you look it up in an interview?

I guess it depends on what 'it' is. I don't really know: I see very few candidates who can solve basic sql problems, but I'm also good at teaching it, so I don't really know.

torn sphinx Jun 20, 2024, 3:35 AM

#

Anyone experienced with setting up local mongodb?

#

I’m trying to decide if I should do my apps rate limiting/usage tracking through local mongodb or redis and shared mongodb which gets periodically synced.

Obviously I want speed, the reason I’m doing this in the first place, however if the speed isn’t THAT different from local mongodb and redis id gladly take the little slowdown of mongodb to make it easy and no syncing needed. But if the difference is substantial or not even that far from shared mongodb time then might as well go through the work of redis and syncing

#

I tried installing local mongodb got ton of errors that seemed common but all the common fixes did nothing for me…

Because of this, I wasn’t able to run any benchmarks to see what my best choice would be.

So I’m asking for either help setting it up locally on Ubuntu or with giving me your experiences using local mongodb to get an idea of the speed. Thanks

keen minnow Jun 20, 2024, 4:07 AM

#

<@&831776746206265384> shitposting

radiant elbow Jun 20, 2024, 4:09 AM

#

!cban 1245012707523170399 You seem to have joined just to troll.

delicate fieldBOT Jun 20, 2024, 4:09 AM

#

:incoming_envelope: :ok_hand: applied ban to @bleak plover permanently.

fading patrol Jun 20, 2024, 10:49 AM

#

torn sphinx I tried installing local mongodb got ton of errors that seemed common but all th...

Spinning up mongodb with the official Docker container should be trivial, and highly performant as well

daring laurel Jun 20, 2024, 1:20 PM

#

Howdy there, having some issues with my sqlalchemy set up. I am getting a reference error. I have tried forward referencing, and using the typing's List type rather than the build in list. I am still getting blocked and am very confused. Any advice would be much appreciated. I have included the code below as well.

sqlalchemy.exc.ArgumentError: Could not locate SQLAlchemy Core type for Python type typing.List[services.host.ip.ip_record_sqlalchemy.IpRecordSqlAlchemy] inside the 'ip_history' attribute Mapped annotation

ip_record_sqlalchemy.py

from datetime import datetime
from typing import TYPE_CHECKING
from uuid import uuid4

from services.automatic_fiesta_base import AutomaticFiestaBase
from services.host.ip.ip_common import IpDTO
from sqlalchemy.orm import Mapped, mapped_column, relationship

if TYPE_CHECKING:
    from services.host.host_record_sqlalchemy import HostRecordSqlAlchemy


class IpRecordSqlAlchemy(AutomaticFiestaBase):
    __tablename__ = "ips"

    id: Mapped[int] = mapped_column(primary_key=True, default=uuid4().int)
    address: Mapped[str] = mapped_column()
    last_seen: Mapped[datetime] = mapped_column()
    host: Mapped["HostRecordSqlAlchemy"] = relationship(back_populates="ip_history")


def ip_record_to_dto(ip_record: IpRecordSqlAlchemy) -> IpDTO:
    return IpDTO(
        address=ip_record.address,
        last_seen=ip_record.last_seen,
    )

host_record_sqlalchemy.py in thread.

daring laurel Jun 20, 2024, 1:23 PM

#

daring laurel Howdy there, having some issues with my sqlalchemy set up. I am getting a refere...

oops, i don't see an option for threads, so here it goes:

host_record_sqlalchemy.py

from datetime import datetime
from typing import TYPE_CHECKING, List
from uuid import uuid4

from pydantic import PositiveInt
from services.automatic_fiesta_base import AutomaticFiestaBase
from services.host.host_common import (
    HostDTO,
    HostRecordNotFoundException,
)
from services.host.host_record_base import HostRecordDatastoreABC
from services.host.ip.ip_common import IpDTO
from services.host.ip.ip_record_sqlalchemy import IpRecordSqlAlchemy, ip_record_to_dto
from services.scan.directory_scan.directory_scan_record_sqlalchemy import (
    directory_scan_record_to_dto,
)
from services.sqlalchemy.sqlalchemy_database import SqlAlchemyDatabase
from sqlalchemy import select
from sqlalchemy.orm import Mapped, mapped_column, relationship

if TYPE_CHECKING:
    from services.scan.directory_scan.directory_scan_record_sqlalchemy import (
        DirectoryScanRecordSqlAlchemy,
    )


class HostRecordSqlAlchemy(AutomaticFiestaBase):
    __tablename__ = "hosts"

    id: Mapped[int] = mapped_column(primary_key=True, default=uuid4().int)
    hostname: Mapped[str] = mapped_column()
    os_type: Mapped[str] = mapped_column()
    os_version: Mapped[str] = mapped_column()
    os_full_type: Mapped[str] = mapped_column()
    ip_history: Mapped[List["IpRecordSqlAlchemy"]] = mapped_column(back_populates="host")
    directory_scans: Mapped[List["DirectoryScanRecordSqlAlchemy"]] = relationship(
        back_populates="host"
    )
    check_ins: Mapped[List[datetime]] = mapped_column()

daring laurel Jun 20, 2024, 1:41 PM

#

daring laurel oops, i don't see an option for threads, so here it goes: host\_record\_sqlalch...

I realized that I did not do a forward ref in host_record_sqlalchemy.py. I have updated the type checking to this:

if TYPE_CHECKING:
    from services.scan.directory_scan.directory_scan_record_sqlalchemy import (
        DirectoryScanRecordSqlAlchemy,
    )
    from services.host.ip.ip_record_sqlalchemy import IpRecordSqlAlchemy

However, getting similar error

sqlalchemy.exc.ArgumentError: Could not resolve all types within mapped annotation: "sqlalchemy.orm.base.Mapped[typing.List[ForwardRef('IpRecordSqlAlchemy')]]".  Ensure all types are written correctly and are imported within the module in use.

daring laurel Jun 20, 2024, 1:53 PM

#

daring laurel I realized that I did not do a forward ref in `host_record_sqlalchemy.py`. I hav...

I am silly,

ip_history: Mapped[List["IpRecordSqlAlchemy"]] = mapped_column(back_populates="host")

should be

ip_history: Mapped[List["IpRecordSqlAlchemy"]] = relationship(back_populates="host")

need to be assigned to a relationship not a mapped column. my bad...

torn sphinx Jun 20, 2024, 2:14 PM

#

I am running this command correctly by using two lines?

jagged light Jun 20, 2024, 2:27 PM

#

bash-4.4# is part of the prompt. Don't copy it when following guides.

torn sphinx Jun 20, 2024, 2:46 PM

#

jagged light bash-4.4# is part of the prompt. Don't copy it when following guides.

new plan, i am using aws instead of powershell, but this happened. How do i write commands once i run mysql in aws?

jagged light Jun 20, 2024, 2:59 PM

#

you can't. aws databases are hosted, so you can't access the console.

#

run mysql on a ec2 instance using the host of the database

torn sphinx Jun 20, 2024, 3:53 PM

#

jagged light run mysql on a ec2 instance using the host of the database

ok I am trying to get it running on powershell first but somehow the password is not example_pass ideas?

jagged light Jun 20, 2024, 4:17 PM

#

Is your database exposed to the internet?

torn sphinx Jun 20, 2024, 4:29 PM

#

jagged light Is your database exposed to the internet?

no

jagged light Jun 20, 2024, 4:29 PM

#

then you have to do it from an ec2 instance.

#

This is all aws, right?

torn sphinx Jun 20, 2024, 4:43 PM

#

jagged light This is all aws, right?

yes

torn sphinx Jun 20, 2024, 4:45 PM

#

jagged light then you have to do it from an ec2 instance.

how do i access my docker through AWS EC2?

jagged light Jun 20, 2024, 4:45 PM

#

install the mysql-client

#

sudo yum install mysql

#

you might also be able to use dnf instead of yum depending on the os version

#

It's dnf on 2023

torn sphinx Jun 20, 2024, 4:52 PM

#

torn sphinx Jun 20, 2024, 4:52 PM

#

jagged light `sudo yum install mysql`

do I need to run anything else before that?

jagged light Jun 20, 2024, 4:53 PM

#

Looks like you need to manually add the mysql repo

#

https://dev.mysql.com/downloads/repo/yum/

#

Follow instructions here. https://dev.mysql.com/doc/refman/8.0/en/linux-installation-yum-repo.html

fading patrol Jun 20, 2024, 4:56 PM

#

torn sphinx how do i access my docker through AWS EC2?

I suspect the complete lack of context for your question may be causing confusion.... Where is docker running? On your local machine?

torn sphinx Jun 20, 2024, 4:58 PM

#

fading patrol I suspect the complete lack of context for your question may be causing confusio...

i have it downloaded so i think so

fading patrol Jun 20, 2024, 4:58 PM

#

torn sphinx how do i access my docker through AWS EC2?

Wait, I see the earlier screenshot that shows MySQL running on a different ec2?

torn sphinx Jun 20, 2024, 4:58 PM

#

they time out or otherwise have errors so i have to restart them (unless i am doing something wrong?)

torn sphinx Jun 20, 2024, 5:04 PM

#

fading patrol Spinning up mongodb with the official Docker container should be trivial, and hi...

there wouldnt be any latency overhead from docker? - how does speed compare against similar actions via redis (obviously redis is faster, just trying to see by how much)

#

"Install the downloaded release package with the following command, replacing platform-and-version-specific-package-name with the name of the downloaded RPM package:" What does this mean, after I download the package I have to manually edit it through powershell or something?

torn sphinx Jun 20, 2024, 5:07 PM

#

torn sphinx "Install the downloaded release package with the following command, replacing pl...

will it only understand these commands after I install MySQL?

jagged light Jun 20, 2024, 5:21 PM

#

amazon linux uses dnf, not apt

torn sphinx Jun 20, 2024, 5:43 PM

#

Hi, i'm trying to get mysql working with my flask application but i keep getting AttributeError: 'NoneType' object has no attribute 'cursor' when i try to get any data from my table. I've double checked the config details.

fading patrol Jun 20, 2024, 7:28 PM

#

torn sphinx there wouldnt be any latency overhead from docker? - how does speed compare agai...

Negligible in theory but best to do your own testing. If you're comparing a local container vs a remote cloud instance, that added network latency is what's going to matter

west blaze Jun 20, 2024, 8:12 PM

#

Is there a way to generate and write migration files for postgres in pure python anyone can recomend?

#

Like, I'd like to make a migrationfile that creates a couple of tables. Any packages that would allow me to generate such a file and ensure that the migration is applied to the db?

fading patrol Jun 20, 2024, 8:17 PM

#

west blaze Is there a way to generate and write migration files for postgres in pure python...

Alembic? I'd rather use Django though

wooden topaz Jun 20, 2024, 8:38 PM

#

Hello,
I am currently solving https://datalemur.com/questions/sql-bloomberg-stock-min-max-1

As a first step, i came up with below query

SELECT ticker,
MAX(open) highest_open,
MIN(open) lowest_open
FROM stock_prices
GROUP BY ticker
ORDER BY ticker

corresponding output is:

#

#

Now, need one more column TO_CHAR(date, 'Mon-YYYY') mth in the above query. could any one pls tell me how to do this.

west blaze Jun 20, 2024, 9:14 PM

#

fading patrol Alembic? I'd rather use Django though

Why? Also is it possible to use Django migration functionality without applying the framework overall in the broader scope?

finite crow Jun 21, 2024, 12:17 AM

#

wooden topaz Hello, I am currently solving https://datalemur.com/questions/sql-bloomberg-stoc...

I had that in a CTE and then joined it with stock_prices twice. Once for the highest dates and again for the lowest dates.

#

When you're using MIN and/or MAX, is there a way to list values associated with the minimum and maximum rows aside from the minimum and maximum values?

finite crow Jun 21, 2024, 12:30 AM

#

wooden topaz Hello, I am currently solving https://datalemur.com/questions/sql-bloomberg-stoc...

It would make this easier and probably faster.

finite crow Jun 21, 2024, 12:30 AM

#

finite crow I had that in a CTE and then joined it with stock_prices twice. Once for the hig...

Compared to this way.

fading patrol Jun 21, 2024, 12:34 AM

#

west blaze Why? Also is it possible to use Django migration functionality without applying ...

I always had frustrating problems with Alembic I couldn't understand... Django's migrations just work.

You would need to declare and maintain your models.py but otherwise I think you could ignore the rest.

coral wasp Jun 21, 2024, 1:33 AM

#

finite crow When you're using MIN and/or MAX, is there a way to list values associated with ...

Some databases have argmax/arguing functions. Search that.

#

Here's a Postgres implementation: https://github.com/bashtanov/argm

#

And I'd be a bad DuckDB shill if I didn't: https://duckdb.org/docs/sql/aggregates.html#arg_maxarg-val

coral wasp Jun 21, 2024, 3:02 AM

#

wooden topaz Hello, I am currently solving https://datalemur.com/questions/sql-bloomberg-stoc...

If you're still stuck: if you know the highs for every stock, then you can join back to the original data where the high = value

finite crow Jun 21, 2024, 4:12 AM

#

coral wasp Some databases have argmax/arguing functions. Search that.

So you couldn't use it on datalemur? That sucks. They should add it to the vanilla version of postgresql, it seems like it would be useful.

dawn moss Jun 21, 2024, 1:03 PM

#

yo currently using postgresql with fastapi & sqlalchemy

#

was wondering how should I implement RLS or RBAC

west blaze Jun 21, 2024, 3:26 PM

#

fading patrol I always had frustrating problems with Alembic I couldn't understand... Django's...

Ok, thanks!

wooden topaz Jun 21, 2024, 5:47 PM

#

coral wasp If you're still stuck: if you know the highs for every stock, then you can join ...

finally: after lot of tries & tries since 3 days. I mean, there is a solution in the site, but looking at that & proceeding in the same way would gives the answer, but i thought of trying on my own, step by step by understanding...

SELECT 
  ticker,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MAX(t1.open)) highest_mth,
  MAX(t1.open) highest_open,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MIN(t1.open)) lowest_mth,
  MIN(t1.open) lowest_open
FROM stock_prices t1
GROUP BY ticker
ORDER BY ticker

#

By the way, i did it using corelated sub query, on the other hand you suggested to use JOINS right. I did not get that.
could you pls modify the above query using JOINS & show me once(Nothing urgent, whenever you have few minutes). Thx

coral wasp Jun 21, 2024, 6:04 PM

#

wooden topaz By the way, i did it using corelated sub query, on the other hand you suggested ...

Try something like: FROM stock_prices t1 JOIN ( SELECT ticker, MAX(open) max_open, MAX(date) date FROM stock_prices GROUP BY ticker ) highest ON t1.ticker = highest.ticker AND t1.open = highest.max_open

coral wasp Jun 21, 2024, 6:52 PM

#

wooden topaz finally: after lot of tries & tries since 3 days. I mean, there is a solution in...

Your version is almost right, but the correlated subqueries are wrong (you need to also join on ticker, and you need to handle case where multiple dates match the high)

wooden topaz Jun 21, 2024, 6:57 PM

#

coral wasp Your version is almost right, but the correlated subqueries are wrong (you need ...

not right? The solution got accepted...

wise wind Jun 21, 2024, 7:17 PM

#

        dm = Dms(sender=username_1,receiver=username_2)
        dm_message = Dm_Messages(text="hi",username=username_1,date=datetime.now())
        dm.dm_messages.append(dm_message)```

#

UPDATE dm_messages SET dm=$1::INTEGER WHERE dm_messages.id = $2::INTEGER

#

[generated in 0.00020s] (None, 1)

#

When I try to create a related sqlalchemy model via relationship, it runs this query under the hood.

#

This is leading to a not null constraint error. Is there anyway to avoid this behavior?

coral wasp Jun 21, 2024, 7:35 PM

#

wooden topaz not right? The solution got accepted...

What would your query do if two tickets had the same high?

wooden topaz Jun 21, 2024, 7:37 PM

#

coral wasp What would your query do if two tickets had the same high?

already thought about this & enhanced the query....

#

SELECT 
  ticker,
  MAX(t1.open) highest_open,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MAX(t1.open)
    LIMIT 1) highest_mth,
  MIN(t1.open) lowest_open,
  (SELECT TO_CHAR(date, 'Mon-YYYY') 
    FROM stock_prices as t2 
    where t2.open = MIN(t1.open)
    LIMIT 1) lowest_mth
FROM stock_prices t1
GROUP BY ticker
ORDER BY ticker

coral wasp Jun 21, 2024, 7:37 PM

#

Thats doesn't fix the problem correctly: what if two tickers had the same max in different days?

wooden topaz Jun 21, 2024, 8:14 PM

#

coral wasp Thats doesn't fix the problem correctly: what if two tickers had the same max in...

any ways, i tried new approach:

WITH highest_values AS(
SELECT ticker,
MAX(open) highest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS row_num
FROM stock_prices
GROUP BY ticker, open, date 
ORDER BY ticker),

lowest_values AS(
SELECT ticker,
MIN(open) lowest_open,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS row_num
FROM stock_prices
GROUP BY ticker, open, date 
ORDER BY ticker)

SELECT 
highest_values.ticker,
highest_values.highest_mth, 
highest_values.highest_open,
lowest_values.lowest_mth,
lowest_values.lowest_open
FROM highest_values
JOIN lowest_values
ON highest_values.row_num = lowest_values.row_num
  AND highest_values.row_num = 1
  AND lowest_values.row_num = 1

#

look at the last two lines alone.....
i am only joining the rows where the row number is 1, right?
but at the output, every row is coming.

coral wasp Jun 21, 2024, 8:27 PM

#

Inspect each of your subqeuries first

#

Are they returning what you want?

wooden topaz Jun 21, 2024, 8:48 PM

#

coral wasp Are they returning what you want?

yes

coral wasp Jun 21, 2024, 8:50 PM

#

I don't believe they are: I'm talking about highest values and lowest values.

wooden topaz Jun 21, 2024, 8:54 PM

#

coral wasp I don't believe they are: I'm talking about highest values and lowest values.

irrespective of highest & lowest, as per join condition, only those rows whose row number = 1 should be matched, right?

coral wasp Jun 21, 2024, 8:55 PM

#

wooden topaz irrespective of highest & lowest, as per join condition, only those rows whose r...

Your where clause does filter on row-num = 1

wooden topaz Jun 21, 2024, 9:00 PM

#

coral wasp Your where clause does filter on row-num = 1

correct. thats because i only want the 1st row.
by the way, it's ON clause

coral wasp Jun 21, 2024, 9:01 PM

#

So why are you asking about the row number? You're only getting row numbers 1, but your subqueries are wrong.

wooden topaz Jun 21, 2024, 9:03 PM

#

coral wasp So why are you asking about the row number? You're only getting row numbers 1, b...

sub queries are worng? For example, if you see the sub query which gets the highest values

#

whats wrong with it?

#

#

for each and every ticker, this sub query is presenting all the rows from highest to lowest with corresponding row_num's

coral wasp Jun 21, 2024, 9:08 PM

#

Also, secondly, look at your on clause: you're not comparing tickers

coral wasp Jun 21, 2024, 9:08 PM

#

wooden topaz sub queries are worng? For example, if you see the sub query which gets the high...

If this is your intent, why do you even had a group by?

wooden topaz Jun 21, 2024, 9:12 PM

#

coral wasp If this is your intent, why do you even had a group by?

There are many tickers, but the values need to be calculated per ticker group

coral wasp Jun 21, 2024, 9:12 PM

#

That's what the window partition does.

#

Row_number() over (partition by ticker order by open) ranks each row by open, for each ticker.

wooden topaz Jun 21, 2024, 9:14 PM

#

coral wasp `Row_number() over (partition by ticker order by open) ` ranks each row by open,...

right.

coral wasp Jun 21, 2024, 9:14 PM

#

So what's the group by doing?

wooden topaz Jun 21, 2024, 9:14 PM

#

how about this: this MAX(OPEN) --> this aggregate needs to be executed per GROUP of ticker.

#

coral wasp Jun 21, 2024, 9:15 PM

#

Why? Row number is already telling you the largest value

#

The way you wrote it, that's the max value for each (ticker, open, date), which is the same as the open (since only one value per date). Max does nothing.

wooden topaz Jun 21, 2024, 9:18 PM

#

coral wasp The way you wrote it, that's the max value for each (ticker, open, date), which ...

yes. but unless you keep ticker & date inside the GROUP BY, it wont work.

coral wasp Jun 21, 2024, 9:19 PM

#

wooden topaz yes. but unless you keep ticker & date inside the GROUP BY, it wont work.

Your aggregate is doing nothing in this query.

#

You're computing the max(open) over sets of exactly 1 value, since there's only one open for each (ticker, open, date)

wooden topaz Jun 21, 2024, 9:21 PM

#

coral wasp Your aggregate is doing nothing in this query.

right & i also corrected the mistake at JOIN ON clause & below is the perfect working query

WITH highest_values AS(
SELECT ticker,
open highest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open DESC) AS row_num
FROM stock_prices),

lowest_values AS(
SELECT ticker,
open lowest_open,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS row_num
FROM stock_prices)

SELECT 
highest_values.ticker,
highest_values.highest_mth, 
highest_values.highest_open,
lowest_values.lowest_mth,
lowest_values.lowest_open
FROM highest_values
JOIN lowest_values
ON highest_values.ticker = lowest_values.ticker
  AND highest_values.row_num = 1
  AND lowest_values.row_num = 1

coral wasp Jun 21, 2024, 9:24 PM

#

You know you don't need two subqueries now

coral wasp Jun 21, 2024, 9:24 PM

#

wooden topaz right & i also corrected the mistake at JOIN ON clause & below is the perfect wo...

You could compute both ranks in one query

#

And I'd optimize by pulling the to_char into the outer select. You only need it for the result

wooden topaz Jun 21, 2024, 9:30 PM

#

coral wasp You could compute both ranks in one query

you mean joining same table with itself?

coral wasp Jun 21, 2024, 9:31 PM

#

wooden topaz you mean joining same table with itself?

Yes (same query)

#

Or just refactoring to have one window query, then a highest and lowest query that uses that, and then a final query to join them

wooden topaz Jun 21, 2024, 9:36 PM

#

coral wasp Yes (same query)

WITH data AS(
SELECT ticker,
open highest_open,
open lowest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open DESC) AS highest_row_num,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY open) AS lowest_row_num
FROM stock_prices)

SELECT 
d1.ticker,
d1.highest_mth, 
d1.highest_open,
d2.lowest_mth,
d2.lowest_open
FROM data d1
JOIN data d2
ON d1.ticker = d2.ticker
  AND d1.highest_row_num = 1
  AND d2.lowest_row_num = 1

#

i know below things can be written only once, but atleast from readabiliy perspective, it looks easy to understand like this. thats why left like this

#

open highest_open,
open lowest_open,
TO_CHAR(date, 'Mon-YYYY') highest_mth,
TO_CHAR(date, 'Mon-YYYY') lowest_mth,

coral wasp Jun 21, 2024, 10:27 PM

#

Now, there's a trick you can use to eliminate the join, using an aggregate

#

You could get the first() filtered where row num is 1

#

And I would move the t o_char to the outer query.

wooden parrot Jun 22, 2024, 12:32 PM

#

Hello! I want to ask in MongoDb how to update or push the array by matching a key and value pair?
So when I used $push operator it always inserting new array in test. But I want this to behave like, it should match with the user_id if same user_id already exist in the test array just update that else it will push an new array to test. How can I do it in a single operation.

data = {
    "test": [
        {
            "user_id": 1,
            "key": "value"
        },
        {
            "user_id": 2,
            "key": "value"
        }
    ]
}

fading patrol Jun 22, 2024, 1:34 PM

#

wooden parrot Hello! I want to ask in `MongoDb` how to update or push the array by matching a ...

I believe upsert is what you're asking for: https://www.mongodb.com/docs/drivers/node/current/fundamentals/crud/write-operations/upsert/

Insert or Update in a Single Operation

wooden topaz Jun 22, 2024, 9:02 PM

#

When i am reading about AWS step functions, it is always referred as serverless fn orchestrator. So, an step function can not orchestrate services like EC2, because it is server based service?
There is one more AWS service named AWS Glue, which is called as server less Data pipeline provided by AWS. Through AWS we can define whole work flow such as data loading, data cleaning & data loading. In fact we can even schedule jobs.
So, isn't it both AWS Glue & Step fn's are doing same job?

mortal solstice Jun 22, 2024, 9:34 PM

#

using .json file as my discord bot database? Is this good or suitable if i plan to make my bot high scale?!

storm mauve Jun 22, 2024, 10:36 PM

#

it is a very bad idea if you want to scale at all

#

I would recommend starting with SQLite + https://github.com/Rapptz/asqlite but you can look into MongoDB if you want to stick with an unstructured data format

GitHub

GitHub - Rapptz/asqlite: A simple async wrapper for sqlite3

A simple async wrapper for sqlite3. Contribute to Rapptz/asqlite development by creating an account on GitHub.

fading patrol Jun 22, 2024, 11:12 PM

#

wooden topaz When i am reading about AWS step functions, it is always referred as serverless ...

You can orchestrate multiple Glue crawlers in parallel with Step Functions. https://aws.amazon.com/blogs/compute/orchestrating-aws-glue-crawlers-using-aws-step-functions/

Amazon Web Services

Orchestrating AWS Glue crawlers using AWS Step Functions | Amazon W...

This blog post is written by Justin Callison, General Manager, AWS Workflow. Organizations generate terabytes of data every day in a variety of semistructured formats. AWS Glue and Amazon Athena can give you a simpler and more cost-effective way to analyze this data with no infrastructure to manage. AWS Glue crawlers identify the schema of […]

kind burrow Jun 23, 2024, 2:11 PM

#

Hi, Does anyone know how to tackle this error ?
js exceptional error 503

coral wasp Jun 23, 2024, 3:03 PM

#

kind burrow Hi, Does anyone know how to tackle this error ? js exceptional error 503

This is a db channel, not js. You can ask in off topic.

torn sphinx Jun 24, 2024, 2:03 AM

#

how deep does the database rabbit hole go? Wha am I in for?

coral wasp Jun 24, 2024, 2:12 AM

#

torn sphinx how deep does the database rabbit hole go? Wha am I in for?

All the way to Australia

torn sphinx Jun 24, 2024, 2:37 AM

#

coral wasp All the way to Australia

Get me a kangaroo and let’s go to the outback then.

wise goblet Jun 24, 2024, 4:55 AM

#

torn sphinx how deep does the database rabbit hole go? Wha am I in for?

the regular non web developer usually just learns Raw SQL, able to use Sqlite3 if needed, and calls it a day.
Usually learn designing relational tables and writing complex queries to query anything

for backend developers (data engineers) journey only begins with raw SQL in depth, preferably for specific also engine details (like postgresql, or optionally mysql/mariadb)
Also of course learn designing and querying in all ways
They learn how to debug it and performance optimize it (with operating millions of row records), with solving its different issues from at query level to maintenance level a bit
small enough app can be using relational db for almost everything.

As they need to go further, it is not uncommon learning Elastic Search or its equvalients. (to feel gaps in functionality)
working with blob storage databases for designing dynamic assets stateless (for horizontal scaling) backend apps
and working with other dbs depending on needs (than more data, more scale of app, then more specific usage cases can be needed)
redis/memcache for caching stuff
redis/rabbitmq or alterantives for message queue stuff (or even Kafka is common for event streaming db)
due to managed databases simplifying life a lot, it is also common choosing cloud provider to specialize in (AWS, GCP or Azure)

Maintanance of databases is usually in hands of System Administrators.
but they can be having also Database Administrators, specialized guys responsible for it with expertise in depth
If they have DevOps engineers, they can be usually responsible for infrastructure as a code approach to keep running managed databases at cloud provider, or having running dbs in kubernetes like https://cloudnative-pg.io/

CloudNativePG - PostgreSQL Operator for Kubernetes

#

#

fifty shades of databases, with its self hosted options, and cloud provider specific managed alternatives

wise goblet Jun 24, 2024, 5:03 AM

#

torn sphinx how deep does the database rabbit hole go? Wha am I in for?

So at certain level every dev learns it
But for backend (+data engineers + database experts) devs databases make the major part of technology fluff they need to learn

#

also databases are one of main reasons (but not all of them) why frontend devs learning small bit of backend and calling themselves full stack devs are heavily frowned upon sometimes. They have to learn dealing with databases properly, otherwise app is greatly screwed up in terms of performance and storage consumptions. At beginner level there are plenty of rookie mistakes that can make 100 times difference.

manic gyro Jun 24, 2024, 5:09 AM

#

I have this sqlite3 database for a bot for a VR game that adds a leveling system, but when I try to use this function I get None returned even if the user id is in the database, does anyone know why this would happen?

def get_level(user_id):
    c.execute('''SELECT level FROM users WHERE id = ?''', (user_id,))
    result = c.fetchone()
    if result:
        return result[0]
    else:
        return None

wise goblet Jun 24, 2024, 5:13 AM

#

manic gyro I have this sqlite3 database for a bot for a VR game that adds a leveling system...

Try viewing db with https://sqlitebrowser.org/ for easier seeing what is inside
Add print/logging functioning verifying what you insert as user_id
Try to query same queries manually in the sqlitebrowser ui

DB Browser for SQLite

wise goblet Jun 24, 2024, 5:23 AM

#

manic gyro I have this sqlite3 database for a bot for a VR game that adds a leveling system...

also, try using fetchall instead of fetchone ^_^ just for debug to see a difference
and give comparison with removed WHERE condition and fetchall used too

torn sphinx Jun 24, 2024, 1:36 PM

#

wise goblet So at certain level every dev learns it But for backend (+data engineers + datab...

thank you mate

torn sphinx Jun 24, 2024, 7:52 PM

#

is sqlite better on python or vscode?

wise wind Jun 24, 2024, 7:53 PM

#

torn sphinx is sqlite better on python or vscode?

vscode

coral wasp Jun 24, 2024, 9:26 PM

#

torn sphinx is sqlite better on python or vscode?

The question is nonsensical. What do you mean?

cosmic surge Jun 24, 2024, 9:51 PM

#

Does anyone know how to transfer 1 postgres db thats in a docker to another host? But without losing the data in it..?

#

Just needing docs or something on it?

torn sphinx Jun 24, 2024, 9:55 PM

#

coral wasp The question is nonsensical. What do you mean?

so which is better?

coral wasp Jun 24, 2024, 9:58 PM

#

torn sphinx so which is better?

Which what?

coral wasp Jun 24, 2024, 9:59 PM

#

torn sphinx so which is better?

Did you mean pycharm vs VSCode?

torn sphinx Jun 24, 2024, 10:44 PM

#

coral wasp Did you mean pycharm vs VSCode?

no, like two years ago I downloaded some sqlite extension for VS code

fading patrol Jun 24, 2024, 11:15 PM

#

cosmic surge Does anyone know how to transfer 1 postgres db thats in a docker to another host...

https://www.postgresql.org/docs/current/backup-dump.html

PostgreSQL Documentation

26.1. SQL Dump

26.1. SQL Dump # 26.1.1. Restoring the Dump 26.1.2. Using pg_dumpall 26.1.3. Handling Large Databases The idea behind this dump method …

fading patrol Jun 24, 2024, 11:16 PM

#

torn sphinx so which is better?

One is a programming language and one is a IDE

waxen finch Jun 24, 2024, 11:16 PM

#

torn sphinx no, like two years ago I downloaded some sqlite extension for VS code

which is better in terms of interactively exploring sqlite databases? writing a python script can be cumbersome for that depending on what you need, as for your vscode extension i cant say how good the experience is

personally i recommend the SQLiteStudio desktop app for interacting with databases, you can easily see the schema, get an overview of a table's data, and of course run queries on it
https://sqlitestudio.pl/

if however you need to programmatically access your database or automate some work, only python can do that

cosmic surge Jun 24, 2024, 11:17 PM

#

fading patrol https://www.postgresql.org/docs/current/backup-dump.html

Ty.

#

We already got it, but thank you.

fading patrol Jun 24, 2024, 11:18 PM

#

torn sphinx no, like two years ago I downloaded some sqlite extension for VS code

I'd use that for convenience sometimes if I'm just checking something. For more indepth database work I'd use DBeaver. What's "better" is always dependent on context and preference

torn sphinx Jun 25, 2024, 4:38 AM

#

fading patrol One is a programming language and one is a IDE

I don’t remember. It was 2 years ago. It was SQLite that someone made and you just made it a environment variable and you could use it in VSCODE. I only brought it up because I learned SQLite that way first

wise goblet Jun 25, 2024, 5:49 AM

#

cosmic surge Does anyone know how to transfer 1 postgres db thats in a docker to another host...

An easier approach than SQL dump, can be... Just moving docker volumes to new host
For large DBS that can be faster at least

cosmic surge Jun 25, 2024, 5:50 AM

#

wise goblet An easier approach than SQL dump, can be... Just moving docker volumes to new ho...

"We already got it, but thank you."

fading patrol Jun 25, 2024, 12:49 PM

#

cosmic surge "We already got it, but thank you."

Deleting your question might be a more appropriate response than wasting multiple people's time with it and then being snarky about it. 😅

slim garden Jun 25, 2024, 5:45 PM

#

#

does this schema make sense to yall

#

is it even normalized

cosmic surge Jun 25, 2024, 5:54 PM

#

fading patrol Deleting your question might be a more appropriate response than wasting multipl...

"We already got it, but thank you."

fading patrol Jun 25, 2024, 5:55 PM

#

slim garden is it even normalized

room_number could be a little dicey but maybe acceptable. Would you ever want to store anything else about a room, like how many students it accommodates? If yes, make it a table

slim garden Jun 25, 2024, 6:20 PM

#

fading patrol `room_number` could be a little dicey but maybe acceptable. Would you ever want ...

thats fair. i was more curious about the enrollments relation referencing 3 tables

fading patrol Jun 25, 2024, 6:27 PM

#

slim garden thats fair. i was more curious about the enrollments relation referencing 3 tabl...

I see... Maybe this is better: https://dba.stackexchange.com/questions/278459/school-database-schema

Database Administrators Stack Exchange

School Database Schema

Im working on a simple school system which handles student registration and schedule. Furthermore, the system should handle different type of schools Such as Kindergarten, Primary, Secondary, and H...

delicate fieldBOT Jun 25, 2024, 9:35 PM

#

:incoming_envelope: :ok_hand: applied timeout to @true zinc until <t:1719351915:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

hushed wren Jun 25, 2024, 11:42 PM

#

I tried to import large CSV files to MySQL Workbench database but I got this error. I wonder if there's any chance I coudl fix?

Screen_Shot_2024-06-25_at_4.41.48_PM.png

#

You mean, change the data type?

#

Should I change the data type in MySQL or with Python?

#

The file is 2.7 GB, I think it may take plenty of time for me to process the file in Python.

wise wind Jun 25, 2024, 11:54 PM

#

hushed wren The file is 2.7 GB, I think it may take plenty of time for me to process the fil...

always do the processing out of Python if you can because usually the third party tool is usually 100x faster

hushed wren Jun 25, 2024, 11:55 PM

#

What tools should I use and free?

wise wind Jun 25, 2024, 11:55 PM

#

hushed wren I tried to import large CSV files to MySQL Workbench database but I got this err...

Could you send more context?

#

I should've asked that earlier.

wise wind Jun 25, 2024, 11:57 PM

#

hushed wren What tools should I use and free?

MySQL (what you're using) is pretty good since it runs a C compiler.

hushed wren Jun 25, 2024, 11:57 PM

#

Sure, lemme upload the file to Google Drive and I'll send you the link

#

Link: https://drive.google.com/file/d/1dm-XXWkq_H_yn9GrkfP6DkYZIqge20W2/view?usp=share_link

#

Lmk what you think

wise wind Jun 26, 2024, 12:12 AM

#

hushed wren Link: https://drive.google.com/file/d/1dm-XXWkq_H_yn9GrkfP6DkYZIqge20W2/view?usp...

Dang. I can see why mysql raised that error. That's a lot of data.

hushed wren Jun 26, 2024, 12:13 AM

#

I was able to read this file in Python

#

Should I load this to a dataframe then load to MySQL database?

wise wind Jun 26, 2024, 12:14 AM

#

hushed wren Should I load this to a dataframe then load to MySQL database?

Can MySQL take in that much data?

#

Is it on a cloud server?

hushed wren Jun 26, 2024, 12:14 AM

#

I use Community Edition

wise wind Jun 26, 2024, 2:54 AM

#

user = await db.execute(select(Users).where(Users.username == usernames[index]).options(selectinload(Users.sent_dms)
                                                                                                    .selectinload(Dms.dm_messages)
                                                                                                    .joinedload(Dm_Messages.user))
                                                                                                     .execution_options(populate_existing=True))```

#

Does anyone know why the relationships aren't loading? The relationships load when I run this query:

#

user = await db.execute(select(Users).where(Users.username == usernames[index]).options(selectinload(Users.sent_dms)
                                                                                                    .selectinload(Dms.dm_messages)
                                                                                                    )
                                                                                                    .execution_options(populate_existing=True))```

#

but not when I run the top query

hushed wren Jun 26, 2024, 3:30 AM

#

hushed wren I tried to import large CSV files to MySQL Workbench database but I got this err...

Update: I imported the data successfully. I just created a table in the database, then write SQL to load the CSV files, much faster than using Data Import Wizard and Python

chilly gulch Jun 26, 2024, 9:03 AM

#

anyone here famiiliar with structure charts?

unreal obsidian Jun 26, 2024, 10:11 AM

#

Can anyone provide me dataset for generators(of thermal power plant).

torn sphinx Jun 26, 2024, 12:58 PM

#

how do I run this sql lite project through an AWS EC2 server, should i sudo apt install sqllite?

coral wasp Jun 26, 2024, 1:15 PM

#

torn sphinx how do I run this sql lite project through an AWS EC2 server, should i sudo apt ...

SQLite is in the Python stdlib.

#

https://docs.python.org/3/library/sqlite3.html

torn sphinx Jun 26, 2024, 1:31 PM

#

coral wasp SQLite is in the Python stdlib.

I am confused, is thought_sync_django-main.zip accessible to the EC2 server, how do I fix this?

#

coral wasp Jun 26, 2024, 1:33 PM

#

Can you open a help thread? #❓｜how-to-get-help

torn sphinx Jun 26, 2024, 1:37 PM

#

coral wasp Can you open a help thread? <#704250143020417084>

sure

torn sphinx Jun 26, 2024, 1:50 PM

#

coral wasp Can you open a help thread? <#704250143020417084>

ready when you are : ]

warm igloo Jun 26, 2024, 3:17 PM

#

I have a question. Is anyone here familiar with the WISKI software? I'm trying to take data stored on there and then importing that data into Python.

warm igloo Jun 26, 2024, 3:37 PM

#

One library I found when looking was called kiwis_pie. Sadly there's not much information about it other than the documentation.

warm igloo Jun 26, 2024, 6:34 PM

#

Never mind. I'm likely going to be able to query the data in Python via SQL Server thankfully.

neat umbra Jun 26, 2024, 9:52 PM

#

can i ask for php here?

coral wasp Jun 26, 2024, 10:22 PM

#

neat umbra can i ask for php here?

We don't have a php channel. You can ask in one of the off topic channels.

#

!topic

delicate fieldBOT Jun 26, 2024, 10:22 PM

#

Off-topic channels

There are three off-topic channels:

The channel names change every night at midnight UTC and are often fun meta references to jokes or conversations that happened on the server.

See our off-topic etiquette page for more guidance on how the channels should be used.

torn sphinx Jun 27, 2024, 1:17 AM

#

Can anyone help me with executing postgres query using psycopg ?

torn sphinx Jun 27, 2024, 8:38 AM

#

It's been a while since I messed with databases, so I was wondering how to structure a database for a simple project of mine?

#

I'm making a bot that tracks scores of users on a per server basis.

#

The idea being that your score is specific to only your server.

#

So I think it only has userID, serverID, and score, but I'm not sure how to design it to seperate the scores.

torn sphinx Jun 27, 2024, 9:16 AM

#

Guess this channel isn't useful, I am leaving.

#

*left

wise goblet Jun 27, 2024, 9:20 AM

#

torn sphinx So I think it only has userID, serverID, and score, but I'm not sure how to desi...

Score table
Value, foreign key to User, foreign key to Server
Shrimple as that 🦐

brave tundra Jun 27, 2024, 9:20 AM

#

for row in cur.execute("select distinct(player_id) from results order by match_no"):
list.append(row[0])

#

Why does the none appear at last?

#

I only have 8 rows

#

or if I remove the 'distinct' it became like this.. more none

wise goblet Jun 27, 2024, 9:24 AM

#

brave tundra I only have 8 rows

Query results show you have more than 8 rows ^-^

#

Deselect nones with Where condition if u wish not seeing those rows

brave tundra Jun 27, 2024, 9:25 AM

#

here's the db

#

there really is only 8 rows

wise goblet Jun 27, 2024, 9:26 AM

#

U may have filter in place to ignore some rows

brave tundra Jun 27, 2024, 9:26 AM

#

oh I'll check it thanks

#

nope no filter

#

nvm I addded 'is not null'

#

thanks

fresh grail Jun 27, 2024, 9:38 AM

#

Which database package should I use with python?

#

The databases I do with sqlite3 malfunction all the time

hexed estuary Jun 27, 2024, 9:44 AM

#

Malfunction? How?

torn sphinx Jun 27, 2024, 9:46 AM

#

I'm actually reading up on sqlite3 now being how I should probably use it.

#

Where would you guys store your database?

fresh grail Jun 27, 2024, 9:48 AM

#

hexed estuary Malfunction? How?

The database cannot be read after working because it's corrupt

#

I'd need to say, the database gets created by the program

brazen charm Jun 27, 2024, 9:58 AM

#

You are definitely doing something wrong if that is the case

#

It is almost certainly not an sqlite issue

hexed estuary Jun 27, 2024, 10:00 AM

#

torn sphinx Where would you guys store your database?

You mean, like, in what directory?

torn sphinx Jun 27, 2024, 10:02 AM

#

hexed estuary You mean, like, in what directory?

Yeah, pretty much.

fresh grail Jun 27, 2024, 10:08 AM

#

I'll send the relevant code in

#

try:
    # Verbindung zur SQLite-Datenbank herstellen
    conn = sqlite3.connect('battle_data.db')
    cursor = conn.cursor()

    # Tabelle erstellen (falls nicht bereits vorhanden)
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS Battle (
       BattleID VARCHAR(255) PRIMARY KEY,
       BattleTime VARCHAR(255),
       BattleMode VARCHAR(255),
       BattleMap VARCHAR(255),
       BattleResult VARCHAR(255),
       PlayerName1 VARCHAR(255),
       PlayerName2 VARCHAR(255),
       PlayerName3 VARCHAR(255),
       PlayerName4 VARCHAR(255),
       PlayerName5 VARCHAR(255),
       PlayerName6 VARCHAR(255),
       Player1 VARCHAR(255),
       Player2 VARCHAR(255),
       Player3 VARCHAR(255),
       Player4 VARCHAR(255),
       Player5 VARCHAR(255),
       Player6 VARCHAR(255),
       Brawler1 VARCHAR(255),
       Brawler2 VARCHAR(255),
       Brawler3 VARCHAR(255),
       Brawler4 VARCHAR(255),
       Brawler5 VARCHAR(255),
       Brawler6 VARCHAR(255)
    )
    ''')

    # Daten in die Tabelle einfügen
    cursor.execute('''
    INSERT INTO Battle (
        BattleID, BattleTime, BattleMode, BattleMap, BattleResult, 
        PlayerName1, PlayerName2, PlayerName3, PlayerName4, PlayerName5, PlayerName6,
        Player1, Player2, Player3, Player4, Player5, Player6, 
        Brawler1, Brawler2, Brawler3, Brawler4, Brawler5, Brawler6
    ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', battle_data)

    # Änderungen speichern
    conn.commit()

except sqlite3.DatabaseError as e:
    # Fehlerprotokollierung und detaillierte Fehlermeldung
    print(f"Failed to insert battle data: {e}")
    raise sqlite3.DatabaseError(f"Failed to insert battle data: {e}")

finally:
    # Verbindung schließen
    if conn:
        conn.close()

hexed estuary Jun 27, 2024, 10:16 AM

#

torn sphinx Yeah, pretty much.

Depends on what's in it - e.g. if it's something like settings, store it in a platform-specific app folder (e.g. on Windows it'd be in %APPDATA%, and in general you can use appdirs). If in doubt, can always just use the folder your program is in - that'd be a portable installation of sorts.

torn sphinx Jun 27, 2024, 10:30 AM

#

hexed estuary Depends on what's in it - e.g. if it's something like settings, store it in a pl...

Sounds good. Thanks for the advice.

torn sphinx Jun 27, 2024, 1:03 PM

#

has anybody used sqlalchemy alembic? is it a common pattern to create migrations as models are defined?

fading patrol Jun 27, 2024, 1:49 PM

#

torn sphinx has anybody used sqlalchemy alembic? is it a common pattern to create migrations...

I only have a little bit, but I think yes, that's probably a good idea.

spare fox Jun 27, 2024, 8:25 PM

#

hi guys! so my company is copy pasting all the data from google manually..... to get real time data like working hours every week. Is it possible to scrape that from the website? and not actually write it in the table

coral wasp Jun 27, 2024, 8:28 PM

#

spare fox hi guys! so my company is copy pasting all the data from google manually..... to...

Ask over in #python-discussion , but you may run into a Terms of Use issue (see #rules / rule 5) because Google doesn't like scraping

fervent charm Jun 27, 2024, 10:51 PM

#

anyone used kdb+

coral wasp Jun 27, 2024, 11:40 PM

#

fervent charm anyone used kdb+

Only briefly, it's somewhat on the other end of what I tend to focus on

#

What's up tho?

fervent charm Jun 27, 2024, 11:56 PM

#

coral wasp What's up tho?

looking to store daily trade data at minute resolution. also I am broke lol. Also storing daily reddit info from api, so far have about 250 parquet files for it, so getting unwieldy and some database is probably in order. doing all processing of reddit data with polars because pandas was too slow

#

kdb has like a one year free trial but honestly whos trying to deal with that and renewing and stuff (and I think I have to learn their weird q language).. probably gonna go with more ML approach rather than try to compete with HFT which is what kdb tries to cater to

coral wasp Jun 27, 2024, 11:57 PM

#

fervent charm looking to store daily trade data at minute resolution. also I am broke lol. Als...

What's the problem with the parquet files? Hive partitioned? Have you tried something like DuckDB to query the hive partitioned parquets?

#

1min candles are usually manageable without something like KDB

#

I'm a DuckDB shill but you can get pretty far querying a directory of partitioned parquet files. Might need to compute some rollups first/etc, but good partitioning leads to efficient queries

fervent charm Jun 28, 2024, 12:02 AM

#

coral wasp What's the problem with the parquet files? Hive partitioned? Have you tried some...

Not hive partitioned, just learning about that. One problem for example, if I want to access data from Feb 4th 2022 I'd have to iterate through all of them until I find the one with the right data

coral wasp Jun 28, 2024, 12:05 AM

#

That's (partly) what the partitioning is for.

#

Plus: https://duckdb.org/docs/data/parquet/overview.html#partial-reading

fervent charm Jun 28, 2024, 12:10 AM

#

so I should just read in everything and rewrite it to more organized folders?

coral wasp Jun 28, 2024, 12:13 AM

#

fervent charm so I should just read in everything and rewrite it to more organized folders?

If you can partition the data in a manner aligned with your queries, that's worth an experiment.

dim hull Jun 28, 2024, 12:42 AM

#

hey anyone here watched CMU Intro to Database Systems course because i tried to watch it and lot of things i didn't understand i already have some database knowledge with sql but this is the first course i found it hard to understand so is this normal or what

coral wasp Jun 28, 2024, 12:45 AM

#

dim hull hey anyone here watched CMU Intro to Database Systems course because i tried to ...

Not familiar with it, but what did you find hard?

fervent charm Jun 28, 2024, 12:59 AM

#

stuff like timescale db has automatic partitioning, researching that rn

dim hull Jun 28, 2024, 1:10 AM

#

coral wasp Not familiar with it, but what did you find hard?

the course overall i saw the first 3 lectures and there are somethings i couldn't understand and relate like database storage

coral wasp Jun 28, 2024, 1:18 AM

#

fervent charm stuff like timescale db has automatic partitioning, researching that rn

My preference lately for parquet is the portability and ergonomics: it's easy to move, access via other engines, and you can get it to work reasonably well: sure, other systems might be faster, but depends on what you're trying to do.

coral wasp Jun 28, 2024, 1:18 AM

#

dim hull the course overall i saw the first 3 lectures and there are somethings i couldn'...

If you can ask something specific, I'm sure someone will try to help

coral wasp Jun 28, 2024, 1:21 AM

#

fervent charm stuff like timescale db has automatic partitioning, researching that rn

Timescale is pretty cool, though. Applicable to a lot of use cases.

brave tundra Jun 28, 2024, 1:26 AM

#

for i in semi_finalist(): cur.execute('update player set seed = 1 where player_id = {i}')

#

how do I check the player_id is = to the i

#

the semi_finalist() is a list

coral wasp Jun 28, 2024, 1:28 AM

#

brave tundra `for i in semi_finalist(): cur.execute('update player set seed = 1 where...

Don't do what you're trying to do.

#

!sql

delicate fieldBOT Jun 28, 2024, 1:28 AM

#

SQL & f-strings

Don't use f-strings (f"") or other forms of "string interpolation" (%, +, .format) to inject data into a SQL query. It is an endless source of bugs and syntax errors. Additionally, in user-facing applications, it presents a major security risk via SQL injection.

Your database library should support "query parameters". A query parameter is a placeholder that you put in the SQL query. When the query is executed, you provide data to the database library, and the library inserts the data into the query for you, safely.

For example, the sqlite3 package supports using ? as a placeholder:

query = "SELECT * FROM stocks WHERE symbol = ?;"
params = ("RHAT",)
db.execute(query, params)

Note: Different database libraries support different placeholder styles, e.g. %s and $1. Consult your library's documentation for details.

See Also

Python sqlite3 docs - How to use placeholders to bind values in SQL queries
PEP-249 - A specification of how database libraries in Python should work

coral wasp Jun 28, 2024, 1:29 AM

#

You are missing the 'f' to make it an fstring, but you never want to do that. Do what the above text says to do instead.

brave tundra Jun 28, 2024, 1:29 AM

#

okay thanks

coral wasp Jun 28, 2024, 1:31 AM

#

And, for efficiency's sake, you should run a single update instead of a loop. You can do this in a single query with a list of ids.

brave tundra Jun 28, 2024, 1:33 AM

#

alright I'll give it a try

#

thanks

coral wasp Jun 28, 2024, 1:34 AM

#

Can't do = a list

#

You want 'in (?)' instead

#

Ie: Give me all rows where the id is in the id_list

brave tundra Jun 28, 2024, 1:37 AM

#

[1, 2, 3, 4]

fervent charm Jun 28, 2024, 1:39 AM

#

coral wasp Timescale is pretty cool, though. Applicable to a lot of use cases.

yeah apparently kdb actually kinda sucks so not gonna put in extra effort to use them

#

timescale also seems nice for adding data in real time from a stream while duckdb not so much for that

coral wasp Jun 28, 2024, 1:41 AM

#

HFT vs non is a big divide. I live on the non side, so my tech stack is more throughput vs latency focused

fervent charm Jun 28, 2024, 1:41 AM

#

and timescale seems flexible enough that I can use it for financial metrics, economic indicators, as well as the typical time series

coral wasp Jun 28, 2024, 1:41 AM

#

fervent charm timescale also seems nice for adding data in real time from a stream while duckd...

DuckDB (OLAP in general) is for the consumption side

fervent charm Jun 28, 2024, 1:44 AM

#

coral wasp DuckDB (OLAP in general) is for the consumption side

As the only one involved in my project I am consumption and production lol

coral wasp Jun 28, 2024, 1:45 AM

#

Or perhaps said differently: you'd still need a production pipeline to retrieve (or receive) and store the data.

brave tundra Jun 28, 2024, 1:45 AM

#

coral wasp You want 'in (?)' instead

thanks I used this and it worked

coral wasp Jun 28, 2024, 1:45 AM

#

One option is timescale, another is parquet, etc

coral wasp Jun 28, 2024, 1:46 AM

#

brave tundra thanks I used this and it worked

In general, when working with data, try to avoid loops. Theres usually a better way.

brave tundra Jun 28, 2024, 3:07 AM

#

Thanks a lot

cedar tiger Jun 28, 2024, 5:53 AM

#

Need help with generating proper SQL statements or at least save SQL statements into a SQL file so that I can execute them into the database. https://discord.com/channels/267624335836053506/1256124450139668530

fresh grail Jun 28, 2024, 7:24 AM

#

My databases keep getting "malformed" but it successed in inserting data

#

I use sqlite3

#

When I download the database and want a website to read it for me, it says it can't

#

Does anyone knows about how to fix it

fading patrol Jun 28, 2024, 11:41 AM

#

fresh grail Does anyone knows about how to fix it

https://stackoverflow.com/questions/18259692/how-to-recover-a-corrupt-sqlite3-database

Stack Overflow

How to recover a corrupt SQLite3 database?

This is a follow up question to a previously answered post: Is there a command line utility for validating SQLite databases in Linux?
If a database is producing the following error:
$ sqlite3 mydat...

brave ingot Jun 28, 2024, 7:06 PM

#

hey! I'm trying to avoid hard-coding relationship strings in sqlalchemy to make everything more type safe

for example, if I have such models:

class User(Base):
    id = Column(Integer, primary_key=True)
    is_active = Column(Boolean, default=True)

    items = relationship(lambda: Item, back_populates="owner")


class Item(Base):
    id = Column(Integer, primary_key=True)
    owner_id = Column(Integer, ForeignKey(User.id))

    owner = relationship(lambda: User, back_populates="items")

how do I avoid hard-coding values like "owner" or "items" of back_populates?

astral current Jun 28, 2024, 9:59 PM

#

Best database for beginners doing dpy?

wise wind Jun 28, 2024, 10:25 PM

#

astral current Best database for beginners doing dpy?

postgres

fervent charm Jun 28, 2024, 10:32 PM

#

coral wasp One option is timescale, another is parquet, etc

not sure if youve ever heard of it but Im leaning towards questdb. seems to be maybe half as popular as timescale though

coral wasp Jun 28, 2024, 10:34 PM

#

fervent charm not sure if youve ever heard of it but Im leaning towards questdb. seems to be m...

Do you really have this kind of workload though? 1 minute candles isn't super high frequency.

wise goblet Jun 28, 2024, 10:34 PM

#

astral current Best database for beginners doing dpy?

What is dpy?

wise wind Jun 28, 2024, 10:53 PM

#

await db.execute(select(Users).options(selectinload(Users.sent_dms).selectinload(Dms.dm_messages).selectinload(Dm_Messages.user)))``` Whenever I try to load in `Dm_Messages.user`, nothing loads leading to a greenlet error, but whenever I don't load it, the relationships load like normal. Does anyone know what could be causing this issue?

slim ingot Jun 28, 2024, 11:01 PM

#

#

got a thing going

fervent charm Jun 28, 2024, 11:10 PM

#

coral wasp Do you really have this kind of workload though? 1 minute candles isn't super hi...

I did have a bottleneck of reading data in for a web app I made using daily price data

#

ended up using an lru cache and just dealth with the first run being kinda slow

#

So if questdb makes reads faster at the cost of just not having a DELETE operation available...

#

though now I am seeing someone on reddit say: "I tried to migrate from numpy .npy files to a timeseries database. I saw a degradation of 10x in read performance. I scrapped the idea and stuck with flat files."

#

for the web app I actually tried using clickhouse thinking itd be faster but it wasnt (I was using pandas' read_parquet() instead of polars or duckdb )

coral wasp Jun 29, 2024, 12:07 AM

#

Databases are a frustrating specialty.... I aim for good enough and predictably scalable, over perfect/etc

astral current Jun 29, 2024, 1:36 PM

#

wise goblet What is dpy?

Discord python

astral current Jun 29, 2024, 1:37 PM

#

wise wind postgres

Never heard of that are you sure bro ?

wise goblet Jun 29, 2024, 1:41 PM

#

astral current Best database for beginners doing dpy?

Sqlite3 for the win for simple beginner projects

astral current Jun 29, 2024, 1:41 PM

#

wise goblet Sqlite3 for the win for simple beginner projects

Is that a website ?

wise goblet Jun 29, 2024, 1:41 PM

#

astral current Is that a website ?

No, it is database

astral current Jun 29, 2024, 1:41 PM

#

I need resources I code on mobile

astral current Jun 29, 2024, 1:41 PM

#

wise goblet No, it is database

O I thought sqlite3 fir the win for simple beginner projects was the name of the website

wise goblet Jun 29, 2024, 1:42 PM

#

Coding on mobile is not intended 😕

astral current Jun 29, 2024, 1:42 PM

#

wise goblet Coding on mobile is not intended 😕

I dont have a pc 😭

wise goblet Jun 29, 2024, 1:43 PM

#

astral current I dont have a pc 😭

Start with a plan to get one then -_-.

astral current Jun 29, 2024, 1:43 PM

#

wise goblet Start with a plan to get one then -_-.

You got a plan?

wise goblet Jun 29, 2024, 1:44 PM

#

astral current You got a plan?

I am not you too know your conditions.
Choices can vary from asking parents/relatives to buy one
To getting summer job to get the money

astral current Jun 29, 2024, 1:44 PM

#

wise goblet I am not you too know your conditions. Choices can vary from asking parents/rel...

Oh

#

I thought you were talking about sqlite3

#

Don't worry I'm gonna buy one next month

astral current Jun 29, 2024, 1:45 PM

#

wise goblet I am not you too know your conditions. Choices can vary from asking parents/rel...

I'm 20 so I can get one by myself 🙂

brave tundra Jun 29, 2024, 1:51 PM

#

match = 'delete from matches' result = 'delete from results' cur.execute(match) cur.execute(result)

#

why isn't the record in the table deleted

coral wasp Jun 29, 2024, 2:32 PM

#

Did you commit?

bright sigil Jun 29, 2024, 3:39 PM

#

I wanna make a voting system using python, and html/css/js what's the best db to use? around 1000 voters, I thought I'm gonna use SQLite, but how do I handle multiple simultaneous votes? Cause I looked it up it and SQLite can only have 1 write simultaneously.
I'm pretty new to this db and networking stuff, and concurrency (if it's needed)

#

Help? Please 🥺

#

Guidance for what I should learn

waxen finch Jun 29, 2024, 5:14 PM

#

bright sigil I wanna make a voting system using python, and html/css/js what's the best db to...

sqlite can handle concurrency reasonably well if you ensure your connections/transactions are short-lived, and sqlite also has a WAL mode to remove contention between readers and writers

Situations Where SQLite Works Well

Websites
SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.

Server-side database
With this pattern, the overall system is still client/server: clients send requests to the server and get back replies over the network. But instead of sending generic SQL and getting back raw table content, the client requests and server responses are high-level and application-specific. The server translates requests into multiple SQL queries, gathers the results, does post-processing, filtering, and analysis, then constructs a high-level reply containing only the essential information.

#

you should learn how to design good database schemas, like what tables you should create, what primary/foreign keys you should use, normal forms you should follow, using indexes to optimize particular queries, etc. and then get used to making connections and executing queries to retrieve/insert into your database

(see also the SQLBolt interactive tutorial if you want practice with writing queries, and the SQLite docs to learn their SQL syntax like INSERT, SELECT, UPDATE, etc.)

brave tundra Jun 29, 2024, 7:28 PM

#

coral wasp Did you commit?

Oh shoot yes thanks

fading patrol Jun 29, 2024, 9:37 PM

#

I see some really old references to people using SQLite and other SQL databases as the message broker for Celery. The Celery documentation doesn't mention this possibility.

Does anyone happen to know if it is still possible?

storm mauve Jun 29, 2024, 9:53 PM

#

fading patrol I see some really old references to people using SQLite and other SQL databases ...

this I think?

#

only as a result backend though, not as a message broker?

storm mauve Jun 29, 2024, 10:05 PM

#

fading patrol I see some really old references to people using SQLite and other SQL databases ...

taking a look at the update history: https://github.com/celery/celery/blob/main/docs/history/whatsnew-4.0.rst#features-removed-for-lack-of-funding

GitHub

celery/docs/history/whatsnew-4.0.rst at main · celery/celery

Distributed Task Queue (development branch). Contribute to celery/celery development by creating an account on GitHub.

fading patrol Jun 29, 2024, 10:26 PM

#

storm mauve taking a look at the update history: https://github.com/celery/celery/blob/main/...

Good find, thanks!

steady epoch Jun 30, 2024, 10:22 AM

#

should i use sqlalchemy with asyncpg? or just sqlachemy alone , i want good performance
is using it is overkill?

wise goblet Jun 30, 2024, 2:44 PM

#

steady epoch should i use sqlalchemy with asyncpg? or just sqlachemy alone , i want good perf...

Since we don't know how serious your app (and what kind of quering usage cases it has), we can't recommend if u should use it or not.
By default i don't recommend, because u bring extra code complexity to code with having async stuff

#

I also don't recommend such extra over compleixities to people who did not cover their database interacting code with unit tests and added their execution into CI

steady epoch Jun 30, 2024, 2:55 PM

#

wise goblet Since we don't know how serious your app (and what kind of quering usage cases i...

its not very big

opal fiber Jun 30, 2024, 6:15 PM

#

Guys i need help

why does python deleted my " HELP " from the " python-help " ??

opal fiber Jun 30, 2024, 6:32 PM

#

Anyway i will post my problem here 😁😁😁

coral wasp Jun 30, 2024, 7:41 PM

#

opal fiber Anyway i will post my problem here 😁😁😁

Threads are closed after 1 hour of inactivity.

opal fiber Jun 30, 2024, 8:21 PM

#

BRUH ☠️

coral wasp Jun 30, 2024, 8:31 PM

#

opal fiber BRUH ☠️

Do you have a question?

opal fiber Jun 30, 2024, 8:57 PM

#

coral wasp Do you have a question?

I have a big problem

coral wasp Jun 30, 2024, 8:59 PM

#

opal fiber I have a big problem

...

opal fiber Jun 30, 2024, 9:00 PM

#

i'm in need of help to finish my term paper i know this is for simple help, but i really need the help of a good soul so i can finish it, so here goes;

My project consists of a control of entry and exit of students using the QR-CODE of the student card when entering you must show the card to the laptop* and take the photo of the card (The QR-CODE shows the student's enrollment that will be previously registered with the student's data that I have already done).

At this point, the system must record the time in the MySQL database, and the same thing applies to the student's departure.

I'm doing my project in Python and MySQL

The problem I have is this enrollment data that I can't transport and save the time and the link to the database.

I need to present it on July 2nd now 😢

opal fiber Jun 30, 2024, 9:09 PM

#

coral wasp ...

.

coral wasp Jun 30, 2024, 9:11 PM

#

opal fiber i'm in need of help to finish my term paper i know this is for simple help, but ...

You want to write a record to a database table when the student enters, and write a record when the student exits.

#

What have you done already for the database? Do you have a database? Have you created a database table?

opal fiber Jun 30, 2024, 9:12 PM

#

I created the database

#

And the tables

#

I just need help to put these things usable

coral wasp Jun 30, 2024, 9:16 PM

#

opal fiber I just need help to put these things usable

Where are you stuck?

opal fiber Jun 30, 2024, 9:17 PM

#

The alter table and the relationtip with the main python code

coral wasp Jun 30, 2024, 9:18 PM

#

Please share the relevant code and ask a specific question, I still don't know what you need help with.

opal fiber Jun 30, 2024, 9:18 PM

#

Ok

coral wasp Jun 30, 2024, 9:18 PM

#

The more specific the better

fervent charm Jun 30, 2024, 9:33 PM

#

Is it bad to have thousands of tables

coral wasp Jun 30, 2024, 9:34 PM

#

fervent charm Is it bad to have thousands of tables

My short answer: probably. Why do you have so many?

fervent charm Jun 30, 2024, 9:34 PM

#

coral wasp My short answer: probably. Why do you have so many?

One per stock / etf (minute level data)

coral wasp Jun 30, 2024, 9:35 PM

#

Why would you put each ticker in a separate table?

fervent charm Jun 30, 2024, 9:35 PM

#

I dont have any yet, just trying to see how I should do this

fervent charm Jun 30, 2024, 9:35 PM

#

coral wasp Why would you put each ticker in a separate table?

Something tells me it'd be more efficient

coral wasp Jun 30, 2024, 9:35 PM

#

Just use one table and have a ticker column.

#

What you're trying to do is partitioning, and partitioning can be done within a table.

#

The idea is sound, just the wrong tool

fervent charm Jun 30, 2024, 9:37 PM

#

Like, I don't put all my price data into a single parquet file, there's a separate praquet file for each. So naturally I didnt think that putting everything into a single table would be correct

fervent charm Jun 30, 2024, 9:37 PM

#

coral wasp What you're trying to do is partitioning, and partitioning can be done within a ...

ok

#

My experience with databases has been limited to school busy-work until now and definitely never worked with time series data in a db before

coral wasp Jun 30, 2024, 9:38 PM

#

Many tables would require dynamically selecting tables in the sql
Query, something sql doesn't allow (can't parametrize table names)

#

Most database optimizations come down to: indexing, partitioning, and orientation (columnar vs row)

#

(And roll ups and sharding and denormalization and caching)

fervent charm Jun 30, 2024, 9:41 PM

#

I'm going with questdb so far, seems pretty good. Its column oriented and as a result faster than timescale, but pretty sure the catch is you can't delete rows easily

#

It described itself as OLAP, while describing timescaledb being more OLTP

opal fiber Jun 30, 2024, 10:55 PM

#

xd

#

XAMPP

#

Yes i selected all the stuff

coral wasp Jun 30, 2024, 11:27 PM

#

'Unknown MySQL server host'

opal fiber Jun 30, 2024, 11:41 PM

#

coral wasp 'Unknown MySQL server host'

Thanks bro i didn't drink coffee this afternoon

i will add more stuff to let complete

coral wasp Jun 30, 2024, 11:41 PM

#

Your server host should probably be localhost, not pou

opal fiber Jun 30, 2024, 11:41 PM

#

Yeah LMAO

tulip estuary Jul 1, 2024, 10:24 AM

#

coral wasp You want to write a record to a database table when the student enters, and writ...

u can use context managers , to do an action , right after a connexion is closed

steady epoch Jul 1, 2024, 1:00 PM

#

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
import asyncpg
from config import DatabaseConfig
from sqlalchemy import MetaData

DATABASE_URL = f"postgresql+asyncpg://{DatabaseConfig.username}:{DatabaseConfig.password}@{DatabaseConfig.hostname}:{DatabaseConfig.db_port}/{DatabaseConfig.db_name}"
print(DATABASE_URL)
DEFAULT_DATABASE_URL = f"postgresql://{DatabaseConfig.username}:{DatabaseConfig.password}@localhost:5432/postgres"
print(DEFAULT_DATABASE_URL)
# Base for declarative class configuration
Base = declarative_base(metadata=MetaData())


async def create_database_if_not_exists():
    try:
        # Connect to the default database
        default_engine = await asyncpg.connect(DEFAULT_DATABASE_URL)
        # Check if the target database exists
        result = await default_engine.fetchrow(f"SELECT 1 FROM pg_database WHERE datname = '{DatabaseConfig.db_name}'")
        print(result)
        if not result:
            # Create the target database if it doesn't exist
            await default_engine.execute(f'CREATE DATABASE "{DatabaseConfig.db_name}"')
            print(f"Database '{DatabaseConfig.db_name}' created successfully")
            await default_engine.commit()
        await default_engine.close()
    except Exception as e:
        print(e)


async def create_tables():
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)
        await conn.run_sync(Base.metadata.create_all)

        print("Tables created successfully")


async def initialize_database():
    try:
        await create_database_if_not_exists()
        await create_tables()
    except Exception as e:
        print(f"Error during database initialization: {e}")

# Create the async engine for the target database
engine = create_async_engine(
    DATABASE_URL,
    echo=True,            # Set to True for debugging SQL queries (optional)
    # Enable pool_pre_ping to check for stale connections (recommended)
    pool_pre_ping=True,
    pool_size=20,         # Maximum number of connections in the pool
    max_overflow=0        # Maximum number of connections that can be created beyond pool_size
)

# Create a configured "Session" class
AsyncSessionLocal = sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False
)

why tables are not getting created : (

#

literally i cant figure out where i am doing wrong everything works but then it doesnt create any table at all

coral wasp Jul 1, 2024, 7:48 PM

#

Add more print statements and narrow it down.

hexed estuary Jul 1, 2024, 7:54 PM

#

except Exception as e:
    print(e)

don't ever do this. at the very least do traceback.format_exc. otherwise you're discarding the valuable traceback.

spare hemlock Jul 1, 2024, 8:40 PM

#

unsure if better here or elsewhere but someone knows if it's possible to read a JS JSON in python? like it isn't plain JSON but a JS file meant to emulate JSON (along with variable names, etc…). right now made it able to read file from internet, now only to read content from it 🙂

#

like i want this file to be read (and putted to variables) by python: https://hacknorris.codeberg.page/alchemy/db.js

hexed estuary Jul 1, 2024, 8:46 PM

#

manually strip the variable stuff and load the rest as 3 different json files (or as python code for that matter)

spare hemlock Jul 1, 2024, 8:46 PM

#

it have 3 "tables"

#

aka - 3 variables

#

and i want them to be loaded into separate variables in python (number of "tables" in JS should be same as number of variables in python)

hexed estuary Jul 1, 2024, 8:48 PM

#

so, why not just change let initial = ["fire", "water", "air", "earth"]; to initial = ["fire", "water", "air", "earth"]; and similarly for the other two, and then just import that as a python file?

#

or must it be automatic?

spare hemlock Jul 1, 2024, 8:49 PM

#

?

#

JS is broken then

#

(i want one file for 2 apps - one in JS and second - local client in python)

fervent charm Jul 1, 2024, 9:09 PM

#

#

Is this accurate

#

https://www.timescale.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables

coral wasp Jul 1, 2024, 11:31 PM

#

I dunno if I'd say narrow is easier

#

Maybe easier to stick data in

#

But often performs terribly, and dealing with data types is problematic

#

It also compromises various compression and optimizations, and managing data integrity can be harder

wooden topaz Jul 2, 2024, 12:01 AM

#

--> Created an .tf file where i wrote the code to create an lambda fn
--> After executing the series of commands init, plan & apply, the lambda got created on AWS console.
--> Now, if i want to use that lambda, i need run that lambda from AWS console, pass the input & see the output.
--> But, i want to execute this lambda from my console & see the output also in my console if self without touching AWS portal. Can terraform do this?

lavish hull Jul 2, 2024, 1:21 AM

#

Hello everyone just a quick question:
I have a get request to one of my models lets say Departments and im passing a column value from a dataset which is typically a short string like 'La county' but sometimes it is a string with multiple counties as one string like 'La county, multnomah county, idaho county' and my get method call fails. what would be the best way cut up that string into seperate strings and run the get method on each?
would making it into a list with split(',') work and then run a loop on that list with each string making its own get call work? Thanks

fervent charm Jul 2, 2024, 10:03 AM

#

coral wasp But often performs terribly, and dealing with data types is problematic

I was going to say, seems like they missed "performance" as a measure to compare with

serene thorn Jul 2, 2024, 1:23 PM

#

EDIT: Problem solved! See below.
Hey everyone, I am having problems creating a session context manager. I have some generic database methods in a class like bulk add. Sometimes I do not want to commit changes unless all method calls execute properly, therefore I was thinking of creating a session context manager. The code executes without errors but the changes are not written to the database. I tried creating it:

async def transaction():
    try:
        async with async_session() as session:
            yield session
            if session.dirty or session.new or session.deleted:
                await session.commit()
    except SQLAlchemyError:
        await session.rollback()
        raise

async def bulk_update_objects(
    self,
    objects: List[Any],
    update_fields: List[str],
    model: Type[Any],
    session: AsyncSession = None
):
    primary_key = inspect(model).primary_key[0].name
    updates = [{primary_key: getattr(obj, primary_key), **{field: getattr(obj, field) for field in update_fields}} for obj in objects]

    # Perform the bulk update
    async with session or async_session() as sess:
        try:
            await sess.execute(
                update(model),
                updates
            )
            if session is None:
                await sess.commit()
        except SQLAlchemyError:
            if session is None:
                await sess.rollback()
            raise

async def example():
    async with self._db_operator.transaction() as session:
        await self._db_operator.bulk_update_objects(
            objects=examples,
            update_fields=["example_flag"],
            model=Example,
            session=session
        )```

Even though the session object has the same ID in all places the changes are never actually written to the PostgreSQL database when using the yielded session. The bulk_update_objects method works fine when the session argument is not provided. Anyone have any idea why that might be?

#

I have also unsuccessfully used the begin method since I thought it might commit the changes after the session is closed automatically, thereby forcing a commit after the yield. But that didn't work either. For example:

    @asynccontextmanager
    async def transaction():
        session = async_session()
        try:
            async with session.begin():
                yield session
        except SQLAlchemyError:
            await session.rollback()
            raise

serene thorn Jul 2, 2024, 3:01 PM

#

Nevermind, it was stupid to create a new context using "async with" like so:

async with session or async_session() as sess:

Instead I can make it work by having the method constructed like this instead:

        context = async_session() if session is None else nullcontext(session)

        async with context as sess:
            try:
                await sess.execute(
                    update(model),
                    updates
                )
                if session is None:
                    await sess.commit()
            except SQLAlchemyError:
                if session is None:
                    await sess.rollback()
                raise

Problem solved.

cloud sedge Jul 2, 2024, 4:47 PM

#

my code has an error and idk what it means

coral wasp Jul 2, 2024, 5:16 PM

#

cloud sedge my code has an error and idk what it means

Ask in #python-discussion , not database related... but, you're downloading a .json.gz, but treating it as a .json

opal river Jul 3, 2024, 5:01 AM

#

This is a random question but can you make a model that for examples tries to draw trendlines on financial data and on each output it gets a rating from a human and the correct example to learn from

fading patrol Jul 3, 2024, 11:03 AM

#

opal river This is a random question but can you make a model that for examples tries to dr...

Of course. But probably the more normal way would be to get all the human input together first and then train on that

If you have more questions #data-science-and-ml is probably the better place

opal river Jul 3, 2024, 12:00 PM

#

Ohh sorry

#

I thought this was the channel

#

My mistake

deep heron Jul 3, 2024, 7:24 PM

#

I am in the process of making a central company package that we can import into our projects for common dev stuff. It's only for internal use and is being stored on a local hosted gitlab, and the users are locked into a menu through a putty session and as soon as their process completes, the session gets closed. So they can't view the actual coding. One thing I realized is that we have a single username and password into our database for general use, and this is something I have to work around (out of my control). The thing is, all of our current projects hardcode that user password combination.
Being that I can't adjust the database, what is the best way to handle database access? Is there a way to get away from hardcoding that? Currently I use it in a JSON.

#

Packages are being used with Poetry, btw.

fading patrol Jul 3, 2024, 7:35 PM

#

deep heron I am in the process of making a central company package that we can import into ...

Looks like GitLab is still working on their Secret Manager solution. 😢

Are you using on-prem servers or what? No containers? Would environment variables work?

This is more of a #tools-and-devops question BTW, unless your DB engine offers some sort of non-password authentication

#

But to be clear you absolutely should not be hard coding DB credentials in your repo

deep heron Jul 3, 2024, 7:38 PM

#

Git lab is locally hosted, is that what you mean by on-prem servers? We're not using any containers, unless you count the Putty remote sessions each user uses. I've raised the question about a system environment variable for the "Business Box" as we call it, as each user uses a remote session into it. I've not heard a good answer back yet other than, "rather not if it can be avoided" so I'm unsure on that. ("Small" company problems, I know...)

#

Should I move the question?

#

It was a toss up for this question and my other one on which would be the proper channels. I chose poorly 😂

#

And we use SQLAlchemy through a connector for MariaDB, if that's the engine.

fading patrol Jul 3, 2024, 7:45 PM

#

deep heron Git lab is locally hosted, is that what you mean by on-prem servers? We're not ...

This is a good summary of the options as I understand them. If you're stuck with the infra you're on, I do think environment variable is probably the way to go

https://medium.com/twodigits/keep-passwords-out-of-source-code-why-and-how-e84f9004815a

Medium

Keep passwords out of source code — why and how

A very common security misbehavior I see in my daily work is that credentials are checked into source code control (like git). This is often referred to as hard-coded credentials. This article…

fading patrol Jul 3, 2024, 7:46 PM

#

deep heron Git lab is locally hosted, is that what you mean by on-prem servers? We're not ...

Oh actually, here's a doc from GitLab https://docs.gitlab.com/ee/ci/secrets/

Using external secrets in CI | GitLab

deep heron Jul 3, 2024, 7:53 PM

#

Ok, thanks 😄 I will give them a read.

deep heron Jul 3, 2024, 8:30 PM

#

Storing a password as an environmental variable would still be open to anyone who could get access to those, right?

fading patrol Jul 3, 2024, 8:33 PM

#

deep heron Storing a password as an environmental variable would still be open to anyone wh...

Yes

deep heron Jul 3, 2024, 8:34 PM

#

But this is still more secure than having them in the program, and much better than having them in the repo.

#

I'm currently researching the hashicorp option, but it seems like a bit much for just one user/pass

small bramble Jul 4, 2024, 12:34 PM

#

A database can use both data storage layout row and columb based together?

#

In a system

coral wasp Jul 4, 2024, 2:34 PM

#

small bramble A database can use both data storage layout row and columb based together?

Some databases allow different storage engines on a per table basis, so some tables might use a row oriented engine, and others might use a columnar engine.

delicate fieldBOT Jul 4, 2024, 3:05 PM

#

:incoming_envelope: :ok_hand: applied timeout to @torn sphinx until <t:1720106129:f> (10 minutes) (reason: emoji spam - sent 24 emojis).

The <@&831776746206265384> have been alerted for review.

obsidian basin Jul 5, 2024, 4:13 AM

#

When a site is live or phrased better connected to the internet do people use flask migrate to update a database?

small bramble Jul 5, 2024, 6:34 AM

#

coral wasp Some databases allow different storage engines on a per table basis, so some tab...

Hii Billy long time no see, hru
Can it is possible we save ur data in 2 database one is row another column and a query selector who select which query is executed by which database column or row?

fading patrol Jul 5, 2024, 11:17 AM

#

obsidian basin When a site is live or phrased better connected to the internet do people use ...

Yes, people do that

fervent charm Jul 5, 2024, 12:43 PM

#

Is this an alright design:

CREATE TABLE IF NOT EXISTS estimates (
    timestamp_accessed TIMESTAMPTZ NOT NULL,
    ticker_id INTEGER NOT NULL, -- AAPL
    line_item TEXT NOT NULL, -- net_sales
    quarter VARCHAR(6) NOT NULL, -- 2022Q1
    released INTEGER,  -- 27908
    forecast INTEGER, --28233
    spread NUMERIC, -- -1.15%
    announcement_date DATE, -- 2024/09/06
    PRIMARY KEY (timestamp_accessed, ticker_id, line_item, quarter),
    CONSTRAINT fk_ticker_id FOREIGN KEY (ticker_id)
        REFERENCES tickers (id)
        ON UPDATE CASCADE,
    CONSTRAINT check_quarter_format CHECK (quarter ~ '^\d{4}Q[1-4]$')
);

coral wasp Jul 5, 2024, 1:05 PM

#

small bramble Hii Billy long time no see, hru Can it is possible we save ur data in 2 database...

That doesn't sound practical. Two tables with the same data but different storage engines?

delicate fieldBOT Jul 5, 2024, 2:27 PM

#

failmail :ok_hand: applied timeout to @tardy bolt until <t:1720190228:f> (10 minutes) (reason: mentions spam - sent 6 mentions).

The <@&831776746206265384> have been alerted for review.

obsidian basin Jul 5, 2024, 11:02 PM

#

@fading patrol Are there other options with flask sqlalchemy, thanks for the previous response

fading patrol Jul 5, 2024, 11:21 PM

#

obsidian basin <@382671972829495298> Are there other options with flask sqlalchemy, thanks for ...

Other options? To do what?

obsidian basin Jul 5, 2024, 11:23 PM

#

@fading patrol to add and delete columns in a table without losing the data in the db

fading patrol Jul 5, 2024, 11:28 PM

#

obsidian basin <@382671972829495298> to add and delete columns in a table without losing the da...

You cannot delete columns without losing data.

Alembic is a standard tool for managing migrations with SQL Alchemy if that's what you're asking for

obsidian basin Jul 5, 2024, 11:29 PM

#

I just mean wihthout losing data in other columns

#

that still exist

fading patrol Jul 5, 2024, 11:30 PM

#

obsidian basin that still exist

It's not a bad idea to backup your DB before performing any migrations

obsidian basin Jul 5, 2024, 11:30 PM

#

So again is flask migrate the best tool,

#

or are there other options?

queen sleet Jul 6, 2024, 5:13 AM

#

Hi everyone,does anyone know how to get the unused table in MySQL other than using count(For MySQL)

coral wasp Jul 6, 2024, 11:55 AM

#

Wdym unused?

torn sphinx Jul 6, 2024, 8:55 PM

#

is big data even there yet?

coral wasp Jul 6, 2024, 8:58 PM

#

torn sphinx is big data even there yet?

Where?

torn sphinx Jul 6, 2024, 9:00 PM

#

coral wasp Where?

is spark and Hadoop even good? Is the technology even there yet in terms of quality?

coral wasp Jul 6, 2024, 9:00 PM

#

You're asking about old tech that's been around for quite a while.

#

So, yes.

torn sphinx Jul 6, 2024, 9:01 PM

#

ok

coral wasp Jul 6, 2024, 9:01 PM

#

Spark is like 10 years old.

#

Hadoop almost 20

torn sphinx Jul 6, 2024, 9:02 PM

#

don't they require like a nonsense amount of RAM?

coral wasp Jul 6, 2024, 9:02 PM

#

torn sphinx don't they require like a nonsense amount of RAM?

Depends what you're doing. But, ram is also pretty cheap.

brazen charm Jul 6, 2024, 9:30 PM

#

I would say Hadoop is mostly on its way out tbh

#

I do not see it much really outside of legacy systems or hidden away by spark's handling

#

by "on its way out" I mean more along the lines of it isn't being used directly on new project anymore

sacred widget Jul 6, 2024, 9:44 PM

#

Hello guys ! I need tips about Sqlite3 functions management. I'm having some exercises with Python and now I'm facing the database connection with Sqlite3.

My question is this " How can i organize well all my function for connect, disconnect, insert, select, update, etc... ? " I tried to organize as best I could but I've always a lot of issue with connection and cursor object thru all the functions i created.

It's better to write always clean code customized every time in this case or someone of you can help me with some example how the manage all these database function ?

Thank you very much !!!

fading patrol Jul 6, 2024, 10:12 PM

#

sacred widget Hello guys ! I need tips about Sqlite3 functions management. I'm having some exe...

If you can share some code it might clarify what you're having a problem with.

It sounds like you're starting to experience one reason that using an ORM can help

sacred widget Jul 6, 2024, 10:18 PM

#

fading patrol If you can share some code it might clarify what you're having a problem with. ...

I'll write again some code for let you understand what I mean. Asap it's ready I'll sent to you !
Thank you very much

willow mortar Jul 6, 2024, 11:41 PM

#

Hi guys. I need some suggestions.

My project currently has two tables in the database. staff_id = Column(Integer, primary_key=True, autoincrement=True) but the autoincrement is relative to individual table. But I would love it to be global for the database. How can I achieve this?

Example: Intern employed on July 7th gets a Staff ID of 1092 but a manager employed the following day could be getting a Staff ID of 723.

coral wasp Jul 6, 2024, 11:42 PM

#

willow mortar Hi guys. I need some suggestions. My project currently has two tables in the da...

Your question is: 'how to share a sequence across tables' (a sequence being the counter used to implement the auto increment)

#

What database are you using?

willow mortar Jul 6, 2024, 11:43 PM

#

SQLite3

#

SQLAlchemy

keen minnow Jul 6, 2024, 11:43 PM

#

willow mortar Hi guys. I need some suggestions. My project currently has two tables in the da...

What problem are you trying to solve?

#

What benefit do you see in having a global id for the database?

coral wasp Jul 6, 2024, 11:44 PM

#

One option is another base table for 'person'.

willow mortar Jul 6, 2024, 11:45 PM

#

keen minnow What problem are you trying to solve?

Just so the next employee gets the next Staff_ID in the organization, rather than in the table.

keen minnow Jul 6, 2024, 11:45 PM

#

willow mortar Just so the next employee gets the next `Staff_ID` in the organization, rather t...

And what benefit does it convey?

willow mortar Jul 6, 2024, 11:48 PM

#

...

#

Can't think of any. Probably because that how I imagined it during the idea conception.

keen minnow Jul 6, 2024, 11:52 PM

#

willow mortar Can't think of any. Probably because that how I imagined it during the idea conc...

Your problem sounds a lot like a https://xyproblem.info/ to me. So I am trying to understand the root problem here and the trade offs you are willing to make.

There might be reasons to make the number sequential within an organization, but there are also reasons to make the numbers not predictable as well (ex: security measure so no one can't predict the hiring rate). Both having very different trade offs

#

A concrete example of trade off is that having a global id means you will run out of ID faster since it's now shared across all the tables 🙂
And note that's not something generally advisable to do

willow mortar Jul 6, 2024, 11:54 PM

#

keen minnow Your problem sounds a lot like a <https://xyproblem.info/> to me. So I am trying...

Valid points.

#

However, the idea behind having unique IDs is such that I can query database with the ID, and get a maximum of 1 value. If I get lucky enough, the two tables might get as many entries that there might be an overlap at some point.

fading patrol Jul 7, 2024, 1:49 AM

#

willow mortar However, the idea behind having unique IDs is such that I can query database wit...

Your DB design is the problem.

If it is logical for staff and interns to have employee IDs in a single sequence, then it's probably logical for them to be on the same table

If there is really a good reason to put them in two different tables, then the employee number sequence can be a third table with fk relations to the other two

willow mortar Jul 7, 2024, 1:52 AM

#

I got a similar suggestion (using a single table) earlier. I would just use that.