#data-science-and-ml | Python | Page 409

iron basalt Jun 8, 2022, 5:44 AM

#

In the equation what is being updated? Write the expression here.

plush jungle Jun 8, 2022, 5:46 AM

#

Q(action) = Q(action) + learning_rate * (reward_t + discount_factor * MaxQ(s_t+1,a) - Q(Action))

#

?

#

the value in the table that corresponds to the action that just got us to the goal

#

is being added to because we just got a reward

iron basalt Jun 8, 2022, 5:47 AM

#

Q takes two arguments.

plush jungle Jun 8, 2022, 5:47 AM

#

a state and an action?

iron basalt Jun 8, 2022, 5:47 AM

#

Yes. As shown in the link.

plush jungle Jun 8, 2022, 5:47 AM

#

so that means that this part

#

MaxQ(s_t+1,a)

iron basalt Jun 8, 2022, 5:47 AM

#

No.

#

Q always takes two arguments.

plush jungle Jun 8, 2022, 5:48 AM

#

ok Q(state_t, action_t) is being incremented

#

based on both the current reward

#

and the future predicted reward?

iron basalt Jun 8, 2022, 5:48 AM

#

Not incremented necessarily, just updated.

plush jungle Jun 8, 2022, 5:48 AM

#

right

#

in this part MaxQ(s_t+1,a) what does a represent?

#

the action we just took?

#

and s_t+1 is the state we got as a result of action at time t

iron basalt Jun 8, 2022, 5:49 AM

#

Ok, so from the beginning, we are at s_t and take action a_t, we are now at s_t+1. And according to the equation we are updating Q(s_t, a_t), which is not a value at the current state Q(s_t+1, ...).

plush jungle Jun 8, 2022, 5:50 AM

#

not a value?

#

oh right

#

because we haven't gotten a reward yet

#

so everything in the tables is just 0

iron basalt Jun 8, 2022, 5:51 AM

#

Q(s_t, a_t) does not hold the value at the current state s_t+1, it's the previous state (and action from there).

plush jungle Jun 8, 2022, 5:52 AM

#

oh... so s_t+1 is the current state? the state after we took the action at time t?

iron basalt Jun 8, 2022, 5:52 AM

#

Yes.

plush jungle Jun 8, 2022, 5:52 AM

#

ok

#

so what is a in this MaxQ(s_t+1,a)

iron basalt Jun 8, 2022, 5:53 AM

#

If you are at s_t, and take some action, you are now at s_t+1. Or from a different POV (looking into the past), you are now at s_t, and were at s_t-1.

#

You could, if you wanted to, rewrite the equation from that POV, but it's the same thing.

#

(Just trivial change of variable names)

#

If you look at wikipedia for example: https://en.wikipedia.org/wiki/Q-learning

Q-learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations.
For any finite Markov decision process (FMDP), Q-learning finds a...

#

In the algorithm section: ```
Before learning begins, Q {\displaystyle Q} Q is initialized to a possibly arbitrary fixed value (chosen by the programmer). Then, at each time t {\displaystyle t} t the agent selects an action a t {\displaystyle a_{t}} a_{t}, observes a reward r t {\displaystyle r_{t}} r_{t}, enters a new state s t + 1 {\displaystyle s_{t+1}} s_{t+1} (that may depend on both the previous state s t {\displaystyle s_{t}} s_{t} and the selected action), and Q {\displaystyle Q} Q is updated. The core of the algorithm is a Bellman equation as a simple value iteration update, using the weighted average of the old value and the new information

plush jungle Jun 8, 2022, 5:54 AM

#

but is this correct then?
MaxQ(s_t+1, a_t)

iron basalt Jun 8, 2022, 5:55 AM

#

Yes.

#

But note that at the end goal there are no more actions to make.

plush jungle Jun 8, 2022, 5:56 AM

#

so the first time it reaches the end of the maze, that MaxQ is no value, but when it reaches the second to last square, the MaxQ gets passed the last square as its state?

iron basalt Jun 8, 2022, 5:56 AM

#

The reward is given for the transition from s_t to s_t+1 (r(s_t, a_t)).

#

Since you are reaching the end goal, there is a reward.

#

Which is used to update the value.

#

But Q-learning uses the reward given and the values. So for ones where there is some next possible action (not at the terminal / final state), there is some max value.

plush jungle Jun 8, 2022, 5:58 AM

#

OH

#

I think i'm starting to understand why it's QMax

#

when updating the weights for state t

#

it doesn't just add the reward

iron basalt Jun 8, 2022, 5:59 AM

#

Remember that the values (not rewards), are taking into account the future in this back chaining way.

#

And they take into account the max possible on the next.

plush jungle Jun 8, 2022, 5:59 AM

#

it also adds the best move from state t+1

#

the one it thinks is the best move anyway

iron basalt Jun 8, 2022, 5:59 AM

#

(greedy)

plush jungle Jun 8, 2022, 5:59 AM

#

so if you get a temporary reward, but the highest move from there is only negative, like dead ends

#

it'll punish it

iron basalt Jun 8, 2022, 6:01 AM

#

Punishment comes in the form of the reward (e.g. negative or just less), value is trying to take into account future discounted rewards.

plush jungle Jun 8, 2022, 6:01 AM

#

but is it recursive?

#

or is it just the t+1 move Q value that gets added in

#

not t+2

iron basalt Jun 8, 2022, 6:07 AM

#

So lets say i'm playing chess and I take a Queen, huge reward. But now my opponent wanted that and they sacrificed it. So they make a move that leaves me with only like 3 moves and all of them are bad. They have bad value, but I will still pick the best one (the max given some action). So now next time i'm in that first state, I know that while my current next reward for taking the queen is good, taking into account my future options (even the best one / max), it's not worth it and I need to update my values to reflect that.

plush jungle Jun 8, 2022, 6:11 AM

#

oh, because every state takes into account the state right after it, it's not recursive, but it has essentially the same effect

#

so three moves before taking the queen, that updated weight gets factored in implicitly because each move considers the Q value of the move right after it as well

#

and it's like links in a chain

iron basalt Jun 8, 2022, 6:15 AM

#

Single step here, going from s to s' by taking action a, resulting in reward r. Now you can update Q(s, a) (update your table if it's tabular).

mint palm Jun 8, 2022, 6:15 AM

#

which python version is best for all D

#

Dl, Comp vision stuff, and visualisation and all

iron basalt Jun 8, 2022, 6:16 AM

#

plush jungle and it's like links in a chain

"trajectories", yes.

mint palm Jun 8, 2022, 6:16 AM

#

3.7.X?

iron basalt Jun 8, 2022, 6:24 AM

#

Another way to look at the terminal state in chess is that in that case you only have 1 action, surrender, and so the max of the options is surrender (only item to do max of), and it's a really low max.

plush jungle Jun 8, 2022, 6:26 AM

#

I really appreciate you taking the time to explain this

iron basalt Jun 8, 2022, 6:27 AM

#

But you don't actually need a future value for the terminal state, because there is no future after that.

#

The bad reward for getting there will propagate on its own. (max of 0 actions to take, can just let it give 0)

#

(Or just not have the terminal state be a special case and treat surrender as an action, whatever, same thing, depends on how you prefer to code it)

stark saddle Jun 8, 2022, 6:39 AM

#

I need To know how to make image matching in python like in gta 4 or police verification systems

#

Like how to achieve that ?

vital torrent Jun 8, 2022, 6:43 AM

#

Hello guys! Nice to meet you all :), I was wondering if you could help me with something. I'm just within my Data Science master's program, and I'm still a little bit new to this field. My inquiry is, is there something such as a "nested time series regression" model using Keras? Like, a model using a time series data, but instead of having a defined "batch size" for the input of the model, the input of the model depends of the entries available of each "ID" or "Patient" in the dataset? (like treating each batch/sample as an independent one for the input of the time series model)

#

I hope I'm making myself clear, I'm just struggling with a project right now 🙏

arctic wedgeBOT Jun 8, 2022, 8:53 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

haughty pewter Jun 8, 2022, 10:07 AM

#

to find a correlation between the two, is it right to say:

since satisfied column is left-skewed, scores of 41 and 43 are the most frequent scores of being satisfied with their flight, or lower scores (22 or less) exponentially decreases with the decrease in scores

as neutral or satisfied column is normally distributed, it is very close to the median Final Score of 37.5 compared to the satisfied columns, with its most common value being 36.

we can can conclude that the highest values of both categories of satisfaction are considerably close to the median Final Score of 37.5, making the Median having a correlation between both satisfaction categories, even if the Neutral or Disassatisfied Columns are left-skewed, it does not deviate much

arctic wedgeBOT Jun 8, 2022, 11:32 AM

#

:incoming_envelope: :ok_hand: applied mute to @upper lichen until <t:1654688547:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

urban lance Jun 8, 2022, 12:14 PM

#

I'd like to access each individual group of a pandas group by. (there are about 277k of those).
The goal is to pass each individual group though a function, what would be the most efficient way to go about this?

haughty topaz Jun 8, 2022, 12:17 PM

#

So what 277k groups which you each want to put through the same function?

urban lance Jun 8, 2022, 12:25 PM

#

(data on) users of a website @haughty topaz

rose agate Jun 8, 2022, 1:32 PM

#

urban lance I'd like to access each individual group of a pandas group by. (there are about ...

If it's a simple function then it's probably best to use the inbuilt method if it exists, or lambda. A loop would probably be slower but easier to implement

for name, groupdf in df.groupby('a'):
    print(name)
    print(groupdf)

what is the function doing?

urban lance Jun 8, 2022, 1:33 PM

#

rose agate If it's a simple function then it's probably best to use the inbuilt method if i...

Dividing users into classes based on row values, so not so simple

#

I did this:

groups.append([list(groups) for groups in grouper.groups])

This is probably not very efficient (and I guess my kernel doesn't like it)

serene scaffold Jun 8, 2022, 1:35 PM

#

urban lance I'd like to access each individual group of a pandas group by. (there are about ...

once you have the groupby, you can just iterate over it. it will give you tuples of (label, dataframe)

urban lance Jun 8, 2022, 1:36 PM

#

serene scaffold once you have the groupby, you can just iterate over it. it will give you tuples...

I'll give it a try

plush jungle Jun 8, 2022, 2:08 PM

#

I've rewritten some deep Q learning code that was originally built to learn flappy bird (which it does perfectly after 2 million rounds)

#

I revamped it to try to train this stationary blue circle to shoot this stationary red circle

#

#

but it's been about 900,000 rounds, and it's still not learning effectively

#

the reward structure for this model is

Doing Nothing -> 0
Firing -> 0.1
Aiming up one degree -> 0.2
Aiming down one degree -> 0.2
having the laser pointed at the red dot -> 1.0
having a bullet overlapping with the red dot -> 10.0

#

it trains every turn by selecting a batch of 32 turns chosen randomly from the previous 10,000, and doing the Bellman Equation on them

#

any idea why it's not training as effectively as the flappy bird model with the same code was?

rose agate Jun 8, 2022, 2:21 PM

#

plush jungle any idea why it's not training as effectively as the flappy bird model with the ...

I know nothing about this problem so I'm probably no help, but have you observed the laser ever actually point at the red dot? I thought that maybe it's just continually shooting below and since it never sees it its just randomly aiming up and down randomly which might not be enough to get to the point where it sees it in the first place. Maybe you can alter the reward to be +0.2 for rotating aim to the right and +0.1 to rotate aim to the left so it tends to drift to the right so it'll eventually intersect with the dot if that is that problem.

plush jungle Jun 8, 2022, 2:23 PM

#

rose agate I know nothing about this problem so I'm probably no help, but have you observed...

it has an epsilon value (that starts at .01 and is now 0.05) and every turn it generates a random number. if the number is less than epsilon, it does a random action regardless of what the neural net tells it to do

#

and I've observed it both shooting the target successfully and aiming at it successfully

#

but it's rate of success has barely gone up at all

rose agate Jun 8, 2022, 2:23 PM

#

Also my intuition suggests that having a reward for pointing at the dot is a bad idea because that's not the actual angle it needs but idk

plush jungle Jun 8, 2022, 2:24 PM

#

wait why

rose agate Jun 8, 2022, 2:24 PM

#

is the ball falling down in a parabola?

#

or does it go straight

plush jungle Jun 8, 2022, 2:25 PM

#

ball goes straight

rose agate Jun 8, 2022, 2:25 PM

#

oh ok I thought it was falling in a parabola, that makes sense then

plush jungle Jun 8, 2022, 2:25 PM

#

oh yeah

rose agate Jun 8, 2022, 2:27 PM

#

so if the laser points and fires a bullet at the dot it will for sure overlap with the red dot later?

#

it seems weird to have two rewards for what will be the same outcome

plush jungle Jun 8, 2022, 2:28 PM

#

I added the laser overlap reward as an attempt to get it to learn more effectively, but it doesn't seem to have worked

#

I was worried that because the bullet hits like 50 turns later, it was too far in the future to effectively teach the agent

rose agate Jun 8, 2022, 2:30 PM

#

my intuition would be to try removing the overlap reward and making to aim up reward double the aim down reward to see what happens. Definitely not my area of work though, this makes me want to look into it

plush jungle Jun 8, 2022, 2:35 PM

#

rose agate my intuition would be to try removing the overlap reward and making to aim up re...

original code is here if you want to try it

#

https://github.com/nevenp/dqn_flappy_bird

rose agate Jun 8, 2022, 2:42 PM

#

thanks

pseudo wren Jun 8, 2022, 3:42 PM

#

is anyone familiar with the loader issue in google colab

#

it's driving me nuts

#

i am trying to import packages and i keep hitting the loader issue

hollow sentinel Jun 8, 2022, 4:07 PM

#

i don't really use google colab so idk

misty flint Jun 8, 2022, 4:18 PM

#

pseudo wren i am trying to import packages and i keep hitting the loader issue

what does it look like

#

/what are you trying to do

pseudo wren Jun 8, 2022, 4:19 PM

#

misty flint what does it look like

I’m trying to import TimeSeries from darts

#

But I keep running into an error that says it’s missing a positional argument

#

I figured it out though

#

I had to download an old version of Colab

bold timber Jun 8, 2022, 4:50 PM

#

Hi, I have a problem that makes me confused. In the cost column, I have the value of the list. How do detect all of the values in the cost column that contained the list values?

serene scaffold Jun 8, 2022, 4:59 PM

#

bold timber Hi, I have a problem that makes me confused. In the cost column, I have the valu...

this seems like a weird situation to have arrived at. you should probably post what the dataframe looked like earlier in the program and explain what you are trying to do

#

because now that you have numbers and lists of numbers in the same Series, that is bad.

pseudo wren Jun 8, 2022, 5:10 PM

#

Hm... i'm getting an error in my model that it failed to converge

#

not totally sure how to proceed with it

#

or how to go about fixing it

#

i've been browsing stack overflow for the answer but haven't found a ton of helpful resources.

bold timber Jun 8, 2022, 5:37 PM

#

serene scaffold this seems like a weird situation to have arrived at. you should probably post w...

but how to showing mode in that case?

jaunty sky Jun 8, 2022, 5:47 PM

#

hey folks, I am given a problem statement, to predict whether a transaction might be fraud or not based on few parameters, and there are some customer id, should I cluster them on their id and rather than predicting their transaction one by one?

arctic wedgeBOT Jun 8, 2022, 5:47 PM

#

Hey @jaunty sky!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold Jun 8, 2022, 5:53 PM

#

jaunty sky hey folks, I am given a problem statement, to predict whether a transaction migh...

IDs are usually arbitrary values, whereas you want to cluster things based on non-arbitrary properties of what you're trying to represent.

#

what information are you given about each transaction?

jaunty sky Jun 8, 2022, 5:54 PM

#

https://drive.google.com/file/d/1eYY1u7Ar5sOOpDmZC-UI1VpHfqaRkVLS/view?usp=sharing

serene scaffold Jun 8, 2022, 5:54 PM

#

it will be easier for me to help if you answer the question directly. I don't want to have to wade through a pdf.

jaunty sky Jun 8, 2022, 5:54 PM

#

this is the exact task

#

so the main question is to predict whether a transaction is fraud or not

#

it pretty simple (just the question statement)

#

but its not given whether to cluster them or not

cerulean mauve Jun 8, 2022, 6:22 PM

#

Hello, was hoping to ask if anyone knew how to see the row by row changes that happen when using vectorized operations in pandas/numpy.

#

Without looping through row data ideally.

#

since the vectorized operation is a lot faster without needing to loop.

jaunty sky Jun 8, 2022, 6:27 PM

#

serene scaffold it will be easier for me to help if you answer the question directly. I don't wa...

sir, please help me

serene scaffold Jun 8, 2022, 6:38 PM

#

jaunty sky so the main question is to predict whether a transaction is fraud or not

What information are you given about each transaction?

jaunty sky Jun 8, 2022, 6:39 PM

#

serene scaffold What information are you given about each transaction?

predict whether it is fraud or nnot

serene scaffold Jun 8, 2022, 6:40 PM

#

jaunty sky predict whether it is fraud or nnot

That's what you're being asked to do. It doesn't answer my question

grave cloak Jun 8, 2022, 6:41 PM

#

can someone help me with pandas?

serene scaffold Jun 8, 2022, 6:41 PM

#

grave cloak can someone help me with pandas?

Don't ask to ask, just ask

grave cloak Jun 8, 2022, 6:43 PM

#

Ah, ok

#

I need to do an analysis of a championship. Each year has a file and I need to merge this data, is it possible?

serene scaffold Jun 8, 2022, 6:43 PM

#

grave cloak I need to do an analysis of a championship. Each year has a file and I need to m...

So each file is a csv?

grave cloak Jun 8, 2022, 6:43 PM

#

json and csv

serene scaffold Jun 8, 2022, 6:44 PM

#

Does the Jason have basically the same kind of data

grave cloak Jun 8, 2022, 6:44 PM

#

Text files are as json and tables as CSV

serene scaffold Jun 8, 2022, 6:45 PM

#

So each year has a Jason and a csv?

grave cloak Jun 8, 2022, 6:46 PM

#

It's a little more complicated

#

Sorry

serene scaffold Jun 8, 2022, 6:52 PM

#

grave cloak It's a little more complicated

just drag an example json and csv into the chat

#

or something. just leaving it at "it's complicated" doesn't help me help you.

jaunty sky Jun 8, 2022, 6:56 PM

#

serene scaffold What information are you given about each transaction?

['step', 'type', 'amount', 'nameOrig', 'oldbalanceOrg', 'newbalanceOrig', 'nameDest', 'oldbalanceDest', 'newbalanceDest', 'isFraud','isFlaggedFraud']

serene scaffold Jun 8, 2022, 6:58 PM

#

jaunty sky ['step', 'type', 'amount', 'nameOrig', 'oldbalanceOrg', 'newbalanceOrig', ...

great. for all the ones that are fraud (according to isFraud), what properties do they have in common? are there "nameDest" that are more frequent for fraud values? what about the change in balance?

jaunty sky Jun 8, 2022, 6:58 PM

#


type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

amount - amount of the transaction in local currency.

nameOrig - customer who started the transaction

oldbalanceOrg - initial balance before the transaction

newbalanceOrig - new balance after the transaction

nameDest - customer who is the recipient of the transaction

oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.```

#

these are attributes details

grave cloak Jun 8, 2022, 7:06 PM

#

serene scaffold just drag an example json and csv into the chat

📎 brasileirao-2021.json

serene scaffold Jun 8, 2022, 7:11 PM

#

@grave cloak this json can't be read directly as tabular data. how does it relate to the content in the CSVs?

#

there's unfortunately way too much content for me to wrap my head around what the schema of the JSON is.

grave cloak Jun 8, 2022, 7:13 PM

#

Ok, I get it. I'm talking to the author of the dataset

cerulean mauve Jun 8, 2022, 7:33 PM

#

@serene scaffold do you know of a way to get debug output when a vectorized operation is carrier out in pandas?

#

Let's say I wanted to cast all my dates to datetime64 and I wanted to see what rows got operated on. Is this possible?

opaque echo Jun 8, 2022, 7:47 PM

#

Hey, so have an np array of say 70K points with their x and y coordinates., with shape [70K, 2]. These points cover a region of about 5368, 7152 units. I want to sample about 1/25th of points from them, uniformly based on their local densities. This means removing points from denser regions more than the sparser regions which will give me a nice uniformly distributed point array. Any idea how can I do it super fast?
Currently, I put all those points on a 2D array of shapes [5368, 7152] and slide a kernel of shape 11x11 onto it to get a rough local density estimate by which I probabilistically sample the points.

shy oasis Jun 8, 2022, 9:59 PM

#

Hello all. Deep learning/machine learning newbie but ex-competitive programmer here. Is there any problem based curriculum to teach myself deep learning? Like how codeforces/hackerrank has problem based structures. Step by step problems getting harder to solve and along the way require you to learn new things. Any tips welcome

misty flint Jun 8, 2022, 10:13 PM

#

have you tried fast.ai

#

not necessarily problem-based but helpful for coders

lapis sequoia Jun 8, 2022, 10:21 PM

#

are there any servers dedicated for python ai?

hasty mountain Jun 8, 2022, 10:52 PM

#

Hey guys, is it normal for a DCGAN to output images with a margin of 9 squares on them? It feels like it adds a # mask on its outputs, even if the generated images are good.

#

I don't know if it's a sign of overfitting, if it's normal or if it's another problem...

misty flint Jun 8, 2022, 10:57 PM

#

the code might be missing a post-processing step

#

if its a consistent output of a 9-square margin, you can just add a step to remove said margin

#

otherwise, youd need to investigate further, starting with maybe the training data as well as padding

jaunty sky Jun 8, 2022, 11:15 PM

#

;lkjh

hasty mountain Jun 8, 2022, 11:25 PM

#

misty flint otherwise, youd need to investigate further, starting with maybe the training da...

Strange... The thing is...the output, at the beginning of the training, seems to include no squares. However, as the generator gets better, the squares show up, gradually.

#

This is why I thought it was a sign of overfitting.

pliant sundial Jun 9, 2022, 12:03 AM

#

What are some ML/AI projects.

#

I need some inspiration to learn it.

misty flint Jun 9, 2022, 12:09 AM

#

hasty mountain Strange... The thing is...the output, at the beginning of the training, seems to...

have you checked the training data itself

#

like just look through it

#

actually

#

what model are you using

#

since maybe its been trained with data with padding

hasty mountain Jun 9, 2022, 12:10 AM

#

misty flint have you checked the training data itself

Yes, I did the training data myself. It's not a big dataset like CIFAR, but I think it'll do, right?

hasty mountain Jun 9, 2022, 12:11 AM

#

misty flint what model are you using

DCGAN

hasty mountain Jun 9, 2022, 12:11 AM

#

misty flint since maybe its been trained with data with padding

It wasn't. It was trained with Batch Normalization layers. I'm following Pytorch's DCGAN tutorial.

misty flint Jun 9, 2022, 12:16 AM

#

yeah i just looked at the original paper https://arxiv.org/pdf/1511.06434.pdf

#

looks like so

#

i have no idea then

#

we used pix2pix and cycleGAN for our GAN project

#

and never faced the issue youre facing

#

kekHands

hasty mountain Jun 9, 2022, 12:18 AM

#

grumpchib

misty flint Jun 9, 2022, 12:18 AM

#

Oopsies

#

computer vision isnt my specialty so maybe someone else has insight into whats going on

hasty mountain Jun 9, 2022, 12:19 AM

#

I actually removed the Batch Normalization now because I thought it was that that was causing this issue

#

But...now I'm seeing those squares again

misty flint Jun 9, 2022, 12:20 AM

#

looks like you found out one thing though

#

it wasnt the batch normalization step

#

kekHands

#

maybe it is just overfitting. idk why that would be a manifestation of overfitting tho

hasty mountain Jun 9, 2022, 12:21 AM

#

Yeah, I'm starting to think it's overfitting

#

The squares appeared after 400.000 epochs

misty flint Jun 9, 2022, 12:21 AM

#

bruh

#

why the heck

#

do you have so many epochs

#

kekHands

hasty mountain Jun 9, 2022, 12:22 AM

#

I thought it was normal to have that many epochs py_guido

#

At least I've read in some paper that the guys used many, many epochs

misty flint Jun 9, 2022, 12:23 AM

#

hasty mountain At least I've read in some paper that the guys used many, many epochs

not for GANs my dude

#

straight from google's intro to GANs

#

https://developers.google.com/machine-learning/gan/training

Google Developers

GAN Training | Generative Adversarial Networks | Google Develop...

hasty mountain Jun 9, 2022, 12:24 AM

#

Oh yeah, I had a problem with that in certain model, then I had to use 235.000 epochs

misty flint Jun 9, 2022, 12:24 AM

#

For a GAN, convergence is often a fleeting, rather than stable, state.

#

kekHands

hasty mountain Jun 9, 2022, 12:24 AM

#

Perhaps my memory is playing tricks at me and it wasn't a paper for GANs, but for Reinforcement Learning

misty flint Jun 9, 2022, 12:24 AM

#

that

hasty mountain Jun 9, 2022, 12:24 AM

#

hemlock

misty flint Jun 9, 2022, 12:24 AM

#

would make much more sense

#

kekHands

hasty mountain Jun 9, 2022, 12:25 AM

#

Uh...well, then...how much epochs did you use?

#

I'm using 16x16 images. I thought 500.000 epochs were ok...until now.

misty flint Jun 9, 2022, 12:26 AM

#

i dont remember kekHands
but def not as much

hasty mountain Jun 9, 2022, 12:26 AM

#

I only noticed overfitting with 32x32

#

And it was quite clear it was overfitting with model collapse

#

#

Can you see the squares?

#

Fun fact: I removed batch norm because it caused those squares to appear with less epochs, this is why I thought the batch norm was causing them.
It seems the batch norm was only making the generator able to reach its ideal point faster thinkmon

pseudo wren Jun 9, 2022, 12:41 AM

#

I have a quick question about pandas dataframe manipulation

#

i have a dataframe that i need to limit certain categories of

#

for example

#

i have a dataframe of house details

#

i want to limit the categories of bathrooms and bedrooms to only houses with 2-3 bathrooms and houses with 3-4 bedrooms

#

i'm looking through the documentation but i cannot find anything thus far

hasty mountain Jun 9, 2022, 12:45 AM

#

pseudo wren i want to limit the categories of bathrooms and bedrooms to only houses with 2-3...

Hm... it's been some time since I don't deal with pandas, but you probably should check pandas.drop()

pseudo wren Jun 9, 2022, 12:45 AM

#

hm yeah

#

i figured i'd have to use drop

#

but i'm more so looking to get those specific parameters

#

and figure out how to drop it so that i can get those specific parameters

#

i don't know if a for loop would do me good here

hasty mountain Jun 9, 2022, 12:46 AM

#

I think you'll use something like data = data.drop("bathroom" < 2) or something like that

pseudo wren Jun 9, 2022, 12:48 AM

#

that might work

hasty mountain Jun 9, 2022, 12:48 AM

#

Probably won't, since I don't remember the command, but it's something around that

#

data = data.drop(data["bathroom"] < 2)

#

And be careful with the argument axis=0 or axis=1

pseudo wren Jun 9, 2022, 12:51 AM

#

hm

#

that's not quite it either

#

'<' not supported between instances of 'str' and 'int'

#

the error i got

hasty mountain Jun 9, 2022, 12:53 AM

#

pseudo wren ```'<' not supported between instances of 'str' and 'int'```

Check if the numbers in your bathroom column are real numbers and not strings

pseudo wren Jun 9, 2022, 12:59 AM

#

hm

#

so i came up with some sort of solution but

#

it doesn't seem to be a fix all

#

housedf = housedf[df.bedrooms != 5]
housedf = housedf[df.bathrooms!= 1]
housedf = housedf[df.bathrooms!= 4]
housedf.head()```

#

#

and yes i know it's bad code

#

but in bathrooms it doesn't register that 4.50 shouldn't be there

#

so i need to figure out how to put it in a range

brazen pike Jun 9, 2022, 1:22 AM

#

housedf = housedf[(df.bathrooms< 4)|(df.bathrooms>=5)] i think would work for that specific line of code

#

i think in general you may be able to simplify that logic just using a range of what you're looking for

#

sorry, corrected something with my suggestion, i think. & = and, | = or

pseudo wren Jun 9, 2022, 1:37 AM

#

hm i'll try it

#

for some reason it's still including values i am trying to tell it to explicitly exclude

#

housedf = housedf[(housedf.bedrooms > 2)|(housedf.bedrooms < 5)]
housedf.head()```

#

sick fern Jun 9, 2022, 2:16 AM

#

square = np.zeroes((300,300) , np.uint8)

#

hey guys, this is a piece of code that i'm not understanding

#

I get that it's making a matrix with only zeroes, but what it 'np.uint8' here?

#

because ive seen it send with np.uint32 as well and i dont understand that

serene scaffold Jun 9, 2022, 2:17 AM

#

@sick fern unsigned eight bit integer

sick fern Jun 9, 2022, 2:17 AM

#

serene scaffold <@538631811098738699> unsigned eight bit integer

could you elaborate further? Sorry, I've never dealt with numpy before

serene scaffold Jun 9, 2022, 2:18 AM

#

sick fern could you elaborate further? Sorry, I've never dealt with numpy before

You know how data is just bits? Everything on the computer is 0 or 1?

sick fern Jun 9, 2022, 2:18 AM

#

Yeah

serene scaffold Jun 9, 2022, 2:19 AM

#

sick fern Yeah

An 8 bit integer is a whole number represented using 8 bits. So you can represent numbers up to 2⁸

#

And it's unsigned, so there isn't a bit used to tell you if it's positive or negative.

sick fern Jun 9, 2022, 2:20 AM

#

ohh okay got it

#

but how does that relate to the piece of code?

#

like what does it do

serene scaffold Jun 9, 2022, 2:20 AM

#

You're telling numpy to create an array using that data type. Rather than a 32 bit integer. Or a 64 bit float. Or something.

sick fern Jun 9, 2022, 2:21 AM

#

so each integer in the array can go up to 2 to the power 8

#

?

serene scaffold Jun 9, 2022, 2:21 AM

#

It might be more memory efficient, but I'm not really sure.

sick fern Jun 9, 2022, 2:21 AM

#

ohh okay got it

#

thanks a lott

serene scaffold Jun 9, 2022, 2:21 AM

#

You're welcome

grave cloak Jun 9, 2022, 2:23 AM

#

Can you help me unite values in Pandas?

serene scaffold Jun 9, 2022, 2:24 AM

#

grave cloak Can you help me unite values in Pandas?

Unite? In what way?

grave cloak Jun 9, 2022, 2:25 AM

#

It has the name of the clubs that appear in different rounds, there are 38 in total, I have to gather and show how many goals each team scored in total

#

But only the minute the goal was scored appears, not the goals themselves.

serene scaffold Jun 9, 2022, 2:28 AM

#

@grave cloak does the minute determine what round it was part of?

grave cloak Jun 9, 2022, 2:29 AM

#

Not. It only has the id of the match, the minute in which the goal was scored, and the round

serene scaffold Jun 9, 2022, 2:32 AM

#

@grave cloak can you do a group by, like I showed you before?

grave cloak Jun 9, 2022, 2:33 AM

#

serene scaffold <@551400279807885312> can you do a group by, like I showed you before?

It didn't work, I tried but it's too long and full of unnecessary information

#

I thought of a way to do...

#

If you took the null minutes from each game of each team and each round, and could show how many times they appeared

junior quest Jun 9, 2022, 2:56 AM

#

https://pastebin.com/axPysKLU

Pastebin

import tensorflow as tfimport numpy as npimport matplotlib.pyplot a...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

I'm getting a dimension mismatch here and insanely high loss and 0 accuracy

#

what is wrong with my implementation?

#

I don't think the dimension mismatch in the plot is relevant to the model accuracy and loss though

rose agate Jun 9, 2022, 3:41 AM

#

pseudo wren ```housedf = housedf[(housedf.bathrooms > 1)|(housedf.bathrooms < 4)] housedf = ...

You're using the '|' in this statement which is OR instead of what should be an AND, '&'

#

you could try housedf.loc[housedf.bathrooms.isin([2,3]) & housedf.bedrooms.isin([3,4])], which should be cleaner, assuming bedrooms and bathrooms are ints

pseudo wren Jun 9, 2022, 3:55 AM

#

rose agate You're using the '|' in this statement which is OR instead of what should be an ...

Yeah I realized my error eventually

#

I fixed it

lofty charm Jun 9, 2022, 5:18 AM

#

trying to follow along with the code for sklearn's logistic regression (https://github.com/scikit-learn/scikit-learn/blob/80598905e517759b4696c74ecc35c6e2eb508cff/sklearn/linear_model/_logistic.py#L754) and am a bit confused about fit_intercept. where in the code is the intercept added to X?

bold timber Jun 9, 2022, 6:26 AM

#

I want to get the most frequency of year and I get an error. How fix that?

rose agate Jun 9, 2022, 7:45 AM

#

bold timber I want to get the most frequency of year and I get an error. How fix that?

try replace .mode with .agg(pd.Series.mode)

weary ridge Jun 9, 2022, 8:49 AM

#

given this image, and the string bluetooth addressi have to output the corresponding value

#

#

this is the image

rose agate Jun 9, 2022, 9:57 AM

#

weary ridge given this image, and the string ```bluetooth address```i have to output the co...

I got it somewhat functional using pytesseract. The text it identifies isn't perfect though

import pytesseract
from PIL import Image
import difflib

pytesseract.pytesseract.tesseract_cmd = 'C:/Users/NAMEHERE/AppData/Local/Programs/Tesseract-OCR/tesseract.exe'
text = (pytesseract.image_to_string(Image.open('unknown.png'), config='--psm 3'))
text = text.split('\n')
string_to_match = 'bluetooth address'

for line in text:
    match = difflib.get_close_matches(string_to_match, [line])
    if match:
        print(*line.split(':')[1:])

#

For 'bluetooth address'

#

You'll probably have to look into how to make it detect better

#

this is what it read

split parcel Jun 9, 2022, 10:15 AM

#

Needed some help with Python (particularly MatPlotLib)
I need to plot some data as a stem plot which is very closely spaced sometimes
Now what matplotlib does is that it just plots the two stems on top of each other
I think this won't happen if the least count is reduced along the x-axis
But I am not getting any documentation online on how to do that
Note that I don't want to change x-ticks since I don't care about the text labels; I want to change the least count which matplotlib is using to put the stems on the graph

gloomy anvil Jun 9, 2022, 10:26 AM

#

split parcel Needed some help with Python (particularly MatPlotLib) I need to plot some data ...

I dont have a perfect solution off the bat, but maybe you can do grouped bar chart. then you make the single bars very narrow (basically like a line) and then add a circle as a marker for each bar at the top.

#

so you'd have a basically a bar plot that looks like a stem plot

split parcel Jun 9, 2022, 10:29 AM

#

gloomy anvil I dont have a perfect solution off the bat, but maybe you can do grouped bar ch...

Okay I'll try and see if this looks better

gloomy anvil Jun 9, 2022, 10:30 AM

#

gloomy anvil Jun 9, 2022, 10:31 AM

#

split parcel Okay I'll try and see if this looks better

so basically you'd take a bar chart like this, and style the the bars with almost no width. and then simply take a marker like the circle and set it to the same data like the bars

split parcel Jun 9, 2022, 10:32 AM

#

gloomy anvil so basically you'd take a bar chart like this, and style the the bars with almos...

How do you get the markers, like which type of plot do you use to get the circle, triangle, square markers?

gloomy anvil Jun 9, 2022, 10:32 AM

#

https://matplotlib.org/stable/api/markers_api.html

#

here is an example: https://matplotlib.org/stable/gallery/lines_bars_and_markers/scatter_star_poly.html

gloomy anvil Jun 9, 2022, 10:49 AM

#

I have a question myself: It is very easy to look at correlations of a dataframe in python with a heatmap

#

#

It is also very easy to plot autocorrelations with statsmodels:

plot_acf(data, lags=50)

#

#

But now, how can I find out if there is a correlation in a more complex dataset? Right now I have around 60 columns and about 3000 rows. How can I see if there is a strong correlation in the dataset between column 23 and column 54 with a 5 day lag or so? Is there some way find it automatically? Or would I need to create this manually? Or maybe is there an interactive plot where I can try different lags and different columns and see if there are strong correlations?

I feel like this might be a standardproblem where there must be some generic solution for. I can't be the first one to look for signs in data, if there is some pattern which is correlated with an event a few days later?

steep bramble Jun 9, 2022, 11:11 AM

#

Hello, I'm looking for help regarding seaborn heatmap

#

I follow a training on this topic but I can't see how to handle the topic.
I have a list of products such as pasta, vegetables in a dataframe. One column is the timestamp when the product has been added to the DB.
I'm requested to make a heatmap of when the product are added (crossing month from 1 to 12 and hours from 1 to 24). I'm good with x and y but I don't know how to compute values

rose agate Jun 9, 2022, 11:25 AM

#

gloomy anvil But now, how can I find out if there is a correlation in a more complex dataset?...

try looking at partial autocorrelation, plot_pacf, I don't know the maths behind it but it basically accounts for the correlation in previous lags built in. I think you can then get those values and test which are significant which is basically above the shaded blue area in the plot

mint garnet Jun 9, 2022, 11:39 AM

#

Hey guys, does anyone here have experience with data mapping using AI? My organization does a lot of work mapping client metadata fields to internal ones and I'm interested in automating the process.

mint garnet Jun 9, 2022, 11:43 AM

#

gloomy anvil But now, how can I find out if there is a correlation in a more complex dataset?...

Have you tried lagging your data by one period and then regressing your data on your lagged data?

somber burrow Jun 9, 2022, 11:57 AM

#

https://stackoverflow.com/questions/72559796/pandas-python-export-to-xlsx-with-multiple-sheets hi guys i have a problem with exporting multiple xlsx as sheets

Stack Overflow

Pandas,(Python) -> Export to xlsx with multiple sheets

i`m traind to read some .xlsx files from a directory that is create earlier using curent timestamp and the files are store there, now i want to read those .xlsx files and put them in only one .xlsx...

gloomy anvil Jun 9, 2022, 12:05 PM

#

rose agate try looking at partial autocorrelation, `plot_pacf`, I don't know the maths behi...

yeah, i've been using pacf plots a lot as well. the thing is just, that it is also based on the residuals of autocorrelation. So it looks at the correlation within itself of a timeseries, not at the correlation between two different timeseries in the time dimension. So that does not help with finding autocorrelations between columns in time.

gloomy anvil Jun 9, 2022, 12:09 PM

#

mint garnet Have you tried lagging your data by one period and then regressing your data on ...

that is what I was hoping to avoid, as I would have to create a heatmap per column and per lag. so that would be roughly 60 columns x 50 time lags x 60 columns = 180.000 heatmaps to look through.

mint garnet Jun 9, 2022, 12:11 PM

#

gloomy anvil that is what I was hoping to avoid, as I would have to create a heatmap per colu...

Oof

gloomy anvil Jun 9, 2022, 12:13 PM

#

mint garnet Hey guys, does anyone here have experience with data mapping using AI? My organi...

It totally depends on how the data is structured and what kind of job the AI has to do. Example: having unstructured data, you could use topic modelling or name entity recognition to map this data to your internal fields. it just totally depends on what you need to do and how good the data is

mint garnet Jun 9, 2022, 12:18 PM

#

gloomy anvil It totally depends on how the data is structured and what kind of job the AI has...

It's mostly a matter of mapping whatever column head a client has used to define an industry standard field (ex. Loan origination date) to the column head we use to denote the same field. Right now we are basically doing the whole process field by field manually, which is incredibly time consuming and tedious. I am hoping to build some sort of a software solution that will match fields based on some combination of data similarity and field name similarly so that match candidates are populated automatically and we just have to validate the generated matches and fill in an fields that weren't matched manually.

gloomy anvil Jun 9, 2022, 12:21 PM

#

you could maybe use a simple way of vectorizing to calculate the distance between the two vectors (which is essentially a score of similarity). You'd have to test how well this approach works and it probably would still need manual control work, but it might be an easy and fast approach to solve this.

mint garnet Jun 9, 2022, 12:23 PM

#

gloomy anvil you could maybe use a simple way of vectorizing to calculate the distance betwee...

Alright great, could you recommend any resources that I can use to teach myself how to apply that approach? I'm relatively new to data science and still have a lot to learn.

gloomy anvil Jun 9, 2022, 12:30 PM

#

mint garnet Alright great, could you recommend any resources that I can use to teach myself ...

if you work with a lot of text data, you could start here: https://machinelearningmastery.com/natural-language-processing/

It is also a great ressource for other disciplines of AI. It is also good to start with some simple statistics text book as a foundation before jumping into ML

Machine Learning Mastery

What Is Natural Language Processing?

Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software. The study of natural language processing has been around for more than 50 years and grew out of the field of linguistics with the rise of computers. In this post, you will […]

edgy glen Jun 9, 2022, 12:38 PM

#

hiho, i am trying to make a script (argpars) form my data that i analysed.
unfortunatly i dont get any output abd some variables are not found.
kind of stuck.. any one down for a quick help

mint garnet Jun 9, 2022, 1:06 PM

#

gloomy anvil if you work with a lot of text data, you could start here: https://machinelearni...

Great, thanks! I studied economics in college, so I have some exposure to statistical analysis, but I could use a refresher.

steady basalt Jun 9, 2022, 2:01 PM

#

Anyone wana help me with my stats homework

serene scaffold Jun 9, 2022, 2:36 PM

#

steady basalt Anyone wana help me with my stats homework

don't ask to ask, just ask.

#

if you had asked an actual question, and it was once that I knew the answer to, I'd be answering it right now.

west tapir Jun 9, 2022, 4:19 PM

#

a question:

Lets imagine a scenario where we have to ask all the employees a set of question (mandatory btw) in which each question have a particular weight and based on their answers we have to put them in different categories aka buckets.
how can we implement this using python dynamically

wooden sail Jun 9, 2022, 4:24 PM

#

that depends on the type of question. one way, for example, is to take multiple choice questions and encode the answers with a numerical value, e.g. an integer. then you can put all of those values into a vector. you can then split up the n-dimensional space into regions based on the values. you can find intersections of half spaces by just writing out inequalities

west tapir Jun 9, 2022, 5:01 PM

#

wooden sail that depends on the type of question. one way, for example, is to take multiple ...

Let's assume, there are 20 questions, single choice each and all of the questions depict human nature. Based of that we need to decide if a person is a party head or an introvert who likes to stay home or something else which is different output

#

Now we have to make clusters of employees respectively to their output which was got by the inputs they did

wooden sail Jun 9, 2022, 5:19 PM

#

this sounds like something you'd do with a spreadsheet or something of the sort, nothing python specific

#

you could use a bunch of ifs or inequalities

lapis sequoia Jun 9, 2022, 6:20 PM

#

Is it "fair", formally, if a university analyses the questions on which minority students perform worse. And deletes those questions from a test.

#

@serene scaffold

serene scaffold Jun 9, 2022, 6:23 PM

#

I HAVE BEEN SUMMONED

#

what do you want

lapis sequoia Jun 9, 2022, 6:23 PM

#

Yes Mr. Bond

#

shipit

lapis sequoia Jun 9, 2022, 6:23 PM

#

lapis sequoia Is it "fair", formally, if a university analyses the questions on which minority...

Help me with this shit

agile cobalt Jun 9, 2022, 6:24 PM

#

it seems like even the formal definition of fairness still leaves whenever or not something is fair subjective, so: unclear imo

serene scaffold Jun 9, 2022, 6:24 PM

#

lapis sequoia Is it "fair", formally, if a university analyses the questions on which minority...

I'm not an ethicist. but I would wonder why minorities do worse on those questions. do the questions rely on a cultural context that isn't shared? or is it a socioeconomic issue?

lapis sequoia Jun 9, 2022, 6:25 PM

#

I feel like it is unfair. Maybe because you are forcefully trying to admit minorities. And not based on their skillset.

#

From what it appears from the prompt. It is an analytical ability test

#

Not explicitly mentioned though

serene scaffold Jun 9, 2022, 6:26 PM

#

I think this is an important issue. But I don't think it's one that our community is really intended to facilitate.

lapis sequoia Jun 9, 2022, 6:26 PM

#

#

3rd part

#

1 is definitely not true. Not sure about 3rd

#

Oh nevermind. It is not fair. I made it out from the options. Because second is true. And no option for 2 and 3

serene scaffold Jun 9, 2022, 6:31 PM

#

wrt test questions, 3 seems more compelling than 1

lapis sequoia Jun 9, 2022, 6:38 PM

#

Is it "fair" or not. We don't know. It's not fair straightforwardly. It's only when you put in some social science in it and use the bigger picture. It might look fair then. But more like unfair rn. Because you are trying to create a "biased" test.
Assuming the questions are analytical in nature. And not some qualitative ones in that case the test might be biased originally and you removed the bias.

grave cloak Jun 9, 2022, 8:37 PM

#

Guys, I'm using Pandas and when I try to see the values of different columns it returns the same value
Does anyone know how to solve?

wispy hill Jun 9, 2022, 8:44 PM

#

grave cloak Guys, I'm using Pandas and when I try to see the values of different columns i...

are you trying to output the names of the columns in your dataframe?

grave cloak Jun 9, 2022, 8:45 PM

#

Yes it was my mistake

#

Can you tell me if there is a way to add results from two groupby?

serene scaffold Jun 9, 2022, 8:47 PM

#

grave cloak Can you tell me if there is a way to add results from two groupby?

what do you mean by "add"? do you mean actually 1 + 2 or "put side by side"?

grave cloak Jun 9, 2022, 8:47 PM

#

1+2

serene scaffold Jun 9, 2022, 8:48 PM

#

grave cloak 1+2

can you show what "the result of a groupby" looks like?

grave cloak Jun 9, 2022, 8:49 PM

#

#

#

Add the results of 'mandante_placar' with 'visitante_placar'

wispy hill Jun 9, 2022, 8:52 PM

#

grave cloak Guys, I'm using Pandas and when I try to see the values of different columns i...

import numpy as np

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['Col A', 'Col B', 'Col C'])

print(df.columns[0])``` 

would print "Col A"

grave cloak Jun 9, 2022, 8:54 PM

#

ok i will try

serene scaffold Jun 9, 2022, 9:00 PM

#

grave cloak

if the set of labels on the left are the same for both, you can literally just add them with +

digital dew Jun 9, 2022, 9:00 PM

#

hey guys can anyone help me with some ML/NPL error please ?

#

i searched on google and didn't find a solution

grave cloak Jun 9, 2022, 9:01 PM

#

serene scaffold if the set of labels on the left are the same for both, you can literally just a...

It is yes

wispy hill Jun 9, 2022, 9:01 PM

#

anyone free to check #help-cake ? im trying to practice for exam tomorrow and im stuck on a problem

grave cloak Jun 9, 2022, 9:12 PM

#

serene scaffold if the set of labels on the left are the same for both, you can literally just a...

Did not work 🥲

serene scaffold Jun 9, 2022, 9:15 PM

#

grave cloak Did not work 🥲

saying that it "didn't work" doesn't help me to help you. what happened instead? did your computer explode?

grave cloak Jun 9, 2022, 9:17 PM

#

No hahaha, but I went to do a sum and he added the match. For example '4x2' became 6

serene scaffold Jun 9, 2022, 9:17 PM

#

grave cloak No hahaha, but I went to do a sum and he added the match. For example '4x2' beca...

I don't understand.

#

please show the exact code that you ran and explain what it did that was different from what you wanted.

grave cloak Jun 9, 2022, 9:19 PM

#

df1 ['total'] = df1['mandante_placar'] + df1 ['visitante_placar']

serene scaffold Jun 9, 2022, 9:19 PM

#

did you get an error message or what?

grave cloak Jun 9, 2022, 9:20 PM

#

But it's doing the sum of the match and not of each club

serene scaffold Jun 9, 2022, 9:21 PM

#

please do print(df[['mandante_placar', 'visitante_placar']].to_dict()) and put that in the paste bin

#

!paste

arctic wedgeBOT Jun 9, 2022, 9:21 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

grave cloak Jun 9, 2022, 9:23 PM

#

serene scaffold Jun 9, 2022, 9:24 PM

#

grave cloak

it has to be text in the paste bin that I can copy and paste. please do not post screenshots of text.

grave cloak Jun 9, 2022, 9:25 PM

#

The content is too big

serene scaffold Jun 9, 2022, 9:26 PM

#

grave cloak

please give the code in this screenshot as text, and the one after it as well.

grave cloak Jun 9, 2022, 9:26 PM

#

arctic wedge

Not even here saved

serene scaffold Jun 9, 2022, 9:26 PM

#

grave cloak The content is too big

that's what the paste bin is for.

#

anyway, we can forget about that for now. just give the code in those two screenshots as text. I don't want to retype what is in them

grave cloak Jun 9, 2022, 9:26 PM

#

serene scaffold that's what the paste bin is for.

grave cloak Jun 9, 2022, 9:27 PM

#

serene scaffold anyway, we can forget about that for now. just give the code in those two screen...

Ok

serene scaffold Jun 9, 2022, 9:27 PM

#

grave cloak

please do not post any more screenshots. I will not look at them. when you use the paste bin, you have to save it and give the link. this was in the instructions from the !paste command.

grave cloak Jun 9, 2022, 9:28 PM

#

Ok, one momento

serene scaffold Jun 9, 2022, 9:29 PM

#

like I said, just the two lines of code from the two screenshots. as text in the chat.

#

@grave cloak I have to leave soon

serene scaffold Jun 9, 2022, 9:35 PM

#

grave cloak

you need to add this expression

arctic wedgeBOT Jun 9, 2022, 9:35 PM

#

Hey @grave cloak!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

serene scaffold Jun 9, 2022, 9:35 PM

#

grave cloak

to this one

#

with +

grave cloak Jun 9, 2022, 9:36 PM

#

Ok, thanks a lot for the help, I'll try here

#

I tried to send the print and the result but it's too big

serene scaffold Jun 9, 2022, 9:38 PM

#

In this case, I was only asking for the code, not the displayed result

grave cloak Jun 9, 2022, 9:45 PM

#

df1.groupby(['mandante'])['mandante_placar'].sum()
df1.groupby(['visitante'])['visitante_placar'].sum()

serene scaffold Jun 9, 2022, 9:50 PM

#

df1.groupby(['mandante'])['mandante_placar'].sum() + df1.groupby(['visitante'])['visitante_placar'].sum()

#

Try that @grave cloak

grave cloak Jun 9, 2022, 9:51 PM

#

File "<ipython-input-41-205e99e99ae7>", line 1
df1.groupby(['mandante'])['mandante_placar'].sum() + df1.groupby(['visitante'])['visitante_placar'].sum()
^
SyntaxError: invalid character in identifier

#

++

#

That was the mistake I think

#

Thank you so much @serene scaffold and thanks for your patience too

pseudo wren Jun 9, 2022, 10:02 PM

#

is anyone familiar with the python library darts?

#

it's specifically used for time series modeling

limber token Jun 9, 2022, 11:32 PM

#

Hey guys, any idea how I can slice a pandas dataframe with a datetime column every X amount of days (specifically monthly and yearly)? The dataframe current has daily info. I've tried two different approaches: filtering every 30 or 365 indexes, but that doesn't work since there are gaps in information: for example index 30 is Feb 10th, 1995, and index 60 is March 28th, 1995. I've also tried locking the relevant day/month based on the selected filter (image in annex), but again that's not the best solution since it's looking for exact days and there are gaps in the df. In some instances it goes like 8 years without finding the same exact date. Any tips on how to go about this?

serene scaffold Jun 10, 2022, 12:22 AM

#

@limber token can you set the timestamp as the index?

limber token Jun 10, 2022, 12:23 AM

#

serene scaffold <@305021323183128577> can you set the timestamp as the index?

I could, but how would that help?

serene scaffold Jun 10, 2022, 12:24 AM

#

limber token I could, but how would that help?

You could use a Grouper

#

!docs pandas.Grouper

arctic wedgeBOT Jun 10, 2022, 12:24 AM

#

pandas.Grouper


class pandas.Grouper(*args, **kwargs)```
A Grouper allows the user to specify a groupby instruction for an object.

This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.

If axis and/or level are passed as keywords to both Grouper and groupby, the values passed to Grouper take precedence.

limber token Jun 10, 2022, 12:25 AM

#

I'm a bit of a noob with this, how could I filter by time intervals by grouping?

serene scaffold Jun 10, 2022, 12:25 AM

#

limber token I'm a bit of a noob with this, how could I filter by time intervals by grouping?

Click the link

limber token Jun 10, 2022, 12:25 AM

#

I did

#

Didn't understand how to apply it

#

Oh

#

There's literally a freq method, my bad lol

limber token Jun 10, 2022, 12:40 AM

#

serene scaffold Click the link

Got a <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000026B36D96490> object, how to actually keep it as df?

misty flint Jun 10, 2022, 1:23 AM

#

serene scaffold !docs pandas.Grouper

wow you are really a pandas guru, stel

#

11/10 would follow any online content you post

#

kekHands

rose agate Jun 10, 2022, 1:34 AM

#

limber token Got a `<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000026B36D9649...

I believe you would be able to loop through each filtered df like

for name, groupdf in filtered_df:
    print(name)
    print(groupdf)

serene scaffold Jun 10, 2022, 1:54 AM

#

limber token Got a `<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000026B36D9649...

You have to do something to the groups

#

Think of a grouped DataFrame as a bag of DataFrames

serene scaffold Jun 10, 2022, 1:55 AM

#

misty flint 11/10 would follow any online content you post

I thought about starting a Twitter to post my hot takes, but that might destroy my family

misty flint Jun 10, 2022, 1:57 AM

#

serene scaffold I thought about starting a Twitter to post my hot takes, but that might destroy ...

no no no twitter is a hot mess

#

dont do it stel

#

kekHands

serene scaffold Jun 10, 2022, 2:42 AM

#

misty flint no no no twitter is a hot mess

I've never actually used jt

lapis sequoia Jun 10, 2022, 3:00 AM

#

Hello, I'm trying to remove all rows with NaN values from my dataframe. I am using the following code:

df = web.DataReader('DEXUSEU', "fred", start, end)
df['SP500'] = web.DataReader('SP500', "fred", start, end)
df['Inflation'] = web.DataReader('FPCPITOTLZGUSA', 'fred', start, end)
df['Interest Rates'] = web.DataReader('INTDSRUSM193N', 'fred', start, end)

df = df.dropna()

print(df)

but when I do that I get this output:

Empty DataFrame
Columns: [DEXUSEU, SP500, Inflation, Interest Rates, M3, M2, M1, Interbank]
Index: []

any ideas on what I'm doing wrong here?

grave cloak Jun 10, 2022, 3:15 AM

#

df1.rename(columns={'mandante': 'clube'},inplace=True)
df1_clubegols = df1.groupby(['clube'])['mandante_placar'].sum() ++ df1.groupby(['visitante'])['visitante_placar'].sum()
print(df1_clubegols.to_markdown())

#

When I run this it is like 'club' and '0' the columns, does anyone know how I can rename this '0' and generate a graph with the values

plush jungle Jun 10, 2022, 5:10 AM

#

I'm training a deep Q learning AI to play a top down shooter of my own design

#

#

and I just learned that the deep Q learning algorithm is guaranteed to converge

#

but there's no guarantee it'll happen quickly

#

at 1.1 Million moves, its loss is still extremely high and it can't hit the stationary target to save its life

#

but I just watched a video on reinforcement learning for a boxing game in unity

#

and they could barely stand until 250 million cycles

#

so I'm wondering if I'm just being impatient, and in a couple hundred million moves it'll converge after all

#

but currently it's training at a rate of about 100k turns an hour

#

so it'll take about 104 days of continuous training for it to get there

#

but the thing is, it's barely using my gpus

#

#

and I've got a 3080

#

so it's pretty powerful

wooden sail Jun 10, 2022, 5:16 AM

#

taking millions of cycles sounds about right. as to how long it takes, it depends on how many parameters there are and how difficult it us to compute the maximization/prediction per step

#

what are you using for this? pytorch? tensorflow?

plush jungle Jun 10, 2022, 5:16 AM

#

pytorch

#

it uses the gpu, but in between every turn it has to run pygame code

#

I'd like to make full use of the gpu

wooden sail Jun 10, 2022, 5:17 AM

#

are you batching things up?

plush jungle Jun 10, 2022, 5:17 AM

#

yes, but I'm using the batch size of the code I skeletonized, 32

#

out of the last 10k turns

wooden sail Jun 10, 2022, 5:17 AM

#

you can probably increase that a lot more

plush jungle Jun 10, 2022, 5:17 AM

#

if I dramatically increase batch size, would my gpu run them concurrently?

wooden sail Jun 10, 2022, 5:18 AM

#

that should be the case

plush jungle Jun 10, 2022, 5:18 AM

#

let me test that theory

wooden sail Jun 10, 2022, 5:18 AM

#

also, are you generating all quantities on gpu? idk if pytorch allows you to directly create variables on gpu

#

moving them from cpu to gpu and back is super slow

plush jungle Jun 10, 2022, 5:20 AM

#

minibatch = random.sample(replay_memory, min(len(replay_memory), model.minibatch_size))

        # unpack minibatch
        state_batch = torch.cat(tuple(d[0] for d in minibatch))
        action_batch = torch.cat(tuple(d[1] for d in minibatch))
        reward_batch = torch.cat(tuple(d[2] for d in minibatch))
        state_1_batch = torch.cat(tuple(d[3] for d in minibatch))

        if torch.cuda.is_available():  # put on GPU if CUDA is available
            state_batch = state_batch.cuda()
            action_batch = action_batch.cuda()
            reward_batch = reward_batch.cuda()
            state_1_batch = state_1_batch.cuda()

        # get output for the next state
        output_1_batch = model(state_1_batch)

        # set y_j to r_j for terminal state, otherwise to r_j + gamma*max(Q)
        y_batch = torch.cat(tuple(reward_batch[i] if minibatch[i][4]
                                  else reward_batch[i] + model.gamma * torch.max(output_1_batch[i])
                                  for i in range(len(minibatch))))

        # extract Q-value
        q_value = torch.sum(model(state_batch) * action_batch, dim=1)

        # PyTorch accumulates gradients by default, so they need to be reset in each pass
        optimizer.zero_grad()

        # returns a new Tensor, detached from the current graph, the result will never require gradient
        y_batch = y_batch.detach()

        # calculate loss
        loss = criterion(q_value, y_batch)

        # do backward pass
        loss.backward()
        optimizer.step()```

#

this is the training code

#

my assumption is that the .cuda() will parallelize it

wooden sail Jun 10, 2022, 5:21 AM

#

where's the .cuda in there

plush jungle Jun 10, 2022, 5:21 AM

#

  if torch.cuda.is_available():  # put on GPU if CUDA is available
            state_batch = state_batch.cuda()```

wooden sail Jun 10, 2022, 5:21 AM

#

ah i missed it

#

that's very slow, i think there's a way to create them directly on gpu. but try increasing the batch size first

plush jungle Jun 10, 2022, 5:22 AM

#

what frustrates me is that the flappy bird model this code was originally for converged after 2 million rounds

#

and mine is at a million and hasn't even improved

#

is aiming and firing from a stationary location to a stationary location really more complex than flappy bird?

wooden sail Jun 10, 2022, 5:24 AM

#

i couldn't say

plush jungle Jun 10, 2022, 5:30 AM

#

ok yeah that definitely changed it

#

it was at 320 batch size before, not 32

#

so I upped it to 5000

#

and now the GPU spikes pretty seriously every time it back propagates

iron basalt Jun 10, 2022, 5:38 AM

#

plush jungle I'm training a deep Q learning AI to play a top down shooter of my own design

Do a different problem first to make sure it's not just a bug.

plush jungle Jun 10, 2022, 5:39 AM

#

you mean a different top down problem?

iron basalt Jun 10, 2022, 5:39 AM

#

Any RL problem, like a maze.

plush jungle Jun 10, 2022, 5:39 AM

#

cause I tried the flappy bird model before I started this, and my model trained pretty quickly

iron basalt Jun 10, 2022, 5:39 AM

#

Do a bunch and see where it fails or not.

plush jungle Jun 10, 2022, 5:40 AM

#

ok

iron basalt Jun 10, 2022, 5:40 AM

#

Like gym.

plush jungle Jun 10, 2022, 5:40 AM

#

a maze seems like a good idea if this batch size thing doesn't work

#

maybe I'll do a driving one too

#

since that's basically a top-down flappy bird

iron basalt Jun 10, 2022, 5:42 AM

#

After you have a bunch of different ones played or failed, make a matrix of features of each and try to find out what it struggles with (assuming it's not just a bug).

plush jungle Jun 10, 2022, 5:42 AM

#

features? how do I compare different games?

#

if it succeeds on a maze and a driving game, what features would be different?

#

from a top down shooter

iron basalt Jun 10, 2022, 5:43 AM

#

You kind of have to guess with it. But you can make an educated guess.

plush jungle Jun 10, 2022, 5:43 AM

#

yeah any information is valuable

iron basalt Jun 10, 2022, 5:43 AM

#

Action space, turn based vs not, grid based vs not, delayed rewards, sparse rewards, etc.

plush jungle Jun 10, 2022, 5:43 AM

#

yeah ok

iron basalt Jun 10, 2022, 5:46 AM

#

Then make a hypothesis, etc, etc (do science, data science).

#

(You can make specific games with specific features)

#

*Also RL is hard/unsolved, it not working without a lot of effort (or just A LOT of compute) is expected.

#

**What is really annoying is that you will often not know if it's just a bug or not. I have had times where removing a bug made it worse (AI/ML is kind of special in this way / that this can happen).

plush jungle Jun 10, 2022, 5:51 AM

#

I guess I'm spoiled then with the flappy bird model being tuned just right

#

and that is annoying that you can't really know what the problem is

iron basalt Jun 10, 2022, 5:55 AM

#

***Some papers out there sometimes make me wonder if their implementation was bugged because when reproducing it I did not get any good results, and tried pretty hard to find any bugs. This is a huge problem with not releasing some source code. There is no reproduction (which is needed for it to be science).

plush jungle Jun 10, 2022, 5:58 AM

#

what exactly do you define as a bug?

#

if a model is valid but only with the hyperparameters tuned just right, is that a bug?

iron basalt Jun 10, 2022, 5:59 AM

#

No, but something like writing out of bounds is.

plush jungle Jun 10, 2022, 5:59 AM

#

writing out of bounds?

iron basalt Jun 10, 2022, 5:59 AM

#

index out of bounds

plush jungle Jun 10, 2022, 5:59 AM

#

oh I see

#

which could give you results that aren't actually real

iron basalt Jun 10, 2022, 5:59 AM

#

Bugs happen all the time, and there is no way to tell that they did not just mess it up if there is no source code available.

plush jungle Jun 10, 2022, 6:00 AM

#

yeah, a paper with no code is a pretty big red flag

iron basalt Jun 10, 2022, 6:00 AM

#

The thing about ML is that the bug could actually results in better results, a special case in software.

#

Other stuff like traditional algorithms would probably just break (usually very noticeable).

#

Another place where this kind of thing can show up is in game development where a bug becomes a feature because during testing they found that it enhanced the game (e.g. Minecraft pistons strange buggy behavior).

#

The reason it shows up in ML is because the systems are often good at dealing with noise and are random already.

#

And adding a bit of noise in a specific way can make it better (not all the bugs add just a bit of noise, there are other types but that is one of them).

plush jungle Jun 10, 2022, 6:04 AM

#

that's a good point, I never thought about how weird that is that a bug could actually be "beneficial" in those fields

hazy gazelle Jun 10, 2022, 7:19 AM

#

yooo do u guys know how to solve this?

wooden sail Jun 10, 2022, 7:22 AM

#

looks like you want to set up some inequality based on the value of AVERAGE SALES

hazy gazelle Jun 10, 2022, 7:31 AM

#

wooden sail looks like you want to set up some inequality based on the value of AVERAGE SALE...

can you help me to solve this problem?

hazy gazelle Jun 10, 2022, 7:32 AM

#

hazy gazelle yooo do u guys know how to solve this?

fyi, this is excel file

hazy gazelle Jun 10, 2022, 7:33 AM

#

hazy gazelle yooo do u guys know how to solve this?

the deadline 2 hours left

wooden sail Jun 10, 2022, 7:33 AM

#

i can help you with the logic, but i won't do your homework for you

hazy gazelle Jun 10, 2022, 7:34 AM

#

wooden sail i can help you with the logic, but i won't do your homework for you

yea just explain, i'll do the rest

wooden sail Jun 10, 2022, 7:34 AM

#

you want some nested if else statements

#

for example if(value on the column to the left <= some number and value on the column to the left >= some other number, then 'SILVER', else ( if ... ) )

low spear Jun 10, 2022, 7:37 AM

#

#

does anyone know an effective way on how to do this?

hazy gazelle Jun 10, 2022, 7:38 AM

#

wooden sail for example if(value on the column to the left <= some number and value on the c...

can i add u on discord, to talk further?

gilded kestrel Jun 10, 2022, 8:02 AM

#

hey guys, I have a question. Let's say you want to predict house/apartment selling price for a part of the city, but you don't have floor details i.e. two apartments on the same building can appear as duplicates in your data but with different selling price. Do you assume that these concern different apartments or assume that it is the same apartment sold another time hence remove them to keep the observations independent?

hazy gazelle Jun 10, 2022, 8:11 AM

#

hazy gazelle yooo do u guys know how to solve this?

pls guys i still need ur help

gilded kestrel Jun 10, 2022, 8:21 AM

#

low spear

df1.merge(df2, on='Code', how='left')

inland zephyr Jun 10, 2022, 10:11 AM

#

hello everyone, does anyone know best dataset to practice for support recomender task?

#

association or other approach such classification or sparse matrix is welcomed

grand bronze Jun 10, 2022, 10:34 AM

#

Hey everyone! I'm very new to computer vision and looking for some input on this problem I have.
I want to train a model that takes an image as input and gives the same image as output but with one or multiple added overlays. Like this:
Input:

#

#

Output:

#

It's basically adding a missing voronoi cell

#

I have identified pix2pix as a possible candidate, but I get the feeling that model is not meant to take a photo as input but more a schematic as input

#

I'm not asking anyone to guide me trough the whole way to do it, just looking for a pointer to the right type of model

#

Image segmentation would perhaps also work to kind of mask the area that the cell is missing from, but I don't think that's what those models are meant for either

grand bronze Jun 10, 2022, 10:45 AM

#

grand bronze Output:

To clarify, it is known in advance what cell is missing, the model would be trained on a synthetic dataset of where these cells are supposed to be. There is only 1 'solution', it doesn't need to create new voronoi patterns

oblique agate Jun 10, 2022, 12:12 PM

#

Can we find confidence value of a prediction made by logistic regression model?

grave cloak Jun 10, 2022, 1:20 PM

#

Does anyone know how I change that '0'?

serene scaffold Jun 10, 2022, 1:41 PM

#

grave cloak Does anyone know how I change that '0'?

the second to last line creates a new DataFrame, so the thing you did in the first line doesn't count anymore.

going forward, I won't attempt to answer any questions you ask that involve a screenshot. sorry.

grave cloak Jun 10, 2022, 1:42 PM

#

serene scaffold the second to last line creates a new DataFrame, so the thing you did in the fir...

OK sorry. But I think it looks better to display the error

serene scaffold Jun 10, 2022, 1:43 PM

#

grave cloak OK sorry. But I think it looks better to display the error

unless it's not text, you can copy and paste it as text. everything in that screenshot is text.

odd meteor Jun 10, 2022, 2:19 PM

#

oblique agate Can we find confidence value of a prediction made by logistic regression model?

Yes you can. You use predict_proba() method to find the probability of your prediction (how confident your model is about the prediction it's made.)

oblique agate Jun 10, 2022, 2:20 PM

#

odd meteor Yes you can. You use `predict_proba()` method to find the probability of your pr...

Thanks

autumn mountain Jun 10, 2022, 3:11 PM

#

Hello, I just found this community (used to look on irc channels but seems like I am getting older)

misty flint Jun 10, 2022, 3:12 PM

#

does anyone have resources on writing aws lambda functions to call ML models for inference

#

kekHands

#

the more i work, the more i feel i know nothing

#

forever imposter syndrome

#

calling it now

#

💀

autumn mountain Jun 10, 2022, 3:14 PM

#

I wanted to know how to convert this little table:

#

#

to something having 3 columns, Year, Month, Value

serene scaffold Jun 10, 2022, 3:18 PM

#

autumn mountain

you could .stack() and then .T to transpose, I think. if you do print(df.to_dict('list')) and give the text in the chat, I can experiment.

autumn mountain Jun 10, 2022, 3:19 PM

#

serene scaffold you could `.stack()` and then `.T` to transpose, I think. if you do `print(df.to...

Cool let me give that print to you

autumn mountain Jun 10, 2022, 3:20 PM

#

serene scaffold you could `.stack()` and then `.T` to transpose, I think. if you do `print(df.to...

{2021: {1: '525.785', 2: '427.857', 3: '477.502', 4: '468.083', 5: '484.556', 6: '457.686', 7: '478.079', 8: '518.769', 9: '532.103', 10: '562.109', 11: '544.405', 12: '526.958'}, 2020: {1: '470.827', 2: '424.322', 3: '459.281', 4: '463.401', 5: '507.738', 6: '509.694', 7: '518.549', 8: '543.902', 9: '566.628', 10: '619.065', 11: '589.061', 12: '583.310'}, 2019: {1: '386.320', 2: '331.042', 3: '374.333', 4: '423.750', 5: '518.613', 6: '525.426', 7: '551.882', 8: '575.643', 9: '556.631', 10: '599.759', 11: '573.220', 12: '539.916'}, 2018: {1: '265.377', 2: '224.912', 3: '278.908', 4: '295.472', 5: '317.927', 6: '317.742', 7: '347.696', 8: '373.720', 9: '401.025', 10: '413.556', 11: '406.041', 12: '407.751'}, 2017: {1: '290.174', 2: '225.300', 3: '252.969', 4: '236.823', 5: '248.159', 6: '245.243', 7: '293.297', 8: '316.340', 9: '307.968', 10: '302.871', 11: '293.155', 12: '285.409'}, 2016: {1: '284.158', 2: '226.621', 3: '264.373', 4: '275.014', 5: '295.629', 6: '297.553', 7: '280.241', 8: '299.579', 9: '334.492', 10: '337.164', 11: '320.987', 12: '289.284'}, 2015: {1: '402.896', 2: '335.901', 3: '341.535', 4: '327.988', 5: '367.478', 6: '362.013', 7: '362.265', 8: '357.917', 9: '347.830', 10: '361.113', 11: '332.901', 12: '314.040'}}

#

(without the 'list' argument)

serene scaffold Jun 10, 2022, 3:20 PM

#

autumn mountain {2021: {1: '525.785', 2: '427.857', 3: '477.502', 4: '468.083', 5: '484.556', 6:...

why did you do something other than what I asked? tangerine_think but okay

autumn mountain Jun 10, 2022, 3:21 PM

#

serene scaffold why did you do something other than what I asked? <:tangerine_think:756526770693...

Sorry you are right:

#

{2021: ['525.785', '427.857', '477.502', '468.083', '484.556', '457.686', '478.079', '518.769', '532.103', '562.109', '544.405', '526.958'], 2020: ['470.827', '424.322', '459.281', '463.401', '507.738', '509.694', '518.549', '543.902', '566.628', '619.065', '589.061', '583.310'], 2019: ['386.320', '331.042', '374.333', '423.750', '518.613', '525.426', '551.882', '575.643', '556.631', '599.759', '573.220', '539.916'], 2018: ['265.377', '224.912', '278.908', '295.472', '317.927', '317.742', '347.696', '373.720', '401.025', '413.556', '406.041', '407.751'], 2017: ['290.174', '225.300', '252.969', '236.823', '248.159', '245.243', '293.297', '316.340', '307.968', '302.871', '293.155', '285.409'], 2016: ['284.158', '226.621', '264.373', '275.014', '295.629', '297.553', '280.241', '299.579', '334.492', '337.164', '320.987', '289.284'], 2015: ['402.896', '335.901', '341.535', '327.988', '367.478', '362.013', '362.265', '357.917', '347.830', '361.113', '332.901', '314.040']}

#

(I thought you were going to miss the month)

serene scaffold Jun 10, 2022, 3:22 PM

#

autumn mountain (I thought you were going to miss the month)

that's fine, I guess. but .unstack() will return a Series of (year, month) -> value, and .stack() will do (month, year) -> value

autumn mountain Jun 10, 2022, 3:24 PM

#

serene scaffold that's fine, I guess. but `.unstack()` will return a Series of `(year, month) ->...

How do I go from that "(month, year) -> value" to a dataframe with Month and Year columns ? sorry I am just starting on this

serene scaffold Jun 10, 2022, 3:24 PM

#

In [8]: df.unstack().to_frame().T
Out[8]:
      2021                                               ...     2015
        1        2        3        4        5        6   ...       7        8        9        10       11       12
0  525.785  427.857  477.502  468.083  484.556  457.686  ...  362.265  357.917  347.830  361.113  332.901  314.040

autumn mountain Jun 10, 2022, 3:24 PM

#

oh let me try

autumn mountain Jun 10, 2022, 3:25 PM

#

serene scaffold ```py In [8]: df.unstack().to_frame().T Out[8]: 2021 ...

I dont really want a multilevel column, but 2 distinct columns

serene scaffold Jun 10, 2022, 3:26 PM

#

each value has a year and a month. so what are these two columns going to mean?

autumn mountain Jun 10, 2022, 3:27 PM

#

Example:
Year Month Val
2021 1 427.5
2021 2 456.6
2020 12 123.45

#

for example

#

(this is because I need to join this dataframe to another based on Year and Month)

raw urchin Jun 10, 2022, 3:28 PM

#

Is this the best place to ask about webscraping?

serene scaffold Jun 10, 2022, 3:28 PM

#

raw urchin Is this the best place to ask about webscraping?

no, this is for data science. try a help channel. see #❓｜how-to-get-help

raw urchin Jun 10, 2022, 3:28 PM

#

serene scaffold no, this is for data science. try a help channel. see <#704250143020417084>

TY ❤️

serene scaffold Jun 10, 2022, 3:30 PM

#

autumn mountain (this is because I need to join this dataframe to another based on Year and Mont...


In [13]: df.unstack().reset_index()
Out[13]:
    level_0  level_1        0
0      2021        1  525.785
1      2021        2  427.857
2      2021        3  477.502
3      2021        4  468.083
4      2021        5  484.556
..      ...      ...      ...
79     2015        8  357.917
80     2015        9  347.830
81     2015       10  361.113
82     2015       11  332.901
83     2015       12  314.040

[84 rows x 3 columns]

renaming the columns would be a bit of a pain.

autumn mountain Jun 10, 2022, 3:31 PM

#

cool it worked !

#

let me try to rename the cols

serene scaffold Jun 10, 2022, 3:31 PM

#


In [15]: df.columns.rename('year', inplace=True)
In [17]: df.index.rename('month', inplace=True)

In [18]: df.unstack().reset_index()
Out[18]:
    year  month        0
0   2021      1  525.785
1   2021      2  427.857
2   2021      3  477.502
3   2021      4  468.083
4   2021      5  484.556
..   ...    ...      ...
79  2015      8  357.917
80  2015      9  347.830
81  2015     10  361.113
82  2015     11  332.901
83  2015     12  314.040

[84 rows x 3 columns]

autumn mountain Jun 10, 2022, 3:34 PM

#

Thank you Stelercus !

storm oasis Jun 10, 2022, 3:56 PM

#

anyone can help me and give me tips or reference ?
i am actually wanna build Named Entity Recognition Using Conditional Random Fields. But I have trouble in entity labelling so any advice from you guys how to labelling data for text with indonesian language?

serene scaffold Jun 10, 2022, 4:03 PM

#

storm oasis anyone can help me and give me tips or reference ? i am actually wanna build Nam...

you're trying to make an NER model for more than one language?

storm oasis Jun 10, 2022, 4:18 PM

#

serene scaffold you're trying to make an NER model for more than one language?

actually for indonesian language

#

so i have done preprocessing phase, then i will go to entity labelling / annonate the text to which location, person and other. but i don't know how do that

serene scaffold Jun 10, 2022, 4:28 PM

#

storm oasis actually for indonesian language

so what do you mean by "multi language"?

storm oasis Jun 10, 2022, 4:31 PM

#

sorry i was wrong giving the explanation.

pseudo wren Jun 10, 2022, 5:00 PM

#

Working on a time series model with the package Darts. Having trouble returning the acf, as the error says time series has no attribute to shape

#

i can't find much on stack overflow about this error

serene scaffold Jun 10, 2022, 5:01 PM

#

pseudo wren Working on a time series model with the package Darts. Having trouble returning ...

please show the whole error from Traceback as well as the relevant code

pseudo wren Jun 10, 2022, 5:01 PM

#

will do

#

AttributeError                            Traceback (most recent call last)
<ipython-input-33-9be0d81f7eb4> in <module>
----> 1 plot_acf(train)

2 frames
/usr/local/lib/python3.7/dist-packages/statsmodels/graphics/tsaplots.py in _prepare_data_corr_plot(x, lags, zero)
     17     if lags is None:
     18         # GH 4663 - use a sensible default value
---> 19         nobs = x.shape[0]
     20         lim = min(int(np.ceil(10 * np.log10(nobs))), nobs - 1)
     21         lags = np.arange(not zero, lim + 1)

AttributeError: 'TimeSeries' object has no attribute 'shape'

#

the error

#

the code is this

serene scaffold Jun 10, 2022, 5:02 PM

#

so x, whatever that is, is not an array.

pseudo wren Jun 10, 2022, 5:02 PM

#

hm

#

i'll send the relevant code

#

train.plot(label='train')
val.plot(label='validation')
plt.legend();
# here we are splitting our model into training and validation. Everything before 2019 is training, and 2019 onwards will be the validation series```

#

output

#

from there i attempted to plot the acf

serene scaffold Jun 10, 2022, 5:04 PM

#

pseudo wren i'll send the relevant code

Python is dynamically typed. so it might be that it's up to you to know what x is supposed to be, both in terms of what type it is and what it represents.

pseudo wren Jun 10, 2022, 5:04 PM

#

the x value is the dates

#

but i thought i had already established it

#

i imported the date time package as well so it could be read

serene scaffold Jun 10, 2022, 5:05 PM

#

pseudo wren the x value is the dates

if you do type(x), what is that?

pseudo wren Jun 10, 2022, 5:06 PM

#

serene scaffold Jun 10, 2022, 5:07 PM

#

how is it a regex flag?

#

also is that an upper case X?

pseudo wren Jun 10, 2022, 5:08 PM

#

yes it is

serene scaffold Jun 10, 2022, 5:09 PM

#

the x in your function is lower case. so this is something completely unrelated.

pseudo wren Jun 10, 2022, 5:09 PM

#

i thought so too

#

but X capital shows up

#

x lowercase does not

#

this is my first time attempting a time series model

serene scaffold Jun 10, 2022, 5:10 PM

#

you have to put print(type(x)) in the function and run it to figure out what type(x) is.

pseudo wren Jun 10, 2022, 5:10 PM

#

yeah i tried that

#

nothing

#

lowercase x is not defined

#

i'm following the documentation guide for darts though

#

on building a time series model

serene scaffold Jun 10, 2022, 5:11 PM

#

oh it's here

#

AttributeError: 'TimeSeries' object has no attribute 'shape'

pseudo wren Jun 10, 2022, 5:11 PM

#

yeah

#

that was the initial issue

#

however i've seen other people work around this

serene scaffold Jun 10, 2022, 5:11 PM

#

you said time series has no attribute to shape

#

which is not the same thing.

pseudo wren Jun 10, 2022, 5:11 PM

#

ah

#

i've seen other people work around this error though

serene scaffold Jun 10, 2022, 5:12 PM

#

what library does TimeSeries come from? if it's a sequence of some kind, you might be able to convert it to an array.

pseudo wren Jun 10, 2022, 5:12 PM

#

The Darts library

#

admittedly a library i have never used before

#

i can send the code leading up to that error

#

series = TimeSeries.from_dataframe(wrc, 'Date', 'High', fill_missing_dates=True, freq='B')
train, val = series[:-36], series[-36:]
wrc.head()```

#

wrc.shape

#


model = ExponentialSmoothing()
model.fit(train)
prediction = model.predict(len(val), num_samples=1000)```

#


wrcm = Theta()
wrcm.fit(train)
pred = model.predict(len(val))```

#

train.plot(label='train')
val.plot(label='validation')
plt.legend();```

sleek forum Jun 10, 2022, 5:14 PM

#

pseudo wren i'll send the relevant code

The relevant part of ur code is where u declare x

serene scaffold Jun 10, 2022, 5:15 PM

#

sleek forum The relevant part of ur code is where u declare x

it's an argument for a function that they didn't necessarily write (though maybe they did)

pseudo wren Jun 10, 2022, 5:15 PM

#

no i didn't

serene scaffold Jun 10, 2022, 5:15 PM

#

anyway, here are the docs for TimeSeries

#

https://unit8co.github.io/darts/generated_api/darts.timeseries.html

pseudo wren Jun 10, 2022, 5:15 PM

#

according to the documentation i didn't need to declare x outright

sleek forum Jun 10, 2022, 5:15 PM

#

As Stelercus said before, whatever x is is not what it should be (a list).

serene scaffold Jun 10, 2022, 5:16 PM

#

there's no "declaring" in Python. variables are just names.

where do you call the function that has x?

pseudo wren Jun 10, 2022, 5:17 PM

#

series = TimeSeries.from_dataframe(wrc, 'Date', 'High', fill_missing_dates=True, freq='B')

sleek forum Jun 10, 2022, 5:17 PM

#

serene scaffold it's an argument for a function that they didn't necessarily write (though maybe...

Yes, but if the error is in the call of this function, then he had to call the function passing the parameters to cause that

serene scaffold Jun 10, 2022, 5:18 PM

#

AttributeError                            Traceback (most recent call last)
<ipython-input-33-9be0d81f7eb4> in <module>
----> 1 plot_acf(train)

#

where is this in the code

#

whatever train is (it is probably a TimeSeries) is not a valid type for that function.

pseudo wren Jun 10, 2022, 5:19 PM

#

train, val = series.split_before(pd.Timestamp('20190101'))

#

this is the code that it's pointing to

#

i had seen someone work around this error using the same darts module though. From what I can tell, they followed relatively the same process.

#

which is frustrating

sleek forum Jun 10, 2022, 5:20 PM

#

serene scaffold it's an argument for a function that they didn't necessarily write (though maybe...

I think I understand it now, this error could have been caused by a call to a method from one of the libs it is using.

sleek forum Jun 10, 2022, 5:33 PM

#

pseudo wren ```train, val = series.split_before(pd.Timestamp('20190101')) train.plot(label='...

So, what I understood of your code and may be causing the error is the that val receive a datetime object and them u call the method plot on it, but it expected an list.

#

I ask you to ignore the English mistakes, I'm not fluent

pseudo wren Jun 10, 2022, 5:34 PM

#

nah totally fine!

#

that bit of code does actually return a plot

#

but it won't return the shape of said plot

sleek forum Jun 10, 2022, 5:35 PM

#

Try to put the val in a list and call that list in the place of val.

pseudo wren Jun 10, 2022, 5:37 PM

#

#

this is the output of that

#

trying to plot it again

#

still nothing

#

same error

#

misty flint Jun 10, 2022, 5:41 PM

#

@pseudo wren this looks like an interesting library nonetheless...maybe i will check it out if i ever have to do time series stuff

#

PikaThink

pseudo wren Jun 10, 2022, 5:42 PM

#

misty flint <@160842639938158593> this looks like an interesting library nonetheless...maybe...

i chose it because it supposedly abstracts away a lot of the processes behind creating a timeseries like you would find in matplotlib

#

buuut so far it's just a headache

misty flint Jun 10, 2022, 5:43 PM

#

oof

#

well it still looks like itd be promising for maybe simple stuff

#

at least according to the readme

#

i like that you can create a time series object straight from a pandas df

#

so ill bookmark it

#

just in case

#

EvilKermit

pseudo wren Jun 10, 2022, 5:44 PM

#

yeah absolutely

#

i just need to figure out how to actually use it all the way

#

creating a time series directly from pandas has been insanely convenient though

misty flint Jun 10, 2022, 5:45 PM

#

if i end up using this library later on, ill let you know

#

kekHands

pseudo wren Jun 10, 2022, 5:45 PM

#

thanks! i'd be interested to see how you fare with it

limber token Jun 10, 2022, 6:52 PM

#

serene scaffold <@305021323183128577> can you set the timestamp as the index?

So I spoke to my boss and I actually can't, so I tried this:

#

But I don't really like this solution, for two reasons:

#

I need it to match the exact day first, and then if it doesn't match, try the range
It's pulling more than one day per month, since it's pulling every day that matches the range

serene scaffold Jun 10, 2022, 6:57 PM

#

limber token So I spoke to my boss and I actually can't, so I tried this:

you're also overwriting the name of the dataframe, and unless the dataframe is preserved in some other scope, that is bad

misty flint Jun 10, 2022, 6:59 PM

#

omg stel, i have done so many type conversions with this data that im pretty sure something got lost along the way

#

time to check this json i exported i guess

#

kekHands

inland drum Jun 10, 2022, 7:03 PM

#

I have to build a large dataset on employees from a couple of siloed datasets and I'm trying to engineer it so that data scientists have an easy time passing it to estimators / prediction algos.

I'm struggling because I have daily snapshots of the employees properties like their contract, benefits, manager, department. However, I also have other data that does not update daily such as survey results, satisfaction, but fortnight, and I think I have to somehow merge them.

Many employees do not change properties daily such as manager but some do.

One of the goals is enabling DS to be able to generate monthly or weekly predictions over the employees, such as churn rate or satisfaction per department per month with the data set whilst including that auxiliar data that comes from evaluations or surveys.

I would like to know what strategy is used normally to face this type of time granularity diversity on features pertaining the same population.

Could anyone provide an outline please?

In kaggle you rarely see data that has like history snapshots of the features so this feels like a non so common case

plush jungle Jun 10, 2022, 7:19 PM

#

yes! my Q learner finally learned to wiggle to maximize laser-on-target time

#

after 1.5 million rounds

#

today wiggling, tomorrow aiming

slender plinth Jun 10, 2022, 7:21 PM

#

inland drum I have to build a large dataset on employees from a couple of siloed datasets an...

I don't know if understood correctly.
Bu tha wouldn't be the case of provide all data timestamped to your DS? This will be more like a time series, the DS would be able to analyse trough your data and get what is useful or not.

If it is not helpful, let me know.

wooden sail Jun 10, 2022, 7:24 PM

#

plush jungle after 1.5 million rounds

aha, so it was the number of iterations

plush jungle Jun 10, 2022, 7:24 PM

#

seems that way

#

we'll see if it learns to aim in a million or two more rounds

inland drum Jun 10, 2022, 7:25 PM

#

slender plinth I don't know if understood correctly. Bu tha wouldn't be the case of provide al...

I believe this would just result on passing the problem to the data scientist?

Wouldnt this force them to go around the data silos to find the feature pieces and then figure out how to merge them / combine into useful / homogenous time ranges? We would like to avoid that.

Naturally, we want to still give them the freedom to mix and match but at the same, it seems valuable/standard to have something like a feature store that they can query in an homogenous manner that suits most common training techniques.

I posted the question in help-donut in case anyone has something they'd like to say

slender plinth Jun 10, 2022, 7:29 PM

#

inland drum I believe this would just result on passing the problem to the data scientist? ...

Better, I have not much to add about it as I'm used to "receive" the data and deal by myself

inland drum Jun 10, 2022, 7:34 PM

#

@slender plinth Sounds like you are a DS?

I feel inclined to believe your experience should still be relevant as it is just a matter of who prepares the data.

So if you were to receive just the timestamped data as you suggest (keeping in mind that some data does not follow the same granularity), ie: payroll survey data is only available every 15 days whilst sentiment score has daily snapshots whilst the department is sampled in the dataset daily but may not change for years, how would you do the merge?

Keep in mind that this means you would actually receive N datasets where N is number of silos and you'd have to figure out how to merge them should you need to do that (perhaps by date and employee, but date granularity is non-homogenous)

slender plinth Jun 10, 2022, 7:39 PM

#

inland drum <@470235392101187584> Sounds like you are a DS? I feel inclined to believe your...

Yes sir I am a DS.

So, this is my opinion and experience.
I would love to have someone parsing and organising data before it comes to me.

But! What I use to see is data coming in raw The DS is up to verify and understand how the DB's should be merged and what are useful or not.

If you could talk to your DS and ask him how he/she would like to receive the data would be the best. I kind like receive all the raw information so I can take my own insights from it and them merge them to do all the work (cleaning, feature engineering, etc...)

misty flint Jun 10, 2022, 7:44 PM

#

I would love to have someone parsing and organising data before it comes to me.

#

if only

#

there is super nested json where it feels like i have to grab hidden secret data

#

in order to train a model

#

it is a nasty schema too since there's somehow multiple choice questions in there

#

too many records too

#

it's like 7 levels in btw

#

kekHands

#

and then like the questions and answers are on separate levels

#

like who did this

#

kekHands

inland drum Jun 10, 2022, 7:49 PM

#

slender plinth Yes sir I am a DS. So, this is my opinion and experience. I would love to have ...

I have a meeting scheduled with him but I am trying to get a better feel for the matter to not be so lost.

I think I failed to english in my last paragraph and omitted the question after just explaining context lol

If its not much of a bother or you have time, I'd like to hear how would you approach merging such an heteregenous dataset? Like, don't you need to have homogenous date ranges in order to pass that to an estimator?

slender plinth Jun 10, 2022, 9:08 PM

#

inland drum I have a meeting scheduled with him but I am trying to get a better feel for the...

Sorry for the late reply, urgent meeting..

Yes, you need to have specific keys between the data set in order do connect them.
But you can do this, see if make sense:
Link all your data by user Id if available, for the data that are not daily, create a specific invalid dare(1/1/1800) and let the DS know. For all dataset, create week and month columns, I believe, by your explanation that you can do at least for all data points by month, if you could also add weeks would be good.

With that, the DS will be able to connect.

Edit: I think your primary key would be the user ID, then you would have your secondary key's Date, Week, Month, Year(if that is the case).

I could only come up with that solution, not sure if is the optimum but pretty sure your DS will figure out how to handle this data.

inland drum Jun 10, 2022, 10:16 PM

#

No need to apologize mate, thanks a bunch for taking the time to answer.

This sound like a plan, I also spoke to them and got some clues onto what structure to keep the data at the DS level for the least amount of pain for us all. I don't think its necessary to go in length but it hints to me that we need to keep the data ungrouped because different analysis will require them to do different groupings so summarizing from the source instead of during their pre-processing may end up restricting some analysis. @slender plinth

chilly token Jun 10, 2022, 11:57 PM

#

can anyone help me a bit I'm a beginner in programming, i want to extract data from a pdf and make graphs using matplotlib, ive already extracted the dat dtring out of the pdf using pdfminer but the way it is will be very hard for me to parse it, i want to make something where i upload any similar pdf and it should give me graphs. Now anyone can help or guide in this please that would be appreciated

gusty frost Jun 11, 2022, 1:59 AM

#

Should I learn the math for ML/AI before I get started with programming and making projects?

serene scaffold Jun 11, 2022, 2:18 AM

#

gusty frost Should I learn the math for ML/AI before I get started with programming and maki...

AI libraries abstract a lot of the math away. but you can't really make intelligent decisions about what models or neural architectures to use unless you understand the math.

#

you should learn the basics of linear algebra, probability, and statistics. at the very least.

#

but if you feel like starting a project, and you enjoy working on it, and you learn something along the way, it doesn't necessarily matter if you don't finish it.

gusty frost Jun 11, 2022, 2:29 AM

#

So I should learn the basics of linear algebra, and statistics before actually programming?

#

Wouldn't I also have to learn some calculus?

#

@serene scaffold

serene scaffold Jun 11, 2022, 2:31 AM

#

gusty frost Wouldn't I also have to learn some calculus?

eventually, yes

serene scaffold Jun 11, 2022, 2:31 AM

#

gusty frost So I should learn the basics of linear algebra, and statistics before actually p...

you can still experiment with numpy, pandas, and sklearn if you want

gusty frost Jun 11, 2022, 2:34 AM

#

serene scaffold you can still experiment with numpy, pandas, and sklearn if you want

If I learn the basics of linear algebra and statistics I would be able to make AI/ML projects?

serene scaffold Jun 11, 2022, 2:36 AM

#

gusty frost If I learn the basics of linear algebra and statistics I would be able to make A...

there are ML algorithms that just involve statistics, so yes

gusty frost Jun 11, 2022, 2:36 AM

#

what about the programming part?

gusty frost Jun 11, 2022, 2:36 AM

#

serene scaffold there are ML algorithms that just involve statistics, so yes

what would I have to learn for programming the actual model?

serene scaffold Jun 11, 2022, 2:37 AM

#

gusty frost what would I have to learn for programming the actual model?

you can use numpy and sklearn and stuff

gusty frost Jun 11, 2022, 4:25 AM

#

Do you guys have any resources on linear algebra/statistics for ML?

tacit basin Jun 11, 2022, 4:28 AM

#

gusty frost Do you guys have any resources on linear algebra/statistics for ML?

https://mml-book.github.io/

Mathematics for Machine Learning

severe grail Jun 11, 2022, 5:42 AM

#

Hello

arctic wedgeBOT Jun 11, 2022, 5:44 AM

#

Hey @severe grail!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

severe grail Jun 11, 2022, 5:46 AM

#

gilded flame Jun 11, 2022, 5:51 AM

#

#

what is wrong with my dataframes indexing?

#

i used append and concat

#

                    dataFrameStack = None
                    
                    cursor = cnx.cursor()
                    cursor.execute(QUERY)
                    df = pd.DataFrame(cursor)
                    df.head()
                    print(df)


                    if not df.empty:
                        if dataFrameStack is not None:
                            dataFrameStack = dataFrameStack.append(df,ignore_index=True)
                        else:
                            dataFrameStack = df

                 
                    print('\n\n\n\n\n***********************')
                    print(dataFrameStack)
                    field_names = [ i[0] for i in  cursor.description]
                    print(field_names)
                        
                    xlswriter = pd.ExcelWriter('{}/{}.xls'.format(type,loc),engine='openpyxl')

                    if not df.empty:
                        df.columns = field_names  
                      
                        df.to_excel(xlswriter,index=false)

                        xlswriter.save()
                    else:
                        cnx.close()```

#

what is wrong with the logic?

#

pd.concat([dataFrameStack,df],axis=0,ignore_index=True)

#

won't work

#

the second dataframe jump after the last columns

arctic wedgeBOT Jun 11, 2022, 6:16 AM

#

@harsh spade Per Rule 6, your invite link has been removed. If you believe this was a mistake, please let staff know!

Our server rules can be found here: https://pythondiscord.com/pages/rules

fierce pine Jun 11, 2022, 9:05 AM

#

Heylo, on what topic would you guys prefer a research summary!??

fierce pine Jun 11, 2022, 9:07 AM

#

severe grail

Where you are getting problem mate?

wary plank Jun 11, 2022, 1:01 PM

#

hii anyone around? I need some help in a problem.

#

so I have a dataset with 800 features, and fun part is there is no correlation between these variables.

#

So how should i reduce the dimensionality of it?

#

i have tried using pca, feature extraction, feature selection and some other method but on test dataset highest r2 score is 1.8 😦

wooden sail Jun 11, 2022, 1:10 PM

#

if you take an SVD of the data, how do the singular values look? do you know anything about where the data comes from?

wary plank Jun 11, 2022, 1:15 PM

#

they haven't told where does that data come from but i think it's a real world data which is feature engineered

#

it's like 8 main features and rest are derived features

wooden sail Jun 11, 2022, 1:17 PM

#

when you did PCA, how did you choose how many components to keep?

#

and how many did you keep

wary plank Jun 11, 2022, 1:17 PM

#

wooden sail when you did PCA, how did you choose how many components to keep?

so i tried keeping various features starting from 50 going till 800

#

r2 score seems to be increasing with increase in features

wooden sail Jun 11, 2022, 1:19 PM

#

with r2 score you mean mean squared error?

wary plank Jun 11, 2022, 1:20 PM

#

no

#

coefficient of determination

#

explained variance

summer pebble Jun 11, 2022, 1:21 PM

#

anyone knows why my friend's and my feature importance value are different despite the fact that we are using the same dataset? 🤔

wooden sail Jun 11, 2022, 1:21 PM

#

it's a scaled mean squared error, i just checked

wary plank Jun 11, 2022, 1:21 PM

#

wooden sail it's a scaled mean squared error, i just checked

yea scaled between 0-1

wooden sail Jun 11, 2022, 1:22 PM

#

you'd expect this quantity to go to 1 as you increase the number of principal values

wary plank Jun 11, 2022, 1:22 PM

#

it hasn't got above 0.3 even taking all features

wooden sail Jun 11, 2022, 1:22 PM

#

that seems wrong, since taking all features would mean you just have the original data again

wary plank Jun 11, 2022, 1:23 PM

#

and on test data it messed up big time, .9 so it's overfitting

wooden sail Jun 11, 2022, 1:23 PM

#

on training data, you mean?

wary plank Jun 11, 2022, 1:23 PM

#

wooden sail that seems wrong, since taking all features would mean you just have the origina...

yea, that's why i am trying to find ways to feature engineer some attributes from original variables

wary plank Jun 11, 2022, 1:23 PM

#

wooden sail on training data, you mean?

yea yea

wooden sail Jun 11, 2022, 1:24 PM

#

probably needs something more robust to noise. that's why i was asking how the singular values look

wary plank Jun 11, 2022, 1:24 PM

#

wait i will send you the ss

wooden sail Jun 11, 2022, 1:24 PM

#

that can give you some idea of how noisy the data set is, and whether a robust version of PCA could work better

wary plank Jun 11, 2022, 1:26 PM

#

just digging in my old code, give me 2 mins

#

#

n_components is 10

wooden sail Jun 11, 2022, 1:30 PM

#

that does already seem like a pretty decent approx

#

can you plot all of the singular values?

#

how many examples are in the training data

wary plank Jun 11, 2022, 1:32 PM

#

wooden sail how many examples are in the training data

20k rows

#

800 attributes

wooden sail Jun 11, 2022, 1:32 PM

#

20k is pretty good

#

so yeah, there should be 800 singular values and the question is how noisy they are

wary plank Jun 11, 2022, 1:33 PM

#

wary plank Jun 11, 2022, 1:33 PM

#

wooden sail so yeah, there should be 800 singular values and the question is how noisy they ...

how can i analyze the noise of 800 features?

wooden sail Jun 11, 2022, 1:33 PM

#

aha, but there you see the singular values are still quite large

wary plank Jun 11, 2022, 1:33 PM

#

wooden sail aha, but there you see the singular values are still quite large

what does it indicate?

wooden sail Jun 11, 2022, 1:36 PM

#

what i was gonna note is that, if you have your samples in a vector of size 800, and have 20k examples of these vectors, you can place them in a matrix of size 800 x 20k. then 1/20k (MM^T) is an approximation of the covariance matrix. under the assumption that there is noise that is uncorrelated with the true data or true features, 1/20k MM^T = C + N, where C is the true covariance and N is the noise covariance. for real world data, C is usually rank-deficient. noise tends to be full rank, and often/hopefully close to diagonal

#

under those conditions, the singular values of the covariance matrix are the original singular values plus the noise singular values, so the overall covariance appears to be full rank. as long as the true singular values are modestly large, you will mostly see the behavior of the data. once they become small, they are dominated by the noise

#

so if there is a weird sudden change in the profile of the singular values, it often hints at moving out of the signal space and into the noise space

#

(which would, in a noise free case, just be the null space)

#

so it'S a good idea to make a plot of all 800 singular values of the sample covariance

wary plank Jun 11, 2022, 1:41 PM

#

the data has lots of nan values as well, i have dropes those features which has nan values > 15,000

wary plank Jun 11, 2022, 1:53 PM

#

wooden sail so it'S a good idea to make a plot of all 800 singular values of the sample cova...

data is overfitting if i use xgboosting

#

model score is 0.92 on train, on tet its showing 0.34

wooden sail Jun 11, 2022, 1:54 PM

#

what are you doing?

wary plank Jun 11, 2022, 1:54 PM

#

so i took all features and transformed using pca, and ran my model

wooden sail Jun 11, 2022, 1:54 PM

#

why?

wary plank Jun 11, 2022, 1:55 PM

#

i thought that's what you sad 😦

wooden sail Jun 11, 2022, 1:55 PM

#

i never said anything about ML

#

we were looking at the data first to see if we could learn something

wary plank Jun 11, 2022, 1:55 PM

#

na na model i ran just to check

wary plank Jun 11, 2022, 1:55 PM

#

wooden sail we were looking at the data first to see if we could learn something

okay my bad.

wooden sail Jun 11, 2022, 1:56 PM

#

and what is your model doing anyway? what are you trying to get from the data

wary plank Jun 11, 2022, 1:56 PM

#

based on features i am trying to predict the score of a person given by some coach

wooden sail Jun 11, 2022, 1:56 PM

#

scores of what

#

you never mentioned any of this before so i have no idea what you're doing

wary plank Jun 11, 2022, 1:57 PM

#

okay i will tell you problem first

#

so i have 8 main features of a football player, and based those 8 features there are other derived variables. We are trying to predict the score given by a scout on the basis of those features

#

features include, position 1, position 2 of a player, weight, age, height, team code he plays for etc etc

wooden sail Jun 11, 2022, 2:01 PM

#

all right

wary plank Jun 11, 2022, 2:01 PM

#

so nearly all the features are scaled between 0, 1

#

including height

#

some categorical features are there which i changed using one hot

#

i just need some clue or hints on how to handle these many features, as i have never worked on something like this

#

zenith panther Jun 11, 2022, 2:46 PM

#

hi, i have a question.. is image classification useful to rate images out of 5 ??

serene scaffold Jun 11, 2022, 2:50 PM

#

zenith panther hi, i have a question.. is image classification useful to rate images out of 5 ?...

what do the ratings mean?

zenith panther Jun 11, 2022, 2:52 PM

#

from the image i can tell if the house for instance is high standing

#

or not

#

the rate will be from 1 to 5 its like giving 5 stars thing

shell depot Jun 11, 2022, 2:53 PM

#

zenith panther hi, i have a question.. is image classification useful to rate images out of 5 ?...

I think yess

#

coz you already have 5 classes, so yess by classifying an image the result will be one of the five classes ofc

zenith panther Jun 11, 2022, 2:55 PM

#

shell depot coz you already have 5 classes, so yess by classifying an image the result will ...

ah i see i see thank you

shell depot Jun 11, 2022, 2:55 PM

#

wlcm

serene scaffold Jun 11, 2022, 2:58 PM

#

zenith panther from the image i can tell if the house for instance is high standing

so the goal is to decide how "good" the house is on a scale of 1 to 5?

zenith panther Jun 11, 2022, 2:58 PM

#

serene scaffold so the goal is to decide how "good" the house is on a scale of 1 to 5?

yes

serene scaffold Jun 11, 2022, 3:08 PM

#

zenith panther yes

what about the value of the home? are you trying to decide that?

zenith panther Jun 11, 2022, 3:10 PM

#

yes its like seeing the rating of the house comparing to its price

#

cause i did scraping to get the data

zenith panther Jun 11, 2022, 3:11 PM

#

zenith panther yes its like seeing the rating of the house comparing to its price

the rating of the image of the house*

west tapir Jun 11, 2022, 6:12 PM

#

Is there any algorithm which dynamically updates/eliminates a various number of output while the user is giving input to a set number of questions?

stoic adder Jun 11, 2022, 6:14 PM

#

west tapir Is there any algorithm which dynamically updates/eliminates a various number of ...

What are you trying to do??

west tapir Jun 11, 2022, 6:21 PM

#

stoic adder What are you trying to do??

I got a question from a friend,

Assume there are 20 personality questions which are mandatory each questions have a specific and unique weights assigned to them. based on the questions answered by the user there are different output or lets call the buckets, for example the person is a party person, introvert, alcoholic etc.
Now what we have to do is, ask all the employees of a company to fill out the form and based on their input we have to put them in different buckets. Now we can do it with any language by just comparing the weights. My friend ask that what can we do to make it dynamic using python specifically

#

so i was thinking of an ai/ml algorithm which eliminates the output while the user inputs the form

stoic adder Jun 11, 2022, 6:26 PM

#

I think the better way would be to just compare the weights as u suggested earlier. I doubt there's an algorithm of such type. Not that I've heard of ofc.

#

Again I'm not sure if there really isn't any algorithm possible. U can try out some unsupervised operations to make sure tho.

west tapir Jun 11, 2022, 6:28 PM

#

hmm alright will try

#

ty!!

naive rover Jun 11, 2022, 6:29 PM

#

hi

#

I need help about data science, the chat room of the problem is #help-chocolate

#

🙂

barren wedge Jun 11, 2022, 6:36 PM

#

Does any one know how to code a game

stoic adder Jun 11, 2022, 6:42 PM

#

barren wedge Does any one know how to code a game

Any specific idea you working on??

misty flint Jun 11, 2022, 7:16 PM

#

barren wedge Does any one know how to code a game

have you checked #game-development

barren wedge Jun 11, 2022, 7:16 PM

#

I will go check..

naive rover Jun 11, 2022, 8:01 PM

#

hi

#

I need help about data science, the chat room of the problem is help-chocolate

#

I'm developing a machine learning model to identify non-payers

#

#help-chocolate

haughty topaz Jun 11, 2022, 8:11 PM

#

from sklearn.cluster import KMeans
from scipy.spatial import KDTree
import webcolors
import cv2

def convert_rgb_to_names(rgb_tuple):
    # a dictionary of all the hex and their respective names in css3
    css3_db = webcolors.CSS3_HEX_TO_NAMES
    names = []
    rgb_values = []
    for colour_hex, colour_name in css3_db.items():
        names.append(colour_name)
        rgb_values.append(webcolors.hex_to_rgb(colour_hex))
    
    kdt_db = KDTree(rgb_values)
    distance, index = kdt_db.query(rgb_tuple)
    # This of course only returns a closest match
    return names[index]

image = cv2.imread(r"helper\data\Nike-SB-Dunk-Low-Pro-Bart-Simpson-Product.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# get n clusters of colours
image = image.reshape((image.shape[0] * image.shape[1], 3))
clt = KMeans(n_clusters = 4)
clt.fit(image)

img_4_most_frequent_colours = []
for colour in clt.cluster_centers_:
    colour = convert_rgb_to_names(colour.astype("uint8").tolist())
    img_4_most_frequent_colours.append(colour)


print(img_4_most_frequent_colours)

#

I'm trying to get 4 colours from a picture

#

is this an efficient way?

rich merlin Jun 11, 2022, 8:29 PM

#

Would someone have a moment to guide me through what I would need to do to begin this assignment?
Or perhaps a good youtube tutorial to follow for this assignment? I've only ever used pandas on jupyter notebooks

fierce pine Jun 11, 2022, 9:41 PM

#

fierce pine Heylo, on what topic would you guys prefer a research summary!??

@wooden sail :)

cerulean stream Jun 11, 2022, 10:18 PM

#

Hi, Im having a little trouble w/ numpy
where arr is a 2D array with shape (30, 30 and dtype=uint8

arr = np.where(
    arr < lower,
    new - diff, 
    np.where(
        arr > upper, 
        new + diff, 
        new - color + arr,
    )
).astype(np.uint8)

this was my former but (slow) solution

def func(element: int, new: int) -> int:
    if element < lower:
        return new - diff
    elif element > upper:
        return new + diff
    else:
        return new - color + element
# and I map func over each element within the nested array

it does not match the desired results at all 😔
does anyone happen know where the process is differing pithink

serene scaffold Jun 11, 2022, 10:43 PM

#

cerulean stream Hi, Im having a little trouble w/ numpy where arr is a 2D array with shape `(30,...

new = arr - color + element
new[arr < lower] -= diff
new[arr > upper] += diff

something like this?

#

if that doesn't help and you want to continue, please say the types of all the variables in your example (other than arr)

tidal bough Jun 11, 2022, 10:45 PM

#

it looks pretty fine for me

cerulean stream Jun 11, 2022, 10:55 PM

#

serene scaffold if that doesn't help and you want to continue, please say the types of all the v...

everything is integers (except for arr obv)

cerulean stream Jun 11, 2022, 10:56 PM

#

tidal bough it looks pretty fine for me

yea I was going off of the example you provided today this morning

cerulean stream Jun 11, 2022, 10:57 PM

#

serene scaffold ```py new = arr - color + element new[arr < lower] -= diff new[arr > upper] += d...

yes I was suggested that initially and tried it that but what about the else statement,
in which this morning @ confused reptile suggested np.where

serene scaffold Jun 11, 2022, 11:10 PM

#

cerulean stream yes I was suggested that initially and tried it that but what about the else sta...

instead of an else statement, the new array is created where every element is what the else block would have created.

cerulean stream Jun 11, 2022, 11:10 PM

#

serene scaffold instead of an else statement, the `new` array is created where every element is ...

ah that is smart I did not see that I will try it, thank you

serene scaffold Jun 11, 2022, 11:11 PM

#

cerulean stream ah that is smart I did not see that I will try it, thank you

I suppose it's "smart", but it also results in wasted computation.

arctic wedgeBOT Jun 12, 2022, 12:02 AM

#

:incoming_envelope: :ok_hand: applied mute to @shell yew until <t:1654992737:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

misty flint Jun 12, 2022, 1:23 AM

#

my favorite DS youtuber https://www.youtube.com/watch?v=0ItYIoOrrUs

YouTube

Ken Jee

Can Machine Learning Fix My Baseball Swing?

What happens when Machine Learning and Baseball converge? You get a system that tells you exactly what you need to do to improve your baseball swing. How much will I improve? Watch to find out! Learn more about it here: https://www.sas.com/en_us/curiosity/battinglab.html

Special thanks to SAS for bringing me out to see their batting lab. The b...

▶ Play video

#

i didnt even know SAS had a campus and everything

#

kekHands

#

fun fact: they apparently use python there too

#

💀

wooden sail Jun 12, 2022, 2:19 AM

#

fierce pine <@467435887236612106> :)

current flavors of explainable AI?

steel burrow Jun 12, 2022, 2:27 AM

#

I’m looking for a data Analysis book, I was recommended this book Wes McKinney Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython but I’m seen bad Reviews that’s not updated is there’s a better book then this I need your suggestions please

tacit basin Jun 12, 2022, 3:26 AM

#

steel burrow I’m looking for a data Analysis book, I was recommended this book Wes McKinney P...

Updated version here https://wesmckinney.com/book/

Python for Data Analysis, 3E

pseudo wren Jun 12, 2022, 3:26 AM

#

What is a good package to use for the double scalar error or a good fix in general

#

I’ve heard that error is a result of too many large negative numbers

tacit basin Jun 12, 2022, 3:28 AM

#

pseudo wren What is a good package to use for the double scalar error or a good fix in gener...

What is double scalar error?

pseudo wren Jun 12, 2022, 3:29 AM

#

From what I found on stackoverflow that error occurs when there’s a very large very negative number

#

I am trying to create a predictive time series model

loud cove Jun 12, 2022, 12:33 PM

#

honestly, just create an inside function that filters and pass those values.

rich fiber Jun 12, 2022, 1:22 PM

#

Can someone guide me on AI

#

In blogs there are conflicts

#

Maybe a roadmap you can provide?

serene scaffold Jun 12, 2022, 1:29 PM

#

@rich fiber start by learning about k nearest neighbors

rich fiber Jun 12, 2022, 1:29 PM

#

I just learned the basics of python

#

Now should I learn about data analysis libraries?

#

Or algorithms first?

serene scaffold Jun 12, 2022, 1:30 PM

#

rich fiber Now should I learn about data analysis libraries?

you can experiment with numpy and pandas, I guess. but it's more useful to learn how to do a certain thing, and learn how to use the libraries as a secondary concern.

rich fiber Jun 12, 2022, 1:32 PM

#

Alright so I start by learning different ML algorithms?

serene scaffold Jun 12, 2022, 1:32 PM

#

rich fiber Alright so I start by learning different ML algorithms?

sure. make sure you already know the basics of linear algebra and stats.

rich fiber Jun 12, 2022, 1:34 PM

#

Can you kindly suggest me any resources to learning the maths required

#

I tried a book of Cambridge University and I could barely read a line without having to Google

pliant pewter Jun 12, 2022, 1:37 PM

#

Cambridge University Press publishes all kinds of textbooks at all university levels

rich fiber Jun 12, 2022, 1:38 PM

#

pliant pewter Cambridge University Press publishes all kinds of textbooks at all university le...

The book "Mathematics for Machine Learning" by A. Aldo Faisal

pliant pewter Jun 12, 2022, 1:39 PM

#

That sounds like it's trying to teach you the mathematics. I haven't read it, so I don't know what level it's aimed at.

rich fiber Jun 12, 2022, 1:39 PM

#

Yes that exactly what I was trying to learn alongside basics of python

#

So later on I won't have trouble learning the algorithms

#

Or dealing with large data (I hope)

rich fiber Jun 12, 2022, 1:41 PM

#

pliant pewter That sounds like it's trying to teach you the mathematics. I haven't read it, s...

Which books have you read as a beginner?

pliant pewter Jun 12, 2022, 1:43 PM

#

It sounds like you're a beginner in three different things at once: Python, mathematics, and machine learning. I don't think I have good advice for that, because I have not attempted to learn all three of these at the same time.

rich fiber Jun 12, 2022, 1:44 PM

#

I've done python

pliant pewter Jun 12, 2022, 1:44 PM

#

I would say, go get the mathematical foundation you need to understand the machine learning textbook, and then come back to machine learning later. You need at least basic statistics, probability theory, calculus, and linear algebra.

rich fiber Jun 12, 2022, 1:44 PM

#

Yes that's exactly what iw as trying to do

#

Kindly give me a resource where I can learn the maths required

pliant pewter Jun 12, 2022, 1:45 PM

#

I don't have a good thing to recommend as all the resources I know are graduate level

#

But look up textbooks on those topics.

rich fiber Jun 12, 2022, 1:46 PM

#

Unfortunate although thanks for all the help :)

inner jackal Jun 12, 2022, 2:10 PM

#

hey guys i don't know if im' in the right channel, i'm looking for a good proxies for scraping with python any good website? thanks in advance

proud skiff Jun 12, 2022, 2:33 PM

#

Hey can anyone help me out with my DL college project?
It's an Image Classification DL work
I've constructed the model, but my accuracy is always at 0.5000

#

https://cdn.discordapp.com/attachments/702850003814580307/985544729355755550/Screenshot_2022-06-12_193133.png

#

https://cdn.discordapp.com/attachments/702850003814580307/985544792152866887/Screenshot_2022-06-12_193154.png

#

https://cdn.discordapp.com/attachments/702850003814580307/985544817280974948/Screenshot_2022-06-12_193213.png

vivid jasper Jun 12, 2022, 3:11 PM

#

Hello! Does anyone have a way to save an excel file with a password that works reliably in python?

gleaming marsh Jun 12, 2022, 3:30 PM

#

Anyone w/ experience using numpy and numba together? I'm having a weird error regarding arrays and matrices #help-cupcake

lapis sequoia Jun 12, 2022, 3:40 PM

#

Hehe boi

limber token Jun 12, 2022, 5:44 PM

#

limber token Hey guys, any idea how I can slice a pandas dataframe with a datetime column eve...

Hey guys, it's me again lemon_angrysad

Is there any way I can filter a datetime column in a df by the closest timedelta?

serene scaffold Jun 12, 2022, 6:05 PM

#

limber token Hey guys, it's me again <:lemon_angrysad:817323592693841961> Is there any way ...

What exactly do you mean by filter

#

Are you sure you don't mean sort?

limber token Jun 12, 2022, 6:07 PM

#

Yes, they're already sorted, what I want is to find the next row with the closest timedelta to either 30 or 365 days

serene scaffold Jun 12, 2022, 6:08 PM

#

@limber token keep in mind that "filter" means "retain only values that satisfy a certain condition". It does not mean "select the most similar "

limber token Jun 12, 2022, 6:09 PM

#

Okay, but "sort" is not really the word here either is it? lemon_thinking

serene scaffold Jun 12, 2022, 6:09 PM

#

It's not

limber token Jun 12, 2022, 6:09 PM

#

Anywhoo

#

Any tips?

serene scaffold Jun 12, 2022, 6:11 PM

#

See if there's any "select closest" functionality built into pandas. I doubt that there is, but it's worth it to check.

Otherwise you'll have to make a new column that is the time Delta and loop through it.

#

Well, I guess you don't have to loop through it manually. Because if you have a column of timedeltas, the closest one is going to be the idxmin

limber token Jun 12, 2022, 6:12 PM

#

When searching "select closest date" I only found how to find the closest date to the initial date

limber token Jun 12, 2022, 6:12 PM

#

serene scaffold Well, I guess you don't have to loop through it manually. Because if you have a ...

How do you mean?

serene scaffold Jun 12, 2022, 6:14 PM

#

limber token How do you mean?

Do you understand what an argmin or argmax are? Idxmin is the pandas version of argmin. If you don't know argmin, read about that and come back.

If you have a column of timedeltas relative to time x, the idxmin will be the index of the row closest to time x.

limber token Jun 12, 2022, 6:17 PM

#

I know what argmin and argmax are, what I meant is, I'm trying to find the closest days to a month and a year after date x, so I'm confused how idxmin would help 😺

(Sorry if I'm being confusing, not a native English speaker)

serene scaffold Jun 12, 2022, 6:19 PM

#

limber token I know what argmin and argmax are, what I meant is, I'm trying to find the close...

You can add a 13 month (one year plus one month) time Delta to the original date, and then do what I said, and you'll have the closest date to 13 months out.

limber token Jun 12, 2022, 6:19 PM

#

Oh of course, that makes a lot of sense

#

Thank you 🙂

serene scaffold Jun 12, 2022, 6:21 PM

#

@limber token I'm at the gym. If you're still confused in like an hour, ping me and I can give a better example

hollow sentinel Jun 12, 2022, 6:37 PM

#

generally speaking it's possible that your dataset doesn't have a problem that can be answered, right?

#

https://www.kaggle.com/datasets/azminetoushikwasi/cr7-cristiano-ronaldo-all-club-goals-stats

⚽ Cristiano Ronaldo ⭐ All Club Goals 📈📊

Date, Time, Opponent, Match - All 698/+ Club Goal stats of Cristiano Ronaldo-CR7

#

for example idk what the problem here is

#

i often have a hard time thinking of the problem at hand

haughty topaz Jun 12, 2022, 6:59 PM

#

What would be a good way to add values to the NA values in this dataframe

#

I'm trying to plot three lines but there's values missing

serene scaffold Jun 12, 2022, 7:01 PM

#

haughty topaz What would be a good way to add values to the NA values in this dataframe

why are they missing? what would they be if they weren't missing?

haughty topaz Jun 12, 2022, 7:01 PM

#

Cause the data doesn't exist

#

And I guess I'm trying to find a good estimation

serene scaffold Jun 12, 2022, 7:02 PM

#

haughty topaz Cause the data doesn't exist

and this is time series data?

haughty topaz Jun 12, 2022, 7:02 PM

#

yea

serene scaffold Jun 12, 2022, 7:03 PM

#

haughty topaz yea

you need to do "time series interpolation"

#

here's an article about it https://towardsdatascience.com/how-to-interpolate-time-series-data-in-apache-spark-and-python-pandas-part-1-pandas-cff54d76a2ea

#

#

you can see how they insert values that "make sense" given the known values.

haughty topaz Jun 12, 2022, 7:08 PM

#

ok thx

#

df1[df1["kwartaal"] == "Q1"]["month"] == "january"

#

Is it not possible to add a column on a slice like this?

#

"month" is a new column

serene scaffold Jun 12, 2022, 7:22 PM

#

haughty topaz Is it not possible to add a column on a slice like this?

use loc instead of stacked [ ][ ]

#

also, you have == on the rightmost side, which is not assignment.

#

that said, you can't add a column to a slice. every cell in a column has to have a value, even if it's NaN. (though pandas might initialize values outside the slice to NaN if you do it that way--idk)

rich fiber Jun 12, 2022, 7:23 PM

#

Can someone confirm this,
"House with 2 bedrooms are cheaper than house with 3bedrooms" is data science
However when predicting prices, it is known as ML

serene scaffold Jun 12, 2022, 7:24 PM

#

rich fiber Can someone confirm this, "House with 2 bedrooms are cheaper than house with 3be...

the whole framing of this question is weird. data science is using principles from programming and math to use large amounts of data. ML is when you have an algorithm that adjusts itself ("learns") based on data.

rich fiber Jun 12, 2022, 7:25 PM

#

Alright I get you

#

Thanks

serene scaffold Jun 12, 2022, 7:26 PM

#

but I guess just making a factual statement about a trend in the data is more "data science" than it is ML.

haughty topaz Jun 12, 2022, 7:26 PM

#

df1.loc[df1["kwartaal"] == "Q1", ["month", "day"]] = "january", "1"

#

yea this adds NA values for all not Q1

serene scaffold Jun 12, 2022, 7:39 PM

#

haughty topaz ```py df1.loc[df1["kwartaal"] == "Q1", ["month", "day"]] = "january", "1" ```

my prediction was correct 🔥