#data-science-and-ml

1 messages Β· Page 409 of 1

iron basalt
#

In the equation what is being updated? Write the expression here.

plush jungle
#

Q(action) = Q(action) + learning_rate * (reward_t + discount_factor * MaxQ(s_t+1,a) - Q(Action))

#

?

#

the value in the table that corresponds to the action that just got us to the goal

#

is being added to because we just got a reward

iron basalt
#

Q takes two arguments.

plush jungle
#

a state and an action?

iron basalt
#

Yes. As shown in the link.

plush jungle
#

so that means that this part

#

MaxQ(s_t+1,a)

iron basalt
#

No.

#

Q always takes two arguments.

plush jungle
#

ok Q(state_t, action_t) is being incremented

#

based on both the current reward

#

and the future predicted reward?

iron basalt
#

Not incremented necessarily, just updated.

plush jungle
#

right

#

in this part MaxQ(s_t+1,a) what does a represent?

#

the action we just took?

#

and s_t+1 is the state we got as a result of action at time t

iron basalt
#

Ok, so from the beginning, we are at s_t and take action a_t, we are now at s_t+1. And according to the equation we are updating Q(s_t, a_t), which is not a value at the current state Q(s_t+1, ...).

plush jungle
#

not a value?

#

oh right

#

because we haven't gotten a reward yet

#

so everything in the tables is just 0

iron basalt
#

Q(s_t, a_t) does not hold the value at the current state s_t+1, it's the previous state (and action from there).

plush jungle
#

oh... so s_t+1 is the current state? the state after we took the action at time t?

iron basalt
#

Yes.

plush jungle
#

ok

#

so what is a in this MaxQ(s_t+1,a)

iron basalt
#

If you are at s_t, and take some action, you are now at s_t+1. Or from a different POV (looking into the past), you are now at s_t, and were at s_t-1.

#

You could, if you wanted to, rewrite the equation from that POV, but it's the same thing.

#

(Just trivial change of variable names)

#

If you look at wikipedia for example: https://en.wikipedia.org/wiki/Q-learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations.
For any finite Markov decision process (FMDP), Q-learning finds a...

#

In the algorithm section: ```
Before learning begins, Q {\displaystyle Q} Q is initialized to a possibly arbitrary fixed value (chosen by the programmer). Then, at each time t {\displaystyle t} t the agent selects an action a t {\displaystyle a_{t}} a_{t}, observes a reward r t {\displaystyle r_{t}} r_{t}, enters a new state s t + 1 {\displaystyle s_{t+1}} s_{t+1} (that may depend on both the previous state s t {\displaystyle s_{t}} s_{t} and the selected action), and Q {\displaystyle Q} Q is updated. The core of the algorithm is a Bellman equation as a simple value iteration update, using the weighted average of the old value and the new information

plush jungle
#

but is this correct then?
MaxQ(s_t+1, a_t)

iron basalt
#

Yes.

#

But note that at the end goal there are no more actions to make.

plush jungle
#

so the first time it reaches the end of the maze, that MaxQ is no value, but when it reaches the second to last square, the MaxQ gets passed the last square as its state?

iron basalt
#

The reward is given for the transition from s_t to s_t+1 (r(s_t, a_t)).

#

Since you are reaching the end goal, there is a reward.

#

Which is used to update the value.

#

But Q-learning uses the reward given and the values. So for ones where there is some next possible action (not at the terminal / final state), there is some max value.

plush jungle
#

OH

#

I think i'm starting to understand why it's QMax

#

when updating the weights for state t

#

it doesn't just add the reward

iron basalt
#

Remember that the values (not rewards), are taking into account the future in this back chaining way.

#

And they take into account the max possible on the next.

plush jungle
#

it also adds the best move from state t+1

#

the one it thinks is the best move anyway

iron basalt
#

(greedy)

plush jungle
#

so if you get a temporary reward, but the highest move from there is only negative, like dead ends

#

it'll punish it

iron basalt
#

Punishment comes in the form of the reward (e.g. negative or just less), value is trying to take into account future discounted rewards.

plush jungle
#

but is it recursive?

#

or is it just the t+1 move Q value that gets added in

#

not t+2

iron basalt
#

So lets say i'm playing chess and I take a Queen, huge reward. But now my opponent wanted that and they sacrificed it. So they make a move that leaves me with only like 3 moves and all of them are bad. They have bad value, but I will still pick the best one (the max given some action). So now next time i'm in that first state, I know that while my current next reward for taking the queen is good, taking into account my future options (even the best one / max), it's not worth it and I need to update my values to reflect that.

plush jungle
#

oh, because every state takes into account the state right after it, it's not recursive, but it has essentially the same effect

#

so three moves before taking the queen, that updated weight gets factored in implicitly because each move considers the Q value of the move right after it as well

#

and it's like links in a chain

iron basalt
#

Single step here, going from s to s' by taking action a, resulting in reward r. Now you can update Q(s, a) (update your table if it's tabular).

mint palm
#

which python version is best for all D

#

Dl, Comp vision stuff, and visualisation and all

iron basalt
mint palm
#

3.7.X?

iron basalt
#

Another way to look at the terminal state in chess is that in that case you only have 1 action, surrender, and so the max of the options is surrender (only item to do max of), and it's a really low max.

plush jungle
#

I really appreciate you taking the time to explain this

iron basalt
#

But you don't actually need a future value for the terminal state, because there is no future after that.

#

The bad reward for getting there will propagate on its own. (max of 0 actions to take, can just let it give 0)

#

(Or just not have the terminal state be a special case and treat surrender as an action, whatever, same thing, depends on how you prefer to code it)

stark saddle
#

I need To know how to make image matching in python like in gta 4 or police verification systems

#

Like how to achieve that ?

vital torrent
#

Hello guys! Nice to meet you all :), I was wondering if you could help me with something. I'm just within my Data Science master's program, and I'm still a little bit new to this field. My inquiry is, is there something such as a "nested time series regression" model using Keras? Like, a model using a time series data, but instead of having a defined "batch size" for the input of the model, the input of the model depends of the entries available of each "ID" or "Patient" in the dataset? (like treating each batch/sample as an independent one for the input of the time series model)

#

I hope I'm making myself clear, I'm just struggling with a project right now πŸ™

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

haughty pewter
#

to find a correlation between the two, is it right to say:

  • since satisfied column is left-skewed, scores of 41 and 43 are the most frequent scores of being satisfied with their flight, or lower scores (22 or less) exponentially decreases with the decrease in scores
  • as neutral or satisfied column is normally distributed, it is very close to the median Final Score of 37.5 compared to the satisfied columns, with its most common value being 36.
  • we can can conclude that the highest values of both categories of satisfaction are considerably close to the median Final Score of 37.5, making the Median having a correlation between both satisfaction categories, even if the Neutral or Disassatisfied Columns are left-skewed, it does not deviate much
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @upper lichen until <t:1654688547:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

urban lance
#

I'd like to access each individual group of a pandas group by. (there are about 277k of those).
The goal is to pass each individual group though a function, what would be the most efficient way to go about this?

haughty topaz
#

So what 277k groups which you each want to put through the same function?

urban lance
#

(data on) users of a website @haughty topaz

rose agate
urban lance
#

I did this:

groups.append([list(groups) for groups in grouper.groups])

This is probably not very efficient (and I guess my kernel doesn't like it)

serene scaffold
plush jungle
#

I've rewritten some deep Q learning code that was originally built to learn flappy bird (which it does perfectly after 2 million rounds)

#

I revamped it to try to train this stationary blue circle to shoot this stationary red circle

#

but it's been about 900,000 rounds, and it's still not learning effectively

#

the reward structure for this model is

Doing Nothing -> 0
Firing -> 0.1
Aiming up one degree -> 0.2
Aiming down one degree -> 0.2
having the laser pointed at the red dot -> 1.0
having a bullet overlapping with the red dot -> 10.0

#

it trains every turn by selecting a batch of 32 turns chosen randomly from the previous 10,000, and doing the Bellman Equation on them

#

any idea why it's not training as effectively as the flappy bird model with the same code was?

rose agate
# plush jungle any idea why it's not training as effectively as the flappy bird model with the ...

I know nothing about this problem so I'm probably no help, but have you observed the laser ever actually point at the red dot? I thought that maybe it's just continually shooting below and since it never sees it its just randomly aiming up and down randomly which might not be enough to get to the point where it sees it in the first place. Maybe you can alter the reward to be +0.2 for rotating aim to the right and +0.1 to rotate aim to the left so it tends to drift to the right so it'll eventually intersect with the dot if that is that problem.

plush jungle
#

and I've observed it both shooting the target successfully and aiming at it successfully

#

but it's rate of success has barely gone up at all

rose agate
#

Also my intuition suggests that having a reward for pointing at the dot is a bad idea because that's not the actual angle it needs but idk

plush jungle
#

wait why

rose agate
#

is the ball falling down in a parabola?

#

or does it go straight

plush jungle
#

ball goes straight

rose agate
#

oh ok I thought it was falling in a parabola, that makes sense then

plush jungle
#

oh yeah

rose agate
#

so if the laser points and fires a bullet at the dot it will for sure overlap with the red dot later?

#

it seems weird to have two rewards for what will be the same outcome

plush jungle
#

I added the laser overlap reward as an attempt to get it to learn more effectively, but it doesn't seem to have worked

#

I was worried that because the bullet hits like 50 turns later, it was too far in the future to effectively teach the agent

rose agate
#

my intuition would be to try removing the overlap reward and making to aim up reward double the aim down reward to see what happens. Definitely not my area of work though, this makes me want to look into it

rose agate
#

thanks

pseudo wren
#

is anyone familiar with the loader issue in google colab

#

it's driving me nuts

#

i am trying to import packages and i keep hitting the loader issue

hollow sentinel
#

i don't really use google colab so idk

misty flint
#

/what are you trying to do

pseudo wren
#

But I keep running into an error that says it’s missing a positional argument

#

I figured it out though

#

I had to download an old version of Colab

bold timber
#

Hi, I have a problem that makes me confused. In the cost column, I have the value of the list. How do detect all of the values in the cost column that contained the list values?

serene scaffold
#

because now that you have numbers and lists of numbers in the same Series, that is bad.

pseudo wren
#

Hm... i'm getting an error in my model that it failed to converge

#

not totally sure how to proceed with it

#

or how to go about fixing it

#

i've been browsing stack overflow for the answer but haven't found a ton of helpful resources.

bold timber
jaunty sky
#

hey folks, I am given a problem statement, to predict whether a transaction might be fraud or not based on few parameters, and there are some customer id, should I cluster them on their id and rather than predicting their transaction one by one?

arctic wedgeBOT
#

Hey @jaunty sky!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
#

what information are you given about each transaction?

serene scaffold
#

it will be easier for me to help if you answer the question directly. I don't want to have to wade through a pdf.

jaunty sky
#

this is the exact task

#

so the main question is to predict whether a transaction is fraud or not

#

it pretty simple (just the question statement)

#

but its not given whether to cluster them or not

cerulean mauve
#

Hello, was hoping to ask if anyone knew how to see the row by row changes that happen when using vectorized operations in pandas/numpy.

#

Without looping through row data ideally.

#

since the vectorized operation is a lot faster without needing to loop.

serene scaffold
jaunty sky
serene scaffold
grave cloak
#

can someone help me with pandas?

serene scaffold
grave cloak
#

Ah, ok

#

I need to do an analysis of a championship. Each year has a file and I need to merge this data, is it possible?

grave cloak
#

json and csv

serene scaffold
#

Does the Jason have basically the same kind of data

grave cloak
#

Text files are as json and tables as CSV

serene scaffold
#

So each year has a Jason and a csv?

grave cloak
#

It's a little more complicated

#

Sorry

serene scaffold
#

or something. just leaving it at "it's complicated" doesn't help me help you.

jaunty sky
serene scaffold
jaunty sky
#

type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

amount - amount of the transaction in local currency.

nameOrig - customer who started the transaction

oldbalanceOrg - initial balance before the transaction

newbalanceOrig - new balance after the transaction

nameDest - customer who is the recipient of the transaction

oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.```
#

these are attributes details

serene scaffold
#

@grave cloak this json can't be read directly as tabular data. how does it relate to the content in the CSVs?

#

there's unfortunately way too much content for me to wrap my head around what the schema of the JSON is.

grave cloak
#

Ok, I get it. I'm talking to the author of the dataset

cerulean mauve
#

@serene scaffold do you know of a way to get debug output when a vectorized operation is carrier out in pandas?

#

Let's say I wanted to cast all my dates to datetime64 and I wanted to see what rows got operated on. Is this possible?

opaque echo
#

Hey, so have an np array of say 70K points with their x and y coordinates., with shape [70K, 2]. These points cover a region of about 5368, 7152 units. I want to sample about 1/25th of points from them, uniformly based on their local densities. This means removing points from denser regions more than the sparser regions which will give me a nice uniformly distributed point array. Any idea how can I do it super fast?
Currently, I put all those points on a 2D array of shapes [5368, 7152] and slide a kernel of shape 11x11 onto it to get a rough local density estimate by which I probabilistically sample the points.

shy oasis
#

Hello all. Deep learning/machine learning newbie but ex-competitive programmer here. Is there any problem based curriculum to teach myself deep learning? Like how codeforces/hackerrank has problem based structures. Step by step problems getting harder to solve and along the way require you to learn new things. Any tips welcome

misty flint
#

not necessarily problem-based but helpful for coders

lapis sequoia
#

are there any servers dedicated for python ai?

hasty mountain
#

Hey guys, is it normal for a DCGAN to output images with a margin of 9 squares on them? It feels like it adds a # mask on its outputs, even if the generated images are good.

#

I don't know if it's a sign of overfitting, if it's normal or if it's another problem...

misty flint
#

the code might be missing a post-processing step

#

if its a consistent output of a 9-square margin, you can just add a step to remove said margin

#

otherwise, youd need to investigate further, starting with maybe the training data as well as padding

jaunty sky
#

;lkjh

hasty mountain
#

This is why I thought it was a sign of overfitting.

pliant sundial
#

What are some ML/AI projects.

#

I need some inspiration to learn it.

misty flint
#

like just look through it

#

actually

#

what model are you using

#

since maybe its been trained with data with padding

hasty mountain
hasty mountain
hasty mountain
misty flint
#

looks like so

#

i have no idea then

#

we used pix2pix and cycleGAN for our GAN project

#

and never faced the issue youre facing

hasty mountain
misty flint
#

computer vision isnt my specialty so maybe someone else has insight into whats going on

hasty mountain
#

I actually removed the Batch Normalization now because I thought it was that that was causing this issue

#

But...now I'm seeing those squares again

misty flint
#

looks like you found out one thing though

#

it wasnt the batch normalization step

#

maybe it is just overfitting. idk why that would be a manifestation of overfitting tho

hasty mountain
#

Yeah, I'm starting to think it's overfitting

#

The squares appeared after 400.000 epochs

misty flint
#

bruh

#

why the heck

#

do you have so many epochs

hasty mountain
#

I thought it was normal to have that many epochs py_guido

#

At least I've read in some paper that the guys used many, many epochs

hasty mountain
#

Oh yeah, I had a problem with that in certain model, then I had to use 235.000 epochs

misty flint
#

For a GAN, convergence is often a fleeting, rather than stable, state.

hasty mountain
#

Perhaps my memory is playing tricks at me and it wasn't a paper for GANs, but for Reinforcement Learning

misty flint
#

that

hasty mountain
misty flint
#

would make much more sense

hasty mountain
#

Uh...well, then...how much epochs did you use?

#

I'm using 16x16 images. I thought 500.000 epochs were ok...until now.

misty flint
#

i dont remember kekHands
but def not as much

hasty mountain
#

I only noticed overfitting with 32x32

#

And it was quite clear it was overfitting with model collapse

#

Can you see the squares?

#

Fun fact: I removed batch norm because it caused those squares to appear with less epochs, this is why I thought the batch norm was causing them.
It seems the batch norm was only making the generator able to reach its ideal point faster thinkmon

pseudo wren
#

I have a quick question about pandas dataframe manipulation

#

i have a dataframe that i need to limit certain categories of

#

for example

#

i have a dataframe of house details

#

i want to limit the categories of bathrooms and bedrooms to only houses with 2-3 bathrooms and houses with 3-4 bedrooms

#

i'm looking through the documentation but i cannot find anything thus far

hasty mountain
pseudo wren
#

hm yeah

#

i figured i'd have to use drop

#

but i'm more so looking to get those specific parameters

#

and figure out how to drop it so that i can get those specific parameters

#

i don't know if a for loop would do me good here

hasty mountain
#

I think you'll use something like data = data.drop("bathroom" < 2) or something like that

pseudo wren
#

that might work

hasty mountain
#

Probably won't, since I don't remember the command, but it's something around that

#

data = data.drop(data["bathroom"] < 2)

#

And be careful with the argument axis=0 or axis=1

pseudo wren
#

hm

#

that's not quite it either

#

'<' not supported between instances of 'str' and 'int'

#

the error i got

hasty mountain
pseudo wren
#

hm

#

so i came up with some sort of solution but

#

it doesn't seem to be a fix all

#
housedf = housedf[df.bedrooms != 5]
housedf = housedf[df.bathrooms!= 1]
housedf = housedf[df.bathrooms!= 4]
housedf.head()```
#

and yes i know it's bad code

#

but in bathrooms it doesn't register that 4.50 shouldn't be there

#

so i need to figure out how to put it in a range

brazen pike
#

housedf = housedf[(df.bathrooms< 4)|(df.bathrooms>=5)] i think would work for that specific line of code

#

i think in general you may be able to simplify that logic just using a range of what you're looking for

#

sorry, corrected something with my suggestion, i think. & = and, | = or

pseudo wren
#

hm i'll try it

#

for some reason it's still including values i am trying to tell it to explicitly exclude

#
housedf = housedf[(housedf.bedrooms > 2)|(housedf.bedrooms < 5)]
housedf.head()```
sick fern
#

square = np.zeroes((300,300) , np.uint8)

#

hey guys, this is a piece of code that i'm not understanding

#

I get that it's making a matrix with only zeroes, but what it 'np.uint8' here?

#

because ive seen it send with np.uint32 as well and i dont understand that

serene scaffold
#

@sick fern unsigned eight bit integer

sick fern
serene scaffold
sick fern
#

Yeah

serene scaffold
# sick fern Yeah

An 8 bit integer is a whole number represented using 8 bits. So you can represent numbers up to 2⁸

#

And it's unsigned, so there isn't a bit used to tell you if it's positive or negative.

sick fern
#

ohh okay got it

#

but how does that relate to the piece of code?

#

like what does it do

serene scaffold
#

You're telling numpy to create an array using that data type. Rather than a 32 bit integer. Or a 64 bit float. Or something.

sick fern
#

so each integer in the array can go up to 2 to the power 8

#

?

serene scaffold
#

It might be more memory efficient, but I'm not really sure.

sick fern
#

ohh okay got it

#

thanks a lott

serene scaffold
#

You're welcome

grave cloak
#

Can you help me unite values ​​in Pandas?

serene scaffold
grave cloak
#

It has the name of the clubs that appear in different rounds, there are 38 in total, I have to gather and show how many goals each team scored in total

#

But only the minute the goal was scored appears, not the goals themselves.

serene scaffold
#

@grave cloak does the minute determine what round it was part of?

grave cloak
#

Not. It only has the id of the match, the minute in which the goal was scored, and the round

serene scaffold
#

@grave cloak can you do a group by, like I showed you before?

grave cloak
#

I thought of a way to do...

#

If you took the null minutes from each game of each team and each round, and could show how many times they appeared

junior quest
#

I'm getting a dimension mismatch here and insanely high loss and 0 accuracy

#

what is wrong with my implementation?

#

I don't think the dimension mismatch in the plot is relevant to the model accuracy and loss though

rose agate
#

you could try housedf.loc[housedf.bathrooms.isin([2,3]) & housedf.bedrooms.isin([3,4])], which should be cleaner, assuming bedrooms and bathrooms are ints

pseudo wren
#

I fixed it

lofty charm
bold timber
#

I want to get the most frequency of year and I get an error. How fix that?

rose agate
weary ridge
#

given this image, and the string bluetooth addressi have to output the corresponding value

#

this is the image

rose agate
# weary ridge given this image, and the string ```bluetooth address```i have to output the co...

I got it somewhat functional using pytesseract. The text it identifies isn't perfect though

import pytesseract
from PIL import Image
import difflib

pytesseract.pytesseract.tesseract_cmd = 'C:/Users/NAMEHERE/AppData/Local/Programs/Tesseract-OCR/tesseract.exe'
text = (pytesseract.image_to_string(Image.open('unknown.png'), config='--psm 3'))
text = text.split('\n')
string_to_match = 'bluetooth address'

for line in text:
    match = difflib.get_close_matches(string_to_match, [line])
    if match:
        print(*line.split(':')[1:])
#

For 'bluetooth address'

#

You'll probably have to look into how to make it detect better

#

this is what it read

split parcel
#

Needed some help with Python (particularly MatPlotLib)
I need to plot some data as a stem plot which is very closely spaced sometimes
Now what matplotlib does is that it just plots the two stems on top of each other
I think this won't happen if the least count is reduced along the x-axis
But I am not getting any documentation online on how to do that
Note that I don't want to change x-ticks since I don't care about the text labels; I want to change the least count which matplotlib is using to put the stems on the graph

gloomy anvil
#

so you'd have a basically a bar plot that looks like a stem plot

split parcel
gloomy anvil
gloomy anvil
split parcel
gloomy anvil
#

I have a question myself: It is very easy to look at correlations of a dataframe in python with a heatmap

#

It is also very easy to plot autocorrelations with statsmodels:

plot_acf(data, lags=50)

#

But now, how can I find out if there is a correlation in a more complex dataset? Right now I have around 60 columns and about 3000 rows. How can I see if there is a strong correlation in the dataset between column 23 and column 54 with a 5 day lag or so? Is there some way find it automatically? Or would I need to create this manually? Or maybe is there an interactive plot where I can try different lags and different columns and see if there are strong correlations?

I feel like this might be a standardproblem where there must be some generic solution for. I can't be the first one to look for signs in data, if there is some pattern which is correlated with an event a few days later?

steep bramble
#

Hello, I'm looking for help regarding seaborn heatmap

#

I follow a training on this topic but I can't see how to handle the topic.
I have a list of products such as pasta, vegetables in a dataframe. One column is the timestamp when the product has been added to the DB.
I'm requested to make a heatmap of when the product are added (crossing month from 1 to 12 and hours from 1 to 24). I'm good with x and y but I don't know how to compute values

rose agate
mint garnet
#

Hey guys, does anyone here have experience with data mapping using AI? My organization does a lot of work mapping client metadata fields to internal ones and I'm interested in automating the process.

mint garnet
somber burrow
gloomy anvil
gloomy anvil
gloomy anvil
mint garnet
# gloomy anvil It totally depends on how the data is structured and what kind of job the AI has...

It's mostly a matter of mapping whatever column head a client has used to define an industry standard field (ex. Loan origination date) to the column head we use to denote the same field. Right now we are basically doing the whole process field by field manually, which is incredibly time consuming and tedious. I am hoping to build some sort of a software solution that will match fields based on some combination of data similarity and field name similarly so that match candidates are populated automatically and we just have to validate the generated matches and fill in an fields that weren't matched manually.

gloomy anvil
#

you could maybe use a simple way of vectorizing to calculate the distance between the two vectors (which is essentially a score of similarity). You'd have to test how well this approach works and it probably would still need manual control work, but it might be an easy and fast approach to solve this.

mint garnet
gloomy anvil
# mint garnet Alright great, could you recommend any resources that I can use to teach myself ...

if you work with a lot of text data, you could start here: https://machinelearningmastery.com/natural-language-processing/

It is also a great ressource for other disciplines of AI. It is also good to start with some simple statistics text book as a foundation before jumping into ML

Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software. The study of natural language processing has been around for more than 50 years and grew out of the field of linguistics with the rise of computers. In this post, you will […]

edgy glen
#

hiho, i am trying to make a script (argpars) form my data that i analysed.
unfortunatly i dont get any output abd some variables are not found.
kind of stuck.. any one down for a quick help

mint garnet
steady basalt
#

Anyone wana help me with my stats homework

serene scaffold
#

if you had asked an actual question, and it was once that I knew the answer to, I'd be answering it right now.

west tapir
#

a question:

Lets imagine a scenario where we have to ask all the employees a set of question (mandatory btw) in which each question have a particular weight and based on their answers we have to put them in different categories aka buckets.
how can we implement this using python dynamically

wooden sail
#

that depends on the type of question. one way, for example, is to take multiple choice questions and encode the answers with a numerical value, e.g. an integer. then you can put all of those values into a vector. you can then split up the n-dimensional space into regions based on the values. you can find intersections of half spaces by just writing out inequalities

west tapir
#

Now we have to make clusters of employees respectively to their output which was got by the inputs they did

wooden sail
#

this sounds like something you'd do with a spreadsheet or something of the sort, nothing python specific

#

you could use a bunch of ifs or inequalities

lapis sequoia
#

Is it "fair", formally, if a university analyses the questions on which minority students perform worse. And deletes those questions from a test.

#

@serene scaffold

serene scaffold
#

I HAVE BEEN SUMMONED

#

what do you want

lapis sequoia
#

Yes Mr. Bond

agile cobalt
#

it seems like even the formal definition of fairness still leaves whenever or not something is fair subjective, so: unclear imo

serene scaffold
lapis sequoia
#

I feel like it is unfair. Maybe because you are forcefully trying to admit minorities. And not based on their skillset.

#

From what it appears from the prompt. It is an analytical ability test

#

Not explicitly mentioned though

serene scaffold
#

I think this is an important issue. But I don't think it's one that our community is really intended to facilitate.

lapis sequoia
#

3rd part

#

1 is definitely not true. Not sure about 3rd

#

Oh nevermind. It is not fair. I made it out from the options. Because second is true. And no option for 2 and 3

serene scaffold
#

wrt test questions, 3 seems more compelling than 1

lapis sequoia
#

Is it "fair" or not. We don't know. It's not fair straightforwardly. It's only when you put in some social science in it and use the bigger picture. It might look fair then. But more like unfair rn. Because you are trying to create a "biased" test.
Assuming the questions are analytical in nature. And not some qualitative ones in that case the test might be biased originally and you removed the bias.

grave cloak
#

Guys, I'm using Pandas and when I try to see the values ​​of different columns it returns the same value
Does anyone know how to solve?

wispy hill
grave cloak
#

Yes it was my mistake

#

Can you tell me if there is a way to add results from two groupby?

serene scaffold
grave cloak
#

1+2

serene scaffold
grave cloak
#

Add the results of 'mandante_placar' with 'visitante_placar'

wispy hill
grave cloak
#

ok i will try

serene scaffold
# grave cloak

if the set of labels on the left are the same for both, you can literally just add them with +

digital dew
#

hey guys can anyone help me with some ML/NPL error please ?

#

i searched on google and didn't find a solution

wispy hill
#

anyone free to check #help-cake ? im trying to practice for exam tomorrow and im stuck on a problem

serene scaffold
grave cloak
#

No hahaha, but I went to do a sum and he added the match. For example '4x2' became 6

serene scaffold
#

please show the exact code that you ran and explain what it did that was different from what you wanted.

grave cloak
#

df1 ['total'] = df1['mandante_placar'] + df1 ['visitante_placar']

serene scaffold
#

did you get an error message or what?

grave cloak
#

But it's doing the sum of the match and not of each club

serene scaffold
#

please do print(df[['mandante_placar', 'visitante_placar']].to_dict()) and put that in the paste bin

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

grave cloak
serene scaffold
# grave cloak

it has to be text in the paste bin that I can copy and paste. please do not post screenshots of text.

grave cloak
#

The content is too big

serene scaffold
# grave cloak

please give the code in this screenshot as text, and the one after it as well.

grave cloak
serene scaffold
#

anyway, we can forget about that for now. just give the code in those two screenshots as text. I don't want to retype what is in them

serene scaffold
# grave cloak

please do not post any more screenshots. I will not look at them. when you use the paste bin, you have to save it and give the link. this was in the instructions from the !paste command.

grave cloak
#

Ok, one momento

serene scaffold
#

like I said, just the two lines of code from the two screenshots. as text in the chat.

#

@grave cloak I have to leave soon

serene scaffold
arctic wedgeBOT
#

Hey @grave cloak!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

serene scaffold
#

with +

grave cloak
#

Ok, thanks a lot for the help, I'll try here

#

I tried to send the print and the result but it's too big

serene scaffold
#

In this case, I was only asking for the code, not the displayed result

grave cloak
#
df1.groupby(['mandante'])['mandante_placar'].sum()
df1.groupby(['visitante'])['visitante_placar'].sum()
serene scaffold
#
df1.groupby(['mandante'])['mandante_placar'].sum() + df1.groupby(['visitante'])['visitante_placar'].sum()β€Š
#

Try that @grave cloak

grave cloak
#

File "<ipython-input-41-205e99e99ae7>", line 1
df1.groupby(['mandante'])['mandante_placar'].sum() + df1.groupby(['visitante'])['visitante_placar'].sum()
^
SyntaxError: invalid character in identifier

#

++

#

That was the mistake I think

#

Thank you so much @serene scaffold and thanks for your patience too

pseudo wren
#

is anyone familiar with the python library darts?

#

it's specifically used for time series modeling

limber token
#

Hey guys, any idea how I can slice a pandas dataframe with a datetime column every X amount of days (specifically monthly and yearly)? The dataframe current has daily info. I've tried two different approaches: filtering every 30 or 365 indexes, but that doesn't work since there are gaps in information: for example index 30 is Feb 10th, 1995, and index 60 is March 28th, 1995. I've also tried locking the relevant day/month based on the selected filter (image in annex), but again that's not the best solution since it's looking for exact days and there are gaps in the df. In some instances it goes like 8 years without finding the same exact date. Any tips on how to go about this?

serene scaffold
#

@limber token can you set the timestamp as the index?

limber token
serene scaffold
#

!docs pandas.Grouper

arctic wedgeBOT
#

class pandas.Grouper(*args, **kwargs)```
A Grouper allows the user to specify a groupby instruction for an object.

This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.

If axis and/or level are passed as keywords to both Grouper and groupby, the values passed to Grouper take precedence.
limber token
#

I'm a bit of a noob with this, how could I filter by time intervals by grouping?

limber token
#

I did

#

Didn't understand how to apply it

#

Oh

#

There's literally a freq method, my bad lol

limber token
misty flint
#

11/10 would follow any online content you post

rose agate
serene scaffold
#

Think of a grouped DataFrame as a bag of DataFrames

serene scaffold
misty flint
#

dont do it stel

serene scaffold
lapis sequoia
#

Hello, I'm trying to remove all rows with NaN values from my dataframe. I am using the following code:

df = web.DataReader('DEXUSEU', "fred", start, end)
df['SP500'] = web.DataReader('SP500', "fred", start, end)
df['Inflation'] = web.DataReader('FPCPITOTLZGUSA', 'fred', start, end)
df['Interest Rates'] = web.DataReader('INTDSRUSM193N', 'fred', start, end)

df = df.dropna()

print(df)

but when I do that I get this output:

Empty DataFrame
Columns: [DEXUSEU, SP500, Inflation, Interest Rates, M3, M2, M1, Interbank]
Index: []

any ideas on what I'm doing wrong here?

grave cloak
#
df1.rename(columns={'mandante': 'clube'},inplace=True)
df1_clubegols = df1.groupby(['clube'])['mandante_placar'].sum() ++ df1.groupby(['visitante'])['visitante_placar'].sum()
print(df1_clubegols.to_markdown())
#

When I run this it is like 'club' and '0' the columns, does anyone know how I can rename this '0' and generate a graph with the values

plush jungle
#

I'm training a deep Q learning AI to play a top down shooter of my own design

#

and I just learned that the deep Q learning algorithm is guaranteed to converge

#

but there's no guarantee it'll happen quickly

#

at 1.1 Million moves, its loss is still extremely high and it can't hit the stationary target to save its life

#

but I just watched a video on reinforcement learning for a boxing game in unity

#

and they could barely stand until 250 million cycles

#

so I'm wondering if I'm just being impatient, and in a couple hundred million moves it'll converge after all

#

but currently it's training at a rate of about 100k turns an hour

#

so it'll take about 104 days of continuous training for it to get there

#

but the thing is, it's barely using my gpus

#

and I've got a 3080

#

so it's pretty powerful

wooden sail
#

taking millions of cycles sounds about right. as to how long it takes, it depends on how many parameters there are and how difficult it us to compute the maximization/prediction per step

#

what are you using for this? pytorch? tensorflow?

plush jungle
#

pytorch

#

it uses the gpu, but in between every turn it has to run pygame code

#

I'd like to make full use of the gpu

wooden sail
#

are you batching things up?

plush jungle
#

yes, but I'm using the batch size of the code I skeletonized, 32

#

out of the last 10k turns

wooden sail
#

you can probably increase that a lot more

plush jungle
#

if I dramatically increase batch size, would my gpu run them concurrently?

wooden sail
#

that should be the case

plush jungle
#

let me test that theory

wooden sail
#

also, are you generating all quantities on gpu? idk if pytorch allows you to directly create variables on gpu

#

moving them from cpu to gpu and back is super slow

plush jungle
#
minibatch = random.sample(replay_memory, min(len(replay_memory), model.minibatch_size))

        # unpack minibatch
        state_batch = torch.cat(tuple(d[0] for d in minibatch))
        action_batch = torch.cat(tuple(d[1] for d in minibatch))
        reward_batch = torch.cat(tuple(d[2] for d in minibatch))
        state_1_batch = torch.cat(tuple(d[3] for d in minibatch))

        if torch.cuda.is_available():  # put on GPU if CUDA is available
            state_batch = state_batch.cuda()
            action_batch = action_batch.cuda()
            reward_batch = reward_batch.cuda()
            state_1_batch = state_1_batch.cuda()

        # get output for the next state
        output_1_batch = model(state_1_batch)

        # set y_j to r_j for terminal state, otherwise to r_j + gamma*max(Q)
        y_batch = torch.cat(tuple(reward_batch[i] if minibatch[i][4]
                                  else reward_batch[i] + model.gamma * torch.max(output_1_batch[i])
                                  for i in range(len(minibatch))))

        # extract Q-value
        q_value = torch.sum(model(state_batch) * action_batch, dim=1)

        # PyTorch accumulates gradients by default, so they need to be reset in each pass
        optimizer.zero_grad()

        # returns a new Tensor, detached from the current graph, the result will never require gradient
        y_batch = y_batch.detach()

        # calculate loss
        loss = criterion(q_value, y_batch)

        # do backward pass
        loss.backward()
        optimizer.step()```
#

this is the training code

#

my assumption is that the .cuda() will parallelize it

wooden sail
#

where's the .cuda in there

plush jungle
#
  if torch.cuda.is_available():  # put on GPU if CUDA is available
            state_batch = state_batch.cuda()```
wooden sail
#

ah i missed it

#

that's very slow, i think there's a way to create them directly on gpu. but try increasing the batch size first

plush jungle
#

what frustrates me is that the flappy bird model this code was originally for converged after 2 million rounds

#

and mine is at a million and hasn't even improved

#

is aiming and firing from a stationary location to a stationary location really more complex than flappy bird?

wooden sail
#

i couldn't say

plush jungle
#

ok yeah that definitely changed it

#

it was at 320 batch size before, not 32

#

so I upped it to 5000

#

and now the GPU spikes pretty seriously every time it back propagates

iron basalt
plush jungle
#

you mean a different top down problem?

iron basalt
#

Any RL problem, like a maze.

plush jungle
#

cause I tried the flappy bird model before I started this, and my model trained pretty quickly

iron basalt
#

Do a bunch and see where it fails or not.

plush jungle
#

ok

iron basalt
#

Like gym.

plush jungle
#

a maze seems like a good idea if this batch size thing doesn't work

#

maybe I'll do a driving one too

#

since that's basically a top-down flappy bird

iron basalt
#

After you have a bunch of different ones played or failed, make a matrix of features of each and try to find out what it struggles with (assuming it's not just a bug).

plush jungle
#

features? how do I compare different games?

#

if it succeeds on a maze and a driving game, what features would be different?

#

from a top down shooter

iron basalt
#

You kind of have to guess with it. But you can make an educated guess.

plush jungle
#

yeah any information is valuable

iron basalt
#

Action space, turn based vs not, grid based vs not, delayed rewards, sparse rewards, etc.

plush jungle
#

yeah ok

iron basalt
#

Then make a hypothesis, etc, etc (do science, data science).

#

(You can make specific games with specific features)

#

*Also RL is hard/unsolved, it not working without a lot of effort (or just A LOT of compute) is expected.

#

**What is really annoying is that you will often not know if it's just a bug or not. I have had times where removing a bug made it worse (AI/ML is kind of special in this way / that this can happen).

plush jungle
#

I guess I'm spoiled then with the flappy bird model being tuned just right

#

and that is annoying that you can't really know what the problem is

iron basalt
#

***Some papers out there sometimes make me wonder if their implementation was bugged because when reproducing it I did not get any good results, and tried pretty hard to find any bugs. This is a huge problem with not releasing some source code. There is no reproduction (which is needed for it to be science).

plush jungle
#

what exactly do you define as a bug?

#

if a model is valid but only with the hyperparameters tuned just right, is that a bug?

iron basalt
#

No, but something like writing out of bounds is.

plush jungle
#

writing out of bounds?

iron basalt
#

index out of bounds

plush jungle
#

oh I see

#

which could give you results that aren't actually real

iron basalt
#

Bugs happen all the time, and there is no way to tell that they did not just mess it up if there is no source code available.

plush jungle
#

yeah, a paper with no code is a pretty big red flag

iron basalt
#

The thing about ML is that the bug could actually results in better results, a special case in software.

#

Other stuff like traditional algorithms would probably just break (usually very noticeable).

#

Another place where this kind of thing can show up is in game development where a bug becomes a feature because during testing they found that it enhanced the game (e.g. Minecraft pistons strange buggy behavior).

#

The reason it shows up in ML is because the systems are often good at dealing with noise and are random already.

#

And adding a bit of noise in a specific way can make it better (not all the bugs add just a bit of noise, there are other types but that is one of them).

plush jungle
#

that's a good point, I never thought about how weird that is that a bug could actually be "beneficial" in those fields

hazy gazelle
#

yooo do u guys know how to solve this?

wooden sail
#

looks like you want to set up some inequality based on the value of AVERAGE SALES

hazy gazelle
hazy gazelle
hazy gazelle
wooden sail
#

i can help you with the logic, but i won't do your homework for you

hazy gazelle
wooden sail
#

you want some nested if else statements

#

for example if(value on the column to the left <= some number and value on the column to the left >= some other number, then 'SILVER', else ( if ... ) )

low spear
#

does anyone know an effective way on how to do this?

hazy gazelle
gilded kestrel
#

hey guys, I have a question. Let's say you want to predict house/apartment selling price for a part of the city, but you don't have floor details i.e. two apartments on the same building can appear as duplicates in your data but with different selling price. Do you assume that these concern different apartments or assume that it is the same apartment sold another time hence remove them to keep the observations independent?

hazy gazelle
gilded kestrel
inland zephyr
#

hello everyone, does anyone know best dataset to practice for support recomender task?

#

association or other approach such classification or sparse matrix is welcomed

grand bronze
#

Hey everyone! I'm very new to computer vision and looking for some input on this problem I have.
I want to train a model that takes an image as input and gives the same image as output but with one or multiple added overlays. Like this:
Input:

#

Output:

#

It's basically adding a missing voronoi cell

#

I have identified pix2pix as a possible candidate, but I get the feeling that model is not meant to take a photo as input but more a schematic as input

#

I'm not asking anyone to guide me trough the whole way to do it, just looking for a pointer to the right type of model

#

Image segmentation would perhaps also work to kind of mask the area that the cell is missing from, but I don't think that's what those models are meant for either

grand bronze
# grand bronze Output:

To clarify, it is known in advance what cell is missing, the model would be trained on a synthetic dataset of where these cells are supposed to be. There is only 1 'solution', it doesn't need to create new voronoi patterns

oblique agate
#

Can we find confidence value of a prediction made by logistic regression model?

grave cloak
#

Does anyone know how I change that '0'?

serene scaffold
grave cloak
serene scaffold
odd meteor
autumn mountain
#

Hello, I just found this community (used to look on irc channels but seems like I am getting older)

misty flint
#

does anyone have resources on writing aws lambda functions to call ML models for inference

#

the more i work, the more i feel i know nothing

#

forever imposter syndrome

#

calling it now

#

πŸ’€

autumn mountain
#

I wanted to know how to convert this little table:

#

to something having 3 columns, Year, Month, Value

serene scaffold
# autumn mountain

you could .stack() and then .T to transpose, I think. if you do print(df.to_dict('list')) and give the text in the chat, I can experiment.

autumn mountain
autumn mountain
# serene scaffold you could `.stack()` and then `.T` to transpose, I think. if you do `print(df.to...

{2021: {1: '525.785', 2: '427.857', 3: '477.502', 4: '468.083', 5: '484.556', 6: '457.686', 7: '478.079', 8: '518.769', 9: '532.103', 10: '562.109', 11: '544.405', 12: '526.958'}, 2020: {1: '470.827', 2: '424.322', 3: '459.281', 4: '463.401', 5: '507.738', 6: '509.694', 7: '518.549', 8: '543.902', 9: '566.628', 10: '619.065', 11: '589.061', 12: '583.310'}, 2019: {1: '386.320', 2: '331.042', 3: '374.333', 4: '423.750', 5: '518.613', 6: '525.426', 7: '551.882', 8: '575.643', 9: '556.631', 10: '599.759', 11: '573.220', 12: '539.916'}, 2018: {1: '265.377', 2: '224.912', 3: '278.908', 4: '295.472', 5: '317.927', 6: '317.742', 7: '347.696', 8: '373.720', 9: '401.025', 10: '413.556', 11: '406.041', 12: '407.751'}, 2017: {1: '290.174', 2: '225.300', 3: '252.969', 4: '236.823', 5: '248.159', 6: '245.243', 7: '293.297', 8: '316.340', 9: '307.968', 10: '302.871', 11: '293.155', 12: '285.409'}, 2016: {1: '284.158', 2: '226.621', 3: '264.373', 4: '275.014', 5: '295.629', 6: '297.553', 7: '280.241', 8: '299.579', 9: '334.492', 10: '337.164', 11: '320.987', 12: '289.284'}, 2015: {1: '402.896', 2: '335.901', 3: '341.535', 4: '327.988', 5: '367.478', 6: '362.013', 7: '362.265', 8: '357.917', 9: '347.830', 10: '361.113', 11: '332.901', 12: '314.040'}}

#

(without the 'list' argument)

serene scaffold
autumn mountain
#

{2021: ['525.785', '427.857', '477.502', '468.083', '484.556', '457.686', '478.079', '518.769', '532.103', '562.109', '544.405', '526.958'], 2020: ['470.827', '424.322', '459.281', '463.401', '507.738', '509.694', '518.549', '543.902', '566.628', '619.065', '589.061', '583.310'], 2019: ['386.320', '331.042', '374.333', '423.750', '518.613', '525.426', '551.882', '575.643', '556.631', '599.759', '573.220', '539.916'], 2018: ['265.377', '224.912', '278.908', '295.472', '317.927', '317.742', '347.696', '373.720', '401.025', '413.556', '406.041', '407.751'], 2017: ['290.174', '225.300', '252.969', '236.823', '248.159', '245.243', '293.297', '316.340', '307.968', '302.871', '293.155', '285.409'], 2016: ['284.158', '226.621', '264.373', '275.014', '295.629', '297.553', '280.241', '299.579', '334.492', '337.164', '320.987', '289.284'], 2015: ['402.896', '335.901', '341.535', '327.988', '367.478', '362.013', '362.265', '357.917', '347.830', '361.113', '332.901', '314.040']}

#

(I thought you were going to miss the month)

serene scaffold
autumn mountain
serene scaffold
#
In [8]: df.unstack().to_frame().T
Out[8]:
      2021                                               ...     2015
        1        2        3        4        5        6   ...       7        8        9        10       11       12
0  525.785  427.857  477.502  468.083  484.556  457.686  ...  362.265  357.917  347.830  361.113  332.901  314.040
autumn mountain
#

oh let me try

autumn mountain
serene scaffold
#

each value has a year and a month. so what are these two columns going to mean?

autumn mountain
#

Example:
Year Month Val
2021 1 427.5
2021 2 456.6
2020 12 123.45

#

for example

#

(this is because I need to join this dataframe to another based on Year and Month)

raw urchin
#

Is this the best place to ask about webscraping?

serene scaffold
serene scaffold
autumn mountain
#

cool it worked !

#

let me try to rename the cols

serene scaffold
#

In [15]: df.columns.rename('year', inplace=True)
In [17]: df.index.rename('month', inplace=True)

In [18]: df.unstack().reset_index()
Out[18]:
    year  month        0
0   2021      1  525.785
1   2021      2  427.857
2   2021      3  477.502
3   2021      4  468.083
4   2021      5  484.556
..   ...    ...      ...
79  2015      8  357.917
80  2015      9  347.830
81  2015     10  361.113
82  2015     11  332.901
83  2015     12  314.040

[84 rows x 3 columns]
autumn mountain
#

Thank you Stelercus !

storm oasis
#

anyone can help me and give me tips or reference ?
i am actually wanna build Named Entity Recognition Using Conditional Random Fields. But I have trouble in entity labelling so any advice from you guys how to labelling data for text with indonesian language?

serene scaffold
storm oasis
#

so i have done preprocessing phase, then i will go to entity labelling / annonate the text to which location, person and other. but i don't know how do that

serene scaffold
storm oasis
#

sorry i was wrong giving the explanation.

pseudo wren
#

Working on a time series model with the package Darts. Having trouble returning the acf, as the error says time series has no attribute to shape

#

i can't find much on stack overflow about this error

serene scaffold
pseudo wren
#

will do

#
AttributeError                            Traceback (most recent call last)
<ipython-input-33-9be0d81f7eb4> in <module>
----> 1 plot_acf(train)

2 frames
/usr/local/lib/python3.7/dist-packages/statsmodels/graphics/tsaplots.py in _prepare_data_corr_plot(x, lags, zero)
     17     if lags is None:
     18         # GH 4663 - use a sensible default value
---> 19         nobs = x.shape[0]
     20         lim = min(int(np.ceil(10 * np.log10(nobs))), nobs - 1)
     21         lags = np.arange(not zero, lim + 1)

AttributeError: 'TimeSeries' object has no attribute 'shape'

#

the error

#

the code is this

serene scaffold
#

so x, whatever that is, is not an array.

pseudo wren
#

hm

#

i'll send the relevant code

#
train.plot(label='train')
val.plot(label='validation')
plt.legend();
# here we are splitting our model into training and validation. Everything before 2019 is training, and 2019 onwards will be the validation series```
#

output

#

from there i attempted to plot the acf

serene scaffold
pseudo wren
#

the x value is the dates

#

but i thought i had already established it

#

i imported the date time package as well so it could be read

serene scaffold
pseudo wren
serene scaffold
#

how is it a regex flag?

#

also is that an upper case X?

pseudo wren
#

yes it is

serene scaffold
#

the x in your function is lower case. so this is something completely unrelated.

pseudo wren
#

i thought so too

#

but X capital shows up

#

x lowercase does not

#

this is my first time attempting a time series model

serene scaffold
#

you have to put print(type(x)) in the function and run it to figure out what type(x) is.

pseudo wren
#

yeah i tried that

#

nothing

#

lowercase x is not defined

#

i'm following the documentation guide for darts though

#

on building a time series model

serene scaffold
#

oh it's here

#

AttributeError: 'TimeSeries' object has no attribute 'shape'

pseudo wren
#

yeah

#

that was the initial issue

#

however i've seen other people work around this

serene scaffold
#

you said time series has no attribute to shape

#

which is not the same thing.

pseudo wren
#

ah

#

i've seen other people work around this error though

serene scaffold
#

what library does TimeSeries come from? if it's a sequence of some kind, you might be able to convert it to an array.

pseudo wren
#

The Darts library

#

admittedly a library i have never used before

#

i can send the code leading up to that error

#
series = TimeSeries.from_dataframe(wrc, 'Date', 'High', fill_missing_dates=True, freq='B')
train, val = series[:-36], series[-36:]
wrc.head()```
#

wrc.shape

#

model = ExponentialSmoothing()
model.fit(train)
prediction = model.predict(len(val), num_samples=1000)```
#

wrcm = Theta()
wrcm.fit(train)
pred = model.predict(len(val))```
#
train.plot(label='train')
val.plot(label='validation')
plt.legend();```
sleek forum
serene scaffold
pseudo wren
#

no i didn't

serene scaffold
#

anyway, here are the docs for TimeSeries

pseudo wren
#

according to the documentation i didn't need to declare x outright

sleek forum
#

As Stelercus said before, whatever x is is not what it should be (a list).

serene scaffold
#

there's no "declaring" in Python. variables are just names.

where do you call the function that has x?

pseudo wren
#

series = TimeSeries.from_dataframe(wrc, 'Date', 'High', fill_missing_dates=True, freq='B')

sleek forum
serene scaffold
#
AttributeError                            Traceback (most recent call last)
<ipython-input-33-9be0d81f7eb4> in <module>
----> 1 plot_acf(train)
#

where is this in the code

#

whatever train is (it is probably a TimeSeries) is not a valid type for that function.

pseudo wren
#

train, val = series.split_before(pd.Timestamp('20190101'))

#

this is the code that it's pointing to

#

i had seen someone work around this error using the same darts module though. From what I can tell, they followed relatively the same process.

#

which is frustrating

sleek forum
sleek forum
#

I ask you to ignore the English mistakes, I'm not fluent

pseudo wren
#

nah totally fine!

#

that bit of code does actually return a plot

#

but it won't return the shape of said plot

sleek forum
#

Try to put the val in a list and call that list in the place of val.

pseudo wren
#

this is the output of that

#

trying to plot it again

#

still nothing

#

same error

misty flint
#

@pseudo wren this looks like an interesting library nonetheless...maybe i will check it out if i ever have to do time series stuff

pseudo wren
#

buuut so far it's just a headache

misty flint
#

oof

#

well it still looks like itd be promising for maybe simple stuff

#

at least according to the readme

#

i like that you can create a time series object straight from a pandas df

#

so ill bookmark it

#

just in case

pseudo wren
#

yeah absolutely

#

i just need to figure out how to actually use it all the way

#

creating a time series directly from pandas has been insanely convenient though

misty flint
#

if i end up using this library later on, ill let you know

pseudo wren
#

thanks! i'd be interested to see how you fare with it

limber token
#

But I don't really like this solution, for two reasons:

#
  1. I need it to match the exact day first, and then if it doesn't match, try the range
  2. It's pulling more than one day per month, since it's pulling every day that matches the range
serene scaffold
misty flint
#

omg stel, i have done so many type conversions with this data that im pretty sure something got lost along the way

#

time to check this json i exported i guess

inland drum
#

I have to build a large dataset on employees from a couple of siloed datasets and I'm trying to engineer it so that data scientists have an easy time passing it to estimators / prediction algos.

I'm struggling because I have daily snapshots of the employees properties like their contract, benefits, manager, department. However, I also have other data that does not update daily such as survey results, satisfaction, but fortnight, and I think I have to somehow merge them.

Many employees do not change properties daily such as manager but some do.

One of the goals is enabling DS to be able to generate monthly or weekly predictions over the employees, such as churn rate or satisfaction per department per month with the data set whilst including that auxiliar data that comes from evaluations or surveys.

I would like to know what strategy is used normally to face this type of time granularity diversity on features pertaining the same population.

Could anyone provide an outline please?

In kaggle you rarely see data that has like history snapshots of the features so this feels like a non so common case

plush jungle
#

yes! my Q learner finally learned to wiggle to maximize laser-on-target time

#

after 1.5 million rounds

#

today wiggling, tomorrow aiming

slender plinth
wooden sail
plush jungle
#

seems that way

#

we'll see if it learns to aim in a million or two more rounds

inland drum
# slender plinth I don't know if understood correctly. Bu tha wouldn't be the case of provide al...

I believe this would just result on passing the problem to the data scientist?

Wouldnt this force them to go around the data silos to find the feature pieces and then figure out how to merge them / combine into useful / homogenous time ranges? We would like to avoid that.

Naturally, we want to still give them the freedom to mix and match but at the same, it seems valuable/standard to have something like a feature store that they can query in an homogenous manner that suits most common training techniques.

I posted the question in help-donut in case anyone has something they'd like to say

slender plinth
inland drum
#

@slender plinth Sounds like you are a DS?

I feel inclined to believe your experience should still be relevant as it is just a matter of who prepares the data.

So if you were to receive just the timestamped data as you suggest (keeping in mind that some data does not follow the same granularity), ie: payroll survey data is only available every 15 days whilst sentiment score has daily snapshots whilst the department is sampled in the dataset daily but may not change for years, how would you do the merge?

Keep in mind that this means you would actually receive N datasets where N is number of silos and you'd have to figure out how to merge them should you need to do that (perhaps by date and employee, but date granularity is non-homogenous)

slender plinth
# inland drum <@470235392101187584> Sounds like you are a DS? I feel inclined to believe your...

Yes sir I am a DS.

So, this is my opinion and experience.
I would love to have someone parsing and organising data before it comes to me.

But! What I use to see is data coming in raw The DS is up to verify and understand how the DB's should be merged and what are useful or not.

If you could talk to your DS and ask him how he/she would like to receive the data would be the best. I kind like receive all the raw information so I can take my own insights from it and them merge them to do all the work (cleaning, feature engineering, etc...)

misty flint
#

I would love to have someone parsing and organising data before it comes to me.
kekHands

#

if only

#

there is super nested json where it feels like i have to grab hidden secret data

#

in order to train a model

#

it is a nasty schema too since there's somehow multiple choice questions in there

#

too many records too

#

it's like 7 levels in btw

#

and then like the questions and answers are on separate levels

#

like who did this

inland drum
# slender plinth Yes sir I am a DS. So, this is my opinion and experience. I would love to have ...

I have a meeting scheduled with him but I am trying to get a better feel for the matter to not be so lost.

I think I failed to english in my last paragraph and omitted the question after just explaining context lol

If its not much of a bother or you have time, I'd like to hear how would you approach merging such an heteregenous dataset? Like, don't you need to have homogenous date ranges in order to pass that to an estimator?

slender plinth
# inland drum I have a meeting scheduled with him but I am trying to get a better feel for the...

Sorry for the late reply, urgent meeting..

Yes, you need to have specific keys between the data set in order do connect them.
But you can do this, see if make sense:
Link all your data by user Id if available, for the data that are not daily, create a specific invalid dare(1/1/1800) and let the DS know. For all dataset, create week and month columns, I believe, by your explanation that you can do at least for all data points by month, if you could also add weeks would be good.

With that, the DS will be able to connect.

Edit: I think your primary key would be the user ID, then you would have your secondary key's Date, Week, Month, Year(if that is the case).

I could only come up with that solution, not sure if is the optimum but pretty sure your DS will figure out how to handle this data.

inland drum
#

No need to apologize mate, thanks a bunch for taking the time to answer.

This sound like a plan, I also spoke to them and got some clues onto what structure to keep the data at the DS level for the least amount of pain for us all. I don't think its necessary to go in length but it hints to me that we need to keep the data ungrouped because different analysis will require them to do different groupings so summarizing from the source instead of during their pre-processing may end up restricting some analysis. @slender plinth

chilly token
#

can anyone help me a bit I'm a beginner in programming, i want to extract data from a pdf and make graphs using matplotlib, ive already extracted the dat dtring out of the pdf using pdfminer but the way it is will be very hard for me to parse it, i want to make something where i upload any similar pdf and it should give me graphs. Now anyone can help or guide in this please that would be appreciated

gusty frost
#

Should I learn the math for ML/AI before I get started with programming and making projects?

serene scaffold
#

you should learn the basics of linear algebra, probability, and statistics. at the very least.

#

but if you feel like starting a project, and you enjoy working on it, and you learn something along the way, it doesn't necessarily matter if you don't finish it.

gusty frost
#

So I should learn the basics of linear algebra, and statistics before actually programming?

#

Wouldn't I also have to learn some calculus?

#

@serene scaffold

serene scaffold
serene scaffold
gusty frost
serene scaffold
gusty frost
#

what about the programming part?

gusty frost
serene scaffold
gusty frost
#

Do you guys have any resources on linear algebra/statistics for ML?

tacit basin
severe grail
#

Hello

arctic wedgeBOT
#

Hey @severe grail!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

severe grail
gilded flame
#

what is wrong with my dataframes indexing?

#

i used append and concat

#
                    dataFrameStack = None
                    
                    cursor = cnx.cursor()
                    cursor.execute(QUERY)
                    df = pd.DataFrame(cursor)
                    df.head()
                    print(df)


                    if not df.empty:
                        if dataFrameStack is not None:
                            dataFrameStack = dataFrameStack.append(df,ignore_index=True)
                        else:
                            dataFrameStack = df

                 
                    print('\n\n\n\n\n***********************')
                    print(dataFrameStack)
                    field_names = [ i[0] for i in  cursor.description]
                    print(field_names)
                        
                    xlswriter = pd.ExcelWriter('{}/{}.xls'.format(type,loc),engine='openpyxl')

                    if not df.empty:
                        df.columns = field_names  
                      
                        df.to_excel(xlswriter,index=false)

                        xlswriter.save()
                    else:
                        cnx.close()```
#

what is wrong with the logic?

#

pd.concat([dataFrameStack,df],axis=0,ignore_index=True)

#

won't work

#

the second dataframe jump after the last columns

arctic wedgeBOT
#

@harsh spade Per Rule 6, your invite link has been removed. If you believe this was a mistake, please let staff know!

Our server rules can be found here: https://pythondiscord.com/pages/rules

fierce pine
#

Heylo, on what topic would you guys prefer a research summary!??

fierce pine
wary plank
#

hii anyone around? I need some help in a problem.

#

so I have a dataset with 800 features, and fun part is there is no correlation between these variables.

#

So how should i reduce the dimensionality of it?

#

i have tried using pca, feature extraction, feature selection and some other method but on test dataset highest r2 score is 1.8 😦

wooden sail
#

if you take an SVD of the data, how do the singular values look? do you know anything about where the data comes from?

wary plank
#

they haven't told where does that data come from but i think it's a real world data which is feature engineered

#

it's like 8 main features and rest are derived features

wooden sail
#

when you did PCA, how did you choose how many components to keep?

#

and how many did you keep

wary plank
#

r2 score seems to be increasing with increase in features

wooden sail
#

with r2 score you mean mean squared error?

wary plank
#

no

#

coefficient of determination

#

explained variance

summer pebble
#

anyone knows why my friend's and my feature importance value are different despite the fact that we are using the same dataset? πŸ€”

wooden sail
#

it's a scaled mean squared error, i just checked

wary plank
wooden sail
#

you'd expect this quantity to go to 1 as you increase the number of principal values

wary plank
#

it hasn't got above 0.3 even taking all features

wooden sail
#

that seems wrong, since taking all features would mean you just have the original data again

wary plank
#

and on test data it messed up big time, .9 so it's overfitting

wooden sail
#

on training data, you mean?

wary plank
wary plank
wooden sail
#

probably needs something more robust to noise. that's why i was asking how the singular values look

wary plank
#

wait i will send you the ss

wooden sail
#

that can give you some idea of how noisy the data set is, and whether a robust version of PCA could work better

wary plank
#

just digging in my old code, give me 2 mins

#

n_components is 10

wooden sail
#

that does already seem like a pretty decent approx

#

can you plot all of the singular values?

#

how many examples are in the training data

wary plank
#

800 attributes

wooden sail
#

20k is pretty good

#

so yeah, there should be 800 singular values and the question is how noisy they are

wary plank
wary plank
wooden sail
#

aha, but there you see the singular values are still quite large

wary plank
wooden sail
#

what i was gonna note is that, if you have your samples in a vector of size 800, and have 20k examples of these vectors, you can place them in a matrix of size 800 x 20k. then 1/20k (MM^T) is an approximation of the covariance matrix. under the assumption that there is noise that is uncorrelated with the true data or true features, 1/20k MM^T = C + N, where C is the true covariance and N is the noise covariance. for real world data, C is usually rank-deficient. noise tends to be full rank, and often/hopefully close to diagonal

#

under those conditions, the singular values of the covariance matrix are the original singular values plus the noise singular values, so the overall covariance appears to be full rank. as long as the true singular values are modestly large, you will mostly see the behavior of the data. once they become small, they are dominated by the noise

#

so if there is a weird sudden change in the profile of the singular values, it often hints at moving out of the signal space and into the noise space

#

(which would, in a noise free case, just be the null space)

#

so it'S a good idea to make a plot of all 800 singular values of the sample covariance

wary plank
#

the data has lots of nan values as well, i have dropes those features which has nan values > 15,000

wary plank
#

model score is 0.92 on train, on tet its showing 0.34

wooden sail
#

what are you doing?

wary plank
#

so i took all features and transformed using pca, and ran my model

wooden sail
#

why?

wary plank
#

i thought that's what you sad 😦

wooden sail
#

i never said anything about ML

#

we were looking at the data first to see if we could learn something

wary plank
#

na na model i ran just to check

wooden sail
#

and what is your model doing anyway? what are you trying to get from the data

wary plank
#

based on features i am trying to predict the score of a person given by some coach

wooden sail
#

scores of what

#

you never mentioned any of this before so i have no idea what you're doing

wary plank
#

okay i will tell you problem first

#

so i have 8 main features of a football player, and based those 8 features there are other derived variables. We are trying to predict the score given by a scout on the basis of those features

#

features include, position 1, position 2 of a player, weight, age, height, team code he plays for etc etc

wooden sail
#

all right

wary plank
#

so nearly all the features are scaled between 0, 1

#

including height

#

some categorical features are there which i changed using one hot

#

i just need some clue or hints on how to handle these many features, as i have never worked on something like this

zenith panther
#

hi, i have a question.. is image classification useful to rate images out of 5 ??

zenith panther
#

from the image i can tell if the house for instance is high standing

#

or not

#

the rate will be from 1 to 5 its like giving 5 stars thing

shell depot
#

coz you already have 5 classes, so yess by classifying an image the result will be one of the five classes ofc

shell depot
#

wlcm

serene scaffold
serene scaffold
zenith panther
#

yes its like seeing the rating of the house comparing to its price

#

cause i did scraping to get the data

zenith panther
west tapir
#

Is there any algorithm which dynamically updates/eliminates a various number of output while the user is giving input to a set number of questions?

west tapir
# stoic adder What are you trying to do??

I got a question from a friend,

Assume there are 20 personality questions which are mandatory each questions have a specific and unique weights assigned to them. based on the questions answered by the user there are different output or lets call the buckets, for example the person is a party person, introvert, alcoholic etc.
Now what we have to do is, ask all the employees of a company to fill out the form and based on their input we have to put them in different buckets. Now we can do it with any language by just comparing the weights. My friend ask that what can we do to make it dynamic using python specifically

#

so i was thinking of an ai/ml algorithm which eliminates the output while the user inputs the form

stoic adder
#

I think the better way would be to just compare the weights as u suggested earlier. I doubt there's an algorithm of such type. Not that I've heard of ofc.

#

Again I'm not sure if there really isn't any algorithm possible. U can try out some unsupervised operations to make sure tho.

west tapir
#

hmm alright will try

#

ty!!

naive rover
#

hi

#

I need help about data science, the chat room of the problem is #help-chocolate

#

πŸ™‚

barren wedge
#

Does any one know how to code a game

stoic adder
misty flint
barren wedge
#

I will go check..

naive rover
#

hi

#

I need help about data science, the chat room of the problem is help-chocolate

#

I'm developing a machine learning model to identify non-payers

haughty topaz
#
from sklearn.cluster import KMeans
from scipy.spatial import KDTree
import webcolors
import cv2

def convert_rgb_to_names(rgb_tuple):
    # a dictionary of all the hex and their respective names in css3
    css3_db = webcolors.CSS3_HEX_TO_NAMES
    names = []
    rgb_values = []
    for colour_hex, colour_name in css3_db.items():
        names.append(colour_name)
        rgb_values.append(webcolors.hex_to_rgb(colour_hex))
    
    kdt_db = KDTree(rgb_values)
    distance, index = kdt_db.query(rgb_tuple)
    # This of course only returns a closest match
    return names[index]

image = cv2.imread(r"helper\data\Nike-SB-Dunk-Low-Pro-Bart-Simpson-Product.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# get n clusters of colours
image = image.reshape((image.shape[0] * image.shape[1], 3))
clt = KMeans(n_clusters = 4)
clt.fit(image)

img_4_most_frequent_colours = []
for colour in clt.cluster_centers_:
    colour = convert_rgb_to_names(colour.astype("uint8").tolist())
    img_4_most_frequent_colours.append(colour)


print(img_4_most_frequent_colours)
#

I'm trying to get 4 colours from a picture

#

is this an efficient way?

rich merlin
#

Would someone have a moment to guide me through what I would need to do to begin this assignment?
Or perhaps a good youtube tutorial to follow for this assignment? I've only ever used pandas on jupyter notebooks

cerulean stream
#

Hi, Im having a little trouble w/ numpy
where arr is a 2D array with shape (30, 30 and dtype=uint8

arr = np.where(
    arr < lower,
    new - diff, 
    np.where(
        arr > upper, 
        new + diff, 
        new - color + arr,
    )
).astype(np.uint8)

this was my former but (slow) solution

def func(element: int, new: int) -> int:
    if element < lower:
        return new - diff
    elif element > upper:
        return new + diff
    else:
        return new - color + element
# and I map func over each element within the nested array

it does not match the desired results at all πŸ˜”
does anyone happen know where the process is differing pithink

serene scaffold
#

if that doesn't help and you want to continue, please say the types of all the variables in your example (other than arr)

tidal bough
#

it looks pretty fine for me

cerulean stream
cerulean stream
cerulean stream
serene scaffold
cerulean stream
serene scaffold
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @shell yew until <t:1654992737:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

misty flint
#

i didnt even know SAS had a campus and everything

#

fun fact: they apparently use python there too

#

πŸ’€

wooden sail
steel burrow
#

I’m looking for a data Analysis book, I was recommended this book Wes McKinney Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython but I’m seen bad Reviews that’s not updated is there’s a better book then this I need your suggestions please

pseudo wren
#

What is a good package to use for the double scalar error or a good fix in general

#

I’ve heard that error is a result of too many large negative numbers

pseudo wren
#

From what I found on stackoverflow that error occurs when there’s a very large very negative number

#

I am trying to create a predictive time series model

loud cove
#

honestly, just create an inside function that filters and pass those values.

rich fiber
#

Can someone guide me on AI

#

In blogs there are conflicts

#

Maybe a roadmap you can provide?

serene scaffold
#

@rich fiber start by learning about k nearest neighbors

rich fiber
#

I just learned the basics of python

#

Now should I learn about data analysis libraries?

#

Or algorithms first?

serene scaffold
rich fiber
#

Alright so I start by learning different ML algorithms?

serene scaffold
rich fiber
#

Can you kindly suggest me any resources to learning the maths required

#

I tried a book of Cambridge University and I could barely read a line without having to Google

pliant pewter
#

Cambridge University Press publishes all kinds of textbooks at all university levels

rich fiber
pliant pewter
#

That sounds like it's trying to teach you the mathematics. I haven't read it, so I don't know what level it's aimed at.

rich fiber
#

Yes that exactly what I was trying to learn alongside basics of python

#

So later on I won't have trouble learning the algorithms

#

Or dealing with large data (I hope)

rich fiber
pliant pewter
#

It sounds like you're a beginner in three different things at once: Python, mathematics, and machine learning. I don't think I have good advice for that, because I have not attempted to learn all three of these at the same time.

rich fiber
#

I've done python

pliant pewter
#

I would say, go get the mathematical foundation you need to understand the machine learning textbook, and then come back to machine learning later. You need at least basic statistics, probability theory, calculus, and linear algebra.

rich fiber
#

Yes that's exactly what iw as trying to do

#

Kindly give me a resource where I can learn the maths required

pliant pewter
#

I don't have a good thing to recommend as all the resources I know are graduate level

#

But look up textbooks on those topics.

rich fiber
#

Unfortunate although thanks for all the help :)

inner jackal
#

hey guys i don't know if im' in the right channel, i'm looking for a good proxies for scraping with python any good website? thanks in advance

vivid jasper
#

Hello! Does anyone have a way to save an excel file with a password that works reliably in python?

gleaming marsh
#

Anyone w/ experience using numpy and numba together? I'm having a weird error regarding arrays and matrices #help-cupcake

lapis sequoia
#

Hehe boi

limber token
serene scaffold
#

Are you sure you don't mean sort?

limber token
#

Yes, they're already sorted, what I want is to find the next row with the closest timedelta to either 30 or 365 days

serene scaffold
#

@limber token keep in mind that "filter" means "retain only values that satisfy a certain condition". It does not mean "select the most similar "

limber token
#

Okay, but "sort" is not really the word here either is it? lemon_thinking

serene scaffold
#

It's not

limber token
#

Anywhoo

#

Any tips?

serene scaffold
#

See if there's any "select closest" functionality built into pandas. I doubt that there is, but it's worth it to check.

Otherwise you'll have to make a new column that is the time Delta and loop through it.

#

Well, I guess you don't have to loop through it manually. Because if you have a column of timedeltas, the closest one is going to be the idxmin

limber token
#

When searching "select closest date" I only found how to find the closest date to the initial date

serene scaffold
# limber token How do you mean?

Do you understand what an argmin or argmax are? Idxmin is the pandas version of argmin. If you don't know argmin, read about that and come back.

If you have a column of timedeltas relative to time x, the idxmin will be the index of the row closest to time x.

limber token
#

I know what argmin and argmax are, what I meant is, I'm trying to find the closest days to a month and a year after date x, so I'm confused how idxmin would help 😺

(Sorry if I'm being confusing, not a native English speaker)

serene scaffold
limber token
#

Oh of course, that makes a lot of sense

#

Thank you πŸ™‚

serene scaffold
#

@limber token I'm at the gym. If you're still confused in like an hour, ping me and I can give a better example

hollow sentinel
#

generally speaking it's possible that your dataset doesn't have a problem that can be answered, right?

#

for example idk what the problem here is

#

i often have a hard time thinking of the problem at hand

haughty topaz
#

What would be a good way to add values to the NA values in this dataframe

#

I'm trying to plot three lines but there's values missing

serene scaffold
haughty topaz
#

Cause the data doesn't exist

#

And I guess I'm trying to find a good estimation

serene scaffold
haughty topaz
#

yea

serene scaffold
#

you can see how they insert values that "make sense" given the known values.

haughty topaz
#

ok thx

#
df1[df1["kwartaal"] == "Q1"]["month"] == "january"
#

Is it not possible to add a column on a slice like this?

#

"month" is a new column

serene scaffold
#

also, you have == on the rightmost side, which is not assignment.

#

that said, you can't add a column to a slice. every cell in a column has to have a value, even if it's NaN. (though pandas might initialize values outside the slice to NaN if you do it that way--idk)

rich fiber
#

Can someone confirm this,
"House with 2 bedrooms are cheaper than house with 3bedrooms" is data science
However when predicting prices, it is known as ML

serene scaffold
rich fiber
#

Alright I get you

#

Thanks

serene scaffold
#

but I guess just making a factual statement about a trend in the data is more "data science" than it is ML.

haughty topaz
#
df1.loc[df1["kwartaal"] == "Q1", ["month", "day"]] = "january", "1"
#

yea this adds NA values for all not Q1

serene scaffold