#data-science-and-ml
1 messages · Page 359 of 1
i got this activity to use eigenface and svm for face recognition
but i am having a hard time understanding the steps
im not a math guy😅
for each class there should be a corresponding eigenface?
and for classifiying ill get the eigenface of the input image then check on which other eigenfaces it is close then it will belong to that class?
https://paste.pythondiscord.com/xiyarahofe.py this is my code, it's a school assignment so I don't expect any direct answer but could someone give me a hint of what I'm doing wrong? I get 4 graphs whereas 3 of them are exactly the same and one of them is completely out of line (broken). Seems like the data I'm trying to get is wrong? The "folkmängd" one seems to be correct, the rest are corrupts. Any idea? (ping on reply)
Hi, I can't really code for what I need to do with this- I need to check the values from a column of a dataframe df1 and see if they match from a column from df2. If they do match, I want to take the corresponding value from the value matched and append another column from df2 to a new column in df1
Hi i want to use UMAP but, i got an error like this? How to fix that?
i cant manage to get object_detection in path, anyone??
Hej hej, I'm building plots with plotly. And after some code checking i found out, that I have a problem with my time data. I'm from Germany and so my time is recorded as dd.mm.yyyy but plotly needs the US Format yyyy-mm-dd. Does anybody knows how to call the datafram what is day and what is month in my data? I have a date and a time column. After calling the time column to timedelta there was no problem to combine both to a date_time column. But still the date is translated wrong.
date time should be 2021-03-05
:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1638960164:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
tbf, its just a formula - maybe ask a researcher?
yo optimizers on model are the one responsible on adjusting the neurons right?
the the basis of the optimizers on which or how much to adjust is the loss function ?
adam optimizer and categorical crossentropy is what im trying to use on a cnn model
https://www.kaggle.com/supreeth888/linear-algebra-for-machine-learning-1
Hey guys check this notebook and give a upvote if you learned something !! Thank you !!
Any viable way to pd.merge() if df1 is string and df2 has int type?
no, you'd have to add an extra column to one of them where the values to merge on are equivalent, but it would be easy with .astype(...)
Which datatype would I change and what to?
I'm not completely sure since I don't know what your dataframes look like. Do print(df1.head().to_dict('list'), df2.head().to_dict('list')) and share the text. Note that I will not look at screenshots.
Ehm, I can give an example.
df1[col1] : KXW453, JQW134, WUHF532, ABFO121
df2[col2]: 34, 52, 27, 12
This is not what I asked for. Please do print(df1.head().to_dict('list'), df2.head().to_dict('list')). This is the only format I will accept.
I can't share the data, an example is what I can share though.
There's no way that the example you gave could be interpreted as dataframes, so I'm at a loss for how to help. Good luck!
Right. I made a mistake.
those are the columns of the DataFrames
up
I tried adding a bit more info if you could look again
Or you could tell me if you need me to add more
df1[col1] : KXW453, JQW134, WUHF532, ABFO121
df2[col2]: 34, 52, 27, 12
so if you see 34, do you want it to go with a value that has 34 at the end?
!e
import pandas as pd
col = pd.Series(['KXW453', 'JQW134', 'WUHF532', 'ABFO121'])
number_col = col.str.extract(r'(\d+)$').astype(int)
print(number_col)
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
001 | 0
002 | 0 453
003 | 1 134
004 | 2 532
005 | 3 121
My initial question was to find a way to match values of two columns, df1[col_1], df2[col_1] of two separate dataframes. The second dataframe, let's call it df2 has another column, df2[col_2] which basically is an index for the code/values in df2[col_1]. I wanted to match the values of df1[col_1] and df2[col_1] and if they match I wanted to add the values from df2[col2] (the index values) to another dataframe.
It seems I don't know how to answer this in the abstract. If you can come up with example dataframes that encapsulate what the problem is, then I'll take a look at that.
this has some pretty good advice, even if you have to make pseudodata: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples
Hi
I need some help
Recently I have completed few python libraries- Numpy, Pandas, and Matplotlib. So i was wondering what is that I can do next ... I want to do some projects. but I don't know where to find then and what's next.
df1['col_1']
0 KPWA231
1 J2F372
2 LKT121
3 TWE443
4 NPR432
df2['col_1']
0 J2F372
1 APQ124
2 KJH152
3 TWE443
4 LKT121
df2['col_2']
0 30
1 17
2 12
3 24
4 52
df1.merge(df2, on='col_1')
?
yes but you have to represent the data directly in the code, since you can't upload the CSV into our execution environment
yes, with !e
Is it possible to do
df1['col_1'] = df1['col_1'].merge(df2, on='col_2')
or rather, what does the on= do?
no, because merge is for merging entire DataFrames
on specifies which column you want to use to merge.
Okay so, is it not possible to perform the merge using specific columns between DataFrames?
it is, but I'm not sure your understanding of what merging is is correct
df1['col_1'] = df1['col_1'].merge(df2, on='col_2')
The result of a merge is an entire DataFrame, not a Series
I'm doubting it too.
I think I'm getting confused between a particular desired column and a resulting DataFrame
I'm looking to do this though.
so you just want a column from one dataframe to also be in the other?
Right, sort of, basically.
sort of basically?
If values in two columns match, I want the corresponding index values (for eg. the relationship between df2['col_1'] and df2['col_1'] to be returned.
you want index values to be returned?
Yes
so you want to know which indices have values in both columns?
you don't want to modify the content of either dataframe?
No, I just need the indices attached to the same values, I don't want to modify the content of the dataframes
what if the indices are different for the same value? does that matter?
Those indices are attached to the values that are matched
but in the two columns, the same values might have different indices
Can I dm you about this?
I'm about to go offline for about an hour, so I probably won't be able to solve it in time
I'd be able to explain it better
The first data set has a column with certain identifiers or codes I guess for products from this sample data. The second dataset has a column with different yet some same identifiers or codes and also has a column for indices for these identifiers. My aim is to see if the identifiers match. If the identifiers match, I want to store those specific indices in a column, if some identifiers don't match, I want to return "-" or NaN for that instance.
If you could take a look at it whenever you're available it'd be really helpful.
Hi do you know to fix this problem?
try just x = UMAP()
i think you need to do umap.umap_() if you wanna use im not sure tho
i get an error like this
Do you only have those two columns or are there others in the dataframes?
Others in the dataframe
Okay. So when you say you want to merge, you want to combine all the other columns too, right?
I want to match the two columns I mentioned that I want to match. If some values match, I want to resulting dataframe column to contain the index of the ones that match and the ones that didn't should have a blank or a zero or a NaN instead of them
anyone can help me?
A toy example is something like this right?
df1 = pd.DataFrame({
'col_1': ['KPWA231',
'J2F372',
'LKT121',
'zTWE443',
'NPR432'],
'other_col': ['a', 'b', 'c', 'd', 'e']})
df2 = pd.DataFrame({
'col_1': ['J2F372',
'APQ124',
'KJH152',
'TWE443',
'LKT121'],
'col_2':[30,
17,
12,
24,
52],
'another_col': ['q', 'w', 'e', 'r', 't']})
there might be a method insude UMAP you want to use
Or instead of that, how can I merge a productId column with the index column. Since the datatype of both of those columns is different
so it would be UMAP.method()
This would be the toy data.
According to that data how can I merge df1[col_1] with df2[col_2] on df2[col_1]
like this
Is the result supposed to look like this:
result = pd.DataFrame({
'col_1': ['KPWA231',
'J2F372',
'LKT121',
'zTWE443',
'NPR432'],
'other_col': ['a', 'b', 'c', 'd', 'e'],
'col_2': [pd.NA, 30,52, pd.NA, pd.NA ],
'another_col': ['', 'q', 't', '', '']
})
result
})
I.e., the rows for J2F and LKT are common for col_1 between the two original dataframes. You keep all the original values from df1 and use null values for df2
remove the () after UMAP
an error like this
can you show me a snippet of the module?
this
yeah theres no method called fit_transform in it
I see an other people can run that code very well
I just wanted the result column basically.
but, what should I write of code?
This will produce the result I pasted above
df1 = df1.set_index('col_1')
df2 = df2.set_index('col_1')
merged = df1.join(df2i, how='left')
merged = merged.reset_index()
What does set_index() do?
what tutorial were you using?
youtube
Every dataframe has an index. By default it is just a row number from 0 to N. To do things like merged and joins you have to ensure that the indices are the same. Since you wanted to join on the value in column 1, it makes sense to use it as the index
then just check the yt vid my man
I'm getting the error Series has no attribute set_index() when I try to do that
You are probably writing something like
df1['col_1'].set_index('col_1')
Instead, do
df1.set_index('col_1')
why is there an i in merged = df1.join(df2i, how='left')
I'll have to work on this tomorrow though since it's pretty late for me now, thank you for the help. I'll reach out to you and i'll let you know if it works for me.
That was a typo, you can remove the 'i'. I had added it because I wanted to test a few things before pasting them here and had made a couple different versions of the dataframes
yo anyone here can help me understand this ? https://sites.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdf
like the simplified steps?
this is what i think it is
yo anyone here can help me understand eigenfaces?
- flatten the images to make vectors
- get the mean face by getting the average vector of the vectors
whats next?
i got this activity to use eigenface and svm for face recognition
but i am having a hard time understanding the steps
im not a math guy😅
you should probably learn some basics of python before diving into something like umap
also fit_transform is part of the transformer api in sklearn, so spend a bit of time learning about the transforms in sklearn
define properly explaining
Course materials and notes for Stanford class CS231n: Convolutional Neural Networks for Visual Recognition.
this is straightforward
hi is someone free to help rn? i just started working with pandas
Hi, is someone free to discuss with me where should I start with making automatic script for reading information from ID picture?
Try asking your question about pandas. Then people can see if they can help. Remember to share DataFrames as text with df.head().to_dict('list') and not as screenshots.
As if I just finished training my Tensorflow model after 1 month and finally try to predict it and all of my outputs are: [1,0,0,0,0,0,0,0,0,0.....]
rip
Does anyone with Tensorflow experience know how to correctly combine an image and question input?
My Images are of shape: (36,2048) and my questions are of shape: (64)
I didn't realise before because when I was using my batches, the error never comes up, but if I try to predict just one image and question from my dataset, I get the error:
ValueError: Data cardinality is ambiguous:
x sizes: 36, 64
Make sure all arrays contain the same number of samples.
This probably explains why my answer outputs are always [1, x] where x is 63 zeroes. So clearly my training hasn't worked correctly.
Just figured this out, I was a bit of an idiot. sklearn thought I was doing a classifier because for some reason I made the model a classifier when it was supposed to be a regressor. When it tried to convert my continuous data to classification labels, it figured out something was wrong and threw that error.
I hope you were joking about running training for a month. If not, it's a reminder why it's always a good practice to do a sanity test using a small amount of data/epochs/layers, etc
👀
I mean, I say a month, most of that was unsaved and restarted (checkpointed every few epochs)
but im basically at the start again because some random thing seems to have taught my model what an "unknown" token is more than anything else
I wish I could remember more details, but there's some professor with a computer in his office that has been running a numerical experiment since the 90s. Would suck if it finally finishes but he had a bug in his code
i mean that's basically what I've done lmao
idek where to start on fixing it, I thought my training was nailed down
I'm really hoping the error is because of the extra dimension but god knows haha
even if it is the extra dimension, I still have a problem in changing that to be compatible
ah it's all good, my model.evaluate loss is tiny... (that is a joke)
Hey guys
do you guys have information abt MLOps ?
what tools to use to work as MLOps engineer ?
Just released:
A free Deep Learning course using Pytorch and starting at the very beginning. Including some cool stuffs:
• Transformers
• Generative models
• Self-supervised learning
Course Overview!
nice
Hey guys.. what kind of join/append/concat is needed to do following:
I'm doing it in pyspark.. but really I can use anything.. the actual DFs I'm using are pyspark DFs.. but I can convert. The above is just an example of what I'm trying to do.
This is called an outer join.
I was afraid of an outer join dropping IDs (the common) that don't have a partner
But if it won't.. then THANK YOU 😁
An outer join is where every row from both the left and right is retained, even those that don't have a match in the other. The rows that can't be matched end up with null values in those cells that can't be filled.
An inner join, on the other hand retains only those rows that can be matched on both the left and right.
In set terminology, an outer join is union and an inner join is intersection.
Yeah, 99% of the time at work in doing inner and left joins.. thank you!
no problem 
Stelercus big brain
Man Tensorflow has amazing documentation until I try to figure out things like masking because I only found it after A LOT of specific googling
Imagine building and processing a model for a month and all you teach it is what an <unk> (unknown token) is
I have spent 12 hours trying to fix it and my Tensorflow model still only outputs this ,-,
whats the best algorithm to build the model for this problem
why couldn't you copy that line of text into the chat as text?
What information in tweets would tell you about "trends of various crypto-currencies"?
what is a "trend" in the context of crypto?
like how much tweet in a day about bitcoin
what determines if a tweet is or is not about bitcoin?
i want to group all the tweets which contain word="bitcoin"
if that's all, you can just use the tweepy library and retain tweets with the substring bitcoin in them.
and also find the sentimental anlysis
the twitter API gives you a sentiment score.
whether or not their model for assigning that score is good... 
do i need to use db for vizualising the trend?
db?
database to store data
I don't see the connection between data storage and data visualization.
what will the two axes of your graph represent?
x axis date,y axis amount of tweets
you don't need to hold the content of the tweets for that. just a running count of tweets of interest by day.
oh okay,thx.my class teacher gave the problem statement told us to do build the model
but i just started learning ml
its hard to understand problem statements
yo anyone can help me understand the math of eigenfaces?
Did you ever solve the issue of torch.cuda.is_available() returning false in Conda even though you install the Gpu version. I’m on win10, with a 2080 most recent drivers.
who else is taking the IBM data science coursera course
Is anyone out there who expart in EEDG data visualization !
Plz contact with you
@pastel valley @marsh yacht @gusty sparrow it sounds like you're all waiting for someone before you ask a question. Don't ask to ask--just ask.
@serene scaffold
Hi ! Can you help me in my project to visualize EEDG data ?
idk what that is. But again, don't ask if people know stuff. Just ask your question.
I appreciate that you're being polite, but in a real time chat such as this, the easiest way to give and receive help is to just put a question right out there. What data do you have? What have you tried to do to visualize it?
@serene scaffold
Ok brother now i understand your concern .
Thanks to your politeness .
If you have n observations from a pareto distribution (shape param 2, scale 1) and an expected value of 2, can you make the claim that the sqrt(n)*(x bar - 2) does not converge to a normal distribution?
Struggling with the central limit theorem here
Cause this does not contradict the CLT but I can't figure out why
- Write your code to construct a dictionary variable data_large_countries whose keys are iso_code,
total_cases_per_million, total_deaths_per_million, population, population_density.
You can start from the original dataset data_dict.
You can adapt the code given to you in Instruction 2.
Print the dictionary.
sorry to ask this ....but can anyone help me with this
im a beginner
my question is has everyone know the eigenfaces theory maybe someone can simplify it for me because i need to do face recognition with it but i dont understand the theory behind it
- Batch Training can solve this RAM issue. Load your data in batches. Utilize the
chunksizeparameter while loaded your data in pandas.
Please do a quick Google search to see how Batch Training is done. I'm busy at the moment so I can't write code this time.
-
Since you're using Sequential method to build your neural network, it's your prerogative to determine the number of hidden layers and nodes your network should have.
-
Then on your second question about input shape this should handle it
It's just an example
Assuming each of your hidden layer has 100 nodes and you're performing a classification problem.
input_data = df.drop(['target_column'], axis=1)
n_cols = input_data.shape[1]
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(100, activation = 'relu', input_shape = (ncols, )))
model.add(Dense(100, activation='relu'))
.
.
.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])
model.fit(input_data, target)
Then proceed from here to prediction, then evaluation.
If you're working with a more complex architecture, you should then configure yours appropriately.
Hello peeps!! I have a dictionary that looks like this. (see image)
year_wise_topk = {}
for i,year in enumerate(os.listdir(years_path)):
topics,topic_prob,topic_ids = prob_topwords(reverse_vocab,year)
year_wise_topk[str(i)+'_topic'] = topics
year_wise_topk[str(i)+'_topic_prob'] = topic_prob
year_wise_topk[str(i)+'_topic_ids'] = topic_ids
I have this code in which topics, topic_prob and topic_ids are lists
I want to plot the probability of for each year with their respective words....
How can i do that
?
the plot should look something like this... or any way to plot it
Ohh, it's a time series analysis, I understand. Have you tried using colab?
can anyone make sense of what these variables in the k-means algorithm means?
I haven't worked with hdf5 dataset but I'm pretty sure it's gon follow the same syntax you'd normally use to load a csv data or image data into keras
It's still gon follow this syntax
haha, ya i did but i still dont know what caused the problem because i was switching from machine to machine and it eventually worked
SageMaker is a strong contender for those starting out in deep learning and almost a straight upgrade from the free version of Colab. Compared to Kaggle and Colab Pro’s P100, SageMaker’s T4 can be faster in Mixed Precision training, while significantly slower in single precision training.
I changed to a new env on Conda and got it workng 🤷🏻♂️
ya that sounds ab right
idk, when i had the problem it very well couldve been bc of the fact that my windows was saying it wasnt legit even tho it is
maybe some features were turned off?idk
Hey guys, any good math sources to learn AI oriented?
or not AI oriented, just good stuff you know of 😉
feel free to tag me
`# -*- coding: utf-8 -*-
"""
Created on Fri Dec 10 11:14:59 2021
@author: HP
"""
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
iris = pd.read_csv("Downloads/iris.csv")
iris.head()
iris.plot(kind = "scatter",x = "sepallengthcm", y = "sepalwidthcm")
hello all
I want to do some investigation about the vector distance between each image augmentation and the embedding model (ex: VGG,Resnet,Mobilenet,etc.). Since i use milvus and normalize each vectors to use dot product distance (which give same output as cosine similarity) instead euclidean distance, i need to visualize each result grouped by model based on each object. Is there any tools that i can use for this task?
My code this way
When I see my month_df in expiry column i am getting nan values
But when I check for single month using break statement it worked as expected but
When I remove break then I am getting nan values in expiry copumn
Please check code of for loop block
Ping me when replying
U can see in last expiry column of this df
I am getting nan values
Assuming 4 bytes per entry - an array of that shape is over a terabyte of data? It's not really practical to try and do that sort of thing on one machine?
Hey there, could someone help me with people motion tracker in opencv?
if anyone has experience implementing gradient descent in python and is willing to help me get my head around some things, please drop me a message or an @ it would be greatly appreciated.
Given I've never used either before, is it wise to use jupyter lab or notebook?
vscode jupyter extensions seem pretty attractive
I find google colab much better, with the same functionality
Hmm interesting
Is that the one that also offers ML compute integrated?
all in browser, allocates resources to your environment and is very good for all but very large scale ML operations
oh right
hmm i thought i heard about people abusing some google service to mine crypto
I'm a student, with a ML module but was initially instructed to use jupyter and me and nearly all my class migrated to colab. So i'm (very) far from an expert but can recommend it on personal experience 🙂
and if you decide to switch they both support the same file formats (namely .ipynb)
@whole phoenix You can use == to compare numpy arrays.
You get back an array.
Try comparing two arrays of the same shape and dtype. Equal objects in the same location of each array tested will cause there to be a True in the array at the same position.
If you're only after one value at a time, however, you can compare a whole array to a single value. You'll get an array with True where it matches.
@sterile heath oooh you may be able to help me with a similar issue, I have to explore a mnist dataset and want to extract just the items that have a label of '1' but have been struggling to do so, okay if you haven't worked with them before jsut thought it's worth an ask whilst you're here 🙂
Extract. What does this mean?
Tally?
What do you mean by labelled?
Fair warning, I'm a numpy novice.
I'm picking things up, but I'm no expert.
Oooh. Okay. I just looked up mnist.
Yeah, that's computer vision/ML-related stuff.
yeah, it's a pain haha i'll post my question regardless in case someone else is hovering around
Really not my area.
the dataset I am exploring is for a binary classification problem, so each image is tagged with a label 1 or 0, i'm trying to extract those which are labelled just 1 so that I can print examples, but am finding it hard to do so when it seems like it should be easy
"Don't know how? Ha! Go pound sand."
I can't wait to fail this module and be done with it haha
I assume you're asking about one of the npz files?
I'm not too sure, i'm exploring just the breastmnist dataset
That tracks.
hello ,i am trying to map my label df with the images in my training folder
but i cant understand where i am doing wrong
LABELS_g=[]
from os import walk
for root,dirs,files in os.walk(r'D:\glaucoma_train\ODIR-5K\ODIR-5K\Training Images', topdown=False):
for name in files:
print(name)
#img_name_g = name
#print(img_name_g)
row_g = df_labels_g.loc[ df_labels_g['image'] == name]
print(row_g)
```
it shows empty df
@crisp cargoWhat's the images list name and what's the labels list name?
['train_images', 'val_images', 'test_images', 'train_labels', 'val_labels', 'test_labels']
Which are pertinent?
yo anybody here have knowledge on eigenfaces implementation?
just a general question:
I am starting to read up on tensorflow and its implementations. I have a set of csv files with one relevant class variable that i want to predict.
I want to build a predictive model for said variable with tensorflow. It would be nice if you could give me some basic pointers to research, on how i would do such a thing :D
I have no experience with tensorflow outside of looking at some basic programs.
@crisp cargoSo I've come up with two solutions. Both start like this.
import numpy as np
with np.load("breastmnist.npz") as file:
data = dict(file)```Now, from here, you can use Python's `zip`, which is handy. If you haven't used it before, you can iterate over two or more things at a time, which is great if you want to do pairwise things. e.g.
```py
zip(data["val_images"], data["val_labels"])```I did it with a list comprehension with an `if` in it, but you don't have to.
The second way you could grab the images you want is through use of `numpy.where` and `numpy.take`. You can use `data["val_labels"]` in `np.where`, then plug that (`[0]`), along with `data["val_images"]` into `np.take` (`axis=0`), giving you your desired arrays.
!e py import numpy as np things = np.array(["Apples", "Pears", "Grapes", "Oranges", "Bananas", "Lychees", "Watermelon"]) indexes = np.array([1,3,6]) print(np.take(things, indexes))
@sterile heath :white_check_mark: Your eval job has completed with return code 0.
['Pears' 'Oranges' 'Watermelon']
!e py import numpy as np things = np.array([False, True, False, True, False, False, True]) indexes = np.where(things)[0] print(indexes)
@sterile heath :white_check_mark: Your eval job has completed with return code 0.
[1 3 6]
This works on anything that's truthy within the array.
hello i have a dataframe
my code this way python for i in unique_months: print('i=', i) if i == 'March': continue get_month_data = df2[df2['month']== i] # check if last thursday is present or not otherwise take wednesday data get_data = get_month_data[get_month_data['week_day']== 'Thursday'] # if thursday is not present then get wednesday data get_expiry_data = get_data['nf_date'].iloc[-1] get_expiry_date = get_data['nf_date'].iloc[-1] print('expiry=', get_expiry_date) month_df = month_df.append(get_month_data) month_df['expiry'] = get_expiry_date break
when i use break statement then code works fine when i remove break statment then i get last value to month_df['expiry'] column
my df using break statment please check expiry column
when i remove break statment then i get this way please check expiry column
this value is appended to all rows
I don't really understand what the code snippet is trying to do
Like what's the intent of that for loop?
I can't interpret what's correct and incorrect
see i am getting each month data and their last date value
then i am getting the last date as expiry and i am appending it to expiry column
Where is the month_df dataframe defined?
see first ss is correct when i do my task for one month using break statment
month_df = pd.DataFrame()
for i in unique_months:
print('i=', i)
if i == 'March':
continue
get_month_data = df2[df2['month']== i]
# check if last thursday is present or not otherwise take wednesday data
get_data = get_month_data[get_month_data['week_day']== 'Thursday']
# if thursday is not present then get wednesday data
get_expiry_data = get_data['nf_date'].iloc[-1]
get_expiry_date = get_data['nf_date'].iloc[-1]
print('expiry=', get_expiry_date)
month_df = month_df.append(get_month_data)
month_df['expiry'] = get_expiry_date ```
do u get my point ? @loud cave
So it's an empty dataframe and you're filling it in that loop?
can u show how way u are saying ?
@loud cave do u get my point by the way what i am trying to do ?
Anyone experienced with Nifi that could give me a hand?
i tried this way python month_df = df2.month.apply(lambda x:x ) can u please look into this? how i can do this?
Hey! Does any of you have a tutorial on AI simulations in Python?
Hello guys !
I'm exploring jupyterlite, and i was wondering if it's possible to access the DOM from a DedicatedWorkerGlobalScope (which is what we have access to with pyodide from inside a notebook) ?
cause i would like to communicate with my web interface
Nevermind, just forgot to run as admin
i= March
i= April
expiry= 4/27/2017
i= May
expiry= 5/25/2017
i= June
expiry= 6/29/2017
i= July
expiry= 7/27/2017
i= August
expiry= 8/31/2017
i= September
expiry= 9/28/2017
i= October
expiry= 10/26/2017
i= November
expiry= 11/30/2017
i= December
expiry= 12/28/2017
i= January
expiry= 1/18/2018
i= February
expiry= 2/22/2018``` i want to fill each date with respected month
so i get month with its respective expiry value in dataframe
ping me when replying
Hi everyone, i'm trying to create a plot that look like this. But i dont know what is the name of it in order to google it. Anyone knows what this type of plot is called?
this looks like an (in matplotlib terms) imshow/matshow - plotting an array as an image, one cell per pixel.
thank you so much
:incoming_envelope: :ok_hand: applied mute to @placid horizon until <t:1639152401:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
Hello everyone. Please I need help creating a TFrecord file out of this raw data below. Thanks.
raw_data = [
{'x': 1.20},
{'x': 2.99},
{'x': 100.00}
]
I have tried but not getting it.
How to use annotations in px.choropleth ?
I want to plot a numpy array but i got an error like this. How to fix it?
Please show the code that caused this. Also, it's a lot more helpful to show everything (code and error messages) as text.
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
It looks like you're dealing with DataFrames, which are not quite the same as arrays.
This is my code
I do not look at screenshots of text.
I cannot help. Good luck!
Heyy, i have a question
I'm trying to do a multiclass classification, there are 3 classes norrmal glaucoma and dr
I have 4 image datasets for glaucoma. To preprocess these should i combine them and put them in a single folder?
If so, how do i go about it . Please guide me ,thank youu!
I have never done image recognition, but you can store the data however you want.
Probably want to split it between training and testing data?
I'm assuming you are trying to make a model right
no no, i just want to preprocess the training data
What library are you using
im planning on using tensorflow
If its keras then you have to split your data into 2 arrays, the features and the labels since when you train the model, those are the inputs
Yeah keras is like a submodule of tensorflow
(would recommended using numpy arrays since thats what everyone uses)
Yeah this is exactly what i wanna do
Oh wait
Im still learning this
But before i get it in that form
Exactlyy
I need to preprocess it
I forgot but either numpy or tensorflow has a helper function to reduce an image to a certain ratio
I am very confused cause i have 3 image datasets w labels and idk how to combine
Ok well I dont know any good ways to get jpg images to numerical data like a neural network needs
Maybe someone else can help with that
I can convert it into a numpy array
So ima go
I just wanna know that if i have 3 kaggle datasets how should i merge it
Oh shit alrightt!!
Yeah idk
you can partition the data between the two in memory with sklearn, though it might be helpful to save the data as it's divided to disk if you need it to be the same every time.
I don't wanna split it into train an test as of yet
I wanna preprocess my datasets
I have got 3 kaggle image datasets
Should i combine them,put them in a single folder and do it?
in what way are you going to preprocess them?
resize,grayscale and augment them
you can put them all in one folder if you want. It's just a matter of opening files from one folder or three.
you are right! would i have to make any changes in the names of the labels' columns or i can leave that as is
I don't know enough about what you are trying to do to comment.
they are 3 image datasets, with labels showing if they are glaucoma or not. The last question i have is whether i can save all these images in the same numpy array even if they are in different datasets..will it make any difference
whether i can save all these images in the same numpy array even if they are in different datasets
the real question is "can I have all these images in one numpy array if they are different sizes?"
and the answer is no. arrays have to be "rectangular"
if i normalize the images?
ohh
any direction you slice an array, the shape of each slice has to be the same
I'm not sure how this limitation is overcome in image processing because I have never done it.
ah alrightt!thank you soo much, i will google a bit more on this
I fear I was not particularly helpful. Good luck!
no no, i got my answers.was mainly confused about the datasets and their labels, thanks a lott:))
For a keras dense layer, what would be the best bias_initalizer?
The default is zero but I want it to start off with some amounts of bias
Is there anyone who can help with learning the first few simple steps to make an ai voice assistant, if there is anyone please ping :)
Hey, my tensor flow model is giving me the following errors when i try to fit the model:
model.fit(training_vals, training_labels, epochs=10)
ValueError: logits and labels must have the same shape, received ((None, 2) vs (None, 1)).
training_vals is shaped like this: (191164, 34)
training_labels like this: (191164,)
what might be causing this issue?
try reshaping training_labels to (191164,1) so that it's a column vector and not a regular vector.
do i just training_labels.reshape(191164,1) to do that?
I think so.
o7, will try
I'm a bit confused as to why the shapes in the error message are given as (None, x)
idk what None even means in that context. When you take slices of an array, None is used to insert a new axis. So training_labels.reshape(191164,1) should have the same effect as training_labels[:, None]
new output when i print the shapes:
(191164, 34)
(191164, 1)
ValueError: logits and labels must have the same shape, received ((None, 2) vs (None, 1)).
error is still there
are you aware that reshaping an array/tensor is not done in-place?
it returns a new array/tensor
yes
without seeing the exact code, I don't know why you're still getting the error.
training_labels = training_labels.reshape(191164,1)
that is not enough of the code to infer what the problem is. I need to see everything between this and the line causing the error.
this is how i construct the training data:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(anlage1.drop(labels=['Fehlernummer'], axis=1), anlage1['Fehlernummer'], test_size=1/3, random_state=0)
training_vals = np.array(x_train, dtype="float32")
training_labels = np.array(y_train.values, dtype="uint8")
print(training_vals.shape)
training_labels = training_labels.reshape(191164,1)
print(training_labels.shape)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(34,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(34,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(34,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(2, activation=tf.nn.sigmoid))
model.compile(optimizer='adam',
loss=tf.keras.losses.binary_crossentropy,
metrics=[keras.metrics.FalseNegatives(name="fn"),
keras.metrics.FalsePositives(name="fp"),
keras.metrics.TrueNegatives(name="tn"),
keras.metrics.TruePositives(name="tp"),
keras.metrics.Precision(name="precision"),
keras.metrics.Recall(name="recall")])
model.fit(training_vals, training_labels, epochs=10)#, class_weight={0:0.06178233806436323645, 1:0.93821766193563676355})
my data comes from a pandas dataframe that reads from a csv file
so anlage1 is a pandas dataframe
did you confirm that all the values in the DataFrame are numeric?
yes
i make them numeric after reading from the csv
i dropped the datetime column too, after reading from the csv
for cols in anlage1.columns:
if cols is not 'Date':
anlage1[cols] = pd.to_numeric(anlage1[cols])
(None, x) means (from painful hours of debugging actually 3 days ago) (1, x)
should make them numeric, right?
in terms of shapes right
hmmmmmmmm
sounds like training_vals and training_labels isn't the source of the problem
that would mean it is the model expecting something shaped like (1, 1)?
if it is saying it wants (none, 1)?
If its asking for (None, 1) then ye
but... why?
¯_(ツ)_/¯
lol
after hours of debugging you dont even question the solution
I just did .reshape(1, 64) when I used .predict() (in my case it wanted (None, 64))
I'm assuming you can do something similar with your dataset?
Not 100% sure
I am just using model.fit not predict.
i did reshape my array, but that did not help.
OOOO
could it be that it wants 2, because my last dense layer goes down from dense(34, activation=tf.nn.relu) to a Dense(2, activation=tf.nn.sigmoid)?
So I am trying to connect to my deep learning vm by using ssh port forwarding
ERROR: (gcloud.compute.ssh) argument [USER@]INSTANCE: Must be specified.
I get this error. How do I fix it
sounds like this is more a networking issue than an AI issue.
right
the error sounds pretty self explanatory
just supply the username and instance for your instance
i want to pick a discrete int from [0, 1, 2... 1000] randomly, but i want the 0, 1, 2.. end to be much more likely than the ..., 999, 1000 end
what statistic/probability function is this?
do you want the probability to decrease as the value increases?
how do you want it to decrease?
like it depends on how unlikely you want, say, 700 to be
relative to 1
but for simplicity
Hey I’m a pretty casual python programmer but I want to get into some data science. But i don’t really know where to start, does anyone know a good starting point?
you could probably just use a folded normal distribution
this looks good thanks
algorithm trading
learn and lose money
Guys i have a question
So i created a liner regression model
And from that i found the rmse value for the training dataset which is 0.32 and on testing its 152.182
Is this correct or am i doing something wrong here
Wondering if this is a good approach to machine learning with chess. Bot plays against itself. If it wins with white, it associates every position after it moved with absolutely winning for white (playing as white and black too) and with black, it will associate every position after it moves with absolutely winning for black, if it ties, it associates every position after it moved as a tie.
Oh and a constant epsilon of 10%
Hi guys I just started learning programming and I have a question. Just out of curiosity is it possible to train an AI model through user inputs instead of letting it run until it is successful? For example training an autonomous driving agent on a race track through reinforcement learning by letting it learn by itself, can we train the model by ourselves instead? We would drive the car for a few laps and then let the model learn from the laps we've done. Is that possible?
learn what, exactly?
Learn to finish a lap by itself in a game perspective
for the racing game, are there other players that can try to sabotage you? does the track change in random ways? or can you complete a lap by following an exact sequence of steps?
A car driving through the exact same route with no changes and no other players at all.
This isn't really AI as enumerating an exact sequence of steps is how programs work in general.
Accelerate this much for this long, break this much for this long, turn at this angle, etc...
Now, if the car had to learn how to respond to events that happen randomly, and it did that based on data that you create by playing, that's another story
But that's just creating training data. It's not "training the model by ourselves" as some thing that's separate from supervised learning in general.
So if I start off by playing the game a few times and let the model train through my gameplay that's called creating training data? Thanks for taking the time to answer my question, I am really new to this and I might not be able to express myself correctly
you'd have to record how you played the game in a structured way. Like, which buttons you pressed and for how long at a time.
And you'd have to have some way of representing what those keystrokes were in response to.
because the point of the model is to learn what the right sequence of keystrokes is given the objective of the game and events that randomly occur during the game.
disclosure, I have never made a model that plays a video game, though a past coworker did.
That's pretty interesting! I learned something new today thanks to you!
I will have to work my way to that point myself lol i just started programming about a month ago
I'm a few years in and I still only barely understand what I'm doing.
Nice haha are you data scientist or something?
computational linguist; basically yes, but with more linguistics.
Oh i've never heard of this career path what do you usually do at work?
I can't answer that.
but in general, a computational linguist might work on machine translation, exacting data from text, or speech recognition and speech synthesis.
That sounds awesome!!
I am still new and I don't know which career path I want to learn yet
I recently bought a Udemy course on data science just to see if I would like that subject
in a lot of ways, it's just using statistics to leverage patterns that exist in data
but then, what else would it be?
That sounds alot like stock trading through patterns hahah
isn't most stock trading these days bots interacting with eachother?
You are right
the company I own the most stock in recently saw some of its employees unionize for the first time. guess I should see how that's doing.
it's down four dollars from the last time I checked. I'm ruined.
ouch, do you still believe in it long term though?
The stock market has been very volatile the past few weeks
The only reason I have that stock is because they gave it to me when I worked for them. I'd be fine with it becoming worthless if subsequent generations of employees get non-shit working conditions.
But now I'm off-topic.
Damn that's bad
Hey I just finished learning python through a Udemy course, would you recommend me practicing what I've learn through a project or coding challenges from websites like codewars?
I would just keep making stuff, even if they have no practical use.
There's actually a few projects I would like to make just for fun but I don't know my basic knowledge will allow me to make them happen. Such as a news filter/scraper or a program to scrape my pdf/receipts and put them in an excel sheet
that's why there's always google to help
the point of doing projects is to learn more, not to just have everything memorized
what I started with was taking some code I found online and modifying it in different ways to improve/adapt it for my use, that can help you gain a good understanding of how it works without having to start from scratch
Is it normal to always google and find whatever I need to solve the coding problems i try to solve?
I see what you're saying
you shouldn't be blindly copy-pasting code from the internet without knowing what it's doing, but its normal to have to look up documentations and error messages
Of course. When I tried doing coding challenges this past week, I find myself always looking on google how to code the ideas/solution that I have in my head
yeah thats perfectly normal
programming is more about knowing how to solve problems rather than memorizing documentation
I see problem solving skill is important
the only way to improve it is to continue to code everyday? lol
yeah pretty much
what is one versus one
and one versus all
there are multiple class classifiers such as naive bayes and random forest...
but you can force sci kit learn to use one versus one or one versus all w binary classifiers?
says in the book OVA's preferred
There are two popular approaches to multiclass classifier.
- One vs. Rest
- Multinomial/Softmax
One vs. Rest is the default behaviour of scikit-learn's Logistic Regression.
Modifying your loss function so that it directly tries to optimize accuracy on a multiclass problem is know as Multinomial Strategy. This is also called Multinomial Logistic Regression or Softmax or Cross-Entropy Loss.
Like you rightly mentioned, in solving Bayesian problem (Naive Bayes) with multiclass we use MultinomialNB but when we're working with a binary class we use GausianNB algorithm to build the model.
Comparing Both Strategies
So let me briefly mention the distinction between One vs. Rest and Multinomial strategy
- Overall Modus Operandi
a) One Vs. Rest: Fit a binary classifier for each class
b) Multinomial: Fit a single classifier for all classes
- How It handles Prediction Behind the Scene
a) One Vs. Rest: Predict with all but take the largest output
b) Multinomial: Prediction directly outputs the best class
- ** Their Drawbacks and Edge**
a) One Vs. Rest: simpler, modular, but not directly optimising accuracy. Not really a standard approach in Neural Network
b) Multinomial: more complicated but tackles accuracy problem directly. The standard approach in Neural Network.
Answering your 2nd Ques.
Yes, we can force the classifier to either use One vs. Rest or Multinomial strategy in Scikit-learn. Let me use a linear-based model to illustrate how it works...
lr_ovr = LogisticRegression()
lr_ovr.fit(X, y) #The conventional approach you know is One Vs. Rest
lr_mn = LogisticRegression(multi_class='multinomial', solver ='lbfgs')
lr_mn.fit(X, y) #The is the Softmax/Multinomial approach
I hope you understand it now? 😊
hey guys! how do I get started with data-science
I'm advanced in python so idm any more challenging to understand resources
https://www.youtube.com/c/Freecodecamp have a browse
Hello, are 600 images for a 3 class classification sufficient for transfer learning??
Or will that give a low accuracy
- Depends what "low" means
- Depends on the data
600 images can be enough if you have a pre-trained model that you finetune on 3 classes.
I concur with @lunar star. More so, more data leads to better performance. So if you have access to more data, you might wanna make use of it as well.
Low as in 80% accuracy i guess
Yes i do have a pre trained,mostly gonna use resnet or vgg16
The prob is i dont have more data, it's about 600 after data augmentation
Make do with what you have.
80% accuracy in my opinion isn't that bad (depending on the severity of what you're trying to do). If you ask some MLOps guys they'll be glad to inform you that models with 69% - 72% accuracy have once made it to production 😂.
If 600 images is capable of yielding 0.80 then what can such model be able to do when fed with, say, 3000 images? 🤔 I'll be curious to find out.
Got it
Hi, I'm working on AutoML project and want to launch process for automated model training. I am interested in how can I monitor the process for resource usage or failures with kinda ready-made APIs.
Maybe? Please at least have a complete question ready if you're going to ping me
oh sorry
ok ill put them all in one message just tell me if i am right or wrong
Scalars - just 1 value
Vector - a list and has more than 1 value
Tensors - an N-dimensional array when N = 1, that tensor is a vector when N = 2, that tensor is a matrix [0, 1] this is a vector, [[0, 1], [1, 2]] this is a matrix
I see one error in this (I think)
Vectors can only have one value
Or even none (maybe? this one I'm not sure)
what do you think? @serene scaffold
@arctic crown @upbeat dove a scalar is a stand-alone number, yes. And then a vector, generally speaking, is a one-dimensional array. A (n,1) shaped array is a column vector because, despite having two dimensions, it's basically just vector in a column
And then (1, n) is a row vector.
what do you mean by a one-dimensional array
Any array where the shape is all 1s except for one n is a vector in a certain space. I'm iffy on the terminology there
@arctic crown when the shape is (n,)
[1 2 3 4] has a shape of (4,)
[[1 2 3 4]] has a shape of (1, 4)
im sorry i am asking so many questions
but whats a shape?
im just getting started on this
Uhh. I'll just make a random array and you can figure out what the shape is intuitively
ok
!e import numpy as np; print(np.random.random((2, 4, 3)))
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
001 | [[[0.28129572 0.33512977 0.93151513]
002 | [0.66246171 0.14259656 0.77884458]
003 | [0.1443945 0.72199244 0.43976486]
004 | [0.6581581 0.27763684 0.14058968]]
005 |
006 | [[0.29535783 0.08170229 0.85562144]
007 | [0.7064785 0.23825289 0.21846468]
008 | [0.65787749 0.91715884 0.45522423]
009 | [0.01521974 0.79929807 0.23513918]]]
This is a there dimensional array where the shape is (2, 4, 3)
There's two groups with four rows, three columns@arctic crown
ah and what are these random num bers?
Exactly that.
ah so random number
but what do we do with these numbers?
If you look at it clearly you'll see it 😊
This might seem a bit unusual 'cos it's has a depth of 3. That is to say, it has 3 dimensions.
This was how I got to understand it clearly when I started learning. Maybe you'll find it useful or useless... Who knows? 🤷🏾♂️
- How many squared brackets does the array has at the beginning?
Your answer would be 3, I presume. Which is correct. This means the array is a 3 dimensional array
- How many homogeneous groups do they have?
Answer = 2
- How many rows does each homogeneous group has?
Answer = 4
- How many columns does each homogeneous group has
Answer = 3
Now, you'd say the array has a shape of (2 by 4 by 3)
No.of groups, Row, Column = 2 x 4 x 3
yea that makes sense ty
but what do we do with the numbers?
Whatever you want
But I thought you were the one that asked for it? That's the shape.
ok now that i know what a shape is could you kindly explain what a vector is @serene scaffold
an array where the shape is (n,)
I went over that earlier, though
oh right sorry
so a vector is just an array that has a shape
and an array is basically a list but it has a fixed length. So once declared you can't add or remove elements from it. right @serene scaffold ?
sorry for the pings tho
I don't have time to answer these questions right now. Try directing them to the whole channel rather than just me.
ok sorry
informally: sequence (n,); vector (n, 1) or (1, n); matrix (r, c); tensor (n, r, c)
however, mathematically, they are all tensors
a tensor is "an interface" (in a SEng sense), and any "mathematical object" with that interface counts
so ordinary numbers are tensors, as well as vectors, matrices, etc.
informally, we tend to say: sequence, vector, matrix, tensor
but all of those are tensors (and theyre all sequences)
what makes a (n,) different from (n,1)?
(n,1) dot (1, n) is different thant (n,) * (n,)
Haha ,but this is for a bachelor's level project and they expect an accuracy above 90% so i wasnt sure how i can increase my data
(1, n) = a function which accepts a (m, 1) and "sum-products" its elements with n coefficients
(m, 1) = a point with m components in m directions
(m,) = a sequence of numbers which may or may not describe a point
a row vector (1, n), eg., f = [1, 2, 3] is essentially just an abbreviated way of saying, f(x) = 1 * x_1 + 2 * x_2 + 3 * x_3
a column vector (1, m), eg., c = [2, 4, 6] is just an abbreivated way of saying c = 2*[1, 0, 0] + 4*[0, 1, 0] + 6*[0,0,1]
what is n?
(1, n) just means the shape of some numpy array, where n is a positive integer
yes
plot_decision_boundary(lambda x: clf.predict(x),X,Y)
``` what is the point of the parameters X and Y in this line,I'm kinda confused
the function plot... takes three args: a prediction function, the dataset X, and the dataset Y
incidentally, it can also be written: plot_decision_boundary(clf.predict, X, Y)
in ml is the dimension the same as a real-life dimension? like x,y,z
Ohhh alright, thank youu😁
dimension = a component of a data point which can change independently of the other components
space has three dimensions: you can go up/down without going back/forwards (, side/side)
what is a data point in ml?
and an array is basically a list but it has a fixed length. So once declared you can't add or remove elements from it. right?
the complete information about a single observation, it will be dataset dependent
in general, think of it as a row in a table
!e ```py
import numpy as np
print("Sequence", np.array([1, 2, 1, 2]).shape)
print("Vector", np.array([[1, 2, 1, 2]]).shape)
print("Matrix", np.array([[1, 2], [1, 2]]).shape)
print("Tensor", np.array([[[1, 2], [1, 2]]]).shape)
@untold tundra :white_check_mark: Your eval job has completed with return code 0.
001 | Sequence (4,)
002 | Vector (1, 4)
003 | Matrix (2, 2)
004 | Tensor (1, 2, 2)
Also i wanted to know if i can increase my dataset from 400 images to 800 images w data augmentation
!e ```py
import seaborn as sns
point = sns.load_dataset('tips').sample()
print("Dimensions", point.columns)
print("Data Point", point)
yes, eg., look up keras image augmentation
Alrightt, thanks a lott
please help, whats an algorithm
a procedure that solves a problem
ty
a procedure that creates a problem
a procedure
@pine wolf what do you mean?
what i meant is that it doesn't actually matter if it solves a problem or not
I replied to you accidentally
given the amount of knowledge of the person who asked, I think "a procedure" would be too opaque of an answer.
well, yes, but a set of instructions is probably good enough
i can give you, say, an algorithm for starting a forest fire, but it wouldn't be solving any problems
unless you're very cold
if you want there to be a forest fire, then the algorithm solves the problem of the forest not being on fire.
but what if i give you the algorithm and i don't want a forest fire
unless "solves a problem" becomes so nebulous that it could mean anything, then ok, it solves a problem
yes, that's pretty much what I'm getting at
"a set of instructions" sounds good.
Hi i'm new to Ai any sugestions?
assert(A2.shape == (1, X.shape[1]))
cache = {"Z1": Z1,
AssertionError:
```can anyone tell what error this is
it would be helpful if you gave the whole error message. What comes after AssertionError:?
bro
can you help me with this google colab error?
It appears that it requires the shape of A2, which is probably an array, to be (1, n), where n is whatever the length of X is in its second dimension.
Sorry but I do not read screenshots of text.
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 10.24 GiB already allocated; 5.81 MiB free; 10.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
i am having trouble training a model
😢
It appears that you're trying to use more memory than the GPU you've been allocated has.
Hi i'm new to Ai any sugestions?
what do you mean by "manage the memory"?
i keep getting this error. how to resolve it?
"Data Science from Scratch" is a book that I read.
Thx
It might not be possible to resolve it since you can't control how much space is available on the GPU that you're allocated. I might be able to offer suggestions to decrease how much memory you're using at any given time if I see the code.
do i share the jupyter notebook?
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
Hey @terse frigate!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
whoops
Any other books that you recommend?
I'm reading one called "Hands on data analysis with Pandas" that's pretty good.
Sounds like that's strictly about using pandas. Which is something you'll need to learn how to do anyway, but if you want to build models, there's going to be a lot more to it.
It's difficult for me to suggest resources because I learned most of what I know as part of my degree, and from reading books that my company pays for.
I appreciate it all the same. Thanks.
whats a rank?
Hello everyone, I have a dataframe that contains this Name_Number_Number_Value_String
What i want to do , is create a column in that dataframe that has Name_Number_Number and another that has Value_String
So basically split the string that exists in one column
in 2 different ones
Does anyone have an idea as to how to approach this
(for keras)
Willpy model.fit([x, x2], [y, y2]) be equivalent to```py
model.fit([x], [y])
model.fit([x2], [y2])
@boreal loom I need to see the exact DataFrame you're talking about to understand the question. Please do print(df.head().to_dict('list')) and copy and paste the text into the chat. This is the only way I'll be able to help.
This is what I used and it worked
There prolly is a cleaner a way
@serene scaffold thanks for the help, I solved by google fu, much appreciated
Glad you were able to figure it out! That's probably going to contribute more to your learning than if I had helped you.
I can suggest a cleaner way if you show the data with the line I showed before.
pandas is giving me a hard time 😄
but we love it so 🐼 
my guess is no, but did you try it?
No not yet
Hello I want to generate some music from a few folk songs... is there an ai that is suited for this task?
so multilabel classification involves algorithms that can put multiple labels on instances?
like if there was a facial recognition algorithm
and we had labels alice, bob, and charlie
alice and bob are in a picture
it should output [1,1,0]?
That is correct. It shouldn't be confused with multiclass classification, which is just if there's more than two classes.
i don't quite understand what multioutput classification is
each label
is multiclass
am i wrong for saying this looks like y = mx + b
like y = theta0 + theta1(x1)
assuming you only had one feature
I think that's what it is, yes.
there's nothing in this which indicates multi-class
it's just multiclass if $\hat{y}$ can take more than one of two values
sure
what's the difference between a col vector and a normal vector... a col vector is vertical and a normal vector is just like [1,2] on a graph?
a col vector is a normal vector
it's just a different representation?
ordinary?
ok, so a col vector is the "ordinary vector"
a row vector is the strange one, a row vector is a function
oh
!e ```py
import numpy as np
point = np.array([1, 2, 3]).reshape(3, 1)
fn = np.array([2, 4, 6]).reshape(1, 3)
print(fn @ point)
@untold tundra :white_check_mark: Your eval job has completed with return code 0.
[[28]]
fn @ is a function which accepts point as an argument, and does 2 * 1 + 4 * 2 + 6 * 3
Would a convolutional layer be useful for a chess neural net?
what does N-dimensional mean? are dimensions the same as real-world like x,y,z?
that it has n dimensions, whatever integer n is.
our physical world has three spatial dimensions, but mathematically there's no limit to how many dimensions something can have.
whats the use of dimensions here?
Many of the questions you have asked would be clear given some reading on linear algebra. It's required for ML.
got it!! thank youu:))
there are some slightly different meanings of "dimension" in machine learning and scientific computing:
-
the number of "axes" of an array. a vector is a 1-dimensional array, a matrix is a 2-dimensional array, a 3-tensor is a 3-dimensional array, and so on
-
the dimension of a vector space, which technically is the size of the set of basis vectors but also corresponds to the number of elements in a vector; e.g. in the standard vector space defined on ℝ³, the dimension of the vector space is 3, and vectors have 3 coordinates
-
the dimension of the column space of a matrix, i.e. the dimension of the image of the linear operator associated with that matrix. this is a very important concept in applied linear algebra, called the "rank" of a matrix.
and yes, i second squiggle's recommendation that linear algebra is "required reading"
i think for the most part, in machine learning discussions people use the 1st meaning
the 3rd meaning usually is called "rank", and the 2nd meaning usually is confined to theoretical math discussion
one well-known place where the 2nd meaning appears: the "curse of dimensionality" and the "dimensionality" of an embedding
so actually i'm wrong... people use both terms, and it's up to the listener to understand their meaning from context
it can be very confusing. unfortunate clash of terminology
maybe it's easier in other languages
i am getting this error. i googled it and it says i should use the second column..but i cant understand fpr,tpr,thresholds =metrics.roc_curve(y_test,y_predict_test_prob) raise ValueError( ValueError: y should be a 1d array, got an array of shape (80000, 2) instead.
i want to plot the roc-auc curve
what is y_predict_test_prob? what does it contain? what does it represent?
y_predict_test_prob = MNB.predict_proba(X_test) it has the predicted probabilities
MNB is logistic regression
can we say 1-tensor or 2-tensor?
i kind of made up that term
I don't know if engineers use it but I wouldn't repeat it
sorry i meant to edit that out
hey are they an ml courses for free online i could also pay as well
but must be well taught and have a lot of math (doesnt skip over)
and preferably in python
no andrew ng course because its bad no math
im a math cs major
@eager ether Please don't try to ping @everyone or @here. Your message has been removed. If you believe this was a mistake, please let staff know!
:ok_hand: applied mute to @eager ether until <t:1639295285:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
Needs to be more specific. The numbers before the -tensor could mean multiple things. Though in general, I think one would assume you mean the tensor rank (not same as matrix rank).
Could probably create a table with columns of tensors, linear algebra, and computer science, and rows of rank, and dimension. With each cell having a different definition.
(yes, i'm keeping tensors separate from lin alg because they are complicated)
could mean multiple things
Hmm, what other ones?(r,s)-tensoris an element of this:
are there other common ways to classify them?
Depends on what the current context is, but yeah the number would be assumed to mean the rank. Just usually would write it as rank-2 tensor, rather than 2-tensor. An example of what might happen is that it might be interpreted to mean any tensor in 2 dimensional space (of any given rank).
guys any article that explains how to make a neural net using numpy and python
i wanna make a chat bot using ml
Can also use the terms dyad, triad, tetrad, etc, for rank 1, 2, and 3 tensors, etc (a bit more uncommon).
you mean 2,3,4, right?
yeah my bad
Dyad might get confused with dyadic. Honestly, math terminology is kind of mess due to there just being so many different fields with overlapping naming. And in the case of ML, programmer terms too (and even ML made up terms which also have different meanings in different papers). It really all relies heavily on context.
I am making making a phishing link detector using python(machine learning) and I need some help if anyone can help pls let me know, I'll give all the details of what help I need and you can refer the base of the project from the link below.Thanks
I made and trained the machine learning algorithm with the dataset that I downloaded from uri .Now I want to make an algorithm that can check the provided link for the features that are present in the dataset and accordingly give the output.
https://www.activestate.com/blog/phishing-url-detection-with-python-and-ml/
Hello, people. I'm very new to python having just completed an introductory course. I want to step into Data Science & ML but Idk what all I should read or start in python. So, I've a question: If you were send into the past to guide your younger self for ML, what all would you recommend him in python & from where?
I'm not a data scientist though I've spend some time as a Data Engineer working with DS. I recommend https://github.com/fastai/nbdev as it will help you able to experiment in Jupyter labs / notebooks and package it up into a python package. The tutorials take you through github or gitlab CICD (continuous integration continuous delivery) which is a good practice for software development.
Thank you so much. But I want to do it from scatch. Can you please tell me what all I should read in python? I'll surely give a look to fastai
Also, I would appreciate if you can differentiate Data Engineer and Data Scientist.
Can you please tell me what all I should read in python?
No not really. I find the offical python / pandas docs great thesedays,
if you can differentiate Data Engineer and Data Scientist.
A Data Engineer (DE) can be as simple as pulling data in from different sources however possible using just python or bash scripts or sometimes tools like apache airflow. A DS can make sense of that data once the data is in a db or place it can be worked with.
I see. DE has a quite neat work.
Also, don't get me wrong when I asked "What all I should read in python?" It didn't imply that I'm looking for shortcuts rather preferring what could be one of the best way to move ahead in ML.
i have a test dataset with shape (1600,10) and a train dataset with shape (6400,10)
is there a convenient way to construct an numpy array of shape (1600, 6400) containing the euclidean distance measures between my test and train data, other than using for loops
For a numerical series of data, is the lowest variance from the 90th percentile pretty much giving you the same data compared to the highest variance from the 10th percentile?
A low variance in the 90th percentile would imply that the data remains close to the 90th percentile, and having a high variance for the 10th percentile would mean that the data is far from that 10th percentile point - AKA it would be closer to the 90th percentile
I figure it's not an exact 1 to 1 relationship between these things. Is there some algorithm or formula to figure out this type of relationship?
something like (train - test)**(-0.5) ?
yh this is the code i currently have, but it takes forever to run, so im wondering if there is a more efficient and quicker approach to this
this is what my lecturer did for the cosine distance, but i dont understand what he did for modtest and modtrain
think that might be exactly what im looking for, ill have a read, thank you!
Traceback (most recent call last):
File "F:\nifty_banknifty\nf_bnf_2.py", line 30, in <module>
df2['one_year_dates'] = np.arange('2017-04', '2018-03', dtype='datetime64[D]')
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
value = sanitize_index(value, self.index)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
raise ValueError(
ValueError: Length of values (334) does not match length of index (46739) ```
ping me when u reply
i want to add array values as column in pandas dataframe
You have to use a for loop in that range and give whatever value as you want to that columns, but you have 46k data, so every column should have 46k value (you can use None)
can u show what u are trying to say ?
You are now creating a column and giving it values of that range, like this
if thats what you want, you shold pad your np.arange array
a = np.arange('2017-04', '2018-03', dtype='datetime64[D]')
print('a\n', a)
for i in a:
df2['date1'].append(i)
``` i am trying this wqay
do you want the column indexes as dates? or you want dates as values to column 'one year dates'
hello there, we are writing up a CASH/Auto-ML algorithm based on the BaseSearchCV function on scikit-learn. we have setup a timeout so that it stops running after a specific period of time; however, we are struggling to extract the best results (after trying best_params_). does anyone have any guidance on that front?
? @lone drum
i want to take dates in expiry column
result = df['expiry']
? 😄 i don't understand what you meant by taking
do you want to put that dates (np.arange) into columns?
see in last expiry column i want to put dates
yes
@uncut stump u thier , do u get my point ?
yo anyone can recommend interesting computer vision related research papers?
the one existed already ill need to create a presenation about it
I'm sorry I'm back
can you pad the a array and try again?
np
means what exactly ?
there is 46k values in your dataframe right?
yes
how i can use this is my case ?
>>> np.pad(np.array([1,2]), (0,46000), 'constant')
array([1, 2, 0, ..., 0, 0, 0])
>>> len(np.pad(np.array([1,2]), (0,46000), 'constant'))
46002
okay
now is i have to create new column ?
did you try with this padded array?
anyone skilled with clustering, specifically DBSCAN and HBDSCAN
I'm wondering how I can reduce the outliers in this HDBSCAN clustering plot
BASE TRUTH
the results are quite nice against the truth that I'm trying to replicate, but through a lot of experimentation I can't seem to pull the outliers in
increasing epsilon seems to have no effect as well, which is confusing me
I just have it set to 0
hdbscan has a lot of parameters, worth checking the docs for the python library. however if you are replicating a result, can you get access to the authors' code?
messing with parameters for 100 years seems like a poor use of your "research energy"
I'm not replicating a result someone else had, the second image is just how the data is actually distributed, so anything close to that would be successful
so I'm on my own there unfortunately
I will check this out though
ah i see. can you train a supervised model (classifier) instead?
seems like a good use case for xgboost or a neural network with 1 hidden layer
or random forest, but the feature subsampling won't be that useful in that case
probably better off with xgboost than rf for just 2 features
oh wait this is pca
I found something else really weird too, which is that if I run the algorithm twice through, it kind of doubles down and classifies more? I'm not sure why this is happening, here's some example:
clusterer = hdbscan.HDBSCAN(
min_samples=20, # for dist metric (mutual reachability metric)
min_cluster_size=50,
cluster_selection_epsilon=0.0
)
# fit data
clusterer.fit(df)
# add cluster to DF
cluster = clusterer.labels_
df['cluster'] = cluster
I run this through to get the first result, I've showed, but if I literally copy and paste the block so that it runs through twice, I get this result:
like you can see the original clusters in there untarnished, but loads of the outliers suddenly become a part of a new cluster
any idea why that's happening just out of curiosity? 😂
interesting, iirc the algorithm is iterative, so it might be doing more iterations instead of starting over
scikit-learn has "warm start" functionality for iterative algorithms
it's weird though because I would've thought it would overwrite, since I'm writing the clusters to the dataframe with each iteration
you can usually use sklearn.clone to get a "clean" copy of the estimator object
and yeah it's 8 dimensions reduced to 2
wait... you are including the cluster labels in the 2nd round of clustering
don't do that
so that's it, how do you mean?
I'm really newbie to data science in general btw so feel free to explain like I'm 5
use a classifier
if you want to learn to predict classes on new data
it thinks the cluster labels are a feature to use for clustering
what's an example of a classifier?
a decision tree is the easiest example to understand
oh so it's not clustering
this project is for uni and for some reason they want it to be unsupervised clustering methods only
but I saw you recommended lots of ways that probably might be superior
clustering is unsupervised way of classification as I know
im trying to train a time series forcasting model to predict 1 timestep into the future. The data comes from a pretty complex thermal system so i would think a LSTM is the way to go. I have quite a lot of data, 12 years @ 3s per data point. My question is: Any tips on how large the network should be (both layers and how many neurons per layer)? Is there a way to figure out if i really need all 12 years of data without trying it out?
I have quite a lot of compute power so that shouldnt be an issue
For hyperparameter tuning, you could try grid searching or Bayesian Optimization.
Here is an article on grid searching
hmm, i dont have THAT much compute, one epoch takes around 1.5h on 8 nvidia V100
Ohh yeah that won’t work then
i was hoping for some kind of rule of thumb that gives me a good starting point
I don’t really know then, sorry
Hello. It's Ali. Is there any python image processing package/library from which after taking snap of the clothe eg. shirt the shirt can only be extracted from the image ?
is there a correlation between the size of the layers and how far into the past it should take into consideration? currently im using 256 as the look-back size (~12 min) should that influence the model size?
Hi, if i have a dataframe column that has values as
0 [31]
1 [24, 36]
2 [24]
3 [11, 24, 42]
and I wanted to return the column as
0 1
1 2
2 1
3. 3
basically a count of elements per row
:incoming_envelope: :ok_hand: applied mute to @stiff geode until <t:1639329081:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
What's the right way to go about it
sounds like you just need to do .apply(len)
Looks good to me.
In [1]: pd.Series([[1, 2], [3], [4, 5, 6]])
Out[1]:
0 [1, 2]
1 [3]
2 [4, 5, 6]
dtype: object
In [2]: s = _
In [3]: s.apply(len)
Out[3]:
0 2
1 1
2 3
dtype: int64
Oh cool, thanks! That seemed to do it.
usually, using apply is a bad practice as it's not optimized with C, but it's the only option in this case because pandas has limited support for working with pure Python objects like lists.
Fair enough, though I tend to forget all the things that I can do with .apply
In other words, if you have a Series of numeric types (like ints or floats), you should pretty much never be using apply
I'll keep that in mind. Thanks for helping out.
In [12]: series
Out[12]:
0 0.097326
1 0.169674
2 0.348592
3 0.110183
4 0.479952
...
95 0.210811
96 0.216305
97 0.918834
98 0.281781
99 0.512556
Length: 100, dtype: float64
In [10]: %timeit series.apply(lambda x: x / series.sum())
2.19 ms ± 16 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [11]: %timeit series / series.sum()
63.7 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I saw someone do the first one on linkedin and it made me so mad
LinkedIn seems to do that usually, I don't blame you
I don't enjoy the way you need to portray yourself on linkedin
that is, how you have to maintain a specific (ie hirable) image?
Hi everyone, I'm trying to scrape data from https://www.idealista.com/en/maps/madrid-madrid/ using beautifulSoup
This is my code :
import requests
from bs4 import BeautifulSoup
url= 'https://www.idealista.com/maps/madrid-madrid/'
res =requests.get(url)
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
soup = BeautifulSoup(res.text,'html.parser')
print(soup.prettify())
But code is not giving the whole html content of the page.
can anyone help me to extract the name of the streets from the page?
Yeah, pretty much. Just feels pretentious a bit? (though I'd rather be pretentious than unemployed.)
Which is also why you see me here asking a lot of questions, just trying to learn/practice python
This is not a data science question. If you finish the scraping component and want to do something with the data, feel free to return here.
okay. sorry..
any suggestions for pdf manipulation libraries, pdfminer isn't working out super well for me.
Hey, i know i have asked this question many times and it has already been answered but this is the last time
what is a tensor
if you're asking about what a tensor is mathematically and what the word "tensor" has come to mean in Python, you will get slightly different answers.
nono i meant in tensorflow
Hi does anyone know how to round a list to 3 significant figures in python?
a tensor in tensorflow is basically the same thing as a numpy array. but they can exist on a GPU.
Significant figures are used to keep track of how precise a given measurement is, but because of how numbers are stored in a computer, I don't think any scientific computing libraries actually try to account for them.
so the question is, are you trying to round to three significant figures because you're doing actual scientific calculations, or is this something you've been asked to do to learn more about programming? (If it's the latter, that does not preclude you from getting help.)
can it be more than one-dimensional? like 27-dimensional array
tensorflow probably imposes some limit on how many dimensions you can have, but theoretically speaking there's no limit, no.
Hi @serene scaffold i'm trying to do actual scientific calculations, I'm trying to find the best way to group a list of floats to count their frequency so I can plot number/frequency vs x values(which are rounded)
ah ty
what do the sequence of numbers mean?
this is the same as asking "what does a number mean" in general. It depends on the situation.
oh sorry i meant what do you do with thoses numbers
it depends on what you are trying to do 
you could, for example, have an array of floats that represent test scores. Taking the mean of that array would tell you what the average test score was.
ah ty
is there a specific one you want feedback on?
wait are theses numbers the numbers that go on the graph
what do you mean by graph?
if you were to plot a graph are these the number that go on it?
what do you need to do? there aren't many options, pdf as a format is kind of fucked up
@arctic crown "tensor" is another word with multiple meanings. in programming (eg tensorflow) it is a multidimensional array. in linear algebra it is a "multilinear map", which can be represented as a multidimensional array
you can think of them as a generalization of matrices and linear transformations (which can be represented as matrices)
what does generalization mean?
a "generalization" of a concept is a "more general" version of that concept. usually it means that some restriction is removed
like, H2O is a generalization of ice and water
(not the best example--I'll think about it and report back)
ok
yes but my question is what do you do the number that are in that array
a better example would be how multiple linear regression is a generalization of linear regression
y = mx + b is a "special case" of the more-general case of multiple linear regression
you take a linear algebra course 😉
matrices and tensors represent transformations
i recommend studying MIT 18.06 linear algebra, the lectures are on youtube and the instructor is very good at helping you build intuition about the math
Now it's finally my turn to ask a question (though unfortunately I have to be a bit vague). One of my projects at work involves learning the mapping between certain sets of sequences, and the length of the input sequence isn't always the same, and the length of the output sequence isn't always the length of the input sequence. I've heard that RNNs or LSTMs might be the right architecture to use, but in my reading, it isn't clear to me how those are ideal in circumstances where the lengths of the sequences is inconsistent.
transformers
Leo Dirac (@leopd) talks about how LSTM models for Natural Language Processing (NLP) have been practically replaced by transformer-based models. Basic background on NLP, and a brief history of supervised learning techniques on documents, from bag of words, through vanilla RNNs and LSTM. Then there's a technical deep dive into how Transformers ...
like, the ones in "attention is all you need"?
⚙️ It is time to explain how Transformers work. If you are looking for a simple explanation, you found the right video!
🔗 Table of contents with links:
- 00:00 The Transformer
- 00:14 Check out the implementations of variuos Transformer-based architectures from huggingface! https://github.com/huggingface/transformers
- 00:38 RNNs recap
- 01:14 ...
yes
RIP easy solution
I don't think they're terribly difficult to implement from scratch, but I don't know how much data you need to train them effectively
my impression is that they are actually a lot more flexible and easier to deal with than LSTM
what does tf.ones mean?
It makes a tensor where every value is 1.0
We're happy to help, but I feel like you could answer a lot of these questions on your own. If you're just asking what a function does, this can be looked up in the documentation.
If you find the documentation confusing, we could also help you learn how to read docs, as this is a critical skill to develop.
To add here, sequence learning (of non-trivial length) is a very open ended problem that has no perfect nor even a nice solution at this time. The new bleeding edge stuff, even outside of deep learning, is still struggling with it. I know that GPT may seem impressive (and it is), but when you really get into it, all the massive limitations and issues show up.
does it matter how much the lengths vary?
Sequences are a different beast than other types of input.
oh i misread sorry
Variable lengths can matter more or less depending on the method used (how much variance). But in general it will make it more difficult.
am i correct in that transformers are a better default choice than lstm or other rnns nowadays?
There is also the problem of trying to recognize objects given a sequence, but that sequence may come out of order as it often does IRL (even more difficult and LSTM will fail hard at this for example).
just to check my understanding, if I'm building a transformer model, do I need to select a constant maximum input size and pad any input with enough zeros to reach that size?
i am also kind of curious what kinds of sequences you are working with
Transformers are often the better option in deep learning, but not always. Also learning about RNNs makes their motivations and methods make more sense / gives context.
afaik in text sequence modeling you use a special "padding" token
dt[f"rmean"] does anyone know what that f means?
which language?
btw go to regular help for these type of questions
I wanted to do some research in data-science and this channel has just become general help
Thats why i posted here
I had no idea it was basic python, I am sorry, I would not ask here, my bad 
Actually different question, is it better to have a nn with increasing # of neurons per layer, decreasing or identical
Depends on the problem, the nn
I'd suggest trying several combination of each if you're able
There is not a pattern per se from what i have seen
This is mostly empirical unless you are hyperaware of what you are doing
You can use a library like Optuna to run a search over those parameters
Assuming that your model trains in a reasonable amount of time. If it takes hours/days it may not be feasible to do that type of experimentation
I'm trying to strip spaces out of a column, but I'm still finding entries with spaces. Even when targeting a specific string it doesn't appear to work. What might be happening here?
df1.iloc[1293].ABN.strip()
'14 011 062 338'
Doesn't strip just remove the first/trailing space and not all spaces in a string? You should use .replace(" ", "")
hmm.. it looks like that's the case.. I must have misread the description.. thanks!
I cant give you an answer, like most people here, since its a very specific problem. You should try by finding the number of layers you want to use and then why. Then try to find how many neurons you need to do a specific task, like what does this group of neurons accomplish etc etc. That's why i said its a very specific problem and you can only find out by trying different things out.
Alright then
For example in a CNN you would have layers where they accomplish certain things, so you could try and gauge from that, but i am not familiar with what type of nn you are using and as to how it operates. Either read more about it, or just try and repeat
If you really want to understand, i would suggest you find some good papers about NN and see how the team behind them arrived at certain assumptions suchs as layers etc. That might give you an insight
Well I'm using one convolutional layer and a couple of dense layers but thanks for the advice
Hoping it would find patterns like pins, forks etc then using the dense layers to evalutate those
You could also try and see how similar teams have achieved similar NN, and take this as a starting point
While it might be out of your scope i would suggest looking into a Reinforcment Learning model free scenario
Where your NN updates the Q
I need some assistance with learning data frames
Any time you ask a question, always immediately put enough information out to start answering it. Don't introduce a topic and wait for someone to show interest.
Hello
I have a date column in dataframe
The dates are in
04/03/2017
05/03/2017``` format
I want it in
4/3/2017
5/3/2017
How I can do this ping me when replying
!docs pandas.Series.dt.strftime
Series.dt.strftime(*args, **kwargs)```
Convert to Index using specified date\_format.
Return an Index of formatted strings specified by date\_format, which supports the same string format as the python standard library. Details of the string format can be found in [python string format doc](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
I tried but I am getting some weird output
Not what I am expecting
I want to remove zero from day and month value
I cannot comment without knowing what code you ran and what the output was.
Check the date1 column
Do u get my code and output?
Sorry, but I cannot look at this. Try opening Discord on that computer and copy/pasting it into the chat.
new_df.date1 = pd.to_datetime(new_df.date1).dt.strftime('%b/%d/%Y')```
Current data frame with date1 column
@serene scaffold do u get my code and output
It appears that it is not possible to do it with strftime, but the alternatives are a lot less elegant.
I guess you could use a regular expression
pd.to_datetime(new_df.date1).dt.strftime('%b/%d/%Y').str.replace(r'/0(\d)/', r'/\1/', regex=True)
See if that ends up looking like you wanted.
boop @lone drum
I am getting this error
In [5]: s = pd.Series(['Apr/09/2017', 'Apr/12/2017'])
Out[5]:
0 Apr/09/2017
1 Apr/12/2017
dtype: object
In [6]: s.str.replace(r'/0(\d)/', r'/\1/', regex=True)
Out[6]:
0 Apr/9/2017
1 Apr/12/2017
dtype: object
It worked when I did it.
I was going by this screenshot
What does pd.to_datetime(new_df.date1).dt.strftime('%b/%d/%Y') look like if you print it? Please copy and paste the result as text and I'll be happy to continue helping.
