#data-science-and-ml
1 messages · Page 410 of 1
df1.loc[df1["kwartaal"] == "Q1", ["month", "day"]] = 1, 1
df1.loc[df1["kwartaal"] == "Q2", ["month", "day"]] = 4, 1
df1.loc[df1["kwartaal"] == "Q3", ["month", "day"]] = 7, 1
df1.loc[df1["kwartaal"] == "Q4", ["month", "day"]] = 10, 1
This this be done more clean
you could use a for loop, I guess. but it looks like what you're trying to do would be better accomplished with datetime instead of ints.
How to make redqueen.py?
idk what that is
I think I understand but an example is welcome 😄
Hi , guys , i have one small take home test.
So the data has ID column and some values in another column.
My requirement is.
If I enter an ID number , the code should return rows that are similar to the values in the ID i entered.
ahem, so i read a csv with pandas, (columns are ID Timestamp Contents Attachments), how would i use the data from the Contents in tensorflow?
nvm
i got it into a TensorSliceDataset
how do i fit it into a model
question about general ai, but im trying to find a pattern in strings (im going to try and use this for reaction prediction in chemistry) anyone know a type of ai, library, or algorithm I could use or research about to go more in depth into this topic
thanks so much!
pattern in strings? like what?
because if your problem can be solved with an exact series of steps, you should just do that
basically (and this might just be me being an idiot lol) but im trying to find a pattern between chemical equations and their results, because they cant be modeled mathematically (to keep it simple) so like something that can realise that x+y=z but if a is added x+y+a=b
I dont mean to confuse you with the plusses but its not a mathematical equation but instead a chemical one
why can't you model them mathematically? doesn't there have to be the same number of moles of each element on either side, or something?
yeah but it gets a lotttttt more complicated :/ the moles on each side is for balancing and I can do that easy enough the issue is predicting an output, such as if given
H2 + O2 how do you know it gives H2O (all non balanced)
and really there isnt any easy way to do that
since there are far too many exceptions, reasons for interactions and all sorts of nonsense
so instead im trying to just feed it like 100 million example equations and hope it can find the relationships itself instead of having to code them all
do you know what feature engineering is?
no sorry :/
it's where you take the inputs that you're given, and derive new inputs that help you figure out the outputs
ohhh
is there any librarys of methods you would recommend for a text based version? or is it all the same kind of idea
for your inputs, you need to extract information that a chemist would use to figure out the answer
like, you can write something that determines if there's at least two Hs and at least one O in the input, and then have has_water as a feature
a feature is like... a property of the input
is there any way I can have it derive that? since there are a helluva lotta different reactions, combinations, and exceptions that I would need to code
I don't know, I've never done anything chemistry related with AI
why do you need to do this? because if you don't already have some ML experience, this sounds like a very challenging first project.
yeah :/ its rlly annoying since very little things can change everything, and there are so many properties that can effect every part of the equation 😩
its a project im trying to develop hopefully for colleges, prolly just cuz its fun lol
this channel is mostly about helping beginners with AI. you might need to go somewhere that's more about chemistry. because reducing the problem to "I need to detect patterns in strings" is so generalized from what you're really trying to do, I don't think there's an answer to that question that will solve it.
alr, tysm for the help!
Hey, it's me again but with another question 
I have a df with ~9500 rows, and I need to find the 'period' of 500 rows that has the highest sum. Can I do that without having to loop through it a bajillion times?
Dataframe looks like this
Also, how can I filter out every row (datetime format for the column) that has the same month AND year than another row?
Do you want to a. keep the first / last valor value of the month and simply ignore the rest? b. Or do you want to average or sum it? If b you just need to use group by and use the appropriate aggregation function.
has anyone ever managed to run code from either the stylegan2 or stylegan3 repos?
I'm a huge fan of how advanced these gans are at creating novel human faces and I'd love to play with the code
but I'm told stylegan2 is basically abandoned, and that I should use this code instead
and the newest version is stylegan3
but I've been trying for months to get any of this code to run on both my windows computer and my ubuntu virtual machine
and every time I solve an error, a new error replaces it, until I run into an error that has an issue on the github project with no solutions, or the only solution given didn't work for me
and when I run into an error that isn't on stackoverflow, I really just feel like I've hit a dead end and it's impossible
is it just me, or are these codebases so outdated now that the only way they'd work would be with the exact versions of their dependencies that they released with
I'd love to know if anyone else can manage to run the generate.py files in either of these projects, and if so, what they did so I can copy it
The inner product cosine formula is a theorem in R^2 and R^3 by law of cosines. In higher dimensions, the inner product cosine formula is a result of how we define angle.
Why then does cosine similarity work in higher dimemsions? Seems to me that angle only has meaning in R^2 and R^3.
what one does is define the angle pairwise between two vectors
since vectors don't have a location in space, only a length and orientation, you can consider both vectors as starting at the origin and point toward a point in space
this gives you 3 points in space, which uniquely defines a 2D plane
the angle between the vectors is computed on that plane
Keep the last and ignore the rest.
this is the same as how the 2-norm or length or euclidean distance of a vector is computed: you can compute it for 2 coordinates at a time and see that this easily extends to higher dimensions by just substituting repeatedly. this is equivalent to drawing 2D planes and using the pythagorean theorem repeatedly
Hmm. So you are saying this reduces to a 2D plane where angle makes sense?
Just finished my final data science project for a while
i'd love it if someone could review it really quickly
it's pretty short
but i want to make sure i got all the technical parts of it correct
pretty much. at least in R^n. in general, one does do it the other way around: one computes an inner product, and attaches the meaning of "angle" to it
share the project link maybe someone here can review
"Here a K-level qualitative variable is represented by a vector of K binary variables or bits, only one of which is “on” at a time." Can anyone please tell me what does a K-level qualitative variable mean in dummy variables?
you'll find this as "one-hot encoding" in google
the idea is that you have a list of categories your object can fall into, or a list of adjectives if you prefer
only one of them can be true at a time
so let's say something like [big, medium, small, tiny]
if your object is big, you encode this as [1,0,0,0]
so thats it??
yep
so amongst a given answers, only one answer will be correct. but what's the use of the term k in this?
there are K answers to choose from
in the example i gave you, K = 4
ohh so it is already assumed that one out of those 4 will be correct right?
you need to have a good reason to believe this representation makes sense
i chose those adjectives because they are mutually exclusive
it doesn't make sense to do this with, for example, [big, small, blue, red]
oh
since 2 could be true at a time
then you have the problem that the possible options don't have the same norm
but minimum 1 will be correct right??
that's the idea, 1 has to be correct no matter what
if your vectors don't have the same norm, the gradients get really nasty after a few iterations
all righty
hello, is someone have e-book or link about generative adversarial network (GAN) to generate image?
my teacher give me project but im still blind about GAN, (I've been trying to find and study GAN from internet).
What is the best for image segregation if I have color ranges?!
For example:
Rang1 : rgb(255,0,0) - rgb(255,0,10)
Rang2 : rgb(255,255,0) - rgb(255,255,10)
computing inequalities? 😛
Do you guy agree?
Context - Deep Learning, Activation Function
sounds about right
if you're dealing with RNNs then you want tanh instead of ReLU
Please help, PANDAS
I’ll guess this will work for you: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.last.html
Or these will work: https://stackoverflow.com/questions/41525911/group-by-pandas-dataframe-and-select-latest-in-each-group
why does it keep saying invalid syntax
Can someone explain what do we mean by modality ?
Anyone here rly good at CV could you teach me segmenting techniques?
i think you'd want to create a unique dictionary containing the times, then do a groupby on the name sorted by the time, and find the lowest time in the dict that isn't in the time of the groupby
anyone willing to help with a data science/ai problem? (in a Jupyter Notebook)
is there a simple way to filter out dataframe columns with equal name and if the combined values of those columns are less than a specific value?
im not sure what you mean
Hi! would AI be a good career? some people told me that it'd be hard to get a job on that field.
If not, what's more hip?
and, is theory > practice? (if you had to choose a speciality that has only one of them)
Thanks!
Takes a couple years to learn
but is it worth it? in terms of facility of getting a job, and its salary?
Yes if you’re able to cope with the monumental work it takes to get there
Would you suggest any other career path that's worth it other than AI?
I don’t know what type of person u are so no
Managment, consulting, anything easy and pays decent may be good
If u can’t sit down hours on end and do computational tasks
And spend over a year learning full time
Another downside to AI is if you aim to reach the top and are not insanely high iq and have a PhD in math I don’t think Ur able to work on research tasks for big tech
Take that with a grain of salt tho, but yeah making it to the top in such a field is not attainable
I doubt I’ll ever make it past senior ds for a medium company
Oh and don’t forget douchebag managers who can’t code and think they know more than u
If u can stomach all of the above I’d say 100% do it it’s worth it
what do we mean by signal and noisy component in a data set ?
You have what it takes to do it. 💪🏿💪🏿
Nice and good luck, do u like python
Yep I like it so far
Thanks!
Good then it will be fun for u
Did u take stats in school? If so it will make life very easy
Signal could be a linear relationship or something and noisy component could be normally distributed error or something like that
Yepp, still will next year
you could do something like business analytics
join the dark side
afaik they do not need phDs in math
we make da dashboard
no seriously to this day i have no idea what a business analyst does besides analyze "business data" like customer churn etc.
Anyone here know how to score segmentation accuracy
Is it how many ground truth pixels were correctly guessed?
In a certain area around the segment
Don't ask to ask, just ask.
OKKKK
I want to cluster a list of names according to 5 different parameters, these parameters have scores ranging from 1 to 100
I did this using the following code
"This part is used to compute the optimal number of clusters using an Elbow Curve---"
model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,12)).fit(df_target)
visualizer.show()
"---Uncomment this previous section to compute the optimal number of clusters-----"
"We will fix the number of clusters to 4 in the following"
X = df_target.values
def calculate_cost(X, centroids, cluster):
sum = 0
for i, val in enumerate(X):
sum += np.sqrt((centroids[int(cluster[i]), 0]-val[0])**2 +(centroids[int(cluster[i]), 1]-val[1])**2)
return sum
def kmeans(X, k):
diff = 1
cluster = np.zeros(X.shape[0])
centroids = df_target.sample(n=k).values
while diff:
# for each observation
for i, row in enumerate(X):
mn_dist = float('inf')
for idx, centroid in enumerate(centroids):
d = np.sqrt((centroid[0]-row[0])**2 + (centroid[1]-row[1])**2)
# store closest centroid
if mn_dist > d:
mn_dist = d
cluster[i] = idx
new_centroids = pd.DataFrame(X).groupby(by=cluster).mean().values
# if centroids are same then leave
if np.count_nonzero(centroids-new_centroids) == 0:
diff = 0
else:
centroids = new_centroids
return centroids, cluster
k = 4
centroids, cluster = kmeans(X, k)
# Create the figure
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Generate the values
x_vals = X[:, 0]
y_vals = X[:, 1]
z_vals = X[:, 2]
cluster = cluster.astype(int)
But this doesn't seems to work
do you have some suggestions please
For now the plot is in 3d
However given that I have 5 parameters
I didn't know how to proceed
The way I coded this is something I used for a 3 parameters problem, worked well
I use k-means
I looked on Stack Ex, apparently kmeans can work for problems up to 7 dimensions
I have 4 graphs I plot with SHAP:
shap.summary_plot(shap_values[0], X_test, feature_names = X2.columns)
shap.summary_plot(shap_values[1], X_test, feature_names = X2.columns)
shap.summary_plot(shap_values[2], X_test, feature_names = X2.columns)
shap.summary_plot(shap_values[3], X_test, feature_names = X2.columns)```
Is there a way to cycle between them with correspondent buttons?
Are you describing a web dev problem?
nope
just jupyter
displaying results obtained with randomforest
I was thinking of using a callback function that clears the plot area and replots the graph selected when the user clicks
but i don't know how to do that with SHAP
yes interactive
I'm not super familiar with JS and it would be preferred to use python in this task
could I use matplotlib maybe?
I don't know how well it works with SHAP
Plotly is what you need
professionally, how are trained NN handed over to customer for deployment??
¿
thats something asked during the business requirements gathering stage
usually not after the fact

Hey can someone please explain how LSTM neural networks work I have a very specific doubt and I cant find the answer anywhere.
I managed to get it done with ipywidgets 👍
tell me i m hungry, its bijness
Good evening guys!
In Pandas, I have a 6000 column deep df, and I want to sum a specific column 500 by 500, something like this in Excel:
Next cell over would be =SUM(C2:C501) and so on
The df is exactly like this sheet, without the sum column
do you want a rolling sum of every window of 500 rows? or do you want to group the dataframe into each block of 500 rows, and sum it?
!docs pandas.Series.rolling
Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')```
Provide rolling window calculations.
Yeah, I tried it, but it threw DataError: No numeric types to aggregate, I even tried to use pd.to_numeric on the valor column, but still get same error
can you do selic_rates.dtypes and show the result as text?
Sure thing
so valor is fine. it's probably data that is giving you issues. try selic_rates['valor'].rolling(500).sum()
Worked, thank you so much! 🙂
do you understand why adding datetimes together doesn't make sense?
dunno what youre saying. good luck
Yeah absolutely, I thought that the on='valor' parameter was limiting the columns to sum 😅
The results are different from Excel somehow tho
Probably decimal place difference? Python uses more so the sums are a little bigger?
you should check the default behavior of rolling, and whether or not it's centered on the current value, or the next 500 values, or the previous 500. when you set the window to 500, which 500 is that?
but it looks like the answer is only different for the first one?
It looks like it's displaced by one
I don't like jumping back and forth with my eyes 😛
Like, 501 on Excel has the value of 500 on Python, 502 on Excel of 501 on Python and so on
From the looks of it the previous since rows 1-499 are NaN right?
I suppose so
Looks like Python has one less row at the top, and one extra at the bottom
how would i make my own model with a dataset?
nvm
does anyone know if pandas supports selecting specific table objects when reading from excel? note the blue table and top left where it says "Table4"
so Table4 might be in an arbritrary position
grabbed that photo from https://stackoverflow.com/questions/54241345/pandas-read-a-table-from-excel
have you already checked openpyxl? pandas is more about tabular manipulation, whereas openpyxl is more about interfacing with excel.
if you read that sheet with pd.read_excel, whatever you get back is probably the most you're going to get.

When converting a pandas column to datetime, is there a way to format the output? I'm currently passing format="%d/%m/%Y" but it's just telling pandas what the input format is
I tried using .strftime on the column later, but it converts them back to str
Also, probably last question: if I have a df with a date (daily) column and a rate column, but some days are missing, what is the fastest way to fill in the missing days copying the rate of the previous day?
formatting the output in what context? just how it displays in the console, or somewhere else?
Display mostly, and also cut out the time stamp, I only want the dates
mm/dd/yyyy is very confusing to a non-american
be sure that whatever you do, you don't change the underlying data. this SO post goes over the distinction and your options https://stackoverflow.com/questions/38067704/how-to-change-the-datetime-format-in-pandas
I actually looked at that post, but hadn't seen the .style.format option, thanks 😄
Any ideas about this friend? @serene scaffold
have you looked at this? I know I talked about interpolation recently, but I don't remember if it was with you https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html
It wasn't, will check out! 😄


How do i deal with inconsistence city names across my data?
some city names are added,some are being droppes.some are being renamed
so the same city might have two or more names in the data. like "Washington" and "Washington, DC" for the same one?
I have data from 2014 to 2019. Each year has State,district,subdistrict data. For a year,lets say 2014, data is consistent, but if I go to next year, some new district/subdistrict are added/removed/and has different spelling that last year
States are consistent cause government cant direclty change the name of state.
but as I go deeper to subdistrict, things gets messy.
so these are all places in the united states? how many unique names are we dealing with here?
India
look into "named entity resolution india places"
So, I made the list of distinct state-district-subdistrict pair for all years and the numbers are about 6k for each year.
unless someone else has already make a look-up table that you can use, you will probably have to accept some inaccuracy. because chances are, you don't want to disambiguate all of those by hand.
what i am trying to do is, i have made distinct list of state-district-subdistrict pair across years and stored in an array.
Now, i am finding setdiff between those arrays
so i get the output of pairs which are different in both the list
for eg:a=[1,2,3,4,5,6]
b=[4,5,6,7]
op-[1,7]
fuzzy string matching may or may not help
heard of dedupe?
Hi, is there a community for Matplotlib related questions?
hi can i get help with this question?
To better recommend songs, Spotify decides to create an agent trained recommender as a model of reinforcement learning. After training, the agent tends to recommend the same songs over and over again. How could this behavior be corrected during the learning process?
a) Decreasing the exploration decay rate.
b) Increasing the exploration decay rate.
c) It is not possible to correct this behavior.
d) Increasing the learning rate.
e) Increasing the discount factor.
Here is a good place for these as well
Don't know much about RL but is it something to do with exploration?
This is actually why I joined today. Haha. If you find one, let me know.
I'm happy to try to help. I'm deep in the same stuff right now.
Reinforcement Learning isn't my forte. Hopefully, people with more experience on that niche can answer your questions
Is anyone skilled at object oriented programming I have a problem I need help with
What's everyone's favorite ML algorithm?
Don't ask to ask. Simply ask your question right away. I'm sure if you had done that, someone probably might being answering your question(s) by now
Lately, XGBoost & CatBoost
XGB
Okay I'm trying to create a table using the tabulate module in python when. I've stored the data in a list of object instances of a class called Shoe when I run the tabulate function I get a runtime error: Shoe object not iterable. Why is that? Also how does one add a variable that is not initialized with init to each object instance by running a for loop through each instance which is stored in a list. Like so:
For object in list of objects:
Value = object. quantity * object.size
I want to store variable "value" as a property of each instance object.
is this a case of overfitting or underfitting. what can i do to improve the model?
I really need someone to help me getting my images in arrays properly
I’m reading when I’m with cv2 but it’s unable to convert to tensor so I can’t train a model
If anyones good at CV please dm me
If you know the meaning of over-fitting, it should be easy to see from these images if the model is over-fitting
Over-fitting just means that the model is performing really well on the data it is trained on, but is not able to generalize very well
I was trying to run the Dall-E mega model on my gtx 1080, but i seem to run out of vram, when trying to run their example code. Is there any way to add just a little bit of extra vram to my system without buying a super expensive 3080? Does Tensorflow support using the vram of 2 gpus connected through sli as a "pool"?
Are there any other ways i could get this to work?
I have 8GB of vram, which seems to be just a bit under the required amount
I thought you had to wait on a waiting list to use Dall-E?
Is there some download link to run it on your own machine?
Maybe it was Dall-E 2 then
Oh, but the transformer to generate the images is not open source?
Only the text to representation part I guess
This is where i found it
dunno if it is a modified dall-e mini, but it sure has the word mega in it
import wandb
run = wandb.init()
artifact = run.use_artifact('dalle-mini/dalle-mini/mega-1:latest')
artifact_dir = artifact.download()
But i cannot really evaluate it, as my gpu cannot run it :(
I don't think it's possible without getting a different GPU. You could also throw it on a google colab instance (you may need pro) if you just want to tinker.
That was my fear :(
Are you aware of how tensorflow interacts with sli? Could I get away with getting a cheapo 4g vram card and slapping it into my motherboard to have more vram?
I might just tinker with it on my CPU, if I can get that running. I'd like to have this on my remote server sometime anyway. Sad though
Not sure, but since most of the new GPU's don't support SLI, I would think that the support for SLI might start dwindling.
That being said, it looks like keras/tensorflow handle that for you and make it easy, so it's possible that another little gpu for the vram would be helpful.
https://datascience.stackexchange.com/questions/46952/should-i-connect-my-two-gpus-with-sli-or-not-for-keras-tensorflow
is anyone aware of the existence of an AI that can create Python programs? Something like "Create a program that traverses the file-system and finds all Linux distro ISOs and ftps them to this ftp server..." https://openai.com/blog/openai-api/ wow...
WHY
DONT
The errors
STOP
Willing to pay a lot of money to anyone who can fix my problem
Need to load images and masks into resnet
and masks?
!rule 9
resnet takes an image and outputs a class, so what do you even mean mask?
And yeah no money
I hate to be that guy but hey...
https://copilot.github.com/ this is also pretty close to what you want
Not a solved problem though. It's more of a "helper"
yeah, copilot is neat. Now they need to have ai-assist for understanding large code-bases..
i need to very precisely track the motion on pupils for some task
so i decided to use openflow for that
is it ok to use that?
considering it is ok:
for eye detection i used dlib 81 landmarks,
it is best for that?
and for this i first need to detect faces
what is the best face detection model
is it vgg?
*so there are 3 questions here!
i am having trouble saving the image for the pytorch implementation of imagen running locally
https://paste.pythondiscord.com/udedoyuwaw this is my modified code from the repo. got help with the last 4 lines
original: https://paste.pythondiscord.com/unikupabiz
however, when i run it i get this error:
cv2.imwrite(f"image-{i}.png", images)
cv2.error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp:737: error: (-215:Assertion failed) image.channels() == 1 || image.channels() == 3 || image.channels() == 4 in function 'cv::imwrite_'
ive been trying various solutions such as putting in a subfolder, specifying the full path, trying without the f but no luck.
can anyone help? and if anyone know an alternative method of saving the image, it would be appreciated
ping if you know
What's 'images' ? Name suggest multiple images?
seems like you can make a batch of images
although its fine if its just 1 at a time
i just cut it down to 1 in my attempt
But can open CV save batch? Or just single image?
where would you put that
This is just image name
After you call imagen.sample
like this?
If it's not jupyter then you want to print it to see it
i wanna save it to my pc locally
ok do i just replace this with the last 4 lines?
sorry if i ask too many questions im not that good at python
Yeah what does it print?
aight lemme run it now, it takes around 4 minutes thats why i wanna make sure
In the last code if you change to
for i, image in enumerate(images):
cv2.imwrite(f'image-{i}.png', image)
Would it work?
tote(images)
NameError: name 'tote' is not defined```
ok ill remove tote and run this
It's type not tote
ah so like this?
Yes
aight
cv2.imwrite(f'image-{i}.png', image)
cv2.error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp:737: error: (-215:Assertion failed) image.channels() == 1 || image.channels() == 3 || image.channels() == 4 in function 'cv::imwrite_'
oh also at the start it says this now <class 'torch.Tensor'>
It complains about number of channels in a image that it expects in png
i saw smth online about tensors but i didnt quite understand it
Where's that code from?
imagen
Is there a GitHub repo? I see
in what timeframe?
Anyone here able to help me with my computer vision segmentation
With Unet experience
Hi. I am wondering if I could get some clarifications regarding these matters please.
In order to pass data to train an SVM or other ML estimator, given a dataset with nested / hierarchical data like in this SO question, is it always necessary to convert/flatten the dataset so that the facts are all present in each row (each example)?
Follow up question. If it is always necessary (meaning classifiers cannot be directly trained or predict based off hierarchical data (examples)), are the only available techniques to flatten the data to either:
- Group the nested data into one representative value
- Expand the nested data into columns/variables (new features)
Follow up question 2. Does RNN have the same need for flattened data or can they handle the hierarchical data out of the box?
SO question -> https://stackoverflow.com/questions/66961525/how-to-feed-a-nested-array-into-an-svm-model
these are really limitations of how the procedure is coded, more than anything else. from the mathematical standpoint, these hierarchies don't really have any meaning. it is equivalent to just take out the "nested" data and appending it at the end of the vector, for example
you can write an equivalent (multi)linear transformation nested with your favorite activation function regardless of how you choose to arrange the data. most libraries are written so as to take a single vector as input, and the order in which you do that doesn't matter as long as it's consistent
Thanks for taking the time to answer Edd, but I am not really understanding your answer. Do you mean to say that the limitations exists or they do not? Do you mean to say that applying a multi linear transformation is going to remove the limitations? If so, could you direct me to any paper, git, or google terms I can use to study more such method.
what i mean is that if you were to look only at the math, the limitation is not there. the problem is the that the libraries pick a fixed implementation
so it's not really a machine learning problem, but rather a software/API design choice
That I understand. Until that point at least.
beyond that, you have an input vector that has one or more entries that are vectors themselves
you can simply make one long vector out of this
this is kinda like what you said on your point 2, but it's not one-hot encoding
Yep, Im using incorrect the one hot encoding term.
So you expand the data, in some cases this expansions is done through pivoting as the SO overflow post, or you could also come with aggregate variables that takes the vector from R^n to R (ie: the mean of the nested vector). Are those the only two ways?
I am barley downloading Anaconda and I was curious as if why I am unable to check the first checkbox
you could technically turn the data into an ND array of arbitrary shape, not just a vector
is it one or the other?
What do you mean exactly ? I am trying to check mark " Add Anaconda3..." and so on.
there are two checkboxes there, and it might be that you can only pick one of them.
I should be able to check mark both checkboxes.
any calculus chads in chat able to tell me why my loss is 1x10^15
learning rate 0.0001 , batch size 1
rate could still be too high
its already far below default
whats the loss at the 1st iter
maximum, its only decreasing slowly
but it is decreasing? in that case you can try increasing the learning rate
it seems to converge though
the numbers alone mean nothing, just the trend. it's important to keep in mind that the admissible initial learning rate depends on the smoothness of the network's behavior. things like "default" don't exist
it stops decreasing and increases back to 12x10 15
try a smaller step size, then
Hey guys! 🙂
Anyone have a little time and patience to help me figure out why my code is displacing the correct values one day back please?
(There are extra days because I filled in the missing days for other purposes but the daily amount earned for the filled in days are 0)
Anyone know why I’m getting 99% validation accuracy in segmenting
Well, more like 97.8%
I know for a fact it’s too good to be true
if the problem is that calculate_interest is having unexpected behavior, you have to show the definition of that function.
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
Shit, I found the problem
In df.iterrows() is there a way to access the value of the previous row?
Should I store it in some variable?
you can do that, but if you show me the implementation, I can see if there's a more idiomatic approach.
Okay!
So basically I have this df (attached image) with daily dates (data) and interest rates (valor_ajustado is them in decimal format for calculations).
Then I have the function: (don't worry about frequency)
What I noticed was that I'm calculating a day's gains using it's own daily rate, when I should be using the rate for the past day.
def calculate_interest(df, frequency, capital):
calculations_df = filter_by_freq(df, frequency)
answer_df = pd.DataFrame(
[{'Date': df['data'].values[0], 'Capital': capital, 'Amount Earned': 0}]
)
if frequency.upper() == 'DAY':
current_capital = capital
for index, row in calculations_df.iterrows():
# -> This next line
current_capital *= (1 + row['valor_ajustado'])
# <- This past line
new_row = {
'Date': row['data'],
'Capital': current_capital,
'Amount Earned': current_capital - capital
}
answer_df = answer_df.append(new_row, ignore_index=True)
elif frequency.upper() == 'MONTH':
pass
elif frequency.upper() == 'YEAR':
pass
return answer_df
pandas doesn't effectively support operations where the previous iteration matters. but you need to remove the df = df.append part. because that has an O(n^2) runtime.
appending to a dataframe (or numpy array) involves copying the entire dataframe into a new dataframe. it's incredibly inefficient
most people don't. I wish they'd just delete it from numpy and pandas
append everything to a list and convert it to a dataframe at the very end.
Makes sense
So the best way is to store the past rate in a variable every loop?
yeah, that's fine
Okay, thank you so much 🙂
Holy shit
The program was taking ~55s to run and now took 1.5 lol
that tends to happen when you go from O(n^2) to O(n) 😄
mental note. do not use df.append. got it. 
How can I get the total amount per "Card Holder" per "month" in excel using pandas/numpy?
https://ibb.co/ZgT4wJP
I managed to split the year,month,days
I need to hand in this assignment tomorrow
hello there fellas im learning AI and im kinda clueless as to what to do to what im trying to do
trying to analyze a large set of files and then auto generate some new batch and manually rate them to train the model
not sure on how to do it to be honest
can you be more specific?
midi files
so you're trying to generate music, or what?
just midi files
what do the midi files mean?
umm its like the data of the melodies
I assume you want your model to do something other than generate completely random audio.
yea
for the moment, you are the only one who knows what the midi files are, and what you want the output to be as compared to the input.
this is what u see when u drag and drop a midi file into FL studio
its a app for making music
a midi file holds data for each note
like the tone pitch velocity
any clue @serene scaffold
I know that app! 🙂
@royal hound are you trying to parse midi files?
i can help with the midi parsing/generating but other than that.....I know foo about training a model/ai/ml
Hey guys! Genuinely last question:
def filter_by_freq(df, frequency):
filtered_df = df.copy()
if frequency.upper() == 'DAY':
pass
else:
date_obj = filtered_df['Date'].values[0]
target_day = pd.to_datetime(date_obj).day
target_month = pd.to_datetime(date_obj).month
final_date_obj = filtered_df['Date'].values[-1]
if frequency.upper() == 'MONTH':
filtered_df = filtered_df.loc[filtered_df['Date'].dt.day.eq(target_day)]
elif frequency.upper() == 'YEAR':
filtered_df = filtered_df.loc[filtered_df['Date'].dt.day.eq(target_day)]
filtered_df = filtered_df.loc[filtered_df['Date'].dt.month.eq(target_month)]
return filtered_df
How can I also include in the .loc the very last row from the original df? Tried doing (for frequency month): filtered_df = filtered_df.loc[(filtered_df['Date'].dt.day.eq(target_day)) | (filtered_df['Date'].dt.date.eq(final_date_obj))] but didn't work.
I know how to read and create midis
I just dont know/understand how to implement the ai part
df.groupby('month').count()
Does anyone know how to use a rectangle with text in it as the legend of a chart in Matplotlib?
Something that looks like this in the legend box (two rectangles with different colors and example text in it)
-------------
| 69% | success (percentage)
-------------
-------------
| 31% | failure (percentage)
-------------
does anyone know how to use theano, I have doen the pip install Theano but whenI try to import it I get an error.
Hey @compact star!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
Could anyone guide me on bullet hole detection system? As I'm quite frankly stuck on what to do next with these images
Any good source for aws deep racer competition?
It's reinforcement based machine learning platform and we need to develop a reward function which keeps car on track and finishes the race in fastest time
You get input parameters like distance from center and track width etc
What algo would be best suited for this scenario
anyone got a tool/script about the PCA eigenface algorithm but with simple matrix inputs cause i wanna visualize the steps?
please @ when replying
I want to pick the 4 corners of an image, and an n amount of random points in the image. Then I want to create a new image that interpolates between all of those points. How would I do that? I tried Scipy's interp2d function but interp2d is its own object I can't get it back to a 2d array for further processing. Anyone know how to do this?
It's for background subtraction of telescope data
tensorflow is running extremely slow, despite having installed CUDA, cuDNN and tensorflow-gpu. I'm running my deep q models, and they will take multiple hours to train on 1000 episodes, instead of minutes which I'd see on my old pc
Great turns out photutils already does all this for me. 3 hours of coding for nothing
anyone comfortable with opencv and pytesseract?
don't ask to ask. just ask.
one of the text is not getting detected
its not proper so that i am not able to match with the text
how to make it predictable
can i send the pic?
SDLTC should come
Hey there. How do we choose the number of components for NMF (Non-Negative Matrix Factorization)?
Or to be more precise n_components from sklearn.decomposition.NMF
(for topic extraction)
Thank you, but i need to do it for each Card Holder
Hi friends, new to this server and concept. Ive joined #help-potato with my problem - do I need to do anything else or just wait until someone can help?
Uh maybe df.groupby(['Cardholder Name', 'Month'])[Amount].sum() not at PC I can't test
Hello guys, I feel a bit stupid to dont knowing this... I know this is simple terms, but I can't get it right. What is the difference between groupby and aggregating?
got any idea on what's the dif between SimpleImputer().fit_transform() and SimpleImputer().transform() ?
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
@spiral furnace check the docs
Examples using sklearn.impute.SimpleImputer: Release Highlights for scikit-learn 1.1 Release Highlights for scikit-learn 1.1, Release Highlights for scikit-learn 0.23 Release Highlights for scikit-...
can someone recommend a tutorial on how to make a text gen model with a custom dataset
I've read but still can't get why you do fit_transform() on your train data and transfrom() on your valid data since you still train your data afterwards
@compact rose with groupby it's like you do a click to collapse files so you can read only their names and not their content... google (collapse vs expand) to understand... agg is different - you use it when you want to apply several functions on your data at once
Open Cv yes tesseract no
WINDOWS?
Try Keras
Hello, can someone help me get feature importance for my data?
This is the code I'm running right now:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
importance = DecisionTreeClassifier.feature_importances_
myModel = DecisionTreeClassifier()
myModel.fit(X_train, y_train)
myModel.predict(X_test)
print(myModel.score(X_test, y_test), myModel.feature_importances_)
It shows feature importance, but without any labels. I want to show the name of each feature next to its feature importance. Any ideas? Thanks
How does it show it?
If it is just a list of numbers, it is likely in the same order as your training data
Yes, it's just the list of numbers but how would I know which number belongs to which feature for datasets with a large amount of features? And how can I sort the list of feature importances by which ones are the most important and knowing which feature they belong to?
I did this to get every feature with its importance:
for i in range(0, len(df.columns)):
print(X.columns[i], myModel.feature_importances_[i])
But I still don't know how I can sort it by feature importance
feature_score = sorted(zip(X.columns, myModel.feature_importances_), key=lambda t: t[1])
@lapis sequoia
You zip the columns with the scores to get them grouped in tuples
then you sort the tuples
I see, thanks
Sadly the tool must be opencv :/
Does anyone have any recommendations on where to get Forex(EURUSD specifically) data from? I would like to be able to get data in 5 second intervals.
this might be a dumb question but can someone help me with this
I have this corpus of text and the red highlighted is the label itself and after the \ is the actual text itself
can someone help me turn that into a list of
sentences = []
labels = []
pls ping me if you can help
hello. anyone k now how to use power BI ? I'm wandering how to merge two columns together WITHOUT affecting original table raw data and only changing the format in a matrix visual. (everything is only visual affections no need to mess with data)
theres a way to do the aggregation you need with DAX
youd have to look into it
should be on the microsoft docs
Summer just started and I haven't been in school for over a year now. And yet, here I am, watching Khan Academy, taking notes in a physical notebook, and testing stuff out in Desmos, trying to figure out derivatives and dot products and sigmoid functions just so I can make a computer predict if some number is a 1 or a 0
what is your goal for doing this?
Gotta learn ML somehow
well, yes, and it can potentially pay off. will you be going back to school later?
Yeah, I'm a college CS student, but I've kinda just slipped through cracks and moved at the least convenient time so I've only had some class on local history since I graduated high school
I'd be doing this if I wasn't going, though. I just love technology and finally feel competent enough to understand this stuff
is an accuracy of 0.42 good for a text gen model? i made it myself, it's nowhere near gpt-2
do the sentences "sound good" to you?
Hi,
Hi
I am a professional ocr developer
Really?
yes
The ocr perfomance, was not that good.. Around 60%
I believe that the binarization is not correct
What do you think?
Or what would you suggest?
yes
By the way, I know that this is a Python server
But I am using Java
Just for educational purposes
just use python
hehe but look at this job posting
i can hear the pain and frustration through these words

straight up an entire backstory in this one


could someone tell me if mediapipe with its pose solution is machine learning or deep learning? Thanks
can someone shed some light on why some of the words are not detected in my code using opencv and pytesseract
I have an image and want to detect the text regions in it.
I tried TiRG_RAW_20110219 project but the results are not satisfactory. If the input image is http://imgur.com/yCxOvQS,GD38rCa it is pro...
"input type (torch.floattensor) and weight type (torch.cuda.floattensor) should be the same"
How to fix this error?
Mediapipe is used in deep learning
why does it say that it is a machine learning solution? then what algorithm does it use? thank you I ask that because on the internet it does not explain
I believe there's a wide range of stuff one can do with mediapipe aside using it for computer vision projects.
yes but if I had to explain mediapipe it's deep learning convolutional neuron
It's a computer vision library. No it's not a convolutional neural net. Just think of Mediapipe the same way you use OpenCV, Skimage, and other computer vision and image processing libraries.
so it's machine learning but computer vision is that right?
Deep Learning is still Machine Learning, so yeah.
but I read on the internet that artificial intelligence is machine learning which contains deep learning but if I have to explain to a person what category is mediapipe, I must say that it is machine learning and it is vision by computer? I'm confused about this
Mediapipe is simply a computer vision library. On the other hand, computer vision is a niche in deep learning.
I hope this clears the confusion.
Also, there's more to AI than Machine Learning.
So if I have to explain what mediapipe has under the hood, is it deep learning to recognize the joints of the human body?
Hey anybody here into data science/data analysis / data field?
i need some strong career /skill suggestions to make progress
thx in advanced!
pls ping me if u're into it
What's like the cheapest way of getting a python file to run 24/7 on a loop of like 30mins
without having your computer running all the time
running a web scraping script that logs data into an exce
excel
if you're running your script on your own computer, you can't exactly keep running that when it's switched off, so you'll need to use a remote machine/service
they look good
they aren't coherent most of the time but they look about right for a self made text gen
for example
i entered "binary" as the string, and it returned "I want to string the python"
well, you're not going to make something that rivals GPT-2, so if you've generated sentences that "sound good" and you learned from doing it, then that's a win.
hello guys, does anybody here already used PCA in pyspark that can help? im trying to use in a highly correlated features,but i dont know how
Now, I'm studying time series forecasting. But I have a question: Why do when I re-run the model the result follows to change?
With the same trained model?
Because predicting is normally deterministic, but training is not as there is randomness in weight initialization f.e.
@bold timber
yes, trained with the same model
this is my model that I've build
Yeah but when you test it the second time, did you fit the model again as well?
this is my first plot after running the model
and this is my second plot when I running the model for twice
hello, im studying machine learning and rn im studying linear models and least squares section, i saw this particular equation. is it necessary for me to remember this equation? Also i can somewhat understand what this equation does so is it fine for me to proceed further? or do i need to do some problem solving sums related to it?
all of machine learning deals with that equation, so you certainly need to remember it
ah thank you, so will we be making more equations with the help of this equation? im sorry for asking dumb questions 😭
one way to interpret it geometrically is as an affine hyperplane. this means that, if you work in 1D, this equation represents a point on a line
if you work in 2d, it represents a line on a plane. in 3D, it represents a plane inside a cuboid. in 4d, a cuboid inside a ????, and so on
you will indeed
ooh
because the sum written there in sigma notation is the same expression as a dot product
that means that you can take several equations of this kind, and write them as a single matrix equation
you reach the familiar y = Ax + b when doing so, and machine learning deals extensively with such equations
okay so here's whats happening here, we are doing a dot product between Xj and B-hat j. and we are doing this for P times and then adding all these dot product values and then adding a B-hat 0 value right??
the equation as it is is ambiguous unless you tell me which quantities are vectors, matrices, and scalars
it does mean summation, but idk if the stuff inside are scalars or vectors
oh
so like how does knowing if the stuff inside is a scalar or vector affects the addition??
yes
oh yes
vector addition and scalar additions are different
so is thats whats affecting it??
or am i wrong?
multiplication is the issue, not addition
oh
multiplication of matrices and vectors adds in other summations
ooh
.latex e.g. if $\boldsymbol{x} = [x_1, x_2, \dots, \x_n]$ and $\boldsymbol{y} = [y_1, y_2, \dots, \y_n]$ are vectors and we take their dot product $\boldsymbol{x} \cdot \boldsymbol{y} = \langle \boldsymbol{x} , \boldsymbol{y} \rangle = \boldsymbol{x}^{T} \boldsymbol{y}$
sigh this thing
damn
.latex e.g. if $\boldsymbol{x} = [x_1, x_2, \cdots, \x_n]$ and $\boldsymbol{y} = [y_1, y_2, \cdots, \y_n]$ are vectors and we take their dot product $\boldsymbol{x} \cdot \boldsymbol{y} = \langle \boldsymbol{x} , \boldsymbol{y} \rangle = \boldsymbol{x}^{T} \boldsymbol{y}$
well, i'm not gonna try and spam it here
XD
Hello peeps i have a Web scraping question
hey guys i need to know if there's a name or if what i'm trying to do is even possible. Let's say i have two functions
def a(self,response):
yield from response.follow_all(xyz, self.b)
def b(self.response):
yield{
"x":'x',
"y":'y'
}
result will be something like this
[{
"x":'x',
"y":'y'
},{
"x":'x',
"y":'y'
},{
"x":'x',
"y":'y'
}]
What if i wanted to add "z":'z' but from the function a ? is it possible to do something like
def a(self,response):
yield { "z":'z',
yield from response.follow_all(xyz, self.b)
}
def b(self.response):
yield{
"x":'x',
"y":'y'
}
basically i have a structure like this
/sectorsPage / ==== needed for sector name
--//categoriesPage \ ==== needed for sector name
----//subCategoriesPage
-------//organizationsPage
----------//organizationDetailsPage <-- needed for coordinates
----------//organizationDetailsPage
----------//organizationDetailsPage
----------//...
-------//organizationsPage
-------//...
----//subCategoriesPage
----//subCategoriesPage
----//....
--//categoriesPage
--//categoriesPage
--//....
sample output
[{"Nom": "ETS TAKTADSQK MOURAD ", "Description": " Notre tablissement toute industrie et les besoins de nos clients....", "category": " ", "Addresse": " BP N 16 3089 ", "Tel": " ", "Fax": " ", "E-mail": " ets.@gmail.com", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1="},
{"Nom": "", "Description": " Raison sociale : ", "category": null, "Addresse": " ariana", "Tel": " ", "Fax": " ", "E-mail": " @yahoo.fr", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1="},
]
but i want it to be
[{"Nom": "ETS TAKTADSQK MOURAD ", "Description": " Notre tablissements besoins de nos clients....", "category": " ", "Addresse": " BP N 16 3089 ", "Tel": " ", "Fax": " ", "E-mail": " ets.@gmail.com", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1=","SECTOR":"finance"
},
{"Nom": "", "Description": " Raison sociale : ", "category": null, "Addresse": " ariana", "Tel": " ", "Fax": " ", "E-mail": " @yahoo.fr", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1=","SECTOR":"finance"},
]
is the range here obtained from cos thetha??
ooh thank you for this formula
oh thanks for that
now it cleared my doubts
also
should i look into how this was proved? or should i just remember this as it is?
there is no proof for that, that is how matrix multiplication is defined
holy hell did you type out all these?!
so its safe to assume that its a corollary right? and just remember it?
ooh thats sick
not corollary. definition
oh
okay okay thank you so much for the explanation :)
this looks so complex wtf 😭
latex is good to learn because it lets you nicely typeset your maths
ooh
this is what people use to write papers, books, magazines, blogs, etc. that are math-heavy
oh sheesh
and overleaf is an easy way of using latex, since you don't have to download anything nor compile it yourself
yep yep
now i understood how they typed all that
overleaf is nice
and i was trying to do that on microsoft word 💀
def worth bookmarking if you havent already
microsoft word also allows tex-style input in equation environments
i only know header and body in word XD
also ik its a dumb question but im reading this book called "Elements of statistical learning" and i have no knowledge in ml
is it a nice idea?
or should i stop and start with a different book or resource?
presumably you'll learn what you need there
if you find it's going too fast, go back to something simpler
idk much about the resources but thanks ill look up online for some
its taking me some time to understand it
you can't escape learning at least basic linear algebra and multivar stats if you're doing ML, so you'll have to pick it up somewhere or another
oh no im familiar with linear algebra
its just that i need some basic brushup which im doing currently by following 3blue1brown's videos
i disagree, given the questions you just asked 😛
what
the stuff i just wrote out for you is week 1 of linear algebra
those are just definitions of basic operations with matrices
oh
so is my approach to ml wrong?
there's no such thing, learning is different for everyone
and stats is such a wild, weird field that you can almost learn it disjointly from everything else
oof
at some point you might need linalg for it though, so you should brush that up sooner rather than later
anyway, the interpretation of this stuff is the intersection of affine hyperplanes
each equation of the kind you shared is the equation of a hyperplane that may or may not pass through the origin
ahh so far i need to look into scalars and vectors and matrices right??
mhm
thank you
also
may i ask how did you learn ai and ml?
it might help me in guiding me into it
i did a masters and am doing a phd 😛
so, going to class, reading books and papers, and writing papers
i like gilbert strang's linalg book
moses and stoica's spectral analysis of signals
axler's linear algebra done right (this one is on the harder side)
statistical signal processing by kay

professor when

idk, still not sure if academia is my thing to begin with
Hello! i'm going crazy.. i don't understand why with matplotlib, once i make an interactive graph, the graph is not updated on slider change. it must be something wrong with the update function but i really don't understand. Could anyone give a look? Many thanks! https://paste.pythondiscord.com/uhilebikal
good. join industry instead

If it helps, i'm self taught
But i had previous knowledge in calculus and linear algebra due to university (no cs)
make small projects
don't get stuck up on the math, the math will come with practice
and it's gonna take time so be patient
How long it takes to learn pytorch?
depends on how you define "learning" it, and how much previous knowledge you have
I also have knowledge in calculus and linear algebra but I'll be revising them for a bit. How did you went through with the learning process?
Thank you so much for these resources, really appreciate it :)
I know little calculus
funtion and realation
Little trignometry
if you know how neural networks are structured, learning how to use PyTorch should be simple
actually understanding why you build them the way they're built or getting good results from your projects is a completely different topic
already knowing the math behind i just went for programming courses (python), then the most mainstream courses about machine learning and deep learning (coursera is a good one)
the rest on the job
Oh thank you sm for the guidance, really appreciate it
if you just want to use existing libs, you probably need nothing more than some base intuition. if you wanna produce new results yourself, you need the maths
For whom is this message meant?
just in general, since you're all discussing different studying methods and different math backgrounds
Yes our college went taught us some maths like complex integration and matrices and vectors so i thought I'll be able to get into ml
is pytorch is capable of doing all the things?
@feral acorn show me your notebook?
Notebook for what?
Let me see standard of maths to get into ml
It will help me
I don't have a well maintained notebook lol
Book they taught you?
- gets your hopes down
- check the pins
- take an actual course if you want to actually understand how stuff works
Andrew Ng just released a new one using Python on Coursera, which you might want to check it out. Alternatively, take a look at fast.ai's
But if i have to go through, its calculus (differentiation, integration, maxima, minima), linear algebra (matrices and vectors and scalars),and iirc some statistics
"doing all the things" is way too broad, but reality is: AI is not magic. They are made to do one thing each, but they do it well and fast
Can anyone experienced confirm if this is truth or not
some
Oof
It didn't understand 1 step hopes== self esteem?
depending on how you intent to use it / what you want to do, you may need none of it, or all of the things you mentioned plus some more
What do you mean by "how you intent to use it"?
Statistics is for ds algo
Idk, prof never told us, he told us to follow his notes
if you just want to make a simple "is it a dog or a cat" model, you can download an existing model and use it easily without understanding anything that's happening behind the scenes
if you want to make something that does not exists anywhere in the world yet, good luck
I want to use ai to simulate virtual phone without having human interraction and do some certain task, is pytorch capable of it?
e.g besides downloading datasets from kaggle and running models on them
like classification and regression models
yep, that sounds like it falls under "good luck"
like I said before, AI is not magic - they are taught to do one task, and you must have ways of giving it input & reading the output
So y'all studied so many complex stuffs to make stuffs that fall under "good luck" category 💀💀
how much size of .pth will be generated? Do you think it's possible?
it is not that simple, but at this rate, no.
if you can identify which task exactly you want for it to perform, find or create a model that excels at it, then find a way to integrate it with that """simulated virtual phone""", it might be
it sounds like you need help framing the problem
The first step in any project is defining your problem. You can use the most powerful and shiniest algorithms available, but the results will be meaningless if you are solving the wrong problem. In this post you will learn the process for thinking deeply about your problem before you get started. This is unarguably the […]
the whole point of captcha's is for your input to be used to train AI
it's a two birds, one stone thing
yes i think
welp
i got an idea.
take capcha. re draw it. then google image it. it should show the thing.
idk not sure just a concept
First part of my biggest project yet is complete 😮 https://github.com/corndogit/DataSpaceArt
weather data-driven art project using matplotlib
thinking of ways to carry out the next step of displaying it in places, i was thinking of just generating one per day and putting it on a website but i could also automate posting one to Twitter and embedding that on a site
Can anyone suggest any good courses or certifications for machine learning within python?
there's a line of tensorflow courses on coursera that covers the basics without going in depth into the math. you can ask for financial aid ifyou're a student of any kind, letting you take them for free while still getting a certificate
machine learning specialization + tensorflow specialization + deep learning specialization all on coursera
this has probably been asked plenty of times here, but does anybody know any fully structured path for data science/machine learning? there is lots of stuff online but it's hard to know what to read/watch first before moving to the next step
Im doing text classification and I have 5 labels
1,2,3,4,5
and this is my model
When I put the last dense layer as 5 I the first image I sent
Loss of nan
but when I put 6 into the last dense layer I get a normal output. Why is that and can someone tell me if im doing something wrong
The labels should probably be 0, 1, 2, 3, 4 @blazing bridge
I would think, but not sure
no I checked
the labels are 1,2,3,4,5
So what does the model output?
its supposed to predict these 5 labels
I have made predictions with it yet
I can try doing that
but idk why it wouldnt be 5 in the last dense layer instead of 6
What I think is that the model outputs 5 logits, and gets the index of the highest logits, which goes from 0 through 4
and your labels have index values to 5
thus giving an undesired result when comparing them
But when increasing the final dense layer, the output is size 6, so the index can go to 5 now
Giving a valid loss
You reduce the labels by 1
I have a list
that I then convert to a numpy array
it works now
with the dense layer of 5 at the end
I just used this for loop
final_labels = []
for i in range(0, len(labels)):
final_labels.append(labels[i] - 1)
^
also any recommendation do lower validation loss
theres a huge difference in loss values between training and validation
Look up "overfitting"
yeah i tried alot of dropout layers but my validation is still so different from the training
and the validation loss keeps going up
If I have variable age (year, bounded by 1-99) with missing values. Is there a difference between making age a categorical variable with "missing/na" as a category vs keeping age as a continuous variable with a null/missing flag indicator variable? I feel like the are the same as age acts as a discrete variable in this case.
if you have missing data that's a NaN, keep it as a NaN. if the fact that there's missing data becomes unavoidable, it's easier to deal with that when the missing data is a NaN
You probably don't want to make it categorical/1-hot encoded, as the output for someone with age 43 will likely be similar to output for someone with age 44, whereas categorical data would not maintain this relation.
sounds like what i ran into during my internship
So in this case, there is sampling bias causing the missing values in age. In terms of keeping as NaN, essentially I'm asking at the feature engineering stage, what exactly the difference is for a regression problem: Converting to categorical vs Keeping continuous, filling the NaNs + having a flag indicator for nulls.
I understand categorical would lose that information. Wouldn't binning age solve some of that issue?
If you think you know how to group the ages then maybe yeah
I guess I'll find out once I try both and get the results. But I was just curious if there's some theoretical justification that's already known.
But maybe there is a big difference between a 51 year old and 53 year old, but you are binning it in 50-60
So you need to think about that
Yes, that's a good point too.
How to apply multilingual bert for non-English text classification?
Does anyone have idea or if you have any good resources please share them
looks more poisson-like to me
is there anyone who s comfortable with cv?
one word is not getting detected in my code properly
I have that problem too
from my 3.5 minutes of cv research I know that you should usually transform the image so it's black text on a white background and sometimes use small amounts of gaussian blur if the text is sharp
Hey guys I need a little advice. I have a feature which consists of age when person started drinking/smoking. The problem is that there are people who have never drunk/smoked and their values are nan. Whats the best way to fill them?
I tried filling with high ages like 100 so that model would think that the later the person starts drinking/smoking the better it is, but it didnt improve score
sounds like something rather to be included in the cost function, not as a feature
you could have an extra penalty term added to the cost function, and it is multiplied by something like not isnan. then for people that never started, there is a nan and this term is multiplied by 0
I'm writing a paper on my doctoral thesis in medicine and did all the statistical analysis using python and scipy/pandas. How likely is it, that there are differences from python to SPSS, if the exact same test is performed with both?
Since my tutor is unfamiliar with python (as most physicians have no idea about anything IT related) I'd like to know, how "scientifically acknowledged" a pure python analysis is
what is the difference of doing ** newdf = df ** and ** newdf = df.copy() **
if I just do newdf = df and then change something in df then I should see the change in the newdf too?
it should be fine, the mathematics are the same. what might change are default parameters for the analysis, e.g. the number of bins for histograms, whether normalization is done, if means, variances etc. are computed as if for a sample or a full population, and the like. that means it is possible to tweak the parameters of spss to do exactly what python does, and also the other way around
What languages and what classes
Hey, does anyone have good materials for PCA
Im trying to do one based gender and political parties but its not working out
😢
you can look up covariance matrices, eigenvalue decomposition, and singular value decomposition to learn more about it
Its not exactly a classification task but in general, hindi error detection task using bert
I mean, comparing the text present in image with original text(.txt file) and applying bert model to detect errors and grammatical mistakes
Check out kaggle course on PCA
Not sure what should I do about the data, should I turn the political parties to numbers or something
Because I only have data with Name Surname Gender Party columns
Thank you I'll check that out
yeah you have to turn everything into numbers somehow
you can do onehot encoding or just assign a scalar to each class (subspace dimension is invariant but the principal components themselves will look different depending on what you choose)
Yea I did one hot encoding on gender
Thanks for the help
Found this kaggle article which is actually really useful
the names are probably not important in your analysis
Never even knew about df.corr()
Yeah no
hi im trying to develop a deepl model, is it bad that i have more loss on my training data than validation data?
or not necessarily
Are you using dropout?
no
ok nm i have no idea what dropouts are just got started xd
I've never heard of people using BERT for that. BERT is a language model. do you know what that means?
Pretty random, but so far I know it shouldn't be bad. If the validation loss was much higher than training, It would mean it's overfitting. But this one means that the model is pretty well trained. Well done 👍
There's loads of reasons it can happen, dropout, maybe you are accidentally using your validation data while training, maybe the validation data is easier to classify
I know what bert is in general, but I am not sure of error detection using bert, but this was a task given to us.....but is text classification according to labels possible using bert?
Yes it is
Is finding errors betwee obtained text and original text using bert and finding accuracy is possible?
That's not classification though
Classification would involve giving discrete labels
Plus depending on how you are calculating accuracy, ML seems a bit overkill anyway
Thanks for the answer, I’m sorry for the late response but I didn’t see it earlier.
So you’re saying that I can trust the results Python is giving me to be the same (regarding something like an independent sudents T-test, wilcoxon signed rank test, Shapirpo-Wilks test or spearman correlation), without having to validate it via SPSS (and therefore learn how to do that plus paying the very expensive program)?
here for the same state code i need to add all the boys of differeent districts nd make it as a single row
Yeah I am not working on classification task
Its this what I am working on @acoustic halo
Fair enough, since you said "is text classification according to labels possible using bert?", one would assume you wanted to classify something
How exactly do you want to find the errors between them? Like count the number of letters different?
yeah that should be fine
Alright, thank you 😊
I guess I'd try something like df[df["state_cd"] == 36]["Boys_total"].sum(), but try claiming a help channel if that doesn't work#❓|how-to-get-help
or nested in a for loop
for code in codes:
print(f"Code: {code}, Total Boys: {df[df["state_cd"] == code]["Boys_total"].sum()}")```
you would want to use .loc for this, rather than stacked df[ ][ ] calls
df.loc[df['state_cd'].eq(36), 'boys_total'].sum()
also, people can ask pandas questions in this channel.
@hallow salmon did you figure it out?
i actually did that but not getting ig i figured out will let u know..
actually, it looks like what you want might be this
df.groupby('state_cd')['Boys_total'].sum()
CC @brisk sage. it's rare that the best way to do something in pandas involves looping over sequences derived from itself.
yo i need help with matplotlib basically i have a huge dataframe containing a date and data from 100+ countries, now i need to show all of those countries' names as labels, i currently have this
yes but i need to show them, like this
Right but that's 2 lines, you have 100
i have no choice, its kinda a requirement
100 lines in a single plot will be chaotic and uninformative
Let alone having a legend with the 100 labels too
hmmmm, you're right, maybe i should just get the average instead
Or separate into multiple plots, or make a grid of plots or something
Or a histogram, but 100 lines isn't the way to go
I keep hearing for loops to be bad for some reason, though I’ve never figured out why that is
@still frost do you have to make a lineplot?
Would anyone like to give some feedback on an API for running embeddings-based classification, segmentation and reranking?
numpy and pandas are implemented in C and can do CPU-bound operations much much faster than Python can, and given that Python has one of the slowest runtimes of widely used languages for CPU-bound operations, that property of numpy in particular is what makes data science/AI in Python viable.
For example, goal is to make it super flexible so that it can provide classification with BYO labels and no finetuning/training. Calling it https://similarity.ai, keen for feedback!
(based on things like kmeans and other algs for segmentation)
it's also just easier to keep track of what your logic is when you use the provided numpy and pandas methods. these libraries favor a declarative programming style, rather than an imperative one.
Yo i'm back i still don't know how i would go about doing what i wnat to do
i want to make a AI that analyzes a certain file type (midi) and try generate new ones based of what it learns
have you read about music generation since last time?
not quiet
i looked at some projects
they dont really do what i want to do
they really focus more on generating sound
than midis
your problem statement is kind of underspecified.
AIs don't just accumulate arbitrary data. you have to know what you expect to happen when you train it on a given MIDI
so a midi file holds data of notes
each note in the midi file contains data of where it is when its played the tone pitch velocity e.t.c
I want to make a ML/AI that can take a look at this midi learn how the midi is made. And after that each generation needs to ouput a set of midi files like 10-50 and then i rate them myself to give the AI some feedback
what do you mean "learn how the midi is made"?
what are the MIDIs going to be? music, or something else?
MIDIs are just files that hold data of the notes
and then you can later use it urself to create music
so are the MIDIs going to be completely random noise?
its not noise
then what is it?
i just said
like u know
c1 c#1 d1 d#1
then when u drag and drop into a music software like fl studio
it reads it
and then the noise is created through fl studio
the midi file is just data of the notes thats all
Unlike regular audio files like MP3s or WAVs, these don't contain actual audio data and are therefore much smaller in size. They instead explain what notes are played, when they're played, and how long or loud each note should be.
Files in this format are basically instructions that explain how the sound should be produced once attached to a playback device or loaded into a particular software program that knows how to interpret the data.
This makes MIDI files perfect for sharing musical information between similar applications and for transferring over low-bandwidth internet connections. The small size also allows for storing on small devices like floppy disks, a common practice in early PC games.
@serene scaffold
You would probably have to find a way to convert your input midi files into something a model can make use of, then do the inverse with the result
yes i been done that
i just dont know how to process it with tensorflow
i dont depend on tutorials unless its like complete beginner stuff
which is what im struggling to find about tensorflow
uhm tensorflow homepage?..
the only thing i know how to do right now is to make a model save the model and load the model
yeah tensorflow has loads of documentation and tutorials, though personally i would use something like keras or pytorch, how much machine learning do you actually know?
not much
Yeah when reading through what you want, it seems like you may just be in over your head
In that case I think you are jumping in at the deep end
This seems like a pretty complex task, and you don't have experience with ml
You would be better off following a structured course in ML to understand the actual concepts at work
looking at it
also i dont think it would be that hard i have a large dataset
uhm, so how much data science and/or ai/ml experience do you have @royal hound ?
i got a good experience in data science
i havent gotten into ML/AI yet
this will be my first project
so you know that a large dataset might not be good if you have a lot of variation within your data?
Hi i m using google colab and i use multi cells for my disco diffusion art works, i need to know any way to change multi cells text at once , example i have sentence in different cells and i need to know how to change that at once ……. Is there any one can help with that please
Find and replace? Won't be at once. But maybe good enough. Btw why you need to change it at once?
I make bulk image sets like 1000s so i have to do it over and over again and im looking for easy way to make my work
Can you use python for that?
we dont talk about aws on this server do we? specifically deploying MLs if possible
AWS is forbidden topic?
haha no its not. i just meant i dont see too many peeps discussing it
I'm trying to deal with AWS right now and I hate it
I think it should be a banned topic.
also AWS should be banned
I don't care how good the uptime or scalability are if it ruins my life otherwise.
Let's ban AWS and Rex for talking about it : 😂
i need to figure out the tradeoffs between having an ML model on SageMaker Serverless vs. traditional Lambda + API gateway architecture

your first born or your left arm


