#data-science-and-ml
1 messages ยท Page 1 of 1 (latest)
i'd rather not dm. you can replace your for loop involving W_val_hs with flat_gt == 255. this returns a vector booleans of the same size as flat_gt, and it has True wherever flat_gt[i] == 255. the other entries are false
but then how do i only append those selective value to list. considering this example.
what i'm saying is you don't need to do that. appending to arrays is a bad idea and is slow
this array of booleans can be used as indices already
umm.. still thinking
give some time to digest this fact
this is what i mean
In [10]: x = np.array([1,2,3,4,5,6,7,8,9])
In [11]: indices = (x > 4)
In [12]: indices
Out[12]: array([False, False, False, False, True, True, True, True, True])
In [13]: x[indices]
Out[13]: array([5, 6, 7, 8, 9])
In [14]: x[x>4]
Out[14]: array([5, 6, 7, 8, 9])
there are very few scenarios in which this doesn't work
so 10 to 13 is a approach and 14 in itself is a single line solution right ?
10 to 13 is basically the explanation of is't working?
exactly
all good
Im a react native Developer. I learned it through tutorial.
I want to switch to data science and ML im in college 3rd year .
What should i do and learn. I hv no experience . I live in india
@nimble glacier it helps to have a plan and pathway, can watch krish naik youtube channel on datascience roadmap is what the video is called , there you will learn what you need to learn
Can even google datascience roadmap 2022
Krish naik i recommend his channel since he is self taught
And got into ML from software dev
He has lots of playlist tutorials for ML and datascience
I did i know a python basic . I Want to know some yt channel or course if u can recommend .
Question regarding AI and quality metrics. As I understand, there is no guarantee that if I have trained for n epochs, the result will be better in epoch n+1, right? n+1 being slightly worse might be an option while n+2 can be better again when it comes to e.g. IoU or F-Score? Maybe @serene scaffold has an idea?
Thats good to know python basic, you want to learn pandas and numpy
Thats the next step
Because you will need to learn how to open datasets and manopulate it and arrays
There are lots of tutorials on this so just pick any , i recommend ones that are more recent like 2022 tutorials
Thank u thats really helpful
For pandas you want to learn like .iloc ,.loc , how to add data, remove, apply , edit data and merge datasets etc
That should i keep in mind while learning
Yeah
There is a good tutorial i think it is called pandas tutorial by keith gali
And numpy tutorial by freeCodeCamp on youtube
That video is also done by Keith Gali
it's not guaranteed, no. eventually, the model will be the best that it can be given its architecture and the training data. you might also "bounce around" the minimum of the cost function depending on your learning rate.
Thank you :)
Why tf is Pandas so slow? I just converted a program that was iterating through dfs that took >90min to transforming the dfs into dicts and now it takes ~5min
iterating through dfs
that's why.
if you don't write idiomatic pandas code, it's not going to be fast.
What do you mean by idiomatic?
pandas is designed so that you almost never have to write any for loops. and if you do, it probably won't be fast.
pandas does iterative procedures in C code (which means that part of the code can't be expressed as a Python loop), and that is what makes it fast.
also, if you were appending to a dataframe, that runs in quadratic time.
I wasn't, was running this: (skus_B and columns_B are both lists)
sorry, but I won't look at screenshots of code
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
No worries, my bad:
for curr_sku in tqdm(skus_B):
for column_name in (columns_B):
valueA = fileA[fileA['sku'] == curr_sku][column_name].values[0]
valueB = fileB[fileB['sku'] == curr_sku][column_name].values[0]
this could probably be sped up with a groupby. is tqdm from a library, or something you defined?
tqdm is just a lib for checking progress of loops, it prints stuff like this:
How do you suggest using groupby?
fileA.groupby('sku').head(1)
that will give you one row of fileA for each unique value in the sku column. and then you don't have to loop over columns_B, because you have every column in that new dataframe.
why using only mask tokens to generate texts breaks BERT
Sorry, not sure I follow. But just to clear things up, as you can see from the shapes of the dfs, they're huge, and every SKU is already unique. What I'm doing is step 2 from this:
This is what the files look like:
do you know about sets? you can make a set of SKUs from file A, and then do .apply(skus_from_a.__contains__) on the file B dataframe, to get rid of the ones that don't appear in A
because seeing if something is in a set is O(1). so the whole process with just be O(n).
you can do step 2 with isna
provided that you set the SKU as the index for both
That step is already pretty quick, but thanks ๐
How so?
even though the instructions say "compare each column", you can pretty much forget that it says that. because if you do operations between two DFs, it will already align on column names
@limber token "na" (or NaN) is a missing value. and you want to see which values are NaN in file B, but not in file A
so it's just boolean logic.
Sorry, I understand that, I meant how can I use the isna method for comparing which ones are NaN in one file but not in the other
do you know how to use the ~, |, and & operators?
right. and ~ is not
which also means that ~ is a unary operator, whereas the other two are binary.
so, if you do df.isna(), you get a dataframe where every element is a bool. and you get True if it was NaN in df, else False
so can you think of what pandas expression would give you "the values are NaN in fileB, but not in fileA"?
fileB.isna() & ~fileA.isna()?
Thank you so much ๐
why is there little to no tutorial on machine learning?
but there is
there are tons of videos and tutorials and stuff about machine learning. the problem is that there's a lot you need to know before you can start to understand ML, so it's hard to have resources for all the various amounts of knowledge
Hi!
I'm working with a h5 file output from a ferromagnetic resonance experiment.
I have three arrays:
current - shape = (39,)
frequency - shape = (5001,)
amplitude - shape = (39, 5001)
I am trying to visualise them in a surface plot (current x, frequency y and amplitude z) and having a bit of a nightmare.
I made a meshgrid (x,y) with the current and frequency arrays and then just tried to surface plot with (x,y,z) but I'm obviously doing something fundamentally wrong.
thanks for sharing the shapes of your arrays (that's really useful info to reproduce). can you show the code you used to make the surface plot, and what the (unwanted) result was?
(the code should be a markdown block, but the plot can be a screenshot)
the most likely issue is that the cartesian product was done in the opposite order as the coordinates of the amplitude array, but yeah, we'd need to see how you did this
ax = plt.axes(projection='3d')
ax.plot_surface(current_mesh,frequency,amplitudes)```
ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5001, 40) and arg 2 with shape (39, 5001).
oh wait current mesh has a shape of (5001,40) for some reason
can you print out the shapes of current, frequency, amplitude, current_mesh, and frequency after the meshgrid?
frequencies shape is (5001,)
amplitudes shape is (81, 5001)
current_mesh shape is (16203240, 81),
frequency_mesh shape is (16203240, 81)```
sorry my supervisor gave me a new dataset to work with in between these messages
same structure though
the weird thing is that the number you have there equals 40 * 81 * 5001
are you by any chance using a jupyter notebook?
indeed
i bet you forgot to restart the runtime. jupyter is super bad for doing this type of work
idk why people swear by it. nondeterministic code results? no thanks
the issue is you stored the result of the frequency cartesian product in the same variable name. when you ran that cell again, you stacked cartesian products
frequencies shape is (5001,)
amplitudes shape is (81, 5001)
current_mesh shape is (5001, 81),
frequency_mesh shape is (5001, 81)```
you need to rerun the whole thing. then do yourself a favor and don't use jupyter in the future ๐
sounds like we agree on jupyter notebooks. but I wonder of jupyter users see us the way we see type-safety bigots.
why did you do angrysad
because many people disagree. random blog posts without quality checks have made jupyter king
ax.plot_surface(current_mesh, frequency_mesh, amplitudes)```
```ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5001, 81) and arg 2 with shape (81, 5001).```
do i need to transpose one
where can i learn about using gpu in university labs remotely
there isn't really a whole lot you have to learn about GPU computation. if you're using pytorch, or something, you just need to install the right version for your CUDA version
CUDA is the API that pytorch et al use to interact with the GPU
did you already ssh into the machine that has a GPU?
i am using tendorflow with CUDA, after installastion nothing really happened, accept the prompts for "gpu...., cuda not .... " were gone
can i learn that with tensorflow too?
the guy in my interview today at BMW said he used jupyter
jupyter is ok if you want to make short presentations. it exports to PDF and other formats very easily, so you can make slidesets from it effortlessly. but please don't do serious dev there, debugging is hard enough without setting traps everywhere for yourself
I'm not sure what you're talking about. and yes, you can use tensorflow instead of pytorch, for pretty much anything that you would use pytorch for.
if you know what the limitations of jupyter are, you can make it work.
i don't even think i got the job tho... i wouldn't be able to do the job even if i did
Hello, I want to be able to look at an image of a plate full of food with different food items and tell whether a set of predefined food items exist in it or not (It should be independent of the position of the individual food items) . Is there a model that i can use abstractly for achieving this? (please ping when replying)
LOL this is something ive never seen applied
but yeah, its a cv task
its possible though, right?
if u have a shit load of training images
some nerds here are gona say 'well u casn make syntehtic data' but the truth is ur gona need to mask food items by hand and label them
i recommend against it
unless of course u have a method to label otherwise
u will need probabvly tens of thousands
this isnt rly a 1 man project
this is the kind thing that google will put out in a years time
hey look at this plate, these is what is on the plate: carrots, beans and potato
something that good sounds like a research teams projcet
im sure u can find food datasets out there but idk on a plate
really? there are a handful of fitness apps that do precisely this. recognizing foods from a picture and trying to estimate stuff like nutrients and calories
the thing is i need to upload some pictures in the beginning and later i will upload some unseen pictures, it should only say if they are similar to some degree (like 80%) but the similarity shouldnt be affected by position of the food on the plate
idk how widely available their models are though, idk how exactly they do it
hmm
can u tell me one of those, might use as reference
Iโm fat af i wudnt know
there's also onmyplate
i need to train it with images I send
indeed, idk if their models are even available somewhere
if their model is available, you could consider transfer learning
ow
remove a few layers of theirs, keep their parameters fixed, and add a few of your own trainable layers
unfortunately idk how any of this works lol
although i can learn, would be easier if i can use something abstract
while using autoencoder for denoising/spare representation, as both side contains original image, how does model learns to generate the object of interest rather then training to generate spare representation to learns to reproduce noise?
is it because most of the time, info relating to object of interest >>>>info relating to noise
and while getting rid of some data both are equally penalized but what remains is is the object of interest??
as a 15 yr old is it possible to put them all in my mind ๐?
Hi! I'm a beginner in ML, and I have a question. If the resources allow it, would it be better to learn how, and setup a dedicated GPU to train models or use paid online GPU servers?
Well if you are a beginner in ML you are probably still learning about the basics
And I doubt you would need to run a 100 million parameter model on your own gpu anytime soon
anyone know how to make a restaurant menu into a json using image to text recognition would i have to train a ai? any ideas
can you recommend free gpu servers alternatives?
anyone good at ai i got some questions
always ask your actual questions. don't ever ask to ask online.
Guys im tryna sort a dataframe by a function of one of its columns but each value has to check each other value in the column and the dataset is far too large for that. Specifically, each value must check how many times it occurs in the column. How would you go about this?
Specifically, each value must check how many times it occurs in the column
you should just calculate thevalue_countsonce, and reuse it.
df.sort_values(by='some_col', key=pd.Series.value_counts)
hmm, that actually won't work
holy, key seems useful
yeah, but what I did was wrong. the key has to return a Series of the same length as the one you passed to it
ah
thats okay Ive been thinking for a day and a half. take your time
take a look at this https://paste.pythondiscord.com/qawayipise
yeah Im actually not worried about sorting the original df so using value_counts on the series was good enough
but now I think ill try and make that work just cause
doesnt seem to work as soon as i try to reindex it
show code (no screenshots)
def common_answers(data):
return data.Answer.value_counts().reindex(data)
oh i have to probably sort after i reindex it
not sure what this is supposed to accomplish by itself.
also, I would do data['Answer'] instead of data.Answer
it goes through my dataframe and reports the frequency of answers in descending order
why no dot notation
dot notation doesn't work for column names that aren't valid as attribute names. and a lot of people find that use of dot notation to be jarring and ugly.
(some people wish they'd just remove it from pandas)
what does the style guide say lol
well, the python style guide would say to never name an attribute with a leading capital letter.
and it doesn't address ways that one could overload __getitem__
thats a backwards application of the rule but i respect it
yea if I figure this out cool if not whatev
you still need to call sort_values at some point.
hmm, looks like my second approach doesn't work either
๐ฆ
its not reindexing properly idk whats happening to the values count
not sure what you mean by "not reindexing properly" either it is indexing properly, or you got an error message. and those are two different things.
I do not understand how it is reindexing
anyway, just ignore everything I said (other than the part about style) and read this https://www.geeksforgeeks.org/sort-dataframe-according-to-row-frequency-in-pandas/
even though geeks4geeks is a very terrible website
i expect to see the value count as the index for the dataframe when i reindex it but it is not
you don't want the value count to be the index, at any point in the process. because then you'll have rows that are considered duplicates
thats what the output looks like on the paste you sent\
no, because the index for Rohan is 4, but it only appears twice.
I already told you to forget what I said about that ๐
fair kk gonna learn this one then
I don't usually tell people to forget what I say. most of what I say is great.
What results should I hope to achieve with NLP classification? Iโve been playing with a lot of hyperparam tuning and still only hovering around 80% accuracy. Iโm not sure I could achieve higher
Problem is real vs fake Amazon reviews
Iโve seen/talked to others using hotel reviews which can hit accuracy in the 0.9s so Iโm not sure
does anyone know how to encode the tags column like this?
Hello guys, I've written and tested a collection of REST API-s using the Flask framework in Python, for interacting with the Elastic cluster without breaking any security protocol or exposing any credentials/access tokens/confidential data. I'm open to collaborating on this project to add more functionalities or structure it in a better way in order to scale it to a more significant project.
So, please give it a look, star it, fork it and if you're interested in collaborating and then include it in your resume as an open source project, feel free to contribute by raising a PR.
GitHub Link of the project: https: //github.com/atanughosh01/elasticsearch-api
Is there a better way to group these algos? How could I call the first group?
I dont want to split the first group into linear/Bayesian/non-parametric or so
Is there a better way to structure these Algos?
things like KNN really dont belong in the same bucket as logistic regression, so the only thing that makes sense is "Other models" and those models coming at the end
Thanks... yeah I figured there is no simple way to categorize those unfortunately...
Ill probably use your approach and put the "Other Models" in the end
another question though: Sklearn categorizes LogReg as Linear, which makes sense in a binary context. Sklearn also categorizes Bayes as a separate category. But isn't it also kind of linear in binary contexts?
naive bayes doesn't involve trying to fit a line, right, it's purely based on probabilities of events.
linear models, as i understand them, essentially involve some kind of regression and line fitting, where the line itself is linear.
logreg is a linear model because it also involves fitting a line, and then transforming that essentially using sigmoid
linear doesn't just refer to "lines" in the traditional sense, but rather to linear transformations. this includes matrix-vector products more generally
you could pair this with bayesian estimation if you wanted to, as well
you could put all of those under regression. you formulate a statistical model that can be parametric or not, bayesian or not, and linear or nonlinear. you can then choose how to find the parameters.
you can fit DL under different parts of these categories depending on if you use them as solution approaches or as part of the model. decision trees would be part of how you formulate the model
that'd be my take on it from the estimation theory perspective, at any rate. it's just very common that nowadays when people say algorithm, they mean "description of the model"
In this they are just using one hot encoding a
Hey Edd, thanks for your insight! I am sorry, but I do not have that deep mathematical/theoretical background and english is not my first language. I thought, SVM and K-Nearest Neighbour would be non-parametric models, as the model structure is not defined prior to training. And using Bernoullio Naive Bayes would be the only bayesian model that I'd have here, isn't it? Please excuse my ignorance, but how would you group these models then ? Which models would be parametric/non-parametric in your eyes?
for svm, i'd say it depends largely on your interpretation. SVM finds by assuming there exists a hyperplane that splits separates the points in space. being on either side of the hyperplane defines what category the points are in. the task is to find the parameters describing the hyperplane. i'd be willing to call that parametric
knn looks nonparametric to me
as you have written them, bernoulli naive bayes seems to be the only bayesian one, yeah. you could mix and match that with some of the deep leaning approaches depending on how you use them. also with logistic regression and svm
okay, but the hyperplane basically is the space between, that I try to measure - with KNN I do basically do the same, but measure the distance between the data points, but not via a hyperplane ?
but the deep learning approaches are neither bayesian nor linear, are they? The multilayerperceptrons try to fit nonlinear data, and they are also not bayesian?
they can be bayesian depending on the function they minimize
the cost function with which you learn the parameters determines this
is semantic segmentation/superpixel method, then masking, then reconstruction, then calculating reconstruction error that fast to incorporate real time anomaly detection??
yea ur gona need to treat the tags as categorical so that one certain list combination is one category and then get dummies
oh, u want seperate headings
thats gona get u the same results just harder really
Guys is it necessary to read research papers or books for ML or is that only reserved for a more research tailored approach
Sometimes the best explanation about an algorithm is straight from the source @nova matrix
If you can find a better explanation, then do so, but reading some research papers can be useful
I am currently on my path tbh. I started Data Science and ML like 6 months ago. I now know how to use the python libraries and use some algos like SVM , RF , Logistic, Linear KNN, lighgbm xgboost
Idk what to do next tho
What should the next approach be
Should I broaden my understanding of these algorithms through books first or do projects on kaggle
Also when should I make the jump to Deep Learning
hackathon
in what context?
show
object
so they're strings. where did this file come from?
looks like the cells in excel were formatted incorrectly
yea it looks like excel is fked
tat fuckery of a dataset
ill delete the rows
thanks anyways
r u from usa
yes. why?
are there better courses in us
better than which?
it's going to depend on the university, but it seems like a lot of places just have some data science electives for computer science majors. hopefully we'll start seeing actual DS/AI degree programs in the next several years.
stanford ml seems so hard core
my uni doesnt have anything similar
my uni just do heres some code
and yea end of course
it really depends on where you look. you'd have to look at engineering or maths programs if you wanna go very in depth
but stanford is maybe a exception
my uni cs is in eng
but ive heard a lot of unis put it under math
maybe they're in the "cs = software eng" school of thought, which is not necessarily what you're after
yep
how bout in usa?
is it in eng or math faculty
are most cs faculties cs = seng thought
that varies by uni, so you'll find both in the states. you could find it under math or still in eng in other au universities
i wouldn't say so
my uni defs has this feeling
lol
ill shwo ut he handbook
click core courses
it looks all seng
sooftware engineering fundamentals oop comp sci projects
at my uni (I say "uni" even though I'm american), computer science had been part of the math department, but it was moved to the engineering school some number of years before I enrolled. but the curriculum was more theoretical than software eng.
you can click through my discord profile to my github to find all my details
u look handsome!
thx bb ๐
u started college at the same age as me
I don't think I have my age anywhere? 
the ppl challenged me saying tell me if u can build a website with math
hmm i see DSA and discrete maths are the only relevant core courses. what is "higher mathematics" lol
it just feels facepalm
o its just harder math
relevant to what?
calc 1 and calc2
yes but what kind xD math is very broad a term
plus lin alg up to vector spaces/linear/transformation/eigenvalues/a bit of stats
relevant to their desire to learn ML in depth
CS courses that are relevant to ML? well, some people would argue that discrete math isn't really part of CS ๐
yo what
"something that you use in CS" doesn't mean that it's part of CS
oof ๐
tats wat most ppl dont understand
am I making a controversial statement?
not controversial, but i'd say you'd make a good lawyer
virginia is a "purple state". the governorship recently flipped red. but both senators and the presidential electors are blue.
not very many people actually like joe biden, but a slight majority prefer him to trump.
we've strayed very far from data science and ai.
from god
I've watched some of his videos, and I think he's good at explaining things. haven't done a whole course of his though
im redoing his new one
I abandoned god a long time ago.
i'd agree they're good courses. and i'm just memeing ๐
lol in australia < 50% are christian
other suggestions, if you're interested, are gilbert strang's linalg and boyd's convex optimization (these are books)
i dont need alg and covnex optimisation
i would argue if you studied those more, you would pick up ML more easily
yep
Hey,
I transformed an excel file (xlsx) into a csv file.
Then when i run pandas.read_csv('csv_file'), i get this error thrown: pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 76, saw 4
Any idea of whats going on? And how to solve it?
I'm trying to follow along with the fastai tutorial, I've copied the code they posted on their Kaggle Jupyter notebooks and copy-pasted the relevant parts of it to my IDE, but I'm getting this error: https://pastebin.com/AQYuAbtz
Any ideas how to fix this?
My code: https://pastebin.com/Jqbiicd9
Fastai code I copied: https://www.kaggle.com/code/jhoward/is-it-a-bird-creating-a-model-from-your-own-data
^ solved
How did you fix it?
started my non-import code with:
if __name__ == '__main__':
Hey guys, a question
I have a pd.DataFrame column with datetime index and I need to get intervals between column value crossing 100 from below until it crosses 0 from above - is this possible at all?
Here's a sample:
2022-07-01 00:00:02.804000000 1.665
2022-07-01 00:00:02.808999936 2.570
2022-07-01 00:00:02.816999936 3.635
2022-07-01 00:00:02.820999936 3.615
2022-07-01 00:00:02.824999936 4.280
2022-07-01 00:00:02.831000064 4.275
2022-07-01 00:00:02.840000000 4.595
2022-07-01 00:00:02.846000128 2.700
2022-07-01 00:00:02.852999936 3.605
2022-07-01 00:00:02.860000000 5.200
and I need to extract parts where the column value crosses 10 from below (i. e where previous value < 10 and current >= 10) and until it crosses 0 from above (previous > 0, current <= 0)
If the value crosses 10 from below again after the first time, I need to ignore it
how big is the dataframe? after thinking about it for a while, i don't think there's any clever way to avoid looping through that column
https://www.youtube.com/watch?v=VC-H2z0Um6o
is this video did i understand correctly is i say basically its normal masking based anomaly detection but the mask is dynamic and its dimensions are hyperparameter?
what else is novel about the author's model??
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. The proposed self-supervised block is generic and can easily be incorporated into various state-of-the-art anomaly detection methods.
The open-access paper can be found at:
https://arxiv.org/pdf/2111.09099.pdf
T...
what's the structure of the json, and what are the patterns
this is too much to look at, and it's not structured in any way that makes it easy to read.
Please remove the parts that can be ignored, so that your question is focused only on what is relevant. We don't know which parts can be ignored unless you tell us.
some non-arbitrary use of new lines.
I can format it myself if you do that much, I guess.
I already had a plan for how I would format it. I just need the irrelevant parts stripped out.
well, it's your job to communicate your question effectively. you have to give some example of what the data is that is usable by a volunteer answerer.
what is difference between loss and validation loss?
basically loss gets lower with every epoch, but if you want to look more closely and avoid pitfalls, you should also look at val_loss
https://www.javacodemonk.com/difference-between-loss-accuracy-validation-loss-validation-accuracy-in-keras-ff358faa
Difference between Loss, Accuracy, Validation loss, Validation accuracy in Keras
what is correlation between val_loss and loss itself?
Just genuinely, how hard is it to make AI? I see Code Bullet do it and it seems possible to even create the shittiest of AI
I get that math is a huge portion and it's important to know the math semi decently and Python but just even a shitty AI that learns how to scale boxes the fastest
it depends on how much you want to understand what you're doing. high level APIs can help you make super neat stuff without ever understanding what you're doing in depth. this will also limit which problems you can solve and how, but if that doesn't matter to you, the answer is "very easy"
Ah I see. What about making your own? Is that a possibility that doesn't require a ph.D
you need at least undergrad level math for that
if you wanna derive any sort of guarantee or description of the performance, then you need to have a pretty decent level of math
Ok that doesn't seem so hard
I will be learning undergrad math
Uhhhh, I looked through the pins, is there anything else you suggest to help me get started?
learn einstein summation notation and get comfortable with isomorphisms of vectorspaces as soon as possible
so start linalg asap
Nice, words that I need to google
Oh it sounds scarier then it is
Ok yeah, I got this. Thank you for the help
yeah it just sounds fancy, the main idea is stuff like, if you have linear transformations acting on matrices, you could instead vectorize the matrices and make a new, equivalent transformation for that
and similarly for n-dimensional arrays
Yeah, doesn't seem the worstโข๏ธ
Hi! does anyone have source codes for python chatbot?
Really ez if u can code well
But soon all coding will be done by ai so anyone can do it soon
I feel like thatโs not quite right
It is because I do it
No none at all depending on your target
Because you know how to code well?
Not even well but yeah
Please continue explaining
Iโm almost done w my masters and the math Iโm learning is purely supplementary
None of it was required
Supplementary meaning?
Data science
Spent a lot of time feature engineering from data base
Too long actually, become god with pandas and python
Import ai library
Get predictions
Wow look at that successful model
Step 1) Import AI library
Step 2) Predictions
Literally yes
Amazing. How would that be implemented into, lets say, an AI that tries to walk
Sure, this is an actual what do u call it
Duno the word for it but itโs like a independent entity
Reinforcement learning plus computer vision shud do it
In the first place yes absolutely, now not so much as people have already done it
It is a fact that yes you need to know math to know whatโs going on in the background
Step 3) Have you tried importing libraries?
But no u donโt if all u want is to make a suboptimal model
But if you donโt care about the background
Ah ok, makes sense
I would like to be able to create it? If that makes sense
All this will be automated by ai writing code for u soon anyway
Why so
Because ai can already write code
And imho these jobs will all be replaced in like 10 years or less
Except for the hardcore resaearch which yes requires a lot of maths
So, management asap for me
Lucky ur not doing this for career
Oh wait, literally every career will be replaced by ai
And you can do that without maths how?
Never said I do it without maths
not just talking about normal maths either that shit requires REALLY advanced maths
Good luck then
Oh thought u meant ai research
It is fun
I am learning Python
But I have experience in web and C++ (didnโt like web development so I switched to Python few days ago)
I tried django lately it was hard
Whatโs Django
Hey is there a tool which can write stuff using AI on the basis of existing text?
Like add onto it.
Hi everyone, I am trying to implement a simple 3 layer neural network on the MNIST handwritten digits dataset using numpy only. I am facing some errors when running the gradient descent. can someone please help me out.
this is the jupyter notebook i am working on
404 error
i think it should be accessible now
i was following andrew ng's course and i didn't really grasped what was going on in the backpropagation portion
oh
so i thought doing this would help me understand
fellow discord, where do you start, when learning machine learning
all I know is that for each weight you add a partial derivative of the loss
have you already taken calculus?
and yes, implementing the algorithms by hand is good.
although I am still not able to fully understand :' ))
ignored*
yes i have
you can go to deeplearning,ai
is it free?
this seems ok: https://www.youtube.com/watch?v=rEDzUT3ymw4
then learn tensorflow/keras
and solve mnist
i know how it works
oh
like algorithm and stuff
do u know tensorflow
i studied calculus back in school, not so properly in college
but how do you actually write the code
module?
well, if someone is trying to start with ML, I wouldn't even touch neural networks for a few months.
yep, I learned with this tutorial: https://www.youtube.com/playlist?list=PLhhyoLH6IjfxVOdVC1P1L5z5azs0XjMsb
thats true
my teacher started with regression
they get really complex really fast
my model's validation keeps getting stuck at 55%
how can I fix this?
my goal is >= 80%
also, think about what non-ML people think ML is. they're usually very wrong about what ML is. and it would be difficult to learn neural networks, while also reframing your whole understanding of what a neural network does.
maybe you're overfitting to your training data? did you use regularization?
yep, and dropout
l2 regularization
imma check it out laterp
thanks
ok
are you using sklearn?
ohh i dont have experience with either keras or a conv net, my bad
ik, when I first started, I thought ML was like trees and evolution
ok
my dad thought they were basically huge flow-charts.
haha lol same( not my dad)
how to properly use dropout and regularization
how large is your data set and how large is the network? might be you need some augmentation, too
my dataset is i think 30000
and this is my network: ```py
tf.config.run_functions_eagerly(True)
model = Sequential()
model.add(Conv2D(8, 2, activation='relu', input_shape=(48, 48, 1)))
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))
model.add(Conv2D(16, 2, activation='relu'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))
model.add(Conv2D(32, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(MaxPooling2D(2))
model.add(Conv2D(64, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(Conv2D(128, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.003)))
model.add(Flatten())
model.add(Dense(512, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.4))
model.add(Dense(128, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0001)))
model.add(Dropout(0.3))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))
model.summary()```
the exact size of the dataset is 28,709
help?
if you think about the size if your image and the number of convolutions and max pools you're taking, it makes more sense to use fewer convolutional layers with larger kernels
Hello everyone! I have a newbie question! โบ๏ธโบ๏ธโบ๏ธ Iโve begun to preprocess Reddit comments for a beginner personal project of sentiment analysis.
My question is
does everyone use TextHero for all preprocessing needs?
Or do you guys use individual libraries for individual preprocessing tasks?
Thanks in advance ๐๐พ๐
I do nlp professionally. and I haven't heard of text hero. what preprocessing are you trying to do?
Never heard of text hero either
Hi! Oh wow! Thank you for taking the time to respond! ๐๐พ๐๐พ. Currently I have
-extracted the comments from one post on Reddit
-I have them in a list of strings
- I think I can convert the list of strings into one string to best get at the text
- then! I am one by one , trying to remove emojis, numbers, url tags, stemming, tokenize, etcโฆ. (I think that is the next step?) I was using NLTK so farโฆ.
-after doing this I hope to have words with which to begin some beginner analysis (summarizing, wordblob, sentiment analysis)โฆ
Am I on the right track as for as my process? I heard that TextHero is something to use that is a โone stop shopโ compared to SPACY or NLTK. Is that correct? Or do you have a favorite?
Thanks so much for any advice ๐๐พ
I usually use spacy, or implement it myself with regex. but if this text hero thing works, I guess why not use it?
Very fascinating! I did start learning about โregexโ a bit yesterday. And I will continue today with that research. Iโm so excited! Thank you for the advice! ๐๐พ๐๐๐พ๐
But since you work professionally and havenโt heard of itโฆ.
TextHero probably isnโt being used professionally. So I donโt wan to get to into it if its a waste of timeโฆ.. I am really trying to tailor my learning to applicable job skills so that I can get employed as soon as possible
-thank you again ๐๐พ
maybe other people are using it professionally, idk. I'm just one person
Lol that some good advice also! ๐๐พ
if you're doing projects and learning the terminology, you're probably spending your time well.
Thanks for the encouragement! Itโs lonely here on the self taught roadsโฆ. But yes! I am working on my first real NLP project and it sooo hard but I am loving it.
Took me two days to even figure out how to get the Reddit comments into the list of strings! ๐๐ but itโs so fascinating and Iโm hooked ๐ค๐ค๐ค๐ค๐ค
self-taught. do you have prior industry or academic experience?
No I come from the service industry. Warehouse jobs and bartendingโฆ so I know I have to work a million times harder than those coming from graduate degrees and technical backgroundsโฆ. But Iโm READY for this challenge!! I believe in myself. Itโs so encouraging to hear you work professionally and are self taught as well!!! Wow I canโt wait for that to be my story ๐๐๐๐
sorry, but I'm not "self-taught" in that sense. I got a degree and did research for my university. (though all learning is fundamentally self-learning.)
and I had been working for starbucks for a few years before that. so unless some life circumstances would make it supremely impractical for you to get a degree, I would encourage you to reconsider it.
any quants in here?
Surely there are some practical uses of AI n warehouses and bartending no?
(quantitative analysts)
in warehouses yes, for bartending i'd say in a similar* way it could be useful for dogwalking
I can think of an image classifier to detect drunk people
as in the application would prob be overkill
Unfortunately my employer isnโt encouraging for learning and stuff. So I know Iโll have to seek opportunities elsewhere.
lol quantify how much liquor to put in someones drink by identifying drunkenness via facial recognition
as drunkenness scores higher , beverage alcohol content lessens
lmk plz
May I please ask why you call yourself โquantsโ?
Is quantitative analysis your favorite or something? Iโm Just a newbie over here wonderingโฆ..
i dont call myself a quant
i want to become a quant tho, thats why i ask
quant is just short for quantitative analyst bc nobody wants to say that every time, and quant sounds less nerdy
Very interesting! Thanks for much for explaining ๐๐พ I love to learn all these new things in the community
Unfortunately itโs not viable for me. But thanks for taking a moment to chat with me. Hope you have a groovy day ๐๐๐พ
What does quantities analyst means?
No one says quant over here
so ur new lol?
If ur in america sure. In Europe ur gona need a degree
No
I actually had to look it up too, seems a finance thing
Quant always crops up as some sort of super advanced high paying job in finance
Not in data science
Sounds really like theyโre making trading algorithms
quantitative is a word that means data that refers to quantities, therefor it can be counted, numerically expressed/scaled, etc
Yeah but thatโs really a broad term
not rly
So quant is someone who does quant? Awesome. Or you could just say youโre a data scientist specialising in finance
Anyway, I think this terms american so make sense why no one says it here a lot
if someone is a quant it is implied that they have a remarkable level of proficiency analyzing specifically quantitative data . heavily oriented around statistics, stochastic calculus, eda
So data analyst on steroids
data analyst on super steroids yes
"The data analyzer" seems more fun tho
So you want to be a data analyst whoโs basically just more skilled than normal ones
That is quant ?
I guess data analyst has been really dragged through the mud whereby youโd imagine people making bad charts and stuff
Makes sense youโd want to distinguish from this
i think thats a gross oversimplification
there is a reason why most seasoned quants make 200k-2m+/yr with insane bonuses
Google says quants develop mathematical models
This is certainly not like data analyst
Seems like data scientist plus financial mathematician
that happened to be incredibly proficient in analyzing numbers
sure
u get the idea
Basically in simpler terms mathematician at the end of the day
no point in continuing this circular convo
simply this
abs not , 90%+ of the work is coding
anyway
Compared to DS at least
so are any in the chat?
This is a data science chat
its\ closely related
did you have any questions?
about quantitative analysis? yes
If u ask it maybe someone can answer u
But no that I know of there are no quants here
Mostly data scientists
just curious where to start from a relatively sr/adept python dev
im already quite familiar with data analysis+/science already, ofc databases, some hft
i did not go to college so my math expertise is pretty average
And you want to be a quant ?
Didnโt you just state quants are like math gods
Yeah that makes sense
a little math is good for everyone, so i'd start there. you can also look at domain-specific techniques and models to get a feel for what kind of maths will avail you
and no i didnt, i just said statistics and calculus
but yeah ofc those can be pretty intense areas of math.
I mean we can point to data science stuff but for financial quants I donโt think we have that knowledge
but its not like rainman mit shit
well all good thanks anyway boys
Statistics and calculus isnโt that heavy for most data scientists
glad to hear
I mean, in terms of data science Iโd say start with python
ive been procrastinating on learning calc
Iโve never met a senior python dev whoโs 18
yea i work remotely as a python dev
you're definitely gonna want some calc. in my very limited knowledge, finance relies heavily on differential equations, so you'll wanna reach that eventually
and stats/probability goes without saying
Try PyTorch or tensorflow
Tho Iโm not sure you need it for quant
Sounds to me like numpy will be most important to you for maths in code
oh ofc those as well
i never got into building models from complete scratch tho so i cant say im a true ml engineer
yea i think scipy will be too
painnn bro ok thank you
Iโd be filling to bet there are python libraries for financial analysis
yeah guess im hitting the books today
And other math stuff
god bros feels like im learning from scratch all over again
To be fair, ur only 18 man
7 more years until neuroplasticity declines
we speedrunning
I mean, most people here are a lot older than you ur starting very early
yeah im glad im starting now. i just got lucky
that i had a friend around me who pushed me into it. otherwise it wouldnt have happened
must be. ive seen some nice gui-based py tools for it, check out quantstats
Do what u enjoy tho, if it doesnโt turn out to be quant donโt chase it you may find more fun stuff along the way
I use STATA almost exclusively for stats
Aim for the sky but I wud say donโt expect an 800k salary
never heard about it, educate me?
ya thatll take years to get lol
might as well get into entrepreneurship before then
biostats? biological statistics?
interesting
I recently did such report
do you have a med background?
Yes, I did a report on survival analysis
So thousands of people over a study period, with certain factors which influence outcome
interesting, what were the parameters?
name every single hyperparameter
It doesnโt use hyperparamets because itโs not machine learning
jokign
Itโs stats.. no tuning
you said you did survival analysis which i'm guessing means analyzing what survives based on some parameters?
Regression etc
Do you mean what covariates were there
like weight, height etc.
Oh yeah
Ahem: sex smoking status bmi age education drugs alzeimers were all confounders
This is not what the models โbased onโ theyโre adjusted for
The models purely bmi and death as well as bmi change
For example, thatโs one model
im joking it just made me think of that joke like
oh ur a math major ? name every single number
No my bachelor was biomedical science my master is data science
no no i know i mean that was the joke
it was a joke
like ur a chem major name every single molecule
Iโm also on the spectrum
more power to you
yeah i was gonna ask
Basically obese people were 3x death rate
Using cox proportional hazards
so you used bmi to predict survival chance?
Being โoverweightโ actually had a protective effect
Compared to underweight
Yes as well as a few other things
I have this sort of experience as far as stats goes, mostly inferential
Not much in terms of theory that you may expect
wait i'm curious, the different columns of the data like height, weight etc. do you not call that parameters in english?
No
that's what i meant but how do you call them?
In ML they call that feature, in stats itโs variable
oh ok
Anyway, they werenโt columns here in that sense
Itโs not predictive itโs causal inference
Except for MICE which you can say is predictive i guess?
I saw you typing edd
Dependent and Independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function), on the values of other vari...
This might help.
Most generic is just "input" and "output".
Absolutely based
yeah no i know what those are i just didn't know how to call them because i didn't learn this stuff in english haha
you have the data you use to predict and the data you predict
Yeah this is just a list of English terms that you can use.
yeah in your case i wasn't referring to that i was talking about ML
thanks!
Are you Israeli
but just out of curiousity, don't you do exactly the same thing as you would in ML? you just fit a model to the data you get and then you can use it to either make a graph or use it in ML to predict the chance of death right?
and for other stuff
Hold on a second
and yes i am
hi so i finally got my neural network to work, but for some reason it is not converging
Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the ...
It isnโt the same as ml
can someone please point out why it is not converging
Nice ur country is amazing for comp sci
"Parameter" is technically valid here (it's also super generic like "input" and "output"), but every field uses the word differently.
thanks
in what way? you do fit a model to data right? so is it different just in the sense that you don't use that model to predict things?
Risk of event uses hazard function
Kind of
U canโt really say that logistic or linear regression is machine learning by default
In my opinion
no they're both concepts taken from statistics used for ML but it's the same concept
Iโm sure alot of people do but for me itโs just a tool
With uses
Maybe it becomes Ml when you want to predict death for data where death is not available sure
Thatโs why I said for MICE it can be considered such
what would it be useful for other than that?
i mean you could use it to see how weight affects survival rate but you don't really need to fit a model for the data to do that right?
you can just look at the data and see it
Odds ratios or hazard ratios rather than just someoneโs interpretation of data requires a model
And yes it gives you โprobabilitiesโ but youโre still not doing ML
ok i understand
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due to symmetry), the ratio of the odds of B in the presence of A and the odds of B in the absence of A...
Look at that
This is like, the fundamentals of logistic
Used aloooooot in medical research
Like the example I used above, different BMI categories
But you may have noticed I used cox, thatโs because itโs time dependant
Beginner here, a little frustrated with understanding pandas and the format/type of the values. Specifically with dates. I'm pulling data out of excel and out of a sql database for processing. It seems that pandas/python typically thinks my dates are strings. For example, when I run this code:
df_projects = df_projects[pd.to_datetime(df_projects['stf_date'], errors='coerce').notnull()]
print(df_projects['stf_date'].dtypes)
It tells me that the data in column stf_date is an object (which I assume means pandas/python thinks it's a string.
Can y'all help me understand why I'm getting that result, immediately after conversion?
The format of the date in excel, which is the source of the dataframe, is %Y-%m-%d.
can you do print(df_projects['stf_date'].head()), so we can see what the timestamp format is? you might need to set the format= argument for to_datetime
!docs pandas.to_datetime
pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)```
Convert argument to datetime.
This function converts a scalar, array-like, [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") or [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame")/dict-like to a pandas datetime object.
2023-01-31 00:00:00 is the result after conversion.
is that what you want?
I would like it to stay as %Y-%m-%d, as it was in excel. It's a project, so we're just working with a resolution of days, not minutes, so I don't need minutes.
That is an object
Not really itโs a date time object in memory
keep in mind that moments in time are not strings. if pandas is showing you 2023-01-31 00:00:00, but the data type is datetime, the way the moment in time is being represented for human viewing is not a string.
Bad computer!
so, just forget how whether or not it "stays as %Y-%m-%d", because that won't matter for your calculations. what matters is that the moment in question is being represented unambiguously.
Or some function that will change it
print(df_projects['stf_date'].head())
df_projects = df_projects[pd.to_datetime(df_projects['stf_date'], errors='coerce').notnull()]
print(df_projects['stf_date'].dtypes)
print(df_projects['stf_date'].head())
gives me:
0 Complete
1 Complete
2 2023-01-31 00:00:00
3 2023-01-31 00:00:00
4 2023-01-31 00:00:00
Name: stf_date, dtype: object
object
2 2023-01-31 00:00:00
3 2023-01-31 00:00:00
4 2023-01-31 00:00:00
5 2023-01-31 00:00:00
6 2023-01-31 00:00:00
Name: stf_date, dtype: object
Whatโs the problem with it
so, these are strings. you want datetimes. remove the errors='coerce' part and see what happens.
also, why do you have Complete in there?
Well, I think I need to keep that as the column is initially mixed between strings and dates. For a project that is complete, that column has the string complete in it.
Oooof
it might be that the timestamps are actually datetimes, but that your column is heterogenous (has more than one type). your columns need to be homogenous (every value is the same type)
Mixing data types
So the purpose of that line of code was to try to convert everything to a datetime, and the strings get converted to NaN
you need to pick a better data model. that's going to ruin all your calculations.
And then removed.
You probably donโt want to mix dates and complete in one column
You
probablydonโt want to mix dates and complete in one column
Xd
Yes, this is true. That's why I'm removing all those rows with complete in that line of code. After that, everything in that column is a date.
you can have is_complete as a separate column of bools.
Good now ur problems fixed
pandas won't automatically infer that that makes the column homogenous.
OK, so let me take a step back here and talk about the table that's feeding this thing as I could use some advice on how better to build it.
In fact, lemme go stare at it a bit and see if I can reconfigure the table that's feeding the dataframe. It could definitely be improved, especially considering the issue I'm having at the moment. ๐
Thanks for your help so far folks. Much appreciated.
Well, yes, but I mean, I shouldn't have the word 'Complete' in there. it's not misleading when you're looking at Excel, but perhaps there's a better way so that when I feed this into pandas it makes more sense to pandas.
However, once all the rows with complete are gone, why does pandas still think it's an object?
And can I convert it to datetime?
Date time is a object
O.o
not in pandas
Thus, my status as beginner.
Itโs automatically changed it?
What is it?
Anyway using date times in my opinion sucks in pandas I use float from time delta
it's a primitive type in pandas, datetime
So youโre saying pandas canโt do the math without converting it to datetime ?
not sure. but keep in mind that any non-objects in the dataframe are C primitives, and there's a C primitive for datetime (in the sense that they're represented as a float, or something)
Ok from excel dates maybe import them first as a python list?
My code does need to see it as a date. I guess it would be nice if I didn't have to convert it to a datetime when I use it from the dataframe.
the datetime in the python stdlib, however, is an object. because everything that's natively in python is an object.
@timid kiln what happens if you take two values and subtract them?
In ur python
Just get two random values from ur column
You can use iloc
If it gives you a timedelta then thatโs a w
Basically the table coming in is a list of projects and the stf_date is the date when the project should kick off. If the project was completed, that value is Complete even though I have another column called Status that has Complete in it. it's just that, as I pulled this data together, I didn't have dates for projects previous to 2021, and I do need to have the data in the table. So I'm thinking partly to put in a fake date so that I can have only dates in that column. If the project is complete, it doesn't really matter what's in the stf_date column. I just don't like it having a bogus date in there. I'm a bit OCD I guess.
What are u gona use pandas for
pandas is helping me build another table based on the information in the projects table. A status table, if you will, for forecasting the future.
Based on that stf_date and the start date of the project, certain values will be calculated and placed in a column that pertains to each month in the future.
What values
For now, how can I get pandas to recognize it pulled a date in from excel?
Have u tried putting the date from excel into a python list first
I'll answer all the questions about what I'm doing after I figure this one thing out. Dates in python have been kicking my butt for the past week, ever since I found out about datetime.datetime and datetime,date and timestamp and how they all hate each other apparently. ๐
Put it in native python first and see
When you make a column out of it might convert
I haven't, because I never thought about testing that, nor did I think a value in a list could have a format, so to speak?
Yes, that's right. I never thought about testing it like that tho.
A list just lets you access it in memory daily
So how does pandas recognize something is a date in a column? When I pulled in data from a sql server, it had timestamp in the dataframe surrounding those values. So it seems that there needs to be some text of some kind in the dataframe for pandas to go "oh, this is a datetime.date or something"
Why donโt you test what steraclus said
Make a new column which is defined as date column minus a value
Where that value is another date maybe
And see if it works
It shud give you another date object thatโs a time delta in days
OK, I'll give that a try. Thank you for the suggestions!
If that works thereโs no issues for having that object type
If it doesnโt work youโll have to convert it
Because itโs looking at native items inside the series though Iโd expect it to work
Iโm about to have to work with date time dataframes myself
Need to define if someone died within a year of an event
I have a dataframe, that contains columns with some features, in one column I have seen that when a feature exists , then in another column I have empty values, is there a way to express that programmatically or check if it is the case for other features as well ?
Youโd have to draw a diagram or something but anythingโs possible man
Ok I get it, yes thatโs easily possible
Example : column_name = "is_brown", with values [1,0,0,0,0,1] then I have another name "is_black" with values [0,1,1,1,1,0] , here the case is obvious but I used correlation and cramers and got 0.48, while it should be something better
It should be 1.0 u mean?
Correlation isnโt gona work with categories like that by defualt
In that case it should be 1, but in my case it is more complicated, instead of is_black, I have various colours like red,green etc etc, so by taking the is_brown I should produce something better
It would treat it as a number but Iโd have thought it wud be higher than 0.48?
It didnโt error when tried missing value?
So for example instead of saying correlate the whole column, check for correlation only when value is 1?
I covered it with 0
Oh thatโs right I think it also skips missing
If u covered it with zero itโs going to break ur pattern try no doing that
It might skip empty rows anyway
U may end up with 0.50
Ok I think I know why
Basically your correlation method is treating them as scalars
And sometimes you have 0>1 and other times 1>0 ?
So it should land on 0.5 ?
So if u skip the empty it shud he 0.5 not 48
Exaclty 0.5
Unless you have more 1s than 0s in the first column by a little
do you add batch normalization after every layer?
Just doing basics. Anyone see why I am getting the exact same values from LinearRegression and Ridge?
have been playing around with alpha just to get a noticeable difference, but they're always same
that'll depend on the singular values of the gramian of the model matrix
you can think of alpha as a factor being added to the singular values of the matrix M^T M, where M is the model matrix. in your case that you have a linear model, that'd be an N x 2 matrix M, where N is the number of examples you have
you'd need to compute the singular values of this M^T M and see how large they are. alpha needs to be comparatively large to have any effect, as alpha -> 0 means the regularization disappears
Perfect, that makes sense. Thanks!
in your case, the matrix would be given by M = [x 1], where x is the vector of x values, and 1 is a vector of 1s of the same size as x. then you can compute M.T.dot(M) and ask numpy to compute the SVD for you
take a look at the singular values and see how large they are
hoo boy, that'd be why, then
try something in the scale of e3 or e4, so that you modify at least the lower singular value. if you go up to e10, then you'll definitely see differences
also notice that these vandermonde matrices often have this nasty condition number (the ratio of the largest to smallest singular values)
tbh idk if scikit learn internally works with the vanilla least squares expression or if it uses 1/N in front (same minimizer, smaller minimum). at any rate, using alpha in that range e4 to e10 will certainly net you some effect ๐
that data doesn't look so linear lol
Propbabily shouldnt beusing linear stuff anyway, but
Lol, just practicing
Next up, log
๐
Thanks for the help @wooden sail
all good
Sorry, fell asleep. Thatโs interesting. I mainly didnโt like web development because it was artsy and stuff
I am not a art person at all so creation of websites was harder than usual and I just prefer arbitrary numbers
yeah its pretty horrible to learn
typing html makes me sick
Itโs honestly not bad for me
Like itโs fine. Itโs just, not something I ended up enjoying
HTML and CSS were easy to type and code up but then when I came to actually making it look nice? That was what sucked ass
Oh and atom is a fucking legend for the Beautify. What an amazing way to not have my html not look like shit
Easy to type to you , for me just nasty experience
My wrist physically hurts with all the <>
/.
has anyone used RLLIB with pettingzoo?
Auto fill are a blessing
hey everyone
i really need some help with pytorch
im confused about
the following error:
All my input files (in the dset folder) are 50 x 50, and my neural network has an input of 2500, so im confused as to what's going wrong
Hi and welcome!
It's not a channel for memes and shitposting though
it seems your input is not of size 2500, but 2500x3. are your images in RGB or black and white?
It's not meme. It's too late.
they are in rgb
that's your problem, then
you can convert to greyscale or apply a transformation that affects all 3 color channels
a simple way of doing the latter is to fully flatten the input to size 7500
what does flatten mean?
just change the input size to 7500?
ok
what documentaiton
can i refer to in order to convert
the matrix
into a vector in pytorch
i wouldn't know where to look, i've never used pytorch ๐ go to pytorch's website and look there
alright
It's an immutable rule
Can I get some quick help please with DateTimeIndex and pandas?
I can say the set is consistent, say I have some dataframe df I can slice like:
df1 = df["2020-01-01":"2020-10-01"]
and get fine results
But if I try to jump over years, say df4["2020-05-01":"2021-01-01"] it returns a failure:
<ipython-input-58-cf7eb8035389> in <module>
----> 1 df4["2020-11-01":"2021-01-01"]
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3005 # either we have a slice or we have a string that can be converted
3006 # to a slice for partial-string date indexing
-> 3007 return self._slice(indexer, axis=0)
3008
3009 # Do we have a (boolean) DataFrame?
~\anaconda3\lib\site-packages\pandas\core\generic.py in _slice(self, slobj, axis)
3807 Slicing with this method is *always* positional.
3808 """
-> 3809 assert isinstance(slobj, slice), type(slobj)
3810 axis = self._get_block_manager_axis(axis)
3811 result = self._constructor(self._mgr.get_slice(slobj, axis=axis))
AssertionError: <class 'numpy.ndarray'>```
df["2021-01-01":"2021-12-01"] slices fine, and the whole frame df has no issues. I get the same erro when I slice over 2021/2022 also (e.g. df["2021-11-01":"2022-01-01"])
Not sure what's happening here. I am taking advantage of the DateTimeIndex feature in pandas when it comes to using strings to filter/select/slice so maybe I could go barebones and just reference everything without using strings, but I'm curious as how I've gotten here.
For what it's worth, the data is consistent across the year jump (that is, a lineplot vs time for the set displays what it's supposed to display)
Bro help
I used wav2lip on conda at windows 11
And it took 1 hour for 10 secs video
How to make dit faster
Any idea?
I have RX560
Laptop
it depends on your code I think, like what you do with the video and how
Does it support your graphics card? Most often Nvidia cards are supported.
Hi!
I am working with results from an FMR machine.
I am trying to remove background noise. I have a reading/contour plot for background noise. I have another for background noise along with the magnetic sample. I'm trying to isolate just the signal from the magnetic sample.
On X axis I have current, on Y axis I have Frequency. The contour is the amplitude (which is the signal i'm trying to separate)
background only (horizontal lines) and sample+background (image with curve included)
when i simply did amplitude2-amplitude1, i get the following:
horizontal lines still present
so it's basically background noise removal from an image/signal, where i have the background already recorded
can someone help me im a newbie in ML and i have an important project to finish this week?
do you have many examples of the background or only one? you can try stuff like cv2's background subtraction methods. for a relatively simple approach (which no guarantees), you can look for the optimal amplitude to subtract, since the scales of the background image and the image with the signal in it don't seem to match. you'd want some scalar c such that
.latex \argmin_s $\Vert I_signal - s I_background \Vert_F^2$
min w.r.t. s || image_with_signal - s*image_with_only_background ||_F^2
is there anyway to prevent label encoder from encoding NaN values
One should first figure out why it's encoding NaN values.
You need to figure out why it's doing it. And then either fix your data so that that doesn't happen, or decide what you want to happen if a NaN does get encoded
If I have two dfs with the same exact columns but different data, what is the best way to join them?
I found .concat(), .merge() and .join() but not sure which is the best for this
concat and append seem to do what you want
Worth noting: I noticed I was somehow losing some rows with .concat()
Seems like the background is not always the same, so you can't just subtract
A very raw way to do it would be maybe taking the mode (or median) color of each row in the image, and subtracting that from the image
Since the background seems to just be horizontal lines of mostly the same color
if, by joining, you meant stacking them vertically, then pd.concat should work
it won't remove rows, so you need to prove that :p
merge & join perform more complicated concatenations; append is deprecated and will be gone
@candid garnet that would give you something like this
Hey guys, new to image recog, cnns, have a problem.
when i use model.summary (model is a Sequential object frm keras), the output is being shown as multiple, instead of the usual tuples, please help
my code
please always give text as actual text. Code, error messages, etc. Only GUI stuff should be given as screenshots.
Would be nice to show the summary as well
Don't know what kind of datatype a multiple is
If it is a datatype at all ๐
okay will tc next time
model.add(keras.layers.Conv2D(64, 3, activation='relu', padding='same'))
model.add(keras.layers.MaxPooling2D(2))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Conv2D(128, 3, activation='relu', padding='same'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.BatchNormalization())```
```model.add(keras.layers.Flatten())
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(32, activation='relu'))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(class_num, activation='softmax'))```
```model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy', 'val_accuracy'])
model.build((32, 32, 3))
model.summary()```