#data-science-and-ml
1 messages ยท Page 291 of 1
ye
should be able to df.to_csv(path_to_file.csv)
replace df with the name of whichever dataframe you want to create a csv from
also, pandas has lots of options for reading and writing csv files, might be worth of look if you're looking to do something specific.
That line I gave you above should work, just give .to_csv() a path to work with
ok I erased the df1 completely, thank god I was smart enough to make 3 copies
lol
File "<ipython-input-85-3aedcd768420>", line 1
df1.to_csv("C:\Users\redacted\Dropbox\Documents\Elite")
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
๐ฆ
lol
soory if I m bothering you lol
On Windows you need to watch the backslashes in your path, they escape characters. Try "C:\Users\redacted\Dropbox\Documents\Elite\my_file.csv"
Samething
hmm let me google
df1.to_csv(r"C:\Users\redacted\Dropbox\Documents\Elite\my_file.csv")
This worked
nice!
what does the r do?
r = raw, means don't escape anything.
I also just noticed that my line above took out the extra slashes I put in
"C:\\Users\\redacted\\Dropbox\\Documents\\Elite\\my_file.csv"
that's what I sent. That should also work
Ok thanks
I will create a bar graph no and a geomap
Should take me 5 hours and then I will be done for the day
lol
ha! good luck!
Pondering how to get Python 3.9 running on Databricks since there's not currently a runtime that supports > 3.8. My idea is to run a cluster init script that installs 3.9 and then see if setting the PYSPARK_PYTHON environment variable to point to the install location does the trick. Don't see why it wouldn't work, but any other ideas?
Can anyone help me? i am getting some really weird model results i can't figure out why
welcome to Machine Learning ๐
BTW if you post your problem here, someone might be able to better help you
I am running 4 ML methods: Random Forest, Gradient Boost, Logistic Regression, and RNN. They are being run a credit card default data set. The first 2 seem to run fine and the results make sense. Here is the conf matrix for RF:
My last two are way off and look weird
here is the LG result
i can't figure out why its not fitting correctly
RNN training looks like this
anyone machine learning devs able to help, my GAN accuracy does not improve from 0 even after thousands of epochs, not sure why
Hey @undone scarab!
It looks like you tried to attach file type(s) that we do not allow (.html). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
my code can be found here: https://paste.ofcode.org/EPdRW2bKfRAw22jtsktUQH
not familiar with vsc. sorry. i usually use pycharm or jupyter
share code?
also a lil off topic for ur current situation but play around with sklearn's GridSearchCV function and u will be able to make ur models more optimized
do they have any documentation of sorts??
@undone scarab code is here: https://colab.research.google.com/drive/1gADi2h2iJfwR1ISO192I6A3jVOPalIwA?usp=sharing
i think i got it though
i had to transform the data using a scaler
i dont have acess
Hello guys - what solution would be recommended for doing point and landmark labeling of images?
what is the task?
Maybe put the plot line up top and try?
Hello Everyone.
I have written a blog on image similarity without using any fancy techniques such as CNNs and more. It uses pure maths and basic programming and the results are pretty good. Please do give it a read.
https://www.analyticsvidhya.com/blog/2021/03/a-beginners-guide-to-image-similarity-using-python/
I am also new to this group but I aim to learn a lot from you guys. Thank you everyone
anyone knows how to plat data like shown in the chart, or what kind of dataset is required for this ?
don't forget to ping on reply ๐
Hello, can someone help, I tried advices from https://stackoverflow.com/questions/34682828/extracting-specific-selected-columns-to-new-dataframe-as-a-copy but nothing work
# %% [markdown]
# ---
#
# **TASK #1: Resample the data into weekly bins**
#
# - Store the result in DataFrame referenced by variable `night_mexico_w_angles_df` in DataFrame `night_mexico_1w_sample`
# - Make a selection to only use column `counts`
# - See:
# - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html
# - Note: when pd.Timedelta() is used more, string formats are supported
# - `pd.Timedelta('1day')` works, `resample('1day')` does not.
# - https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
# - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html
#
# ---
# %%
night_mexico_1w_sample = night_mexico_w_angles_df[['counts']].copy()
night_mexico_1w_sample
# TODO
...have you looked closely at the code you wrote?
or the error?
can you suggest to me what you think the problem is
I think, the problem is that it gives me AttributeError: 'Series' object has no attribute 'counts'.
good one. Too often people treat Deep Learning as a silver bullet for their tasks that can be done with simple algos.
Thank you. I just wanted people who are scared of terms like CNNs DNNs to move past that barrier and understand that the world offers many ways to do and all you need is passion. I personally was shit scared when I heard people talk about DL but I did this as one of my assignments got to learn something and now I am confident.
Thank you for your reply.
shit scared when I heard people talk about DL
It's not that tough. classical DL is a tinker toy compared to the really sophisticated stuff in development
I still have a long way to go. Thanks for the help. Love to be a part of this group
once you build a MLP (Multi -layer perceptron) that is very easy and van be done on 10 lines, you would realize that DL is just that scaled up
ATTENTION IS ALL YOU NEED ? naah Passion is all you need ๐
self attention is a bit complicated IMO
but really agree about passion. I consider it to be one of the most important aspect
Thank you
while a lot of people say that passion is not really necessary - it's a pretty common occurrence in really successful people. passion provides such a powerful internal motivation because a passionate person would do anything because he/she loves his work so much.
especially in tech
I tried, but I don't understand it at all
I am starting with pandas. There is a function similar to Excel's vlookup in Pandas?
pd.merge()
Depends on what kind of data you're looking for. You can do something like
df['column_name' == 3] and it'll filter those values in that column_name that are equal to 3
Hello
Can anyone teach me or instruct or provide source code for for how to fetch the table data from worldometer website ?
You could follow up a tutorial on lxml and requests library from python and read on how does XPATH works. Let me look for some resources for you. Give me a second.
https://lxml.de/ lxml official site with it's installing instructions and documentation. This will help you fetch the elements of the html.
https://requests.readthedocs.io/en/master/ requests library which helps you connect to the website itself.
Note that you could also use the beautiful soup library instead of lxml. I find lxml easier.
Here you'll find someone actually scraping with python and lxml
https://www.youtube.com/watch?v=5N066ISH8og
Come back with any follow-up question when you try it yourself @hoary wigeon
Anyone knows an image labeling tool that can do point and landmark labeling?
young
Hey man I looked into this and it looks pretty cool. I certainly wouldnt have thought that simple Euclidean distance could be used for image recognition
you wouldn't have thought that because its a terrible idea
While it might not be "optimal" I don't see why it is exactly a terrible idea? Can you please elaborate?
euclidean distance is an awful metric for image similarity-- you can arbitrarily make an image "dissimilar" by adding a constant noise to the image, or even a constant value shift to the pixels
count histograms are a decent idea-- it was heavily used pre cnn days, but the metrics used were scale invariant
in the end, if you wanted a suitable loss functiom you'd end up with convolutions anyway
@bronze skiff the point was not in making a good algo, the point was just using something with simple algos for beginners to get their hands dirty and learn a bit about how things are like in DS/ALGO/ML like.
Though ofc, a lot of stuff has improved but still simple algos are one of the best ways to introduce beginners. DL in general is a bit complex when approaching it at the start
beginners should know what and why things are awful
You seem to be one of them ๐
I tried to show how count histograms can be used. I yet did not include cross bin relations but they are known for giving better results. Mahalanobis Distance is another distance metric that is much more generic however with the SOTA available now this is indeed not practically efficient but enough to get a beginner excited ๐
Please explain how can a "beginner" know what and WHY something is bad. This is hilariously out of touch.
I tend to ignore such things. There are a couple of people who loved it and I am pretty happy with it
uhhh pastafish is no beginner
Definitely not but he is awful
ยฏ\_(ใ)_/ยฏ
awful?? do you even have a job?
lmao
pastafish is just here to vibe he's pretty chill
๐
I'm not going to debate people's credential as it is both senseless and dumb, but my point stands:
Please explain how can a "beginner" know what and WHY something is bad. This is hilariously out of touch.
a beginner should learn the limitations of their algorithms and why other techniques improve on them
this is a critical part of the scientific process
By definition, a beginner knows nothing / very little. Assuming that people who are learning / just beginning will somehow know all permutations of something and why it works / doesnt work, at first glance, sounds unrealistic.
They won't know why its not practical unless they try out everything
This is different. I agree with this.
The initial reply however was that the begineer should already know it.
I'm a beginner myself and I take huge amounts of time to understand how and why and algorithm does something, even if sometimes i struggle to generalize situations; I just dont like some of the intellectual elitism ive seen in the data science community
As far as I know I learned how to use Euclidean distance and how its not practical moved to DNNs and then to CNNs
This is interesting, since I've never done anything with image similarity; Whenever I compare images outside of classification its always unsupervised. I've just started getting into unsupervised with autoencoders and stuff; anyone here know how these metrics work?
when someone says "the beginner should know..." its the same as "the beginner should learn in order to progress..."
i apologize that it didn't transfer
Perhaps there's a bit of a language difference there, my bad.
English "should" always confuses me because it can mean 10 things lol
no prob
its not my first language anyway
so i probably made some interpretation errors
I could have written a blog on CNNs but I wanted to share what I am learning.
I started with this moved to DNNs and then to CNNs and I know why it doesn't work. Sorry I should have mentioned that it isn't pragmatic
i didn't criticize you, or the blog-- it was decently written if anything
i was just remarking that its an ineffective technique
if that's intellectual elitism in ds, then i don't really know what elsd to say
I think we are all speaking the same language here but we are not communicating properly and we are all making incorrect assumptions about the others lol
Oh my apologies if this came up as a direct. It was more a general rant, not talking about you in particular.
So true. My bad. I just got to know today that there are discord servers where you can share and learn what you know so I posted it here
if you're introducing someone to distance matrices, why not use something that does work well with euclidean distance?
images are complicated to reason about anyway
i dont think teaching people "wrong" stuff is worth it even if it makes the task simpler, you never know who is going to read this blog post then try to do it at work
I've seen beginners to DS/ML who just try to dive headfirst into sklearn
It's not pretty
I'm a complete beginner and I'm trying to plot some things with matplotlib but I'm stuck, is this the right channel to ask or?
yep. go for it
Hey @smoky epoch!
It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
i have a college stats project, im comparing exam scores for our year group and the one below and we have to analyse it etc etc its the first time we've done this, i collected my own data and since i do cs i decided to try visualize the data with pandas and matplotlib (my first time). i was able to read the csv file into a dataframe, the column Level is just ' year group ' e.g. AS or A2. and MAG is like a minimum end of year expected grade, the rest are numeric values out of 5.
i want to do some type of plotting but i cant' seem to get it work. i want to plot revision against difficulty for AS group and try show a correlation if possible. i also want to show a barchart ( if appropriate ) for Grade Vs MAG.
this is what i have so far:
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('Report Task.csv')
df.columns = ['Level','Grade','Difficulty','Revision','Happy','MAG'] #numerical values are out of 5
df[df.Level.str.match('AS')] #to get only AS group
plt.plot(df.Revision, df.Difficulty)```
the plot looks really bad though its like a 2 year old just scribbled all over the graph.. could this be because of my data? its not really the best and i was only able to get 30 people to fill in the form
whats your data look like
can i send the spreadsheet link?
also this line
df[df.Level.str.match('AS')]
consider using .iloc instead
yeah you can send it to me
Plots like that imply that your points' x-coordinates aren't sorted in ascending order.
plot just plots the points in the order provided and connects them with lines; making sure they are in the right order is up to you.
model does not generalize well to the validation data. either it is overfitting or the validation data is not representative enough of the training set. if you are deriving the validation set from the train data only, then try to increase a split slightly.
use k-fold validation to see if it overfits
Quick Question - if the validation accuracy is constant, does that always indicate that the model is overfitting, or can it also indicate lack of complexity of model?
๐ค Maybe it could be complexity - because increasing hidden layers does help
Alas I have a query - Does anyone know what a bell-shaped accuracy curve can lead to? Googling it doesn't produce relevant results
Nah, it was just that my input features were not correctly processed leading to the weird shaped graphs
My best guess is that the models are finding spurious relations that are hampering its' ability to make correct predictions. I found my input pipeline was pretty bad and naive so I am changing that to trained word2vec averaged (maybe multiply by their TF-IDF scores). that's a pretty good way to represent input data (if you are doing NLP)
It could be overfitting, or it could be that you've reached the accuracy you can reach for the task, or your model isn't strong enough to predict
If training is still decreasing it's overfitting
But in general it's all a bit hard to tell, might have to experiment
ye, you would reach max accuracy for a task pretty quick if your input data is malformed
Btw, I'm mathematician, but I'm interested in learn programming, all aspects, so if someone would like to learn math, and if you're interested to teach me programming, please DM me
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
has lots of good stuff
how do I force how sent_tokenize to be multingual?
define mathematician, and what can you teach us

statistics, linear algebra, calculus, discrete maths, probability?
the list goes on
Hi everybody, I am looking for someone who writes production code in any data science field that would enjoy having an apprentice or a newB that would ask questions about data science and programming, I ultimately am going to start a programming career using python in any manner, and eventually specialize in machine learning
I am currently struggling to assure my learning path isn't full going to be wasteful and need direction on how i should learn to understand what is considered clean code and why things are done in certain manners and how to implement the structure in my head into code
this is my third time posting this and being recommended to another chat room if i am simply looking in the wrong community please let me know
i mean this is the right chat room...but if someone wants to accept you as a mentee...thatll be a different story
either way, best of luck pal
@misty flint Just glad i found the right spot, thank you
How can I import all files(PDFs) from a folder in my drive to colab?
Hey a simple and quick question, don't think requires me to open the help channels, anyone has a clue how to remove 0-50 in the yaxis?
why do you need someone who writes production code? are you a beginner? why not other beginners?
I would like to plot the total confirmed cases of COVID-19 in all countries using folium's heatmap but I can't make it work because it somehow requires geojson.
I have all the countries' name and their corresponding total number of cases and longitude and latitude coordinates saved in my csv. I already loaded the dataframe with pandas as well
This is the dataframe I have been working with
frame1.axes.get_yaxis().set_visible(False)?
Hey all, anyone know how I can display the username under each class: https://bpa.st/FLEQ
removes % Population too, there has been answer to set y axis to an empty list, but i'm wonder it odesn't work perviously
ax.set_yticklabels([])?
or ax.set_yticks([]) if you want to remove the ticks in general, not just labels
yeah but wondering if there is a different way than this?
Any ideas @lean ledge you sound like the pro here?
Can anyone explain why it is impossible to create an embedding vector of a negative number?
that makes little sense
unless you mean "why does nn.Embedding not take a negative index?" at which point
it's a lookup table from range(0, <your max idx here>) to trainable vectors
yeah, I get that. but why is the domain only for N?
suppose I want the domain to be [-2, max_voc). Why can't I do that?
because that's how nn.Embedding is written? it's not hard to fix
either override it with your own nn.Module, or you shift all your idx by 2
the shift all idx by 2 is easier
would you happen to have a rough Ideas how I can override that?
wrap nn.Embedding in a new nn.Module
Ah, leave it
i have no idea how to write code blocks in discord
I will just add a dummy dimension
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
class ShiftedEmbedding(nn.Module):
def __init__(self, min_idx: int, max_idx: int, emb_dim: int):
super().__init__()
self.min_idx = min_idx
self.emb = nn.Embedding(max_idx - min_idx, emb_dim)
def forward(self, x):
return self.emb(x - self.min_idx)
or something like that
My problem was that I was feeding a 2D numpy array while the Trsf layer requires 3D. Figured it would be better to have an embedding
I would prefer not to use anything external coz It may break it. Better reshape
Thanx a lot anyways!
Anyone know much about NLP? I recently became interested in it and I'm looking for resources on how to get started with it
Also, if an input vector contains negative components, is it a good idea to make them positive? or does it lead to loss in features?
i was recommended the stanford CS224n course
i've started going through it and its pretty good
https://www.youtube.com/playlist?list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z theres the yt link
Thanks! Iโll check it out
Hello everyone, I'm doing a little hobby project of analysing tweets to see if they have any relation with rising and falling stock prices, anyone done something similar?
I've scraped over 300000 tweets for an index fund and have stock index dataset, applied vader sentiment analysis on tweets
That give 4 values of positive, negative neutral and compound all ranging from -1 to 1 how do I correlate it to stock data ? Any ideas
on the left hand side theres a google drive icon. click it and itll ask for your permission to access your Google drive
No I'm asking how do I import multiple files , some command to import all files in a folder maybe
Any good resources to learn Neural Networks / DeepL?
just write a for loop
once you connect to your drive you can access it like any other filepath
!cp??
Hey @grave frost I got tensorflow working. So it turns out, vms are bad for tensorflow and I used my actual machine and it works. I got the predictive text to work
why were you using VM?
VM doesn't have GPU passthrough (unless you only wished to run it on CPU)
whats your error?
typo karnel
I thought it could work since it uses the processor, but now that I think about it. The model wouldโve needed more processing power so a VM is a really horrible option.
yeah, unless you are using hypervisor, VM offer pretty bad perf
you should have used colab ๐คท
Yeh.
No real justification. If it's a limited dataset, you can just not have a test set.
Make it like 70/30
How big is your dataset and how complex is the relationship you're trying to learn?
i have a graphic like on the img, but it can have 1 or 2 points who are out of curve like top one of the white line, I'm thinking about how to delete these points, I'm thinking about take a median of the values and multiply by a number if the value be higher than this so delete from the data, what u guys think about this way and u guys think has a more easier way?
So you are trying to remove outliers?
ye
Feature engineering is an important area in the field of machine learning and data analysis. It helps in data cleaning process where data scientists and analysts spend most of their time on. Here are few examples of feature engineering techniques,
- Outlier detection and removal
- One hot encoding
- Log transform
- Dimensionality reduction ...
Second video in series.
Or third.
ty, gonna watch it
One thing to note is that you can't just remove the point, since you have multiple curves that all need the same number of points (to match on the time axis), you can do various things instead, such as replace outliers with the average of the previous and next value (assuming neither of those are outliers). @lavish tundra
Or you need to remove all the points at that point in time for all the curves (the entire row in the table is invalidated). Edit: As @lean ledge wrote, this is the much more common case (probably what you want).
It's basically linear interpolation between two points.
Assuming a line connects previous and next point.
If it's only a few points, calculate the standard deviation for the values and just take out anything greater than X deviations manually.
Averaging them out or linearly interpolating is generally not what you want to do
Yea that's the third video in the series.
Second is with percentiles.
Averaging outliers still gives them weightage. When you have 3 trials, in one of which you spilled half the solution, you get rid of the trial not average it out with other ones
In other cases averaging/filtering out is more appropriate
Eg a voltage spike may be a meaningful signal
Yea, idk what the y-axis is so I presented both choices.
The latter case is a lot rarer and probably not the case given the graph they showed
is anyone know about nltk, I'm having trouble with installing
Anyone know how I can remedy this error?
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
The block of code that this occurs on is:
from sklearn.linear_model import LogisticRegression logmodel=LogisticRegression(solver="liblinear") #initialize logmodel.fit(x_train,y_train)
but specifically that last line
validation set is not always the best way to test the abilities of a model. Recommend you do AUC and CV
hi is anyone particularly good at sql
they're not mutually exclusive
!rule 5
5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious or inappropriate. Do not help with ongoing exams. Do not provide or request solutions for graded assignments, although general guidance is okay.
ope sorry
also, "AUC" is an ambiguous term
what do you think the error means?
Honestly I have no idea. But I'll see what I can do to clean up the data
I mean the error itself says it: Part of the dataset contains NaN (clean it up) Infinity (clean it up, good luck) or is too large (normalize it; MinMax or Standard scalers, perhaps)
It's your dataset that's the issue. You should check to make sure there are no null values, empty spaces, letters or anything else like that in your dataset.
And like the poster above said, if different columns in the dataset have wide value margins, it could also cause an issue, so you'd have to normalise or standardise your dataset.
Not sure, but I think the Val curve being lower than the training curve a little is a good sign that the data is learning Since the Val data is unseen data.
If the training curve is lower than the Val curve, then the network is probably overfitting the data.
Hey, anyone know how to put a cnn neural network to my gralhics card
Hello does anyone here know how to work with converting feature EEG data into visualizations (time domain, frequency domain)?
most deep learning frameworks will automatically use gpu
just because of how infeasible it is to run it on cpu
what framework are you using?
about performance u guys thinks is better to acess data from a file or from a request on a link?
file usually
i mean if its not much data then requesting a link will probably be fine
but usually file is more reliable and faster
gotcha ty
Pytorch
it should use gpu by default
you just have to move the stuff to gpu
so like move the model to gpu using .cuda() or .to(torch.device("cuda"))
and the data too
at what line do i need to add that
Do i put in in the network?
you put that when you're creating the model object
so like if your model class was called Net it would be model = Net().cuda()
Can you give an example
or you can do it afterwards too
Thanks
like
model = Net()
model.cuda()```
Thanks for the help
also this is kinda obvious but you would have to install cuda and have the cuda version of pytorch as well
How do i install that
Ok, thanks
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html thats the command for pytorch 1.8.0 cuda 11.1
thats the latest
https://developer.nvidia.com/cuda-11.1.1-download-archive theres the link to install cuda 11.1
Please Note: We advise customers updating to Linux Kernel 5.9+ to use the latest NVIDIA Linux GPU driver R455 that will be available for download from NVIDIA website and repositories, starting today. Select Target Platform Click on the green buttons that describe your target platform. Only supported platforms will be shown. By downloading and us...
https://developer.nvidia.com/rdp/cudnn-download and theres to install cudnn
What if i allredy install pytorch and torchvision
if you installed the cpu version youd need to reinstall it with the cuda version
do pip show torch and show me the output
i'll lyk if you need to reinstall
Ok, give me a minute
could anyone help me with R studio stuff
It shows torch 1.7.1
if it doesnt have +cu that means its the cpu only version
so you need to reinstall
How do i reinsta the framework
Can you reply to the mesage, i can't open the link on my phone
this one
https://pytorch.org/get-started/locally/ just go to this link
and follow the instructions there
Thanks
https://developer.nvidia.com/cuda-11.1.1-download-archive and then this for cuda and https://developer.nvidia.com/rdp/cudnn-download for cudnn
Please Note: We advise customers updating to Linux Kernel 5.9+ to use the latest NVIDIA Linux GPU driver R455 that will be available for download from NVIDIA website and repositories, starting today. Select Target Platform Click on the green buttons that describe your target platform. Only supported platforms will be shown. By downloading and us...
hey, can anyone explain me what that graph means? I ran 3 layers (200, 100, 64) with 0.2 dropout, optimizer= adam, loss= mae. And use earlystop at 5 with min_delta = 0.001. My confusion is that what is happening when curve intersect at each other... Is the point of intersection best_fit or something else? Thanks in advance.
the bottom of the graph is epochs right
yes
yeah so basically whats happening is that originally the training loss is very high, obviously because of the initialized weights, then the val loss is much lower since validation is after the first epoch so it would have had time to train. The intersection is basically just when the model technically starts overfitting, since the training loss becomes lower than the validation loss
its best to just take the lowest validation loss
and use that
since the validation loss is what you're going for
while there technically is overfitting in that graph it isn't really too bad, so you don't really have to worry about it
you only really need to worry about it when the validation loss starts going up instead of down
and diverging
a lot
like this, this was one of my previous loss graphs
high variance
thank you, but what is actually mean when validation curve behave like an ECG curve very shaky
this is before earlySopping
You really gotta have earlier early stopping. Small fluctuations is expected while finding weights towards minimal point, but your model is clearly overfitting
this what with batchnorm without early stopping.
with early stopping using batchnorm
But what is your metrics saying?
Loss is relative.
What's more important is your metrics indicating your model is solving the problem.
someone here understand about seaborn?
Hii
If someone's good at deep learning and neural networks please dm me
Plz plz plz plz plz plz plz
@iron basalt Gotta admit HTM theory has blown my mind ๐คฏ I am just gonna start with Sparse Data Representations but I still don't get - why sparse? it just increases the complexity of data while not adding much information. It seems counterintuitive that our brain works that way.
The way brains work are inherently sparse compared to computers. Computers have essentially random memory access and operations are fixed size. They're highly centralised in a particular way for processing. Brains have sparsity in their connection topology (all neurons are inherently only linked to a few more), there's sparsity built into activation (in form of the action potential theshold voltage, etc)
It doesn't really add any complexity
For a computer to recreate the equivalent dimensionality of problem, it does make it faster computationally because you can ignore the 0 operations.
In case of HTM, sparsity helps with reliability by having some inherent redundancy
so...if each unit of a sparse vector is fed to a hierarchical model, then even the 0s count as features and helps the model learn/converge?
Also, does the current level of research incorporate hierarchy?
Hey guys, I've been programming for a while now and the machine learning field is one that interests me a lot, however I'm a high school student with no advanced math knowledge.
I've just read the "Getting Into ML: High Schoolers Guide" on r/machinelearning, and wanted to know your opinions about this. Do you think it's not worth it for a person of my age to get into machine learning rn, and I should focus on other things? Sorry if this is not the right place to ask, but thanks in advance!
just got to make a legend now
anyone know to to add a custom legend?
does any newbies here want to be friends? Lets tryhard together๐
Newby here
yeah, im noob too
You never ever ever touch the test set. Never. You would create cross validation only within the train set. You fine tune your model based on whatโs happening with the Val sets within your train set. And once you trained the model to your satisfaction, you score again test set, record the result, and youโre done. Even if itโs bad. You would have to get new data, as best practices, if you plan to research again.
Hello, I'm trying to drop rows in a pandas Dataframe that has a column location with different values A1, L2, BB1 and others.
Since there are the fewest rows containing data on L2, I'm trying to chop rows from the other different locations to have the height at L2
I'm starting with A2 using the following lines, this brings the height of A2 to be the same as L2:
test_df.drop( test_df[ test_df['site'] == 'A2' ].loc[:'2/9/2020 10:30'].index, inplace=True )
test_df.drop( test_df[ test_df['site'] == 'A2' ].loc['1/1/2021 8:30':].index, inplace=True )
However, after removing those rows from A2, I noticed the size of total rows containing BB1 also changes!
Even more, I'm unable to perform the same two lines on BB1 to change its size to the same height as L2.
Please let me know if my question is unclear. Thank you!
age is just a number, and best time to do something is always right now
This is a fine place to ask. If you're a high school student in the US, I would encourage you to take the most advanced math courses that you feel you would succeed in. You will likely need to take a few calculus courses and linear algebra.
And you'd likely be taking those in college, so if you can take the first semester's worth of calculus in high school, that will buy you some time.
And if you already feel that you have a good foundation in general programming skills, I'd say it's a fine time to pursue AI knowledge. That's a matter of how much time is available to you outside of school and how you'd like to spend it.
I'd recommend fast.ai lectures, Stanford Deep learning lectures, and most of all the deeplearning.ai coursera course with Andrew.Ng. You don't need to understand all the linear algebra and calculus math to begin working with tensorflow, pytorch or any other machine learning libraries.
Also let me expand on what I had said about taking advanced math classes now: showing competence in math signals to admissions counselors that you would succeed in their computer science courses, so your math grades will probably be a key consideration.
@solemn holly does that help at all?
I recommend reading through https://en.wikipedia.org/wiki/Compressed_sensing.
"At first glance, compressed sensing might seem to violate the sampling theorem, because compressed sensing depends on the sparsity of the signal in question and not its highest frequency. This is a misconception, because the sampling theorem guarantees perfect reconstruction given sufficient, not necessary, conditions. A sampling method fundamentally different from classical fixed-rate sampling cannot "violate" the sampling theorem. Sparse signals with high frequency components can be highly under-sampled using compressed sensing compared to classical fixed-rate sampling" - This is a common confusion (key part).
Another key part is: "Compressed sensing takes advantage of the redundancy in many interesting signalsโthey are not pure noise. In particular, many signals are sparse, that is, they contain many coefficients close to or equal to zero, when represented in some domain.[11] This is the same insight used in many forms of lossy compression. "
Compressed sensing is pretty unrelated to HTM other than the idea of sparsity but a lot of things have the idea of sparsity
It's also not really very impressive to anyone who hasn't taken an electrical engineering degree/signal processing course and doesn't already know Nyquist Shannon theorem
HTM is related to sparse coding, and one of sparse coding's main applications is in compressed sensing. I believe that reading about it would help give a bigger picture. Especially since @grave frost was asking "why sparse?".
Was it suppose to be impressive?
@grave frost @serene scaffold @nimble saffron
thank you for your responses!
Unfortunately I live in Europe, and in the place where I am things don't really work like in the US. Here, we have a very specific and predefined school program and we can't really get much more beyond than that, if you know what I mean.
There is only one math course available to us, in which I'm already in.
I've heard many good things regarding the Andrew Ng's Machine Learning Course, but as I randomly explored some of its videos I came cross some very intimidating math I would say at least xd, thats way I was asking that question in the first place.
Maybe with a bit of googling I could get through it? Or would it be better to try something more programming related like the Sentdex video series on yt? The deeplearning.ai course seems really interesting as well.
I currently have a lot of spare time and I think it would be a great chance to start learning this.
"why sparse" has a lot of different answers tbh
Is it possible to build a good image classifier without convolutions? As it turns out one can just use the Vision Transformer! It was proposed in the paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In this video, you can learn how to implement it from scratch in PyTorch.
In this video I implement the Vision Transformer from scratch. It is very much a clone of the implementation provided in https://github.com/rwightman/pytorch-image-models. I focus solely on the architecture and inference and do not talk about training. I discuss all the relevant concepts that the Vision Transformer is using e.g. patch embedding,...
Sparsity in compressed sensing is in basis vectors for L0 optimisation, which is a different reason than sparsity for computational reasons in general numerical computing, which is different than the many-brains type of reasons behind redundancy+sparsity of data
Just spend your own time learning maths
If you're in the most mathematical stream for your country, you'll go over calc 1, maybe a bit of calc 2, at some point in the last couple years.
You just have to go over linear algebra and calc 3
Wow I had no idea how different European school is, how interesting. What I would recommend is not randomly jumping through Andrew Ng's courses, but to instead enroll in his deeplearning.ai coursera course and pursue the courses in recommended order. I've completed Linear Algebra and mulitvariate Calculus, but Andrew's courses lead you through the material and expect that you don't understand the math.
But yeah, just learn all the maths in high school. Save machine learning for university
Andrew Ng's course expecting you to not understand is hardly a good thing
Imo you don't need to wait until you understand the maths. You can totally start learning about machine learning now. The maths can be learned on a case by case basis.
If you're learning a field of applied maths without learning the maths, you're missing out on stuff and not even realising you are
Maths has anywhere from subtle to important implications
please don't just do DS/ML with no math foundation
take the time and learn the math
Yes sparsity has many benefits and one can make use of all or only one of them (or some number of them in-between). Compressed sensing is one of my favorite reasons / applications.
I like compressed sensing also, I find finding sparse bases is very interesting
And it has some interesting applications (speeding up MRI scan times etc)
Do you like using sparse distributed representations? If so, I would like to hear about any applications or other cool thing you found out about them.
I don't know as much about them. My background is more robotics/signal processing/control/scientific computing type stuff
Weird niche areas of neuroscience and AI are just a side hobby
You might want to look into them. Distributed representations are cool, but when you combine them with sparsity, it becomes really powerful (not just for HTM's purposes).
@hollow sentinel the math is kinda easy anyway. you just need to know what matrixes and gradients are
Only if you're looking at super basic courses lol
The more maths you know, the better
I'm only talking about the basic math you need to understand what ML is
Understanding what ML is is sadly different from actually doing ML in practice
cortical.io for example uses SDRs for NLP: https://www.cortical.io/science/sparse-distributed-representations/
Sparse Distribution - The fundamental concept behind Cortical.ioโs Retina Engine consists in converting text into a Sparse Distributed Representation (SDR) so that it becomes numerically computable. Contact Cortical.io to learn more about our solutions.
so should I go ahead with this or leave it for later?
Following Andrew Ng's course?
yeah
@solemn holly as someone, who used to live in India with 0 resources available to europeans, I would emphasize it hardly matters
or machine learning in general
Yes, definitely go for it!
ok guys, tysm! I'll give you updates later ๐
Let me get that coursera course link I was talking about
You don't always need math for learning ML. learning things intuitively first and then exploring the math behind it later is a really great way (speaking from expereince)
I agree
yh that's what I was thinking, thx
@solemn holly here ya go https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning#instructors
Offered by DeepLearning.AI. In the first course of the Deep Learning Specialization, you will study the foundational concept of neural ... Enroll for free.
You can even get a certificate of completion for your linkedin profile afterwards
You can audit it
The coursework is laid out over 4 weeks, but if you're really in a groove you can bust it out in only a couple days.
@iron basalt@lean ledge I just learned more about SDR's and I am already quite fascinated by their properties. Honestly, there is much more than meets the eye. on the YT course, the guy demonstrated the compressibility and noise-withstanding capabilities that I never could have guessed. Another thing - I think (not sure) that the SDR also provides redundancy by resusing features in the hierarchy of the HTM and having different columns study different parts of it? am I getting that right?
If you are willing to share what robotics you have been working on I would be very interested, there is no robotics section in this python discord, but you can DM me.
I haven't started on the thousand brains theory so I only know vaguely how HTM works
woah that sounds awesome, what is that?
some resources: https://numenta.com/resources/videos-podcasts-and-more/
Want to learn about Numenta's brain theory and technology? Whether you're a neuroscientist looking for published papers or an AI enthusiast interested in how the brain works, here you'll find papers, posters, podcasts and videos about HTM cortical theory and its applications for machine intelligence.
could you explain what audit is?
Watch the cortical.io video on that link. It may help your understanding even more since you have been doing NLP stuff IIRC.
I will surely do that ๐ but right now, I am doing the YT course cuz it hollows out the extreme foundations - and since the theory is so complex anyways I would like to do it properly with actual math
linear algebra, calculus, statistics, discrete maths
Checkout this article: https://learner.coursera.help/hc/en-us/articles/209818613-Enrollment-options It talks about what you'll get from auditing the course, basically you'll be allowed to view all course lessons and lectures but you won't be able to submit quizzes and tests that amount of recieving a course certificate.
If you really want the course certificate, you can subscribe for a trial to coursera after you've compelted the material, take the quizzes/tests and then recieve the certificate. Thats what I did.
there's plenty of math you have to know
Pls just learn the maths first before ML, I've worked with a lot of "maths second" people and it still gives me nightmares
don't just dismiss the math as "easy" it hurts beginners to see someone just say that when they're struggling
@lean ledge the maths is often not in the currciulum that is required for DL. college is the best place to learn that level maths
oh god
knowing the math and then learning sklearn is much better than randomly using algorithms that they don't even know
if you tell any high-schooler to do the amount of math to build a transformer, it would likely take years
If you want any confirmation the maths is important, compare Andrew Ng's course to his actual course at Stanford
I would start w the practical statistics behind machine learning
that is a good book
cuz its already made by people with phds
This is a fruitless argument, entirely based on opinion, lets agree to disagree @lean ledge @grave frost @hollow sentinel
His coursera course is just for self learnt people to feel better. If his coursera approach was any good, he'd teach it the same way to his actual undergrads
No, math is definitely important.
there is no opinion to this you either know the math behind what you're doing or you don't at all
it's just facts
@iron basalt I'm getting ready for work but I'll mention you in off-topic when in free :)
I agree with this, learning the math first will make learning the ML that much faster. The slow start will pay off over time. If you are not in an actual course, get some books and you can self study, but it requires a lot of self motivation.
@lean ledge if you don't mind me asking, what use do you do with HTM+ robotics
I don't do htm with robotics but I do use other neuroscience methods with Robotics
I am interested in all neuroscience based methods, especially applied in robotics (right now grid cells are my primary target / focus).
neuroscience seems to be a pretty well-focused area. But are there other theories apart from HTM that try to relate AGI with neuroscience
I'll move this to #ot1-perplexing-regexing to explain what I'm doing rn :)
Thank you guys for your opinions, time and help. I might leave this for later.
The problem with the math discussion in data science is that more often than not people view deterministically. Either you learn it or you don't.
In reality, I think you will always learn it either before or after but YOU WILL be forced to learn it.
The thing is, what does "learn" math mean for you guys here? I haven't done any decent math in over 8 years so I the most I remember about calculus is general concepts and intuition. If you ask me "what is gradient?" I would just say "Delta, how much Y changed along with X, which is also the solution to f'(x) for x in the range of change"
If you ask me, and from what i've seen so far, Statistics is the math I'd consider "most important" -if one can even say such a preposterous thing-
Hypothesis testing, sampling, distributions, etc. I've found much more IRL uses to statistics than to Calculus. That's just my humble opinion.
On another note: It's sad reviewing my notebooks from 8 years ago when I could watch anime and solve differential equations; then look at myself now and I can't remember wtf a diff equation is 
Learn maths meaning understanding the relationships and nuances of mathematical objects, equations, etc.
Gradient is obviously high school level stuff but within ML there's stuff with nuance like optimisation. Can someone without much understanding of maths understand stuff like why quasi Newton methods are everywhere outside ML but don't work for ML?
Does someone doing linear regression understand they're placing a Gaussian assumption on the noise?
Sure often it doesn't matter when you're doing basic linear regression, but there's so many situations in which the extra maths background helps you have intuition and make better decisions than those without that same level of knowledge
math is important - just that the time to learn it may not always be early. everything has a proper time to be done. IMO if you ask a high-schooler to learn that level of maths (it being easy to you, so you may be underestimating it) that's overkill
Also yeah, people saying "that level of maths is best learned in college" are sort of missing their own point. Machine learning is best learnt in college
There's no reason machine learning should be accessible to high schoolers
@lean ledge Then there's me, who 90% understand things but always forgets wtf things are called so i'm always lost
by Gaussian you mean normally distributed?
Indeed. From a statistics perspective, linear regression solution is the maximum likelihood estimate of a linear model with normally distributed noise.
true, but there is no harm in being familiar with it - the same way you learn basic Quantum mechanics in high school but not the real deal
- The core difference is physics is about the intuition and understanding which can be done without maths
- If you know how physics people work, high schoolers interested in physics learn maths subjects like calc series and dive into it properly during high school
I absolutely dived into differential equations and then simplified infinite potential well/finite potential well stuff when I was in high school
since we're talking math, I'd like to ask for some particular advice @lean ledge your opinion.
I have variable, let's call it TOTAL.
TOTAL is the sum of 4 different features.
question: should i delete total?
after all, as a feature, it holds no mathematical value and its nothing but a simple addition of its parent features
My intuition tells me Total would skew any solving as it is adding unnecesarry noise
Not really a maths question. Keep it for readability I guess
Thing is, my DF is around 250k rows, so any dimensionality reduction is appreciated haha
so i'm trying to get rid of whatever is -obviously- useless and then i'll go more in-depth
If you need the total at some point, you're gonna add them anyway. Dimensionality doesn't factor into this.
Adding or not adding isn't improving dimensionality anyway
wouldnt it affect model training? because if we are looking for y = wx, and x is also a sum of "other" x, isn't a bit, ehm, odd?
well, your high school is quite different. states?
@exotic maple that type of feauture enginnering is pretty common
no, the US has dismal education
Australia
@lean ledge Damn i'm envious of your Australian schools. Diff eq in high school?
pretty cool schools you have
That, or I have to praise you for being gifted lol
For reference, we covered part of diffeq stuff in maths but physics finite/infinite potential well stuff is covered without diffeq
I self learnt it with proper maths on my own
It's not specific to Aus. Lots of European countries do part of diffeq in high school also
Eg UK has diffeq stuff in A level maths/further maths
idts
yeah, UK might be, not the whole EU
Some US high schools do too, but US schools have high variance.
my brother studied in Germany and he didnt' learn jack shit about DF EQ lol
ye, the country of your residence impacts much about your life
cries in 3rd world country
(and future too)
but still, stuff can be done - just in the same degree offered in other countries
admittedly I can't blame everything on country, but still...
wish i had a bit more math during high school :v
learnig stuff through internet is pretty inefficient - contrary to popular beleif
Oh I know. I've been strugglinh with some stuff
I'm certain i would have eaily elarned with a teacher
Might depend on stream? Was he in gymnasium?
@lean ledge I saw those things until college lol
Yeah my curriculum is definitely a step above most
@lean ledge beats me. German education is a bit weird. He has a masters in CS now thou
But it's also an international curriculum
..?
This is part of International Baccalaureate
IB?
Yep
I learned about that option only until I was already in 1st year of college ๐
fk me
India had IB - but parents usually will not take it on the advice of (f-ing) teachers
First year for me repeated a lot of that yeah. Main maths I learnt in first year was linear algebra and multivariate stuff
oh we just got a name change? its about time
Doesn't really make sense compared to CBSE or something unless you're moving countries

So makes sense
CBSE sucks af
Yeah indian education is dismal. Glad I left there like more than 10 years ago now
did you give the board?
lucky
Sounds really bullshit but yeah. I mean, i'm also from a 3rd would country but i've worked with tons of Indians from all levels and we've always struglled
I'm still in uni, 10 years ago I was pretty young
@lean ledge how old are you?
an advice from an old relic here (29) before 25 your brain is super fresh and fast, train it hard on math. after that it gets a bit difficult ๐ฅฒ
29's young IMO
I'm joking lol. I dont feel old, but i've noticed a HUGE difference in learning speed and processing
I am 20
21 in a month or so
Month and a half
I thought myself Python and lots of stuff in like 2 months, even made a few shit programs to test myself, so I guess I can still "learn"
but some abstractions or some fast things I could do before...I can't anymore
smoetimes i wonder if it just stress occupying working memory in my brain or if im getting dumber lol
hmm....
Growing up sounds scary. That's why I try to speedrun through the learning part of life now :p
Just gotta make sure I don't delay my PhD too much
@lean ledge Eh, i make it sound bad but nt everything is
you get bad at some things , you get better at others
doesn't bachelors+masters take years?
Oh soz should have clarified, I mean starting my PhD
I'm just finishing up my master's by end of this year
And plan to work for a couple years before jumping into PhD
It's probably a stress thing, my grandpa learned programming at like 70. What's important is that you take the time to build a solid foundation (like knowing a bunch of math), as it will make learning other things much faster (later on you may not have enough time due to other adult things, unless you become a study hermit type (no social life, etc)).
@iron basalt Yeah i've been thinknig about it and i think its stress too. I'm finding it difficult to focus latelly, but when I do i usually breeze through most things
Just also make sure you don't do drugs and you should be fine (don't damage your brain significantly).
is that supposed to be serious advice?
It is
Former smoker vouches for it
Yeah, people get stressed and do drugs, it's bad.
Ha, i've been staring at my DF for 2 hours thinking wtf to do with missing values
delete, interpolate from averages, bfill (for dates) ugh... there's also no reference info
Also study groups and clubs (or other gatherings) are very important for some. Some people need other people's shared excitement to stay focused/motivated for a longer period of time.
I estimate rows with missing data represent 2% of all my valid DF
I feel personally attacked. I learned this too late. I need people with similar interests to stay motivated on something
which is why im here :p
but its still not the same as IRL acquaintances
Conferences are a great tool in this aspect if you are doing research.
mmm... as expected, simply deleting all my NaN rows gets rid of 2% of data. i wonder if its worth trying to go ahead like that
I'm tired of thinking wtf to do with some of those columns
its not. but better than nothing 
me
i start to lose motivation if i dont see others doing stuff
hey guys, im doing a decision tree with 418 data points. i have cleaned the data and fit the model, my classification report is only showing support of 105. i didnt specify a size in my train_test_split. what am i doing wrong? or in better words how do test the entire csv. here is code:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
X = test.drop('Survived',axis=1)
y = test['Survived']
X_train,X_test,y_train,y_test = train_test_split(X,y)
dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)
prediction = dtree.predict(X_test)
print(confusion_matrix(y_test,prediction))
print("\n")
print(classification_report(y_test,prediction))
i made a software
that detects political ideology sentiment from religious books
where can i share my results
and howcan i know if if code is doing the right suff
Here ๐
Liberalism : 8.5%
Socialism : 7.6%
Fascism : 7.5%
Syndicalism : 7.4%
Democracy : 7.3%
Communism : 7.2%
Libertarianism : 6.9%
Nationalism : 6.3%
Populism : 6.3%
Environmentalism : 5.5%
Conservatism : 5.4%
Communitarianism : 5.4%
Corporatism : 5.3%
Anarchism : 5.0%
Progressivism : 4.4%
Authoritarianism : 4.0%
after some nlp on bible
How did you arrive at these numbers?
Does this mean that 5.5% of the Bible supports authoritarianism?
I haven't read the bible, but does it have communism?
lemme explain
You can find Bible passages to prop up a lot of different ideas.
Liberalism : 8.6%
Socialism : 7.6%
Fascism : 7.5%
Syndicalism : 7.2%
Democracy : 7.2%
Communism : 7.1%
Libertarianism : 6.8%
Populism : 6.4%
Nationalism : 6.1%
Corporatism : 5.4%
Conservatism : 5.4%
Communitarianism : 5.4%
Environmentalism : 5.3%
Anarchism : 5.1%
Progressivism : 4.5%
Authoritarianism : 4.3%
quran sentiment
It's not off until we know specifically what it's designed to do
yeah, true
But you're right that I don't think these numbers are currently insightful
but the OP did say sentiment analysis
How are you would you classify ideological sentiment in the first place?
assuming everone knows how senti analysis works
Doesnโt that heavily depend on context?
Let's hear out their explanation
so i took the entire quran/bible in plain text
then i took the wikipedia articles on all these idealogies, democracy, communism etc
in plain txt
then tried to see their match
So you did textual similarity?
euclidean distance with vectors?
ye, NN's would be much better
the n i used was 3 4 5 6
but all produce simmilar results
i wanna do nn
but donno how
Wouldnโt you be picking up/comparing words like articles too?
I might need to head out but I'll try to get caught up with your explanation
wasn't there some command for DS resources?
i kindda sanitised the data
from wikipedia
That probably wouldn't matter if you're normalizing for how words are used throughout the corpus
i did it manually so i need an api that will olny give me plain txxt aricle without bs
i did the normalisation shit
punctuation, stop words?
yeah
lemmatization and stemming?
yes all those
good, the bases are covered then
someone know why this code is giving me a error:
go_dias = [0, 24, 48, 72, 96, 120, 144, 168]
go.loc[i]["timestamp"] for i in enumerate(go_dias)
TypeError: object of type 'generator' has no len()
list(generatoor)
whats a generator?
you prob need to debug your model. OR I recommend you go the easy way and fine-tune some pre-trained model (Like BERT) for your task
A certain type of iterator. Why don't you open a help channel so that there's space carved out for your question?
mostly liberal?
am i doing this shit right
probably not
tf is a bert
did you adjust class weights?>
If your model is saying that every political ideology is represented similarly in every text, you probably haven't gotten it right quite yet
no no
its a model trained with a supervised technique
the ideologies in a qural are different
A way of representing semantics as a vector.
on a pretty big corpus of data (maybe even quran and bible)
I was explaining what Bert is.
yeah, the results are too same to say that your model is correct
basically my conclusion is from the data, bible and quran convey the same ideo logies and are primarily liberal
BERT is a model. internally it does represent as tensors but it doesn't return them
If you limit the scope to the old testament, I would expect your model to predict mostly authoritarianism
bible too i guess
.....?
Would you even be able to pull sentiment out of a Wikipedia article?
i too whaer i got on the innernet
probably not
thats the only data i have from say communism
Right
the style of wiki and bible is much different
if you have better sample txt then send me
Keywords maybe but I canโt imagine youโll be able to pull the actual ideology
what would be a better way
where can i gext txt tat contains only communist ideas
or socialist ideas
you have to hand label a part of it , or download some data fron the net
or democratic ideas etc
no, no articles
how
The communist manifesto
only text whose vocabulary is similar to your target corpus
Communist manifesto/ other biased writing but your success will heavily depend on how you train it
tf is that
The book that describes communism
I can't think of a seminal work about the virtues of democracy
Find the root of the ideologies and find texts from those roots
maybe some lecture by democrates
Iโm sure thereโs an Ancient Greek text out there somewhere
but the style will be quite different
but what youguysthink the bible/quran primarily are
Not capitalism
But even with all that, @lapis sequoia your model will perform badly even then
Both of those books are about following rules laid down by a supreme being, so they should probably be authoritarian.
Here, so what is the difference between sql and excel? I want to learn it but I need to know what it absolutely does
authoritarian seems to be the least
Also language is going to differ from text to text which skews that analysis
the supreme being seems to be cool
Sounds like a good question for #databases.
sql is some advanced shit, excel is like basic spreadshit
not only that, the context and style is so different, and they are using such a naive method
sql is commandline
i think so
I did not mean sql I mean excel
thats how I learnt it
So what is the difference between excel and pandas
Well also I can make the same โmodelโ by generating a bunch of random numbers and arguing why theyโre right
Excel is its own software?
Feel free to continue the thread about SQL vs Excel in #databases. I want @lapis sequoia to get the feedback they asked for about their project
yeah
I mean for the functionality
its too naive to be inefficient
Please don't discuss websites that aren't appropriate for this community
well, he was discussing ai
Doesn't change what I said
@lapis sequoia recommend you pretrain some model
Try training a classifier to distinguish between capitalist and communist content
and do you think my results make any sense
not at all
I don't think the results you showed us are informative
Let's not be too harsh. It looks like they've made a lot of progress
I think it's a great project. How do you think it could be used?
who says anything about using it
Do excel and pandas have the same functionality
yea
Thanks
pd is better once you know python
you can do the same shit with both, one ise gui, another is code
I know python but I use it mainly for web development and discord bot development
Excel has a graphical user interface. Pandas gives you a lot more flexibility with how you manipulate the data and what you do with it.
learn pd, it doesnt have a learning curve
Some people make complicated data pipelines in excel that are hard to understand
excel is pretty straightforward IMO
so is pd
well, excel is GUI
But the functionality is the same they do the exact same thing but differently
hmm...really?
bad programming
oh yes
Hello, i have a small question regarding Cartopy/Axis3D usage
I am guessing the best way to learn pandas is with the docs?
import itertools
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import LineCollection
import cartopy.feature
from cartopy.mpl.patch import geos_to_path
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
ax.set_zlim([-110, 0])
ax.set_xlim([-72.4044, -67.3506])
ax.set_ylim([-31.279, -24.1272])
borde = [-72.4044, -67.3506, -31.279, -24.1272]
target_projection = ccrs.PlateCarree()
feature = cartopy.feature.NaturalEarthFeature("physical", "coastline", "50m")
geoms = feature.intersecting_geometries(borde)
lc = LineCollection(geoms, color="black")
ax.add_collection3d(lc)
ax.set_extent = (borde, target_projection)
ax.scatter(eve_long, eve_lati, eve_profu, s=20, c='b')
plt.xlabel("Longitud")
plt.ylabel("Latitud")
ax.set_zlabel("Profundidad")
plt.title(" TALTAL-SERENA 2019 : eventos 4> ")
plt.show()
In this code
Pandas and Excel are both about working with tabular data, but Excel is a specific environment for working with them, whereas pandas is programmatic
i can't leave the coastline feature within the box
Not really. Just try working on projects that use pandas and look things up as you go
Do you know how to use numpy?
the pandas docs are pretty good, but theres some good YT tutorials. even for projects too
Pandas is a lot like numpy
Tbh I barely know numpy, I think I should get started their
Thanks
Thanks
I don't think it matters which you learn first. Pandas and nunpy share a certain approach to iteration
Numpy*
I'm on mobile BTW
ye
pandas numpy matplotlib
But so many people tell me to learn numpy, I need to get started with that, then I can get up one more level by learning pandas, and then after maybe learn a few different databases basically something that is not JSON
hmm theyre not as sequential as you would think. you can learn both at once.
numpy and pandas are like pb&j

they are made by the same dude
Just one problem, my computers broke so I am stuck on an iPad until I have a couple more dollars to get a proper pc ๐
Numpyโs epic and super useful for a tonnn of stuff

Ik, but rn I am trying to learn MySql, I am so confused
Ah gl with that, Iโve had 0 motivation to go anywhere near databases ๐
Numpyโs a lot friendlier
Not really but itโs more worth it imo
Thanks
Sorry removed the msg accidentally
Ya if youโre doing data analysis then some sort of sql will go a long way
Yep
I wanna know which is the easiest
has anyone made an AI in python that makes AIs?
that would essentially be an AI with self-reproducing capability :>
That probably depends on your definition of AI. I assume you're mostly referring to some sort of AI Singularity which is a no. But if you're talking about an AI that makes some custom mini AIs for specific tasks, certainly
hey im learning that too!
SQL buddies


sql is a good skill to have
and it doesnt break after a million or so rows
unlike excel

Eyy!
Ma man I decided to use this tutorial https://youtu.be/p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes. We will cover what SQL is, why SQL is important, what SQL is used for, the syntax of SQL, and multiple examples of SQL. SQL is the standard language for interacting with and manipulating data in a relational database system, and is one of the most important con...
noice
handled, thanks for the ping ๐
What version of Java do you suggest for latest Pyspark? It works with both 8 and 11, and 11 starts generating warnings, since apparently Pyspark uses some outdated features.

is this... a new channel or am I seeing this for the first time?
it used to be called plain data science
o, they renamed it
they should add ML to the name
make it DS-AI-ML
all the acronyms

the alphabet soup channel

anyway, debating on learning more math before learning more ML
its such a slog to get through
might just look at chain rule and partial derivatives for now

I have experience in frontend dev and now learning backend do you think when I switch to Data Engineering career path it's gonna useful?
It'll help but not that much...
Data Engineering is totally different thing.
Basic will be very similar but you can't become data engineer by learning backend.
I see do you think this roadmap is enough https://coursebank.ph/sparta/pathway/data-engineer
Which is better for image classification PyTorch or Tensorflow?
its been done already by google
though the task would be similar - its just that the child AI would perform much better than the parent one
I think there was a link
They are just framework for deep learning. Image classification is not dependent on frameworks.
You can use any models in both of them..
For starters it looks good enough.
They are focusing on lot of stuff but yeah if you do them you should not have any problem in starting as data engineer.
Obviously you will have to learn new skills/tools depending on the company you will work so one course can't cover anything.
Thanks for clarification bro
So I'm kind of confusing myself and just want some advice on something
Say ive got data on when a machine breaks: the input are things like the condition of the object the machine is working with and the output is"did machine break yes or no?"
Im using Scilearn, specifically the multilayer perceptron
But im unsure which one to use, am i meant to be using the regression model or the classifier model?
The model should be able to predict when the machine breaks essentially
My gut is saying it's a classifier "did machine break? Yes class and no class". Could i be wrong on this?
Regression is for predicting some continious quantity, classification is predicting some discrete quantity, like a class. Your task is very much the latter - a binary (only two classes) classification task.
If you turned it into trying to find a probability of it breaking, wouldnt that be like a continuous quantity and therefore become a regression problem?
Thank you for your answer btw!
The thing is, many classifiers are secretly just regressors finding the probability of each class, then predicting the class they predict the highest probability of ๐
So if you just want a classifier that also tells you how sure it is about its prediction, this I believe can be achieved by asking for that data from a classifier model.
oooohhhh ok that does make sense actually!
One more thing, what score is acceptible?
I get around 0.6 as my score, which i assume means it's correctly predicting the test data 60% of the time
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
Looks like it's predict_proba. Essentially, when you ask a classifier to .predict, it simply gets the .predict_proba, then returns 1 for points whose probability is >0.5 and 0 otherwise.
score "Return the mean accuracy on the given test data and labels.", so yeah, it means for 60% of the points the right class is predicted.
what would it say about the data if it wasn't consistently 60% though?
Like it jumps between 40% and 70% each time i run the code. The data is arranged differently each time
If i got a score of 0.1 that would be fine, im not worried about it being low, i just need it to be consistent
As for how good it is... well, suppose your data is 50% class A and 50% class B. Then a completely dumb model that just flips a coin each time would get 50% accuracy just by random chance.
Another, even worse example: consider your data is 90% class A and 10% class B. Then a dumb model may simply predict A always and get 90% accuracy.
So it really depends on how good the data is and what accuracy are you going for.
Hmm, are you using a validation subset (like, separating the validation points from the rest of the dataset to check the accuracy on points the model have never trained on)?
Yeah so say i have 200 data points, I split 60% into a "training" data set and 40% into a "testing" data set
So if 60% of my data gave breaks, it would just predict a break to happen 60% of the time? Is that what youre saying?
I'm just talking about the fact that the more imbalanced the data is, the higher the possible accuracy for a "dumb" model that does no actual computation can be. So depending on how much of your data is class A, 60% accuracy may mean the percepton has actually learned something (if it's 50/50, that means it's beating a coin by a solid 10%), or it may be doing worse than it could be by simply predicting the more common class each time.
That's the problem with accuracy as a metric, basically. If you, say, identify cancer patients among healthy ones, you may have a percentage of sick ones as low as 0.1%. That means a model that just labels everyone as healthy will achieve an "enormous" 99.9% accuracy.
So for heavily unbalanced classes, accuracy is not a good metric.
that makes so much sense, I should probably run a check to see if this is the case in my data
Yeah, that's a good idea.
So presumably the closer it is to being a coin flip on whether the machine breaks or not within my data the more likely the model is actually predicting something useful?
More like the closer your class composition is to 50/50, the lower the minimal accuracy levels that need actual work are.
If your accuracy is 60%, that needs (a bit of) actual work if your dataset is 60/40 or more balanced than that.
Also, a good idea to detect this kind of cheating is to check the accuracy for each class: as in, the 4 probabilities:
P(X is classified as A when really being A)
P(X is classified as A when really being B)
P(X is classified as B when really being A)
P(X is classified as B when really being B)
I just checked, this is literally the case. The number of breaks in the data is around 60%
or just the two P(X is classified as A when really being A) and P(X is classified as B when really being B) as the other two are 1-these.
So to do that, i should use the .predict(X) function and count how many times the output is "break" when the true result is "break" and count how many times the output is "break" when the true result is "no break"?
Yeah, but I feel like sklearn should have a function for that, hold on
hmm, guess not, because sklearn metric functions are mostly one-output (rather than an output per class)
so do something like
# predict the validation dataset
validation_predictions = model.predict(validation_data)
# separate the validation dataset into two true classes:
B_inds = validation_true == 1
A_inds = validation_true == 0
# calculate the accuracies for each class:
A_total = validation_predictions[A_inds] == validation_true[A_inds]
B_total = validation_predictions[B_inds] == validation_true[B_inds]
A_acc = np.mean(A_total)
B_acc = np.mean(B_total)
Thank you! Let me try this out. What result should I be hoping for?
A bad result would be a very low accuracy on one of the classes - that's what the model cheating like I described looks like.
Oh right because it's disproportionately guessing between the two
yeah
@plush portal ask @hard frost ๐
well is he minds replying, he is free to do so
would you mind if i quickly DM you about this?
Any plotly proโs here?
Maybe best to just add the content with dash I guess, no need to filter the histogram
Data Science is an interdisciplinary field. A mix of statistics, computer science, and one or more disciplines (e.g. biology, mathematics). Data Science can be an umbrella term but one can guess that it involves data. A data scientist's job is to fetch, clean, manipulate, and analyze large amounts of data so that it can be interpreted into a form where a human can then understand or read the result of an analyses. Anyway, thats the gist of it and u can just look it up on Google or some search engine
Bravo! thankies
@tidal bough So i gave it a whirl and I got exactly the same as my score which means i interpreted your code wrong lol What did you mean by the validation_true step when you separate the data-set into two classes?
nice status. lol

your point is?
how is this relevant here?
better do that on #ot0-psvmโs-eternal-disapproval or other offtopic channels
Oh, sorry, I really hadn't saw that, I'll send It there now
Sorry again
does this channel topic include neural networks?
it does i guess since neural networks is part of machine learning
Technically yes, though I kind of feel like we need a numpy/pandas channel and then the rest of DS channel
#community-meta suggest that there if u think it is a good idea
well i have no idea what the transform argument is in the mnist function (im using pytorch), i have been following a tutorial but its kinda broken
also sadly i am new to data science because i am learning bioinformatics so i still cant help u with that
probably an outdated one?
check the versions
and read documentations just in case
wait its not outdated i misstyped something, but i still have no idea what the transform argument is
hmmm i hope someone can help u with that. i usually ask in discourse or on another platform
can you send a link to the docs?
i seriously considered bioinformatics before deciding on a grad program that was more data-science leaning
(artificial intelligence)
i was just afraid it was too narrow of a scope and wouldnt have as many job prospects
any data science stuff is actually good,
u can translate what youve learned into other fields thats why data science is so hot right now for the next few decades be it in biology (bioinformatics, computational biology) or in other fields
yeah i actually have a medical and public health background, so ill see about leveraging that domain expertise some time after schooling
but also maybe not bc that industry is heavily regulated
I am avoiding data science for now bc I want to do data strucs/algos
so its harder to do stuff in the field
ooooh we have the same goals then

similar rather
yeah i might try a couple industries then circle back to the medical field
do machine learning for petroleum companies
they make hella money
it is actually a good choice to learn that first
its also extremely volatile
haha true but getting familiar with it is a good idea tho
exactly then I have the foundational knowledge for the algorithms in sklearn and I'll know the data structures I can use instead of just lists and dictionaries
I was joking
strange sense of humor
oil and gas industry: oh price of gas has gone down, since youre not a senior position, you get laid off

data science position at a bank
u should also know how to use grakn too guys
:3
plan to learn it next
after some stuffs
https://grakn.ai/ good stuff

planning to learn BioGrakn
I am stuck in linked list hell
R is not that bad to learn
its not
just julia with me
i used it to do my stats homework
according to the CUNA mutual group people it's easy to make the switch

i know R since i played with it in highschool
sigh if only Textron let me work remotely
julia is good too :((


