#data-science-and-ml
1 messages · Page 121 of 1
from what i see, without feature engineering this data can yield at most .76 accuracy
I guess I'm more asking for techniques that I might not know which can help in this situation
obv good feature engineering probably helps, but are there other methods that I should know about? that kinda stuff
well, if i were you i would begin with feature selection
then do either factor analysis or pca
to see if you can get more knowledge out of the data
to be honest this 3/4 accuracy is related to there being too many categorical variables
if there was more numerics it would have been better
Hello guys. I have a question. How can i know how many hidden layers has a CNN? I want ro build one with 5 hidden layers but i am getting stuck
if u are building a neural net with keras it will look somewhat like this:
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(16, activation='relu'))
model.add(Dense(3, activation='softmax'))
every line that begins with model.add(Dense is a layer, the last one being the output.
pytorch has a similar syntax
# Define the neural network model
class BinaryClassificationNN(nn.Module):
def __init__(self, input_size):
super(BinaryClassificationNN, self).__init__()
self.fc1 = nn.Linear(input_size, 16)
self.fc2 = nn.Linear(16, 8)
self.fc3 = nn.Linear(8, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
x = self.sigmoid(x)
return x
this is a binary classification neural net in pytorch
1 input, 1 hidden, 1 out
ther nn linear takes the arguments input number, output number
and u add a sigmoid to the end to take the value in the last neuron and make it into a probability
Here it is:
actually im trying to build a personal vistural assistant and ive trained a tfidf vectorizer on dataset
which part of code should i share
Can you make a confusion matrix? This is the case where it shines
you mean this right?
pd.DataFrame(
confusion_matrix(y_true, y_pred, labels=['Dropout', 'Enrolled', 'Graduate']),
index=['Dropout', 'Enrolled', 'Graduate'],
columns=['Dropout', 'Enrolled', 'Graduate']
)
# original
-----SVC-----
Dropout Enrolled Graduate
Dropout 206 29 49
Enrolled 31 52 76
Graduate 9 19 414
# class_weight='balanced'
-----SVC-----
Dropout Enrolled Graduate
Dropout 193 57 34
Enrolled 28 92 39
Graduate 12 83 347
# down sample
-----SVC-----
Dropout Enrolled Graduate
Dropout 134 50 16
Enrolled 30 131 38
Graduate 7 30 163
Have you tuned your SVC? What Kernel are you using?
I haven't done any tuning, am using rbf.
other untuned models (RandomForest, lightgbm, catboost) also showed a similar precision-recall-f1 trend, I haven't checked their confusion matrices tho
Can you try tuning it? Just tune the C and the gamma parameters
Have you split off some data properly? If so, just looking at those records that are misclassified helps
Looking at your confusion matrix class weights made it worse
split off some data properly what does that mean?
train test split followed by cross validation
And not infinitely checking the misclassified rows of your test set (that's cheating)
you mean this right? (I didnt do hyperparams search yet tho)
train_X, test_X, train_y, test_y = train_test_split(X, y)
# hyperparam search / tune on CV(train_X), check model on test_X
tuning it a lil got me
{'C': 15.705979599964326, 'gamma': 0.0019505526555292866}
# original
-----SVC-----
precision recall f1-score support
Dropout 0.85 0.72 0.78 284
Enrolled 0.54 0.38 0.45 159
Graduate 0.77 0.93 0.84 442
accuracy 0.76 885
macro avg 0.72 0.68 0.69 885
weighted avg 0.76 0.76 0.75 885
Dropout Enrolled Graduate
Dropout 204 30 50
Enrolled 27 61 71
Graduate 9 22 411
# down sampled
-----SVC-----
precision recall f1-score support
Dropout 0.78 0.67 0.72 200
Enrolled 0.61 0.66 0.63 199
Graduate 0.75 0.80 0.77 200
accuracy 0.71 599
macro avg 0.71 0.71 0.71 599
weighted avg 0.71 0.71 0.71 599
Dropout Enrolled Graduate
Dropout 133 51 16
Enrolled 30 132 37
Graduate 7 34 159
welp gotta leave
guess I'll just let optuna run for a while and see if it comes up with smthn better
Does anyone have any experience with derivative free minimization
Are there any new methods that are better
exactly
Yeah, you only have 2 parameters so this is something I'd grid search and not use fancy optuna algos 😄
GridSearchCv from scikit is enough here
Hello everyone
I have used a virtual environment and installed these packages
but I cannot get them to be imported
@river cape your editor must be using a different environment than the one where you installed stuff
OH yea
But it isnt showing me the environment which I want
@river cape what editor are you using
VS code
You need to figure out where myenv is and tell Vs code to use that. You'll also need to restart Jupyter.
Thank you it worked
Actually it is showing to virtualenv of a folder called Notebooks
I deleted that
Yet it still points it to that
I have created another environment for this current folder , it doesnt show up in this
I have a question... What would be the logistics of running a small local language model in my simple 2d game that returns strings which are commentaries of various game events? (could be stuff like dialogue)
I'm very new to this, so please forgive the broadness of the question, but how feasible would this be? Where would be a good place to start regarding training my own small scale models?
You won't be able to train anything that is anything like what most people think language models are, unfortunately.
Models like ChatGPT are specifically generative and interactive language models
But language models are actually a much broader class of model than that.
You can make a generative language model that's based on markov chains on your laptop. Though I'm not sure it would produce coherent responses to game events
You would also have to encode each game event as a natural language statement.
In particular, it wouldn't be able to understand game state.
Or remember it.
alright, but training aside, surely they could at least fine-tune an existing model
to some extent anyway
or if not fine-tune then throw prompts at it to make it behave the way they want
If The model is small enough that they can run it on their laptop then yes
I should have said that, I was distracted by pycon
speaking of language models, my last attempt at improving the multilayer GRU next token predictor was to add layer normalization between them, to... speed up training? at least that's what I understand all these normalization layers are for (I actually managed to read skim over a couple papers on the topics I wanted to find out more about (currently reading the paper on attention)), not sure how much of an impact those layers had, but nonetheless, without handling class imbalance it converged to predicting only . pretty much, handling class imbalance the test loss just went up and up, so uhhh, idk what could be the issue (maybe I should MLFlow through some hyperparameters), maybe the dataset is not large enough, maybe this or maybe that, I don't know, I'll now proceed over to transformers though and yeah, that's kind of my little update 😁
Yup, tune the parameters 🚀
I want to do a topological sorting
But I don't know how
Pls someone help me
My input data is Like
{
"Node_ID":["bias : int",
"Activator : callable",
[["Descendant_ID : int ,
weight : float"], ...]],...
}
Hi guys! Im currently working on a ML project which consists of training a Resnet 18 model to learn to predict tire thread depth. I have a fully working code right now, but its not achieveing desired accuracy. I have tried a lot of different stuff but still cant seem to achieve the desired goal. Would someone mind helping me figure out a solution to get better accuracy? I would really
it!
This project will be use to sort the nodes while training the network using NEAT
Yes I will use DAGs network
Dead chat?
if you're working with graphs, you odds are you should be using one of these, so here:
- https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.topological_sort.html (python library)
- https://neo4j.com/docs/graph-data-science/current/algorithms/dag/topological-sort/ (graph database)
if you are not using either of them, then first load your data into either of them then use them.
alternatively, you could use the source code of NetworkX as a reference to how to implement it yourself if you truly must
Thanks
I mean that I pretty much make the distribution of the words the same, i.e., factor / occurences, that is apparently something one should do and it definitely makes sense for classification tasks or regression and such. It might be that I need to increase the threshold for when a token gets put into the group of rare tokens. I think can see how in next token prediction class imbalance might be desired, but it just converges to the most common token it seems, could be an issue with my model obviously, is GRU used for next token prediction even? There are frankly tons of variables here at play obviously, but I think for now I'll certainly just move on and direct my efforts towards understanding transformers.
yeah, I remember, I'll just assume something's off with the model then
and continue with transformers
cuz I also need to cover ViT afterwards
"Layer 'normalization' expected 3 variables, but received 0 variables during loading. Expected: ['normalization/mean:0', 'normalization/variance:0', 'normalization/count:0']" guys does anyone knows abt this error?
so far online source doesnt rlly provide a good answer
For my case i have trained the model on gg colab which used tf 2.15 and now I'm loading it out using tf 2.15, mayb the process of downloading the file from gg drive to my machine has problems?
So u mean like the version is matched so what if left is the .keras file download got probs?
Ah gotcha, i will look into it, ty for the help mate
welp, I'll keep that in mind the next time 😅
anyways, this is what I got
{'C': 31973.23413892633, 'gamma': 9.61611073865163e-05}
# tuned on CV(X_train) and checked on X_test
-----SVC-----
precision recall f1-score support
Dropout 0.82 0.70 0.76 284
Enrolled 0.51 0.39 0.44 159
Graduate 0.78 0.91 0.84 442
accuracy 0.75 885
macro avg 0.70 0.67 0.68 885
weighted avg 0.74 0.75 0.74 885
Dropout Enrolled Graduate
Dropout 200 34 50
Enrolled 30 62 67
Graduate 13 25 404
# no hyperparams
-----SVC-----
precision recall f1-score support
Dropout 0.83 0.70 0.76 284
Enrolled 0.51 0.36 0.43 159
Graduate 0.76 0.92 0.83 442
accuracy 0.75 885
macro avg 0.70 0.66 0.67 885
weighted avg 0.74 0.75 0.74 885
Dropout Enrolled Graduate
Dropout 198 34 52
Enrolled 26 58 75
Graduate 14 21 407
sooo I don't think the hyperparams did much
Hello everyone,
I have developed a Python application for Windows that transcribes speech using OpenAI's Whisper model. I've also created a small UI for this app. However, I'm running into issues when trying to create a .exe file to share the program.
The main problem is with the backend: when I try to transcribe speech using the .exe version, I encounter various errors. It appears that not all dependencies are being included in the installer, likely due to the extensive nature of the Whisper model.
Could anyone advise on the best way to package such a large project with a substantial backend? Any tips or solutions would be greatly appreciated!
Thank you!
The next step is just looking at the misclassified ones
Hello everyone, sharing a notebook on customer segmentation using KMeans and Hierarchical Clustering. I'm yet to finish the conclusion. I hope you guys can check it out. Thank you.
https://www.kaggle.com/code/jaepin/customer-segmentation-using-unsupervised-ml-algo
do I like compare them against the correctly classified ones or something?
(there are 37 columns and I'm not sure how I'd see what's causing the misclassification manually)
not sure what I'm supposed to look at
Yeah, manually. It's time consuming but you need to get intuition on what's going wrong. Another thing you can do is make plots that show the error versus each variable. Maybe you can see that if this variable occurs errors are more common and so on
aight, ty for your help and insights
hi guys! I am working on a multilabel classification task, and i have 3 models trained with different datsets. I want to ensemble these models. whats the best appraoch?
I mean just tf.keras.models.load_model ye?
Does being a h5 or a keras file has anything to do with it aye?
weird both keras and tf are 2.15 in both places
I just needed to know
Anyone here using lightening ai? to train their model
If thishttps://www.datawars.io/
Is a good place to make projects
print("{0:20}{1:20}".format(word, wordnet_lemma.lemmatize(word, pos='v'))
Any idea as to what does :20 mean?
!e adds some empty space (padding) to align ```py
for name in ('Foo', 'Title Bar', 'Long Title Baz'):
print(f'{name:20} - test')
@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | Foo - test
002 | Title Bar - test
003 | Long Title Baz - test
!d str.format
str.format(*args, **kwargs)```
Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces `{}`. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.
```py
>>> "The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'
``` See [Format String Syntax](https://docs.python.org/3/library/string.html#formatstrings) for a description of the various formatting options that can be specified in format strings.
See Format String Syntax for a description of the various formatting options that can be specified in format strings.
Confession time: Trying to make pytorch to work with ROCm is the most rage inducing, frustrating and completely awful experience that anyone ever had to experience
tbh, I think it's a good idea. Not a lot of practical project sites for data.
It seems like it's more educational/useful than leetcode
Hey guys. I've a weather dataset where I train data to learn how to predict solar irradiance. Now after it, I wanna use this model as a base model for PV solar energy prediction in transfer learning, but it does not do any good job in the latter no sort of change. So how can I debug what went wrong?
I'm working on a project where I'm trying to get an AI to learn how to play blackjack, would it be simpler to get the AI to predict whether you will lose or not depending on each move you could make and then make decisions based on that, or would it be simpler to just go straight to making decisions based on the state of the game and then improving based on the results of that?
I don't know the rules of blackjack. But for chess, the non-neural approach is to compute all the possibilities up to n turns out, and then use a heuristic to decide which path is best.
Okay thank you.
If you have a model that "predicts whether the next move will eventually cause you to win or lose", you still have to decide how the model will learn to make that prediction. Which still involves learning some understanding of what makes one move better than another
Does blackjack involve random chance?
does anyone know what's the best deep learning model for Regression task?
Hello, welcome to our wonderful data science chat.
Try being more specific about your task
I don't know how to be more specific other than that the dataset doesn't contain a Time feature, so LSTM isn't helpful.
Try telling the chat about the dataset
Well, the dataset is in the CSV format and contains 21 features, whereas the Yield values range from 0-1. I’ve tried XGBoost and CatBoost, but the accuracy seems to be stuck at 86%. I believe the dataset is imbalanced, but I want to try other models as well to see if I can reach 90% without doing data augmentation.
I've heard that xgboost tends to be the best for tabular data. Are there any hyper parameters you can change?
Take a look at what instances are getting predicted incorrectly. What do they have in common? Are there any patterns in the data that those instances don't uphold?
I’ve tried the Grid Search CV method to find the best hyper-parameters. It improved the accuracy but not by a lot.
I really don’t want to do data analysis, but it seems like I’ve no other option. oof
if you just want something quick, have you looked into automl solutions like autogluon, tpot, etc?
imbalanced
a lot of these models have parameters that try to remedy this, for xgb I believe it's sample_weight, catboost class_weights, lgbm is_unbalance, etc
and also if you know the data is imbalanced, accuracy may not be what you want to look at, something like f1 might be better
thx, I'll look into it.
idk, tell me if its good
Yeah, idk, I implemented a transformer, but it's only predicting dots... It does seem to converge faster to just predicting this one symbol compared to the RNNs, but nonetheless that is not the expected result as one might imagine
I mean, at this point, I have tried like at least 3 separate models (two RNN types, now one Transformer type), they just don't want to work and I'm a tad lost here
maybe it is the dataset, maybe it is that it's not actually tokenizing, it's more just splitting words
another thing I noticed is that they had a lot more sentences and a lot more words in the vocabulary, the dataset I'm using has 10k tokens and 34k sentences and I did adjust the other hyperparameters according to the paper (Attention Is All You Need) (as closely as possible)
dots are the most common token, by quite a large margin too
I'll let it train
maybe by some miracle, a miracle will happen, idk, will see later ig
what is itertools im not able to understand
tools for various iteration needs, you're gonna have to be a bit specific if you want a more specific answer
it is in a code which which is combining two different dataset columns
this
may I suggest the documentation for itertools.product
it's really just a couple iteration utilities, certainly one of the built-in modules I would recommend being familiar with at least to some extent
hmm
well is this part of data analysis or data science?
or anything else
and why is it called module
if you make it a part of either, then it is a part of it
ok
while I can't give you the exact origins of the name, I assume it's to do with Python files being modular in that you can import them and reuse and stuff like that, so, yeah, you can create your own modules, you can use built-in modules, you can use 3rd party modules, they're also sometimes called packages when there are several modules grouped together or libraries, but yk, they're all modules when you import them anyway, it's just an object in Python pretty much
it's probably something I'd recommend getting familiar with before trying to learn stuff like pandas
yeah sure, its funny question weird things just becuz i wanna to do english literature than this
i question the meaning of word instead questioning what is it use for lol
hahaha
yk what, it seems it's going away from them dots 
I forgot to add rollouts so I'm stuck with test predictions and not some fun fresh stuff, but at least it's not just dots
im sorry, im not understanding why we used Itertools.Product() here
well, did you read the documentation? what part of the docs did you not understand? maybe I can help you explain it.
on a higher level it just seems to have been used to generate those combinations pretty much
hopefully this helps you better understand what itertools.product does
yooo, actually, it's doing a lot better than the dang GRU net (maybe it just needed a couple thousand more iterations and it would've been at the same level too...)
(I still partially blame the dataset)
How do I start learning ml and ai
Like I know the maths such as calculus and statistics I have also learned python so I would like to know how to start learning ml and ai , would also like if you provide some Good teaching websites or yt tutorials not those too heavy ones but just some basic to intermediate so that I can atleast create my own models and also fine tuning them
check out the pinned messages in this channel
omg thanks....this is exactly what i want to learn data analysis. this visual way of showing a code
btw im gonna have to give an interview for a "Web Data Scraping Company" - https://www.actowizsolutions.com/about-us.php
guys can anyone tell me what to prepare for this
oh thanks, i guess ypu appove this website. im happy im learning on it. that i first came up with this, now everyone knows. haha i'm so amazing
what sujbects i would need to be a good data analysis tell me'
i'll go there and learn it
i'll go with 'Probability and Statistics' for now and learn it before i go to sleep
instead of wasting time playing "age of mythology"
"One strategy is to use the website's API, if available. APIs often provide a more structured way of accessing data and are less likely to have blocks in place to prevent scraping."
what does website API means?
also what does this means in simple launguage
oh
so we read the given set of rules and understand the block they might give us for their authentication requirement and then do something with the block given
right?
i'm sorry for my English i sound do unprofessional
yes
hello Im new here im doing one project with neural collaborative filtering and i tryed to do it thru gpu I have win 11 and i downloaded latest version of cuda bud pytorch cant find it. and yes i have GPU what shoul I do ?
ohhh god how can u make so much sense
hmmm right
but lets assume if there's a block
oh
god wow
how could u know this much man
you're amazing
so you're a data analyst ?
idk what is that but sounds cool
oh
Maximum likelihood estimator
i think it might be night there, if u r working in day, i think u should sleep and rest, u seem like a hard working person idk
Are you using windows subsystem for Linux?
no pure win
yeah well u r a solid person
Yeah, the latest torch versions don't work with GPU on pure windows
good man your future is bright
what should i do ?
good luck
You'll have to use WSL
It's explained somewhere in the docs
Actually I'm wrong sorry
This was the case with Tensorflow 🥴
does anybody know of a good way to generate smooth noise
Hmmmm you should be able to use it with pure windows. Have you tried running torch.cuda.is_available()
i mean i love this server becuz people here help a dumb person like me to understand things in better human language, i really trust u guys blindly. i've only came across smart people from this people. who always help me learn. you're one of them. thanks for helping me. you rock!
False
smooth noise as in random numbers around the mean?
yes and it should look smooth
wym
smooth noise
how in any dimensions
u can just use normally distributed values from numpy with howmuchever mean and deviation and then just shuffle em
do it for each dimension and voila
u prob can even type a oneliner like this
df = pd.DataFrame({f'Column{i+1}': np.random.permutation(np.random.normal(0, 1, 100)) for i in range(3)})
change the 0 to be mean, 1 to be the deviation and 100 to be the amount of values
oh wait a sec, u mean smooth as in the values must be continuous?
ok scratch that do this instead
import numpy as np
import matplotlib.pyplot as plt
from perlin_noise import PerlinNoise
# Set the random seed for reproducibility
np.random.seed(42)
# Define the size of the grid
grid_size = 100
# Create a Perlin noise object
noise = PerlinNoise(octaves=4, seed=42)
# Generate the 2D noise grid
noise_grid = np.array([[noise([i/grid_size, j/grid_size]) for j in range(grid_size)] for i in range(grid_size)])
# Display the noise grid
plt.imshow(noise_grid, cmap='gray')
plt.colorbar()
plt.title('2D Smooth Noise Grid')
plt.show()
this yields this image
u can find a more detailed package in https://pynoise.readthedocs.io/en/latest/
which also has perlin
in like 2 hours, but the good news is that it worked out in the end, yay, I was pessimistic at the start, but boy did a miracle (it's called math) happen, it went to like over 90% test accuracy, all the example inputs and predicted outputs that were printed matched perfectly, I'll try to do inference as well, as I didn't have any rollouts going on during training
oh and those attention matrices... man, those were beautiful to see
will share the fun stuff in them couple hours as well
honestly, kinda crazy
right and also the implementation (well, a chunk of it anyway) https://paste.pythondiscord.com/BJAQ
also the actual metrics at epoch 22
so yeah, very cool, onto the other stuff now
sure, I can try, if there will be any free GPUs available on paperspace, well, they'll become available eventually, so... yeah, but sure, can do
but using the dataset I have or sth else?
<@&831776746206265384> it appears that @vernal thunder is advertising a YouTube channel
Hello, your message has been deleted due to violating rule 6 of the server, which does not allow advertising
People, how do I use my M3 chip's GPU for tensorflow?
I have tried to install tensorflow macos but it didn't work I think.
Failed to install tensorflow metal from pip
and tensorflow GPU
why would they introduce more problems into their product and people's lives? smh
Rough take, switching from Tensorflow to Pytorch is a pain.
I am learning Pytorch but it just seems hella complex compared to tensorflow
My take is in TF you gotta worry about the logic and building of the model more than the syntax, for Pytorch, you gotta do both.
can you show some examples of which syntax elements you're thinking about?
Mainly having to work with OOP and just extra code for something simple like an ANN.
I could do the same thing with half the code with tensorflow
That's a little extreme but you get the point
Yes but why the need for OOP?
Tensorflow doesn't use it, idk why Pytorch needs to.
Enlightened me if I got it wrong
last I checked, TF definitely does use OOP
and yes need, you gotta keep a bunch of internal state somewhere
Not as much as Pytorch though
Everything I see with Pytorch just makes my head hurt
well, ig there's the whole torch.nn.functional which at least partially is getting depreacated for torch, so... hmm
Class Model(nn.Module)
I assume this is initializing the nn?
I am 100% lost 😂
wait, I swear I saw a warning about some nn.functional thing getting deprecated for just torch.
tensorflow does OOP too as soon as you want to make any non-trivial model
Ong learning data science is insanely difficult without university courses...
Deep learning specifically
They've both converged to very similar libs, especially with LazyLinear, LazyConv2D and so on being a thing now in Torch
yes, reading the docs is what I'd recommend
the hard part is rarely the code though
If I were you I'd read a book about this stuff
Like I understand all the theory to NNs but this is just a tad bit confusing
Even if you're doing a uni course you still venture out and learn on your own
Maybe that's a good idea
This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely av...
I am not, I am literally in grade 9 💀
The course I am following should have let me know that OOP would be required, never bothered to learn it before.
Should probably learn it now.
a factory of closures? that sounds like peak Javaism
I'm not even sure Java has Closures
Yeah.
You need free functions for closures. Maybe they have it now because they're adding every feature known to man in the latest Java versions, but I digress
but ofc they have anonymous classes
Java initially didn't have syntactic support for closures (these were introduced in Java 8), although it was fairly common practice to simulate them using anonymous inner classes.
https://stackoverflow.com/a/3805576/14531062
Oh god this looks terrible 😂
Anyhow, just learn the syntax of OOP in Python but nothing more. Learn what class FeedForwardNetwork(L.LightningModule) means
You can do this:
from dataclasses import dataclass
@dataclass
class Network(nn.Module):
w1: nn.Parameter
def forward():
pass
dataclass is stdlib
I'm sure there's a way you could do it. You can pass default, mutable parameters with field_init or something similar
this is what I meant: https://docs.python.org/3/library/dataclasses.html#dataclasses.field
60-70 % of the classes I make are with dataclass but I don't make torch stuff with it (or use dataclass + inheritance, in general)
Something in me thinks it'll go south (and I don't know why)
like some sort of edgecase
default_factory FTW
well, I'm not gonna find out any time soon 😂
There's something I'm missing tbh. I love absolute reproducibility but not the effort it entails. Some CLI tool that registers model / experiment runs and you can "reactivate" them with the CLI tool, which does a checkout to a commit and runs the .py file
Hey guys I need a help here,
I'm trying to make a Website Summarizer using Python and AI.
-> Which model is best for this use case? And I'm very new to this so I want to know if any type of fine-tuning can be done on the model.
-> Is there any web-service where I can deploy the model for free so I can use it as a backend to test, or I can run it locally in my laptop.
-> Is there also any lightweight fast models for this use case.
Ik there are too many questions here, and any help is appreciated.
Consider using an existing model/service. You could use OpenAI's models to do this, it makes the task significantly easier. There's also https://goose.ai/. Both of these assume you're at least willing to put a little bit of money in it though.
there are several issues with this
first, how are you gonna retrieve the content of those websites? you have to make sure you're not violating their ToS
second, it will most likely cost something, there are free models that can deal with RAG such as langchain, but if you want to host it someplace, you'll need to pay for the computation time, even if it runs on the CPU
third, ehhh, fine-tuning? probably won't be necessary if you just go with RAG (which I'm not sure where that technique lies in the state of the art hierarchy)
I spend very little time making "data science" code pretty 🤷
data science code has got to be the ugliest code I write casually
I try my best whenever I can though
My DS code is really bad. I code the happy path and constrain myself to only using that
But, I'd say my standard for "really bad" isn't really bad (humble brag)
I still have a couple of interfaces, use some level of OOP etc etc but it's definitely a lot worse than my other code for various reasons
can multi-headed attention be parallelized by using a bigger weight matrix?
The interfaces are mostly because for instance for work I was making encoder decoder models. It didn't makke sense, imo, to just flat out code a Seq2Seq but to make an TimeSeriesEncoder TimeSeriesDecoder because you can have a CNN encoder, with an RNN decoder, vice versa or CNN, CNN, RNN RNN and so on
oh, thought so
Definitely, thanks, I am learning about classes and objects rn, will learn a little bit more to get myself comfortable then get back to Pytorch.
dam, each of my multi-headed attention blocks could've been 8x faster had I opted for this
ehh, it was expected 
General purpose inference server, built in Rust 🦀 that loads models from "storage" (S3, minio, azure blob, filesystem, ...) that are in ONNX format. Next to that also make a small Python lib that wraps libraries that export models to ONNX, add metadata necessary for my inference server and store it in the "storage" (that rust is reading from). Deploying a new model as an endpoint would just be export_to_storage and the endpoint is created automatically
If classes are classes then what are classes?
You were talking about the "class" objectright?
I see, okay
well, type is the default metaclass and other metaclasses have to inherit from it
(maybe not details that matter to beginners tho)
everything (aside from type ig) inherits from object
but tbf, I would not worry about it even at an upper intermediate level
everything's a PyObject *
in CPython anyway
Metaprogramming is dangerous because it's easy to convince yourself you need it
Yup, but particularly for things you train yourself. Especially traditional ML models.
Doubt it. it's not a hard language at all imo but you need patience + to care about programming
I'd love to use an API service, but I'm making this as a project so a model running locally or smt that I can modify or use would be a great addition.
And that's only a small % of people programming professionally (and that's totally ok! Your job doesn't need to be your hobby)
Then huggingface is your best bet https://huggingface.co/
ah, an unimportant detail is I want the project to be usable by people that only know Python and no Rust
Like through a CLI tool that spins up the inf server so as a user you just focus on making models and putting them in the right storage
Hmmm, I doubt it because cpp doesn't have web related stuff that is popular + it's much more of a pain to write
Here is a question, would you guys say chatGPT is an "AI" or a glorified search engine?
With AI you probably mean "artificial general intelligence" (also known as AGI)?
The term AI (without the G) actually includes many "basic" things like search algorithms. When I say search here I don' tmean search engines but I mean depth first search, A*, uniform cost search and so on. These are traditionally seen as AI as well.
Now if the question is is it "AGI or a search engine" then the answer is: "ask a philosopher" 😂
What I mean is that it's not actually doing anything, its trained on a massive massive dataset so anything you might ask would be in it's dataset, it doesn't do anything out of the box.
REST APIs are mostly a solved problem - you don't suddenly decide you need to change your entire API to use XML instead of JSON, or stop using HTTP Headers to use a different type of request metadata
deep learning is much more experimental on that aspect, so the libraries do have to provide more low-level access, so that researchers can experiment with using different existing or new architectures, including all sorts of layers and connections between them, different ways of training the models, creating new activation functions and so on
You can use huggingface transformers if you want something on a similar level of abstraction to fastapi
Did you use TF1?
Well then you can see the progression we've made from TF1 all the way to Pytorch
But, I think @agile cobalt 's answer covers it pretty well
I mean the process of creating the text and using different code pieces and making a finished code from those is from the ability to learn, that's what makes anything an AI or NI. And no, I don't use it to make code for me.
The fact we have automatic differentiation is BIG
I guess that another point is that ML libraries cannot sacrifice performance at all, while most wouldn't complain if you got like 50% slower on fastapi because of using middlewares instead of defining things in a more low level way
it's easy to overlook the strides we've made. For something that is still research (like etrotta says) and not just code => solution means we might be there
Like?
It has learnt all of python or all of C# from its dataset, similar to how we learned those programming languages. It could be argued that that is what makes it AI.
the "learning" in machine learning is not something as lofty as you make it sound here. it just refers to optimizing parameters based on data examples
you realize that app = FastAPI() is already oop right?
The weights you mean?
Or are there some other parameters you are talking about
not necessarily, but sure
AI and ML do not involve neural networks in general
I was not talking about ML specifically
It was more about LLMs as a whole
that you cannot reach the level of abstraction you are thinking about without making sacrifices, and these sacrifices are not (yet) viable at this point in time
Hmm the terms used are ill defined so you can't really have a discussion
which falls under deep learning, and that also falls under machine learning
AI > ML > LLMs
yep
So LLMs are per definition AI
What about it though? I understand that but I was thinking about it logically
Not definition wise
Yes I know that
what exactly is your question?
Can GPT be considered an AI due to the fact that it isn't doing anything intelligent or out of the box.
By that I mean it's not inventing anything.
by definition, yes
I can't exactly explain what I mean but I hope you get the general idea
otherwise, AI does not exist if that's what you're getting at
zestar was right in pointing out you mean something at or above AGI level
We are not there yet though
Hence why this is a question for philosophers
Lmao
so as i said, from your view point, there is no AI at all
Yeah ig
AI broadly refers to data-driven optimization, which is what i pushed in that direction
in that sense, GPT is AI
Like there was one checkers bot that defeated the world champion by inventing a new out of the box move.
https://keras.io/getting_started/intro_to_keras_for_engineers/ Keras 3 actually looks cool
in the sense you're making up, there is no AI at all
I think it's what I'd recommend for "engineers" that have to train a model or so
where do you draw the "engineer" line
I would consider that to be more of AGI. Cause it's doing some inventing and thinking
good question, I'd say specifically software engineers
ML engineers are also engineers
(my phd cardboard, if i ever finish this shit, is gonna say dr. ing. too)
nice
so technically engineer here too
The ing. flex is a thing in germany too
Well, you have ir. as a title for MSc engineering science (the hard one) and ing. for engineering technology (the easier one)
ooh keras with jax, tf, and pytorch. nice
hmm interesting, i'm not sure if that distinction exists here as well
gonna have to ask
It's because of the bologna accord soup thing
It's pretty bad I've been in situations where I saw an ir. tell and ing. "don't touch the machine, let me get an engineer!" on the shop floor 😭 (because 1 has more prestige than the other)
yeah sorry
(all the terms are bullshit anyway)
AI is pretty tough to define broadly enough
you hear people talk about game AI all the time and it's just like a for loop and 3 if statements
Prolog 😦
Honestly it just depends on who you ask as well
I learnt about a bunch of methods in operations research that were also called "AI" in later years
i think this is the best answer to your original question
It's the same as linear regression being AI
According to the definition I think
What about it?
Stack overflow is getting GPT?
Didn't they already have an auto mod
Anyhow, to give my last message in something that is going off topic.
AI is basically whatever creates value for critical stakeholders by means of increasing the amount of venture capital they have to burn.
💀
This just raises the question of what counts as "inventing" and "thinking."
On the inventing part, if it's just making something new, that is trivial, if it's making something new and useful, we already do that with AI / optimization: https://en.wikipedia.org/wiki/Evolved_antenna
In radio communications, an evolved antenna is an antenna designed fully or substantially by an automatic computer design program that uses an evolutionary algorithm that mimics Darwinian evolution. This procedure has been used since the early 2000s to design antennas for mission-critical applications involving stringent, conflicting, or unusual...
And as for thinking, it's just not really defined (beyond the not very useful stuff like "it's the process by which someone comes up with a solution").
For example a checkers AI, it defeated the checkers world champion by inventing a move from its training. That, I would consider an AI.
I know I am throwing the term AI very loosely but you get the point
Yeah, in that case we have been doing that for a long time.
My point is that GPT doesn't exactly do that
But that is really just because there are things that humans are not good at all, and computers are, and for which you can make a well defined procedure.
Technically a human could find that move if they also did the search algorithm by hand on paper, it would just take really long.
So time is an important factor here.
Yes, GPT can only kind of do it, but not really intentionally.
It's meant to just be a chat bot.
that's a kinda "trivial" case though. there are finitely many (though really a LOT) of possible chess games. you don't need anything clever to make a "new" move. just loop over all of them and play a good move for your situation, no special thinking involved. people don't do that because memorizing a set of very good moves and having the skill to recognize them and know when to use them gets the job done. you can go out of your way and play weirdass moves if you like though
Although it's advertised as much more.
I kind of agree with this thinking but I was having a conversation with someone and wanted a second opinion
Ask it what script it's following
Or something like that
This, what is often considered intelligent is not having to do the brute force. An intelligent math proof is one where you cleverly get around doing brute force (to the extreme, you effectively skip it all (or even infinite)).
Not exactly a languages expert lol
guys i will start learn numpy and pandas... (some lybrairys should learn it for machine learning) so jupyter or pycharm is good for me????

So it basically just moved
d some letters around, not really reinvented or or something.
whichever you like, doesn't really matter much. if you're just starting out with python, i would advise against jupyter because it can promote bad habits in out of order code execution
Between pycharm and Jupiter I would say Jupiter
@craggy agate What you are probably looking for is deep reasoning as described by Wolfram. You can think of this as having and internal thought loop in which it solves problems algorithmically. We do have AI models that do this (e.g. NTMs), but they have been overshadowed by language models in popularity, they are still there though and being worked on, including integrating them with language models.
No not really
hmmmmm
why? can you give me reasons
i think jupyter is good for small projects like biginners right?
don't care for my mistakes in english
i already gave you my 2 cents 😛 it promotes bad habits if you're new to python
yo, does anyone know of a neural net architecture that:
is trained using a target_processed_signal + raw_unprocessed_signal and outputs the parameters
is tested using only raw_unprocessed_signal and outputs some parameters that would transform the raw signal into a desired signal? any links to papers will be highly appreciated, especially if they contain neural net diagrams for the architecture
I'm thinking of a use case is to train the network to somehow "remember" the qualities of a desired signal
and when I have no target signal but give it new, raw signals it would give me some parameters that I could apply to process the new signals
we've had a few cases here of people asking for help where the issue was executing cells out of order, or running a cell a second time resulting in inadvertent composition of functions. the first means the code only works if you run the cells out of order, and the latter means the code only works if you re run all the cells in order, at which point you're better off just using any ide you like
ok
Jupiter cause you can split up your code cell by cell and vs code for better customization.
I see
IMO it's better to just get used to using text directly, not the notebooks. Notebooks have the issue that any other non-plain-text coding method has, and it's that everyone now needs that specific editor for it to access your code and they have to now learn that. In addition, Jupyter Notebook is not well designed even as a notebook IMO. It causes many issues with debugging.
you did learn ML or something like that right?
Yes, why?
I currently run an architecture that predicts parameters based on a signal difference
the problem is that if I want to apply the network for my use case, I will need BOTH the raw and target signals, but I have no target signal, just the raw version
ok that big reason
This comes from frustration of asking for code from others only to receive a notebook which I now need to manually go through and painfully copy paste (with my mouse, ew, I need vim please) into a regular text file while paying attention to what their cell execution order was.
you might be able to rewrite the problem as a "self supervised problem" where you take an autoencoder that starts with params, generates a signal, then estimates params from that signal. you then split up the autoencoder and keep the decoder. just bouncing ideas around, this may or may not be helpful for you. (you an also interpret this as just using synthetic data if you keep the "encoder" fixed)
i think you know jupyter is better in small projects if i want practicing my new skills it will help me but pycharm won't help me beceause it just for big projects right?
you could do big and small projects in either, just use what you prefer
neither will "help you" in any special way
ok
pycharm has a special debugger and automatically creates venvs. jupyter has in-line plotting and can display latex cells between code cells. neither helps you with coding
i think will use vscode 🤣 
Your tool of choice can only hinder you, choose the one that stays out of your way (so you can just code).
i do use vscode cuz you can use it either for vanilla coding or for notebooks (but most importantly for writing latex tbh)
yeah you right
pycharm is arguably the most complex out of pycharm, vscode, and jupyter.
if you already use that comfortably, the others won't give you any issue
but yeah just use what you like
but also it is good for programmers python
if you like it, sure
About vim, emacs, etc. These editors are for when you decide which editor to use for a lifetime, they have high upfront learning investment required, but you will never need anything else again. So if you think you are ready for that, then maybe give it a go.
no ide or editor will do your job for you. they also don't teach you how to code
just use whatever has tools you like
the best one is vscode for me but beceause i hear pycharm is good in python
i just asking i want some information
there is no "best", it just depends on how much you like it and how well you use it
Until then, yeah, probably vscode.
Recommend starting out with Jupiter NB but if you like py charm then go for it
notebooks are not too bad - in particular, inline plotting can be useful for exploring datasets or testing data transformations, just remember to Restart Kernel every so often to ensure your results are reproducible and you didn't end up with a messy state
the tool you use doesn't really matter
though, there are indeed 2 main approaches that can influence the way you organize your code:
notebooks, where the state of the program is sort of saved as long as your session (or whatever it's called) is still active
or the classic way where you would have to run your code every time if you want to access a specific state
Yeah, just do whatever, you can use notepad, it's really whatever. What really matters is that you are making stuff, you will find out for yourself what is working better after trying them.
Don't get stuck in analysis paralysis.
I'd recommend the classic way if you're a beginner
don't look into notebooks just yet because there you can run the cells out of order and you could get unexpected results because you didn't pay attention to what cell you've ran before
Path of least resistance since what right now matters is that you are coding. But keep in mind that some paths with more resistance have payoffs at the end.
I worked like 1.5 years with notebooks and it was a bit hard for me to transition back to the classoc way of coding
i will never choose front end
I don't like front end either but anyway
i just 3 months 
looks at python code
for every epsilon, there is a delta...
If it does it says more about frontend work as a whole than about AI... (or any other "job")
i like back end is just like, i see my self like mr robot 🫠 🤣
Well, eventually everything maybe, frontend will be the least of our concerns at that point...
it will never replace all devlopers but just front end can ai replace it
yeah
anyway, does anyone know how to train a network with 2 inputs X1, X2 (like a raw and a target) to match some output parameters Y
so that when I apply it to the real use case I only have the raw input X1 and I need to find the parameters that would transform X1 to X2 without having the X2?
i think much peaple choose ai
right??
Will people be choosing at all at that point?
if that happend the market will..
not all peaple, beceause thier eyes see AI is the future
or I could use an encoder to estimate the params from the raw to the desired AND use the output of the decoder which should be the desired signal
and then the decoder would receive the params+raw as input and would generate the desired signal
that's already implicitly included in the model, since forward modelling the input parameters into synthetic data means your data y is a function of the params x. then you want to use y(x) to estimate x, which is a typical inverse problem
thing is: I already have a deterministic "algo" for the "decoder" - i.e. the signal processor I want to set those params for
which algorithm are you using? if it'S deterministic and fixed, you certainly don't need both the signals and the parameters
there is no training
it's a parametric filter that's used to transform X1 to X2
I have a training set of a very soecific set of desired X2s
and when I encounter real data, I wouldn't have the X2, but just an idea of what X2 I would want to find
it's basically the mixing process of a sound engineer
so the training would be needed to find the "good" parameters to make any X1 signal sound better even if I don't have a target X2
but I want to estimate the params instead of having the desired signal as a direct output, because I want to be able to make changes in the parameters for full customizability of the output
that sounds like a typical ML problem though, what's the issue?
you want to compute X2 from X1, you have a parametric function that can do that but you don't know the parameters
you have examples of X2 as well. do you have the X1 that go with those X2's?
Hi Guys I have actually started learning SQL for data science.
I was wondering what I should do after learning the syntax and those basic where,group by,join,etc
practice
Surely but how
I mean, ig practicing SQL on its own is gonna be a bit tough, you kinda gotta include it in some bigger project
So you suggest that instead of doing Hackerrank Problems,I should jump straight into using it in some project that requires SQL?
there are hackerrank problems that only need SQL?
but yeah, usually projects are a great way to practice
Defiently
check the pins in #databases
I mean, ig you can try hackerrank as well
Okay thanks
Yea
I get it now
Can I tell you ig what I am doing rn towards learning Data Science to make sure I am doing it right and in the right order?
I am just making sure that you are specialized in it
I'm not particularly specialized, but you can check out the pinned messages in this channel
nope
i have categorical columns in my dataset that have missing values what could be a goof way to impute them besides replacing empty values with mode for that column. something that might impute these values depending on some other column to make ot closer to what could have been the real value. I hope the question is understandable
So, you want to infer some data depending on the values of other rows?
yes. taking an example:suppose there is a datset with cols A B C. A is a categorical col with many null values. I want to handle these missing values and to do so I want use cols B and C to come to help.This way I can get a value which is as realistic as possible w.r.t the given dataset.
if you have enough examples, you could create a small supervised model to infer those values but it might involved a bit of work. But since you already have all the data structured it could be worth it
thanks for the idea👍
one idea is you can groupby some other column and use the modes of each group to fill instead
oh! thats actually a very nice approach. Thanks alot🙌
afternoon everyone. can anyone recommend alternatives to the mml book? or website which lists all the topics to learn for ML (excluding the stats)?
hey guys
Take a look at the pinned posts, I wrote some good ones down there
im studying from khan academy for the Probability and Statistics, u asked. aur you sure there can't be Probability and Statistics specially for a data analysis coders?
Any as to why do we use toarray() function to get the output of the Bag Of Words ?
It's hard to tell what you mean without any context, but generally toarray in ML is used just to make something a normal numpy array, rather than a torch tensor (which is similar to a numpy array but has extra stuff like autograd attached to it).
Here why do we use toarray()
Here why do we use toarray()
self - Reminder to not always set CV so high for parameter tuning, unless absolutely necessary
A couple of resources - sqlbolt
i am scrapping and completely redesigning my reward function
cuz obviously something aint working
C:\Users\Adam\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
hey
i have detection model for potholes, which is intended to work on the DVR set up on the driving car
currently I have only class potholes, but I need to diversify it and create new classes based on the size of the potholes : small_pothole, medium_pothole, large_pothole
should I retrain the model, or should I leverage the area of the potholes based on the pixels, such as
bbox_area = (x_max - x_min) * (y_max - y_min)
if bbox_area > area_threshold_large:
return "large_pothole"
elif ...
else...
?
Hey data people
As an non passionate data analysis student, what do you guys think I need to understand when I'm finding it hard to have focus on a project, that sometimes I lose motivation while making one?
How to do that ?
Maybe my brain thinks everything is hard for it to understand until I do, you need to know that I'm not too sharp
# now lets try to convert all the cells in the date column into dates via to_datetime()
import pandas as pd
df = pd.read_csv('dirtydata.csv')
df["Date"] = pd.to_datetime(df["Date"])
print(df.to_string())
I am getting error, can someone tell me whats wrong here
i am trying to clean the date formatted in wrong way
I always get ModuleNotFoundError: No module named 'loss_functions' in the following lines:
File "/home/user/backend-project/core/scripts/classification.py", line 13, in classify_single_label
checkpoint = torch.load(model_path, map_location=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/backend-project/.venv/lib/python3.11/site-packages/torch/serialization.py", line 1025, in load
return _load(opened_zipfile,
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/backend-project/.venv/lib/python3.11/site-packages/torch/serialization.py", line 1446, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "/home/user/backend-project/.venv/lib/python3.11/site-packages/torch/serialization.py", line 1439, in find_class
return super().find_class(mod_name, name)
Problem is, that this function (which I have used in another project for training my model) is never used in this new API project. Anyone ever experienced something similar?
!traceback Paste the full trackback plz
Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.
A full traceback could look like:
Traceback (most recent call last):
File "my_file.py", line 5, in <module>
add_three("6")
File "my_file.py", line 2, in add_three
a = num + 3
~~~~^~~
TypeError: can only concatenate str (not "int") to str
If the traceback is long, use our pastebin.
But, the fact that you're unpickling an object is likely the problem. What kind of object is it?
You (or someone) trained some model, pickled it, then are trying to load it... but to load it, you need the modules that it uses.
I've trained the model(s). I also know which module is needed, but I was so confused, as it is never used anywhere else outside of training. But pasting the module back to its original location in the new project, where I changed all locations for each file (since I am building the backend now, with a proper api structure), didn't work. Also new locations weren't accepted as well.
Is there a way to find out where the module should be located at?
Traceback will be added via pastebin, as the python deleted my message...
*python bot
Is the module your code or something you installed via pip?
No, custom made (by me, my code) module, consisting of 2 functions. It is really small and only used for the loss calculation in training.
Traceback: https://paste.pythondiscord.com/5WZA
Sorry, missed the last msg, one sec
How is your code laid out? Just test to make sure your module can be imported by a simple program
This would be a good help thread, as I'm leaving in a few minutes #❓|how-to-get-help
What activation function I can use on hidden nodes and output nodes?
hidden nodes: nearly any can work, there isn't one clear winner for all cases, but ReLU is a popular choice
output nodes: if you mean the final output, very problem dependent but much of the time you wouldn't include one - The output of the model has to match the properties of your target variable
Most tutorials should mention which activation functions you should use for a particular problem, and you can always check which ones popular architectures are using
neurips does double blind and open review (the papers are put up somewhere where anyone can comment and leave feedback, plus reviewers are assigned to it)
as for the quality... it's a huge conference with tens of thousands of submissions, so there have been concerns with the quality of the reviews
how do i find bends in this image?
you'll have to be more specific about what you are asking
Yes, this table represents the correlations between each variable, the correlation is 0.389583. What exactly do you not understand?
On the diagonal you're computing the correlation between a variable and itself, which is obviously 1 (look at the formula again if you're unsure)
Afterwards you compute the correlation between weight and height, this is exactly the same as computing the correlation between height and weight (look at the formula again if you're unsure about this one as well)
do I just replace my SingleHeadedAttention with this? which one? try with both?
what's x_bcd 👀
(what are all these attention types)
it's moments like that that make me wonder, well, surely I'm in the valley of despair, but maybe it's a logarithmic scale
Hey i am new to pytorch and trying to basic train and evaluate a model. The training is not working very good is here someone I could ask some questions in the DMs about this? Maybe get some tips and hints.
Heads up, nobody will answer in DMs. It's best to just describe your issue here and if someone can help they will help
mmm, I use multiheaded attention, the singleheaded attention class is so I can expand it more easily and such or is that what you meant?
what? I'm pretty sure I'm doing it pretty much how it's described
I don't remember anything about that 
the shape of the output of the multiheaded sublayer is the same as the shape of a single head
is that not what you mean?
the HIDDEN_SIZE is the size of a single head
yeah, that's what I have
concat dotted with the fc layer
it's for a single head
yeah, so the input gets fed to each head, each of which has a hidden size of 64, each of them output a shape of (64, 64), that gets concatenated, so you get (64, 512) that then gets dotted with (512, 64) and the output of the whole multiheaded sublayer is (64, 64)
mmm, should it? it just duplicates the input for each head pretty much
I was also reading this alongside the paper: https://jalammar.github.io/illustrated-transformer/
Are you guys aware of any hackathons where aiml skills are used?
I only hear web dev guys going and rocking there
i just wanna highlight that you don't need it to be a rectangular matrix for it to be a projection
(in fact rectangular matrices don't define projections at all, but the resulting vector space is isomorphic to a low dimensional subspace of the domain)
that it does, yes, but still you don't need it to be rectangular for that to happen
you also don't
you noted yourself that there is an implementation that uses dropout instead of those rectangular matrices
dropout is a projection onto a low dimensional subspace. a proper one, too. square matrix and idempotent, rank deficient
an identity with missing 1s on the diagonal
they really don't
good for them, but they don't need to be
here
by enforcing low rank with some other condition
e.g. requiring the matrix to be diagonal and only a specified number of nonzero entries
implementation in paper and in code are two different things as well
in the example i gave you above of punctured identity mats, you can use a sparse array representation that is pretty efficient
the matrix also has square root as many parameters. win-win
i'm not bringing it up to be pedantic, but rather to highlight that there are multiple ways of achieving the same effect, and as you noted, not all of them have been tried. they have different properties and entail different costs. the rectangular approach certainly achieves the effect you want, but i don't want either of you to be chained to that approach and/or wonder why all of a sudden people do something different in another paper
the main idea is low dimensional subspaces, and you can get those in more than one way
ok, that visualization really messed with me
I just have this pretty much
alright
here's what I understand
in_features of each head correspond to embedding dimensions
which is what I have
it just happens that they are the same value
alright, in that case, I have the architecture implemented correctly I just need to ensure that embedding dim == number of heads * hidden size
phew
alright, but each head still receives the same input, right? like it's duplicated across each head, just each head has its own weights
alright, I see now, I misunderstood you at first
Has anyone every installed onnxruntime for armv7 architecture
Hey so I was deploying a simple model with the following code in the predict function
item = Item(
prompt=prompt,
)
messages = [
{
"role": "system",
"content": "You are an expert programming assistant",
},
{"role": "user", "prompt": item.prompt},
]
outputs = pipe(
messages,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95,
stop_sequence="<|im_end|>",)
return(outputs[0]["generated_text"][-1]["content"]) ```
and I'm getting this error when calling the function:
```{
"run_id": "5e5b8f00-2587-93e4-96f5-bf23009ee062",
"result": {
"error": "When passing chat dicts as input, each dict must have a 'role' and 'content' key."
},
"run_time_ms": 26864.662170410156
}```
My function call was just a simple:
```{
"prompt": "Program to add 3 numbers"
}```
what is x_bcc supposed to be?
I'll assume dotproducts_bcc
well, it is
in the code you sent
in softmax
lol, what
also how is this supposed to work if the matrices are not square?
metric_dd = (self.coefmatrix_dd.weight + self.coefmatrix_dd.weight.transpose(-1, -2)) / 2
no, but why are you in vr 😄
well, I changed it
but why wouldn't it generalize over non-square matrices?
alright, I named it EMBEDDING_DIM and HIDDEN_SIZE btw
I assume K is the head hidden size
well, not to me
at least for now...
wait, why did it change to D, K for kk?
alr
kd seems to be the right one
also bit of a technicality, but why not use .forward and set bias=False or use a Parameter(Tensor(...))
any recommended projects for data analyst or data scientist
anyway, it's running, but I changed it to projection_kd instead of transposing
makes sense, but then why not just create a Tensor param
ah, I see
makes sense
can someone guide me where to start with data science? currently i know python and some bash, can bash be helpful? where should i look for algorithms and such...
I know python is not the only necessary tool to use, but i want to begin from somewhere
bash can probably be useful for deployments
it can indeed, um actually, are there any fundamental steps i should take before getting into data science? i can decide for bash later on, i dont think its necessary right now
probably not necessary rn, no
you can take a look at the pinned messages here
same for @warm pebble
ok I'll take a look at them, thanks
something seems off with that testing accuracy
does it mask everything it needs to mask though?
doesn't it need to mask below the sequence as well?
I mean like this
also as I understand masking is done only for the decoder column, right?
uhhh, is it because packing the sequence is gonna throw away the rest?
what's in those places then? the c has to have the same shape as the rest of everything regardless of the actual sequence size
now I'm confused
yeah
yes
but like what if there are other lengths in that batch?
actually, gimme a sec, I need to check something, lol
waiiit nooo
it gets padded
the dataset I have has varying lengths of text as far as I'm aware
and it doesn't get preprocessed to cut that down
it just gets padded with zeros
now, I'm not saying that's how it should be, but that's how it was handed to me
the dataset is a bunch of sentences of varying length, yes
I mean, ig then it makes sense to do this if I do have varying lengths of sequences
classification as in sentiment analysis?
mmm
I'm slightly veering into the RNN territory again, thinking of those architectures, lol
I think I saw those graphs where you thought there might be a leak
the only issue is that I haven't found a way besides using a for loop to mask that other area...
and that makes the training incredibly slow
sth about that test accuracy ain't looking good still
status upate
alright, I have a feeling I messed up somewhere...
it appears the issue was too many dimensions for the embeddings?
yeah, that seems to be it, dam
I also just found the median sentence length and just sliced all the sentences to that length (that were long enough) so I have a fixed context size across the entire dataset so I don't have to use that dang slow for loop to do this whole thing, I can just use the triangle mask
I can't believe that doing what I did just now made it so much much faster, like, what it took a couple minutes to reach similar accuracy as before when it took like 6 hours...
so anyway, this is what I got with my model (but like, improved as of today)
this is this, it took it longer to reach similar accuracy as you can see, but it was certainly more interesting to see how the attention matrix developed over time (or maybe that's because it just took longer to develop, lol)
and this is for this, as you had said, it's rather similar to what I have
I modified the code a bit to do the projection thingy
class SingleHeadedAttention(torch.nn.Module):
def __init__(self, mask: bool):
super().__init__()
self.projection_kd = torch.nn.Linear(in_features=EMBEDDING_DIM, out_features=HIDDEN_SIZE)
self.coefmatrix_kk = torch.nn.Linear(in_features=HIDDEN_SIZE, out_features=HIDDEN_SIZE)
self.mask = mask
def forward(self, x_bcd, lengths, y=None, soft_attention=False):
x_bkc = self.projection_kd.weight @ x_bcd.transpose(1, 2)
x_bck = x_bkc.transpose(1, 2)
dotproducts_bcc = x_bck @ self.coefmatrix_kk.weight @ x_bkc
if self.mask:
seq_size = x_bck.size(1)
mask = torch.tril(torch.ones(seq_size, seq_size)).to(DEVICE)
dotproducts_bcc = dotproducts_bcc.masked_fill(mask == 0, value=-torch.inf)
dotproducts_bcc = torch.softmax(dotproducts_bcc, dim=-1)
y_bcd = dotproducts_bcc @ x_bck
if soft_attention:
return y_bcd, dotproducts_bcc
else:
return y_bcd
``
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
2 Physical GPUs, 2 Logical GPUs
``
why iam getting this?
i have only one gpu on my system
also got this error when i started to train my model
https://pastebin.com/KHJDMfpe
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Does anyone know the best Python library to under-sample the regression dataset to deal with an imbalanced dataset? I've tried resreg, but it's not really helpful since I can't control the under-sampling dataset, and the other one is smoter, which is extremely slow.
i'm not the one who created the dataset.
Don't undersample imo
You're better off using some sort of cost function
They added this in the latest version of sklearn. Now you don't need to do it manually
Before I'd have written it out in full but it seems they even have a guide that explains my point well now 😄
https://scikit-learn.org/stable/auto_examples/model_selection/plot_cost_sensitive_learning.html
thanks i'll look into it
hey how to decide which scaling to apply on data?
Models that use gradient descent or L2/L1 regularization need some sort of scaling, frequently standard scaling is applied. Imo it's a nice an easy exercise to figure out why it's the case (just look at the equations).
Models that don't fall into this category (famously, tree based models) don't necessarily need scaling but I frequently do it anyway.
I'd say the biggest downside of standard scaling is that robust metrics aren't used. If you have outliers it can skew your mean and median. That would be an argument for using a different scaler (e.g., min-max). Defaulting to standard scaling is a good idea though.
can i show you snippet of data ?
sure
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
1st and 2nd channels are in exponential distribution and 3 and 4th channel are in normal distribution
I'd start by standard scaling all of them
It'd be great if you figure out why (it's not a hard exercise)
to bring it in same range?
So really just take the equation of MSE loss + L2 regularization
just saying but I'm masking only in decoder's self-attention, the decoder-encoder attention doesn't do masking and neither does encoder's self-attention
And think about "what happens to my regularization term if my variables are on different scales"
this one
oh ok it would be biased towards features with larger scale
Close, but it's the opposite. Small variables will have a large weight and would contribute disproportionately to the cost
got it!
And if you look at the update for (stochastic) gradient descent the same thing comes up.
ok so its directly proportional to magnitude of weights
And to come full circle, the way you normalize is typically standard scaling but if you have outliers you may also use min-max scaling or something that won't destroy your data.
seems like u have a decent exp in RAG in general, i need some review over a few things,
would be greatful if u could enlighten with that
You should be trying this out with numpy to get the intuitions. Make a very large number in an array of randomly generated numbers with a certain mean and check what it does to your mean and stdev
outliers are important in my data
its real life data so some anamoly in it should be learned by model
Sure, but try some of the things out I mentioned. Either empirically (with numpy) or by looking at the equations
Then the whole scaling thing will be clear to you
thank you that was really helpful
how can I find the intersection 3 indexes in pandas? the Index.intersection method only accepts to check 1 other index.
I, uhh, lost them, btw, how do I combine the attention from all heads in all layers in all blocks?
cuz, I'm gonna have to rerun it, cuz like, yeah... I'll also make the attention look nicer and across multiple sequences and add the sequence values as well
I mean, it converged pretty quickly
so it won't take long once I set it up
wait
but attention is bcc
yes
that's what I meant
I realize how it could've been misunderstood
no, I want to know how to combine the attention score matrices from all heads in the whole network
well, yes
but like, what attention scores do you want them?
I've been using one of the heads of the last encoder-decoder attention sub layer
I have basically no experience with RAG
oh right... that's very interesting
but then how do u know that much
but how do you do a decoder without the encoder-decoder sub-layer?
isn't that just an encoder with masking or sth?
dam
what is encoder + decoder used for then?
translation?
mmmmm, right
I thought you'd do the same with next token, though it did make we wonder why the same input is embedded in different embeddings 😁
(nope)
I have very surface level understanding of RAG, so maybe it just seems like I know a ton about it
mmm, I see
ya possible
i have a very surface level questionn for u then
is RAG just feeding the entire context to a transformer and then you just append your question to the end of that and it does next token prediction?
You take one or many documents, slice them into pieces, embed and store them. Get a query, embded your query, do a similarity search and append the contents of the most similar document fragments to the context
nope
oh and then it just does next token based on that context?
After a long discussion with Edd and Etrota on this topic ...
it's basically what I learnt in search engines and information retrieval years ago
but with LLMs 🤷
My experience with RAGs is to the extent that I've made one with AWS bedrock to see wazzup
alright, I'll cover RAG at some point in the future, I'm here trying to even understand transformers to a reasonable extent, they're too magical... and I mean, they literally are since it's not like we actually understand how they work, do we?
^ that's what I always say
I think understanding the intuitions of self attention, multihead attention and so on isn't too hard but I'm not always convinced our intuition of the methods is aligned with how and especially why they work
well, we might have an "intuition"
so i've implemented these things (room for improvement)
I manually coded it but at the meantime LangChain has in built classes and functions for it
Should i use it or my manually coded one
this is the working thing.
just need review
so, basically I was trying to translate English to English... that's hilarious
@spring field @past meteor maybe have a look, please?
i want u to review it
I'd prefer it if Maud'dib would have a look in my stead
I'm not sure what to review there... as I said, I have a surface level understanding of RAG
Make it blue
hmm
I mean the code looks alright at first glance
ya and it's working but the fear is i coded it manually
and aint using langchain libraries/classes for ReOrdering Text
also that format_docs function is a bit pointless
probably, it was on the docs
i have no as such exp in this field 😦
but our product is fully based on this
i will go bankrupt
just to recap
next token prediction with transformers is done using only encoders, but you mask the self-attention sub-layer?
because I see structures with "decoder-only", but like, that implies throwing out the whole encoder-decoder sub-layer and I don't like it
decoder-only doesn't make sense, it has to be a hybrid between the two, according to the paper
or can you also not mask the attention?
hm
okay okay
can we make a conclusion out this
so i can work on it
💀 please?
but if you mask it, your attention score is also masked? or did I forget the order of masking
No errors,
Want to make it better
no errors,
all good and working
need to make it
Better output from RAG
More relevant, more precise more btter
so true
oh, do you do the whole dot prod and softmax, then use that as the attention score and then mask the thing that's going to output?
GL
hm
right, makes sense
I just may have some knowledge in some other areas, like 10 minutes ago I found out that RAG does similarity search
💀
more interesting.
sth like that
Is it possible to create a Network in pytorch that only uses a int-datatype, so the weights, input and output are all int? I've tried to make this work but it always returns
RuntimeError: Only Tensors of floating point and complex dtype can require gradients
I mean, ig you can disable the require_gradients and do them yourself, but why would you use ints anyway?
i need really fast int8_t operations in c++ with simd avx2 instructions
but why ints?
simd instructions can handle floats as well
ig not as many as int8_t at once, but yk
why do you even need simd, it won't help you that much, better if you can run stuff on a GPU
tbf a lot of times float32 operations on intel chips can be faster than their integer versions
with a lot of integer operations you have the caveat of the overflow handling which normally limits how many lanes you can actually use even if you can hold more
also with int8_t you might easily run into overflow issues with neural nets
yeah, normally you end up going 16 x int8 ops instead of 32 x int8 ops moving up into int16 results
unless you don't care about the saturation or overflows, but then that can cause behavour differences between archs
also some things like any integer division is incredibly expensive compared to fp
i thought of int8xint8 = int16 and then as activation function clamp [0,127] and so on
classical derivatives are only defined over the reals, not over the ints
you wouldn't get correct results with autograd in general, since derivatives could generally map into the rationals or reals
rounding/casting after differentiation is also generally not correct
probably worth a note though
if you want the best speed
Your buffers need to be aligned to 64 bytes
and your operations need to cut the branching down so you're doing about 64 values per loop call
yes i have already test with float and i have alligend it to the chache size
Also probably want an AMD cpu rather than intel most of the time
template<uint64_t inputSize, uint64_t outputSize>
void layer_32(Layer<inputSize, outputSize>& layer_16, std::array<float, outputSize>& output, const std::array<float, inputSize> input)
{
alignas(64) float arr[8];
for (uint64_t j = 0; j < outputSize; ++j) {
output[j] = layer_16.bias[j];
for (uint64_t i = 0; i < inputSize; i += 32) {
__m256 _weights0 = _mm256_load_ps(&layer_16.weights[j][i]);
__m256 _weights1 = _mm256_load_ps(&layer_16.weights[j][i + 8]);
__m256 _weights2 = _mm256_load_ps(&layer_16.weights[j][i + 16]);
__m256 _weights3 = _mm256_load_ps(&layer_16.weights[j][i + 24]);
__m256 _input0 = _mm256_load_ps(&input[i]);
__m256 _input1 = _mm256_load_ps(&input[i + 8]);
__m256 _input2 = _mm256_load_ps(&input[i + 16]);
__m256 _input3 = _mm256_load_ps(&input[i + 24]);
__m256 out0 = _mm256_mul_ps(_weights0, _input0);
__m256 out1 = _mm256_fmadd_ps(_weights1, _input1,out0);
__m256 out2 = _mm256_fmadd_ps(_weights2, _input2,out1);
__m256 out3 = _mm256_fmadd_ps(_weights3, _input3,out2);
__m256 temp = _mm256_hadd_ps(out3, out3);
temp = _mm256_hadd_ps(temp, temp);
_mm256_store_ps(arr, temp);
output[j] += arr[0] + arr[4];
}
}
}
this is my matrix multiplication
for a input of 1d and output of 1d array
but with int8 i could make this way faster, the only thing i need is to some how train a model that is accurate enough
you are loose a tone of performance with how you have structured your ops btw
and the copying of memory per iteration
i dont quite follow
did you look into popular quantatization and distillation methods first?
so SIMD instructions are basically 1 instruction, but they are not 1 instruction = 1 cyle
and sometimes multiple SIMD instructions can be executed within 1 cycle
which is where the whole "uops" stuff comes about
im currently experimenting with this but i havent made it work yet
but currently, you are bottleknecking yourself at major points from a quick glance:
__m256 out0 = _mm256_mul_ps(_weights0, _input0); becomes your dependency on the execution below so
__m256 out1 = _mm256_fmadd_ps(_weights1, _input1,out0);
__m256 out2 = _mm256_fmadd_ps(_weights2, _input2,out1);
__m256 out3 = _mm256_fmadd_ps(_weights3, _input3,out2);
Each step here is now having to wait on the previous instruction to finish before executing the next
so instead of the CPU being able to do this step in 2 cycles, it's execution time jumps to 4 cycles (normally) and you have the added latency for each instruction
which is normally ~7-10
i wast testing this implementation before i had :
template<uint64_t inputSize, uint64_t outputSize>
void layer_32(Layer<inputSize, outputSize>& layer_16, std::array<float, outputSize>& output, const std::array<float, inputSize> input)
{
alignas(64) float arr[8];
for (uint64_t j = 0; j < outputSize; ++j) {
output[j] = layer_16.bias[j];
for (uint64_t i = 0; i < inputSize; i += 32) {
__m256 _weights0 = _mm256_load_ps(&layer_16.weights[j][i]);
__m256 _weights1 = _mm256_load_ps(&layer_16.weights[j][i + 8]);
__m256 _weights2 = _mm256_load_ps(&layer_16.weights[j][i + 16]);
__m256 _weights3 = _mm256_load_ps(&layer_16.weights[j][i + 24]);
__m256 _input0 = _mm256_load_ps(&input[i]);
__m256 _input1 = _mm256_load_ps(&input[i + 8]);
__m256 _input2 = _mm256_load_ps(&input[i + 16]);
__m256 _input3 = _mm256_load_ps(&input[i + 24]);
__m256 out0 = _mm256_mul_ps(_weights0, _input0);
__m256 out1 = _mm256_mul_ps(_weights1, _input1);
__m256 out2 = _mm256_mul_ps(_weights2, _input2);
__m256 out3 = _mm256_mul_ps(_weights3, _input3);
out0 = _mm256_add_ps(out0, out1);
out2 = _mm256_add_ps(out2, out3);
out0 = _mm256_add_ps(out0, out2);
__m256 temp = _mm256_hadd_ps(out0, out0);
temp = _mm256_hadd_ps(temp, temp);
_mm256_store_ps(arr, temp);
output[j] += arr[0] + arr[4];
}
}
}```
i would have to test the exact performance but running this on random data a million times both versions take around the same time
but this are small chnages that could performance increase just a bit, my main focus is if i can make int8xint8 = int16 work with acceptable accuracy
i should also mention that my first layer consist of 0 and 1, so i dont even need to convert them
I mean you can do it, but like you said I'm not sure how well your accuracy is going to carry over
i will try and see if i get good results, ty
What about just training on float16 and quantising to integers of whatever you want
thats what im currently trying
anyone know YOLO ? and have any experience into that ? i just need help
what is the best dataset(s) for people who are kind of new to NLPs?
Thank you
what about animequotes or hate speech?
what is the main difference between countvectorize and Tfidvectorize?
why am i here
religions attempt to answer this question. personally, I think life is more fulfilling if you don't try to ascribe purpose to your existence, and just spend it doing things that are fulfilling for you.
i have no words
why do you think you're here?
because i clicked a link
why did you do it?
because my actions were predestined 50,000 years before the creation of the universe
Interesting. Anyway, this is the data science channel, so let's talk about that going forward.
I just finished my first NLP, I think it is trash. How do I judge it objecitvely?
like, what are the tiers of skill in ML/data science whatever
nothing is "an NLP". NLP is a concept.
What did you create?
BCE can only be used when you have two classes at least for that particular output, right?
am I overthinking this or can BCE only be used for that one case where you have to predict between one of two classes? is that it? can it at all be used for multi-class classification?
ViT + TokenLearner 🤭
yeah
- TokenLearner
which ig is really helpful if you have more transformer layers, greater hidden size, and a larger dataset than I do (apparently too many dimensions with small datasets worsens the performance in my current experience)
like yesterday or mby it was early morning today when I was running those gpts for next token prediction, using a hidden size of 512 basically made it so it didn't learn at all (now thinking back, maybe it could've been caused by using encoder + decoder for next token prediction...), anyway, I reduced the hidden size to 128 (16 per head) and it immediately started learning again
embedding dimensions
how many dimensions a token is embedded into
mhm, that might have been what's happening
I assume it's been tried, but what about using a quadratic instead of a linear function? instead of Linear(x, M, b) = Mx + b use sth like Quadratic(x, A, B, c) = Axx^T + Bx + c
the linear layers with quadratic layers? it probably has to be differentiated twice though of which the second time is gonna be a const anyway.... hmmm
If you're in doubt about good initial hyperparameters a good thing you can do is make them too large and fit on a single batch
Your loss should go to 0, if it doesn't you have a bug. It's kind of a unit test for your architecture 😄
Aside from time (how long it takes to train it all) starting big and going smaller is a good idea
Any good guides for LSTMs for forecasting time series data?
watch videos on how object detection ais work
Ok
finally got pytorch running with cuda yall arent ready 😈 😈
wanna see it?
I do not know. I have been at this for a while. I base my self=worth on this and never ever stop doing it. I never feel like I am good and with all of this pytorch TensorFlow stuff (not that bad) like, I do not know, I always feel like I am trash. Like, I do not care about money. This data thing is a massive massive obsession. Whatever, everyone is obsessed with something
Like, you go pretty hard. What do I lack?
@serene scaffold it was this piece of garbage https://github.com/nickkatsy/python_ml_ect_/blob/master/hotel_nlp.py
ValueError: The filepath provided must end in `.keras` (Keras model format). Received: filepath=model_output/test01/model-{epoch:02d}-{val_loss:03f}.h5 i ran same code on my system and it worked fine but on kaggle it throws this error
does this mean i have to save the file with .keras extension or can i bypass it?
evidently, it expects a file with a .keras file extension. though if you change filepath=model_output/test01/model-{epoch:02d}-{val_loss:03f}.h5 to filepath=model_output/test01/model-{epoch:02d}-{val_loss:03f}.keras (and change the name of the file to match), you'll get a different error if the file isn't structured the way it expects.
you don't need to be so hard on yourself.
a tip: when working with pandas, assume that there's always a solution that doesn't involve .apply. you've used .apply several times when you should have used native pandas methods.
# not this
df['content'] = df['content'].apply(lambda x: x.lower())
# do this
df['content'] = df['content'].str.lower()
mcp_save = ModelCheckpoint(os.path.join(outdir,'model-{epoch:02d}-{val_loss:03f}.h5'), save_best_only=True, monitor='val_loss', mode='min') its giving error in this line
so here should i add save_format = none
in fact, every time that you use apply before at least line 68 should have been a .str. method.
https://pastebin.com/WXXNeX8t here is the whole code
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
i searched on google and someone said in latest version they did this to increase the usage of .keras
yeah, I was doing that. I do not know why I was doing the other method without lamda. I do not remember that one. I did like 4 today because I am trying to get them on lock
thank you for this: # not this
df['content'] = df['content'].apply(lambda x: x.lower())
do this
df['content'] = df['content'].str.lower(); Do you mean in general when it comes to cleaning text and stuff? I do not know, I kinda forgot what dataset that was in all honesty.
whenever you're trying to make a Series of strings lowercase, use .str.lower(), not a lambda. or apply
Thanks
Hi, isn't this like none of the above?
my orange line is where there's the largest explained variance ratio right?
why not d?
yep i chose D haha thanks
D is basically what i said right?
it's perpendicular to the "/"
technically at least
this is like an ellipsoid in R^2 right?
yeah
okay thanks
is variance and mse calculated the same? 
also why does sample variance divide by n - 1 instead of n?
To correct for sample bias.
You don't have the variance but you have a sample of the variance. Ideally, when n goes to infinity it converges to the actual variance, that means it's an unbiased estimator.
The actual reason has to be with taking the expected value of the statistic (in this case the variance) and checking if it's equal to the population variance. It isn't, you need a correction to get there. That's where the term comes from. It's a typical thing you do in a statistics class, it's been a while for me
No they're different
Ah wait, I see what you mean
Yeah that's a correct observation
very cool, thanks
hi, whats a good beginner data science python project?
they don't
they only do if whatever generates your estimate is "unbiased", yielding the correct value in expectation
otherwise the MSE is bias + covariance
I meant like, it's a mean square over a bunch of differences
yes but the meaning is different
you can get a huge MSE with 0 variance
e.g. if your function just outputs 0 always, regardless of input
I get that (well, I understand that that's the case, I'm unsure of the deeper details)
I just found it surprising the formulas were conceptually the same
on purpose, so that you can study the mean and covariance :p
they both describe second order statistics
oh, is that why it's a square?
yes
https://en.wikipedia.org/wiki/Moment_(mathematics) some people call these "statistical moments"
In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment (normalized by total mass) is the center of mass, and the second moment is the moment of inertia. If the function is a probab...
I'd call this an episode of enlightenment 
you could generally say that the two things (variance and MSE) are "second moments". the variance is the second central moment (subtract the TRUE mean of the random variable). the MSE is the "second moment" (NOT central) of the ERROR
if your estimator is unbiased, then the estimator's mean is the true mean and the error is now centered at zero, turning the MSE into a second central moment (a variance)
the example of the 0 estimator i mentioned before is pretty important because it reminds you that the MSE doesn't tell you the nature of the error. it's up to you to verify later if it's variance or bias
oooh, the puzzle pieces are coming together
at any rate, the point being that the MSE formula looks like the variance formula not because the MSE is a variance, but because both the MSE and the variance are the same kind of object, a statistical second moment
unbiased relative to the population, right?
like if you do linear regression and find the true mean, your bias value might not be 0, but still unbiased, is that correct? 
i see what you mean and you could technically define the bias either way
but we normally refer to the true parameters, so bias is defined w.r.t. the true mean and not the population one
this is what the correction factor zestar mentioned addresses
but you're mixing two things up as well
because i can have data of a population
compute the variance of that data
but the data has a true variance
so i can also compute the MSE of the variance
and those are two separate things 😛