#data-science-and-ml

1 messages · Page 121 of 1

jaunty helm
#

basically I know there's data imbalance, but after down sampling the models still can't predict Enrolled too well, so I'm asking what are some techniques I can apply here (multiclass classification, 1 class performs worse than all others)

cedar tusk
#

from what i see, without feature engineering this data can yield at most .76 accuracy

jaunty helm
#

I guess I'm more asking for techniques that I might not know which can help in this situation
obv good feature engineering probably helps, but are there other methods that I should know about? that kinda stuff

cedar tusk
#

well, if i were you i would begin with feature selection

#

then do either factor analysis or pca

#

to see if you can get more knowledge out of the data

#

to be honest this 3/4 accuracy is related to there being too many categorical variables

#

if there was more numerics it would have been better

noble topaz
#

Hello guys. I have a question. How can i know how many hidden layers has a CNN? I want ro build one with 5 hidden layers but i am getting stuck

cedar tusk
#

every line that begins with model.add(Dense is a layer, the last one being the output.

noble topaz
#

I need to do this in pytorch. Thanks for tou fast reply

#

Your*

cedar tusk
#

pytorch has a similar syntax

#
# Define the neural network model
class BinaryClassificationNN(nn.Module):
    def __init__(self, input_size):
        super(BinaryClassificationNN, self).__init__()
        self.fc1 = nn.Linear(input_size, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        x = self.sigmoid(x)
        return x
#

this is a binary classification neural net in pytorch

noble topaz
#

3 hidden layers and one output

#

Am i mistaken?

cedar tusk
#

1 input, 1 hidden, 1 out

#

ther nn linear takes the arguments input number, output number

#

and u add a sigmoid to the end to take the value in the last neuron and make it into a probability

molten elk
#

Here it is:

autumn gull
#

actually im trying to build a personal vistural assistant and ive trained a tfidf vectorizer on dataset

#

which part of code should i share

past meteor
#

Can you make a confusion matrix? This is the case where it shines

jaunty helm
# past meteor Can you make a confusion matrix? This is the case where it shines

you mean this right?

pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=['Dropout', 'Enrolled', 'Graduate']),
    index=['Dropout', 'Enrolled', 'Graduate'],
    columns=['Dropout', 'Enrolled', 'Graduate']
)

# original
-----SVC-----
          Dropout  Enrolled  Graduate
Dropout       206        29        49
Enrolled       31        52        76
Graduate        9        19       414

# class_weight='balanced'
-----SVC-----
          Dropout  Enrolled  Graduate
Dropout       193        57        34
Enrolled       28        92        39
Graduate       12        83       347

# down sample
-----SVC-----
          Dropout  Enrolled  Graduate
Dropout       134        50        16
Enrolled       30       131        38
Graduate        7        30       163
past meteor
jaunty helm
past meteor
jaunty helm
#

mkay

#

should I use class_weight='balanced'?

past meteor
#

Have you split off some data properly? If so, just looking at those records that are misclassified helps

past meteor
jaunty helm
past meteor
#

train test split followed by cross validation

#

And not infinitely checking the misclassified rows of your test set (that's cheating)

jaunty helm
jaunty helm
# jaunty helm mkay

tuning it a lil got me

{'C': 15.705979599964326, 'gamma': 0.0019505526555292866}

# original
-----SVC-----
              precision    recall  f1-score   support

     Dropout       0.85      0.72      0.78       284
    Enrolled       0.54      0.38      0.45       159
    Graduate       0.77      0.93      0.84       442

    accuracy                           0.76       885
   macro avg       0.72      0.68      0.69       885
weighted avg       0.76      0.76      0.75       885

          Dropout  Enrolled  Graduate
Dropout       204        30        50
Enrolled       27        61        71
Graduate        9        22       411


# down sampled
-----SVC-----
              precision    recall  f1-score   support

     Dropout       0.78      0.67      0.72       200
    Enrolled       0.61      0.66      0.63       199
    Graduate       0.75      0.80      0.77       200

    accuracy                           0.71       599
   macro avg       0.71      0.71      0.71       599
weighted avg       0.71      0.71      0.71       599

          Dropout  Enrolled  Graduate
Dropout       133        51        16
Enrolled       30       132        37
Graduate        7        34       159
jaunty helm
#

welp gotta leave
guess I'll just let optuna run for a while and see if it comes up with smthn better

lapis sequoia
#

Does anyone have any experience with derivative free minimization

#

Are there any new methods that are better

past meteor
#

GridSearchCv from scikit is enough here

river cape
#

Hello everyone

#

I have used a virtual environment and installed these packages

#

but I cannot get them to be imported

serene scaffold
#

@river cape your editor must be using a different environment than the one where you installed stuff

river cape
#

But it isnt showing me the environment which I want

serene scaffold
#

@river cape what editor are you using

river cape
#

VS code

serene scaffold
# river cape VS code

You need to figure out where myenv is and tell Vs code to use that. You'll also need to restart Jupyter.

river cape
#

I deleted that

#

Yet it still points it to that

river cape
#

I have created another environment for this current folder , it doesnt show up in this

clever cipher
#

I have a question... What would be the logistics of running a small local language model in my simple 2d game that returns strings which are commentaries of various game events? (could be stuff like dialogue)

I'm very new to this, so please forgive the broadness of the question, but how feasible would this be? Where would be a good place to start regarding training my own small scale models?

serene scaffold
#

Models like ChatGPT are specifically generative and interactive language models

#

But language models are actually a much broader class of model than that.

#

You can make a generative language model that's based on markov chains on your laptop. Though I'm not sure it would produce coherent responses to game events

#

You would also have to encode each game event as a natural language statement.

serene scaffold
#

Or remember it.

spring field
#

or if not fine-tune then throw prompts at it to make it behave the way they want

serene scaffold
#

I should have said that, I was distracted by pycon

spring field
#

speaking of language models, my last attempt at improving the multilayer GRU next token predictor was to add layer normalization between them, to... speed up training? at least that's what I understand all these normalization layers are for (I actually managed to read skim over a couple papers on the topics I wanted to find out more about (currently reading the paper on attention)), not sure how much of an impact those layers had, but nonetheless, without handling class imbalance it converged to predicting only . pretty much, handling class imbalance the test loss just went up and up, so uhhh, idk what could be the issue (maybe I should MLFlow through some hyperparameters), maybe the dataset is not large enough, maybe this or maybe that, I don't know, I'll now proceed over to transformers though and yeah, that's kind of my little update 😁

past meteor
late lichen
#

I want to do a topological sorting

#

But I don't know how

Pls someone help me

#

My input data is Like

{
   "Node_ID":["bias : int",
            "Activator : callable",
            [["Descendant_ID : int ,
               weight : float"], ...]],...
}
keen crow
#

Hi guys! Im currently working on a ML project which consists of training a Resnet 18 model to learn to predict tire thread depth. I have a fully working code right now, but its not achieveing desired accuracy. I have tried a lot of different stuff but still cant seem to achieve the desired goal. Would someone mind helping me figure out a solution to get better accuracy? I would really
it!

late lichen
#

Yes I will use DAGs network

late lichen
#

Dead chat?

agile cobalt
#

alternatively, you could use the source code of NetworkX as a reference to how to implement it yourself if you truly must

late lichen
#

Thanks

spring field
#

I mean that I pretty much make the distribution of the words the same, i.e., factor / occurences, that is apparently something one should do and it definitely makes sense for classification tasks or regression and such. It might be that I need to increase the threshold for when a token gets put into the group of rare tokens. I think can see how in next token prediction class imbalance might be desired, but it just converges to the most common token it seems, could be an issue with my model obviously, is GRU used for next token prediction even? There are frankly tons of variables here at play obviously, but I think for now I'll certainly just move on and direct my efforts towards understanding transformers.

spring field
#

yeah, I remember, I'll just assume something's off with the model then
and continue with transformers
cuz I also need to cover ViT afterwards

vestal imp
#

"Layer 'normalization' expected 3 variables, but received 0 variables during loading. Expected: ['normalization/mean:0', 'normalization/variance:0', 'normalization/count:0']" guys does anyone knows abt this error?

#

so far online source doesnt rlly provide a good answer

vestal imp
#

For my case i have trained the model on gg colab which used tf 2.15 and now I'm loading it out using tf 2.15, mayb the process of downloading the file from gg drive to my machine has problems?

#

So u mean like the version is matched so what if left is the .keras file download got probs?

#

Ah gotcha, i will look into it, ty for the help mate

jaunty helm
# past meteor Yeah, you only have 2 parameters so this is something I'd grid search and not us...

welp, I'll keep that in mind the next time 😅
anyways, this is what I got

{'C': 31973.23413892633, 'gamma': 9.61611073865163e-05}

# tuned on CV(X_train) and checked on X_test
-----SVC-----
              precision    recall  f1-score   support

     Dropout       0.82      0.70      0.76       284
    Enrolled       0.51      0.39      0.44       159
    Graduate       0.78      0.91      0.84       442

    accuracy                           0.75       885
   macro avg       0.70      0.67      0.68       885
weighted avg       0.74      0.75      0.74       885

          Dropout  Enrolled  Graduate
Dropout       200        34        50
Enrolled       30        62        67
Graduate       13        25       404

# no hyperparams
-----SVC-----
              precision    recall  f1-score   support

     Dropout       0.83      0.70      0.76       284
    Enrolled       0.51      0.36      0.43       159
    Graduate       0.76      0.92      0.83       442

    accuracy                           0.75       885
   macro avg       0.70      0.66      0.67       885
weighted avg       0.74      0.75      0.74       885

          Dropout  Enrolled  Graduate
Dropout       198        34        52
Enrolled       26        58        75
Graduate       14        21       407
jaunty helm
slim finch
#

Hello everyone,

I have developed a Python application for Windows that transcribes speech using OpenAI's Whisper model. I've also created a small UI for this app. However, I'm running into issues when trying to create a .exe file to share the program.

The main problem is with the backend: when I try to transcribe speech using the .exe version, I encounter various errors. It appears that not all dependencies are being included in the installer, likely due to the extensive nature of the Whisper model.

Could anyone advise on the best way to package such a large project with a substantial backend? Any tips or solutions would be greatly appreciated!

Thank you!

past meteor
boreal nest
jaunty helm
past meteor
jaunty helm
dreamy sorrel
#

hi guys! I am working on a multilabel classification task, and i have 3 models trained with different datsets. I want to ensemble these models. whats the best appraoch?

vestal imp
#

I mean just tf.keras.models.load_model ye?

#

Does being a h5 or a keras file has anything to do with it aye?

vestal imp
#

weird both keras and tf are 2.15 in both places

orchid forge
#

I just needed to know

warm trellis
#

Anyone here using lightening ai? to train their model

orchid forge
#

If thishttps://www.datawars.io/
Is a good place to make projects

DataWars is a Project-based playground with +1000 ready-to-solve, interactive, Data Science projects. Practice your skills solving real life challenges in an interactive, real-life Data Science simulator.

river cape
#

print("{0:20}{1:20}".format(word, wordnet_lemma.lemmatize(word, pos='v'))

#

Any idea as to what does :20 mean?

agile cobalt
#

!e adds some empty space (padding) to align ```py
for name in ('Foo', 'Title Bar', 'Long Title Baz'):
print(f'{name:20} - test')

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | Foo                  - test
002 | Title Bar            - test
003 | Long Title Baz       - test
agile cobalt
#

!d str.format

arctic wedgeBOT
#

str.format(*args, **kwargs)```
Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces `{}`. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

```py
>>> "The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'
```  See [Format String Syntax](https://docs.python.org/3/library/string.html#formatstrings) for a description of the various formatting options that can be specified in format strings.
agile cobalt
#

See Format String Syntax for a description of the various formatting options that can be specified in format strings.

normal python
#

Confession time: Trying to make pytorch to work with ROCm is the most rage inducing, frustrating and completely awful experience that anyone ever had to experience

left tartan
#

tbh, I think it's a good idea. Not a lot of practical project sites for data.

#

It seems like it's more educational/useful than leetcode

warm trellis
#

Hey guys. I've a weather dataset where I train data to learn how to predict solar irradiance. Now after it, I wanna use this model as a base model for PV solar energy prediction in transfer learning, but it does not do any good job in the latter no sort of change. So how can I debug what went wrong?

spring field
#

that's intriguing

#

but it's only 50 days

#

like a db, but with history

final swift
#

I'm working on a project where I'm trying to get an AI to learn how to play blackjack, would it be simpler to get the AI to predict whether you will lose or not depending on each move you could make and then make decisions based on that, or would it be simpler to just go straight to making decisions based on the state of the game and then improving based on the results of that?

serene scaffold
final swift
#

Okay thank you.

serene scaffold
#

If you have a model that "predicts whether the next move will eventually cause you to win or lose", you still have to decide how the model will learn to make that prediction. Which still involves learning some understanding of what makes one move better than another

#

Does blackjack involve random chance?

frosty fulcrum
#

does anyone know what's the best deep learning model for Regression task?

serene scaffold
frosty fulcrum
serene scaffold
frosty fulcrum
# serene scaffold Try telling the chat about the dataset

Well, the dataset is in the CSV format and contains 21 features, whereas the Yield values range from 0-1. I’ve tried XGBoost and CatBoost, but the accuracy seems to be stuck at 86%. I believe the dataset is imbalanced, but I want to try other models as well to see if I can reach 90% without doing data augmentation.

serene scaffold
frosty fulcrum
frosty fulcrum
jaunty helm
jaunty helm
#

and also if you know the data is imbalanced, accuracy may not be what you want to look at, something like f1 might be better

orchid forge
#

idk, tell me if its good

spring field
#

Yeah, idk, I implemented a transformer, but it's only predicting dots... It does seem to converge faster to just predicting this one symbol compared to the RNNs, but nonetheless that is not the expected result as one might imagine
I mean, at this point, I have tried like at least 3 separate models (two RNN types, now one Transformer type), they just don't want to work and I'm a tad lost here
maybe it is the dataset, maybe it is that it's not actually tokenizing, it's more just splitting words
another thing I noticed is that they had a lot more sentences and a lot more words in the vocabulary, the dataset I'm using has 10k tokens and 34k sentences and I did adjust the other hyperparameters according to the paper (Attention Is All You Need) (as closely as possible)

#

dots are the most common token, by quite a large margin too

#

I'll let it train

#

maybe by some miracle, a miracle will happen, idk, will see later ig

orchid forge
#

what is itertools im not able to understand

spring field
#

tools for various iteration needs, you're gonna have to be a bit specific if you want a more specific answer

orchid forge
#

it is in a code which which is combining two different dataset columns

spring field
#

may I suggest the documentation for itertools.product
it's really just a couple iteration utilities, certainly one of the built-in modules I would recommend being familiar with at least to some extent

orchid forge
#

hmm

#

well is this part of data analysis or data science?

#

or anything else

#

and why is it called module

spring field
#

if you make it a part of either, then it is a part of it

orchid forge
#

ok

spring field
# orchid forge and why is it called module

while I can't give you the exact origins of the name, I assume it's to do with Python files being modular in that you can import them and reuse and stuff like that, so, yeah, you can create your own modules, you can use built-in modules, you can use 3rd party modules, they're also sometimes called packages when there are several modules grouped together or libraries, but yk, they're all modules when you import them anyway, it's just an object in Python pretty much

#

it's probably something I'd recommend getting familiar with before trying to learn stuff like pandas

orchid forge
#

yeah sure, its funny question weird things just becuz i wanna to do english literature than this

#

i question the meaning of word instead questioning what is it use for lol

#

hahaha

spring field
#

I forgot to add rollouts so I'm stuck with test predictions and not some fun fresh stuff, but at least it's not just dots

orchid forge
spring field
#

well, did you read the documentation? what part of the docs did you not understand? maybe I can help you explain it.
on a higher level it just seems to have been used to generate those combinations pretty much

spring field
spring field
spring field
#

(I still partially blame the dataset)

burnt pond
#

How do I start learning ml and ai

Like I know the maths such as calculus and statistics I have also learned python so I would like to know how to start learning ml and ai , would also like if you provide some Good teaching websites or yt tutorials not those too heavy ones but just some basic to intermediate so that I can atleast create my own models and also fine tuning them

spring field
orchid forge
orchid forge
#

oh thanks, i guess ypu appove this website. im happy im learning on it. that i first came up with this, now everyone knows. haha i'm so amazing

#

what sujbects i would need to be a good data analysis tell me'

#

i'll go there and learn it

#

i'll go with 'Probability and Statistics' for now and learn it before i go to sleep

#

instead of wasting time playing "age of mythology"

#

"One strategy is to use the website's API, if available. APIs often provide a more structured way of accessing data and are less likely to have blocks in place to prevent scraping."
what does website API means?

orchid forge
#

oh

#

so we read the given set of rules and understand the block they might give us for their authentication requirement and then do something with the block given

#

right?

#

i'm sorry for my English i sound do unprofessional

#

yes

fleet epoch
#

hello Im new here im doing one project with neural collaborative filtering and i tryed to do it thru gpu I have win 11 and i downloaded latest version of cuda bud pytorch cant find it. and yes i have GPU what shoul I do ?

orchid forge
#

ohhh god how can u make so much sense

#

hmmm right

#

but lets assume if there's a block

#

oh

#

god wow

#

how could u know this much man

#

you're amazing

#

so you're a data analyst ?

#

idk what is that but sounds cool

#

oh

past meteor
#

Maximum likelihood estimator

orchid forge
#

i think it might be night there, if u r working in day, i think u should sleep and rest, u seem like a hard working person idk

past meteor
fleet epoch
#

no pure win

orchid forge
#

yeah well u r a solid person

past meteor
#

Yeah, the latest torch versions don't work with GPU on pure windows

orchid forge
#

good man your future is bright

orchid forge
#

good luck

past meteor
#

It's explained somewhere in the docs

#

Actually I'm wrong sorry

#

This was the case with Tensorflow 🥴

lapis sequoia
#

does anybody know of a good way to generate smooth noise

past meteor
orchid forge
#

i mean i love this server becuz people here help a dumb person like me to understand things in better human language, i really trust u guys blindly. i've only came across smart people from this people. who always help me learn. you're one of them. thanks for helping me. you rock!

cedar tusk
lapis sequoia
lapis sequoia
#

smooth noise

#

how in any dimensions

cedar tusk
#

do it for each dimension and voila

#

u prob can even type a oneliner like this

df = pd.DataFrame({f'Column{i+1}': np.random.permutation(np.random.normal(0, 1, 100)) for i in range(3)})
#

change the 0 to be mean, 1 to be the deviation and 100 to be the amount of values

#

oh wait a sec, u mean smooth as in the values must be continuous?

#

ok scratch that do this instead

import numpy as np
import matplotlib.pyplot as plt
from perlin_noise import PerlinNoise

# Set the random seed for reproducibility
np.random.seed(42)

# Define the size of the grid
grid_size = 100

# Create a Perlin noise object
noise = PerlinNoise(octaves=4, seed=42)

# Generate the 2D noise grid
noise_grid = np.array([[noise([i/grid_size, j/grid_size]) for j in range(grid_size)] for i in range(grid_size)])

# Display the noise grid
plt.imshow(noise_grid, cmap='gray')
plt.colorbar()
plt.title('2D Smooth Noise Grid')
plt.show()
#

this yields this image

#

which also has perlin

spring field
#

in like 2 hours, but the good news is that it worked out in the end, yay, I was pessimistic at the start, but boy did a miracle (it's called math) happen, it went to like over 90% test accuracy, all the example inputs and predicted outputs that were printed matched perfectly, I'll try to do inference as well, as I didn't have any rollouts going on during training
oh and those attention matrices... man, those were beautiful to see
will share the fun stuff in them couple hours as well

spring field
#

honestly, kinda crazy

#

also the actual metrics at epoch 22

#

so yeah, very cool, onto the other stuff now

spring field
#

sure, I can try, if there will be any free GPUs available on paperspace, well, they'll become available eventually, so... yeah, but sure, can do

#

but using the dataset I have or sth else?

serene scaffold
#

<@&831776746206265384> it appears that @vernal thunder is advertising a YouTube channel

long locust
#

Hello, your message has been deleted due to violating rule 6 of the server, which does not allow advertising

craggy agate
#

People, how do I use my M3 chip's GPU for tensorflow?

left tartan
#

Hah, Excel just announced a cutting edge new feature today:

craggy agate
#

I have tried to install tensorflow macos but it didn't work I think.

#

Failed to install tensorflow metal from pip

#

and tensorflow GPU

spring field
craggy agate
#

Rough take, switching from Tensorflow to Pytorch is a pain.

#

I am learning Pytorch but it just seems hella complex compared to tensorflow

#

My take is in TF you gotta worry about the logic and building of the model more than the syntax, for Pytorch, you gotta do both.

agile cobalt
#

can you show some examples of which syntax elements you're thinking about?

craggy agate
#

I could do the same thing with half the code with tensorflow

#

That's a little extreme but you get the point

#

Yes but why the need for OOP?

#

Tensorflow doesn't use it, idk why Pytorch needs to.

#

Enlightened me if I got it wrong

spring field
#

last I checked, TF definitely does use OOP
and yes need, you gotta keep a bunch of internal state somewhere

craggy agate
#

Everything I see with Pytorch just makes my head hurt

spring field
#

well, ig there's the whole torch.nn.functional which at least partially is getting depreacated for torch, so... hmm

craggy agate
#

I assume this is initializing the nn?

#

I am 100% lost 😂

spring field
#

wait, I swear I saw a warning about some nn.functional thing getting deprecated for just torch.

craggy agate
#

Where would y'all recommend I learn Pytorch from?

#

Documentations?

past meteor
#

tensorflow does OOP too as soon as you want to make any non-trivial model

craggy agate
#

Ong learning data science is insanely difficult without university courses...

#

Deep learning specifically

past meteor
#

They've both converged to very similar libs, especially with LazyLinear, LazyConv2D and so on being a thing now in Torch

past meteor
#

the hard part is rarely the code though

past meteor
craggy agate
#

Like I understand all the theory to NNs but this is just a tad bit confusing

past meteor
#

Even if you're doing a uni course you still venture out and learn on your own

craggy agate
past meteor
craggy agate
#

The course I am following should have let me know that OOP would be required, never bothered to learn it before.

#

Should probably learn it now.

spring field
#

a factory of closures? that sounds like peak Javaism

past meteor
#

I'm not even sure Java has Closures

craggy agate
#

Yeah.

past meteor
#

You need free functions for closures. Maybe they have it now because they're adding every feature known to man in the latest Java versions, but I digress

spring field
#

but ofc they have anonymous classes

Java initially didn't have syntactic support for closures (these were introduced in Java 8), although it was fairly common practice to simulate them using anonymous inner classes.
https://stackoverflow.com/a/3805576/14531062

past meteor
past meteor
#

You can do this:

from dataclasses import dataclass

@dataclass
class Network(nn.Module):
  w1: nn.Parameter
  
  def forward():
    pass
#

dataclass is stdlib

#

I'm sure there's a way you could do it. You can pass default, mutable parameters with field_init or something similar

#

60-70 % of the classes I make are with dataclass but I don't make torch stuff with it (or use dataclass + inheritance, in general)

#

Something in me thinks it'll go south (and I don't know why)

#

like some sort of edgecase

spring field
#

default_factory FTW

past meteor
#

well, I'm not gonna find out any time soon 😂

#

There's something I'm missing tbh. I love absolute reproducibility but not the effort it entails. Some CLI tool that registers model / experiment runs and you can "reactivate" them with the CLI tool, which does a checkout to a commit and runs the .py file

vapid pumice
#

Hey guys I need a help here,

I'm trying to make a Website Summarizer using Python and AI.
-> Which model is best for this use case? And I'm very new to this so I want to know if any type of fine-tuning can be done on the model.
-> Is there any web-service where I can deploy the model for free so I can use it as a backend to test, or I can run it locally in my laptop.
-> Is there also any lightweight fast models for this use case.

Ik there are too many questions here, and any help is appreciated.

past meteor
spring field
#

there are several issues with this
first, how are you gonna retrieve the content of those websites? you have to make sure you're not violating their ToS
second, it will most likely cost something, there are free models that can deal with RAG such as langchain, but if you want to host it someplace, you'll need to pay for the computation time, even if it runs on the CPU
third, ehhh, fine-tuning? probably won't be necessary if you just go with RAG (which I'm not sure where that technique lies in the state of the art hierarchy)

past meteor
#

I spend very little time making "data science" code pretty 🤷

spring field
#

data science code has got to be the ugliest code I write casually
I try my best whenever I can though

past meteor
#

My DS code is really bad. I code the happy path and constrain myself to only using that

#

But, I'd say my standard for "really bad" isn't really bad (humble brag)

#

I still have a couple of interfaces, use some level of OOP etc etc but it's definitely a lot worse than my other code for various reasons

spring field
#

can multi-headed attention be parallelized by using a bigger weight matrix?

past meteor
#

The interfaces are mostly because for instance for work I was making encoder decoder models. It didn't makke sense, imo, to just flat out code a Seq2Seq but to make an TimeSeriesEncoder TimeSeriesDecoder because you can have a CNN encoder, with an RNN decoder, vice versa or CNN, CNN, RNN RNN and so on

spring field
#

oh, thought so

past meteor
#

I'm making something in this space soon 👀

#

Very early PoC

craggy agate
spring field
past meteor
#

General purpose inference server, built in Rust 🦀 that loads models from "storage" (S3, minio, azure blob, filesystem, ...) that are in ONNX format. Next to that also make a small Python lib that wraps libraries that export models to ONNX, add metadata necessary for my inference server and store it in the "storage" (that rust is reading from). Deploying a new model as an endpoint would just be export_to_storage and the endpoint is created automatically

craggy agate
#

If classes are classes then what are classes?

#

You were talking about the "class" objectright?

#

I see, okay

spring field
#

well, type is the default metaclass and other metaclasses have to inherit from it

past meteor
#

(maybe not details that matter to beginners tho)

spring field
#

everything (aside from type ig) inherits from object

#

but tbf, I would not worry about it even at an upper intermediate level

#

everything's a PyObject *

#

in CPython anyway

past meteor
#

Metaprogramming is dangerous because it's easy to convince yourself you need it

#

Yup, but particularly for things you train yourself. Especially traditional ML models.

#

Doubt it. it's not a hard language at all imo but you need patience + to care about programming

vapid pumice
past meteor
#

And that's only a small % of people programming professionally (and that's totally ok! Your job doesn't need to be your hobby)

past meteor
#

ah, an unimportant detail is I want the project to be usable by people that only know Python and no Rust

#

Like through a CLI tool that spins up the inf server so as a user you just focus on making models and putting them in the right storage

#

Hmmm, I doubt it because cpp doesn't have web related stuff that is popular + it's much more of a pain to write

craggy agate
#

Here is a question, would you guys say chatGPT is an "AI" or a glorified search engine?

past meteor
# craggy agate Here is a question, would you guys say chatGPT is an "AI" or a glorified search ...

With AI you probably mean "artificial general intelligence" (also known as AGI)?

The term AI (without the G) actually includes many "basic" things like search algorithms. When I say search here I don' tmean search engines but I mean depth first search, A*, uniform cost search and so on. These are traditionally seen as AI as well.

Now if the question is is it "AGI or a search engine" then the answer is: "ask a philosopher" 😂

craggy agate
#

What I mean is that it's not actually doing anything, its trained on a massive massive dataset so anything you might ask would be in it's dataset, it doesn't do anything out of the box.

agile cobalt
#

REST APIs are mostly a solved problem - you don't suddenly decide you need to change your entire API to use XML instead of JSON, or stop using HTTP Headers to use a different type of request metadata

deep learning is much more experimental on that aspect, so the libraries do have to provide more low-level access, so that researchers can experiment with using different existing or new architectures, including all sorts of layers and connections between them, different ways of training the models, creating new activation functions and so on

#

You can use huggingface transformers if you want something on a similar level of abstraction to fastapi

past meteor
#

Did you use TF1?

#

Well then you can see the progression we've made from TF1 all the way to Pytorch

#

But, I think @agile cobalt 's answer covers it pretty well

craggy agate
past meteor
#

The fact we have automatic differentiation is BIG

agile cobalt
#

I guess that another point is that ML libraries cannot sacrifice performance at all, while most wouldn't complain if you got like 50% slower on fastapi because of using middlewares instead of defining things in a more low level way

past meteor
#

it's easy to overlook the strides we've made. For something that is still research (like etrotta says) and not just code => solution means we might be there

#

Like?

craggy agate
#

It has learnt all of python or all of C# from its dataset, similar to how we learned those programming languages. It could be argued that that is what makes it AI.

wooden sail
past meteor
#

Just look at Keras for instance

#

So what's your point then 😭

agile cobalt
#

you realize that app = FastAPI() is already oop right?

craggy agate
#

Or are there some other parameters you are talking about

wooden sail
#

not necessarily, but sure

wooden sail
craggy agate
craggy agate
agile cobalt
#

that you cannot reach the level of abstraction you are thinking about without making sacrifices, and these sacrifices are not (yet) viable at this point in time

past meteor
#

Hmm the terms used are ill defined so you can't really have a discussion

wooden sail
#

which falls under deep learning, and that also falls under machine learning

past meteor
#

AI > ML > LLMs

wooden sail
#

yep

past meteor
#

So LLMs are per definition AI

craggy agate
#

What about it though? I understand that but I was thinking about it logically

#

Not definition wise

craggy agate
wooden sail
#

what exactly is your question?

craggy agate
#

By that I mean it's not inventing anything.

craggy agate
#

I can't exactly explain what I mean but I hope you get the general idea

wooden sail
#

otherwise, AI does not exist if that's what you're getting at

#

zestar was right in pointing out you mean something at or above AGI level

craggy agate
past meteor
#

Hence why this is a question for philosophers

craggy agate
wooden sail
#

so as i said, from your view point, there is no AI at all

wooden sail
#

AI broadly refers to data-driven optimization, which is what i pushed in that direction

#

in that sense, GPT is AI

craggy agate
#

Like there was one checkers bot that defeated the world champion by inventing a new out of the box move.

past meteor
wooden sail
#

in the sense you're making up, there is no AI at all

past meteor
#

I think it's what I'd recommend for "engineers" that have to train a model or so

wooden sail
#

where do you draw the "engineer" line

craggy agate
#

I would consider that to be more of AGI. Cause it's doing some inventing and thinking

past meteor
#

good question, I'd say specifically software engineers

#

ML engineers are also engineers

wooden sail
#

(my phd cardboard, if i ever finish this shit, is gonna say dr. ing. too)

past meteor
#

nice

wooden sail
#

so technically engineer here too

past meteor
#

The ing. flex is a thing in germany too

#

Well, you have ir. as a title for MSc engineering science (the hard one) and ing. for engineering technology (the easier one)

wooden sail
#

ooh keras with jax, tf, and pytorch. nice

#

hmm interesting, i'm not sure if that distinction exists here as well

#

gonna have to ask

past meteor
#

It's because of the bologna accord soup thing

#

It's pretty bad I've been in situations where I saw an ir. tell and ing. "don't touch the machine, let me get an engineer!" on the shop floor 😭 (because 1 has more prestige than the other)

wooden sail
#

yeah sorry

#

(all the terms are bullshit anyway)

#

AI is pretty tough to define broadly enough

past meteor
#

We did things like search algorithms

#

Graphical models

wooden sail
#

you hear people talk about game AI all the time and it's just like a for loop and 3 if statements

past meteor
#

Prolog 😦

#

Honestly it just depends on who you ask as well

#

I learnt about a bunch of methods in operations research that were also called "AI" in later years

craggy agate
#

I think we are at narrow AI

#

AGI is par to the human brain

wooden sail
past meteor
#

It's the same as linear regression being AI

craggy agate
#

According to the definition I think

craggy agate
#

Stack overflow is getting GPT?

#

Didn't they already have an auto mod

past meteor
#

Anyhow, to give my last message in something that is going off topic.

AI is basically whatever creates value for critical stakeholders by means of increasing the amount of venture capital they have to burn.

craggy agate
wooden sail
#

that immediately excludes me

#

no but i look dumb 😌

iron basalt
#

On the inventing part, if it's just making something new, that is trivial, if it's making something new and useful, we already do that with AI / optimization: https://en.wikipedia.org/wiki/Evolved_antenna

In radio communications, an evolved antenna is an antenna designed fully or substantially by an automatic computer design program that uses an evolutionary algorithm that mimics Darwinian evolution. This procedure has been used since the early 2000s to design antennas for mission-critical applications involving stringent, conflicting, or unusual...

#

And as for thinking, it's just not really defined (beyond the not very useful stuff like "it's the process by which someone comes up with a solution").

craggy agate
#

I know I am throwing the term AI very loosely but you get the point

iron basalt
craggy agate
iron basalt
#

But that is really just because there are things that humans are not good at all, and computers are, and for which you can make a well defined procedure.

#

Technically a human could find that move if they also did the search algorithm by hand on paper, it would just take really long.

#

So time is an important factor here.

iron basalt
#

It's meant to just be a chat bot.

wooden sail
#

that's a kinda "trivial" case though. there are finitely many (though really a LOT) of possible chess games. you don't need anything clever to make a "new" move. just loop over all of them and play a good move for your situation, no special thinking involved. people don't do that because memorizing a set of very good moves and having the skill to recognize them and know when to use them gets the job done. you can go out of your way and play weirdass moves if you like though

iron basalt
#

Although it's advertised as much more.

craggy agate
#

I kind of agree with this thinking but I was having a conversation with someone and wanted a second opinion

#

Ask it what script it's following

#

Or something like that

iron basalt
craggy agate
#

Not exactly a languages expert lol

sick eagle
#

guys i will start learn numpy and pandas... (some lybrairys should learn it for machine learning) so jupyter or pycharm is good for me????

craggy agate
#

So it basically just moved
d some letters around, not really reinvented or or something.

wooden sail
craggy agate
iron basalt
#

@craggy agate What you are probably looking for is deep reasoning as described by Wolfram. You can think of this as having and internal thought loop in which it solves problems algorithmically. We do have AI models that do this (e.g. NTMs), but they have been overshadowed by language models in popularity, they are still there though and being worked on, including integrating them with language models.

craggy agate
#

No not really

sick eagle
sick eagle
#

don't care for my mistakes in english

wooden sail
#

i already gave you my 2 cents 😛 it promotes bad habits if you're new to python

whole zephyr
#

yo, does anyone know of a neural net architecture that:

is trained using a target_processed_signal + raw_unprocessed_signal and outputs the parameters

is tested using only raw_unprocessed_signal and outputs some parameters that would transform the raw signal into a desired signal? any links to papers will be highly appreciated, especially if they contain neural net diagrams for the architecture

I'm thinking of a use case is to train the network to somehow "remember" the qualities of a desired signal

and when I have no target signal but give it new, raw signals it would give me some parameters that I could apply to process the new signals

wooden sail
#

we've had a few cases here of people asking for help where the issue was executing cells out of order, or running a cell a second time resulting in inadvertent composition of functions. the first means the code only works if you run the cells out of order, and the latter means the code only works if you re run all the cells in order, at which point you're better off just using any ide you like

craggy agate
#

I see

iron basalt
# sick eagle i think jupyter is good for small projects like biginners right?

IMO it's better to just get used to using text directly, not the notebooks. Notebooks have the issue that any other non-plain-text coding method has, and it's that everyone now needs that specific editor for it to access your code and they have to now learn that. In addition, Jupyter Notebook is not well designed even as a notebook IMO. It causes many issues with debugging.

sick eagle
craggy agate
whole zephyr
iron basalt
wooden sail
# whole zephyr I currently run an architecture that predicts parameters based on a signal diffe...

you might be able to rewrite the problem as a "self supervised problem" where you take an autoencoder that starts with params, generates a signal, then estimates params from that signal. you then split up the autoencoder and keep the decoder. just bouncing ideas around, this may or may not be helpful for you. (you an also interpret this as just using synthetic data if you keep the "encoder" fixed)

sick eagle
# craggy agate Yes, why?

i think you know jupyter is better in small projects if i want practicing my new skills it will help me but pycharm won't help me beceause it just for big projects right?

wooden sail
#

neither will "help you" in any special way

sick eagle
#

ok

wooden sail
#

pycharm has a special debugger and automatically creates venvs. jupyter has in-line plotting and can display latex cells between code cells. neither helps you with coding

sick eagle
#

i think will use vscode 🤣 pithink

iron basalt
sick eagle
#

i just kidding

#

i will use pycharm is good and simple

#

i did use it

wooden sail
wooden sail
#

pycharm is arguably the most complex out of pycharm, vscode, and jupyter.

#

if you already use that comfortably, the others won't give you any issue

#

but yeah just use what you like

sick eagle
wooden sail
#

if you like it, sure

iron basalt
#

About vim, emacs, etc. These editors are for when you decide which editor to use for a lifetime, they have high upfront learning investment required, but you will never need anything else again. So if you think you are ready for that, then maybe give it a go.

wooden sail
#

no ide or editor will do your job for you. they also don't teach you how to code

#

just use whatever has tools you like

sick eagle
sick eagle
wooden sail
#

there is no "best", it just depends on how much you like it and how well you use it

iron basalt
craggy agate
#

Recommend starting out with Jupiter NB but if you like py charm then go for it

sick eagle
#

ok

#

thanks too much guys

#

you are so usefull

#

thanks 🤝

agile cobalt
#

notebooks are not too bad - in particular, inline plotting can be useful for exploring datasets or testing data transformations, just remember to Restart Kernel every so often to ensure your results are reproducible and you didn't end up with a messy state

whole zephyr
#

the tool you use doesn't really matter

though, there are indeed 2 main approaches that can influence the way you organize your code:

notebooks, where the state of the program is sort of saved as long as your session (or whatever it's called) is still active

or the classic way where you would have to run your code every time if you want to access a specific state

iron basalt
#

Yeah, just do whatever, you can use notepad, it's really whatever. What really matters is that you are making stuff, you will find out for yourself what is working better after trying them.

#

Don't get stuck in analysis paralysis.

sick eagle
#

i think jupyter is so hard

#

It's not organized

whole zephyr
#

I'd recommend the classic way if you're a beginner

don't look into notebooks just yet because there you can run the cells out of order and you could get unexpected results because you didn't pay attention to what cell you've ran before

iron basalt
#

Path of least resistance since what right now matters is that you are coding. But keep in mind that some paths with more resistance have payoffs at the end.

sick eagle
#

yeah

#

so guys do you think AI will replace devlopers front end ??

whole zephyr
#

I worked like 1.5 years with notebooks and it was a bit hard for me to transition back to the classoc way of coding

sick eagle
#

i will never choose front end

whole zephyr
#

I don't like front end either but anyway

wooden sail
iron basalt
sick eagle
sick eagle
#

but after years AI will replace front end

#

i search in google

iron basalt
sick eagle
#

it will never replace all devlopers but just front end can ai replace it

whole zephyr
#

anyway, does anyone know how to train a network with 2 inputs X1, X2 (like a raw and a target) to match some output parameters Y

so that when I apply it to the real use case I only have the raw input X1 and I need to find the parameters that would transform X1 to X2 without having the X2?

sick eagle
#

right??

iron basalt
sick eagle
#

if that happend the market will..

sick eagle
whole zephyr
wooden sail
#

that's already implicitly included in the model, since forward modelling the input parameters into synthetic data means your data y is a function of the params x. then you want to use y(x) to estimate x, which is a typical inverse problem

whole zephyr
#

thing is: I already have a deterministic "algo" for the "decoder" - i.e. the signal processor I want to set those params for

wooden sail
#

which algorithm are you using? if it'S deterministic and fixed, you certainly don't need both the signals and the parameters

#

there is no training

whole zephyr
#

it's a parametric filter that's used to transform X1 to X2

#

I have a training set of a very soecific set of desired X2s

#

and when I encounter real data, I wouldn't have the X2, but just an idea of what X2 I would want to find

#

it's basically the mixing process of a sound engineer

#

so the training would be needed to find the "good" parameters to make any X1 signal sound better even if I don't have a target X2

but I want to estimate the params instead of having the desired signal as a direct output, because I want to be able to make changes in the parameters for full customizability of the output

wooden sail
#

that sounds like a typical ML problem though, what's the issue?

#

you want to compute X2 from X1, you have a parametric function that can do that but you don't know the parameters

#

you have examples of X2 as well. do you have the X1 that go with those X2's?

quiet bridge
#

Hi Guys I have actually started learning SQL for data science.
I was wondering what I should do after learning the syntax and those basic where,group by,join,etc

spring field
#

practice

quiet bridge
#

Surely but how

spring field
#

I mean, ig practicing SQL on its own is gonna be a bit tough, you kinda gotta include it in some bigger project

quiet bridge
#

So you suggest that instead of doing Hackerrank Problems,I should jump straight into using it in some project that requires SQL?

spring field
#

there are hackerrank problems that only need SQL?

#

but yeah, usually projects are a great way to practice

quiet bridge
#

Defiently

agile cobalt
spring field
#

I mean, ig you can try hackerrank as well

quiet bridge
quiet bridge
#

I get it now

#

Can I tell you ig what I am doing rn towards learning Data Science to make sure I am doing it right and in the right order?

#

I am just making sure that you are specialized in it

spring field
#

I'm not particularly specialized, but you can check out the pinned messages in this channel

quiet bridge
#

Yea I see

#

But do you know about Andrew Ng Specialization for ML on Coursera?

spring field
#

nope

calm hatch
#

i have categorical columns in my dataset that have missing values what could be a goof way to impute them besides replacing empty values with mode for that column. something that might impute these values depending on some other column to make ot closer to what could have been the real value. I hope the question is understandable

velvet olive
#

So, you want to infer some data depending on the values of other rows?

calm hatch
velvet olive
#

if you have enough examples, you could create a small supervised model to infer those values but it might involved a bit of work. But since you already have all the data structured it could be worth it

jaunty helm
calm hatch
#

oh! thats actually a very nice approach. Thanks alot🙌

fallow coyote
#

afternoon everyone. can anyone recommend alternatives to the mml book? or website which lists all the topics to learn for ML (excluding the stats)?

barren mango
#

hey guys

past meteor
orchid forge
#

im studying from khan academy for the Probability and Statistics, u asked. aur you sure there can't be Probability and Statistics specially for a data analysis coders?

orchid forge
#

Yeah it's good

#

Actually

river cape
#

Any as to why do we use toarray() function to get the output of the Bag Of Words ?

tidal bough
#

It's hard to tell what you mean without any context, but generally toarray in ML is used just to make something a normal numpy array, rather than a torch tensor (which is similar to a numpy array but has extra stuff like autograd attached to it).

river cape
#

Here why do we use toarray()

iron ruin
#

self - Reminder to not always set CV so high for parameter tuning, unless absolutely necessary

flat sigil
#

well shit

#

i messed something up 💀

flat sigil
#

i am scrapping and completely redesigning my reward function

#

cuz obviously something aint working

buoyant sapphire
#
C:\Users\Adam\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
charred sandal
#

hey
i have detection model for potholes, which is intended to work on the DVR set up on the driving car
currently I have only class potholes, but I need to diversify it and create new classes based on the size of the potholes : small_pothole, medium_pothole, large_pothole
should I retrain the model, or should I leverage the area of the potholes based on the pixels, such as
bbox_area = (x_max - x_min) * (y_max - y_min)

if bbox_area > area_threshold_large:
return "large_pothole"
elif ...
else...
?

orchid forge
#

Hey data people

#

As an non passionate data analysis student, what do you guys think I need to understand when I'm finding it hard to have focus on a project, that sometimes I lose motivation while making one?

orchid forge
#

How to do that ?

#

Maybe my brain thinks everything is hard for it to understand until I do, you need to know that I'm not too sharp

pliant heron
#
# now lets try to convert all the cells in the date column into dates via to_datetime()
import pandas as pd
df = pd.read_csv('dirtydata.csv')
df["Date"] = pd.to_datetime(df["Date"])
print(df.to_string())

I am getting error, can someone tell me whats wrong here
i am trying to clean the date formatted in wrong way

hushed pike
#

I always get ModuleNotFoundError: No module named 'loss_functions' in the following lines:

  File "/home/user/backend-project/core/scripts/classification.py", line 13, in classify_single_label
    checkpoint = torch.load(model_path, map_location=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/backend-project/.venv/lib/python3.11/site-packages/torch/serialization.py", line 1025, in load
    return _load(opened_zipfile,
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/backend-project/.venv/lib/python3.11/site-packages/torch/serialization.py", line 1446, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "/home/user/backend-project/.venv/lib/python3.11/site-packages/torch/serialization.py", line 1439, in find_class
    return super().find_class(mod_name, name)

Problem is, that this function (which I have used in another project for training my model) is never used in this new API project. Anyone ever experienced something similar?

left tartan
arctic wedgeBOT
#
Traceback

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
        ~~~~^~~
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

left tartan
#

You (or someone) trained some model, pickled it, then are trying to load it... but to load it, you need the modules that it uses.

hushed pike
# left tartan You (or someone) trained some model, pickled it, then are trying to load it... b...

I've trained the model(s). I also know which module is needed, but I was so confused, as it is never used anywhere else outside of training. But pasting the module back to its original location in the new project, where I changed all locations for each file (since I am building the backend now, with a proper api structure), didn't work. Also new locations weren't accepted as well.

Is there a way to find out where the module should be located at?

Traceback will be added via pastebin, as the python deleted my message...

#

*python bot

left tartan
hushed pike
#

No, custom made (by me, my code) module, consisting of 2 functions. It is really small and only used for the loss calculation in training.

left tartan
#

Sorry, missed the last msg, one sec

#

How is your code laid out? Just test to make sure your module can be imported by a simple program

late lichen
#

What activation function I can use on hidden nodes and output nodes?

agile cobalt
#

hidden nodes: nearly any can work, there isn't one clear winner for all cases, but ReLU is a popular choice
output nodes: if you mean the final output, very problem dependent but much of the time you wouldn't include one - The output of the model has to match the properties of your target variable

Most tutorials should mention which activation functions you should use for a particular problem, and you can always check which ones popular architectures are using

wooden sail
#

neurips does double blind and open review (the papers are put up somewhere where anyone can comment and leave feedback, plus reviewers are assigned to it)

#

as for the quality... it's a huge conference with tens of thousands of submissions, so there have been concerns with the quality of the reviews

native narwhal
#

how do i find bends in this image?

agile cobalt
#

you'll have to be more specific about what you are asking

past meteor
#

Yes, this table represents the correlations between each variable, the correlation is 0.389583. What exactly do you not understand?

#

On the diagonal you're computing the correlation between a variable and itself, which is obviously 1 (look at the formula again if you're unsure)

Afterwards you compute the correlation between weight and height, this is exactly the same as computing the correlation between height and weight (look at the formula again if you're unsure about this one as well)

spring field
#

do I just replace my SingleHeadedAttention with this? which one? try with both?
what's x_bcd 👀
(what are all these attention types)

#

it's moments like that that make me wonder, well, surely I'm in the valley of despair, but maybe it's a logarithmic scale

crimson schooner
#

Hey i am new to pytorch and trying to basic train and evaluate a model. The training is not working very good is here someone I could ask some questions in the DMs about this? Maybe get some tips and hints.

past meteor
spring field
#

mmm, I use multiheaded attention, the singleheaded attention class is so I can expand it more easily and such or is that what you meant?

#

what? I'm pretty sure I'm doing it pretty much how it's described

#

I don't remember anything about that pithink

#

the shape of the output of the multiheaded sublayer is the same as the shape of a single head

#

is that not what you mean?

#

the HIDDEN_SIZE is the size of a single head

#

yeah, that's what I have

#

concat dotted with the fc layer

#

it's for a single head

#

yeah, so the input gets fed to each head, each of which has a hidden size of 64, each of them output a shape of (64, 64), that gets concatenated, so you get (64, 512) that then gets dotted with (512, 64) and the output of the whole multiheaded sublayer is (64, 64)

#

mmm, should it? it just duplicates the input for each head pretty much

lone hollow
#

Are you guys aware of any hackathons where aiml skills are used?
I only hear web dev guys going and rocking there

wooden sail
#

i just wanna highlight that you don't need it to be a rectangular matrix for it to be a projection

#

(in fact rectangular matrices don't define projections at all, but the resulting vector space is isomorphic to a low dimensional subspace of the domain)

#

that it does, yes, but still you don't need it to be rectangular for that to happen

#

you also don't

#

you noted yourself that there is an implementation that uses dropout instead of those rectangular matrices

spring field
#

I'm deeply lost now

#

what is token embed?

wooden sail
#

dropout is a projection onto a low dimensional subspace. a proper one, too. square matrix and idempotent, rank deficient

#

an identity with missing 1s on the diagonal

#

they really don't

#

good for them, but they don't need to be

#

here

#

by enforcing low rank with some other condition

#

e.g. requiring the matrix to be diagonal and only a specified number of nonzero entries

#

implementation in paper and in code are two different things as well

#

in the example i gave you above of punctured identity mats, you can use a sparse array representation that is pretty efficient

#

the matrix also has square root as many parameters. win-win

#

i'm not bringing it up to be pedantic, but rather to highlight that there are multiple ways of achieving the same effect, and as you noted, not all of them have been tried. they have different properties and entail different costs. the rectangular approach certainly achieves the effect you want, but i don't want either of you to be chained to that approach and/or wonder why all of a sudden people do something different in another paper

#

the main idea is low dimensional subspaces, and you can get those in more than one way

spring field
#

ok, that visualization really messed with me

#

I just have this pretty much

#

alright

#

here's what I understand

#

in_features of each head correspond to embedding dimensions

#

which is what I have

#

it just happens that they are the same value

#

alright, in that case, I have the architecture implemented correctly I just need to ensure that embedding dim == number of heads * hidden size

#

phew

#

alright, but each head still receives the same input, right? like it's duplicated across each head, just each head has its own weights

#

alright, I see now, I misunderstood you at first

hollow escarp
#

Has anyone every installed onnxruntime for armv7 architecture

dull radish
#

Hey so I was deploying a simple model with the following code in the predict function

    item = Item(
        prompt=prompt,
    )

    messages = [
    {
        "role": "system",
        "content": "You are an expert programming assistant",
    },
    {"role": "user", "prompt": item.prompt},
    ]

    outputs = pipe(
    messages,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    stop_sequence="<|im_end|>",)

    return(outputs[0]["generated_text"][-1]["content"]) ```

and I'm getting this error when calling the function:
```{
    "run_id": "5e5b8f00-2587-93e4-96f5-bf23009ee062",
    "result": {
        "error": "When passing chat dicts as input, each dict must have a 'role' and 'content' key."
    },
    "run_time_ms": 26864.662170410156
}```
My function call was just a simple:
```{
    "prompt": "Program to add 3 numbers"
}```
spring field
#

what is x_bcc supposed to be?

#

I'll assume dotproducts_bcc

#

well, it is

#

in the code you sent

#

in softmax

#

lol, what

#

also how is this supposed to work if the matrices are not square?

metric_dd = (self.coefmatrix_dd.weight + self.coefmatrix_dd.weight.transpose(-1, -2)) / 2
#

no, but why are you in vr 😄

#

well, I changed it

#

but why wouldn't it generalize over non-square matrices?

#

alright, I named it EMBEDDING_DIM and HIDDEN_SIZE btw

#

I assume K is the head hidden size

#

well, not to me
at least for now...

#

wait, why did it change to D, K for kk?

#

alr

#

kd seems to be the right one

#

also bit of a technicality, but why not use .forward and set bias=False or use a Parameter(Tensor(...))

deft solar
#

any recommended projects for data analyst or data scientist

spring field
#

anyway, it's running, but I changed it to projection_kd instead of transposing

#

makes sense, but then why not just create a Tensor param

#

ah, I see

#

makes sense

honest reef
#

can someone guide me where to start with data science? currently i know python and some bash, can bash be helpful? where should i look for algorithms and such...

#

I know python is not the only necessary tool to use, but i want to begin from somewhere

spring field
#

bash can probably be useful for deployments

warm pebble
#

hello i need help making an object deetection ai

#

i have no idea how to start

honest reef
spring field
#

probably not necessary rn, no
you can take a look at the pinned messages here
same for @warm pebble

honest reef
#

ok I'll take a look at them, thanks

spring field
#

something seems off with that testing accuracy

#

does it mask everything it needs to mask though?

#

doesn't it need to mask below the sequence as well?

#

I mean like this

#

also as I understand masking is done only for the decoder column, right?

#

uhhh, is it because packing the sequence is gonna throw away the rest?

#

what's in those places then? the c has to have the same shape as the rest of everything regardless of the actual sequence size

#

now I'm confused

#

yeah

#

yes

#

but like what if there are other lengths in that batch?

#

actually, gimme a sec, I need to check something, lol

#

waiiit nooo

#

it gets padded

#

the dataset I have has varying lengths of text as far as I'm aware

#

and it doesn't get preprocessed to cut that down

#

it just gets padded with zeros

#

now, I'm not saying that's how it should be, but that's how it was handed to me

#

the dataset is a bunch of sentences of varying length, yes

spring field
#

classification as in sentiment analysis?

#

mmm

#

I'm slightly veering into the RNN territory again, thinking of those architectures, lol

#

I think I saw those graphs where you thought there might be a leak

spring field
#

and that makes the training incredibly slow

spring field
#

sth about that test accuracy ain't looking good still

spring field
#

status upate

spring field
#

alright, I have a feeling I messed up somewhere...

spring field
#

it appears the issue was too many dimensions for the embeddings?
yeah, that seems to be it, dam

spring field
# spring field I mean like this

I also just found the median sentence length and just sliced all the sentences to that length (that were long enough) so I have a fixed context size across the entire dataset so I don't have to use that dang slow for loop to do this whole thing, I can just use the triangle mask

#

I can't believe that doing what I did just now made it so much much faster, like, what it took a couple minutes to reach similar accuracy as before when it took like 6 hours...

#

so anyway, this is what I got with my model (but like, improved as of today)

#

this is this, it took it longer to reach similar accuracy as you can see, but it was certainly more interesting to see how the attention matrix developed over time (or maybe that's because it just took longer to develop, lol)

spring field
#

and this is for this, as you had said, it's rather similar to what I have
I modified the code a bit to do the projection thingy

class SingleHeadedAttention(torch.nn.Module):
    def __init__(self, mask: bool):
        super().__init__()
        self.projection_kd = torch.nn.Linear(in_features=EMBEDDING_DIM, out_features=HIDDEN_SIZE)
        self.coefmatrix_kk = torch.nn.Linear(in_features=HIDDEN_SIZE, out_features=HIDDEN_SIZE)
        
        self.mask = mask

    def forward(self, x_bcd, lengths, y=None, soft_attention=False):
        x_bkc = self.projection_kd.weight @ x_bcd.transpose(1, 2)
        x_bck = x_bkc.transpose(1, 2)
        dotproducts_bcc = x_bck @ self.coefmatrix_kk.weight @ x_bkc
        
        if self.mask:
            seq_size = x_bck.size(1)
            mask = torch.tril(torch.ones(seq_size, seq_size)).to(DEVICE)
            dotproducts_bcc = dotproducts_bcc.masked_fill(mask == 0, value=-torch.inf)
            
        dotproducts_bcc = torch.softmax(dotproducts_bcc, dim=-1)
        y_bcd = dotproducts_bcc @ x_bck

        if soft_attention:
            return y_bcd, dotproducts_bcc
        else:
            return y_bcd
gritty vessel
#

``
import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
2 Physical GPUs, 2 Logical GPUs
``
why iam getting this?

#

i have only one gpu on my system

frosty fulcrum
#

Does anyone know the best Python library to under-sample the regression dataset to deal with an imbalanced dataset? I've tried resreg, but it's not really helpful since I can't control the under-sampling dataset, and the other one is smoter, which is extremely slow.

#

i'm not the one who created the dataset.

past meteor
#

You're better off using some sort of cost function

#

They added this in the latest version of sklearn. Now you don't need to do it manually

gritty vessel
#

hey how to decide which scaling to apply on data?

past meteor
# gritty vessel hey how to decide which scaling to apply on data?

Models that use gradient descent or L2/L1 regularization need some sort of scaling, frequently standard scaling is applied. Imo it's a nice an easy exercise to figure out why it's the case (just look at the equations).

Models that don't fall into this category (famously, tree based models) don't necessarily need scaling but I frequently do it anyway.

I'd say the biggest downside of standard scaling is that robust metrics aren't used. If you have outliers it can skew your mean and median. That would be an argument for using a different scaler (e.g., min-max). Defaulting to standard scaling is a good idea though.

gritty vessel
past meteor
#

sure

gritty vessel
#

1st and 2nd channels are in exponential distribution and 3 and 4th channel are in normal distribution

past meteor
#

I'd start by standard scaling all of them

gritty vessel
#

ok so apply standard scaling to all data

past meteor
#

It'd be great if you figure out why (it's not a hard exercise)

gritty vessel
#

to bring it in same range?

past meteor
#

So really just take the equation of MSE loss + L2 regularization

spring field
#

just saying but I'm masking only in decoder's self-attention, the decoder-encoder attention doesn't do masking and neither does encoder's self-attention

past meteor
#

And think about "what happens to my regularization term if my variables are on different scales"

#

this one

gritty vessel
#

oh ok it would be biased towards features with larger scale

past meteor
past meteor
gritty vessel
#

ok so its directly proportional to magnitude of weights

past meteor
#

And to come full circle, the way you normalize is typically standard scaling but if you have outliers you may also use min-max scaling or something that won't destroy your data.

peak ridge
past meteor
gritty vessel
#

its real life data so some anamoly in it should be learned by model

past meteor
#

Sure, but try some of the things out I mentioned. Either empirically (with numpy) or by looking at the equations

#

Then the whole scaling thing will be clear to you

gritty vessel
finite sierra
#

how can I find the intersection 3 indexes in pandas? the Index.intersection method only accepts to check 1 other index.

spring field
#

I, uhh, lost them, btw, how do I combine the attention from all heads in all layers in all blocks?

#

cuz, I'm gonna have to rerun it, cuz like, yeah... I'll also make the attention look nicer and across multiple sequences and add the sequence values as well

#

I mean, it converged pretty quickly

#

so it won't take long once I set it up

#

wait

#

but attention is bcc

#

yes

#

that's what I meant

#

I realize how it could've been misunderstood

#

no, I want to know how to combine the attention score matrices from all heads in the whole network

#

well, yes

#

but like, what attention scores do you want them?

#

I've been using one of the heads of the last encoder-decoder attention sub layer

spring field
#

oh right... that's very interesting

peak ridge
spring field
#

but how do you do a decoder without the encoder-decoder sub-layer?

#

isn't that just an encoder with masking or sth?

#

dam

#

what is encoder + decoder used for then?

#

translation?

#

mmmmm, right

#

I thought you'd do the same with next token, though it did make we wonder why the same input is embedded in different embeddings 😁

#

(nope)

spring field
#

mmm, I see

peak ridge
#

i have a very surface level questionn for u then

spring field
#

is RAG just feeding the entire context to a transformer and then you just append your question to the end of that and it does next token prediction?

past meteor
peak ridge
#

nope

spring field
past meteor
#

After a long discussion with Edd and Etrota on this topic ...

#

it's basically what I learnt in search engines and information retrieval years ago

#

but with LLMs 🤷

peak ridge
#

yes sir

#

do u have exp in rag @past meteor

past meteor
spring field
#

alright, I'll cover RAG at some point in the future, I'm here trying to even understand transformers to a reasonable extent, they're too magical... and I mean, they literally are since it's not like we actually understand how they work, do we?

past meteor
#

^ that's what I always say

#

I think understanding the intuitions of self attention, multihead attention and so on isn't too hard but I'm not always convinced our intuition of the methods is aligned with how and especially why they work

spring field
#

well, we might have an "intuition"

peak ridge
#

so i've implemented these things (room for improvement)
I manually coded it but at the meantime LangChain has in built classes and functions for it

Should i use it or my manually coded one

#

this is the working thing.
just need review

spring field
#

so, basically I was trying to translate English to English... that's hilarious

peak ridge
#

@spring field @past meteor maybe have a look, please?
i want u to review it

past meteor
#

I'd prefer it if Maud'dib would have a look in my stead

spring field
#

I'm not sure what to review there... as I said, I have a surface level understanding of RAG

past meteor
#

Make it blue

peak ridge
#

hmm

spring field
#

I mean the code looks alright at first glance

peak ridge
#

ya and it's working but the fear is i coded it manually

#

and aint using langchain libraries/classes for ReOrdering Text

spring field
#

also that format_docs function is a bit pointless

peak ridge
#

okay

#

💀

peak ridge
#

i have no as such exp in this field 😦
but our product is fully based on this

#

i will go bankrupt

spring field
#

just to recap
next token prediction with transformers is done using only encoders, but you mask the self-attention sub-layer?
because I see structures with "decoder-only", but like, that implies throwing out the whole encoder-decoder sub-layer and I don't like it

peak ridge
#

hmm

#

🧐

spring field
#

decoder-only doesn't make sense, it has to be a hybrid between the two, according to the paper

#

or can you also not mask the attention?

peak ridge
#

okay okay

#

can we make a conclusion out this

#

so i can work on it

#

💀 please?

spring field
#

but if you mask it, your attention score is also masked? or did I forget the order of masking

peak ridge
#

No errors,

Want to make it better

#

no errors,
all good and working

need to make it

#

Better output from RAG
More relevant, more precise more btter

#

so true

spring field
#

oh, do you do the whole dot prod and softmax, then use that as the attention score and then mask the thing that's going to output?

peak ridge
#

i dont understand this

#

@spring field talks are above my knowledge

past meteor
#

GL

peak ridge
#

hm

spring field
#

right, makes sense

spring field
spring field
#

sth like that

flint gazelle
#

Is it possible to create a Network in pytorch that only uses a int-datatype, so the weights, input and output are all int? I've tried to make this work but it always returns
RuntimeError: Only Tensors of floating point and complex dtype can require gradients

spring field
flint gazelle
#

i need really fast int8_t operations in c++ with simd avx2 instructions

spring field
#

but why ints?

#

simd instructions can handle floats as well

#

ig not as many as int8_t at once, but yk

#

why do you even need simd, it won't help you that much, better if you can run stuff on a GPU

buoyant vine
#

tbf a lot of times float32 operations on intel chips can be faster than their integer versions

#

with a lot of integer operations you have the caveat of the overflow handling which normally limits how many lanes you can actually use even if you can hold more

spring field
#

also with int8_t you might easily run into overflow issues with neural nets

buoyant vine
#

yeah, normally you end up going 16 x int8 ops instead of 32 x int8 ops moving up into int16 results

#

unless you don't care about the saturation or overflows, but then that can cause behavour differences between archs

#

also some things like any integer division is incredibly expensive compared to fp

flint gazelle
#

i thought of int8xint8 = int16 and then as activation function clamp [0,127] and so on

wooden sail
#

classical derivatives are only defined over the reals, not over the ints

#

you wouldn't get correct results with autograd in general, since derivatives could generally map into the rationals or reals

#

rounding/casting after differentiation is also generally not correct

flint gazelle
#

yeah but for clamp its just 0 or 1

#

the derrivative

buoyant vine
#

probably worth a note though

#

if you want the best speed

#

Your buffers need to be aligned to 64 bytes

#

and your operations need to cut the branching down so you're doing about 64 values per loop call

flint gazelle
#

yes i have already test with float and i have alligend it to the chache size

buoyant vine
#

Also probably want an AMD cpu rather than intel most of the time

flint gazelle
#
template<uint64_t inputSize, uint64_t outputSize>
void layer_32(Layer<inputSize, outputSize>& layer_16, std::array<float, outputSize>& output, const std::array<float, inputSize> input)
{
    alignas(64) float arr[8];

    for (uint64_t j = 0; j < outputSize; ++j) {
        output[j] = layer_16.bias[j];
        for (uint64_t i = 0; i < inputSize; i += 32) {

            __m256 _weights0 = _mm256_load_ps(&layer_16.weights[j][i]);
            __m256 _weights1 = _mm256_load_ps(&layer_16.weights[j][i + 8]);
            __m256 _weights2 = _mm256_load_ps(&layer_16.weights[j][i + 16]);
            __m256 _weights3 = _mm256_load_ps(&layer_16.weights[j][i + 24]);

            __m256 _input0 = _mm256_load_ps(&input[i]);
            __m256 _input1 = _mm256_load_ps(&input[i + 8]);
            __m256 _input2 = _mm256_load_ps(&input[i + 16]);
            __m256 _input3 = _mm256_load_ps(&input[i + 24]);

            __m256 out0 = _mm256_mul_ps(_weights0, _input0);
            __m256 out1 = _mm256_fmadd_ps(_weights1, _input1,out0);
            __m256 out2 = _mm256_fmadd_ps(_weights2, _input2,out1);
            __m256 out3 = _mm256_fmadd_ps(_weights3, _input3,out2);


            __m256 temp = _mm256_hadd_ps(out3, out3);
            temp = _mm256_hadd_ps(temp, temp);

            _mm256_store_ps(arr, temp);

            output[j] += arr[0] + arr[4];
        }
    }
}
#

this is my matrix multiplication

#

for a input of 1d and output of 1d array

#

but with int8 i could make this way faster, the only thing i need is to some how train a model that is accurate enough

buoyant vine
#

you are loose a tone of performance with how you have structured your ops btw

#

and the copying of memory per iteration

flint gazelle
#

i dont quite follow

agile cobalt
buoyant vine
#

so SIMD instructions are basically 1 instruction, but they are not 1 instruction = 1 cyle

#

and sometimes multiple SIMD instructions can be executed within 1 cycle

#

which is where the whole "uops" stuff comes about

flint gazelle
buoyant vine
#

but currently, you are bottleknecking yourself at major points from a quick glance:

__m256 out0 = _mm256_mul_ps(_weights0, _input0); becomes your dependency on the execution below so

            __m256 out1 = _mm256_fmadd_ps(_weights1, _input1,out0);
            __m256 out2 = _mm256_fmadd_ps(_weights2, _input2,out1);
            __m256 out3 = _mm256_fmadd_ps(_weights3, _input3,out2);

Each step here is now having to wait on the previous instruction to finish before executing the next

#

so instead of the CPU being able to do this step in 2 cycles, it's execution time jumps to 4 cycles (normally) and you have the added latency for each instruction

#

which is normally ~7-10

flint gazelle
#

i wast testing this implementation before i had :

template<uint64_t inputSize, uint64_t outputSize>
void layer_32(Layer<inputSize, outputSize>& layer_16, std::array<float, outputSize>& output, const std::array<float, inputSize> input)
{
    alignas(64) float arr[8];

    for (uint64_t j = 0; j < outputSize; ++j) {
        output[j] = layer_16.bias[j];
        for (uint64_t i = 0; i < inputSize; i += 32) {

            __m256 _weights0 = _mm256_load_ps(&layer_16.weights[j][i]);
            __m256 _weights1 = _mm256_load_ps(&layer_16.weights[j][i + 8]);
            __m256 _weights2 = _mm256_load_ps(&layer_16.weights[j][i + 16]);
            __m256 _weights3 = _mm256_load_ps(&layer_16.weights[j][i + 24]);

            __m256 _input0 = _mm256_load_ps(&input[i]);
            __m256 _input1 = _mm256_load_ps(&input[i + 8]);
            __m256 _input2 = _mm256_load_ps(&input[i + 16]);
            __m256 _input3 = _mm256_load_ps(&input[i + 24]);

            __m256 out0 = _mm256_mul_ps(_weights0, _input0);
            __m256 out1 = _mm256_mul_ps(_weights1, _input1);
            __m256 out2 = _mm256_mul_ps(_weights2, _input2);
            __m256 out3 = _mm256_mul_ps(_weights3, _input3);

            out0 = _mm256_add_ps(out0, out1);
            out2 = _mm256_add_ps(out2, out3);

            out0 = _mm256_add_ps(out0, out2);

            __m256 temp = _mm256_hadd_ps(out0, out0);
            temp = _mm256_hadd_ps(temp, temp);

            _mm256_store_ps(arr, temp);

            output[j] += arr[0] + arr[4];
        }
    }
}```
buoyant vine
#

What does the LLVM MCA breakdown give

#

those hadds and stores look a little bit sus

flint gazelle
#

i would have to test the exact performance but running this on random data a million times both versions take around the same time

#

but this are small chnages that could performance increase just a bit, my main focus is if i can make int8xint8 = int16 work with acceptable accuracy

#

i should also mention that my first layer consist of 0 and 1, so i dont even need to convert them

buoyant vine
#

I mean you can do it, but like you said I'm not sure how well your accuracy is going to carry over

flint gazelle
#

i will try and see if i get good results, ty

teal lance
#

I moved on from tkinter to pyside6 ❤️ im a happy learner

past meteor
#

What about just training on float16 and quantising to integers of whatever you want

flint gazelle
#

thats what im currently trying

main citrus
#

Is bias like accuracy score

#

?

sinful surge
#

anyone know YOLO ? and have any experience into that ? i just need help

lapis sequoia
#

what is the best dataset(s) for people who are kind of new to NLPs?

#

Thank you

#

what about animequotes or hate speech?

#

what is the main difference between countvectorize and Tfidvectorize?

spring field
#

man, these plots hit hard

#

all that fancy schmancy spatial attention and stuff

robust zodiac
#

why am i here

serene scaffold
# robust zodiac why am i here

religions attempt to answer this question. personally, I think life is more fulfilling if you don't try to ascribe purpose to your existence, and just spend it doing things that are fulfilling for you.

serene scaffold
robust zodiac
serene scaffold
robust zodiac
serene scaffold
lapis sequoia
#

I just finished my first NLP, I think it is trash. How do I judge it objecitvely?

#

like, what are the tiers of skill in ML/data science whatever

serene scaffold
spring field
#

BCE can only be used when you have two classes at least for that particular output, right?
am I overthinking this or can BCE only be used for that one case where you have to predict between one of two classes? is that it? can it at all be used for multi-class classification?

#

ViT + TokenLearner 🤭

#

yeah

#
  • TokenLearner
#

which ig is really helpful if you have more transformer layers, greater hidden size, and a larger dataset than I do (apparently too many dimensions with small datasets worsens the performance in my current experience)

#

like yesterday or mby it was early morning today when I was running those gpts for next token prediction, using a hidden size of 512 basically made it so it didn't learn at all (now thinking back, maybe it could've been caused by using encoder + decoder for next token prediction...), anyway, I reduced the hidden size to 128 (16 per head) and it immediately started learning again

#

embedding dimensions

#

how many dimensions a token is embedded into

#

mhm, that might have been what's happening

#

I assume it's been tried, but what about using a quadratic instead of a linear function? instead of Linear(x, M, b) = Mx + b use sth like Quadratic(x, A, B, c) = Axx^T + Bx + c

#

the linear layers with quadratic layers? it probably has to be differentiated twice though of which the second time is gonna be a const anyway.... hmmm

past meteor
#

If you're in doubt about good initial hyperparameters a good thing you can do is make them too large and fit on a single batch

#

Your loss should go to 0, if it doesn't you have a bug. It's kind of a unit test for your architecture 😄

#

Aside from time (how long it takes to train it all) starting big and going smaller is a good idea

craggy agate
#

Any good guides for LSTMs for forecasting time series data?

warm pebble
#

can somebody help me make an object detection ai

#

i have no idea where to start

violet gull
warm pebble
#

Ok

flat sigil
#

finally got pytorch running with cuda yall arent ready 😈 😈

lapis sequoia
#

I do not know. I have been at this for a while. I base my self=worth on this and never ever stop doing it. I never feel like I am good and with all of this pytorch TensorFlow stuff (not that bad) like, I do not know, I always feel like I am trash. Like, I do not care about money. This data thing is a massive massive obsession. Whatever, everyone is obsessed with something

#

Like, you go pretty hard. What do I lack?

gritty vessel
#

ValueError: The filepath provided must end in `.keras` (Keras model format). Received: filepath=model_output/test01/model-{epoch:02d}-{val_loss:03f}.h5 i ran same code on my system and it worked fine but on kaggle it throws this error

#

does this mean i have to save the file with .keras extension or can i bypass it?

serene scaffold
serene scaffold
#
# not this
df['content'] = df['content'].apply(lambda x: x.lower())
# do this
df['content'] = df['content'].str.lower()
gritty vessel
#

mcp_save = ModelCheckpoint(os.path.join(outdir,'model-{epoch:02d}-{val_loss:03f}.h5'), save_best_only=True, monitor='val_loss', mode='min') its giving error in this line

#

so here should i add save_format = none

serene scaffold
#

in fact, every time that you use apply before at least line 68 should have been a .str. method.

gritty vessel
#

i searched on google and someone said in latest version they did this to increase the usage of .keras

lapis sequoia
lapis sequoia
serene scaffold
lapis sequoia
#

Thanks

buoyant shoal
#

Hi, isn't this like none of the above?

#

my orange line is where there's the largest explained variance ratio right?

wooden sail
buoyant shoal
#

D is basically what i said right?

#

it's perpendicular to the "/"

#

technically at least

#

this is like an ellipsoid in R^2 right?

wooden sail
#

yeah

buoyant shoal
#

okay thanks

spring field
#

is variance and mse calculated the same? pithink
also why does sample variance divide by n - 1 instead of n?

past meteor
# spring field is variance and mse calculated the same? <:pithink:652247559909277706> also why...

To correct for sample bias.

You don't have the variance but you have a sample of the variance. Ideally, when n goes to infinity it converges to the actual variance, that means it's an unbiased estimator.

The actual reason has to be with taking the expected value of the statistic (in this case the variance) and checking if it's equal to the population variance. It isn't, you need a correction to get there. That's where the term comes from. It's a typical thing you do in a statistics class, it's been a while for me

spring field
#

I (think I) see

#

but MSE and (population) variance share the same formula, right?

past meteor
#

No they're different

#

Ah wait, I see what you mean

#

Yeah that's a correct observation

spring field
#

very cool, thanks

latent girder
#

hi, whats a good beginner data science python project?

wooden sail
#

they only do if whatever generates your estimate is "unbiased", yielding the correct value in expectation

#

otherwise the MSE is bias + covariance

spring field
#

I meant like, it's a mean square over a bunch of differences

wooden sail
#

yes but the meaning is different

#

you can get a huge MSE with 0 variance

#

e.g. if your function just outputs 0 always, regardless of input

spring field
#

I get that (well, I understand that that's the case, I'm unsure of the deeper details)

#

I just found it surprising the formulas were conceptually the same

wooden sail
#

on purpose, so that you can study the mean and covariance :p

#

they both describe second order statistics

spring field
wooden sail
#

yes

wooden sail
# spring field oh, is that why it's a square?

https://en.wikipedia.org/wiki/Moment_(mathematics) some people call these "statistical moments"

In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment (normalized by total mass) is the center of mass, and the second moment is the moment of inertia. If the function is a probab...

spring field
#

I'd call this an episode of enlightenment pg_rofl

wooden sail
#

you could generally say that the two things (variance and MSE) are "second moments". the variance is the second central moment (subtract the TRUE mean of the random variable). the MSE is the "second moment" (NOT central) of the ERROR

#

if your estimator is unbiased, then the estimator's mean is the true mean and the error is now centered at zero, turning the MSE into a second central moment (a variance)

#

the example of the 0 estimator i mentioned before is pretty important because it reminds you that the MSE doesn't tell you the nature of the error. it's up to you to verify later if it's variance or bias

spring field
#

oooh, the puzzle pieces are coming together

wooden sail
#

at any rate, the point being that the MSE formula looks like the variance formula not because the MSE is a variance, but because both the MSE and the variance are the same kind of object, a statistical second moment

spring field
#

unbiased relative to the population, right?

#

like if you do linear regression and find the true mean, your bias value might not be 0, but still unbiased, is that correct? pithink

wooden sail
#

i see what you mean and you could technically define the bias either way

#

but we normally refer to the true parameters, so bias is defined w.r.t. the true mean and not the population one

#

this is what the correction factor zestar mentioned addresses

#

but you're mixing two things up as well

#

because i can have data of a population

#

compute the variance of that data

#

but the data has a true variance

#

so i can also compute the MSE of the variance

#

and those are two separate things 😛