#data-science-and-ml
1 messages Β· Page 154 of 1
yes you need FC layers, each neuron is basically a dimension of your output vector
How many do you need? That' a hyperparameter π
so does seaborn also have an unintuitive API?
alternatives? plotly?
I'm a plotly shill now
"Dress me slowly, I'm in a hurry"
i am also a plotly shill
someone knows why my pytorch model isnt running on my GPU #1320119231026827316
Matplotlib is OK if you read their docs then it makes sense
BUT itβs awfully verbose to do some things seaborn does easily out of the box
The annoying thing is that imo, to use seaborn you must know matplotlib or youβll not be able to customise it properly
Hi anyone, can anyone help me with code? Its quite long. I tried to train a model for computer vision. For some reason it does not completely run
Hello, when you ask for help, always ask your actual quesiton. Don't wait for a commitment before you give necessary details for answering your question.
hi, beginner here, does smaller data set cause predictions to capture more noise (the randomness of the data that causes irreducible error)?
so the smaller the dataset, the error represents the data variation larger, is that true?
@serene scaffold ._.
We discussed this earlier, did we not? Did nothing we discuss develop your understanding?
sorry even though i think i know the material i wont leave any gaps in my knowledge
so i usually asks for confirmation to my teachers whether my exact understanding of the material is true or not
I've never heard anyone describe a prediction as "capturing" something.
perhaps you may need to see yourself
wait
this is the definition of f, and epsilon (which is the irreducible error i were talking about earlier)
wait I did?
lol I thougth I deleted a different one >.<
I DM'ed you the message
yeah give me a sec, I might have worked with a too simplistic example.
can you show figures 2.3 and 2.4?
sure
@thorny geode what does all this have to do with the size of the dataset?
this is also what i meant from "capturing the noise" (the green curve)
the green curve is overfit. that isn't "capturing noise" per se, but I see what you mean.
it's not that all the data points are "noise". it's that the green curve isn't general.
my hypothesis is that as the data set got larger, the error rate can balance each other so the overall prediction will capture the true f much more than the error rate (as the error rate or variance of the data decreases as the sample increases), but if on an extreme cass, where there is only 3 data sets, even a slight variance of the data can cause the prediction to "wobble" considerably, such that the prediction captures much more noise than the correlation itself
yes, the prediction is including the variance of the dataset, the noise, too much, thats what im thinking about overfitting
the three curves in the left plot of figure 2.9 are not datasets. they're different models trained on that data.
Let me ask a similar question but more concret. I wrote a pytorch tensor using tensor subclasses. I basically just log any function call that is made to the. tensor. My tensor is called LoggingTensor. So if you have a tensor t of type LoggingTensor and you do t + 1 you'll see a printing statement saying that the addition function was used.
I have this network:
class MyModel(nn.Module):
def __init__(self, bias=False):
super(MyModel, self).__init__()
self.input = nn.Linear(1, 2, bias=bias)
self.hidden1 = nn.Linear(2, 2, bias=bias)
self.hidden2 = nn.Linear(2, 2, bias=bias)
self.output = nn.Linear(2, 2, bias=bias)
self.init_weights()
def forward(self, x):
v = self.input(x)
h = self.hidden1(v)
k = self.hidden2(h)
o = self.output(k)
return o
def init_weights(self):
# [[w1], [w2]]
self.input.weight = torch.nn.Parameter(torch.tensor([[0.2], [0.3]]))
# [[w3, w4], [w5, w6]]
self.hidden1.weight = torch.nn.Parameter(torch.tensor([[0.4, 0.5], [0.6, 0.7]]))
# [[w7, w8], [w9, w10]]
self.hidden2.weight = torch.nn.Parameter(
torch.tensor([[0.8, 0.9], [0.22, 0.33]])
)
# [[w11, w12], [w13, w14]]
self.output.weight = torch.nn.Parameter(
torch.tensor([[0.44, 0.55], [0.66, 0.77]])
)
It's very basicay. π
Now I do:
bias = False
model = MyModel(bias=bias)
x = torch.tensor([6.9], requires_grad=True)
breakpoint()
prediction = model(x)
grad_out = LoggingTensorOld(torch.ones_like(prediction))
L = prediction.sum()
prediction.backward(gradient=grad_out
the models made those predictions on those datasets, right? and the prediction overfits, because its too flexible and thus considers the irreducible error too much
Note that the input tensor is a normal torch tensor but I define the "output gradients" as one and of type LoggingTensor. I pass it to backward . Now this allows me to see a log of all the function calls during backprop. I see this: https://bpa.st/FU4Q
As you can see, we have: (My notation is simplified)
- dL/do * W
- dL/dk * W
- dL/dh * W
But also
- dL/dx * x
- dL/dh * v
- dL/dk * h
and 2-3 other things IM not sure about yet.
Question: Why exactly do I see forward values show up?
any point on the curve represents a prediction that that model could make
hmm maybe chatgpt to the rescude for me
god maybe I'm just way too tired.
Does backprop also compute dL/dw?
yes. and then the optimizer (which might be stochastic gradient descent) decides how to use dL/dw to update the values of w
yes, that part is clear
okay so it's really just as simple as
"we derive with respect to the weight duh"
okay good, very close to figuring out my issue
what about derivatives between nodes e.g. dv_i/dk_j? I think that's something you could theoretically compute during the forward pass right?
yes, i just need to confirm as the more observation a dataset have, the model will be less affected by its irreducible error as a larger percentage of its change is from the function that relates y to x
honestly I'm not really sure.
sure, but you don't need to. calculating the derivative of the loss function and adjusting the weights will account for the weight between any two connected nodes.
sorry, my bad, i was just really unsure of myself making the right conclusion
also keep in mind that when you view the network as one grand function, the nodes aren't really discrete things in the function. the weights are.
you don't need to be sorry. I should be sorry for not knowing. but I'm not sorry, because I am evil.
but i think from conversing with you i can relates the knowledge that the variance decreases with sample size.... so technically the irreducible error becomes less important
I mean it's basically the derivative of the activation I think so you do need it.

the derivative of the loss function includes the derivative of the activation functions, yes
god I swear fucking with the backprop implementation in torch makes me lose my mind π but yeah, I think I finally solved all my theoretically issues. No idea why I got so confused. Now I could finally continue coding, at 02:43 in the morning π
thanks
go to sleep
haha I will - I just had to solve my issue otherwise I couldn't have slept.
im confused in which loss should i use for siamese nn, contrastive, triplet, crossentrophy which fit the best?
I made this financial model for crypto and in the optuna studies ive seen some really interesting behavior with this thing.
wow
can i have a look at the code?
so u can predict the past at 40% accuracy? πΏ
the residual plot also seems to lack generalization
it's actually discovering fundamental patterns in market behavior not just predicting. this is a different trail with slighty different parameters.
youll see t hat the valadation begins to go negative.
u say it like its a good thing?
wait,what pattern are you inferring from this?
prob referring to his gaussian error plot but dunno
what worries me is the predicted vs actual returns
me too
looks like a overfit to me
Usually that's the case, but in my model its doing better on validation actually whole using simpler and simpler representations. This is an outline what Im using to compute the loss
# Quantum-aware uncertainty scaling
quantum_uncertainty = torch.exp(-log_var) * quantum_scale
# Enhanced negative log likelihood with quantum effects
nll_loss = 0.5 * (quantum_uncertainty * error**2 + log_var + math.log(2 * math.pi))```
wouldn't you expect the actual vs expected return to not be so horizontal?
or the error vs predicted to not be a giant V ?
I start the model with a tiny dim size of 32 and , and I built a module that actively seeks to reduce dimensional usage when it spots simpler patterns. think of it like compression instead of using a huge dim to represent market patterns, it's learning to be to be more efficient
as far as the predictions vs actuals plot, instead of trying to predict every tiny price movement (which would give you that diagonal line), it's learning to recognize particular patterns where it can make reliable predictions. it shows the model is most accurate (smallest errors) around certain market conditions, and it knows to express more uncertainty when conditions don't match its learned patterns. but thats why the validation loss going negative is a good thing.
Not sure to follow.
Your model has predictions and they do not match the actual values (ie. horizontal line)
For the V, it sounds like you are predicting returns, which should be centered around 0 (and also explains why you have a lot of points centered there). So the V could mean the further you are from the center, the more wrong you can be, by definition
Its actually intentional due to how i built the architecture. The horizontal banding happens because the model separates market conditions into different "regimes" or patterns. Look at the color intensity in those bands - the brighter colors show where the model is most confident.
So when you see those horizontal bands, that's the model saying "under these specific market conditions, I expect returns in this range with high confidence." The spacing between bands suggests it's identified distinct market states where different return ranges are likely.
Think of those horizontal bands like prediction zones so when the model sees certain market conditions, it's saying "I expect returns to fall in this specific range." The fact that they're distinct bands rather than a scattered diagonal line means the model has identified clear, separate market states where different return ranges are likely.
What matters is how closely the predictions match the actual values, not their confidence. You can be super confidently wrong π
If your model predicts a gain of 100% tomorrow but the stock actually loose 100%, you loose money, even if the model was super confident about it
better approach then would be to predict such events as a timeseries data giving u dates to invest
but honestly as recursive pointed out a confident model on wrong predictions brings u nothing
note also you can use/build a model to classify the market regime and use that in your model
but regardless, your predictions need to have some predictive power
Otherwise, how do you plan to use that model?
If you do not believe the feedback you got here, I would recommend you to run a back test of your model to see the results for yourself on a trading strategy (make sure the backtest data has data posterior to the training data of your model)
Im alrerady doing that . class MarketComplexityDetector(nn.Module): def forward(self, x): market_complexity, regime_probs = self.market_tracker(x)
what's your result?
2.75 - 3.5 sharpe ratio in the backtest with a 70% win rate? What do you want to know?
like win rate? sharpe ratio? max drawdown?
The max drawndown is actually in the negative. lol
I can't argue with that, but something still smells fishy to me
Im sure it does hehe
but hey, I would be happy to be proven wrong
soon u are millionaire
but good for u if u backtested and it works, its fine to gamba a bit i guess
Maybe a nitpick but if the diagonal line doesn't run through (0, 0) and (1, 1) a predicted vs actuals plot is very hard to read
When that's fixed, you can tell in an instant if a model is doing well or not
I was showing my friends this but im actually for the entire image. I had to make it discord friendly, lol
lemme locate it
Either way, I'm always extremely skeptical of this stuff
It's the data equivalent of turning iron into gold
And the follow up is "if I can do it, why can't an army of PhDs that have dedicated their life it at {insert massive firm here} not do it?"
this could be argued cause they dont want to predict they want to force the market
ngl this is too good to be true
is ur model on 1 coin or multiple?
They have found stuff, but they haven't found stuff that good
also 102 trades seems very low.
I kinda do that in a month and am not even looking at intra days, more chill stuff
look, i get it. i'd be skeptical too. heres one, you have to save it to zoom in though
im running optuna trials so im still trying to narrow down the best parameters.
I see charts from 1970 all the way to 2020 but only 100 trades?
obviously bitcoin doesnt go that far dude, its a hobby project, its not perfect geez lol but my architecture does sing
I am trying to make sense of your charts and taking it seriously as you might start putting your own money down
Or at least simulate it in the real world
I do simulate real world.
And the results are the same there?
lol, that's why I implemented market regime detection, transaction costs, and realistic position sizing. those were the real world results my friend. you should have seem them before on the these results I had a sharpe in the 4's
if you ever do actually trade with it, I would love to hear an update from you about your perspective and what you learned from it, regardless of the outcome
I plan to, but I going to get everything right before I chuck real money at it
I've seen contrastive loss used most of the time, however, triplet loss can as well be used depending on how your Siamese network is structured.
The best fit depends on your task.
When to choose between Contrastive and Triplet loss
-
Triplet Loss: when working on fine-grained distinctions or ranking tasks, when working with a very large dataset with diverse examples, or when working on tasks that requires nuanced relative comparisons.
-
Contrastive Loss: when you're trying to determine whether two inputs are similar (binary similarity tasks), when working with small dataset, when working on a simpler similarity/dismilarity tasks.
I need help!
So i need a problem statement for the major project of college but I ain't got no idea. So please help me. Drop Some real life problems that can be solved or optimized using AI but no one has yet attempted to do so.
i finally got the quote i need !!!!!
Overfitting is especially likely in cases where learning was performed too long or where training examples are rare, causing the learner to adjust to very specific random features of the training data that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.
wikipedia
yay!
but is this true? i need an expert opinion
yes? wikipedia articles are constantly being scrutinized. they undergo more scrutiny than most peer-reviewed papers.
oh... yess finally oh my god i can sleep in peace π π π π
i used tripletloss but the loss doesnt improve, its always around 1 or 0.8 ive tried changing the optimizer, lr, momentum, batch size
also thank you, i just know that i can trust wikipedia even though i cant use it as a major literary source materials
Use the sources of wikipedia π
I'm pretty sure all standard ML textbooks will have this info as well
In the pinned posts I link a few, https://www.statlearning.com/ has this stuff for sure
From where can i start learning llm
I assume you're specifically interested in generative LLMs like ChatGPT. what do you want to know about GLLMs?
Yeahh.
Andrew karpathys yt tutorial series
Unfortunately, there's no shortcut to this part in ML. You just have to experiment more to figure out how to improve your model performance.
To avoid wasting much effort, start your debugging by verifying that your model is at least able to overfit on a small subset of your data (n <= 100 samples) .
If you're unable to overfit on that small subset, then the issue is most likely one of these
- your model architecture is too simple to fit the data
- you didn't set up the training process properly
- there's indeed a bug in your implementation of Siamese network.
Today i launched my AI project with for business owners where AI handle inventory and chat with customers about business products, availability and services in realtime i would anyone to try it out https://app.cognova.io/?ref=dsc
do you have any other features available? otherwise, you're limited to figuring out if there's some sort of time-based cycle (weekly, monthly, yearly, etc.)
wdym?
there's other features but they're redundant, I used them to remove any outliers
the only actual data that would matter is the price and sold_at I'm pretty sure
try plotting the price over time with a line plot and see if there's a pattern
even after removing obvious outliers there's still some stragglers that I'll remove
why is there more than one point per day?
kus multiple sell per day
what is the thing that's being traded? a stock?
nah, just an online product being sold
okay, what online product?
why does that matter?
could be seasonal for example
wouldn't it be easier to answer the question than argue about whether or not it's important?
it's just a gaming thing, available for purchase 24/7 and is non-seasonal. all of this stuff I've accounted for I'm fairly sure
the problem is idk which method of forecasting to use
Hey guys how do you manage your python virtual environments
I'm only able to determine price trends currently, I can't really accurately determine what the specific price would be in x days
I'm not sure that you have enough data to produce a forecasting model that isn't essentially random.
I have this problem where I don't know what to do exactly making a big env related to one topic like data science will make me download a lot of libs in one place and some projects don't need most of these libs, however If I make small env srelated to each subtopic I will have to install extra libs that will probably be in the rest of these envs beacuse most projects will mostly need the same libs that are in an another env.
I would suggest uv
it does have a global cache btw
and it's very speed
how much more data and what data would you say I would need? just stuff describing the characteristic of the item?
for example how would you group these
the most important thing is that you don't use anaconda.
you can just use regular python environments (the venv module that comes with python). or you can look into uv.
even pip has a global cache. it's usually fine to have multiple venvs with the same packages
ah, hmm, thought it was a uv thing
oh well, uv is speeeeeeed
pandas
numpy
matplotlib
seaborn
streamlit
shiny
requests
beautifulsoup
scrapy
tensorflow
sklearn
keras
scrapy
would you install it in one env or make small ones for each category?
i would not do categories. create a venv for each project
other features that are relevant for determining the price of the commodity. you might have to source those features from somewhere else. and having that data for a longer duration would make it easier as well.
Well now I will have to repeat myself alot for each project. I know that pip caches these envs but This still takes a lot of storage space!
the point of caching is that it doesn't take up extra space (also the packages are not even that big to begin with)
okay cool, thanks. and what method of forecasting would you recommend I use without being too broad or too overkill (knowing I just want to determine the price of an item)? ik some models are super complicated and don't do much better than a simple moving average
they don't take much space at all.
The env itself does
start with a simple one and see how it does.
about 25MB or something
that's negligible
okay cool, thanks
not really. on linux it's just a simlink
if you're really concerned about local storage, just use google colab or something
I make a lot of small projects and this would be inefficient!
I want to make a big env, but I am worried about the performance!
you don't need to squeeze every last drop of efficiency out of your computer.
and even if you did, the size of the env or the number of envs that you have will not affect performance. at all.
So, you what do you recommend?
make a venv for each project
That is a storage space problem. I have limited storage on that Linux boot.
how much?
44.11 GiB / 58.81 GiB (75%) - ext4
okay, well having just one environment won't affect the performance of your code. but if you need to reproduce the results later, you need to keep track of all the library versions that you used
which you can do with pip freeze and write the result to file.
I mostly import all the dependencies at the begging of the code. This is not primarily an issue.
right, but each environment can only have one version of each library
I just heard people saying that having a lot of dependencies would be heavy while loading the env
they were wrong.
Ok, then if this is the case I prefer having everything related to each topic in one place
Gemini just told me that having a lot of dependencies may make the env heay
what does it mean for the env to be "heavy"?
takes time to activate
activating a venv is basically just setting some environment variables
Does this disprove it?
actually activating the venv is instantaneous no matter how much you have installed, for the reason PSVM just said.
starting a python process with that venv is different. having more libraries installed might make an imperceptibly small difference.
having more libraries installed increases the number of places python has to look when you import stuff. and this will have an imperceptibly small difference as well.
none of this is the kind of thing you need to optimize around.
does what disprove what?
the increase of time taked to activate an env depending on the number of libs
I covered this in my message just now.
yeah, thanks
remember that starting a python process with a venv is not "activating" the venv.
I am not taking about the efficiency of the code. I am talking about the time to activate the .venv
like source .venv/bin/activate? that is always instantaneous.
can you use a Adam optimizer for a LSTM?
@serene scaffold are u not on mobile today :D?
at least not at this exact moment.
if you still need help with the problem from yesterday, make a paste bin that includes every user-defined function, variable, etc.
kinda yes, i think i found the problem and it has to do with my dataset generation, however i dont see where the missmatch is
img_path = pathlib.Path('./data/img/Test')
for folder in img_path.iterdir():
if folder.is_dir():
main_name = folder.name
for subfolder in folder.iterdir():
if subfolder.is_dir():
sub_name = subfolder.name
subfolder.rename(img_path / f"{main_name}_{sub_name}")
dataset = torchvision.datasets.ImageFolder(img_path)
lable_dict = dataset.class_to_idx
# PIL to Torch img
data_transformer = transforms.Compose([transforms.ToTensor(),
transforms.Resize((46,46))])
img_paths, img_lables = [i[0] for i in dataset.imgs], [i[1] for i in dataset.imgs]
class CustomDataset(Dataset):
def __init__(self, image_paths, labels, transform):
self.image_paths = image_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
# Load the image
image = Image.open(self.image_paths[idx]).convert("RGB")
image = self.transform(image) # Apply transformations
return image, torch.tensor(self.labels[idx], dtype=torch.long)
img_dataset = CustomDataset(img_paths, img_lables, data_transformer)
classes = dataset.classes
return image, torch.tensor(self.labels[idx], dtype=torch.long)
try moving these two to cuda?
simply with .cuda()?
cuda = torch.device('cuda')
...
image.to(cuda), ...
```
nope
I would go back to the training loop and print the .device attribute of stuff
I didn't see history here, what was the error? I occasionally have to fight with the device w PyTorch. Had a case where the tensors would get zeroed out unexpectedly due to a device mismatch and modification
Does anyone here works in the field of BIO DATA SCIENCE?
that's usually called bioinformatics.
Im finally finding the sweet spots in the models parameters i gained nearly two hundred epochs even with a patience of 10 set. it's kinda hard to see but thers a minus sign in front of the y=-0.03
try dataset = torchvision.datasets.ImageFolder(img_path) print(dataset.class_to_idx) lets see what it says
print the first 5 path and label pairs, lets see whats going on with the data
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
Hi,
If anyone familiar with this, please help me.
https://discord.com/channels/267624335836053506/1320601045784596542
I cannot find any explicit statement on statlearning.com, only implicit... and i do not want to make invalid assumptions
if I defined a pytorch model in a file, how do I use it in another file?
how do you overfit a model?
statlearning is a good enough book to cite. If not, type "overfit" into google scholar and you'll find stuff π
you can check chapters 4 and 5 here. sadly it's not searchable, but also i didn't find overfitting mentioned explicitly. the ideal is discussed in depth though: under some conditions, you can always generate a sequence of functions that converges exactly to the data points you observed, and this is often not what you want. you want a way of restricting the possible functions to ones that are somehow "simple"
the discussion on the order of polynomials in ch 4 is good
I think if you really want the technical definition googling "generalization gap" is a good start
Anyone here has idea about datacubes
I want to learn how to create datacubes but can't find any useful resources
What do you exactly mean with "datacubes"?
Okay, couple of pointers π
I doubt people think of the image above when talking about OLAP
And you're better of googling either of "data engineering" and "dimensional modelling"
My data looks something like this so I thought it was a good image
What kind of data do you have?
kaboom can i dm you i have a question
Hey, anyone here has any Time series forecasting project tat has a dataset where dates repeat and there is a hourly pattern in the data?
Satellite data
Latitude longitude time and different sensors data at those
Coordinates and time
Yeah sure
And what do you want to do with your data?
What types of queries
Convert hourly or minutes data to daily or vice versa
please check your dms
Change resolutions of data suppose one satellite has 2x2km resolution
And another has 5x5km
So we have to interpolate or some other technique
To bring them to the same grid
hmm, then I don't think OLAP is a good fit. How did you get there in the first place?
What do you mean?
Just curious to see where you got the idea to use OLAP from + what their train of thought was?
Olap
Is a good choice suppose I have to slice a part of data
Or convert data to
Like monthly
Have you consired time series DBs?
Yes
And multiple dimensions as well?
Though do I have to create a datacube as My guide told me to do so
I did created something similar
By storing it in ndimarrays
Aside from time series DBs
I feel like there should be some geospatial DB or so
I've not worked with GIS but upsampling and downsampling satelite images seems like a standard use case
Yes and different satellite might have different resolution
And like to check the trends
How it's behaving weakly
Monthly ,quarterly
just ask !!!
dont ask to ask
Alright, so I would like to make a face recognition system. The input is only one image. How do I augment the picture in a way that the face is facing every angle like for example I am looking up
Also can mediapipeβs face mesh be used for facial recognition? By identifying who does the face belongs to with facemesh?
you only want to detect face or you wanna detect particular face?
Particular
Also, is there any python library or GANN to remove accessories like goggles from faces
Hi, do you guys know any good Data Science conferences in Europe for 2025?
then just train it on, but ensure you have different angles
I think I might have stumbled on an interesting new approach for financial time series architectures , hear me out. Rather than manually tuning the dim sizes using the standard optimzation methods ,ive been experiemtning with letting the network discover its own ideal capacity through a continuous feedback mecahnism. The idea came to me thinking about financial markets and their immutable property. So it got me thinking, what if it could adapt its capacity based on the inherent complextity of the market state its processing. Im testing this theory to check it out.
Do you understand the concept of bias-variance tradeoff?
Overfitting occurs when your model performs exceptionally well on the train data but poorly on the test data.
I suggested you check if your model can overfit on a small subset of your data because it's actually a good sanity check.
If you train your model on a small subset of your data and it cannot achieve 99% - 100% accuracy, then it's a clear indication that your model might have those 3 issues I mentioned in my last response.
The idea is, a small dataset is easy to memorize, even for a relatively simple model. So if your model cannot even overfit on a small dataset, then it'll likely struggle to generalize on even larger dataset.
So in essence, confirming your model can overfit on a small subset of your dataset is a sanity check used to catch bullsh!t very early w/o having to waste time and energy training a model for so long only to end up with an abysmal performance.
If you try that and you're able to overfit on a small subset of your data, then that proves that indeed your training pipeline (data loading, model architecture, optimizer, etc) is truly working correctly.
(Pdb) C = torch.tensor([[0,1]])
(Pdb) id(C)
5098297024
(Pdb) C[0], type(C[0]), id(C[0])
(tensor([0, 1]), <class 'torch.Tensor'>, 5099183120)
(Pdb) C[0][0], type(C[0][0]), id(C[0][0])
(tensor(0), <class 'torch.Tensor'>, 5372607632)
(Pdb) C[0][1], type(C[0][1]), id(C[0][1])
(tensor(1), <class 'torch.Tensor'>, 5369838736)
I always thought torch.tensor is just a memory wrapper with a bunch of extra info. but it seems that each element of C and C itself is a tensor object?
Both ECAI and ECIR 2025 are scheduled to hold in Italy.
Sadly, most popular (top) AI conferences are held in North America. (I could go on to rant on how tiring it is but unfortunately, that's where we found ourselves)
Meanwhile, I use https://openreview.net and https://aideadlin.es to keep tabs on stuff.
Of course, how can I forget to mention Heidelberg Laureate Forum (HLF) in Germany. This one is actually dope if you can get in.
Application deadline for HLF 2025 is on 11th February 2025.
C is a rank-2 tensor while C[0][0] is a rank-0 tensor; a scalar, yeah, but still a tensor. However, calling C[0][0].item() will yield a normal scalar object ( an instance of int class).
Also, a tensor is more than just a memory wrapper. It also has other functionalities it encapsulates like autograd, PyTorch's computational graph, metadata.
I think the tensor library can be summarized as a multi-dimensional array with GPU support.
Emyrs is right. Keep in mind also that when you slice/index a tensor, the python object that it returns is newly created.
I got 90 from my sentiment analysis project π
The highest in the class hahahaha
Teacher commented βyou used so many different models and introduced focal loss to address imbalance which was very uniqueβ
nice!
Thank you π
I wasnβt sure if I was gonna get a high score because I focused on F1-score but teacher said thatβs more important than accuracy in imbalanced dataset
guys if im trying to train an ai but its a pain to get the data since i need to get it manually is it possible to like generate extra data to train it on or do i just use a smaller dataset?
depends on what you're working on
for some things, you might be able to get an existing model to generate synthethic data for you
for others, you might be able to get it from the web and/or find a ready dataset
for others, it can be a very time expensive task
Many companies spent millions hiring people just to create datasets for them (e.g. Amazon Mechanical Turk)
even if it is, I'm not sure how useful it is.
i made a ball
thank you zestar2512
thank you also zestar
https://paste.pythondiscord.com/ESTA can someone tell me if my ai thingy works or if its wrong or what
it gives output but idk if it is correcy
or if its just being wrong
No one can answer this without running the code, so you have to show what the output is when you run it and explain what you think the output is supposed to be
right lemme explain
So I have a bunch of pickle files which I want to extract into JSON files. These JSON files include multiple economic values such as price, demand, etc of different items (this is all ingame in a very unusual game). I wish to write a program that gets these pickle files, turns them into JSON and then uses the information to tell me whether it would be optimal to buy or sell each item, shows me the average price change and also shows me a "profitability" factor, and the best way I thought to do this was to use AI to like try and do it best as it can.
this is a small snippet of output
there i sent some info is that enough?
I'm going to sleep. Hopefully someone else can pick up.
https://discord.com/channels/267624335836053506/1320998949032562779 can someone help me
i seem to have an error
i just did it on a small subset with 100 samples and 10 epochs, got around 50% accuracy does that count?
best was 61
Were you able to overfit on the small subset? What's the performance on the test data?
i got almost the same result on both train and test subset
There you have your answer.
50% accuracy is as good as random guessing. So, now you've seen that your Siamese network was unable to overfit on a small subset of your data, it shouldn't be a surprise when it's unable to generalize on a much larger data.
Those 3 issues I pointed in my previous response, you need to now evaluate each of them to figure out where the problem is from and how to fix it.
any of you have a favorite book from which to fish out definitions of probability distributions?
i decided not to randomly choose between positive and negative when getting the image pair and got better results, it does overfit now is this a good sign?
matplotlib is taking such a long time to learn ._.
i literally spend 2 days of my holiday only reading 10 pages of book
use plotly
whats the difference between both of those?
i read that plotly is easier and more interactable
matplotlib is inspired by matlab, which is a different programming language. and given how unintuitive matplotlib is, it shows.
plotly is newer and more intuitive, and it can make both static and interactive plots.
oh no... is it similar to matplotlib somewhat
I've been using matplotlib to varying extents for years, and I still don't feel like I understand it.
I mean, they're both for making plots? but plotly feels more intuitive and object-oriented.
ah... hmm, what should i do with this info now... since the ISLP, the book ive been using for the past month, is using matplotlib
up to you
its no problem learning it later on right? since its much easier anyway
no
what
no, it won't be a problem learning it later.
thank god
this is pretty funny, im 65 pages away from using matplotlib a second time
why is BERT driving me bloody mad?
A week ago, it was going fine, I guess I just forgot what I knew. Threw it out the window of my mind
Use a sentence transformer on the original data to enrich it.
Try that.
yeah the ai wont work in first place
but its 2am so im not gonna fix it rn, gn
can someone explain to me what is model bias? just having trouble trying to define it
The word "bias" is used in a few ways that, as far as I can tell, aren't really related to each other.
I somewhat figured it out. From what I searched up it can involve a number of errors like algorithm errors, the selection of training data used and whatnot. The ISLP book doesnt really explain model bias that well at all
Also, recommend any website or books that are good at explaining what Bayes Classifier and KNN are? Something which can be somewhat understandable. Want to learn a bit more about these two classfication methods before I go onto the Classification chapter of ISLP
i have now learned pytorch and cnn's whats next?
Please do not ping specific people to answer your questions. You can use pings to help continue an ongoing conversation. Not to summon someone to help you.
sorry i just replied
Yes, but that caused me to get a ping, and I wasn't talking to you.
do what you like
there are lots of architecture
but prefer to go with building projects
What yall working on? Is anyone working with training a model on say the Python grammar spec?
Training a model to do what?
On the python grammar spec+code so it understands python code...
You have to train models to do certain things. They don't actually know things. They only appear to know things inasfaras they do their tasks correctly.
For models like ChatGPT, their task is to generate text. They aren't trained on formal representations of English syntax.
Maybe they should be...
Nope.
lol I can see your not a candidate to assist me with my project lol
And your a Computational Linguist!
What do you want your model to do?
what would understanding Python code even look like?
A different type of neural network
This conversation can't go anywhere until we've established what the inputs and outputs are for the model.
The inputs are the Python grammar specification coupled with example code snippets and the output is Python programs from a prompt
even if it would be possible to learn the rules of ebnf grammar from the grammar, it still doesn't have the semantics
There are already prompt-to-python models that always produce syntactically correct python no matter what. Where they fail is with the semantics. So I think it's doubtful that coupling the prompt -> code examples with a grammar spec would improve anything.
There are probably ways that you can improve their performance wrt semantics
I'm trying to find the paper on the guy who did this with another language to detect 'code smells' in a program
"code smell" is syntactically correct code that's considered distastefully designed
He used the language's grammar spec...
If you find the paper, please post it in this chat.
True but it takes a higher level to determine it's a 'smell'
I'm searching...
Yes. But whether or not something is a code smell isn't a syntax question. So I'm not sure how the grammar would help.
That's something but it's not the paper I'm talking about...I'm still searching for it
I can't find it. Crazy...
I think I can see a link to why one might think that learning grammar could help, here's an isolated example:
Given code is
for i in range(len(seq)):
item = seq[i]
Model learns that based on grammar rules (is this grammar related at all?
(anyway)), this can be simplified to
for item in seq:
...
Now, obviously, such a leap might be several levels of indirection deep for a model to even get there and simply "simplifying an expression while maintaining grammar compatibility" is certainly not a great metric for making code better.
It'd be much easier to just tell the model that one is bad and the other is good and let it figure out the rest or something. So yeah, idk what it would do with grammar exactly, but I can see a thread of thought leading somewhere in a somewhat logical direction I guess
for that behavior, you'd either need to train the model on pairs of (poorly factored, well factored) code. there's nothing about the Python EBNF that would allow any system to conclude that regular for loops are preferable and (for built-in sequences) semantically equivalent to rangelen loops.
what's the other option?
good catch. I decided I was too tired to describe the other option, and forgot that I said "either"
I'll tell you another time if I remember.
(as a reminder for myself, my coworker whose name starts with J suggested it, and I thought it was a bad idea.)
was it Joe? /s
@worldly wagon, don't bother learning nltk. I work professionally as a linguist and do not use it.
hi sorry stepped a way for a bit to research before i ask
but i do need nltk, its for my research/research paper
then don't bother trying to become familiar with the whole thing. just learn the bare minimum to accomplish your goals. which is how most people approach most libraries.
badly worded my previous reply, i agree just new to NLP stuff so wanted to learn a bit
why are you sure that you need to use nltk?
most of my work before was in predective modelling
centered around classification of words from text, ie features such as phonemes, morphemes, sight words etc
also the professor kinda told me to leverage the lib
i'm honestly worried about the performance too but hoping panda's numpy and some c++ can carry there
you can use spacy for this. it's more modern than nltk.
and don't worry about performance until you've confirmed that the performance isn't good enough.
i agree with you there, but i'm fairly sure performance wont be good enough but i dont/wont pre-optimize
ahh now i have two things to look into to, as for chosing spacy over nltk, if its anything like polars over pandas that will be great as i prob won't see my prof till jan
@worldly wagon what are you doing that you care about phonemes? (I am a linguist and have an IPA chart on my wall next to me right now.)
literacy metrics in the anglophone caribbean, basically taking a given text input and giving each words some classifications
how do phonemes pertain to that?
just one of the classifications necessary
the plan is for my UG to produce meta data for any given text input
please send me your paper when it's ready for external consumption.
sure will write that down appreciate the insight btw will look into them
you might also look into doublemetaphone. it's an algorithm for approximating a phonetic representation based only on spelling. it's very fast.
thanks appreciate any advice i get
buy assets and sell liabilities.
don't lower your standards for her/him.
thank you thank you
you're welcome
I turned to linguistics after failing to become a professional quote maker.
In the same vein, this video is excellent "nobody is consistently right at making deviant forecasts" https://youtu.be/6WroiiaVhGo?feature=shared
Pretty happy with how the uncertainty distribution and calibration are looking after 570 epochs, but im curious what ya'll think.. Happy Holidays and all that jazz too.
And an IPA chart? I assume that's some chart of beers, but assume far less interesting π
Merry Xmas, good color choice π
Haha thanks π
Should i make a different fc layers for resnet when i train it on different dataset? since it only got 1 fc layer
What do you mean with "different fc layers"? Different weights? Different amount of neurons?
a new classifier with more linear layers
not sure how to describe it
Sure, the architecture can differ from use case to use case
Start large and decrease
I think this is the right area for my question? I have a dataset of book covers and from each of those covers, Iβve programmed a code to take the most (maybe top 5? Itβs all a blur at this point) dominant colors in that image. How do I cluster similar enough colors together to be able to get enough color groups to compare across the different subgenres? Iβve tried to use kmeans a number of different ways and Iβm just about to give up on the whole thing altogether.
ETA: for example if I have 5 different rgb codes, but theyβre all super similar, I want them to count as one color. And I donβt want to hand code it. Afaik, I need some sort of machine learning.
How do I deal with this data distribution
You can try minmax/standard scaling
Maybe even apply log tansformation
or segment data to two groups, bell curve and spike
Merry Christmas everyone ππππ. Today is a good day to allow your pc to rest. Go enjoy your time with your family and loved ones.
make sure to check ur dataset, maybe there was an error within a pipeline
This is the original data
you could add a column that's just whether this data is 0 or not
Could you elaborate
add a boolean column that indicates if this data is 0
e.g. df['is_data_0'] = df['data'] == 0
does having more fc layers increase the accuracy? also does the input_features param have a specific rule? i always see people set it to 4096 or 2048
guys this 2025 im starting with ai ml...........how to get started??
can you recommend me some yt channels ,, videos or books maybe?
Check the pins
ok thank you
I am learning Data Science.. Anyone wants to collaborate ?
Is the new anaconda 10.24 I can't change between command and edit mode?? Always blue when I run Jupyter notebook??
People set it two powers of 2. I forgot why but itβs more efficient to do so
Unless you know exactly why you need to be using anaconda, I recommend just using regular python.
I'm taking a course for data analysis still learning fundamentals he set up anaconda but he change between two modes while I'm just on Command mode and don't know how to change to edit mode he uses old version of anaconda
@azure osprey I removed your message for containing self promotion.
if one knows a good approach to filter large amounts of text to features other than nltk and rapidfuzz would be interested in ur opinion
What kind of text and what kind of features, for what task?
Please be as specific as you possibly can.
i got a description of and want to find (over many descriptions) shared wording, therefore i splitted the original description into features using re, however this can lead to bad fragments etc., therefore i wanted to know if theres a better solution for that task.
Currently its working and im able to filter frequently used words by this approach but i have to do a manual selection of those words and also do manual cropping, once thats done i use a list of those frequently used words to check the original descriptions for the occurrence.
My question had three parts. Please let me know when you've answered all three.
Also, you wrote "I got a description of", not there's nothing after the of.
i think i gave answer to all three πΏ :
- what type: description (recipes)
- what kind of feature: frequently used words (so to say a word)
- for what task: occurrence of frequently used words
Your original message didn't say recipes, so I was confused. Let me think.
yeh u noticed i dropped that sorr xD
Each description of a recipe is a document.
- split each document into tokens
- drop tokens that are stopwords
- replace each token with its lemma
- split each sequence of tokens into trigrams, 4grams, 5grams, ... Ngrams
- look at the frequency of the *grams across documents
do u know if works for text like: "thisIs a string whichCan have linked Words and also integers12g..."
So medial capital letters are actually supposed to be a word break?
And medial digits?
i would say also, however digits can hold units following them, i tried to do that with re but thats not suitable i think
You can do it with re.
If a capital letter follows a lowercase letter, or a digit follows a letter, add a space.
But not for letters coming after digits.
There wouldn't be an easy way to not split medial capitals that are supposed to be there. Like for McGonigle.
i was thinking to maybe give it a kind of limit but am unsure
i want to avoid a large dict which needs to be loaded and updated all time
for my testing nltk seems to be way better than re, i think i will do prefiltering with nltk and then check for short tokens and their indicies
why nltk over spacy in this usecase btw? (btw not criticising just asking for my own knowledge)
Hello is there any way to bypass the shape of an output being equal to the yTrain data shape in a CNN model when training......I'm trying to follow a CNN architecture stated in a research paper, but the input shape is equal to the output shape which is not equal to the shape of my yTrain data thus causing a mismatch....Idk if I reshape it, that won't let the model learn well.
i.e input_shape = output _shape = [ ?, 1280,1 ]
y_Train shape = [, 5,1]
The research paper I'm talking about is this
Leong, Zi Xian & Zhu, Tieyuan. (2021). Direct Velocity Inversion of Ground Penetrating Radar Data Using GPRNet. Journal of Geophysical Research: Solid Earth. 126. 10.1029/2020JB021047.
Or if there's any suggestion on how I can do it without need to bypass , thanks
Check out these new wild visuals. the training and val graphs are pretty smooth, it seems its learning without overfitting. Finally got the complexity metrics working right too, but im still running optuna trials to tune parameters, playing with base/hidden/max dims. Tweaking those gives wild different outputs.
The complexity metrics is cool is like the pulse of the market
guys how does linear algebra play in data science
depends on whether you consider data science and ML to have a hard boundary. everything in ML involves linear algebrea.
I feel like im dealing with the 3 body problem in my base/hidden and max dims π
These parameters are all pulling on each other, and changing one can totally throw off the others.
I'd say "involves" is an understatement, lol. It's the bedrock of data science and ML. It's the fundamental language for representing data as vectors and matrices, and NumPy and Pandas are built upon these principles, making it essential for even basic data manipulation.
Anyone please?
ill check that out thanks, i just did a quick google what to use to tokenize larger strings and thats what i came up with π
Is the new anaconda update 10.24 don't offer switch between edit and command mode always shows blue bar with no in out or pencil?? If anyone can confirm this or just a bug from my app??
What python package is this?
I must know
Definitely installing immediately
Once I know
for visualization?
he probably used Matplotlib with dark background style
this is the official documentation to achieve this effect:
https://matplotlib.org/stable/gallery/style_sheets/dark_background.html
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('dark_background')
fig, ax = plt.subplots()
L = 6
x = np.linspace(0, L)
ncolors = len(plt.rcParams['axes.prop_cycle'])
shift = np.linspace(0, L, ncolors, endpoint=False)
for s in shift:
ax.plot(x, np.sin(x + s), 'o-')
ax.set_xlabel('x-axis')
ax.set_ylabel('y-axis')
ax.set_title("'dark_background' style sheet")
plt.show()
using plt.style.use('dark_background') according to the documentation changes the theme of the plot to dark mode.
this link has all the style sheets in Matplotlib:
https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html
I never used dark background
I always used five thirty eight
Or seaborn styles
I'll look at it
I assumed it was interactive as well
I just got Plotly on my side too
plotly is amazing
never heard of it, will check it out
You must
I have a thing for visualizations
I know machine learning is important and all
But the graphs are just so pleasing
i really like the altair.param method
in altair?
For Plotly you just use plotly.dark
In general with visualizations
Something about them seems so alluring
yeah dark visualizations look sick
Bloomberg used dark visualizations
Which is probably why
And economics is my profession
oh that's cool
Yeah so my point on this channel will probably be for yfinance and other related packages
I hope you find the help and the assist you need
Quick question, does randomforest get affected by outliers?
Should i remove outliers?
I'll have no problem, everyone loves Python right
Only wished that yfinance wasn't discontinued by Yahoo itself
Absolutely
any idea π₯²
That sounds like a general statistics question
it is
just want conformation, i cant trust gpt anymore
I know it has the bagging technique I think it's called
Bootstrap aggregation is what the Internet calls it
So i found a dataset on kaggle and wanted to make my own notebook.
out of 100,000 records i ended up with with 10,000.
My original idea was make a neural network but the number of records is low isnt it?
idk
Can you describe the dataset? Or give a df.head
wwait
9 features
Hmmm interesting
So idk, do you suggest continue with neural network or randomforest
My initial thought was to do a logistic regression with #diabetes part
but its classification problem, either 0 or 1
Im unsure
am second year data science student
i feel stupid not knowing where am going
Im economics so less knowledgeable on it. For me I'm only now looking at these things
bruh, python is really for everyone
Ye
i will try both then. Extra 2 hours no worries π―
you removed 90,000 outliers?
Hi, what are the best tools to use for ml model deployment? Docker, Amazon Sagemaker, etc? Idk much about this so if you guys have some online course suggestions that I can take to learn this is would be great, ty!
Check out the pins for this channel
I didnβt see anything about model deployment, maybe Iβm blind lol
Ah, cuz there aren't any for that I don't think. I may have misread what resources you were looking for specifically.
matplotlib and seaborn dark theme
depends a lot on the type of model you're trying to deploy. deep learning models are super annoying, while linear regression is super simple
How would you do it for either scenario?
What do you think of this course?
https://www.udemy.com/course/deployment-of-machine-learning-models/
MLOps is quite broad, hence there's really no acclaimed "best" tool out there. This is because, the notion of what's best can be very subjective.
You might find this roadmap interesting. https://marvelousmlops.substack.com/p/mlops-roadmap-2024?utm_source=substack&utm_medium=web&utm_content=embedded-post&triedRedirect=true
being good at docker is a superpower for MLops and in general. learn about the difference between an image and a container, all the things you can do when you start a container, and how to make an image with a Dockerfile.
hey guys i need help with my pytorch train function
# Train function
def train_combined(model, train_loader, criterion, optimizer, epochs=30):
model.train()
for epoch in range(epochs):
running_loss=0.0
correct=0
total=0
for image, label in train_loader:
image, label= image.to(device), label.to(device)
optimizer.zero_grad()
fwd_output=model(image)
loss=criterion(fwd_output, label)
loss.backward()
optimizer.step()
# Calculate accuracy
_, predicted = torch.max(fwd_output.data, 1)
total += label.size(0)
correct += (predicted == label).sum().item()
train_acc = 100 * correct / total
print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss / len(train_loader):.4f}, Accuracy: {train_acc:.2f}%")
I keep using cross-entropy loss and it keeps yapping about: RuntimeError: 0D or 1D target tensor expected, multi-target not supported
I dont really understand how though
btw ping me, i may get off in a bit π(hopefully stelercus could impart me some great wisdom)
What page should I read up to on mastering pytorch?
Please show the whole error message. (Remember to always do that)
Omg I just got my Reinforcement Learning book for next semester. The author uses Lisp in exercises
what the hell
I went to the codes and they were in lisp
fuck
π’
I did I'm asking what page on the book should I read up to.?
so "mastering pytorch" is the name of the book? is it an O'Reilly book?
LISP was the AI language.
This class looks like a blast
which one is it @unkempt wigeon
a real blast that will blast my head
Im getting flashback from my data structures and algorithms course
π₯²
@unkempt wigeon the "Who this book is for" section of the book says "Working knowledge of deep learning with Python programming is required", so you should not have purchased the book.
Omg I just looked at this @iron basalt
RL is all about algorithms?
RL is a branch of Control Theory.
And also a branch of ML.
It's for AI, not just ML.
It's math.
The algorithms are abstract and can be implemented in many ways.
good thing is that this class doesn't have exams
just a project and homework
I would be screwed I think
The book will give them to you like this:
I implemented them in C when I first went through it.
Second edition
oh okay π
Oh man, memories of ai courses in grad school tragically missed the boat
do I need the first edition?
No, you need a book that's for learning about neural networks. not how to implement neural networks that you already know about.
What's the name of the book that I need an by whom?
Also thank you
No, the data was imbalanced so i had to drop most of the records to make the majority class match the minority class
If the data is representative, you shouldn't try to solve the imbalance
https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning
what does "representative" mean
That the sample, when scaled up to the size of the population, (roughly) matches the population
The auther stated that it depends on the learning or model you want to use
like he mentioned, random forest is not affected with imbalance
but am sure neural network does get affected
I mean, another way to emphasize a minority would be to use a weighted cost function so that the loss is more affected by the minority than the majority and, I'm sure that would have a slightly different effect than solving the imbalance by throwing data out
Oooo the info on this link look good!! Tyyy!!
Got it tyy
Hi everyone, Iβm encountering an issue with timing inconsistencies in my PyTorch GPU matrix multiplication loop. Hereβs the situation:
I'm iterating 1000 times and timing each matrix multiplication operation torch.matmul on the GPU.
Some iterations show 0.0 ms, which is unrealistic, while others show high variability (e.g., 188 ms vs. <1 ms).
My total execution time for 1000 iterations is ~496 seconds, which seems inefficient.
I suspect issues with CUDA context initialization, memory allocation within the loop, improper timing measurements, or CUDA/PyTorch compatibility.
Details:
Timing: time.time(), without torch.cuda.synchronize().
Installed PyTorch for CUDA version 12.4; CUDA version on PC is 12.7.
GPU: NVIDIA RTX 3060 Ti.
PC specs: Intel i5-12400F, 16GB RAM, SSD+HDD storage.
this is my code:
import torch
import numpy as np
import time
import matplotlib.pyplot as plt
times = [] # this variable is for storing the time taken for each iteration
# Perform large matrix multiplication on GPU 1000 times
start_gpu_100 = time.time()
for i in range(1000):
# Define large matrices as random tensors directly on GPU
large_matrix1_torch = torch.rand((10*i, 10*i), device='cuda')
large_matrix2_torch = torch.rand((10*i, 10*i), device='cuda')
print("Starting Iteration: ", i)
iter_start = time.time()
result_gpu_100 = torch.matmul(large_matrix1_torch, large_matrix2_torch)
iter_time = (time.time() - iter_start) * 1000 # Convert to milliseconds
print(f"Ending Iteration: {i} in {iter_time} ms")
# Collect iteration times
times.append(iter_time)
end_gpu_100 = time.time()
plt.plot(times)
plt.xlabel('Iteration')
plt.ylabel('Time (ms)')
plt.title('Time taken for each iteration on GPU')
plt.show()
print("Time taken on GPU (1000 times):", (end_gpu_100 - start_gpu_100) * 1000, "ms")
final time taken:
Time taken on GPU (1000 times): 496804.4521808624 ms
I also made this plot, i really wonder what is the cause of these spikes, and why isn't it a linear relationship with the matrix size?
Alright
Btw I am using a vision transformer for this
how do you make a model that generate text and replies like chatgpt? what is it actually called in deeplearning
a large language model? or are you looking for some technical term?
yes
yes to what?
technical term
there are many terms, autoregressive sequence to sequence models and transformer models are the common ones
but it depends on the kind of LLM being used
Are you wanting to do this local? I posted a link a little bit back that has a new Nvidia computer and the guy puts llamas on it.
Okay, but what about the error message? Are you going to show it?
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-64-79153cd7a2a3> in <cell line: 1>()
----> 1 train_combined(new_vit_pathmnist, val_pathmnist_dataloader, loss_fn, optimizer_fn)
4 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3477 if size_average is not None or reduce is not None:
3478 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3479 return torch._C._nn.cross_entropy_loss(
3480 input,
3481 target,
RuntimeError: 0D or 1D target tensor expected, multi-target not supported
I can send you the notebook if you want
No; I'll look at this when I'm at my computer
No problem sir
yeah im looking into it can you link it again
FREE GIVEAWAY OF JENSEN-HUANG-SIGNED ORIN NANO SUPER! See Below!
Join Dave as he explores NVIDIA's Jetson Orin Nano Super, a compact AI powerhouse with 1024 CUDA cores and 6 ARM cores for just $249. Learn why this could be the best AI board for your projects in robotics, IoT, or AI development. Free Sample of my Book on the Spectrum: https://a...
the error comes from inside cross_entropy_loss and it's saying that the target tensor must be 0d or 1d. it assumes that 2d means a "multi-target" problem, which cross_entropy_loss does not support. so you need to look at the shape of your label/target tensor and check that it conforms to what cross_entropy_loss expects.
is that device needed for llms?
the -> points to the line where the error occurred. the ----> above it points to the function that the error occurred inside of. that's usually all you need to understand the error message at the bottom.
PyTorch has a nice doc page for functions like LayerNorm() with the math and everything. Do they somewhere also have docs about the derivative/backward functions of it?
LLMs require huge amount of computation power. The Jetson Orin Nano Super can run Small language models (SLMs), just to be clear i don't have a jetson orin nano super on me, but from its spec you can tell that it can probably run like an 2-7B parameter language model.
alright
i will check sir
alright sir
this is the label
these are random predictions:
i very confused sir
this is what it is for mnist
https://pytorch.org/docs/stable/generated/torch.amax.html states that
amax/amin evenly distributes gradient between equal values, while max(dim)/min(dim) propagates gradient only to a single index in the source tensor.
What exactly do they mean by that? Do they mean that if we have an array [1,2,3,4,4,3,2,1] then we have two max vlaues (two 4s), which might have different gradients but amax then just takes teh mean of it and reports that? What does max do in that case? The word propagate doesn't really mean much.
Oh crap
Never mind sir I just realized that my data set was multi-class not single class
ππ stelercus can rest now
It isn't needed specifically. It's just a little computer for cheap that is almost designed for the job. @ $250, if it's a serious project and you need more power, you could stack these as I understand (I also don't have one)
I can't rest until I have accomplished my glorious goals.
ik you can acheive them sir
damn i dont think i can handle a llm on my crappy intel i7 6600u
the LLMs that people talk about these days require enterprise hardware.
and even then, sometimes you have to quantize (which means to store each parameter at a lower precision, at the cost of performance)
can some1 check out #databases
For convolutional neuron that my case in it just taking a photo and converting it into an array?
an image can be represented as a 3d array. do you know what each of the three dimensions represents?
hint: greyscale images are 2d.
3d is for color
there are three dimensions. I'm asking you to say what each one is.
I'm asking do I convert it in each coordinate into an array
if you load the image with PIL, it will just automatically be an array.
Is that why we need for the conversation
the error messages are there to explain the error, not to confuse you. read them, think about what they mean.
usually the answer to "how do i fix the error" either lies in the error message itself, or the immediately surrounding code, or both
only if you want to!
hi, if im starting out and want to do some of my own deep learning projects do you guys recommend I go thoroughly through the theory first, like a textbook or a course then start development or go head first into trying to build a project?
i'd say you should alternate between theory and practice. but a structured course is almost always better than going it alone. are you already competent at programming?
if so, you might want to check out "dive into deep learning" or "fast ai" courses, both free online
hey guys can you please provide me a begineer roadmap for machine learning as i am beginner and new in programming and i also have basic knowledge on programming and all so please provide me
Depends on what you mean by competent, but I would consider myself somewhat experienced.
I will check that out, thank you
you can research or search for roadmaps online but I know for a fact that if you are going into machine learning/ ai engineering then you must know your maths. linear algebra, calc, and stats are the main ones
okay please tell me
thank oyu
you
Np
what about something like the andrew ng course?
search on yt for the roadmaps
what is adrew ng course
heyo
im working on an ml model and need a bit of guidance,
Don't wait for someone to agree to help you. Be as specific as you can about what you're trying to do and what problem you're having.
aye
i need a bit of help with the venv as it says "ipykernel" is missing and when vsc installs it, still doesn't work.
double check if you have the correct environment selected
maybe try reloading/restarting vscode
and i think it's the ve problem, but the cuda and other things still dont work and when i have per say py 3.7 and the terminal still says it's on 3.12 or something
i did
just pinging this again, if someone could help that'd be great
I have questions on this following code
df['Sentiment'] = df['Sentiment'].map({"positive":2,"negative":0,"neutral":1})
MODEL_NAME = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(MODEL_NAME)
class Finance_Dataset(Dataset):
def __init__(self,Sentence,targets,tokenizer,max_len):
self.Sentence = Sentence
self.targets = targets
self.tokenizer = tokenizer
self.max_len = max_len
def __len__(self):
return len(self.Sentence)
def __getitem__(self,idx):
Sentence = str(self.Sentence[idx])
Sentence = " ".join(Sentence.split())
target = self.targets[idx]
encoding = self.tokenizer.encode_plus(
Sentence,
max_length=self.max_len,
padding="max_length",
return_attention_mask=True,
return_token_type_ids=True,
add_special_tokens=True,
truncation=True,
return_tensors='pt',
)
attention_mask = encoding['attention_mask']
input_ids = encoding['input_ids']
token_type_ids = encoding['token_type_ids']
return {
"Sentence":Sentence,
"attention_mask":torch.tensor(attention_mask,dtype=torch.long),
"input_ids":torch.tensor(input_ids,dtype=torch.long),
"targets":torch.tensor(target,dtype=torch.long),
"token_type_ids":torch.tensor(token_type_ids,dtype=torch.float)
}
from sklearn.model_selection import train_test_split
df_train,df_val = train_test_split(df,test_size=.20,random_state=42)
BATCH_SIZE_TRAIN = 8
VAL_BATCH_SIZE = 4
MAX_LEN = 200
num_epochs = 1
def get_dataloader(df,tokenizer,batch_size,max_len):
ds = Finance_Dataset(
Sentence = df['Sentence'].to_numpy(),
targets = df['Sentiment'].to_numpy(),
tokenizer=tokenizer,
max_len=max_len
)
return torch.utils.data.DataLoader(
ds,
batch_size=batch_size,
num_workers=0
)
train_dataloader = get_dataloader(df_train, tokenizer=tokenizer, batch_size=BATCH_SIZE_TRAIN, max_len=MAX_LEN)
val_dataloader = get_dataloader(df_val, tokenizer=tokenizer, batch_size=VAL_BATCH_SIZE, max_len=MAX_LEN)
training_batch = next(iter(train_dataloader))
training_batch.keys()
attention_mask = training_batch['attention_mask']
input_ids = training_batch['input_ids']
targets = training_batch['targets']
token_type_ids = training_batch['token_type_ids']
bert_model = BertModel.from_pretrained(MODEL_NAME)
class BERTClass(nn.Module):
def __init__(self):
super(BERTClass, self).__init__()
self.l1 = BertModel.from_pretrained(MODEL_NAME)
self.l2 = torch.nn.Dropout(0.1)
self.l3 = torch.nn.Linear(768, 3)
def forward(self, input_ids, attention_mask, token_type_ids):
_, output_1= self.l1(token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
output_2 = self.l2(output_1)
output = self.l3(output_2)
return output
model = BERTClass()
model.to(device)
loss_fn = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=1e-5)```
def training_epoch(epochs):
model.train()
for d in train_dataloader:
input_ids = d['input_ids'].to(device,dtype=torch.long)
attention_mask = d['attention_mask'].to(device,dtype=torch.long)
targets = d['targets'].to(device,dtype=torch.long)
token_type_ids = d['token_type_ids'].to(device,dtype=torch.long)
outputs = model(
attention_mask=attention_mask,
input_ids=input_ids,token_type_ids=token_type_ids)
optimizer.zero_grad()
loss = loss_fn(outputs,targets.float())
optimizer.zero_grad()
loss.backward()
optimizer.step()```py
@lapis sequoia please put a py after the three backticks so that there's color.
```py
code goes here
```
just keep that in mind for the future. what is your question?
@lapis sequoia please stop trying to fix it. please just ask your question.
ok, I am trying to run it in a training loop, I keep getting the same error
be sure to always show the error message right away.
raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds
Please show the whole error message. that's just the end.
don't worry about how long it is. just show the whole thing.
print(training_epoch(epoch))
Cell In[200], line 153 in training_epoch
outputs = model(
File ~\anaconda3\envs\pytorch_env\Lib\site-packages\torch\nn\modules\module.py:1736 in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ~\anaconda3\envs\pytorch_env\Lib\site-packages\torch\nn\modules\module.py:1747 in _call_impl
return forward_call(*args, **kwargs)
Cell In[200], line 131 in forward
_, output_1= self.l1(token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
File ~\anaconda3\envs\pytorch_env\Lib\site-packages\torch\nn\modules\module.py:1736 in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ~\anaconda3\envs\pytorch_env\Lib\site-packages\torch\nn\modules\module.py:1747 in _call_impl
return forward_call(*args, **kwargs)
File ~\anaconda3\envs\pytorch_env\Lib\site-packages\transformers\models\bert\modeling_bert.py:1062 in forward
raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds
what's the exact code that's in In[200]?
```py
code goes here on a new line
```
Cell In[200], line 153 in training_epoch
outputs = model(```
def training_epoch(epochs):
model.train()
for d in train_dataloader:
input_ids = d['input_ids'].to(device,dtype=torch.long)
attention_mask = d['attention_mask'].to(device,dtype=torch.long)
targets = d['targets'].to(device,dtype=torch.long)
token_type_ids = d['token_type_ids'].to(device,dtype=torch.long)
outputs = model(
attention_mask=attention_mask,
input_ids=input_ids,token_type_ids=token_type_ids)
optimizer.zero_grad()```
You have this
```py
outputs = model(
attention_mask=attention_mask,
input_ids=input_ids,
token_type_ids=token_type_ids
)
```
Which goes to `BERTClass.forward`
```py
def forward(self, input_ids, attention_mask, token_type_ids):
_, output_1= self.l1(token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
output_2 = self.l2(output_1)
output = self.l3(output_2)
return output
```
Look at what you pass to `self.l1`, which is the BERT model that you're wrapping.
you define it outside first?
before we continue, do you understand that the py must go on the same line as the three backticks?
```py
code
```
yes, just exausted
@lapis sequoia self.l1 is the BERT model itself. l2 and l3 are just two additional layers to turn it into a classifier. right?
yes
the BERT model requires you to specify either input_ids or inputs_embeds.
self.l1(token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
before the nn.Module?
do you see?
I do, is that defined in the class or outside of it first? Or does it all going through with the training epoch?
in your forward method, you don't do anything with the input_ids.
you need to pass them through to the BERT model.
self.l1( ) is where you pass things through the BERT model.
is defining these variables first, messing it up?
No. defining variables does not, in itself, mess anything up.
in the class, for the dataset, does anything not have to be there?
The BERT model is telling you "You have to specify either input_ids or inputs_embeds"
self.l1(token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
you need to specify either input_ids or input_embeds
you currently do neither
do you see the solution?
show how this line should be modified to solve the problem.
_, output_1= self.l1(token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
do you only need those two?
it's just one line--it's the only one you need to modify to solve the current problem.
do you know what the solution is?
you still need the input ids, or do you not?
the error message says "You have to specify either input_ids or inputs_embeds"
not both
you currently do neither.
can you imagine what code you would need to insert that would "specify input_ids"?
the only parameters you have available are input_ids, attention_mask, token_type_ids, none of which are embeds.
targets
targets?
Just replace the forward method with this.
def forward(self, input_ids, attention_mask, token_type_ids):
_, output_1= self.l1(input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=False)
output_2 = self.l2(output_1)
output = self.l3(output_2)
return output
I did too much too fast
are you following a tutorial, or something?
no, when I do it normally, and chill, and just ask someone what is off instead of over doing it and going to another source, I start doubting what I know and just make all of it a mess. Yes, it was from a huggingfaces collab
def __getitem__(self,idx):
Sentence = str(self.Sentence[idx])
Sentence = " ".join(Sentence.split())
target = self.targets[idx]
encoding = self.tokenizer.encode_plus(
Sentence,
max_length=self.max_len,
padding="max_length",
return_attention_mask=True,
return_token_type_ids=True,
add_special_tokens=True,
truncation=True,
return_tensors='pt',
)
attention_mask = encoding['attention_mask']
input_ids = encoding['input_ids']
token_type_ids = encoding['token_type_ids']
return {
"Sentence":Sentence,
"attention_mask":torch.tensor(attention_mask,dtype=torch.long),
"input_ids":torch.tensor(input_ids,dtype=torch.long),
"targets":torch.tensor(target,dtype=torch.long),
"token_type_ids":torch.tensor(token_type_ids,dtype=torch.long)
}
from sklearn.model_selection import train_test_split
df_train,df_val = train_test_split(df,test_size=.20,random_state=42)
BATCH_SIZE_TRAIN = 8
VAL_BATCH_SIZE = 4
MAX_LEN = 200
num_epochs = 1
def get_dataloader(df,tokenizer,batch_size,max_len):
ds = Finance_Dataset(
Sentence = df['Sentence'].to_numpy(),
targets = df['Sentiment'].to_numpy(),
tokenizer=tokenizer,
max_len=max_len
)
return torch.utils.data.DataLoader(
ds,
batch_size=batch_size,
num_workers=0
)
train_dataloader = get_dataloader(df_train, tokenizer=tokenizer, batch_size=BATCH_SIZE_TRAIN, max_len=MAX_LEN)
val_dataloader = get_dataloader(df_val, tokenizer=tokenizer, batch_size=VAL_BATCH_SIZE, max_len=MAX_LEN)
training_batch = next(iter(train_dataloader))
training_batch.keys()
I got it to work, thank you
is datacamp a good resource for beginners looking to get into data science? i've been looking into courses, but confused on which one to commit to. data camp is currently $159 for the year.
I don't know that it's bad necessarily, but I've never heard anyone recommend it.
if nothing else, people tend to value what they pay for, so buying it might trick your brain into sticking to it.
I donβt think itβs worth paying for
I did a lot of data camp years ago when I was in uni, each prof can apply for a free license for all of their students
It doesnβt really teach you anything imo, but at the very least itβs good at exposing you to many new concepts
And keeping you βbusyβ / motivated with some milestones
did bert take you a while to get down?
do all of you just tune bert like it is nothing?
I can't diagram BERT's entire architecture or explain every component, so in that sense, I don't "have it down".
I've done several projects that involve fine-tuning BERT for classification tasks, with varying degrees of success
Itβs definitely not βa walk in the parkβ or anything, right?
No
Is it one of the easier ones compared to t5 or BART? Iβve never fine tuned those 2, Iβve used them.
I mean, there's a point at which you know enough about ML that you can adapt ML code without understanding how the model works "all the way down"
Yeah, I was just overwriting it
I've heard of BART but idk what it is
I'm trying to wrap my head around transfer learning. My understanding is that you use the architecture for a similar task and maybe even the layer parameters. Then you just train the output layer parameters. How does that work out?
I understand you could optionally just retrain all the parameters and just use the original task to initialize them. This wouldn't improve the model right? Only in theory reduce training time/resources?
transfer learning is just the general idea of using a model that was trained for one task, in the context of a different task. I don't think there's actually a throughline in any implementation-level details
Ah okay
Well, it seems like at the very least you have to retrain the output layer
Unless you are looking for exactly what the original model was
Have you ever used it and if so, how?
you say "layer" as though we're sure we're talking about neural networks. but transfer learning is even more general of a concept than that.
Ohhh
Yeah, I'm focused on neural networks.
It's just my current topic of learning, so forget that other machine learning stuff exists π
Tbh I wan't even aware it extended beyond neural nets
in that case, it would be more helpful to focus on fine-tuning, since that's a more specifically-defined concept.
it's literally just "using a model for a task that wasn't the original task"
Ah okay
"using" can mean anything.
I gotta assume regardless of context it needs some fine tuning, right?
Or at least experience some level of poorer performance
no, because there are types of models where "tuning" isn't even a thing.
Oh. What sort of models? Do they lack like... hyper parameters?
I'm trying to give myself a crash course on all this so I can find better jobs, haha, so it's definitely very new
idk, everything I do involves deep neural networks. which I don't purport to fully understand.
Haha, fair enough!
I learned how to make them in numpy or even do them by hand. Obviously not particularly useful or practical, but...
Was still fascinating all the same
how to make what in numpy?
Oh, a neural network
nice
I've been doing a lot of learning on how to evaluate different neural networks or choose between them, but I still have no idea how to like... architect a network for a given problem.
How do you make that decision?
I don't, because I work in language technology, and the only thing to do that's relevant is to wrap or augment a generative language model (ChatGPT, Llama, etc)
Ahh gotcha
I guess no need to reinvent the wheel for this sort of thing, if other people have spent months/weeks determining an ideal architecture
or have pretrained a network
Do you work with like... audio data? Or just like text?
there are people in my department who work on speech data exclusively; I am not one of them.
Also, if you don't mind me asking, what's your job title?
computational linguist
Nice
(I have formal training in linguistics, in addition to CS)
Oh, badass
you should know, the market right now isn't great. and even when it is, ML jobs are the most related-degree-requiring in all of programming.
Tell me more about the market?
I understand about the degree thing, most are asking for PhDs
most of what I know is from #career-advice. my company's hiring is driven by different factors than the market in general.
Yes, the mirror we hold to our minds. The study of the purest manifestation of human intelligence.
depends on the day.
Does it help with knowing the context of the word and stopping recurrence?
what do you mean by recurrence?
like, repetetive use of the same word?
Knowing the context of a word embedding and not throwing out words the matter
what were you doing that inspired you to ask this question?
I donβt know, the word βpullβ may be a stop word in most cases but if it pertains to a bunch of people deadlifting weight, it matters and has context.
stopwords are usually a finite set of words that are the same in every context
Do you know if it's just ML or if it' CS in general?
Just asking how it helps, I know it does and I know people literally go to university for linguistics to get better at NLP.
seems like it's CS in general.
Are you talking about uni for ML?
you should ask in #career-advice. I just write python code and talk about linguistics.
NLP > CV for you?
does CV stand for computer vision, in this context?
Yes
NLP is the one that I do.
this is a non sequitur.
Why
being an engineer has no baring on whether one would prefer NLP or CV.
True. I only based that on most people I know who are engineers use computer vision much more than others in DL/ML stuff.
any resources you would recommend? i've been looking into different courses on coursera as-well
check the pins
I genuinely think Im breaking now ground here guys. Check out that V shape in the error vs actuals, thats the uncertainty calibration at work. Smaller market moves, it nails the predictions (tight at the bottom!), and bigger moves show linearly increasing error, and the red line hugging the V means the model knows how uncertain it should be. The complexity shows the adaptive dimensionality is working, see how its adjusting its computational capacity based on market conditions. Im still doing intense parameter discovery right now though, but ive implemented a dynamic range for the model's dimensions ie, 256 base dim, 512 for max dim. So far it seems that the parameter controlling the maximum expansion limit, relative to the complexity scores, play a huge role in it all.
Interesting enough because of the adaptive dim size instead of a big fixed one, i can increase the seq length substantial without hitting performance hiccups in the training
Is this from 1 script or modular scripts? Are you threading? How long has this been training?
Probably finetuning
Unless youβre doing research or youβre in a very specific niche where you can really go in depth not breadth you β¦ donβt.
You pick an existing architecture and roll with it
!code
https://paste.pythondiscord.com/T36A
this doesnt train properly and i looked at everything i could and it doesnt have anything different than working ones(i tried with another dataset and still doesnt work)
by doesnt train properly i mean the loss isnt going down
I recently explained how transfer learning works and its different variants sometime ago. This might also help you solidify the concept.
I know you're doing this for learning purposes but that's exactly the difficulty handrolling your own architecture
It could be anything, your learning rate, batch size, amount of neurons, ...
i tried to fix this for 2days and no result so idk i said i might aswell stop wasting time
What I'd suggest you do is to take it step by step
Comment out the data augmentation for now
Make your network larger. If you can't at least get the training loss to 0 with a large net that means you have a bug
okey
and i think the problem is with the dataset
or me augmenting it
since i copied a conv net i made and tried that with the set and its not working still
Yeah, so isolate the parts you don't trust one by one π
Augmentation is a solution and not something you do when you don't have to
It's a form of regularization
Basically, when you make your large net that can get 0 training loss (but likely has a decently bad validation loss) you start adding regularization such as augmentation
Also, try 3e-4 for your learning rate for Adam
i dont know what 3e-4 means sorry π
0.00003?
it's a value. A default that is often used for Adam, a nice starting point
IT WORKED
tysm bro
Happy to help π
https://karpathy.github.io/2019/04/25/recipe/ this is a great read
Musings of a Computer Scientist.
hi, I run a BERT as a classifier on a very very simple task. I have a script that trains it until I get validation accuracy of 1.0. THen I have a second script with which I want to run some experiment. I load the pretrained bert classifier model and check the accurcay again, just to be sure. It is 1.0. Now I put the model into training mode and take 1 sample and do a forward pass: The prediction is wrong.
I'm a bit confused. Sure during training the trainin acc is "only" 98% 99% or whatever but still. Now I know that setting the model to training mode will enable the dropout layers and what not but I still expect a correct predictions - otherwise, how should I be able to train.
Any explanation for this?
My guess would be: I don't store the optimizer state, so I end up with a different model state (if Im in train mode) but that's too deep for me to really be able to judge it.
In fact, my test accurcay in train mode is only 63%
actually, optimizer state doesn't matter in my case since all I wanna do is do a prediction in train mode. I dont wanna continue training. hm.
hmm I think I misunderstand something
so I set dropout to 0, retrained my model and now it seems to work. dropout was 0.2 But I still don't fully understand. It makes sense to me, that the randomness of dropout can fuck wiht your results and that's kinda the point but what I struggle with understanding is: We do training with dropout enabled and we get, let's say, training accuracy of 99%. Then we store the model and load the model, shouldn't we have the same accuracy?
hmm actually with dropout at 0.2 the training accuracy converges to 69%. I think I have to go back and study some theory! ^^
assuming pytorch, there's train mode and eval mode
in the latter, dropout layers are disabled so it performs better
how should I be able to train.
you're still training, the model will just never converge to a still point
which is actually a good thing cause that's probably just overfitting on training data
this is torch's page on dropout which has a link to a paper for ig more theoretical stuff
ok, NER with a halo 2 dataset. Should I just use spacy or transformers?
what types of entities?
characters from h2 and their lines
what is h2?
Halo 2
okay. please try to be specific when you talk about non-programming things on this server. we have 400k people from every part of the world.
can you give a link for the halo 2 dataset, so that I know how it's structured?
No, it is a dataset for NER
and you can't show where you got it?
Kaggle
can you give the link...?
Ok
I cannot find it on kaggle at the moment. I downloaded it a year ago. It probably came from a database.
he guys
i am trying to do binary classification on images, and I am very confused on how you are supposed to build the model and also what loss function to use
if anyone can help, it would be good
please say more about what the images and what the binary classes are.
alright
well, I am using a vision transformer that has a mlp head of 2 classes
and I am using a pneumonia dataset that is binary classification
the labels in the dataset usually are like this: [0]
and my output from the vit is like this: [0.3, 0.4]
so the images are xrays of lungs, and the classes are "HAS PNEUMONIA" and "DOES NOT HAVE PNEUMONIA"?
yup
okay, that's what I was asking.
what is the "vit"?
It is known as TinyViT
I still don't know what that is.
the first result for searching it up should be its github
all you need to know is that it is a vision transformer with not a lot of parameters
and there is a mlp head that I am modifying for binary classification
if you don't have much experience with image classification, I would start with a convolutional neural network.
I have done a cnn for multi-class problems like MNIST
but for binary classification, I am confused in general
it's more so about the MLP
wait i found something on kaggle
ViT (Vision Transformer) is a kind of transformer architecture designed for computer vision task. So I'm guessing TinyViT is most likely a small scale implementation of this vision transformer architecture.
if youre using pytorch and after loading the model turn on eval mode for the model, the dropout will be disabled
binary cross entropy
Preferably your last layer is just a linear layer without a sigmoid, then you use https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html
But it seems to me yours already has that baked in
yeah
It's a modular, i split it into a few scripts. It has the database, risk management and all the model stuff. No threading though.
two years ago on this day, I wrote my first line of python code. I do not know, thought I would just say that.
Which is better in face recognition? Deepface or face_recognition library for python?
do you guys learn regression models
Wdym? Regression is a pretty important/fundamental concept that you can't help but learn at some early point in learning data science
im stuck at learning how to use pandas π₯² my holiday is almost over
How are you stuck? What concepts?
not really stuck, but i just barely got through slicing and boolean indexing
i cant believe this short text is really confusing to learn lol
guess which book it is
Do you get how the boolean indexing works? It's a pretty important idea
That's a pretty crazy example tho. Wth
Reminds me of how Uni's teach nested loops right after introducing loops, before students 'get it'
hey guys, Iβm looking to learn more about the OpenAI API and how to use it to create AI agents that RAG and function calling. I donβt have much experience with these concepts yet, so Iβd appreciate any guidance. Does anyone have suggestions on where to start or resources I could use to learn?
OpenAI has some good tutorials with code examples
Are these tutorials on yt?
Their developer docs are really good, plus they have examples on GitHub