#data-science-and-ml
1 messages Ā· Page 153 of 1
@weary timber try it and see.
one last thing, i do feedforward with all the training data is it a problem?
like i do data = load()
and do feedforward(data)
do you know about training data and testing data?
yes
Hey guys, currently working on a text to handwriting letter model with EMNIST but my outputs for this jupyter notebook are really bad and I dont know why. ANy tips to improve it?
Like is something wrong with my training or
or potentially how my image is being saved
Hello! When you have freetime can you check my class project? I just want to get feedback from people who are in the field
no ultralytics_runner found hwo to fix
why you guys use GPT to code
just go step by step
and you will understand more things
rather than directly put the code and ask here
Hello so i was working on a computer vision project for first time ever (project is to extract data from Invoices). I saw some YT videos that they extract the text and use Regular expressions. Is this the optimal way? or is there another approach ? Thanks in advance
20,000 samples something
I also tried this logic, which is faster than the last. but the validation accuracy still lacks.
def build_image2text_model():
image_input = Input(shape=(5, 28, 28, 1)) # 5 images per sequence
# feature extraction with regularization
conv_features = TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same'))(image_input)
conv_features = TimeDistributed(MaxPooling2D((2, 2)))(conv_features)
conv_features = TimeDistributed(BatchNormalization())(conv_features)
conv_features = TimeDistributed(Dropout(0.3))(conv_features)
conv_features = TimeDistributed(Flatten())(conv_features)
encoder_output = LSTM(128, return_sequences=True, kernel_regularizer=l2(1e-4))(conv_features)
encoder_output = Dropout(0.4)(encoder_output)
decoder_input = RepeatVector(3)(encoder_output[:, -1])
decoder_output = LSTM(128, return_sequences=True, kernel_regularizer=l2(1e-4))(decoder_input)
# attention mechanism
encoder_aligned = Dense(128)(encoder_output)
attention_output = Attention()([decoder_output, encoder_aligned])
decoder_combined = Concatenate()([decoder_output, attention_output])
# final layer
output = TimeDistributed(Dense(13, activation='softmax'))(decoder_combined)
# compile the model
model = Model(inputs=image_input, outputs=output)
model.compile(optimizer=Adam(learning_rate=1e-3),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
# training model
image2text_model = build_image2text_model()
image2text_model.fit(
X_img, y_text_onehot,
epochs=40, batch_size=128,
validation_split=0.1
)
kinda short on time for my school project sorry
I think it was because the emnist letters has 27 classes and my training was trying to use all the possible classes instead of just 27
still working on it
Now in those days I really like pythtorch
It's code like in real pythonic
I really love that
!code
import numpy as np
import pandas as pd
data = pd.read_csv(r"C:\Users\mehme\OneDrive\Desktop\neural network\train.csv")
data = np.array(data)
m, n = data.shape
np.random.shuffle(data)
data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.
data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.
_,m_train = X_train.shape
def sigmoid(n,deriv=False):
x = 1 / (1 + np.exp(-n))
if deriv:
return x * (x-1)
return x
def init():
global params,cache
params = {"W1":np.random.randn(10,784),"B1":np.random.randn(10,1),"W2":np.random.randn(10,10),"B2":np.random.randn(10,1)}
cache = {}
def feedforward(input):
cache["A0"] = input
cache["Z1"] = params["W1"].dot(cache["A0"]) + params["B1"]
cache["A1"] = sigmoid(cache["Z1"])
cache["Z2"] = params["W2"].dot(cache["A1"]) + params["B2"]
cache["A2"] = sigmoid(cache["Z2"])
def backprop(input,desired):
m = input.size
feedforward(input)
dZ2 = cache["A2"] - desired
dW2 = 1 / m * dZ2.dot(cache["A1"].T)
db2 = 1 / m * np.sum(dZ2 , 2)
dZ1 = params["W2"].T.dot(dZ2) * sigmoid(cache["Z1"],True)
dW1 = 1 / m * dZ1.dot(cache["A0"].T)
db1 = 1 / m * np.sum(dZ1)
return dW1,db1,dW2,db2
def update_params(dW1,db1,dW2,db2,learning_rate):
params["W1"] -= learning_rate * dW1
params["B1"] -= learning_rate * db1
params["W2"] -= learning_rate * dW2
params["B2"] -= learning_rate * db1
def gradient_descent(x_train,y_train,learning_rate,iterations):
init()
for i in range(iterations):
dW1,db1,dW2,db2 = backprop(x_train,y_train)
update_params(dW1,db1,dW2,db2,learning_rate)
if i % 50 == 0:
print(i)
gradient_descent(X_train,Y_train,0.1,10)
feedforward(X_dev[0][0])
can someone with time check this code and tell me how do i fix the error
at backprop while calculating the db2 it gives an error saying the dz2 is 2 dimensional so it cant be summed in axis = 2
how does that make sense i cant get it
and another problem, when it works(when i set the sum to be done in axis 1) the output neurons returns 10x784 array
Hey, i have a Pandas series ```
session_duration
0 26.625450
17 28.681083
20 35.153633
25 27.660017
28 32.193067
and a df which has a column session. I now want to basiccly get the data in the df that only have theese sessions in it, how would i filter
multible lines have the same session
so you only want to get rows of df where the index is one of those five (0, 17, 20, 25, 28)?
also, what is the name of that series variable?
its basiccly spotify sessions that i set to be continues listenings sessions
this looks sigma
i tried this:
if you're just here to troll, kindly refrain.
session_duration_avg_adapted_ids= session_duration_avg_adapted.index
session_duration_avg_adapted
df_session= [df['session']== session_duration_avg_adapted_ids]
can you answer my questions before you continue?
im sorry man
i didnt mean toš„ŗ
yes, name is session_duration
try df_session.loc[session_duration.index]
perhaps I misunderstood you. can you explain what it means for something to be sigma?
a sigma is a cool person
that means ur code is really good
i think
idk how to read code
is this the javascript server
ah. I thought a "sigma male" was someone undesireable.
my friend is saying javascript is better than python
get better friends.
is javascipt bad
No. I'm just being silly. He can have that opinion.
Do exactly the code that I gave you and let me know what happens.
actually, do df_session.loc[session_duration_avg_adapted.index]
again, do this exactly.
KeyError: "None of [Int64Index([ 0, 17, 20, 25, 28, 48, 49, 58, 77, 125,\n ...\n 2982, 2987, 2991, 2995, 2998, 3012, 3014, 3023, 3034, 3055],\n dtype='int64', name='ts', length=236)] are in the [index]
lemme check dtypes
my friend say do you prefer tf or pt
might be string even tho that would be weird
i prefer subway
please run the code that I gave exactly, and if you get an error, show the whole error message
he said by pt is pie torch
I don't like pytorch per se, but development of tensorflow is slowing down, so I don't consider it a viable option.
i like subway do you like subway?
sorry, i think i have found something important ratehr than the message
okay; if you need help from me again, ping me with the result of the exact code and the whole error message.
you do deserve to be here.
@subtle talon you deserve to be here. please do not squander what you deserve by spamming.
@serene scaffold i dont think you can work with that but since u insist:
to recklessly waste
oh ok
KeyError Traceback (most recent call last)
Input In [120], in <cell line: 4>()
1 session_duration_avg_adapted= session_duration[(session_duration>= 26) & (session_duration<=36) ]
2 session_duration_avg_adapted
----> 4 df_session= df.loc[session_duration_avg_adapted.index]
5 df_session
File c:\Users\Grr\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.__getitem__(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)
File c:\Users\Grrr\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1191, in _LocIndexer._getitem_axis(self, key, axis)
1188 if hasattr(key, "ndim") and key.ndim > 1:
1189 raise ValueError("Cannot index with multidimensional key")
-> 1191 return self._getitem_iterable(key, axis=axis)
1193 # nested tuple slicing
1194 if is_nested_tuple(key, labels):
File c:\Users\Grrr\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1132, in _LocIndexer._getitem_iterable(self, key, axis)
1129 self._validate_key(key, axis)
1131 # A collection of keys
-> 1132 keyarr, indexer = self._get_listlike_indexer(key, axis)
1133 return self.obj._reindex_with_indexers(
...
-> 5842 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
5844 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
5845 raise KeyError(f"{not_found} not in index")
KeyError: "None of [Int64Index([ 0, 17, 20, 25, 28, 48, 49, 58, 77, 125,\n ...\n 2982, 2987, 2991, 2995, 2998, 3012, 3014, 3023, 3034, 3055],\n dtype='int64', name='ts', length=236)] are in the [index]"
so again in normal df we have multible copies of a session value
@wheat snow try this
indexer = (session_duration >= 26) & (session_duration <= 36)
df_session = session_duration.loc[indexer]
where did session_duration come from?
What's the "mother dataframe"?
You can do .loc[indexer] on that.
thing, is session_duration is not a column of the df
it is just a value i calculated by grouping miliseconds_played and the session togethe
# Assuming df is your DataFrame with 'ts' as the index
# Convert 'ts' to datetime if it's not already
df.index = pd.to_datetime(df.index)
# Sort the DataFrame by timestamp
df = df.sort_index()
# Calculate the time difference between consecutive tracks
df['time_diff'] = df.index.to_series().diff().dt.total_seconds()
# Define a session break threshold (e.g., 30 minutes)
session_break_threshold = 30 * 60 # in seconds
# Identify sessions
df['session'] = (df['time_diff'] > session_break_threshold).cumsum()
# Calculate session duration
session_duration = df.groupby('session')['ms_played'].sum() / 1000 / 60 # Convert to minutes
session_duration= session_duration[session_duration >=1]
``` maybe that helps
that seemed to work so far @serene scaffold
indexer = (session_duration >= 26) & (session_duration <= 36)
df_session = df[df['session'].isin(indexer.index)]
I submitted my paperrrrr
How do I show probability distributions like this using matplotlib? X-axis shows for which depth this distribution is, y-axis shows dates
This plot was generated by R library
help me with this please
Try printing the arrays and their shape between steps, that way you can see were things get messy
It's kinda confusing so at least kinda. I'm pretty sure there are some metrics that you can use to calculate how good your predictions are based on it, though, and estimate that more "objectively".
the problem occurs when i set the sum axis to 2 when summing dZ2. if you know how i can fix that that fixes the output being 100arrays too
turns out the output arrays is (10,inputSize)
like the values of the output gets turned into an array with inputsize amount of the value
db2 = 1 / m * np.sum(dZ2, axis=1, keepdims=True)
@weary timber
how can I manage an existing conda env using pixi - should i export conda yaml and convert that to pixi.toml
i have done it but the calcs are wrong, can you help me out? https://paste.pythondiscord.com/7EYA
yes, lets try and figure it out together
sounds good
hi, have a look at Matt Harrison's Essential Statistics course (using Seaborn, sorry!) https://github.com/LinkedInLearning/python-statistics-essential-training-4433355/blob/main/PyStats-Solutions.ipynb
Hi I want to build a machine learning model that can detect toxic messages throughout different languages
Do you guys have any ideas?
Kaggle has some datasets for that, at least for some of the most common languages like English and Portuguese
I need arabic
Just read this:
And it seems like classic Multilingual models like BERT are less effective than language specific. Is it a viable solution to first detect the language then use a model designed for that language specifically?
yes
in some cases a single model trained in multiple languages could perform better than models trained only in a single language - specially if the languages are similar to each other (e.g. Latin derived European languages) and you don't have a lot of training data for each
for this, just training specifically in one language should be fine though
just pay attention to class (in)balance and determine if you'd rather prioritize accuracy or recall
def backprop(DESIRED):
global cache,params
m = DESIRED.size
DESIRED = np.eye(10)[DESIRED].T
dZ2 = cache["A2"] - DESIRED
dW2 = 1 / m * dZ2.dot(cache["A1"].T)
dB2 = 1 / m * np.sum(dZ2,axis=1,keepdims=True)
dZ1 = params["W2"].T.dot(dZ2) * sigmoid(cache["Z1"],True)
dW1 = 1 / m * dZ1.dot(cache["A0"].T)
dB1 = 1 / m * np.sum(dZ1,axis=1,keepdims=True)
return dW1,dB1,dW2,dB2
isnt this a true implementation?
note that the inputs im making are transposed
@wooden sail
You do not have standing permission to ping people who have helped you in the past
What are steps to process data, like when processing data what are the things to consider or do for example: handling missing values duplicates, what else
it varies a lot depending on which data you're working with
beginner data science material sometimes makes it sound like preprocessing is a well-defined thing with exact steps that you always follow. but it isn't really like that.
sorry!
@livid locust locked now, you still want help?
yeah iāll come back here after my classes i didnāt realize the time frame
we should just dm or make a group ?
https://github.com/lucifertrj/Awesome-RAG/
found some useful GitHub repo with multiple notebooks on RAG mainly using Open Source LLMs
as long as it doesn't goes too much into self-advertising
post it and we'll say if it's too much I guess
is that satire?
regardless... I would recommend putting things in GitHub instead of just pasting in the pastebin
and (if you have any) linking your citations or other useful resources
anyone know how to solve this??? GOING INSANE and cant figure it out
Maybe open a help thread, and paste the problem as text? Hard to read this. #āļ½how-to-get-help
Hello, a question please, what a better approach to training a model, If no improvement is made in this run reload to the last point of improvement (weights wise), or keep on training with decreased accuracy?
Maybe it's a dip but the model still got some parameters which will help it better improve on next runs / long run
For image segmentation if that matters
I mean if for example I train the model for x amount of runs (each have y amount of epoches) and after each run it saves the weights, and in one run the accuracy of the validation is worse than the privous run, should I keep training with those weights on the next run or reload the last weights which had better accuracy score
Hi,
could anyone please check my understanding on this:
Athena vs Redshift
Athena can't run SQL queries on the data that is present inside the warehouse(For ex: Redshift).
Athena works only on the data present inside S3 bucket.
Athena vs Glue
Catalog can be created by athena service / glue service.
Recommended way to generate catalog is by glue. The reason is simple. Athena can prepare catalog only based on the data from s3.
How about preparing the catalog from may sources like S3, SQL DB, No-SQL DB etc ? Only glue can do that.
If you want to correct/add new thing, pls.
Note: I am not literally comparing the above services. They are there for different purpose. My intention is to compare certain aspects which are command between services.
Also, if glue can do more than what athena can do, whats the purpose of Athena?
what sould next
i have done a oursse in coursera
form andrew ng machine learning spelication
There should be a channel for Quantitative Finance
Deep learning by the same guy.
What GPUs are the best for machine learning? (pytorch)
TPUs and NPUs
If you really wanted the best of the best, then of course you're not going to be using a normal GPU
you'd be using an A100 or something
last time I checked both of those are inferior to just buying something from nvidia tho?
The A100 is an Nvidia TPU
They're only used for AI
esp. training
my memory is fuzzy, but isn't NPU in like a CPU, and TPU you rent from google, and there was another thing that's very tiny which may run image recognition nets
More processing power than a 4090
An NPU is a Neural Processing Unit, a TPU is a Tensor GPU
The former is for neural networks
the latter is used for things such as ChatGPT
last time I checked nvidia called them Tensor Core GPUs
which I don't think is the same as TPUs, those are google products
former for neural network, latter for ChatGPT
that doesn't even make much sense as the architecture behind ChatGPT is a neural network
nvm
I was talking about scale
ChatGPT is a combination of 10,000 or so Nvidia A100 Tensor GPUs
*V100 not A100
different things, same purpose
anyway; still, your best hardware in terms of AI rn is still just everything nvidia (though the price isn't so attractive)
Yeah, the price is never attractive when you're talking about large scale stuff
And you did in fact ask for the best, not the most cost friendly
lol I just found out the A100 is around 4-5 times as powerful as the V100
I mean it's not even about large scale but software
pytorch and others were clearly built with CUDA in mind
Everything that's using AI has an Nvidia chip at this point, they're just the best option
if you're on windows and have an amd card, well too bad cause pytorch ROCm ain't supported
intel is also here I guess, even worse than amd though
Anyone?
I didnāt know that those A5000 and so on where called differently, very interestingā¦
Yes
Iām investigating all of that, might come here later, thank you
The A5000 is a normal GPU in comparison to the A100
The A just means it's of the Ampere architecture
V means the Volta architecture
Very interesting indeed.
Do the numbers follow any logic?
I heard that CUDA Cores are the most important thing to look intoā¦
what do you actually plan to do
like
What GPUs are the best for machine learning? (pytorch)
what are you gonna do with pytorch? try implementing image recognition nets? run large language models? finetune large language models?
The number of cores, the speed of the cores, and the latency/speed of data transfer to TPU/GPU. Those are the main things to look for.
Depending on the task at hand, the priority of these change.
Itās Deepfake technology in real timeā¦
Curious on if using two graphic cards will make that much of a difference
Image related.
I mean, from the looks of their github, a 3060 alr runs at 20 fps?
so just consumer grade is probably enough
I have a 3060 and it peaks at 20 FPS, usually it goes at around 7 frames (per second)
Willing to make an upgradeā¦
Although I can probably wait for the new generation
šš»šš»šš»
Hey, I had an idea for a model that could create Virtual Machines/Environments to run software not typically compatible on the system. I.e. Excel on Linux, but without the need to use Wine, and I also wanted to use it for games.
how is this related to DS and AI?
2 things, why this channel, and is this not already solved by Steam's Proton
ok, 3 things actually, WINE is normally better than a VM if you can use it because it distinctly isn't a VM
Oh, sorry, I forgot to mention I was going to use AI to make the environment.
As in, it scans the software and sees what the requirements are.
I was going to use AI to make the environment.
like an LLM or what?
Yeah? Something like that. I apologize I should have phrased it better.
Well, maybe not a large language model.
that isn't very realistic I think
what exactly do you have in mind when you say "scans the software"?
You are describing Proton, but with AI instead of just procedural detection
Can you explain how nested cross validation is performed or point me to some resource that does so?
The way I understand it so far is:
- split the data into k-folds
- for i from 1 to k:
- test set =
fold[i], outer training set = remaining (k - 1) folds - let's call the outer training set folds
out_foldsfor example - for each model that we have:
- for j from 1 to (k - 1):
- validation set =
out_folds[j], inner training set = remaining (k - 2) folds - let's call the inner training set folds
in_foldsfor example - train the model on
in_folds, evaluate on the validation set
- validation set =
- report the mean performance of the model on all folds
- for j from 1 to (k - 1):
- Now we should have performance measures for all models on the first round of cross validation on the outer fold, so do we pick the best model, train it on the first set of
out_foldsand evaluate it on the test set (fold[i])? This way we might report different models as the best one from one outer fold to another right?
Or we would do the above steps on all combinations of inner folds, and then report only one final model which performs best on all of them, and then apply cross-validation on this model with the outer folds?
- test set =
is learning reinforcement learning right after cnn a good idea?
RL is bit different , basic knowledge of NN, some begineer level algorithms can be also good
so wdym
i dont learn?
learn then
do you know basics of ML?
of course you do!
then continue to learn RL
are you being sarcastic with this?
no I was just rechecking
how do i get started on building a multiagent ai system?
What is multiagent. I am hearing about ai agent. What is all of that.
Its just ai's that are not fully chatbots its a network of ai's discussing stuff like email automation, coding, data analsyst or researching and you put all theese agents together and they discuss how to make their task succsesful. im also kinda new to it
I am on 3.11.9 of Python. CUDA 12.6.3.
Need to train a model asap, have a hackathon in 5 hours. Not sure how to install PyTorch.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu12
I tried this but it gave an error.
I am having massive trouble with reading and merging pandas dataframes. I am getting seemingly contradictory output from code.
I am on MacOS Sonoma 14.6.1, with Apple M2 chip. I have python 3.11.5. I have anaconda python installed.
Problem #1: This assert fails, which it shouldn't on an outer join. I checked, the dtype of truths.id, edges.id, and merged1.id are all int64
import csv
import pandas as pd
#increase size limit because file is too big
csv.field_size_limit(10**9)
truths = pd.read_csv("truths.tsv",
sep="\t",
quotechar='"',
engine="python",
on_bad_lines='skip')
edges = pd.read_csv("truth_hashtag_edges.tsv",
sep="\t",
quotechar='"',
engine="python",
on_bad_lines='skip')
merged1 = truths.merge(edges, on = "id", how = "outer", indicator = True)
for item in merged1.id:
assert (item in truths.id or item in edges.id)```
Problem #2: When I run this after the above code, directly contradicting the fact that the assert failed
`(merged1.id.isin(truths.id) | merged1.id.isin(edges.id)).all()`
Problem #3: This sequence of asserts passes:
```python
for item in edges.id:
assert item in truths.id
But edges.id.isin(truths.id) gives a pd.Series with some values True and some values False. If that assert passed, it should be all True.
Also I can't easily manually inspect this stuff because I'm dealing with dataframes with hundreds of thousands of rows.
Also the dataset is the Truth Social dataset found at https://zenodo.org/records/7531625
Please do print(merged1.head().to_dict()) and put the result in the paste bin. Please do not post any screenshots.
!paste
@tranquil zenith nevermind; I downloaded the dataset and ran your code.
The problem is that x in series tells you if x is an index for an element in that series. not if it's an element in the series.
In [6]: truths.id
Out[6]:
0 703265
1 807614
2 807615
3 807618
4 807619
...
739774 1060918
739775 1060919
739776 1060920
739777 1060928
739778 1060937
Name: id, Length: 739779, dtype: int64
In [7]: 1060937 in truths.id
Out[7]: False
In [11]: pd.Series.__contains__?
Signature: pd.Series.__contains__(self, key) -> 'bool_t'
Docstring: True if the key is in the info axis
File: c:\users\17032\appdata\local\programs\python\python312\lib\site-packages\pandas\core\generic.py
Type: function
In [12]: pd.Series.__iter__?
Signature: pd.Series.__iter__(self) -> 'Iterator'
Docstring:
Return an iterator of the values.
what's the differences between Constructive loss and triplet loss? and how do i use them in pytorch?
It is likely due to the initialization of the cluster centers. You could try come up with some sort of "error" metric, like average distance to cluster mean, and run the algorithm multiple times with different initializations, and pick the one with the lowest error.
Is anyone at NeurIPS?
It'd have been easier to profer a solution if you had mentioned the specific error message you got .
Meanwhile, I've been using light-the-torch since I discovered it.
pip install light-the-torch
@mild dirge is right, in practice K-means++ is used a lot. It's same algo with a smarter initialisation
It's pretty straight forward to implement https://en.wikipedia.org/wiki/K-means%2B%2B
In data mining, k-means++ is an algorithm for choosing the initial values (or "seeds") for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means problemāa way of avoiding the sometimes poor clusterings found by the standard k-means algorithm. It ...
i want to run a llama model locally, currently i have installed it with ollama and it runs on its local server port. I want to be able to load the model separately and have more freedom over it, probably like a .keras or .h5 file. Can anyone guide me to what I'm looking for?
please link the model card for the exact llama model you're talking about (there are several), and say how much GPU RAM you have.
I'm starting with a smaller model, llama3.1:8b
I'm not using a GPU for this, it seems to work fine on my laptop
what are you trying to do with the model?
For now nothing much, a web app for this model. Later on I want to try fine tuning it and much more
probably use whatever is in these directories? https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored
alternatively download from another site like hugging face instead of using their cli
let me take a look at that
Ordered a book on pytorch so the document is in front of me
May I ask a question for those who have made specific type of neural network? Sorry
!mute 868137138091343925 "1 day" You previously agreed to "not apologize for everything and not asking any questions to ask a question", and you have done that. This style of engagement is irritating for other users.
:incoming_envelope: :ok_hand: applied timeout to @unkempt wigeon until <t:1734059787:f> (1 day).
here we go again!
just go with hugging face
it is very easy
do you have a documentation or tutorial to use hugging face for this
can someone help me with this?
it's mostly for inference, for finetuning you'll have to find something else
tho finetuning without a gpu sounds like asking for hell
im trying to do facial keypoint detection but am running into a problem where my model plateaus pretty quickly and seemingly doesnt want to move. ive tried varying LR, dropout, BN, augmentations, model size etc but cant get it to work. Any ideas?
this is my current model:
class KeypointModel(nn.Module):
def __init__(self, hparams):
super().__init__()
self.hparams = hparams
self.features = nn.Sequential(
# First conv layer
nn.Conv2d(1, 64, kernel_size=5, stride=2, padding=2), # 96x96 -> 48x48
nn.ReLU(),
nn.BatchNorm2d(64),
# Second conv layer
nn.Conv2d(64, 128, kernel_size=5, stride=2, padding=2), # 48x48 -> 24x24
nn.ReLU(),
nn.BatchNorm2d(128),
# Large pooling to reduce dimensions
nn.MaxPool2d(kernel_size=4, stride=4), # 24x24 -> 6x6
)
# fully connected layers
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(128 * 6 * 6, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 30)
)
def forward(self, x):
if x.dim() == 3:
x = torch.unsqueeze(x, 0)
x = self.features(x)
x = self.classifier(x)
return x
Keypoints are 15 coordinates (x,y)
i was hoping to overfit first but i cant seem to even do that. even with many epochs
Seems like a pretty small model @heavy crow , maybe try adding more convolutional layers with a stride of 1?
what is your learning rate?
you are on the right path with trying to overfit first
is cupy really much faster than numpy?
GPUs are orders of magnitude faster than CPUs for certain operations
Assuming that you have the required hardware and drivers setup, using libraries that execute heavy operations in it will be much faster than using libraries that execute the same operations in the CPU
im currently working on a project that is implementing algorithms like q learning but with a shared action and state space and it takes forever to run simulations
im already using sparse matrices
If you are already using numpy efficiently and have a GPU you can use, then it may be worth trying to migrate
If you aren't even using numpy efficiently/correctly, I'd recommend starting there instead of changing libraries though (most libraries are similar to numpy one way or the other)
For example, are you using any for loops to iterate over rows?
yes i am
....for the first part or the last question?....
second part sorry
iterating over numpy arrays completely destroys any performance benefits you could hope to gain from using it
same but even worse with GPU libraries
if you haven't yet, I very strongly recommend reading the numpy user guide
I'm using a LR (on plateau) scheduler but starting between 1e-3 and -6.
Yes I tried this as well, without much luck
It seems random. If I reinitialize the model a few times sometimes it will fit great
But not over fit. Val los sinks with training loss. Overall it seems very sensitive to batch size, LR, and initialization. I'm using Adam btw
why is llama so popular?
when I read messages in this server, everyone who wants to improve a bot choose llama
Technologies like ChatGPT and Llama are generative (large) language models. Do not refer to generative LLMs as bots, as everyone has their own preconceived notion of what a "bot" is, and it often won't be "generative LLMs".
The Llama family of generative LLMs are probably the most capable open-source ones. The GPT family (not including GPTs 1 and 2) are generally more capable, but they're proprietary.
thanks for clarifying
Mixtral is another generative LLM that in my experience is very capable.
what I never get is how neural networks work, I tried to make a sentence boundary code and I had issues with the neural network part, I mean, the basic part is more or less easy, (sum(input)*weight)+bias
well and the threshold and stuff
but I dont have any idea of how to apply does to make a neural network
and youtube doesnt rlly help, any good homepage for learning the theory?
for learning the theory, starting where?
id go to a constant learning rate and do one of like 1e-2 and paste teh graph here
i posted about my lstm here a few days ago and didnt come back i apologize (6 CS courses are killing me rn)
but if anyone could check this out sm appreciated
i wanted to show the variance (stuff related to accuracy of my model) and i ran into an issue
Start here: https://pytorch.org/tutorials/
I guess from the very beginning
Hello everyone, I am using insightface arcface models for face detection and recognition
The model returns embeddings for each face it detects. Embeddings are multidimensional vector 512d ( im flattening the array at the end either ways to perform cosine similarity on them)
so what is the industry standard for storing these vectors and comparing?
currently what I do is I have target_vector, vectors_in_group_photo.
I do a sequential comparison of them with each other using cosine similarity, if the similarity is below threshold. I make a verdict.
but the issue is if I need to find 2 people in the same image the time complexity for this becomes O(n²).
60 people in the image, 30 target people to recognise. Calculating cosine similarity for each is a resource intensive task.
how does one tackle it, is there any better approaches for it. Also would be eager to get information on how to store these vector embeddings once calculated
how do i get a random image from a ImageFolder dataset class?
have you looked at the methods that are available for that class? and can you link to the docs for them?
yes i did and still couldnt understand it
can you give the link? cuz I don't even know what that class is. I do nlp.
hello everyone! I'm creating a system that performs credit analysis with Django Rest integrated with a ML Algorithm that will make a prediction if the client is a good client or not. I develop all the training and pipeling on a jupyter eviroment, integrate on Django, but I'm getting this error:
TypeError at /api/credit-analysis/3/
ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I verified on Jupyter if had any df with NaN value, but it isn't, also verified if the data extracted from the DB had, but isn't as well. Any tips or help?
can you show the whole traceback and the code that causes the error?
Read the DM from the @arctic wedge bot, if you got one
https://paste.pythondiscord.com/UBSA
Can you read? There has both the code and the traceback
I'm not that system developer, so forgive any silly mistake :p
what type is xp in if xp.any(xp.isnan(known_values)):? can you show the code where xp = happens?
or is xp a module?
I think is a module of sklearn, I found here the code, one moment
I think can be something about the categorical variables that I encoded to fit the ML Algorithm, but on jupyter works well. I will make some tests with these pipelines outside de django project, for tests.
hi for those who have experience in mlops, which course do you recommend most?
https://www.udemy.com/course/mlops-course/
https://www.udemy.com/course/azure-machine-learning-mlops-mg/
https://www.udemy.com/course/deployment-of-machine-learning-models/
do you know what Azure is, and do you already know for sure that you need to know how to use it?
can someone help me build a resume š . I dont have any projects. I want to make some projects in Machine Learning or DL or LLMs.
please don't ask about the same thing in more than one place. you're already asking for help with this in #career-advice
Oh sorry
Itās like AWS? For deployment, traffic management, maintenance, etc?
yes, it's like AWS. do you know for sure that you will need to use it? because it would be a waste of your time to learn Azure and then get hired by a company that only uses AWS (likely) or on-prem infrastructure.
No idk fs that Iāll use it
So from the other two which one do you recommend?
can anyone help me in dms with my scikit?
No; ask your question here
I'll respond when I can
so I am following a post online for implementing a contrastive loss function in python like this
class ContrastiveLoss(nn.Module):
def __init__(self, temperature=0.05):
super(ContrastiveLoss, self).__init__()
self.temperature = temperature
def forward(self, z_i, z_j):
batch_size = z_i.shape[0]
z = torch.cat([z_i, z_j], dim=0)
sim_matrix = torch.mm(z, z.T) / self.temperature
sim_matrix = sim_matrix - torch.eye(batch_size * 2).to(z.device)
labels = torch.cat([torch.arange(batch_size), torch.arange(batch_size)]).to(z.device)
loss = nn.CrossEntropyLoss()(sim_matrix, labels)
return loss
where zi, zj is being generated from
for images in trainloader:
images = images[0].to(device)
images = torch.cat([images, images], dim=0)
z_i, z_j = model(images).chunk(2, dim=0)
loss = contrastive_loss(z_i, z_j)
``` and each images in trainloader is a batch of 256 images
However this seems wrong no? Shouldnt the images being fed to the model be different in constrastive loss? Where we have one positive pair but the positive pair shouldnt be the same image should it?
Also I am confused as to how that loss function is actually the contrastive loss function shouldnt it only be comparing the first two images
how do i shuffle datasets while keeping its labels aligned
like i have a dataset of [1,2,3]
are you using pytorch
just define a new index array and shuffle it that way
wdym
are you letting yourself import libraries?
arr1 = np.array([1, 2, 3, 4, 5])
shuffle_array = [4, 3, 2, 0, 1]
arr1[shuffle_array]
like that works
@weary timber
I use this in one of my codes, maybe it can help you.
"""
InfoNCE Loss for contrastive learning with proper dimension handling.
"""
def __init__(self, temperature=0.07):
super(InfoNCELoss, self).__init__()
self.temperature = temperature
self.criterion = nn.CrossEntropyLoss()
def forward(self, features_1, features_2, labels):
"""
Compute InfoNCE loss between two sets of features.
Args:
features_1: First set of features (B x D)
features_2: Second set of features (B x D)
labels: Tensor of labels
"""
if features_1.size(0) == 0 or features_2.size(0) == 0:
return torch.tensor(0.0, device=features_1.device)
# Project both feature sets to same dimension if needed
if features_1.size(-1) != features_2.size(-1):
projection_dim = min(features_1.size(-1), features_2.size(-1))
features_1 = nn.Linear(features_1.size(-1), projection_dim).to(features_1.device)(features_1)
features_2 = nn.Linear(features_2.size(-1), projection_dim).to(features_2.device)(features_2)
# Normalize the features
features_1 = F.normalize(features_1, dim=1)
features_2 = F.normalize(features_2, dim=1)
# Calculate similarity matrix
logits = torch.matmul(features_1, features_2.T) / self.temperature
# Labels for contrastive loss
labels = torch.arange(logits.size(0), device=logits.device)
# Calculate loss in both directions
loss_i = self.criterion(logits, labels)
loss_t = self.criterion(logits.T, labels)
return (loss_i + loss_t) / 2```
thanks Ill look at this!
@rich moth it does look like this code is taking the same image and comparing it to itself does it not?
which seems odd
ya, that seems to be the issue it looks like its comparing the image with itself.
which is def odd ok thanks!
no worries
Hey snow
hey
So do u still need help
yes
Oh ok
Ive been working on a model for crypto price prediction, the idea is to try to accurately forecast future asset returns and also providing uncertainty estimates in its predictions. Ive still gone some more fine tuning in the parameters in the optuna study but im really happy with the results right out of the gate.
Can u pls help snow for me
sure, i can try
Thanks bro
tyy
What part of the code seems to be running for so long? How big is the xlxs files? I mean do you have a lot of columns and values? You're preforming a grid search. It can take awhile depending on your specs and data, consider RandomizedSearchCV
A good idea is to create more logging and it will help you track down the issue more.
can you hop in vc possibly?
vc?
voice chat
Plunder can do it
not right now. maybe later
what are you basing the Gaussian distribution on?
for data science is it better to be a jack of all trades and focus on being able to work in diff languages and frameworks or is it better to focus on specific things and languages? dk if I should focus on python or start learning R. @ me replies
my overall python knowledge is decent, have 0 R knowledge though
I've never been asked to use R a single time ever.
Hi, can python programs using existing ML packages be optimized highly efficient?
Like, as efficient as programs written by c++ or so?
this question is a lot more complicated than you probably realize. what is your goal for having the answer, and we'll work backwards from there.
cool thanks
I'm thinking about the limit of computing capability of ML packages. For example, on a PC if I feed 10^6 training data each having 10^6 features, than before if can give any result will the program collapse ?
what does it mean for a program to "collapse"? programs don't just "collapse" if they don't finish after a certain amount of time.
@quartz karma if you want to learn about ML, and you only have time to learn one of Python and C++, do not pick C++.
like, will the 10^6 features exceed maxiumn num of features allowed which may lead to program terminating?
that's a matter of how much RAM your computer has. which is a hardware question.
so the internal variables of python doesnt have length limit or something?
Nope.
It sounds like you're very new to programming. If you're worried about optimizing the memory or runtime of your programs, I would encourage you to stop worrying about that. You don't currently know enough to understand what actually isn't optimal.
Yeh maybe. I'm just surprised that there's no limitation of feature numbers or other parameters when using python ml pkgs...
all these things are represented in terms of multi-dimensional arrays (also called tensors) and you can have as many of them as you want, as large as you want, as long as you have enough RAM.
if you have 10^6 instances with 10^6 features, then that's 1 trillion feature values. And if each one is a 16 bit float, then that's almost 2 terabytes.
yes I agree that's huge amount, on the other hand consider how complex a ML model would be (esp. those with a scale close to LLM models), it's possible that we may need to deal with such huge data right?
yes. and all those LLMs are trained by Python code on machines with huge amounts of hardware resources.
could it be that the real efficiency are coming from the fact that, on the lower level the computings are mainly carried out by CUDA in graphic memories, which is totally different from python?
it's not "could it be". it is.
if you rewrote all of it with C++, but you didn't use CUDA, it would be way too slow.
and if you rewrote it in C++ and you still used CUDA, the performance gains would be negligible, but it would have taken longer to write.
So can I think it like this, that for smaller data/models we can handle them with existing python libs, and for big data size or LLM models, we use those suites from CUDA to achieve acceptable performance?
existing python libs still use cuda.
Could you provide some hints of examples?
pytorch.
LLMs can have 7 billion or even >140 billion parameters. but even models with hundreds of parameters can benefit from CUDA.
that's fascinating. On the other hand if i'm using simple models from say sci-kit learn or xgboost, then i'm totally relying on performance from python code?
the ones in sklearn will probably use numpy for performance gains. idk about xgboost
@gentle wyvern its being fit to the empirical distribution of the models prediction errrors on the validation set
can someone take a look at this code and tell if this looks right for implementing contrastive loss?
class ContrastiveLoss(nn.Module):
def __init__(self, temperature=0.5):
super(ContrastiveLoss, self).__init__()
self.temperature = temperature
def forward(self, projections_1, projections_2):
z_i = projections_1
z_j = projections_2
z_i_norm = F.normalize(z_i, dim=1)
z_j_norm = F.normalize(z_j, dim=1)
cosine_num = torch.matmul(z_i, z_j.T)
cosine_denom = torch.matmul(z_i_norm, z_j_norm.T)
cosine_similarity = cosine_num / cosine_denom
numerator = torch.exp(torch.diag(cosine_similarity) / self.temperature)
denominator = cosine_similarity
diagonal_indices = torch.arange(denominator.size(0))
denominator[diagonal_indices, diagonal_indices] = 0
denominator = torch.exp(torch.sum(cosine_similarity, dim=1))
loss = -torch.log(numerator / denominator).sum()
return loss
That's awesome! I've been working on a trading bot. So far in the testing it's done well!
Hey, there's something I'm trying to work out/understand. I'm going over material that proposes one way of choosing which model to use, but then later declares that this is a flawed methodology.
The proposed way to select models was to select which model had the smallest loss in the test set. I can believe it's a flawed strategy, just not sure I understand why.
In the context of the lecture video, it's related to selecting what degree of polynomial model to use and argues that the higher degree polynomials inherently give an overly optimistic amount of loss. This is very intuitive when we're looking at the training set, but how does this carry over when we're evaluating with a test set?
Okay, this might be one of those embarassing situations where I should have just kept watching š They're arguing to make a training set, test set, and cross validation set
If you test a large amount of models, eventually you'll find some that perform better in your specific test set by pure change chance - effectively overfitting your architecture to that test set
Less complex models are less sensitive thus more likely to generalize to unseen data
Ah, gotcha, super fair. I'm familiar with that sort of thing in statistical analysis. Where if you happen to be testing a ton of hypotheses you have to use much stricter requirements
yeah... in many (if not nearly all) cases, the data you collected for tests is not going to be a perfect representation of real world data
I guess that's another reason why it's a good idea to have a dev, test, and train set.
can anyone help me fine tune a model?
send your actual question, rather than asking if someone is available to only then maybe send it later
I'm going to sleep now though, and my only recommendation without more details would be taking a look at Kaggle resources
maybe someone can answer after you ask the actual question, good night
@lapis sequoia what etrotta said applies broadly to any time you want to talk to people on the internet. be sure to keep it in mind.
I meant in vc coz i wanna screenshare some questions
that doesn't go without saying. but this server doesn't really support screenshare calls. if you want to get help, I suggest asking your question as best you can in this channel with code examples.
ok here's my code:
https://paste.pythondiscord.com/DQIA
it's taking hours to process the data (800k entries), and these are the results:
Random Forest Cross-Validation Scores:
R² Scores: [0.52024959 0.61293593 0.6065748 0.60181169 0.45804028]
Mean R²: 0.5599 (+/- 0.1223)
Random Forest Metrics:
RMSLE: 0.07565448909307013
R² Score: 0.6019513442397915
RMSE: 5441924.603108913
how could i possibly speed it up and improve performance?
test data looks something like this and im trying to predict the prices
training data looks something like that
@lapis sequoia thank you. do you know what a nominal feature is?
yes
I suggest eliminating them from X_train. features like brand and country probably have too many unique values.
I would probably also eliminate date_listed and use only year.
so year, hours, manufacturer, model, type, connectivity?
condition is always used with very rarely "Good"
so i dont think this should be included
I think you can make that determination. think of it from the computer's perspective: numeric features can inherently be compared to one another. but names/labels cannot. if a feature can only be meaningfully represented as something other than a number (like a country, or a brand name), you want it to have as few unique values as possible. just the essentials.
thing is i wouldnt mind keep testing
but it takes 2-3hours to train once
and im runnnign out of time, what could i do to train as fast as possible
coz 800k entries is not optimal, but if i start doing like 10k it will not be as accurate
select only the bare minimum set of features that you think are relevant to the y data.
what is your y data? the price?
so try picking only the features that you're sure are relevant for determining the price. maybe train with only 20% of the training data, and see how accurate it is on the test data.
then you can start adding more of the training data back in. which will increase training time.
so
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.8, random_state=42)
instead of
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
ight
no. keep the test size at 0.2. but then make X_train and y_train a subset of themselves
just pick the first n rows, or something simple like that
well it'd be more like 200k
the important thing is that the mth row of X_train needs to line up with the mth row of y_train
hmmm
so
X_subset = X_train.sample(n=subset_size, random_state=42)
y_subset = y_train.loc[X_subset.index]
try a model that's faster to train maybe?
e.g. reduce the n_estimators in your rfe
basically that
any recommendation?
xgboost?
did I not include in the same message what you can do
I think that would work; inspect X_subset and y_subset to confirm
no i meant a recommendation of a faster model
ight ty
...
I said, you can reduce the n_estimators parameter in your random forest regressor to make it train faster
yes but you also said try a model thats faster to train, so i just assumed u meant to use a diff model
I don't think they know what that is. n_estimators appears once when they define param_grid, but that variable is never used.
i do, i currently have a grid with 50-100-200
you define param_grid in line 79 of your code, but you never use it.
the only hyperparameter that you specify (if it can even be considered a hyperparam) is the random state (which you set to the right number)
i just removed it coz someone told me it'd be faster when im testing
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
subset_size = int(len(X_train) * 0.2)
X_subset = X_train.sample(n=subset_size, random_state=42)
y_subset = y_train.loc[X_subset.index]
rf_model.fit(X_subset, y_subset)
rf_predictions = rf_model.predict(X_val)
so basically like that?
bad wording from my part then
anyways, you can turn down the n_estimators or change other params like max_depth
another thing is the HGBR which is still a tree but should be faster
in terms of other gradient boosted trees, lightgbm faster than xgboost faster than catboost (usually)
also, trees would usually take longer than say a ridge or lasso or a simple linear reg; however, then you'd need to deal with categorical variables differently if you have those
okay sounds good
or do i nede to change x_val and y_val as well?
what's the difference between ImageFolder.imgs and ImageFolder.samples ?
can someone tell me how do i work with googles quick draw dataset when its unlabeled
i need to shuffle it too
is it necessary to use sigmoid on the subnet output in a siamese network?
hey guys
is it a bad idea to create a model that is bad predicting, so that you assume the opposite will happen?
for example, almost every time my model thinks it's going to rain, it doenst
so I kinda know that whenever my model thinks it's gonna rain, it wont haha
smaaaaaaaaaart
the thing is at that point, how did you even get the "bad" model, sounds to me like you messed up some training data
e.g. if this is a binary classification problems, maybe you accidentally flipped the labels
for multiclass, then not really
e.g. if you know it's either sunny, cloudy or rainy, you still wouldn't know if it's cloudy or sunny if your bad model predicts raining
so even though I described a classification problem, in my case it's not, this was just for an example
like what if I'm predicting continuous variables
then shouldn't you have a continuous objective function and the bad model is just bad no matter how you spin it? I don't get it
like say your model never predicts correctly
- say your model always predicts too high, then again that sounds like something messed up in training
- say your model always predicts either too high or too low, but you don't know which; then it's not very useful cause you wouldnt know to go up or down from the predict
lets say my model predicts the number of new members python's discord group will have. Let's imagine my model predicts that there will be over X new members today, but that is almost never the case
then I know that since my model is bad, almost every time my model predicts there will be lets say, 10 new members, I can say there WONT be 10 new members
I guess?
seems weird
I have cloned this repo: https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
and followed the first code over here:
https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4#š¤-transformers
on running it I get this error: ```bash
CUDA extension not installed.
CUDA extension not installed.
Traceback (most recent call last):
File "C:\Users\user\Desktop\LLAMA 2\gpt.py", line 7, in <module>
model = AutoGPTQForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AutoGPTQForCausalLM.from_pretrained() missing 1 required positional argument: 'quantize_config'
How do I fix this?
I know it sounds weird, but in case I try to come up with a good model and I really cant, it would be weird but at least it would work?
I mean sure, it makes more sense to come up with GOOD models instead of BAD models, but if you cannot come up with GOOD ones, then just predict the opposite of what your BAD model predicts.
I'm not sure why you'd want that tho
like there's infinitely many worse answers than "there will be 10 members"
so knowing there isn't 10 members doesn't really help, you still have infinite wrong answers
like I can say "pydis's members will not double today"
well, duh, but also, how does that help me
the problem is the opposite isn't really useful ig?
like my example with ...will not double
can anyone help me about this?
Random Forest Cross-Validation Scores:
R² Scores: [0.48833798 0.56808773 0.55909636 0.56288745 0.42693348]
Mean R²: 0.5211 (+/- 0.1108)
Fitting Random Forest on all data...
Random Forest Metrics:
RMSLE: 0.07985307785989508
R² Score: 0.5581471021117961
RMSE: 5714076.326474625
This is my model's scores
this is my code
https://paste.pythondiscord.com/K5SA
what can i change to improve the rmsle?
no, because generally ReLu works fine
always depends on the data at hand
check for underfitting, or more likely as you're using trees, overfitting
do more diligent EDA and feature engineering
but you need it to calculate the similarity?
yeah but im lost rn im ngl.
date_listed: The date when the device was listed on the platform.
year: The year the device model was manufactured or first introduced to the market.
manufacturer: The company that produced the device.
model: The specific model name or code for the device.
type: The category of the device.
connectivity: The type of connectivity supported by the device.
hours: The total number of hours the device has been used.
brand: The commercial brand associated with the device.
country: The location where the device is listed for sale, mapped to various countries.
condition: The current state of the device.
accessories: The number of additional accessories included with the device.
discount_percentage: The percentage discount applied to the original listing price.
user_rating: A rating from past users or simulated feedback that reflects satisfaction with the device.
warranty_extension: A binary indicator of whether a warranty extension option is available for the device.
price: The original listing price of the device. (target btw)
condition is literally 99% "used" and 1% "good"
type has a lot of empty rows
can you explain for what purpose you are using this?>
a siamese network
I mean first of all
for col in df.select_dtypes(include=np.number):
df[col] = df[col].fillna(df[col].median())
for col in df.select_dtypes(include='object'):
df[col] = df[col].fillna(df[col].mode()[0])
```surely you can do better than blindly filling every missing with median / mode of the entire dataframe?
not to mention you're leaking due to this
sure, what do you recommend? simple imputer?
yeah, so in general sigmoid is not required
no
I recommend checking your data and finding something that makes more sense
and no, I have no idea what would "make more sense" because that 100% depends on the data you have
because sigmoid just maps the output of your layer/neuron into [ 0 to 1 ]
do i need it when using tripletloss
i just looked at the pytorch example for siamese network and it uses sigmoid to get the similarity
make some plots and see & feel your data
find patterns, try using it to your advantage, check if it betters the performance
first, clear your purpose and then learn about it !
i have a lot of data, i just dont really have any other idea how to find patterns basiccally
plot the data and see if you spot anything
ive tried a few combinations of features
how much do you mean by "a lot" btw?
trees tend to overfit, so try tuning the parameters
do you really have to implement your own Contrastive loss function, because pytorch doesnt have it
Could anyone help me with this?
what parameters are best to handle overfitting?
smaller n_estimators or lower max_depth immediately comes to my mind
but also, unironically just check sklearn docs
Gallery examples: Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 0.24 Combine predictors using stacking Comparing Random Forests and Histogram Gradient Boosting models ...
def criterion(x1, x2, label, margin: float = 1.0):
"""
Computes Contrastive Loss
"""
dist = torch.nn.functional.pairwise_distance(x1, x2)
loss = (1 - label) * torch.pow(dist, 2) \
+ (label) * torch.pow(torch.clamp(margin - dist, min=0.0), 2)
loss = torch.mean(loss)
return loss
found this on github gist
you just want to inference the model right?
@jaunty helm can you send that hf link again for that llama models which were 4bit
isnt TripletWithDistanceLoss the same thing?
and in other format too
I don't know man
you have to search about it
gptq is basically deprecated at this point
like the popular formats are either gguf if you need to run some parts on cpu, or exl2 if you don't
yeah can you send that gguf link?
it was this I guess
that's not the same model
ig this?
llama 2 is at least like 1 year old at this point
sure ig
for future reference, you can just go to the original model's page, and check to the side the Quantizations text in the model tree and click that
does it still count as a siamese network if i use crossentrophyloss
is it necessary every one of these files are needed?
cuz they are 3GB+ each
just curious about your insight on these.
no
each file is a different level of quantization
coz this means basically no correlation on all these values, so i'd rather stick to the others right?
potentially (as it may have a non-linear correlation)
looking at the graph I also don't see much interesting other than the 0s which I assume are missing
maybe the year has something but it's all squeezed to the ends cause of the 0s
using .corr()
Correlations with Price:
price 1.000000
year 0.039578
discount_percent 0.020176
accessories 0.000480
device_id 0.000047
warranty_extension_available -0.009423
user_rating -0.010529
so obv year is best in this, so the rest has to be the other cat data
yes, that calculates linear correlation
I meant that while yes, the correlations look low, there may or may not be non-linear correlation it can't detect
one of the easier ways to find out is just compare the models when you include vs. not include them
so any one of those should work i suppose
wouldnT just using xgboost manage that?
yes
don't use the QX_0 nor QX_1 ones, they're legacy and do worse
otherwise, the higher the X in QX, the higher bit it is
So Q2_K's good then?
Q2_K would be like 2-3 bits, if you're fine with that then sure
the original gptq you showed says INT4, which is 4bits, which is somewhere around Q3 or Q4
sure, or it could overfit (or you can still just tune the params to fix that)
I'm just saying that, just because .corr() is low, that doesn't mean there isn't some association between two variables
i want to try it out, I'll go to larger ones after playing around with these and getting familiar
usually the rule of thumb is, Q8 is virtually the same as the original, Q6 is within 99% the same
and for small models like an 8b, you don't want to go below Q4
I have tried with this, but then a new error talking about config.json: ```bash
Traceback (most recent call last):
File "C:\Users\user\Desktop\LLAMA 2\gpt.py", line 6, in <module>
pipeline = transformers.pipeline(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\LLAMA 2.venv\Lib\site-packages\transformers\pipelines_init_.py", line 849, in pipeline
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\LLAMA 2.venv\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 1053, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in ./Meta-Llama-3-8B-Instruct-GGUF. Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
I don't see any config files in repo, do I make my own? What information do I provide?
are you following https://huggingface.co/docs/transformers/en/gguf#example-usage ?
I am trying this now, it seems to give this error I am unsure of: ```bash
Traceback (most recent call last):
File "C:\Users\user\Desktop\LLAMA 2\gpt.py", line 6, in <module>
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\LLAMA 2.venv\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 875, in from_pretrained
config_dict = load_gguf_checkpoint(gguf_path, return_tensors=False)["config"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\LLAMA 2.venv\Lib\site-packages\transformers\modeling_gguf_pytorch_utils.py", line 278, in load_gguf_checkpoint
reader = GGUFReader(gguf_checkpoint_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\LLAMA 2.venv\Lib\site-packages\gguf\gguf_reader.py", line 94, in init
if self._get(offs, np.uint32, override_order = '<')[0] != GGUF_MAGIC:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\LLAMA 2.venv\Lib\site-packages\gguf\gguf_reader.py", line 151, in _get
.newbyteorder(override_order or self.byte_order)
^^^^^^^^^^^^
AttributeError: newbyteorder was removed from the ndarray class in NumPy 2.0. Use arr.view(arr.dtype.newbyteorder(order)) instead.
do I uninstall numpy and use a prior version of it? It might cause some other problems I suppose
that is a bug in the gguf package, looks like a fix was merged yesterady https://github.com/ggerganov/llama.cpp/pull/9772 but they haven't released a new version yet
for now just pip install numpy<2
alright that helped
Nice! Its pretty fun to tinker around with. I'm putting the final touches on my backtest engine. What kind of architecture did you end up going with? I used a transformer with multiheaded attention with some additions. I got inspiried from my Prime number predictor project to make the it. Like the uncertainty estimator, it just seemed like a perfect fit for a trading bot.
I have this code, ```py
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "./Meta-Llama-3-8B-Instruct-GGUF"
filename = "Meta-Llama-3-8B-Instruct.Q2_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
def generate_text(prompt, max_length=50):
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=max_length)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
prompt = "Once upon a time in a land far, far away"
generated_text = generate_text(prompt)
print(generated_text)
Terminal:```bash
Converting and de-quantizing GGUF tensors...: 100%|āāāāāāāāāāāāāāāāāāāā| 291/291 [03:20<00:00, 1.45it/s]97/291 [02:03<00:57, 1.64it/s]
``` It didnt show any text generation printed, why was that?
I tried to save the dequantized model into a folder but it didnt seem to work, the folder is empty: ```py
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "./Meta-Llama-3-8B-Instruct-GGUF"
filename = "Meta-Llama-3-8B-Instruct.Q2_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
model.save_pretrained("./dequantized_model")
tokenizer.save_pretrained("./dequantized_model")
bash
Converting and de-quantizing GGUF tensors...: 100%|āāāāāāāāāāāāāāāāāāāāā| 291/291 [02:45<00:00, 1.75it/s]
anyone able to tell me if this implementation looks correct for contrastive loss?
class ContrastiveLoss(nn.Module):
def __init__(self, temperature=0.5):
super(ContrastiveLoss, self).__init__()
self.temperature = temperature
def forward(self, projections_1, projections_2):
z_i = projections_1
z_j = projections_2
z_i_norm = F.normalize(z_i, dim=1)
z_j_norm = F.normalize(z_j, dim=1)
cosine_num = torch.matmul(z_i, z_j.T)
cosine_denom = torch.matmul(z_i_norm, z_j_norm.T)
cosine_similarity = cosine_num / cosine_denom
numerator = torch.exp(torch.diag(cosine_similarity) / self.temperature)
denominator = cosine_similarity
diagonal_indices = torch.arange(denominator.size(0))
denominator[diagonal_indices, diagonal_indices] = 0
denominator = torch.exp(torch.sum(cosine_similarity, dim=1))
loss = -torch.log(numerator / denominator).sum()
return loss
Well, I started with just a bot that monitored BTC so I could establish the best method to monitor the market. After messing with a few API's I decided to go with the wws feed from coinbase. Much faster and you don't run out of calls. Then I tested something I had AI come up with when I showed it what I look at on the charts. That didn't test well but I was just cutting teeth on Python. Now I have to with a ML module, that decided when to buy and sell on it's own. That's tested really well but my script got too long. I'm now hung on changing it to a modular system so I can make a variety o bots with it by just changing the logic module.
Wrap the generation in a try except block, maybe something is going on behind the scene
Hey guys I'm back
Howdy back, I'm front..
@rich moth hey if you dont mind me asking does that implementation look correct?
So my script primes as many coins as you feed it with $500 to start. I have it running 4 coins on most tests, BTC, ETH,SHIB and DOGE. It buys low, sells high and has a hold (1 - 0 - -1
Alas, currntly stuck at porting it to a modular file. I can't seem to get my starting prime balances to prime anything.
I still have to implement the stop loss and actual trading functions but it was testing well before I broke it up.
I used coinbase pro for trading with my first bot, I liked it. Though I didnt have many options being in the US. Lots of exchanges dont want to take you if you're from the US. So you need to design your RiskManger next? Sounds like you're diving in head first though and figuring it out. While you might cringe having to make it more modular, youll thank yourself later.
@rich moth I like how you charted out your plots. I just used a standard graph. I'm un the US, was it about certain coins?
You're asking about crypto? Can you explain the AI angle to this?
Here was on test.
@stelecrcus . Well, I'm using it to write a lot of the base code and then I'm using the Machine LEarning module to automate the trade logic.
@serene scaffold are you able to take a look at that loss function code and tell me if it looks right?
I'v suggesteed a crypto-bots in python topical chat. Would be great to have a place to talk specifically about crypto bots in PY.
You have to exponentiate all your similarities before you summing them. You should normalize the projections before you matrix multiplcation.
Thanks, I've learned over time how much visualizing your data can really help, numerical metrics only go so far. But in general they wont allow you to register if you click you're from the US on most exchanges i've tried.
yea I want cosine similarity for pretty specific reasons in this case hence why I am doing it that way - not sure if it will matter tbh
just curious what are the reasons?
so I really need the similarity of vectors and want to ignore the total magintude
or divide through by it I guess
empirically tbh might end up being worse
and its an easy enough change tbf
thanks for the catch on the exp before summing
duh.
I see hmm
honestly it might not actually make a difference tbf
thank you for the set of eyes
much appreciated
is there a way to make google colabs runtime less annoying? im dealing with large datasets and a bunch of libraries and nothing is running i have some weird problems with my runtime
either nothing runs at all and the runtime suddenly disconnects or it just takes absolute ages to run things that arent even that heavy
the only way is probably to pay for the subscription service.
I understand that this might come off as a really weird question, but
do any of you, in the "Data Science" sphere, use stuff like neovim?
I'm just interested in trying out "that stuff", switching to completely terminal-based code editing, however, I do really like some of the features of say PyCharm and VSCode that I'm not sure can be replicated in something like neovim
for instance, all the fun debugging tools in both PyCharm and VSCode, or the matplotlib chart display that's integrated into PyCharm and other simlar features
I understand I can get all of actual text-editing-related functionality (autocomplete, intellisense, stuff like that) in neovim, I'm just unsure about the rest of it, the rest of it that I might miss (like some fun extensions from VSCode and such)
might I suggest using paperspace
it's paid, but last I checked, it was relatively cheap and there are some free options occasionally as well
it's also owned by DigitalOcean now (and I think DigitalOcean has some GPU droplets itself now as well (might be a result of acquiring paperspace I suppose))
So it's important for many machine learning methods to operate off of scaled values that have a mean of 0 and a std dev of 1.
My question is, does it matter if you scale or fit a polynomial model first? I'm watching a lecture video and they suggest fitting the polynomial model first and then scaling the values with StandardScaler. I've always done it the other way around.
Have I been doing it wrong?
And if I've been doing it wrong, what happens when it's done that way?
Debugging, yes, Matplotlib, possible in different ways. Ideal may be to just use a tiling window manager (if you are going for the full Linux experience).
Notebook-style: https://github.com/luk400/vim-jukit
Debugging: https://github.com/puremourning/vimspector
(Uses the same debugger server as Visual Studio Code for Python)
The reason I bring up the tiling window manager is because on Linux, the entire OS is the IDE (Unix). And with tiling windows it feels just like one.
Nevermind, I think I misunderstood what I was looking at. They only add polynomial features with sklearn's preprocessing convenience method
So that makes sense it would come before the scaling
That's a neat way to look at it, thanks!
and thanks for the resources š¤
Hi can someone help me w this ty š
I am having this code: ```py
import traceback
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "./Meta-Llama-3-8B-Instruct-GGUF"
filename = "Meta-Llama-3-8B-Instruct.Q2_K.gguf"
try:
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
tokenizer.save_pretrained('dequantized_model')
model.save_pretrained('dequantized_model')
def generate_text(prompt, max_length=50):
inputs = tokenizer(prompt, return_tensors="pt")
print(f"{inputs = }")
outputs = model.generate(**inputs, max_length=max_length)
print(f"{outputs = }")
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"{result = }")
return result
prompt = "Once upon a time in a land far, far away"
generated_text = generate_text(prompt)
print(generated_text)
except Exception as e:
traceback.print_exc()
print(F"{e = }")
The only thing in terminal is: ```bash
Converting and de-quantizing GGUF tensors...: 100%|āāāāāāāāāāāāāāāāāāāāā| 291/291 [02:39<00:00, 1.83it/s]
do i have to modify the fc layer of resnet to train on custom dataset?
Good morning
I think the right answer is, it depends on the variant of transfer learning you wanna use.
There are 2 variants of transfer learning (TL)
a) TL from dataset perspective
b) TL from model perspective
Your question lean more towards b so I'll focus more on that part.
Transfer Learning from model perspective also has 3 variants.
- 1st Varaint: This variant of transfer learning is where you finetune only the last layer of your ResNet.
step 1: Train whole model on a large dataset (since you're using Resnet which has been pretrained you'll skip this)
step 2: Freeze the model weights
step 3: Replace the last layer which is the output layer.
step 4: After you've done step 3, retrain the model on your custom dataset. (here, only the new output layer which you replaced in step 3 will be updated while performing step 4)
- 2nd Varaint:
This variant of transfer learning is used to train multiple output layers.
step 1: Train whole model on large dataset (you'll skip this since Resnet is already pretrained)
step 2: Freeze the weights
step 3: replace the last layer
step 4: Train on custom dataset and update all the different fully connected layers in the network.
So in contrast to 1st Varaint, you'll be updating multiple output layers here. This usually tend to perform slightly better than 1st variant.
- 3rd Varaint:
This variant is where you finetune the whole model.
Step 1: Just like 1st & 2nd Varaint, you train the whole model on large dataset (skip this once again since you're using Resnet).
Step 2: replace the last layer
Step 3: Now, train on custom data and update all layers. That is, instead of only updating the last layer, you'll train the model and update all layers (notice in this variant we don't have to freeze model weights)
This variant can be a bit more expensive because we have to update all the weights. However, depending on the classification task at hand, this might yield better predictive accuracy.
I have multiple similar videos and I want to detect every instance of a common image. I have a perfect base imagine of what the program should detect, and I need it to, based on that, find every instance of it that looks kinda alike, and then present me with the candidates so that I can point which ones are correct and which aren't so that it can train itself.
why freeze the layers ? im new
A good example would be a 2D RPG game, where the images are mostly clear with little distortion, except from the magic spells covering them sometimes. So the program would tell me: this square area appears to match 30%, or it looks 30% of this character is visible inside it.
Then I would confirm and the program would train itself based on that.
Is it possible?
If yes, could you guys recommend me some resources? Hints and advice would be great too.
I'm not a math person, I just read loads of documentation and test the output on things to know how to put the logic together.
Thank you in advance
Okay cool. Let me break it down a bit more. When it comes to transfer learning, we're essentially transferring knowledge from a large dataset.
Since labelled data are typically scarce, in practice, it makes sense to leverage pretrained models (example Resnet, Llama etc) which we can further finetune on our custom dataset.
The key idea behind transfer learning is that we can use these large dataset to learn general feature extraction layers that are generally useful.
Now, this is the part that answers your question more directly.
The essence of freezing the weights is because we want the backbone of the network (in your case, Resnet) to remain untampered with. That is, we don't wanna tamper with the CNN part. We can however, decide to tamper with the MLP or even, just the output layer.
Remember, Resnet has already been pretrained on a large image dataset (ImageNet) and it has learned stuff during that training.
If we now want to use ResNet on our own custom image dataset, say, CIFAR-10, we might not want to update the weights in every layer in ResNet again, hence, the reason one might decide to freeze the weights in those layers.
Of course, this is not always the case in transfer learning. If you want to update all the weights during training, you will use the 3rd variant.
thanks
also how do you calculate the similarity and distance in a siamese network? and how to classify them after?
is there anyone who is learning python advanced concepts here? am looking for a peer to study together
I would like to learn advanced concept in both ways, about python and about ML
Hi guys
bah humbuggy
may someone help me
anyone know if we need linux to utilise gpu or just anaconda is enough
Tensor Flow removed support for GPU usage in Windows a while ago, so if you're using their latest version you will need to use Linux (or WSL2)
Nearly any other major tool supports it in both Windows and Linux
For most tools you don't really need of Anaconda either, although it might be easier to setup some using it than without it
i thought anaconda navigator had
function to utilise gpu?
hmm
some tools you can install using it may utilise your GPU if configured correctly, but for most part, you can install those same tools without using Anaconda and they'll work just as well
A quick guide on how to enable the use of your GPU for machine learning with Jupyter Notebook, Tensorflow, Keras on the Windows operating system.
I researched and tried various methods to get this work, and discovered this to be the easiest and quickest solution.
This will allow you to use your GPU instead of your CPU when training your your n...
is this removed?
I would strongly recommend using PyTorch instead of TensorFlow if you do not have any specific reasons to use TensorFlow
why
in the last few years it has greatly overtaken TF in popularity, so most recent research, models and resources will be using it
i c
also
ive been creating lot of environments and install libraries for each of them
now my space is full
where can i clean them up
just delete the environments
how depends on what you're using to create them, but usually same place that you went to create in first place
Hi, I am currently trying to make a RL, DL rocket league bot that can play in competitive/casual games (online games). There is almost no way to extract data as in speed/trajectory of the ball as it changes memory adress often. I have made some pattern recognition so it can find the ball on my screen and track it and soon getting some more variables for it to see. Is there anything I can do since I am pretty stumped. It doesnt really know where it is relative to the field and the ball on the field, how can I make much more accurate?
You should not make any kind of automation for playing rocket league in competitive games. we will not help you with this.
its not for comp its just for concept
i was trying to say doing it without the rlbot framework
Sorry if I haven't explained fully.
hello! i'm writing an academic research paper w/ all my data analyses done in jupyterhub & don't know how to cite it in the abstract. usually when i work in R, i can just say R (version 4.1.3) but how would it be for jupyterhub? like are there versions or am i just supposed to cite python itself
seems like there are some open Issues simlar to that in their organizational repos, https://github.com/jupyterlab/frontends-team-compass/issues/144
As far as Jupyter Hub goes, you could just cite the https://github.com/jupyterhub/jupyterhub website
You may want (need?) to cite some of the libraries you are using like numpy, pytorch, polars or whatever you're using... Usually you should be able to find information about it in the GitHub readme, but in practice it's all over the place
you could also try bumping https://github.com/jupyterhub/jupyterhub/issues/2661 or otherwise contacting the repository maintainers
(for real... there are way too many open issues about this in their repos, https://github.com/jupyterhub/jupyterhub/issues/2039, https://github.com/jupyter/jupyter/issues/190...)
thank you so much for your response!! i tentatively said "all statistical analyses were performed in Python (version 3.11) in the JupyterHub environment" but i didn't think to cite each of the libraries. i also didn't think to check the readmes, i appreciate your help a ton!!
You do not necessarily have to cite every single tool you are using, maybe check with your professor or reviewer
I dual boot linux and windows switching to ubuntu for programming was the best decision of my life so if u can i would dual boot ubuntu or just switch entirely if you can
Hey, I have a question. I know that high bias is associated with underfitting, but what does the term mean? Like... the etymology of the term maybe? high variance makes intuitive sense to me but I'm not sure how to interpret high bias
Without actually answering your question: my opinion is that ML terminology is a shit show and you should fully disconnect any ML term from any straightforward meaning that a fluent English speaker would intuitively ascribe to it.
lmao fair enough
There definitely is some like.... nomenclature variable labeling that just bothers me because it contradicts with other mathematical paradigms
So, can I discuss ML here? Is that considered AI?
it's pretty much a subset of AI. and the only subset that people care about.
I use to program in PERL. Done a little c++ and java. Fairly new to PYTHON but I"ve used it a few times. I used gpt to start a script, it suggested to use a ML model. I used the joblib file and "trained" it by running my code. I really don't know enough about what I'm doing so am looking into how I can find out more about the "ML" part of my file. In the general chat, I'm asked questions, I don't know the answer to, but I'm trying to learn, not getting much help...
I understand the parts of the code that call it, but I don't really know what "it" is lol...
I don't see where my code is calling anything other then joblib.
well, the joblib file.
I can't imagine what joblib would have to do with ML. can you put the joblib file in the paste bin?
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
sure, it's there...
https://paste.pythondiscord.com/XSDA
ok, wait, sorry, that's not the joblib file... brb.
nevermind--don't post the joblib file
is there a way to add a file after the fact or do I need to make a new one?
it looks like it's just using joblib to save and load a python object
ok, it's small.
So, this is the current code... As I said, I was using cgpt to start the code off. I found that it seemed that cgpt was messing with me, but it deleted the ML module probablt 50 times..
you have model = joblib.load(MODEL_FILENAME). it looks like model was originally created using the scikit-learn library
if chatgpt gave you this code, but it didn't give you a complete file that you can download to be MODEL_FILENAME, then ChatGPT was totally bullshitting you
I usually just need a lil start to research it, but I was getting nowhere fast.
it did. It messed with me for days before I gave up and just went solo.
it deleted so many bit of code that worked fine when I didn't ask it to it was crazy..
I am a computational linguist--my profession is essentially evaluating the utility of technologies like ChatGPT for different use cases
And now I'm left as a noob, not understanding the implementation of ML in my code, but I want and need to.
my professional and very authoritative opinion is that you should only use ChatGPT for "informed questions"
which is not currently the case.
I'm not using at all anymore. Havent for quite a while. I installed zencoder and that's helped me hammer bugs.
once I started making the script modular, gpt couldn't handle it at all.
At lease with zencoder, it can see all of my modules and functions.. But it's still been unable to get my script to run since I ported it.
But, Every time I ask a question, I usually figure out the answer as I hit enter.. lol... Cept for ML.. I don't think my script is using it properly and I want to use it for a variety of models to test with.
I mean, clearly it's not really using it activally
How did you come to determine that it was the scikit-learn library? I'm guess this is the same as the sklearn?
So it's trained with the sklearn.linear_model Now I have to figure out how to retrain it using the new model. The first run used 1 as buy and 0 as sell. My new model uses 1 as buy, 0 as hold and -1 as sell.
Im I implementing the ML properly, to generate a joblib file and then that is whats used? Can it be used in a more live sense to dynamically learn?
(should read "am I"?)
they're the same. it was evident by the method calls.
the overwhelming majority of ML programs (ie, models) do not learn while they are being used; all the learning takes place before, during a training phase.
I assume a lot of people here have tried to build or at least learned from building models to predict the stock market which is pretty much "impossible"
however something gets me really curious, why do most people lose money trading? isnt it a 50/50?
and if we take the assumption that most people lose money trading, then shouldnt a simple python program that opens trades against normal traders be profitable?
why wouldnt it?
my guess is that there are people who make enough money from day trading to live, and that that only works because they make a large profit from lots of people making small losses.
Anecdotally, the majority of people I know who day-trade are trading off of poorly educated decisions, i.e trading on stocks for companies they like, ones they hear about a lot in the news, etc. and have a general disregard for the numbers. I have been able to make a fair amount of money by trading purely based on statistics, though this is not consistent.
In general, from what I have learned, advanced learning algorithms are not better at making trading decisions than classic statistical analysis, so AI is generally just slower at making decisions, requires more serious hardware, and consumes more electricity for mostly the same results as simpler algorithms.
woah, Fisher is in this channel
With all that said, I think making a simple python trading algorithm could be a fun project, it is not as easy to translate statistics into profit as you might immediately think, I always recommend testing extensively without real money and limit risk for programming errors.
Much of the difficulty is not with analyzing the market, but making decisions based on edge cases and handling the API. I.e I have written in bugs before where a stock was, like 1% away from hitting my sell mark to earn profit, but I missed it and didnt handle the fall correctly and let a share spiral and lose WAY more money than I would have if I was trading without the algo simply because I didnt handle an api error response correctly.
I saw you here, I had to
but I'm always in this channel
Oh, and to answer this portion: It probably would and there are loads of people doing this with many millions or billions of dollars. The problem here is that you are not necessarily competing against my random friend who may know nothing about trading, you are mostly competing in an arms race of other skilled and informed traders/developers to take the money from the randos who are making poor decisions. When you change the goal post, it is easy to see how that could be difficult.
I dunno, I made an advanced ML system and I'm pretty happy with the results so far. Looking at the metrics - Sharpe Ratio of 2.78 and Profit Factor of 1.88 it's performing well above average. Its processing 18 different technical features simultaneously while also providing confidence levels for eah predicition. Not something a simple classic system can do.
hi
i'm doing a program to help an student to do a new activitie based on a book
i tried to use the openai API to do that but it's so expansive
how mutch it will cost if i decide to train my own ai?
Thats kind of loaded question. Theres a bunch of factors and it can get complicated and expensive quick. Personally if you're serious get a rig with a 4090 and start experimenting. There's some free ones you could use in the mean time like Google Colab they all come withs limits though.
I can do this project with something like $2000?
I think its 2k just for Graphics card these days.
wait for blackwell
wait for them to sell out and watch people jack up prices? maybe š
It was crypto that caused the shortage a few years ago. Ethereum really took things to the next level, but with the increase price in BTC i foresee another smaller mining boom and a sell out of these things quick.
Humans love doing that when things get scarce, remember the whole toilet paper debacle during COVID19? Like why? Lol. People saw it selling out on TV and it created a sense of scarcity. The perfect storm is coming. I bought my 4090 at retail when it first came out, now its like 700-1000 more, it hasnt budged in years
So in a way its related to AI/ML because if you want one forr you projects, I'd pull the trigger when they drop.
Mining is pointless now industrial grade factories have taken over in 2021 it was everyoneās last opportunity to mine so they went all in and took advantage but they halved btc and no one can compete now
this is like asking "how much would it cost to contract a programmer to make me a program?"
tbh if you're not doing this regularly, I heard from others that just renting is cheaper
renting GPU compute?
yes
if you value your time, renting GPU compute is cheaper than any roundabout solution one might come up with (but they probably won't come up with one)
like runpod or something
it's definitely cheaper than buying a what now, 1k 2k dollars 4090?
at least you can also use it for gaming
but then you can't game while your model trains, which is a non-starter.
where i can rent a GPU?
google colab
ok
Is any system currently utilizing a Time Series DRL for Rocket Control Systems? I have to refine it but the workflow how does it look?
How can i caluclate the similarity and distance in a siamese network can someone help me?
I am running python code in jupyter notebook created by docker with below file
Everything is working fine, just one doubt.
Here i am creating 2 fodlers named ./files/data & ./files in the container right?
When i open the jupyter notebook, only seeing /work folder.
It's just confusing. I never said to create /work folder, but it got created & on the other hand, ./files/data & ./files are not created.
version: '3'
services:
spark:
image: jupyter/pyspark-notebook
user : root
ports:
- "8888:8888" # Jupyter Notebook
- "4040:4040" # Spark UI
volumes:
- ./files/data:/root/pyspark_simple_end_to_end_project/data
- ./files:/root/pyspark_simple_end_to_end_project/jupyter_note_book_files
environment:
- JUPYTER_ENABLE_LAB=yes
command: start.sh jupyter lab --NotebookApp.token=''
this is more of a #tools-and-devops question
im trying to make a drawing board that reprocess the image to be centered before feeding forward but cant manage to get it work
can someone help me with that?
pls help me with this
They took their shot in '21, and now they say it's over? Bah! Crypto's core was about everyone diggin' for treasure, not just the fat cats with their fancy ships. A new tide could come in, a new way to mine. Then we'll see who's laughin'
Is there a way to format pandas to_latex by column
table_list = self.df.rename(columns=self.get_headers()).astype("float").to_latex(
# right align columns, add || between measurement and calculation columns
column_format="r" * len(self.measurements) + "||" + "r" * len(self.calculations),
float_format="{:0.3E}".format,
index=False,
position="ht"
).split("\n")```
Id like the first three to be floats (without trailing zeros) but id rather not concat the latex
there's a formatter kwarg of type dict[column_name, function], so you can have the function be "{:0.3E}".format for some columns and "{:0.2f}".format for others, or whatever you have in mind.
As I understand, you use a ML training model to make a joblib file then you use the joblib file to perform the function after it's been trained. In a crypto trading bot that continuously runs, for days+, can the ML training module be called say, every hour, to re-train on the last hour's data or is the joblib file static and once it's called, can no longer be modified.
You build a model and serialize that model (using whatever is appropriate, joblib, or PyTorch.save, etc). You may use the model, or retrain and use a new model (which you'd serialize again either overwriting or creating a new file)
@bitter harbor did it work?
had to change around how i was replacing the headers but yes ty
yay
now I can finally be happy
speaking of being happy, im formatting the var names in the headers myself but dont love having a constant max char len, is there a way to get latex to 'evenly' wrap so the table fits the page width?
Like I would be happy with Mass times Lever\\ Arm for m_2 but not Mass times Lever Arm for \\m_2
def get_header(self) -> str | None:
if isinstance(self, Constant):
return None
squished_name = self.squash_name()
uncertainty = rf"\pm{self.uncertainty}" if self.uncertainty is not None else ""
header_base = rf"{squished_name}\\${self.symbol}{uncertainty}$\\({self.unit})"
return r"\thead{" + header_base + "}"
# I don't like this, but I couldn't figure out how to do it with latex
def squash_name(self) -> str:
name: list[str] = []
words = self.name.split(" ")
for word in words:
if len(name) != 0 and len(word + name[-1]) < 10: # Put words on same line if < 10 chars combined
name[-1] = name[-1] + " " + word
else:
name.append(word)
return r"\\".join(name)```The other two lines are constant, only thought ive had is to try to minimize the dimensions of the text but that sounds like a lot of work and i dont know how well it would play with the tabular
THANK YOU BB. Alas, the question remains. Can I keep rebuilding, reloading, reusing, relearning, rebuilding, reloading inloop on a continuously running script?
This reminds me of the actual progression of mining. Individuals dig to find the gold, big companies come in and take over then later, individuals pick it up again when the big guys start overlooking the small pockets and the cycle continues. Main difference, different tools. The fat cats were the ones who could own all the fancy new tools. This time around, the programmers making the tools are using them for themselves. And they become the next fat cats.... There's an algorithm there somewhere š
if you want to talk about the crypto economy, please go to an off-topic channel.
lol.. exactly what I'm talking about.. I asked a question and got no answer. I engage in someone elses comment, and then you say this.....
but to me..
lol
why not them?
This channel is not for the discussion of crypto. I'm muting you if this continues.
it;s for data science and AI... So it pertains.
you can DM @sonic vapor if you have an issue with our moderation practices
crypto is not part of DS/AI
this message is a tangent about the crypto economy.
If I'm using python and AI and ML to make a bot (lets say it's not a trading bot but a research bot) then it absoutly DOES belong here, just becuase I"m researching crypto doesn't make it not pertain!
And where did that tangent come from?
what did I REPLY too?
THAT was relevant?
DM @sonic vapor if you have any questions about this. Please make sure that all your subsequent messages in this channel are only about data science and AI.
Yet, no answer for my question though....
Is making a bot that monitors crypto to look for specific patterns "data science"? Answer honestly now...
Think anyone could give me some guidance on this problem I am facing, with GPT-2 Fine tune.
So I am fine tuning GPT-2 on some conversational data scraped by reddit: https://www.kaggle.com/datasets/jerryqu/reddit-conversations with transformers.
The problem I am facing is with my output sentences being truncated (abruptly cut off),
Prompt: What is your favorite type of food?
Output: I like bananas, but I have a particular dislike for onion rings. I really dislike them. And (take note of the "And")
This is my tokenize function for both training and testing: (I have a feeling the problem is because of this max_length param here, please correct me if I am mistaken.)
def tokenize_data(data, tokenizer, max_length=1024):
return tokenizer(data, return_tensors="pt", padding=True, truncation=True, max_length=max_length)
This is how I format my data after cleaning:
dialog_data.append(f"<USER> {user}{tokenizer.eos_token} <BOT> {bot}{tokenizer.eos_token}")
Then I pass dialog_data into the tokenize_data function.
Then I pass the tokenized data into get_train_eval_data function:
def get_train_eval_data(tokenized_data, test_size=0.2, random_state=42):
input_ids = tokenized_data["input_ids"]
attention_mask = tokenized_data["attention_mask"]
train_input_ids, eval_input_ids, train_attention_mask, eval_attention_mask = train_test_split(
input_ids, attention_mask, test_size=test_size, random_state=random_state
)
train_dataset = TensorDataset(train_input_ids, train_attention_mask)
eval_dataset = TensorDataset(eval_input_ids, eval_attention_mask)
return train_dataset, eval_dataset
Then finally, those get passed into the eval_dataset and train_dataset params of the Trainer class.
And on the testing side, this is my generate call:
def generate_response(prompt, model, tokenizer, temperature=0.7):
inputs = tokenize_data(prompt, tokenizer).to("cuda")
output = model.generate(
inputs.input_ids,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
do_sample=True,
temperature=temperature,
)
return tokenizer.decode(output[0], skip_special_tokens=True)
And my prompt:
prompt = f"<USER> {input("Enter Prompt: ")}{tokenizer.eos_token} <BOT>"
response = generate_response(prompt, model, tokenizer)
print("Response:", response)
is this the right channel to ask for help on code for ai speech recognition?
Since you already set padding to True, I think you should also try using the same tokenizer that was used on the pretrained GPT-2 model (if you're not doing so yet)
Once you've loaded and initialized the tokenizer, try to print the max_length of the input tokens to confirm if it's really 1024.
Once you've known the max length, use that same max_length in your utility function for tokenization.
Yes
what's the keepdim parameter of functional.pairwise_distance?
sure what's ur project?
Hello. When you ask a question, be sure to ask your whole question all at once.
after aggregation, you lose a dimension (over which the aggregation was performed), keepdim lets you put a placeholder dimension of 1 there instead of it disappearing
not specific to torch, numpy has that to for appropiate functions and they call it keepdimS
why use it: may be you'll do something with the result afterwards that is more convenient if it has that extra dimension
example: arr := (N, D) array, arr.sum(0) is of shape (D,), arr.sum(0, keepdims=True) is of (1, D)
I agree with ya. Don't let these guys bother you. I mean my crypto project involves ML.. soo
Guys, i trying to learn deep learning and machine learning but i need to get a better graphic card, i get a rtx 3060 ti 12GB or a rtx 4060 ti 8gb?
you don't need one
Oh.
ofc depends on what oyu wanna do but for just learning basic DL you don't need one, just use your CPU. and if you really need a gpu you can use google colab or some free service
if you just wanna train big networks then you need a GPU sure but I mean for learning you don't need it.
and if you want to train a "big network", it probably requires an enterprise-tier GPU (not a gaming one)
Hello everyone, I have an idea to write a python package for automating the machine learning process, the process that i have in mind is as follows:
- Filling (or dropping) missing values.
- Auto Feature Selection
- Splitting the data in multiple forms (for multiple training tests)
- Training multiple models on the test set
- Auto-Hyper parameter tuning
- Picking the best performing model
The code would look something like this:
model = AutoML(X, y, **optional_parameters) # AutoML is just a placeholder
X: dataset without the target
y: target
I have the skills to work and finish such a project, but I need your feedback, from my research i found that there are multiple python packages that can do the same thing, like MLJAR AutoML, Auto-sklearn, PyCaret, etc.
But I read multiple reddit threads that complaint about the limitations of these packages, and i am willing to fill these gaps in my package.
So, what do you think? should I commit to this project? or should I put my efforts into something else?
if your math is good and you wanna actually learn the theory: https://www.deeplearningbook.org/ otherwise just go and start with the pytorch tutorial ^^ and do some project (classify dogs or whatever)
Ok, thanks.
have a good day.
I'd try to learn the basic concept. A very nice video for that is https://www.youtube.com/watch?v=aircAruvnKk https://www.youtube.com/watch?v=IHZwWFHWa-w and other videos of that guy.
i gonna see this when i go to my home, i on the work now, and i need focus, english is not my first language.
I have some dificulty
well you'll get better with time š also if the math is way too complex for you: If you really wanna learn it, you can. It's very doable. But there's also a very applied side to deep learning where you don't need much math (but that's veeeery shallow)
I'm doing a recommendation system as a graduation project with my two project partners we want to make a hybrid system that includes a content-based, collaborative and demographic recommendations module while utilizing reinforcement learning to keep up with new users data
as none of us has any experience we thought we would like to turn professional communities for advice
we started preprocessing on the data but have fears toward actually being able to implement everything before the deadline February 12th and if there's any integration limitation that we should know of before starting to train the modules so that we wouldn't have to start from scratch after we finish a module cause we should've done something differently if we were going for an hybrid system
Ok, i'm new here and wasn't sure if it's appropriate to ask here
If you're unsure about finishing before the deadline and it's crucial to submit a complete P.O.C., you might want to explore Meta's 2024 ICML paper on recommendation systems. They introduced HSTU, a new architecture designed for high-cardinality, streaming data, achieving state-of-the-art performance and efficiency.
The paper highlights how Generative Recommenders can scale to massive datasets while significantly reducing computational costs.
Check out the paper and the code therein and see if it's something that interests you.
Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compu...
i will look into it
also no we don't need a complete P.O.C
it's just that me and my team are basically fumbling in the dark without prior experience so i just wanted to seek for guidance
thanks for your reply
i havent been able to figure out why my code is not working, i'm trying to set up basic voice recognition so that when i say "what time is it", it will respond with the current time. i followed a tutorial on how to do it but it isn't picking up my voice
anyone have experience with time series analysis/forecasting, arima models, or regression? if so, please give me book or resource recommendations that are easy to understand. I want to learn but the math and wording is so complicated everywhere I look. I suck at reading and understanding the conceptual math most of these books have. @ me replies or go to #1319410226088382576
There's this pinned message, it's a bit old so let us know if anything is a dead link: #data-science-and-ml message
Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively. Examples of time series forecasting use cases are: financial forecasting, product sales forecasting, web traffic forecasting, energy demand forec...
Wheres the problem? In speech recognition?
What library are you using for it?. I tried smthn similar
its not taking microphone input, using libraries pyttsx3 and speech_recognition, heres the code: ```import pyttsx3
import speech_recognition as sr
from datetime import datetime
engine = pyttsx3.init()
engine.say("Hello Sir, how can I help you")
engine.runAndWait()
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
def takecommand():
command = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
command.adjust_for_ambient_noise(source)
command.pause_threshold = 1
audio = command.listen(source)
try:
print("Recognizing...")
query = command.recognize.google(audio, language="en-in")
print("You said", query)
if 'what time is it' in query:
engine = pyttsx3.init()
engine.say(current_time)
engine.runAndWait()
except Exception as Error:
return None
return query
takecommand()
How can I learn to make a very very simple ai chatbot?
I have a programmer background very little data science--but I guess I'm trying to figure out how to make my own very very simple AI data models. Or at least learn how it's done
Example: chat bot that only has knowledge of automotives
a chat bot that can respond to questions about only one topic, and responds with plausible answers for that one topic domain, is still very advanced and not a good beginner project.
Start with a simple prediction model
Not chatbots which includes NLP, DL and LLMs lol
chatbots are part of NLP by definition, but I'm not aware of any question-answering bot techniques that are easy for beginners.
this is good advice.
Idk man i am still learning, my biggest problem was going right into ML and DL while not knowing the statistics fully š
learn stats
all my homies learn stats
Yeah im doing rn, im learning distributions rn⦠i have a long way to go, i should do it as soon as possible, going to get ready for AI Olympiads
And im so stressed man
Have to learn a lot in one month lol
Hey just a quick question
when doing countour plots
is it possible to get the value of the countour as long as your x and y cord in when you hover above the interatice plot?
I'm working on a project with assemblyai with the transcribe live audio streams feature, realized you can't set it to a different language, it only works with english. Is there an alternative to assemblyai that works with spanish? Does anybody know? I can't seem to find anything that works as good as assemblyai.
is there a way to have speech recognition use a wake word?
@bitter oyster To use wake word, you can do something like this. This uses regex to match your wake word, if found gets everything after your wake word plus wake word, you can then pass that to your ai assistant.
def extract_prompt(transcribed_text, wake_word):
pattern = rf'\b{re.escape(wake_word)}(.*)'
match = re.search(pattern, transcribed_text, re.IGNORECASE)
if match:
prompt = match.group(1).strip()
return prompt
else:
return None
prompt = extract_prompt
if prompt:
do_stuff()
do i need to change anything to add to my current code? ```import speech_recognition as sr
import pyaudio
import pyttsx3
import time
from datetime import datetime
now = datetime.now()
current_time = now.strftime("%I:%M:")
current_date = now.strftime("%A:%B:%d")
This tells the AI to Speak
def speak(text):
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice',voices[0].id)
print("X.E.N.O.N.:" + text + "\n")
engine.say(text)
engine.runAndWait()
This states how to process audio
def takeCommand():
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...", end="")
audio = r.listen(source)
query = ""
try:
print("Recognizing...",end="")
query = r.recognize_google(audio,language='en-US')
print(f"User said: {query}")
except Exception as e:
print("Exception:"+str(e))
return query.lower()
def main():
Talk = True
while Talk == True:
userSaid = takeCommand()
if "hello" in userSaid:
speak("hello sir")
if "bye" in userSaid:
speak("goodbye sir")
if "how are you" in userSaid:
speak("doing well sir")
if "stop" in userSaid:
speak("stopping sir")
break
if "exit" in userSaid:
speak("ending process")
if "what time is it" in userSaid:
speak(current_time)
if "what day is it" in userSaid:
speak(current_date)
if "open my email" in userSaid:
speak("Maybe you should finish your program before trying to use it")
time.sleep(1)
main()
Is there a way to get FRED data without APi? I couldn't find a Internet tutorial on it.
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
@bitter oyster you just need to check for wake word after you get transcript, after calling takeCommand, then continue if wake word, otherwise do nothing
Have you seen NVIDIA's latest mini supercomputer? 70 TRILLION computations a second. In the palm of your hand. 15 watts. $249 It can run python scripts and can run standalone AI models. In this video (I hope it's ok to share this) he uses one with a python script to monitor his driveway camera to identify guests and vehicles. Then he adds llama3.2 to it to interact with his script for the vocal call outs.
Is anyone using any of the Jetson computers or similar to run their intensive code?
https://youtu.be/QHBr8hekCzg
FREE GIVEAWAY OF JENSEN-HUANG-SIGNED ORIN NANO SUPER! See Below!
Join Dave as he explores NVIDIA's Jetson Orin Nano Super, a compact AI powerhouse with 1024 CUDA cores and 6 ARM cores for just $249. Learn why this could be the best AI board for your projects in robotics, IoT, or AI development. Free Sample of my Book on the Spectrum: https://a...
this is too good !!, nice
I'm trying to scrap the data below this 'load more' button, but I can't do so. Can someone help with it? code - https://paste.pythondiscord.com/4NUA
what online AI is this and do they have an API that you can use?
https://www.motorsport.com/live/, i'm trying to extract commentary data
Its unfortunate that they don't have an API. Even more unfortunate that their Terms of Service says to not scrape data.
So as per the server rules, can't help
!rule tos says so
5. Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.
Ohh okay, I didn't looked at it, I needed it to train the AI model, Are there any alternatives?
anyone here familiar with model training for computer vision?
Don't ask a question to ask a question. Ask your question with all the details one would need to be able to answer your question right away.
sorry my bad. I need assistance on how to do code relating model training for computer vision. For face recognition specifically. I've ran my codes but I dont get the output which supposedly have graphs
Open a help thread and share you code and output: #āļ½how-to-get-help . Hard to assist without seeing the specific problem
hey, im looking for a mlops tool to track image processing pipeline/steps that were applied, is this possible to achieve with mlflow? as far as Ive checked there are mlflow recipes available, but Im not sure if it tracks steps applied
Hello, can anyone recommend me a good tutorial/book/course on data engineering? Specifically Azure if possible. I have started data science recently and would like to get the basic or intermediate info on data engineering.
Why doesnt pytorch have a builtint contrastive loss?
I'm not sure; do you need help implementing your own loss function?
yes, i saw a bunch of them online but im not sure if they're the correct implementation
what's the formula, and what is the code for the implementation that you saw?
I have decided I'm going to make my Own LLM from Scratch, and then eventually implement higher functions into it to allow it to become AGI.
you will never have enough computation power at your disposal to do this.
class ContrastiveLoss(torch.nn.Module):
def __init__(self, margin=2.0):
super(ContrastiveLoss, self).__init__()
self.margin = margin
def forward(self, output1, output2, label):
# Calculate the euclidean distance and calculate the contrastive loss
euclidean_distance = F.pairwise_distance(output1, output2, keepdim = True)
loss_contrastive = torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
(label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2))
return loss_contrastive
from here https://datahacker.rs/019-siamese-network-in-pytorch-with-application-to-face-similarity/
Never say never. The dumpster behind Game Stop might have some goodies.
I'm saying never and I mean it. only exceptionally large, wealthy companies have enough computation power to do this. you cannot do it by networking together a bunch of consumer-grade GPUs. if you want to learn more about AI and LLMs, you must pick a more attainable project.
Oh my Lord every which way I have an asperation there's some smart allik to shoot me down. Who cares if it's under powered or super basic in comparison to what the companies do? Half the biggest names in tech development started in their garage with practically nothing. I'm tired of the pessimism and cynicism. Why don't actually give me something that might help?
I like your attitude
By definition, an AGI cannot be "super basic".
There are things you can do with AI that are attainable with the hardware and resources available to you, and I'm happy to help you with those. I can't help you do things that you won't be able to do.
I'm still looking at this. one moment..
AGI doesn't even have a solid deffinition and doesn't nessasarily mean singularity. Like, computer programs can and have been made to manipulate other programs and they run on the most basic of laptops
An LLM with some other higher functions added in is at least a bare bones AGI. It's a baby AGI.
imo in order to achieve what big companies were able to achieve by training LLMs on super computer, you need to invent a new approach to achieve this.
@worldly terrace you can potentially fine-tune (but not create from scratch) an LLM with hardware resources that you can obtain or rent. And you can also experiment with task-specific prompt engineering, without having to fine tune.
additionally, @worldly terrace can write multiple functions for the fine tuned LLM can use which gives it the ability to do advanced tasks.
My main Idea is to bruit force it with anything and everything I can find in dumpster and garage sales. I don't care if it'd take hundreds or even thousands of components, this is my aspiration in the tech feild. Plus I'd be cleaning up a whole lot of toxic e waste. Even the disposable vapes have LCDs and minor computer parts.
if you want to frame this as something environmentally-aware, you'll need to take the energy consumption into account
The e waste cleaning is just a bonus and wasn't really a primary focus smarty pants. Also you know how easy it is to make an AC current generator out of speaker magnets, scrap copper and an old trash bycicle?
Literally half if not everything I'd need can be found in the trash
Please re-read the #code-of-conduct. The way you're engaging here is not appropriate.
I realize that starting with "you will never have enough computation power at your disposal to do this" was stark, but I am actually trying to help you.
I'm disengaging. Once again I finally feel like I can achieve something and no one wants to genuienly help me. No one ping me I'm taking a long break.
@deep veldt I think their implementation is right, but I re-wrote it to look more like the formula.
def forward(self, output1, output2, label):
euclidean_distance = F.pairwise_distance(output1, output2, keepdim=True)
left_term = label * (euclidean_distance ** 2)
right_term = (1 - label) * torch.clamp((self.margin ** 2) - (euclidean_distance ** 2), min=0.0)
return torch.mean(left_term + right_term)
Not sure there's a specific recipe in MLFlow intrinsically capable of tracking applied image processing steps w/o creating it yourself.
You can however, customize and log those steps yourself in MLFlow like you would if you wanna track stuff like adding texts, tags, plots, and other artifacts etc.
When it comes to Contrastive Learning (a.k.a Relationship Learning) you'd have to write the loss function yourself in PyTorch.
I think this is because there are just so many type of contrastive learning methods available; hence, contrastive learning isn't tied to a single loss function.
For example if you were using SimCLR instead of Siamese network you'd still need to write the InfoNCE loss function yourself.
Blud you need to have raised a huge amount of funds for you to be able to build your own LLM from scratch
More so, I for one believe we all should refrain from building AGI. Nobody is ready for the consequences that comes from attaining AGI
I regret having asked at all
finetuning is still possible by individuals though
still, your hardware requirements only drops from nigh impossible to very high
I am not in the headspace right now.
Anyone?
ok, oh lord I have some questions. OK, with fine tuning bert, when the targets multi-labeled, do you use a differnet optimizer and is AdamW from transformers better than torch.optim.AdamW?
AdamW is the optimizer that I have used for that; it shouldn't matter which implementation you use, because the algorithm is the same.
it does not matter if the targets are binary or categorical?
I'm saying that it doesn't matter which of the Torch or transformers implementations of AdamW you use
oh
it takes a lot of data to fine-tune BERT for multilabel. what does your performance look like currently? can you show a precision-recall-f1-support table?
yeah, atm, I am trying to make a target variable based on wether or not the job posting is fake, so, I will cut that out and just show another example
what are you training BERT to do?
text classification
right, but can you be more specific? what are all the possible classes?
def __init__(self, n_classes):
super(Bert_Classifier,self).__init__()
self.bert = BertModel.from_pretrained(MODEL_NAME)
self.drop = nn.Dropout(p=0.3)
self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
def forward(self, input_ids, attention_mask):
outputs = self.bert(
input_ids=input_ids,
attention_mask=attention_mask
)
pooled_output = outputs.pooler_output
output = self.drop(pooled_output)
return self.out(output)
model = Bert_Classifier(2).to(device)
model```
!code
the possible classes are "fake" or "real", but the current data cannot classifiy it but there are patterns in the text and I am trying to make it so it is binary depending on the statement
this is a binary classification task. did you think it was multilabel?
I know, I just asked that to see if there was a difference
here, I will show what I am trying to do
- for binary classification, the target for each instance is
0or1. - for multiclass (single label) classification with
nclasses, the target for each instance is a vector ofnelements, where all elements are 0, except exactly 1 is1, ie[0, 1, 0, 0] - for multiclass (multilabel) classification with
nclasses, the target is a vector that has a1for each label that the instance has, ie [0, 1, 1, 0, 1]
df['description'] = df['description'].str.lower()
df['text'] = df['text'].str.lower()
import string
def remove_punctuations(text):
return text.translate(str.maketrans('','',string.punctuation))
df['text'] = df['text'].apply(remove_punctuations)
sample_txt = " ".join(i for i in df['text'])
target_txt = " ".join(i for i in df['description'])
from nltk import FreqDist
from textblob import TextBlob
blob = TextBlob(target_txt).sentences
most_common_words = FreqDist(blob).most_common(50)
print("top 50 most common words",most_common_words)
phrases = ["earn $5000/week!","contact now at fsmith@hotmail.com."]
df['fraudulent'] = [1 if (X == df['description'] | df['description'] == phrases) else 0 for X in df['fraudulent']]
blob = TextBlob(target_txt).ngrams(5)
df['n_grams'] = df['text'].apply(n_grams_blob)
df['n_grams'] = df['n_grams'].str.lower()
def fake_text(blob):
tokens = word_tokenize(phrases)
found_phrases = [phrases for phrase in tokens]
return " ".join(found_phrases)
df['fake_jobs'] = df['description'].apply(fake_text)
df['fake_jobs'].head(10)
df['fradulent'] = [for phrase in phrases if X == phrases return 1 else 0]
df['fradulent'].value_counts()
for phrase in phrases:
if df['description'] == phrases:
df['fradulent'].apply(phrases).astype(int)
again, it is extremly messy
be sure to follow this
sorry about that
so it is not just anything with labels > 2 ?
multiclass is when there's more than two classes, and multilabel is when it's multiclass, and each instance can belong to more than one class.
ok, the optimizer and/or the loss function change, right?
not necessarily. loss functions and optimizers can be applied to any of those.
ok, does the max_length alway matter?
where in the code is max_length?
oh, it is not in that code, the code I sent was just myself trying to find a way for the text data to have phrases that are repeated all of the time and then try to make it binary(and numerical) and apply it to a new class in a dataframe, yes binary for that one, I was trying something else with sentiments that is unrelated to the code I posted which is why I asked about multi-class text classification with bert.
I need to see the code that involves max_length to tell you what it is and why it matters.
@serene scaffold do you want me to share a differnet example for that?
whatever the code is that inspired you to ask about max_length, that's what I need to see.
ok, I really rushed this this morning
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")
from wordcloud import WordCloud
from textblob import TextBlob
from transformers import BertModel,BertTokenizer,AdamW,get_linear_schedule_with_warmup
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
device = ("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
df.head(10)
df.isnull().sum()
df['Review'] = df['Review'].str.lower()
df.duplicated().sum()
df.drop_duplicates(inplace=True)
df['label'] = [1 if X == "POSITIVE" else 0 for X in df['label']]
MODEL_NAME = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(MODEL_NAME)
MAX_LEN = 128
class Spotify_Dataset(torch.utils.data.Dataset):
def __init__(self,Review,targets,tokenizer,max_len):
self.Review = Review
self.targets = targets
self.tokenizer = tokenizer
self.max_len = max_len
def __len__(self):
return len(self.Review)
def __getitem__(self,idx):
Review = str(self.Review[idx])
target = self.targets[idx]
encoding = self.tokenizer.encode_plus(
Review,
max_length=self.max_len,
padding="max_length",
return_attention_mask=True,
return_token_type_ids=False,
add_special_tokens=True,
return_tensors='pt',
)
return {
"Review":Review,
"attention_mask":encoding['attention_mask'].flatten(),
"input_ids":encoding['input_ids'].flatten(),
"targets":torch.tensor(target,dtype=torch.long)
}
when you tokenize a segment of text with BERT, you get a vector where each token is one element of the vector. you can set a max length if you want all the tokens after a given length to get clipped off.
it prevents you from blowing out your GPU if you accidentally have an instance that's excessively long
so, do a sample tokenizer.encode_plus() first to see the length of the tokens?
you can get a sense for how many tokens your instances are by counting the number of whitespace spans there are in each instance (though "words" and tokens are not 1:1), and see if you have any outliers.
you can also replace padding='max_length' with padding='longest'. do you know what that would do?
yes
I am also pretty new to torch
I got all of it to work and it passed two epochs and actually worked, thank you!
import numpy as np
import matplotlib as plt
from matplotlib import pyplot as plt
import seaborn as sns
import sklearn
from sklearn import linear_model
import plotly.express as px
print(plt.style.available)
plt.style.use('fivethirtyeight')
df = pd.read_csv('Five Stocks.csv') SHY= df['SHY'].values
GIS = df['GIS'].values
DENN=df['DENN'].values
SWPPX=df['SWPPX'].values
SHEL=df['SHEL'].values
import plotly.express as px
px.scatter(SHY,GIS)```
Im unsure why the plotly express doesnt work even after I used pip install plotly
as for the csv all I did was take 5 random stocks and made a csv
you should delete every line where you do x = df['x'].values.
try doing px.scatter(df, x='SHY', y='GIS')
be sure to never say that something "doesn't work". that could mean basically anything. you always have to say what you expected to happen, and what actually happened.
How you guys do the code feature on Discord
!code
Ok so I expected a plotly graph after using "'px"' function
px
Oh
px is just the module. you have to call px.scatter, which you did
fig = px.scatter(df, x='SHY', y='GIS')
fig.show()
see if that works. it would cause a browser tab to open.
Yeah I'll do that when I get home
sure. to make sure that everyone's time is used effectively, it's best to ask when you're available to actively try things that people suggest.
are people using px more now instead of seaborn and matplotlib
I use plotly for nearly everything
matplotlib has the least intuitive API of all time and seaborn is just a wrapper for matplotlib.
This isn't related to data science or AI. I think you should look for mental health resources to help you work on these feelings. I would hate for someone on this server to say something to you that would make it worse.
Sorry for getting late but it worked
I thought plt.title at least was intuitive
No. It's actually illegal for matplotlib to be intuitive.
It's in the bylaws.
But I'm glad you got it to work!
hi, beginner here, does smaller data set cause predictions to capture more noise
can you explain what you mean by "capture noise"?
yes, noise as the randomness of the data that causes irreducible error, i learned the term from Introduction to Statistical Learning Python
the data that you have available is almost always a (very small) subset of all the actual data.
like if you have data about home sales in the US, you probably don't have data about every single home sale in the US.
and the smaller your dataset, the less representative it might be of the true, complete data.
so the smaller the dataset, the error represents the data variation larger, is that true?
when building a siamese network, do you need fc layers? if yes how many num_classes should it output?
How to learn data analysis fast