#data-science-and-ml
1 messages · Page 92 of 1
ok i did it but there are not graphs visible ?
let me see if i can point you to the right direction.
ax.bar(dates, volumes, width=1, edgecolor="white", linewidth=0.7)
what is this doing?
and also
- why are you using bar chart?
- what are
linewidthandwidthin particular doing?
is there any way to animate this some of the javascript libraries used to have a automatic animation by just changing a bool to true ?
I'm trying to upgrade some old code from Pandas 1.0 to 1.2 and I'm getting this error: TypeError: Expected unicode, got pandas._libs.properties.CachedProperty. Internet says that I have to set frequency ( df = df.asfreq("1D") ), but the issue is I have multiple places where this happens and frequency is different...is there a generic solution for this one?
edit: it seems that frame.index.freq = frame.index.inferred_freq is what I need
import pandas as pd
import matplotlib.pyplot as plt
# Read the stock data
stockData = pd.read_csv("/home/needjobcoder/devlopment/python/dataSciencePractice/practice/stockMarket/indexProcessed.csv")
# Convert 'Date' column to datetime
dates = pd.to_datetime(stockData['Date'])
# Extract columns
high = stockData['High']
low = stockData['Low']
_open = stockData['Open']
close = stockData['Close']
# Combine columns into a single NumPy array
stock_array = stockData[['High', 'Low', 'Open', 'Close']].values
print(len(stock_array))
dates = dates.to_numpy().flat
print(len(dates))
# Create a boxplot
fig, ax = plt.subplots()
VP = ax.boxplot(stock_array, positions=dates, widths=0.6, patch_artist=True,
showmeans=False, showfliers=False,
medianprops={"color": "white", "linewidth": 0.5},
boxprops={"facecolor": "C0", "edgecolor": "white",
"linewidth": 0.5},
whiskerprops={"color": "C0", "linewidth": 1.5},
capprops={"color": "C0", "linewidth": 1.5})
ax.set(xlim=(0.5, 4.5),
ylim=(0, stock_array.max()),
)
plt.savefig('candlestick.png')
ValueError: List of boxplot statistics and positions values must have same the length
it is giving this but len of dates and stock_array is same
I am trying to train a model to predict loan eligibility, and I am getting this error:
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 607, 11), found shape=(None, 11)
this is my code:
import pandas
import tensorflow
from sklearn.model_selection import train_test_split
dataset = pandas.read_csv('/Users/oliverjohnson/loan-eligibility-predictor/loan-train.csv')
x = dataset.drop(columns=['Loan_Status'])
y = dataset['Loan_Status']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.01)
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.Input(shape=(x_train.shape)))
#input layers
model.add(tensorflow.keras.layers.Dense(256, activation='sigmoid'))
#hidden layers
model.add(tensorflow.keras.layers.Dense(256, activation='sigmoid'))
#output layer
model.add(tensorflow.keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
print('test')
model.fit(x_train,y_train, epochs=1000)
would appreciate any help```
ty for the answer!
ended up using Gephi!
I really just meant, the best way to be able to visualize and have a high-level overview of said network
turns out i'll need to do some programming work on this, I need to visualize a particular graph in a very specific manner
You can change the BLAS library selection ordering (preferred order) when building Numpy from source. I don't think there is any proper way to handle libraries in Windows. On Linux this would be searching lib directories / using pkg-config.
(You could also move MKL so it does not find it in the expected location and uses the next option)
okay, that makes sense. but what's with the groupings? you're interested in how the relationship between these independent variables and the dependent variable changes across 4 physical test locations?
this is a very traditional statistical modeling scenario... do you expect strongly nonlinear effects here? if not, i'd suggest maybe going for a more probabilistic model here
but if you want to use the random forest approach, i suggest just fitting the random forest model with all 3 of those dependent variables and categorical features as needed to describe the location
however i'm concerned if you only have 4 measurements, a random forest model will be approximately useless
and the measurements of a time series don't really count, each time series is effectively one measurement
Thank you so much for your explanation. And you’re totally right. The validity is of course robust but if gives us a more reasonable direction of thinking. I did figure out the issue as Matplot was cutting the flow causing it to not do the random forest test on the other areas. @desert oar
No. Not exactly. I want to know what parameters (independent variable) is mostly the cause of the dependent variable. I will be using several other methods ( Such as mentel test and SEM (When I am able to build the model) and additional experiments to understand the mechanism behind. I do have more then 4 measurements. Actually I have sampled weekly for over 1 year, so is not about having not enough data but more how to use the data and understand it in modeling and through statistical tests.
are there any measures of data coverage, or data control? 100% coverage meaning, all the scoped data is maximally useful and none of it is wasted
or at least, 100% means, all data has been accounted for, or all data has been involved somehow, etc.
a measure of data being-accounted-for, data-accountedness?
Can you give a concrete example, I don't really know what you mean / this isn't standard terminology
probably we'd define prior objectives, and we could determine whether a subset of data can be successfully transformed into a form that realizes this objective
I don't mean this in a bad way but I have no clue what you mean. Could you give a concrete example / use case?
hm, thanks for the question, definitely this needs more definiteness
hmm this is kinda cursed, i was hoping there'd be an easier solution 😛 thanks for the input
i guess WSL and fiddling with packages and directories it is
It seems Conda may have something for this: https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
this is exactly what i was hoping would exist
It's doing dynamic library manipulation / injection basically. Similar to what would be done with manual directory fiddling and stuff.
it does appear that this is linux only though
Yeah, Windows no idea.
ok, but doing this in wsl is still good
Windows does not work like Linux in this way, Linux prefers lots of dynamic libs in standard locations that are meant to be swapped out (e.g. for a security patch).
So that programs can be updated without recompiling everything.
mhm, makes sense
Windows's style is more that everything ships with its own copy of every lib (often statically linked too), and then "installs" by moving it to anywhere and editing the registry and path and such.
This means it's more annoying to develop on since the libs are not all in a standard spot, but the tradeoff is that if multiple libs depend on some DLL, and that gets updated by one of them installing, it does not break the rest.
(Which is part of why Windows apps don't break all the time like with Linux (unless you are using stuff like Flatpak specifically meant to avoid this issue))
right, the newer pre-packaged stuff like flatpack and snap take a similar container-like approach
(The common offender is glibc)
(Which is partially why Musl exists)
i'll have to go read about musl, had never heard of it before
anyone done Gephi plugin programming before?
Im working through some EDA and came across this warning. I feel that I am handling the issue that its warning me about. Am I misunderstanding this?
Hey guys! I have data feature which is positively skewed and I want to use it for linear programming. I used skewness and Shapiro linearity test, and after applying logarithmic transformation the skewness got decreased but the Shapiro as k-s test both fails for normality. I've got around 400 data points, should I try to make any more transformations or remove outliers?
Hello, I'm kinda new to machine learning stuff and i wanted to ask if someone knows a good book or free course for starting w it. I want to learn some AI related stuff so... any advise would be great.
check the pins
it's because matches_raw is itself a "slice" of another data frame. matches_raw = data[...]
if you explicitly meant to make a copy, use matches_raw = data[...].copy(). if you want your changes to apply to data and not just matches_raw, don't slice off any columns.
also in the future it's much easier for people to help you if you share code as text, not a screenshot. use https://paste.pythondiscord.com for sharing
hello i think i know the answer and it is quite complex wanna talk about it on the dms
this has nothing to do with loans as such, but it's worth noting that gradient boosting historically tends to work better on "tabular" data than neural networks
so you might want to try xgboost or similar in addition to your NN
1000 epochs ... ??
i sorted it
ok
gotta add [1]
i just thought of a more simple way
wow ur computer is fast
for 100 epochs it took 10 minutes
yeah m2 is kinda crazy
only passively cooled also
ik
the dataset is only like 600 entries though
do u want to make a project together
i know literally nothing about ai and ml i doubt id be a very useful partners lol
i watched this 15 min video on how get started with neural networks and tried to apply that to another dataset
so far had 0 luck, 30% accuracy lol
sure
What arethe option to compute how far an object is from the edge?
Is exploratory data analysis all that is needed to solve a given problem? I hear people say that EDA is just a step in the data analysis process and that insights from EDA can be used for further steps and analysis, is this true?
to solve what given problem? global warming? no.
exploratory data analysis is just where you look at the data and understand how it's structured, how you could use it, etc.
What is an language model that can run on a intel i3 cpu
hi, im new with machine learning and stuff, but would like to know in object tracking, with like deepsort, is it possible to count the object tracked from the track id or is it not? why do most still use roi line to count or is there other method to count? thanks
Does any one here know about any deep learning programs, schools, or online courses that really teach you everything? Not just CNNs but all of deep learning.
From where do i learn data science
any data scientist here?
I am
Kaggle might be a starting point
There are also recorded online lectures
If you search for it you find plenty stuff.
And like karpathy said, its not that important with what you start, more important that you start and put hours in.
how old are you
31, whys that important?
I need a mentor 🙏🏽
it's not very likely that anyone will commit to being your mentor. you should just ask questions in this channel as you have them.
Sure sure... Thanks
But it feels like I'm not making any progress
Just making use for tensorflow keras applications (pretrained models)
I want more than that
I am currently working on developing a Random Forest model using a dataset that consists of weekly values for 16 different locations. My analysis focuses on the entire area rather than specific individual locations, which is why I've merged these locations into 4 distinct areas based on spatial considerations.
Regarding the imputation process, I am indeed using it to fill in missing values within the dataset. Specifically, when a location has missing data, I employ a method to calculate the mean value based on the remaining non-missing values from the grouped locations.
The issue I'm encountering is that after applying this imputation method, certain missing values that were initially 0 are now being replaced with unexpected values like 6. In essence, it seems like the imputation is causing non-missing values that were originally 0 to increase to 6.
I'm uncertain about the root cause of this issue and would greatly appreciate any insights or suggestions on how to resolve it. If there are any error messages or specific code segments that would aid in diagnosing the problem, please feel free to ask.
Number of missing values before imputation: 0
Number of missing values after imputation: 6
def fill_missing_values(data, columns_to_fill, area_groups):
for column in columns_to_fill:
for area, positions in area_groups.items():
mask = (data['Position'].isin(positions)) & data[column].isna()
data[column] = pd.to_numeric(data[column], errors='coerce')
mean_value = data[mask]['Date'].dt.month.map(data[(data['Position'].isin(positions)) & ~data[column].isna()].groupby(data['Date'].dt.month)[column].mean())
data.loc[mask, column] = mean_value
I am also using the Drop Nan as some of the parameter that I am taking sample is only montly.
Here is the complete code:
https://paste.pythondiscord.com/PDIQ
What do you guys think would be the best way to analyze a text and give suggestions to replace phrases from a list? Cosine similarity?
I'm not really familiar with most lines of your code since I'm still a beginner but I've encountered something like this before.
What if you write an if statement that if the data in the columns are 0 it should return back 0 and see if it works.
But thinking about this, other cells that aren't 0 might have issues. So what if you say if the cells are not NaN return cells else return (whatsoever you want it to).
Again, I'm just trying to help incase I'm not being helpful or anything, still a beginner at this
Coding in a nutshell. I will have this try tomorrow. So simple and plain, yet I didn’t try it… It started confusing me when I noticed some values which was 64 and turned into 78 which shocked me a little. Thank you.☺️
errors='coerce' looks suspicious
Glad I was able to help, hopefully it works.
?
Not an answer to your question but you're leaking a bit of data
X, y = prepare_data(area_data)
if X.shape[0] > 0 and y.shape[0] > 0:
rf_regressor, mae, mse, r2 = apply_random_forest(X, y, area_label)
You're not really supposed to impute and then train your model. You're imputing using the mean of the entire dataset which isn't really allowed. If I were you I would try and encapsulate your entire preprocessing and modeling into an sklearn ColumnTransformer and Pipeline.
sci-kit learn's documentation are fantastic, I'd give them a read:
- A docs page on leakage, which is happening in your case https://scikit-learn.org/stable/common_pitfalls.html#data-leakage
- A docs page on pipelines etc. https://scikit-learn.org/stable/modules/compose.html
You see at around x = 2.5 and x = 6.1 there are 2 basically straight blue lines
this is because my function is like 1/tan(x) and thus goes to the infinites and comes back
How do i stop this line from being plotted
I dont want them to join up
Why do I have to have insanely small learning rate in order not to get overflow runtime error?
import numpy as np
import matplotlib.pyplot as plt
import copy
def compute_gradient(w,b,x,y):
djdw=np.zeros(x.shape[1])
djdb=0.0
for i in range (x.shape[0]):
err=(np.dot(w,x[i])+b)-y[i]
for j in range(x.shape[1]):
djdw[j]+=err*x[i,j]
djdb+=err
return djdw/x.shape[0],djdb/x.shape[0]
def gradient_descent(alpha,epoch,_w,_b,x,y):
w=copy.deepcopy(_w)
b=_b
for _ in range(epoch):
djdw,djdb=compute_gradient(w,b,x,y)
w=w-alpha*djdw
b=b-alpha*djdb
return w,b
if __name__ == '__main__':
x = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y = np.array([460, 232, 178])
w = np.zeros(x.shape[1])
alpha = 5.0e-7
epochs = 1000
w, b = gradient_descent(alpha, epochs, w, 0, x, y)
predicted_y = np.dot(x, w) + b
feature_index = 0
plt.scatter(x[:, feature_index], y, color='blue')
plt.scatter(x[:, feature_index], predicted_y, color='red')
plt.xlabel('Feature Value')
plt.ylabel('Target Value')
plt.legend(['Actual', 'Predicted'])
plt.show()
Anything larger then this value is failing
Gradient descent has been known to diverge instead of converge if the learning rate is too large.
I'd advise you to step in a debugger if you're unsure, what's happening is that w or b gets either too large or too small based on your learning rate.
Yes, I tried debugging, the weights become large and eventually turn into NaN. Then I tried playing around with learning rate, from 1e-10 to 5e-7.Each time 3x it and limit test it. Turned out that 5e-7 is a sweet spot. I even plotted the J based on number of iteration to see how fast or slow it is converging. Is this a common problem? Does it depends on the format of data ( like the actual values or even more trivial things like the number of elements)
It's a common problem yes, I wouldn't be able to explain it better than this link: https://stats.stackexchange.com/questions/315664/gradient-descent-explodes-if-learning-rate-is-too-large?noredirect=1&lq=1
Thanks for the resource
I love reading books. How can I put my Data science skills to use in analysing a single book. How can I leverage my DS skills to more critically examine let's say Harry Potter and the Prisoner of Azkaban?
Check out spacy, you could do Nlp analysis of a text and, perhaps, extract named entities/etc. like, find all the interactions between Harry and Hermoine
how can I analyze a text with spacy, and check for similarities between sets of 2 or 3 words rather than whole sentences or individual words?
Are you the same person as zakomayo?
What would make a "set of two or three words" "similar" to another word set?
why would I be the same guy?
I don't know, sentiment I guess. I'm currently using bigrams and trigrams. It works great for certain terms but not so well for others
Sentiment analysis is pretty challenging, so you might already be getting the best performance you can get without upgrading to a more sophisticated technique
gotcha, I'll probably leave it as is, it's working acceptably well so far, it's a coding challenge for a job opportunity
thanks
I once had a job interview where they asked me to do sentiment analysis
On 100k tweets
Is that what you're doing?
oh god, thankfully not
I got rejected 
I got a text paragraph and I have to suggest replacement phrases from a list based on a similarity score
I can approach it however I want, and decided to use spacy because it's what I'm most familiar with
it's working pretty alright, but certain sentences get suggestions that are not so good, and researching I found out about sentiment analysis, but yeah it seems very complex and I'm running out of time
You can use my thing
https://github.com/swfarnsworth/madlibert
thanks man
I may use it as inspiration, but I still wanna come up with my own script, since I'm still learning and I want to understand what I'm doing if I end up getting the job
- spaCy
import spacy
nlp = spacy.load("en_core_web_md") # make sure to use larger package!
doc1 = nlp("I like salty fries and hamburgers.")
doc2 = nlp("Fast food tastes very good.")
# Similarity of two documents
print(doc1, "<->", doc2, doc1.similarity(doc2))
# Similarity of tokens and spans
french_fries = doc1[2:4]
burgers = doc1[5]
print(french_fries, "<->", burgers, french_fries.similarity(burgers))
Thank you sooo much.
Guys in need help with my project. Plz this is very urgent. I'm not as well gifted as y'all. I need to solve a real life problem using Business intelligence or machine learning. A unique topic help me how I can collect data for it too. Plz guide me programming gods. It could be any problem.
For now I just need to give a problem and analysis on how I'd collect data and try to solve it
thank you!
I like this
I will look into scikit documentation later
Can I use scikit to train images?
Instead of using tensorflow models
Yes and no. Before neural networks people used to run algorithms like SIFT to make variables and then used models such as those found in sklearn but using a neural network is a lot easier and it has way better performance.
Wayyyy easier
Makes me feel I'm not even trying
Thanks man
could anyone throw me a couple of keywords to get me started of evaluating whether two paragraphs contain similar content/ideas?
I'm looking to get started with using AI in python. I want to write a program that will run power shell scripts to look for vulnerabilitys in a system and notify you about them. I could write an algorithm to do this manually but I wanna incorporate machine learning to automate the process. However, I have no idea where to start when it comes to working with ai. How should I get started?
Semantic similarity
Cosine distance
thanks!
This sounds insurmountably challenging for a first ai project
It also wouldn't be "more automated" than the program you have in mind
You use AI when the problem to be solved can't easily be expressed as an exact series of steps
I love challenging myself but capability to learn is what I'm most interested in since cyber security is ever evolving
If you want to get started, you should practice data exploration and manipulation with numpy and pandas. So that you understand what "data" is in the context of AI and ML
Admittedly, this is pretty removed from what you want to do
But if you start with trying to classify code examples as malicious or not, I think you'll be completely lost.
I'll start with your recommendations to familiarize myself with the core concepts before I delve into the more complicated things but thank you for your help
While it's on my mind, when I got started with nlp, I had a mentor who insisted that I learn concepts that were above my ability to comprehend at that time, and while I appreciate that he believed in my ability to understand it, I think that stunted my motivation.
I want to work on programming drones(AI oriented) but I have no idea how to start. I don't even have a drone but maybe a simulation can help but how do I start?
I think good starting point would be to figure out algorithm to do it manually to look for unused open ports/services then try to integrate the ai?
Drones are often trained in stimulations, but I've never done anything with them. I do language technology.
Yh, alright then, thanks
Everyone is rushing to shoehorn AI into everything, but it doesn't go without saying that a solution involving AI will be better than one that doesn't.
I get what your saying and I'll keep it in mind. Vulnerability scanning and detection nowadays works through an algorithmmic approach using preset flags that if tripped would notify the appropriate parties. However, if you could figure out these flags you can circumvent them. However with an ai approach it's harder to predict what the ai would consider as a trigger
You train the AI with appropriate trigger datasets
If the training goes well then you have nothing to worry about. Remember after training, we test our models with a test/new dataset to see the accuracy for ourselves rather than just looking at the accuracy score
I could incorporate some vulnerability databases to learn new triggers as they discovered which would help maintain its accuracy as new methods are developed
This works... Nice one honestly
Thank you again for you suggestions. It seem like it helped a lot. And the docs you shared was highly valuable. If you ever have the time, would you have a quick look at the code and see it it actually got improved?
Although it seem to be improved, I do have a few questions.
-
It seems like the mae, mse, r2 is similar for all areas? Is the module just running for one area, although the terminal says is running for all area?
-
The module could not handle the missing data regarding the parameter that I have only once a month (beside the other parameter which I have sampled/tested weekly). Any suggestion on this?
-
Another question regarding the R^2. It turned out that the R^2 is as low as 0.25, which is quiet low. Is this suggesting that I still do not have enough data to run the module?
Again, if you ever have some sparetime, please have a look. And also, thanks to you @desert oar
https://paste.pythondiscord.com/YXKA
I don't have the time this week to look at the code but you can ping me around this time next week.
-
I don't understand what you mean with "area", I think it'll be clear after I read your script.
-
How to handle missing data depends on your domain. You might be able to do a left fill or so. Do not do right fill or linear interpolation in time series as you might leak data.
-
I never use R^2 for prediction problems personally. It's unlikely but possible it's low because there is a non-linear relationship but the model is actually good R^2 doesn't account for this.
Thank you for your time. I’ll ping you later when I know you have more time.
As someone in security I'd second skipping the AI for this task.
almost all successful applications of "AI" on specific applications like this turn out to be more like a handcrafted combination of machine learning, heuristics, and statistics. in particular, figuring out how to actually represent your data in some way that you can actually run machine learning models on it is usually the most important thing you can spend your effort on. that process is often known as "feature engineering". as you might imagine, constructing useful features is very often a matter of understanding the problem domain and starting from a position of "how would a human look at this"?
If anything data science to help identify how often certain systems or types of systems need attention and how that affects labor costs and why you should maybe charge more/less for different system types when integrated into a vuln management program
You made progress... This is nice
@serene scaffold you on?
I just wanted you to check out this https://medium.com/@TheUndergrad/introducing-gemini-chat-app-your-conversational-companion-17d9cdb3eaac
:)
That is excellent to hear. Awesome ^^
they are different outputs
"Type a number: " should be in the input call
and then you should just print even or odd
o really?
also this is probably not the channel for this lol
ohh I sent it here because there's more active people in here compared to the other channels
just needed some little help and i appreciate that you helped me
no problem 👍
hey guys, after i performed train test split, and one hot encoded my train and test data, i wanted to put them into the regressor to evaluate the model's performance. but i've been stuck on this error for a long time. do need urgent help with this!
Sup how long do u guys think one should focus on eda+ supervise +unsupervised learning. The problem is its obvious that no one can master it in small amount of time but I can't get stuck over there for long period of time either. So advice me when to move and should deep learning be my next goal.
A year
Can anyone help me do my assignment ?
It's all in python but I don't really know what's going on since I'm new to it
It won't be complicated to you guys but it's entire another language for me.
@edgy pasture this is the data science and AI channel. Is it about that?
In either case, be sure to never ask to ask. Ask your actual question. Not if someone is willing to answer a question that you haven't revealed yet.
yea
its a mixture of datascience and python
could we hop on call, cause itll be easier to say what its about
We cannot
You new to py?
Yeah
Just fall in love with python and then it’s ez
yeah i think i did fall in love because now i understand it a little and its really fun messing around with it (trial/error)
W
I’ve been hanging out c# and html just trying to make stuff, I’ve been studying
Kinda fun, I agree with you
What initially got you into wanting to study Python?
Honestly my friend she’s likes pretty good at it and I was like woah future hacker 😆
onbb !
What language does she do?
on my what
python she’s like new as well but she made it more enjoyable
onb = on bro
You a dude right?
You lot tryna learn together
I’m a beginner too, but I need a group to grow with
Hello, I'm new writing in this section. I'd like to have some recommendations on a problem I'm trying to solve. As a context, I'm trying to solve it with Deep Reinforcement Learning.
The task is to control the activity of some fans [on / off] (In this case, 3, all with the same caudal of 95 m3/h) connected to a box, that has a heater inside (currently, it is always at 100% capacity, which is 1kW)
The current set of actions, with 3 fans, are numbers from 0 to 9, being mapped as: 0 -> do nothing; 1 -> {0, 0 ,0}; 2 -> {1, 0, 0}; 3 ->{0, 1, 0}; 4->{0, 0, 1}; 5->{1, 1, 0} .... 9->{1, 1, 1}. Being {x, x, x} the representation of the state on[1]/off[0] of the fans_{1, 2, 3}
The Ambient temperature might as might not be hotter that the target temperature wanted inside the box.
Currently, the simulator I built with my colleague use the basic heat transfer equations, without considering that faster wind lower the entrance temperature.
The ambient temperature is obtained every second, and the changes in internal temperature is calculated every 0.01 seconds (A simple interpolation is made to obtain the external temperature in each "dt"). The steps are every 2 seconds (might change in the future), This means that the algorithm has to take a decision every 2 seconds. There is no penalty for turning on/off fans consecutively (like a kid with a light switch), yet.
The values available for the NN are: Ti(Internal T), Tt(Target T), Te(External T), A_t1 (Action in last step), Delta Ti (Change in temperature in alst step) and Dt (step size in seconds)
Here are my questions:
- If my intention is to keep the temperature near the target temp. Which would be a good q-function?
- Right now, I'm just using a Sequence Neural Network (SNN), with some "relu" activation functions and a linear activation function to obtain the q-function-estimate as an output. Any recommendation on how deep or wide the NN?
- Would it be possible / wise, to try to use a RNN? I would think that the hidden state would have some intrinsic information about how the external temperature has been changing over time
- If I would have to add more fans, the number of states would increase in a 2^n +1 size. Any advice to affront this "curse of dimensionality"?
- If the fan state were to be continuous... Any idea how to affront it?
Thanks in advance for any idea, suggestions, questions
Well we not learning together exactly. She taking cs class and I’m learning python my self with some free courses
what makes it enjoyable is that whenever she gets like an exercise or project that she’s having trouble with, I can assist and it js makes coding hella fun imo
Yeah
We could be one but really it’s up to her
lmao, you like designing?
nvm then g
it's cool, you enjoy yourself
Nah we could be duos
Designing?
Hey guys I did this EDA project a while back and since I have been kind of stuck about how to improve my EDA skills, it would be helpful if you could point out some specific drawbacks in this project which would help me improve my skills. https://www.kaggle.com/code/omraizada/exploring-global-terrorism
it's cool, you can stick to your duo innit
Dropping all the NA values in cell 12 without any analysis on them isn't great. Same goes for the duplicates in cell 17 and the outliers and so on
So would it be appropriate to use something like heatmap to visualize it first?
The actual EDA, at a glance, looks good. You're asking relevant questions, providing detailed answers with context outside of the scope of your dataset and so on
Just looking at the data, scrolling through it can be really helpful
Like getting those duplicates indexes and really looking at the rows and seeing what's up with them. If they're truly duplicates you can throw them away but also show the reader that they are
Okayyy, will make sure to do these things from now on
Thanks a lot for your time
Removing data is a very drastic decision so always motivating why you're doing it is always great, for the rest you're doing a good job, keep it up!
Will keep that in mind, once again thanks
how large do I have to make GPT to get interesting results ?
i'll echo that removing outliers is drastic and needs to be carefully motivated. are they actually anomalous events in some way that might warrant removing them from the analysis? or are they legitimate values that happen to occur at the tail of the distribution?
this is good
i'd like to see it on log scale as well
currently the big spike dominates the graph, which is good: it tells a clear story, there is a huge increase compared to a global baseline
but there might be a secondary story which is hidden by the scale
one thing i wonder about is measurement methodology. what defines a terrorist attack? who collected this data? has the methodology or definition changed over time in a way that might affect the data?
i'll also echo that the "asking questions" section is excellent, both in concept and in execution
can anyone explain the [index//2] part for the skip connecetion. tthe m odel is unet: ```py
class UNET(nn.Module):
def init(self, in_channels=3, out_channels=1, features=[64, 128, 256, 512]):
super(UNET, self).init()
self.ups = nn.ModuleList()
self.downs = nn.ModuleList()
self.pools = nn.MaxPool2d(kernel_size=2, stride=2)
# DOWN PART OF UNET
for feature in features:
# creating down sampling layers - adding every feature output
self.downs.append(DoubleConvolution(in_channels, feature))
in_channels = feature # becomes input to next Conv
# UP PART OF UNET
for feature in reversed(features):
# double width of image
self.ups.append(nn.ConvTranspose2d(feature*2, feature, kernel_size=2, stride=2))
self.ups.append(DoubleConvolution(feature*2, feature))
# 512, 1024
self.bottleneck = DoubleConvolution(features[-1], features[-1]*2)
self.final_conv = nn.Conv2d(features[0], out_channels, kernel_size=1)
def forward(self, x):
skip_connections = []
for down in self.downs:
x = down(x) # downsampling tensor
skip_connections.append(x)
# pass through max pooling
x = self.pools(x)
x = self.bottleneck(x)
# REVERSING LIST FOR UPSAMPLING
skip_connections = skip_connections[::-1]
# up, double conv
for index in range(0, len(self.ups), 2):
# for each index upsample
# upsample, pass through double transpose
x = self.ups[index](x)
# skip connection - div due to step 2
skip_connection = skip_connections[index //2]
if x.shape != skip_connection.shape:
x = TF.resize(x, size=skip_connection.shape[2:])
concat_skip = torch.concat((skip_connection, x), dim=1)
# running through double conv
x = self.ups[index+1](concat_skip)
return self.final_conv(x)```
So it seems that you are computing two up layers for each iteration. And you have one skip connection every two layers.
Layer 1
Layer 2 -> 1 (skip connection index)
Layer 3
Layer 4 -> 2
Layer 5
Layer 6 -> 3
Yes sir
The code I was writing did not have 2 elements in self.ups before. I thought ups was only 4 elements long, so did not know mathematically how that was working
if I want to ask something pandas-related, is data-science-and-ai the right tag in the help section?
Yes I'd assume so
class SelfAttentionHead(nn.Module):
def __init__(self, params: ModelParameters):
super(SelfAttentionHead, self).__init__()
self.d_k = params.word_vector_size // 3
temp = []
for _ in range(3):
proj = make_parameter(size_x = params.word_vector_size, size_y = self.d_k)
bias = make_parameter(size_x = 1, size_y = self.d_k)
temp.append(proj)
temp.append(bias)
self.q, self.q_bias, self.k, self.k_bias, self.v, self.v_bias = temp
def forward(self, sequence: torch.Tensor):
q_vectors = self.q_bias + self.q @ sequence
k_vectors = self.k_bias + self.k @ sequence
attention_scores = q_vectors @ k_vectors.T
attention_scores /= torch.sqrt(self.d_k)
attention_scores = torch.nn.functional.softmax(attention_scores)
v_vectors = self.v_bias + self.v @ sequence
return attention_scores @ v_vectors
I'm implementing nano gpt, and one thing that surprised me is that the Q,K,V matrices end up reducing the dimension of the embedded token
Which kinda ruins the intuition I've been reading about how all this is based on a sort of modified dot product for similarity.
The normalization is also quite strange. The normalization layers go like (v - E(v))/std(v)
Which does scale their size so as not to let them explode in value, and also centers them at 0. But I don't see much of an intuition when thinking of word embeddings as living in some dot product space as suggested by a lot of resources online
I'd imagine a better normalization would be, actual normalization, v / norm(v)
Furthermore, why is positional encoding needed ?
Couldn't the network pick up the position of each word via the literal position of the word vector in the matrix that represents the sequence ?
Hey guys, I am a Computer Science student and I want to learn Machine Learning, AI. So right now, I know a bit of Python and 7th grade Maths. I would be really glad if you can provide me with a super detailed roadmap on how to learn these stuff and finally land a job.
I don't wanna invest my time on learning something which is currently not of the most priority.
Thank You~~~
I do believe you need more math, just so you know what you are doing.
I'd reckon you should have:
-
calculus I and II (important to understand gradient descent and why neural nets work at all)
-
linear algebra - this is the basis for pretty much anything that is both high dimensional and linear and is like a language that you use to talk about all sorts of things, so it's pretty useful
-
multivariate calculus - this is like, joining points one and two, and is where neural nets reside I'd say. This is where the concept of gradient resides, which you need to understand gradient descent and etcs
-
stats and probability - neural nets can be thought as statistical models, and you use statistical tools to evaluate their performance etc etc etc
Signal processing concepts are also super useful. So like knowing what is a Fourier transform, knowing about kernels, knowing about DFT, knowing how to understand data, manipulate it, etc
I'd say, once you know all this stuff, and you are good with python, picking up the ML frameworks and just start building things is enough to get you going.
Are these stuff covered in grade 12 maths??
Uhm, I believe I covered all these in my first year college.
I'd highly recommend finding time for a college education if possible. If not, at least complementing the math til 12th grade and try to cover these subjects over time.
Do you guys recommend using pydot and GraphViz for visualtion? Not sure if its relevant but I am using python - VScode
I've never used GraphViz directly, only things built on top of it, same for Pydot so I can't comment on how good they are.
Personally I use:
- Matplotlib for straightforward things.
- Seaborn for things that are a bit more work in Matplotlib "natively"
- Plotly if I want interactive plots.
Seaborn is built on top of Matplotlib and honestly, if you want to learn Seaborn you need to know the basics of Matplotlib, it makes your life so much easier. Matplot has a very strange API this is a must-read, if you do over it, it'll all make sense in like half an hour or less 😄 https://matplotlib.org/stable/users/explain/quick_start.html. The "anatomy of a figure" section is critical to understand. This is also interesting because this is the code that actually makes the figure in question https://matplotlib.org/stable/gallery/showcase/anatomy.html.
Seaborn's documentation is also great, I'd block out an hour or two to read it after you're familiar with Matplotlib.
I am a bit familiar with matplotib but I guess I still have a lot more to learn about it. But I find it a little tricky to do quick modification to make the visualization considerated "beautiful". Some of my coworkers is pretty good at Origin but I don`t feel like having to use another software like that. I tried Origin once and and I felt I was back to SPSS somehow.
Seaborn is not working in the purpose of what I will be doing as I am currently going to create a path diagram for my structural equation model.
Okay, that's good you mentioned this. Personally I have not done SEM, but I know enough of it to know I wouldn't make those plots in Matplotlib. For better or for worse, this is one of the times I'd reach for R because they have better tools in this space, e.g., TidyGraph and SemPlot
My brother will walk me through the basic of R language. I also tried it a couple of times after my jupyter notebook limited my work. Going from jupyter notebook to another python software is a big step. haha
For R I'd really just focus on learning how to do stuff and not necessarily being a competent R programmer. Treat it like a statistics and data visualisation toolkit. I love Python but R is better at both.
(Not) using notebooks is a surprisingly long and nuanced debate, I'm on mobile so I'll summarise it by saying that you should be able to code outside of an interactive session (Jupyter, Spyder, Iphyton, Rstudio) at the very least yes
Python's my jam, but I'll give R a shot when I can. Is not that long ago that I started learning coding in general so I don`t want to overload myself beside my research. Is will come smoothly along the way. Too much of each language may just confuse me.
The things that you're doing are already quite complex - might be a good idea to practice Python in isolation as well
guys I wanna to ask How to install conda for python and How to run Jupyter notebook on it locally on my Windows I'm learning data analysis with python from a website called Jovian dot com but I couldn't save my work online if anyone can explain me this and does it worth learning python basics from this course another point the same course available on freecodecamp
It's pretty straight forward. Just download the Anaconda Distribution. https://www.anaconda.com/download
Once you've done that, it brings alongside all its friends like Jupyter Notebook to the party.
Meanwhile, can you add more clarity on the "I couldn't save my work online" part.
Is it that you were using Colab or Binder to run your code?
Akash is one of founders of Jovian. His work has been featured in FreeCodeCamp so I believe his python course will be on point!
As you already know, we humans don't always like similar stuff... So I think what you should focus more on is finding out for yourself if that particular python course in Jovian is 'customer-friendly' to YOU.
Only way to find out is to try taking a few chapter of the course with an open mind.
And If it's hard for you to understand what's being taught or you find yourself sleeping off while watching the video (I presume it's a video course), then by all means don't hesitate to drop it and try another course.
Thank you I appreciate your answer
Is it better to use miniconda and How to run Jupyter locally from online course?
If you’re interested in visualizations, you should check out some of the great stuff in #1180191057498083418
Hello guys i want to make recommendation model based on the credit card data and one of the column is df['Reward rates']
which have data like this:
rows 1: '6X 6x Marriott Bonvoy point dollar eligible purchase hotel participating Marriott program 4X 4x point purchase made restaurant worldwide gas station wireless telephone service purchased directly service provider purchase shipping 2X 2x point eligible purchase'
row 2: '7X Earn 7X Hilton Honors Bonus Points dollar eligible purchase charged directly hotel resort within Hilton portfolio 5X Earn 5X Points per dollar purchase restaurant supermarket gas station 3X Earn 3X Points eligible purchase Card'
row 3: '12X Earn 12X Hilton Honors Bonus Points dollar eligible purchase charged Card directly hotel resort within Hilton portfolio 6X Earn 6X Points dollar purchase Card restaurant supermarket gas station 4X Earn 4X Points dollar Online Retail Purchases 3X Earn 3X Points eligible purchase Card'
'3 3 Cash Back supermarket per year purchase 1 3 3 Cash Back online retail purchase per year 1 3 3 Cash Back gas station per year 1 1 1 Cash Back purchase'
'12X 12X directly hotel resort Hilton portfolio 6X 6X Select Business Travel Purchases 3X 3X eligible purchase Terms Limitations Apply'
now I'm applying many nlp techniques to extract meaningful data but either can't get relevant features to train model on or there are so many columns created if i Use tf_idf and n-grams, any help will be appreciated.
I've always used anaconda but some people also prefer miniconda. So you'll be fine with either one.
Yes you can run your code on JNB locally with the online course.
https://stackoverflow.com/questions/45421163/anaconda-vs-miniconda
You could also just go with a normal python install and use vs-code IDE with jupyter extension to work with notebooks.
one thing i do is this for every row based on ?X values creating the seprate columns
So mostly I need to focus more on the nuances of the data and be more careful while processing the data, right? And code wise is it fine?
Hey I was just curious, are there any widely used models using stochastic calculus?
There are several ways to control the number of extracted features gotten by TfidfVectorizer.
Personally, in most of my work I always use ngram_range = (1, 2) to consider both unigram and bigrams in the final features tfidf extracts.
For every other parameter I experiment, experiment, and experiment before settling for the configuration that yields the optimal result.
The documentation will do better justice than I can in explaining what each parameter in TfidfVectorizer does.
Examples using sklearn.feature_extraction.text.TfidfVectorizer: Biclustering documents with the Spectral Co-clustering algorithm Topic extraction with Non-negative Matrix Factorization and Latent D...
i see but then so many other columns are created like almost 60-70 based on the data.
see using ngram of (2,2) i get this first
see
for single row too many ngrams created
What really determines the final number of features extracted is the configuration used when instantiating your vectorizer.
So if you want tfidf to extract just the top 100 features (100 new columns), it still can do that.
So it depends on how you configure the parameters.
Make Sense, let me try and also some different approaches.
I don't think I've studied stochastic calculus. But probably related are the usage of monte Carlo methods. Quite sure GPT uses a simple one to generate its output. GPT itself approximates a probability function, which is then used to sample tokens. I've also read very briefly about some networks that use probabilistic activation. And then there's quantum learning, which is inherently stochastic because quantum is all about probabilities.
it's not that you need to focus on those things in this particular project. it's more of something to keep in mind as you progress into professional work. you can't ignore the measurement and data collection procedure as part of the data generating process.
what monte carlo method is used in a GPT model? i always thought the text output from ChatGPT etc was just a next-word prediction given the context
usually "monte carlo" methods refer to computational techniques for approximately computing otherwise-intractable quantities, very often integrals. they are frequently used in Bayesian statistical inference and probability modeling to sample from a posterior distribution
the underlying theory of monte carlo computing techniques is indeed that of stochastic processes, eg. markov chain monte carlo (markov chain being a particular type of stochastic process)
It's an assumption of mine because a lot of the explanations I've read talk about chat GPT as a conditional probability distribution, given the previous tokens it outputs the probability for each token to occur as the next token.
No that's one of its applications. Monte Carlo methods is an umbrella term that encompasses computational techniques to perform pseudo random sampling
right
that's why i am wondering where and how it shows up in the GPT language model
i get that the output is a stochastic process, but that doesn't strike me as a monte carlo method except in a very generic sense
Well, I'm only going into detail about this architecture now, so I haven't gotten to the end. But the way I imagine it is: it outputs an array of probabilities, one for each token, and you use that to sample the next token.
yeah, that's a stochastic process
the "state" is the current context and the state transition function is the probability distribution over the next token
or something like that anyway
So the computational method you use to generate the token would be a monte Carlo method, albeit a simple one
that's a broader definition of monte carlo methods than what i'd use and typically see, but i understand what you mean
if you did something like repeatedly generating multiple outputs over and over using the same prompt, in order to compute some statistic or distribution over those outputs, i'd say that's more like a monte carlo method
but maybe my interpretation is too narrow
Uhmmmmm, yeah I see what you mean. You'll usually be doing that yeah. So ig you'd just call it random sampling.
I mean is not so clear cut
In case of GPT, the underlying statistic/shape of distribution does matter a lot
But idk, I don't like to get too caught up in the definitions
Hi i have a bunch of annotated images and i want to make a python ai model that trains with those images so that it can detect the image from a given picture
can someone show me a good course or where i can get started
Wikipedia has an interesting passage about the possible definitions, I think this one is what aligns with the way I use it:
"""
Monte Carlo simulation: Drawing a large number of pseudo-random uniform variables from the interval [0,1] at one time, or once at many different times, and assigning values less than or equal to 0.50 as heads and greater than 0.50 as tails, is a Monte Carlo simulation of the behavior of repeatedly tossing a coin.
"""
So like, GPT would take the place of the [0, 1] distribution and the simulation would be the simulation of the behaviour of a person writing some text message.
This would mean that it's not just the last step, the whole thing would be a monte Carlo simulation.
The cuttoff line is arbitrary, you may have mutliple simulations interacting, but each must contain some random or pseudo-random generation that affects the output.
Monte Carlo is very broad.
Oh and repeat runs too*
I didn't quite understand your point. The passage I'm mentioning is making a distinction between monte Carlo, simulation and Monte Carlo simulation.
Simulation: Drawing one pseudo-random uniform variable from the interval [0,1] can be used to simulate the tossing of a coin: If the value is less than or equal to 0.50 designate the outcome as heads, but if the value is greater than 0.50 designate the outcome as tails. This is a simulation, but not a Monte Carlo simulation.
Monte Carlo method: Pouring out a box of coins on a table, and then computing the ratio of coins that land heads versus tails is a Monte Carlo method of determining the behavior of repeated coin tosses, but it is not a simulation.
Monte Carlo simulation: Drawing a large number of pseudo-random uniform variables from the interval [0,1] at one time, or once at many different times, and assigning values less than or equal to 0.50 as heads and greater than 0.50 as tails, is a Monte Carlo simulation of the behavior of repeatedly tossing a coin.
Uhm, no I think it would fall into the first one
Even accounting for the auto regression, the end result is one sample
Are you not drawing a large number?
It's technically a single sample from one distribution I think.
Btw, the answer is no, but it does make it more difficult for the network.
It can effectively learn to do what the position encoding does.
(But it's a waste, just neat that it can)
Oh okay I see, you give it a hint for how to represent position so it' more efficient to train it
What about this stuff tho, aren't we meant to think of embeddings as part of a vector space
Similar to other methods that precompute stuff and feed that as input rather than just the inputs directly, puts less burden on the network.
Especially fuctions that the network would require a lot of neurons to compute itself.
Yeah I didn't think of it that way, I had the impression that positional encoding was obligatory.
Or the extreme end of that, precompute a ton of random functions on the inputs, then at the end have a simple linear layer.
(Which is its own model of computation being researched, pulling answers out of chaos, one of those functions surely has the answer by chance)
hi is there a situation where the following is true
(ndarray * scalar // scalar) != ndarray
im currently facing this situation and not sure what could have caused it
shape of my ndarray is (39584,) single dim array
my scalar is 8 tho
Uhm is still possible I think, what's the smallest value in the array
0
Better yet, you can directly print the ones that are different
And try to see a pattern
left side is original ndarray, right side is after * and //
some of the bytes are short by 96 (hex 60)
Uhm, would be easier to see base 10
What is really cool is what it learns is pretty much exactly the positional encoding again, and behaves pretty much exactly the same as grid cells in biology (used for positioning in the brain). It seems that nature has converged to the same thing.
You mean it learns the sine wave stuff described in the 2017 paper ?
Yeah that's pretty cool
It's also related to grid cells, and the Fourier Transform (grid cells act like one).
woah not sure if theres a way to do that in hxd
Uhm from what I recall grid cells sort of create a map of repeated circular shapes that repeat across space. You'd have several kinds which repeat at different frequencies and that's how it kinda encodes position
Yes.
Place fields, which are built with grid cells, also kind of show up (we know less about them, so can't really tell yet) in Transformers, when the context switches you can see the attention remapping.
(The context remapping place fields behavior)
Doesn't this at least point to GPT having "understanding" similar to our own ? Since it's using similar ways of representing things
It's closer than before. It's more like we are reinventing what biology has done, without realizing it, because we are also getting in new information about the biology at the same time.
Do you know how they are doing vision ? Is it a literal part of the input or is it part of a different network ?
Like, a transcription from image to text and then that gets fed to gpt ?
Are you familiar with https://en.m.wikipedia.org/wiki/Chinese_room?
It’s a philosophical debate over what ‘understanding’ means
A lot of what happens in the brain revolves around this positioning system / place fields, so it's probably needed for all future networks.
Vision has multiple sections in the brain. That is a lot of just parallel processing to simplify it for the rest of the brain.
Yes, the system as a whole has understanding even though none of the parts does. That's also what we are, no neuron understands what I do even tho I'm neurons.
No, the only people I know using stochastic calculus are actuaries. You don't need this.
My eli5 is. The argument is that a such a system cannot be described as ‘understanding’. It’s a good read/but purely philosophical point
Oh I mean in the new multimodal LLMs like GPT4V
Vision, like all other systems in the brain is heavily reliant on top down observer expectations, it's how you see things even if they are noisy, and also things that don't exist, like imaginary edges.
They are not accurate, because the whole system needs to be learned together.
Each system can affect the top town effect on another.
I come up against the edges of it in fintech but I run away when I see it. But Black Scholes in particular.
Like priming someone with an auditory cue, which affects what they see.
That would imply that no human has true understanding. I've come to accept that the world is weird and counterintuitive.
GPT uses beam search.
You have a problem such: maximize the likelihood over a sequence. Naturally you shouldn't always pick the greedy option, you might pick to 4 suboptimal tokens first and then the rest in a greedy fashion and you end up with a higher likelihood at the end.
You can have exact solutions here using BFS/DFS but it's intractable.
If trained together, the language modeling part would actually ground it's symbols better to visuals and such, making it have an actual understanding of the world. Via just text is too narrow.
Read the wiki page, the argument is not so easily discarded
A big one is touch, specifically how positioning systems interact with that and model objects / spaces, and link that to stuff like words, sounds, etc.
However, this interconnected training problem is really hard, because you need all inputs coming in at the same time, you can't just train each part separate.
You can make an arbitrary model multimodal by doing this:
Language model A has an embedding space a.
Vision model B has an embedding space b.
Train a translation "network" c that maps a to b and vice versa.
There's been a large amount of research doing this. They take pretrained vision and language models and just train the translation/mapping network. You would need a training task that accurately allows you to learn this though, for instance image captioning may work to train this translation network.
Just plain text is about as convenient as it gets.
One thing to also note about training them separate is that you doing a lot of redundant work, when interconnected during training they can make the learning processes faster for each other.
Agreed
But if your compute budget or dataset isn't massive I prefer freezing the language and vision models and just training the translation
Yeah, having separate models right now is the convenient while still kind of works approach.
It's not ideal, but works decently well.
But multi-task learning has been shown to improve generalization and data efficiency in theory yes. I typically comment from the "practical" perspective 😄
But, if your models are online learners, now you can do some cool stuff in post. You can hook them up more directly and have them learn more together without disrupting each other's knowledge.
(Biology uses online learning, in part because it really needs to not disrupt exisitng stuff, especially while still growing)
I don't even know what black scholes is (anymore). I'm an alumnus of the faculty of economics and business (not CS) but after I finished my masters I purged everything 😩
(Interestingly, Transformers become more online-like the more they have trained)
I've read the system reply section for refutation. I don't think this thought experiment proves or disproves any side, it just brings to light how ignorant we are about consciousness.
I don't take a strong side, I just try to err on the side of caution so that I can act in an ethical manner in face of ignorance. We don't know how it works, so we should be careful when something starts acting conscious, otherwise we may inadvertently cause suffering.
I used to know though, we learnt about it in some math or finance class.
Agree, it’s just a philosophical debate… worthy of thinking through both sides
Oh that's pretty cool
Well yes, but I think about it a lot because it's a dangerous thing for us to be ignorant about.
And it seems like we won't have answers for a long time
My default stance to AI safety is that our current approach is bad
It's always philosophy 😩
issue was uint8 overflow
Yah, it’s somewhat interesting for pricing options. The main point is it assumes a random walk, which is amazingly/surprisingly quite useful.
This is the only field where this is the case. When civil engineers are building a bridge they don't call in philosophers to talk about the safety of it nor expect engineers to become philosophers.
Agree, we need math.
It's become my ultimate pet peeve these days, we should stop this imho
Think of the Chinese room/ church argument as a counter to the Turing test argument. Turing test is: it’s intelligent if it’s indistinguishable from intelligent. Churches is: it’s never intelligent.
(I don't mean this in reference to the conversation above btw! It's only tangentially related.)
Stop what tho?
The top voices of AI safety being dominated by people like Eliezer Yudkowsky
I'd say they still have a very important role to play, even when talking about how structural engineers build bridges. What should be the most important thing to be is not hypotheticals like "sentience" but things that are grounded in how models are actually trained, so basically grounded in math. There's great papers that take this angle which Rob Miles frequently summarizes The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Some AI type classification charts, like how algorithms are classified in theory of computation, but in this case AI safety.
(Based on math)
Sorry if I came on a bit strong and/or gatekeepe-y!
Was not my intention
I was just genuinely curious, I don’t really follow the ethics in ai space
I've always dreaded that field. The active ML researchers in that field are really made of steel!
https://news.ycombinator.com/item?id=15171513 <- this comment is a nice summary of what I mean. It's from 2017 which is even before the current hype 😄
They have to deal both with math and Twitter. Hard job.
It's always ballistic and house of flying daggers in that field. I don't think I can deal with that 🤣🤣
Robert Miles is the kind of not panicking, but also directly trying to solve issues person in this field (not the "there are no issues" type either). There are many such people, but outrage draws attention as usual.
(It's similar to those actually trying to solve climate change actively, versus the panickers and deniers, they are too busy solving the problem so you don't hear much from them)
To use it here, Elon Musk, Putin, and Mark Zuckerberg all have something in common
didn't know putin spoke about ai safety
2017 was a different world I suppose
anyway, is there anything being done in the direction of making the structure of the neural network part of the optimization ? so like, instead of just adjusting weights, also be able to add layers, increase their size, decrease their size, etc
Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and ...
Yes, it's very common.
NAS is a genetic algorithm
You'd typically have 2 optimization routines, one to train a network (a single instance) and then a hyperparameter search that maybe isn't fully random if you're going down this route
I haven't read any research of people doing it in one e.g., using the gradient after a batch to change the actual architecture, which I think the question is, maybe @iron basalt has
Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods.
IDR the names right now, but those don't work well yet (and probably won't for deep learning in general due to being offline).
what do you mean by being offline ?
Opposite of online learner.
As you might be able to imagine, a learner that is really good as learning new things without disrupting existing knowledge at all is ideal for growing more layers and such.
yeah ig I'm still getting caught up on all the terminology
are there any multi sequence GPT models ? like, two input sequences, one which is updated independently (like a user writing to a textbox real time) and another which is the output of the model
However, the concept of handling multiple independent input sequences in the context of GPT-like models is more about how you frame the problem and feed data into the model rather than a built-in feature of the model itself. For instance, you can design an application where one input stream is user-generated content (like real-time text input) and another is context or additional information (like a separate conversation or data stream). These inputs can be concatenated or formatted in a way that the model understands as separate but related pieces of information.
interesting, I think I'm still gonna do two branches and then kinda mix em up somehow before the output
my objective is to be able to talk with it via voice chat in real time
so like, it should know that it is interrupting me and not speak
instead of the current turn based thing they have on chat gpt mobile
sounds like what transformers were originally used for: sequence-to-sequence transformation
that's the encoder-decoder architecture, as in e.g. the "attention is all you need" paper
(unless i'm misunderstanding what you're looking for)
Right, in the original one the sequence doesn't change, but I suppose it's only a minor adjustment
It encodes the first sequence and then autoregression is done with the decoder
oh, i think i see what you mean
maybe it would work if you applied masking to both input and output...
that has to be done somewhere in the literature already. right?
I haven't looked into it yet, I'm still in the part of understanding and implementing the transformer. I'm using a Viz as guide
I'm using this as a guide for the implementation https://bbycroft.net/llm
A 3D animated visualization of an LLM with a walkthrough.
you're in the DS server right? i was asking a bunch of questions on this topic a few months ago while i was doing the same, search for my messages in the machine-learning channel there if you want to see the questions i asked and the answers i got
i was focused mostly on the self-attention mechanism specifically, since that was the non-obvious part to me
(not that any of it was obvious, but it was the part that i really didn't understand from reading the literature)
I think I got some intuition for self attention tho I do need to work through it.
I'm honestly stuck on why z score is used to normalize the vectors
Has no direct interpretation except that it keeps values from exploding
I am creating a MIDI music generative AI but have failed multiple times. I am starting over again and would like some insight on what models I should use
normalizing the mean to 0 and standard deviation to 1 just tends to work well in general
the important thing is to put all numbers on roughly the same numerical precision scale
centering at 0 and rescaling by standard deviation just happens to work well for that, it helps ensure that you're "in the middle" of the space of what can be represented by floating-point numbers, allowing lots of room for numbers to be significantly smaller than or significantly larger than 0
it also does have direct interpretation in statistical models, so there's some carry-over if you squint
oh also, if you center the mean at 0, then scaling down by standard deviation is just normalizing in the linear algebra sense of dividing by the l2 norm
i really like this talk for an explanation of self-attention https://youtu.be/S27pHKBEp30?feature=shared&t=587
Leo Dirac (@leopd) talks about how LSTM models for Natural Language Processing (NLP) have been practically replaced by transformer-based models. Basic background on NLP, and a brief history of supervised learning techniques on documents, from bag of words, through vanilla RNNs and LSTM. Then there's a technical deep dive into how Transformers ...
the value of the i,j cell of the attention matrix is a relevance score of the j'th token in the input sequence "from the perspective of" the i'th token in the output sequence
that's why they mask off the upper triangle of the attention matrix in decoder-decoder transformer, to prevent the i'th token in the decoded sequence from "attending to" any subsequent tokens
ig my question here would be if it is totally necessary, I'm fine with the division by std since it's the same as L2-normalization, but I'd be very happy to do away with subtraction by the mean if possible, im gonna try to graph this
ye makes no sense
in 2d is awful
maybe im doing something wrong
the vector goes from spanning the entire 2d plane to being confined to two points
which makes sense, subtraction makes it confined to y = -x, then normalization forces the norm to be 1
so each time this is done two dimensions are discarded, ig the network will find some way of accounting for this
class SelfAttentionHead(nn.Module):
def __init__(self, params: ModelParameters):
super(SelfAttentionHead, self).__init__()
self.compressed_coordinates = params.word_vector_size // 3
self.q: TensorFloat["coordinates compressed_coordinates"] = RandParameter(
params.coordinates, self.compressed_coordinates
)
self.k: TensorFloat["coordinates compressed_coordinates"] = RandParameter(
params.coordinates, self.compressed_coordinates
)
self.v: TensorFloat["coordinates compressed_coordinates"] = RandParameter(
params.coordinates, self.compressed_coordinates
)
def forward(self, sequence: TensorFloat["words coordinates"]) -> TensorFloat["words compressed_coordinates"]:
# TensorFloat["words coordinates"] @ TensorFloat["coordinates compressed_coordinates"]
q_vectors: TensorFloat["words compressed_coordinates"] = sequence @ self.q
k_vectors: TensorFloat["words compressed_coordinates"] = sequence @ self.k
v_vectors: TensorFloat["words compressed_coordinates"] = sequence @ self.v
# TensorFloat["words compressed_coordinates"] @ TensorFloat["compressed_coordinates words"]
attention_scores: TensorFloat["words words"] = q_vectors @ k_vectors.T
attention_scores /= torch.sqrt(self.compressed_coordinates)
attention_scores = torch.nn.functional.softmax(attention_scores)
# TensorFloat["words words"] @ TensorFloat["words compressed_coordinates"]
return attention_scores @ v_vectors
class SelfAttention(nn.Module):
def __init__(self, params: ModelParameters):
super(SelfAttention, self).__init__()
self.head_1 = SelfAttentionHead(params)
self.head_2 = SelfAttentionHead(params)
self.head_3 = SelfAttentionHead(params)
self.projection: TensorFloat["words words"] = RandParameter(params.words, params.words)
def forward(self, sequence: TensorFloat["words coordinates"]):
att_1: TensorFloat["words compressed_coordinates"] = self.head_1(sequence)
att_2: TensorFloat["words compressed_coordinates"] = self.head_2(sequence)
att_3: TensorFloat["words compressed_coordinates"] = self.head_3(sequence)
output: TensorFloat["words coordinates"] = torch.stack([att_1, att_2, att_3], dim=1)
return self.projection @ output
wait I should do softmax here isnt it
no is done on attention_scores = torch.nn.functional.softmax(attention_scores)
anyone has some resources on how to train neural networks??
It's hard to make recommendations without knowing your background. But my experience has been that if you know enough math and code, none of this is hard to pick up by just building stuff.
where have you seen standardization (centering and scaling by 1 standard deviation) rather than normalization (dividing by the norm) in a NN?
i'm not an expert in deep learning as i'm sure you know, but i've only ever seen the latter
Well in the transformer. The reference I'm using uses the z-score formula over the coordinates of the tokens.
I'm using this as ref: https://bbycroft.net/llm
A 3D animated visualization of an LLM with a walkthrough.
They do m*z_score + b, where b and m are learnable
you're talking about the layer norm step?
i see, so it is
i maintain it makes sense to both center and scale
it's the same reason you do it in just about any other machine learning model
it's good for numerical behavior
the fact that the scaling of centered data coincides with l2 vector normalization is just a bonus
The goal is to make the average value in the column equal to 0 and the standard deviation equal to 1. To do this, we find both of these quantities (mean (μ) & std dev (σ)) for the column and then subtract the average and divide by the standard deviation.
i wish they'd say why you do this, because what i said above is not obvious at all unless you already happen to know it
great resource overall, but too much focus on what/how and not enough on why
I am aware of this. My issue with it is that it's a step away from interpretability, so if I can do away with it I'd rather do it.
1/norm(v) is much more intuitive
Uhm, I also wonder if anyone has tried to do "compression" of the attention heads.
So like, train a larger transformer, but then look at the attention heads and see if they can be used to train smaller ones. Effectively compressing them. Or maybe even changing architectures entirely.
that sounds like distillation maybe https://medium.com/nlplanet/a-model-distillation-survey-7f0e1b56b3cf
centering at 0 has numerical benefits and doesn't qualitatively change the data at all
it's not a linear transformation, but it doesn't actually change any of the subsequent interpretation
it's just shifting the entire space to exist in a more numerically-comfortable region
followed by rescaling the norm to 1
Every idea I have has already been tried ahahah
yeah but you keep reinventing successful ideas. it's much less encouraging if you keep having ideas that people have tried and they turned out to be bad ideas
It does because you actually remove two dimensions, normalization constraints the data to an (n-1) hyper sphere for ex
shifting the data (centering) isn't linear, 0 isn't preserved
but yes, normalization (scaling) is linear
The plot I made really made it look like the v - E(v) thing is almost a projection. For 2d it maps the plane to the y=-x line. But maybe I'm doing something wrong in the plot.
that does seem off
in this case "2D" means you have 2 possible tokens in the sequence. the idea is that the embedding for each token is centered at 0 mean and scaled to 1 std dev, but that shouldn't involve any nullifying of vector space dimensions. it's just shifting the origin, followed by squeezing/stretching
Normalization is an important step in the training of deep neural networks, and it helps improve the stability of the model during training.
i guess this is their explanation
x - .5*( x + y) = .5(x - y)
y - .5 (x + y) = -.5(x - y)
As a sanity check
I think that's right unless I got fooled again by my eternal enemy, the minus sign
So there's one free variable after subtracting the mean
The 2017 paper points here: "Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint
arXiv:1607.06450, 2016."
Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which ...
oh i see what you mean, yes that makes sense
in statistics the sample average is just an estimate of a population mean so it makes sense in that context
Yes it makes sense if you look at the coordinates as samples from the same distribution. But that sort of collides with the picture of a vector space where dot products measure similarity.
Which is how the best intuitions I've seen are being built from.
that's a good point, especially because each individual vector is being normalized, rather than normalizing each dimension across all vectors. i'll have to think about it to see if there's any way to interpret this meaningfully, or if it's purely mechanical
there's also the whole aspect of learning slope and intercept parameters, which kind of throws off my interpretation before
guys How to start jupyter notebook locally on my windows I already installed Miniconda
Open your conda cli and type Jupyter Notebook then hit enter
How can I run external Jupyter notebook from online source to my windows??
Is miniconda enough to have Jupyter or do I need to install whole anaconda??
The difference between anaconda and minconda is that anaconda already comes with a ton of packages and tools preinstalled and miniconda doesn't. As a beginner it might be a good idea to start with Anaconda.
I'll refer you to an editor (visual studio code): https://code.visualstudio.com/learn/get-started/basics. You can install it and you can be up and running in a minute but there's also a 5 minute video you can watch if you want/need to.
You'll have to install the Python extension https://marketplace.visualstudio.com/items?itemName=ms-python.python
and afterwards follow a third of this guide (you certainly don't need to read all of it) https://code.visualstudio.com/docs/datascience/jupyter-notebooks
I'm mostly sending you in the direction of tutorials that you need to read and/or watch because of the old adage: “Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime.”
Hey, I've 75k row dataset. I need to use sklearn.MLPClassifier. I have 5 class. My accuracy score every time higher than 0.99. Why this can be happen?
my dataset distribution
Can you do 3 things please:
- When sharing code could you use triple backticks (`) to paste multiple lines instead of screenshots, it's typically preferred here :D
- Could you use
cross_val_scoreinstead ofcross_val_predictand casting to integers? It's the more idiomatic way to do this thing. - Can you split into train and test,
cross_val_scoreon train, then train the classifier "for real" on train, predict on test and then make a confusion matrix.
How to optimise linear regression model to produce better predictions?
- Using
RidgeRegressionCVorElasticNetCVinstead as these models already carry out some hyperparameter tuning for you. - Feature engineering: add interaction terms, binning, polynomials, splines, feature transforms and so on. The best way to identify if you need additional feature engineering is by doing residiual analysis. Plot the error your model is making versus each variable. Normally there should be no structure in the residuals, if there is you may need feature engineering.
I am using polynomials. I am also experimenting with ensemble methods
I will try the models you mention right away
When it comes to feature engineering I am trying to avoid it for today
Does it need to be linear regression? I always try a gradient boosted machine and/or say Random Forest to see what their performance is and then compare that to the linear models. If they're doing significantly better then there's at least a few non-linearities that your linear model is not accounting for.
Yeah it has to be linear regression.
Then you should definitely do what I suggested, the gbm / rf model will at least give you a lower bound of performance your linear model should be able to obtain
Btw RidgeRegressionCV performed horribly on my test data. I am currently using ElasticNet as it was best one so far
The ElasticNetCV performed same as my regular Elastic Net. That’s because I already hand picked the best hyper parameters.
Then residual analysis and feature engineering and what you should probably be doing. I'd stay with RidgeRegressionCV and ElasticNetCV as adding features will change the hyperparameter values you need
hi guys, sorry for interupting, but anyone knows where i did wrong?
download those csv files and look at them in excel and it should be obvious, or use Google sheets. Is there a google Drive API you could use to fetch the files?
I will take a break and start with the feature engineering. Any tips?
Just the stuff I mentioned, it's very case specific. Do the residual analysis and set yourself a "target number to beat" by running gradient boosting.
You're not interrupting. But what about it is wrong? We can't always tell how it's different from what you expect, unless you say what you expect.
You might need to change the axis for pd.concat
i wanted it to look like this
but instead, it looks like this
Can you download those CSV files and open them locally?
i could open it with excel i guess
you might have to use this https://developers.google.com/drive/api/reference/rest/v3
Or an easier solution might be to run the code in Colab, it might have direct access to Google Drive that you don't otherwise get
hey i'm interesting in getting into ML, what is the best way of starting this?
I've asked gpt to make these vizualizations, haven't checked the code but they seem to align well with the 2d case
My intuition here is that you start with a 512 dimensional space, and you only use a slice of a slice of it, a subspace of 510 dimensions. Still plenty of room to work with, but you excluded points with values that might cause numerical instability.
I'm in a similar situation and moving from basic statistics to linear regression. I think it's the easiest to understand, and once you learn how to do linear regression with scikit learn, the approach is similar for other algorithms
great idea, and this actually makes a lot of sense now that i think about it more
the whole point is that the original data could be anywhere in space
and you want to bring it all back into the middle
but can you share the code? i want to make sure it's actually doing the right thing
that is, each "instance" should be normalized within itself, rather than what we normally do, which is each dimension being normalized across all "instances"
Sure:
Here's the full Python code used for the visualization with shifted and normalized grid points:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Generate a uniform grid in 3D space
x = np.linspace(-4, 4, 10)
y = np.linspace(-4, 4, 10)
z = np.linspace(-4, 4, 10)
xx, yy, zz = np.meshgrid(x, y, z)
grid_points = np.vstack([xx.ravel(), yy.ravel(), zz.ravel()]).T
# Subtract the mean of each point's coordinates from the point itself and then normalize
shifted_normalized_grid_points = np.array([(p - np.mean(p)) / np.linalg.norm(p - np.mean(p)) for p in grid_points])
# Plotting
fig = plt.figure(figsize=(12, 6))
# Original grid
ax1 = fig.add_subplot(121, projection='3d')
ax1.scatter(grid_points[:,0], grid_points[:,1], grid_points[:,2], color='b')
ax1.set_title("Original Grid")
ax1.set_xlabel('X axis')
ax1.set_ylabel('Y axis')
ax1.set_zlabel('Z axis')
# Shifted and normalized grid
ax2 = fig.add_subplot(122, projection='3d')
ax2.scatter(shifted_normalized_grid_points[:,0], shifted_normalized_grid_points[:,1], shifted_normalized_grid_points[:,2], color='r')
ax2.set_title("Shifted and Normalized Grid")
ax2.set_xlabel('X axis')
ax2.set_ylabel('Y axis')
ax2.set_zlabel('Z axis')
plt.show()
This code generates a 3D grid of points, processes each point by subtracting the mean of its coordinates and then normalizing it, and finally visualizes the original and processed grids.
Generalizing this to 4d, I reckon that the subtraction by mean gets you to a 3d space and then norm gets you to the familiar spherical surface of radius 1.
In that case dot product is just the cosine similarity. For higher dimensions, you'd always still be working inside an hyper sphere.
It's also neat that it always ends up being a manifold. And that the dot product can still preserve that similarity interpretation.
How do latent spaces end up working ? Do the networks always partition the vector space into chunks ? Or do they make use of dimensions to represent things ?
(Like in QM, where a dimension will correspond to a possible state )
And has anyone tried doing these things with complex numbers ? Or check if the network ends up learning them somehow, and possibly even forming an Hilbert space instead of the usual thing ?
Guys, has anyone worked on chatgpt's api for python?
pretty good job by the bot there. i'm consistently impressed with its understanding of tasks like this, even if it doesn't know how to write particularly efficient or idiomatic numpy code
what do you mean by chunks?
iirc there has been work on using fourier transforms inside neural networks, e.g. transforming the data to fourier domain and training the model on that. so that will pull in complex numbers. otherwise i don't know if there's an advantage to complex numbers vs. 2 dimensional real vectors
It actually took me 15min to get it to do it right. At some point I just wanted it to succeed so I didn't give up. I explained it from so many angles ahah
lol fair enough
off the top of my head, you could rewrite shifted_normalized_grid_points this way:
shifted_normalized_grid_points = (p - np.mean(p, axis=1)) / np.linalg.norm(p - np.mean(p, axis=1), axis=1)
you might need to scatter in some [: , np.newaxis]
In partitions, from what I saw when I trained a small GAN, it looked like there were regions dedicated to each face and then a continuous morphing from one region to the other.
ah, i don't know what GAN architectures are like
i'd say that in general no, NNs don't always partition the data like that. but maybe modern architectures for image GANs are designed to do that, or do it naturally
I used a decoder made of convolution layers. So it took a "small" vector of 512 dimensions and expanded it into an image. I trained it on random 512 vectors so it turned the vector space R512 into a latent space. A path in that space would originate an image of a face morphing continuously. I have the gif somewhere
Aaah I lost the gif
Can someone help me
you have to ask the actual question that you want help with.
Hi, how do I build a proper framework/schedule to learn coding?
I'm asking here because I'm trying to learn ml
and heard that you start with the python libraries, like pandas or mat
but doing tutorials and exercises is so boring to me
I usually have to slug through to make any progress
idk do I just push through
How can i change a list to an array guys
x = np.array(y)
Where y is a python list
I usually just come up with a cool project and kinda do it. The struggle generates learning and then at the end of it I have a project to list on my CV.
It's also a lot more bearable to work on something I love than to watch someone talk about random stuff I don't need yet.
And the nail on the coffin is that studying theory is 10% (or less) of the learning process. You can read an entire book and then not be able to apply any of it.
Hence why many software engineers seem to regret majoring in cs 💀
No hate towards anyone who does
I can't comment on it because I haven't majored in it. But if you love the subject and you enjoyed studying it, it wasn't a waste of time.
Were you able to get a job out of it tho? (If you dont mind sharing)
I don't have a formal education in CS.
But I was able to get a job because I've done interesting stuff
Good to know cause I'm going to college next year and dont know if I want to transfer to cs
And I like to think that I keep doing interesting stuff ofc ahah
True lol
Yeah it's a very personal choice, I'd take into account both personal circumstance and passion.
Passion will get you through the worst of it, but you should be careful with the particulars of your life.
So like, some people are very passionate about arts, but on avg that won't get you positioned in the job market.
Yea I currently have compE because resources for learning electronics are usually pricier
Compared to cs
From my experience, even smaller scale stuff will be expensive because you need GPU just to experiment with stuff
When you get into the good stuff
It's prohibative
Like gpt4 for example, you need the backing of Microsoft
Anthropic is backed by Google
And Meta is literally one of the Maang/FAANG or wtv the current name is ahha
Ah and data, data is expensive. One of the lessons that really stuck with me was about the data quality
Your model is as good as the data you give it. And the best data you can get is expensive. You'll usually pay a bunch of people to do the manual work of annotating things.
Data quality really really does make a huge difference, it's actually insane.
ML is a field where programming is a means to a specific end. I suggest starting with just the basics, and then you can learn more about programming as you feel the need
https://python3.info/ just today i came across this book, it seems like a pretty good place to get started if you're interested in ML
Between 2 vector, how cen i find exactly number and which is the most show in these 2 vertor
Like looking the duplicate number in 2 vector
I'm confused. You can find out if there's a two by doing any(vector == 2)
vector == 2 will apply the == 2 operation to each coordinate
Which will result in a boolean vector
And any will return True if any of the values is True
i'm surprised that worked at all. google sheets URLs do not return CSV data.
you'll need to export/download each document as CSV first
(google might have some API to do this programmatically, but i'm not aware of it)
yeah i understand it now. you need to replace the URL with read csv format
i thought i could use a oversimplyfied version
Is there anyone who have face the issue that "semTools" is not an exported object ?
I have downloaded the semTools package, upgraded it and using library (semTools) in the beginning of my code.
Error: 'semTools' is not an exported object from 'namespace:semTools'
In addition: Warning messages:
1: In lav_data_full(data = data, group = group, cluster = cluster, :
lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
2: In lavaan::lavaan(model = model_description, data = data_processed, :
lavaan WARNING:
the optimizer warns that a solution has NOT been found!
Thats not python related right? Its R?
Try the steps mentioned here
Is R, right
Maybe is my knowledge gap but this doesn’t seem to be much related to the issues I am facing. If so, please elaborate.
I'm having a dejavu, didn't we discuss this in #1035199133436354600 ? Or is there a glitch in the Matrix? 😉
hey, does anyone know how I can use seaborn or other viz tools to create a grid plot to display multiple dataframes in it?
I don't mean the "classic" pair plot that takes all the numeric columns in a single dataframe, but rather I want to display, in multiple subplots, the regplot of same 2 columns that are shared across multiple dataframes.
oh yeah, i mean. i just understand that recently
sorry for being slow i guess?
sorry, not trying to tell you off, but if you link to the previous discussion then we can move on from there. Instead of starting from the same original question
-
this is a python discord idk if its the right place for R stuff here, especially since its some R related package error and nothing datascience related.
-
It seems like the package is missing that function based on that error. In the post it mentions some options to check/mitigate that error.
as usual, show your code
you can do it with matplotlib, but you'll need to do the regression and plot its output yourself
!d matplotlib.pyplot.subplots
matplotlib.pyplot.subplots(nrows=1, ncols=1, sharex=False, sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None, **fig_kw)```
Create a figure and a set of subplots.
This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.
use that to set up a grid of axes, then you can plot whatever you want on each axes
Hello guys I'm working on SOMA implementation do MEALPY and also enhancing algorithms about mirror boundaries. Currently I'm struggling with convergeency problems, is there somebody who could help me. I'll share everything and my last step is to commit these improvements to public package, but unfortunately I'm stuck.
guys, anyone understand where i did wrong?
Can you show where you set that variable
Right now that variable isn't a data frame, it's a function you haven't executed yet
What zestar said. You probably forgot to call a function in an earlier line, by adding parens (), or something similar
is it because of this?
Yes, like BillyBobby says, you're missing parentheses () behind copy
aight, i understand. thanks.
I'm trying to export my custom bert model to ONNX, but for some reason after loading the exported model it has empty input array, what could be the reason?
hi
Maybe my math is off, but I think self attention can be simplified to:
softmax(xMx^T / sqrt(d_k))Vx^T
Where M and V are the learnable parameters.
Went over it a couple times now.
This would kinda simplify the interpretation too, since M is kind of acting like a metric tensor
are you talking about "condensing" Q and K to a single matrix?
hello could anybody help me with uml class diagrams?
-
"don't ask to ask" - you need to describe your question enough so that someone can actually help you
-
you might want #software-architecture instead of this channel
i assume you'd want it to be the same shape as Q K'? how would you construct it?
It follows directly from the definition.
You just substitute :
K = Wk x
Q = Wq x
I'm using ' to distinguish between the K, Q in the paper and the transformations that produces them
i'm using ' for transpose.. can we call them Wk and Wq like in the attention-is-all-you-need paper?
Oh, sure
so you have the decoder-side tokens Y, and the encoder-side tokens X. how do you construct this M matrix differently from (Wq Y) (Wk X)'?
err, i think i swapped q and k. same idea though
Let me check this
yeah, GPT is decoder-only and BERT is encoder-only, but this is the most general case
in the case of the nanogpt model you were working through in https://bbycroft.net/llm, they already simplified this operation somewhat
in general, you project queries, keys, and values into 3 separate spaces. even if they come from the same input sequence
No I found an error in my calculation
and even if you enforce that those 3 spaces are the same, you still have this "cartesian product" operation, multiplying all pairs of tokens together (at least looking backwards in the sequence, if you're in a decoder unit)
ah, okay then
No this is too suss wait
Qx (Kx)' = Q x x' K'
= (x x') Q K '
It's probly gonna be the other way around
x Q ( x K ) ' =. x Q K ' x' = x M x '
The second way makes more sense
And is how I produce the matrix
I mean both ways produce a single matrix. But the second way makes it so that it's not a scalar mul
Looking at the paper, Wq and Wk (which I'm calling Q and K above), have dimension d_model x d_k, since d_model is the size of the embedding vector, it must come from the left as a 1xd_model
https://github.com/mistralai/mistral-src/tree/main/mistral
you guys most definitely know about mistral right
i have no background into machine learning, im a self taught python 'developer' (im not professional, though i am proficient)
How hard is the road of learning to work with ML in python
from 0 to being able to expand on open source frameworks
if you were to give me advice, please dont focus on required python skills (mentioning some important frameworks to learn is nice though) but instead maybe give some sort of guidance on what subjects to tackle first
thx 🙏
Perhaps start with https://cs50.harvard.edu/ai/2023/. That'll give you a taste and some ideas of what you might want to learn more of.
Expand in what way? Are you trying to make use of ML to solve some problem as a framework, in which you don't really touch the ML part directly, but build around it (like making use of a physics engine in a video game, but not touching the physics engine internals)? Or do you want to make new kinds of ML models (research)? Or the functions required to make those models, etc (e.g. GPU kernels)?
i think you're conflating the purpose of Q and K with the purpose of what i was calling Wq and Wk
ah, i see what you're doing here
Q = X @ Wq
K = X @ Wk
V = X @ Wv
Q @ K.T = (X @ Wq) @ (X @ Wk).T
= (X @ Wq) @ (Wk.T @ X.T)
= X @ (Wq @ Wk.T) @ X.T
i think you had it right the first time, but matrix multiplication doesn't commute so you can't pull out the (X @ X.T) like you did
that is a pretty interesting interpretation of what's going on though
The first three equations don't make sense dimensionally if you check. And the equation you're solving results in scalar multiplication of a single matrix, which doesn't really do anything , gotta be the other way I think.
you're right, the first 3 lines are swapped
Which means that the true interpretation of self attention is that the network learns custom dot product metrics, which is super elegant.
precisely
that's actually the whole point!
it's basically a "soft lookup" , hence the names "query", "key", and "value"
But did they try to do two matrices instead of three and didn't work out as well ? Even tho they're equivalent descriptions ? Or did they not realize what they were doing ? A single learnable metric tensor oughta be more efficient
is there a way to reduce this to a single linear transformation of (X @ X.T) or X? Wq @ (X @ X.T) @ Wk.T
oh, i forgot to swap the other lines
lol, hang on
X @ X.T ends up being dot product
Assuming X is only one vector, that's a single number
ah yeah
(which you can assume without loss of generality)
well yeah, that's the point
if X is one vector, that means the input sequence had one token
Yes
but anyway you were right after all, you get X @ <something> @ X.T
so let me think that through, why you wouldn't want to just have "something" there
Something like that, where C is a compression transformation and M is a metric tensor
it's entirely possible that models which work on a single sequence (not on a pair of encoded and decoded sequences) already do this as an optimization
that or it actually doesn't work as well, that i would not know
i'm also not sure it allows for masking
You can include it outside of all of it, when you get a square matrix
Like uhm
(mask) @ (custom dot product thing)
And then apply softmax, etx
ah nvm, im looking at the attention is all you need paper now to confirm, and they do the masking after QK anyway
It's a pretty cool idea this whole thing, but am super curious if the Wq and Wk are really needed and why, and if not why didn't they know it
again i think in the most general case it allows for two different sequences, encoder and decoder
it's probably how they arrived at the concept
why they didn't simplify after, i'm not sure
What do you mean by two different sequences ?
like in a machine translation scenario, you train it on pairs of e.g. english and spanish sentences
I haven't gotten to that part yet.
but GPT doesn't do that
as far as i understand, that was one of the earlier use cases of transformers, although one of our local NLP experts would know better than i would
GPT and BERT came out later than Attn Is All You Need
They use a single branch isn't it. Instead of encoder decoder thing
right
interestingly nanogpt (the one you were looking at in the visualization tool) also doesn't do this
https://github.com/karpathy/nanoGPT/blob/eba36e84649f3c6d840a93092cb779a260544d08/model.py#L29-L76 should be relatively easy to modify this code to use your idea
there's quite a lot of research now on making self-attention fast, eg. https://arxiv.org/pdf/2205.14135.pdf but no mention of this particular transformation
Ah I can't look at it, I'm implementing from scratch
well if you're implementing your own, you should be able to get the same or similar results doing it your way vs. the usual way
that'd be an interesting experiment, to compare training times and results
Yeah if no one's doing it, kinda sounds like a paper cuz it's one less operation per head right
the fact that it's not being done even in extremely optimized implementations makes me think we're just missing something
I mean is a super simple mod, so I doubt anyone hasn't tried it yet
Exactly
hm.. is it actually one less operation?
i know it's one less matrix multiply, but the dimensions involved are bigger
originally you have (d_batch,d_model x d_model,d_key) x (d_batch,d_model x d_model,d_key).T so the inner multiplication is between matrices of relatively small dimension d_key
it's the same number of dot products, but the dot products are between smaller vectors
hm... no, that doesn't matter. because you're kind of proposing that the dot products themselves are essentially pre-computable
keras uses some kind of einsum magic, not willing to muddle through it right now 😆 https://github.com/keras-team/keras/blob/v3.0.1/keras/layers/attention/multi_head_attention.py#L626
keras/layers/attention/multi_head_attention.py line 626
def _build_proj_equation(free_dims, bound_dims, output_dims):```
Is there anyone who know why one would use Bootstraps in a structural equation model?
Disclaimer: I don't know SEM at all but you'd bootstrap to get a confidence interval on methods that don't give it to you "out of the box"
Alright! Thank you. : )
As you`re not familiar with SEM, you may not know why semopy is not able to calculate the r-square.
Hi
Is there any one having good knowledge of opencv and ml
I want to build a project for that I need some navigation I can make possible that If any one is here who will help me then please reach me
I have very good project and we can build it together
Then DM me
Hi all, any resource for learning generative ai using python?
Guys imma planning to develop a ml model so can u guys suggest some fresh and new ideas with some complexity involved ?
@last ivy yes I have idea
During training you're going from 2*d*d_k parameters to d*d. So at least during training the condition for the first beingg more efficient is that 2*d_k < d .
in the case of nano gpt, 2*1/3*d < d -> 2/3 < 1
so the way it's done makes training more efficient
and if the other way around is more efficient for inference, it should be possible to reduce one form to the other
Could someone smarter than me tell me why the resulting matrix doesnt have ones along its diagonal? Even though the paper explicitly states that the sqrt of a matrix has to be the original matrix when taking the dot product with itself ```
m.pearsonsCoefficient(covMatrix)
array([[ 0.60948941, -0.06662308, -0.59805044],
[-0.06662308, 0.00828873, 0.03770686],
[-0.59805044, 0.03770686, 1.34752355]])
https://paste.pythondiscord.com/OV5A
Yes I tested this method and the sqrt method works just fine as youd expect. Sigma is in this case the covariance matrix
right. the idea of using one for training and the other for inference is interesting. might be worth experimenting with
I am a newbie. can anybody give a road map for AI.
Sigma here is just the covariance matrix, it's not the eigendecomposition
that expression is just dividing each element by the square root of the product of its corresponding variances
Im not a native speaker, so in simple english; are you just supposed to divide the cov Matrix element wise?
i'm not sure what eigenVectors**0.5 * np.linalg.inv(eigenVectors) @ eigenVectors is supposed to do, but maybe i'm just too rusty with the math here
sorry it's supposed to be py eigenValues**0.5 * np.linalg.inv(eigenVectors) @ eigenVectors but i still dont get the 1 along the main diagonal
def cov(x):
m = x.mean(axis=0)
c = (x - m)
return (c @ c.T) / (x.shape[0] - 1)
def corr(x_cov):
x_vars_sqrt = np.diag(np.diag(x_cov) ** -0.5)
return x_vars_sqrt @ x_cov @ x_vars_sqrt
it should just be this
there's probably a way to rewrite x_vars_sqrt @ x_cov @ x_vars_sqrt using numpy broadcasting instead of constructing x_vars_sqrt explicitly. but the code above is the typical formula. it's also what's shown in your screenshot
But wouldnt this contradict the assumption A^0.5A^0.5 is A?
ah, you're trying to solve for the square root of the diag matrix that way
Yeah im honestly kinda confused as well but I just went along as the author said and here I am 😉
In mathematics, the square root of a matrix extends the notion of square root from numbers to matrices. A matrix B is said to be a square root of A if the matrix product BB is equal to A.Some authors use the name square root or the notation A1/2 only for the specific case when A is positive semidefinite, to denote the unique matrix B that is po...
in this case a = np.diag(np.diag(x_cov)). the formula says that you want the square root of that thing. but we know by construction that a is diagonal, so we can use the special case formula where we just take the square roots of the elements
it should make sense intuitively... what is the result, in general, when you multiply two diagonal matrices?
i believe all the more-general matrix square root techniques depend on that result for diagonal matrices
in any case, you shouldn't need to compute the eigendecomposition here
again, Σ in your text is just the covariance matrix, it's not related to eigenvalues
Yep, thank you i now get along the diagonal only ones, which makes sense since they correlate to each other in a 1 to 1 ratio
Honestly I started with implementing a MLP, it's really up to you and how you feel. But I think it is always a good idea to begin somewhere simple so that you get a hands down experience with the fundamentals, again its just an opinion
hello, is there anyone who's more familiar with time series? more specifically price data
Always ask your actual question. Don't ask to ask. Even if someone does know about time series, they need to know the actual question to start helping.
not really sure how to word it, but I want to represent multiple chart patterns as features and I don't really know where to start
for example, I want to represent a wedge pattern as something I could feed into a model
By the help of an AI model i want to assess mental health of a kid using a survey/questionnare. But the problem is i dont have appropriate data set to train my model for this. What should I do in this case. can the concept of coldstart help in this case. If yes then how ? Also if i use 10-20 questions then is there a way to make the model learn from itself, like can i apply reinforced learning in this. If yes then how ?
you wouldn't use reinforcement learning for this. (it's reinforcement learning, not reinforced learning.)
what is the format of the questionnaire? Are they open-answer, or is it things like "I often feel like something bad is going to happen. Strongly agree, agree, neural, disagree, strongly disagree."
yes something like this only, no open-answers. But we've to remember that questions will be specifically for children. So whatever you think might be used, please elaborate on it. I am in middle of a competition and I need some clarity on it. @serene scaffold
requesting anyone to please help. I need some information immediately. Kindly understand
what is the competition asking you to do?
we just have to assess the responses of the questions and based on that we have to show a rating. Thats it.
are you required to use ML?
@serene scaffold
Is there any one having good knowledge of opencv and ml
I want to build a project for that I need some navigation I can make possible that If any one is here who will help me then please reach me
I have very good idea and we can build it together and if not can you guild me through it
What are you looking to build?
I want to build a logging tracking system with opencv with addition of ml algo
I just need of some guidance if someone having clear vision of ml opencv
We can make a little conversion
Can I DM you
Does anyone here have experience the bureau of labor and statistics series id's? I'm working toward collecting more targeted data for analysis but the process of ascertaining data from their ids is cumbersome.
Doesn't seem like an AI use case at all. There are standardized questionnaires and ways to score them in psychology
People don't typically engage with such requests here, it's best that you ask specific questions e.g., "how do I do this in opencv" or "this is how I'm approaching it, is there anything wrong with it" compared to "I need guidance on task X, can anyone help?"
anyone here interested in joining a hackathon, need people with decent data science and ai skills. Drop a dm, making a team
|| My previous team i joined all quited so making my own team ||
any ideas on how I could represent chart patterns on time series as features?
for example, I want to represent wedges or triangles on portions of the graph as some parameters that define them (as I would represent a trendline for the last i-X days with a slope at point i)
if i1==3:
a1=df['Year'].value_counts()
print(pd.DataFrame(a1,index=['A','B','C','D','E','F','G']))
getting all indeces as nan
any clue why?
@blazing vale df['Year'].value_counts() returns a Series where the indices are values from df['Year'] and the actual values are integers for how many times that index appeared in df['Year']
so, why are you trying to change the index to letters? what would that even mean?
What is the idea behind this transformation?
# Here we map each temporal variable onto a circle such that the lowest value for that variable appears right next to the largest value. We compute the x- and y- component of that point using the sin and cos trigonometric functions.
df['day_sin'] = np.sin(df.day*(2.np.pi/31))
df['day_cos'] = np.cos(df.day(2.np.pi/31))
df['month_sin'] = np.sin((df.month-1)(2.np.pi/12))
df['month_cos'] = np.cos((df.month-1)(2.*np.pi/12))
it's so that "31" ends up spatially close to "1"
mathematically it's like turning the number line into a circle
i suggest actually drawing it out on a unit circle
btw you probably want to use code formatting for this:
df['day_sin'] = np.sin(df.day*(2.np.pi/31))
df['day_cos'] = np.cos(df.day*(2.np.pi/31))
df['month_sin'] = np.sin((df.month-1)*(2.np.pi/12))
df['month_cos'] = np.cos((df.month-1)*(2.np.pi/12))
!code
that said i don't know if day-of-month is all that useful
maybe for things like paydays and stuff that's on a bimonthly cycle
whats different between sklarn iteration and epochs?
but months are inconsistent in size and occur somewhat arbitrarily at various weekdays, so i think you'll have relatively low signal from day-of-month
that's going to depend on the particular model you're looking at. scikit-learn has dozens. can you be more specific?
for example MLPClassifier doesn't have epoch value
It's only accept max_iteration
when i plot my loss_curve, iteration is just 12
but if i check internet other models return as a epoch
this is my loss curve
they're probably just calling it something different. you'll have to check the docs to see what each parameter means
Ty this helped alot!
what's a good dataset to practice training an ai on?
For what
Greetings everyone, I have a question, who has actually managed to fully train a functional Multi Linear Regression model (at least more that 6 features) using their off-the-shelf pc/laptop or at least Google Colab?
linear regression? i've trained linear regression models with thousands of features and millions of rows on my laptop in a few minutes
the only thing you really can't do on off-the-shelf general-purpose hardware nowadays is deep learning for massive models
linear regression on moderately large datasets has been doable on off-the-shelf general-purpose hardware since the 90s
These are common tensorflow models plots. If you are using skelarn MLPClassifier, iterations are same as number of epochs. Usually in DL frameworks, one epoch means N iterations where N is basically (total samples / batch size). Maybe thats why its a bit confusing when we set max_iter parameter in MLPClassifier.
sklearn*
Thanks sir
can you tell me which python libraries skills are needed for this hackathon ?
Hi Guys, i wanted to add value to the bar chart but it ended with error. where i did wrong?
I think x,y are not defined, in this case maybe sns.barplot args should be right arguements for addlabels
addlabels(class_distribution.index, class_distribution)
Yeah i was just being dumb here lol.
btw anyone knows how to get years along with this in output
if i1==4:
a1=pd.DataFrame(df['Year'].value_counts())
print('Year in which most number of games were released',a1.max(),'\n Year in which least number of games were released ',a1.min())
space()```
so i have a csv dataset of 7 years which consists all the info of games released on the ps4 console
2013-2020
however this piece of code is giving me only max number of games
and min number of games
it isnt giving me the years along with it
so value_counts returns a series that is automatically sorted for you, there's no need to turn it into a dataframe for this. Here's an example of how you can use it for this:
>>> a = pd.DataFrame({'Year': [1997, 1998, 1997, 2005, 2005, 2005]})
>>> a1 = a['Year'].value_counts()
>>> print(f'Year where the most games released {a1.index[0]}, got {a1.iloc[0]} sales\nYear where the least games were released {a1.index[-1]}, got {a1.iloc[-1]} sales')
Year where the most games released 2005, got 3 sales
Year where the least games were released 1998, got 1 sales
ohhh thankks mann
if i may ask what is a1.index[0] needed for here. Cant i also just write its iloc and print it? @small wedge
i am getting the same output but its giving me the name and dtype too. anyway to remove that from output?
Is it ok I tell you on dms?
Yes
hm can can you show what you're running? unless we have very different versions of pandas installed iloc should only give you the counts of each year; these counts are indexed via the years that they represent which is why I used a1.index
ohh how do check it?
what version i have
i can share my output
this is for value_counts
Year
2017 254
2016 222
2015 172
2014 98
2018 39
2013 20
2019 12
2020 8
pip show pandas I'm using 2.1.1
ok
i am using 2.0.3
Year in which most games were releasedcount 254
Name: 2017, dtype: int64,Year in which most games were releasedcount 12
Name: 2019, dtype: int64
interesting
if i use only iloc i get this output
i am using a dataset of 826 rows
and 9 columns\
if i1==4:
a1=pd.DataFrame(df['Year'].value_counts())
print(f'Year in which most games were released{a1.iloc[0]},Year in which most games were released{a1.iloc[6]}')
space()```
ohh waittt i forgot to do that
lol
lemme make changes
working properly now 🫡
thankssss
np
but i still have a question
if i use it with a df why it returns name and dtype as well
but when i use the same func with series it doesnt do so
thats strange lol
and cool at the same time
because a dataframe returns a series when you index via iloc
but a series returns the value that's at the index
ohhh
so series just returns the value
whereas df returns a series the value along with name and dtype
if c==5:
while True:
print('''Enter 1 to get Total Sales of all games\n
Enter 2 to get Total sales in each genre and by each publisher\n
Enter 3 to get game info about the games with Maximum and Minimum Sales across each Region and ROW Sales\n
Enter 4 to get Maximum and Minimum Sales made by each publisher across each Region and ROW Sales\n
Enter 5 to get Maximum and Minimum Sales made in each genre across each Region and ROW Sales\n
Enter 6 to return to previous Menu''')
space()
i1=eval(input('Enter your choice: '))
space()
if i1==1:
print(df[['Game','Year','Genre','Publisher','Global']])
space()
if i1==2:
if df['Global']=='Action':
a1=df['Global'].sum()
@small wedge
Here
mhm
lemme show the error
Traceback (most recent call last):
File "C:\Users\LENOVO\Desktop\IP Project\Ip project101.py", line 130, in <module>
if df['Global']=='Action':
File "E:\lib\site-packages\pandas\core\generic.py", line 1466, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
error
data set
what are you trying to do with these lines?
if df['Global']=='Action':
a1=df['Global'].sum()
trying to find total global sales of each genre
ah ok
here are all the genres if needed 'Action','Shooter','Action-Adventure','Sports','Role-Playing','Platform',
'Racing','Fighting','Adventure','MMO','Simulation','Music','Party','Strategy','Puzzle','Visual','Novel','Misc'
so when you do df['Global']=='Action' this creates a series that is a mask of 0's and 1's
you can use this mask to index your dataframe, then take the sum from the result instead
df[df['Global']=='Action'].sum()
s'all good, sometimes you get lost in the sauce
Yeah
Hey if i wanna do it all for once for all genres
Then should i pass a list of everything there
Ohh wait then that would do sum of everything too 😭
yeah and use df['Global'].isin(['Action','Other stuff', ...])
Should i define this function?
And use it again and again just by giving the name
Of the genre
This would reduce the typing and copy pasting part alot lol
you could, you could also use groupby to split them all up for you
then select the groups you want and take their sums
can you tell me what isin is?
a function for creating masks that match more than one category
Ohh
just like a == 'a' or a == 'b' is cleaner to do as a in [*'ab'] for pandas you use isin
I dunno but i think we call it boolean indexing here(the masks)
I dunno if its the same thing lol
I've never heard it called that but it would make sense to call it that
Yeah
Cazue it returns true and false when checking condition in df
And series too
Is it the same thing?
I am almost done with my project lol thanks to you
is what the same thing?
nice lol, happy to help
like boolean indexing and masking
probably but idrk 
@small wedge hey 🥲
Game 0
Year 0
Genre 0
Publisher 0
North America 0.0
Europe 0.0
Japan 0.0
Rest of World 0.0
Global 0.0
getting this output
In Pytorch how do you create a zero(?) dim tensor with a single value...
A couple of metrics return a tensor(0.4) but I have no idea how that is created 
show what you wrote