#data-science-and-ml

1 messages · Page 92 of 1

maiden arch
#

nevermind i made it worked i think kinda

marble raft
#

Hi

#

Does anyone understand tensorflow i need help

maiden arch
#

ok i did it but there are not graphs visible ?

boreal gale
maiden arch
#

haha the color is white

#

thats y it is not printing

#

lol

maiden arch
dense cloud
#

I'm trying to upgrade some old code from Pandas 1.0 to 1.2 and I'm getting this error: TypeError: Expected unicode, got pandas._libs.properties.CachedProperty. Internet says that I have to set frequency ( df = df.asfreq("1D") ), but the issue is I have multiple places where this happens and frequency is different...is there a generic solution for this one?

edit: it seems that frame.index.freq = frame.index.inferred_freq is what I need

maiden arch
# boreal gale let me see if i can point you to the right direction. ``` ax.bar(dates, volumes,...
import pandas as pd
import matplotlib.pyplot as plt

# Read the stock data
stockData = pd.read_csv("/home/needjobcoder/devlopment/python/dataSciencePractice/practice/stockMarket/indexProcessed.csv")

# Convert 'Date' column to datetime
dates = pd.to_datetime(stockData['Date'])

# Extract columns
high = stockData['High']
low = stockData['Low']
_open = stockData['Open']
close = stockData['Close']

# Combine columns into a single NumPy array
stock_array = stockData[['High', 'Low', 'Open', 'Close']].values
print(len(stock_array))


dates = dates.to_numpy().flat
print(len(dates))

# Create a boxplot
fig, ax = plt.subplots()
VP = ax.boxplot(stock_array, positions=dates, widths=0.6, patch_artist=True,
                showmeans=False, showfliers=False,
                medianprops={"color": "white", "linewidth": 0.5},
                boxprops={"facecolor": "C0", "edgecolor": "white",
                          "linewidth": 0.5},
                whiskerprops={"color": "C0", "linewidth": 1.5},
                capprops={"color": "C0", "linewidth": 1.5})

ax.set(xlim=(0.5, 4.5),
       ylim=(0, stock_array.max()),
       )

plt.savefig('candlestick.png')

#

ValueError: List of boxplot statistics and positions values must have same the length

#

it is giving this but len of dates and stock_array is same

hybrid maple
#

I am trying to train a model to predict loan eligibility, and I am getting this error:
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 607, 11), found shape=(None, 11)

this is my code:

import pandas 
import tensorflow 
from sklearn.model_selection import train_test_split


dataset = pandas.read_csv('/Users/oliverjohnson/loan-eligibility-predictor/loan-train.csv')
x = dataset.drop(columns=['Loan_Status'])
y = dataset['Loan_Status']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.01)

model = tensorflow.keras.models.Sequential()

model.add(tensorflow.keras.Input(shape=(x_train.shape)))

#input layers
model.add(tensorflow.keras.layers.Dense(256, activation='sigmoid')) 
#hidden layers
model.add(tensorflow.keras.layers.Dense(256, activation='sigmoid')) 
#output layer
model.add(tensorflow.keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
print('test')
model.fit(x_train,y_train, epochs=1000)


would appreciate any help```
long canopy
#

ty for the answer!

#

ended up using Gephi!

#

I really just meant, the best way to be able to visualize and have a high-level overview of said network

#

turns out i'll need to do some programming work on this, I need to visualize a particular graph in a very specific manner

iron basalt
#

You can change the BLAS library selection ordering (preferred order) when building Numpy from source. I don't think there is any proper way to handle libraries in Windows. On Linux this would be searching lib directories / using pkg-config.

#

(You could also move MKL so it does not find it in the expected location and uses the next option)

desert oar
#

okay, that makes sense. but what's with the groupings? you're interested in how the relationship between these independent variables and the dependent variable changes across 4 physical test locations?

#

this is a very traditional statistical modeling scenario... do you expect strongly nonlinear effects here? if not, i'd suggest maybe going for a more probabilistic model here

#

but if you want to use the random forest approach, i suggest just fitting the random forest model with all 3 of those dependent variables and categorical features as needed to describe the location

#

however i'm concerned if you only have 4 measurements, a random forest model will be approximately useless

#

and the measurements of a time series don't really count, each time series is effectively one measurement

quaint loom
#

Thank you so much for your explanation. And you’re totally right. The validity is of course robust but if gives us a more reasonable direction of thinking. I did figure out the issue as Matplot was cutting the flow causing it to not do the random forest test on the other areas. @desert oar

quaint loom
# desert oar okay, that makes sense. but what's with the groupings? you're interested in how ...

No. Not exactly. I want to know what parameters (independent variable) is mostly the cause of the dependent variable. I will be using several other methods ( Such as mentel test and SEM (When I am able to build the model) and additional experiments to understand the mechanism behind. I do have more then 4 measurements. Actually I have sampled weekly for over 1 year, so is not about having not enough data but more how to use the data and understand it in modeling and through statistical tests.

long canopy
#

are there any measures of data coverage, or data control? 100% coverage meaning, all the scoped data is maximally useful and none of it is wasted

#

or at least, 100% means, all data has been accounted for, or all data has been involved somehow, etc.

#

a measure of data being-accounted-for, data-accountedness?

past meteor
long canopy
past meteor
#

I don't mean this in a bad way but I have no clue what you mean. Could you give a concrete example / use case?

long canopy
#

hm, thanks for the question, definitely this needs more definiteness

wooden sail
#

i guess WSL and fiddling with packages and directories it is

wooden sail
#

this is exactly what i was hoping would exist

iron basalt
#

It's doing dynamic library manipulation / injection basically. Similar to what would be done with manual directory fiddling and stuff.

wooden sail
#

it does appear that this is linux only though

iron basalt
#

Yeah, Windows no idea.

wooden sail
#

ok, but doing this in wsl is still good

iron basalt
#

Windows does not work like Linux in this way, Linux prefers lots of dynamic libs in standard locations that are meant to be swapped out (e.g. for a security patch).

#

So that programs can be updated without recompiling everything.

wooden sail
#

mhm, makes sense

iron basalt
#

Windows's style is more that everything ships with its own copy of every lib (often statically linked too), and then "installs" by moving it to anywhere and editing the registry and path and such.

#

This means it's more annoying to develop on since the libs are not all in a standard spot, but the tradeoff is that if multiple libs depend on some DLL, and that gets updated by one of them installing, it does not break the rest.

#

(Which is part of why Windows apps don't break all the time like with Linux (unless you are using stuff like Flatpak specifically meant to avoid this issue))

wooden sail
#

right, the newer pre-packaged stuff like flatpack and snap take a similar container-like approach

iron basalt
#

(Which is partially why Musl exists)

wooden sail
#

i'll have to go read about musl, had never heard of it before

long canopy
#

anyone done Gephi plugin programming before?

shell ruin
#

Im working through some EDA and came across this warning. I feel that I am handling the issue that its warning me about. Am I misunderstanding this?

simple snow
#

Hey guys! I have data feature which is positively skewed and I want to use it for linear programming. I used skewness and Shapiro linearity test, and after applying logarithmic transformation the skewness got decreased but the Shapiro as k-s test both fails for normality. I've got around 400 data points, should I try to make any more transformations or remove outliers?

simple hound
#

Hello, I'm kinda new to machine learning stuff and i wanted to ask if someone knows a good book or free course for starting w it. I want to learn some AI related stuff so... any advise would be great.

desert oar
# shell ruin Im working through some EDA and came across this warning. I feel that I am handl...

it's because matches_raw is itself a "slice" of another data frame. matches_raw = data[...]

if you explicitly meant to make a copy, use matches_raw = data[...].copy(). if you want your changes to apply to data and not just matches_raw, don't slice off any columns.

also in the future it's much easier for people to help you if you share code as text, not a screenshot. use https://paste.pythondiscord.com for sharing

marble raft
desert oar
#

so you might want to try xgboost or similar in addition to your NN

marble raft
#

ok

hybrid maple
marble raft
hybrid maple
#

only took like 30 secs on my m2

marble raft
#

for 100 epochs it took 10 minutes

hybrid maple
#

only passively cooled also

marble raft
hybrid maple
#

the dataset is only like 600 entries though

marble raft
#

do u want to make a project together

hybrid maple
#

i know literally nothing about ai and ml i doubt id be a very useful partners lol

marble raft
#

i can help u

#

i have 1 year xp

#

but i am still learning

hybrid maple
#

i watched this 15 min video on how get started with neural networks and tried to apply that to another dataset

#

so far had 0 luck, 30% accuracy lol

marble raft
#

oof

#

i can help

hybrid maple
#

that would be much appreciated

#

i am trying to predict loan eligibility

marble raft
#

ok

#

lets talk in the dms

hybrid maple
#

sure

brazen spire
#

What arethe option to compute how far an object is from the edge?

shut girder
#

Is exploratory data analysis all that is needed to solve a given problem? I hear people say that EDA is just a step in the data analysis process and that insights from EDA can be used for further steps and analysis, is this true?

serene scaffold
jaunty geyser
#

What is an language model that can run on a intel i3 cpu

frigid creek
#

hi, im new with machine learning and stuff, but would like to know in object tracking, with like deepsort, is it possible to count the object tracked from the track id or is it not? why do most still use roi line to count or is there other method to count? thanks

abstract wasp
#

Does any one here know about any deep learning programs, schools, or online courses that really teach you everything? Not just CNNs but all of deep learning.

lofty thorn
#

From where do i learn data science

trim saddle
lofty thorn
#

any data scientist here?

trim saddle
#

I am

#

Kaggle might be a starting point

#

There are also recorded online lectures

#

If you search for it you find plenty stuff.

#

And like karpathy said, its not that important with what you start, more important that you start and put hours in.

lofty thorn
#

how old are you

trim saddle
shadow viper
#

I need a mentor 🙏🏽

serene scaffold
shadow viper
#

Sure sure... Thanks

shadow viper
#

But it feels like I'm not making any progress

#

Just making use for tensorflow keras applications (pretrained models)
I want more than that

quaint loom
#

I am currently working on developing a Random Forest model using a dataset that consists of weekly values for 16 different locations. My analysis focuses on the entire area rather than specific individual locations, which is why I've merged these locations into 4 distinct areas based on spatial considerations.

Regarding the imputation process, I am indeed using it to fill in missing values within the dataset. Specifically, when a location has missing data, I employ a method to calculate the mean value based on the remaining non-missing values from the grouped locations.

The issue I'm encountering is that after applying this imputation method, certain missing values that were initially 0 are now being replaced with unexpected values like 6. In essence, it seems like the imputation is causing non-missing values that were originally 0 to increase to 6.

I'm uncertain about the root cause of this issue and would greatly appreciate any insights or suggestions on how to resolve it. If there are any error messages or specific code segments that would aid in diagnosing the problem, please feel free to ask.

Number of missing values before imputation: 0
Number of missing values after imputation: 6

def fill_missing_values(data, columns_to_fill, area_groups):
for column in columns_to_fill:
for area, positions in area_groups.items():
mask = (data['Position'].isin(positions)) & data[column].isna()
data[column] = pd.to_numeric(data[column], errors='coerce')
mean_value = data[mask]['Date'].dt.month.map(data[(data['Position'].isin(positions)) & ~data[column].isna()].groupby(data['Date'].dt.month)[column].mean())
data.loc[mask, column] = mean_value

I am also using the Drop Nan as some of the parameter that I am taking sample is only montly.

Here is the complete code:
https://paste.pythondiscord.com/PDIQ

radiant dock
#

What do you guys think would be the best way to analyze a text and give suggestions to replace phrases from a list? Cosine similarity?

shadow viper
# quaint loom I am currently working on developing a Random Forest model using a dataset that ...

I'm not really familiar with most lines of your code since I'm still a beginner but I've encountered something like this before.

What if you write an if statement that if the data in the columns are 0 it should return back 0 and see if it works.
But thinking about this, other cells that aren't 0 might have issues. So what if you say if the cells are not NaN return cells else return (whatsoever you want it to).

Again, I'm just trying to help incase I'm not being helpful or anything, still a beginner at this

quaint loom
desert oar
shadow viper
quaint loom
past meteor
# quaint loom I am currently working on developing a Random Forest model using a dataset that ...

Not an answer to your question but you're leaking a bit of data

X, y = prepare_data(area_data)
if X.shape[0] > 0 and y.shape[0] > 0:  
    rf_regressor, mae, mse, r2 = apply_random_forest(X, y, area_label)

You're not really supposed to impute and then train your model. You're imputing using the mean of the entire dataset which isn't really allowed. If I were you I would try and encapsulate your entire preprocessing and modeling into an sklearn ColumnTransformer and Pipeline.

sci-kit learn's documentation are fantastic, I'd give them a read:

  1. A docs page on leakage, which is happening in your case https://scikit-learn.org/stable/common_pitfalls.html#data-leakage
  2. A docs page on pipelines etc. https://scikit-learn.org/stable/modules/compose.html
umbral charm
#

You see at around x = 2.5 and x = 6.1 there are 2 basically straight blue lines

#

this is because my function is like 1/tan(x) and thus goes to the infinites and comes back

#

How do i stop this line from being plotted

#

I dont want them to join up

toxic mortar
#

Why do I have to have insanely small learning rate in order not to get overflow runtime error?

import numpy as np
import matplotlib.pyplot as plt
import copy

def compute_gradient(w,b,x,y):
    djdw=np.zeros(x.shape[1])
    djdb=0.0
    for i in range (x.shape[0]):
        err=(np.dot(w,x[i])+b)-y[i]
        for j in range(x.shape[1]):
            djdw[j]+=err*x[i,j]
        djdb+=err
    return djdw/x.shape[0],djdb/x.shape[0]
def gradient_descent(alpha,epoch,_w,_b,x,y):
    w=copy.deepcopy(_w)
    b=_b
    for _ in range(epoch):
        djdw,djdb=compute_gradient(w,b,x,y)
        w=w-alpha*djdw
        b=b-alpha*djdb
    return w,b

if __name__ == '__main__':
    x = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
    y = np.array([460, 232, 178])
    w = np.zeros(x.shape[1])
    alpha = 5.0e-7
    epochs = 1000
    w, b = gradient_descent(alpha, epochs, w, 0, x, y)
    predicted_y = np.dot(x, w) + b
    feature_index = 0
    plt.scatter(x[:, feature_index], y, color='blue')
    plt.scatter(x[:, feature_index], predicted_y, color='red')
    plt.xlabel('Feature Value')
    plt.ylabel('Target Value')
    plt.legend(['Actual', 'Predicted'])
    plt.show()

Anything larger then this value is failing

past meteor
toxic mortar
# past meteor Gradient descent has been known to diverge instead of converge if the learning r...

Yes, I tried debugging, the weights become large and eventually turn into NaN. Then I tried playing around with learning rate, from 1e-10 to 5e-7.Each time 3x it and limit test it. Turned out that 5e-7 is a sweet spot. I even plotted the J based on number of iteration to see how fast or slow it is converging. Is this a common problem? Does it depends on the format of data ( like the actual values or even more trivial things like the number of elements)

past meteor
# toxic mortar Yes, I tried debugging, the weights become large and eventually turn into NaN. T...

It's a common problem yes, I wouldn't be able to explain it better than this link: https://stats.stackexchange.com/questions/315664/gradient-descent-explodes-if-learning-rate-is-too-large?noredirect=1&lq=1

rare ferry
#

I love reading books. How can I put my Data science skills to use in analysing a single book. How can I leverage my DS skills to more critically examine let's say Harry Potter and the Prisoner of Azkaban?

left tartan
radiant dock
#

how can I analyze a text with spacy, and check for similarities between sets of 2 or 3 words rather than whole sentences or individual words?

serene scaffold
#

What would make a "set of two or three words" "similar" to another word set?

radiant dock
#

why would I be the same guy?

radiant dock
serene scaffold
radiant dock
#

gotcha, I'll probably leave it as is, it's working acceptably well so far, it's a coding challenge for a job opportunity

#

thanks

serene scaffold
#

Is that what you're doing?

radiant dock
#

oh god, thankfully not

serene scaffold
#

I got rejected Pepega

radiant dock
#

I got a text paragraph and I have to suggest replacement phrases from a list based on a similarity score

#

I can approach it however I want, and decided to use spacy because it's what I'm most familiar with

#

it's working pretty alright, but certain sentences get suggestions that are not so good, and researching I found out about sentiment analysis, but yeah it seems very complex and I'm running out of time

serene scaffold
radiant dock
#

thanks man

#

I may use it as inspiration, but I still wanna come up with my own script, since I'm still learning and I want to understand what I'm doing if I end up getting the job

odd meteor
# radiant dock how can I analyze a text with spacy, and check for similarities between sets of ...
  1. spaCy

import spacy
nlp = spacy.load("en_core_web_md")  # make sure to use larger package!
doc1 = nlp("I like salty fries and hamburgers.")
doc2 = nlp("Fast food tastes very good.")

# Similarity of two documents
print(doc1, "<->", doc2, doc1.similarity(doc2))

# Similarity of tokens and spans
french_fries = doc1[2:4]
burgers = doc1[5]
print(french_fries, "<->", burgers, french_fries.similarity(burgers))
  1. https://github.com/UKPLab/sentence-transformers
GitHub

Multilingual Sentence & Image Embeddings with BERT - GitHub - UKPLab/sentence-transformers: Multilingual Sentence & Image Embeddings with BERT

mortal rover
#

Guys in need help with my project. Plz this is very urgent. I'm not as well gifted as y'all. I need to solve a real life problem using Business intelligence or machine learning. A unique topic help me how I can collect data for it too. Plz guide me programming gods. It could be any problem.

#

For now I just need to give a problem and analysis on how I'd collect data and try to solve it

shadow viper
past meteor
shadow viper
#

Thanks man

long canopy
#

could anyone throw me a couple of keywords to get me started of evaluating whether two paragraphs contain similar content/ideas?

summer crypt
#

I'm looking to get started with using AI in python. I want to write a program that will run power shell scripts to look for vulnerabilitys in a system and notify you about them. I could write an algorithm to do this manually but I wanna incorporate machine learning to automate the process. However, I have no idea where to start when it comes to working with ai. How should I get started?

serene scaffold
long canopy
serene scaffold
#

It also wouldn't be "more automated" than the program you have in mind

#

You use AI when the problem to be solved can't easily be expressed as an exact series of steps

summer crypt
serene scaffold
#

Admittedly, this is pretty removed from what you want to do

#

But if you start with trying to classify code examples as malicious or not, I think you'll be completely lost.

summer crypt
serene scaffold
#

While it's on my mind, when I got started with nlp, I had a mentor who insisted that I learn concepts that were above my ability to comprehend at that time, and while I appreciate that he believed in my ability to understand it, I think that stunted my motivation.

shadow viper
summer crypt
#

I think good starting point would be to figure out algorithm to do it manually to look for unused open ports/services then try to integrate the ai?

serene scaffold
serene scaffold
summer crypt
#

I get what your saying and I'll keep it in mind. Vulnerability scanning and detection nowadays works through an algorithmmic approach using preset flags that if tripped would notify the appropriate parties. However, if you could figure out these flags you can circumvent them. However with an ai approach it's harder to predict what the ai would consider as a trigger

shadow viper
summer crypt
shadow viper
quaint loom
# past meteor Not an answer to your question but you're leaking a bit of data ```python X, y ...

Thank you again for you suggestions. It seem like it helped a lot. And the docs you shared was highly valuable. If you ever have the time, would you have a quick look at the code and see it it actually got improved?

Although it seem to be improved, I do have a few questions.

  • It seems like the mae, mse, r2 is similar for all areas? Is the module just running for one area, although the terminal says is running for all area?

  • The module could not handle the missing data regarding the parameter that I have only once a month (beside the other parameter which I have sampled/tested weekly). Any suggestion on this?

  • Another question regarding the R^2. It turned out that the R^2 is as low as 0.25, which is quiet low. Is this suggesting that I still do not have enough data to run the module?

Again, if you ever have some sparetime, please have a look. And also, thanks to you @desert oar
https://paste.pythondiscord.com/YXKA

past meteor
# quaint loom Thank you again for you suggestions. It seem like it helped a lot. And the docs ...

I don't have the time this week to look at the code but you can ping me around this time next week.

  • I don't understand what you mean with "area", I think it'll be clear after I read your script.

  • How to handle missing data depends on your domain. You might be able to do a left fill or so. Do not do right fill or linear interpolation in time series as you might leak data.

  • I never use R^2 for prediction problems personally. It's unlikely but possible it's low because there is a non-linear relationship but the model is actually good R^2 doesn't account for this.

quaint loom
lean sparrow
desert oar
#

almost all successful applications of "AI" on specific applications like this turn out to be more like a handcrafted combination of machine learning, heuristics, and statistics. in particular, figuring out how to actually represent your data in some way that you can actually run machine learning models on it is usually the most important thing you can spend your effort on. that process is often known as "feature engineering". as you might imagine, constructing useful features is very often a matter of understanding the problem domain and starting from a position of "how would a human look at this"?

lean sparrow
#

If anything data science to help identify how often certain systems or types of systems need attention and how that affects labor costs and why you should maybe charge more/less for different system types when integrated into a vuln management program

shadow viper
thorn bobcat
#

@serene scaffold you on?

#

:)

delicate rune
#

what could the problem be?

quaint loom
#

That is excellent to hear. Awesome ^^

small wedge
#

"Type a number: " should be in the input call

#

and then you should just print even or odd

delicate rune
#

o really?

small wedge
#

also this is probably not the channel for this lol

delicate rune
#

ohh I sent it here because there's more active people in here compared to the other channels

#

just needed some little help and i appreciate that you helped me

small wedge
#

no problem 👍

placid cedar
#

hey guys, after i performed train test split, and one hot encoded my train and test data, i wanted to put them into the regressor to evaluate the model's performance. but i've been stuck on this error for a long time. do need urgent help with this!

dull flare
#

Sup how long do u guys think one should focus on eda+ supervise +unsupervised learning. The problem is its obvious that no one can master it in small amount of time but I can't get stuck over there for long period of time either. So advice me when to move and should deep learning be my next goal.

edgy pasture
#

Can anyone help me do my assignment ?

#

It's all in python but I don't really know what's going on since I'm new to it

#

It won't be complicated to you guys but it's entire another language for me.

serene scaffold
#

@edgy pasture this is the data science and AI channel. Is it about that?
In either case, be sure to never ask to ask. Ask your actual question. Not if someone is willing to answer a question that you haven't revealed yet.

edgy pasture
#

yea

#

its a mixture of datascience and python

#

could we hop on call, cause itll be easier to say what its about

lapis sequoia
delicate rune
lapis sequoia
delicate rune
lapis sequoia
#

I’ve been hanging out c# and html just trying to make stuff, I’ve been studying

#

Kinda fun, I agree with you

#

What initially got you into wanting to study Python?

delicate rune
delicate rune
lapis sequoia
delicate rune
delicate rune
lapis sequoia
lapis sequoia
#

Dang

lapis sequoia
#

I’m a beginner too, but I need a group to grow with

south crypt
#

Hello, I'm new writing in this section. I'd like to have some recommendations on a problem I'm trying to solve. As a context, I'm trying to solve it with Deep Reinforcement Learning.

The task is to control the activity of some fans [on / off] (In this case, 3, all with the same caudal of 95 m3/h) connected to a box, that has a heater inside (currently, it is always at 100% capacity, which is 1kW)

The current set of actions, with 3 fans, are numbers from 0 to 9, being mapped as: 0 -> do nothing; 1 -> {0, 0 ,0}; 2 -> {1, 0, 0}; 3 ->{0, 1, 0}; 4->{0, 0, 1}; 5->{1, 1, 0} .... 9->{1, 1, 1}. Being {x, x, x} the representation of the state on[1]/off[0] of the fans_{1, 2, 3}

The Ambient temperature might as might not be hotter that the target temperature wanted inside the box.

Currently, the simulator I built with my colleague use the basic heat transfer equations, without considering that faster wind lower the entrance temperature.

The ambient temperature is obtained every second, and the changes in internal temperature is calculated every 0.01 seconds (A simple interpolation is made to obtain the external temperature in each "dt"). The steps are every 2 seconds (might change in the future), This means that the algorithm has to take a decision every 2 seconds. There is no penalty for turning on/off fans consecutively (like a kid with a light switch), yet.

The values available for the NN are: Ti(Internal T), Tt(Target T), Te(External T), A_t1 (Action in last step), Delta Ti (Change in temperature in alst step) and Dt (step size in seconds)

Here are my questions:

  1. If my intention is to keep the temperature near the target temp. Which would be a good q-function?
  2. Right now, I'm just using a Sequence Neural Network (SNN), with some "relu" activation functions and a linear activation function to obtain the q-function-estimate as an output. Any recommendation on how deep or wide the NN?
south crypt
# south crypt Hello, I'm new writing in this section. I'd like to have some recommendations on...
  1. Would it be possible / wise, to try to use a RNN? I would think that the hidden state would have some intrinsic information about how the external temperature has been changing over time
  2. If I would have to add more fans, the number of states would increase in a 2^n +1 size. Any advice to affront this "curse of dimensionality"?
  3. If the fan state were to be continuous... Any idea how to affront it?

Thanks in advance for any idea, suggestions, questions

delicate rune
#

what makes it enjoyable is that whenever she gets like an exercise or project that she’s having trouble with, I can assist and it js makes coding hella fun imo

delicate rune
delicate rune
lapis sequoia
#

it's cool, you enjoy yourself

delicate rune
delicate rune
halcyon hedge
lapis sequoia
lapis sequoia
#

html and css

past meteor
halcyon hedge
past meteor
#

The actual EDA, at a glance, looks good. You're asking relevant questions, providing detailed answers with context outside of the scope of your dataset and so on

past meteor
#

Like getting those duplicates indexes and really looking at the rows and seeing what's up with them. If they're truly duplicates you can throw them away but also show the reader that they are

halcyon hedge
#

Okayyy, will make sure to do these things from now on

past meteor
halcyon hedge
final kiln
#

how large do I have to make GPT to get interesting results ?

desert oar
# halcyon hedge Thanks a lot for your time

i'll echo that removing outliers is drastic and needs to be carefully motivated. are they actually anomalous events in some way that might warrant removing them from the analysis? or are they legitimate values that happen to occur at the tail of the distribution?

#

this is good

#

i'd like to see it on log scale as well

#

currently the big spike dominates the graph, which is good: it tells a clear story, there is a huge increase compared to a global baseline

#

but there might be a secondary story which is hidden by the scale

#

one thing i wonder about is measurement methodology. what defines a terrorist attack? who collected this data? has the methodology or definition changed over time in a way that might affect the data?

#

i'll also echo that the "asking questions" section is excellent, both in concept and in execution

verbal venture
#

can anyone explain the [index//2] part for the skip connecetion. tthe m odel is unet: ```py
class UNET(nn.Module):
def init(self, in_channels=3, out_channels=1, features=[64, 128, 256, 512]):
super(UNET, self).init()
self.ups = nn.ModuleList()
self.downs = nn.ModuleList()
self.pools = nn.MaxPool2d(kernel_size=2, stride=2)

# DOWN PART OF UNET
for feature in features:
  # creating down sampling layers - adding every feature output
  self.downs.append(DoubleConvolution(in_channels, feature))
  in_channels = feature # becomes input to next Conv

# UP PART OF UNET
for feature in reversed(features):
  # double width of image
    self.ups.append(nn.ConvTranspose2d(feature*2, feature, kernel_size=2, stride=2))
    self.ups.append(DoubleConvolution(feature*2, feature))

# 512, 1024
self.bottleneck = DoubleConvolution(features[-1], features[-1]*2)
self.final_conv = nn.Conv2d(features[0], out_channels, kernel_size=1)

def forward(self, x):
skip_connections = []
for down in self.downs:
x = down(x) # downsampling tensor
skip_connections.append(x)
# pass through max pooling
x = self.pools(x)

x = self.bottleneck(x)

# REVERSING LIST FOR UPSAMPLING
skip_connections = skip_connections[::-1]

# up, double conv
for index in range(0, len(self.ups), 2):
    # for each index upsample
    # upsample, pass through double transpose 
    x = self.ups[index](x)
    # skip connection - div due to step 2
    skip_connection = skip_connections[index //2]

    if x.shape != skip_connection.shape:
      x = TF.resize(x, size=skip_connection.shape[2:])

    
    concat_skip = torch.concat((skip_connection, x), dim=1)
    # running through double conv
    x = self.ups[index+1](concat_skip)

return self.final_conv(x)```
final kiln
verbal venture
#

Yes sir

#

The code I was writing did not have 2 elements in self.ups before. I thought ups was only 4 elements long, so did not know mathematically how that was working

latent dirge
#

if I want to ask something pandas-related, is data-science-and-ai the right tag in the help section?

final kiln
#

Yes I'd assume so

#
class SelfAttentionHead(nn.Module):
  def __init__(self, params: ModelParameters):
    super(SelfAttentionHead, self).__init__()
    

    self.d_k = params.word_vector_size // 3

    temp = []
    for _ in range(3):
      proj = make_parameter(size_x = params.word_vector_size, size_y = self.d_k)
      bias = make_parameter(size_x = 1, size_y = self.d_k)
      temp.append(proj)
      temp.append(bias)

    self.q, self.q_bias, self.k, self.k_bias, self.v, self.v_bias = temp

  def forward(self, sequence: torch.Tensor):
    q_vectors = self.q_bias + self.q @ sequence
    k_vectors = self.k_bias + self.k @ sequence
    attention_scores =  q_vectors @ k_vectors.T
    attention_scores /= torch.sqrt(self.d_k)
    attention_scores = torch.nn.functional.softmax(attention_scores)
    v_vectors = self.v_bias + self.v @ sequence
    return attention_scores @ v_vectors
#

I'm implementing nano gpt, and one thing that surprised me is that the Q,K,V matrices end up reducing the dimension of the embedded token

#

Which kinda ruins the intuition I've been reading about how all this is based on a sort of modified dot product for similarity.

#

The normalization is also quite strange. The normalization layers go like (v - E(v))/std(v)

#

Which does scale their size so as not to let them explode in value, and also centers them at 0. But I don't see much of an intuition when thinking of word embeddings as living in some dot product space as suggested by a lot of resources online

#

I'd imagine a better normalization would be, actual normalization, v / norm(v)

#

Furthermore, why is positional encoding needed ?

#

Couldn't the network pick up the position of each word via the literal position of the word vector in the matrix that represents the sequence ?

limber creek
#

Hey guys, I am a Computer Science student and I want to learn Machine Learning, AI. So right now, I know a bit of Python and 7th grade Maths. I would be really glad if you can provide me with a super detailed roadmap on how to learn these stuff and finally land a job.
I don't wanna invest my time on learning something which is currently not of the most priority.
Thank You~~~

final kiln
#

I'd reckon you should have:

  • calculus I and II (important to understand gradient descent and why neural nets work at all)

  • linear algebra - this is the basis for pretty much anything that is both high dimensional and linear and is like a language that you use to talk about all sorts of things, so it's pretty useful

  • multivariate calculus - this is like, joining points one and two, and is where neural nets reside I'd say. This is where the concept of gradient resides, which you need to understand gradient descent and etcs

  • stats and probability - neural nets can be thought as statistical models, and you use statistical tools to evaluate their performance etc etc etc

#

Signal processing concepts are also super useful. So like knowing what is a Fourier transform, knowing about kernels, knowing about DFT, knowing how to understand data, manipulate it, etc

#

I'd say, once you know all this stuff, and you are good with python, picking up the ML frameworks and just start building things is enough to get you going.

limber creek
#

Are these stuff covered in grade 12 maths??

final kiln
#

I'd highly recommend finding time for a college education if possible. If not, at least complementing the math til 12th grade and try to cover these subjects over time.

quaint loom
#

Do you guys recommend using pydot and GraphViz for visualtion? Not sure if its relevant but I am using python - VScode

past meteor
# quaint loom Do you guys recommend using pydot and GraphViz for visualtion? Not sure if its r...

I've never used GraphViz directly, only things built on top of it, same for Pydot so I can't comment on how good they are.

Personally I use:

  1. Matplotlib for straightforward things.
  2. Seaborn for things that are a bit more work in Matplotlib "natively"
  3. Plotly if I want interactive plots.

Seaborn is built on top of Matplotlib and honestly, if you want to learn Seaborn you need to know the basics of Matplotlib, it makes your life so much easier. Matplot has a very strange API this is a must-read, if you do over it, it'll all make sense in like half an hour or less 😄 https://matplotlib.org/stable/users/explain/quick_start.html. The "anatomy of a figure" section is critical to understand. This is also interesting because this is the code that actually makes the figure in question https://matplotlib.org/stable/gallery/showcase/anatomy.html.

Seaborn's documentation is also great, I'd block out an hour or two to read it after you're familiar with Matplotlib.

quaint loom
# past meteor I've never used GraphViz directly, only things built on top of it, same for Pydo...

I am a bit familiar with matplotib but I guess I still have a lot more to learn about it. But I find it a little tricky to do quick modification to make the visualization considerated "beautiful". Some of my coworkers is pretty good at Origin but I don`t feel like having to use another software like that. I tried Origin once and and I felt I was back to SPSS somehow.

Seaborn is not working in the purpose of what I will be doing as I am currently going to create a path diagram for my structural equation model.

past meteor
quaint loom
past meteor
#

For R I'd really just focus on learning how to do stuff and not necessarily being a competent R programmer. Treat it like a statistics and data visualisation toolkit. I love Python but R is better at both.

#

(Not) using notebooks is a surprisingly long and nuanced debate, I'm on mobile so I'll summarise it by saying that you should be able to code outside of an interactive session (Jupyter, Spyder, Iphyton, Rstudio) at the very least yes

quaint loom
past meteor
pearl barn
#

guys I wanna to ask How to install conda for python and How to run Jupyter notebook on it locally on my Windows I'm learning data analysis with python from a website called Jovian dot com but I couldn't save my work online if anyone can explain me this and does it worth learning python basics from this course another point the same course available on freecodecamp

odd meteor
# pearl barn guys I wanna to ask How to install conda for python and How to run Jupyter noteb...

It's pretty straight forward. Just download the Anaconda Distribution. https://www.anaconda.com/download

Once you've done that, it brings alongside all its friends like Jupyter Notebook to the party.

Meanwhile, can you add more clarity on the "I couldn't save my work online" part.
Is it that you were using Colab or Binder to run your code?

Anaconda Team

Anaconda's open-source Distribution is the easiest way to perform Python/R data science and machine learning on a single machine.

odd meteor
# pearl barn guys I wanna to ask How to install conda for python and How to run Jupyter noteb...

Akash is one of founders of Jovian. His work has been featured in FreeCodeCamp so I believe his python course will be on point!

As you already know, we humans don't always like similar stuff... So I think what you should focus more on is finding out for yourself if that particular python course in Jovian is 'customer-friendly' to YOU.

Only way to find out is to try taking a few chapter of the course with an open mind.

And If it's hard for you to understand what's being taught or you find yourself sleeping off while watching the video (I presume it's a video course), then by all means don't hesitate to drop it and try another course.

pearl barn
#

Is it better to use miniconda and How to run Jupyter locally from online course?

left tartan
pseudo pasture
#

Hello guys i want to make recommendation model based on the credit card data and one of the column is df['Reward rates']
which have data like this:

rows 1: '6X 6x Marriott Bonvoy point dollar eligible purchase hotel participating Marriott program 4X 4x point purchase made restaurant worldwide gas station wireless telephone service purchased directly service provider purchase shipping 2X 2x point eligible purchase'

row 2: '7X Earn 7X Hilton Honors Bonus Points dollar eligible purchase charged directly hotel resort within Hilton portfolio 5X Earn 5X Points per dollar purchase restaurant supermarket gas station 3X Earn 3X Points eligible purchase Card'

row 3: '12X Earn 12X Hilton Honors Bonus Points dollar eligible purchase charged Card directly hotel resort within Hilton portfolio 6X Earn 6X Points dollar purchase Card restaurant supermarket gas station 4X Earn 4X Points dollar Online Retail Purchases 3X Earn 3X Points eligible purchase Card'

'3 3 Cash Back supermarket per year purchase 1 3 3 Cash Back online retail purchase per year 1 3 3 Cash Back gas station per year 1 1 1 Cash Back purchase'

'12X 12X directly hotel resort Hilton portfolio 6X 6X Select Business Travel Purchases 3X 3X eligible purchase Terms Limitations Apply'

now I'm applying many nlp techniques to extract meaningful data but either can't get relevant features to train model on or there are so many columns created if i Use tf_idf and n-grams, any help will be appreciated.

odd meteor
# pearl barn Is it better to use miniconda and How to run Jupyter locally from online course?

I've always used anaconda but some people also prefer miniconda. So you'll be fine with either one.

Yes you can run your code on JNB locally with the online course.

https://stackoverflow.com/questions/45421163/anaconda-vs-miniconda

trim saddle
#

You could also just go with a normal python install and use vs-code IDE with jupyter extension to work with notebooks.

pseudo pasture
#

one thing i do is this for every row based on ?X values creating the seprate columns

halcyon hedge
halcyon hedge
odd meteor
# pseudo pasture Hello guys i want to make recommendation model based on the credit card data and...

There are several ways to control the number of extracted features gotten by TfidfVectorizer.

Personally, in most of my work I always use ngram_range = (1, 2) to consider both unigram and bigrams in the final features tfidf extracts.

For every other parameter I experiment, experiment, and experiment before settling for the configuration that yields the optimal result.

The documentation will do better justice than I can in explaining what each parameter in TfidfVectorizer does.

https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

pseudo pasture
#

see using ngram of (2,2) i get this first

#

for single row too many ngrams created

odd meteor
pseudo pasture
final kiln
# halcyon hedge Hey I was just curious, are there any widely used models using stochastic calcul...

I don't think I've studied stochastic calculus. But probably related are the usage of monte Carlo methods. Quite sure GPT uses a simple one to generate its output. GPT itself approximates a probability function, which is then used to sample tokens. I've also read very briefly about some networks that use probabilistic activation. And then there's quantum learning, which is inherently stochastic because quantum is all about probabilities.

desert oar
desert oar
#

usually "monte carlo" methods refer to computational techniques for approximately computing otherwise-intractable quantities, very often integrals. they are frequently used in Bayesian statistical inference and probability modeling to sample from a posterior distribution

#

the underlying theory of monte carlo computing techniques is indeed that of stochastic processes, eg. markov chain monte carlo (markov chain being a particular type of stochastic process)

final kiln
final kiln
desert oar
#

right

#

that's why i am wondering where and how it shows up in the GPT language model

#

i get that the output is a stochastic process, but that doesn't strike me as a monte carlo method except in a very generic sense

final kiln
desert oar
#

yeah, that's a stochastic process

#

the "state" is the current context and the state transition function is the probability distribution over the next token

#

or something like that anyway

final kiln
#

So the computational method you use to generate the token would be a monte Carlo method, albeit a simple one

desert oar
#

that's a broader definition of monte carlo methods than what i'd use and typically see, but i understand what you mean

#

if you did something like repeatedly generating multiple outputs over and over using the same prompt, in order to compute some statistic or distribution over those outputs, i'd say that's more like a monte carlo method

#

but maybe my interpretation is too narrow

final kiln
#

Uhmmmmm, yeah I see what you mean. You'll usually be doing that yeah. So ig you'd just call it random sampling.

#

I mean is not so clear cut

#

In case of GPT, the underlying statistic/shape of distribution does matter a lot

#

But idk, I don't like to get too caught up in the definitions

lapis sequoia
#

Hi i have a bunch of annotated images and i want to make a python ai model that trains with those images so that it can detect the image from a given picture
can someone show me a good course or where i can get started

final kiln
# desert oar if you did something like repeatedly generating multiple outputs over and over u...

Wikipedia has an interesting passage about the possible definitions, I think this one is what aligns with the way I use it:

"""
Monte Carlo simulation: Drawing a large number of pseudo-random uniform variables from the interval [0,1] at one time, or once at many different times, and assigning values less than or equal to 0.50 as heads and greater than 0.50 as tails, is a Monte Carlo simulation of the behavior of repeatedly tossing a coin.
"""

So like, GPT would take the place of the [0, 1] distribution and the simulation would be the simulation of the behaviour of a person writing some text message.

#

This would mean that it's not just the last step, the whole thing would be a monte Carlo simulation.

iron basalt
#

Monte Carlo is very broad.

#

Oh and repeat runs too*

final kiln
#

I didn't quite understand your point. The passage I'm mentioning is making a distinction between monte Carlo, simulation and Monte Carlo simulation.

#

Simulation: Drawing one pseudo-random uniform variable from the interval [0,1] can be used to simulate the tossing of a coin: If the value is less than or equal to 0.50 designate the outcome as heads, but if the value is greater than 0.50 designate the outcome as tails. This is a simulation, but not a Monte Carlo simulation.

Monte Carlo method: Pouring out a box of coins on a table, and then computing the ratio of coins that land heads versus tails is a Monte Carlo method of determining the behavior of repeated coin tosses, but it is not a simulation.

Monte Carlo simulation: Drawing a large number of pseudo-random uniform variables from the interval [0,1] at one time, or once at many different times, and assigning values less than or equal to 0.50 as heads and greater than 0.50 as tails, is a Monte Carlo simulation of the behavior of repeatedly tossing a coin.

#

Uhm, no I think it would fall into the first one

#

Even accounting for the auto regression, the end result is one sample

iron basalt
#

Are you not drawing a large number?

final kiln
#

It's technically a single sample from one distribution I think.

iron basalt
#

It can effectively learn to do what the position encoding does.

#

(But it's a waste, just neat that it can)

final kiln
#

Oh okay I see, you give it a hint for how to represent position so it' more efficient to train it

final kiln
iron basalt
#

Especially fuctions that the network would require a lot of neurons to compute itself.

final kiln
#

Yeah I didn't think of it that way, I had the impression that positional encoding was obligatory.

iron basalt
#

Or the extreme end of that, precompute a ton of random functions on the inputs, then at the end have a simple linear layer.

#

(Which is its own model of computation being researched, pulling answers out of chaos, one of those functions surely has the answer by chance)

lunar ibex
#

hi is there a situation where the following is true

(ndarray * scalar // scalar) != ndarray
#

im currently facing this situation and not sure what could have caused it

#

shape of my ndarray is (39584,) single dim array

final kiln
#

Probly something like dividing by zero

#

Or very small values

lunar ibex
#

my scalar is 8 tho

final kiln
#

Uhm is still possible I think, what's the smallest value in the array

lunar ibex
#

0

final kiln
#

Better yet, you can directly print the ones that are different

#

And try to see a pattern

lunar ibex
#

left side is original ndarray, right side is after * and //

#

some of the bytes are short by 96 (hex 60)

final kiln
#

Uhm, would be easier to see base 10

iron basalt
final kiln
final kiln
#

Yeah that's pretty cool

iron basalt
#

It's also related to grid cells, and the Fourier Transform (grid cells act like one).

lunar ibex
final kiln
#

Uhm from what I recall grid cells sort of create a map of repeated circular shapes that repeat across space. You'd have several kinds which repeat at different frequencies and that's how it kinda encodes position

final kiln
#

So is kinda a 3d sine wave

#

Didn't think of it that way, pretty cool

iron basalt
#

Place fields, which are built with grid cells, also kind of show up (we know less about them, so can't really tell yet) in Transformers, when the context switches you can see the attention remapping.

#

(The context remapping place fields behavior)

final kiln
#

Doesn't this at least point to GPT having "understanding" similar to our own ? Since it's using similar ways of representing things

iron basalt
final kiln
#

Do you know how they are doing vision ? Is it a literal part of the input or is it part of a different network ?

#

Like, a transcription from image to text and then that gets fed to gpt ?

left tartan
#

It’s a philosophical debate over what ‘understanding’ means

iron basalt
#

A lot of what happens in the brain revolves around this positioning system / place fields, so it's probably needed for all future networks.

iron basalt
final kiln
past meteor
left tartan
final kiln
iron basalt
#

Vision, like all other systems in the brain is heavily reliant on top down observer expectations, it's how you see things even if they are noisy, and also things that don't exist, like imaginary edges.

iron basalt
#

Each system can affect the top town effect on another.

left tartan
iron basalt
#

Like priming someone with an auditory cue, which affects what they see.

final kiln
past meteor
iron basalt
#

If trained together, the language modeling part would actually ground it's symbols better to visuals and such, making it have an actual understanding of the world. Via just text is too narrow.

left tartan
iron basalt
#

A big one is touch, specifically how positioning systems interact with that and model objects / spaces, and link that to stuff like words, sounds, etc.

#

However, this interconnected training problem is really hard, because you need all inputs coming in at the same time, you can't just train each part separate.

past meteor
# final kiln Do you know how they are doing vision ? Is it a literal part of the input or is ...

You can make an arbitrary model multimodal by doing this:

Language model A has an embedding space a.

Vision model B has an embedding space b.

Train a translation "network" c that maps a to b and vice versa.

There's been a large amount of research doing this. They take pretrained vision and language models and just train the translation/mapping network. You would need a training task that accurately allows you to learn this though, for instance image captioning may work to train this translation network.

iron basalt
#

Just plain text is about as convenient as it gets.

#

One thing to also note about training them separate is that you doing a lot of redundant work, when interconnected during training they can make the learning processes faster for each other.

past meteor
#

Agreed

#

But if your compute budget or dataset isn't massive I prefer freezing the language and vision models and just training the translation

iron basalt
#

It's not ideal, but works decently well.

past meteor
#

But multi-task learning has been shown to improve generalization and data efficiency in theory yes. I typically comment from the "practical" perspective 😄

iron basalt
#

But, if your models are online learners, now you can do some cool stuff in post. You can hook them up more directly and have them learn more together without disrupting each other's knowledge.

#

(Biology uses online learning, in part because it really needs to not disrupt exisitng stuff, especially while still growing)

past meteor
iron basalt
final kiln
# left tartan Read the wiki page, the argument is not so easily discarded

I've read the system reply section for refutation. I don't think this thought experiment proves or disproves any side, it just brings to light how ignorant we are about consciousness.

I don't take a strong side, I just try to err on the side of caution so that I can act in an ethical manner in face of ignorance. We don't know how it works, so we should be careful when something starts acting conscious, otherwise we may inadvertently cause suffering.

past meteor
#

I used to know though, we learnt about it in some math or finance class.

left tartan
final kiln
#

And it seems like we won't have answers for a long time

past meteor
#

My default stance to AI safety is that our current approach is bad

#

It's always philosophy 😩

lunar ibex
left tartan
past meteor
#

This is the only field where this is the case. When civil engineers are building a bridge they don't call in philosophers to talk about the safety of it nor expect engineers to become philosophers.

iron basalt
past meteor
#

It's become my ultimate pet peeve these days, we should stop this imho

iron basalt
#

Yeah.

#

Won't go anywhere, you think what you think at this point.

left tartan
past meteor
#

(I don't mean this in reference to the conversation above btw! It's only tangentially related.)

past meteor
#

The top voices of AI safety being dominated by people like Eliezer Yudkowsky

#

I'd say they still have a very important role to play, even when talking about how structural engineers build bridges. What should be the most important thing to be is not hypotheticals like "sentience" but things that are grounded in how models are actually trained, so basically grounded in math. There's great papers that take this angle which Rob Miles frequently summarizes The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

iron basalt
#

(Based on math)

past meteor
#

Was not my intention

left tartan
odd meteor
past meteor
iron basalt
odd meteor
iron basalt
#

(It's similar to those actually trying to solve climate change actively, versus the panickers and deniers, they are too busy solving the problem so you don't hear much from them)

final kiln
#

2017 was a different world I suppose

#

anyway, is there anything being done in the direction of making the structure of the neural network part of the optimization ? so like, instead of just adjusting weights, also be able to add layers, increase their size, decrease their size, etc

#

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and ...

past meteor
#

NAS is a genetic algorithm

#

You'd typically have 2 optimization routines, one to train a network (a single instance) and then a hyperparameter search that maybe isn't fully random if you're going down this route

#

I haven't read any research of people doing it in one e.g., using the gradient after a batch to change the actual architecture, which I think the question is, maybe @iron basalt has

final kiln
#

Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods.

iron basalt
final kiln
iron basalt
#

As you might be able to imagine, a learner that is really good as learning new things without disrupting existing knowledge at all is ideal for growing more layers and such.

final kiln
#

yeah ig I'm still getting caught up on all the terminology

iron basalt
#

(And does not even need batches either)

#

(Which is why biology must do it)

final kiln
#

are there any multi sequence GPT models ? like, two input sequences, one which is updated independently (like a user writing to a textbox real time) and another which is the output of the model

#

However, the concept of handling multiple independent input sequences in the context of GPT-like models is more about how you frame the problem and feed data into the model rather than a built-in feature of the model itself. For instance, you can design an application where one input stream is user-generated content (like real-time text input) and another is context or additional information (like a separate conversation or data stream). These inputs can be concatenated or formatted in a way that the model understands as separate but related pieces of information.

#

interesting, I think I'm still gonna do two branches and then kinda mix em up somehow before the output

#

my objective is to be able to talk with it via voice chat in real time

#

so like, it should know that it is interrupting me and not speak

#

instead of the current turn based thing they have on chat gpt mobile

desert oar
#

that's the encoder-decoder architecture, as in e.g. the "attention is all you need" paper

#

(unless i'm misunderstanding what you're looking for)

final kiln
#

Right, in the original one the sequence doesn't change, but I suppose it's only a minor adjustment

#

It encodes the first sequence and then autoregression is done with the decoder

desert oar
#

oh, i think i see what you mean

#

maybe it would work if you applied masking to both input and output...

#

that has to be done somewhere in the literature already. right?

final kiln
#

I haven't looked into it yet, I'm still in the part of understanding and implementing the transformer. I'm using a Viz as guide

desert oar
#

i was focused mostly on the self-attention mechanism specifically, since that was the non-obvious part to me

#

(not that any of it was obvious, but it was the part that i really didn't understand from reading the literature)

final kiln
#

I think I got some intuition for self attention tho I do need to work through it.

#

I'm honestly stuck on why z score is used to normalize the vectors

#

Has no direct interpretation except that it keeps values from exploding

craggy patio
#

I am creating a MIDI music generative AI but have failed multiple times. I am starting over again and would like some insight on what models I should use

desert oar
#

the important thing is to put all numbers on roughly the same numerical precision scale

#

centering at 0 and rescaling by standard deviation just happens to work well for that, it helps ensure that you're "in the middle" of the space of what can be represented by floating-point numbers, allowing lots of room for numbers to be significantly smaller than or significantly larger than 0

#

it also does have direct interpretation in statistical models, so there's some carry-over if you squint

#

oh also, if you center the mean at 0, then scaling down by standard deviation is just normalizing in the linear algebra sense of dividing by the l2 norm

#

i really like this talk for an explanation of self-attention https://youtu.be/S27pHKBEp30?feature=shared&t=587

Leo Dirac (@leopd) talks about how LSTM models for Natural Language Processing (NLP) have been practically replaced by transformer-based models. Basic background on NLP, and a brief history of supervised learning techniques on documents, from bag of words, through vanilla RNNs and LSTM. Then there's a technical deep dive into how Transformers ...

▶ Play video
#

the value of the i,j cell of the attention matrix is a relevance score of the j'th token in the input sequence "from the perspective of" the i'th token in the output sequence

#

that's why they mask off the upper triangle of the attention matrix in decoder-decoder transformer, to prevent the i'th token in the decoded sequence from "attending to" any subsequent tokens

final kiln
#

ye makes no sense

#

in 2d is awful

#

maybe im doing something wrong

#

the vector goes from spanning the entire 2d plane to being confined to two points

#

which makes sense, subtraction makes it confined to y = -x, then normalization forces the norm to be 1

#

so each time this is done two dimensions are discarded, ig the network will find some way of accounting for this

#
class SelfAttentionHead(nn.Module):
  def __init__(self, params: ModelParameters):
    super(SelfAttentionHead, self).__init__()
    self.compressed_coordinates = params.word_vector_size // 3
    self.q: TensorFloat["coordinates compressed_coordinates"] = RandParameter(
        params.coordinates, self.compressed_coordinates
    )
    self.k: TensorFloat["coordinates compressed_coordinates"] = RandParameter(
        params.coordinates, self.compressed_coordinates
    )
    self.v: TensorFloat["coordinates compressed_coordinates"] = RandParameter(
        params.coordinates, self.compressed_coordinates
    )

  def forward(self, sequence: TensorFloat["words coordinates"]) -> TensorFloat["words compressed_coordinates"]:
    # TensorFloat["words coordinates"] @ TensorFloat["coordinates compressed_coordinates"] 
    q_vectors: TensorFloat["words compressed_coordinates"] = sequence @ self.q
    k_vectors: TensorFloat["words compressed_coordinates"] = sequence @ self.k
    v_vectors: TensorFloat["words compressed_coordinates"] = sequence @ self.v

    # TensorFloat["words compressed_coordinates"] @ TensorFloat["compressed_coordinates words"]
    attention_scores: TensorFloat["words words"] =  q_vectors @ k_vectors.T
    attention_scores /= torch.sqrt(self.compressed_coordinates)
    attention_scores = torch.nn.functional.softmax(attention_scores)

    # TensorFloat["words words"] @ TensorFloat["words compressed_coordinates"]
    return attention_scores @ v_vectors
#
class SelfAttention(nn.Module):
  def __init__(self, params: ModelParameters):
    super(SelfAttention, self).__init__()
    self.head_1 = SelfAttentionHead(params)
    self.head_2 = SelfAttentionHead(params)
    self.head_3 = SelfAttentionHead(params)
    self.projection: TensorFloat["words words"] = RandParameter(params.words, params.words)

    def forward(self, sequence: TensorFloat["words coordinates"]):
      att_1: TensorFloat["words compressed_coordinates"] = self.head_1(sequence)
      att_2: TensorFloat["words compressed_coordinates"] = self.head_2(sequence)
      att_3: TensorFloat["words compressed_coordinates"] = self.head_3(sequence)
      output: TensorFloat["words coordinates"] = torch.stack([att_1, att_2, att_3], dim=1)
      return self.projection @ output
#

wait I should do softmax here isnt it

#

no is done on attention_scores = torch.nn.functional.softmax(attention_scores)

golden ridge
#

anyone has some resources on how to train neural networks??

final kiln
desert oar
#

i'm not an expert in deep learning as i'm sure you know, but i've only ever seen the latter

final kiln
#

They do m*z_score + b, where b and m are learnable

desert oar
#

you're talking about the layer norm step?

#

i see, so it is

#

i maintain it makes sense to both center and scale

#

it's the same reason you do it in just about any other machine learning model

#

it's good for numerical behavior

#

the fact that the scaling of centered data coincides with l2 vector normalization is just a bonus

#

The goal is to make the average value in the column equal to 0 and the standard deviation equal to 1. To do this, we find both of these quantities (mean (μ) & std dev (σ)) for the column and then subtract the average and divide by the standard deviation.

i wish they'd say why you do this, because what i said above is not obvious at all unless you already happen to know it

#

great resource overall, but too much focus on what/how and not enough on why

final kiln
#

1/norm(v) is much more intuitive

#

Uhm, I also wonder if anyone has tried to do "compression" of the attention heads.

So like, train a larger transformer, but then look at the attention heads and see if they can be used to train smaller ones. Effectively compressing them. Or maybe even changing architectures entirely.

desert oar
desert oar
#

it's not a linear transformation, but it doesn't actually change any of the subsequent interpretation

#

it's just shifting the entire space to exist in a more numerically-comfortable region

#

followed by rescaling the norm to 1

final kiln
desert oar
final kiln
desert oar
#

but yes, normalization (scaling) is linear

final kiln
desert oar
#

that does seem off

#

in this case "2D" means you have 2 possible tokens in the sequence. the idea is that the embedding for each token is centered at 0 mean and scaled to 1 std dev, but that shouldn't involve any nullifying of vector space dimensions. it's just shifting the origin, followed by squeezing/stretching

#

Normalization is an important step in the training of deep neural networks, and it helps improve the stability of the model during training.
i guess this is their explanation

final kiln
#

x - .5*( x + y) = .5(x - y)
y - .5 (x + y) = -.5(x - y)

#

As a sanity check

#

I think that's right unless I got fooled again by my eternal enemy, the minus sign

#

So there's one free variable after subtracting the mean

#

The 2017 paper points here: "Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint
arXiv:1607.06450, 2016."

#
desert oar
#

in statistics the sample average is just an estimate of a population mean so it makes sense in that context

final kiln
#

Yes it makes sense if you look at the coordinates as samples from the same distribution. But that sort of collides with the picture of a vector space where dot products measure similarity.

#

Which is how the best intuitions I've seen are being built from.

desert oar
#

there's also the whole aspect of learning slope and intercept parameters, which kind of throws off my interpretation before

pearl barn
#

guys How to start jupyter notebook locally on my windows I already installed Miniconda

odd meteor
pearl barn
#

How can I run external Jupyter notebook from online source to my windows??

#

Is miniconda enough to have Jupyter or do I need to install whole anaconda??

past meteor
past meteor
# pearl barn How can I run external Jupyter notebook from online source to my windows??

I'll refer you to an editor (visual studio code): https://code.visualstudio.com/learn/get-started/basics. You can install it and you can be up and running in a minute but there's also a 5 minute video you can watch if you want/need to.

You'll have to install the Python extension https://marketplace.visualstudio.com/items?itemName=ms-python.python

and afterwards follow a third of this guide (you certainly don't need to read all of it) https://code.visualstudio.com/docs/datascience/jupyter-notebooks

#

I'm mostly sending you in the direction of tutorials that you need to read and/or watch because of the old adage: “Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime.”

hollow flicker
#

Hey, I've 75k row dataset. I need to use sklearn.MLPClassifier. I have 5 class. My accuracy score every time higher than 0.99. Why this can be happen?

#

my dataset distribution

past meteor
# hollow flicker Hey, I've 75k row dataset. I need to use sklearn.MLPClassifier. I have 5 class. ...

Can you do 3 things please:

  1. When sharing code could you use triple backticks (`) to paste multiple lines instead of screenshots, it's typically preferred here :D
  2. Could you use cross_val_score instead of cross_val_predict and casting to integers? It's the more idiomatic way to do this thing.
  3. Can you split into train and test, cross_val_score on train, then train the classifier "for real" on train, predict on test and then make a confusion matrix.
carmine ore
#

How to optimise linear regression model to produce better predictions?

past meteor
# carmine ore How to optimise linear regression model to produce better predictions?
  1. Using RidgeRegressionCV or ElasticNetCV instead as these models already carry out some hyperparameter tuning for you.
  2. Feature engineering: add interaction terms, binning, polynomials, splines, feature transforms and so on. The best way to identify if you need additional feature engineering is by doing residiual analysis. Plot the error your model is making versus each variable. Normally there should be no structure in the residuals, if there is you may need feature engineering.
carmine ore
#

I will try the models you mention right away

#

When it comes to feature engineering I am trying to avoid it for today

past meteor
#

Does it need to be linear regression? I always try a gradient boosted machine and/or say Random Forest to see what their performance is and then compare that to the linear models. If they're doing significantly better then there's at least a few non-linearities that your linear model is not accounting for.

carmine ore
past meteor
#

Then you should definitely do what I suggested, the gbm / rf model will at least give you a lower bound of performance your linear model should be able to obtain

carmine ore
#

Btw RidgeRegressionCV performed horribly on my test data. I am currently using ElasticNet as it was best one so far

#

The ElasticNetCV performed same as my regular Elastic Net. That’s because I already hand picked the best hyper parameters.

past meteor
#

Then residual analysis and feature engineering and what you should probably be doing. I'd stay with RidgeRegressionCV and ElasticNetCV as adding features will change the hyperparameter values you need

grizzled locust
#

hi guys, sorry for interupting, but anyone knows where i did wrong?

versed pilot
carmine ore
past meteor
serene scaffold
grizzled locust
#

but instead, it looks like this

serene scaffold
#

Can you download those CSV files and open them locally?

grizzled locust
versed pilot
#

Or an easier solution might be to run the code in Colab, it might have direct access to Google Drive that you don't otherwise get

onyx widget
#

hey i'm interesting in getting into ML, what is the best way of starting this?

final kiln
#

My intuition here is that you start with a 512 dimensional space, and you only use a slice of a slice of it, a subspace of 510 dimensions. Still plenty of room to work with, but you excluded points with values that might cause numerical instability.

versed pilot
desert oar
#

the whole point is that the original data could be anywhere in space

#

and you want to bring it all back into the middle

#

but can you share the code? i want to make sure it's actually doing the right thing

#

that is, each "instance" should be normalized within itself, rather than what we normally do, which is each dimension being normalized across all "instances"

final kiln
# desert oar but can you share the code? i want to make sure it's actually doing the right th...

Sure:


Here's the full Python code used for the visualization with shifted and normalized grid points:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate a uniform grid in 3D space
x = np.linspace(-4, 4, 10)
y = np.linspace(-4, 4, 10)
z = np.linspace(-4, 4, 10)
xx, yy, zz = np.meshgrid(x, y, z)
grid_points = np.vstack([xx.ravel(), yy.ravel(), zz.ravel()]).T

# Subtract the mean of each point's coordinates from the point itself and then normalize
shifted_normalized_grid_points = np.array([(p - np.mean(p)) / np.linalg.norm(p - np.mean(p)) for p in grid_points])

# Plotting
fig = plt.figure(figsize=(12, 6))

# Original grid
ax1 = fig.add_subplot(121, projection='3d')
ax1.scatter(grid_points[:,0], grid_points[:,1], grid_points[:,2], color='b')
ax1.set_title("Original Grid")
ax1.set_xlabel('X axis')
ax1.set_ylabel('Y axis')
ax1.set_zlabel('Z axis')

# Shifted and normalized grid
ax2 = fig.add_subplot(122, projection='3d')
ax2.scatter(shifted_normalized_grid_points[:,0], shifted_normalized_grid_points[:,1], shifted_normalized_grid_points[:,2], color='r')
ax2.set_title("Shifted and Normalized Grid")
ax2.set_xlabel('X axis')
ax2.set_ylabel('Y axis')
ax2.set_zlabel('Z axis')

plt.show()

This code generates a 3D grid of points, processes each point by subtracting the mean of its coordinates and then normalizing it, and finally visualizes the original and processed grids.

final kiln
#

In that case dot product is just the cosine similarity. For higher dimensions, you'd always still be working inside an hyper sphere.

#

It's also neat that it always ends up being a manifold. And that the dot product can still preserve that similarity interpretation.

#

How do latent spaces end up working ? Do the networks always partition the vector space into chunks ? Or do they make use of dimensions to represent things ?
(Like in QM, where a dimension will correspond to a possible state )

#

And has anyone tried doing these things with complex numbers ? Or check if the network ends up learning them somehow, and possibly even forming an Hilbert space instead of the usual thing ?

timid trench
#

Guys, has anyone worked on chatgpt's api for python?

desert oar
desert oar
final kiln
desert oar
#

off the top of my head, you could rewrite shifted_normalized_grid_points this way:

shifted_normalized_grid_points = (p - np.mean(p, axis=1)) / np.linalg.norm(p - np.mean(p, axis=1), axis=1)

you might need to scatter in some [: , np.newaxis]

final kiln
# desert oar what do you mean by chunks?

In partitions, from what I saw when I trained a small GAN, it looked like there were regions dedicated to each face and then a continuous morphing from one region to the other.

desert oar
#

i'd say that in general no, NNs don't always partition the data like that. but maybe modern architectures for image GANs are designed to do that, or do it naturally

final kiln
# desert oar ah, i don't know what GAN architectures are like

I used a decoder made of convolution layers. So it took a "small" vector of 512 dimensions and expanded it into an image. I trained it on random 512 vectors so it turned the vector space R512 into a latent space. A path in that space would originate an image of a face morphing continuously. I have the gif somewhere

#

Aaah I lost the gif

glass fiber
#

Can someone help me

serene scaffold
signal holly
#

Hi, how do I build a proper framework/schedule to learn coding?
I'm asking here because I'm trying to learn ml
and heard that you start with the python libraries, like pandas or mat
but doing tutorials and exercises is so boring to me
I usually have to slug through to make any progress
idk do I just push through

winter delta
#

How can i change a list to an array guys

final kiln
final kiln
#

It's also a lot more bearable to work on something I love than to watch someone talk about random stuff I don't need yet.

#

And the nail on the coffin is that studying theory is 10% (or less) of the learning process. You can read an entire book and then not be able to apply any of it.

signal holly
#

No hate towards anyone who does

final kiln
signal holly
#

Were you able to get a job out of it tho? (If you dont mind sharing)

final kiln
#

But I was able to get a job because I've done interesting stuff

signal holly
#

Good to know cause I'm going to college next year and dont know if I want to transfer to cs

final kiln
#

And I like to think that I keep doing interesting stuff ofc ahah

signal holly
#

True lol

final kiln
#

Passion will get you through the worst of it, but you should be careful with the particulars of your life.

#

So like, some people are very passionate about arts, but on avg that won't get you positioned in the job market.

signal holly
#

Yea I currently have compE because resources for learning electronics are usually pricier

#

Compared to cs

final kiln
#

Uhm yeah that's true

#

ML is also very much like that tho, it's very expensive

signal holly
#

Really? Ml is expensive?

#

In what way

final kiln
#

From my experience, even smaller scale stuff will be expensive because you need GPU just to experiment with stuff

#

When you get into the good stuff

#

It's prohibative

#

Like gpt4 for example, you need the backing of Microsoft

#

Anthropic is backed by Google

#

And Meta is literally one of the Maang/FAANG or wtv the current name is ahha

#

Ah and data, data is expensive. One of the lessons that really stuck with me was about the data quality

#

Your model is as good as the data you give it. And the best data you can get is expensive. You'll usually pay a bunch of people to do the manual work of annotating things.

#

Data quality really really does make a huge difference, it's actually insane.

desert oar
#

https://python3.info/ just today i came across this book, it seems like a pretty good place to get started if you're interested in ML

winter delta
#

Between 2 vector, how cen i find exactly number and which is the most show in these 2 vertor

#

Like looking the duplicate number in 2 vector

final kiln
#

I'm confused. You can find out if there's a two by doing any(vector == 2)

#

vector == 2 will apply the == 2 operation to each coordinate

#

Which will result in a boolean vector

#

And any will return True if any of the values is True

grizzled locust
#

anyone understand why it's like this,

#

and not like this?

desert oar
#

you'll need to export/download each document as CSV first

#

(google might have some API to do this programmatically, but i'm not aware of it)

grizzled locust
#

i thought i could use a oversimplyfied version

quaint loom
#

Is there anyone who have face the issue that "semTools" is not an exported object ?

I have downloaded the semTools package, upgraded it and using library (semTools) in the beginning of my code.

Error: 'semTools' is not an exported object from 'namespace:semTools'
In addition: Warning messages:
1: In lav_data_full(data = data, group = group, cluster = cluster, :
lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
2: In lavaan::lavaan(model = model_description, data = data_processed, :
lavaan WARNING:
the optimizer warns that a solution has NOT been found!

trim saddle
#

Thats not python related right? Its R?

trim saddle
quaint loom
quaint loom
versed pilot
whole zephyr
#

hey, does anyone know how I can use seaborn or other viz tools to create a grid plot to display multiple dataframes in it?

I don't mean the "classic" pair plot that takes all the numeric columns in a single dataframe, but rather I want to display, in multiple subplots, the regplot of same 2 columns that are shared across multiple dataframes.

grizzled locust
#

sorry for being slow i guess?

versed pilot
#

sorry, not trying to tell you off, but if you link to the previous discussion then we can move on from there. Instead of starting from the same original question

trim saddle
desert oar
#

!d matplotlib.pyplot.subplots

arctic wedgeBOT
#

matplotlib.pyplot.subplots(nrows=1, ncols=1, sharex=False, sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None, **fig_kw)```
Create a figure and a set of subplots.

This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.
desert oar
#

use that to set up a grid of axes, then you can plot whatever you want on each axes

forest bolt
#

Hello guys I'm working on SOMA implementation do MEALPY and also enhancing algorithms about mirror boundaries. Currently I'm struggling with convergeency problems, is there somebody who could help me. I'll share everything and my last step is to commit these improvements to public package, but unfortunately I'm stuck.

grizzled locust
#

guys, anyone understand where i did wrong?

past meteor
#

Right now that variable isn't a data frame, it's a function you haven't executed yet

left tartan
grizzled locust
past meteor
grizzled locust
#

aight, i understand. thanks.

short heart
#

I'm trying to export my custom bert model to ONNX, but for some reason after loading the exported model it has empty input array, what could be the reason?

late wraith
#

hi

final kiln
#

Maybe my math is off, but I think self attention can be simplified to:

softmax(xMx^T / sqrt(d_k))Vx^T

Where M and V are the learnable parameters.

Went over it a couple times now.

#

This would kinda simplify the interpretation too, since M is kind of acting like a metric tensor

desert oar
jagged nebula
#

hello could anybody help me with uml class diagrams?

desert oar
desert oar
# final kiln Yes

i assume you'd want it to be the same shape as Q K'? how would you construct it?

final kiln
desert oar
desert oar
#

so you have the decoder-side tokens Y, and the encoder-side tokens X. how do you construct this M matrix differently from (Wq Y) (Wk X)'?

final kiln
#

You distribute the transpose

#

Wait

desert oar
#

err, i think i swapped q and k. same idea though

final kiln
#

No, it's X on both sides isn't it ?

#

Doesn't matter

desert oar
#

yeah, GPT is decoder-only and BERT is encoder-only, but this is the most general case

#

in the case of the nanogpt model you were working through in https://bbycroft.net/llm, they already simplified this operation somewhat

#

in general, you project queries, keys, and values into 3 separate spaces. even if they come from the same input sequence

final kiln
#

No I found an error in my calculation

desert oar
#

and even if you enforce that those 3 spaces are the same, you still have this "cartesian product" operation, multiplying all pairs of tokens together (at least looking backwards in the sequence, if you're in a decoder unit)

#

ah, okay then

final kiln
#

No this is too suss wait

#

Qx (Kx)' = Q x x' K'

#

= (x x') Q K '

#

It's probly gonna be the other way around

#

x Q ( x K ) ' =. x Q K ' x' = x M x '

#

The second way makes more sense

#

And is how I produce the matrix

#

I mean both ways produce a single matrix. But the second way makes it so that it's not a scalar mul

#

Looking at the paper, Wq and Wk (which I'm calling Q and K above), have dimension d_model x d_k, since d_model is the size of the embedding vector, it must come from the left as a 1xd_model

cold dawn
cold dawn
#

i have no background into machine learning, im a self taught python 'developer' (im not professional, though i am proficient)

#

How hard is the road of learning to work with ML in python

#

from 0 to being able to expand on open source frameworks

#

if you were to give me advice, please dont focus on required python skills (mentioning some important frameworks to learn is nice though) but instead maybe give some sort of guidance on what subjects to tackle first

#

thx 🙏

left tartan
iron basalt
# cold dawn from 0 to being able to expand on open source frameworks

Expand in what way? Are you trying to make use of ML to solve some problem as a framework, in which you don't really touch the ML part directly, but build around it (like making use of a physics engine in a video game, but not touching the physics engine internals)? Or do you want to make new kinds of ML models (research)? Or the functions required to make those models, etc (e.g. GPU kernels)?

desert oar
#

ah, i see what you're doing here

#
Q = X @ Wq
K = X @ Wk
V = X @ Wv

Q @ K.T = (X @ Wq) @ (X @ Wk).T
        = (X @ Wq) @ (Wk.T @ X.T)
        = X @ (Wq @ Wk.T) @ X.T

i think you had it right the first time, but matrix multiplication doesn't commute so you can't pull out the (X @ X.T) like you did

#

that is a pretty interesting interpretation of what's going on though

final kiln
desert oar
final kiln
desert oar
#

precisely

#

that's actually the whole point!

#

it's basically a "soft lookup" , hence the names "query", "key", and "value"

final kiln
#

But did they try to do two matrices instead of three and didn't work out as well ? Even tho they're equivalent descriptions ? Or did they not realize what they were doing ? A single learnable metric tensor oughta be more efficient

desert oar
#

is there a way to reduce this to a single linear transformation of (X @ X.T) or X? Wq @ (X @ X.T) @ Wk.T

#

oh, i forgot to swap the other lines

#

lol, hang on

final kiln
#

Assuming X is only one vector, that's a single number

desert oar
#

ah yeah

final kiln
#

(which you can assume without loss of generality)

desert oar
#

if X is one vector, that means the input sequence had one token

final kiln
#

Yes

desert oar
#

but anyway you were right after all, you get X @ <something> @ X.T

#

so let me think that through, why you wouldn't want to just have "something" there

final kiln
#

It can be further expanded

#

x @ C @ M @ C.T x.T

desert oar
#

right, that's what i got to above

#

ah, ok

final kiln
#

Something like that, where C is a compression transformation and M is a metric tensor

desert oar
#

it's entirely possible that models which work on a single sequence (not on a pair of encoded and decoded sequences) already do this as an optimization

#

that or it actually doesn't work as well, that i would not know

#

i'm also not sure it allows for masking

final kiln
#

You can include it outside of all of it, when you get a square matrix

#

Like uhm

#

(mask) @ (custom dot product thing)

And then apply softmax, etx

desert oar
#

ah nvm, im looking at the attention is all you need paper now to confirm, and they do the masking after QK anyway

final kiln
#

It's a pretty cool idea this whole thing, but am super curious if the Wq and Wk are really needed and why, and if not why didn't they know it

desert oar
#

again i think in the most general case it allows for two different sequences, encoder and decoder

#

it's probably how they arrived at the concept

#

why they didn't simplify after, i'm not sure

final kiln
#

What do you mean by two different sequences ?

desert oar
#

like in a machine translation scenario, you train it on pairs of e.g. english and spanish sentences

final kiln
#

I haven't gotten to that part yet.

desert oar
#

but GPT doesn't do that

#

as far as i understand, that was one of the earlier use cases of transformers, although one of our local NLP experts would know better than i would

#

GPT and BERT came out later than Attn Is All You Need

final kiln
#

They use a single branch isn't it. Instead of encoder decoder thing

desert oar
#

right

#

interestingly nanogpt (the one you were looking at in the visualization tool) also doesn't do this

final kiln
#

Ah I can't look at it, I'm implementing from scratch

desert oar
#

well if you're implementing your own, you should be able to get the same or similar results doing it your way vs. the usual way

#

that'd be an interesting experiment, to compare training times and results

final kiln
#

Yeah if no one's doing it, kinda sounds like a paper cuz it's one less operation per head right

desert oar
#

the fact that it's not being done even in extremely optimized implementations makes me think we're just missing something

final kiln
#

I mean is a super simple mod, so I doubt anyone hasn't tried it yet

desert oar
#

hm.. is it actually one less operation?

#

i know it's one less matrix multiply, but the dimensions involved are bigger

#

originally you have (d_batch,d_model x d_model,d_key) x (d_batch,d_model x d_model,d_key).T so the inner multiplication is between matrices of relatively small dimension d_key

#

it's the same number of dot products, but the dot products are between smaller vectors

#

hm... no, that doesn't matter. because you're kind of proposing that the dot products themselves are essentially pre-computable

arctic wedgeBOT
#

keras/layers/attention/multi_head_attention.py line 626

def _build_proj_equation(free_dims, bound_dims, output_dims):```
quaint loom
#

Is there anyone who know why one would use Bootstraps in a structural equation model?

past meteor
quaint loom
#

As you`re not familiar with SEM, you may not know why semopy is not able to calculate the r-square.

torpid violet
#

Hi

#

Is there any one having good knowledge of opencv and ml
I want to build a project for that I need some navigation I can make possible that If any one is here who will help me then please reach me

I have very good project and we can build it together

#

Then DM me

velvet thorn
#

Hi all, any resource for learning generative ai using python?

last ivy
#

Guys imma planning to develop a ml model so can u guys suggest some fresh and new ideas with some complexity involved ?

supple osprey
#

@last ivy yes I have idea

final kiln
#

in the case of nano gpt, 2*1/3*d < d -> 2/3 < 1

#

so the way it's done makes training more efficient

#

and if the other way around is more efficient for inference, it should be possible to reduce one form to the other

oblique quarry
#

Could someone smarter than me tell me why the resulting matrix doesnt have ones along its diagonal? Even though the paper explicitly states that the sqrt of a matrix has to be the original matrix when taking the dot product with itself ```

m.pearsonsCoefficient(covMatrix)
array([[ 0.60948941, -0.06662308, -0.59805044],
[-0.06662308, 0.00828873, 0.03770686],
[-0.59805044, 0.03770686, 1.34752355]])

https://paste.pythondiscord.com/OV5A
#

Yes I tested this method and the sqrt method works just fine as youd expect. Sigma is in this case the covariance matrix

desert oar
lapis sequoia
#

I am a newbie. can anybody give a road map for AI.

desert oar
#

that expression is just dividing each element by the square root of the product of its corresponding variances

oblique quarry
#

Im not a native speaker, so in simple english; are you just supposed to divide the cov Matrix element wise?

desert oar
#

i'm not sure what eigenVectors**0.5 * np.linalg.inv(eigenVectors) @ eigenVectors is supposed to do, but maybe i'm just too rusty with the math here

oblique quarry
desert oar
#

there's probably a way to rewrite x_vars_sqrt @ x_cov @ x_vars_sqrt using numpy broadcasting instead of constructing x_vars_sqrt explicitly. but the code above is the typical formula. it's also what's shown in your screenshot

oblique quarry
#

But wouldnt this contradict the assumption A^0.5A^0.5 is A?

desert oar
#

ah, you're trying to solve for the square root of the diag matrix that way

oblique quarry
#

Yeah im honestly kinda confused as well but I just went along as the author said and here I am 😉

desert oar
#

in this case a = np.diag(np.diag(x_cov)). the formula says that you want the square root of that thing. but we know by construction that a is diagonal, so we can use the special case formula where we just take the square roots of the elements

#

it should make sense intuitively... what is the result, in general, when you multiply two diagonal matrices?

#

i believe all the more-general matrix square root techniques depend on that result for diagonal matrices

#

in any case, you shouldn't need to compute the eigendecomposition here

oblique quarry
#

Yeah now that you mention it. It does make sense

#

Im just not good at math lmao

desert oar
#

again, Σ in your text is just the covariance matrix, it's not related to eigenvalues

oblique quarry
#

Yep, thank you i now get along the diagonal only ones, which makes sense since they correlate to each other in a 1 to 1 ratio

oblique quarry
whole zephyr
#

hello, is there anyone who's more familiar with time series? more specifically price data

serene scaffold
whole zephyr
winged sigil
#

By the help of an AI model i want to assess mental health of a kid using a survey/questionnare. But the problem is i dont have appropriate data set to train my model for this. What should I do in this case. can the concept of coldstart help in this case. If yes then how ? Also if i use 10-20 questions then is there a way to make the model learn from itself, like can i apply reinforced learning in this. If yes then how ?

serene scaffold
winged sigil
#

yes something like this only, no open-answers. But we've to remember that questions will be specifically for children. So whatever you think might be used, please elaborate on it. I am in middle of a competition and I need some clarity on it. @serene scaffold

#

requesting anyone to please help. I need some information immediately. Kindly understand

serene scaffold
winged sigil
#

we just have to assess the responses of the questions and based on that we have to show a rating. Thats it.

serene scaffold
winged sigil
#

yes

#

because we want to train our model to our specific needs

supple osprey
#

@serene scaffold
Is there any one having good knowledge of opencv and ml
I want to build a project for that I need some navigation I can make possible that If any one is here who will help me then please reach me

I have very good idea and we can build it together and if not can you guild me through it

small wedge
supple osprey
#

I want to build a logging tracking system with opencv with addition of ml algo

#

I just need of some guidance if someone having clear vision of ml opencv

#

We can make a little conversion

#

Can I DM you

obtuse bane
#

Does anyone here have experience the bureau of labor and statistics series id's? I'm working toward collecting more targeted data for analysis but the process of ascertaining data from their ids is cumbersome.

past meteor
past meteor
# supple osprey We can make a little conversion

People don't typically engage with such requests here, it's best that you ask specific questions e.g., "how do I do this in opencv" or "this is how I'm approaching it, is there anything wrong with it" compared to "I need guidance on task X, can anyone help?"

knotty flume
#

anyone here interested in joining a hackathon, need people with decent data science and ai skills. Drop a dm, making a team

|| My previous team i joined all quited so making my own team ||

whole zephyr
#

any ideas on how I could represent chart patterns on time series as features?

for example, I want to represent wedges or triangles on portions of the graph as some parameters that define them (as I would represent a trendline for the last i-X days with a slope at point i)

blazing vale
#
if i1==3:
               a1=df['Year'].value_counts()
               print(pd.DataFrame(a1,index=['A','B','C','D','E','F','G'])) 
#

getting all indeces as nan

#

any clue why?

serene scaffold
#

@blazing vale df['Year'].value_counts() returns a Series where the indices are values from df['Year'] and the actual values are integers for how many times that index appeared in df['Year']
so, why are you trying to change the index to letters? what would that even mean?

woven sluice
#

What is the idea behind this transformation?

# Here we map each temporal variable onto a circle such that the lowest value for that variable appears right next to the largest value. We compute the x- and y- component of that point using the sin and cos trigonometric functions.
df['day_sin'] = np.sin(df.day*(2.np.pi/31))
df['day_cos'] = np.cos(df.day
(2.np.pi/31))
df['month_sin'] = np.sin((df.month-1)
(2.np.pi/12))
df['month_cos'] = np.cos((df.month-1)
(2.*np.pi/12))

desert oar
#

mathematically it's like turning the number line into a circle

#

i suggest actually drawing it out on a unit circle

#

btw you probably want to use code formatting for this:

df['day_sin'] = np.sin(df.day*(2.np.pi/31))
df['day_cos'] = np.cos(df.day*(2.np.pi/31))
df['month_sin'] = np.sin((df.month-1)*(2.np.pi/12))
df['month_cos'] = np.cos((df.month-1)*(2.np.pi/12)) 
#

!code

arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

desert oar
#

that said i don't know if day-of-month is all that useful

#

maybe for things like paydays and stuff that's on a bimonthly cycle

hollow flicker
#

whats different between sklarn iteration and epochs?

desert oar
#

but months are inconsistent in size and occur somewhat arbitrarily at various weekdays, so i think you'll have relatively low signal from day-of-month

desert oar
hollow flicker
#

for example MLPClassifier doesn't have epoch value

#

It's only accept max_iteration

#

when i plot my loss_curve, iteration is just 12

#

but if i check internet other models return as a epoch

#

this is my loss curve

desert oar
silent gull
#

what's a good dataset to practice training an ai on?

serene scaffold
crisp shuttle
#

Greetings everyone, I have a question, who has actually managed to fully train a functional Multi Linear Regression model (at least more that 6 features) using their off-the-shelf pc/laptop or at least Google Colab?

desert oar
#

the only thing you really can't do on off-the-shelf general-purpose hardware nowadays is deep learning for massive models

#

linear regression on moderately large datasets has been doable on off-the-shelf general-purpose hardware since the 90s

outer widget
# hollow flicker but if i check internet other models return as a epoch

These are common tensorflow models plots. If you are using skelarn MLPClassifier, iterations are same as number of epochs. Usually in DL frameworks, one epoch means N iterations where N is basically (total samples / batch size). Maybe thats why its a bit confusing when we set max_iter parameter in MLPClassifier.

#

sklearn*

humble cobalt
grizzled locust
#

Hi Guys, i wanted to add value to the bar chart but it ended with error. where i did wrong?

outer widget
blazing vale
#

btw anyone knows how to get years along with this in output

#
if i1==4:
               a1=pd.DataFrame(df['Year'].value_counts())
               print('Year in which most number of games were released',a1.max(),'\n Year in which least number of games were released ',a1.min())
               space()```
#

so i have a csv dataset of 7 years which consists all the info of games released on the ps4 console

#

2013-2020

#

however this piece of code is giving me only max number of games

#

and min number of games

#

it isnt giving me the years along with it

small wedge
# blazing vale ```py if i1==4: a1=pd.DataFrame(df['Year'].value_counts()) ...

so value_counts returns a series that is automatically sorted for you, there's no need to turn it into a dataframe for this. Here's an example of how you can use it for this:

>>> a = pd.DataFrame({'Year': [1997, 1998, 1997, 2005, 2005, 2005]})
>>> a1 = a['Year'].value_counts()
>>> print(f'Year where the most games released {a1.index[0]}, got {a1.iloc[0]} sales\nYear where the least games were released {a1.index[-1]}, got {a1.iloc[-1]} sales')
Year where the most games released 2005, got 3 sales
Year where the least games were released 1998, got 1 sales
blazing vale
#

ohhh thankks mann

blazing vale
#

i am getting the same output but its giving me the name and dtype too. anyway to remove that from output?

knotty flume
humble cobalt
small wedge
blazing vale
#

ohh how do check it?

#

what version i have

#

i can share my output

#

this is for value_counts

#

Year
2017 254
2016 222
2015 172
2014 98
2018 39
2013 20
2019 12
2020 8

small wedge
blazing vale
#

ok

#

i am using 2.0.3

#

Year in which most games were releasedcount 254
Name: 2017, dtype: int64,Year in which most games were releasedcount 12
Name: 2019, dtype: int64

small wedge
#

interesting

blazing vale
#

if i use only iloc i get this output

#

i am using a dataset of 826 rows

#

and 9 columns\

#
if i1==4:
               a1=pd.DataFrame(df['Year'].value_counts())
               print(f'Year in which most games were released{a1.iloc[0]},Year in which most games were released{a1.iloc[6]}')
               space()```
small wedge
#

oh

#

you're still converting it to a dataframe

#

that's why the output is different

blazing vale
#

ohh waittt i forgot to do that

#

lol

#

lemme make changes

#

working properly now 🫡

#

thankssss

small wedge
#

np

blazing vale
#

but i still have a question

#

if i use it with a df why it returns name and dtype as well

#

but when i use the same func with series it doesnt do so

#

thats strange lol

#

and cool at the same time

small wedge
#

because a dataframe returns a series when you index via iloc

#

but a series returns the value that's at the index

blazing vale
#

ohhh

#

so series just returns the value

#

whereas df returns a series the value along with name and dtype

blazing vale
#
if c==5:
      while True:
            print('''Enter 1 to get Total Sales of all games\n
Enter 2 to get Total sales in each genre and by each publisher\n
Enter 3 to get game info about the games with Maximum and Minimum Sales across each Region and ROW Sales\n
Enter 4 to get Maximum and Minimum Sales made by each publisher across each Region and ROW Sales\n
Enter 5 to get Maximum and Minimum Sales made in each genre across each Region and ROW Sales\n
Enter 6 to return to previous Menu''')
            space()
            i1=eval(input('Enter your choice: '))
            space()
            if i1==1:
               print(df[['Game','Year','Genre','Publisher','Global']])
               space()
            if i1==2:
               if df['Global']=='Action':
                  a1=df['Global'].sum()
#

@small wedge

#

Here

small wedge
#

mhm

blazing vale
#

lemme show the error

#
Traceback (most recent call last):
  File "C:\Users\LENOVO\Desktop\IP Project\Ip project101.py", line 130, in <module>
    if df['Global']=='Action':
  File "E:\lib\site-packages\pandas\core\generic.py", line 1466, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

#

error

#

data set

small wedge
#

what are you trying to do with these lines?

if df['Global']=='Action':
  a1=df['Global'].sum()
blazing vale
#

trying to find total global sales of each genre

small wedge
#

ah ok

blazing vale
#

here are all the genres if needed 'Action','Shooter','Action-Adventure','Sports','Role-Playing','Platform',
'Racing','Fighting','Adventure','MMO','Simulation','Music','Party','Strategy','Puzzle','Visual','Novel','Misc'

small wedge
#

so when you do df['Global']=='Action' this creates a series that is a mask of 0's and 1's

#

you can use this mask to index your dataframe, then take the sum from the result instead

#

df[df['Global']=='Action'].sum()

blazing vale
#

Ohhhhhhhhhhhhhhhhhhhhh 💀

#

I am so dumbbbh

#

I could have done this lol

#

😭

small wedge
#

s'all good, sometimes you get lost in the sauce

blazing vale
#

Yeah

#

Hey if i wanna do it all for once for all genres

#

Then should i pass a list of everything there

#

Ohh wait then that would do sum of everything too 😭

small wedge
#

yeah and use df['Global'].isin(['Action','Other stuff', ...])

blazing vale
#

Should i define this function?

#

And use it again and again just by giving the name

#

Of the genre

#

This would reduce the typing and copy pasting part alot lol

small wedge
#

you could, you could also use groupby to split them all up for you

#

then select the groups you want and take their sums

blazing vale
#

Isn’t groupby a sql function 💀

#

Never knew its in pandas too

#

Damnnnn

blazing vale
small wedge
#

a function for creating masks that match more than one category

blazing vale
#

Ohh

small wedge
#

just like a == 'a' or a == 'b' is cleaner to do as a in [*'ab'] for pandas you use isin

blazing vale
#

I dunno but i think we call it boolean indexing here(the masks)

#

I dunno if its the same thing lol

small wedge
blazing vale
#

Yeah

#

Cazue it returns true and false when checking condition in df

#

And series too

#

Is it the same thing?

#

I am almost done with my project lol thanks to you

small wedge
#

is what the same thing?

small wedge
blazing vale
#

like boolean indexing and masking

small wedge
#

probably but idrk shrug

blazing vale
#

lool

#

thanks again

#

imma continue my work further

blazing vale
#

@small wedge hey 🥲

#

Game 0
Year 0
Genre 0
Publisher 0
North America 0.0
Europe 0.0
Japan 0.0
Rest of World 0.0
Global 0.0

#

getting this output

buoyant vine
#

In Pytorch how do you create a zero(?) dim tensor with a single value...
A couple of metrics return a tensor(0.4) but I have no idea how that is created wearyfire

blazing vale
#

used the code u gave me

#

it isnt working

small wedge
blazing vale
#

ok

#
 if i1==2:
               print(df[df['Global']=='Action'].sum())