#data-science-and-ml

1 messages Β· Page 156 of 1

tidal bough
#

Looks like you shuffle the dataset (should use shuffle=False for time series). Also I'm a bit confused where the dataset is split into smaller timeseries - or does your LSTM take sequences of length 1?

gray slate
#

what goes in here? like, is the data set [0, 1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7] and when split you have overlapping segments?

iron basalt
#

This goes back to the exposure I mentioned earlier. You just need to see a lot.

#

Same with math.

gray slate
# iron basalt Same with math.

Yeah I've seen a lot of procedural things I guess, so have a kind of process-based outlook on things. Too old to recalibrate myself for functional now :/

gray slate
tidal bough
# tidal bough Looks like you shuffle the dataset (should use `shuffle=False` for time series)....

(I think what's happening is that you're giving your model 80% of randomly chosen points along a curve, from which it's not hard to learn to predict the (few) gaps - the model just needs to learn to linearly interpolate between points it has seen. Whereas for timeseries prediction you need to, nonrandomly, use the first 80% of dataset for training and then the last 20% for prediction - then the model actually needs to learn how to extrapolate far into the future.)

rancid sorrel
#
-|-|-|-|-
| 202401311456 | 0.686809    | 0.685685   | 0.693007  | 0.684981    | 0.00159292  |
| 202401311457 | 0.68398     | 0.685171   | 0.693259  | 0.685494    | 0.00108766  |```
#

is the input data first colmun is being dropped as its the date column

#

crap

gray slate
#

columns are "start, end, high, low, ??"

rancid sorrel
#

open,high,low,close,volume

#

aiming for close

#
created_by: parquet-cpp-arrow version 18.1.0
num_columns: 7
num_rows: 8601
num_row_groups: 1
format_version: 2.6
serialized_size: 4036


############ Columns ############
Date
Open
High
Low
Close
Volume
__index_level_0__```
#

if you ant the full description

#

and yeah this is normalized data using min/max scaling

gray slate
#

are you predicting current close? or close of next minute?

rancid sorrel
#

honestly not sure, its probably prediciting the next one

gray slate
#

because it seems to me like "current low" and "current high" are gonna flail around until the end, you'll only be able to predict it if it stays within the window?

rancid sorrel
#

using non shuffeled data i got scarrly same data

gray slate
#

from what I understand, trading is largely based on psychology and superstition. people believe in technical analysis so they become self-fulfilling prophecies. traders use specific time windows like 1, 5, 15 minutes, 1 hr, 4hr etc, and look for patterns in the graphs

#

so you'd need to take as much data as is fits on a trader's screen, at multiple scales, and use that as your input data

rancid sorrel
#
"Train Precision": 49.99, 
"Test Precision": 100.00, 
"Train Recall": 50.00, 
"Test Recall": 100.00, 
"Train F1 Score": 49.99, 
"Test F1 Score": 100.00, 
"Train Accuracy": 99.99, 
"Test Accuracy": 100.00, 
"Train MSE": 0.0473, 
"Test MSE": 0.2405, 
"Train RMSE": 2.17, 
"Test RMSE": 4.90, 
"Train MAE": 1.81, 
"Test MAE": 4.89, 
"Train MAPE": 1895412317.84, 
"Test MAPE": 6.56, 
"Train R^2": 98.90, 
"Test R^2": -78.00 ```
#

this is also with dropping in tensroflow to elimate overfitting

#

those are %

#

i dont trust data that good

tidal bough
#

I'd look very carefully at what data the model actually gets in fit_model

#

like, what shapes are X_train and y_train and their first few values and where they came from

gray slate
#

you'd need a ton of data for that I guess. you could maybe scrape it from tradingview or youtube though πŸ€”

rancid sorrel
#

(8601, 6)
total data shape of input
date is being droped competly as LTSM dosnt need it
train_x (ie open,high,low,volume)
(6880, 1, 4)
test_x
(1721, 1, 4)
train_y (close)
(6880,)
test_y
(1721,)

#

looks fine tbh

gray slate
#

what's len(set(", ".join(row) for row in data))

#

also drop the first column and check that. see if you have duplicate data

tidal bough
#

as in, I think just guessing close=open would have a low loss here.

rancid sorrel
#

mostly likely yes

gray slate
#

and the data is after close, and during the period you don't know high or low anyway, they're in a state of flux

rancid sorrel
#

see i knew it was too good to be true

#

ty guys

tidal bough
#

that probably explains your results, then. This isn't timeseries prediction. You need to transform your dataset into timeseries, so that each input to your model consists only of previous datapoints and the model has to predict the current one

gray slate
#

yeah, you need "an average trader's screen width of data", and you want to predict a trend line from the last data point

#

if you're hoping to do high frequency trading, you're gonna get front-run by your broker's mates anyway - they're selling your data to the highest bidder who can make trades before you do

#

if you're not then you need a longer period of time

#

Idea being that you're predicting the behaviour of people who are doing technical analysis, or taking advice from people who are doing it, or are running bots that are acting that way. Then you add in sentiment analysis based on news + social media, or companies that provide feeds of those things (ideally ones that other traders are using)

tidal bough
#

Do humans actually... do technical analysis nowadays? I thought it was mostly a 1900s thing.

gray slate
#

it's the only legal way to conspire isnt it?

#

if you have a handbook of plays that are called "technical analysis" rather than "illegally conspiring with other traders using price history as a communication channel" then you can all conspire and get fat together

#

obviously when it comes to shares sentiment analysis is a channel (we get our mates in the financial press, as a covert signal), and quarterly reports are too (dunno if it's all hard data, the conspiritard in me would expect there to be secret handshake language in them), and global political news for things like currency and commodities, they send messages for what nation states want, and those who go against them won't get free printed money

#

I'd imagine a lot of that is decodable via machine learning techniques

#

there's a nonzero chance i'm talking out of my arse though πŸ˜‚

unkempt wigeon
#

Has anyone ever thought of layer probability to add into
W*X+B

gray slate
unkempt wigeon
gray slate
#

lol I assumed I read it wrong, saw your edit!

#

you mean like a loss function for a specific layer?

unkempt wigeon
#

Why mean is taking a sum of the amount of neurons within that specific layer and getting the average or the sum of what the layer is and then passing that through to tell the network which layer might need to be tweaked a bit to get a more perfect answer

gray slate
#

do you know what the layer represents? like, did you train some of it one way and now you're locking layers during fine tuning or something?

#

or added a bunch of layers after training

unkempt wigeon
gray slate
#

I think there's "elastic weight freezing" that you use to stop catastrophic forgetting when you're tuning or continually training

#

that's kinda "find the most important weights and make sure they don't change too much because they have the strongest effect on the chance of a decent outcome"

#

but it's not per-layer, it has to be per-weight because in general you don't really know how the network is gonna capture the transformations. well, unless you are carefully designing the architecture to have certain properties, like encoder/decoder pairs

unkempt wigeon
#

Innocence you could make a sub neuron group like a class for each neuron there's eight layers of neurons in each one that gives you a binary coded pair one's and zeros similar to a human brain but that would require a high system just to render maybe one neuron because it needs a lot of power and processing to go through each some layer of the neuron to find out the probability or number that should come out

gray slate
#

might be cool to have a single bit that says "if this bit is set, then don't change the power of this float. you can change its mantissa but not its power"

#

you'd get elastic weight freezing for 1 bit of RAM. But you'd also need to do some pretty low-level hacking in CUDA

#

dunno if its even possible. thought about it the other day but have no experience working at that level in CUDA

#

(I'm not a machine learning expert btw, just a graphics / performance nerd who is dipping his toe into this)

unkempt wigeon
#

Here's what I thought as a possible way sorry for my messy handwriting

unkempt wigeon
gray slate
unkempt wigeon
gray slate
#

ah okay. what's the intended outcome here?

unkempt wigeon
# gray slate ah okay. what's the intended outcome here?

To have the network get closer to winnable probability of it saying this is correct by locking the next layer till the probability outputs the perfect closeness sure it seems stupid but heck it might be the most probable answer that can be obtained making it so that the network might be slow but after it gets the perfect probability on each later the networks learns more about each probability it does the back tracing as it's working on going forward so that I can tweak it to get an outcome that can be quickly obtained without going through the entire layer finding out which layer needs to be changed and then completely reworked by doing the process and parallel you could probably cut down on all the processes

gray slate
#

from what I understand, as the network learns, the first layers tend to settle down first and learn higher order patterns. then later on the deeper layers flap about more and learn more nuanced ones. You can use that to your advantage with curriculum learning - you give it easier data to start with then ramp up the complexity and reduce the learning rate as you progress

gray slate
#

though per layer sounds interesting. like if you found parts of your dataset that caused more variance in later layers, you could perhaps run a few tests and automatically build a curriculum?

unkempt wigeon
gray slate
#

Is P the loss at the current weight, and L of P some aggregate of the loss of its inputs?

unkempt wigeon
gray slate
#

I may be the wrong person to be asking this tbh! I kinda get the general principles but lack the mathemagic and practical experience!

unkempt wigeon
gray slate
#

seems like the whole field is blundering through hacks that kinda work via magic, and using empiricism to prove it

#

Do you have a specific task in mind?

unkempt wigeon
# gray slate Do you have a specific task in mind?

No I was trying to condense the process into one task that can be done by the computer all at once I know it seems stupid but it allows the highest form or the best data with the highest probability to go forward to next wine and then it would lock the best data strings and then take the data from that string pass it on to the next one if it's already been previously unlocked via being above the threshold

gray slate
#

there's a lot of tasks out there, y'know

unkempt wigeon
# gray slate there's a lot of tasks out there, y'know

I know I want it to really learn anything it really depends on when I want to set forth on but so that the network gets the highest probability through training having this is more of a genetic algorithm but it can adapt almost if the probability is high for something specific within that neuron arrangement or layer it allows it to pass on to the next one and if so on so forth it gets to the end it learns that data because it took multiple tries for the data to be fit enough to go through and be learned

rich moth
#

So try to imagine a single tool that can quantify the complexity of any dataset. Doesn't matter whether it’s images, text, or numbers. Using a unified metric using Phi(x), it basically quantifies the complexity of the data based on things like density, entropy, phase, and uncertainty. It doesn’t stop at just measuring complexity. Phi(x) helps uncover hidden patterns and relationships like identifying chaotic points in a time series or spotting high density regions in images or finding real relationships in tabular data. Its like an x-ray for datasets, a way to make sense of the abstract, hidden stuff in data.

rancid sorrel
#

you should also measure nosie

rich moth
rancid sorrel
#

the biggest generic one is to find the data distrabtion

gray slate
#

compress it for starters I guess?

rancid sorrel
#

and the clustering of data

gray slate
#

I mean, decompress it first so it's raw. then throw it into PAQ

rancid sorrel
#

do you wanna know the "biggest" test for data?

rancid sorrel
#

second order difrrential of the dataseet

gray slate
#

rate of change of rate of change?

rancid sorrel
#

yup

#

d^2x
dy^2 or something

gray slate
#

you need to know dimensionality too though for that dont you?

rancid sorrel
#

not really

#

cause you want the trend line to be junk

#

you get an actual trend line your data has been faked

#

its how they spot fake data in publications when someone with brains looks at it

gray slate
#

well a bitmap, for example, is [y][x][channels]

rancid sorrel
#

and it should, return junk

#

it returns a r2 of say 1 your fucked

#

cause the data has been manipulate

gray slate
#

not sure I understand the reasoning behind that. I could make it junk by multiplying in bytes from /dev/urandom couldn't i?

rancid sorrel
#

yeah but the ppl who submit their papers dont know that

gray slate
#

ah ok lol

#

they fake their papers and don't understand that they shouldl find the normal distribution of the thing they want to fake and introduce some variance from it?

#

and, people actually do that? makes me very skeptical of any science that doesn't come with full datasets and runnable code; docker container / gtfo

rancid sorrel
#

sorry trying to find the video on it

#

can cant right now

gray slate
#

I've a theory that anything that backs up something we already believe culturally is not science, it's ethics in a lab coat, and that the truth value of something is inversely proportional to how hard it's pushed.

rich moth
gray slate
#

clustering looks pretty cool, what's it clustering?

rich moth
gray slate
#

PCA = principal component analysis? what does that do exactly?

rancid sorrel
#

is point B related to A

gray slate
#

oh so that's the colour space of the image? like luminance and hue or something?

rich moth
gray slate
#

but does it operate on tensor array of pixels?

rancid sorrel
#

this data is clearly clusterd for example

gray slate
#

or tensor of pixels I guess. array of pixels?

rancid sorrel
#

i mean you got a large group of one greyscale colour in an area kmeans would probably pick it up

rich moth
#

not exactly classification. clustering is a unsupervised learning technique.

rancid sorrel
#

i mean it tells you if say your data is a member of A,B,C

#

so yeah you can use it for classifcatoin

#

like for example
you like drama on netflix, if you like anime

#

thats one of the things kmeans can do

gray slate
#

yeah I get k-means. it kinda draws lines through the space cutting it into bubble type things that pop and stick together until you have N left?

rich moth
#

its results can inform or be used in conjunction with classifcation, but their two seperate beast.

gray slate
#

with 3d it'd be planes, and with higher dimensions some n-dimensional surface line that's cropped by intersections

rancid sorrel
gray slate
#

k-means seems extremely computationally expensive from a gfx hacker point of view! but I guess it's generic

rancid sorrel
#

oh absoutly its a insane computation cost

#

its like what o^e time or someshit

gray slate
#

data science has all of the brute force and none of the ignorance lol

rancid sorrel
#

its been a while since ive consulted the chart

#

yeah i have a chart somewhere of alot the algorthims complexity

rich moth
#

isnt k-means a general algo?

rancid sorrel
#

yes, but like a nuke is a long understood weapon you dont wanna pull it out

#

on big ass data ;)#

rich moth
#

πŸ™‚

#

I mean there are several things going on but thats half of it

#

PCA is the other key

gray slate
#

hmm looking at it, k-means doesn't seem all that bad for iterations, it's splitting the space that'll hurt I guess

#

if you don't know how many clusters you have then I guess you need to have everything be a centroid then you'll get worst-case performance?

rich moth
#

That was the last of the visuals

rancid sorrel
#

there are entire subject on image processing tbh

#

its one of the more developed feilds, that and signal processing

wheat merlin
rich moth
#

scaling and PCA

#

But since all datasets will be memory efficent it solves that.

#

well all is a big word lol i cant say that yet

gray slate
#

The guy who invented YOLO put an entire course on computer graphics on YouTube

rancid sorrel
#

best option you have is paqute and avro

#

parqute if you want columns avro if you want rows

gray slate
rich moth
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

gray slate
#

Well worth a watch, since, well, y'know, he actually invented YOLO. Knew a fair bit going in but there was a lot of good stuff in there too

rancid sorrel
#

whats pretty funny is disney cracked "green screen" decades ago

#

then lost it for 50-60 years

#

then some youtubers worked it out again

gray slate
#

what do you mean?

rancid sorrel
#

so pretty much all VFX now gonna be recorded in video streams

#

Squarespace β–Ί Head to http://squarespace.com/corridorcrew to save 10% off your first purchase!
Our videos are made possible by Members of CorridorDigital, our Exclusive Streaming Service! Try a membership yourself with a 14-Day Free Trial β–Ί http://corridordigital.com/

Four years ago, we learned about Disney's magic prism that created the best t...

β–Ά Play video
rich moth
gray slate
rancid sorrel
#

you should also use the python profiler

rich moth
rancid sorrel
#

yeah will tell you whats going on with how many times stuff has been called ect

rich moth
#

ok let me check that out thanks

rancid sorrel
gray slate
#

snakevis

rancid sorrel
#

cprofile

gray slate
#

snakeviz? can't remember. it's good for profile outputs though

rancid sorrel
#

i used it to test dataclasses

#

and holy fuck they are effiecnt

gray slate
#

if I had enough data

rich moth
gray slate
#

might be better as a web app actually, and have people type data in and post it to something that uploads it to archive.org

#

pretty sure milint and the likes already have this data. would be cool to let everyone else have access to it

rich moth
gray slate
#

Well, I'll run it for a bit locally and see if I can get something working I guess. If I can then javascript -> fastapi -> internetarchive -> tag the uploads so they can be found by anyone who wants to train it. release as a docker container

rich moth
gray slate
#

I doubt HuggingFace would allow it lol

rich moth
#

What about streamlit?

#

I guess thats not the best option

gray slate
#

Dunno, decentralized is probably the way to go for something that might upset the safety crowd

rich moth
#

How bout FASTAPI for the backend React.js for the frontend and leverage services like elastic cloud or AWS Elasticsearch and Grafana.

gritty vessel
#

Hey guys I wanted to ask I am working with conv3d

#

So if data is of height 300 width 300 depth let's say I stacked all images so it's 150

#

H=300,w=300,d=150

#

So If I keep kernel size x,y,10 so it will be capturing temporal features right? Till 10 time steps

gray slate
# rich moth How bout FASTAPI for the backend React.js for the frontend and leverage services...

Well, with Internet Archive I don't need to pay anyone. I don't run it on anyone else's machine, I'm not responsible for it as a whole. It just runs when people decide to run it, they need an archive.org account - which is free. It collects and tags data, dropping it into a folder that can be downloaded by anyone, and Brewster's bunch aren't gonna delete data collected like that either. And some people will upload good data, others bad, and you can filter them by their archive.org username

rich moth
#

wow this is awesome thanks for sharing i didnt know something like this existed.

rich moth
gritty vessel
#

Encoder-decoder arch

rich moth
# gritty vessel Yeah

Do you plan on maintaininn the dims throughout the nnetwork or will you collapse them at some point?

#

I mean what kind of output are you trying to gennerate with the encoder-decoder?

gritty vessel
#

It's unsupervised classification

#

So what I am trying to so is if I can reconstruct an image properly

#

If can reconstruct the images properly I will apply clustering on the latent space

rich moth
#

So that complexity tool is cool and all but the real magic is how it adaptive to a dynamic adaptive transformer. The cool part is the transformer can automatically adjust its dim size based on the complexity of the data, so like bigger for more complex stuff and smaller or simpler things. It's like the automatic transmisson of a car as gear change it speaks with a sensor to dynamically shift it,

rancid sorrel
#

you kno what i just thought of that would befucking awsome

rancid sorrel
#

do you know what a continous varible transmission is?

rich moth
#

I own one a Nissan Altima 2014.

rancid sorrel
#

make one for transfromer

rich moth
#

its is pretty awesome idea.

rich moth
plucky iron
#
# folder structure
# root/
# β”œβ”€β”€ demo/
# β”‚   └── app.py
# β”œβ”€β”€ resnet18/
# β”‚   └── app.py
# β”œβ”€β”€ resnet34/
# β”‚   └── app.py
# └── app.py  # Main app to route traffic


from fastapi import FastAPI
from fastapi.middleware.wsgi import WSGIMiddleware
import os

# Create a FastAPI instance
app = FastAPI()

# Mount the apps
def load_subapp(folder_name):
    folder_path = os.path.join(os.getcwd(), folder_name)
    exec(open(os.path.join(folder_path, "app.py")).read(), globals())
    return globals().get("app")

app.mount("/demo", WSGIMiddleware(load_subapp("demo")))
app.mount("/resnet18", WSGIMiddleware(load_subapp("resnet18")))
app.mount("/resnet34", WSGIMiddleware(load_subapp("resnet34")))

@app.get("/")
def read_root():
    return {"message": "Welcome! Use /demo, /resnet18, or /resnet34 to access specific models."}```

How can I route?
fickle shale
#

just ask

gritty vessel
#

Is there any way to to capture variable time steps in data set like I have many events

#

Rain events*

#

Some events are 9 hours long and some are like half hour

#

So when I create batches

#

One event has like 50 time steps that's the total duration of the event

#

And some had like 2 time steps that's around half hour

gritty vessel
#

Let's say event 1 ,timestep 1 will have a image

#

All the time steps in that event will have an image

#

But count of image differs in each event

fickle shale
gritty vessel
#

No what?

#

How can have 5 image in 1 image

fickle shale
#

can u elaborate ur question i am not able to understand!

gritty vessel
#

Wait I will explain again

#

See it's raining for 9 hours so we will have 56 images for those 9 hours

#

This all 9 hours in 1 event

#

After few hours it rains again for 2 hours

#

So it's event 2 but now it has 12 images

#

Do you understand it now?

fickle shale
#

I don't know may be others can help u

fickle shale
#

and ur data look likes?

gritty vessel
#

My bad

#

It's sequence of images

#

But for each event sequence is different

#

And size of sequence is different as well

hybrid locust
#

Thing is, when running Main.py, it throws the following error. Do you know how to fix it?

midihum_model loading model from model_cache\midihum.json and model_cache\midihum_scaler.json
C:\Users\Guido\AppData\Local\Programs\Python\Python313\Lib\site-packages\sklearn\utils\_tags.py:354: FutureWarning: The MyXGBRegressor or classes from which it inherits use `_get_tags` and `_more_tags`. Please define the `__sklearn_tags__` method, or inherit from `sklearn.base.BaseEstimator` and/or other appropriate mixins such as `sklearn.base.TransformerMixin`, `sklearn.base.ClassifierMixin`, `sklearn.base.RegressorMixin`, and `sklearn.base.OutlierMixin`. From scikit-learn 1.7, not defining `__sklearn_tags__` will raise an error.
  warnings.warn(
midihum could not humanize the given file: 'super' object has no attribute '__sklearn_tags__'```
steep bough
#

Hello hello

#

I'm here to ask about what would it take to work on the more theoretical part of Data Science? The more "science" part of it rather than the more applied/industry part of it

#

I'm currently in a CS major, but I'm thinking about changing to a more math reliant degree at my college (which is math focused on Data Science and AI)

serene scaffold
#

@steep boughthe kind of jobs you're describing are to be had in academia more so than in industry. so you'd need to get a PhD. And when you're getting a PhD, you get to blaze your own trail and be interdisciplinary.

whatever direction you end up wanting to go with theoretical data science, there are PhDs doing it in the context of CS, statistics, and probably a few others.

wheat merlin
#

Also when you apply, it can be a good practice to contact professors and suggest professors you would like to work with on your application

#

Showing that you know what research is and that you are prepared for it is like 80% of the selection process

steep bough
#

Yeah. I had a feeling that that would lead me to academia, which is what I want to do (or at least that's where my current interests lie)

#

And, what kind of projects does one do in theoretical data science?

steep bough
serene scaffold
steep bough
#

Got it
Thanks :3

rancid sorrel
mighty widget
#

i have an image it has bounding boxes which are txt files how do I make this in yolov8 format

#

the image looks like this

#

bounding boxes looks like this

#

in yolo format

#

how can I export this for use

steep bough
#

Oh, and a bit of linear algebra

earnest widget
rich moth
#

The time series are really interesting.

fickle shale
cursive oriole
#

Hello, I want to get into AI-ML, I can't afford any of the paid courses available online which is why I'm watching a lot of statquest, but I haven't done anything on the coding part, I have a few project ideas I wanna try, one of them is to create an AI model that can be trained to speed run videogames and do things like find glitches or come up with tactics

#

I want some advise on what I should do next after watching StatQuest to like work towards that project if it is possible

bright rain
#

Why is tensorflow library not working on python version 13?

wheat merlin
rich moth
# fickle shale Damn! What r u doing!

The cool thing about analyzing the second derivative like this is that it pulls out patterns in how the data is accelerating or decelerating over time

odd meteor
odd meteor
# cursive oriole Hello, I want to get into AI-ML, I can't afford any of the paid courses availabl...

Are you familiar with Sklearn yet? If no, I think you can start with

  1. https://kaggle.com/learn
  2. Andrew NG machine learning course on Coursera (it's free)
plush kettle
#

Do news classifier classify news bias based on title or content, this is for NLP

serene scaffold
calm thicket
#

why not both

weak oxide
#

Some help with my learning of machine learning. I managed to master Prophet which I found surprisingly easy to implement but for some reason I cant implement a simple linear regression machine learning model

import matplotlib as plt
from matplotlib import pyplot as plt
import seaborn as sns
import sklearn
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
print (plt.style.available)
from sklearn.metrics import mean_squared_error,r2_score
from sklearn import linear_model
lr = linear_model.LinearRegression()
plt.style.use('classic')

GIS = pd.read_csv('GIS Prices.csv')
GIS.head()
df1 = pd.DataFrame(data=GIS, columns=['y'])
df2 = pd.DataFrame(data=GIS, columns=['Close'])
df = pd.merge(df1, df2, left_index=True, right_index=True)
df
X = df['y']
y = df['Close']
X_train,X_test,y_train,y_test= train_test_split(X,y,test_size =0.2)
lr = linear_model.LinearRegression()
lr.fit(X_train,y_train) ```
I keep getting this error : ValueError: Expected a 2-dimensional container but got <class 'pandas.core.series.Series'> instead. Pass a DataFrame containing a single row (i.e. single sample) or a single column (i.e. single feature) instead.
How would I fix this problem?
#

I get the error after I press lr.fit(X_train,y_train)

#

I would send the csv but dont know how

serene scaffold
weak oxide
serene scaffold
weak oxide
#

ok so Im unsure what to do then

serene scaffold
#

try changing X = df['y'] to X = df[['y']]

weak oxide
#

ok

#

It did work nice

#

didnt realize the importance of two brackets instead of one

serene scaffold
#

using two gives you a DataFrame instead of a Series

#

you can select more than one column that way. or you can select just one and get a DataFrame with only that column.

weak oxide
#

somebody needs to change Stackflow

serene scaffold
weak oxide
#

I admit I didnt know

#

Ill see if I can get the linear to work anyway

serene scaffold
#
GIS = pd.read_csv('GIS Prices.csv')
train, test = train_test_split(GIS, test_size=0.2)
X_train = train[['y']]
y_train = train['Close']

this should be all that's required.

#

though it's pretty sus that the X data comes from the y column

weak oxide
#

I was doing prophet earlier with the same data

#

so it was ds, y, and then close (wheat prices)

#

just for familiarity with the forecasting model

serene scaffold
weak oxide
#

ok

#

anyways thanks

#
Prophet

Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.

#

thanks @serene scaffold

spiral willow
#

Hey guys, i am new to this channel. Actually I wanted to build some data science projects with your help and support. Can you guys please me out. Have a great day ahead

serene scaffold
spiral willow
#

Okay sure, thanks

bleak dew
#

Are there any ways to get pandas to warn when using set items? df["new_column"] = ... ? Just spent an hour tracking down an unexpected edit of a dataframe like this.

serene scaffold
bleak dew
ornate iris
#

hello all, Im looking for other Data Scientists to review a tool Im trying to make, this one is mostly just about dealing with missing values and trying to automate work in our field where possible

https://paste.pythondiscord.com/FM6A

untold bloom
#
In [1]: import warnings

In [2]: def pd_setitem_that_warns(*args, original=pd.DataFrame.__setitem__):
   ...:     warnings.warn(f"setting some df with {args[1:]}")
   ...:     return original(*args)
   ...:

In [3]: pd.DataFrame.__setitem__ = pd_setitem_that_warns

In [4]: df["new"] = 7
/path/to/ipython:2: UserWarning: setting some df with ('new', 7)

In [5]: df
Out[5]:
  item  month  sales  new
0    A      1    100    7
1    A      2    200    7
2    B      3    300    7
3    A      2    100    7
4    D      1    300    7
5    Z      3    200    7
6    Z      4      0    7
7    B      2    500    7
```you can wrap
serene scaffold
#

@bleak dew look at it ^

ornate iris
#

I decided to start again with the earlier file I shared as the foundation to a new program. Here's generation 1! Interface built in PyQt5.

I'd love to hear your feedback!

distant quail
#

whats the main difference between pytorch and tensorflow? Which should I be using more?

ornate iris
# distant quail whats the main difference between pytorch and tensorflow? Which should I be usin...

PyTorch and TensorFlow are both powerful frameworks for machine learning and deep learning, but they cater to slightly different preferences and workflows in the data science and ML community.

PyTorch is often favored for its dynamic computation graph, which allows you to define and modify the model graph on the fly. This makes it particularly intuitive and Pythonic, offering flexibility for experimentation, research, and debugging. PyTorch’s ecosystem integrates seamlessly with Python libraries like NumPy and pandas, which is useful for data preprocessing in data science workflows. Moreover, its focus on usability makes it a great choice for quickly prototyping models and working on smaller, more customized projects.

TensorFlow, on the other hand, is built with scalability and production in mind. It uses static computation graphs, which can be optimized for efficient deployment on various platforms, including servers, mobile devices, and even the web (via TensorFlow.js). TensorFlow offers tools like TensorFlow Extended (TFX) for end-to-end ML workflows and TensorFlow Serving for deploying models in production. If your work requires seamless transition from development to production or emphasizes large-scale, distributed training, TensorFlow might be the better fit.

In a data science context, if your focus is on exploratory analysis, rapid experimentation, and custom model development, PyTorch might feel more natural. However, if you're working on a project that needs to scale to production or integrate with a robust deployment pipeline, TensorFlow’s ecosystem offers more comprehensive support. Many data scientists choose to familiarize themselves with both frameworks, as each has unique strengths that can be leveraged based on the project’s requirements.

#

In the Python ecosystem, several libraries form the foundation of data science and machine learning work. NumPy serves as the cornerstone for numerical computing, providing efficient array operations and mathematical functions, while Pandas offers powerful data manipulation and analysis through its DataFrame structure. For machine learning, scikit-learn remains the go-to library for traditional algorithms, preprocessing, and model evaluation, while TensorFlow and PyTorch dominate deep learning applications, with PyTorch gaining particular popularity in research settings. XGBoost and LightGBM are essential for gradient boosting implementations.

For visualization, Matplotlib provides the basic plotting capabilities that many other libraries build upon. Seaborn extends Matplotlib with statistical visualizations and a more modern aesthetic, while Plotly offers interactive plots that work well in web applications and notebooks. For specialized visualizations, Bokeh excels at creating interactive dashboards, and Altair provides a declarative approach to creating statistical charts. In the R ecosystem, ggplot2 remains the gold standard for static visualizations, while libraries like Shiny enable interactive web applications.

distant quail
#

πŸ‘ thank you :)

serene scaffold
ornate iris
#

My apologies, I just reread the rules won't happen again

digital lark
#

HELP, i have darts tft that went well in the tests but idk y it started giving out train_loss=nan.0, i even converted the covs to dataframes, dropped na and converted them back to series.

How can I stop getting training loss nan, what can cause this ?

serene scaffold
hexed plume
#

Hey less intellegent question here. I am VERY new to python, don't know what I'm doing, and don't know how I managed to code a basic ai. I am using stable baselines 3 and gym for my ai and I'm wondering where do I add aditional inputs to the neural network. Please help me or give me good sources to learn how to do this πŸ™‚

digital lark
echo lance
#

I want to do work in machine learning.. but recently every work i got is related to gen ai and rag projects only... I want to go towards more ML side.. but dont know why automatically I am devieated towards Gen Ai.. is it good for future ? Or should i take some action ?

shrewd spindle
#

Does anyone know about data parallelism like splitting the dataset to finetune a model using multiple GPUs. I have a project but I am a complete beginner in this field. Can anyone help? DM please

ornate iris
# echo lance I want to do work in machine learning.. but recently every work i got is related...

With the new advent of Large Concept Models, Gen AI has a promising next few years. Recommendation systems building more and more into slightly better chat bots and management systems. Especially with more and more research into multimodal, better versions of Test time adaptation and surprise minimization. But IMO, core ML isn't going anywhere and it's only getting stronger as Gen AI depends on developments in core ML skills. Really you have to ask what you're most passionate about.

serene scaffold
serene scaffold
fickle shale
echo lance
# serene scaffold RAG is the hottest thing in AI right now. it's not going to stay that way foreve...

I havent done the finetuning yet .. and so it feels like I am just writing backend code, data processing and prompt engineering. And I feel that this is not data science. So i feel the experience I am gaining is not some high demand experience.. this work can be done by a good developer with some documentation. πŸ˜•

So am i overthinking it ? Or should i get more skills in ML or data engineering to get a more difficult domain... with less crowd .. i dont know if it is correct to think like thiss

echo lance
odd meteor
# shrewd spindle Does anyone know about data parallelism like splitting the dataset to finetune a...

Hi glitch, don't ask question to ask question (if you know what I mean)

Not sure anyone would wanna commit to send you a DM if they can simply answer your questions here. Well, except the person is very free and feeling nice.

Well, what exactly do you need help with in distributed training? Did you try something and you got an error message or ?

Meanwhile, data parallelism (DP) isn't a recommended strategy in practise lately because there's a much better option.

Distributed Data Parallelism (DDP) is now the most preferred (recommended) strategy.

There are several variants of ddp strategy you could explore.

  • Regular one (DDP)
  • ddp_spawn
  • ddp_notebook (if you're training on jupyter lab instead of python script. I don't advise this though but, yeah, this strategy works if you're using Jupyter notebook/lab)
  • DDP Sharded
  • Bagua, DeepSpeed
  • Fully Sharded data parallel (fsdp)
  • etc.

I might be able to comment further if you provide a full picture on what exactly you need help with. Is it in fixing an error, or actually setting up your code for ddp, or??

fringe adder
#

I'm trying to write code for gan, and something is wrong with my discriminator but I can't figure out what. Can anyone help?

willow sequoia
#

guys do you think coding neural network in scratch was a mistake ?😭

#

in python it was easy so i had to do it from "scratch" 😭

lapis sequoia
#

thanks to python and openai api i made a tool to cheat in school

willow sequoia
#

smart

mighty needle
#

I need a little help

Basically i have what i think is a tuple with data, the first is a tensor with the image data, and the other is an array which tells the class from which it came from

How do I use it to train a model? If i use something like data[0], i remove the other array, which i want as the classes also work as its classification

rancid sorrel
#

see data preprosing

#

specificaly image preprosing

mighty needle
#

the image is in the form of a numpy array

serene scaffold
rancid sorrel
#

i mean thats a dedicated channel for it, but its a part of data science an AI

#

there is just a specific subset of AI data preprocessing thats been around for a long time when it comes to Image/video

#

he can stay here if he wants, but... that was more the term for what he needs to google

#

there are alot of existing libires in tensorflow/pytorch so you dont have to actually mess with the image

hasty grail
#

I tried locally using gpt4all

  File "C:\Users\hp\Desktop\runnyyy.py", line 2, in <module>
    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\jarvis\models\gpt4all\gpt4all-bindings\python\gpt4all\gpt4all.py", line 263, in __init__
    self.model = LLModel(self.config["path"], n_ctx, ngl, backend)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\jarvis\models\gpt4all\gpt4all-bindings\python\gpt4all\_pyllmodel.py", line 291, in __init__
    raise RuntimeError(f"Unable to instantiate model: {errmsg}")
RuntimeError: Unable to instantiate model: Could not find any implementations for backend: kompute```


this is the error I faced

```from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
model_path="D:\JARVIS\Models\gpt4all"
response = model.chat("Hello! How are you?")
print(response)```

this is the code I am trying to run

and I am pretty sure I followed all the instructions to locally download it
#

this is the picture of how my folder looks from inside of the gpt4all installation

rancid sorrel
#

sounds like you got an import error

#

this is why i dont use python on windows outside WSL2

mighty needle
rancid sorrel
#

ok what you actually trying to do?

#

like high level

mighty needle
#

the reason i have
batch_size=None
is because the batch size seems to turn the image data from a (256, 256, 3) to a (batch size, 256, 256, 3) which is a tensor

lapis sequoia
#

How do apis work?

serene scaffold
rich moth
#

I feel like future generations will be able to cheat easier, but the threshold for acceptance will change. It's funny yet strange how AI and humans agree on one thing, the easiest/fastest path is through exploitation.

rich moth
#

The real mind bender is here is , Did AI pick up explotation from hard math or from studying human behavior in its training data? Maybe a synergy of both? What if looking for shortcuts is just a sign of intelligence?

fickle shale
#

Can anyone explain me,In multihead attention why we complex ur data instead of this we can't take second highest softmax probablites(let we use 2 self attention here!)?Isn't it valid and saves computation speed!

small wedge
#

if you mean finding the easiest path that's just a result of how they are trained, the easiest path often results in a disproportionately large reward in rl for example (hence being an exploit)

rich moth
#

Imagine this, both AI and humans are wired to find the path of least resitantance (well most of us). In AI training, especially RL, if theres a shortcut that gives a big reward, the AI will zero in on it.

#

I'm just seeing a pattern in how Ai and humans fundamentally achieve optimization problems in math and train human data.

#

It's no different than learning to play catch with your dad. I mean the training data.

small wedge
#

I guess that leads the analogy to an interesting place if you consider the exploitation/exploration balance that we manage in rl as well

#

if an agent only exploits something to the best of their knowledge there might be better exploits, they must take percieved suboptimal actions occasionally to thoroughly search the space of possible exploits

rich moth
rancid sorrel
#

i broke MS print to PDF

#

printing 979 pages to pdf is appretnly the lmit

bright comet
#

what are the best libraries that i can use to create an AI?

mighty needle
#

can anyone help me out with this error?

pine hawk
#

i wanna create a accident detection system how shall i do

fickle shale
unkempt apex
#

what it contains

unkempt apex
mighty needle
#

i used .shape() to check

#

both inputs used are tensors/arrays

cursive oriole
brave sand
#

does anyone know how to make sp.tocsc faster?

unkempt apex
pine hawk
#

how about crash detection T^T

unkempt apex
#

you can use any vision model, if you want to check performance!
but always go with recommendations

if you want to check more results
just search with "github" as suffix to your google search and you will find repos regarding to that

upbeat prism
#

Hello

cunning pond
#

Can some help .I'm starting to learn Machine Learning and finding it really hard to understand linear regression why is it

cunning pond
#

i mean why is it that its hard for me to understand even the first model (is it normal for beginner?)

merry ridge
#

It helps if you have some background in calculus, the derivation is a pretty typical early application of derivatives.

cunning pond
distant quail
#

if i want to train an AI to learn to play a game using Q Learning, do I have to design the game as well or can i do it on an actual game like subway surfers?

serene scaffold
distant quail
serene scaffold
#

But if all the information is knowable from the screen, then yes

distant quail
#

I see, and for things like people teaching their character in unity to walk, how would i go about doing something like that?

serene scaffold
#

I've never done reinforcement learning. But the AI has to be able to output the same keystrokes, etc as a player

distant quail
#

So it isnt feeding a script but rather letting it enter keystrokes and then telling it what was good what wasnt? by that i mean you dont give the character a script, you just let the script run on your laptop as if youre playing^

#

or am i misunderstanding the idea

iron basalt
distant quail
iron basalt
#

You can make custom environments, including hooking into real games.

#

There are extra environments online.

#

Start with these simple ones.

odd meteor
warm copper
remote pewter
#

Hello guys. I just recently started learning Python and Pandas. I understand that becoming a data scientist or a data engineer is way out of reach for a beginner like me. So, i would like to firstly break into data analyst. What do you guys should i study next?

serene scaffold
#

because companies probably aren't going to hire entirely self-taught people to help them make expensive business decisions

remote pewter
# serene scaffold is there a reason that you can't get a CS degree?

I was actually always interested in the field of programming but for some reason i didn't directly hop into computer science right after my highschool. I chose to purse business instead because i was a very shy and introverted person and i believed studying business would somehow make me more outspoken (which is dumb but it actually worked out for me)

serene scaffold
remote pewter
#

Yes.

#

But i don't want to break into programming, rather into data science.

serene scaffold
#

sorry, I meant that it was in the business school, but what was the specific degree?

remote pewter
#

Business Administration

#

but it had a very diverse set of subjects ranging from statistics to business intelligence and data analysis

#

But right after i finished my undergrad, i started learning python. And i also took a 3 months data science with python class but they only taught me pandas.

serene scaffold
#

@remote pewter I would ask in #career-advice how people with your background have gotten jobs. be as detailed as you can, so that no one has to interview you to start giving you useful information.

cunning pond
remote pewter
weary timber
#

can anybody provide me a source to learn about residual networks

rich river
#
# Import necessary libraries
import torch
from PIL import Image
import torchvision.transforms as transforms

# Read a PIL image
image = Image.open('iceland.jpg')

# Define a transform to convert PIL 
# image to a Torch tensor
transform = transforms.Compose([
    transforms.PILToTensor()
])

# transform = transforms.PILToTensor()
# Convert the PIL image to Torch tensor
img_tensor = transform(image)

# print the converted Torch tensor
print(img_tensor)
#

does transforms.PILToTensor() and other transforms only be used inside transforms.Compose? I see every code example online they are used inside transforms.Compose
I tried to use tmp = transforms.ToImage(tmp) it says Transform.__init__() takes 1 positional argument but 2 were given

upbeat prism
#

Hello

A model basically β€œencodes” the learned knowledge, resp. its features, as weights. The "feature representation” of the knowledge is smaller than the knowledge itself. So I think there’s a lower bound to the size of the knowledge which is given by the β€œfeature size” and an upper bound which is given by the β€œknowledge size”. Goal is to be on the lower bound so our model doesn’t just memorise.

The question now is: How do we figure out this size?

Note that I work with tiny networks, I solve XOR, MNIST. So we have very slow amount of params.

My initial thought is: I just run the experiment for varying model size and varying data size. The idea: If the model's accuracy doesn't change for higher amounts of data, we actually captured the features, if accuracy is prop to it, then we memorize.

But there must be some more involved metric no?

odd meteor
torpid latch
#

hi folks, i want to learn to make ai. do you have any recommendations of book/video tutorials?

devout cloak
#

This video tutorial from 3Blue1Brown is a good tutorial/explanation of Artificial Neural Networks:

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

What are the neurons, why are there layers, and what is the math underlying it?
Help fund future projects: https://www.patreon.com/3blue1brown
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

Additional funding for this project was provided by Amplify Partners

Typo correction: At 14 minutes 45 seconds...

β–Ά Play video
iron basalt
unkempt wigeon
#

Does torch have an audio transform?

torpid latch
serene scaffold
torpid latch
serene scaffold
torpid latch
serene scaffold
#

do you know what RAG stands for?

torpid latch
#

retrieval augmented generation?

#

im new to AI

serene scaffold
#

@torpid latch the steps are basically this:

  • the user asks a question
  • the system looks up information that's relevant to the question from a knowledge store; for example, it might pick out key words from the question and look up their Wikipedia articles
  • the system puts the user's question and the retrieved information into a prompt for the LLM
  • the LLM produces an answer to the original question
torpid latch
opaque condor
# torpid latch do you have any recommendations of books/videos for starting learning AI?

Welcome to the most beginner-friendly place on the internet to learn PyTorch for deep learning.

All code on GitHub - https://dbourke.link/pt-github
Ask a question - https://dbourke.link/pt-github-discussions
Read the course materials online - https://learnpytorch.io
Sign up for the full course on Zero to Mastery (20+ hours more video) - https:/...

β–Ά Play video
wide bane
#

can someone help with the deeplabv3 architecture, I want to learn about it and also if it is possible that i can use segmentation_model in training model?

upbeat prism
#

I do basic classification experiments on XOR, MNIST and fahsion-MNIST. Anyone knows some good reference for good model(sizes) for those issues?

white socket
#

hello im trying to train a yolo model using my gpu so i did this

from ultralytics import YOLO
model = YOLO("yolov9m.pt")
model.to('cuda')
results = model.train(data="config.yaml", epochs=3)

When training the model i get this error

NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
#

i dont understand what it means πŸ˜…

upbeat prism
# white socket i dont understand what it means πŸ˜…

YOLO, whatever it is, probably uses torchvision in particular torchvision.nms. If you use that to do computation, you run it on e.g. CPU or in your case GPU. Each case needs a backend, code that actualy runs it on the CPU, GPU etc.

GPU = cuda but there is not cuda backend implemented.

In short: you can't do it on GPU

#

there might be somethign else going on tho

#

from the docs. So I guess you dont have to do it manually.

teal reef
#

yo guys

#

i wanted to ask like how will you fix a data of where you have user input for their college name and now you have multiple variations of every college in the data now

sinful relic
#

did anyone try creating a text translation model, like from english to some unique language!
Let's say we want to create a model that translate English to LangX.
Any ideas?

serene scaffold
rich moth
#

Heres the complexity symphony i created for time series

rich moth
#

It shows the measurments sof complexityy over time in the time series in a way that not just about numbers, but the evolution of data.

#

Like it alive or something lol. But it makes sense, time series data is always concurrent, with a history of the past. Like time its always here.

rich moth
pearl imp
#

Hey I'm working on a chatbot that uses information from multiple datasets and I was wondering how I should handle its training. Specifically, how would you handle preprocessing, merging, and ensuring consistency across datasets, as well as choosing the right model and framework for fine-tuning?

serene scaffold
pearl imp
serene scaffold
#

Sorry, you said country. I thought you said county.

But what I said still applies.

pearl imp
#

What's wrong with fine tuning and what is RAG? The data is stuff like the last election result by district, previous polling stations, list of candidates by party, etc.

serene scaffold
hexed plume
#

Hey does anyone know what's wrong with

"model = PPO("MlpPolicy", env, verbose=55)"

It's giving me an error for not being a list?

serene scaffold
#

I'm in a car now so I'll elaborate when I get home hopefully

hexed plume
#

Traceback (most recent call last):
File "D:\Python.AI.Games\AI.8.venv\Scripts\Main.py", line 354, in <module>
train_model()
File "D:\Python.AI.Games\AI.8.venv\Scripts\Main.py", line 321, in train_model
model.learn(total_timesteps=30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
File "D:\Python.AI.Games\AI.8.venv\Lib\site-packages\stable_baselines3\ppo\ppo.py", line 311, in learn
return super().learn(
^^^^^^^^^^^^^^
File "D:\Python.AI.Games\AI.8.venv\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 323, in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python.AI.Games\AI.8.venv\Lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 207, in step
return self.step_wait()
^^^^^^^^^^^^^^^^
File "D:\Python.AI.Games\AI.8.venv\Lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 59, in step_wait
obs, self.buf_rews[env_idx], terminated, truncated, self.buf_infos[env_idx] = self.envs[env_idx].step(
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python.AI.Games\AI.8.venv\Lib\site-packages\stable_baselines3\common\monitor.py", line 94, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Python.AI.Games\AI.8.venv\Lib\site-packages\shimmy\openai_gym_compatibility.py", line 250, in step
obs, reward, done, info = self.gym_env.step(action)
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

#

code:

pearl imp
serene scaffold
white socket
marsh hedge
#

greetings everyone, i want to start learning ai, can anyone suggest me some of the good resources available online? Thankyou

rich moth
#

Im still fine tuning the visuals. I think ill split them in two gifss

iron basalt
rich moth
#

i got some work todo still with the visuals (obviously) but

serene scaffold
marsh hedge
# iron basalt What do you think "AI" is? And what do you want to make?

i think 'AI' is the better version of the previous gen supercomputers which can do some tasks which requires human intelligence . It still cannot perform tasks like humans but in near future it will . what i want to make?-- my aim is to predict the packet loss in networking and automate the manual part because it sucks to get a high latency in fps games and i am tired of it. i am in my last year of graduation and i think i have the basics down....but i want to know what is happening in the global market....i also have some research papers on my name......i can show it to you personally which already got published.

marsh hedge
fickle shale
past meteor
#

Basically, you embed your documents (turn them into a dense vector) and when a question gets asked you embed that question, find the most similar documents and pass them on to the bot when it's answering

dusty jetty
#

Hiee is there anyone who could suggest me how could i start up with elasticSearch as i want to start it in python and i do not know anything about elasticsearch

supple grove
#

if i have tabular data is there a way i can guage if a random forest, xgb or NN will be the best performer, without training all of them, or do I need to test all the models?. Is there a way to gauge threshold for data samples for NN to be viable.

serene scaffold
#

I heard xgboost is the best for tabular data

supple grove
serene scaffold
supple grove
#

Didnt u just say "youd have to train them all"

serene scaffold
supple grove
#

I see the point with infinite node settings in a NN, but how does a model go about it without missing the best setting. training infinity is not possible

serene scaffold
#

You decide how many nodes the network will have. The network can't add or remove nodes from itself

supple grove
serene scaffold
supple grove
supple grove
serene scaffold
#

So you're submitting your model to a competition?

supple grove
#

No.

#

You dont want to be outperformed by competitors

#

Do you find it odd that I want to optimize the numbers?

serene scaffold
#

I think you've fallen into the trap of premature optimization

white socket
supple grove
#

Eg one of my questions were if anybody had suggestions on how to gauge the number of data samples needed to outperform xgb with a NN? a fair thought before training arbitrarily without contemplating possibilities and the better grounds for end result is hard premature optimization.

rich moth
# dusty jetty Hiee is there anyone who could suggest me how could i start up with elasticSearc...

Download it from github and installed it, edit the config.yaml file to your likings. Start the server. You can add an initialization block for to your python script to connect to it. You can also index data right into it, like datasets and embedded it. Its really easy to make a custom knowledge base for agents to access. Elasticsearch and Haystack work well together, maybe check that out.

dusty jetty
#

Is there any resource because I have been struggling for the entire day so it would help me a lot

upbeat prism
#

I have a torch dataset that has a plot function. I use subsets to split. Can I somehow make functions from the dataset available in my subset?

serene scaffold
rich moth
upbeat prism
sterile belfry
#

im trying to build a forecasting prediction model for sales but having some issues. would any one be able to give me a hand if I provide the code and data set

serene scaffold
serene scaffold
rich moth
#

Wanted to share my current version with you guys hope you like it. I could use some feedback too, thanks

#

So I built this adaptive transformer that uses this new method to measure data complexity. Its suppose to let the model automatically adjust its size and complexity to better fit the data it's processing, making it much more efficient and able to capture subtle details.

#

It also reduces the chances of the model overfitting by significant amount if not completely mitigating the issue

rich moth
#

I feel like AI and all of ML it encompasses is just pattern optimizers. I mean at a fundamental level.

serene scaffold
#

that's the whole game.

rich moth
serene scaffold
iron basalt
iron basalt
rich moth
iron basalt
iron basalt
#

ML is also slowly becoming an everything term in the same way AI is being used.

#

Some use ML in place of AI just so they can avoid a bunch of philosophical discussion, as everyone seems to have an opinion on it.

#

Or to de-hype what they are working on.

serene scaffold
rich moth
#

So I make an autotransformer with the DeepChem data. Playing around with the ideas I made.

vast yacht
#

hi guys, i'm currently working on a data engineering project that involves incremental load. i'm trying to replicate (not entirely) dbt's incremental model but dont know the backend logic they use to handle old/unchanged rows without (explicitly) running a full load again, maybe they use a log or cache or metadata or sth idk. if you're familiar with dbt, i need your help. let's dm ❀️ thankssssssssss

fervent canopy
amber sequoia
#

does anybody have experience maybe with outputting matplotplib to the browser?

supple grove
#

The first section is a 80iteration cross validation. Second part is predict_proba() results. Does this smell like over fitting or how can i check it? Noob at play

mild dirge
supple grove
#

Is the desired optimization by avoiding overfit that the test and train score goes towards a mean of the 2, in other words that the test score rises while the training score should drop?

upbeat prism
mild dirge
#

In which case training may not be affected as much, and the model will perform better on new data as well.

supple grove
#

All good thanks.

#

I read that data augmentation with tabular data could be difficult, not saying its off the table.
I tried to lower learningrate and colsamplebytree

remote stream
#

is there a way to utilise gpu in windows

supple grove
#

I got this instead, which seems to be more "generalized" and not overfitting?
I do wonder why the test auc mean is 67% but when I look at the last line (which is precision recall f1 support scores) they are very high between 92%.99%. I dont undertand how they are so high when the test auc score is low.

jaunty helm
supple grove
#

no in case of the overfit?

#

Im not formatting it, just printing the result of print(precision_recall_fscore_support(y_test, y_pred))

jaunty helm
#

(I've never used that function in my life tbh)

supple grove
#

I'm just getting started, I was trying to understand the results im looking at.

supple grove
#

I dont understand when the testaucmean is around 67% shouldnt there be some of the results below that is closer to 67%?

jaunty helm
#

my guess is that you have a heavy imbalance of data, i.e. class_1 appears way more than class_2

supple grove
#

whats the testauc number at 67%?

jaunty helm
supple grove
#

Idk why I thought it would correlate with performance

#

No, i had looked it up.

jaunty helm
supple grove
#

based on the auc value or the bottom line?

jaunty helm
supple grove
#

I meant "wouldnt say its performing great" based on auc the the line in the bottom (prec rec etc)?

#

Im running the program again, ill let u know

jaunty helm
supple grove
#

ah, better formatting than the func I used lol.

jaunty helm
# supple grove

yeah so the "problem" is you have way more Falses than Trues, and your model does very poorly at predicting Trues overall
*this may or may not be a problem depending on your needs

supple grove
#

yea, only about 8% true

#

in the set

jaunty helm
# supple grove yea, only about 8% true

basically you can imagine that what your model's learned is to pretty much always predict False because that'll almost always be correct simply due to your training data having mostly Falses

#

and your data is called "imbalanced" in this case

supple grove
#

hmm, I thought i was using smotetomek

jaunty helm
supple grove
#

I have forgotten to balance it after i switched to xgb, so need to repipe it

#

it wasnt balanced

#

hmm well i hope it works lol

calm thicket
supple grove
#

if I'm normalizing the data. It is wrong to normalize the data and then split it? should it be split and then normalized after the split?

jaunty helm
supple grove
#

tried with smotetomek. went some useless to trash ^^

jaunty helm
#

which will make the metrics less reflective of actual performance

supple grove
#

Right, I can see I actually did normalize after, was I thought I had justwhen i began.

jaunty helm
supple grove
#

the minority is much more important to predict

jaunty helm
#

that's usually the case

#

depending on what model you use there also might be parameters that helps w/ imbalanced datasets

supple grove
#

xgboost

jaunty helm
supple grove
#

read it was best for tabdata

#

should i remove the oversamplying if trying class weights

upbeat prism
#

I wanan do a continual learning experiment on fashion-MNIST dataset. For that I want a paper or reference that tells me a good model size.

Here's their arxiv doc https://arxiv.org/pdf/1708.07747 but I?m not sure which one is actually a neural netwokr in the classical sense. Maybe it's the SGDClassifier? ^^

#

ah maybe the MLP no

#

but wtf is a sgdclassifier then

#

Linear classifiers (SVM, logistic regression, etc.) with SGD training.

ah hmm

supple grove
#

i tried to do a gridsearch. trhat just lowered the scores trying to apply the best "params". I feel the scores are really low and not good at predicting. Can I do something else than balancing, hyper tuning params? and could I hit a "jackpot" or for example doing something that would improve the accuracy 2x for example?

remote stream
calm thicket
#

there is a way

agile cobalt
#

generally speaking WSL is better supported, but some common tools do support it just fine

throw windows subsystem for linux on google if you never heard about it before

calm thicket
#

it's WSL

agile cobalt
#

oops derp
edited, I knew something felt off but always get it mixed up

errant bison
#

Can someone suggest a good project idea or any project definitions which is a requirement or good to learn

ornate iris
#

So I had this strange Idea to put together a hybridized Latent Context Model and a Tiny Concept Model ( based on a Large concept model). Giving them a shared space, Then wrap them in an AI team infrastructure. Then apply Monte Carlo Tree Systems, Test time training and Surprise minimization.

Does anyone have any suggestions or guidance on this kind of thing?

lapis sequoia
#

I guess I did most of what I could do, but I still got an CUDA OOM.Is this part of the VRAM unable to be allocated due to fragmentation or is it due to some other reason? I read the following document and still don't understand it.

weak oxide
#

Is there a good video you guys recommend for clean learning and understanding of LSTMs like any favorite YouTuber?

#

Because LSTMs I kinda want to utilize for long term prediction of some prices

#

If anyone has a favorite YouTuber that's good on the subject that's all

rich moth
rich moth
# weak oxide Is there a good video you guys recommend for clean learning and understanding of...

LSTM's and GRU's are widely used in state of the art deep learning models. For those just getting into machine learning and deep learning, this is a guide in plain English with helpful visuals to help you grok LSTM's and GRU's.

Subscribe to receive video updates on practical Artificial Intelligence and it's applications.

Also, comment below an...

β–Ά Play video
serene scaffold
#

!code

arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

tawdry sundial
#

whats better, llamaindex or langchain?

lapis sequoia
#

hello

#

is starting with scikit learn a good idea? πŸ€”

#

looks very solid

serene scaffold
#

and they tend to come up with obscure abstractions

timid jasper
#

Hey, what’s the best way I can implement a generative ai ?
I need a sort of unlimited amount of requests, but I prefer my AI to be locally hosted than using an API
I have over 1 million specific texts I want to train my AI on

serene scaffold
serene scaffold
timid jasper
tawdry sundial
#

how should i decide?

serene scaffold
#

at least, not in a form that you would recognize.

timid jasper
serene scaffold
timid jasper
#

Ah

#

So I’ll have to go for an api solution, any recommendations for a free to priced api with good amount of requests per day?

serene scaffold
#

the generative AI bomb has only been possible because of innovations in GPU technology since your GPU was designed.
and the cost of these new GPUs reflect their capabilities.

timid jasper
#

Makes sense!

serene scaffold
lapis sequoia
#

what differentiates an A100 and 5090 πŸ€” in terms of training models

serene scaffold
#

(so pretty much, the size and speed)

lapis sequoia
serene scaffold
lapis sequoia
#

a 4090 for example

serene scaffold
#

I'm not saying which is better for either; I'm just saying that if you're comparing two GPUs, those are the main things

lapis sequoia
#

yes

#

i just had another question 🫑

#

because it can do more calculations, shouldn't it technically render more polygons

#

so why dont they use a100 in those expensive gaming setups

serene scaffold
#

if the most graphically intense games are designed for at most 5090s, then there won't be added benefit from using a "better" GPU

lapis sequoia
#

how is 4090 faster than a100 in terms of processing power if a100 has 300tflops and 4090 has 90tflops

#

am i missing a variable

serene scaffold
#

like, if you need to carry 1L of water, and you have a 2L bucket, getting a 3L bucket wouldn't be an upgrade.

lapis sequoia
#

vram is the size of the bucket

serene scaffold
#

right; flops is speed. my metaphor is only for VRAM

lapis sequoia
#

so confusing

#

apparently, the a100 is slower

#

but it has more vram so it can train larger models

#

and more effecient than a 4090 since it uses less power

serene scaffold
#

whenever you say "more efficient", you have to say for what

lapis sequoia
#

like power effeciency

#

so its used in data centers

serene scaffold
#

less power per floating point operation?

lapis sequoia
#

for personal use, multiple 4090 is preferred over a100

#

yes

jaunty helm
jaunty helm
#

and also, you need software (drivers) that allows the hardware to render anything at all onto your screen
so e.g. if you only have an A100 linked up to your pc, it can't even render your desktop, unlike an 4090

#

the datacenter cards are pure CUDA compute

tight sphinx
#

Why when I request to Llama API for the llama3.1-70b model it takes forever and then just stops working

#

I'm using it for my discord bot

#

Also, I wanna train an AI on some datasets and pdfs too (texts), however I only have GTX 1050 Ti.. Is it sufficient?

#

If so, what's the suitable model to choose?

jaunty helm
tight sphinx
jaunty helm
#

and I don't use llama api in the first place

tight sphinx
#

I should have used the official API πŸ’€

rancid sorrel
#

well video out

#

see lack of video out

brave sand
#

does anyone know faster alternatives to scipy tocsc?

serene scaffold
brave sand
serene scaffold
#

if it's converting between data formats, rather than doing a calculation, then probably not

brave sand
serene scaffold
brave sand
#

i replaced the sp (scipy) calls with cp and timed it

#

cupy is always slower

serene scaffold
#

cupy is just "numpy on cuda", so it implements array data types themselves
are the cupy arrays that you're creating using the GPU?

brave sand
#

shouldn't it default to using the gpu?

serene scaffold
brave sand
serene scaffold
brave sand
#

so i am using the gpu but it's still slower than scipy?

serene scaffold
#

you keep saying "slower than scipy", but that's not how it works

#

you can compare cupy and numpy. you can't compare either of them to scipy

#

can you show the code you're trying to run that's too slow?

brave sand
#

so context:
construct_reward_matrix is from a larger codebase that i ran a profiler on, it's saying .tocsc is taking the most time so i wanted to optimize it

serene scaffold
brave sand
serene scaffold
#

@brave sand and you're saying that the line that's too slow is which of these?

  • sp.coo_array((datas, (rows, cols)), shape=(self.n_actions, self.n_states))
  • cps.coo_matrix((datas, (rows, cols)), shape=(self.n_actions, self.n_states))
serene scaffold
#

because neither of them contain "toscs"

brave sand
#
        return R.tocsc()
#

this one

#

right below it

serene scaffold
#

that same line appears after both; in which instance is it too slow?

brave sand
#

like, does .tocsc work for scipy and cupy?

#

so it's not a valid comparison?

serene scaffold
#

you have to look at what types sp.coo_array and cps.coo_matrix return. different types can have different methods with the same name, and that guarantees nothing.

brave sand
# serene scaffold you have to look at what types `sp.coo_array` and `cps.coo_matrix` return. diffe...

ok i will look into that. do you think this is even worth pursuing to make my code run faster?

this is the function from the codebase:

    def construct_reward_matrix(
        self, t: int, market: Market = None, **kwargs
    ) -> sp.csc_array:
        """
        Construct a sparse reward matrix

        Args:
            t (int): timestep of the market price data ([-1, horizon])
            market (Market): market of prices.

        Returns:
            sp._csc.csc_matrix |A|x|S| matrix
        """
        if market is None:
            market = self.market

        if self.verbose:
            print(
                f"Constructing {self.n_actions:,}x{self.n_states:,} reward matrix."
            )

        harvest_items = self.get_harvest_items(market=market, t=t)
        penalty_constraint_sat_items = self.get_penalty_constraint_sat_items(
            t=t
        )  
        entries = harvest_items + penalty_constraint_sat_items

        locations = {(i[0], i[1]) for i in entries}
        assert len(entries) == len(locations)

        rows = []
        cols = []
        datas = []

        for action_id, state_id, data in entries:
            rows.append(action_id) 
            cols.append(state_id)
            datas.append(data)

        R = sp.coo_array(
            (datas, (rows, cols)), shape=(self.n_actions, self.n_states)
        )
        return R.tocsc()```
#

is the problem that converting to csc format is inherently slow no matter what library does it?

#

i thought using cupy would make it a lot faster because it's on the gpu

serene scaffold
#

it looks like CSCs are for representing sparse 2d arrays with less memory, with performance costs. and it also takes time to convert to that format.

brave sand
serene scaffold
brave sand
serene scaffold
brave sand
serene scaffold
brave sand
#
Raw duration: 0.000000 seconds
Scipy duration: 0.001004 seconds```
@serene scaffold
brave sand
#
    def construct_reward_matrix(self, t, market=None, **kwargs):
        if market is None:
            market = self.market

        if self.verbose:
            print(f"Constructing {self.n_actions:,}x{self.n_states:,} reward matrix.")

        harvest_items = self.get_harvest_items(market=market, t=t)
        penalty_constraint_sat_items = self.get_penalty_constraint_sat_items(t=t)
        entries = harvest_items + penalty_constraint_sat_items

        locations = {(i[0], i[1]) for i in entries}
        assert len(entries) == len(locations)

        rows, cols, datas = zip(*entries)

        return rows, cols, datas```
#

*no lists sorry

#

i need a datatype to store them though no?

serene scaffold
#

what are "them"?

brave sand
#

bc i do this:

    R = sp.coo_array((datas, (rows, cols)), shape=(self.n_actions, self.n_states))
#

shld i use a dictionary?

serene scaffold
serene scaffold
violet gull
#

has anyone tried using LLM for automated parsing in industry applications? I know LLM can occasionally generate false information but recently ive seen no false information when it comes to numbers and stuff. Im wondering if anyone has found a way to successfully use it and what kind of cross checking ensures accuracy?

#

Is there a way to increase the accuracy to the point it is equivelent to a human written parser on complex input and would it be cost effective sacraficing compute cost for development cost?

serene scaffold
violet gull
serene scaffold
# violet gull Chemical Sensor data which comes in big excel files

suppose that you have that excel file as a dataframe, and you pick the column that has the values you want to extract as col.
you can then ask something like "The following is a list of ... from a spreadsheet about ... . Please extract all the ... values that appear in the list. Examples of this include ... . Please return the extracted values as a JSON array of strings; please do not include any other explanations or text in your response. {col.tolist()}"

you will need to use an LLM that's instruction-tuned (they typically have "instruct" in the name.

#

you can modify the prompt and the inference parameters (like temperature), but don't try to fine-tune.

#

as far as evaluating the system: you need a test set with which you can calculate the precision and recall.

violet gull
#

On large scale

serene scaffold
violet gull
#

I didn’t ask for that

serene scaffold
#

you're asking how accurate LLM-based entity extraction is in general/universally

#

are you not?

violet gull
#

I’m asking has anyone done it, how well did it work

serene scaffold
#

it worked well when I did it.

#

at least for certain types of entities.

violet gull
#

Failure rate? implementation complexity? what kind of data?

iron basalt
#

It's improves both memory usage and performance. The reason it improves performance is because when doing multiplication, you can skip all zero entries, resulting in less multiply-adds (less loop iterations total, it does not even consider the zero entries).

#

However, those non-zero elements are slower (jumping around in memory position where those non-zeros are) than before, so this must be outweighed by the matrix being sparse enough.

#

If you are not constructing this sparse matrix repeatedly, but only once, or only every once in a while, then formats such as CSR/CSC are optimal.

#

They take more time to construct, but run just as fast as dense matrixes, while also avoiding all zero entries and reduced memory usage.

#

(COO if you need to construct the matrix repeatedly and need that to not be slow (ratio of number of times you construct the matrix over how many times you use it to multiply is not small))

iron basalt
#

Converting a dense matrix to such a sparse matrix requires looping over all the matrix entries, creating this contiguous array from the non-zero entries. So if you then only use that to multiply once, you are not gaining anything unless the matrices are giant and very sparse (due to O(n^3) multiplication).

rancid sorrel
serene scaffold
rancid sorrel
#

you wanna take A and make it B thats a sparse matrix?

hollow carbon
#

anyone know some free real time data api that i can use for data analysis maybe sports or something

rancid sorrel
#

depends on sport

hollow carbon
#

wdym

rancid sorrel
#

major league baseball and american footbal tend put out lot of stats

hollow carbon
#

yeah basketball or football

rancid sorrel
#

like you can see the players injury and crimminal convictions if you want

hollow carbon
#

football meaning soccer

rancid sorrel
#

no "football" the real one πŸ˜‰ dosnt put out as much stats

hollow carbon
#

ahh wb basketball

rancid sorrel
thick rapids
#

Guys one question, do you think an intelligent person with an economic background can be a good data scientist

rancid sorrel
hollow carbon
thick rapids
#

Because I’m doing a master in data science and I have a lot of friends with a computer science background and look down to me

hollow carbon
#

idk pretty new to this api stuff

rancid sorrel
#

but generaly the disaplin your looking at @hollow carbon is data scraping, there are 2 major types of this
webscraping (http is an api)
actual api's like REST, GRPC, this usally has a more defiined model

#

if you want an api to playwith, i reccomend guild wars2 or Eve Online, both have a market API you can mess with , the documentation and a community that knows whats going on

#

but this is more "software developer"

#

fyi this is what i am like when i write papers πŸ˜‰

serene scaffold
rancid sorrel
#

the economist probably has more math

#

computer scientists tend to be software engneeris with decent descrete math

thick rapids
serene scaffold
rancid sorrel
#

or just earn so much money you can buy them

thick rapids
rancid sorrel
#

do you know R and Python?

#

ontop of your economics mathmatics?

thick rapids
#

I have to do a project about spatial data in sql using pgadmin4 and postgis

thick rapids
rancid sorrel
#

congrats your a data scientist

#

you just care about the answer

thick rapids
thick rapids
rancid sorrel
#

alot of the time data scienits/engineers dont care about the answer

#

your next reccomended netflix watch dosnt really matter that much

#

well not since everyone is using same algorthim

#

btw anyone know a good book for programming/finance intro?

thick rapids
#

Check books about econometrics

#

I don’t know why you need the book tho

rancid sorrel
#

i need to improve my time serise programming, but i need domain knowlge about finace to make code that is well useful

thick rapids
#

Study stocks

#

You know finance is a bit difficult to start with

rancid sorrel
#

yeah thats more or less what iam doing
can i caluatle a return and a monety carlo simulation, yes. do i know what they are for no

#

can i do those in 3-4 diffrnet methods in 2-3 programming laungess yes, this is where i am most of the skill and none of the reason

brave sand
thick rapids
rancid sorrel
#

I can already do that πŸ™‚

thick rapids
rancid sorrel
#

I am going back to learn how to do it in power bi as well πŸ™‚

thick rapids
#

You may also try Tableau

wicked pine
#

guys

#

where can i paste code?

So i can get a link that leads to the code

rancid sorrel
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

pearl blaze
#

i started khan acadmey linear algebra course of 144 videos some days ago for ml , should 144 vids are worth it ?

#

i want to build strong foundation , so i will complete it anyway , but i just wanna ask will this thing help me in practical use cases too ?

#

And i don't wanna become research scientist sort of thing , my aim is ultimately one that is start my startup and grow .

#

i know i can call api and integrate it on my web project as i also know good amound web dev thing , but i just wanna make some unique thing also and do thing like helping local / remote businesses in very large scale , that's why i started ml from scratch . And i am doing this thing along with software dev .

thick rapids
#

You need time and practice on real world problems

#

I guess even 1444 videos aren’t worthy as a project

supple grove
ornate iris
#

This was right after reading about some performance metrics Microsoft was getting from a small language model and MCTS compared to O1

#

But I realized that TTT, MTCS and Surprise Minimization are each applied in slightly different aspects. So while they're each showing effectiveness independently, there's a lot of potential if they're used in tandem and used properly.

lilac island
#

Data analytics has a related Python course ?

real smelt
#

I want to learn Data science, where do i begin

errant bison
#

I want to create project using llms, nlp/ rag. Can anyone give idea how to go about it

gloomy river
past meteor
errant bison
green pilot
#

For people looking into data analytics with python I recommend just getting on youtube and looking things up there are free data sets out there and use pandas and other distros that are commonly used. Right now i found a free data sets of customers working on make graphs and heat maps with it

gentle sage
#

What separates a resume worthy project from a pet project? I’ve been practicing using RNN and LSTM on time series data however I don’t know what makes it resume worthy.

serene scaffold
gentle sage
#

I have it annotated in a Jupyter notebook with comments on functionality inline

serene scaffold
gentle sage
#

I’m sorry im not the best with formal names. I did use markdown cells with latex but lowkey I just c+v and filled it out

#

Kinda just copied format from other notebooks I’ve seen

final cobalt
#

Hey people

#

Halp Q.Q

#

Somewhere in this code is a bug. Yesterday this code (a messier version of it) worked. Today it isn't working. Specifically, yesterday continued to learn much further before converging. Now it isn't getting far at all before getting caught in some kind of minimum

unkempt apex
pearl blaze
thick rapids
#

I hate linear algebra and I dont think you need it as much as statistics

#

start with stats

pearl blaze
#

205 videos of statistics

#

khan acadmey

#

worth it ?

#

and i am starting with linear algebra because its required in statistics too

#

as it provides essential tools for managing and analyzing data, particularly when dealing with multiple variables

#

It simplifies the handling of large datasets and is crucial for understanding and applying statistical concepts effectively

thick rapids
#

bruh yeah but for dont spend much of your energy in mat teory go on with stats and python

fallow coyote
#

How do you balance learning the mathematics for ML and programming? I like learning the maths behind it but, I dont want to spend so much time learning it snd then not spending enough time programming.

pearl blaze
fickle shale
#

Learn and implement!

scarlet quest
#

Hey Guys

#

What's Going On

fallow coyote
fickle shale
#

Basic of linear algebra probability stats calculus takes time!

pearl blaze
#

if we learn 2% everyday , and implement atleast 30% of it , its more than enough .

thick rapids
#

yeah

pearl blaze
#

i would like to know , is any library

#

that connect

#

linear algebra with practicality , basically library that help me to apply daily linear algebra dose

thick rapids
#

there are no shortcut my friend unfortunatelty you have to connect things

untold bloom
#

scipy.linalg, numpy.linalg

fallow coyote
#

I think I'm better off learning ML/AI when I've reached a sufficient understanding of mathematics whilst I'm at uni. Its such a complex topic that a uni environment would be much better for me to fully understand it

serene grail
fallow coyote
midnight rain
#

if u dont then u wont understand what you are doing

final cobalt
unkempt apex
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

final cobalt
#

Just in case anyone can see any obvious issues. I'm pretty sure whatever bug I'm looking for is in here. Sin switched with cos or something.

dense needle
final cobalt
#

In theory

#

But it's extremely helpful. Sorta the difference between using both arms or having one tied behind your back

dense needle
#

i am actually planning to work through Intro to Statisitcal Learning later in the spring/summer if anyone is interested in forming a group

final cobalt
#

As someone who self-taught myself programming, I know that no amount of studying in the world is a substitute for actually getting your hands on the thing, but, that the foundational knowledge will fill in gaps you can't really learn on your own

dense needle
#

or to start working through it. it's a big book lol

final cobalt
#

My advice is to simply push along both fronts, and you'll learn what you need to (or at least, come to know what you don't know) in due course

#

And in any case, it's a moot point. You're going to need Calculus period, full stop. It doesn't matter what you're going to end up doing, calculus is a must have. That alongside linear algebra is the basis of ML

#

So don't sweat it. Just focus on learning Calculus and when you're done, assuming you've dicked around with ML in the mean time, you'll have a pretty good idea of what's left to study once that's over

#

On that front, I do have a little advice

#

Professor Sexy Leonard is your friend. https://www.youtube.com/professorleonard

#

#1 best calculus teacher ever. You'll be fine

#

Second, do your calculus in a single straight shot. Do calc I, Calc II in the second term, Calc III in the summer (online if you have to) and then Calc IV in the second fall

#

And fourth: Subscribe to ChatGTP. It's $20 (USD) per month, but it'll be the best money you'll ever spend on an educational expense. You can give it a calculus problem and it'll break it down for you step by step with full english explanations. It'll also help with everything else in your educational life

#

And lastly, be a nuisance here. Ask every question you need to (after doing the reading of course, we aren't here to do your studying)

fallow coyote
dense needle
#

Maybe just try making stuff then until you bump into walls based on math limitations

#

Which is maybe something like β€œI can make this model but it performs poorly”?

#

Or β€œidk which model is appropriate here”

fallow coyote
#

Idk man. Ive gone back to square one, as I usually do. Its not like the maths is hard to learn. My mathematical ability has always been my strongest part. Even with complicated concepts I can power through and learn. I just want to do something interesting where I get stuck in and not do menial BS

dense needle
fallow coyote
#

In uni. Literally all my classes are maths (mechanics, electronics and your pure maths). Ill send you a link with my uni course: https://www.shu.ac.uk/courses/computing/bsc-honours-artificial-intelligence-and-robotics-with-foundation-year/full-time#course-modules

#

Its utter dogshit atm. I dropped out of uni over a year ago after doing biomed for three years. Was essentially forced to go back to uni by my parents. Dont want to do uni anymore or at least do uni the conventional way

dense needle
#

Seems like if you want to make some stuff just try to make it then and see where you land

fallow coyote
#

Yeah but with ml stuff, im very limited in what I do if I dont understand what the code and values are telling me. Im just going to switch to a different are of computing for now

#

Over a year of programming and im still barely a beginner.

analog gust
#

if anyone with knowledge in sklearn could read this i would be really thankful paimoncry
also if anyone is for some reason super interested in the topic please feel free to DM me since i dont think theres gonna be a simple answer for this ...

i'm not sure how to even start but, i was gonna program an AI for an NPC boss for my bachelor thesis, and my professor kind of pushed me into the direction of using scikit learn and he really wants me to use machine learning.
now, maybe my understanding of machine learning just isnt that great but, until now i've set up a python server which gets all data entries of the players movement, actions, his distance to the boss, attacks, dash directions, etc etc. and the idea was to use sklearn to evaluate the data and send its results back to the game, to change the bosses stats for the next round.
but i really don't have a lot of data that i'm sending, at least nothing compared to the sample sizes in the sklearn examples. and what is a graph of a DBscan gonna help me to evaluate the results? it seems i still need to evaluate them myself in the end? I've never worked with sklearn before , so maybe i just dont know its full potential.
I guess my question is basically, if your professor told you to use machine learning for this specific example, what would you do with the received data and how would that be helpful in the end?
ty in advance y'all are lovely sagehearts

serene scaffold
#

what is your professor's research domain?

analog gust
serene scaffold
analog gust
#

I don't think he works a lot with scikit himself, he only teaches advanced game development and not really ML

serene scaffold
#

scikitlearn decidedly doesn't support reinforcement learning

#

it says so in their FAQ

#

do not try to do this with sklearn, because it can't be done. show them the link I just sent you. do not let them convince you to continue this exercise in futility.

analog gust
#

so reinforcement learning would be the only option that would even make sense in this scenario?

serene scaffold
#

yes

#

if the other student got in trouble for not using the ML tools in sklearn for that assignment, it's because there are no ML tools in sklearn for that.

analog gust
#

thank you, that's kind of bad news but, at least I know not to keep looking for solutions now NamiPray

serene scaffold
#

you asked about this a few weeks ago, right?

analog gust
#

I think I opened a thread for it but at that time my issue was mostly still even connecting my project to a python environment to even access sklearn

serene scaffold
#

try this.

analog gust
#

I will look into it cNaruThank

serene scaffold
iron basalt
# analog gust so reinforcement learning would be the only option that would even make sense in...

You can also do a few other options. Such as behavioral cloning (copying a player playing through it), or genetic algorithms (evolution). Genetic algorithms take the most compute, but absolutely wreck any video game (far above human performance given enough training time). They are really good when applicable, and you can afford it (they can also be combined with other methods, such as reinforcement learning).

#

Reinforcement learning will work depending on the specific game / reward signal.

#

Some problems it will just never get there in any reasonable amount of training time (without you helping it out a lot (basically cheating)).

#

For animals IRL, they have a ton of evolved stuff that basically bootstraps them, so their reinforcement learning only needs to tweak things, it does not need to start from scratch (where the search space is so massive you often don't get anywhere).

#

(e.g. putting everything in your mouth when young is an evolved data collection behavior which helps the reinforcement learning, it gives it a bunch of data it would not have gotten otherwise through completely random actions (starting from random init))

#

(behavioral cloning is another way of getting a starting point)

analog gust
# iron basalt You can also do a few other options. Such as behavioral cloning (copying a playe...

I see yea... well right now my game is kind of made to be probably less than 3 minutes of playtime in which the player either kills the boss, gets killed, or the timer runs out. after that, next round starts obviously. in theory, the boss should change according to not only the player style ie aggressive/defensive etc but of course also the skill level. I guess it would make sense that I try to feed it test samples of what an aggressive/defensive behavior would be reflected like in the data? id hate for the test playert to have to play for hours to see any changes? floof_cry which method makes sense for this?