#data-science-and-ml

1 messages · Page 176 of 1

pseudo loom
#

C++

soft ermine
#

ye

pseudo loom
#

Cool. So

#

Hm js?

soft ermine
#

ye

pseudo loom
#

...

#

U know everything

soft ermine
#

?

pseudo loom
#

Wow

soft ermine
#

not everything

pseudo loom
#

Mostly

#

So can u actually make a full stacked ai model

soft ermine
#

full stacked? wat that mean for ai?

pseudo loom
#

Ye

soft ermine
#

no like. wdym full stacked. like just a working neural network?

pseudo loom
#

Kinda.

soft ermine
#

then ye i can make any neural net i want and train it

pseudo loom
#

U can actually build stuff and become rich perhaps

soft ermine
#

neural nets r not that complicated

pseudo loom
#

Like ai nowadays is going in a big boom

soft ermine
#

its gonna plateau soon

pseudo loom
#

Rly?

soft ermine
#

ye

pseudo loom
#

🙁

#

I felt like in the future the world might revolve around ai

soft ermine
#

the models havent changed mathematically for decades. the experts r going in to big companies which are run by old ppl so we are having no technical advancements for it.

pseudo loom
#

It actually makes sense

#

Sad

#

Who cares. Python is a great language though ain't it?

soft ermine
#

ai is not intelligent. the biggest models dont even think or reason, they just predict the next word in a sentence based on probability

pseudo loom
#

Hm

#

Can't predict what would happen still. I hope it doesn't end up like that

soft ermine
#

ye u dont need to predict anything. its just how limits work. ai will plateau soon because the training pool will be less effective over time and the predicting ability will stagnate

pseudo loom
#

o-0

soft ermine
#

ye so the next step is either big ai companies financing research into new models or throwing endless money for very little gain

pseudo loom
#

Many people r learning for the sole reason of making better ai models. Like my friends r learning python for the same reason

#

Yo just wait a min I will come back

soft ermine
#

ye they will all unanimously fall into the trap of doing exactly what they were taught to do, then if they make it to a company the company will tell them to copy the big companies and they will get stuck making another chatgpt. which is just a flawed concept

pseudo loom
#

Its just like that

#

When people in 2017 thought ai was gonna plateau. 💥 Chatgpt claude and gemini

soft ermine
#

no

#

they just got financed is all

pseudo loom
#

Its just like that. Limits can come in siem fields. Who knows perhaps we get an AI that thinks for itself instead of predictions

#

U can't rly predict

soft ermine
#

hundreds of millions of dollars and the ais cant even make reliable video slop yet...

pseudo loom
#

I understand limits have occured in some fields, but some other fields still r in the beginning phase

soft ermine
#

so idk what you believe, but its very obvious that these ais are going nowhere

pseudo loom
#

Let's see 🙂

#

Kk bbye u seem cool

soft ermine
#

k bye

pseudo loom
#

Gotta have breakfast

pseudo loom
#

I am back

soft ermine
proper urchin
#

so i made this ai

#

import random
from random import *

populationamount = 100
mutationrate = 0.15
goalnum = randint(1, 1000)/10
print(f"Goal: {goalnum}")
chars = "0123456789+-/"
operators = "+-
/"
randpopamount = 10
addcharrate = 0.05
delcharrate = 0.05

def randexpr(length = 12):
return "".join(choice(chars) for x in range(length))

def safeeval(expr):
try:
val = eval(expr)
if val > 1e9 or val < 1e-9:
return float("inf")
else:
return val
except (Exception, SyntaxWarning):
return float("inf")

def mutateexpr(expr):
expr = list(expr)
newexpr = []
for char in expr:
if random() < mutationrate:
newexpr.append(choices(chars, weights = [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4], k = 1)[0])
else:
newexpr.append(char)

if random() < addcharrate and len(expr) < 20:
pos = randrange(len(newexpr))
newexpr.insert(pos, choice(chars))

if random() < delcharrate and len(expr) > 4:
pos = randrange(len(newexpr))
del newexpr[pos]
return "".join(newexpr)

def crossover(a, b):
cut = randrange(len(a))
return a[:cut] + b[cut:]

def tourney(pop, k = 5):
group = [choice(pop) for x in range(k)]
return min(group, key = getfitness)

fitnesscache = {}
badpats = [
([f"{i}/{i}" for i in range(1, 10)], 50),
([str(goalnum + i) for i in range(-9, 9)], 100),
(["0+", "+0", "-0", "0/", "0*", "0", "0*"], 100),
(["1*", "*1", "/1"], 75),
(["++", "+-", "-+", "/+", "/-"], 50),
(["//"], 25)
]

def getfitness(expr):
if expr in fitnesscache:
return fitnesscache[expr]
diff = abs(safeeval(expr) - goalnum)

#

#pusnishments
for outerpat in badpats:
for innerpat in outerpat[0]:
if innerpat in expr:
diff += outerpat[1]
break

fitnesscache[expr] = diff
return diff

def properformat(expr):
newexpr = ""
for char in expr:
if char in operators:
newexpr += f" {char} "
else:
newexpr += char
return newexpr

population = [randexpr() for _ in range(populationamount)]
generation = 0

while True:
generation += 1
best = min(population, key = getfitness)
if safeeval(best) == goalnum:
print(f"Achived target number on generation {generation}")
print(f"{properformat(best)} = {safeeval(best)}")
break
else:
print(f"Best of generation {generation}: {best} = {safeeval(best)}")
newpopulation = []
for _ in range(populationamount - randpopamount):
parent1 = tourney(population)
parent2 = tourney(population)
child = mutateexpr(crossover(parent1, parent2))
newpopulation.append(child)
for _ in range(randpopamount):
newpopulation.append(randexpr())
population = newpopulation

#

ugh dc is formatting my stuff ;-;

soft ermine
#

um

#

thats not an ai

pseudo loom
#

thats cool bacon

#

but ya its not technically an ai model, but a random guessing program

#

its still cool

#

btw u have commited a syntax error at line 86 i think. in is invalid syntax

untold bloom
#

that's far from a random guessing program, it's indeed an intelligent one

velvet ice
#

Hi, I'm trying to make a predicting model which uses the opening price of a stock and returns the volume (how many people buy it.)

This image above is the relation between Opening price (X-axis) and volume (Y-Axis). I was wondering what regression model I should use in order to get the most accurate result.

waxen kindle
#

seems non-correlated, at least below ~500

arctic smelt
vale field
#

Quick question, I'm trying to interpret my results for cosine similarity. I know that it is a measure how similar e.g. a document is to other documents. Would most similar pairs be values that are greater than 0.5 and values that are 1?

arctic smelt
#

A value of 1 is the most similar

soft ermine
#

hoi

arctic smelt
#

Anything less than that depends entirely on your specific use case and data

soft ermine
#

noice

opaque condor
#

Does anyone have a neural mat that can generate useful structures?

serene scaffold
opaque condor
#

It was a typo and had to keep my eyes on the road for a second or two

waxen kindle
#

please ask again after driving

serene scaffold
opaque condor
#

I turned onto a side street to ask the question

waxen kindle
#

doesn't matter, please drive safely and will talk about that later

serene scaffold
#

There's no way we can have a productive conversation ... While you're driving

soft ermine
woven prairie
#

Has anyone worked with langgraph building an agent.

grand minnow
supple spruce
#

I have question.
for AI learning, is python essential?

grand minnow
supple spruce
#

but python also okay?

grand minnow
opaque condor
#

yes of course

supple spruce
#

ohk

#

thanks

red flint
#

Anyone can send Whole roadmap of AI i mean full scrkit learning to tensorflow to Latest llm concepts

pseudo loom
#

Try finding some course on Coursera or Udemy. They can be really helpful. I suggest Andrew ng's full ai with python course in Coursera

opaque condor
#

Is there a book for py torch for beginners?

vale field
#

Im trying to learn more about BERT. I wanna ask which model should i use for generating BERT embeddings for sentences all-MiniLM-L6-v2 or all-mpnet-base-v2 or something else?

pseudo loom
#

According to me you should use all-mpnet-base-v2 as its a top performer balancing in speed and high accuracy for smaller models with less memory. all-MiniLM-L6-v2 is a highly recommended choice, while bge-en-icl is noted as having top performance on various benchmarks, but with a larger size.

#

Building LLMs with pytorch: step by step by Anand Trivedi is a good book for pytorch. You could find more books on Amazon

charred light
#

Ollama with RAG is still non-deterministic even with temperature set to 0 right?

lime grove
#

having some fun with the airline passengers prediction data set, and I got this with a naive LSTM implementation in pytorch

#

1 to 1 prediction, 0.67 train test split, 2000 epochs.

#

However, a single MinMax normalization line results in this

#

same hyperparameters

#

Now, the thing that I am finding interesting is this: the green line, which represents the test set, is nearly identical to the original dataset. The MinMax range was chosen to be (0, 0.1)

#

It is not as good if I normalize it as (0, 1), which is commonly done. So the absolute magnitude of the training errors matter in the end. The error is directly dependent on the absolute scale of the y-axis

lime grove
#

and yes, did this with Jupyter, the hated notebook.

rich moth
#

What if you could turn ML artifacts into proof-carrying objects?

#

Like lets say, for any given dataset, model or run a small, deterministic and verifiable record gets produced that says, this is what i am, this is how i was computed and here how you can independently verify that I'm not lying.

torpid quartz
#

Running a model on my Mac even with metal takes an extremely long time (13 sec or more), but with the webgpu demo on chrome, exact model it has sub-second latency… anyone know what might be causing this?

#

It shouldn’t take this long as it’s meant to be very fast even on iOS

vast pond
#

this is just a silly question but whats happening here? why are those two not interleaving

wooden sail
#

make_moons generates a dataset for classification, so what you see are the two classes. since noise=0, the two classes are clearly separated

vale field
pseudo loom
#

Hmm

mellow vector
#

So I'm interpolating values for NaN's and I'm not sure if this instructor is just saying, "Hey, you can deal with NaNs this way!" or "Hey, you can synthesize data for training this way!" Is it unheard of to train on available data to generate missing values to then train on?

agile cobalt
# mellow vector So I'm interpolating values for NaN's and I'm not sure if this instructor is jus...

not unheard of, but there are a bunch of downsides like

  • can reinforce biases
  • increase the risk of data drift
  • many real datasets are made by concatenating different datasets ; the distribution in each of them would be different, so you may need to train a model per data source, or give up if a given source does not contains that field for any records at all

oftentimes it's better to just let the model figure it out instead of layering a model on top of another model

mellow vector
#

man, the precision that make up stats still perplexes me sometimes... I don't have a formal stats education and it would be so easy to look at two distributions and just say "eh, close enough"

lime grove
#

By interpolation do you actually mean imputation?

#

I feel that using the word "interpolation" leads to a mental dead end. If you start thinking of filling gaps in the data as "imputation" you'll find a rich and extremely challenging literature on this theme

#

Because, for example, preserving the statistical properties of the data set is important, and this isn't a solved problem, and is an area of active research. For a time series, for instance, there are techniques that consist of identifying distributions characteristic to the region around the gap, and then taking a random sample.

mellow vector
#

Yeah I'm somewhat familiar with the different methods of filling in the gaps, I was just a bit surprised to have it presented as it was. I wouldn't intuitively consider training a model on synthetic data.

lime grove
#

I don't see an issue with synthetic data

#

The question is really about the properties that synthetic data has

#

As long as the model that generated that synthetic data has some kind of correspondence with what you're trying to ultimately model

#

Like the stock market, a good question would be the model that generated synthetic data set: how well does it represent the actual market? Probably not very well, but I hope your get the idea I'm trying to present

#

Ideally, synthetic data inserted into real data should be invisible insofar the final result is concerned

mellow vector
#

It's just strange to treat a prediction like a feature for the first time I guess ¯_(ツ)_/¯

lime grove
#

what do you mean - a prediction like a feature?

vale field
#

Quick question, i'm supposed to be making heatmap of my document similarity results and when I was calculating the similarity results, I used 2000 of the documents. When I was trying to make heatmap e.g. for 50 documents, it was not interpretable whatsoever. I just wanna ask, does the number of documents when making visualisation matters or not? I'm not sure what people do when they need to make visualisations and they are working with a lot of data. I've been using top N documents so far but I don't know why I still feel uncomfortable doing this way.

torpid quartz
unkempt apex
torpid quartz
unkempt apex
torpid quartz
unkempt apex
#

lol

lime grove
lime grove
#

the usual question with clustering, which is: how much data do you need to resolve a cluster in N dimensions, as a function of N?

obsidian talon
obsidian talon
torpid quartz
#

Running llms and vlms on my Mac are very slow even with metal, even in webgpu it’s getting sub second response times but trying to run it manually it takes over 10 seconds - does anyone know why this might be?

#

<@&831776746206265384>

#

This guy is trolling in general as well

zenith nova
#

!mute 1435917124303589427

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @glad vessel until <t:1762854500:f> (1 hour).

sly viper
wheat snow
#

markov decision processes: the sum of a row within the transition probability matrix must be either 1 or 0 correct? when iterating over my transition matrix for a shape(22,22,4) matrix (its about the grid given in the assignment) i get some odd results:

#

T[s', s, action_space]

sum of T[0, 0, :]: 2.0
sum of T[0, 1, :]: 1.0
sum of T[0, 5, :]: 0.9999999999999999
sum of T[1, 0, :]: 0.9999999999999999
sum of T[1, 1, :]: 1.9999999999999998
sum of T[1, 2, :]: 1.0
sum of T[2, 1, :]: 0.9999999999999999#

#

the 0.99 is due to machine precision error i assume but i am more worried that i get 2.0 as the prob sum to transition from State 0 to state 0 given to try all a from the action space

#

the ruleset was this

wheat snow
#

@buoyant slate

buoyant slate
buoyant slate
# wheat snow T[s', s, action_space] sum of T[0, 0, :]: 2.0 sum of T[0, 1, :]: 1.0 sum of T[0...

then it makes sense for it to be 2, while being in 0, 0 you only stay in place if you try to move north or west
then you can choose north or west, succeed and stay in place with probability p each or fail and go to west or north respectively with probability (1-p)/3
or you can go south or east, each has probability p of succeeding and probability (2 - 2p)/3 to instead go north or west and stay in place
giving you (6-6p)/3 + 2p = 2 - 2p + 2p = 2

#

like i said in math server the values in action rows don't have to make a distribution, the procedure is that you first choose an action, take the row corresponding to your chosen action and current state and this will be the distribution over next states (and hence has to sum to 1)

wheat snow
#

which results in also staying

#

what does this mean now. that there is a 2 chance to stay in s0 when starting from s0 when we sum all actions. thats not really workijng in my brain, cause i knew probabilities beeing between 0.0 and 1.0

#

OHHH but ofc, teh agent can only pick one action at a time

#

so i should rather see it like the agent has a x chance to stay in s0 when starting from s0 when picking that and that action

buoyant slate
wheat snow
buoyant slate
#

as in, the individual cells are probabilities but they aren't probabilities over next states not over actions

wheat snow
#
print(grid.T.sum(axis=0))

[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]] works out

wheat snow
#

wait holup what does axis=0 in a rank 22 tensor mean?

buoyant slate
#

side note- rank means number of dimensions in a tensor so it's rank 3 tensor

wheat snow
#

3 dimenison ye mb

buoyant slate
wheat snow
buoyant slate
#

that will result in the same cuboid tho

wheat snow
#

i could reshape and check

#

printout of the T

#

p=0.7 in this example

buoyant slate
#

yep, now imagine taking each of these matrices and stacking them in a separate dimension

#

now fixing one dimension will give you a slice, fixing two of them will give you a "line" in this cuboid, fixing three of them will give you one element

wheat snow
# wheat snow

i mean this shape makes sense for me. the first chunk (layer) is an s' . this s' has 23 rows each representing s0- s22. the 4 columns present the possible actions to take and teh vals the % to move from that state s (which is a row) with an action to the s' given by teh entire first layer

#

and teh 2nd layer is s1 then again with 22 rows representing the starter states

wheat snow
buoyant slate
#

then summing over axis=0 will eliminate the 0th axis (corresponding to "next state") so it will give you probability to moving to "any of the next states" for a given state-action pair
which is 1, because you always have to move to some next state

wheat snow
#

also i coded my solution with adding this comment:

Assumption: rewards get given to agent upon departing from an state, so arriving from S16 -> S21: +0 and going from S21 ->S16: +10

this what we got tought in the lesson about mrp's. rewards gioven upon departing

#

but i think it doesnt make too much sense for this?

#

also States moive from left to right

#

this 1 between 10 and 11 is supposed to eb empty

buoyant slate
wheat snow
#

also does absorbing state mean that the agent cant take any further action once reacing an so called "absorbing state"

buoyant slate
#

what is your algorithm for computing the policy tho

wheat snow
#

thats for next week

#

"An absorbing state is a state that, once entered, cannot be left. A (finite) drunkard's walk is an example of an absorbing Markov chain. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space." wikipedia

buoyant slate
wheat snow
#

so the thought cannot be for a succsessfull policy/agent to give rewards upon departing from a state because absorbing states trap us.

buoyant slate
#

also algorithms usually do offline learning - they don't change policy during simulating the agent, only after it has finished a "simulation episode"

wheat snow
buoyant slate
#

if that's not the case then yea it seems wrong at the first glance

wheat snow
#

anyway. can u have a look ove rthe code i wrote to achieve my matrix @buoyant slate ?

buoyant slate
wheat snow
# buoyant slate not right now cuz i have to leave in 10 minutes but i can later

k, i leave it here fo ya to check if u got time

we have a big ah class GridWorld which contains all info a lot of helper functions:

def __init__(self,
               shape = (5,5),
               prob_success = 0.7,
               obstacle_locs = [(1,1),(2,1),(2,3)],
               absorbing_locs = [(4,0),(4,1),(4,2),(4,3),(4,4)],
               absorbing_rewards = [-10, -10, -10, -10, 10]
              ):
    """
    GridWorld initialisation
    input:
      - shape {tuple} -- GridWorld shape (height, width)
      - prob_success {float} -- probability of success when taking an action, used to fill the transition matrix
      - obstacle_locs {list of tuples} -- location of all obstacles of the grid: [(obstacle 1), (obstacle 2), ...]
      - absorbing_locs {list of tuples} -- location of all absorbing states of the grid: [(state 1), (state 2), ...]
      - absorbing_rewards {list of float} -- reward corresponding to each absorbing state of the grid: [reward 1, reward 2, ...]
    output: /
    """
```helper functions: 

a neighbour matrix 22x4 (bad name) containing what happens when taking a direction a from a state and where you end up:
#

and teh fucnxtion i wrote

#
def fill_in_transition(self):
    """
    Compute the transition matrix of the grid
    input: /
    output: T {np.array} -- the transition matrix of the grid
    """
    T = np.zeros((self.state_size, self.state_size, self.action_size)) # Empty matrix of dimension S*S*A  23 sacks of 23 x 4 matricies. 
    #each stack is exactly one S_prime. each s_prime contains 23 States where it could originate from. T[s', s, a]
    
    ####
    a_size= self.action_size
    state_size= self.state_size
    neighbours= self.neighbours
    prob_success= self.prob_success

  # T[s_prime, s, action] --> similar to P(s_prime| s, a) ===> T[21, 17,2]: the probability when departing from state 17 to 22 by choosing west
    for a in range(a_size): # represents the choice the agents picks
      for s in range(state_size):
        for a_result in range(a_size): #represents the actual s_prime the agents attempts to move to
          s_prime= neighbours[s][a_result]

          if a_result==a: #sucsess case
            p=prob_success
          else:
            p=(1-prob_success) /3

          T[int(s_prime), s, a] +=p #+= because of multible walls that agent could hit
    return T
#

i think helper functions arent used here yet

#

they are for the reward matrix

wheat snow
thick heart
#

what are some free ml bootcamps or coursess?

wheat snow
torpid quartz
#

Help wud be appreciated thanksss

thick heart
lime grove
#

there's no reason to assume that those courses are better than a comparable rando Udemy / Coursera course

#

you kind of have to tailor the course you choose to your goals & skill sets.

#

I never recommend learning a data science topic in a college degree style, which is bottom up. In other words, before you learn the topic itself, first we must take a few semesters of adjacent courses, etc.

#

you should do it top-down, which is you focus on the topic itself, whatever it might be, and you pick up the needed knowledge along the way. Tons of options for this, so choosing the right one for you depends on who you are

#

my experience with online MIT / Harvard / etc courses is that they tend to be bottom-up in how they approach the topic. They are rigorous, demanding, but ultimately a waste of time.

wheat snow
wheat snow
#

@buoyant slate got some.time now?

smoky arrow
lime grove
#

and it takes a fair amount of mathematical literacy to do that. Reading journal articles should be something you can do on a regular basis

twilit topaz
#

You guys been using polars more or still pandas

serene scaffold
twilit topaz
#

The pandas query function is kinda tough for me to follow vs what polars does with filter/select

#

Pandas has the better transpose function doing it with just T

spring field
#

speed's not that relevant for most use cases probably

#

that said... someonesaidrust

twilit topaz
#

Only thing in polars I can't do properly is the transpose function

#

Polars transpose is more complicated

agile cobalt
spring field
twilit topaz
twilit topaz
spring field
#

I know it's faster, I just don't think the speed difference is meaningful in the vast majority of use cases

#

but of course, that's a lame excuse

twilit topaz
#

Like I had some code for Treasury data I had for time series and I couldn't transpose it to make yield curve using polars but the pandas T worked fine

#

Given if you use pandas to datetime

#

The documentation for polars on this was confusing

agile cobalt
agile cobalt
twilit topaz
#
import matplotlib.pyplot as plt
import numpy as np
import yfinance as yf
import scipy as sp
Treasury = pd.read_csv('daily-treasury-rates.csv')                                                        Treasury['Date']= pd.to_datetime(Treasury['Date'])
Treasury = Treasury.set_index('Date')
Treasury1 = Treasury.T                                                                                    plt.plot(Treasury1['2025-10-10 00:00:00'],color='black',label='10/10/2025 Yield Curve')
plt.plot(Treasury1['2025-08-28 00:00:00'],color='red',label='8/28/2025 Yield Curve')```
agile cobalt
twilit topaz
#

This was in pandas but how to convert to polars is difficult

#

The data I used came from the US treasury website

#

The result I would get this

#

from pandas but polars i cant

#

i know i can convert the pd dataframe to polars but i cant transpose cleanly like pandas

agile cobalt
# twilit topaz

I think that df.unpivot(index='Date') works for that? ```pycon

import plotly.express as px
import polars as pl
df = pl.read_csv('daily-treasury-rates.csv')
t = df.unpivot(index='Date')
dates = ['10/10/2025', '01/02/2025']
test = t.filter(pl.col('Date').is_in(dates))
fig = px.line(test, x='variable', y='value', color='Date')
fig.write_html('test.html')

granted, you cannot index it later - polars explicitly avoids having an index like pandas's
twilit topaz
#

Unorthodox but works

somber willow
#
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df.head(20)

X_train, X_test, y_train, y_test = train_test_split(df, data.target, test_size=0.1, random_state=42)    
model = LogisticRegression(max_iter=100000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_pred

model_accuracy = accuracy_score(y_test, y_pred)
model_classification_report = classification_report(y_test, y_pred)
model_confusion_matrix = confusion_matrix(y_test, y_pred)```
#

is there anything to improve or anything as a recommendation, I am a beginnner to data science

#

it's a breast cancer detection model

obsidian talon
#

Technically a lot to improve, but what do you feel what help you most to learn how to do?

somber willow
hoary elbow
#

Hello,

I am having issues with setting GPU/CPU in order to train my ResNet model. My Jupternotebook is currently set with a GPU. When I try to load my dataset from a directory. I need to explicitly tell tensorflow to perform that operation on the CPU by doing

with tf.device('CPU:0')
 # code to load datasets here

When I want to create my resnet model i currently do;

with tf.device('/CPU:0'):    
    pretrained_model = tf.keras.applications.ResNet50V2(
        include_top = False,
        input_shape = (img_height, img_height, 3),
        weights = 'imagenet',
    )

    # other code here

    output = Dense(1, activation="sigmoid")(x)

    model = Model(inputs=pretrained_model.input, outputs=output)

Finally, I want to compile and fit the model;

# compile code here

epochs = 50

with tf.device('/GPU:0'):
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=epochs,
        callbacks=[checkpoint_cb]
    )

However when I try this, I get this error;

InvalidArgumentError: Graph execution error:

Detected at node StatefulPartitionedCall defined at (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
Trying to access resource conv1_conv/kernel/911 (defined @ /opt/conda/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py:38) located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
[[{{node StatefulPartitionedCall}}]] [Op:__inference_multi_step_on_iterator_38084]

How can this be solved? When I set the model creation to a GPU that code will then fail. So it becomes a big problem.

somber willow
true flicker
#

Hey everyone! I’m preparing for Data Engineer roles (Python, SQL, ADF, ETL) for the next 6months — anyone up for consistent learning and project collaboration?

buoyant slate
# wheat snow <@454740319510986755> got some.time now?

sorry i was quite busy yesterday and i forgot
the code looks alright i think, you iterate over action-state pairs and the innermost loop looks at all neighbours of the state and fills out probabilities in sliced matrix corresponding to s-a pair
although you might be misunderstanding what the tensor means - it is keeping transition probabilities, so probabilities that you will transition to a given state after you've chosen action in some state
so the innermost loop shouldn't in general loop over all possible actions, but rather over all possible states (or only the ones that are reachable from current state, varies from env to env) - in this case the actions correspond to possible next states but it doesn't have to be the case

wheat snow
buoyant slate
# wheat snow hmm i am not quite getting the 2nd part. why wouldnt we want to have the state l...

okay so the tensor you're filling out tells you this - "If we are in state s, we take action a, then for any state s', T[s', s, a] is the probability we end up in this state in the next step"
And what you are doing is "If we are in state s, we take action a, then for any action a' we fill out T with probabilities so that T[s', s, a] is probability of ending up at s' where s' is the state associated with a' "
the problem is - the possible states s' you can end up in after action a from state s do not have to correspond to actions, they are kinda independent
You assumed that each action corresponds to some direction in which we are trying to move - this is correct since we have a grid, but what if we had only 2 actions - one has 1/3 probability to either go north, west or east and second 1/3 probability of going south, east or west - then your innermost loop will go over two actions but you have 3 probabilities to add - since you can move in 3 different directions after any action

wheat snow
# buoyant slate okay so the tensor you're filling out tells you this - "If we are in state s, we...

thanks a lot. ima go over this with my code later, but i assume the correction would be to loop over action space 2x. then identifying and crafting the probabilities (that represent actually attempting to move in that direction a we are currently looping over):

if a_result==a: #sucsess case
   p=prob_success
else: #fail case
   p=(1-prob_success) /3

followed by looping over our states and within that loop determine s' and the val for T to fill in by:

s_prime= neighbours[s][a_result] #a_result is the inner a loop
T[int(s_prime), s, a] +=p
buoyant slate
# wheat snow thanks a lot. ima go over this with my code later, but i assume the correction w...

i mean your code is correct, I was just adding context since looping over actions second time seemed a bit odd to me
the usual way I did this is having some function that returns possible next states for state, action pair, e.g.

def possible_next_states(action, state):
    return neighbours[s]

Or just one that returns pairs (next_state, transition probability)
However you don't really need it in this case, just as a reference for future

nimble osprey
#

Hello all, I'm a beginner looking for ideas on how to approach creating a tool that pulls an identical subimage from a larger image using a template, then write what was found to a csv or json file. Specifically it is an inventory screen in a game UI, so the subimage would be an item icon in a fixed position and resolution for every screengrab and it would be an exact match. I've looked into opencv but was just wondering if there was any better suggested methods or tools to use?

vale field
mellow vector
#

so heres some fun everyone can enjoy, I want to convert MNIST back into images in altair. I wrote a 2x2 heat plot that was like 20 lines of code so I'm a little bit intimidated by the prospect. I'm not writing this because it's the best way, I have been doing everything with altair whether it makes sense or not to level up as a DA.

#

I'm about to whip out an autoencoder and then I'll need to check my results, some alt.Chart() code will follow

mellow vector
#

so here's where I'm at ```py
def train_it(model):
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.01)
loss_func = nn.MSELoss()
n_epochs = 10000
for i in range(n_epochs):
sample = df.sample(n=32, seed=i).lazy()
X = sample.select(pl.exclude(['id', 'column_1'])).collect().cast(pl.Float32).to_torch()
y_hat = model(X)
loss = loss_func(y_hat, X)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


return model(
    df.sample(n=10, seed = 42)
    .select(pl.exclude(['id', 'column_1']))
    .cast(pl.Float32)
    .to_torch())
#

nice, am in business torch.Size([10, 784])

waxen kindle
mellow vector
#

autoencoder

waxen kindle
#

right now you are trying to regress lambda x:x

#

alright

mellow vector
#

is that wrong?

waxen kindle
#

no, didn't realise it was an autoencoder

#

I think you should use dataloaders and batches

mellow vector
#

reciting this from memory after a lecture a few hours ago, so ya feel free to correct mistakes

waxen kindle
#

it will work better

mellow vector
#

yeah I have no doubt really I'm more after the charting experience, this was a pretty easy subject

mellow vector
#

huh... so [column for column in encoded_df] returns the columns but encoded_df[0] returns the first row (polars)

#

how'm I supposed to iterated over it, I was thinking .to_numpy().reshape((28,28)) was going to get it reshaped

agile cobalt
mellow vector
#

yeah... hmm I'm less comfortable with numpy but that makes sense

agile cobalt
#

it's mostly the same as torch

mellow vector
#

in general you're not supposed to plot unlabeled data with altair heh

agile cobalt
#

also, you can use df[row, col] like df[:, 0] or df[0, :]

serene scaffold
#

without the loc

mellow vector
#

polars

serene scaffold
#

o

#

I'm on a losing streak today

mellow vector
#

we'll win you over eventually

mellow vector
#

reshaped_arr = encoded_arr.reshape((10,28,28))
reshaped_df = pl.concat([pl.DataFrame(reshaped_arr[i]) for i in range(10)])

#

this feels like a weak method for getting at the intended result

#

I'm still reaching for iteration

#

oh well, break time here's where I'ma leave off ```py
reshaped_df = pl.concat([pl.DataFrame(reshaped_arr[i]) for i in range(10)])
reshaped_df = reshaped_df.with_row_index(name = 'Y')
reshaped_df.with_columns(pl.col('Y')%28+1)

oak hearth
#

guys i need to check ai % detection for my word document do you know any tools?\

serene scaffold
spice tartan
#

Hey guys

#

How do you guys deal with git and merges with notebooks?

serene scaffold
spice tartan
#

I am looking at nbdime and jupytext rn

serene scaffold
umbral shell
#

Hey folks, I know there are a lot of professional programmers here, so I’d like to ask you something.
I keep seeing a lot of negativity around coding with AI / AI-generated code.
Why do you think that is?
I’m genuinely curious — not trying to start drama or be ironic.

Just to clarify:
I’m not talking about “one-prompt copy-paste code”,
but about serious, iterative development with AI tools involved.

jagged bane
#

I think a lot of negativity stems from the reduced human input. Code written by AI will almost always be worse than code written by someone with proper experience, and that will only compound

wicked flare
jagged bane
#

or people backing up their own "ideas" by saying "but chatgpt told me it was correct"

wicked flare
#

Another is that it's just kinda human nature to be skeptical of new and untested concepts or technologies.

umbral shell
wicked flare
#

Lots of people will just kinda lean in the negative direction until it's been proven beyond reasonable doubt that something really works well.

mellow vector
#

it can be useful to someone who can pick out the 1/3 of the code that's decent from the 2/3 that tend to wander into hallucination.

wicked flare
jagged bane
#

a calculator is only useful if you know what numbers you're meant to be punching in

umbral shell
wicked flare
#

We also get a lot of people coming into this server where they've used ChatGPT to vibe code something, and eventually they run into an issue they can't resolve just by re-prompting the AI, and ask us to solve it for them instead.

mellow vector
#

even code that works and is optimized will often be needlessly complex... is complexified a word? anyway, GPT likes to add libraries and variables inappropriately. Routinely in my studies I'll ask AI to make something happen and the learning process for me is to take what GPT gives me and do it the right way

jagged bane
#

Ask it to build something you already have deep knowledge of and you'll see how much unnecessary fluff it adds

umbral shell
jagged bane
#

but if you already have deep knowledge of something, why are you using AI for it....now apply this for things you're less confident about

mellow vector
#

additionally, it's training is horribly outdated in many areas. If you ask it about something it should know, like how to write machine learning, it will give you advice on tensorflow. Tensorflow hasn't been relevant for years.

wicked flare
#

And it's deceptive, because it LOOKS as if you can.

sweet prawn
#

is it necessary to learn how to use LLMs?

wicked flare
#

You will get code, it just won't be good or useful code.

mellow vector
sweet prawn
#

it is

mellow vector
#

I mean, I've read that in china they have LLM class with Math, History, Literacy and whatever

wicked flare
umbral shell
sweet prawn
#

im just going to assume that i am in the "most situations"

wicked flare
sweet prawn
#

from what i hear from senior devs, it seems like it's not actually that useful anyway

#

or at least it doesn't really improve your productivity much

mellow vector
#

Learning how to use LLMs isn't really easy to answer, the models are constantly changing and you wont always be informed. What works today may not work tomorrow, but there're probably some concepts that can be applied across the subject

wicked flare
umbral shell
#

For me it feels like:
– 10 years ago: you had to learn Git
– later: you had to learn Docker / CI
– now: you have to learn how to use LLMs properly

sweet prawn
#

let's see if LLMs go into the crypto bin

wicked flare
#

It'll be interesting to see what will be considered industry standard practice in 10 or 20 years.

sweet prawn
#

for now, it seems i should be able to get away with not using LLMs

umbral shell
sweet prawn
#

im just going to run with that until im forced to do otherwise

wicked flare
# sweet prawn let's see if LLMs go into the crypto bin

I strongly doubt that will happen, because even today people are flailing to even conceptualize use cases for crypto, whereas I think there are already many obvious uses for LLMs, that just maybe still haven't been fully refined.

#

It's just hard to imagine that we'll just shelve all of this.

jagged bane
#

I use it to write me snippets that I can then evaluate, but other than that, it's not a regular part of my workflow

mellow vector
#

It will boil down to "How much hallucination is acceptable?"

jagged bane
#

I've started learning c# and I tried using chatgpt just to see what it would throw at a beginner and I hated it

wicked flare
#

I use the Copilot autocomplete all the time. I never turn it off and I accept its suggestions all the time.

umbral shell
#

hey folks, slightly off-topic for a moment 🙂
I’m building a lightweight IDE in PyQt + QScintilla because I got a bit tired of VS Code / Sublime / PyCharm / Spyder.
i’m planning to release it soon and I’m really curious about your experience.
What are the biggest pain points or cons of the IDEs you currently use?

wicked flare
#

I like using Copilot or ChatGPT to generate test code. I don't use it wholesale and I review it, but it's nice not to have to manually type out all the boilerplate that tends to come with test code.

sweet prawn
jagged bane
wicked flare
wicked flare
jagged bane
#

yeah, it's fine if you can vet what it gives you

umbral shell
#

hah, oky, but anyway guys, thank you for curious discussion about using ai in work !

wicked flare
#

I find in general that it's mentally easier for me to change something that already exists rather than create something new from scratch. Even long before LLMs came around I would often develop new features or tests by copying existing similar code and then changing it to my needs.

#

And I find that LLMs are useful for this, generating an initial draft that I can iterate on, for various tasks.

mellow vector
#

hey nothing personal but I don't really DM, you can usually find me here though

umbral shell
# mellow vector hey nothing personal but I don't really DM, you can usually find me here though

just wanted to thanks and show you something

Task 3.4 – Discord UX Features (Idle Tabs, Variables, Beginner Mode)
Priority: P1 – IMPORTANT
Estimated time: 10–12 h

Scope:

Idle Tabs Manager
track the last_accessed timestamp for each tab,
configurable threshold in settings (default: 24 h),
“notify only” mode (status bar / lightweight popup) + logging,
absolute requirement: zero data loss
(before any future auto-closing behavior, there must be autosave + a “parking lot” recovery system — for 1.0, notification-only is enough).
Variables Panel (post-run snapshot)
after script execution, generate a simple snapshot of locals() (no debugger),
filter out private names (_), provide readable repr() with length limit,
separate panel / output tab, read-only.
Beginner Mode
toggle in Settings (e.g. ui/beginner_mode),
when enabled:
simplified menu (hide advanced options),
larger fonts, more tooltips,
Variables Panel enabled by default, improved error messages.
Acceptance criteria:
Idle Tabs Manager gently notifies the user about long-unused tabs (no auto-closing),
Variables Panel displays a clear, sensible list of variables after script execution,
Beginner Mode noticeably simplifies the UI (fewer options, more guidance).

have a nice day ! 🙂

somber willow
#

is there someone, here who's looking forward to learn together and is a begginer, if yes dm me personally

cedar tusk
#

ive been messing with ai for 5 years now and i still feel like a beginner

somber willow
cedar tusk
#

im learning other stuff

#

and also dont try to learn a library

#

its useless to just learn all the functions defined in a library

#

thats why google exist

somber willow
cedar tusk
#

just go along with what you are doing and search for stuff when its needed

somber willow
#

what are you learning rn

cedar tusk
#

decompositions, ode solvers, efficient ways to handle big data such as sparse matrices, comfyui, autoencoders,

somber willow
#

I get it you're doing the maths

cedar tusk
#

ode solvers is used for image generation

#

decompositions are useful for handling the math of latent space

#

big matrice storage u can guess probably

#

comfyui is for hobby

#

autoencoders is what transforms inputs into latent

somber willow
#

nahh how're you gonna apply it

#

like it's application

cedar tusk
#

i wanna make my own diffusion model

#

along with the vae

somber willow
cedar tusk
#

lol

#

variational autoencoder

somber willow
#

bro, how you're gonna build the model, like actually aplly it in an app etc

cedar tusk
#

if i make the model, then applying it into an app is very easy

#

making the model is the hard part here

somber willow
cedar tusk
#

because i dont have to know the function names, as i know what to do and can find the name from there

#

google it

somber willow
#

can I switch to dms, cause I wanna send you a filw

cedar tusk
#

sure

obsidian talon
carmine vale
#

what is the porpuse of data science

rich moth
cedar tusk
rich moth
#

I just learned about the Kurimoto model recently.

#

Ive been dabbling in multi agent cogntive swarms.

rich moth
#

theres more to it, but thats the geist these days.

lofty root
#

Hi

#

Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

somber willow
wooden sail
# carmine vale what is the porpuse of data science

the purpose is to extract information from data and to interpret it. all of these buzzwords like ML and data science are kinda muddy, unfortunately. for example, data science can involve using AI/ML to process and interpret data. it can also be done with classical methods instead

#

what plunder mentioned is more like data preparation, which is a cleanup process you can do before either of ML and data science

lofty root
#

Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

#

Hi

#

Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

lime berry
#

Hello guys can someone help me by giving me an idea for my final project. I want the for hackathon so I want a great idea and easy cuz im like beginner im down to learn more for the project 😄

slate trench
#

Also write a proper project plan once the idea and project goal starts to take shape. In my opinion, it's a good way to get started.

wooden sail
#

it's not uncommon for these libraries and othery like numpy to not have support for a handful of months. your best bet is to install 3.13 in a separate environment and use that

lofty root
#

Thank you @wooden sail

vale elbow
#

as a beginner which one is easier to learn, which one is easier to master? pandas/polars

lofty root
#

@wooden sail What do you recommend vscode or py charm

wooden sail
#

it depends on how experienced you are. in my personal opinion, learning python, learning a ML module, and learning an IDE are 3 completely different tasks and doing all 3 at the same time will make you learn everything more slowly

#

so if you're new to all, i would probably just use a syntax-highlighting text editor

#

a disproportionate amount of beginner questions in this server have to do with people fighting against vscode and pycharm to get things just to run

#

on a separate note, i do use vscode myself

lofty root
#

I've 3 years experience but I'm used to both

vale elbow
#

i have the basics of python, i can understand variables, functions, classes, dictionaries, tuple, list, for loop, while loop, if/else statements, data types (like str int bool float) and i recently learn some pandas

#

i also know a little about numpy and matplotlib but completely new to polars & sklearn

wooden sail
#

polars is better for large queries because it's faster

#

at least in my head, polars is for handling, moving, and accessing data, but not for any complex processing of the data

#

doing the latter will have you leave polars, transforming the data into something like numpy, torch, etc

vale elbow
#

basically i have to learn these libraries at my school

numpy
seaborn
sklearn
pandas
matplotlib

and i am trying to master them so i can take the exam

vale elbow
wooden sail
obsidian talon
#

Polars is great for processing data, but it has no ecosystem whatsoever.

vale elbow
#

how far does sklearn cover in machine learning? the python library

#

whats the maximum it can do

obsidian talon
#

It covers traditional machine learning

#

Supervised, unsupervised, preprocessing, and pipelines and all.

wooden sail
#

a really big amount of the classical statistical optimization/estimation methods, and some basic deep learning methods

vale elbow
#

oh right now we're on only supervised learning, the lecturer asked us to finish the datacamp course "supervised learning with scikit-learn" and we have some kind of project but i haven't started to even learn the library yet

obsidian talon
#

The majority of supervised learning algorithms in sklearn are hardly used in practice, bit it builds the foundation.

obsidian talon
#

How much do you know about just basic old linear regression

wooden sail
#

there is a large amount of papers being published on these topics and their applications still

obsidian talon
wooden sail
#

and they build the foundation for state of the art model-based neural networks

wooden sail
vale elbow
#

i know nothing yet i only know theory (like reading) classification, regression, where to use each, confusion matrix, decision trees, test data train data like that

obsidian talon
#

Theres not meant to be sequential

#

Supervised learning is easier to learn compared to unsupervised

#

But you need unsupervised to eventually optimize your supervised model

#

For feature engineering

obsidian talon
#

And they're used extensively in more modern tabular ML algorithms

#

Is recommend keep going into linear regression and its regularizations

#

So L1, L2, and L1+L2 - these become hyperparameters for many algorithms so it helps to know how they work

#

For classification, I'd do logistic before anything else and multinomial logistic.

#

After than try KNN, then flip a coin between SVMs and Naive Bayes - difficult cor different reasons.

#

If youre more of a stats guy, try for naive bayes, if youre a math and CS guy, go for SVMs

#

KNN and SVMs can both be used for regression, but a lot less commonly. You can technically use naive bayes for regression too, but I wouldn't recommend it.

#

Then id move on to decision trees and ensemble methods which sort of rule tabular ML for complex data and relationships

#

So random forest after, then general gradient boosting, XGBoost, LightGBM, and CatBoost

#

And then stacking if you're feeling fancy

past meteor
#

There's a lot of details that are off with this, we can go into the details if you're willing to 😄

obsidian talon
#

But learning preprocessing is a huge must!

#

Which details

past meteor
# obsidian talon Which details

The order in which to learn the algorithms seems off and that's partially because you don't (imo) appropriately highlight why any of them make sense to use in a given context

obsidian talon
#

This is similar to how I learned it in my machine class last semester.

past meteor
#

I don't think it's particulaly helpful learning ML as a big box of different algorithms with different names

#

When it's more important to look at the properties of each method, and group them by property, which then maps to the kind of problems they're good at solving

obsidian talon
#

Its more about know where they fall in the landscape of ML and how these algorithms set the foundation for many others

#

You should along the way know every models assumptions/strengths/weaknesses

past meteor
#

In part because the optimization problem they solve's number of unknowns is the amount of observations and not the amount of features

#

Then it's also clear you can't use them on large datasets since you need to make the Gram matrix (size N x N) which may not fit in memory

obsidian talon
#

Works great for high dimensional data with medium sized data sets. It was used commonly for NLP tasks. It blew up in the 90s.

past meteor
obsidian talon
#

They're rather obsolete in the sense of using a tabular model instead of a deep learning model that excels with text based/high cardinality/high dimensionality datasets

past meteor
#

Not at all

#

You need a lot more data to train deep learning models

obsidian talon
#

LLMs alone make an SVM model more novelty than anything else.

past meteor
#

You have much much much more hyperparameters! (In deep learning everything is a hyperparameter, in RBF SVMs you only have 2 hyperparameters)

obsidian talon
#

Industries aren't using SVMs besides niche datasets where its too small for something more complicated but too complex for GLMs

past meteor
#

And finally, the optimization problem for SVMs, both in the primal and dual formulation leads to a global optimum. With neural nets, well good luck fiddling with parameters and training it over and over

wooden sail
past meteor
wooden sail
#

you're underestimating performance and optimality guarantees, which deep learning has very little of

past meteor
#

It's up to you to know when simple(r) methods are appropriate and use them

#

Instead of trying to dice a tomato with a chainsaw

obsidian talon
#

But realistically, organizations that need to deal with high cardinality like that

#

Theyre using LightGBM

#

Theyre using deep learning models

past meteor
#

Not necessarily

obsidian talon
#

Yes, you need more data, and companies have plenty of it. Too much of it.

past meteor
#

Again, it's not the cardinality

#

it's the amount of data

wooden sail
#

big AI companies have plenty of data. most companies do not have enough to train small models

past meteor
#

And when you're working with anything related to bio / human stuff

#

You have so so so little data

#

And these are domains where the most money can be made

obsidian talon
#

Im talking corporate level companies

wooden sail
#

for reference zestar works in bio applications with ML, and i work in industrial nondestructive testing, also with ML

#

with masters/phd

#

i have yet to collaborate with a company or university that says they have too much data

obsidian talon
#

I work as a data scientist

wooden sail
#

training with less data is an active research field

#

ML rollout in industry is impeded by lack of data

obsidian talon
#

Lack of publicly accessible data

past meteor
#

No, even within companies

#

Even if they have data, a lot of it isn't labelled to be used in the context of supervised ML

obsidian talon
past meteor
#

Can I be blunt? 😅

#

There's typically dozens of ways models/approaches can be improved by knowing some more of the theory. I see this at work as well, and models have been demonstrably improved by this.

We can have a cool discussion here, but I feel like I'm talking to a wall haha.

obsidian talon
#

I guess I just have a different perspective

past meteor
#

It's not different, we can't say 1+1=3 and call that a different perspective imo

obsidian talon
#

ML Algorithms that arent great out of the box are very costly and need rapid prototyping

past meteor
obsidian talon
#

And a lot of the modeling i do is bayesisn modeling

past meteor
#

They are not great out of the box if you need a novel architecture because the design space is infinite

#

The whole point of this, is essentially that other methods, simpler ones are great out of the box

obsidian talon
#

Simpler ones are grest if you have simple data and simple needs.

wooden sail
past meteor
#

Let's go back to fMRI data, that is definitely not simple data and not used for simple. SVMs will still outperform an exotic whatever architecture in most cases

#

On forecasting data exponential smoothing has been shown in large scale studies to be hyper competitive with whatever LSTM people were cooking up

obsidian talon
#

You can try benchmarking it against an XGBoost or LightGBM

past meteor
#

Yes

#

That's what they did in the survey paper

obsidian talon
#

What survey paper

past meteor
#

On forecasting methods

obsidian talon
#

Forecasting?

#

As in time series?

past meteor
#

Yes

#

And I hope you know XGB and LGBM have serious issues for forecasting (another nice theory one)

#

The values in the leaves and nodes are from the training data, correct?

obsidian talon
#

Theyre not as commonly used for time series modeling compared to cross sectional data

past meteor
#

Mostly because they can model seasonality without preprocessing

past meteor
obsidian talon
#

Most time series models fail horrendously regardless

past meteor
#

But a lot of people do not know this, so they employ these models in different scenarios where the real world distribution shifts in a very predictable, constant way (e.g., trend) and they cannot capture this, linear regression can

past meteor
obsidian talon
#

The time series has to be static. No sudden cuts to interest rates, no random politician shenanigans.

past meteor
#

If you're using exponential smoothing you're invariant to this

#

If the series suddenly shifts your predictions will also shift

obsidian talon
#

For serious time series modeling, theyre doing everything from scraping news articles and performing sentimental analyses on the

past meteor
#

because the prediction is a moving average

obsidian talon
#

Created several lagged features

#

Comparing the data to clusters of portfolios

past meteor
#

Forecasting is not just stocks btw

#

It's like, the generic term 😅

obsidian talon
#

Its not. One single "cancel culture" tweet would destroy your model and its predictions for next month's concert sales.

#

A single tweet, sudden changing trend, a natural disaster, a political whatever, a pandemic - that model is gone

#

A time series would have be consistent and stable. These models learn from what theyre given. They cant do anything about something no one of us saw coming.

past meteor
obsidian talon
#

Great for valuing your most recent data points more than your previous ones.

#

Still cant do anything about that Mcdonalds E coli outbreak

#

Time Series models overfit beyond most others. The second anything in thst environment changes that is significant - its all out the window.

#

They learn from what they have, but they cant predict or forecast a sudden feature it was never trained on, without a ton of uncertainty.

plush shuttle
#

Guy how to fine tune a model

#

on "cpu"

serene scaffold
#

the code doesn't look any different than the same code running on a GPU. It just won't finish.

prisma wing
plush shuttle
#

want the code then ping me

serene scaffold
# plush shuttle i tried dosent work

It's important to never say that something "didn't work". That doesn't give anyone any useful information. You have to say what you did, what you expected it to do, and what actually happened.

lofty root
#

Can anyone suggest me CV or NLP project ideas??

clever hollow
#

Hi, Is there anyone here, aged 14–25, who is interested in AI/ML?

serene scaffold
# plush shuttle

You deleted the pastebin entry? I'm not available right now, but it would have been useful for other people

serene scaffold
clever hollow
serene scaffold
lofty root
#

@clever hollow I'm interested

clever hollow
#

are u a university student

lofty root
#

Yes

#

I'm BSCS graduate

lofty root
clever hollow
lofty root
#

Amazing

#

I'm working in Computer Vision and Natural language processing

clever hollow
lofty root
#

Let's chat in DM

clever hollow
#

ok

gritty vessel
#

Hey when working on spatio temporal problem let's say I take input t1 to t6 and predict t7 to t9 after that I take t2 to t7 and predict t8 to t10 and so on will this result in data leakage? As samples are overlapping

#

Or I build samples like this t1 to t6 and targets t7 to t9 next sample will be t10 to t15 and tragets for this will be t16 to t18

ornate trellis
#

hey can anybody help me to develop skill in data science and ai i am in 2nd year B.Tech student done with python ,NumPy,pandas ,matplotlib and seaborn going to start ml. Please provide me setp by step process to learn it and also give suggestions

wispy haven
#

Hi everyone! I'm currently engaged in deep analytical research focused on the Oslo Bysykkel Open Data . My main goal is to extract maximum educational and practical value from this data asset. I've developed (mete) several distinct concepts for structuring this work, and I'd love to share the vision and gather feedback. I'd love to connect with anyone interested in discussing these concepts, collaborating on content, or just exchanging insights on the Oslo data.Feel free to send me a DM or comment below! Thanks

mellow vector
mellow vector
#

Spent about 3-400 hours studying ML math and inner workings in the last year, for context. As a professional DA I recognize that deep learning isn't usually the right tool for the job, but I figure the math will be valuable as I try to reenter the job market.

plush cloud
#

Hi everyone! I’m currently exploring Data Science through a course and have just started a GitHub repo to map out learning paths. It’s a collaborative space, and I’d love to include others who are passionate about DS — whether you're experienced or just starting out. If you'd like to contribute helpful resources or insights, feel free to DM me for an invite. Let’s learn and grow together

gritty vessel
mint shard
plush cloud
# mint shard heard data science is just statistics is that ryt?

Not exactly, DS includes statistics, but it’s much more than that. It also involves programming, machine learning, data wrangling, and storytelling through visualizations. Statistics helps us understand patterns, but data science uses that understanding to build models, solve problems, and make smart decisions with data.

serene scaffold
#

Hi @grizzled tartan . It actually takes an exceptional amount of computing power to create LLMs. Like it literally costs millions of dollars each time. You should start by learning how to train simpler models.

grizzled tartan
snow fog
#

Hi everyone, I have created a fuzzer to fuzz test the MCP, helpful mostly if you’re using compiler language to create an MCP server as it would help detect crashes and other probable resource issues, also if you’re implementing your own custom MCP protocol implemented, it’s not tested thoroughly as you can see from the issue https://github.com/Agent-Hellboy/mcp-server-fuzzer/issues/108

please use this on your server and help me test it, could be a helpful project to the community.

GitHub

Hello and thanks for putting together this tool! I am trying to run the fuzzer against a local server requiring a bearer token and was not able to figure this out from the current docs originally: ...

full ravine
#

Hello guys

#

Would love to know your opinion on this project

woven prairie
#

My question is - Can we make subgraphs inside the main graph sharing the same state of the main graph in langgraph ?

brazen jungle
#

Guys.... Where to start?

rich moth
rich moth
# full ravine Hello guys

Im working on something a bit like that in nature, but im working toward a crypto economic provenance layer. Its got a unique complexity metric Ive been working on for a few years. But it also uses merkle anchored commitments on Polygon zkevm, basically ZK data availability layer.

rich moth
still prairie
#

Hello my name is Taha, nice to meet you! -> likedin/in/tahayacine
If you are a strong CS/AI/Data Undergrad or Masters, and strongly interested in AI research and just starting or just started, DM me, I am starting an initiative together.

placid pine
#

Hello everyone, I'm new to python, and now for the specific what i do is learn about data and be a data scientist, but now i'm really confused what should I learn next after playing with the python.. should I continue to learn about sql? or do I need to learn another thing that relate to a data? welp, no idea.. so i just wanna ask something, what the next thing should I learn to be a data scientist?

prisma wing
#

uhm

grand minnow
placid pine
grand minnow
grand minnow
#

like data cleanup or reshaping, resizing, additional columns, drop NAs, etc

placid pine
placid pine
mellow vector
# placid pine oh yes, the seaborn one who made the chart of a data

Data science is a lot of statistics and data interpretation, much of which isn't performed programmatically. As agent mentioned, a dataframe library is the primary tool in their stack, personally I prefer polars over pandas. Corey Schafer goes over pandas and is highly regarded by the community if you're looking for some educational material.

#

As for charting, multiple sources have recommended plotly express as the first library to reach for. Past that there are many options, you may also benefit from Marimo, a notebook interface for *.py files with integrated visualization.

placid pine
hard kettle
#

hello guys, im currently looking for data science github repositories made by seniors, i want to know how seniors make projects

#

how can i find one?

lime grove
#

charting: plotly or matplotlib. Roughly equivalent, but I do not like how plotly has a sales component to it. Makes me feel sullied.

#

also note that you can use gnuplot from the command line if you want to do something really quick and dirty. I feel that gnuplot is often overlooked

#

it has a simple and easy to learn scripting language

orchid light
#

what kind of zesty finetuning where they doing for grok 4.1

errant bison
#

Which is the best and most scalable tool for complex workflows?

rich moth
#

Anyone in here read about complexity-aware embeddings yet?

#

i had an idea..

gloomy dirge
#

what is it

rich moth
#

take a sentencetransformer that has this built in complexity sense, thats driven by my UCF tool. But the idea is to put a small head that projects embeddings int o a 5D "UCF" space (N,A,ϵ,cosθ,sinθ)then.. maybe.. train it so that this subspace matches the analytic UCF while still doing normal semantic embeddings. something like that

#

one embedding, but two views

dusty violet
#

hi guys, im kind of obsessed with reproducibility (nix user 😔 ) and i was wondering if there is a library i can use for downloading datasets that:

  • caches the data on disk, so it's downloaded only once
  • asserts that the checksum of the file matches a given hash, so it's clear if the source data ever changes
  • returns a path to the downloaded/cached file

basically im looking for something similar to fetcher derivations in nix, but as a python library

fetchurl {
  url = "https://www.kaggle.com/api/v1/datasets/download/hojjatk/mnist-dataset";
  hash = "sha256-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
}
rich moth
jaunty helm
twilit prism
mellow vector
twilit prism
woven prairie
#

Does anyone have the experience of using langgraph

serene scaffold
woven prairie
#

Yes will do it

waxen kindle
jaunty helm
agile cobalt
dusty violet
supple plover
#

Hi . I have recently started a master's course in machine learning and data science. I was hoping to find out if there was anyone on this channel that does or might be interested in tutoring/having a ML concepts discussion...basically to help with discussing doubts on basic ML concepts. Feel free to DM if anyone is interested.

waxen kindle
twilit prism
# supple plover Hi . I have recently started a master's course in machine learning and data scie...

Isn't pretty much just:

  • prediction by pre-training
  • prediction by reward

using things like quality-learning, where you have a state, and you just alter the state if it does an action by how far/close it got to guessing if the route it should take is correct, so the weights on best action to choose are altered and allow it to make a different decision on the next test run as a way to permutate every possible action given the state constraints that adjust each run until it converges on what's good by always getting the right answer each time?

And if you take all possible states that could be permutated, and scope them to a smaller set of values to split up the states assessed for the learning process, you have a network/framework of these q-learning instances working within their own state-space, so you dont permutate something like 800 points, and it cuts it down to like 60 (60 is alot, idk of an example right now, but I try to get it to 10 or less) per learning instance.

There ya go, ML.

tired wedge
#

What do I do to get experience that has an effect on the outside world so that I can turn python into something that makes me money. I have thought about process automation but I do not know where to reach people.

subtle lotus
tired wedge
#

What do you guys think about the longevity of data science?

vivid flicker
#

What bot should i make

waxen kindle
#

Whatever you need

bronze wyvern
#

Hello, anyone familiar with Roboflow here pls...I have a folder with my images and labels from txt file, anyone knows how I can upload that in roboflow? I can only upload a single folder at a time, do I need a json file or something linking each image to a label or something like that?

rich moth
#

damn my trading bot is destroying it. its up $1800 bucks in 40 hours. pretty proud of myself though, between the model and the trading bot itself it's taken me at least a year and half to make it this far.

rich moth
#

But you wont need nearly as many as we currently probably do.

vale elbow
#

I know nothing about trading

rich moth
vale elbow
#

Too busy with school to start now

slate trench
rich moth
pliant venture
#

Can yall help me with a data leakage problem

left tartan
pliant venture
#

Ok, I have a data leakage problem in this code where im getting a 1 for the score(It may be overfitting but I doubt it), here's my code: import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

ts = pd.read_csv("/Users/arhaann/Documents/code/Python/Titanic Survival.csv")
s = pd.read_csv("/Users/arhaann/Documents/code/Python/Survive.csv")
ts['Age'] = ts['Age'].fillna(ts['Age'].median())
ts['Fare'] = ts['Fare'].fillna(ts['Fare'].median())
le_sex = LabelEncoder()
le_embarked = LabelEncoder()
#Female is 0, Male is 1
ts['Sex'] = le_sex.fit_transform(ts['Sex'])
#C is 0, Q is 1, S is 2
ts['Embarked'] = le_embarked.fit_transform(ts['Embarked'])
ts['Family_Size'] = ts['SibSp'] + ts['Parch'] + 1
ts['Family_Size'] = ts['Family_Size'].fillna(ts['Family_Size'].median())
ts['Survived'] = s['Survived']
ts = ts.sample(frac=1, random_state=41).reset_index(drop=True)
x = ts[['Pclass', 'Sex', 'Age', 'Embarked', 'Family_Size', 'Fare']]
y = ts['Survived']
print(ts.head(10))
print(s.head(10))
gbr = GradientBoostingClassifier()
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1000)
gbr.fit(x_train, y_train)
print(cross_val_score(gbr, x_train, y_train, cv = 3, n_jobs=-1).mean())
param_grid = {
'n_estimators': [100, 200, 300, 500],
'learning_rate': [0.01, 0.05, 0.1],
'max_depth': [3, 5, 7]
}
gbr2 = GridSearchCV(gbr, param_grid, cv = 3, n_jobs=-1)
gbr2.fit(x_train, y_train)
y_pred = gbr.predict(x_test)
print("Base Model Accuracy:", accuracy_score(y_test, y_pred))
best_model = gbr2.best_estimator_
y_pred_best = best_model.predict(x_test)
print("Best Model Accuracy:", accuracy_score(y_test, y_pred_best))

limpid zenith
#

!code

arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

opaque condor
#

Does anyone have any kids stories they can type to my ai? (Chatgpt coded it and I have been playing with it)

quasi echo
#

Hey guys it's nice to have you all here , today i joined this discord community and it's really exciting for me to be here .

Any body amongst you guys familiar with the libraries that one needs to learn to be a data analyst I'm actually pretty confused though.

copper kindle
atomic magnet
#

hey guys can you gimme advice i know nothing bout AI (i know python. c. java tho) will this book help me like build at least small langauge models. and what videos and media you recommend other than this. rlly appreciated.

agile cobalt
slate trench
lime berry
#

Hello guys can you give ideas for data science project for a hackathon

mint flume
#

Some guidance for data science please

subtle lotus
rich moth
slate trench
#

Cool project, but FYI this isn’t an LLM. It’s basically a rule-based NLP engine, not a neural language model. Also, real question: is there even a single line of code in here that you actually wrote yourself?

#

Eager but completely uneducated "developers" are killing all open source work. My time was wasted again for 10 minutes. Open source is doomed.

worldly dawn
#

Not the place to shitpost

#

<@&831776746206265384> shitposting, aggressive

zenith nova
#

!mute 1426730370665152683

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @devout pivot until <t:1763803386:f> (1 hour).

cedar carbon
#

A cli tool to search for models and datasets on the hf hub.

Features

  • Search Models: Find models by keywords, author, tags, or task
  • Search Datasets: Find datasets by keywords, author, or tags
  • Export Results: Export search results to CSV or TXT files
  • Beautiful Output: Formatted terminal output with Rich
  • Python API: Use as a library in your Python projects

pip install hfsearch

drop a star🌟 if you like it https://github.com/HenokB/hfsearch

GitHub

cli to search for models and datasets on the Hugging Face Hub. - HenokB/hfsearch

oak sentinel
#

Hi everyone,
I’m working on a sign-language classification project in TensorFlow and I need some advice because my accuracy is very low. I have used WLASL100 and WLASL1000, and I also tried using only the top 10 most frequently recorded words, but that didn’t improve accuracy much. I excluded the face keypoints and only used body, hands, and arms (with MediaPipe), which helped a little but didn’t solve the problem. My model is a small BiLSTM network with two layers (64 and 32 units), followed by a dense layer and a softmax output. For training, I used class weighting, early stopping, and learning rate reduction. Sequences are padded to the maximum length in the dataset, and I do a stratified train/validation split.
I’m wondering what I could do to improve accuracy, as my deadline is tomorrow. Should I switch to a different architecture or dataset? Any advice would be very helpful!
Thanks a lot!

waxen kindle
#

maybe try some bigger network ?

bronze wyvern
#

Hello, anyone knows is there is some kind of object detection models that detect letters trained on a high volume of data?

I want to make a program that will identify letters and based on that identify a whole word and perform some other logic

serene scaffold
bronze wyvern
bronze wyvern
agile cobalt
bronze wyvern
#

The deepSeek OCR is a model that can be downloaded?

agile cobalt
bronze wyvern
#

yep noted, ty !

agile cobalt
#

note that deepseek is much larger and compute intensive than traditional methods like tesseract though

bronze wyvern
#

yeah I guess😭 , for my use case it's really for minimalistic thing, I think I will switch to a lighter thing

#

I noticed there is a library called EasyOCR, have you used it?

agile cobalt
#

iirc it's just a wrapper around other libraries never mind, must be confusing with another one
no

copper kindle
crimson tulip
#

I am trying to learn how to make an Ai for a project I am working on. The ai will take in content from the user and reply in a kind comforting way almost like a therapist. I have no idea where I should start. I would appreciate any advice or suggestions!

serene scaffold
#

if you've only started learning about AI in the last few years, pretty much everything that you think of as "AI" is unobtainable.

crimson tulip
#

Thank you! I will look into Eliza.

strange hornet
#

Since the D2L library used there isn't compatible with Google Colab, I need to rearrange the programs.

#

I want to learn deep reinforcement learning, but I can't find good educational resources for it.

pliant venture
final badge
#

h

bronze wyvern
copper kindle
#

once you set it up correctly everything works

deep abyss
bronze wyvern
#

the thing is, can it be used on a cloud platform like google colab?

#

I wanted to try something but I'm unsure if I can use it there

bronze wyvern
#

ohhh

#

thanks !

slate trench
hasty ivy
#

explain me roadmap of ai and data science

copper kindle
little arrow
#

does anyone have a good resource on implementing a custom matrix in python? i cant really find much online

serene scaffold
little arrow
little arrow
#

idk how to really explain, like a mini matrix inside of a matrix

#

its like 4 matrices inside one matrix

serene scaffold
#

how big is a block?

little arrow
#

any value n

#

this is what its supposed to output

serene scaffold
#

so if the blocks are n * n, then the size of the whole matrix will be 2n * 2n

little arrow
#
  • is any random number
#

yeah correct

serene scaffold
#

!docs numpy.zeros

arctic wedgeBOT
#

numpy.zeros(shape, dtype=None, order='C', *, device=None, like=None)```
Return a new array of given shape and type, filled with zeros.
little arrow
#

so id define it initially with np.zeros(4,n)

#

so it would be ([n], [n], [n], [n])

waxen kindle
#

you can create of size 2n by 2n with the diagonal filled with the value

#

then take the second half of the matrix in both dimensions and fill it with a new value

#

something like:

bigmat = numpy.eye(2*n)*value
bigmat[n+1:, n+1:] = value
# or if the last line isn't working
bigmat[n+1:, n+1:] = np.ones([2n, 2n])*value
little arrow
#

thank you

little arrow
#

im using scipy's linear operator class

#

so i can avoid constructing 2 large arrays filled with 0s

#

so if i want to perform a matvec i can just do it on the two non-zero sections of my matrix

sacred tusk
#

Hi everyone! I’m Stella from Zagreb. I recently finished a Python Developer course and have been diving deep into AI, experimenting and practicing to really understand how it works. I’m super excited to start my first projects and learn by doing, especially in Python + AI.

I’d love any advice, tips, or pointers on where to find opportunities or projects — or just general guidance on how to get started. Any help would mean a lot!

obsidian talon
#

There's machine learning, LLMs, deep learning, natural language processing, computer vision, chatbots, and even robotics can be considered AI.

#

In all cases, I'd highly recommend diving into stats for anything in the machine learning route. I'd also recommend learning how to wrangle and visualize data.

sacred tusk
# obsidian talon AI is a very broad term. Is there something in particular?

You’re right, AI is extremely broad.
My main focus is working hands-on with large language models — experimenting with them, building structured interactions, testing their behavior, and understanding how they reason.

I’ve spent a lot of time doing deep practical work with LLMs: creating prompt systems, running simulations, analyzing responses, and pushing models to understand complex patterns. So even though I’m new to the Python job market, I already have strong practical intuition in how AI models think, learn through feedback loops, and how to guide them effectively.

If anyone here works with Python + LLMs or is building small AI tools, I’d love to learn, contribute, and help wherever I can. Happy to be here!

sacred tusk
# obsidian talon What's your skillset atm?

My skillset right now is a mix of early-stage Python development and deep hands-on experience with LLMs.

Python (beginner):

basics: variables, loops, functions, OOP

working with files, APIs

simple scripts and automation

currently learning best practices and looking for small real projects to improve

AI / LLM practical experience:

prompt engineering

designing structured conversations

building and iterating “AI agent” personalities

studying model behavior, consistency and memory

running small simulations with an LLM to test reasoning and interaction patterns

I’m still early in Python professionally, but I learn fast and I’m very active in experimenting with AI behavior.
If you have suggestions for small projects or beginner-friendly tasks, I’d appreciate it.

twilit prism
#

Can someone explain the self attention formula steps for Q dot product with K, the final computation with k and the argmax step after all of the attention values are accumulated? I can't seem to find a resource that explains it step by step so I'm mixing up which step happens where.

obsidian talon
#

NLP is just really intense classification

serene scaffold
obsidian talon
#

a lot of NLP tasks revolve around classification

serene scaffold
#

Would you consider machine translation in that?

sturdy stump
#

Hey guys, I got recommended by a kind lad to figure out a solution to my problem in python help, no one responded and my post got locked

#

its about rocket thrusters

mellow vector
#

you can just post it, we do a bit more with numpy so you might get a bite

#

these channels are slower though

sacred tusk
sturdy stump
#

ill just post the full thing here

#

I am given a rocket and I need to figure out the values of Fmax, F0 and mass of the rocket by varying my thrust values and according to the a(F) function given in the screenshot, the acceleration varying in such a way.

I have attached the text file for the code for the right thruster (code is pretty much the same for the left thruster) and the F0 for both thrusters differ while the Fmax is the same.

The problem I get when plotting my results is crazy oscillatory (at least that is what I think it is) behaviour as I go through the thrust values. I have tried to instead use np.gradient but I am not very sure if that is a good approach for this.

This in turn will not allow me to obtain the values at a high enough accuracy and precision.

I have attached some images giving a clearer picture as to what my issue is.

arctic wedgeBOT
# sturdy stump

Please react with ✅ to upload your file(s) to our paste bin, which is more accessible for some users.

rich moth
sonic olive
#

What are the recommended methods and resources for studying mathematics relevant to data science?

sturdy stump
#

now here’s my follow up, how do i fix this, should i just use np.gradient?

naive river
sturdy stump
#

ooo new concepts for me

thorny geode
#

My random forests is dying

#

4 true negative 72 false positive

jaunty helm
thorny geode
#

It's based on 80/20 split, there's 2922 data total

#

I feel likes it's something to do with classification

jaunty helm
thorny geode
jaunty helm
thorny geode
thorny geode
thorny geode
#

Thanks a lot Purplys, now I will be sleeping

jaunty helm
thorny geode
#

Oh god yes thanks Purplys and Nahita

lapis sequoia
#

Hi.
l've always enjoyed coding and I'm already comfortable with python and building small things. But l've recently realized that in order to get hired as a developer, you need to specialize in a field. And after a bit of research, I find myself to be drawn towards data science in python. However, I think it's worth mentioning that my math skills are not really good. The potential is there. But, l've never really studied math as l'm a high school dropout. So, I'm seeking advice as to whether if I should dive into data science or not. I have a few questions:
• do you need to be good at math?
• what kind of background do you need ?
• what is the best way to learn ?
• what are the best resources to learn ?
I would deeply appreciate any advice and thank you all.

#

Feel free to ping me any time

limpid zenith
# lapis sequoia Hi. l've always enjoyed coding and I'm already comfortable with python and buil...

do you need to be good at math?
Yes, But it's something you can learn to be good at.

what kind of background do you need ?
Data scientists come from all sorts of background, but most share some computer science, math and stats background.

what is the best way to learn ?
That depends on what works best for you.

what are the best resources to learn ?
Depends on what areas you want to concentrate in. Paid options include DataCamp which are very high quality courses. The there's more formal approaches like college/uni. Free approaches iinclude financial aid in coursera, or watching youtube videos or reading the documentation online.

lapis sequoia
#

Why exactly do I need to be good at math ? I’m sorry I’m asking too much but I just wanna be certain before I commit to it. So, why do you need to be good at math ? What do you do daily that requires math

limpid zenith
# lapis sequoia Why exactly do I need to be good at math ? I’m sorry I’m asking too much but I j...

Most of data science is writing code to compute statistics and plots things. This requires a lot of deep understanding of the theory.

For instance, if you're an entry level data analyst you might need to work with dataframes, pivot tables, and compute various statistics, which means knowing what formulas to apply and when. If you're a data scientist then it's even more involved in math usually requires lots of rigorous experimentation, bias/variance tests, hypothesis testing and so on.

#

More advanced data science positions, like senior level data scientists or machine learning engineers also work with lots of differential geometry, calculus and information theory.

lapis sequoia
#

To be honest, that truly doesn’t sound like something I would be good at. I might have to take some time to consider it. However, if I don’t get into data science, what other fields do you recommend I specialize in ?

#

Is automation/scripting a field you can get a job with ? Because I really like that. I do build small stuff for myself sometimes

limpid zenith
#

It's not a matter of if you'd be good at it, it's a matter do you want to be good at it. The tech industry is currently saturated and jobs like those are harder to come by these days.

lapis sequoia
#

I see. Well, thanks for your advice.

tawny raft
lapis sequoia
serene scaffold
lapis sequoia
#

How is the job market for it ? Now that I’m looking into it, I really like it

serene scaffold
lapis sequoia
#

I’m not in the US.

serene scaffold
lapis sequoia
#

Thanks. I’m gonna also look into data engineering. Any resource to start with ?

serene scaffold
#

I'm not sure. you should probably be comfortable with SQL and MongoDB

lapis sequoia
#

Thanks

tawny raft
# lapis sequoia Thanks for replying. What is data engineering specifically and how is it differe...

You can check out this video to see how different roles on a data team work together. Some of those roles might overlap if you're working at a small company.
https://www.youtube.com/watch?v=tyJ476aNCYU

Watch this visual, animated breakdown of how modern data teams really work — including data engineers, analysts, scientists, architects, and ML experts collaborating on real projects.
👉 Subscribe, Like, and Comment If you want more FREE Courses ❤️https://www.youtube.com/@UC8_RSKwbU1OmZWNEoLV1tQg

━━━━━

MY COURSES
To get ce...

▶ Play video
lapis sequoia
#

I see that. Thank you.

random nymph
#

Hi! Im curious on what some of your guys' favorite deep learning packages/tools are.

My current stack is pretty standard. PyTorch, Polars/Pandas, numpy, Sklearn, Scipy, etc..

I've had ~2 years of hands-on experience with both designing and training models (mostly time series and NLP related), and I'm wondering if there's if there are any underrated or super useful tools that you recommend checking out.

agile cobalt
#

for some things I prefer jax over pytorch, other than that just whatever solves specific (frequently niche) problems

e.g. docker image annoyingly large? use onnx over torch
and not really specific to machine learning, but I like markitdown to ingest any files and marimo for prototyping

random nymph
#

thank you! these seem pretty cool

agile cobalt
random nymph
#

ive always used jupyter notebook, marimo looks crazy

lone gulch
#

I have 360 wedding photos, all ready and edited. I wondered why I couldn't create an AI model to edit them. I came up with a few ideas and approaches:

#######. ######################## ######################.
First, I thought about it and asked Giminai to do it. He suggested training the model with the photos one by one. However, the result was messed up because he was editing pixel by pixel. For example, one half of the face would be over-lit while the other half was under-lit. The training would take about four hours. (I didn't like that idea; it wasn't what I wanted.)

#

########. ######################### ####################.
The second idea was to discover algorithms that adjust lighting and colors. There are also algorithms that calculate the percentage of lighting and colors. So, what did I do? I wrote code that retrieved all the data from the 360-degree photos into a table of the edited images and trained it ("unsupervised learning") so that if I fed it data from an unedited image, it would predict ideal lighting and colors and apply them to the image. (The idea wasn't the best; the editing was weak.)

#####. ######################## ################.
The third idea involved importing the images into Lightroom and changing the settings so they reverted to their original state. I then extracted the data, resulting in two files: x = unedited data, y = edited data. I tried training them using Random Forest Regressor, but the result was worse, especially in terms of lighting. The colors were somewhat good. (Here, I felt the problem was with the data itself, as there was a small amount of incorrect data, but I didn't think it would significantly affect the results.) So, the questions I want to understand are:

#

What's the best training method?
Does even a small amount of incorrect data affect the results?
Is this small amount of data the cause?
Is my approach to these steps sound? In your opinion, how would you rate my thinking of these alternative plans out of 10 (regardless of the project not yet being successful)?
And these are questions for those with experience 👇

Is it possible to train the model, but if the training is insufficient, I create an interface and let the model not predict and modify, and then display a modified image? I would then have three options to click: the first, "No," means the image is corrupted; the second, "Maybe"; and the third, "Yes," means it's modified perfectly. If I click "Yes," it saves the data to the table, adds it, and trains from it?

I would appreciate any helpful information, and if you have any ideas, please leave a comment.

rich moth
#

I just discovered notebooklm . What an incredible learning tool.

iron basalt
#

(Numba killer)

random nymph
random nymph
#

this thing sounds insane

serene scaffold
#

How do you get to the open source part? The link requires you to submit your business email.

radiant pasture
#

I'm developing an Algorithm Trading bot. So I'm wondering what do you guys use in VS Code to visualize/analyse large base of data

slate trench
royal kraken
#

that's true, because Jupyter Notebook is very popular right now

serene scaffold
#

!warn @waxen crag your message was removed for advertising. And it's not really an open-source project if you have to sign up for something to be able to access the code.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied warning to @waxen crag.

lapis sequoia
agile cobalt
#

the biggest upside is having no hidden state ; the execution order depends only on which cells reference variables defined in which cells, you cannot run things out of order so it's much harder to end up with results different from what you're get running it fresh after restarting the kernel

about half of that comes from their reactive code, half from preventing you from doing things that are generally considered a bad idea though
(you cannot re-assign variables in different cells, and are generally discouraged from mutating things)

#

it also comes with some built-in UI elements and you can toggle between the code and a dashboard/webapp-ish view when using them, kinda like having streamlit built into the notebook

lapis sequoia
snow marsh
bronze wyvern
#

hello, anyone knows about the ipywidgets library? I recently came across that, can anyone explain what is it and how it is used pls, is it just a UI that allows us to set up some settings?

agile cobalt
bronze wyvern
radiant pasture
#

guys do you know why the data looks so ugly ? it wasnt supporse to be like in clean separated colums ?

#

I just wanna to look like this

#

it says that I got a total of 1 column, why is that ?

random nymph
radiant pasture
#

thanks man

barren wadi
#

Did anyone do anything using conformal prediction efore?

prisma wing
#

just found this

#

better and faster than autogluon

slate trench
serene scaffold
slate trench
rich moth
noble spear
#

Can someone help me with a ai/ vision problem that know alot about these things?

waxen kindle
#

Just ask

noble spear
#

I have a ai model that can detect eggs and tell if they are clean or dirty. I want to be able to see on the dirty eggs how much dirt is on them. So i would like to make a mask of only the dirt. I have tried a few things but the results are not that great.

Does anyone know how i could make a presice mask from the dirt on the egg? Or have any ideas.

Its also important that the stamps on the eggs are ignored in the mask. I also have no idea how to do this or if its even posible.

noble spear
agile cobalt
#

I feel like some simple theresholding should work?

agile cobalt
#

yeah even just asking AI to write some opencv2 code it seems like it should do the job
(code)

noble spear
#

And it needs to be specific. And detect evrything thats dirty and should not be on the egg

agile cobalt
#

that was meant to be at most a starting point, not me doing your entire job for you

#

you can test a few different strategies and see what works - it'll probably involve a ton of trial and error one way or the other

noble spear
#

I have tried some things myself aswel and ive come here(see picture). It looks realy doable to filter it out of here. And i tried to filter out green with the inrange command. But i dont get enything out if it. The img is in hsv

noble spear
radiant pasture
#

guys what is going on ??

serene scaffold
agile cobalt
radiant pasture
agile cobalt
#

if you are just trying to download/install packages, you should use tools like uv or pip that manage downloading automatically for you instead of downloading these files yourself

radiant pasture
#

yes, I use pip

#

what is the actual difference between this two ?

subtle lotus
#

For most cases, i recommend Python environments. It's simpler, less hassle.

radiant pasture
zenith nova
#

!ban 719846291332661259 spam

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @waxen crag permanently.

glossy tendon
#

Hi all, I have been trying to use dask and a notebook to process 560 gzip files of data, 100mb compressed and 524mb uncompressed each, on my computer which usually has 10-13 GB available RAM, is there a way to split the aggregates at the middle layer to write to their own files? I'm sick and tired of constantly having to cancel runs because of OOM

#

I know of repartition, maybe I'm just misreading this graph??

digital valley
#

“Hi, can anyone tell me which project I should put on my resume?”

#

in Data analysis

glossy tendon
#

the project you're most proud of or most relevant to what job you want

stable mural
#

Could someone recommend some books covering data analysis for beginners

radiant pasture
#

guys I dont know why but Its taking too long to connect with python

agile cobalt
#

it might not work well with 3.14 yet, try using 3.13 or even 3.12

molten badger
#

should i learn neural networks from scratch or should i directly learn tensorflow

random nymph
#

You can technically get away with just using a python library and not know how everything works for some basic projects, but diagnosing your problems and improving your models will be very hard

#

For coding by scratch, there are things like gradient calculation and optimizers that might make it a little harder and confusing for beginners to implement, which is why I suggest just jumping to the library

#

Those libraries take care of it for you

molten badger
random nymph
#

I’d say from scratch is good if you want to dive super deep into the functionality and implementation, but if you just want to learn about architecture components and specific applications, libraries will probably help you a little more there

lapis sequoia
#

guys is this were data anlayst chat?

serene scaffold
#

!warn 1401906940866465848 Your messages where you ask for work have been removed, as this is against the rules.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied warning to @craggy creek.

summer mist
#

Someone knows a good formation for Pandas ?

serene scaffold
summer mist
agile cobalt
#

also read the official user guide if you haven't yet

lapis sequoia
#

guys anybody knows what should i master in python to become a data analyst

random nymph
#

These libraries were more enough for me to get my first work experience at least

lapis sequoia
#

what is easier data anysis or data science

waxen kindle
#

The one you prefer

lapis sequoia
#

i mean i want the easiesst one only

radiant pasture
#

can anyone help me with this error ?

#

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyarrow
Failed to build pyarrow
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> pyarrow

waxen kindle
opaque condor
#

I'm finally happy

#

I can add a bounding box to my dataset using Labelimg

sacred tusk
#

Lately I’ve been noticing something funny:
the same intuition I use when I read people in real life seems to work surprisingly well when I work with AI systems.

It’s like… every model has a ‘personality rhythm’.
Every prompt has an emotional temperature.
Every conversation has a hidden structure.

I don’t approach AI academically — I feel it first.
Behaviour, resonance, stability, the way the system “breathes”…
and only later I figure out the technical name behind it.

It’s a strange skill to describe, but it lets me spot patterns fast, tune prompts intuitively, and stabilise behaviour before it breaks.

Does anyone else work with AI more through intuition than textbooks?
Curious to hear how you translate your human instincts into AI work.

short saffron
#

hi

radiant pasture
#

What is the best model in Ollama for coding?

#

But at the same time not too big, because I only have 1Tb and 16gb DDR5 ram

spiral kindle
#

Hello everyone is any body, ready to collaborate for any projects, which I can get hand on learning skills from

Please let me know am open.

#

I like to learn from people guys.

surreal terrace
#

lets take deepseek coder for example, looking at ur storage and ram, i think youll have maybe a 8 or 12 gb vram gpu or even more im not familiar with the latest gpus, but you can for sure run like 8b or even 12b params model very easily, but if you go for better ones like the 33b youll have to offload these work to your ram and cpu too which could be slow but yea still run, again im no expert

toxic palm
#

Hi,
I am interested in learning AI. checked out youtube courses & udemy etc to find an course.
looked into so many suggestions from reddit, however, could not find something explaining in simple way with pictures & sample programs.
if you know any, pls let me know.

dreamy latch
#

@toxic palm a specific part of the AI field or everything that came in the last decade ? I'm a newcomer too but all I can say is that I watched sebastian raschka llm videos

dreamy latch
#

seems like raschka video tutorials are split in easy chunks, and his book has a bit more details. but maybe there's better. how solid on math are you

surreal terrace
toxic palm
surreal terrace
# toxic palm kind of what is AI, then writing some small AI programs etc

mhm thats cool, if you just want the basics then i'd say watch 3blue1brown's video (youtube) on large language models, its pretty simple to understand for beginners, there is also a person named Andrew Ng, who is like really good at this, he has courses on coursera, all tho they are paid you can still watch all videos of that course, and if you want to understand it more visually then there is this website called mlu-explain github io, it explains all the ways we train these "AI"

toxic palm
toxic palm
dreamy latch
#

@toxic palm it's the mainstream trend these days. then there's machine learning (various kinds of neural networks, deep learning) ... long ago there was GOFAI (expert systems, prolog)

#

I find the semantic vector embedding idea nice

toxic palm
dreamy latch
#

i'm too new to answer that sorry

#

start with the answer above (andrew ng) you'l see

quasi pier
#

Is anyone in here familiar with deep reinforcement learning ?
I'm trying to solve highway_env using DQN and am struggling a lot.

quartz plover
#

hey uall

#

anybody here learn about data engineering ??

opaque condor
#

I know not yet also would have anyone made a simulation with life in it?

agile cobalt
#

"with life in it"? what exactly do you mean by that

opaque condor
#

A network that can pass traits on like a genetic algorithm but goes through the same processes similar to animals or humans in any regard

serene scaffold
opaque condor
#

Good point

empty cipher
#

I wana build a tool compatible with next js... Can someone guide me on how to build an Ai that can check 10 pdf files with each file having one page .. either calling them from some data base or user uploads
What should I go for An agent or some finetuned Ai model

raven swift
modern copper
#

hallo.
does this channel count as scientific computing?

serene scaffold
modern copper
#

epic.

#

gang NumPy slicing is SOO hard to get in my head.
wdym the last index is 'exclusive'😭🙏

serene scaffold