#data-science-and-ml | Python | Page 176

pseudo loom Nov 8, 2025, 4:22 AM

#

C++

soft ermine Nov 8, 2025, 4:22 AM

#

ye

pseudo loom Nov 8, 2025, 4:22 AM

#

Cool. So

#

Hm js?

soft ermine Nov 8, 2025, 4:23 AM

#

ye

pseudo loom Nov 8, 2025, 4:23 AM

#

...

#

U know everything

soft ermine Nov 8, 2025, 4:23 AM

#

?

pseudo loom Nov 8, 2025, 4:23 AM

#

Wow

soft ermine Nov 8, 2025, 4:23 AM

#

not everything

pseudo loom Nov 8, 2025, 4:23 AM

#

Mostly

#

So can u actually make a full stacked ai model

soft ermine Nov 8, 2025, 4:24 AM

#

full stacked? wat that mean for ai?

pseudo loom Nov 8, 2025, 4:24 AM

#

Ye

soft ermine Nov 8, 2025, 4:24 AM

#

no like. wdym full stacked. like just a working neural network?

pseudo loom Nov 8, 2025, 4:24 AM

#

Kinda.

soft ermine Nov 8, 2025, 4:24 AM

#

then ye i can make any neural net i want and train it

pseudo loom Nov 8, 2025, 4:24 AM

#

U can actually build stuff and become rich perhaps

soft ermine Nov 8, 2025, 4:25 AM

#

neural nets r not that complicated

pseudo loom Nov 8, 2025, 4:25 AM

#

Like ai nowadays is going in a big boom

soft ermine Nov 8, 2025, 4:25 AM

#

its gonna plateau soon

pseudo loom Nov 8, 2025, 4:25 AM

#

Rly?

soft ermine Nov 8, 2025, 4:25 AM

#

ye

pseudo loom Nov 8, 2025, 4:25 AM

#

🙁

#

I felt like in the future the world might revolve around ai

soft ermine Nov 8, 2025, 4:26 AM

#

the models havent changed mathematically for decades. the experts r going in to big companies which are run by old ppl so we are having no technical advancements for it.

pseudo loom Nov 8, 2025, 4:26 AM

#

It actually makes sense

#

Sad

#

Who cares. Python is a great language though ain't it?

soft ermine Nov 8, 2025, 4:26 AM

#

ai is not intelligent. the biggest models dont even think or reason, they just predict the next word in a sentence based on probability

pseudo loom Nov 8, 2025, 4:27 AM

#

Hm

#

Can't predict what would happen still. I hope it doesn't end up like that

soft ermine Nov 8, 2025, 4:28 AM

#

ye u dont need to predict anything. its just how limits work. ai will plateau soon because the training pool will be less effective over time and the predicting ability will stagnate

pseudo loom Nov 8, 2025, 4:28 AM

#

o-0

soft ermine Nov 8, 2025, 4:29 AM

#

ye so the next step is either big ai companies financing research into new models or throwing endless money for very little gain

pseudo loom Nov 8, 2025, 4:29 AM

#

Many people r learning for the sole reason of making better ai models. Like my friends r learning python for the same reason

#

Yo just wait a min I will come back

soft ermine Nov 8, 2025, 4:30 AM

#

ye they will all unanimously fall into the trap of doing exactly what they were taught to do, then if they make it to a company the company will tell them to copy the big companies and they will get stuck making another chatgpt. which is just a flawed concept

pseudo loom Nov 8, 2025, 4:31 AM

#

Its just like that

#

When people in 2017 thought ai was gonna plateau. 💥 Chatgpt claude and gemini

soft ermine Nov 8, 2025, 4:32 AM

#

no

#

they just got financed is all

pseudo loom Nov 8, 2025, 4:32 AM

#

Its just like that. Limits can come in siem fields. Who knows perhaps we get an AI that thinks for itself instead of predictions

#

U can't rly predict

soft ermine Nov 8, 2025, 4:33 AM

#

hundreds of millions of dollars and the ais cant even make reliable video slop yet...

pseudo loom Nov 8, 2025, 4:33 AM

#

I understand limits have occured in some fields, but some other fields still r in the beginning phase

soft ermine Nov 8, 2025, 4:33 AM

#

so idk what you believe, but its very obvious that these ais are going nowhere

pseudo loom Nov 8, 2025, 4:33 AM

#

Let's see 🙂

#

Kk bbye u seem cool

soft ermine Nov 8, 2025, 4:34 AM

#

k bye

pseudo loom Nov 8, 2025, 4:34 AM

#

Gotta have breakfast

pseudo loom Nov 8, 2025, 4:58 AM

#

I am back

soft ermine Nov 8, 2025, 5:06 AM

#

pseudo loom I am back

wb

proper urchin Nov 8, 2025, 5:50 AM

#

so i made this ai

#

import random
from random import *

populationamount = 100
mutationrate = 0.15
goalnum = randint(1, 1000)/10
print(f"Goal: {goalnum}")
chars = "0123456789+-/"
operators = "+-/"
randpopamount = 10
addcharrate = 0.05
delcharrate = 0.05

def randexpr(length = 12):
return "".join(choice(chars) for x in range(length))

def safeeval(expr):
try:
val = eval(expr)
if val > 1e9 or val < 1e-9:
return float("inf")
else:
return val
except (Exception, SyntaxWarning):
return float("inf")

def mutateexpr(expr):
expr = list(expr)
newexpr = []
for char in expr:
if random() < mutationrate:
newexpr.append(choices(chars, weights = [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4], k = 1)[0])
else:
newexpr.append(char)

if random() < addcharrate and len(expr) < 20:
pos = randrange(len(newexpr))
newexpr.insert(pos, choice(chars))

if random() < delcharrate and len(expr) > 4:
pos = randrange(len(newexpr))
del newexpr[pos]
return "".join(newexpr)

def crossover(a, b):
cut = randrange(len(a))
return a[:cut] + b[cut:]

def tourney(pop, k = 5):
group = [choice(pop) for x in range(k)]
return min(group, key = getfitness)

fitnesscache = {}
badpats = [
([f"{i}/{i}" for i in range(1, 10)], 50),
([str(goalnum + i) for i in range(-9, 9)], 100),
(["0+", "+0", "-0", "0/", "0*", "0", "0*"], 100),
(["1*", "*1", "/1"], 75),
(["++", "+-", "-+", "/+", "/-"], 50),
(["//"], 25)
]

def getfitness(expr):
if expr in fitnesscache:
return fitnesscache[expr]
diff = abs(safeeval(expr) - goalnum)

#

#pusnishments
for outerpat in badpats:
for innerpat in outerpat[0]:
if innerpat in expr:
diff += outerpat[1]
break

fitnesscache[expr] = diff
return diff

def properformat(expr):
newexpr = ""
for char in expr:
if char in operators:
newexpr += f" {char} "
else:
newexpr += char
return newexpr

population = [randexpr() for _ in range(populationamount)]
generation = 0

while True:
generation += 1
best = min(population, key = getfitness)
if safeeval(best) == goalnum:
print(f"Achived target number on generation {generation}")
print(f"{properformat(best)} = {safeeval(best)}")
break
else:
print(f"Best of generation {generation}: {best} = {safeeval(best)}")
newpopulation = []
for _ in range(populationamount - randpopamount):
parent1 = tourney(population)
parent2 = tourney(population)
child = mutateexpr(crossover(parent1, parent2))
newpopulation.append(child)
for _ in range(randpopamount):
newpopulation.append(randexpr())
population = newpopulation

#

ugh dc is formatting my stuff ;-;

soft ermine Nov 8, 2025, 8:26 AM

#

um

#

thats not an ai

pseudo loom Nov 8, 2025, 9:30 AM

#

thats cool bacon

#

but ya its not technically an ai model, but a random guessing program

#

its still cool

#

btw u have commited a syntax error at line 86 i think. in is invalid syntax

untold bloom Nov 8, 2025, 11:50 AM

#

that's far from a random guessing program, it's indeed an intelligent one

#

https://en.wikipedia.org/wiki/Genetic_algorithm

velvet ice Nov 8, 2025, 1:47 PM

#

Hi, I'm trying to make a predicting model which uses the opening price of a stock and returns the volume (how many people buy it.)

This image above is the relation between Opening price (X-axis) and volume (Y-Axis). I was wondering what regression model I should use in order to get the most accurate result.

waxen kindle Nov 8, 2025, 3:07 PM

#

seems non-correlated, at least below ~500

arctic smelt Nov 8, 2025, 3:36 PM

#

velvet ice Hi, I'm trying to make a predicting model which uses the opening price of a stoc...

I'd say try the log transform on your volume data. It's the easiest thing to do and might just work. If that makes the relationship linear, run a linear regression and see what you get

vale field Nov 8, 2025, 4:48 PM

#

Quick question, I'm trying to interpret my results for cosine similarity. I know that it is a measure how similar e.g. a document is to other documents. Would most similar pairs be values that are greater than 0.5 and values that are 1?

arctic smelt Nov 8, 2025, 5:16 PM

#

vale field Quick question, I'm trying to interpret my results for cosine similarity. I know...

A good score is completely relative to what you're analyzing. In some datasets, especially in text analysis where word frequencies can't be negative, the entire scale is just 0 to 1. In that case, 0.5 is halfway to being identical, which might be significant or might be total garbage, depending on the context

#

A value of 1 is the most similar

soft ermine Nov 8, 2025, 5:17 PM

#

hoi

arctic smelt Nov 8, 2025, 5:17 PM

#

Anything less than that depends entirely on your specific use case and data

soft ermine Nov 8, 2025, 5:20 PM

#

noice

opaque condor Nov 8, 2025, 9:21 PM

#

Does anyone have a neural mat that can generate useful structures?

serene scaffold Nov 8, 2025, 9:31 PM

#

opaque condor Does anyone have a neural mat that can generate useful structures?

Is neural mat not a typo?

opaque condor Nov 8, 2025, 9:36 PM

#

It was a typo and had to keep my eyes on the road for a second or two

waxen kindle Nov 8, 2025, 9:40 PM

#

please ask again after driving

serene scaffold Nov 8, 2025, 9:51 PM

#

opaque condor It was a typo and had to keep my eyes on the road for a second or two

Setting aside how dangerous that is, there's no way we can have a productive conversation where you will learn something while you're driving

opaque condor Nov 8, 2025, 9:53 PM

#

I turned onto a side street to ask the question

waxen kindle Nov 8, 2025, 9:54 PM

#

doesn't matter, please drive safely and will talk about that later

serene scaffold Nov 8, 2025, 10:03 PM

#

There's no way we can have a productive conversation ... While you're driving

soft ermine Nov 9, 2025, 1:19 AM

#

opaque condor Does anyone have a neural mat that can generate useful structures?

i believe openai open sources weights

woven prairie Nov 9, 2025, 7:43 AM

#

Has anyone worked with langgraph building an agent.

grand minnow Nov 9, 2025, 7:50 AM

#

woven prairie Has anyone worked with langgraph building an agent.

why do you ask?

supple spruce Nov 9, 2025, 8:01 AM

#

I have question.
for AI learning, is python essential?

grand minnow Nov 9, 2025, 8:03 AM

#

supple spruce I have question. for AI learning, is python essential?

you can also use other languages to do Machine Learning

supple spruce Nov 9, 2025, 8:04 AM

#

but python also okay?

grand minnow Nov 9, 2025, 8:04 AM

#

supple spruce but python also okay?

yes

opaque condor Nov 9, 2025, 8:04 AM

#

supple spruce I have question. for AI learning, is python essential?

no you could use C/C++, java but python is easy syntax

#

yes of course

supple spruce Nov 9, 2025, 8:04 AM

#

ohk

#

thanks

red flint Nov 9, 2025, 1:55 PM

#

Anyone can send Whole roadmap of AI i mean full scrkit learning to tensorflow to Latest llm concepts

pseudo loom Nov 9, 2025, 2:03 PM

#

Try finding some course on Coursera or Udemy. They can be really helpful. I suggest Andrew ng's full ai with python course in Coursera

opaque condor Nov 9, 2025, 4:41 PM

#

Is there a book for py torch for beginners?

vale field Nov 9, 2025, 4:48 PM

#

Im trying to learn more about BERT. I wanna ask which model should i use for generating BERT embeddings for sentences all-MiniLM-L6-v2 or all-mpnet-base-v2 or something else?

pseudo loom Nov 9, 2025, 5:32 PM

#

According to me you should use all-mpnet-base-v2 as its a top performer balancing in speed and high accuracy for smaller models with less memory. all-MiniLM-L6-v2 is a highly recommended choice, while bge-en-icl is noted as having top performance on various benchmarks, but with a larger size.

#

Building LLMs with pytorch: step by step by Anand Trivedi is a good book for pytorch. You could find more books on Amazon

vale field Nov 9, 2025, 5:37 PM

#

pseudo loom Building LLMs with pytorch: step by step by Anand Trivedi is a good book for pyt...

Oh ok thanks

charred light Nov 9, 2025, 10:11 PM

#

Ollama with RAG is still non-deterministic even with temperature set to 0 right?

lime grove Nov 10, 2025, 5:22 AM

#

having some fun with the airline passengers prediction data set, and I got this with a naive LSTM implementation in pytorch

#

#

1 to 1 prediction, 0.67 train test split, 2000 epochs.

#

However, a single MinMax normalization line results in this

#

same hyperparameters

#

Now, the thing that I am finding interesting is this: the green line, which represents the test set, is nearly identical to the original dataset. The MinMax range was chosen to be (0, 0.1)

#

It is not as good if I normalize it as (0, 1), which is commonly done. So the absolute magnitude of the training errors matter in the end. The error is directly dependent on the absolute scale of the y-axis

lime grove Nov 10, 2025, 5:50 AM

#

and yes, did this with Jupyter, the hated notebook.

rich moth Nov 10, 2025, 7:18 AM

#

What if you could turn ML artifacts into proof-carrying objects?

#

Like lets say, for any given dataset, model or run a small, deterministic and verifiable record gets produced that says, this is what i am, this is how i was computed and here how you can independently verify that I'm not lying.

torpid quartz Nov 10, 2025, 8:21 AM

#

Running a model on my Mac even with metal takes an extremely long time (13 sec or more), but with the webgpu demo on chrome, exact model it has sub-second latency… anyone know what might be causing this?

#

It shouldn’t take this long as it’s meant to be very fast even on iOS

vast pond Nov 10, 2025, 8:52 AM

#

this is just a silly question but whats happening here? why are those two not interleaving

wooden sail Nov 10, 2025, 10:05 AM

#

vast pond this is just a silly question but whats happening here? why are those two not in...

what do you mean by not interleaving?

#

make_moons generates a dataset for classification, so what you see are the two classes. since noise=0, the two classes are clearly separated

vale field Nov 10, 2025, 10:58 AM

#

Can someone please help me with my post in #1035199133436354600 ???

dusty valve Nov 10, 2025, 1:09 PM

#

vast pond this is just a silly question but whats happening here? why are those two not in...

67

pseudo loom Nov 10, 2025, 1:46 PM

#

Hmm

mellow vector Nov 10, 2025, 2:01 PM

#

So I'm interpolating values for NaN's and I'm not sure if this instructor is just saying, "Hey, you can deal with NaNs this way!" or "Hey, you can synthesize data for training this way!" Is it unheard of to train on available data to generate missing values to then train on?

agile cobalt Nov 10, 2025, 2:10 PM

#

mellow vector So I'm interpolating values for NaN's and I'm not sure if this instructor is jus...

not unheard of, but there are a bunch of downsides like

can reinforce biases
increase the risk of data drift
many real datasets are made by concatenating different datasets ; the distribution in each of them would be different, so you may need to train a model per data source, or give up if a given source does not contains that field for any records at all

oftentimes it's better to just let the model figure it out instead of layering a model on top of another model

mellow vector Nov 10, 2025, 2:19 PM

#

man, the precision that make up stats still perplexes me sometimes... I don't have a formal stats education and it would be so easy to look at two distributions and just say "eh, close enough"

lime grove Nov 10, 2025, 2:59 PM

#

By interpolation do you actually mean imputation?

#

I feel that using the word "interpolation" leads to a mental dead end. If you start thinking of filling gaps in the data as "imputation" you'll find a rich and extremely challenging literature on this theme

#

Because, for example, preserving the statistical properties of the data set is important, and this isn't a solved problem, and is an area of active research. For a time series, for instance, there are techniques that consist of identifying distributions characteristic to the region around the gap, and then taking a random sample.

mellow vector Nov 10, 2025, 3:08 PM

#

Yeah I'm somewhat familiar with the different methods of filling in the gaps, I was just a bit surprised to have it presented as it was. I wouldn't intuitively consider training a model on synthetic data.

lime grove Nov 10, 2025, 3:09 PM

#

I don't see an issue with synthetic data

#

The question is really about the properties that synthetic data has

#

As long as the model that generated that synthetic data has some kind of correspondence with what you're trying to ultimately model

#

Like the stock market, a good question would be the model that generated synthetic data set: how well does it represent the actual market? Probably not very well, but I hope your get the idea I'm trying to present

#

Ideally, synthetic data inserted into real data should be invisible insofar the final result is concerned

mellow vector Nov 10, 2025, 3:16 PM

#

It's just strange to treat a prediction like a feature for the first time I guess ¯_(ツ)_/¯

lime grove Nov 10, 2025, 5:10 PM

#

what do you mean - a prediction like a feature?

vale field Nov 10, 2025, 5:39 PM

#

Quick question, i'm supposed to be making heatmap of my document similarity results and when I was calculating the similarity results, I used 2000 of the documents. When I was trying to make heatmap e.g. for 50 documents, it was not interpretable whatsoever. I just wanna ask, does the number of documents when making visualisation matters or not? I'm not sure what people do when they need to make visualisations and they are working with a lot of data. I've been using top N documents so far but I don't know why I still feel uncomfortable doing this way.

torpid quartz Nov 10, 2025, 8:05 PM

#

torpid quartz Running a model on my Mac even with metal takes an extremely long time (13 sec o...

Anyone have any thoughts on this?

unkempt apex Nov 10, 2025, 8:17 PM

#

torpid quartz Anyone have any thoughts on this?

ask GPT first

torpid quartz Nov 10, 2025, 8:17 PM

#

unkempt apex ask GPT first

gng are we serious and also I did

unkempt apex Nov 10, 2025, 8:18 PM

#

torpid quartz gng are we serious and also I did

gng ??

torpid quartz Nov 10, 2025, 8:18 PM

#

unkempt apex gng ??

gang

unkempt apex Nov 10, 2025, 8:19 PM

#

lol

lime grove Nov 10, 2025, 8:36 PM

#

vast pond this is just a silly question but whats happening here? why are those two not in...

they are not "interleaving" because the data set is designed that way. Those sklearn.datasets are meant as training tools, with which to test clustering algorithms. In this example you can see how it might challenge proximity based approaches, like KNN, but not density based such as DBSCAN

lime grove Nov 10, 2025, 8:38 PM

#

vast pond this is just a silly question but whats happening here? why are those two not in...

My main complaint about those scikit training datasets is that they don't have a good assortment of high dimensional datasets, which would be useful for understanding the dimensionality-dependence of hyperparameters

#

the usual question with clustering, which is: how much data do you need to resolve a cluster in N dimensions, as a function of N?

obsidian talon Nov 11, 2025, 7:35 AM

#

mellow vector So I'm interpolating values for NaN's and I'm not sure if this instructor is jus...

There is, but there's different types of "missingness" depending on the type it is, you can potentially deal with it algorithmically to impute like KNN imputation or MICE, but those have assumptions on the type of missingness present

obsidian talon Nov 11, 2025, 7:37 AM

#

vale field Quick question, i'm supposed to be making heatmap of my document similarity resu...

Usually with extremely high cardinality like that, thats the easiest way to do it. Otherwise you can try clustering based on semantic similarities, but its computationally expensive with that much data. Are the documents all related?

torpid quartz Nov 11, 2025, 8:40 AM

#

Running llms and vlms on my Mac are very slow even with metal, even in webgpu it’s getting sub second response times but trying to run it manually it takes over 10 seconds - does anyone know why this might be?

#

<@&831776746206265384>

#

This guy is trolling in general as well

zenith nova Nov 11, 2025, 8:48 AM

#

!mute 1435917124303589427

arctic wedgeBOT Nov 11, 2025, 8:48 AM

#

:incoming_envelope: :ok_hand: applied timeout to @glad vessel until <t:1762854500:f> (1 hour).

sly viper Nov 11, 2025, 8:48 AM

#

torpid quartz <@&831776746206265384>

I like how even the last two words got flagged as "AI"

wheat snow Nov 11, 2025, 10:25 AM

#

markov decision processes: the sum of a row within the transition probability matrix must be either 1 or 0 correct? when iterating over my transition matrix for a shape(22,22,4) matrix (its about the grid given in the assignment) i get some odd results:

#

T[s', s, action_space]

sum of T[0, 0, :]: 2.0
sum of T[0, 1, :]: 1.0
sum of T[0, 5, :]: 0.9999999999999999
sum of T[1, 0, :]: 0.9999999999999999
sum of T[1, 1, :]: 1.9999999999999998
sum of T[1, 2, :]: 1.0
sum of T[2, 1, :]: 0.9999999999999999#

#

the 0.99 is due to machine precision error i assume but i am more worried that i get 2.0 as the prob sum to transition from State 0 to state 0 given to try all a from the action space

#

the ruleset was this

wheat snow Nov 11, 2025, 10:47 AM

#

@buoyant slate

buoyant slate Nov 11, 2025, 10:51 AM

#

wheat snow T[s', s, action_space] sum of T[0, 0, :]: 2.0 sum of T[0, 1, :]: 1.0 sum of T[0...

i assume T[0, 0, :] represents the row of probabilities over actions of staying in place for the top-left corner

buoyant slate Nov 11, 2025, 10:56 AM

#

wheat snow T[s', s, action_space] sum of T[0, 0, :]: 2.0 sum of T[0, 1, :]: 1.0 sum of T[0...

then it makes sense for it to be 2, while being in 0, 0 you only stay in place if you try to move north or west
then you can choose north or west, succeed and stay in place with probability p each or fail and go to west or north respectively with probability (1-p)/3
or you can go south or east, each has probability p of succeeding and probability (2 - 2p)/3 to instead go north or west and stay in place
giving you (6-6p)/3 + 2p = 2 - 2p + 2p = 2

#

like i said in math server the values in action rows don't have to make a distribution, the procedure is that you first choose an action, take the row corresponding to your chosen action and current state and this will be the distribution over next states (and hence has to sum to 1)

wheat snow Nov 11, 2025, 11:18 AM

#

buoyant slate then it makes sense for it to be 2, while being in 0, 0 you only stay in place i...

so a0= p + (1-p)/3 because 1. if agent chooses north he has p chance to actually "go" there (by staying in place) + the (1-p)/3 chance from choosing west but actually going north

#

which results in also staying

#

what does this mean now. that there is a 2 chance to stay in s0 when starting from s0 when we sum all actions. thats not really workijng in my brain, cause i knew probabilities beeing between 0.0 and 1.0

#

OHHH but ofc, teh agent can only pick one action at a time

#

so i should rather see it like the agent has a x chance to stay in s0 when starting from s0 when picking that and that action

buoyant slate Nov 11, 2025, 11:22 AM

#

wheat snow what does this mean now. that there is a 2 chance to stay in s0 when starting fr...

yeah the point is that these are not probabilities, as you said actor can only pick one action at a time and these are probabilities that mean "T[s', s, a] is the probability that actor will move to s' GIVEN that he was in s and chose action a"

wheat snow Nov 11, 2025, 11:22 AM

#

buoyant slate yeah the point is that these are not probabilities, as you said actor can only p...

that means that the probabilities along our first dimension should add up to 1 tho ye?

buoyant slate Nov 11, 2025, 11:23 AM

#

as in, the individual cells are probabilities but they aren't probabilities over next states not over actions

wheat snow Nov 11, 2025, 11:23 AM

#

print(grid.T.sum(axis=0))

[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]] works out

buoyant slate Nov 11, 2025, 11:23 AM

#

wheat snow that means that the probabilities along our first dimension should add up to 1 t...

yeah i think so

wheat snow Nov 11, 2025, 11:23 AM

#

wait holup what does axis=0 in a rank 22 tensor mean?

buoyant slate Nov 11, 2025, 11:24 AM

#

side note- rank means number of dimensions in a tensor so it's rank 3 tensor

wheat snow Nov 11, 2025, 11:24 AM

#

3 dimenison ye mb

buoyant slate Nov 11, 2025, 11:25 AM

#

wheat snow wait holup what does axis=0 in a rank 22 tensor mean?

you can visualize the tensor as four 22x22 matrices stacked on top of each other forming a "cube" (cuboid to be exact)

wheat snow Nov 11, 2025, 11:26 AM

#

buoyant slate you can visualize the tensor as four 22x22 matrices stacked on top of each other...

no i think thats wrong. we have 22 22x4 matricies stacked on top

buoyant slate Nov 11, 2025, 11:26 AM

#

that will result in the same cuboid tho

wheat snow Nov 11, 2025, 11:27 AM

#

i could reshape and check

#

#

printout of the T

#

p=0.7 in this example

buoyant slate Nov 11, 2025, 11:28 AM

#

yep, now imagine taking each of these matrices and stacking them in a separate dimension

#

now fixing one dimension will give you a slice, fixing two of them will give you a "line" in this cuboid, fixing three of them will give you one element

wheat snow Nov 11, 2025, 11:29 AM

#

wheat snow

i mean this shape makes sense for me. the first chunk (layer) is an s' . this s' has 23 rows each representing s0- s22. the 4 columns present the possible actions to take and teh vals the % to move from that state s (which is a row) with an action to the s' given by teh entire first layer

#

and teh 2nd layer is s1 then again with 22 rows representing the starter states

wheat snow Nov 11, 2025, 11:31 AM

#

buoyant slate now fixing one dimension will give you a slice, fixing two of them will give you...

that is what i have been presenting with the T[s', s, a] already. i struggle to understand your representation in a different shape that should still somehow result in the same representation of the transition amtrix

buoyant slate Nov 11, 2025, 11:32 AM

#

wheat snow and teh 2nd layer is s1 then again with 22 rows representing the starter states

yup that's correct

#

then summing over axis=0 will eliminate the 0th axis (corresponding to "next state") so it will give you probability to moving to "any of the next states" for a given state-action pair
which is 1, because you always have to move to some next state

wheat snow Nov 11, 2025, 11:33 AM

#

also i coded my solution with adding this comment:

Assumption: rewards get given to agent upon departing from an state, so arriving from S16 -> S21: +0 and going from S21 ->S16: +10

this what we got tought in the lesson about mrp's. rewards gioven upon departing

#

but i think it doesnt make too much sense for this?

#

also States moive from left to right

#

#

this 1 between 10 and 11 is supposed to eb empty

buoyant slate Nov 11, 2025, 11:36 AM

#

wheat snow also i coded my solution with adding this comment: Assumption: rewards get gi...

rewards are usually given for "taking an action in a given state" - so they correspond to action-state pairs, in your case they are determined by the next state, which is in one-to-one correspondence with state-action pairs so it's okay
doesn't matter if you do it when leaving the state or if when arriving at the state, as long as it depends on the state you moved to

wheat snow Nov 11, 2025, 11:38 AM

#

buoyant slate rewards are usually given for "taking an action in a given state" - so they corr...

but wouldnt that change mes up or policy? if we give rewards upon departing the agent will not take rest in one of the absorbing states, they woudl likely like to move into teh last absorbing state and move out?

#

also does absorbing state mean that the agent cant take any further action once reacing an so called "absorbing state"

buoyant slate Nov 11, 2025, 11:38 AM

#

what is your algorithm for computing the policy tho

wheat snow Nov 11, 2025, 11:39 AM

#

buoyant slate what is your algorithm for computing the policy tho

not yet written haha

#

thats for next week

#

"An absorbing state is a state that, once entered, cannot be left. A (finite) drunkard's walk is an example of an absorbing Markov chain. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space." wikipedia

buoyant slate Nov 11, 2025, 11:39 AM

#

buoyant slate rewards are usually given for "taking an action in a given state" - so they corr...

also i'm wrong here, it's not one-to-one correspondence with action-state pairs it's correspondence with outcomes of an action at a given state, it'll depend on algorithm which one matters

wheat snow Nov 11, 2025, 11:39 AM

#

so the thought cannot be for a succsessfull policy/agent to give rewards upon departing from a state because absorbing states trap us.

buoyant slate Nov 11, 2025, 11:42 AM

#

wheat snow so the thought cannot be for a succsessfull policy/agent to give rewards upon de...

that depends on context but the point of the policy learning is to maximize the rewards
so staying at the state with R=10 is actually the optimal policy here

#

also algorithms usually do offline learning - they don't change policy during simulating the agent, only after it has finished a "simulation episode"

wheat snow Nov 11, 2025, 11:43 AM

#

buoyant slate that depends on context but the point of the policy learning is to maximize the ...

yes but we can never collect that reward if we use the given assumption: rewards are given upon departing from a state because we are trapped in there

buoyant slate Nov 11, 2025, 11:43 AM

#

wheat snow yes but we can never collect that reward if we use the given assumption: _reward...

ah, i assumed staying at the state is also considered "departing from it", just to the same state

#

if that's not the case then yea it seems wrong at the first glance

wheat snow Nov 11, 2025, 11:54 AM

#

buoyant slate ah, i assumed staying at the state is also considered "departing from it", just ...

ye i have no idea anymore, i sent an email to the course assistends asking for clarifying that

#

anyway. can u have a look ove rthe code i wrote to achieve my matrix @buoyant slate ?

buoyant slate Nov 11, 2025, 11:55 AM

#

wheat snow anyway. can u have a look ove rthe code i wrote to achieve my matrix <@454740319...

not right now cuz i have to leave in 10 minutes but i can later

wheat snow Nov 11, 2025, 11:58 AM

#

buoyant slate not right now cuz i have to leave in 10 minutes but i can later

k, i leave it here fo ya to check if u got time

we have a big ah class GridWorld which contains all info a lot of helper functions:

def __init__(self,
               shape = (5,5),
               prob_success = 0.7,
               obstacle_locs = [(1,1),(2,1),(2,3)],
               absorbing_locs = [(4,0),(4,1),(4,2),(4,3),(4,4)],
               absorbing_rewards = [-10, -10, -10, -10, 10]
              ):
    """
    GridWorld initialisation
    input:
      - shape {tuple} -- GridWorld shape (height, width)
      - prob_success {float} -- probability of success when taking an action, used to fill the transition matrix
      - obstacle_locs {list of tuples} -- location of all obstacles of the grid: [(obstacle 1), (obstacle 2), ...]
      - absorbing_locs {list of tuples} -- location of all absorbing states of the grid: [(state 1), (state 2), ...]
      - absorbing_rewards {list of float} -- reward corresponding to each absorbing state of the grid: [reward 1, reward 2, ...]
    output: /
    """
```helper functions: 

a neighbour matrix 22x4 (bad name) containing what happens when taking a direction a from a state and where you end up:

#

and teh fucnxtion i wrote

#

def fill_in_transition(self):
    """
    Compute the transition matrix of the grid
    input: /
    output: T {np.array} -- the transition matrix of the grid
    """
    T = np.zeros((self.state_size, self.state_size, self.action_size)) # Empty matrix of dimension S*S*A  23 sacks of 23 x 4 matricies. 
    #each stack is exactly one S_prime. each s_prime contains 23 States where it could originate from. T[s', s, a]
    
    ####
    a_size= self.action_size
    state_size= self.state_size
    neighbours= self.neighbours
    prob_success= self.prob_success

  # T[s_prime, s, action] --> similar to P(s_prime| s, a) ===> T[21, 17,2]: the probability when departing from state 17 to 22 by choosing west
    for a in range(a_size): # represents the choice the agents picks
      for s in range(state_size):
        for a_result in range(a_size): #represents the actual s_prime the agents attempts to move to
          s_prime= neighbours[s][a_result]

          if a_result==a: #sucsess case
            p=prob_success
          else:
            p=(1-prob_success) /3

          T[int(s_prime), s, a] +=p #+= because of multible walls that agent could hit
    return T

#

i think helper functions arent used here yet

#

they are for the reward matrix

wheat snow Nov 11, 2025, 3:25 PM

#

wheat snow markov decision processes: the sum of a row within the transition probability ma...

this start for anyone reading and trying to help.

thick heart Nov 11, 2025, 4:54 PM

#

what are some free ml bootcamps or coursess?

wheat snow Nov 11, 2025, 6:06 PM

#

thick heart what are some free ml bootcamps or coursess?

arent like all harvard courses for free on youtube (at least the lectures idk about the execises)

torpid quartz Nov 11, 2025, 8:06 PM

#

torpid quartz Running llms and vlms on my Mac are very slow even with metal, even in webgpu it...

#

Help wud be appreciated thanksss

thick heart Nov 11, 2025, 9:31 PM

#

wheat snow arent like all harvard courses for free on youtube (at least the lectures idk ab...

why would you assume harvard courses are good?

lime grove Nov 11, 2025, 9:35 PM

#

there's no reason to assume that those courses are better than a comparable rando Udemy / Coursera course

#

you kind of have to tailor the course you choose to your goals & skill sets.

#

I never recommend learning a data science topic in a college degree style, which is bottom up. In other words, before you learn the topic itself, first we must take a few semesters of adjacent courses, etc.

#

you should do it top-down, which is you focus on the topic itself, whatever it might be, and you pick up the needed knowledge along the way. Tons of options for this, so choosing the right one for you depends on who you are

#

my experience with online MIT / Harvard / etc courses is that they tend to be bottom-up in how they approach the topic. They are rigorous, demanding, but ultimately a waste of time.

wheat snow Nov 11, 2025, 9:44 PM

#

thick heart why would you assume harvard courses are good?

Uh idk, never listened to one

wheat snow Nov 11, 2025, 9:48 PM

#

lime grove you should do it top-down, which is you focus on the topic itself, whatever it m...

I noticed that aswell. I started my first pandas project knowing nothing worked from there and developed 2 amazing data dashboards about my spotify and netflix data. I also took a course a year back and some foundations are laid but there was so much missing

#

@buoyant slate got some.time now?

smoky arrow Nov 11, 2025, 10:26 PM

#

lime grove you should do it top-down, which is you focus on the topic itself, whatever it m...

college courses teach you the implmenetation. they can be considered useless if you will never work on designing ml algorithms

lime grove Nov 12, 2025, 12:13 AM

#

smoky arrow college courses teach you the implmenetation. they can be considered useless if ...

Right. So do that if your goal is to design an ML algorithm.

#

and it takes a fair amount of mathematical literacy to do that. Reading journal articles should be something you can do on a regular basis

twilit topaz Nov 12, 2025, 3:38 AM

#

You guys been using polars more or still pandas

serene scaffold Nov 12, 2025, 3:40 AM

#

twilit topaz You guys been using polars more or still pandas

Pandas. I don't have a compelling reason to switch.

twilit topaz Nov 12, 2025, 3:40 AM

#

serene scaffold Pandas. I don't have a compelling reason to switch.

Faster it has declarative syntax. Only thing I can't figure how to do in polars is the transpose function

#

The pandas query function is kinda tough for me to follow vs what polars does with filter/select

#

Pandas has the better transpose function doing it with just T

spring field Nov 12, 2025, 3:50 AM

#

twilit topaz Faster it has declarative syntax. Only thing I can't figure how to do in polars ...

isn't pandas pretty declarative as well?

#

speed's not that relevant for most use cases probably

#

that said... someonesaidrust

twilit topaz Nov 12, 2025, 3:52 AM

#

spring field isn't pandas pretty declarative as well?

Meh I get confused with the brackets a lot. And then sometimes you are forced to use loc or iloc. I found it better to stack multiple conditions with polars instead of using a complicated query

#

Only thing in polars I can't do properly is the transpose function

#

Polars transpose is more complicated

agile cobalt Nov 12, 2025, 3:52 AM

#

twilit topaz Faster it has declarative syntax. Only thing I can't figure how to do in polars ...

unpivot/pivot works better for many transpose-ish use cases
polars also has a transpose method though

spring field Nov 12, 2025, 3:54 AM

#

spring field speed's not that relevant for most use cases probably

what I mean is the difference in speed between pandas and polars

twilit topaz Nov 12, 2025, 3:54 AM

#

agile cobalt `unpivot`/`pivot` works better for many transpose-ish use cases polars also has ...

I know they have their own but pandas T is better from my experience

twilit topaz Nov 12, 2025, 3:54 AM

#

spring field what I mean is the difference in speed between pandas and polars

Yeah polars is faster too and I found their lazy evaluation useful

spring field Nov 12, 2025, 3:55 AM

#

I know it's faster, I just don't think the speed difference is meaningful in the vast majority of use cases

#

but of course, that's a lame excuse

twilit topaz Nov 12, 2025, 3:57 AM

#

Like I had some code for Treasury data I had for time series and I couldn't transpose it to make yield curve using polars but the pandas T worked fine

#

Given if you use pandas to datetime

#

The documentation for polars on this was confusing

agile cobalt Nov 12, 2025, 3:57 AM

#

spring field I know it's faster, I just don't think the speed difference is meaningful in the...

for some use cases it matters a lot, if doing things more efficiently can let you allocate fewer resources to achieve the same result (cost savings), or some niche cases in which ultra low latency matters

though for most cases I agree it's at best a convenience of having to wait a few less seconds

agile cobalt Nov 12, 2025, 3:58 AM

#

twilit topaz Like I had some code for Treasury data I had for time series and I couldn't tran...

do you have a complete example?

twilit topaz Nov 12, 2025, 4:01 AM

#

import matplotlib.pyplot as plt
import numpy as np
import yfinance as yf
import scipy as sp
Treasury = pd.read_csv('daily-treasury-rates.csv')                                                        Treasury['Date']= pd.to_datetime(Treasury['Date'])
Treasury = Treasury.set_index('Date')
Treasury1 = Treasury.T                                                                                    plt.plot(Treasury1['2025-10-10 00:00:00'],color='black',label='10/10/2025 Yield Curve')
plt.plot(Treasury1['2025-08-28 00:00:00'],color='red',label='8/28/2025 Yield Curve')```

agile cobalt Nov 12, 2025, 4:01 AM

#

twilit topaz The documentation for polars on this was confusing

iirc they use this: https://docs.rs/chrono/latest/chrono/format/strftime/index.html
polars really doesn't documents it well, in great part since it's just wrapping chrono, but I still wish they'd at least include a link or two
might open an issue for that later pithink

agile cobalt Nov 12, 2025, 4:01 AM

#

twilit topaz ```import pandas as pd import matplotlib.pyplot as plt import numpy as np import...

set_index('Date')

twilit topaz Nov 12, 2025, 4:01 AM

#

This was in pandas but how to convert to polars is difficult

#

The data I used came from the US treasury website

#

The result I would get this

#

#

from pandas but polars i cant

#

i know i can convert the pd dataframe to polars but i cant transpose cleanly like pandas

agile cobalt Nov 12, 2025, 4:18 AM

#

twilit topaz

I think that df.unpivot(index='Date') works for that? ```pycon

import plotly.express as px
import polars as pl
df = pl.read_csv('daily-treasury-rates.csv')
t = df.unpivot(index='Date')
dates = ['10/10/2025', '01/02/2025']
test = t.filter(pl.col('Date').is_in(dates))
fig = px.line(test, x='variable', y='value', color='Date')
fig.write_html('test.html')

granted, you cannot index it later - polars explicitly avoids having an index like pandas's

twilit topaz Nov 12, 2025, 4:27 AM

#

agile cobalt I think that `df.unpivot(index='Date')` works for that? ```pycon >>> import plot...

I think this is a acceptable solution

#

Unorthodox but works

somber willow Nov 12, 2025, 5:37 AM

#

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df.head(20)

X_train, X_test, y_train, y_test = train_test_split(df, data.target, test_size=0.1, random_state=42)    
model = LogisticRegression(max_iter=100000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_pred

model_accuracy = accuracy_score(y_test, y_pred)
model_classification_report = classification_report(y_test, y_pred)
model_confusion_matrix = confusion_matrix(y_test, y_pred)```

#

is there anything to improve or anything as a recommendation, I am a beginnner to data science

#

it's a breast cancer detection model

obsidian talon Nov 12, 2025, 6:13 AM

#

somber willow is there anything to improve or anything as a recommendation, I am a beginnner t...

I mean the code will run. Do you understand the outputs? Do you want to improve accuracy of the model, improve interpretability of it, etc.

#

Technically a lot to improve, but what do you feel what help you most to learn how to do?

somber willow Nov 12, 2025, 9:27 AM

#

obsidian talon I mean the code will run. Do you understand the outputs? Do you want to improve ...

I know how the model works, but I will apprecite if you explain me how the confusion matrice works

hoary elbow Nov 12, 2025, 9:37 AM

#

Hello,

I am having issues with setting GPU/CPU in order to train my ResNet model. My Jupternotebook is currently set with a GPU. When I try to load my dataset from a directory. I need to explicitly tell tensorflow to perform that operation on the CPU by doing

with tf.device('CPU:0')
 # code to load datasets here

When I want to create my resnet model i currently do;

with tf.device('/CPU:0'):    
    pretrained_model = tf.keras.applications.ResNet50V2(
        include_top = False,
        input_shape = (img_height, img_height, 3),
        weights = 'imagenet',
    )

    # other code here

    output = Dense(1, activation="sigmoid")(x)

    model = Model(inputs=pretrained_model.input, outputs=output)

Finally, I want to compile and fit the model;

# compile code here

epochs = 50

with tf.device('/GPU:0'):
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=epochs,
        callbacks=[checkpoint_cb]
    )

However when I try this, I get this error;

InvalidArgumentError: Graph execution error:

Detected at node StatefulPartitionedCall defined at (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
Trying to access resource conv1_conv/kernel/911 (defined @ /opt/conda/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py:38) located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
[[{{node StatefulPartitionedCall}}]] [Op:__inference_multi_step_on_iterator_38084]

How can this be solved? When I set the model creation to a GPU that code will then fail. So it becomes a big problem.

somber willow Nov 12, 2025, 9:41 AM

#

obsidian talon I mean the code will run. Do you understand the outputs? Do you want to improve ...

Like I know that it tells me how and where the model did wrong but how can I fix it or improve it's accuracy

true flicker Nov 12, 2025, 11:09 AM

#

Hey everyone! I’m preparing for Data Engineer roles (Python, SQL, ADF, ETL) for the next 6months — anyone up for consistent learning and project collaboration?

buoyant slate Nov 12, 2025, 1:46 PM

#

wheat snow <@454740319510986755> got some.time now?

sorry i was quite busy yesterday and i forgot
the code looks alright i think, you iterate over action-state pairs and the innermost loop looks at all neighbours of the state and fills out probabilities in sliced matrix corresponding to s-a pair
although you might be misunderstanding what the tensor means - it is keeping transition probabilities, so probabilities that you will transition to a given state after you've chosen action in some state
so the innermost loop shouldn't in general loop over all possible actions, but rather over all possible states (or only the ones that are reachable from current state, varies from env to env) - in this case the actions correspond to possible next states but it doesn't have to be the case

wheat snow Nov 12, 2025, 2:18 PM

#

buoyant slate sorry i was quite busy yesterday and i forgot the code looks alright i think, yo...

hmm i am not quite getting the 2nd part. why wouldnt we want to have the state looping in the 2nd loop? what would be a "sentence" that describes what my version is doing ( i think that helps me understand the difference), which seems to be wrong?

buoyant slate Nov 12, 2025, 2:28 PM

#

wheat snow hmm i am not quite getting the 2nd part. why wouldnt we want to have the state l...

okay so the tensor you're filling out tells you this - "If we are in state s, we take action a, then for any state s', T[s', s, a] is the probability we end up in this state in the next step"
And what you are doing is "If we are in state s, we take action a, then for any action a' we fill out T with probabilities so that T[s', s, a] is probability of ending up at s' where s' is the state associated with a' "
the problem is - the possible states s' you can end up in after action a from state s do not have to correspond to actions, they are kinda independent
You assumed that each action corresponds to some direction in which we are trying to move - this is correct since we have a grid, but what if we had only 2 actions - one has 1/3 probability to either go north, west or east and second 1/3 probability of going south, east or west - then your innermost loop will go over two actions but you have 3 probabilities to add - since you can move in 3 different directions after any action

wheat snow Nov 12, 2025, 2:36 PM

#

buoyant slate okay so the tensor you're filling out tells you this - "If we are in state s, we...

thanks a lot. ima go over this with my code later, but i assume the correction would be to loop over action space 2x. then identifying and crafting the probabilities (that represent actually attempting to move in that direction a we are currently looping over):

if a_result==a: #sucsess case
   p=prob_success
else: #fail case
   p=(1-prob_success) /3

followed by looping over our states and within that loop determine s' and the val for T to fill in by:

s_prime= neighbours[s][a_result] #a_result is the inner a loop
T[int(s_prime), s, a] +=p

buoyant slate Nov 12, 2025, 2:42 PM

#

wheat snow thanks a lot. ima go over this with my code later, but i assume the correction w...

i mean your code is correct, I was just adding context since looping over actions second time seemed a bit odd to me
the usual way I did this is having some function that returns possible next states for state, action pair, e.g.

def possible_next_states(action, state):
    return neighbours[s]

Or just one that returns pairs (next_state, transition probability)
However you don't really need it in this case, just as a reference for future

nimble osprey Nov 12, 2025, 3:28 PM

#

Hello all, I'm a beginner looking for ideas on how to approach creating a tool that pulls an identical subimage from a larger image using a template, then write what was found to a csv or json file. Specifically it is an inventory screen in a game UI, so the subimage would be an item icon in a fixed position and resolution for every screengrab and it would be an exact match. I've looked into opencv but was just wondering if there was any better suggested methods or tools to use?

vale field Nov 12, 2025, 4:03 PM

#

obsidian talon Usually with extremely high cardinality like that, thats the easiest way to do i...

Not exactly. Some are and some are not. I did try clustering based on semantic similarities but only for a small sample of documents e.g. 100 out of tens of thousands.

mellow vector Nov 12, 2025, 4:25 PM

#

so heres some fun everyone can enjoy, I want to convert MNIST back into images in altair. I wrote a 2x2 heat plot that was like 20 lines of code so I'm a little bit intimidated by the prospect. I'm not writing this because it's the best way, I have been doing everything with altair whether it makes sense or not to level up as a DA.

#

I'm about to whip out an autoencoder and then I'll need to check my results, some alt.Chart() code will follow

mellow vector Nov 12, 2025, 4:58 PM

#

so here's where I'm at ```py
def train_it(model):
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.01)
loss_func = nn.MSELoss()
n_epochs = 10000
for i in range(n_epochs):
sample = df.sample(n=32, seed=i).lazy()
X = sample.select(pl.exclude(['id', 'column_1'])).collect().cast(pl.Float32).to_torch()
y_hat = model(X)
loss = loss_func(y_hat, X)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


return model(
    df.sample(n=10, seed = 42)
    .select(pl.exclude(['id', 'column_1']))
    .cast(pl.Float32)
    .to_torch())

#

nice, am in business torch.Size([10, 784])

waxen kindle Nov 12, 2025, 5:09 PM

#

mellow vector so here's where I'm at ```py def train_it(model): optimizer = torch.optim.Ad...

wait, why are you using X as your target ?

mellow vector Nov 12, 2025, 5:10 PM

#

autoencoder

waxen kindle Nov 12, 2025, 5:10 PM

#

right now you are trying to regress lambda x:x

#

alright

mellow vector Nov 12, 2025, 5:10 PM

#

is that wrong?

waxen kindle Nov 12, 2025, 5:10 PM

#

no, didn't realise it was an autoencoder

#

I think you should use dataloaders and batches

mellow vector Nov 12, 2025, 5:10 PM

#

reciting this from memory after a lecture a few hours ago, so ya feel free to correct mistakes

waxen kindle Nov 12, 2025, 5:10 PM

#

it will work better

mellow vector Nov 12, 2025, 5:11 PM

#

yeah I have no doubt really I'm more after the charting experience, this was a pretty easy subject

mellow vector Nov 12, 2025, 6:07 PM

#

huh... so [column for column in encoded_df] returns the columns but encoded_df[0] returns the first row (polars)

#

how'm I supposed to iterated over it, I was thinking .to_numpy().reshape((28,28)) was going to get it reshaped

agile cobalt Nov 12, 2025, 6:09 PM

#

mellow vector how'm I supposed to iterated over it, I was thinking `.to_numpy().reshape((28,28...

in general you are not supposed to iterate over data frames

but to_numpy should work?

mellow vector Nov 12, 2025, 6:09 PM

#

yeah... hmm I'm less comfortable with numpy but that makes sense

agile cobalt Nov 12, 2025, 6:10 PM

#

it's mostly the same as torch

mellow vector Nov 12, 2025, 6:10 PM

#

in general you're not supposed to plot unlabeled data with altair heh

agile cobalt Nov 12, 2025, 6:11 PM

#

also, you can use df[row, col] like df[:, 0] or df[0, :]

serene scaffold Nov 12, 2025, 6:13 PM

#

agile cobalt also, you can use `df[row, col]` like `df[:, 0]` or `df[0, :]`

I think that will try to get a column whose index is a tuple of (row, col)

#

without the loc

mellow vector Nov 12, 2025, 6:13 PM

#

polars

serene scaffold Nov 12, 2025, 6:13 PM

#

o

#

I'm on a losing streak today

mellow vector Nov 12, 2025, 6:14 PM

#

we'll win you over eventually

mellow vector Nov 12, 2025, 6:39 PM

#

reshaped_arr = encoded_arr.reshape((10,28,28))
reshaped_df = pl.concat([pl.DataFrame(reshaped_arr[i]) for i in range(10)])

#

this feels like a weak method for getting at the intended result

#

I'm still reaching for iteration

#

oh well, break time here's where I'ma leave off ```py
reshaped_df = pl.concat([pl.DataFrame(reshaped_arr[i]) for i in range(10)])
reshaped_df = reshaped_df.with_row_index(name = 'Y')
reshaped_df.with_columns(pl.col('Y')%28+1)

oak hearth Nov 13, 2025, 1:23 PM

#

guys i need to check ai % detection for my word document do you know any tools?\

vale elbow Nov 13, 2025, 2:36 PM

#

oak hearth guys i need to check ai % detection for my word document do you know any tools?\

Gptzero

serene scaffold Nov 13, 2025, 2:54 PM

#

oak hearth guys i need to check ai % detection for my word document do you know any tools?\

none of those are actually good. unlike images or sound, there's no surefire way to detect that it's AI generated.

spice tartan Nov 14, 2025, 2:48 AM

#

Hey guys

#

How do you guys deal with git and merges with notebooks?

serene scaffold Nov 14, 2025, 3:08 AM

#

spice tartan How do you guys deal with git and merges with notebooks?

ipynb notebooks are notoriously incompatible with git. I'm not aware of a good solution except to switch to a different type of notebook like marimo.

spice tartan Nov 14, 2025, 3:09 AM

#

serene scaffold ipynb notebooks are notoriously incompatible with git. I'm not aware of a good s...

Big teams must collaborate somehow?

#

I am looking at nbdime and jupytext rn

serene scaffold Nov 14, 2025, 3:11 AM

#

spice tartan Big teams must collaborate somehow?

I think it's unusual for multiple people to collaborate on a notebook in a way that involves git versioning. If they do, they probably clear all the cell outputs before comitting.

umbral shell Nov 14, 2025, 9:12 AM

#

Hey folks, I know there are a lot of professional programmers here, so I’d like to ask you something.
I keep seeing a lot of negativity around coding with AI / AI-generated code.
Why do you think that is?
I’m genuinely curious — not trying to start drama or be ironic.

Just to clarify:
I’m not talking about “one-prompt copy-paste code”,
but about serious, iterative development with AI tools involved.

jagged bane Nov 14, 2025, 9:14 AM

#

I think a lot of negativity stems from the reduced human input. Code written by AI will almost always be worse than code written by someone with proper experience, and that will only compound

wicked flare Nov 14, 2025, 9:14 AM

#

umbral shell Hey folks, I know there are a lot of professional programmers here, so I’d like ...

I think there are multiple reasons. One is that the easy accessibility of AI tools and AI code generation has lead to an influx of people with little (or even no) programming experience producing and contributing poorly written code in large quantities, which can add a lot of extra workload to people who have to review or deal with that code.

jagged bane Nov 14, 2025, 9:15 AM

#

or people backing up their own "ideas" by saying "but chatgpt told me it was correct"

wicked flare Nov 14, 2025, 9:15 AM

#

Another is that it's just kinda human nature to be skeptical of new and untested concepts or technologies.

umbral shell Nov 14, 2025, 9:15 AM

#

jagged bane I think a lot of negativity stems from the reduced human input. Code written by ...

that's a fair point, but we have pyright, pylint, those tools arent created to make codemore.... stronger?

wicked flare Nov 14, 2025, 9:15 AM

#

Lots of people will just kinda lean in the negative direction until it's been proven beyond reasonable doubt that something really works well.

mellow vector Nov 14, 2025, 9:15 AM

#

it can be useful to someone who can pick out the 1/3 of the code that's decent from the 2/3 that tend to wander into hallucination.

wicked flare Nov 14, 2025, 9:16 AM

#

umbral shell Hey folks, I know there are a lot of professional programmers here, so I’d like ...

That being said, I am a professional developer and I do use AI tools all the time in my daily work.

jagged bane Nov 14, 2025, 9:16 AM

#

umbral shell that's a fair point, but we have pyright, pylint, those tools arent created to m...

yeah, I'm not saying it can't be beneficial, but it still requires an experienced dev in the driver seat

#

a calculator is only useful if you know what numbers you're meant to be punching in

umbral shell Nov 14, 2025, 9:17 AM

#

jagged bane a calculator is only useful if you know what numbers you're meant to be punching...

agree in 100% haha

umbral shell Nov 14, 2025, 9:19 AM

#

wicked flare Lots of people will just kinda lean in the negative direction until it's been pr...

oh yea, in my opinion worstest thing - chatgpt can generate code and newbie can think - wow its amazing, for sure this code its fully optimised

wicked flare Nov 14, 2025, 9:20 AM

#

We also get a lot of people coming into this server where they've used ChatGPT to vibe code something, and eventually they run into an issue they can't resolve just by re-prompting the AI, and ask us to solve it for them instead.

mellow vector Nov 14, 2025, 9:22 AM

#

even code that works and is optimized will often be needlessly complex... is complexified a word? anyway, GPT likes to add libraries and variables inappropriately. Routinely in my studies I'll ask AI to make something happen and the learning process for me is to take what GPT gives me and do it the right way

jagged bane Nov 14, 2025, 9:23 AM

#

Ask it to build something you already have deep knowledge of and you'll see how much unnecessary fluff it adds

umbral shell Nov 14, 2025, 9:23 AM

#

wicked flare We also get a lot of people coming into this server where they've used ChatGPT t...

Ohhh, didn't knowed that, i am having same issues sometimes like that, but in geneal then im switching tool - claude/ gemini / chatgpt / grok/ perplexity.... 🙂
Anyway coding with ai= needs strong testings - like many tests plans, regression, crosscheck with other ai's, its my opinion based on my little experience with python haha

jagged bane Nov 14, 2025, 9:23 AM

#

but if you already have deep knowledge of something, why are you using AI for it....now apply this for things you're less confident about

mellow vector Nov 14, 2025, 9:24 AM

#

additionally, it's training is horribly outdated in many areas. If you ask it about something it should know, like how to write machine learning, it will give you advice on tensorflow. Tensorflow hasn't been relevant for years.

wicked flare Nov 14, 2025, 9:25 AM

#

umbral shell Ohhh, didn't knowed that, i am having same issues sometimes like that, but in ge...

I think something that's counter-intuitive and a lot of people fail to realize is that using an LLM effectively (in such a way that the result is useful and that you actually save time and effort) to accomplish complex tasks like coding is itself a skill that needs to be learned. It's not a magic box that just any person off the street can use with no training.

#

And it's deceptive, because it LOOKS as if you can.

sweet prawn Nov 14, 2025, 9:25 AM

#

is it necessary to learn how to use LLMs?

wicked flare Nov 14, 2025, 9:26 AM

#

You will get code, it just won't be good or useful code.

mellow vector Nov 14, 2025, 9:26 AM

#

sweet prawn is it necessary to learn how to use LLMs?

hah... is that a serious question?

sweet prawn Nov 14, 2025, 9:26 AM

#

it is

mellow vector Nov 14, 2025, 9:27 AM

#

I mean, I've read that in china they have LLM class with Math, History, Literacy and whatever

wicked flare Nov 14, 2025, 9:27 AM

#

sweet prawn is it necessary to learn how to use LLMs?

"necessary" is a strong word. I genuinely think it can be a useful tool in various contexts. But you can probably get away with not using it in most situations as well.

umbral shell Nov 14, 2025, 9:27 AM

#

wicked flare I think something that's counter-intuitive and a lot of people fail to realize i...

totally agree, if someone doesnt know how properly use ai it can generate so many hallucinations, so many false code... and at end they start to get in places like there based on this discussion what i see

sweet prawn Nov 14, 2025, 9:27 AM

#

im just going to assume that i am in the "most situations"

wicked flare Nov 14, 2025, 9:28 AM

#

sweet prawn im just going to assume that i am in the "most situations"

Most people are in most situations most of the time. 😛

sweet prawn Nov 14, 2025, 9:28 AM

#

from what i hear from senior devs, it seems like it's not actually that useful anyway

#

or at least it doesn't really improve your productivity much

mellow vector Nov 14, 2025, 9:28 AM

#

Learning how to use LLMs isn't really easy to answer, the models are constantly changing and you wont always be informed. What works today may not work tomorrow, but there're probably some concepts that can be applied across the subject

wicked flare Nov 14, 2025, 9:29 AM

#

sweet prawn from what i hear from senior devs, it seems like it's not actually that useful a...

In my opinion, this technology is still in its infancy and most senior devs don't know much about it.

umbral shell Nov 14, 2025, 9:29 AM

#

For me it feels like:
– 10 years ago: you had to learn Git
– later: you had to learn Docker / CI
– now: you have to learn how to use LLMs properly

sweet prawn Nov 14, 2025, 9:29 AM

#

let's see if LLMs go into the crypto bin

wicked flare Nov 14, 2025, 9:29 AM

#

It'll be interesting to see what will be considered industry standard practice in 10 or 20 years.

sweet prawn Nov 14, 2025, 9:30 AM

#

for now, it seems i should be able to get away with not using LLMs

umbral shell Nov 14, 2025, 9:30 AM

#

wicked flare It'll be interesting to see what will be considered industry standard practice i...

i dont want even immagine, seriously i dont want even immagine haha

sweet prawn Nov 14, 2025, 9:30 AM

#

im just going to run with that until im forced to do otherwise

wicked flare Nov 14, 2025, 9:31 AM

#

sweet prawn let's see if LLMs go into the crypto bin

I strongly doubt that will happen, because even today people are flailing to even conceptualize use cases for crypto, whereas I think there are already many obvious uses for LLMs, that just maybe still haven't been fully refined.

#

It's just hard to imagine that we'll just shelve all of this.

jagged bane Nov 14, 2025, 9:32 AM

#

I use it to write me snippets that I can then evaluate, but other than that, it's not a regular part of my workflow

mellow vector Nov 14, 2025, 9:32 AM

#

It will boil down to "How much hallucination is acceptable?"

jagged bane Nov 14, 2025, 9:32 AM

#

I've started learning c# and I tried using chatgpt just to see what it would throw at a beginner and I hated it

wicked flare Nov 14, 2025, 9:33 AM

#

I use the Copilot autocomplete all the time. I never turn it off and I accept its suggestions all the time.

umbral shell Nov 14, 2025, 9:34 AM

#

hey folks, slightly off-topic for a moment 🙂
I’m building a lightweight IDE in PyQt + QScintilla because I got a bit tired of VS Code / Sublime / PyCharm / Spyder.
i’m planning to release it soon and I’m really curious about your experience.
What are the biggest pain points or cons of the IDEs you currently use?

wicked flare Nov 14, 2025, 9:34 AM

#

I like using Copilot or ChatGPT to generate test code. I don't use it wholesale and I review it, but it's nice not to have to manually type out all the boilerplate that tends to come with test code.

sweet prawn Nov 14, 2025, 9:34 AM

#

umbral shell hey folks, slightly off-topic for a moment 🙂 I’m building a lightweight IDE in ...

what's the problem you are trying to solve

jagged bane Nov 14, 2025, 9:35 AM

#

wicked flare I use the Copilot autocomplete all the time. I never turn it off and I accept it...

I think it's so damaging to beginners though. I immediately turned it off

wicked flare Nov 14, 2025, 9:35 AM

#

This is still #data-science-and-ml

wicked flare Nov 14, 2025, 9:35 AM

#

jagged bane I think it's so damaging to beginners though. I immediately turned it off

That might be the case, I'm speaking from my own perspective as a senior dev.

jagged bane Nov 14, 2025, 9:35 AM

#

yeah, it's fine if you can vet what it gives you

sweet prawn Nov 14, 2025, 9:35 AM

#

umbral shell hey folks, slightly off-topic for a moment 🙂 I’m building a lightweight IDE in ...

yea you probably should ask it in #python-discussion

umbral shell Nov 14, 2025, 9:36 AM

#

hah, oky, but anyway guys, thank you for curious discussion about using ai in work !

wicked flare Nov 14, 2025, 9:36 AM

#

I find in general that it's mentally easier for me to change something that already exists rather than create something new from scratch. Even long before LLMs came around I would often develop new features or tests by copying existing similar code and then changing it to my needs.

#

And I find that LLMs are useful for this, generating an initial draft that I can iterate on, for various tasks.

mellow vector Nov 14, 2025, 9:46 AM

#

hey nothing personal but I don't really DM, you can usually find me here though

umbral shell Nov 14, 2025, 10:52 AM

#

mellow vector hey nothing personal but I don't really DM, you can usually find me here though

just wanted to thanks and show you something

Task 3.4 – Discord UX Features (Idle Tabs, Variables, Beginner Mode)
Priority: P1 – IMPORTANT
Estimated time: 10–12 h

Scope:

Idle Tabs Manager
track the last_accessed timestamp for each tab,
configurable threshold in settings (default: 24 h),
“notify only” mode (status bar / lightweight popup) + logging,
absolute requirement: zero data loss
(before any future auto-closing behavior, there must be autosave + a “parking lot” recovery system — for 1.0, notification-only is enough).
Variables Panel (post-run snapshot)
after script execution, generate a simple snapshot of locals() (no debugger),
filter out private names (_), provide readable repr() with length limit,
separate panel / output tab, read-only.
Beginner Mode
toggle in Settings (e.g. ui/beginner_mode),
when enabled:
simplified menu (hide advanced options),
larger fonts, more tooltips,
Variables Panel enabled by default, improved error messages.
Acceptance criteria:
Idle Tabs Manager gently notifies the user about long-unused tabs (no auto-closing),
Variables Panel displays a clear, sensible list of variables after script execution,
Beginner Mode noticeably simplifies the UI (fewer options, more guidance).

have a nice day ! 🙂

somber willow Nov 14, 2025, 12:32 PM

#

is there someone, here who's looking forward to learn together and is a begginer, if yes dm me personally

cedar tusk Nov 14, 2025, 2:36 PM

#

somber willow is there someone, here who's looking forward to learn together and is a begginer...

define beginner

#

ive been messing with ai for 5 years now and i still feel like a beginner

somber willow Nov 14, 2025, 2:59 PM

#

cedar tusk ive been messing with ai for 5 years now and i still feel like a beginner

nahh I mean are you still learning libraries such as sciket learn, mathematics stats etc

cedar tusk Nov 14, 2025, 3:00 PM

#

somber willow nahh I mean are you still learning libraries such as sciket learn, mathematics s...

nope

#

im learning other stuff

#

and also dont try to learn a library

#

its useless to just learn all the functions defined in a library

#

thats why google exist

somber willow Nov 14, 2025, 3:01 PM

#

cedar tusk its useless to just learn all the functions defined in a library

bro damn🤣🤣

cedar tusk Nov 14, 2025, 3:01 PM

#

just go along with what you are doing and search for stuff when its needed

somber willow Nov 14, 2025, 3:01 PM

#

what are you learning rn

cedar tusk Nov 14, 2025, 3:02 PM

#

decompositions, ode solvers, efficient ways to handle big data such as sparse matrices, comfyui, autoencoders,

somber willow Nov 14, 2025, 3:02 PM

#

cedar tusk decompositions, ode solvers, efficient ways to handle big data such as sparse ma...

ok now how're you gonna apply it

#

I get it you're doing the maths

cedar tusk Nov 14, 2025, 3:02 PM

#

ode solvers is used for image generation

#

decompositions are useful for handling the math of latent space

#

big matrice storage u can guess probably

#

comfyui is for hobby

#

autoencoders is what transforms inputs into latent

somber willow Nov 14, 2025, 3:03 PM

#

nahh how're you gonna apply it

#

like it's application

cedar tusk Nov 14, 2025, 3:03 PM

#

i wanna make my own diffusion model

#

along with the vae

somber willow Nov 14, 2025, 3:04 PM

#

cedar tusk along with the vae

vue.js??????

cedar tusk Nov 14, 2025, 3:04 PM

#

lol

#

variational autoencoder

somber willow Nov 14, 2025, 3:05 PM

#

bro, how you're gonna build the model, like actually aplly it in an app etc

cedar tusk Nov 14, 2025, 3:05 PM

#

if i make the model, then applying it into an app is very easy

#

making the model is the hard part here

somber willow Nov 14, 2025, 3:06 PM

#

cedar tusk if i make the model, then applying it into an app is very easy

my guy how're you gonna make a model, when you don't know libraries such as tensorflow which are essential for deployment

cedar tusk Nov 14, 2025, 3:06 PM

#

somber willow my guy how're you gonna make a model, when you don't know libraries such as tens...

if i know the math, programming it is easy

#

because i dont have to know the function names, as i know what to do and can find the name from there

#

google it

somber willow Nov 14, 2025, 3:07 PM

#

can I switch to dms, cause I wanna send you a filw

cedar tusk Nov 14, 2025, 3:07 PM

#

sure

obsidian talon Nov 14, 2025, 9:09 PM

#

somber willow Like I know that it tells me how and where the model did wrong but how can I fix...

I just saw this. Did you still need help?

carmine vale Nov 15, 2025, 12:29 AM

#

what is the porpuse of data science

rich moth Nov 15, 2025, 1:06 AM

#

cedar tusk define beginner

its adapting so fast, it does feel like that.

cedar tusk Nov 15, 2025, 1:08 AM

#

carmine vale what is the porpuse of data science

to make data work

rich moth Nov 15, 2025, 1:08 AM

#

I just learned about the Kurimoto model recently.

#

Ive been dabbling in multi agent cogntive swarms.

rich moth Nov 15, 2025, 1:14 AM

#

carmine vale what is the porpuse of data science

the way i see it is its purpose lies in organizing and arranging usually messy data to adapt to ML uses.

#

theres more to it, but thats the geist these days.

lofty root Nov 15, 2025, 1:53 AM

#

Hi

#

Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

somber willow Nov 15, 2025, 5:06 AM

#

obsidian talon I just saw this. Did you still need help?

no not anymore

wooden sail Nov 15, 2025, 5:42 AM

#

carmine vale what is the porpuse of data science

the purpose is to extract information from data and to interpret it. all of these buzzwords like ML and data science are kinda muddy, unfortunately. for example, data science can involve using AI/ML to process and interpret data. it can also be done with classical methods instead

#

what plunder mentioned is more like data preparation, which is a cleanup process you can do before either of ML and data science

lofty root Nov 15, 2025, 7:05 AM

#

Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

#

Hi

#

Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

lime berry Nov 15, 2025, 7:36 AM

#

Hello guys can someone help me by giving me an idea for my final project. I want the for hackathon so I want a great idea and easy cuz im like beginner im down to learn more for the project 😄

slate trench Nov 15, 2025, 8:29 AM

#

lime berry Hello guys can someone help me by giving me an idea for my final project. I want...

I recommend coming up with your own idea and being proud that it was your own concept, and that you carried out the project yourself from start to finish. Start by creating a mindmap of what interests you and what the possibilities are. It could give you insight into what truly excites you.

#

Also write a proper project plan once the idea and project goal starts to take shape. In my opinion, it's a good way to get started.

wooden sail Nov 15, 2025, 8:44 AM

#

lofty root Anyone tried python 3.14 with pytorch, Tensorflow OpenCV?

3.14 came out fairly recently. on pytorch's site, no support for 3.14 is listed yet

#

it's not uncommon for these libraries and othery like numpy to not have support for a handful of months. your best bet is to install 3.13 in a separate environment and use that

lofty root Nov 15, 2025, 8:44 AM

#

Thank you @wooden sail

vale elbow Nov 15, 2025, 8:44 AM

#

as a beginner which one is easier to learn, which one is easier to master? pandas/polars

lofty root Nov 15, 2025, 8:45 AM

#

@wooden sail What do you recommend vscode or py charm

wooden sail Nov 15, 2025, 8:47 AM

#

it depends on how experienced you are. in my personal opinion, learning python, learning a ML module, and learning an IDE are 3 completely different tasks and doing all 3 at the same time will make you learn everything more slowly

#

so if you're new to all, i would probably just use a syntax-highlighting text editor

#

a disproportionate amount of beginner questions in this server have to do with people fighting against vscode and pycharm to get things just to run

#

on a separate note, i do use vscode myself

lofty root Nov 15, 2025, 8:49 AM

#

I've 3 years experience but I'm used to both

vale elbow Nov 15, 2025, 8:49 AM

#

i have the basics of python, i can understand variables, functions, classes, dictionaries, tuple, list, for loop, while loop, if/else statements, data types (like str int bool float) and i recently learn some pandas

#

i also know a little about numpy and matplotlib but completely new to polars & sklearn

wooden sail Nov 15, 2025, 8:50 AM

#

vale elbow i have the basics of python, i can understand variables, functions, classes, dic...

it kinda depends what you want to do. pandas has better integration with numpy at the moment, because it sits on top of numpy

#

polars is better for large queries because it's faster

#

at least in my head, polars is for handling, moving, and accessing data, but not for any complex processing of the data

#

doing the latter will have you leave polars, transforming the data into something like numpy, torch, etc

vale elbow Nov 15, 2025, 8:51 AM

#

basically i have to learn these libraries at my school

numpy
seaborn
sklearn
pandas
matplotlib

and i am trying to master them so i can take the exam

obsidian talon Nov 15, 2025, 8:51 AM

#

vale elbow basically i have to learn these libraries at my school numpy seaborn sklearn pa...

Focus on pandas

vale elbow Nov 15, 2025, 8:51 AM

#

wooden sail polars is better for large queries because it's faster

now i know why we weren't taught polars, we only needed to work with smaller datasets

wooden sail Nov 15, 2025, 8:52 AM

#

vale elbow basically i have to learn these libraries at my school numpy seaborn sklearn pa...

since you list all of these things, pandas plays way better with everything here than polars does. polars will require extra conversions into types that can be used in mpl, seaborn, numpy, sklearn, etc

obsidian talon Nov 15, 2025, 8:52 AM

#

Polars is great for processing data, but it has no ecosystem whatsoever.

vale elbow Nov 15, 2025, 8:53 AM

#

how far does sklearn cover in machine learning? the python library

#

whats the maximum it can do

obsidian talon Nov 15, 2025, 8:53 AM

#

It covers traditional machine learning

#

Supervised, unsupervised, preprocessing, and pipelines and all.

wooden sail Nov 15, 2025, 8:54 AM

#

a really big amount of the classical statistical optimization/estimation methods, and some basic deep learning methods

vale elbow Nov 15, 2025, 8:54 AM

#

oh right now we're on only supervised learning, the lecturer asked us to finish the datacamp course "supervised learning with scikit-learn" and we have some kind of project but i haven't started to even learn the library yet

obsidian talon Nov 15, 2025, 8:55 AM

#

The majority of supervised learning algorithms in sklearn are hardly used in practice, bit it builds the foundation.

obsidian talon Nov 15, 2025, 8:55 AM

#

vale elbow oh right now we're on only supervised learning, the lecturer asked us to finish ...

I looked at the course. Dont recommend it.

#

How much do you know about just basic old linear regression

wooden sail Nov 15, 2025, 8:56 AM

#

obsidian talon The majority of supervised learning algorithms in sklearn are hardly used in pra...

i disagree with this, look at the module

#

there is a large amount of papers being published on these topics and their applications still

obsidian talon Nov 15, 2025, 8:56 AM

#

wooden sail i disagree with this, look at the module

This is scikit?

wooden sail Nov 15, 2025, 8:56 AM

#

and they build the foundation for state of the art model-based neural networks

wooden sail Nov 15, 2025, 8:57 AM

#

obsidian talon This is scikit?

yes

vale elbow Nov 15, 2025, 8:57 AM

#

i know nothing yet i only know theory (like reading) classification, regression, where to use each, confusion matrix, decision trees, test data train data like that

obsidian talon Nov 15, 2025, 8:57 AM

#

Theres not meant to be sequential

#

Supervised learning is easier to learn compared to unsupervised

#

But you need unsupervised to eventually optimize your supervised model

#

For feature engineering

obsidian talon Nov 15, 2025, 8:59 AM

#

vale elbow i know nothing yet i only know theory (like reading) classification, regression,...

Decision trees are the foundation of the more "cutting edge" tabular models

#

And they're used extensively in more modern tabular ML algorithms

#

Is recommend keep going into linear regression and its regularizations

#

So L1, L2, and L1+L2 - these become hyperparameters for many algorithms so it helps to know how they work

#

For classification, I'd do logistic before anything else and multinomial logistic.

#

After than try KNN, then flip a coin between SVMs and Naive Bayes - difficult cor different reasons.

#

If youre more of a stats guy, try for naive bayes, if youre a math and CS guy, go for SVMs

#

KNN and SVMs can both be used for regression, but a lot less commonly. You can technically use naive bayes for regression too, but I wouldn't recommend it.

#

Then id move on to decision trees and ensemble methods which sort of rule tabular ML for complex data and relationships

#

So random forest after, then general gradient boosting, XGBoost, LightGBM, and CatBoost

#

And then stacking if you're feeling fancy

past meteor Nov 15, 2025, 9:10 AM

#

There's a lot of details that are off with this, we can go into the details if you're willing to 😄

obsidian talon Nov 15, 2025, 9:10 AM

#

But learning preprocessing is a huge must!

#

Which details

past meteor Nov 15, 2025, 9:13 AM

#

obsidian talon Which details

The order in which to learn the algorithms seems off and that's partially because you don't (imo) appropriately highlight why any of them make sense to use in a given context

obsidian talon Nov 15, 2025, 9:14 AM

#

This is similar to how I learned it in my machine class last semester.

past meteor Nov 15, 2025, 9:14 AM

#

I don't think it's particulaly helpful learning ML as a big box of different algorithms with different names

#

When it's more important to look at the properties of each method, and group them by property, which then maps to the kind of problems they're good at solving

obsidian talon Nov 15, 2025, 9:14 AM

#

Its more about know where they fall in the landscape of ML and how these algorithms set the foundation for many others

#

You should along the way know every models assumptions/strengths/weaknesses

past meteor Nov 15, 2025, 9:16 AM

#

obsidian talon KNN and SVMs can both be used for regression, but a lot less commonly. You can t...

As an example, SVM is commonly used in many domains. Anything related to EEG/fMRI will likely use (kernel-based) SVMs because they really shine when you have high dimensional data with a limited set of data points

#

In part because the optimization problem they solve's number of unknowns is the amount of observations and not the amount of features

#

Then it's also clear you can't use them on large datasets since you need to make the Gram matrix (size N x N) which may not fit in memory

obsidian talon Nov 15, 2025, 9:17 AM

#

Works great for high dimensional data with medium sized data sets. It was used commonly for NLP tasks. It blew up in the 90s.

past meteor Nov 15, 2025, 9:18 AM

#

obsidian talon Works great for high dimensional data with medium sized data sets. It was used c...

I know, but courses will typically just write this stuff without explaining why, but the why is so important :p

obsidian talon Nov 15, 2025, 9:20 AM

#

They're rather obsolete in the sense of using a tabular model instead of a deep learning model that excels with text based/high cardinality/high dimensionality datasets

past meteor Nov 15, 2025, 9:20 AM

#

Not at all

#

You need a lot more data to train deep learning models

obsidian talon Nov 15, 2025, 9:22 AM

#

LLMs alone make an SVM model more novelty than anything else.

past meteor Nov 15, 2025, 9:22 AM

#

You have much much much more hyperparameters! (In deep learning everything is a hyperparameter, in RBF SVMs you only have 2 hyperparameters)

obsidian talon Nov 15, 2025, 9:23 AM

#

Industries aren't using SVMs besides niche datasets where its too small for something more complicated but too complex for GLMs

past meteor Nov 15, 2025, 9:23 AM

#

And finally, the optimization problem for SVMs, both in the primal and dual formulation leads to a global optimum. With neural nets, well good luck fiddling with parameters and training it over and over

wooden sail Nov 15, 2025, 9:24 AM

#

obsidian talon LLMs alone make an SVM model more novelty than anything else.

deep learning is often the wrong approach for problems. many problems have simple solutions, sometimes even in closed form, where you cannot get any more performance no matter how you try

past meteor Nov 15, 2025, 9:24 AM

#

obsidian talon Industries aren't using SVMs besides niche datasets where its too small for some...

No, they definitely are using them. Where they are appropriate. When they're appropriate, they're simply (one of) the best method you can apply

wooden sail Nov 15, 2025, 9:24 AM

#

you're underestimating performance and optimality guarantees, which deep learning has very little of

past meteor Nov 15, 2025, 9:24 AM

#

It's up to you to know when simple(r) methods are appropriate and use them

#

Instead of trying to dice a tomato with a chainsaw

obsidian talon Nov 15, 2025, 9:25 AM

#

But realistically, organizations that need to deal with high cardinality like that

#

Theyre using LightGBM

#

Theyre using deep learning models

past meteor Nov 15, 2025, 9:25 AM

#

Not necessarily

obsidian talon Nov 15, 2025, 9:25 AM

#

Yes, you need more data, and companies have plenty of it. Too much of it.

past meteor Nov 15, 2025, 9:25 AM

#

Again, it's not the cardinality

#

it's the amount of data

wooden sail Nov 15, 2025, 9:26 AM

#

obsidian talon Yes, you need more data, and companies have plenty of it. Too much of it.

this is 100% wrong

#

big AI companies have plenty of data. most companies do not have enough to train small models

past meteor Nov 15, 2025, 9:26 AM

#

And when you're working with anything related to bio / human stuff

#

You have so so so little data

#

And these are domains where the most money can be made

obsidian talon Nov 15, 2025, 9:26 AM

#

Im talking corporate level companies

wooden sail Nov 15, 2025, 9:26 AM

#

for reference zestar works in bio applications with ML, and i work in industrial nondestructive testing, also with ML

#

with masters/phd

#

i have yet to collaborate with a company or university that says they have too much data

obsidian talon Nov 15, 2025, 9:27 AM

#

I work as a data scientist

wooden sail Nov 15, 2025, 9:27 AM

#

training with less data is an active research field

#

ML rollout in industry is impeded by lack of data

obsidian talon Nov 15, 2025, 9:27 AM

#

Lack of publicly accessible data

past meteor Nov 15, 2025, 9:28 AM

#

No, even within companies

#

Even if they have data, a lot of it isn't labelled to be used in the context of supervised ML

obsidian talon Nov 15, 2025, 9:29 AM

#

past meteor Even if they have data, a lot of it isn't labelled to be used in the context of ...

Thats where my job comes in

past meteor Nov 15, 2025, 9:30 AM

#

Can I be blunt? 😅

#

There's typically dozens of ways models/approaches can be improved by knowing some more of the theory. I see this at work as well, and models have been demonstrably improved by this.

We can have a cool discussion here, but I feel like I'm talking to a wall haha.

obsidian talon Nov 15, 2025, 9:31 AM

#

I guess I just have a different perspective

past meteor Nov 15, 2025, 9:32 AM

#

It's not different, we can't say 1+1=3 and call that a different perspective imo

obsidian talon Nov 15, 2025, 9:32 AM

#

ML Algorithms that arent great out of the box are very costly and need rapid prototyping

past meteor Nov 15, 2025, 9:33 AM

#

obsidian talon ML Algorithms that arent great out of the box are very costly and need rapid pro...

You know you've just described deep learning?

obsidian talon Nov 15, 2025, 9:33 AM

#

And a lot of the modeling i do is bayesisn modeling

past meteor Nov 15, 2025, 9:33 AM

#

They are not great out of the box if you need a novel architecture because the design space is infinite

#

The whole point of this, is essentially that other methods, simpler ones are great out of the box

obsidian talon Nov 15, 2025, 9:34 AM

#

Simpler ones are grest if you have simple data and simple needs.

wooden sail Nov 15, 2025, 9:35 AM

#

https://tenor.com/view/no-gif-23548728

Tenor

past meteor Nov 15, 2025, 9:35 AM

#

Let's go back to fMRI data, that is definitely not simple data and not used for simple. SVMs will still outperform an exotic whatever architecture in most cases

#

On forecasting data exponential smoothing has been shown in large scale studies to be hyper competitive with whatever LSTM people were cooking up

obsidian talon Nov 15, 2025, 9:37 AM

#

You can try benchmarking it against an XGBoost or LightGBM

past meteor Nov 15, 2025, 9:37 AM

#

Yes

#

That's what they did in the survey paper

obsidian talon Nov 15, 2025, 9:37 AM

#

What survey paper

past meteor Nov 15, 2025, 9:37 AM

#

On forecasting methods

obsidian talon Nov 15, 2025, 9:37 AM

#

Forecasting?

#

As in time series?

past meteor Nov 15, 2025, 9:37 AM

#

Yes

#

And I hope you know XGB and LGBM have serious issues for forecasting (another nice theory one)

#

The values in the leaves and nodes are from the training data, correct?

obsidian talon Nov 15, 2025, 9:38 AM

#

Theyre not as commonly used for time series modeling compared to cross sectional data

past meteor Nov 15, 2025, 9:38 AM

#

obsidian talon Theyre not as commonly used for time series modeling compared to cross sectional...

They absolutely are

#

Mostly because they can model seasonality without preprocessing

past meteor Nov 15, 2025, 9:39 AM

#

past meteor The values in the leaves and nodes are from the training data, correct?

But due to this they cannot model trend

obsidian talon Nov 15, 2025, 9:39 AM

#

Most time series models fail horrendously regardless

past meteor Nov 15, 2025, 9:40 AM

#

But a lot of people do not know this, so they employ these models in different scenarios where the real world distribution shifts in a very predictable, constant way (e.g., trend) and they cannot capture this, linear regression can

past meteor Nov 15, 2025, 9:40 AM

#

obsidian talon Most time series models fail horrendously regardless

They make us. so. much. money.

obsidian talon Nov 15, 2025, 9:41 AM

#

The time series has to be static. No sudden cuts to interest rates, no random politician shenanigans.

past meteor Nov 15, 2025, 9:41 AM

#

If you're using exponential smoothing you're invariant to this

#

If the series suddenly shifts your predictions will also shift

obsidian talon Nov 15, 2025, 9:42 AM

#

For serious time series modeling, theyre doing everything from scraping news articles and performing sentimental analyses on the

past meteor Nov 15, 2025, 9:42 AM

#

because the prediction is a moving average

obsidian talon Nov 15, 2025, 9:42 AM

#

Created several lagged features

#

Comparing the data to clusters of portfolios

past meteor Nov 15, 2025, 9:42 AM

#

Forecasting is not just stocks btw

#

It's like, the generic term 😅

obsidian talon Nov 15, 2025, 9:44 AM

#

Its not. One single "cancel culture" tweet would destroy your model and its predictions for next month's concert sales.

#

A single tweet, sudden changing trend, a natural disaster, a political whatever, a pandemic - that model is gone

#

A time series would have be consistent and stable. These models learn from what theyre given. They cant do anything about something no one of us saw coming.

past meteor Nov 15, 2025, 9:46 AM

#

Okay, just gonna drop this here and move on 😄 https://otexts.com/fpp3/

Great book on forecasting. Definitely check out chapter 8 which may help on this topic (exponential smoothing)

Forecasting: Principles and Practice (3rd ed)

3rd edition

obsidian talon Nov 15, 2025, 9:47 AM

#

Great for valuing your most recent data points more than your previous ones.

#

Still cant do anything about that Mcdonalds E coli outbreak

#

Time Series models overfit beyond most others. The second anything in thst environment changes that is significant - its all out the window.

#

They learn from what they have, but they cant predict or forecast a sudden feature it was never trained on, without a ton of uncertainty.

plush shuttle Nov 15, 2025, 10:59 AM

#

Guy how to fine tune a model

#

on "cpu"

serene scaffold Nov 15, 2025, 11:34 AM

#

plush shuttle on "cpu"

if it's an LLM, you just start doing it and wait for the heat death of the universe

#

the code doesn't look any different than the same code running on a GPU. It just won't finish.

prisma wing Nov 15, 2025, 12:35 PM

#

https://github.com/sdv-dev/SDV

GitHub

GitHub - sdv-dev/SDV: Synthetic data generation for tabular data

Synthetic data generation for tabular data. Contribute to sdv-dev/SDV development by creating an account on GitHub.

prisma wing Nov 15, 2025, 12:35 PM

#

serene scaffold the code doesn't look any different than the same code running on a GPU. It just...

you could LoRA it 💀

plush shuttle Nov 15, 2025, 1:12 PM

#

serene scaffold the code doesn't look any different than the same code running on a GPU. It just...

i tried dosent work

#

want the code then ping me

serene scaffold Nov 15, 2025, 1:13 PM

#

plush shuttle i tried dosent work

It's important to never say that something "didn't work". That doesn't give anyone any useful information. You have to say what you did, what you expected it to do, and what actually happened.

lofty root Nov 15, 2025, 3:42 PM

#

Can anyone suggest me CV or NLP project ideas??

plush shuttle Nov 15, 2025, 4:11 PM

#

serene scaffold It's important to never say that something "didn't work". That doesn't give anyo...

📎 main.txt

clever hollow Nov 15, 2025, 4:38 PM

#

Hi, Is there anyone here, aged 14–25, who is interested in AI/ML?

serene scaffold Nov 15, 2025, 4:38 PM

#

plush shuttle

You deleted the pastebin entry? I'm not available right now, but it would have been useful for other people

serene scaffold Nov 15, 2025, 4:38 PM

#

clever hollow Hi, Is there anyone here, aged 14–25, who is interested in AI/ML?

There are lots of people here who are interested in that. Why that age range?

clever hollow Nov 15, 2025, 4:41 PM

#

serene scaffold There are lots of people here who are interested in that. Why that age range?

just trying to find people my age to connect

serene scaffold Nov 15, 2025, 4:42 PM

#

clever hollow just trying to find people my age to connect

Does that mean we can't be friends? lemon_sentimental

clever hollow Nov 15, 2025, 4:43 PM

#

serene scaffold Does that mean we can't be friends? <:lemon_sentimental:754441881743786104>

of course we can 😄

lofty root Nov 15, 2025, 4:48 PM

#

clever hollow Hi, Is there anyone here, aged 14–25, who is interested in AI/ML?

Yes

#

@clever hollow I'm interested

clever hollow Nov 15, 2025, 4:53 PM

#

lofty root Yes

hi nice to meet you

#

are u a university student

lofty root Nov 15, 2025, 4:54 PM

#

Yes

#

I'm BSCS graduate

lofty root Nov 15, 2025, 4:59 PM

#

clever hollow hi nice to meet you

Nice to meet you too

clever hollow Nov 15, 2025, 5:02 PM

#

lofty root I'm BSCS graduate

How was your experience studying CS

lofty root Nov 15, 2025, 5:02 PM

#

Amazing

#

I'm working in Computer Vision and Natural language processing

clever hollow Nov 15, 2025, 5:04 PM

#

lofty root Amazing

oh nice im into those too got any projects u are working on

lofty root Nov 15, 2025, 5:05 PM

#

Let's chat in DM

clever hollow Nov 15, 2025, 5:06 PM

#

ok

gritty vessel Nov 15, 2025, 7:00 PM

#

Hey when working on spatio temporal problem let's say I take input t1 to t6 and predict t7 to t9 after that I take t2 to t7 and predict t8 to t10 and so on will this result in data leakage? As samples are overlapping

#

Or I build samples like this t1 to t6 and targets t7 to t9 next sample will be t10 to t15 and tragets for this will be t16 to t18

ornate trellis Nov 15, 2025, 11:20 PM

#

hey can anybody help me to develop skill in data science and ai i am in 2nd year B.Tech student done with python ,NumPy,pandas ,matplotlib and seaborn going to start ml. Please provide me setp by step process to learn it and also give suggestions

obsidian talon Nov 15, 2025, 11:32 PM

#

gritty vessel Hey when working on spatio temporal problem let's say I take input t1 to t6 and...

what model are you using

obsidian talon Nov 15, 2025, 11:33 PM

#

ornate trellis hey can anybody help me to develop skill in data science and ai i am in 2nd year...

Would be happy to help.

wispy haven Nov 16, 2025, 12:32 AM

#

Hi everyone! I'm currently engaged in deep analytical research focused on the Oslo Bysykkel Open Data . My main goal is to extract maximum educational and practical value from this data asset. I've developed (mete) several distinct concepts for structuring this work, and I'd love to share the vision and gather feedback. I'd love to connect with anyone interested in discussing these concepts, collaborating on content, or just exchanging insights on the Oslo data.Feel free to send me a DM or comment below! Thanks

mellow vector Nov 16, 2025, 5:49 AM

#

past meteor They make us. so. much. money.

I like making money. like. a lot. That book you recommended, would it be a valuable read? Are there others you can recommend?

mellow vector Nov 16, 2025, 6:08 AM

#

Spent about 3-400 hours studying ML math and inner workings in the last year, for context. As a professional DA I recognize that deep learning isn't usually the right tool for the job, but I figure the math will be valuable as I try to reenter the job market.

plush cloud Nov 16, 2025, 6:14 AM

#

Hi everyone! I’m currently exploring Data Science through a course and have just started a GitHub repo to map out learning paths. It’s a collaborative space, and I’d love to include others who are passionate about DS — whether you're experienced or just starting out. If you'd like to contribute helpful resources or insights, feel free to DM me for an invite. Let’s learn and grow together

gritty vessel Nov 16, 2025, 11:22 AM

#

obsidian talon what model are you using

Convlstm

mint shard Nov 16, 2025, 3:22 PM

#

plush cloud Hi everyone! I’m currently exploring Data Science through a course and have just...

heard data science is just statistics is that ryt?

plush cloud Nov 16, 2025, 4:32 PM

#

mint shard heard data science is just statistics is that ryt?

Not exactly, DS includes statistics, but it’s much more than that. It also involves programming, machine learning, data wrangling, and storytelling through visualizations. Statistics helps us understand patterns, but data science uses that understanding to build models, solve problems, and make smart decisions with data.

serene scaffold Nov 16, 2025, 4:33 PM

#

Hi @grizzled tartan . It actually takes an exceptional amount of computing power to create LLMs. Like it literally costs millions of dollars each time. You should start by learning how to train simpler models.

grizzled tartan Nov 16, 2025, 5:09 PM

#

serene scaffold Hi <@755306224882548736> . It actually takes an exceptional amount of computing ...

thanks for understanding I am new to this filed

snow fog Nov 17, 2025, 3:32 AM

#

Hi everyone, I have created a fuzzer to fuzz test the MCP, helpful mostly if you’re using compiler language to create an MCP server as it would help detect crashes and other probable resource issues, also if you’re implementing your own custom MCP protocol implemented, it’s not tested thoroughly as you can see from the issue https://github.com/Agent-Hellboy/mcp-server-fuzzer/issues/108

please use this on your server and help me test it, could be a helpful project to the community.

GitHub

Documentation/error improvement request on running with bearer toke...

Hello and thanks for putting together this tool! I am trying to run the fuzzer against a local server requiring a bearer token and was not able to figure this out from the current docs originally: ...

past meteor Nov 17, 2025, 8:09 AM

#

mellow vector I like making money. like. a lot. That book you recommended, would it be a valu...

Check out the pinned comments

full ravine Nov 17, 2025, 10:20 AM

#

Hello guys

#

Would love to know your opinion on this project

#

https://github.com/awesome-open-source-projects/llm-data-forge

GitHub

GitHub - awesome-open-source-projects/llm-data-forge: A project aim...

A project aiming to produce high quality data for LLMs to train on - awesome-open-source-projects/llm-data-forge

woven prairie Nov 17, 2025, 11:10 AM

#

My question is - Can we make subgraphs inside the main graph sharing the same state of the main graph in langgraph ?

brazen jungle Nov 17, 2025, 11:24 AM

#

Guys.... Where to start?

rich moth Nov 17, 2025, 11:29 AM

#

brazen jungle Guys.... Where to start?

With what? thinkfast

rich moth Nov 17, 2025, 11:41 AM

#

full ravine Hello guys

Im working on something a bit like that in nature, but im working toward a crypto economic provenance layer. Its got a unique complexity metric Ive been working on for a few years. But it also uses merkle anchored commitments on Polygon zkevm, basically ZK data availability layer.

rich moth Nov 17, 2025, 11:43 AM

#

full ravine https://github.com/awesome-open-source-projects/llm-data-forge

Looks a bit AI generated. I'd be the last to knock you for that, but presentation. But I also dont see any code. Just your LICENSE and README. Its not even python related, it looks like AI slop honestly lol.

still prairie Nov 17, 2025, 12:01 PM

#

Hello my name is Taha, nice to meet you! -> likedin/in/tahayacine
If you are a strong CS/AI/Data Undergrad or Masters, and strongly interested in AI research and just starting or just started, DM me, I am starting an initiative together.

placid pine Nov 17, 2025, 12:05 PM

#

Hello everyone, I'm new to python, and now for the specific what i do is learn about data and be a data scientist, but now i'm really confused what should I learn next after playing with the python.. should I continue to learn about sql? or do I need to learn another thing that relate to a data? welp, no idea.. so i just wanna ask something, what the next thing should I learn to be a data scientist?

prisma wing Nov 17, 2025, 12:30 PM

#

uhm

grand minnow Nov 17, 2025, 1:18 PM

#

placid pine Hello everyone, I'm new to python, and now for the specific what i do is learn a...

learn pandas and numpy. you could learn SQL if you are going to get and check data in a SQL database.

placid pine Nov 17, 2025, 1:21 PM

#

grand minnow learn pandas and numpy. you could learn SQL if you are going to get and check da...

i already learn about sql.. but about pandas and numpy, i just know about the basic.. but, does pandas only making a grapich of a data? or it can do smth else?

grand minnow Nov 17, 2025, 1:23 PM

#

placid pine i already learn about sql.. but about pandas and numpy, i just know about the ba...

I could be wrong but I don't think pandas does graphic charts for data. That's normally done with another library like seaborn or matplotlib

grand minnow Nov 17, 2025, 1:23 PM

#

placid pine i already learn about sql.. but about pandas and numpy, i just know about the ba...

If it did charts, you can do more than that

#

like data cleanup or reshaping, resizing, additional columns, drop NAs, etc

placid pine Nov 17, 2025, 1:23 PM

#

grand minnow I could be wrong but I don't think pandas does graphic charts for data. That's n...

or maybe i'm the one who wrong here 😅

placid pine Nov 17, 2025, 1:24 PM

#

grand minnow I could be wrong but I don't think pandas does graphic charts for data. That's n...

oh yes, the seaborn one who made the chart of a data

unkempt apex Nov 17, 2025, 1:53 PM

#

full ravine https://github.com/awesome-open-source-projects/llm-data-forge

AI slop?

unkempt apex Nov 17, 2025, 1:53 PM

#

rich moth Looks a bit AI generated. I'd be the last to knock you for that, but presentati...

yea it is

mellow vector Nov 17, 2025, 2:01 PM

#

placid pine oh yes, the seaborn one who made the chart of a data

Data science is a lot of statistics and data interpretation, much of which isn't performed programmatically. As agent mentioned, a dataframe library is the primary tool in their stack, personally I prefer polars over pandas. Corey Schafer goes over pandas and is highly regarded by the community if you're looking for some educational material.

#

As for charting, multiple sources have recommended plotly express as the first library to reach for. Past that there are many options, you may also benefit from Marimo, a notebook interface for *.py files with integrated visualization.

placid pine Nov 17, 2025, 2:13 PM

#

mellow vector Data science is a lot of statistics and data interpretation, much of which isn't...

ohh I see.. thanks for the information, I'll start to find out about it soon 😄

hard kettle Nov 17, 2025, 3:38 PM

#

hello guys, im currently looking for data science github repositories made by seniors, i want to know how seniors make projects

#

how can i find one?

lime grove Nov 17, 2025, 6:18 PM

#

charting: plotly or matplotlib. Roughly equivalent, but I do not like how plotly has a sales component to it. Makes me feel sullied.

#

also note that you can use gnuplot from the command line if you want to do something really quick and dirty. I feel that gnuplot is often overlooked

#

Nearly 40 year old plotting tool.
http://www.gnuplot.info/

#

it has a simple and easy to learn scripting language

subtle lotus Nov 17, 2025, 11:11 PM

#

hard kettle hello guys, im currently looking for data science github repositories made by s...

take a look here http://terence-lim.github.io/docs/financial-data-science-notebooks/README.html

orchid light Nov 18, 2025, 5:19 AM

#

what kind of zesty finetuning where they doing for grok 4.1

errant bison Nov 18, 2025, 8:24 AM

#

Which is the best and most scalable tool for complex workflows?

rich moth Nov 18, 2025, 8:44 AM

#

Anyone in here read about complexity-aware embeddings yet?

#

i had an idea..

gloomy dirge Nov 18, 2025, 8:52 AM

#

what is it

rich moth Nov 18, 2025, 8:59 AM

#

take a sentencetransformer that has this built in complexity sense, thats driven by my UCF tool. But the idea is to put a small head that projects embeddings int o a 5D "UCF" space (N,A,ϵ,cosθ,sinθ)then.. maybe.. train it so that this subspace matches the analytic UCF while still doing normal semantic embeddings. something like that

#

one embedding, but two views

gloomy dirge Nov 18, 2025, 9:10 AM

#

rich moth take a sentencetransformer that has this built in complexity sense, thats driven...

so what next

dusty violet Nov 18, 2025, 9:11 AM

#

hi guys, im kind of obsessed with reproducibility (nix user 😔 ) and i was wondering if there is a library i can use for downloading datasets that:

caches the data on disk, so it's downloaded only once
asserts that the checksum of the file matches a given hash, so it's clear if the source data ever changes
returns a path to the downloaded/cached file

basically im looking for something similar to fetcher derivations in nix, but as a python library

fetchurl {
  url = "https://www.kaggle.com/api/v1/datasets/download/hojjatk/mnist-dataset";
  hash = "sha256-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
}

rich moth Nov 18, 2025, 9:30 AM

#

gloomy dirge so what next

hell if i know lol still thinking about it

jaunty helm Nov 18, 2025, 11:10 AM

#

lime grove charting: plotly or matplotlib. Roughly equivalent, but I do not like how plotly...

matplotlib has an interesting API that I can't stand honestly
I'm in the minority here but I like hvplot a bit (which is another abstraction layer on top of the plotting libs like matplotlib, plotly, bokeh)

twilit prism Nov 18, 2025, 11:40 AM

#

rich moth take a sentencetransformer that has this built in complexity sense, thats driven...

what are the odds. you're saying, have an origin which you use theta to cluster, and then use phi/theta/distance from 0,0,0 to find semantic clusters. Noice.
That's what I'm doing.

mellow vector Nov 18, 2025, 11:41 AM

#

jaunty helm matplotlib has an *interesting* API that I can't stand honestly I'm in the minor...

hvplot looks really cool, do you find yourself using it's interactive functionality? I recently switched to vega-altair and marimo but haven't ever put any real dashboards together. I guess I'm just wondering how many of these bells and whistles I'd benefit from with a more holistic familiarity.

twilit prism Nov 18, 2025, 11:42 AM

#

rich moth take a sentencetransformer that has this built in complexity sense, thats driven...

if you use a 30k vocab, that mapping reduces search space down to 7k or lower. You can do ALOT better than that though. It's not that helpful, but just enough to leverage it.

woven prairie Nov 18, 2025, 2:35 PM

#

Does anyone have the experience of using langgraph

serene scaffold Nov 18, 2025, 2:38 PM

#

woven prairie Does anyone have the experience of using langgraph

what would you ask that person? it's faster for everyone if you ask your actual question.

woven prairie Nov 18, 2025, 2:42 PM

#

Yes will do it

waxen kindle Nov 18, 2025, 4:23 PM

#

woven prairie Yes will do it

Well, do it...

jaunty helm Nov 18, 2025, 5:08 PM

#

mellow vector hvplot looks really cool, do you find yourself using it's interactive functional...

yeah pretty often
I mean I just find it usually nice to be able to drag around, zoom in/out, hover and get tooltips, etc
the interactivity just depends on the backend you're using
(I'd say I'm more of a hobbyist than a good data scientist, so take the above with big grains of salt)

agile cobalt Nov 18, 2025, 5:39 PM

#

mellow vector hvplot looks really cool, do you find yourself using it's interactive functional...

with marimo, you can also kind of use plots as inputs
e.g. select a region of a scatter plot then retrieve it as a dataframe
some examples: official docs, notebook I made a while ago

it is nice for data exploration and could be useful for some dashboards

dusty violet Nov 18, 2025, 6:54 PM

#

dusty violet hi guys, im kind of obsessed with reproducibility (nix user 😔 ) and i was wonde...

i found what i was looking for: https://pypi.org/project/pooch/

import pooch

file_path = pooch.retrieve(
    # URL to my data
    url="https://github.com/org/project/raw/v1.0.0/data/test_image.jpg",
    known_hash="sha256:50ef9a52c621b7c0c506ad1fe1b8ee8a158a4d7c8e50ddfce1e273a422dca3f9",
)

apparently it's pretty widely used (in packages like scipy, scikit, histolab, etc)

supple plover Nov 18, 2025, 7:58 PM

#

Hi . I have recently started a master's course in machine learning and data science. I was hoping to find out if there was anyone on this channel that does or might be interested in tutoring/having a ML concepts discussion...basically to help with discussing doubts on basic ML concepts. Feel free to DM if anyone is interested.

waxen kindle Nov 18, 2025, 8:49 PM

#

supple plover Hi . I have recently started a master's course in machine learning and data scie...

I don't know if you'll find someone, but in case you don't, feel free to ask your questions and doubts regarding ML and DS in this channel

twilit prism Nov 19, 2025, 12:58 AM

#

supple plover Hi . I have recently started a master's course in machine learning and data scie...

Isn't pretty much just:

prediction by pre-training
prediction by reward

using things like quality-learning, where you have a state, and you just alter the state if it does an action by how far/close it got to guessing if the route it should take is correct, so the weights on best action to choose are altered and allow it to make a different decision on the next test run as a way to permutate every possible action given the state constraints that adjust each run until it converges on what's good by always getting the right answer each time?

And if you take all possible states that could be permutated, and scope them to a smaller set of values to split up the states assessed for the learning process, you have a network/framework of these q-learning instances working within their own state-space, so you dont permutate something like 800 points, and it cuts it down to like 60 (60 is alot, idk of an example right now, but I try to get it to 10 or less) per learning instance.

There ya go, ML.

hard kettle Nov 19, 2025, 2:10 AM

#

subtle lotus take a look here http://terence-lim.github.io/docs/financial-data-science-notebo...

ty

tired wedge Nov 19, 2025, 3:05 AM

#

What do I do to get experience that has an effect on the outside world so that I can turn python into something that makes me money. I have thought about process automation but I do not know where to reach people.

subtle lotus Nov 19, 2025, 3:09 AM

#

tired wedge What do I do to get experience that has an effect on the outside world so that I...

honestly, there are 2 main paths: freelancing or actually getting a job

doesn't even have to be an IT job. anything Excel-related you can use Python to automate and impress your boss works too

rich moth Nov 19, 2025, 4:59 AM

#

tired wedge What do I do to get experience that has an effect on the outside world so that I...

Build a trading bot.

tired wedge Nov 19, 2025, 5:04 AM

#

What do you guys think about the longevity of data science?

vivid flicker Nov 19, 2025, 10:51 AM

#

What bot should i make

waxen kindle Nov 19, 2025, 3:04 PM

#

Whatever you need

bronze wyvern Nov 19, 2025, 7:14 PM

#

Hello, anyone familiar with Roboflow here pls...I have a folder with my images and labels from txt file, anyone knows how I can upload that in roboflow? I can only upload a single folder at a time, do I need a json file or something linking each image to a label or something like that?

rich moth Nov 19, 2025, 9:38 PM

#

damn my trading bot is destroying it. its up $1800 bucks in 40 hours. pretty proud of myself though, between the model and the trading bot itself it's taken me at least a year and half to make it this far.

rich moth Nov 19, 2025, 10:19 PM

#

tired wedge What do you guys think about the longevity of data science?

We will probably see it evolve with new tools and that kind of stuff, but i dont see it going anywhere. Not as long as AI is around.

#

But you wont need nearly as many as we currently probably do.

vale elbow Nov 20, 2025, 12:00 AM

#

I know nothing about trading

rich moth Nov 20, 2025, 2:23 AM

#

vale elbow I know nothing about trading

why not start now ?

vale elbow Nov 20, 2025, 2:49 AM

#

rich moth why not start now ?

Maybe in 3 or 4 years

#

Too busy with school to start now

slate trench Nov 20, 2025, 7:56 PM

#

rich moth damn my trading bot is destroying it. its up $1800 bucks in 40 hours. pretty p...

Nice work! Though 40 hours is still a pretty short sample, so let’s see where this stands in 10 years. As Taleb would remind us, this could easily be a Black Swan event 😉

rich moth Nov 20, 2025, 8:42 PM

#

slate trench Nice work! Though 40 hours is still a pretty short sample, so let’s see where th...

I restarted it last night and started it with 5k this time. Seemed like a perfect time to test its grit. Financial markets are on fire.

pliant venture Nov 21, 2025, 2:54 AM

#

Can yall help me with a data leakage problem

left tartan Nov 21, 2025, 2:59 AM

#

pliant venture Can yall help me with a data leakage problem

Just ask the whole question, so people can answer if they can.

pliant venture Nov 21, 2025, 3:02 AM

#

Ok, I have a data leakage problem in this code where im getting a 1 for the score(It may be overfitting but I doubt it), here's my code: import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

ts = pd.read_csv("/Users/arhaann/Documents/code/Python/Titanic Survival.csv")
s = pd.read_csv("/Users/arhaann/Documents/code/Python/Survive.csv")
ts['Age'] = ts['Age'].fillna(ts['Age'].median())
ts['Fare'] = ts['Fare'].fillna(ts['Fare'].median())
le_sex = LabelEncoder()
le_embarked = LabelEncoder()
#Female is 0, Male is 1
ts['Sex'] = le_sex.fit_transform(ts['Sex'])
#C is 0, Q is 1, S is 2
ts['Embarked'] = le_embarked.fit_transform(ts['Embarked'])
ts['Family_Size'] = ts['SibSp'] + ts['Parch'] + 1
ts['Family_Size'] = ts['Family_Size'].fillna(ts['Family_Size'].median())
ts['Survived'] = s['Survived']
ts = ts.sample(frac=1, random_state=41).reset_index(drop=True)
x = ts[['Pclass', 'Sex', 'Age', 'Embarked', 'Family_Size', 'Fare']]
y = ts['Survived']
print(ts.head(10))
print(s.head(10))
gbr = GradientBoostingClassifier()
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1000)
gbr.fit(x_train, y_train)
print(cross_val_score(gbr, x_train, y_train, cv = 3, n_jobs=-1).mean())
param_grid = {
'n_estimators': [100, 200, 300, 500],
'learning_rate': [0.01, 0.05, 0.1],
'max_depth': [3, 5, 7]
}
gbr2 = GridSearchCV(gbr, param_grid, cv = 3, n_jobs=-1)
gbr2.fit(x_train, y_train)
y_pred = gbr.predict(x_test)
print("Base Model Accuracy:", accuracy_score(y_test, y_pred))
best_model = gbr2.best_estimator_
y_pred_best = best_model.predict(x_test)
print("Best Model Accuracy:", accuracy_score(y_test, y_pred_best))

limpid zenith Nov 21, 2025, 3:22 AM

#

!code

arctic wedgeBOT Nov 21, 2025, 3:22 AM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

opaque condor Nov 21, 2025, 5:54 AM

#

Does anyone have any kids stories they can type to my ai? (Chatgpt coded it and I have been playing with it)

worldly dawn Nov 21, 2025, 6:33 AM

#

opaque condor Does anyone have any kids stories they can type to my ai? (Chatgpt coded it and ...

would something like https://github.com/galaxykate/tracery help?

quasi echo Nov 21, 2025, 8:44 AM

#

Hey guys it's nice to have you all here , today i joined this discord community and it's really exciting for me to be here .

Any body amongst you guys familiar with the libraries that one needs to learn to be a data analyst I'm actually pretty confused though.

copper kindle Nov 21, 2025, 1:08 PM

#

quasi echo Hey guys it's nice to have you all here , today i joined this discord community ...

As a Data Scientist I hope I can answer this for a Data Analyst.

The learning never stops.

If you are starter then:

Pandas
NumPy
Matplotlib
SciPy
statsmodels
Scikit-learn

Data Analyst Roadmap

roadmap.sh

Data Analyst Roadmap

Step by step guide to becoming an Data Analyst in 2025

atomic magnet Nov 21, 2025, 2:03 PM

#

hey guys can you gimme advice i know nothing bout AI (i know python. c. java tho) will this book help me like build at least small langauge models. and what videos and media you recommend other than this. rlly appreciated.

agile cobalt Nov 21, 2025, 2:20 PM

#

atomic magnet hey guys can you gimme advice i know nothing bout AI (i know python. c. java tho...

building an useful language model from scratch requires an absurd amount of data and compute, even a relatively "small" one
fine tuning existing models is not as bad though, take a look at Hugging Face or Unsloth's documentation

slate trench Nov 21, 2025, 2:57 PM

#

pliant venture Ok, I have a data leakage problem in this code where im getting a 1 for the scor...

label leak from the second csv, wrong gridsearch attribute, evaluating the wrong model. All humming together like a quiet bureaucratic nightmare.

lime berry Nov 21, 2025, 3:50 PM

#

slate trench I recommend coming up with your own idea and being proud that it was your own co...

Alright thanks dude

#

Hello guys can you give ideas for data science project for a hackathon

mint flume Nov 21, 2025, 6:53 PM

#

Some guidance for data science please

subtle lotus Nov 21, 2025, 6:57 PM

#

mint flume Some guidance for data science please

I'd start here forsure https://pythonprogramming.net/

rich moth Nov 21, 2025, 8:15 PM

#

atomic magnet hey guys can you gimme advice i know nothing bout AI (i know python. c. java tho...

maybe you can start with LORA model if you have a decent gaming rig.

#

I saw this recently. https://github.com/unslothai/unsloth

Has anyone seen unsloth before? Looks really awesome

GitHub

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning fo...

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM. - unslothai/unsloth

slate trench Nov 22, 2025, 8:12 AM

#

Cool project, but FYI this isn’t an LLM. It’s basically a rule-based NLP engine, not a neural language model. Also, real question: is there even a single line of code in here that you actually wrote yourself?

#

Eager but completely uneducated "developers" are killing all open source work. My time was wasted again for 10 minutes. Open source is doomed.

worldly dawn Nov 22, 2025, 8:18 AM

#

Not the place to shitpost

#

you will be better off in one of the off topic channels like #ot0-psvm’s-eternal-disapproval

#

<@&831776746206265384> shitposting, aggressive

zenith nova Nov 22, 2025, 8:23 AM

#

!mute 1426730370665152683

arctic wedgeBOT Nov 22, 2025, 8:23 AM

#

:incoming_envelope: :ok_hand: applied timeout to @devout pivot until <t:1763803386:f> (1 hour).

cedar carbon Nov 22, 2025, 10:00 AM

#

A cli tool to search for models and datasets on the hf hub.

Features

Search Models: Find models by keywords, author, tags, or task
Search Datasets: Find datasets by keywords, author, or tags
Export Results: Export search results to CSV or TXT files
Beautiful Output: Formatted terminal output with Rich
Python API: Use as a library in your Python projects

pip install hfsearch

drop a star🌟 if you like it https://github.com/HenokB/hfsearch

GitHub

GitHub - HenokB/hfsearch: cli to search for models and datasets on ...

cli to search for models and datasets on the Hugging Face Hub. - HenokB/hfsearch

oak sentinel Nov 22, 2025, 10:11 AM

#

Hi everyone,
I’m working on a sign-language classification project in TensorFlow and I need some advice because my accuracy is very low. I have used WLASL100 and WLASL1000, and I also tried using only the top 10 most frequently recorded words, but that didn’t improve accuracy much. I excluded the face keypoints and only used body, hands, and arms (with MediaPipe), which helped a little but didn’t solve the problem. My model is a small BiLSTM network with two layers (64 and 32 units), followed by a dense layer and a softmax output. For training, I used class weighting, early stopping, and learning rate reduction. Sequences are padded to the maximum length in the dataset, and I do a stratified train/validation split.
I’m wondering what I could do to improve accuracy, as my deadline is tomorrow. Should I switch to a different architecture or dataset? Any advice would be very helpful!
Thanks a lot!

waxen kindle Nov 22, 2025, 3:40 PM

#

maybe try some bigger network ?

bronze wyvern Nov 22, 2025, 3:57 PM

#

Hello, anyone knows is there is some kind of object detection models that detect letters trained on a high volume of data?

I want to make a program that will identify letters and based on that identify a whole word and perform some other logic

serene scaffold Nov 22, 2025, 4:05 PM

#

bronze wyvern Hello, anyone knows is there is some kind of object detection models that detect...

are you sure you're not asking about optical character recognition?
they usually use BiLSTMs for error correction, since those can take the left and right context into account.

bronze wyvern Nov 22, 2025, 5:01 PM

#

serene scaffold are you sure you're not asking about optical character recognition? they usually...

oh interesting, I will read a bit about that and come back

bronze wyvern Nov 22, 2025, 8:18 PM

#

serene scaffold are you sure you're not asking about optical character recognition? they usually...

by the way, do you have any suggestion about any open source library that I can use to perform OCR in python pls

agile cobalt Nov 22, 2025, 8:23 PM

#

bronze wyvern by the way, do you have any suggestion about any open source library that I can ...

the most popular was Tesseract for years, nowadays visual language models are also being used in some cases though
e.g. https://github.com/deepseek-ai/DeepSeek-OCR

GitHub

GitHub - deepseek-ai/DeepSeek-OCR: Contexts Optical Compression

Contexts Optical Compression. Contribute to deepseek-ai/DeepSeek-OCR development by creating an account on GitHub.

bronze wyvern Nov 22, 2025, 8:24 PM

#

agile cobalt the most popular was [Tesseract](<https://pypi.org/project/pytesseract/>) for ye...

yeahh, I have made use of tesseract, was very helpful, but if there are newer libraries, would really love to use them

#

The deepSeek OCR is a model that can be downloaded?

agile cobalt Nov 22, 2025, 8:25 PM

#

bronze wyvern The deepSeek OCR is a model that can be downloaded?

yes, the readme contains all commands and code necessary to install the dependencies, download the model and run it locally

bronze wyvern Nov 22, 2025, 8:26 PM

#

yep noted, ty !

agile cobalt Nov 22, 2025, 8:27 PM

#

note that deepseek is much larger and compute intensive than traditional methods like tesseract though

bronze wyvern Nov 22, 2025, 8:27 PM

#

yeah I guess😭 , for my use case it's really for minimalistic thing, I think I will switch to a lighter thing

#

I noticed there is a library called EasyOCR, have you used it?

agile cobalt Nov 22, 2025, 8:29 PM

#

~~iirc it's just a wrapper around other libraries~~ never mind, must be confusing with another one
no

copper kindle Nov 22, 2025, 9:43 PM

#

bronze wyvern I noticed there is a library called `EasyOCR`, have you used it?

I tried it, though I would not recommend it if your background is colored. I had to do a lot of rule based amendments to the EasyOCR results

crimson tulip Nov 23, 2025, 1:42 AM

#

I am trying to learn how to make an Ai for a project I am working on. The ai will take in content from the user and reply in a kind comforting way almost like a therapist. I have no idea where I should start. I would appreciate any advice or suggestions!

serene scaffold Nov 23, 2025, 1:46 AM

#

crimson tulip I am trying to learn how to make an Ai for a project I am working on. The ai wi...

you can look into Eliza, which is something that is possible for someone to implement on their own. It is not possible to train an LLM on ones own computer, or on cloud infrastructure without significant costs.

#

if you've only started learning about AI in the last few years, pretty much everything that you think of as "AI" is unobtainable.

crimson tulip Nov 23, 2025, 1:47 AM

#

Thank you! I will look into Eliza.

strange hornet Nov 23, 2025, 3:39 AM

#

https://d2l.ai/chapter_introduction/index.html
is this a good resource for learning deep learning?

#

Since the D2L library used there isn't compatible with Google Colab, I need to rearrange the programs.

#

I want to learn deep reinforcement learning, but I can't find good educational resources for it.

pliant venture Nov 23, 2025, 4:16 AM

#

slate trench label leak from the second csv, wrong gridsearch attribute, evaluating the wrong...

I found the problem. It was a lot simpler actually. For some reason all the males died and all the females lived. No data leakage, just a bad dataset. Thank you though.

final badge Nov 23, 2025, 6:34 AM

#

h

bronze wyvern Nov 23, 2025, 6:48 AM

#

copper kindle I tried it, though I would not recommend it if your background is colored. I had...

oh ok, the problem with tesseract is there a lot to configure, no? Like adding to the environmental variable path etc

copper kindle Nov 23, 2025, 6:49 AM

#

bronze wyvern oh ok, the problem with tesseract is there a lot to configure, no? Like adding t...

yes but thats a one time thing.

#

once you set it up correctly everything works

deep abyss Nov 23, 2025, 6:50 AM

#

Can someone please help me with this....😅 ?
https://discord.com/channels/267624335836053506/1442034333526261770

copper kindle Nov 23, 2025, 6:57 AM

#

deep abyss Can someone please help me with this....😅 ? https://discord.com/channels/267624...

Replied

bronze wyvern Nov 23, 2025, 6:57 AM

#

copper kindle once you set it up correctly everything works

yep noted, ty !

#

the thing is, can it be used on a cloud platform like google colab?

#

I wanted to try something but I'm unsure if I can use it there

copper kindle Nov 23, 2025, 6:59 AM

#

bronze wyvern the thing is, can it be used on a cloud platform like google colab?

This might help you OCRusingTesseract on Google Colab

Google Colab

bronze wyvern Nov 23, 2025, 7:00 AM

#

ohhh

#

thanks !

slate trench Nov 23, 2025, 8:08 AM

#

pliant venture I found the problem. It was a lot simpler actually. For some reason all the male...

If it was the Titanic dataset, then it’s not a bad dataset at all, but a completely classic one, and you won’t get a figure like that if the analysis is done correctly. Although in the Titanic dataset there is no need for a second CSV file, so this may have been a different case.

hasty ivy Nov 23, 2025, 11:16 AM

#

explain me roadmap of ai and data science

copper kindle Nov 23, 2025, 12:19 PM

#

hasty ivy explain me roadmap of ai and data science

AI and Data Scientist Roadmap might help you.

roadmap.sh

AI and Data Scientist Roadmap

Step by step roadmap guide to becoming an AI and Data Scientist in 2025

worn nimbus Nov 23, 2025, 12:43 PM

#

copper kindle [AI and Data Scientist Roadmap](https://roadmap.sh/ai-data-scientist) might help...

you're a hero

little arrow Nov 23, 2025, 1:49 PM

#

does anyone have a good resource on implementing a custom matrix in python? i cant really find much online

serene scaffold Nov 23, 2025, 1:51 PM

#

little arrow does anyone have a good resource on implementing a custom matrix in python? i ca...

what kind of matrix? "matrix" just means "2d array".

little arrow Nov 23, 2025, 1:52 PM

#

serene scaffold what kind of matrix? "matrix" just means "2d array".

basically i need to implement a matrix that has 4 blocks of with dimensions n*n

serene scaffold Nov 23, 2025, 1:52 PM

#

little arrow basically i need to implement a matrix that has 4 blocks of with dimensions n*n

what is a block?

little arrow Nov 23, 2025, 1:52 PM

#

idk how to really explain, like a mini matrix inside of a matrix

#

its like 4 matrices inside one matrix

serene scaffold Nov 23, 2025, 1:52 PM

#

how big is a block?

little arrow Nov 23, 2025, 1:52 PM

#

any value n

#

this is what its supposed to output

serene scaffold Nov 23, 2025, 1:53 PM

#

so if the blocks are n * n, then the size of the whole matrix will be 2n * 2n

little arrow Nov 23, 2025, 1:53 PM

#

is any random number

#

yeah correct

serene scaffold Nov 23, 2025, 1:53 PM

#

!docs numpy.zeros

arctic wedgeBOT Nov 23, 2025, 1:53 PM

#

numpy.zeros


numpy.zeros(shape, dtype=None, order='C', *, device=None, like=None)```
Return a new array of given shape and type, filled with zeros.

little arrow Nov 23, 2025, 1:55 PM

#

so id define it initially with np.zeros(4,n)

#

so it would be ([n], [n], [n], [n])

waxen kindle Nov 23, 2025, 2:00 PM

#

you can create of size 2n by 2n with the diagonal filled with the value

#

then take the second half of the matrix in both dimensions and fill it with a new value

#

something like:

bigmat = numpy.eye(2*n)*value
bigmat[n+1:, n+1:] = value
# or if the last line isn't working
bigmat[n+1:, n+1:] = np.ones([2n, 2n])*value

little arrow Nov 23, 2025, 2:06 PM

#

thank you

little arrow Nov 23, 2025, 2:34 PM

#

waxen kindle you can create of size 2n by 2n with the diagonal filled with the value

alright ive figured out a nicer way

#

im using scipy's linear operator class

#

so i can avoid constructing 2 large arrays filled with 0s

#

so if i want to perform a matvec i can just do it on the two non-zero sections of my matrix

sacred tusk Nov 23, 2025, 5:12 PM

#

Hi everyone! I’m Stella from Zagreb. I recently finished a Python Developer course and have been diving deep into AI, experimenting and practicing to really understand how it works. I’m super excited to start my first projects and learn by doing, especially in Python + AI.

I’d love any advice, tips, or pointers on where to find opportunities or projects — or just general guidance on how to get started. Any help would mean a lot!

obsidian talon Nov 23, 2025, 5:14 PM

#

sacred tusk Hi everyone! I’m Stella from Zagreb. I recently finished a Python Developer cour...

AI is a very broad term. Is there something in particular?

#

There's machine learning, LLMs, deep learning, natural language processing, computer vision, chatbots, and even robotics can be considered AI.

#

In all cases, I'd highly recommend diving into stats for anything in the machine learning route. I'd also recommend learning how to wrangle and visualize data.

obsidian talon Nov 23, 2025, 5:17 PM

#

sacred tusk Hi everyone! I’m Stella from Zagreb. I recently finished a Python Developer cour...

What's your skillset atm?

sacred tusk Nov 23, 2025, 5:18 PM

#

obsidian talon AI is a very broad term. Is there something in particular?

You’re right, AI is extremely broad.
My main focus is working hands-on with large language models — experimenting with them, building structured interactions, testing their behavior, and understanding how they reason.

I’ve spent a lot of time doing deep practical work with LLMs: creating prompt systems, running simulations, analyzing responses, and pushing models to understand complex patterns. So even though I’m new to the Python job market, I already have strong practical intuition in how AI models think, learn through feedback loops, and how to guide them effectively.

If anyone here works with Python + LLMs or is building small AI tools, I’d love to learn, contribute, and help wherever I can. Happy to be here!

sacred tusk Nov 23, 2025, 5:51 PM

#

obsidian talon What's your skillset atm?

My skillset right now is a mix of early-stage Python development and deep hands-on experience with LLMs.

Python (beginner):

basics: variables, loops, functions, OOP

working with files, APIs

simple scripts and automation

currently learning best practices and looking for small real projects to improve

AI / LLM practical experience:

prompt engineering

designing structured conversations

building and iterating “AI agent” personalities

studying model behavior, consistency and memory

running small simulations with an LLM to test reasoning and interaction patterns

I’m still early in Python professionally, but I learn fast and I’m very active in experimenting with AI behavior.
If you have suggestions for small projects or beginner-friendly tasks, I’d appreciate it.

twilit prism Nov 23, 2025, 7:26 PM

#

Can someone explain the self attention formula steps for Q dot product with K, the final computation with k and the argmax step after all of the attention values are accumulated? I can't seem to find a resource that explains it step by step so I'm mixing up which step happens where.

obsidian talon Nov 23, 2025, 7:41 PM

#

sacred tusk You’re right, AI is extremely broad. My main focus is working hands-on with larg...

getting into NLP would help I believe

#

NLP is just really intense classification

serene scaffold Nov 23, 2025, 7:43 PM

#

obsidian talon NLP is just really intense classification

What makes you say that?

obsidian talon Nov 23, 2025, 7:43 PM

#

a lot of NLP tasks revolve around classification

serene scaffold Nov 23, 2025, 7:45 PM

#

Would you consider machine translation in that?

sturdy stump Nov 23, 2025, 10:13 PM

#

Hey guys, I got recommended by a kind lad to figure out a solution to my problem in python help, no one responded and my post got locked

#

its about rocket thrusters

mellow vector Nov 23, 2025, 10:15 PM

#

you can just post it, we do a bit more with numpy so you might get a bite

#

these channels are slower though

sacred tusk Nov 23, 2025, 10:25 PM

#

obsidian talon getting into NLP would help I believe

One of my fav books 🫠 I've tried to send pdf here but server blocked me

sturdy stump Nov 23, 2025, 10:59 PM

#

mellow vector you can just post it, we do a bit more with numpy so you might get a bite

hmmm okok

#

ill just post the full thing here

#

I am given a rocket and I need to figure out the values of Fmax, F0 and mass of the rocket by varying my thrust values and according to the a(F) function given in the screenshot, the acceleration varying in such a way.

I have attached the text file for the code for the right thruster (code is pretty much the same for the left thruster) and the F0 for both thrusters differ while the Fmax is the same.

The problem I get when plotting my results is crazy oscillatory (at least that is what I think it is) behaviour as I go through the thrust values. I have tried to instead use np.gradient but I am not very sure if that is a good approach for this.

This in turn will not allow me to obtain the values at a high enough accuracy and precision.

I have attached some images giving a clearer picture as to what my issue is.

#

📎 RightThrusterRocket-KYAA.txt

arctic wedgeBOT Nov 23, 2025, 11:01 PM

#

sturdy stump

~~Please react with ✅ to upload your file(s) to our paste bin, which is more accessible for some users.~~

rich moth Nov 24, 2025, 1:51 AM

#

sturdy stump I am given a rocket and I need to figure out the values of Fmax, F0 and mass of ...

looks like numerical noise coming from np.diff

sonic olive Nov 24, 2025, 5:58 AM

#

What are the recommended methods and resources for studying mathematics relevant to data science?

sturdy stump Nov 24, 2025, 8:15 AM

#

rich moth looks like numerical noise coming from np.diff

ah okok i thought that would be the problem

#

now here’s my follow up, how do i fix this, should i just use np.gradient?

naive river Nov 24, 2025, 10:13 AM

#

sturdy stump I am given a rocket and I need to figure out the values of Fmax, F0 and mass of ...

Savitzky-Golay Filter and the Kalman Filter might be relevant

sturdy stump Nov 24, 2025, 11:12 AM

#

ooo new concepts for me

thorny geode Nov 24, 2025, 5:12 PM

#

My random forests is dying

#

4 true negative 72 false positive

jaunty helm Nov 24, 2025, 5:17 PM

#

thorny geode 4 true negative 72 false positive

what about true positives and false negatives?
could be due to imbalance, could be due to small sample size

thorny geode Nov 24, 2025, 5:19 PM

#

jaunty helm what about true positives and false negatives? could be due to imbalance, could ...

506 true positive, 2 false negative

#

It's based on 80/20 split, there's 2922 data total

#

I feel likes it's something to do with classification

jaunty helm Nov 24, 2025, 5:22 PM

#

thorny geode 506 true positive, 2 false negative

looks a bit imbalanced / positive case is way easier to classify
did you do stratified split? also class_weight / sample_weight?

thorny geode Nov 24, 2025, 5:23 PM

#

jaunty helm what about true positives and false negatives? could be due to imbalance, could ...

Hmm, I think you are true, because the population proportions should be 77/23 for positive/negative, but the sample is around 14%,

jaunty helm Nov 24, 2025, 5:24 PM

#

thorny geode Hmm, I think you are true, because the population proportions should be 77/23 fo...

for this, you should use stratified split
which would ensure that your train set and test set have similar class distributions

thorny geode Nov 24, 2025, 5:25 PM

#

jaunty helm looks a bit imbalanced / positive case is way easier to classify did you do stra...

It is time based as the closest the data is to the present, the better it is (i guess) to predict future outbreaks

thorny geode Nov 24, 2025, 5:25 PM

#

jaunty helm looks a bit imbalanced / positive case is way easier to classify did you do stra...

Ah thanks a lot, I think I'll put more weight to 0 so the code think twice before labelling everything as outbreaks

thorny geode Nov 24, 2025, 5:26 PM

#

jaunty helm for this, you should use stratified split which would ensure that your train set...

I will try using it

#

Thanks a lot Purplys, now I will be sleeping

jaunty helm Nov 24, 2025, 5:27 PM

#

thorny geode It is time based as the closest the data is to the present, the better it is (i ...

well then that complicates it a bit
usually you don't do stratified for time series, you just split based on time
speaking of which, you should e.g. order your data based on time then turn shuffle=False when splitting, otherwise you leak future information into the training set

untold bloom Nov 24, 2025, 9:46 PM

#

or https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

thorny geode Nov 24, 2025, 11:33 PM

#

Oh god yes thanks Purplys and Nahita

lapis sequoia Nov 25, 2025, 2:02 AM

#

Hi.
l've always enjoyed coding and I'm already comfortable with python and building small things. But l've recently realized that in order to get hired as a developer, you need to specialize in a field. And after a bit of research, I find myself to be drawn towards data science in python. However, I think it's worth mentioning that my math skills are not really good. The potential is there. But, l've never really studied math as l'm a high school dropout. So, I'm seeking advice as to whether if I should dive into data science or not. I have a few questions:
• do you need to be good at math?
• what kind of background do you need ?
• what is the best way to learn ?
• what are the best resources to learn ?
I would deeply appreciate any advice and thank you all.

#

Feel free to ping me any time

limpid zenith Nov 25, 2025, 2:12 AM

#

lapis sequoia Hi. l've always enjoyed coding and I'm already comfortable with python and buil...

do you need to be good at math?
Yes, But it's something you can learn to be good at.

what kind of background do you need ?
Data scientists come from all sorts of background, but most share some computer science, math and stats background.

what is the best way to learn ?
That depends on what works best for you.

what are the best resources to learn ?
Depends on what areas you want to concentrate in. Paid options include DataCamp which are very high quality courses. The there's more formal approaches like college/uni. Free approaches iinclude financial aid in coursera, or watching youtube videos or reading the documentation online.

lapis sequoia Nov 25, 2025, 2:14 AM

#

Why exactly do I need to be good at math ? I’m sorry I’m asking too much but I just wanna be certain before I commit to it. So, why do you need to be good at math ? What do you do daily that requires math

limpid zenith Nov 25, 2025, 2:18 AM

#

lapis sequoia Why exactly do I need to be good at math ? I’m sorry I’m asking too much but I j...

Most of data science is writing code to compute statistics and plots things. This requires a lot of deep understanding of the theory.

For instance, if you're an entry level data analyst you might need to work with dataframes, pivot tables, and compute various statistics, which means knowing what formulas to apply and when. If you're a data scientist then it's even more involved in math usually requires lots of rigorous experimentation, bias/variance tests, hypothesis testing and so on.

#

More advanced data science positions, like senior level data scientists or machine learning engineers also work with lots of differential geometry, calculus and information theory.

lapis sequoia Nov 25, 2025, 2:21 AM

#

To be honest, that truly doesn’t sound like something I would be good at. I might have to take some time to consider it. However, if I don’t get into data science, what other fields do you recommend I specialize in ?

#

Is automation/scripting a field you can get a job with ? Because I really like that. I do build small stuff for myself sometimes

limpid zenith Nov 25, 2025, 2:23 AM

#

It's not a matter of if you'd be good at it, it's a matter do you want to be good at it. The tech industry is currently saturated and jobs like those are harder to come by these days.

lapis sequoia Nov 25, 2025, 2:25 AM

#

I see. Well, thanks for your advice.

tawny raft Nov 25, 2025, 2:48 PM

#

lapis sequoia Is automation/scripting a field you can get a job with ? Because I really like t...

If you like automation and aren't particularly good at maths, you could consider a Data Engineer role.

lapis sequoia Nov 25, 2025, 3:23 PM

#

tawny raft If you like automation and aren't particularly good at maths, you could consider...

Thanks for replying. What is data engineering specifically and how is it different from data science ?

serene scaffold Nov 25, 2025, 3:27 PM

#

lapis sequoia Thanks for replying. What is data engineering specifically and how is it differe...

data engineering is when you manage the data infrastructure for a team, like how it gets stored in databases and made available for other team members

lapis sequoia Nov 25, 2025, 3:28 PM

#

How is the job market for it ? Now that I’m looking into it, I really like it

serene scaffold Nov 25, 2025, 3:29 PM

#

lapis sequoia How is the job market for it ? Now that I’m looking into it, I really like it

the job market right now in the US is generally not great. but if you don't have a degree, you'll very likely need to get one, and the market might have improved by then

lapis sequoia Nov 25, 2025, 3:29 PM

#

I’m not in the US.

serene scaffold Nov 25, 2025, 3:30 PM

#

lapis sequoia I’m not in the US.

idk what the market is like in your country, whatever it might be. you can ask in #career-advice

lapis sequoia Nov 25, 2025, 3:30 PM

#

Thanks. I’m gonna also look into data engineering. Any resource to start with ?

serene scaffold Nov 25, 2025, 3:30 PM

#

I'm not sure. you should probably be comfortable with SQL and MongoDB

lapis sequoia Nov 25, 2025, 3:31 PM

#

Thanks

tawny raft Nov 25, 2025, 3:36 PM

#

lapis sequoia Thanks for replying. What is data engineering specifically and how is it differe...

You can check out this video to see how different roles on a data team work together. Some of those roles might overlap if you're working at a small company.
https://www.youtube.com/watch?v=tyJ476aNCYU

YouTube

Data with Baraa

How Modern Data Teams Work (Engineer, Analyst, Scientist, Architect...

Watch this visual, animated breakdown of how modern data teams really work — including data engineers, analysts, scientists, architects, and ML experts collaborating on real projects.
👉 Subscribe, Like, and Comment If you want more FREE Courses ❤️https://www.youtube.com/@UC8_RSKwbU1OmZWNEoLV1tQg

━━━━━

MY COURSES
To get ce...

▶ Play video

lapis sequoia Nov 25, 2025, 3:36 PM

#

I see that. Thank you.

thorny geode Nov 25, 2025, 4:47 PM

#

jaunty helm well then that complicates it a bit usually you don't do stratified for time ser...

thanks

random nymph Nov 25, 2025, 10:00 PM

#

Hi! Im curious on what some of your guys' favorite deep learning packages/tools are.

My current stack is pretty standard. PyTorch, Polars/Pandas, numpy, Sklearn, Scipy, etc..

I've had ~2 years of hands-on experience with both designing and training models (mostly time series and NLP related), and I'm wondering if there's if there are any underrated or super useful tools that you recommend checking out.

agile cobalt Nov 25, 2025, 10:07 PM

#

for some things I prefer jax over pytorch, other than that just whatever solves specific (frequently niche) problems

e.g. docker image annoyingly large? use onnx over torch
and not really specific to machine learning, but I like markitdown to ingest any files and marimo for prototyping

serene scaffold Nov 25, 2025, 10:09 PM

#

agile cobalt for some things I prefer jax over pytorch, other than that just whatever solves ...

What's onnx?

random nymph Nov 25, 2025, 10:09 PM

#

thank you! these seem pretty cool

agile cobalt Nov 25, 2025, 10:10 PM

#

serene scaffold What's onnx?

ONNX Runtime

GitHub

GitHub - microsoft/onnxruntime: ONNX Runtime: cross-platform, high ...

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator - microsoft/onnxruntime

random nymph Nov 25, 2025, 10:10 PM

#

ive always used jupyter notebook, marimo looks crazy

lone gulch Nov 25, 2025, 10:34 PM

#

I have 360 wedding photos, all ready and edited. I wondered why I couldn't create an AI model to edit them. I came up with a few ideas and approaches:

#######. ######################## ######################.
First, I thought about it and asked Giminai to do it. He suggested training the model with the photos one by one. However, the result was messed up because he was editing pixel by pixel. For example, one half of the face would be over-lit while the other half was under-lit. The training would take about four hours. (I didn't like that idea; it wasn't what I wanted.)

#

########. ######################### ####################.
The second idea was to discover algorithms that adjust lighting and colors. There are also algorithms that calculate the percentage of lighting and colors. So, what did I do? I wrote code that retrieved all the data from the 360-degree photos into a table of the edited images and trained it ("unsupervised learning") so that if I fed it data from an unedited image, it would predict ideal lighting and colors and apply them to the image. (The idea wasn't the best; the editing was weak.)

#####. ######################## ################.
The third idea involved importing the images into Lightroom and changing the settings so they reverted to their original state. I then extracted the data, resulting in two files: x = unedited data, y = edited data. I tried training them using Random Forest Regressor, but the result was worse, especially in terms of lighting. The colors were somewhat good. (Here, I felt the problem was with the data itself, as there was a small amount of incorrect data, but I didn't think it would significantly affect the results.) So, the questions I want to understand are:

#

What's the best training method?
Does even a small amount of incorrect data affect the results?
Is this small amount of data the cause?
Is my approach to these steps sound? In your opinion, how would you rate my thinking of these alternative plans out of 10 (regardless of the project not yet being successful)?
And these are questions for those with experience 👇

Is it possible to train the model, but if the training is insufficient, I create an interface and let the model not predict and modify, and then display a modified image? I would then have three options to click: the first, "No," means the image is corrupted; the second, "Maybe"; and the third, "Yes," means it's modified perfectly. If I click "Yes," it saves the data to the table, adds it, and trains from it?

I would appreciate any helpful information, and if you have any ideas, please leave a comment.

vale elbow Nov 26, 2025, 1:56 AM

#

lone gulch I have 360 wedding photos, all ready and edited. I wondered why I couldn't creat...

Giminai

rich moth Nov 26, 2025, 2:46 AM

#

I just discovered notebooklm . What an incredible learning tool.

iron basalt Nov 26, 2025, 2:57 AM

#

random nymph Hi! Im curious on what some of your guys' favorite deep learning packages/tools ...

https://www.taichi-lang.org/

Taichi Lang: High-performance Parallel Programming in Python

Taichi is a domain-specific language embedded in Python that helps you easily write portable, high-performance parallel programs.

#

(Numba killer)

random nymph Nov 26, 2025, 2:59 AM

#

iron basalt https://www.taichi-lang.org/

Thank you!

random nymph Nov 26, 2025, 3:21 AM

#

this thing sounds insane

serene scaffold Nov 26, 2025, 5:13 AM

#

How do you get to the open source part? The link requires you to submit your business email.

radiant pasture Nov 26, 2025, 5:36 AM

#

I'm developing an Algorithm Trading bot. So I'm wondering what do you guys use in VS Code to visualize/analyse large base of data

slate trench Nov 26, 2025, 7:30 AM

#

radiant pasture I'm developing an Algorithm Trading bot. So I'm wondering what do you guys use i...

A lot of people use Jupyter Notebooks inside VS Code. Personally I prefer doing all the data work in JupyterLab, and then I build "production-ready" Python packages in VS Code when needed.

royal kraken Nov 26, 2025, 10:49 AM

#

that's true, because Jupyter Notebook is very popular right now

serene scaffold Nov 26, 2025, 11:32 AM

#

royal kraken that's true, because Jupyter Notebook is very popular right now

I'd say they're less popular now than they have been in the last five years, now that there's competitors like marimo, and people seem generally more aware of their limitations.

#

!warn @waxen crag your message was removed for advertising. And it's not really an open-source project if you have to sign up for something to be able to access the code.

arctic wedgeBOT Nov 26, 2025, 11:38 AM

#

:incoming_envelope: :ok_hand: applied warning to @waxen crag.

lapis sequoia Nov 26, 2025, 1:48 PM

#

serene scaffold I'd say they're less popular now than they have been in the last five years, now...

Hi, first time hearing about 'marimo'. Apart from Jupyter notebook, I've used Kaggle notebook and Google colab, but they all feel the same in usage with colab having AI integration in it.
I am curious, in what area(s) does marimo excel over the platforms mentioned above? Thanks.

agile cobalt Nov 26, 2025, 2:06 PM

#

the biggest upside is having no hidden state ; the execution order depends only on which cells reference variables defined in which cells, you cannot run things out of order so it's much harder to end up with results different from what you're get running it fresh after restarting the kernel

about half of that comes from their reactive code, half from preventing you from doing things that are generally considered a bad idea though
(you cannot re-assign variables in different cells, and are generally discouraged from mutating things)

#

it also comes with some built-in UI elements and you can toggle between the code and a dashboard/webapp-ish view when using them, kinda like having streamlit built into the notebook

lapis sequoia Nov 26, 2025, 2:35 PM

#

agile cobalt it also comes with some built-in UI elements and you can toggle between the code...

Thanks a bunch. Lots of features, I'd definitely give it a try.

snow marsh Nov 26, 2025, 4:02 PM

#

random nymph Hi! Im curious on what some of your guys' favorite deep learning packages/tools ...

I've only been doing ml and deep learning for a few months, but one library I found to be useful is skorch because it lets pytorch models interface well with scikit-learn functions. 🙂

random nymph Nov 26, 2025, 4:41 PM

#

snow marsh I've only been doing ml and deep learning for a few months, but one library I fo...

Ooh cool thank you

bronze wyvern Nov 26, 2025, 6:00 PM

#

hello, anyone knows about the ipywidgets library? I recently came across that, can anyone explain what is it and how it is used pls, is it just a UI that allows us to set up some settings?

agile cobalt Nov 26, 2025, 6:06 PM

#

bronze wyvern hello, anyone knows about the `ipywidgets` library? I recently came across that,...

pretty much - though not necessarily "settings", just user interface inputs.
You could use it for experiments/simulations/training parameters, plots/tables filters, or even just something like a calculator
the documentation explains it fairly well imo, is there some specific thing you feel that it is missing? https://ipywidgets.readthedocs.io/en/stable/

bronze wyvern Nov 26, 2025, 6:08 PM

#

agile cobalt pretty much - though not necessarily "settings", just user interface inputs. You...

will have a look at the docs, haven't dive into it yet, :c, ty !

radiant pasture Nov 26, 2025, 6:56 PM

#

guys do you know why the data looks so ugly ? it wasnt supporse to be like in clean separated colums ?

#

I just wanna to look like this

#

it says that I got a total of 1 column, why is that ?

random nymph Nov 26, 2025, 7:05 PM

#

radiant pasture it says that I got a total of 1 column, why is that ?

Use sep or separator = “\t” for tab separated values in the read_csv arguments, I forget what the arg name is specifically. By default it’s a comma separator because csv is “comma separated values”

radiant pasture Nov 26, 2025, 7:14 PM

#

thanks man

barren wadi Nov 26, 2025, 7:19 PM

#

Did anyone do anything using conformal prediction efore?

prisma wing Nov 26, 2025, 7:45 PM

#

https://github.com/perpetual-ml/perpetual

GitHub

GitHub - perpetual-ml/perpetual: A self-generalizing gradient boost...

A self-generalizing gradient boosting machine that doesn't need hyperparameter optimization - perpetual-ml/perpetual

#

just found this

#

better and faster than autogluon

slate trench Nov 26, 2025, 8:22 PM

#

serene scaffold I'd say they're less popular now than they have been in the last five years, now...

Cloud data-platform notebooks are also often used, such as those in Databricks or Fabric, and within the notebook the languages used can be Python, PySpark, SQL, or Scala.

serene scaffold Nov 26, 2025, 8:22 PM

#

slate trench Cloud data-platform notebooks are also often used, such as those in Databricks o...

But are those actually different notebook types, or are they just reskins of the ipython engine?

slate trench Nov 26, 2025, 8:30 PM

#

serene scaffold But are those actually different notebook types, or are they just reskins of the...

They look visually similar to Jupyter notebooks, but the execution engine is Spark (and other runtimes depending on the cell type). Since Apache Spark is built in Scala, Scala is fully supported in Spark environments like Databricks, though not all platforms. Fabric has PySpark, SQL and Python. No ipython under the hood at all.

rich moth Nov 27, 2025, 7:53 AM

#

radiant pasture I'm developing an Algorithm Trading bot. So I'm wondering what do you guys use i...

I've been working on for a while, too. Its actually trading on a testnet right now. What kind of setup did you go with? As for VS Code extension, maybe look at Sandance in the extensions.

noble spear Nov 27, 2025, 10:38 AM

#

Can someone help me with a ai/ vision problem that know alot about these things?

waxen kindle Nov 27, 2025, 11:04 AM

#

Just ask

noble spear Nov 27, 2025, 11:24 AM

#

I have a ai model that can detect eggs and tell if they are clean or dirty. I want to be able to see on the dirty eggs how much dirt is on them. So i would like to make a mask of only the dirt. I have tried a few things but the results are not that great.

Does anyone know how i could make a presice mask from the dirt on the egg? Or have any ideas.

Its also important that the stamps on the eggs are ignored in the mask. I also have no idea how to do this or if its even posible.

noble spear Nov 27, 2025, 11:58 AM

#

agile cobalt Nov 27, 2025, 12:06 PM

#

I feel like some simple theresholding should work?

agile cobalt Nov 27, 2025, 12:36 PM

#

yeah even just asking AI to write some opencv2 code it seems like it should do the job
(code)

noble spear Nov 27, 2025, 12:50 PM

#

agile cobalt yeah even just asking AI to write some opencv2 code it seems like it should do t...

This is not rlly what im looking for. Im looking for a mask for only the dirt on the egg. If you get what i mean

#

And it needs to be specific. And detect evrything thats dirty and should not be on the egg

agile cobalt Nov 27, 2025, 12:58 PM

#

that was meant to be at most a starting point, not me doing your entire job for you

#

you can test a few different strategies and see what works - it'll probably involve a ton of trial and error one way or the other

noble spear Nov 27, 2025, 12:59 PM

#

I have tried some things myself aswel and ive come here(see picture). It looks realy doable to filter it out of here. And i tried to filter out green with the inrange command. But i dont get enything out if it. The img is in hsv

noble spear Nov 27, 2025, 1:00 PM

#

agile cobalt you can test a few different strategies and see what works - it'll probably invo...

Yeah i noticed😅

noble spear Nov 27, 2025, 1:01 PM

#

agile cobalt that was meant to be at most a starting point, not me doing your entire job for ...

Ty tho! But was not completely what i was looking for

radiant pasture Nov 27, 2025, 6:17 PM

#

guys what is going on ??

serene scaffold Nov 27, 2025, 6:22 PM

#

radiant pasture guys what is going on ??

You're trying to look at a file that isn't text in the text editor

agile cobalt Nov 27, 2025, 6:22 PM

#

radiant pasture guys what is going on ??

why are you working with wheel files in first place?

radiant pasture Nov 27, 2025, 6:23 PM

#

serene scaffold You're trying to look at a file that isn't text in the text editor

I dont know why, but it popped out suddenly, maybe I touched something that I should not have

agile cobalt Nov 27, 2025, 6:23 PM

#

if you are just trying to download/install packages, you should use tools like uv or pip that manage downloading automatically for you instead of downloading these files yourself

radiant pasture Nov 27, 2025, 6:23 PM

#

yes, I use pip

#

what is the actual difference between this two ?

subtle lotus Nov 27, 2025, 7:09 PM

#

radiant pasture what is the actual difference between this two ?

python environments -> uses a default python env (which could be global or a venv)

existing jupyter server -> you must run the jupyter server separately, for example in another terminal. good for centralizing workflows in a single environment

#

For most cases, i recommend Python environments. It's simpler, less hassle.

radiant pasture Nov 28, 2025, 5:56 AM

#

rich moth I've been working on for a while, too. Its actually trading on a testnet right ...

Im really new into python. I have created many Bot in Pine Script. Now Im wondering which one should i use: Backtrader, Backtesting.py or any other tool. Which one should I use ?

zenith nova Nov 28, 2025, 7:54 AM

#

!ban 719846291332661259 spam

arctic wedgeBOT Nov 28, 2025, 7:54 AM

#

:incoming_envelope: :ok_hand: applied ban to @waxen crag permanently.

glossy tendon Nov 28, 2025, 8:13 AM

#

Hi all, I have been trying to use dask and a notebook to process 560 gzip files of data, 100mb compressed and 524mb uncompressed each, on my computer which usually has 10-13 GB available RAM, is there a way to split the aggregates at the middle layer to write to their own files? I'm sick and tired of constantly having to cancel runs because of OOM

#

I know of repartition, maybe I'm just misreading this graph??

digital valley Nov 28, 2025, 8:51 AM

#

“Hi, can anyone tell me which project I should put on my resume?”

#

in Data analysis

glossy tendon Nov 28, 2025, 8:51 AM

#

the project you're most proud of or most relevant to what job you want

digital valley Nov 28, 2025, 8:52 AM

#

glossy tendon the project you're most proud of or most relevant to what job you want

Data analysis

stable mural Nov 28, 2025, 1:26 PM

#

Could someone recommend some books covering data analysis for beginners

radiant pasture Nov 28, 2025, 5:01 PM

#

guys I dont know why but Its taking too long to connect with python

agile cobalt Nov 28, 2025, 5:02 PM

#

it might not work well with 3.14 yet, try using 3.13 or even 3.12

molten badger Nov 28, 2025, 5:21 PM

#

should i learn neural networks from scratch or should i directly learn tensorflow

random nymph Nov 28, 2025, 5:43 PM

#

molten badger should i learn neural networks from scratch or should i directly learn tensorflo...

I’d say learn the math and the theory first, but I’d recommend just implementing with PyTorch/Tensorflow after you understand how the structures work

#

You can technically get away with just using a python library and not know how everything works for some basic projects, but diagnosing your problems and improving your models will be very hard

#

For coding by scratch, there are things like gradient calculation and optimizers that might make it a little harder and confusing for beginners to implement, which is why I suggest just jumping to the library

#

Those libraries take care of it for you

molten badger Nov 28, 2025, 5:54 PM

#

random nymph I’d say learn the math and the theory first, but I’d recommend just implementing...

i know maths , i'm reading a book called neural networks from scratch in python , it's using numpy to create neural networks

molten badger Nov 28, 2025, 5:55 PM

#

random nymph For coding by scratch, there are things like gradient calculation and optimizers...

yea ok i'll do that

random nymph Nov 28, 2025, 5:58 PM

#

I’d say from scratch is good if you want to dive super deep into the functionality and implementation, but if you just want to learn about architecture components and specific applications, libraries will probably help you a little more there

lapis sequoia Nov 28, 2025, 6:04 PM

#

guys is this were data anlayst chat?

serene scaffold Nov 28, 2025, 6:07 PM

#

!warn 1401906940866465848 Your messages where you ask for work have been removed, as this is against the rules.

arctic wedgeBOT Nov 28, 2025, 6:07 PM

#

:incoming_envelope: :ok_hand: applied warning to @craggy creek.

summer mist Nov 28, 2025, 6:43 PM

#

Someone knows a good formation for Pandas ?

serene scaffold Nov 28, 2025, 6:44 PM

#

summer mist Someone knows a good formation for Pandas ?

I recommend the kaggle pandas tutorial, which is interactive

summer mist Nov 28, 2025, 6:44 PM

#

serene scaffold I recommend the kaggle pandas tutorial, which is interactive

Thanks i'll check it !

agile cobalt Nov 28, 2025, 6:51 PM

#

also read the official user guide if you haven't yet

lapis sequoia Nov 28, 2025, 6:53 PM

#

guys anybody knows what should i master in python to become a data analyst

random nymph Nov 28, 2025, 7:01 PM

#

lapis sequoia guys anybody knows what should i master in python to become a data analyst

I’d say the standard libraries you must learn for any data related job are numpy, pandas, scipy, scikit learn, and matplotlib

#

These libraries were more enough for me to get my first work experience at least

lapis sequoia Nov 28, 2025, 7:04 PM

#

random nymph I’d say the standard libraries you must learn for any data related job are numpy...

thanks bro

#

what is easier data anysis or data science

waxen kindle Nov 28, 2025, 7:05 PM

#

The one you prefer

lapis sequoia Nov 28, 2025, 7:06 PM

#

i mean i want the easiesst one only

radiant pasture Nov 29, 2025, 8:14 AM

#

can anyone help me with this error ?

#

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyarrow
Failed to build pyarrow
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> pyarrow

waxen kindle Nov 29, 2025, 9:00 AM

#

lapis sequoia i mean i want the easiesst one only

The easiest one is the one you'll be the most at ease with, which is the one you prefer

opaque condor Nov 29, 2025, 7:20 PM

#

I'm finally happy

#

I can add a bounding box to my dataset using Labelimg

pearl wedge Nov 29, 2025, 7:45 PM

#

lapis sequoia guys anybody knows what should i master in python to become a data analyst

learn pandas

sacred tusk Nov 29, 2025, 11:43 PM

#

Lately I’ve been noticing something funny:
the same intuition I use when I read people in real life seems to work surprisingly well when I work with AI systems.

It’s like… every model has a ‘personality rhythm’.
Every prompt has an emotional temperature.
Every conversation has a hidden structure.

I don’t approach AI academically — I feel it first.
Behaviour, resonance, stability, the way the system “breathes”…
and only later I figure out the technical name behind it.

It’s a strange skill to describe, but it lets me spot patterns fast, tune prompts intuitively, and stabilise behaviour before it breaks.

Does anyone else work with AI more through intuition than textbooks?
Curious to hear how you translate your human instincts into AI work.

short saffron Nov 30, 2025, 7:53 AM

#

hi

radiant pasture Nov 30, 2025, 11:31 AM

#

What is the best model in Ollama for coding?

#

But at the same time not too big, because I only have 1Tb and 16gb DDR5 ram

spiral kindle Nov 30, 2025, 11:37 AM

#

Hello everyone is any body, ready to collaborate for any projects, which I can get hand on learning skills from

Please let me know am open.

#

I like to learn from people guys.

surreal terrace Nov 30, 2025, 1:19 PM

#

radiant pasture But at the same time not too big, because I only have 1Tb and 16gb DDR5 ram

i dont think storage matters a lot, its more of ur gpu, vram, ram and cpu

#

lets take deepseek coder for example, looking at ur storage and ram, i think youll have maybe a 8 or 12 gb vram gpu or even more im not familiar with the latest gpus, but you can for sure run like 8b or even 12b params model very easily, but if you go for better ones like the 33b youll have to offload these work to your ram and cpu too which could be slow but yea still run, again im no expert

toxic palm Nov 30, 2025, 1:28 PM

#

Hi,
I am interested in learning AI. checked out youtube courses & udemy etc to find an course.
looked into so many suggestions from reddit, however, could not find something explaining in simple way with pictures & sample programs.
if you know any, pls let me know.

dreamy latch Nov 30, 2025, 1:29 PM

#

@toxic palm a specific part of the AI field or everything that came in the last decade ? I'm a newcomer too but all I can say is that I watched sebastian raschka llm videos

toxic palm Nov 30, 2025, 1:31 PM

#

dreamy latch <@924935415721521172> a specific part of the AI field or everything that came in...

just basics for now..

dreamy latch Nov 30, 2025, 1:32 PM

#

seems like raschka video tutorials are split in easy chunks, and his book has a bit more details. but maybe there's better. how solid on math are you

surreal terrace Nov 30, 2025, 1:32 PM

#

toxic palm Hi, I am interested in learning AI. checked out youtube courses & udemy etc to f...

learning "ai" in the sense how to build them?

toxic palm Nov 30, 2025, 1:34 PM

#

surreal terrace learning "ai" in the sense how to build them?

kind of what is AI, then writing some small AI programs etc

surreal terrace Nov 30, 2025, 1:40 PM

#

toxic palm kind of what is AI, then writing some small AI programs etc

mhm thats cool, if you just want the basics then i'd say watch 3blue1brown's video (youtube) on large language models, its pretty simple to understand for beginners, there is also a person named Andrew Ng, who is like really good at this, he has courses on coursera, all tho they are paid you can still watch all videos of that course, and if you want to understand it more visually then there is this website called mlu-explain github io, it explains all the ways we train these "AI"

toxic palm Nov 30, 2025, 1:42 PM

#

dreamy latch seems like raschka video tutorials are split in easy chunks, and his book has a ...

Hey gum, just checked raschka video tutorials. they are good. he is explaining step by step starting from environment setup. Thank you.
Regading math : any specific things you mean?

toxic palm Nov 30, 2025, 1:44 PM

#

surreal terrace mhm thats cool, if you just want the basics then i'd say watch 3blue1brown's vi...

thank you. Also, one basic qn,
when i ask about AI, why everyone pointing to LLM. Is it one concept in AI / what is it?

Got it:
AI (artificial intelligence) is a broad field that aims to simulate human intelligence and behavior. Under its umbrella are machine learning, deep learning, and generative AI. All three concepts share a common foundation: learning from data.

dreamy latch Nov 30, 2025, 1:57 PM

#

@toxic palm it's the mainstream trend these days. then there's machine learning (various kinds of neural networks, deep learning) ... long ago there was GOFAI (expert systems, prolog)

#

I find the semantic vector embedding idea nice

toxic palm Nov 30, 2025, 1:58 PM

#

dreamy latch <@924935415721521172> it's the mainstream trend these days. then there's machine...

ok. which branch is better for starters?

dreamy latch Nov 30, 2025, 1:59 PM

#

i'm too new to answer that sorry

#

start with the answer above (andrew ng) you'l see

quasi pier Nov 30, 2025, 3:37 PM

#

Is anyone in here familiar with deep reinforcement learning ?
I'm trying to solve highway_env using DQN and am struggling a lot.

quartz plover Nov 30, 2025, 6:30 PM

#

hey uall

#

anybody here learn about data engineering ??

opaque condor Nov 30, 2025, 7:02 PM

#

I know not yet also would have anyone made a simulation with life in it?

agile cobalt Nov 30, 2025, 7:06 PM

#

"with life in it"? what exactly do you mean by that

opaque condor Nov 30, 2025, 7:31 PM

#

A network that can pass traits on like a genetic algorithm but goes through the same processes similar to animals or humans in any regard

serene scaffold Nov 30, 2025, 8:03 PM

#

opaque condor A network that can pass traits on like a genetic algorithm but goes through the ...

How is what you're saying different from a genetic algorithm?

opaque condor Nov 30, 2025, 9:49 PM

#

Good point

unkempt apex Dec 1, 2025, 5:26 AM

#

quasi pier Is anyone in here familiar with deep reinforcement learning ? I'm trying to solv...

elaborate more

empty cipher Dec 1, 2025, 8:51 AM

#

I wana build a tool compatible with next js... Can someone guide me on how to build an Ai that can check 10 pdf files with each file having one page .. either calling them from some data base or user uploads
What should I go for An agent or some finetuned Ai model

raven swift Dec 1, 2025, 9:29 AM

#

empty cipher I wana build a tool compatible with next js... Can someone guide me on how to bu...

Use a Python backend (FastAPI) with Next.js frontend to upload or fetch 10 PDF files, then use OCR + structured parsing, and only call an LLM when needed. Start with a pipeline approach, not a finetuned model, unless your PDFs follow a consistent format.

modern copper Dec 2, 2025, 1:49 PM

#

hallo.
does this channel count as scientific computing?

serene scaffold Dec 2, 2025, 2:10 PM

#

modern copper hallo. does this channel count as scientific computing?

this channel is where we talk about that

modern copper Dec 2, 2025, 2:10 PM

#

epic.

#

gang NumPy slicing is SOO hard to get in my head.
wdym the last index is 'exclusive'😭🙏

serene scaffold Dec 2, 2025, 2:23 PM

#

modern copper gang NumPy slicing is SOO hard to get in my head. wdym the last index is 'exclus...

it's the same as list slicing, in that regard