#data-science-and-ml

1 messages · Page 184 of 1

iron basalt
#

(And there is still a lot of things that can be done with that simple ML (it's not all just LLMs and video generation, etc, lots of practical problems to be solved with simple ML))

primal hemlock
#

Could liquid cooling work?

iron basalt
#

(Even if you are doing ML research btw, see LeCun right now for example, you start small and scale up later, make sure the idea works first in small)

iron basalt
#

Out of the PC, so it keeps running, but it needs to go somewhere.

#

Think about it as moving around the heat.

#

Ideally out of your house.

primal hemlock
#

Strip down an old fridge and use it

iron basalt
#

Keep replacing the bucket and that can work, if you are up for that 24/7.

#

But yeah, datacenters have this big problem, so they act like one giant PC that is the size of the whole building, and pump out the heat.

#

Liquid cooling.

#

(Locals hate it)

#

You can convert your home into a mini datacenter, if you really don't mind price.

primal hemlock
#

Heat pipe to the neighbors home, problem solved

iron basalt
#

Open their window, pipe from your window, they surely won't notice /s.

primal hemlock
#

Paint it camo

#

“This is not a heat pipe”

#

Bootlegged magnetocaloric effect fridge

primal hemlock
#

Where would you suggest I start with the beginner stuff?

versed pilot
#

For a total ML beginner start with sklearn. Linear regression etc. Maybe XGBoost once you hit the limits of what you can do with sklearn?

grim storm
#

Hello guys anyone worked with anomaly detection on Agriculture sensors ?

warm dune
serene scaffold
serene scaffold
#

I'm just fabulous. what do you think about data science and ML?

warm dune
serene scaffold
#

sure, what about it?

warm dune
#

like i genuinely could explain to my mother and and she understood

#

now i will to check a little RAG and LangChain

#

just to see what is it

half pulsar
half pulsar
warm dune
#

langchain i have no ideia for what is it

half pulsar
serene scaffold
#

The same is true when you do agentic development

serene scaffold
#

What question?

warm dune
#

transformers will learn the model how to read

#

and RAG will give the context

#

its like that?

serene scaffold
#

The reason we do RAG is because we can't trust LLMs to function as knowledge stores. They tend to form coherent sentences that make sense but are just false.

#

RAG is just the idea of looking up potentially relevant text from a knowledge store, and then putting that text after the user's question, and then letting the LLM generate text from there.

warm dune
#

i think I understand

serene scaffold
#

But you can trust them to synthesize information that's immediately available to them

warm dune
#

thks pope

serene scaffold
#

I absolve thee

dull flicker
#

@versed pilot @grim storm my dms are open!

livid oasis
#

i am just curious as a beginner, the libraries like numpy and pandas, how they're used in later on stages of machine learning !!

#

or the 80/20 rule applies here?

jaunty helm
livid oasis
#

like data cleaning and pre-processing

warm dune
#

Guys, just a review question, neural networks are feature extractors

That is, the weights of neurons are, in part, vectors that simulate characteristics (after training, with the weights adjusted)

And through the dot product, we can see the similarity of the neuron (which carries a feature) and our input vector (the data) so if that vector has the features our dot product will send an 'intensity' to the next layer

That the next layer will do the same feature simulation, and now it will be kind of a 'feature of the feature', until it reaches the exit layer

And with each layer pass, the result of the 'intensity' that will be passed as an input vector, so we can extract the 'characteristic from the characteristic' and also modify the space, since this intensity ends up becoming the coordinates for a new space, so to speak

serene scaffold
#

though it's not really as simple as "this layer identifies one of the features". the feature extraction is something that emerges from the whole network.

stuck swallow
mild dirge
unreal condor
warm dune
#

guys in the context of transformers, whats the main difference between heads and blocks?

unreal condor
#

Head is a special block computed with 3 special matrices Key, Query, Value (divided into 3 from the result of the previous layer) I think these 3 are inspired by the concept of information retrieval. And if you compute this block multiple times in parallel you have multi-head attentions

#

Also, I have given up ML long ago so pls fact check : )

tawdry heart
#

@warm dune there's a 3b1b vid on transformers which is pr good if u haven't seen it

frigid niche
#

I am currently working on a Language Model that runs on the TI 84 Plus CE. It is 200k parameters! It uses syllables as a tokenization system. I have it running on the actual hardware, but did testing with an emulator first. I should have all of the documentation ready in a few days or so, but I was really excited to share a sneak peek!

iron basalt
# primal hemlock Where would you suggest I start with the beginner stuff?

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016
View the complete course: http://ocw.mit.edu/6-0002F16
Instructor: John Guttag

Prof. Guttag provides an overview of the course and discusses how we use computational models to understand the world in which we live, in particular he discusses the knapsack problem and g...

▶ Play video
#

After which I would find some resource on deep learning, and after that, pick some topic, such as computer vision. But it's really important to get these foundations covered from resources like that MIT course I linked. Without them you won't really know what it's all based on, and how to correctly evaluate various methods (and how not to do statistics (many ways to mess it up)).

#

Small note on the MIT course, they use outdated Python libraries, specifically PyLab, use matplotlib.pyplot to plot things instead and other replacements for things they do.

warm dune
#

Guys, in the context of fine tuning the model (LLM), such as specializing in a subject, transforming it into a chatbot and more, have a place where I can explore that?

#

a video, article or anything

serene scaffold
warm dune
# serene scaffold you can look into how it's done conceptually, but fine-tuning also requires a lo...

I saw a video of a guy saying that

  1. Models like GPT and more, are trained with text corns (pre-training)

And in this context, he doesn't know how to respond like an assistant, he would just complete the sentences. Then Fine Tuning would emerge

  1. The Fine Tunnig in the video, the person explained that we would 'train' the model, but only some weights, and gave an example of LORA, who would then make the model respond like an assistant

I was trying to talk about it, I don't know if I used the wrong terms

agile cobalt
# warm dune I saw a video of a guy saying that 1. Models like GPT and more, are trained wit...

both things are called fine tuning,

  • going from a pre-trained model into a instruct tuned chat model
  • further tuning an already instruct-tuned chat model to follow some specific formatting/guidelines
    but are entirely different beasts, the format requiring orders of magnitude more data and compute than the later

all major chatbot models like chatgpt, gemini, deepseek, qwen etc. go through some pre-training and fine-tuning, but there is relatively little to gain from further fine tuning models afterwards unless you have some very specific use case

warm dune
# agile cobalt both things are called fine tuning, - going from a pre-trained model into a inst...

I was watching Kaparthy's video where he creates NanoGPT, and the predictions were based on the text itself. Then I started thinking: "If I use a dataset like Shakespeare's, it won't respond like a chatbot." So I looked into it and discovered fine-tuning LLMs. That I can take a ready-made model like GPT2 and transform it into whatever I want. Since it's already trained to recognize context and more. I saw a comment on Twitter saying that this was the industry standard.

They train the model to learn context from the entire internet -> They do fine-tuning so it acts like a chatbot.

But you said there's little gain, so how would it work to have a greater gain?

#

like to transform a model to a chatbot

agile cobalt
# warm dune I was watching Kaparthy's video where he creates NanoGPT, and the predictions we...

the open source models labs publish on huggingface and such already do everything we know of that leads to a greater gain

there is little (if anything) to improve that would lead to better general purpose usage, most fine tunes are either trying to remove censorship, improve the performance for niche scenarios at the cost of general performance, or aim for some ultra specific task

warm dune
agile cobalt
#

both
see the description of https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct for example, preferably also look into the actual papers and technical reports

Model Architecture: Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
(in that release series, -Instruct is the chatbot, https://huggingface.co/meta-llama/Llama-3.1-8B is the base model ; some others invert it by adding a -Base suffix to the base model and no suffix to the instruct model)

frigid niche
#

Hello there everyone! I have recently completed a syllable-level autoregressive language model that runs entirely on a TI-84 Plus CE calculator! It generates original English prose and poetry from a seed phrase, doing all inference on-device with no external hardware! The architecture is something I am really proud of. Rather than working at the word or character level, the model tokenizes language into its phonetic syllable components, onset, nucleus, coda, stress, and word boundary, and predicts each one through six separate factored output heads. The hidden layer is 198 neurons split into two 99-neuron chunks to fit the TI-84's matrix constraints, with 21-dimensional embeddings per component and a context window of 10 syllables. There is also a 16-dimensional discourse state and an 8-dimensional word state that carry meaning across the generation, giving it a sense of narrative continuity! The full input dimension ends up at 874. The biggest challenge was getting inference to run at all on 154KB of RAM. I precompute the token-context H1 contributions ahead of time so the calculator only has to add vectors instead of multiplying full matrices at runtime, and the output weights are repacked in column-major order for a further speedup. Even with all of that, a full generation run takes about 2.5 to 3 hours on the calculator. You also have to keep an eye on it and confirm garbage collection prompts periodically, which I find adds a certain charm to the experience!

I hope that others will find joy, intrigue, or inspiration from this project. If anyone checks it out, please let me know what you think!

https://github.com/exploratorystudios/TILM2

GitHub

Syllable-level autoregressive language model that runs on a TI-84 Plus CE calculator. Full pipeline: corpus generation, training, PC inference, and export to native TI variable files. Generates poe...

frigid niche
#

The constraint does not limit the work. It becomes the work.

wintry brook
#

Is there any docs to study ds & ml from ?

grand minnow
wintry brook
devout osprey
#

any resource for pandas aside docs?

#

more of a practical way to learn ?

grand minnow
oak veldt
gloomy fractal
#

i didnt get how the tutor did the last step

#

@\

serene scaffold
#

I don't see where x_2 is ever defined? am I blind, @gloomy fractal?

gloomy fractal
#

you could see the 2nd image

gloomy fractal
#

of last step

warm dune
#

guys, is there any lib for RAG, or something like that? what is the standard of the industry?

unreal condor
#

you probably have more animal_migration instances in your dataset

trim dock
#

I checked mutiple times

unreal condor
#

apart from acc, like Precision, Recall, F1?

trim dock
# unreal condor have you tried other metrics when training ?

Uhm... no i just know the very basics of this and am just a hobbiysts, but i would try the ones that you have mentioned also i will re-re-check the image distribution you mentioned

Will see these when i receive the free credit things on colab

And i have re-opened the thread
Replies could be late as i am studying!

unreal condor
#

Accuracy is probably the least trustworthy metric tbh

trim dock
#

Can i ping you here or in threads when i work on it again on colab?

unreal condor
#

whatever works tbh

#

also disclaimer, my ML knowledge is kinda rusty since I have given up ML for a while

#

also, did you test your model with custom inputs?

trim dock
trim dock
#

It was working whenever i gave it a photo from test set but its fails miserably when i doodle myself it 99% misclasifies it as animal_migration no matter what i doodle

unreal condor
trim dock
#

Was saying this animal migration 😂

trim dock
unreal condor
unreal condor
trim dock
trim dock
unreal condor
trim dock
unreal condor
trim dock
#

I do now understand why it says it is animal_migration

trim dock
#

Like i made digit_recognizer once its processed image (28x28) didnt looked like this they still preserved info

unreal condor
#

I think your input is also kinda wrong

#

the object should be white and the background black

unreal condor
#

your data instances are more pixelated too

trim dock
unreal condor
#

This is one of the toy datasets that has like no application in the real world tbh so your custom inputs will have a hard time to fit in the model :/

trim dock
#

See this is a preproccsed image from my the digit recigniset

trim dock
unreal condor
#

could opencv2 do your thing?

trim dock
unreal condor
trim dock
unreal condor
trim dock
trim dock
unreal condor
#

no? all of this preproccess stuffs don't need GPU?

trim dock
unreal condor
#

use it to preprocess data then

trim dock
#

Tho i trained the digit_recogniser on phone since it was small but not this sht

unreal condor
#

there is also kaggle for free GPU

trim dock
trim dock
unreal condor
#

idk, I haven't tried preproccess images like this before

#

but Opencv2 is literally for working with images so you should check it out if can

trim dock
#

Thank you for pointing this out tho as i had forgotten about this i will go re-write the preprocess code using PIL since it worked last time

trim dock
#

@unreal condor using PIL to pre-process data worked like a charm its atleast classifying correctly however when it doesnt it throws it in animal_migration category which tbh is frustating but hey one step closer.
Thank you greg!

dense kite
#

I have a Query?

When humans see something, we immediately build mental stories and simulate possible futures. Current AI models generate predictions based on patterns in data, but do not seem to have internal simulation or understanding.

Do you think large neural networks are developing a form of internal world-model or imagination-like process, where they can simulate future outcomes beyond pattern completion? Or is this still fundamentally different from human cognition?

serene scaffold
abstract wasp
#

Hi do u guys use cursor ide and uv for building ai agents? I’m new to agents, is it better than just using vscode?

iron basalt
# dense kite I have a Query? When humans see something, we immediately build mental stories ...

World models are a thing in ML, they do already exist, but it's about how to do that well. It's part of the whole. Humans have many different subsystems for specific things, a general system wrapped around all of that (literally), and a meta system (which may or may not be called "consciousness" depending on who you ask / how you define that). What we have with a lot of things right now in AI/ML is basically taking one part of one of those systems, making a very crude approximation or just loosely inspired by it and scaling that up really big. But another big thing is just the high level design goals of these things. Humans for example will do things without being prompted, they will say "IDK" instead of always giving an answer with confidence, they are aligned (to varying degrees) with other humans in terms of goals and "taste," they don't need to be trained all ahead of time on a massive dataset (they learn "online"), they have a meta algorithm applied to the whole population (evolution), etc. A lot of things in AI/ML just don't even have these design goals, they are meant to do some specific job or set of jobs. Very different from a thing that just exists/survives and does stuff on its own (lots of interacting parts / goals). Human cognition involves this dance of all these systems interacting (this is not including the rest of the body which is also part of it all (and also social, etc)).

warm dune
#

someone knows a good article for model monitoring?

limpid zenith
# dense kite I have a Query? When humans see something, we immediately build mental stories ...

there is a huge body of evidence that suggests that there is no world model in LLMs, they're the Myhill-Nerode theorem and similar results show case this

https://arxiv.org/abs/2406.03689v1

devout osprey
#

hey i have done with python , numpy and pandas , might look into matplotlib and seaborn later ,
can anyone tell me good resources to go learning ml/dl , and mathmatics required for ml/dl.
??

grand minnow
devout osprey
grand minnow
devout osprey
gloomy fractal
#

anyone active?

serene scaffold
gloomy fractal
#

hi @serene scaffold

#

can i create all linalg concepts from scratch

#

is it a good idea?

gloomy fractal
#

as codes

serene scaffold
#

don't ping people to say hi before you say the thing that you actually want them to read and respond to. that's like calling someone on the phone and then immediately putting them on hold, and is rude

#

you can implement linalg algorithms in python, yes. you'll get worse performance than if you had used numpy.

gloomy fractal
#

oh....mb

gloomy fractal
#

?

serene scaffold
#

what about that statement do you find confusing?

gloomy fractal
#

numpy is better for linalg? and doing the other way is worse?

serene scaffold
#

numpy is implemented in C and can do atomic operations in parallel using CPU magic, so it scales much better than pure python.

gloomy fractal
#

i want to build to learn..revision is boring, maybe building helps

serene scaffold
#

but writing something from scratch is a great way to learn, so go ahead and do it in pure python if you think that will help.

gloomy fractal
#

okay

foggy jay
#

Hi everyone

#

I want to build ML projects any suggestions?

serene scaffold
#

I don't expect you to know what all of that means, but you'll be able to figure it out.

grand tulip
#

I have a research project involving the use of camera object detection and Id like to gain a solid understanding of OpenCV before (tool in research might not be OpenCV but at the end of the day they’re all similar) .What are the best ressources ?

mellow spruce
serene scaffold
half pulsar
# mellow spruce

I checked your Github and I’m not seeing AGI here. I’m seeing heavily branded LLM/tooling projects with AI-generated imagery and inflated claims.

If there’s real substance, explain it plainly. Otherwise call it an agent framework, not AGI.

gloomy fractal
#

how often OOP is used in ML

#

and in data science

serene scaffold
gloomy fractal
serene scaffold
#

what are they teaching you about OOP that you feel is depthful?

gloomy fractal
#

most of the dunder methods..,callables, using specific libs, descriptors, enumeration and many more....i covered only classes part in one month

serene scaffold
#

a lot of ML libraries use dunder methods, so it's good to understand them

gloomy fractal
#

so is it good?

serene scaffold
#

yeah

warm dune
#

I don't think it gets much more than that about OOP

pseudo sundial
#

Hi,I have 5 years in game industry as an animator is it possible to switch careers to the data field (data engineer or data analyst) at age 27?

half pulsar
pseudo sundial
#

Yeahh I really want to change career into the data fields. So far I join online course @half pulsar hopefully works well 😄😄

half pulsar
ashen echo
#

looking to use PandasAI to do some data analysis, any suggestion for which underlying LLM i should use outside of openAI. I am mostly doing some transforming and analysis of excel files?

true pollen
fading wigeon
royal raven
#

anyone here tried doing a sentiment analysis for book reviews?

ashen echo
# fading wigeon Why are you using an LLM for data analysis?

Pandas, is the defacto, but I feel like automating some of my weekly analysis on certain excels I pull down from out crm system, could have certain conclusions automatically, and get a second angle to look at my data sometimes. I think the Pandas AI could help with that. Unless you have another suggestion?

mossy blaze
half pulsar
# mossy blaze I've made progress on my neuro-symbolic hybrid AI project. My latest work is ava...

I really like how clean and grounded this work is. I especially respect that it reports concrete benchmark results and openly states current limitations instead of overselling the system. The eval results are still limited, but that honesty makes the project feel more credible. The architecture is understandable, testable, and built around a clean separation between LLM-guided proposal and deterministic symbolic verification. Very nice work overall consider me impressed!

fading wigeon
brisk lantern
#

what would be a good free platform for building a chatbot for a uni assignment?
we are planning to build an expert system that basically functions as a sorta knowledge base that allows users to ask basic questions and learn more about a specific topic. Overall it is not going to be a very complex system.

warm dune
warm dune
#

does anyone know of a good, up-to-date article about model monitoring?

jaunty helm
brisk lantern
#

yep it matches that definition of an expert system

jaunty helm
brisk lantern
#

i see, ill check that out

#

thank you

jaunty helm
#
  • you can deploy it to a website for free p easily
sharp apex
#

im fresh out of high school and i want to get into data science
is there a roadmap for this field?

#

ive seen people saying python-> SQL -> apache airflow for data science
i know some other PLs so learning python shouldnt be hard, ive learned a thing or two about SQL as well but idk anything abt apache airflow

#

sorry if this is the wrong channel to ask questions

heavy crow
#

I have a question on object detection transformer architectures. Standard softmax attention artificially dilutes attention across multiple objects and forces unnatural focus onto empty backgrounds. E.g if the image doesnt contain any objects it still has to attend somewhere! And if there are many objects or one object is made up of two patches that are far away from each other, it has to split its attention across them. Wouldn't an independent, per-token sigmoid activation fix this by allowing the model to flexibly attend to multiple targets simultaneously or completely ignore the background?

heavy crow
#

Here a plot to visualize, with what i belive happens with softmax on the left and what i would think would happen with sigmoid on the right

#

it might still attend a bit to the first token because its a bit different than the other background tokens but less than the softmax.

unreal condor
pulsar crow
#

Whate are some Examples of Quantitative Data Analysis Methods?

#

mean, median, mode, standard deviation...

orchid lance
#

I'm looking for guidance to break into tier 1, buy-side quantitative hedge funds. I'm already a quant, but at a lower level (in risk & control side). My resume is probably good enough to get interviews with although I lack the pedigree. It would be nice if anyone can help me understand this industry because I don't really have connections in the space.

#

I'm currently studying the "Green Book" (A Practical Guide to Quantitative Finance Interviews) and doing NeetCode top 300 problems. I don't know if this is enough. I was thinking about also setting up an algorithmic trading bot and building out several machine learning projects to bolster my resume.

heavy crow
unreal condor
heavy crow
#

just from the scaled dot-product attention. so softmax(Q*K/sqrt(d_k))*V

orchid lance
# pulsar crow Whate are some Examples of Quantitative Data Analysis Methods?

In risk & control side, it doesn't go much farther than that plus a few more concepts. There's also outlier shooting algos, decision trees, RF, ATT/BTT analysis, confusion matrices for testing/validation, LASSO/Ridge regression, rule-based modeling. Time series for these models is almost always on monthly/quarterly/yearly basis, or rolling windows of 3/6/12 months; often, these windows are compared to the same window of the preceding year. All of that is for alert generation for a given model and there tends to be another layer that manages alerts across all models and can alter weights of the feeder models. If you say all of that, you'll 100% break into this field easily haha. That's the cheat sheet.

#

Being a quant in risk and control is like data scientist lite tbh. What I've just mentioned is pretty bottom of the barrel in terms of what other data scientists can do.

#

I was just really hoping someone here knew the process of becoming a quant trader/researcher/strategist at a tier 1 firm. I'm not sure how to differentiate myself and be taken seriously by the interviewers. I'm not even really sure about the interview topics.

unreal condor
# heavy crow In normal attention, we use softmax. This normalizes in such a way that the sum ...

I can't see how "it's always has to equal one" since the chance of equal one is astronomically low after the input has been passed through so many layers. And "it has to spend its attention somewhere" doesn't sound right because the attention block isn't the final block. And also the phrase "the model can pay attention using the attention mechanism" is kinda overly romanticized. Truth is deep within the layers of a neural net, things work like a blackbox so you shouldn't think of "attention" too literally

heavy crow
#

I can't see how "it's always has to equal one" since the chance of equal one is astronomically low after the input has been passed through so many layers.
Why? softmax ensures this.

unreal condor
heavy crow
#

Yes, i mean it sums to 1. That means it cant output zero across the board.

unreal condor
#

why do you want 0?

#

Like I said, it would be hardmax and iirc Andrew Ng explained why softmax is preferred

heavy crow
#

im working with object detection and am noticing that for scenes with no objects the model has a hard time predicting a low background confidence

#

thats why i thought it might be because of the softmax

unreal condor
#

is background a class that need to be classified in your dataset?

heavy crow
#

this is pointcloud data so the model predicts a confidence and a position vote, dataset is about 1:3 balanced for bg vs fg

unreal condor
#

so it's either bg or fg?

#

no other classes?

heavy crow
#

right. So single class segmentation with regression

unreal condor
#

oh, so segmentation

#

I thought you meant object detection like drawing bounding boxes around the objects

heavy crow
#

well it is detection, each token casts a vote for the centroid of the object and its bb

#

but it does this for all points (or a subset) in the scene, which makes the confidence score more of a segmentation task

unreal condor
#

object segmentation and object detection are two different problems tho

heavy crow
#

Here the model does both.

unreal condor
#

Is it a new problem or sth? Combining both detection and segmentation? I quit ML like a long time ago so I don't update myself anymore

heavy crow
unreal condor
obtuse acorn
#

anybody able to help me do something more effiently?

#

basically ive got a pandas dataframe that i got from reading json, and its got a column with a list in it

#

and im wanting to compare the lists of each row and store the overlapping data

#

currently im doing this but it doesnt seem very efficent

#

im 99.99% sure theres a better method

#

but my brain isnt coming up with it

serene scaffold
#

@obtuse acorn remember to always share code as text. Not as a screenshot.

I think your code would be faster if you skipped pandas entirely and used sets.

obtuse acorn
#
newData = []
for card in ids:
  for card2 in ids:
    card3 = pd.Series(card)
    card4 = pd.Series(card2)
    
   
    compared = card3[card4.isin(card3)]
    if (compared.count() > 0):
      newData.append(compared)
serene scaffold
#

What type is ids?

obtuse acorn
#

they are strings

#

i think

#

a list of strings

serene scaffold
#

@obtuse acorn I'm busy (at pycon no less) but look into sets and set intersection in python. It's designed to solve this exact problem

viscid wigeon
#

Hey guys, I am a beginner in ML and data science, I want to know what are the concepts that I have to know. For instance, I am a jr web developer and I want to implement a model that predicts disasters, in an weather app

obtuse acorn
serene scaffold
obtuse acorn
#

i figured out why it was going so slow

#

i had exported it wrong and it had turned each character of the strings in the list into a set

primal hemlock
#

I just realized how much money this turing pi thing really costs.

#

Damn near a thousand. Is there anything else I could use to learn ML?

iron basalt
primal hemlock
#

Alright then

mellow vector
versed pilot
#

yes, both kaggle and colab offer GPU and TPU acceleration options

#

But talking of learning ML, you don't have to go straight for GPU, learn the basics first, do some linear regression, look at SK learn etc.

gritty void
#

There are cheaper GPU rent options like lightning.ai or vast.ai, but moving forward with free tier of Kaggle and Collab Pro+ should be first steps.

#

In the long term, buying a GPU with at least 32GB VRAM might me cheapest option tho.

obtuse acorn
#

just marry someone with a powerful gpu

#

smh

gritty void
obtuse acorn
#

prices of weddings vary widely

fallow coyote
#

Apologies for being off topic (i.e. not talking about python), but how would you lot use Go for DS and ML?

fallow coyote
#

Im thinking about learning another programming language along with python so I want to see how. Just want to increase my skillset and see how I can use Go for data science and ML purposes.

serene scaffold
fallow coyote
#

I might do that then. Tbf I was thinking about using Go for more network based projects. Could be useful if I need to quickly setup a network application

versed pilot
#

It's not a language that is often mentioned for DS. Julia, R etc. yes.

serene scaffold
tiny mauve
#

hey I was just wondering if anyone here is a data scientist, if possible I can dm someone for advice on a roadmap, I’ve done my research online but I don’t know anyone with actual expertise in my life n wanted some personal help, if possible I’d appreciate havin a more in depth conversation in dms, I’m 23, restarting my life as a returning student at community college n plan in to transfer to uci after, any words would be greatly appreciated

serene scaffold
tiny mauve
#

university of Cali, irvine

serene scaffold
#

Did you get a bachelor's in something else previously?

tiny mauve
#

I took a big gap(3 years) and before I wasn’t really focused on school, I was pursuing a side hustle which ended up falling off

#

I’m coming back with a 2.23 gpa, n am trying to figure out a strategy to bring it up to an admissible grade for uc transfer (3.5)

serene scaffold
#

So a few things you should know:

Tech hiring is way down. It might improve by the time you finish a degree. You should look at how much debt you're looking at and what your risk tolerance is.

"Data scientist" has never had a widely agreed upon or consistently applied meaning. You should look at current job listings for various titles and see what skills are being asked for.

tiny mauve
#

I haven’t done too much extensive research yet on job listings for the field, I just figured it would work if I was passionate in business and analytics of the sort, coming back after the gap I figured I sort of had passions for understanding data n stuff along the lines of that

serene scaffold
#

Then I would include "analyst" in the list of job titles that you look for listings for

versed pilot
cursive cosmos
#

speaking from experience of working at large ecom w/ 50m+ MAU

drowsy pollen
#

im really new w ML

obtuse acorn
#

im trying to think what would be the best way to store overlaps between data

#

like the easy way is to just store copies of the overlapping parts

#

but you could instead do something like storing the index of the overlapping parts and just read the data from the array when you need it

vale badge
#

Hey peeps, I've got this graph, (Hue mean average over time) and it's showing some very strange oscillations. If I do a Fourier transform on the data set will that smooth out the whole graph? Also, if I want to find the frequency of the oscillation, and what might be causing it, how would I go about it?

Thanks in advance,

cursive cosmos
# vale badge Hey peeps, I've got this graph, (Hue mean average over time) and it's showing so...

its very hard to say for certain without knowing origin of the data, but oscillations are natural e.g. in physical systems.

yeah, fourier can help you out cut frequencies under some threshold and "dampen" the signal, you'll have to inspect the data to make sure that it didn't get wrong frequencies either though.

you can also do exponential moving average (EMA), which could be more versatile, since you can more easily iterate over weights. This feels much safer than frequency threshold.

Btw I have implementation of EMA for Adam optimizer in this notebook (there's also a link to the blog post that does an overview of EMA and where exactly it is in Adam): https://github.com/sutskelis/sutskelis_explains_stuff/blob/main/optimizers.ipynb

GitHub

Dragon gives interview-friendly prespective on Machine Learning - sutskelis/sutskelis_explains_stuff

wooden sail
#

the fourier transform doesn't do any smoothing, it only gives you an alternative representation of the data. it should be able to tell you something about the nature of the oscillations

versed pilot
#

which should pick up the periodicity, if there is any

#

and you can do rolling mean or rolling median etc. for smoothing

#

This is an autocorrelation plot from a project I'm working on

vale badge
wooden sail
#

so the suggestion of lowpass filtering will probably work there. without knowing anything else about the topic, stuff like lighting changes introduces very sharp transitions

versed pilot
#

That looks like you occasionally have outliers that skew the distribution?

#

Not really periodic, they start very frequent and become more sparse over time

wooden sail
#

maybe a fourier transform can show you if there is a clear chunk of the spectrum that is nice, and other stuff that is noiselike

versed pilot
#

so I wouldn't do either fourier or autocorrelation, I would go back to the raw data before the average

wooden sail
#

but also maybe not. you can try to lowpass and also plot the magnitude spectrum and see if you learn something

#

what you can do is pick out a few of the frames, going by the timestamp, where these spikes occur

versed pilot
#

maby take a small window around 300s and plot all points, or do box and whiskers for selected times etc.

wooden sail
#

see if there is anything explainable causing the variations and whether they need to be addressed

vale badge
# wooden sail so the suggestion of lowpass filtering will probably work there. without knowing...

My working theory is that it's lighting related. The video I'm recording and analysing is from a webcam, with the lighting being provided from an LED, and I think the wedcam is picking up the flicker, but I've used this same LED under different conditions and not had this effect at all before.

But to summarise what you're all saying:
Go back to the original data and look for trends.
Maybe Lowpass filtering,

And analyse specific frames with notable peaks

Thanks guys

sudden canyon
#

!rule 6 9 @jagged dew We do not allow looking for developers on this server.

arctic wedgeBOT
#

6. Do not post unapproved advertising.

9. Do not offer or ask for paid work of any kind.

round crystal
#

Open source models are lowk scary like why do I have to download 7600 zigabytes of parameters

fading sedge
ionic zealot
#

Hi everyone, I’m starting from zero and my goal is to learn programming first, then move into AI and machine learning. I prefer a desktop PC. What build would you recommend for this path if I want something reliable, upgradeable, and good for the long term?

unreal condor
#

unless you have some serious budget, a normal setup is more than good enough for daily tasks. Just use cloud computing when you have enough knowledge and want to build some large models

vale badge
#

So I looked in depth at the RGB averages the webcam is picking up, and it turns out the pattern matches some small variations in the blue channel that are then just being amplified when expressed as the Hue.

desert laurel
#

Hello everyone 👋
I’m currently a BCA student and I want to build my career in Data Science / AI-ML.
I’m a beginner right now and I’m a bit confused about the roadmap.Could anyone please guide me:
What should I start learning first?
Which skills are most important for beginners?
How should I plan my daily study routine?
And what is the best way to practice and build projects?

hard nest
#

In Google cloud, I have a project billed by slot time, the version is Standard and I have max 400 slots with auto scale. I want to estimate the cost of automatize some queries depending of the frecueny (every hour, every 4 hours...). How do you do it?

versed pilot
#

Not sure about estimating slots, but if you do a dry run of the queries you get the Gibigbytes they process, I thought those kind of convert to $

#

you are not paying flat rate so many $/month for your 400 slots, right?

#

this in cloudshell or any shell with the SDK installed

bq query
--use_legacy_sql=false
--dry_run
'SELECT
COUNTRY,
AIRPORT,
IATA
FROM
project_id.dataset.airports
LIMIT
1000'

#

Or just paste the sql in the console query editor and it should validate it and show the data that will be processed

glass jetty
#

@prime holly I've deleted your message. If you know it's off-topic, don't post it.

prime holly
#

@glass jetty then where can i post it

#

which topic is it

glass jetty
arctic wedgeBOT
solemn depot
#

Hey everyone. I’ve been practicing strict data cleaning and just finished a project matching exact crypto news publication times to 1-minute market data (Kaggle link: https://www.kaggle.com/datasets/yevheniipylypchuk/bitcoin-news-vs-1m-btc-price-action-2025-26).

The hardest part was standardizing the UTC timestamps and handling the exact T0/T+15m delta calculation. If anyone here has experience building backtesting pipelines or scraping financial news, I’d love a quick roast of the methodology in my notebook. Did I miss any obvious edge cases?

viscid wigeon
#

Does someone know a good course for practical computer vision?

warm dune
#

rn I'm studying ML Engineering, but I wanted to expand my knowledge to MLOPs, does anyone have a good course?

serene scaffold
warm dune
serene scaffold
warm dune
serene scaffold
#

If you're still training the model you plan to use in an application, you're probably not doing anything related to MLops

hushed light
#

I'm about to jump in to the waters of Machine Learning!!! I have no idea where to start. 🙂 I'm reading the beginner's guide for "Gymnasium" at the moment. I already have a game I've written in C that I want to use for the training. I.e., I want a ML agent to "learn" how to play this game. Presumably the output of this process is some file(s) with data that I can then use to write some kind of AI bot that can play my game, using this generated ML data? Is that the general flow?

serene scaffold
#

The most popular way to make a beginner chess playing bot is to use a heuristic to calculate how favorable one board arrangement is to a given player. Then you consider different possibilities up to n turns ahead and decide how you can get to a better board in the fewest possible turns

#

But this isn't machine learning.

#

I don't recommend making a chess bot as your first ML project.

#

Hmm, why did I think you mentioned chess?

#

Is this a turn based game that you wrote? What I said is still applicable to turn based games with fully exposed state.

hushed light
#

It’s a text-based (console) adventure game. It has 40 rooms with ability to navigate between them. There are treasures to find and use and monsters to fight and kill with different strategies. The goal is to escape to the “victory” room and maximize your score along the way.

serene scaffold
#

You want to come up with a way to express game state and the player's decisions in some pure form, so that you can have a sequence of turns to train a model on.

hushed light
#

Yea, I’ve refactored the code into a format that is (presumably) compatible with ML training. I have a reset() function to restore the initial game state.I have perform_action() as the method called by the ML during the training loop, etc. I have a defined GameState struct but I have been learning about Observational Space so I will be populating a struct for that, that is passed from the agent to the game engine. Which the game will update based on the agent’s actions.

#

The possible commands are very simple to start with, represented by single characters. I have commands to go in a direction NSEWUD, to Pick up an item, Fight monster or Retreat, etc.

hushed light
#

So what exactly is Gymnasium? Is it just a tool for RL or is it for general purpose ML? What are its outputs? And once you have the outputs, what do you do with them?

iron basalt
#

Originally OpenAI, but it was abandonware and a mess. Lots of old papers use it and so to have those still be reproducible it was taken over by the Farama Foundation (forked). It has been heavily improved since then, effectively a full rewrite.

hushed light
#

And what about the outputs and how to use them?

iron basalt
# hushed light And what about the outputs and how to use them?
import gymnasium as gym

# Initialise the environment
env = gym.make("LunarLander-v3", render_mode="human")

# Reset the environment to generate the first observation
observation, info = env.reset(seed=42)
for _ in range(1000):
    # this is where you would insert your policy
    action = env.action_space.sample()

    # step (transition) through the environment with the action
    # receiving the next observation, reward and if the episode has terminated or truncated
    observation, reward, terminated, truncated, info = env.step(action)

    # If the episode has ended then we can reset to start a new episode
    if terminated or truncated:
        observation, info = env.reset()

env.close()
hushed light
#

As I stated, I am going through these tutorials right now. My question is about the end goal of this process. Does Gymnasium create some kind of "model" as its output? And then how would I use this "model" to control the thing I was training it for? Say I train it on how to land the lunar module. And now in my game the human user controls one lander and I want the other lander to be controlled by AI, presumably using the "model" I just trained. How do I use that model in my game?

iron basalt
#

It does what that code snippet does and nothing else.

#

It runs a virtual environment.

#

"How do I use that model in my game?" You give it observations, and it takes actions.

hushed light
#

How do I "bring my own model" when my whole goal is to create a model I don't have yet? If I want to say, create a model that can land the lunar lander successfully. That doesn't exist at first.

hushed light
#

I guess the first thing I need to know is the precise definitions of "model" and "agent."

iron basalt
warped salmon
#

random rant: I see how useful AI is in fields like robotics but then I see how all the big companies are using it for the dumbest, most wasteful shit

#

like...

gilded depot
hushed light
#

Ok, I think the word I am looking for is "Policy." Gymnasium trains to develop a Policy. How do I extract this policy from Gymnasium after I train it? How do I use this Policy in a different application that I write myself?

iron basalt
#

You should already know some calculus and statistics prior to getting into this. Although it's still readable without knowing much of these subjects.

#

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016
View the complete course: http://ocw.mit.edu/6-0002F16
Instructor: John Guttag

Prof. Guttag provides an overview of the course and discusses how we use computational models to understand the world in which we live, in particular he discusses the knapsack problem and g...

▶ Play video
iron basalt
# hushed light Ok, I think the word I am looking for is "Policy." Gymnasium trains to develop a...

When you play a game, you use a learned policy to make decisions (hence the term "policy"). You are also using what you experienced while playing to update your policy such that it results in better decisions. The game knows nothing of your policy, or that you are a human playing it, it's just a game. So the game won't give you a policy/agent/model/AI/etc. You bring your own to play the game (which could be yourself, a person). Gymnasium is the game. It's just designed to simulate some game, and provide observations and rewards to the user.

hushed light
#

I understand this. I am making the game. I have structured it such that it is trainiable via ML/RL. E.g., I have a reset(), perform_action(), check_game_over(), etc. I have modeled the data in such a way that I have an ObservationSpace struct that I update every game turn. I want to use Gymnasium to "train" on my game. Then I want to capture the Policy created by Gymnasium. I want to export this Policy, however this works. It's a black box to me at the moment. Then I want to implement an agent as part of my own game code that can play the game I wrote and just trained on. From a software perspective I know how to do all this. I just don't know what a Policy export is or what code I need to write in my own app to use it. Presumably it's very similar to the training code in Gymnasium where I start with the initial ObservationState after the first reset(). Then using the Policy data I extracted from Gymnasium, I determine the next action based on the current ObservationSpace. Rinse and repeat.... right?

iron basalt
# hushed light I understand this. I am making the game. I have structured it such that it is tr...

An example of a policy (not a learned one), is to randomly take some action from the action space each frame. That is a simple policy (env.action_space.sample()). A less simple policy would be taking some action based on what was observed. "Policy created by Gymnasium" - Gymansium does not to create policies. You create policies. If you boot up Tetris it does not spit out a policy, that's not its function. But when playing it, you receive information/data that you could use to craft a policy.

fickle shale
#

how to perform good at datascience casestudies?

#

sometimes i am not able to think like i have an alzheimer!!

gilded depot
celest sandal
#

hi

hearty hazel
#

Hi o/

south quest
#

Hõla

pearl valve
#

Hey there

south quest
#

Hõla

small sun
#

how good has OCR gotten? i found a decently big data set i'd like to train a net on, but i need to extract the text from about ten hundred images of text on a generally mostly flat background

hearty hazel
#

OCR is still tricky

#

I've worked with Tesseract in Python but it really is a bit hit and miss still

#

A lot of tools are deliberately not colour aware, including that one

#

And will convert images to black and white before trying to read them

#

So you need to be careful with light colours in your text

#

Some processing may be required beforehand

hollow kernel
#

hello there

#

could anyone help me with processing some images?

#

basically i'm given a set of hundreds of images

#

and i want to convert each of them to a matrix

#

and then to a vector

hearty hazel
#

You need to perform sone kind of OCR?

naive swallow
#

Can't you use scipy for that sort of thing?

#
scipy.misc.imread()```
^ returns a numpy array
#

You can also use numpy, apparently:

>>> import Image, numpy
>>> numpy.asarray(Image.open('1.jpg').convert('L'))
hollow kernel
#

i'm unsure what OCR is, i'm very very new to machine learning, and python

hearty hazel
#

Text recognition from images

hollow kernel
#

ah, yes that's what i'm trying to do

#

but i don' t necessarilly need help with that portion at the moment

#

it's the preprocessing of a different testing set that i'm working on

#

that's an example of an image

#

there's 150 '9's, 150 8s, etc down to 0

#

so my goal is to convert that image to a 28x28 image, then to a 28x28 matrix, then to a 784 (28*28) length vector

#

where i can test using my current model

hearty hazel
#

so, you're trying to join these images together into a grid?

hollow kernel
#

that i'm a bit unsure about

#

i understood it more as each image individually

#

this is what i have implemented atm

hearty hazel
#

well I mean, you already can't really crop it to 28x28

#

anyway, if you need to construct or modify images

#

you want pillow

hollow kernel
#

and i'd like to use that model to test it

hollow kernel
#

if you're still around to help and i could point you towards this link

#

that is effectively what i'm aiming to do, to process all the images and try to center the important parts

hearty hazel
#

Machine learning is honestly not my area

#

we're kind of lacking on that department here to be honest

elder otter
#

So I can help with this @hollow kernel

hollow kernel
#

hello

#

that would be lovely lol

elder otter
#

When they mean flatten, they just mean arranging the 28x28 image in a vector/array form

hollow kernel
#

yes

#

so i have 1500 images

elder otter
#

you can use any way you'd like to compress it into the array/vector form

#

as long as it's constant for all images

#

the network itself will identify relevant weights for the image

hollow kernel
#

so it's the compressing into an array/vector form that's giving me trouble right now

#

like i said i'm new to python

#

but i have a folder full of images ranging from test_0001 to test_1500

#

so what i'd like to end up with is a 1500x784 array

elder otter
#

the simplest way to do it is just join the rows of the 28x28 grid - this would work

hollow kernel
#

i think

elder otter
#

ya, you want 1500 long list of 784 length lists

#

if using python

hollow kernel
#

yes

elder otter
#
    compression = []
    for row in len(img):
        compression.extend(row)
    return compression```
#

or idk, probably not called compression

#

but something simple like this is enough

#

it's just that if you're using a basic neural net, especially one that operates on a by pixel basis and doesn't require convolution, it's much simpler to format the inputs in the form of a vector

hollow kernel
#

that's effectively what i'm doing i think...

#

if each 28x28 matrix gets flattened into a 784 length vector

#

so pillow has one called resize

#

so if i were working with that

#
    compression = []
    for row in len(img):
        resize.extend(row)
    return compression```
#

?

#

or is that way off?

elder otter
hollow kernel
#

i was looking at that one but yes

elder otter
#

seems like it might be more like this ```
from PIL import Image

def compress("filepath"):
compression = []
img = Image.open("test1.jgp")
img = Image.resize( (28,28))
for row in Image.getdata(): # not sure about this
compression.extend....

#

tbh i think the best way is just to try it out bc you're working with an Image object, but you want the compression to return as a simple array/vector

#

bc it's all 1s and 0s, and there are no RGB values involved

hearty hazel
#

This is some complicated stuff

#

I assure you I'm taking notes :P

elder otter
#

noo not complicated at all

#

just preprocessing data aha

#

i've never worked with the pillow module, but as long as you can figure out how to resize the image, transform it into a vector you should be good to go to input into the neural net

hollow kernel
#

so that should work

#

but i'm unsure how to do it for the entire data set

#

and i think that's where you put the # not sure about this lol

#
from PIL import Image
import os, sys

path = "/home/joe/Desktop/CSE474/proj3/Test/"
dirs = os.listdir( path )

def resize():
    for item in dirs:
        if os.path.isfile(path+item):
            im = Image.open(path+item)
            f, e = os.path.splitext(path+item)
            imResize = im.resize((28,28), Image.ANTIALIAS)
            imResize.save(f + ' resized.png', 'PNG', quality=90)

resize()
#

that's sort of working

elder otter
#

Just for loop across all files

#

For entire dataset

#

The function compress is meant to work for a single image

#

Loop over all images calling compress on each

#

The not sure is bc idk how the pillow image object works

#

this is actually less of a machine learning problem, and more of a how to use python modules problem

ionic summit
#

hello, are you aware of any python library that centralize and ease the download and load of machine learning dataset?

lapis sequoia
#
✨ Level Up!! ✨

Wolfgang just got to Level 1 - Beginner

ionic summit
#

I mean, when you use sklearn, you have access to the "dataset" module for this purpose but for example with mnist, the function only load few examples of the total dataset

modern vapor
#

Hello Everyone, does anybody have a link where i can find weather sensitive product dataset or something similar.

hearty hazel
#

You'll have to explain what you mean by that I think

eternal falcon
#

so i'm just starting out with machine learning and i'm having trouble finding a place to begin so my question is, where do i begin?

#

i have absolutely no education in calculus, my highschool was a joke. i've found that to be a hurdle from what i can tell.

rose quarry
#
#

Im not a master in machine learning, in fact I know literally nothing, but this was a pretty cool intro to it

#

theres also this

foggy moss
#

calculus is helpful

#

but you really just need to understand the ideas of calculus

#

you dont need to learn how to solve a bunch of differential systems

placid river
#

Hey there

undone jackal
#

i dont really believe you need calculus for it, but it certainly helps

charred kite
#

i havent finished them yet but so far theyre very informative

#

usually when it comes to maths 3Blue1Brown is my go to

undone jackal
#

they and numberphile are the best math/number related channels ive seen

#

carykh is pretty great too even though its not pure math

quick willow
#

Does tensorflow not work for python3?

#

3.6 rather

spark nimbus
#

It should

tight dove
#

Hi. Good morning

#

I'm about to download Anaconda for Data Analytics

#

There are two versions available

#

for 2.7 and 3.6 versions of Python

#

Please I need advise for which version I should install

naive swallow
#

3.6

hearty hazel
#

@tight dove #welcome has an FAQ about that

naive swallow
#

no contest :^)

tight dove
#

@hearty hazel , @naive swallow Thanks!

#

I hope to ask and contribute here!

hearty hazel
#

We look forward to having you \o/

quick willow
#

I got it installed with aconda

#

Wasn't working with pip

quick willow
#

tensorflow is harder than I imagined GWdarateroLongneckThink

dim beacon
#

@quick willow you may want to use a higher-level abstraction such as TFLearn or Keras, which simplify working with NNs/DNNs using Tensorflow (or even other backends like Theano or Torch)

south quest
#

I used TFLearn a little

#

It's pretty nice

quick willow
#

I'll check it out

spark nimbus
#

Does anyone have any good references for Natural Language Processing?

wild oasis
#

@spark nimbus I'm also trying to get into NLP but specifically into language classification. If that's what you're interested in then I have found some papers

#

I will probably be doing the language identification with N-grams since that seems like the best approach. I'm currently trying to decide on which Python library to be using for this. TensorFlow / TFLearn? NLTK? TextBlob? Something else?Does anybody know which library is the best?

spark nimbus
#

I have never actively worked with machine learning so I wouldn't know

quick willow
#

TFLearn is simply a wrapper for Tensorflow

wild oasis
#

I was researching this for the past 2 hours and it seems that NLTK is the best thing to use for this. Tensorflow etc. is overkill

novel hornet
#

Does anyone have any experience using convolutional nets to read bar graph data?

#

Or know of any papers aroudn the idea?

odd basin
#

why there is a high demand for ML programmers?

hearty hazel
#

A lot of companies think it's the Next Big Thing

#

in a way, they're not wrong, but I don't think it's as universally useful as they do

lean ledge
#

It definitely is the next big thing for large companies with lots of data

hearty hazel
#

Yeah, but that isn't everyone :P

lean ledge
#

Could definitely apply to everyone though. And definitely could revolutionise not just tech and CS industry, but lots of scientific industries and lots of businesses

odd basin
#

ok

#

so to do machine learning I found in coursera you have to understand statistics how far i should learn that subject?

#

at least the fundamentals?

lean ledge
#

machine learning isnt about programming

#

its more like a field of maths

#

need knowledge of linear algebra, statistics and some calculus

#

be comfortable with those three

odd basin
#

oh

lean ledge
#

Subscribe for more (part 3 will be on backpropagation): http://3b1b.co/subscribe Thanks to everybody supporting on Patreon. https://www.patreon.com/3blue1bro...

▶ Play video

What's actually happening to a neural network as it learns? Training data generation + T-shirt at http://3b1b.co/crowdflower Crowdflower does some cool work ...

▶ Play video

This one is a bit more symbol heavy, and that's actually the point. The goal here is to represent in somewhat more formal terms the intuition for how backpro...

▶ Play video
#

You pretty much need a PhD in ML to be considered for a job/research

odd basin
#

wowww

#

i dont even have a degree

#

haha

lean ledge
#

(thats simply because almost all research fields require a PhD to be taken seriously anyway and most positions for ML happen to be at big companies who have the resources and need of ML want the best people)

#

Same

#

I'm not even an adult yet, I cant imagine spending years more at uni in what feels like a super specific field

odd basin
#

well i guess i should look into other field

#

i want to make money

#

fast

#

you know

lean ledge
#

almost no technical field in STEM makes money fast. finance or something is what you want to be looking at

#

most things that make lots of money require years of commitment or moving into something that isnt the field (like engineering management rather than engineering)

odd basin
#

iam commitment to put years but i never was a good student

#

yeah to make money and get something we have to pur a lot of dedication

#

put*

lapis sequoia
#

New to machine learning. How do I start? Are there any good tutorials? Appreciate the help. Thank you!

earnest prawn
#

mainly google and docs of the lib you use

novel hornet
#

Read o’reilly’s data science from scratch with python

#

And/or machine learning with scikit learn and tensorflow

#

I think both might be available online

quiet gyro
#

TensorFlow has a great tutorial series

#

Two versions, one for people new to ML, another for people who know the fundamentals of ML already

lapis sequoia
#

Thank you guys!

lean ledge
#

Is anyone else annoyed about the number of people trying to "do ML" by watching tutorials and videos that walk them through basic things, leaving them with no mathematical understanding of what they're doing

#

Everyone's trying to do ML without realising the nature of the field because it sounds cool and is the new hot thing

charred kite
#

i mean id get annoyed at myself for not knowing it, but not others

#

my general view is 'you do you'

#

i quite like learning maths behind these concepts tho, and as such have pursued learning calculus before even touching any form of learning

lean ledge
#

It just annoys me when people try to just jump into super quantitative and large fields without literally any background or research. Got a bunch of people asking how to get into quantum mechanics with high school level maths on the physics server I admined

#

I think part of it is that even if they do learn something basic, it often leads to them pretending they know what they're doing and being overconfident

charred kite
#

i do know what you mean there

#

i sometimes get the opposite effect, other people thinking im a master of some things i do (when i am very much not), which i guess can cause people to do that if it happens to them

ripe vessel
#

Personally, I'd rather jump into something without knowing what I'm doing in order to learn more about what I'm trying to do, and the solution to my "problem". I know it's not the same for everyone, but I can personally see why people without any backing or prior knowledge would try to jump into a topic like maching learning.

lean ledge
#

Intro ML doesn't even have the same requirements as does something like intro QM so it's probably still easier to get into ML. But the fact that people don't even research what it involves and just ask for basic tutorials or YouTube videos on it?? It's a large academic field like any scientific field.

charred kite
#

do you mean like people who are only doing it to make something that looks cool (as a sort of boastful act maybe), rather than to learn about it and get better at it?

#

like those who are looking for the easy way, rather than the proper way

lean ledge
#

Sort. People that are so ignorant and arrogant enough that they have no clue what the field they want to study involves and have no idea how little they know already

lapis sequoia
#

Yes agreed @lean ledge. I asked because I work for a health research. we are starting a new project soon where we are going to use machine learning.

lapis sequoia
#

hey guys

#

I have trouble printing results from my classifiers.. I have the code up and running.. I have good accuracy but I'm not sure how to do the confusion matrix

#

and not sure how to print results

#

can someone help?

earnest prawn
#

kinda hard to help people without knowing their code

lapis sequoia
#

ahh yeah

#

hold on

#

here's my code

#

i have the accuracy.. there's no train test split from what I can observe of it.. I basically tried to fork another code and modify it for my purpose.. the accuracy is good but I need to print the results of the models..and stuff

#

i don't know how

earnest prawn
#

uuh never worked with keras dont think i can help you. my only advice would be looking up the docs and maybe do some dir() if you don´t find anything in the docs sry

lapis sequoia
#

oki

#

Philosophical question, do you think data science vs web development has bigger potential to benefit humanity in the long run and why? :x

#

depends on how we use them..

#

as with anything else..

dusky agate
#

it's a very broad question.

lapis sequoia
#

so is the data science hype going to fade or explode out of proportions?

#

it's not a hype.. it'll be a way of life

#

it's not replacing web development..

earnest prawn
#

the hype wont fade its just ... like tron says actually

lapis sequoia
#

there's no measure of comparison.. it's apples and peanuts

#

fineee

foggy moss
#

h u h

#

thats a weird question

#

data science is still in infancy

#

and yet is already critical to so many things

#

in a few years your LG refrigerator will have more data science in it than all the data science in the DoD today

#

i recommend the first couple chapters of the Undoing Project by Michael Lewis

#

and someday

#

i hope in my lifetime

#

ML can give birth to "auto brightness" on a phone

#

that actually works

austere slate
#

+1 @foggy moss hahahaha sooo true

odd basin
#

hey guys

#

I dont want to learn R

#

Is there any book like Statistics with python

weak kiln
#

he said he didn't want to learn R :D

naive swallow
#

...

#

There's a good edX course for it

#

you can audit the course for free

tight dove
#

Hi guys

#

Please where can I learn Churn Prediction?

lapis sequoia
tight dove
#

@lapis sequoia thanks man

hasty maple
#

@lean ledge "Sort. People that are so ignorant and arrogant enough that they have no clue what the field they want to study involves and have no idea how little they know already" Why do you think knowing the math is all so important for using a model of ML? I know what gradient descent does, take partial derivative(gives positive slope) and use that directional information to head to the minimum of the cost function. Even if I can't do the actual partial derivative I am satisfied with this understanding. I do similar abstractions of the concepts and understand them, piece them and apply the algorithms to my problem statements. Is this also considered ignorant/arrogant in your view?

loud crypt
#

whats machine learning?

hearty hazel
tepid pagoda
young blaze
#

Hi, guys! Does anybody know a good machine learning course?

#

@earnest prawn are you Dutch by chance? 😄

earnest prawn
#

German

young blaze
#

oh, lol

#

Niemand also means nobody in Dutch

earnest prawn
#

I know

#

Already caused some confusion because of that

young blaze
#

Do you know any machine learning?

earnest prawn
#

Barely

#

Fun story

#

I actually listened to a two hour presentation of a Dutch professor about machine learning, he was speaking """"""German""""""""

young blaze
#

They're almost the same anyway XD

tepid pagoda
#

i know a little bit but only in german xD

earnest prawn
#

Dutch is raped German

young blaze
#

it's actually more the other way around

earnest prawn
#

No

#

German is older

young blaze
#

German sounds like an angry Dutchman

earnest prawn
#

I looked that up during a discussion with another Dutch guy

#

German was there before Dutch so Dutch is raped German

young blaze
#

goddamit

#

anyway, do you know some good resources?

#

I'm having a hard time finding them

#

What's even worse is that all resources are in English.

earnest prawn
#

Personally not but I am quite sure if you go to search and enter sth like
in: #data-science-and-ml has: link
You will find stuff

young blaze
#

okay, thanks!

hasty maple
#

@young blaze You could try andrew ng's basic ML course to get started in ML

young blaze
#

this one?

hasty maple
#

Yes

vast oasis
#

I have some ML course material for python from my uni

#

Maybe I can share that

#

If anyone would be interested.

hearty hazel
#

I'm sure some people would, we're sorely lacking on ML resources here

hasty maple
hearty hazel
naive swallow
#

Thanks for the resource

#

This channel's pretty dead

hasty maple
#

https://www.youtube.com/watch?v=yDLKJtOVx5c&list=PLD0F06AA0D2E8FFBA 15.1, 15.2 in this playlist helped me understand 2nd order optimizers. I didn't have time for seeing other videos but based on personal experience of 15.1 and 15.2 I assume the rest would be good as well.

Attempt at a definition, and some applications of machine learning. A playlist of these Machine Learning videos is available here: http://www.youtube.com/my_...

▶ Play video
#

Well there are dedicated ML discord servers so I presume most discord ML conversations go there and lesser people visit python server for ML specifics. It's usually those that are reasonably good with basic python coding that jump into ML so seems normal they don't visit python server for ML and hence the resources for ML on python server is kinda less. If that makes sense 😅

vast oasis
#

As soon as I'm doing studying I will put all the stuff on my git.

#

Is there a way to set a reminder for myself here?

earnest prawn
#

nop

#

maybe implement a feature for the bot 😄

vast oasis
#

Can I set a reminder to remind me to build a remind feature for the bot?

earnest prawn
#

u could open an issue on github

vast oasis
#

😉

#

You mean InfoBot?

earnest prawn
#

no

#

self.help()

arctic wedgeBOT
#
class Bot:
    bot.info()        # Get information about the bot
class ​NoCategory:
    bot.help          # Shows this message.

# Type self.help() command for more info on a command.
# You can also type self.help() category for more info on a category.
earnest prawn
#

this one

vast oasis
arctic wedgeBOT
#

A utility bot designed just for the Python server! Try bot.help() for more info.

Total Users

1727

Git SHA

47136cf

earnest prawn
#

no bot commands here

#

just for showing the bot

vast oasis
#

Please bot give me your github link.

#

😄

earnest prawn
#

oh go to github search for discord-python

vast oasis
#

👍

naive swallow
#

bot.help()

arctic wedgeBOT
#
class Bot:
    bot.info()        # Get information about the bot
class Deployment:
    bot.deploy_site() # Trigger website deployment on the server - will only ...
    bot.redeploy()    # Trigger bot deployment on the server - will only rede...
    bot.uptimes()     # Check the various deployment uptimes for each service
class ​NoCategory:
    bot.help          # Shows this message.

# Type bot.help() command for more info on a command.
# You can also type bot.help() category for more info on a category.
naive swallow
#

pls

earnest prawn
#

its the user owning the bot

naive swallow
#

bot.help > self.help

arctic wedgeBOT
#

No command called ">" found.

void depot
#

shouldn't bot.help be bot.help() if it's a function

earnest prawn
#

zis one

void depot
#

bot.help

arctic wedgeBOT
#
class Bot:
    bot.info()        # Get information about the bot
class ​NoCategory:
    bot.help          # Shows this message.

# Type bot.help command for more info on a command.
# You can also type bot.help category for more info on a category.
earnest prawn
void depot
#

oh

#

uhhh

naive swallow
#

bot.help()

arctic wedgeBOT
#
class Bot:
    bot.info()        # Get information about the bot
class Deployment:
    bot.deploy_site() # Trigger website deployment on the server - will only ...
    bot.redeploy()    # Trigger bot deployment on the server - will only rede...
    bot.uptimes()     # Check the various deployment uptimes for each service
class ​NoCategory:
    bot.help          # Shows this message.

# Type bot.help() command for more info on a command.
# You can also type bot.help() category for more info on a category.
naive swallow
#

you can do both

earnest prawn
#

nooo

vast oasis
#

Yeah, got it.

void depot
#

those are two different commands

earnest prawn
#

no bot commands

void depot
#

they do two different things

naive swallow
#

now that's weird GWchadMEGATHINK

vast oasis
#

Nobody said no bot commands

naive swallow
#

#bot-commands

vast oasis
#

Yeah, I was just kidding. "Niemand" translates to "Nobody" in german...

hearty hazel
#

@void depot They're the same command.

earnest prawn
#

yesss

hearty hazel
#

Aperture just has access to more stuff than you

void depot
#

ohh ok

earnest prawn
#

finally someone noticing its german and not saying its dutch

#

woop

#

thanks @vast oasis

hearty hazel
#

Your name is Dutch? I knew that.

#

:>

earnest prawn
#

no

vast oasis
#

Haha

earnest prawn
#

its german

#

f*** u gdude

hearty hazel
#

Haha

vast oasis
#

trolled

earnest prawn
#

does f*** u gude count as swearing?

naive swallow
hasty maple
#

inb4 ml is spam land 😂

spring flicker
#

ok so i have some questions. if anyone can give me a hand...< noob

#

when i download a github repository. i can never seen to figure out what's what , as in: which file is the ai? also when would i have use for a trainer file? wouldn't i use the ai itself during training?

#

i can't seem to find any tutorials that walk you through navigating a large project.

lean ledge
#

Look at the file you're supposed to run. Read it

spring flicker
#

they don't specify which file that would be.

dusky agate
#

is there a readme.md?

lean ledge
#

Well, look for a file with the same name as the project, or something along the lines of "main" or something

spring flicker
#

i go through them one by one reading, and for the most part i can say ok this is building a database or this is specifying the parameters of a training environment

#

yeah

#

so i think cool watch the video. but the video if him writing script that isn't in the repository, and makes no mention of it.

#

i'm not asking hey what is this guy talking about.

charred kite
#

the layout may have been guidelines set out by the creators of whatever training/ai module thing he used

spring flicker
#

i'm asking is there a resource for learning how this stuff works

charred kite
#

i'd suggest taking a look at those

spring flicker
#

this is what i was talking about with siraj's videos. @south quest

earnest prawn
#

you will (sadly) rarely find help about ml here

lapis sequoia
#

oh okay

earnest prawn
#

and no need to delete stuff

south quest
#

there are various modules for markov chain generation

lapis sequoia
#

I thought the question didn't belong here

#

so I deleted it

south quest
lapis sequoia
#

thanks

loud crypt
#

Theoretically how would a machine learn?

lapis sequoia
#

pattern recognition is one example @loud crypt

dim beacon
#

@loud crypt cost function minimization, reinforcement methods, backpropagation, clustering, etc.

#

For instance, neural network mathematical models are theoretically based on real biological neural networks, which means that, to some extent, artificial NNs learn the same way as we do

loud crypt
#

so you would need to study how the brain functions to make a model of it @dim beacon

dim beacon
#

@loud crypt we know how neurons work, for decades

loud crypt
#

i mean like learn from books

dim beacon
#

You do not need to know about biological details to understand how neural networks work, simple artificial NNs models are mathematically very explicit about it

#

However if biology interests you, do not keep yourself from reading about it anyway 👍

hasty maple
#

optimize a cost function, that's how they work

dim beacon
#

@hasty maple indeed, but this is true for every ML model, what I wanted to tell when talking about "how NNs work" is how this cost function is optimized, which is using backpropagation (with sometimes some more specific details)

hasty maple
#

Ah, I was just giving a simpler shorter answer.

#

It wasn't in relation to your example.

thick siren
#

I'm wondering how to make a tensorflow neural network with two inputs, two hidden layers and one output

#

using relu as the activation function

dusky agate
#

have you read the TF docs?

hasty maple
#

It's coursera's andrew ng's course tensorflow basics code, maybe go through this to understand how to use tf

thick siren
#

Okay thanks @hasty maple

#

Will do

deft harbor
#

Enjoying that link, thanks

spare arch
#

hello

spark nimbus
#

So I was working on a neural net I probably took from somewhere

#

and I'm having issues setting it up

#

if someone's able to help me out, contact me either here or in DM

stone oasis
#

@spark nimbus what api? tensowflow? what o/s? using cuda?

spark nimbus
#

none of that

#

just own implementation

#

nothing too big

#

just some numpy size issues

lapis sequoia
#

say

#

this stuff looks cool

earnest prawn
#

this stuff looks cool

lapis sequoia
#

how do i implement machine learning in my program

earnest prawn
#

thats

#

a very broad question

#

considering that

lapis sequoia
#

are you an expert at machine learning

#

or AI

earnest prawn
#

machine learning can have dozens of usage cases

#

and there are dozens of libs

#

and no im not

lapis sequoia
#

oh ok

earnest prawn
#

there is (not sure) no one on this server who is

lapis sequoia
#

like whats the first project you'd assign me

earnest prawn
#

but i guess you can ask stuff anyways

lapis sequoia
#

that involves machine learning

earnest prawn
#

oh

#

classifier

#

for numbers

#

to be more accurate

lapis sequoia
#

what does it do?

earnest prawn
#

classify numbers in a fixed size black white picture

#

well

lapis sequoia
#

ooooh

#

where do i start

earnest prawn
#

it gets a picture and gets the number out of that

#

its like the hello world of ml

lapis sequoia
#

i see

earnest prawn
#

well you do start by choosing a lib

#

there are.... a LOT

#

pytorch
tensorflow
scikit learn
and and and

#

but those are the most popular

lapis sequoia
#

oh okay

#

what do you recommend?

earnest prawn
#

oh

#

i dunno

#

i heard tf is more diy scikit is more highlevel and nothing about pytorch

#

tf is by google and now open source
pytorch by facebook and now opensource
and scikit learn well free software from the first day

#

i couldnt tell you much more without doing some research tbh

lapis sequoia
#

welll

#

what do i do first

#

oh

#

have u done it before?

earnest prawn
#

i have done the number detector in scikit

#

for reason i dont remeber anymore

lapis sequoia
#

oh

#

can u show me how to do et

earnest prawn
#

but most popular ai stuff you hear about is tf

lapis sequoia
#

ah okay

#

well whatever impresses my computer science teacher

earnest prawn
#

and two things
no i cant thats not the point of helping
there is a tutorial on their website

lapis sequoia
#

i topped my first semester class

#

oh ok

#

can u link me to it?

lapis sequoia
#

tbh

#

i want to be a programmer

#

yet an artist

earnest prawn
#

but?

lapis sequoia
#

yet a musician

earnest prawn
#

well you are lucky

#

machine learning and a few other programming topics mix that

lapis sequoia
#

awww yeeee

#

problem is

#

im shit at being an artist

#

and i never made music b4

#

well i made this small melody on fl studio but thats it lol

#

like this is the best i could do

#

lol

earnest prawn
#

better than i could ever do

#

but for example there is an ai which can make a van gogh out of a picture

#

or

#

make a bach out of bach examples

lapis sequoia
#

yeah

#

ive seen and heard

#

holy fuck its complicated

earnest prawn
#

the scikit thingy?

lapis sequoia
#

the link u sent me

earnest prawn
#

well

#

that is the high level stuff

#

the real machine learning and science stuff is in the functions which get called there

lapis sequoia
#

oh

#

well is there anything lower leveled

earnest prawn
#

tensorflow

lapis sequoia
#

ok

hasty maple
#

@spark nimbus The shapes of a and y are not same, numpy broadcasting is the issue, reshape y to y =y.reshape(y.shape[0],1) before delta = a-y and should work I think

spark nimbus
#

It worked fine previously though

#

@hasty maple only after repurposing it did it break

hasty maple
#

repurposing?

spark nimbus
#

It used to be written number recognition

#

Of 16x16 images with grey tints only

hasty maple
#

and what is it now?

spark nimbus
#

Emotional recognition in sentences

#

Maybe I just screwed up the data format

#

Hold on

#

fml now I have to un-gzip un-pickle my data

hasty maple
#

Well I don't think there is an easy way to re purposing an image classifier for sentiment analysis

spark nimbus
#

You wanna help reimplementing it?

hasty maple
#

I haven't done any NLP work before. so I don't think I wouldn't be of much help I think 😅

#

How big is the data set? if it's small enough I'll download it and when I eventually get to NLP we could compare our implementations :)

spark nimbus
#

It's not too big

#

If empty, use "none"

#

Max input size should be changeable

#

And the amount of neurons should be [input_size, 3/4 input_size, 1/2 input_size, output_size (7)]

hasty maple
#

most is empty though, how would the training go when you don't have enough labels 🤔

spark nimbus
#

this is all the data the project lead gave me

spark nimbus
#

once this works we'll add more data

narrow flare
#

can someone give me a brief outline on what machine learning algorithms are?

dusky agate
#

There's a bajillion youtube videos about this that can explain better than most people on this server

spark nimbus
#

This is the best guide I've seen so far

narrow flare
#

Magikarp ur right x) im just being a poo head as usual

serene pier
#

does anyone have any experience with online learning?

#

basically retraining models with new data constantly or periodically

thick siren
spark nimbus
#

This is fine

#

Iike how despite all weights being random, they all still look the same in each row

hasty maple
#

They just show the fully connected layers, not the actual weights

spark nimbus
#

Yeah I know they don't show the weights

#

But the value of each neuron in each layer has such a low diff

lapis sequoia
#

any of you guys happen to know good algorithms for image upscaling, preferably i want to get something like this

#

not sure how this was generated though

hearty hazel
#

Have you looked at waifu2x? It's an interesting image upscaler

spark nimbus
#

why is backpropagation so hard aaaa

lapis sequoia
#

i have, it's quite decent but it's not good at upscaling real-to-life images @hearty hazel

#

this looks like some sort of texture remapping

#

i forget what they call it specifically

#

@spark nimbus it's just math™

thick siren
#

@spark nimbus feels bad man

hasty maple
spark nimbus
#

I know the math

#

just not how to implement

hasty maple
#

oh

hasty maple
#

why not just use an opensource library?

#

that only requires forward propagation,it will do the back prop for you

spark nimbus
#

I don't understand shit about the terms they use

spark nimbus
#

Managed to implement these

#

Just the bias left

spark nimbus
#

@hasty maple for $delta_{n+1}, should I take the sum of all the weight deltas for this neuron or the average, or what?

hasty maple
#

You first canculate delta_N the final layer delta given by ( yN - t )* derivative of activation function for that layer

#

@spark nimbus

spark nimbus
#

@hasty maple can you look into my code for a bit?

thick siren
#

Is there any gui tool you can use to make functional neural networks?

#

I can't figure out tensorflow

hasty maple
#

@thick siren maybe try keras

#

It's not gui but at a higher abstraction level than tensorflow

thick siren
#

ahhh it's so hard

tawdry dock
#

hello