#data-science-and-ml | Python | Page 149

river cape Oct 16, 2024, 5:44 PM

#

Btw do we as ML engineers require DSA?

wooden sail Oct 16, 2024, 5:45 PM

#

yes, among other things. a really important skill is being able to determine which problems merit using deep learning in the first place, because many (most, even) don't

#

DSA gets you familiar with common problems that already have good solutions, vs those that don't

#

you generally wanna have good familiarity with classical problem-solving and optimization methods, which includes DSA and more

charred egret Oct 16, 2024, 5:46 PM

#

Yeah like using NLP when a simple regex will suffice

desert oar Oct 16, 2024, 5:46 PM

#

for engineering specifically you might also actually want to know about things like space-time tradeoffs and avoiding accidentally quadratic/exponential algorithms

wooden sail Oct 16, 2024, 5:47 PM

#

it also helps you detect when your deep learning algo is shit

river cape Oct 16, 2024, 5:48 PM

#

Ahhh man need to learn that also now

desert oar Oct 16, 2024, 5:50 PM

#

example: finding the maximum in a list is O(N) but sorting a list is usually something like O(N log(N)), and it turned out that we were able to actually make our database queries run faster by switching from 1 = row_number() over (partition by i order by y) to i = max(i) over (partition by i) using that knowledge

#

I also had some code running in production that was accidentally quadratic, we had to switch from using Pandas vectorized operations to a hand-written for loop

#

it also comes up in interviews

river cape Oct 16, 2024, 6:14 PM

#

#

should I increase the complexity?

cerulean kayak Oct 16, 2024, 7:37 PM

#

For anyone else using Google Colab, namely for image processing and making image based CNN models:

Under the "Change runtime type", do you guys use:

CPU
T4 GPU
TPU v2-8
please at me if you respond.

serene scaffold Oct 16, 2024, 9:02 PM

#

cerulean kayak For anyone else using Google Colab, namely for image processing and making image...

if you're writing code that uses CUDA, I would just pick T4 GPU.

nova matrix Oct 16, 2024, 9:18 PM

#

cerulean kayak For anyone else using Google Colab, namely for image processing and making image...

T4 works fine for most of the computer vision tasks for me

faint quail Oct 16, 2024, 9:32 PM

#

https://youtu.be/uV8g4THpF6g

YouTube

lol man

Rainbow Six Siege Soft Aim Assist AI

▶ Play video

#

made this using my own custom neural network library

cerulean kayak Oct 16, 2024, 10:04 PM

#

faint quail made this using my own custom neural network library

In freaking Python?

faint quail Oct 17, 2024, 12:07 AM

#

cerulean kayak In freaking Python?

yeah most computer vision libraries are built in python

mint plume Oct 17, 2024, 12:45 AM

#

I'm doing a project right now and I the issue here is that I have two datasets that are related but they have different granularities. One has data collected every 1 hour and the other every 3 hours. I want to combine them to get a good design matrix but I'm not really sure what to do here.

faint quail Oct 17, 2024, 12:52 AM

#

mint plume I'm doing a project right now and I the issue here is that I have two datasets t...

data augmentation?

signal escarp Oct 17, 2024, 12:53 AM

#

Hello guys, I have a project about the reinforcement learning and supervised learning with some sort of zigzag algorithm that extracts top and bottom points on trading but I have some issues about it. For supervised learning, it's overfits too much, it goes from 10k to millions on training data but actually performs very badly in any other testing data. I used some dropout methods and tried to increase the batch size from 256 to 512 and even 1024 but it still faces with the same problem (I will try runing 4096 for a few days but I don't think that it will solve the problem). So I know that to I have to use this model as base policy (at least it will help a lot on train data for RL) and run it on RL model but RL has some problems that I didn't find but it actually not learning at all. I debugged rewards at a well performing session and I noticed even tho everything is positive high, reward is generally around -4 to -7 and it's simular at bad performing session, problem could be that but I don't actually know. So, I'm just searching for someone who can help me about it and we can solve this issues together, if we can fix the issues we will have a great performing model and personally I'm planning to use it myself for automated investing and also I think it would be great for both sides. So, how can I find someone to fix the issues together (I mean where) or does anyone interested?

mint plume Oct 17, 2024, 12:54 AM

#

faint quail data augmentation?

Not a bad idea, how would I implement this though?

#

The closest thing I've done is imputation.

faint quail Oct 17, 2024, 1:00 AM

#

signal escarp Hello guys, I have a project about the reinforcement learning and supervised lea...

This is a very outside look but to me the stock market seems like something incredibly random that you can't really easily predict, especially not without enterprise grade hardware to mine data and train giant models. I think it's not that the model is underperforming, I think it's that the task of predicting the stock market makes it impossible or at the very best needing extremely powerful hardware

#

It would be like trying to tell a model to predict something like a hashing algorithm such as SHA256, technically it's possible since it's determinate but in practice its not possible

#

idk tho I don't know a lot about finance

signal escarp Oct 17, 2024, 1:07 AM

#

faint quail This is a very outside look but to me the stock market seems like something incr...

Generally market is not that much random and I belive that it could be something solved at a level that machine can understand complex patterns. I'm not trying to build a model that would perform well in market on my computer, first I have to get a algorithm that performs well on small data (should perform well on any kind of simular data) and after that I will use strong servers like 8x of h200 and I belive it will end in a good model if we have a great train algorithm and enough complexity of nn

cerulean kayak Oct 17, 2024, 1:11 AM

#

faint quail yeah most computer vision libraries are built in python

no but you were able to mod Rainbow 6 with Python?
Because I would've thought that Python was too slow for making videogames especially fast paced ones like fpses.

faint quail Oct 17, 2024, 2:26 AM

#

cerulean kayak no but you were able to mod Rainbow 6 with Python? Because I would've thought th...

I didnt mod the game I just draw circles and rectangles to the screen using the windows GDI api

untold fable Oct 17, 2024, 3:44 AM

#

what is difference between standardization and normalization

quaint mulch Oct 17, 2024, 4:45 AM

#

what are you into?

onyx frigate Oct 17, 2024, 5:36 AM

#

faint quail https://youtu.be/uV8g4THpF6g

I saw it it's awesome

marsh marsh Oct 17, 2024, 6:58 AM

#

kaggle datasets list -s retail-orders
^
SyntaxError: invalid syntax

can anyone help me with this ??

austere swift Oct 17, 2024, 6:58 AM

#

that's a command for your terminal, not python

#

so command prompt if you're on windows or whatever shell/terminal you use if you're on linux

marsh marsh Oct 17, 2024, 7:00 AM

#

ahh no was using kaggle api to download a dataset in my jyupter notebook extension in vs code

#

but it solved thanks for the information tho

onyx frigate Oct 17, 2024, 7:45 AM

#

I have a question how is that mixtral out performs llama2 and gpt3.5 in some benchmarks even though it has 56billion parameters where as llama and gpt have in trillions ??

grand breach Oct 17, 2024, 7:52 AM

#

what can i infer from this information:

if i've found out the mean ctr of entire dataset and tried finding out which categories from a categorical feature had ctr above the mean ctr of dataset

do those categories play an important role for model to learn, what if a feature has only 1 or 2 categories that have ctr more than the mean ? do they've any impact on what patterns my model learns

hearty token Oct 17, 2024, 8:33 AM

#

I have an exam question like this. I would like to segment each question into its own snippet. Sort like taking a screenshot out of each question as separate images. How can I go about automating this with A.I.? I have thought about using edge detection of some sort

vestal spruce Oct 17, 2024, 10:25 AM

#

hearty token I have an exam question like this. I would like to segment each question into it...

My approach would be using the question number as an anchor to determine the height of each question and segment it vertically, and then figure out the fix width for each question horizontally. so the current problem would be figuring out how to extract the question number without raising false flag from the number within the question.

trim forum Oct 17, 2024, 10:27 AM

#

basically I would use Openai api

#

its fire

vestal spruce Oct 17, 2024, 10:29 AM

#

@hearty token
Assuming the width of every question in the test is all the same, you could start with finding the fix width first to get the dead zone for number detection, so if the number is inside the dead zone, which is the width range, then we can exclude it. so only the number on the left hand side of the width range will be detected.

#

from there we can pad the coordinate of the question number vertically (basically shifting upward), and then subtract the distance between question number vertically using a condition that both the question number we try to calculate the distance of is within the relatively same x coordinate (horizontally)

granite canopy Oct 17, 2024, 10:52 AM

#

hello every one am new here from cameroon i like python programmation i never did that before i may you people to help me to start programming like such people here thanks..

left tartan Oct 17, 2024, 11:35 AM

#

granite canopy hello every one am new here from cameroon i like python programmation i never di...

Welcome! Start in #python-discussion to learn

crude nebula Oct 17, 2024, 11:37 AM

#

hearty token I have an exam question like this. I would like to segment each question into it...

!exam

#

!rule exam

arctic wedgeBOT Oct 17, 2024, 11:37 AM

#

Rules

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

hearty token Oct 17, 2024, 11:40 AM

#

crude nebula !exam

I just meant that i was trying to do image segmentation on an exam question

thorny geode Oct 17, 2024, 12:00 PM

#

quaint mulch what are you into?

ah ill reply in dm

rain glen Oct 17, 2024, 1:06 PM

#

FineTuning an OpenAI model with data from huggingface for the first time, and im using a 5gb file of chess moves lol. Gonna have the ultimate (mediocre) chess bot

vestal spruce Oct 17, 2024, 2:02 PM

#

Guys I have this distribution graph of cumulative win/loss of a technical indicator on every stocks in my market, is it a good idea to normalize and filter out the outlier?

e7Ds86AsFmjDGBDEQBAAAAAAAAwIs1KAEAAAAAAABYhoASAAAAAAAAgGUIKAEAAAAAAABYhoASAAAAAAAAgGUIKAEAAAAAAABYhoASAAAAAAAAgGUIKAEAAAAAAABYhoASAAAAAAAAgGUIKAEAAAAAAABYhoASAAAAAAAAgGUIKAEAAAAAAABYhoASAAAAAAAAgGXP5aPiWOuvfUWAAAAAElFTkSuQmCC.png

quaint mulch Oct 17, 2024, 2:05 PM

#

use more bins?

vestal spruce Oct 17, 2024, 2:12 PM

#

quaint mulch use more bins?

I beg your pardon?

quaint mulch Oct 17, 2024, 2:20 PM

#

vestal spruce I beg your pardon?

https://en.wikipedia.org/wiki/Data_binning

Data binning

Data binning, also called data discrete binning or data bucketing, is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a bin, are replaced by a value representative of that interval, often a central value (mean or median). It is related to qua...

vestal spruce Oct 17, 2024, 2:24 PM

#

quaint mulch https://en.wikipedia.org/wiki/Data_binning

Sorry I meant, were you giving solution for me or someone else, since you didn't specifically reply to any comment.

gentle storm Oct 17, 2024, 2:25 PM

#

Hi does anyone know a good program to start using python on? I want to make neural networks and simulations

vestal spruce Oct 17, 2024, 2:30 PM

#

gentle storm Hi does anyone know a good program to start using python on? I want to make neur...

For NN you can try out TensorFlow, they have a comprehensive tutorial and documentation on how to get started.

gentle storm Oct 17, 2024, 2:32 PM

#

vestal spruce For NN you can try out TensorFlow, they have a comprehensive tutorial and docume...

Thank you will check it out.

onyx frigate Oct 17, 2024, 2:33 PM

#

rain glen FineTuning an OpenAI model with data from huggingface for the first time, and im...

Then compare it with stockfish it would be awesome to see how good or bad it is.

rain glen Oct 17, 2024, 2:34 PM

#

Stockfish to this is the magnas carlson of the kindergarten

quaint mulch Oct 17, 2024, 2:42 PM

#

vestal spruce Sorry I meant, were you giving solution for me or someone else, since you didn't...

I meant, the whole distributiosn seems to be captured by 3 bins. If it were me, I'll add more bins to see if it is actually normal-ish

vestal spruce Oct 17, 2024, 2:43 PM

#

quaint mulch I meant, the whole distributiosn seems to be captured by 3 bins. If it were me, ...

Ohh gotchu, thanks

river cape Oct 17, 2024, 3:22 PM

#

desert oar example: finding the maximum in a list is O(N) but sorting a list is usually som...

Do you use Python for dsa or C?

grand breach Oct 17, 2024, 3:58 PM

#

does it make any sense to do standardization after hash encoding categorical variables ? i saw a guy doing this in his analysis (not criticising his work at all) - my understanding is if there are no multiple ranges in dataset scaling might not be necessary, but main thing is doing this on categorical features is incomprehensible

mellow vector Oct 17, 2024, 4:29 PM

#

jupyter always duplicates images I paste into markdown cells, what am I doing wrong?

serene scaffold Oct 17, 2024, 4:46 PM

#

mellow vector jupyter always duplicates images I paste into markdown cells, what am I doing wr...

can you show all the code for a given cell that has this problem?

mellow vector Oct 17, 2024, 4:47 PM

#

there's a help topic open

#

https://discord.com/channels/267624335836053506/1296512171642982510

serene scaffold Oct 17, 2024, 4:47 PM

#

mellow vector there's a help topic open

if you ask for help about something where you have a help thread, be sure to always link to it

mellow vector Oct 17, 2024, 4:47 PM

#

ya sorry, was back and forth between ds and general

lilac lichen Oct 17, 2024, 7:15 PM

#

is there any news related to KAN?

odd meteor Oct 17, 2024, 9:20 PM

#

lilac lichen is there any news related to KAN?

Apparently, the hype around KAN seem to have fizzled out after some people on Twitter showed it's not as good as the kind of hype it got.

twilit sable Oct 17, 2024, 11:23 PM

#

Which ML framework should I learn as a beginner so that I can find job quickly??? I mean tf or pytorch

agile cobalt Oct 17, 2024, 11:24 PM

#

learning a framework is less than 20% of the work
the vast majority of what you have to learn is more about math and statistics rather than programming itself

serene scaffold Oct 17, 2024, 11:27 PM

#

twilit sable Which ML framework should I learn as a beginner so that I can find job quickly??...

It will probably be impossible to get an ML job quickly--or ever--without ML related coursework at a university. At which point the effort needed to learn pytorch is negligible.

twilit sable Oct 17, 2024, 11:35 PM

#

serene scaffold It will probably be impossible to get an ML job quickly--or ever--without ML rel...

Just tell me the framework

serene scaffold Oct 17, 2024, 11:36 PM

#

twilit sable Just tell me the framework

I told you implicitly which one I prefer. But there does not exist a framework, the knowledge of which enables one to get an ML job.

twilit sable Oct 17, 2024, 11:37 PM

#

serene scaffold I told you implicitly which one I prefer. But there does not exist a framework, ...

But I found that pytorch is used in most industries rather than tf

serene scaffold Oct 17, 2024, 11:37 PM

#

twilit sable But I found that pytorch is used in most industries rather than tf

Yep.

twilit sable Oct 17, 2024, 11:37 PM

#

So should i learn pytorch as a beginner

serene scaffold Oct 17, 2024, 11:37 PM

#

Nope.

twilit sable Oct 17, 2024, 11:38 PM

#

Why??

serene scaffold Oct 17, 2024, 11:38 PM

#

Because the framework doesn't matter. You need to learn the concepts.

#

ML has less to do with the actual code implementation than does general SWE

twilit sable Oct 17, 2024, 11:39 PM

#

Ik

#

Ahh. The. Good courses online are of tf not of pytorch why ??

#

@serene scaffold

faint quail Oct 17, 2024, 11:41 PM

#

serene scaffold Because the framework doesn't matter. You need to learn the concepts.

word

#

built my own crappy framework

#

its not impossible to be self taught tho

#

just learn the concepts and understand WHY they work

twilit sable Oct 17, 2024, 11:42 PM

#

Can anyone help me ??

faint quail Oct 17, 2024, 11:42 PM

#

do what?

twilit sable Oct 17, 2024, 11:43 PM

#

Plz I am confused tf or pyt ???

#

Tensorflow or pytorch

#

I am confused

faint quail Oct 17, 2024, 11:43 PM

#

pytorch is more popular now adays

#

so I recommend that

twilit sable Oct 17, 2024, 11:44 PM

#

From where should I learn ml with it ??

#

There are no good courses??

#

All good courses are of tf

#

I also want to learn pytorch

#

@faint quail

faint quail Oct 17, 2024, 11:45 PM

#

twilit sable From where should I learn ml with it ??

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
https://www.youtube.com/watch?v=hfMk-kjRv4c
https://www.youtube.com/watch?v=Lakz2MoHy6o

YouTube

3Blue1Brown

But what is a neural network? | Deep learning chapter 1

What are the neurons, why are there layers, and what is the math underlying it?
Help fund future projects: https://www.patreon.com/3blue1brown
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

Additional funding for this project was provided by Amplify Partners

Typo correction: At 14 minutes 45 seconds...

▶ Play video

YouTube

Sebastian Lague

How to Create a Neural Network (and Train it to Identify Doodles)

Exploring how neural networks learn by programming one from scratch in C#, and then attempting to teach it to recognize various doodles and images.

Source code: https://github.com/SebLague/Neural-Network-Experiments
Demo: https://sebastian.itch.io/neural-network-experiment

If you'd like to support me in creating more videos (and get early acce...

▶ Play video

YouTube

The Independent Code

Convolutional Neural Network from Scratch | Mathematics & Python Code

In this video we'll create a Convolutional Neural Network (or CNN), from scratch in Python. We'll go fully through the mathematics of that layer and then implement it. We'll also implement the Reshape Layer, the Binary Cross Entropy Loss, and the Sigmoid Activation. Finally, we'll use all these objects to make a neural network capable of classif...

▶ Play video

#

these are really good and how I learned

#

no crappy commentary with a cheap mic

#

just gets straight to the point

#

while still giving you a general understanding

twilit sable Oct 17, 2024, 11:45 PM

#

These are all about neural network

faint quail Oct 17, 2024, 11:45 PM

#

u dont have to watch the entire 3blue1brown playlist tho

twilit sable Oct 17, 2024, 11:45 PM

#

Not pytorch

#

Dude

twilit sable Oct 17, 2024, 11:46 PM

#

faint quail u dont have to watch the entire 3blue1brown playlist tho

I have

faint quail Oct 17, 2024, 11:46 PM

#

twilit sable Not pytorch

pytorch is a library for building neural networks

#

do you know python already?

twilit sable Oct 17, 2024, 11:46 PM

#

faint quail pytorch is a library for building neural networks

So is tf

faint quail Oct 17, 2024, 11:46 PM

#

just use whichever is easiest

twilit sable Oct 17, 2024, 11:46 PM

#

faint quail do you know python already?

Yeah I also know pandas numpy matpolit and sql

faint quail Oct 17, 2024, 11:46 PM

#

if you understand the concept it generally doesnt matter what framework you use

#

just read the docs

twilit sable Oct 17, 2024, 11:49 PM

#

@faint quail Bro this is the most shitty person I have ever seen -> @serene scaffold

serene scaffold Oct 17, 2024, 11:50 PM

#

You're right. I'm the worst of all possible humans.

faint quail Oct 18, 2024, 12:01 AM

#

lol

#

better to just help them instead of putting them down

#

but he is kind of right because you're likely not gonna get hired unless you really understand the advanced calculus concepts of A.I

#

or atleast to some degree the math for training A.I

serene scaffold Oct 18, 2024, 12:04 AM

#

faint quail better to just help them instead of putting them down

What did I say that you feel was me putting them down?
They asked about getting an ML job quickly, so I gave them information to ground their expectations.

left tartan Oct 18, 2024, 12:06 AM

#

twilit sable Yeah I also know pandas numpy matpolit and sql

Perhaps consider data engineering? The barrier to entry is lower (still high) than data science.

iron basalt Oct 18, 2024, 12:10 AM

#

twilit sable Just tell me the framework

"Which suit should I wear so I can become the president of the US quickly?" "Armani or Gucci? Just tell me!"

faint quail Oct 18, 2024, 12:13 AM

#

iron basalt "Which suit should I wear so I can become the president of the US quickly?" "Arm...

lmao this is the best comparison

serene scaffold Oct 18, 2024, 12:36 AM

#

iron basalt "Which suit should I wear so I can become the president of the US quickly?" "Arm...

It's Gucci, isn't it

quaint mulch Oct 18, 2024, 4:49 AM

#

serene scaffold It will probably be impossible to get an ML job quickly--or ever--without ML rel...

Yea, I wish this is not the case

elder coyote Oct 18, 2024, 5:09 AM

#

Hello, is there any algorithm to find the exact images without having false positives while comparing two images? i have used cross correlation with pHash, but i still get false positives

#

i use them both to filter out the images

#

at the same time

agile cobalt Oct 18, 2024, 5:19 AM

#

elder coyote Hello, is there any algorithm to find the exact images without having false posi...

besides comparing if the colors of each pixel are exactly the same?

#

for "exact" matches, that is the only way
if you want similar images, it depends heavily on your definition of how "similar" the images should be

terse mirage Oct 18, 2024, 5:42 AM

#

hi, I'm a complete beginner in machine learning and am trying to implement a neural network from scratch, but i seem to be getting 'weird' results, could anyone go over my code once?

#

i am randomizing weights and biases everytime i run the program (i haven't completely implemented backpropagation yet), and sometimes i get this output, where the first 0 refers to calculated y value, and the second zero refers to the error of the last layer

#

my network has 4 layers with 1,2,5 and 1 neurons respectively

lilac lichen Oct 18, 2024, 5:51 AM

#

odd meteor Apparently, the hype around KAN seem to have fizzled out after some people on Tw...

where i can read more?

timber citrus Oct 18, 2024, 6:43 AM

#

Hello everyone, we are developing an FAQ chatbot to assist students with their enrollment by answering their queries. We have utilized a pre-trained BART model for the chatbot. My question is: how can I make the chatbot context-aware, so it can understand and maintain the context of the conversation? Additionally, any advice on developing a chatbot that specializes in answering questions would be greatly appreciated.

graceful niche Oct 18, 2024, 6:46 AM

#

anyone got a good book for building models from 0? neural networks or ML

#

or just some good in depth guide

timber citrus Oct 18, 2024, 6:49 AM

#

Any answer will help us plan our next step because we are stuck 😭

terse mirage Oct 18, 2024, 6:54 AM

#

graceful niche anyone got a good book for building models from 0? neural networks or ML

its not a book, but this is what i'm following
http://neuralnetworksanddeeplearning.com/index.html

graceful niche Oct 18, 2024, 6:56 AM

#

timber citrus Hello everyone, we are developing an FAQ chatbot to assist students with their e...

I mean

#

https://medium.com/@singla.jitesh27/generative-ai-101-making-your-llm-context-aware-using-langchain-1016fbf4c2b2

Medium

Generative AI 101: Making your LLM context-aware using LangChain 🦜🔗

Introduction

#

just google it next time

cold estuary Oct 18, 2024, 7:05 AM

#

Guys I am doing a major project on Text to floorplan using GANs. I have a dataset already but it has the floor plan layout only like in the picture I have attached. My professor told me to include beds, dining tables, doors, and some other things that are kept in respective rooms. So does any have any idea whether we can modify the dataset using GAN itself?

quaint mulch Oct 18, 2024, 7:12 AM

#

terse mirage hi, I'm a complete beginner in machine learning and am trying to implement a neu...

if 0 is not what you expect, then what are you expecting?

quaint mulch Oct 18, 2024, 7:15 AM

#

graceful niche anyone got a good book for building models from 0? neural networks or ML

https://iamtrask.github.io/2015/07/12/basic-python-network/

A Neural Network in 11 lines of Python (Part 1) - i am trask

A machine learning craftsmanship blog.

quaint mulch Oct 18, 2024, 7:15 AM

#

graceful niche or just some good in depth guide

pinned messages in this channel and https://github.com/aprbw/ArianDLPrimer

GitHub

GitHub - aprbw/ArianDLPrimer: My personal list of what are the thin...

My personal list of what are the things to learn in deep learning. - aprbw/ArianDLPrimer

odd meteor Oct 18, 2024, 7:57 AM

#

lilac lichen where i can read more?

Left to me I'd say, KAN currently looks like a nice interpretable model to play with toy examples, but it hasn’t shown nearly enough evidence to claim that it can replace MLPs.

So I have doubts about the claim that KAN is superior to MLPs.

https://arxiv.org/abs/2407.17790

arXiv.org

Exploring the Limitations of Kolmogorov-Arnold Networks in Classifi...

Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific ...

terse mirage Oct 18, 2024, 8:03 AM

#

quaint mulch if 0 is not what you expect, then what are you expecting?

its not supposed to be zero (at least without backpropagation i think). The first output represents the result of the feedforward, and i don't random weights and biases will align a lot of times to make the feedforward result zero

#

the second zero represents the errors of the backpropagation layer

#

i can't predict what exactly i'm expecting because i'm randomising my weights and biases at every run

terse mirage Oct 18, 2024, 8:32 AM

#

I figured out why its outputting zero as feedforward results

#

a lot of my weights are negative

#

and this impacting the weighted inputs

#

and since i'm using ReLU this is just becoming zero

dawn blaze Oct 18, 2024, 9:08 AM

#

Heyaaa, anyone here a "master" of tensorflow/keras? me and two classmates are working with audio-identification prediction and cant seem to get it to analys our voices. 🙂 anyone wanting to have a look at our codes 🙂

rich moth Oct 18, 2024, 10:10 AM

#

I made a Pokémon scraper. My next plan is to incorporate the MTG world into it but im not sure if it should be one parquet or two. what do you guys think? https://paste.pythondiscord.com/YFCQ

serene scaffold Oct 18, 2024, 12:46 PM

#

@dawn blaze always ask the question you actually want answered. Don't wait for a commitment.

olive wedge Oct 18, 2024, 1:01 PM

#

anyone got good resources from where i could begin learning?

crystal badger Oct 18, 2024, 1:03 PM

#

Has anyone tried using SDFusion or AutoSDF for 3D completion? I’d appreciate insights on how they perform, especially with setup or training. Were there any specific challenges you faced during the implementation or dataset preparation? Im stuck right nw

serene scaffold Oct 18, 2024, 1:04 PM

#

olive wedge anyone got good resources from where i could begin learning?

There are some in the pins

olive wedge Oct 18, 2024, 1:05 PM

#

serene scaffold There are some in the pins

thanksss!!

ancient hornet Oct 18, 2024, 4:22 PM

#

Hi, just wanted to ask some questions about MLE N-gram LMs. Does anyone have any expertise?

serene scaffold Oct 18, 2024, 4:22 PM

#

ancient hornet Hi, just wanted to ask some questions about MLE N-gram LMs. Does anyone have any...

Hello, remember to never ask to ask. Always ask the question you actually want answered.

olive wedge Oct 18, 2024, 4:36 PM

#

serene scaffold There are some in the pins

I went through the pinned messages, and most of the stuff is intermediate to advanced (personally for me). Do you have any resources for the beginner level?

serene scaffold Oct 18, 2024, 4:37 PM

#

olive wedge I went through the pinned messages, and most of the stuff is intermediate to adv...

!resources data science

arctic wedgeBOT Oct 18, 2024, 4:37 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

olive wedge Oct 18, 2024, 4:40 PM

#

serene scaffold !resources data science

oh wow, thank you so much!! 🙌

mossy hazel Oct 18, 2024, 5:12 PM

#

hi, i have an query related to my project that i am working. I quickly jump into my problem statement;

I need to extract a text data from Tabular format, plots, images in the PDF files. I need to develop a sub project using Generative AI architecture.

Can anyone help me with How should i start? what are things and stuff i need to integrate? Affective LLM models.. and finally you can help me with anything related to this...

serene scaffold Oct 18, 2024, 5:41 PM

#

mossy hazel hi, i have an query related to my project that i am working. I quickly jump into...

why do you need to use generative AI?

#

to cut to the chase: you shouldn't use generative AI. generative AI is what's fashionable right now, but it's the wrong choice for this task. generative AI is about creating new content. but you're not trying to create new content. you're trying to extract content that already exists.

mossy hazel Oct 18, 2024, 5:51 PM

#

serene scaffold to cut to the chase: you *shouldn't* use generative AI. generative AI is what's ...

what would be the right approach to deal?

serene scaffold Oct 18, 2024, 5:54 PM

#

mossy hazel what would be the right approach to deal?

something involving OCR.

mossy hazel Oct 18, 2024, 5:56 PM

#

serene scaffold something involving OCR.

see, extracting normal text data from normal PDF's is easy. but in my case, i need to extract Data from scientific/Research Documents, which isn't that easy. that is why using LLM's and agents weould be robust.

#

i need help, in architecture to develop this project. and workflow,

charred egret Oct 18, 2024, 6:10 PM

#

You’re extracting content from a pdf. Why would you want generative methods for this? What are you generating? You just need some way to read the pdf content, spot the tables, and transform it to whatever your target is

mossy hazel Oct 18, 2024, 6:22 PM

#

charred egret You’re extracting content from a pdf. Why would you want generative methods for ...

yeah understood, but i have been told to develop using generative arch specifically

left tartan Oct 18, 2024, 6:29 PM

#

charred egret You’re extracting content from a pdf. Why would you want generative methods for ...

This sounds way easier than it is. https://youtu.be/hjswZbbglbw?feature=shared for a nice talk on it

charred egret Oct 18, 2024, 6:39 PM

#

oh yeah I’m simplifying a lot here. text mining/extraction and friends can be very complex

ancient hornet Oct 18, 2024, 7:29 PM

#

I'm trying to create an MLE N-gram LM. I've been provided the following code that I need to fill in.

class mle_ngram_lm:
    def __init__(self, train : Sequence[str], n : int) -> None:
        # TODO: train the model here, declaring appropriate instance variables (i.e., model parameters!)
        # Consider what you need to compute logprob(w, c)!
        pass

    def logprob(self, w : str, c : Sequence[str]) -> float:
        assert (len(c) + 1) == self.n
        # TODO: compute p(w | c) using those instance variables
        return 0.0

I've been taking this course informally (not for accreditation) so I haven't been able to attend all of the lectures and now I'm confused as to what parameters need to be filled in. This is based on Chapter 3 of Jurafsky & Martin's Speech and Language Processing.

I posted this to the python-help channel but didn't get any replies so hoping that it gains more traction here.

mint plume Oct 18, 2024, 9:23 PM

#

What IDE's do you guys prefer?

spring field Oct 18, 2024, 9:32 PM

#

PyCharm (which unfortunately is not starting up lately) and VSCode

rich moth Oct 18, 2024, 9:34 PM

#

!paste

arctic wedgeBOT Oct 18, 2024, 9:34 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich moth Oct 19, 2024, 3:00 AM

#

mossy hazel see, extracting normal text data from normal PDF's is easy. but in my case, i ne...

https://paste.pythondiscord.com/ITSA

Maybe something like that would be a good start for a foundation.

broken eagle Oct 19, 2024, 3:17 AM

#

Anyone familiar with text-image retrieval? What's the SOTA now? Anything significantly better than BLIP?

agile cobalt Oct 19, 2024, 3:34 AM

#

broken eagle Anyone familiar with text-image retrieval? What's the SOTA now? Anything signifi...

for "what's the SOTA", see https://paperswithcode.com/task/image-to-text-retrieval or some other tasks they link in https://paperswithcode.com/paper/blip-2-bootstrapping-language-image-pre

for how good a model will be for your specific use case / what the best model is, the answer is pretty much the same as all software engineering problems - "it depends"

some models may perform better in specific datasets for particular reason, some use cases may require lower cost/latency or require a higher accuracy etc. (not to mention fine tuning, distillation and so on)

Papers with Code - Image-to-Text Retrieval

Image-text retrieval is the process of retrieving relevant images based on textual descriptions or finding corresponding textual descriptions for a given image. This task is interdisciplinary, combining techniques from computer vision, and natural language processing. The primary challenge lies in bridging the semantic gap — the difference...

#

also "significantly" better is very subjective

mossy hazel Oct 19, 2024, 3:52 AM

#

rich moth https://paste.pythondiscord.com/ITSA Maybe something like that would be a good ...

pymupdf
sympy
matplotlib
farm-haystack[all]
elasticsearch

when i try to install above requirements, it failing

Building wheels for collected packages: faiss-cpu
  Building wheel for faiss-cpu (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for faiss-cpu (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      running bdist_wheel
      running build
      running build_py
      running build_ext
      building 'faiss._swigfaiss' extension
      swigging faiss\faiss\python\swigfaiss.i to faiss\faiss\python\swigfaiss_wrap.cpp
      swig.exe -python -c++ -Doverride= -I/usr/local/include -Ifaiss -doxygen -DSWIGWIN -o faiss\faiss\python\swigfaiss_wrap.cpp faiss\faiss\python\swigfaiss.i    
      error: command 'swig.exe' failed: None
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for faiss-cpu
Failed to build faiss-cpu
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (faiss-cpu)

rich moth Oct 19, 2024, 4:12 AM

#

mossy hazel ```python pymupdf sympy matplotlib farm-haystack[all] elasticsearch ``` when i...

hmm uninstall faiss-cpu ```pip uninstall faiss-cpu

tryy ```pip instal faiss-cpu```  see if that works first

mossy hazel Oct 19, 2024, 4:14 AM

#

rich moth https://paste.pythondiscord.com/ITSA Maybe something like that would be a good ...

can you give me pip install with versions for all the imports ?

rich moth Oct 19, 2024, 4:14 AM

#

mossy hazel can you give me pip install with versions for all the imports ?

ya one sec.

mossy hazel Oct 19, 2024, 4:32 AM

#

rich moth ya one sec.

?

broken eagle Oct 19, 2024, 5:39 AM

#

agile cobalt for "what's the SOTA", see https://paperswithcode.com/task/image-to-text-retriev...

Fair point. Cool. Thanks.

rich moth Oct 19, 2024, 9:16 AM

#

mossy hazel ?

sorry dude, i got distracted

silver perch Oct 19, 2024, 9:18 AM

#

hey guys im thinking of a excalidraw kind of application with ai integrated so i can type query and it creates drawing accordingly
im not sure what should i use can anyone give me an idea of how can i do this? ( not image generation but get a type of data which i can parse and create stuff myself)

rich moth Oct 19, 2024, 9:24 AM

#

mossy hazel ?

Im watching Alien Romulus gonna have to work with me 🙂

jaunty helm Oct 19, 2024, 10:40 AM

#

silver perch hey guys im thinking of a excalidraw kind of application with ai integrated so i...

trying to go directly query -> AI -> image prob sucks
maybe through some intermediary like mermaid or graphviz to make the graph

stone patrol Oct 19, 2024, 1:11 PM

#

Where to get simple Text ai code?

odd meteor Oct 19, 2024, 1:21 PM

#

stone patrol Where to get simple Text ai code?

"simple text AI code"... Care to elucidate?

stone patrol Oct 19, 2024, 1:26 PM

#

odd meteor "simple text AI code"... Care to elucidate?

Just it’s code

odd meteor Oct 19, 2024, 1:28 PM

#

stone patrol Just it’s code

Do you mean mean ML model that works with text data or are you referring to something else called Text AI?

stone patrol Oct 19, 2024, 1:29 PM

#

odd meteor Do you mean mean ML model that works with text data or are you referring to some...

Just text data

odd meteor Oct 19, 2024, 1:32 PM

#

stone patrol Just text data

Okay got it. You can start with simple NLP tasks like text classification and sentiment analysis which uses BoW model.

You can check https://kaggle.com for code examples.

Kaggle: Your Machine Learning and Data Science Community

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

fresh bay Oct 19, 2024, 5:16 PM

#

looking at the pytorch geometric implementation of GCN - I am confused because it seems like they are assuming the adjancency matrix will be made up of 1's and 0's whch might not always be the case no?

#

#

compared with the formulation from Kipf

wooden sail Oct 19, 2024, 5:29 PM

#

fresh bay looking at the pytorch geometric implementation of GCN - I am confused because i...

should be the standard case. the adjacency matrix is a binary matrix with 1s where nodes are connected. if you include the weight of the edges, it's no longer an adjaency matrix. there are also some variations for directed and undirected graphs when you have loops (nonzeros along the diagonal)

wooden sail Oct 19, 2024, 5:29 PM

#

fresh bay

this doesn't say anything about what A and A tilde are, at least not explicitly. but note that self connections are considered through addition of an identity matrix, meaning it also assigns a 1 to any connections/edges

#

the matrix D there in the second figure is probably the "degree matrix", which tells you the number of edges for each vertex. given how they define it there, i'd expect A tilde to be a binary matrix, just like in the first image

fresh bay Oct 19, 2024, 5:46 PM

#

Thanks Edd - thats a fair point I am generally use to thinking of Adj matricies as correlations from frmi/transcriptome data - but yes that should be the standard case

#

I actually didnt know that if you have edge weights it isnt a true adj matrix that is good to know

#

@wooden sail catking \

onyx laurel Oct 19, 2024, 5:57 PM

#

Hello, I am someone who wants to start learning data science. I found some books from the OpenStax series: Algebra and Trigonometry (page 1516), Calculus Volume 1 (page 875), Calculus Volume 2 (page 737), Calculus Volume 3 (page 915), High School Statistics (page 932), Introductory Statistics (page 849), and Business Statistics (page 627). Do you think I need to study all of them fully

#

Does anyone have any experience with them?

wooden sail Oct 19, 2024, 6:02 PM

#

onyx laurel Hello, I am someone who wants to start learning data science. I found some books...

the topics are definitely among the basics that people studying data science-adjacent programs in undergrad cover

#

but in uni it's unusual to grab a book and read it cover to cover. it's more common to grab several books on the topic and fish out some select few sections. you read the rest as needed, since it's unrealistic to cover all the topics in a whole book well in a semester

#

it might help to orient yourself by looking at the syllabi of some programs

#

you can find them e.g. for some of the MIT open courseware to get an idea of how fully books are really covered

onyx laurel Oct 19, 2024, 6:06 PM

#

You know, I want to learn data science through self-study, not by going to university. Do you have any suggestions for me?

wooden sail Oct 19, 2024, 6:06 PM

#

exactly the same ones i gave you just now

#

you wouldn'T be asking about books if you were going to uni 😛 i'm giving you the context of how it would be for people who do

fresh bay Oct 19, 2024, 6:07 PM

#

Id also recommend ISL

#

Khan Academy is also good

onyx laurel Oct 19, 2024, 6:07 PM

#

what's ISL

wooden sail Oct 19, 2024, 6:08 PM

#

i would only add that it's important to try to set a schedule for yourself and stick to it, and to cover important topics even if you find them boring. also do several problems to check you're understanding

fresh bay Oct 19, 2024, 6:08 PM

#

introduction to statistical learning

#

agree with Edd about the schedule

wooden sail Oct 19, 2024, 6:08 PM

#

uni naturally forces you to do all those things, but you might feel like skipping them when studying on your own. ideally you wouldn't skip them cuz they're actually helpful

onyx laurel Oct 19, 2024, 6:09 PM

#

But I'm really better at learning by reading text than by watching videos or taking courses.

fresh bay Oct 19, 2024, 6:10 PM

#

I know this is a python server but introduction to data science with tidyverse is pretty good as well

onyx laurel Oct 19, 2024, 6:10 PM

#

I found the atmosphere of this server better than the others. thanks both of you

wooden sail Oct 19, 2024, 6:11 PM

#

watching videos may be the worst way of learning a topic from scratch, there are too many pitfalls that are easy to fall into. and courses are designed around books and require you to go read it anyway. for the most part, grabbing the book and reading it + doing several exercises is the best way to go about it. at least imo

#

i do find the math server is pretty good to ask for help and explanations on math subjects like these btw

onyx laurel Oct 19, 2024, 6:13 PM

#

I'm a perfectionist, and I usually spend a lot of time on a topic. How much time do you think is enough, on average, for each topic?

wooden sail Oct 19, 2024, 6:13 PM

#

https://discord.gg/math

fresh bay Oct 19, 2024, 6:13 PM

#

Id add if you wanted a video - Statquest by Josh Starmer is wonderful

fresh bay Oct 19, 2024, 6:13 PM

#

onyx laurel I'm a perfectionist, and I usually spend a lot of time on a topic. How much time...

this is completely variable

#

like there is no way to know

wooden sail Oct 19, 2024, 6:14 PM

#

onyx laurel I'm a perfectionist, and I usually spend a lot of time on a topic. How much time...

i would say that, at undergrad level, a working understanding is usually enough. you'd be surprised how little depth there is in undergrad

#

it'S more about becoming aware that several weird things exist, getting some intuition, and most importantly, learning how to learn

fresh bay Oct 19, 2024, 6:15 PM

#

at some point - the best way to do this is just to go do it

onyx laurel Oct 19, 2024, 6:15 PM

#

thankss😊

wooden sail Oct 19, 2024, 6:15 PM

#

here, since you have data science as a goal, not straight up studying mathematics, it helps a lot to look for practical problems related to the math topics, if at all possible

fresh bay Oct 19, 2024, 6:15 PM

#

you need to accept you wont be able to do things perfectly - but make sure you do a few of the excerices even if its the first two or three

#

dont just read you need to do a few problems

onyx laurel Oct 19, 2024, 6:17 PM

#

Do you think it's a good idea to ask ChatGPT to make a list of subjects that I should study?

wooden sail Oct 19, 2024, 6:18 PM

#

i wouldn't ask chatgpt something you don't already know about, no

fresh bay Oct 19, 2024, 6:18 PM

#

no

wooden sail Oct 19, 2024, 6:18 PM

#

or at the very least cross-check what it spits out

fresh bay Oct 19, 2024, 6:18 PM

#

you can use it if you can verify the answer

#

but if you cant then dont

onyx laurel Oct 19, 2024, 6:19 PM

#

again thanks

mellow vector Oct 19, 2024, 6:27 PM

#

yall use nbextensions?

mellow vector Oct 19, 2024, 6:46 PM

#

hmm Well I'm annoyed with the installation process now and have abandoned that. I was hoping to find a tool that would assist in writing docstrings, going start spending some time exercising my tech writing synapses. If yall suggest anything for jupyter I'm all ears.

#

I'm also thinking I'll commit to the numpy standard, data analysis is my end goal for studying python.

mellow vector Oct 19, 2024, 8:19 PM

#

👂👂🌽👂

rich moth Oct 20, 2024, 1:00 AM

#

mellow vector hmm Well I'm annoyed with the installation process now and have abandoned that. ...

There's a lot of tools, you can use Cody integrated with VSC to help write doc strings or Mintlify doc writer.

vocal zealot Oct 20, 2024, 1:51 AM

#

agile cobalt Oct 20, 2024, 1:58 AM

#

vocal zealot

what?

delicate apex Oct 20, 2024, 1:59 AM

#

youtube should stop being dum with the constant anti-adblock war and disable video speed controls

vocal zealot Oct 20, 2024, 2:00 AM

#

delicate apex youtube should stop being dum with the constant anti-adblock war and *disable vi...

I use ad accelerator. Speeds up the add so you can skip it after like one second. Never been blocked.

#

Chrome extension\

wintry relic Oct 20, 2024, 2:01 AM

#

woa
this isn't data science...

delicate apex Oct 20, 2024, 2:01 AM

#

ah, fooey. this is not an ot channel. did you mean to post this here?

static falcon Oct 20, 2024, 3:46 AM

#

What a pathetic response by Stelercus. @viral zinc Learn both. Becoming comfortable with both will only make you a stronger candidate in the job market whether it's academia, industry, or government. On a further note, you can have fun spending a lot more unnecessary time reading in .csv or Excel data and creating a plot in Python that would take mere moments in R (and people are short on time, so use common sense).

serene scaffold Oct 20, 2024, 3:50 AM

#

static falcon What a pathetic response by Stelercus. <@801616748817940551> Learn both. Becomin...

Why are you responding to something I said several months ago?

static falcon Oct 20, 2024, 4:00 AM

#

Get comfortable understanding statistics and probability theory at its core. You should study calculus and, once in college, take a proofs course and then move into real analysis. Application comes from Theory, and without a solid foundation, well... just think of "The Three Little Pigs."

serene scaffold Oct 20, 2024, 4:02 AM

#

static falcon Get comfortable understanding statistics and probability theory at its core. You...

The message after that one mentions R. Are you searching for messages in this channel that mention R?

#

I'm not saying you can't do any of these things. I'm just trying to understand your motives.

pliant echo Oct 20, 2024, 4:40 AM

#

Heyo question what are the alternatives to confidence score if the model you're using(Gemini) doesn't support them

pine lake Oct 20, 2024, 7:07 AM

#

yo guys, can you please tell me some good textbooks on ML and AI?

wooden sail Oct 20, 2024, 7:29 AM

#

pine lake yo guys, can you please tell me some good textbooks on ML and AI?

try perusing the pinned messages

worldly dawn Oct 20, 2024, 7:39 AM

#

pine lake yo guys, can you please tell me some good textbooks on ML and AI?

I like the books from https://charuaggarwal.net/

brazen cape Oct 20, 2024, 10:23 AM

#

hey guys I have some questions about ml could someone help me with that

#

what are ml specializations and what is recommendation systems all about

thorny geode Oct 20, 2024, 10:44 AM

#

static falcon Get comfortable understanding statistics and probability theory at its core. You...

i see, i do lack some calculus and some basic statistics foundation, tho i want to also increase my practical skills together with my theoretical knowledge

#

last time i am forced to learn anova and regression tools in my competition, but i see some real world results from those kind of case study questions, so im now trying to find some similar projects like that

lunar gyro Oct 20, 2024, 11:19 AM

#

I want to be a data analysis. Rn I'm doing under graduation in computer applications. So, could someone suggest me that what projects should be I looking for a good company and earnings.

earnest widget Oct 20, 2024, 1:52 PM

#

lunar gyro I want to be a data analysis. Rn I'm doing under graduation in computer applicat...

Instead of focusing on many projects, try to focus on something you are interested in. Specifically domain based knowledge is better. A specific industry you want to get into.

desert oar Oct 20, 2024, 2:42 PM

#

grand breach does it make any sense to do standardization after hash encoding categorical var...

You mean like scaling your binary features or something other than 0 and 1? It can be useful for the purpose of interpreting your model, but it shouldn't change the results

grand breach Oct 20, 2024, 2:43 PM

#

high cardinality features only the target variable is binary

#

some features have 1000s of categories

desert oar Oct 20, 2024, 3:19 PM

#

grand breach high cardinality features only the target variable is binary

it shouldn't matter, 0 and 1 are fine

feral blade Oct 20, 2024, 5:56 PM

#

Hello! I wanted some help with ai tools.
I need to run a batch llm query on a dataset column with about 4k entries.
is there any that lets you do this for free without card info?

serene scaffold Oct 20, 2024, 7:24 PM

#

feral blade Hello! I wanted some help with ai tools. I need to run a batch llm query on a da...

Probably not. Your best bet is to try running the LLM on a platform like Google colab and hope you can get through all 4k before you max out your compute limit

#

Crypto miners ruined free compute for everyone

versed pilot Oct 20, 2024, 7:28 PM

#

lunar gyro I want to be a data analysis. Rn I'm doing under graduation in computer applicat...

Do you know SQL or Pandas/Python, or R? Do you know any visualisation tools (python or R libraries, Tableau, Power BI). Do you know basic statistics, normal distribution, means/medians/standard deviations etc? That's probably a good starting point, try and put together a portfolio to demonstrate your knowledge, ideally also displaying some domain expertise about the data you are analysing.

feral blade Oct 20, 2024, 7:40 PM

#

serene scaffold Probably not. Your best bet is to try running the LLM on a platform like Google ...

Ooh which ones can i run there?
I tried gpt api but it seemed like it wanted some cards set up before i can run anything.

serene scaffold Oct 20, 2024, 7:53 PM

#

feral blade Ooh which ones can i run there? I tried gpt api but it seemed like it wanted so...

You can't run any openai models. You have to use their API, and there is no way around this

#

The ones on hugging face can be used "anywhere" that has a large enough GPU

#

I would try mistral 7b. That means it has seven billion parameters. You might be able to download and load it on collab.

#

You probably won't be able to run any 70b models.

feral blade Oct 20, 2024, 7:54 PM

#

Oooh awesome, thanks a lot 👌

quartz lotus Oct 20, 2024, 10:05 PM

#

Does anyone know if pyautogui works with open cv template matching? I can't seem to get it to work in runtime but for static images i can make template matching work

nocturne valley Oct 20, 2024, 11:50 PM

#

"data science and ai" feels like "statistics and epistemology"

serene scaffold Oct 20, 2024, 11:51 PM

#

nocturne valley "data science and ai" feels like "statistics and epistemology"

you are entirely right. I am going to memorialize your complete and total correctness.

nocturne valley Oct 20, 2024, 11:51 PM

#

"math and living"

#

lol

serene scaffold Oct 20, 2024, 11:51 PM

#

!otn a statistics and epistemology

arctic wedgeBOT Oct 20, 2024, 11:51 PM

#

:ok_hand: Added statistics-and-epistemology to the names list.

quaint mulch Oct 21, 2024, 1:21 AM

#

!otn

quaint mulch Oct 21, 2024, 1:21 AM

#

serene scaffold !otn a statistics and epistemology

what does this do?

quaint mulch Oct 21, 2024, 1:25 AM

#

earnest widget Instead of focusing on many projects, try to focus on something you are interest...

And also @versed pilot
Can you give an example of someone, making a project / portfolio so good, that they get many good job offers?

Like, these advise make a lot of sense to me. And I used to give them out too. But now that I got some experience, I'm wondering how true is it, and if it actually works. So I'm looking for evidences that these advise actually works.

left tartan Oct 21, 2024, 1:36 AM

#

quaint mulch And also <@1114957477889441822> Can you give an example of someone, making a pr...

Minecraft?

quaint mulch Oct 21, 2024, 1:42 AM

#

I mean, something more data-sciency

quaint mulch Oct 21, 2024, 1:45 AM

#

left tartan Minecraft?

According to wiki, Notch is already have a job as a game dev.
I mean, someone who are not already doing data-science, or maybe a huge career jump from data analyst to a data science role and triple their pay

left tartan Oct 21, 2024, 1:49 AM

#

quaint mulch According to wiki, Notch is already have a job as a game dev. I mean, someone wh...

Jobs don't really work that way, at least not in some reproducible way.

quaint mulch Oct 21, 2024, 2:00 AM

#

left tartan Jobs don't really work that way, at least not in some reproducible way.

I know, I'm not asking for like a proper survey study with stats. I'm just asking if there are at least few anecdotal examples to support this idea.

fresh bay Oct 21, 2024, 3:05 AM

#

does pytorch geometric not have a way to represent my connections with edges in the data class?

earnest widget Oct 21, 2024, 3:50 AM

#

quaint mulch And also <@1114957477889441822> Can you give an example of someone, making a pr...

Well maybe if you’re looking at a project, it could be something where investors are ready to invest in or a big project which is open sourced and used by many. That’s something that catches a lot of eyes. But it has to be something really eye catching by tech firms or investors especially.

Or if you’re interested in the research field, then maybe some groundbreaking paper will definitely turn some eyes towards you.

quaint mulch Oct 21, 2024, 4:45 AM

#

then maybe some groundbreaking paper
...cries... (even incremental is already too hard!!!)

quaint mulch Oct 21, 2024, 4:47 AM

#

earnest widget Well maybe if you’re looking at a project, it could be something where investors...

something where investors are ready to invest in
Besides networking, how else can I figure out which people are ready to invest in what kind of capabilities, and how much money are they willing to spend?

finite narwhal Oct 21, 2024, 7:22 AM

#

any ideas why this is not working?

conda install -c conda-forge -c schrodinger pymol

or

conda install -c conda-forge -c schrodinger pymol-bundle

PackagesNotFoundError: The following packages are not available from current channels:

pymol
Current channels:

https://conda.anaconda.org/schrodinger

https://conda.anaconda.org/NEURON

https://conda.anaconda.org/conda-forge

https://conda.anaconda.org/r

https://conda.anaconda.org/bioconda
it's on the pymol webpage,

also the package is available: https://anaconda.org/schrodinger/pymol
this (listed in above link) doesn't find it either conda install schrodinger::pymol

versed pilot Oct 21, 2024, 7:56 AM

#

quaint mulch And also <@1114957477889441822> Can you give an example of someone, making a pr...

So often in interviews you will get asked about a project you worked on, you might even be asked to give a presentation on something. You can always talk about projects with previous employers but only up to a point, you can't give confidential and commercially sensitive information of the previous employer to the next employer. So a good personal project might be an alternative there. A good university project likewise, but as the years go by, you can't keep referring to your student work.
You can also present pet projects in meetups and community events which usually are good networking opportunities.

finite narwhal Oct 21, 2024, 7:59 AM

#

finite narwhal any ideas why this is not working? ``` conda install -c conda-forge -c schroding...

Ok it was the architecture, passing --platform osx-64 to micromamba fixed it

#

How do you all resolve dependency problems? I'm trying to get a simple example with a recent version of pymol and rdkit and running into a quagmire of dependency incompatibilities. I don't have a preference for python version etc. just get that basic example running.

#

added an env.yaml file but can't seem to get the versions right for it to work

#

ok specifying no versions helps, then I can copy the compatible versions found

#

what but the versions chosen are mega old, like python 3.6, pymol 2.3.5, rdkit from 2018 !??

versed pilot Oct 21, 2024, 8:05 AM

#

Python 3.6 is no longer supported I think

finite narwhal Oct 21, 2024, 8:06 AM

#

does the automatic solver (when I don't specify any versions) pick the most recent possible? or what is it doing

#

and if not how do I find otherwise the compatible versions, there's a ton of dependencies and I can't possibly figure out what goes with what

#

I mean sub-dependencies of pymol and rdkit

scarlet anchor Oct 21, 2024, 2:49 PM

#

Hi, I want to use any good AI model like LLama or any other to generate synthetic data. Specifically want to generate synthetic data to predict sentiments on Indian langauges. Are there good ones I can use?

serene scaffold Oct 21, 2024, 3:02 PM

#

scarlet anchor Hi, I want to use any good AI model like LLama or any other to generate syntheti...

llama is a generative language model. you're trying to predict sentiments, which is not a form of text generation.
I would look into sentiment analysis techniques that don't require a pre-trained model.

scarlet anchor Oct 21, 2024, 3:05 PM

#

serene scaffold llama is a generative language model. you're trying to predict sentiments, which...

Uhhh, no, I want to ue llama to only generate synthetic data to feed to my model to predict the sentiments

#

Due to lack of data, I want to geenrate synthetic data

#

https://distilabel.argilla.io/latest/sections/how_to_guides/basic/task/

Tasks for generating and judging with LLMs - Distilabel Docs

Distilabel is an AI Feedback (AIF) framework for building datasets with and for LLMs.

serene scaffold Oct 21, 2024, 3:05 PM

#

ah right. I see.

scarlet anchor Oct 21, 2024, 3:05 PM

#

for now, I am trying this

serene scaffold Oct 21, 2024, 3:06 PM

#

which languages specifically?

scarlet anchor Oct 21, 2024, 3:06 PM

#

serene scaffold which languages specifically?

Hindi, Kannada, Tamil, Telugu - Indian Lanugages

serene scaffold Oct 21, 2024, 3:08 PM

#

scarlet anchor Hindi, Kannada, Tamil, Telugu - Indian Lanugages

Do you speak all four of these?
I just asked llama3-70b to generate some text in Hindi. I don't know what it means or if it's correct.

फ़ीस बुक बिल्डिंग में मौजूदा संकाय सदस्य, कर्मचारी और छात्र, साथ ही साथ अकादमिक अध्यक्ष और डीन अपने आप से तालमेल करके इमारत की इस मंज़िल पर उपलब्ध हैं।

scarlet anchor Oct 21, 2024, 3:08 PM

#

serene scaffold Do you speak all four of these? I just asked llama3-70b to generate some text in...

Hmmm. no, I only know Kannada

scarlet anchor Oct 21, 2024, 3:08 PM

#

serene scaffold Do you speak all four of these? I just asked llama3-70b to generate some text in...

Existing faculty members, staff and students in the Fee Book Building, as well as Academic Chairmen and Deans are available to coordinate on this floor of the building.

#

translation ^

serene scaffold Oct 21, 2024, 3:09 PM

#

scarlet anchor Hmmm. no, I only know Kannada

ಮಾನವೀಯ ಶಿಕ್ಷಣದ ಪ್ರಮುಖ ಉದ್ದೇಶವೆಂದರೆ ಸರ್ವಾಂಗೀಣ ಅಭಿವೃದ್ಧಿಯನ್ನು ಸಾಧಿಸಲು ನಿರ್ಮಲ ಮನಸ್ಸನ್ನು ಮತ್ತು ಸುಶಿಕ್ಷಿತ ಮನಸ್ಸನ್ನು ಬೆಳೆಸುವುದು.

#

how is that?

scarlet anchor Oct 21, 2024, 3:09 PM

#

serene scaffold > ಮಾನವೀಯ ಶಿಕ್ಷಣದ ಪ್ರಮುಖ ಉದ್ದೇಶವೆಂದರೆ ಸರ್ವಾಂಗೀಣ ಅಭಿವೃದ್ಧಿಯನ್ನು ಸಾಧಿಸಲು ನಿರ್ಮಲ ಮನಸ...

The main objective of humanitarian education is to cultivate a pure mind and a well-educated mind to achieve all-round development.

serene scaffold Oct 21, 2024, 3:10 PM

#

you'll need to confirm with fluent speakers of each language that the LLM reliably produces correct text.
neural machine translators might still produce correct-sounding translations, even if the original text contains mistakes that a fluent speaker wouldn't make.

scarlet anchor Oct 21, 2024, 3:10 PM

#

👍

#

yes thanks

serene scaffold Oct 21, 2024, 3:10 PM

#

also, llama3-70b takes a huge amount of GPU space. if you don't have an enterprise compute environment, you will need to pay for one. or buy some API credits.

scarlet anchor Oct 21, 2024, 3:10 PM

#

yes

scarlet anchor Oct 21, 2024, 3:12 PM

#

serene scaffold also, llama3-70b takes a huge amount of GPU space. if you don't have an enterpri...

I am still a student, so cannot purchase anything yet 😅

serene scaffold Oct 21, 2024, 3:13 PM

#

scarlet anchor I am still a student, so cannot purchase anything yet 😅

there is no way around this.

#

you can ask your university if they have a compute environment that you can use

#

but you can't run llama3-70b on your own computer, and no one is going to give you enough compute to do it for free. except maybe a university that you belong to.

scarlet anchor Oct 21, 2024, 3:14 PM

#

👍

narrow merlin Oct 21, 2024, 3:29 PM

#

serene scaffold but you can't run llama3-70b on your own computer, and no one is going to give y...

currently you got groq giving it to you for free

scarlet anchor Oct 21, 2024, 3:29 PM

#

narrow merlin currently you got groq giving it to you for free

free?

narrow merlin Oct 21, 2024, 3:29 PM

#

yes free

scarlet anchor Oct 21, 2024, 3:29 PM

#

https://console.groq.com/playground this?

GroqCloud

Experience the fastest inference in the world

narrow merlin Oct 21, 2024, 3:29 PM

#

yes

#

just register and start using, all free, they show you how it WOULD cost if they activate the payment, which just makes it even cooler, cause its splinter of PENNIES what that stuff cost overall.

#

and sambanova.ai is also free there you even got the 405b bomber

#

but i tell you: big models are not your solution

scarlet anchor Oct 21, 2024, 3:39 PM

#

👍

narrow merlin Oct 21, 2024, 3:42 PM

#

oh right and huggingface also allows a degree of usage of the inference api inside some rate limits, i am just not sure if you can make it work for the 405b but i think 70b should work somewhere, i am just sooooo much not getting their complete product world yet

jaunty helm Oct 21, 2024, 4:02 PM

#

scarlet anchor Hindi, Kannada, Tamil, Telugu - Indian Lanugages

gemma might be a better fit for multilingual

scarlet anchor Oct 21, 2024, 4:13 PM

#

jaunty helm gemma might be a better fit for multilingual

sadly tat didnt work too !!

My strength lies in understanding and generating text in English. Creating grammatically sound sentences in another language requires a deep understanding of its rules and nuances, which I don't currently possess.

narrow merlin Oct 21, 2024, 4:14 PM

#

well gemma knows a lot of languages, but its not specific made for languages, but its general a good model, and i hope you use gemma2 😉

#

if you want the full blown langugae stuff you can use the aya, but aya is really bad at everything else 😄

scarlet anchor Oct 21, 2024, 4:15 PM

#

Hmm I do use gemma for other purposes ofc

scarlet anchor Oct 21, 2024, 4:15 PM

#

narrow merlin if you want the full blown langugae stuff you can use the aya, but aya is really...

ah

#

wata aya?

narrow merlin Oct 21, 2024, 4:15 PM

#

https://ollama.com/library/aya

aya

Aya 23, released by Cohere, is a new family of state-of-the-art, multilingual models that support 23 languages.

#

actually i think thats even now outdated hahaha

scarlet anchor Oct 21, 2024, 4:15 PM

#

narrow merlin https://ollama.com/library/aya

it can generate multilingual data?

#

narrow merlin Oct 21, 2024, 4:16 PM

#

oh right qwen was the one

#

https://ollama.com/library/qwen2.5

qwen2.5

Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.

#

but yeah, its always a problem what you wanna do with the language

scarlet anchor Oct 21, 2024, 4:16 PM

#

yess

narrow merlin Oct 21, 2024, 4:16 PM

#

testing testing testing

#

chainforge

scarlet anchor Oct 21, 2024, 4:16 PM

#

haha ye

bright garden Oct 21, 2024, 4:32 PM

#

Was just profiling my PyTorch Lightning code. Does anyone know why configure_optimizers() is called 4-6 thousand times?

#

I have a very standard PyTorch Lightning module with a config_optimizers method defined

#

Does this have anything to do with the learning rate scheduler?

serene scaffold Oct 21, 2024, 6:06 PM

#

can someone help out in #1297970307747020850? it's a pretty simple question about numpy usage. I need to run an errand.

dapper berry Oct 21, 2024, 6:24 PM

#

bright garden Was just profiling my PyTorch Lightning code. Does anyone know why `configure_op...

cumtime i cant act mature

pastel frost Oct 21, 2024, 7:17 PM

#

Hi, I'd hate to fill the chat with absolute newbie questions. I finished all my gen ed classes at community college and transfered to another college to take my data science major. Brand new to coding and coding classes! If anyone can help with methods to study / anything data science career related. Please let me know PM if anything too I knew i'm really new. I also have a really big test on functions/tupples/lists next monday that im super nervous about! So, big SOS 😄

vestal spruce Oct 21, 2024, 8:00 PM

#

Hi, quick question I wnat to use NLTK for sentiment analysis but the language dataset I'm using is not English, do I need to find and use a local language tokenizer/preprocessing so it can provide an accurate results?

serene scaffold Oct 21, 2024, 8:02 PM

#

vestal spruce Hi, quick question I wnat to use NLTK for sentiment analysis but the language da...

what language?

serene scaffold Oct 21, 2024, 8:03 PM

#

pastel frost Hi, I'd hate to fill the chat with absolute newbie questions. I finished all my ...

hello and welcome to our wonderful data science chat. sounds like your functions/tuples/list test is about python fundamentals. not data science. so you can just do any beginner python exercizes.

pastel frost Oct 21, 2024, 9:11 PM

#

serene scaffold hello and welcome to our wonderful data science chat. sounds like your functions...

Thanks man, sorry to bother the chat i wasnt sure where to post.

serene scaffold Oct 21, 2024, 11:52 PM

#

pastel frost Thanks man, sorry to bother the chat i wasnt sure where to post.

you are not bothering the chat. this is what we are here for.

rich moth Oct 21, 2024, 11:52 PM

#

I’m having an issue with shape mismatch during the validation phase of my model the input data that includes fixed variables like ['a1', 'an', 'ak', 'n', 'd', 'Sn', 'k'], which I’m using to solve arithmetic progression formulas. Heres the code and an example. https://paste.pythondiscord.com/HT6Q

FIXED_VARIABLES = ['a1', 'an', 'ak', 'n', 'd', 'Sn', 'k']  # Added 'Sn' and 'k'

# Define symbolic variables based on FIXED_VARIABLES
a1, an, ak, n, d, Sn, k = sp.symbols(FIXED_VARIABLES)

# Key AP formulas using fixed variables
AP_FORMULAS = [
    sp.Eq(an, a1 + (n - 1) * d),  # an = a1 + (n-1)d
    sp.Eq(a1, an - (n - 1) * d),  # Rearranged formula to solve for a1
    sp.Eq(ak, a1 + (k - 1) * d),  # ak = a1 + (k-1)d
    sp.Eq(an, (2 * Sn) / n - a1),  # Derived from Sn = (n/2) * (2*a1 + (n-1)*d)
    sp.Eq(d, (an - a1) / (n - 1)),  # d = (an - a1) / (n-1)
    sp.Eq(Sn, (n / 2) * (2 * a1 + (n - 1) * d)),  # Sum formula Sn = (n/2) * (2*a1 + (n-1)*d)
    sp.Eq(Sn, (n / 2) * (a1 + an)),  # Sn = (n/2) * (a1 + an), alternative form
]
Error in predict_formula: mat1 and mat2 shapes cannot be multiplied (1x6 and 5x256)
Input tensor shape before passing to model: torch.Size([1, 6])
2024-10-21 16:43:12,504 INFO:Generated 0 valid novel formulas.
2024-10-21 16:43:12,504 INFO:Testing model with an example...
Input tensor shape before passing to model: torch.Size([1, 6])
2024-10-21 16:43:12,505 ERROR:Error in predict_formula: mat1 and mat2 shapes cannot be multiplied (1x6 and 5x256)
2024-10-21 16:43:12,505 INFO:Scenario: {a1: 5.0, n: 10.0, d: 3.0}
2024-10-21 16:43:12,505 INFO:Predicted Formula: None
2024-10-21 16:43:12,505 INFO:Main training pipeline completed.```


The model trains fine but durning the start of the formula validation i get a mismatch error the one above.  During training, the input size is 5 features, but during the validation phase, the input shape becomes [1, 6], likely due to the formulas or how the input data is structured at that point

tawdry sundial Oct 22, 2024, 12:18 AM

#

how can i improve my model accuracy?

#

#

I want to improve my randomtreeregresssor accuracy

# Preprocessing for numerical data
numerical_transformer = Pipeline(steps=[("Scaler", StandardScaler()),("Imputer",SimpleImputer(strategy="constant"))]) # Your code here

# Preprocessing for categorical data
categorical_transformer = Pipeline(steps=[("Imputer",SimpleImputer(strategy="most_frequent")),("OHE",OneHotEncoder(handle_unknown="ignore"))]) # Your code here

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

# Define model
model = RandomForestRegressor(n_estimators=100,random_state=0) # Your code here

# Check your answer
step_1.a.check()```

#

this iss currently my pipeline

rich moth Oct 22, 2024, 2:13 AM

#

rich moth I’m having an issue with shape mismatch during the validation phase of my model ...

I figured it out, i wasn't creating the synthetic data properly. It was causing issues downstream.

rich moth Oct 22, 2024, 2:58 AM

#

2024-10-21 21:07:48,134 INFO:Attempt 856: Valid data point generated: {'n': 65.44522712125084, 'a1': 2.0614216909210743, 'd': 22.104379657779848, 'an': 1426.5831891109}
2024-10-21 21:07:48,273 INFO:Attempt 857: Invalid target value: None. Skipping.
2024-10-21 21:07:48,413 INFO:Attempt 858: Invalid target value: None. Skipping.
2024-10-21 21:07:48,611 INFO:Valid solutions for Eq(Sn, n*(2*a1 + d*(n - 1))/2): [1.27052504310215]
2024-10-21 21:07:48,611 INFO:Attempt 859: Valid data point generated: {'Sn': 69.45588675795838, 'd': 71.04840790543672, 'a1': 45.05688735810755, 'n': 1.2705250431021455}
2024-10-21 21:07:48,780 INFO:Attempt 860: Invalid target value: None. Skipping.
2024-10-21 21:07:48,878 INFO:Valid solutions for Eq(an, 2*Sn/n - a1): [1.76248570047424]
2024-10-21 21:07:48,878 INFO:Attempt 861: Valid data point generated: {'Sn': 82.19978323777535, 'an': 22.98209692884189, 'a1': 70.29500962091075, 'n': 1.7624857004742391}
2024-10-21 21:07:48,914 INFO:Valid solutions for Eq(an, a1 + d*(n - 1)): [1.02059088463249]
2024-10-21 21:07:48,914 INFO:Attempt 862: Valid data point generated: {'an': 61.113350388683386, 'n': 38.768650571295055, 'a1': 22.567009890750036, 'd': 1.020590884632487}

Woohoo!

rich moth Oct 22, 2024, 3:44 AM

#

i'm using yoshitomo-matsubara/srsd-feynman_easy from HF, but it contains physics equationsand various variables, and I'm trying to discover new relationships between them. Here's a correlation heatmap I generated to analyze how the features in the dataset interact with each other.

rigid cape Oct 22, 2024, 5:32 AM

#

Hey people. How do I improve the results of my RAG project. I'm not getting good results from my direct retrieval from the vector database. Any resources or article links also would be great .

past meteor Oct 22, 2024, 5:55 AM

#

rigid cape Hey people. How do I improve the results of my RAG project. I'm not getting good...

Maybe the questions are just unsuitable for a RAG

rich moth Oct 22, 2024, 5:56 AM

#

rigid cape Hey people. How do I improve the results of my RAG project. I'm not getting good...

Let's see what you're working with,.

past meteor Oct 22, 2024, 5:56 AM

#

Things like “summarise my documents” will not work

rich moth Oct 22, 2024, 5:58 AM

#

past meteor Things like “summarise my documents” will not work

I summarize conversations for short term memory. Guess depoends what you're doing with it.

past meteor Oct 22, 2024, 5:59 AM

#

Yea, but that’s not summarising your documents

#

That will obviously work

rich moth Oct 22, 2024, 5:59 AM

#

Well it gets embeded into the index.

past meteor Oct 22, 2024, 6:00 AM

#

The problem is just that some questions don’t really result in a retrievable set of documents, that’s what I meant

rich moth Oct 22, 2024, 6:00 AM

#

rigid cape Hey people. How do I improve the results of my RAG project. I'm not getting good...

Anyways, lets see what you're trying todo, paste the code with !paste.

past meteor Oct 22, 2024, 6:00 AM

#

So I’m more interested in what questions he wants to ask

#

Not necessarily the code, depending on the questions you want to ask it’s already a dead end and you should consider other things than rags

#

Like, imagine you have a recipe book. You embed each recipe separately. You shouldn’t ask “give me a recipe”, your result will be pretty much arbitrary

rigid cape Oct 22, 2024, 6:22 AM

#

So I'm making my RAG application where in I've used some of my textbooks and previous year question papers as the data. So I need to get questions related to a topic whenever I ask about a particular topic

#

If I give it a broad chapter name , it does give me some questions related to the chapter that have the name in . But if I give it some specific topic , it fails. I'm testing it on various ways to adjust my prompt and retrieval mechanisms.

rich moth Oct 22, 2024, 6:26 AM

#

You need to embed your data into the vector database and use a similarity function to measure the vectors in order to perform efficient nearest neighbor searches. Did you already decied your framework?

rigid cape Oct 22, 2024, 6:26 AM

#

Maybe RAG isn't suited for this ? I don't know , just gave it a try

rigid cape Oct 22, 2024, 6:26 AM

#

rich moth You need to embed your data into the vector database and use a similarity functi...

I did embed it into a faiss vector database

rich moth Oct 22, 2024, 6:27 AM

#

rigid cape I did embed it into a faiss vector database

It problem revolves somewhere around how you indexed or retrieving the data from the Fass. What are you using to generate the embeddings?

rigid cape Oct 22, 2024, 6:28 AM

#

rich moth It problem revolves somewhere around how you indexed or retrieving the data from...

Google Gemini API

rich moth Oct 22, 2024, 6:28 AM

#

Is it just text? English?

#

I recommened a sentence transformer or BERT,, but ive had lots of sucess with both of them

rigid cape Oct 22, 2024, 6:29 AM

#

rich moth Is it just text? English?

Yeah it's English , mostly text but with some math with it. I did use bert but it failed with the math part

#

I didn't change the math to latex but used the utf encoding as it is for say - summation, integration etc. was too lazy to do that. Maybe that could have caused some problems

rich moth Oct 22, 2024, 6:32 AM

#

rigid cape Yeah it's English , mostly text but with some math with it. I did use bert but i...

data processing and cleaning is key is a lot of AI stuff.

#

It could be your data too, have you tried to view it?

rigid cape Oct 22, 2024, 6:34 AM

#

rich moth It could be your data too, have you tried to view it?

Yeah , tried it . I stored the data chunks in a SQL database with references to page numbers , book names etc .

#

The problem I get is usually in the retrieval part .

#

I let my code usually print out the text it retrieves and then use it for the generated output

#

I guess I'll try re-encoding my math into latex . I'm still trying out various prompts . Hope it gets better.

rich moth Oct 22, 2024, 6:39 AM

#

rigid cape I guess I'll try re-encoding my math into latex . I'm still trying out various p...

Good luck!

rigid cape Oct 22, 2024, 6:40 AM

#

rich moth Good luck!

Yeah thank you for your help too

rich moth Oct 22, 2024, 7:20 AM

#

I think my synthetic data set is finally ready. wow what a pain in the ass. let me tell you.

past meteor Oct 22, 2024, 9:50 AM

#

rigid cape If I give it a broad chapter name , it does give me some questions related to th...

Maybe you could try different strategies like adding metadata to your db and filtering and/or doing a hybrid search and reranking

#

Ultimately, it’s just important you know what you want to do first and then you can see if your approach is fit for purpose

rigid cape Oct 22, 2024, 10:30 AM

#

past meteor Ultimately, it’s just important you know what you want to do first and then you ...

Yeah doing it

unkempt apex Oct 22, 2024, 1:23 PM

#

anyone have exp with React + FastAPI?

#

getting some issues while deploying it

warm girder Oct 22, 2024, 1:51 PM

#

From where do i start as a beginner for data science?

serene scaffold Oct 22, 2024, 1:58 PM

#

warm girder From where do i start as a beginner for data science?

!resources data science

arctic wedgeBOT Oct 22, 2024, 1:58 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold Oct 22, 2024, 1:58 PM

#

unkempt apex anyone have exp with React + FastAPI?

Hello, this is a #web-development question.
remember to always ask your actual quesiton. don't say you're "getting an issue". say what the issue is and give the information someone would need to help you.

past meteor Oct 22, 2024, 2:04 PM

#

unkempt apex anyone have exp with React + FastAPI?

You should ask your question in full instead of asking if we know X, Y or Z

serene scaffold Oct 22, 2024, 2:05 PM

#

past meteor You should ask your question in full instead of asking if we know X, Y or Z

unkempt apex Oct 22, 2024, 2:13 PM

#

past meteor You should ask your question in full instead of asking if we know X, Y or Z

my bad

#

getting issues with deploying FastAPI + React webapp on vercel

#

just reply and I will send the logs

past meteor Oct 22, 2024, 2:32 PM

#

unkempt apex just reply and I will send the logs

Send the logs.

unkempt apex Oct 22, 2024, 2:34 PM

#

past meteor Send the logs.

okay so on vercel
it says

as per the attached ss

although the build is complete in react ( I mean in frontend/ dir )

past meteor Oct 22, 2024, 2:35 PM

#

unkempt apex okay so on vercel it says as per the attached ss although the build is complet...

Seems like this doesn’t really have to do with data science or Python but what do you mean with “the build is complete in react”

#

Are you using vite?

unkempt apex Oct 22, 2024, 2:35 PM

#

past meteor Seems like this doesn’t really have to do with data science or Python but what d...

after npm run build

past meteor Oct 22, 2024, 2:36 PM

#

Are you planning on fastapi serving the static files?

unkempt apex Oct 22, 2024, 2:36 PM

#

it is now ready to deployed

unkempt apex Oct 22, 2024, 2:36 PM

#

past meteor Are you planning on fastapi serving the static files?

I have done that already

#

wanna see?\

past meteor Oct 22, 2024, 2:37 PM

#

I haven’t used vercel yet but if I were you I’d have CI/CD on GitHub actions where the CI runs npm run build, puts it in your static folder and the CD deploys your Python project like any normal one

unkempt apex Oct 22, 2024, 2:39 PM

#

past meteor I haven’t used vercel yet but if I were you I’d have CI/CD on GitHub actions whe...

ohh that's interesting any article explaning this?

past meteor Oct 22, 2024, 2:43 PM

#

Not that I know of but it should be pretty easy 😄

I’d start with making a 2 bash scripts, call one CI and one CD.

In the CI bash script you basically build your project and potentially mv the files to the right place (Python’s static folder).

In your CD bash script you use vercel cli to deploy your Python project.

After you have these show them to ChatGPT and figure out how to turn them into a GitHub actions yaml

#

Although, you’re the first person I know to run Python on vercel

unkempt apex Oct 22, 2024, 2:48 PM

#

past meteor Not that I know of but it should be pretty easy 😄 I’d start with making a 2 b...

ohh that's nice

unkempt apex Oct 22, 2024, 2:48 PM

#

past meteor Although, you’re the first person I know to run Python on vercel

yeah I mean I have uploaded like 3 webapps already on vercel
u just have to select older version on node on vercel and it works

unkempt apex Oct 22, 2024, 2:53 PM

#

past meteor Not that I know of but it should be pretty easy 😄 I’d start with making a 2 b...

I think I will do this with node js now
bcz vercel works fine with node

signal whale Oct 22, 2024, 3:27 PM

#

#1298301107726061568

#

wondering if the way i am implementing np.log1p is appected because this way my results are 10x better

upbeat prism Oct 22, 2024, 3:55 PM

#

Hi, I implemented my own transformer based on the attention is all you need paper and it works. Now I wanna do the following classification task using the encoder block only. Given a list of 10 numbers, each between 1 and 20 I wanna ask if the first element in the ist is repeated. E.g. [1,2,3,4,5,6,7,8,9,0] -> False but [1,2,3,4,5,6,1,8,9,0] -> True.

So now I need a ClassifierHead. Basically my output of my encoder block is of dim (batch_size,sequence_length,d_model).

So basically I need to go from (sequence_length, d_model) to (2,)

How do I do that? I know I can reduce one dimension with a linear layer but it's still a 2D input.

#

for this specific task, I think I should basically make a linear map from (batch_size, 0, d_model) to (batch_size, 2) i.e. focusing on the first token but I'm a bit unsure how to argue for that.

I guess my question boils down to: What's the actual output of the encoder?

#

of course, I could also just use the (normalized) logits and feed it to a loss function?

#

(that already includes sigmoid/softmax like Bceloss from torch)

mint palm Oct 22, 2024, 4:29 PM

#

full padding in convT

#

??what is does?

vestal spruce Oct 22, 2024, 6:36 PM

#

Hi, a question about sentiment analysis, if I wanted to make an analysis to factor in the age of said sentiment, how do I quantize/calculate and include this metric into my model?

jaunty helm Oct 23, 2024, 4:26 AM

#

How do you guys usually deal with missings in time series?
Right now I can bin them (i.e. original data is in 1 min intervals -> round to 5 min bins) and take the mean or something, or I can try imputation, or there's some other method I'm unaware of

rich moth Oct 23, 2024, 4:26 AM

#

I did it baby! Watch out Claude and Chatgpt! Im coming baby!

versed pilot Oct 23, 2024, 8:06 AM

#

serene scaffold

Nice muscle guys, what food supplements would you recommend ? 😛

dense star Oct 23, 2024, 8:19 AM

#

I have a question it is possible with CNN model to skip a part of a object that I'm comparing? Can I use like open cv to check it and program to get a certain part of a object and then compare? Or something like a backup picture and check if it's the same error on the object en skip it?

hardy bear Oct 23, 2024, 8:56 AM

#

rich moth I think my synthetic data set is finally ready. wow what a pain in the ass. le...

Hm interesting

cyan urchin Oct 23, 2024, 1:53 PM

#

Hello, good day! I would like to ask if some good people here know any fast audio noise cancellation algorithm that will be used for real-time processing.

oblique isle Oct 23, 2024, 2:05 PM

#

hello guys , wht are the best plateform where i can really prepare my ML interview

#

theory + programming

serene scaffold Oct 23, 2024, 2:09 PM

#

oblique isle hello guys , wht are the best plateform where i can really prepare my ML intervi...

you shouldn't need to do lots of extra learning in addition to the education and experience that you got leading up to the interview. you probably won't retain it all anyway. they want to talk to you because they think the material on your resume is relevant to what they'd need you to do.

practice talking clearly and confidently about items on your resume. what projects did you contribute to, and what were your specific contributions?

oblique isle Oct 23, 2024, 2:11 PM

#

serene scaffold you shouldn't need to do lots of extra learning in addition to the education and...

but there is a technical interview about ML theory and programming

serene scaffold Oct 23, 2024, 2:11 PM

#

oblique isle but there is a technical interview about ML theory and programming

yes. you should already know how to answer all the technical ML theory and programming questions they might ask you. Cramming in the last few days probably won't make a difference.

#

which is why my recommendation is to practice talking clearly and confidently about your resume items.

oblique isle Oct 23, 2024, 2:12 PM

#

serene scaffold which is why my recommendation is to practice talking clearly and confidently ab...

Thanks man ! you gotta point !! ❤️

serene scaffold Oct 23, 2024, 2:13 PM

#

you've probably spent months or years in school up to this point learning this kind of material. how much of a difference will three days of cramming really make?

#

(reasonable people can disagree with me on this, but that's my opinion.)

oblique isle Oct 23, 2024, 2:15 PM

#

well i dont agree wth u completly but there is sm points where i do agree with u, but i got your point . and for real i appreciate your effort of responding !! ❤️

#

that was helpfull

serene scaffold Oct 23, 2024, 2:16 PM

#

you are welcome pepefedora

vestal spruce Oct 23, 2024, 4:01 PM

#

Hi is anyone familiar with TextBlob? I just started learning how to use it and wondering if the package include stop word on it's model?

#

Ok I just looked it up apparently textblob doesn't have stopwords, so I guess adding nltk is necessary for this preprocessing task :/

rich moth Oct 23, 2024, 8:30 PM

#

This link has a lot of information covering machine learning, if anyone is interested. https://www.cs.ubc.ca/~schmidtm/Courses/LecturesOnML/

100 Lectures on Machine Learning (Mark Schmidt)

Lecture slides from courses taught by Mark Schmidt at UBC

rich moth Oct 23, 2024, 9:52 PM

#

I had this crazy idea of using AI to predict prime numbers, and guess what? It actually works! lol so I trained 3 different ML models on big dataset of primes, and then combined their predictions for accuracy. It even generates an Ulam spiral to visualize the patterns!

serene scaffold Oct 23, 2024, 9:57 PM

#

rich moth I had this crazy idea of using AI to predict prime numbers, and guess what? It a...

How did you do it?

rich moth Oct 23, 2024, 10:02 PM

#

serene scaffold How did you do it?

So after I was reading about the Riemann Hypothesis, I had this weird thought .What if instead of trying to PROVE patterns in primes, we tried to PREDICT them using AI?

brave stream Oct 23, 2024, 10:44 PM

#

Hi, does anybody have any experience using the Magenta library. I'm attempting work on a audio synthesis project using audio files/spectrograms and am attempting to follow along with Magenta's guides on installation/implementation but it doesn't seem properly supported by google collab anymore or their guides just aren't current anymore. Are there known workarounds/forks/repositories that account for this?

rich moth Oct 23, 2024, 11:25 PM

#

INFO:__main__:Generating primes...
INFO:__main__:Generated 78498 primes
INFO:__main__:Training quantum ensemble...
INFO:__main__:Extracting features for training...
INFO:__main__:Training with feature shape: torch.Size([78497, 6])
INFO:__main__:Epoch 0, Average Loss: 49.3643
INFO:__main__:Epoch 10, Average Loss: 1.6099
INFO:__main__:Epoch 20, Average Loss: 1.2289
INFO:__main__:Epoch 30, Average Loss: 1.0271
INFO:__main__:Epoch 40, Average Loss: 0.8719
INFO:__main__:Epoch 50, Average Loss: 0.8199
INFO:__main__:Epoch 60, Average Loss: 0.8696
INFO:__main__:Epoch 70, Average Loss: 0.6831
INFO:__main__:Epoch 80, Average Loss: 0.6840
INFO:__main__:Epoch 90, Average Loss: 0.7866
INFO:__main__:Training Random Forest...
INFO:__main__:Making predictions...
100%|██████████| 99/99 [00:06<00:00, 14.41it/s]
INFO:__main__:Mean Absolute Error: 10.0802

Im generating 5,761,455 primes now, Ill share the results with ya guys when its done

rich moth Oct 23, 2024, 11:43 PM

#

From a security perspective, what implications might this have if an AI could reliably predict prime number patterns?

serene scaffold Oct 24, 2024, 12:05 AM

#

rich moth From a security perspective, what implications might this have if an AI could re...

All the primes below some absurdly large number are known, so I can't imagine it would make a security difference

serene scaffold Oct 24, 2024, 12:06 AM

#

rich moth So after I was reading about the Riemann Hypothesis, I had this weird thought .W...

Sounds like an interesting thesis topic.

#

@wooden sail What is the largest known prime n for which all prime numbers lower than n are known?

#

(there might be undiscovered prime numbers that are less than the largest known prime)

rich moth Oct 24, 2024, 1:09 AM

#

Dude, I was imagining a blockchain where each new block represents a verified prime number, secured through a DAG consensus mechanism. Miners would contribute by finding new primes, I mean instead of just generating meaningless hashes, what I mean is the output of the computation is actually useful information we can build a ledger. I mean its actually contributing to science.

rich moth Oct 24, 2024, 2:34 AM

#

This ones a little more interesting.

gritty vessel Oct 24, 2024, 3:07 AM

#

Hey can I ask questions related to data processing in this channel?

rich moth Oct 24, 2024, 3:16 AM

#

gritty vessel Hey can I ask questions related to data processing in this channel?

ya, whats the worse that can happen 🙂

gritty vessel Oct 24, 2024, 3:16 AM

#

OK thanks

#

I am working on satellite data lets say i have two satellites data namely l1 and l2 I have to extract the area covered by l2 from l1

#

data in l1 is like a Quadrilateral shape and region covered by l2 is curved strip passing through that Quadrilateral

#

Currently I am using ckdtree to extract the nearest cordinates in l 1 and l2 with tolarance of of 0.057

#

Ckdtree is working fine but i thought if there is any better apporach to do this

#

in this that red line is l2 and whole region visible on map is data of l1

rich moth Oct 24, 2024, 4:43 AM

#

!paste

arctic wedgeBOT Oct 24, 2024, 4:43 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

gritty vessel Oct 24, 2024, 5:07 AM

#

rich moth !paste

this is empty

#

should i share my code ? that iam using ?>?

wooden sail Oct 24, 2024, 6:05 AM

#

serene scaffold <@467435887236612106> What is the largest known prime `n` for which all prime nu...

i'm honestly not sure, though i think the largest ones like the recent one in the news are larger than that. but google tells me computing trillions of primes is a common thing, and it's a list longer than one would want (or be able to) store. here's a link to something i found on reddit https://t5k.org/notes/faq/LongestList.html that talks about it and provides a list of 50 million primes, but the TL;DR is that you'd need a method of generating the primes on the fly, at least up to 10^18 if you want to be competitive with this person's website

rich moth Oct 24, 2024, 6:49 AM

#

gritty vessel this is empty

Sure, if thats something you're comfortable with, we can take a look

gritty vessel Oct 24, 2024, 6:49 AM

#

rich moth Sure, if thats something you're comfortable with, we can take a look

https://pastebin.com/zaR6mNP0

Pastebin

# Create cKDTree for INSAT data ckdtree = cKDTree(i...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

Currently I am using this way

hazy vector Oct 24, 2024, 1:58 PM

#

Anyone help me i want to learn coding and spoken english at the same time, i am confused to how to do that?

unkempt apex Oct 24, 2024, 2:55 PM

#

gritty vessel https://pastebin.com/zaR6mNP0

bruhh, use python version of pastebin

lapis sequoia Oct 24, 2024, 4:14 PM

#

Has anyone build cicd pipeline for azure databricks notebooks using azure pipeline??

mint palm Oct 24, 2024, 4:41 PM

#

if an unknown company approaches you, should they give some hint about the pay range?
I hate it when there is no way I can get an idea about pay range
and asking upfront doesnt seems very convenient as a candidate.

signal whale Oct 24, 2024, 6:46 PM

#

@rich moth can you help?

rich moth Oct 24, 2024, 9:21 PM

#

signal whale <@204385862081970178> can you help?

I can sure try

signal whale Oct 24, 2024, 9:24 PM

#

rich moth I can sure try

uhh about to go to bed now it is almost midnight in europe 😅 pithink

golden canyon Oct 24, 2024, 10:02 PM

#

hey guys, I am looking for some advice on how to start being invited for the interviews as an ML engineer being fresh out of university? I am totally lost

rich moth Oct 24, 2024, 10:33 PM

#

signal whale uhh about to go to bed now it is almost midnight in europe 😅 <:pithink:65224755...

Sorry, we're on opposite time zones, but maybe we can do later.

rich moth Oct 24, 2024, 10:41 PM

#

golden canyon hey guys, I am looking for some advice on how to start being invited for the int...

I would think attending those tech conferences and trying to link up and meet other people would be a good bet. I imagine have some cool projects on your github would be a plus. Resume, anyone can generate that. I would want to see your thoughts to actions in form of projects or something with some depth. I mean, that's what I would look for if I was in that position. it's like that one movie "field of dreams". theres a quote, "if you build it, they will come". Have you tried just asking google or a chatbot? nowadays they're great places for resources, use the new free claude, ask that same question.

vocal cave Oct 24, 2024, 11:27 PM

#

serene scaffold (there might be undiscovered prime numbers that are less than the largest known ...

the largest known prime is a prime of the form 2^q - 1 which q rounding 150k, we don't know all primes up to there though, but those primes had interesting properties, I just can't remember what where they

vocal cave Oct 24, 2024, 11:28 PM

#

vocal cave the largest known prime is a prime of the form `2^q - 1` which q rounding 150k, ...

nevermind, the exponent is 136 million

#

with a whopping 41 million digits

#

https://en.wikipedia.org/wiki/Largest_known_prime_number

Largest known prime number

The largest known prime number is 2136,279,841 − 1, a number which has 41,024,320 digits when written in base 10. It was found on October 12, 2024 by a computer volunteered by Luke Durant to the Great Internet Mersenne Prime Search (GIMPS).

A prime number is a natural number greater than 1 with no divisors other than 1 and itself. According to ...

#

https://en.wikipedia.org/wiki/Euclid–Euler_theorem
it was this obscure theorem I was talking about

Euclid–Euler theorem

The Euclid–Euler theorem is a theorem in number theory that relates perfect numbers to Mersenne primes. It states that an even number is perfect if and only if it has the form 2p−1(2p − 1), where 2p − 1 is a prime number. The theorem is named after mathematicians Euclid and Leonhard Euler, who respectively proved the "if" and "only if" aspects o...

rich moth Oct 24, 2024, 11:58 PM

#

I finally got all the losess to converge, but I need some serious equipment to train it, lol. 128gigs ram and 24gigs of vram isnt cutting it, it gets killed along the way due to OOM

#

Tonight Im gonna combine the parquet files of MTG and pokemon. If' im gonna train this thing, I imagine it will be on AWS EC2? i'm gonna need to recreate the env in ubuntu. Anyone got any ideas?

shut girder Oct 25, 2024, 4:42 AM

#

rich moth I finally got all the losess to converge, but I need some serious equipment to t...

Hi, I am not experienced enough to help. But this seems very interesting, what do you plan to do with the Pokemon cards?

rich moth Oct 25, 2024, 5:05 AM

#

shut girder Hi, I am not experienced enough to help. But this seems very interesting, what d...

I'm building a multi-modal AI model thats learning to understand trading card games, i built datasets with all the pokemone and MTG cards and orgainzed them. Roughly 25 gigs of data. Im usuing a VQVAE to learn the card layouts, art visual elements, diffusion to help generate and refine the images and CLIP to align images with the text descriptions. I'm using sentenece transformer to understand the card mechanics and flavor of the text, card names, card rules, etc to understand the different writing styles. I made a transforme to understand all the card stats, types, subtyypes, etc. This confusion matrix shows how its learning to classify the cards. I'm just doing pokemon right now until I get the final touches on it. Im using optuna to fine tune the hyperparmeters. The idea is to get it to understand the realms of both and generate new and unique cards based on prompts.

#

Theres some other things going on, but thats the geist.

versed pilot Oct 25, 2024, 7:08 AM

#

gritty vessel I am working on satellite data lets say i have two satellites data namely l1 and...

So is your data in raster format? Like netcdf with coordinates for each pixel? I'm more used to working with vector data, geopandas is a good tool for that, I wonder if you can tweak your data to work with it https://geopandas.org/en/stable/docs/user_guide/set_operations.html

signal whale Oct 25, 2024, 8:37 AM

#

!paste

arctic wedgeBOT Oct 25, 2024, 8:37 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

signal whale Oct 25, 2024, 8:39 AM

#

https://paste.pythondiscord.com/LJKA

toxic mortar Oct 25, 2024, 3:37 PM

#

How do you guys aproach parsing images within a document?

I’ve reached the point where I can extract them and use OpenAI vision models to analyze them, meaning I have two independent components ( ones is textual corpus and the other textual with image analysis origin ).

I want to embed the analyzed image context at the correct place within the parsed textual content of the page.
Is there any prebuilt solution for this within a Llamaparse which I am not aware of?

serene scaffold Oct 25, 2024, 7:01 PM

#

<@&831776746206265384> advertising @cosmic patrol

rugged mist Oct 25, 2024, 7:04 PM

#

@cosmic patrol please do not send advertisements in this server without approval from moderators

signal whale Oct 25, 2024, 7:41 PM

#

#

is there a way i can add 2 tables together in looker studio dim and fact tables 🙂

left tartan Oct 25, 2024, 8:17 PM

#

signal whale

wdym? You have two top 5 lists and you want one table?

signal whale Oct 25, 2024, 8:20 PM

#

#

wanna have something like this

#

as 3d table

left tartan Oct 25, 2024, 8:22 PM

#

Oh, it's a looker question, not a query question.. nm, can't help 🙂

signal whale Oct 25, 2024, 8:24 PM

#

yeah i sucks like power bi more lol and i use linux then you know it is bad

signal whale Oct 25, 2024, 9:23 PM

#

fixed it

fickle shale Oct 26, 2024, 5:14 AM

#

#

fickle shale Oct 26, 2024, 5:15 AM

#

fickle shale

Why it;s not working?

rich moth Oct 26, 2024, 6:45 AM

#

long robin Oct 26, 2024, 11:20 AM

#

Bottleneck issue in PyTorch:
I will be grateful if someone's solution or suggestion works for me.

I am learning deep learning using PyTorch. When I train CNN models. In some datasets or sometimes while using pretrained models, it doesn't utilize my GPU at all. I have a RTX-3050ti. And sometimes it works. Another thing is that when the num_workers=0, everything on the main thread it didn't give me the usse but for values greather than 0, it was causing the same low/zero GPU usage.

I have read so many articles, forumns, other stuff etc. But I am not able to understand what's actually happening.

jaunty helm Oct 26, 2024, 11:25 AM

#

long robin **Bottleneck issue in PyTorch:** I will be grateful if someone's solution or sug...

did you download pytorch with cuda? or did you download the cpu version
if you did pip install pytorch then you downloaded the cpu version

fickle shale Oct 26, 2024, 11:36 AM

#

fickle shale

Why It's not working?

floral delta Oct 26, 2024, 2:02 PM

#

hye guys I'm doing a data visualization with a BDS algo but the graph has some bugs when I run it, do u know why it happens

sharp crest Oct 26, 2024, 2:16 PM

#

Screenshot Your Bugs

earnest widget Oct 26, 2024, 2:36 PM

#

fickle shale Why It's not working?

What’s the error? We need some more info.

fickle shale Oct 26, 2024, 2:47 PM

#

earnest widget What’s the error? We need some more info.

in edit file my code works fine but if we open my code it;s not working

floral delta Oct 26, 2024, 2:58 PM

#

https://paste.pythondiscord.com/6NNA

earnest widget Oct 26, 2024, 3:02 PM

#

fickle shale in edit file my code works fine but if we open my code it;s not working

What’s the error though?

floral delta Oct 26, 2024, 3:10 PM

#

I solve it but idk how to implement this point in my function: The function in point four above should accept i.e. have three arguments that can be changed, as

graph_name.
(Start/initial) student name who wishes to connect.
(End/goal) student name of target person.
For example, if you store the relationships under a structure named “GraphX” and “Ema” wishes to get
introduced to “Bob” then you would call the function from the main as follows:
BFS_firstname (“GraphX”, “Ema”, “Bob”)

#

cuz in my State is already declared my initial State = myName and my GoalState = Jill but idk what to exactly do in that point

fickle shale Oct 26, 2024, 3:54 PM

#

earnest widget What’s the error though?

nothing check both screenshot same code but 2 diff output

fickle shale Oct 26, 2024, 3:56 PM

#

earnest widget What’s the error though?

https://www.kaggle.com/code/lucifierx/linear-regression-tutorial

Linear Regression Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from House price prediction

#

Check this!

fickle shale Oct 26, 2024, 4:01 PM

#

fickle shale

on my https://www.kaggle.com/code/lucifierx/linear-regression-tutorial/ edit works fine

Linear Regression Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from House price prediction

#

https://hastebin.com/share/abuxadoguw.xml

Hastebin

Hastebin is a free web-based pastebin service for storing and sharing text and code snippets with anyone. Get started now.

earnest widget Oct 26, 2024, 4:38 PM

#

fickle shale on my https://www.kaggle.com/code/lucifierx/linear-regression-tutorial/ edit wor...

Oh about the table formatting, yeah it could be how Kaggle is changing the view when it's not in the edit format.

fickle shale Oct 26, 2024, 4:39 PM

#

earnest widget Oh about the table formatting, yeah it could be how Kaggle is changing the view ...

so how can i correct it?

#

It's working but from today it's not working

earnest widget Oct 26, 2024, 4:52 PM

#

fickle shale so how can i correct it?

I don't think anything can be changed for it because the view is optimized through Kaggle. Instead maybe you could put it in a form of a pandas dataframe? But it won't look like a proper colour table. Looking at other notebooks, it seems that's the only way to provide a dataset description or you could make it in the form of a list view. That's about it I guess.

fickle shale Oct 26, 2024, 4:55 PM

#

earnest widget I don't think anything can be changed for it because the view is optimized throu...

ok!

#

and Thanks!

broken eagle Oct 27, 2024, 12:12 AM

#

Anyone here worked with streaming dataset, and dataloader? need some help for finetuning a Blip model. If you have any experience or reference notebook. Please hit me up. thank you.

hearty token Oct 27, 2024, 9:01 AM

#

How can I precisely describe this? I've attempted using VLMS but they tend to get the precise coordinates wrong

cyan birch Oct 27, 2024, 10:29 AM

#

i am dealing with outliers problem and i have two approaches for dealing with outlier
create class have iqr and zscore and make comparison then create filter with best function of them or
i could use the scipy zscore and convert the outlier to null and then use decision tree to predict them again this could use overfitting
what should i do to address this problem either to reduce the data or impute the data

hardy depot Oct 27, 2024, 10:44 AM

#

https://github.com/heuristic-solver/Image-Pixelation-Detection-And-Correction
this is my work on pixelation detection and correction if you need it for reference kate

GitHub

GitHub - heuristic-solver/Image-Pixelation-Detection-And-Correction...

This repository offers a comprehensive solution for detecting and removing image pixelation using deep learning models. It leverages MobileNetV2 for pixelation detection and ESPCN for high-quality ...

fiery citrus Oct 27, 2024, 12:19 PM

#

Hello, i have my training code working for my AI model but when i try to use my GPU tensorflow just outright doesnt detect it.

I have an AMD RX6600. With this code the GPU count is 0, could it be that my GPU isnt supported?

import tensorflow as tf

# Confirm TensorFlow is using GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

#

I have found out that i am an idiot and TF doesnt support AMD, ill use a different modules

jaunty helm Oct 27, 2024, 12:43 PM

#

fiery citrus I have found out that i am an idiot and TF doesnt support AMD, ill use a differe...

I'm not sure if many support amd at all
everyone's nvidia cuda
pytorch's got ROCm support on linux ig

unkempt wigeon Oct 27, 2024, 6:14 PM

#

What functions are good for a conventional neral network my apologies

serene scaffold Oct 27, 2024, 6:20 PM

#

unkempt wigeon What functions are good for a conventional neral network my apologies

We've discussed how your questions need to provide more context about what you're trying to do.

Also, did you mean to say convolutional?

unkempt wigeon Oct 27, 2024, 6:30 PM

#

serene scaffold We've discussed how your questions need to provide more context about what you'r...

I want to make a convolutional neural network that can tell different vehicles apart like a valiant to Nova

unkempt apex Oct 27, 2024, 6:30 PM

#

serene scaffold We've discussed how your questions need to provide more context about what you'r...

I am just getting bored by "my apologies"

unkempt wigeon Oct 27, 2024, 6:36 PM

#

unkempt wigeon I want to make a convolutional neural network that can tell different vehicles a...

Nova red

Vallient oxidized copper

#

I want the AI to tell these apart

serene scaffold Oct 27, 2024, 6:56 PM

#

unkempt wigeon I want to make a convolutional neural network that can tell different vehicles a...

Was there something you found lacking about the advice we've given you about CNNs over the last several weeks?

unkempt wigeon Oct 27, 2024, 7:00 PM

#

Yes what type of defs I should use just in case I want to give it more data of images of the cars

Do I use a Relu or do I use a leaky Relu should I use the softmax for the beginning and end

spring field Oct 27, 2024, 8:46 PM

#

I'm fairly certain that the choice of what activation function to use has been explained at some point over the last several weeks
you either follow some implementation from a paper or you just test with several or just go with your gut, nothing here is set in stone
for CNNs, a ReLU is probably one of the most common ones

in terms of code architecture:

what type of defs I should use
that's something you sort of figure out by trying out various stuff and seeing what works best for you, it's more of a software design choice really
like, on its own or rather in this context of neural nets, it doesn't really make that much sense as a question, it also somewhat indicates a lack of understanding of some core principles that you should already be familiar with before taking on a project like this

sinful surge Oct 28, 2024, 10:28 AM

#

which version of tensorflow shall i install?? like am instaling conda as of now.

serene scaffold Oct 28, 2024, 10:31 AM

#

sinful surge which version of tensorflow shall i install?? like am instaling conda as of now.

Install pytorch.
But in general, unless you know why you need an earlier version, you can always just install the latest.

sinful surge Oct 28, 2024, 10:32 AM

#

serene scaffold Install pytorch. But in general, unless you know why you need an earlier version...

give me the command for tensorflow

#

leave the pytorch they are begineers.

serene scaffold Oct 28, 2024, 10:48 AM

#

sinful surge leave the pytorch they are begineers.

It's usually beginners who use tensorflow and professionals who use pytorch.

late lichen Oct 28, 2024, 10:51 AM

#

So uhm gradient decent is Like you will take the relation of a parameter to the output right?

serene scaffold Oct 28, 2024, 11:02 AM

#

late lichen So uhm gradient decent is Like you will take the relation of a parameter to the ...

The disparity between the actual output and the desired output is the loss.

Gradient descent is where you modify each parameter slightly in whichever direction would have resulted in a smaller loss.

wooden sail Oct 28, 2024, 11:20 AM

#

i wouldn't split it like that, but pytorch is a safer bet

#

tf introduces breaking changes irregularly

serene scaffold Oct 28, 2024, 11:21 AM

#

wooden sail i wouldn't split it like that, but pytorch is a safer bet

The only people I see using tensorflow are beginners following old tutorials

#

But There's nothing inherent to either library that makes it better for beginners or pros

indigo wing Oct 28, 2024, 11:55 AM

#

Hey, I want to enrich llama 3.2 3B uncencored instruct with specific knowledge for fine tuning the model for specific needs. How do i prepare the dataset? The dataset is about 3gb english and maths text. I have no idea what approach to take to preprocess it. Most I have done is stripping whitelines and unwanted characters using regex. My goal is to make it for querying. Do I use llama? I have no idea, I am new to fine tuning and first time using gpu.

#

Do I embed the words first?

#

My dataset is pdf images with texts which I successfully extracted and saved via fitz and oswalf(hurray oswalk). I am also aiming for very specific answers and limiting knowledge and answer access based on user authentication level. Can someone please help me out here?

#

I also trued using llamaindex and weaviate instead of FAISS but Its hard to setup for me and I dont want to use gpt and want good resutlts. i dont even know what to do anymore.

unique spoke Oct 28, 2024, 1:18 PM

#

Hey guys i am currently working on a computer vision project and im not sure if im going the right direction. So i made a flutter app which has a live camera stream and am trying to send each frame to the flask server containing the ai code which (not working yet but in theory ) should apply that code onto it. but am I going the wrong way. Am i supposed to create a camera livestream in the flask code and then somehow link it to the flutter because i saw some guy trying to do something similar to me on stack overflow and he went that direction. Any advice on how i should go about it? Step by Step process would be appreciated ( I will stick to flask and flutter tho)

twilit iris Oct 28, 2024, 3:34 PM

#

unique spoke Hey guys i am currently working on a computer vision project and im not sure if ...

Flutter App:
Use the camera package to capture the camera stream.
Convert each frame to a format that can be sent over the network (e.g., JPEG).
Use a library like http to send the frames to the Flask server.
Flask Server:
Use a library like flask to create a RESTful API that accepts frames from the Flutter app.
Use a library like OpenCV or TensorFlow to apply AI processing to the frames.
Send the processed frames back to the Flutter app.

iron basalt Oct 28, 2024, 8:38 PM

#

serene scaffold The only people I see using tensorflow are beginners following old tutorials

Tensorflow seems kind of like abandon-ware at this point. There are several issues open unresolved and the people still using it are starting to post stuff like https://github.com/tensorflow/tensorflow/issues/69586 .

GitHub

tensorflow is so buggy, you guys should just gave up and should mig...

Issue type Build/Install Have you reproduced the bug with TensorFlow Nightly? Yes Source source TensorFlow version tf 2.16 Custom code Yes OS platform and distribution ubuntu 24.04 Mobile device ws...

#

My guess is that Google focused instead on Jax.

serene scaffold Oct 28, 2024, 8:40 PM

#

iron basalt Tensorflow seems kind of like abandon-ware at this point. There are several issu...

Isn't Google diverting resources from TF to Jax?

iron basalt Oct 28, 2024, 8:40 PM

#

serene scaffold Isn't Google diverting resources from TF to Jax?

I think so.

marsh marsh Oct 28, 2024, 8:42 PM

#

can anyone help me with Sql Alchemy?

iron basalt Oct 28, 2024, 8:42 PM

#

Biggest issue with this is that there are a ton of old papers that used it and now those can't be reproduced, which kind of invalidates them. OpenAI Gym was having the same issue but it was forked and now maintained by the Farama Foundation.

serene scaffold Oct 28, 2024, 8:43 PM

#

iron basalt I think so.

Torch and Jax is a more sensible bifurcation of the neural network ecosystem

serene scaffold Oct 28, 2024, 8:43 PM

#

marsh marsh can anyone help me with Sql Alchemy?

Yes, but ask your question in #databases . And ask your actual question. Don't ask to ask.

marsh marsh Oct 28, 2024, 8:44 PM

#

serene scaffold Yes, but ask your question in <#342318764227821568> . And ask your actual questi...

i raised a post in Help channel

iron basalt Oct 28, 2024, 8:45 PM

#

Gym took a lot of effort because it has to be the exact same, for example, the image resize algorithm had to be the same, there is no standard way of doing that actually, and so it took digging through a bunch of old code to reproduce it. A lot of the DQNs fail if that is even slightly different (no noticeable difference to the human eye). If TF has any of these same issues it means all those old projects are doomed.

serene scaffold Oct 28, 2024, 8:45 PM

#

marsh marsh i raised a post in Help channel

Okay. Please only ask your question in one place. Don't open a help thread and then ask people to answer your question without telling them you actually have a thread.

marsh marsh Oct 28, 2024, 8:45 PM

#

serene scaffold Okay. Please only ask your question in one place. Don't open a help thread and t...

oh my mistake
but can you help me ?

serene scaffold Oct 28, 2024, 8:45 PM

#

marsh marsh oh my mistake but can you help me ?

Not right now. Sorry.

marsh marsh Oct 28, 2024, 8:46 PM

#

serene scaffold Not right now. Sorry.

okay sorry

toxic mortar Oct 28, 2024, 11:19 PM

#

Hi guys,

How is it possible that my model invocation time is longer than the entire request handling time (which includes the model request)? I'm calling the HuggingFace API.

Any ideas?


@ModelInvocationTimeMetric.time()
def model_invocation():
  pass

@RequestInvocationTime.time()
def handle_request():
  # ...
  model_invocation()
  # ...

buoyant vine Oct 28, 2024, 11:31 PM

#

Is the model on a GPU? It can be because the timer is being messed with by counting only the CPU time not the time spent on the gpu

toxic mortar Oct 28, 2024, 11:34 PM

#

buoyant vine Is the model on a GPU? It can be because the timer is being messed with by count...

Yup. In case, that might be an answer!

untold shoal Oct 28, 2024, 11:44 PM

#

anyone here fairly familiar with langgraph?

young beacon Oct 28, 2024, 11:45 PM

#

Hello, I have a set of different pssages, I'd like to identify the most frequent terms across them, and identify the theme of each passage and based on it group them in different categories

untold shoal Oct 28, 2024, 11:46 PM

#

could prolly use ai for that somehow lmao

young beacon Oct 28, 2024, 11:47 PM

#

I don't have AI available if you mean LLMs

untold shoal Oct 28, 2024, 11:48 PM

#

low parameter models like llama3.2 exist that are designed to run on literal phones

#

models are getting small but powerful recently

buoyant vine Oct 28, 2024, 11:48 PM

#

Most frequent terms is just tokenizing + counting
Theme/Classification into categories can be done with a GRU or similar with GloVe, Or you could use a small transformer BERT model

young beacon Oct 28, 2024, 11:48 PM

#

But how do I pass all the 400 passages at once and identify patterns?

young beacon Oct 28, 2024, 11:49 PM

#

buoyant vine - Most frequent terms is just tokenizing + counting - Theme/Classification into ...

Can u expand on the second term, I did get started with first step

buoyant vine Oct 28, 2024, 11:49 PM

#

or do zero-shot classification with a bigger LLM like llama as dragon has said

untold shoal Oct 28, 2024, 11:49 PM

#

you CF8 is u familiar with langgraph in any major way?

buoyant vine Oct 28, 2024, 11:49 PM

#

nop

untold shoal Oct 28, 2024, 11:50 PM

#

ah alr

young beacon Oct 28, 2024, 11:50 PM

#

buoyant vine or do zero-shot classification with a bigger LLM like llama as dragon has said

But I don't have pre defined set of classes available. I need to understand the common themes across them to come with such classes

whole elm Oct 29, 2024, 2:12 AM

#

I have this figure using matplotlib. I want to fill in the area between the blue and green plots. I have tried fill_between and fill_betweenx but can't seem to get the entire area filled. Does anyone have any suggestions?

bitter garden Oct 29, 2024, 5:28 AM

#

This is a histGradientBoostingRegression plot from a 300 row dataset with n_splits=5, hence 5 folds. Avg R-squared is 0.77.
Would I be able to increase the R score if I restrict the model to train on only the lower end values? and maybe increase the number of folds which may or may not help idk.

stark echo Oct 29, 2024, 8:05 AM

#

can anybody help me on how should i learn Tenserflow

rich moth Oct 29, 2024, 8:17 AM

#

https://wandb.ai/bigpunk2/volatility_prediction/reports/Volatility-Peridction--Vmlldzo5OTMxNjg4

Im looking for some feedback or ideas especially when it comes to gathering data for economic price indicators magnitude doesnt matter i figure as long as it gets marked on the board. What do you guys think?

W&B

Weights & Biases

Weights & Biases, developer tools for machine learning

#

It's bascially a transformer with attention layers but also implementing uncertainty estimation using Monte Carlo Dropout while also capturing both aleatoric epistemic uncertainties, right now im just running optuna on the parameters.

grim carbon Oct 29, 2024, 9:33 AM

#

Hi, is there anyone here who uses the lbph algorithm as a face recognition method?

untold cliff Oct 29, 2024, 11:08 AM

#

Why does spacy's small english model (en_core_web_sm) return vectors (of size 96) eventhough it shouldn't have any embeddings according to the documentation?

rich moth Oct 29, 2024, 11:15 AM

#

untold cliff Why does spacy's small english model (`en_core_web_sm`) return vectors (of size ...

There has to be some type of embedding then? If its not a word embedding its a token? Probley the size of the hideen layer

untold cliff Oct 29, 2024, 11:20 AM

#

rich moth There has to be some type of embedding then? If its not a word embedding its a ...

I'm sorry, i didn't quite get what you're trying to say. So even the small models have enbeddings? Eventhough if you check nlp.vocav.vector_lengths you'll find that it's zero

rich moth Oct 29, 2024, 11:24 AM

#

untold cliff I'm sorry, i didn't quite get what you're trying to say. So even the small model...

correct, thats probley why you see zero. But almost every model has token embeddings

long hazel Oct 29, 2024, 11:53 AM

#

serene scaffold The only people I see using tensorflow are beginners following old tutorials

Lol im watching sentdex rn 6 yr old playlist

mental ivy Oct 29, 2024, 1:36 PM

#

guys dose anyone use python NLTK?

serene scaffold Oct 29, 2024, 2:36 PM

#

mental ivy guys dose anyone use python NLTK?

Hello, remember to always ask your actual question. Don't ask if anyone knows about the topic of a secret question.

stable hollow Oct 29, 2024, 6:30 PM

#

Hey guys I'm looking for tutorials on making maps and working with map data. I want to focus on the graphics side of things, and customizing visuals with shapefiles and possibly layering things to make more in depth charts. I'm trying to work with geopandas and seaborn right now. Any ideas?

terse crag Oct 29, 2024, 7:01 PM

#

Hello, I have small problem with pyspark. I try to rewrite code from pandas where I have:

df.merge(df2, how="outer", left_on=["a"], right_on=["b"])

And I have got columns a_x and a_y. But when I do it in pyspark like:

df.join(df2, how="outer", on=["a"])

I have got two times column a... What did I wrong?

desert oar Oct 29, 2024, 8:10 PM

#

Maybe #tools-and-devops would be better? It doesn't seem like a Python-specific question

strong wharf Oct 29, 2024, 8:15 PM

#

Okay, I'll head there, thank you

rough herald Oct 29, 2024, 11:27 PM

#

is it possible to use chatgpt in my python code not helping me code just to impopliment it into my code

summer glade Oct 29, 2024, 11:37 PM

#

Hi everyone, i am new to machine learning. I created an image editor application in opencv and tkinter :
https://youtu.be/KCbpbTOPNt4?si=PtveFXaaVB1g3xQX

YouTube

GPU Governor

I made an image editor software in python (tkinter + openCV)

source code : https://github.com/gpu-governor/shrimp
discord: https://discord.gg/QM97pDZHtY

▶ Play video

past bramble Oct 30, 2024, 1:45 AM

#

any project ideas for AI? Should I try something different from neural network models? I only used TensorFlow in most of my AI apps, any other suggestions I should try?

past bramble Oct 30, 2024, 7:07 AM

#

I have access to A100 GPU, can anyone guide me on making a text gpt with this?

rich moth Oct 30, 2024, 7:38 AM

#

past bramble I have access to A100 GPU, can anyone guide me on making a text gpt with this?

I'd suggesting finding a guide oneline or asking AI to help guide you.

#

Truth is you'll learn faster than waiting for someone on here to show you. You're best bet is start to expereiment and ask for help along the way.

serene scaffold Oct 30, 2024, 8:11 AM

#

past bramble I have access to A100 GPU, can anyone guide me on making a text gpt with this?

You unfortunately can't. There's only about five companies with enough compute power to do this

#

What are you ultimately trying to do?

past bramble Oct 30, 2024, 8:15 AM

#

i was shown a github repo in this channel guiding to building one requiring A100

#

I'll have to search for it

#

I'm not looking for a perfect advanced text gpt, maybe a simple one

#

found it

#

https://github.com/karpathy/nanoGPT

GitHub

GitHub - karpathy/nanoGPT: The simplest, fastest repository for tra...

The simplest, fastest repository for training/finetuning medium-sized GPTs. - karpathy/nanoGPT

grand breach Oct 30, 2024, 10:13 AM

#

I observed that after removing tomek links and undersampling only on majority class the ROC AUC score got reduced by 11 % (trained on 1,36,750 records with balanced classes) but when I trained initially over 37,668 samples (with balanced classes) that removed tomek links from both classes I got an ROC AUC score of 84%. I trained with xgboost which of these is actually a good model ? the one trained on lower samples or the one trained on more number of samples ?

grand breach Oct 30, 2024, 10:29 AM

#

i mean both of them have way lesser number of samples in minority class compared to the real dataset which has 600k minority samples

lime palm Oct 30, 2024, 12:53 PM

#

hey

rigid hamlet Oct 30, 2024, 12:54 PM

#

Hey bro

#

Haha the bots were going nuts lol

rigid hamlet Oct 30, 2024, 12:54 PM

#

past bramble any project ideas for AI? Should I try something different from neural network m...

Yes lol

harsh heron Oct 30, 2024, 1:30 PM

#

Actually I want to make a project in how nerves works during our overthinking and when we interact with people

I want to write program in python of this concept so, I need someone's help

vestal spruce Oct 30, 2024, 1:52 PM

#

Hi guys I just finished making a machine learning model comparison for financial sentiment analysis classification, so far I manage to find the best model to be svm w/ ngram-tfidf feature extraction and an accuracy score of 83% or so, at this point should I keep exploring better methods or should I focus my work on finetuning the hyper parameter for the best model?

cedar tusk Oct 30, 2024, 1:59 PM

#

vestal spruce Hi guys I just finished making a machine learning model comparison for financial...

83 percent is VERY good. Try to increase f1 score and you should be golden.

#

i dk the current version of the f1 score but if its less than 60% i mean

#

or its good for me as well

jaunty helm Oct 30, 2024, 2:03 PM

#

accuracy is especially bad if your data is imbalanced
e.g. detecting fraud, there's way more data about normal transactions than fraudulent ones
just blindly guessing everything's normal gives you high accuracy

#

you can check others like f1, recall, precision, roc-auc

past meteor Oct 30, 2024, 2:09 PM

#

I don’t really like auc either

vestal spruce Oct 30, 2024, 2:11 PM

#

jaunty helm you can check others like f1, recall, precision, roc-auc

noted, I'll try that too

opaque merlin Oct 30, 2024, 2:12 PM

#

hello I am trying to do a arabic to english translation using Seq2Seq RNN model anyone experienced with NLP Pytorch can help, I wanted to ask a question

jaunty helm Oct 30, 2024, 2:13 PM

#

past meteor I don’t really like auc either

enlighten me please

grand breach Oct 30, 2024, 2:17 PM

#

past meteor I don’t really like auc either

isn't auc the proper metric here ?

grand breach Oct 30, 2024, 2:18 PM

#

grand breach I observed that after removing tomek links and undersampling only on majority cl...

these kind of findings are preventing me from concluding my project

past meteor Oct 30, 2024, 2:49 PM

#

jaunty helm enlighten me please

Because you don’t care about the entire area under the curve imo

#

You care about the missclasification costs at the operating point you set

grand breach Oct 30, 2024, 2:50 PM

#

I was reading that tomek links removes majority samples at decision boundary and is useful for dealing with class overlapping

past meteor Oct 30, 2024, 2:51 PM

#

Imagine you have 2 cases, one with a higher AUC and one with a lower.

It’s entirely possible the optimal setting is found on the second one

#

Each dot on the AUC’s curve is an operating point

pearl basin Oct 30, 2024, 3:05 PM

#

Hey there,
I already know basic web dev(mern)
Some python and DSA, OOP, stats
But have no idea about AI/ML dev

Can anyone suggest me a roadmap? For ML engineer

jaunty helm Oct 30, 2024, 3:06 PM

#

pearl basin Hey there, I already know basic web dev(mern) Some python and DSA, OOP, stats B...

I guess https://roadmap.sh/ ?

roadmap.sh

Developer Roadmaps - roadmap.sh

Community driven roadmaps, articles and guides for developers to grow in their career.

grand breach Oct 30, 2024, 3:40 PM

#

grand breach I observed that after removing tomek links and undersampling only on majority cl...

hi, can anyone help me decide with this ? i'm stuck and overthinking

cedar tusk Oct 30, 2024, 4:02 PM

#

pearl basin Hey there, I already know basic web dev(mern) Some python and DSA, OOP, stats B...

math -> stat -> python/R (or both) -> SQL -> database systems -> spark -> done

#

rest is u can do whatever u want at that point

grand breach Oct 30, 2024, 4:02 PM

#

removing only majority samples in each tomek link sounds more correct but not sure why performance decreased, i'd used random undersampling which might have caused this, i'm guessing ? i'd used xgboost without a lot of parameters

desert plinth Oct 30, 2024, 5:50 PM

#

Can anyone explain why when I run my RL agent for 1000 episodes (for the game snake) it just stops inputting by episode 500 and just dies to the top wall despite being penalised for doing so?

#

These are the first 10 epidodes:
Episode: 1, Total Reward: 2.80, Epsilon: 1.00
Episode: 2, Total Reward: 12.10, Epsilon: 1.00
Episode: 3, Total Reward: 45.80, Epsilon: 0.99
Episode: 4, Total Reward: 22.90, Epsilon: 0.99
Episode: 5, Total Reward: 2.20, Epsilon: 0.99
Episode: 6, Total Reward: 22.00, Epsilon: 0.99
Episode: 7, Total Reward: 5.40, Epsilon: 0.98
Episode: 8, Total Reward: 1.80, Epsilon: 0.98
Episode: 9, Total Reward: 0.40, Epsilon: 0.98
Episode: 10, Total Reward: 11.40, Epsilon: 0.97
vs the last 10:
Episode: 990, Total Reward: -0.00, Epsilon: 0.30
Episode: 991, Total Reward: -0.50, Epsilon: 0.30
Episode: 992, Total Reward: -0.40, Epsilon: 0.30
Episode: 993, Total Reward: 0.10, Epsilon: 0.30
Episode: 994, Total Reward: -0.40, Epsilon: 0.30
Episode: 995, Total Reward: 0.10, Epsilon: 0.30
Episode: 996, Total Reward: -0.20, Epsilon: 0.30
Episode: 997, Total Reward: -0.50, Epsilon: 0.30
Episode: 998, Total Reward: -0.40, Epsilon: 0.30
Episode: 999, Total Reward: -0.40, Epsilon: 0.30
Episode: 1000, Total Reward: -0.40, Epsilon: 0.30

unkempt apex Oct 30, 2024, 5:54 PM

#

desert plinth Can anyone explain why when I run my RL agent for 1000 episodes (for the game sn...

okay okay wait wait

#

so as per you,

after you have trained it for 1000 episodes it is still dying to top wall right?

desert plinth Oct 30, 2024, 5:55 PM

#

yep

#

and it has a punishment of -10 for doing so

unkempt apex Oct 30, 2024, 5:55 PM

#

and till now you have only trained for 1k episodes?

desert plinth Oct 30, 2024, 5:55 PM

#

yeah I'm pretty new so idk if that's bad lol

unkempt apex Oct 30, 2024, 5:56 PM

#

for you idea,
1k is too short

desert plinth Oct 30, 2024, 5:56 PM

#

But at the beginning it makes a lot of inputs and explores but because of the epsilon decay it stops making inputs towards the end

unkempt apex Oct 30, 2024, 5:56 PM

#

desert plinth But at the beginning it makes a lot of inputs and explores but because of the ep...

so you have to study about epsilon then

#

read the DQN paper first

#

https://arxiv.org/pdf/1312.5602

#

this one

desert plinth Oct 30, 2024, 5:58 PM

#

okay thank you

unkempt apex Oct 30, 2024, 5:58 PM

#

desert plinth But at the beginning it makes a lot of inputs and explores but because of the ep...

so the main idea of epsilon is, for first few episodes
it will explore the environment

now what does this mean? -> it will make random moves

and after that it will make fix moves as per the learning pattern

desert plinth Oct 30, 2024, 5:59 PM

#

yeah

unkempt apex Oct 30, 2024, 5:59 PM

#

but I will suggest to share about your environment , about your reward function

desert plinth Oct 30, 2024, 6:00 PM

#

I linked the file above

unkempt apex Oct 30, 2024, 6:00 PM

#

unkempt apex https://arxiv.org/pdf/1312.5602

just read it as per structure wise
for example, read about new jargons like -> reward functions, episodes and stuff

desert plinth Oct 30, 2024, 6:00 PM

#

with all the code

unkempt apex Oct 30, 2024, 6:00 PM

#

where?

#

plz share again if possible

desert plinth Oct 30, 2024, 6:00 PM

#

I can't chat reply it since it pops up with an error

unkempt apex Oct 30, 2024, 6:00 PM

#

desert plinth I can't chat reply it since it pops up with an error

that's why you have to use pastebin

desert plinth Oct 30, 2024, 6:00 PM

#

Ohhhh

unkempt apex Oct 30, 2024, 6:01 PM

#

https://paste.pythondiscord.com/

#

paste the code in this link and share the link

#

you have your custom environment or you are using GYm??

#

wait does gym have snake game in it?

desert plinth Oct 30, 2024, 6:02 PM

#

I just wrote the code for the snake game in with the agent

past bramble Oct 30, 2024, 6:02 PM

#

rigid hamlet Yes lol

what do you suggest

desert plinth Oct 30, 2024, 6:02 PM

#

I didn't use an environment

unkempt apex Oct 30, 2024, 6:02 PM

#

desert plinth I didn't use an environment

heh? so how you are training it?

desert plinth Oct 30, 2024, 6:03 PM

#

I made it store the values for the weights after each run

unkempt apex Oct 30, 2024, 6:03 PM

#

please don't use GPT and claude if you are learning

desert plinth Oct 30, 2024, 6:03 PM

#

so it keeps and imroves them

unkempt apex Oct 30, 2024, 6:03 PM

#

this is all AI made code

#

don't do like this

desert plinth Oct 30, 2024, 6:03 PM

#

damn you're observant 😭

unkempt apex Oct 30, 2024, 6:04 PM

#

wait I will share some articles/blogs where you can learn that

#

again please don't use AI when you are learning something

desert plinth Oct 30, 2024, 6:04 PM

#

Okay

unkempt apex Oct 30, 2024, 6:04 PM

#

it will destroy your baseline of understanding things

desert plinth Oct 30, 2024, 6:04 PM

#

I thought I could grasp the concepts by seeing them done automatically

unkempt apex Oct 30, 2024, 6:04 PM

#

desert plinth I thought I could grasp the concepts by seeing them done automatically

nah

desert plinth Oct 30, 2024, 6:05 PM

#

Where do you suggest I start?

unkempt apex Oct 30, 2024, 6:09 PM

#

-> read this one to just for intro
https://blog.paperspace.com/introduction-to-reinforcement-learning/

-> this is to learn about enviornment
https://medium.com/@paulswenson2/an-introduction-to-building-custom-reinforcement-learning-environment-using-openai-gym-d8a5e7cf07ea

-> this is to start , RL for lunar lander with pre made OpenAI-Gym library
https://medium.com/@sokistar24/introduction-to-deep-reinforcement-learning-solving-the-lunar-lander-c1bb0f6e6f0

-> again same but for Pong game
https://towardsdatascience.com/intro-to-reinforcement-learning-pong-92a94aa0f84d

-> karpathy's one
https://karpathy.github.io/2016/05/31/rl/

Paperspace by DigitalOcean Blog

Introduction to Reinforcement Learning | Paperspace Blog

This article gives a thorough introduction to reinforcement learning, covering topics like RL in robotics, Markov Decision Processes, Q-Learning, and more.

Medium

An Introduction to Building Custom Reinforcement Learning Environme...

Introduction

Medium

Landing a Space Craft on the Moon Using Deep Reinforcement Learning...

Introducing the lunar lander

Medium

Playing Pong from pixels using Reinforcement Learning

A gentle introduction to the key principles of Reinforcement Learning

Deep Reinforcement Learning: Pong from Pixels

Musings of a Computer Scientist.

desert plinth Oct 30, 2024, 6:09 PM

#

Oh wow I was not expecting that

#

thank you

unkempt apex Oct 30, 2024, 6:10 PM

#

I will recommmend to use pre made environment first
train them for like atleast 100k episodes

learn about DQN -> but first what is Q-table

#

and yeah, you can find different types of pre made environment in OpenAI - Gym

#

I have done like 300k episodes on my Pong game which has custom made environment

unkempt apex Oct 30, 2024, 6:12 PM

#

unkempt apex https://arxiv.org/pdf/1312.5602

but don't forgot about this paper, it is most important

#

read it in chunks and understand it step by step

cedar tusk Oct 30, 2024, 6:24 PM

#

ah reinforcement learning, one type of learning i cant do because i cant code the game itself xD

unkempt apex Oct 30, 2024, 6:26 PM

#

cedar tusk ah reinforcement learning, one type of learning i cant do because i cant code th...

you don't have to code the game actually

cedar tusk Oct 30, 2024, 6:26 PM

#

unkempt apex you don't have to code the game actually

i mean i prob can but its not worth the effort most of the time

#

because most likely scenario is that you getting the game logic false

#

and then create a biased and flawed model

unkempt apex Oct 30, 2024, 6:28 PM

#

unkempt apex you don't have to code the game actually

read this

#

gym library provides all

#

you just have to choose right approach and train it

rigid hamlet Oct 30, 2024, 6:28 PM

#

past bramble what do you suggest

Lots lol an ai stock market tool would be cool

rich moth Oct 30, 2024, 8:06 PM

#

rigid hamlet Lots lol an ai stock market tool would be cool

That's what I'm working on. Well a model to perdict market volatility. I got 209 features from market prices technical indicators, options data economic signals to forcast the volatility of the market, well thats the plan. The idea was to back test it on Quant connect.

rigid hamlet Oct 30, 2024, 8:10 PM

#

That’s fucking amazing

#

Wowwwwwww

#

I wish I could join you @rich moth XD

rich moth Oct 30, 2024, 8:20 PM

#

rigid hamlet Wowwwwwww

really? thanks lol. im always open to work with other people. you never know what you could learn!

rigid hamlet Oct 30, 2024, 8:20 PM

#

Yes 100%

#

I’m a big time investor in a sense

#

I dm you bro

craggy pilot Oct 30, 2024, 10:05 PM

#

rigid hamlet I dm you bro

can i join

rich kernel Oct 31, 2024, 2:05 AM

#

I need datasets (specifically udders of cows, teats, etc) for a project regarding early mastitis detection within cattle. I have looked at kaggle but images are pretty limited (they contain roughly ~100 images, but i would prefer thousands for accuracy). Does anyone know where I can find datasets?

long hazel Oct 31, 2024, 2:10 AM

#

is this a place where i can post my code]

#

specifically im getting an error that has to with the .keras extension

#

ValueError: The filepath provided must end in .keras (Keras model format). Received: filepath=models/RNN_Final-{epoch:02d}-{val_acc:.3f}.model

tawdry sundial Oct 31, 2024, 2:13 AM

#

whats wrong here #1301353561397133372

agile cobalt Oct 31, 2024, 2:27 AM

#

rich kernel I need datasets (specifically udders of cows, teats, etc) for a project regardin...

a possible starting point is https://datasetsearch.research.google.com/

not sure if there's much out there though, maybe something like https://data.mendeley.com/datasets/kbvcdw5b4m/2 ? (note: I haven't downloaded it to check myself)

Data for: Clinical Mastitis in Cows based on Udder Parameter using ...

The data is collected from the udder of a cow to detect clinical mastitis. The four flex sensors and a temperature sensor are used to collect the udder data. Milk image is processed to find the quality of milk which also affects clinical mastitis. In the milk quality attribute, 0 indicates the normal milk and 1 indicates the abnormal milk. The d...

#

Kaggle has some for lumpy skin (1, 2 )

but for mastitis just https://www.kaggle.com/datasets/sivaprathishsiva/mastitis-disease-detection it seems

Hugging Face does not appears to have anything particularly useful either...

bitter garden Oct 31, 2024, 3:05 AM

#

rich moth That's what I'm working on. Well a model to perdict market volatility. I got ...

Working on a personal project of my own on the similar lines but for crypto.
Hourly data fetching -> a ton of technical analyses -> picking top 30 with a good score and backtesting them.
Additionally a hist gradient model that feeds on technical data and makes hourly target predictions. Got it to almost 90% accuracy, just need to validate it on more data to come at a number and decide on tuning parameters. Sometimes I see overfitting issues, still alot of work to be done regarding implementing news feeds sentiment analysis and maybe an ensemble model stack, I'll give it maybe 70% accuracy for now, even though it shows more than that in these plots, it just sounds too good to be true lol

bitter garden Oct 31, 2024, 3:39 AM

#

rich moth Oct 31, 2024, 4:01 AM

#

bitter garden

Looks sexy man . I've just started optuna parameter search a new model today. https://wandb.ai/bigpunk2/volatility_hound?nw=nwuserbigpunk2

W&B

bigpunk2

Weights & Biases, developer tools for machine learning

#

Ya overfitting issues seems common place with us lol

brave stream Oct 31, 2024, 5:04 AM

#

Does anyone here have experience using Magenta? I'm attempting to use the library for music synthesis but it seems deprecated and doesn't interact very well with google colab now.

rich moth Oct 31, 2024, 5:13 AM

#

One of the losses came back infinity. 🤔 Guess it didnt like the parameters hehe

trim saddle Oct 31, 2024, 7:20 AM

#

long hazel ValueError: The filepath provided must end in `.keras` (Keras model format). Rec...

The error tells you the solution.
Your file ends with .model and it has to end with .keras.

Rename the file i guess?

long hazel Oct 31, 2024, 7:46 AM

#

I got it to work using chatgpt

#

Btw can i not implement variable auto encoders on a non image based dataset

scenic parcel Oct 31, 2024, 10:54 AM

#

Anyone know an open source rdbms similar to postgres but it lets you have column names longer than 64 characters

radiant frigate Oct 31, 2024, 12:11 PM

#

how do you pick optimal parameters for normal inverse gamma distribution? im building a 2-letter classifier based on MAP, ML and Bayesian inference estimates and Im not sure how do you ideally pick parameters for the NIG distribution

random sapphire Oct 31, 2024, 2:05 PM

#

hey is there any ML dc server that i can join

upbeat prism Oct 31, 2024, 2:07 PM

#

scenic parcel Anyone know an open source rdbms similar to postgres but it lets you have column...

you can 100% change that for psql. in the worst case you have to compile it yourself but I bet that's adaptable

random sapphire Oct 31, 2024, 2:07 PM

#

random sapphire hey is there any ML dc server that i can join

anyone?

vestal spruce Oct 31, 2024, 2:19 PM

#

random sapphire anyone?

Here is the place, I'd share you the alternative but idk if that would be against rule 6 of this server.

random sapphire Oct 31, 2024, 2:19 PM

#

vestal spruce Here is the place, I'd share you the alternative but idk if that would be agains...

alr

#

u can dm me

#

the alt

vestal spruce Oct 31, 2024, 2:20 PM

#

But if you must, try searching it on the discover community server and use machine learning as the keyword

random sapphire Oct 31, 2024, 2:20 PM

#

vestal spruce But if you must, try searching it on the discover community server and use machi...

where is that

vestal spruce Oct 31, 2024, 2:20 PM

#

though not much of them are active, aside from a few.

vestal spruce Oct 31, 2024, 2:21 PM

#

random sapphire where is that

most left bottom of the discord window on PC, you'll see a compass

#

there's a lot of discord server group based on types, such as but not limited to gaming, college/school, language, music community.

#

and yes even data science and ai/ml

random sapphire Oct 31, 2024, 2:23 PM

#

vestal spruce though not much of them are active, aside from a few.

oh then its alr

random sapphire Oct 31, 2024, 2:23 PM

#

vestal spruce there's a lot of discord server group based on types, such as but not limited to...

got it thanks very much

vestal spruce Oct 31, 2024, 2:23 PM

#

just type in those keyword in the search query of discover community server, you'll find them

vestal spruce Oct 31, 2024, 2:24 PM

#

random sapphire got it thanks very much

my pleasure 👌 🎩

random sapphire Oct 31, 2024, 2:24 PM

#

are you into ML

vestal spruce Oct 31, 2024, 2:24 PM

#

random sapphire are you into ML

Indeed.

#

If you have any question about ML don't be afraid to ask, though asking about question isn't really going to be an efficient way to find answers so whatever it is just ask it, someone will eventually have an answer, if not today, ask again tomorrow.

random sapphire Oct 31, 2024, 2:29 PM

#

great, Im a novice and ive been working on some project(mini) could you pls checkout my github i just wanna know if those projects are good to start out ML

#

its pxul1236 on github

vestal spruce Oct 31, 2024, 2:30 PM

#

random sapphire great, Im a novice and ive been working on some project(mini) could you pls chec...

Sure, that's one of the best approach to ml, gaining experience by doing, if you're not shy from sharing your work then go crazy share us what you've made

#

Ok I'll check it out

random sapphire Oct 31, 2024, 2:31 PM

#

is it ok to send repo links in this channel?

vestal spruce Oct 31, 2024, 2:32 PM

#

hmm not sure, would that be consider as advertisment, since it's self-promotion

random sapphire Oct 31, 2024, 2:33 PM

#

ohkk

clever current Oct 31, 2024, 3:10 PM

#

Does anyone have resources for developing a marketing attribution model? It doesn't have to be fancy. I'm joining an education startup soon that wants to improve their marketing attribution and tracking. They do mostly email, SMS, and phone call marketing

#

Please @ me 🙂

deep zealot Oct 31, 2024, 4:50 PM

#

i just realized how diverse the matplotlib library is after only using matplotlib.pyplot then being introduced to import matplotlib instead of import matplotlib.pyplot

#

im ngl i think im going to switch to R for data visualization

agile cobalt Oct 31, 2024, 5:05 PM

#

personally I like plotly for simple things

bokeh is also fine

jaunty helm Oct 31, 2024, 5:12 PM

#

deep zealot im ngl i think im going to switch to R for data visualization

plotnine

deep zealot Oct 31, 2024, 5:26 PM

#

simple plots on matplotlib.pyplot is fine

deep zealot Oct 31, 2024, 5:43 PM

#

deep zealot simple plots on matplotlib.pyplot is fine

https://cdn.discordapp.com/attachments/864546976670547981/1301602278536642570/image.png?ex=67251315&is=6723c195&hm=1117ed784a2b55f3c547b3435b5fe83d98ebfa8f21d0b81afcd224dca1183500&
https://cdn.discordapp.com/attachments/864546976670547981/1301602278062559302/image2.png?ex=67251315&is=6723c195&hm=f8273cf76414d376eeb7c0d4b4b4c5fc04fc7f408e8f912ee8b8dcc1274784cc&

#

its stuff like this im scared of

#

ill try seaborn soon anyway

jaunty helm Oct 31, 2024, 5:55 PM

#

try plotnine if you like ggplot syntax

hard mortar Oct 31, 2024, 6:21 PM

#

I made a python code to extract data from a website, but the code is taking a long time to execute due to the website taking a long time to update, I've already tried using commands to stop updating the website, it still doesn't work, each step of the code only runs when the website stop updating, does anyone know how to help me?

agile cobalt Oct 31, 2024, 6:33 PM

#

hard mortar I made a python code to extract data from a website, but the code is taking a lo...

do not extract from the website itself.

Either use an API or look for data dump or alike

open lily Oct 31, 2024, 7:28 PM

#

Hello, I'm kinda beginner in Python (I know how to code simple algorithm and I know a little bit of numpy), and I'm trying to learn ML

Do you guys have a plan to follow ? Like what are the basics, the things I must know etc...

Also I only code with VSCode but idk if I should try an other IDE

I want to know how to code a neural network from scratch berore using frameworks

cedar tusk Oct 31, 2024, 7:57 PM

#

https://roadmap.sh/ai-data-scientist

roadmap.sh

AI and Data Scientist Roadmap

Learn to become an AI and Data Scientist using this roadmap. Community driven, articles, resources, guides, interview questions, quizzes for modern backend development.

cedar tusk Oct 31, 2024, 8:03 PM

#

open lily Hello, I'm kinda beginner in Python (I know how to code simple algorithm and I k...

.

open lily Oct 31, 2024, 8:07 PM

#

❤️@cedar tusk thank you !

rich moth Oct 31, 2024, 10:19 PM

#

!paste

arctic wedgeBOT Oct 31, 2024, 10:19 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

trim saddle Nov 1, 2024, 12:03 AM

#

open lily Hello, I'm kinda beginner in Python (I know how to code simple algorithm and I k...

https://youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&si=ulIcs1F11JTlhN-b

The best from scratch tutorial out there, by Andrej Karpathy

YouTube

Neural Networks: Zero to Hero

fallow coyote Nov 1, 2024, 12:48 AM

#

I've been stuck in progressing my programming skills so Im going to go back to original goal of learning ML. The project I have in mind to make is to create a program that creates a heatmap of a fighters movement around the ring. What libraries do I need to utilise and what books or websites provides good tutorial in the tools I need?

main solar Nov 1, 2024, 2:10 AM

#

wow google gemini is helping a lot, I just typed a few words and it gave me everything

random sapphire Nov 1, 2024, 7:30 AM

#

can someone tell me what is the use of warnings library

arctic star Nov 1, 2024, 8:50 AM

#

can anyone help me with computer vision

#

i am facing an error in intensity values in threshold images ]

#

does anyone has a good code where from the intensity values can be stroed in numpy array from the threshold image. I am getting wronf values

ionic valley Nov 1, 2024, 10:17 AM

#

I'm a sophomore interested in ML. Should I learn Julia?

wooden sail Nov 1, 2024, 10:37 AM

#

if you like it, why not? the ecosystem is smaller than in python, but it attracts the mathy type more. as a result it has some functionality that psthon doesnt

bronze creek Nov 1, 2024, 10:51 AM

#

Hi everyone, I am new here
And do not know much about this community. I am a student in Data Science and in first semester of my study program. I want to know how many of you are new here and on the same stage. Let's collaborate.

Which one do you think is more in demand in Data Science.
Python or R?

ionic valley Nov 1, 2024, 11:10 AM

#

bronze creek Hi everyone, I am new here And do not know much about this community. I am a stu...

Python but you should learn both

ionic valley Nov 1, 2024, 11:12 AM

#

wooden sail if you like it, why not? the ecosystem is smaller than in python, but it attract...

I don’t know if I like it, I’ve never used it, what I mean is 1.) will it make me stand out and 2.) is it in demand

#

am I likely to get an ROI

#

these questions are pretty bad nvm

wooden sail Nov 1, 2024, 11:28 AM

#

ionic valley I don’t know if I like it, I’ve never used it, what I mean is 1.) will it make m...

it really depends on what your aim is, tbh. python is a lot more widespread and it's most likely what you'll use in practical settings unless you specifically look for something with julia

#

i wouldn't say it'll make you stand out nor is it in high demand, since frameworks for ML are huge and well established

long robin Nov 1, 2024, 11:54 AM

#

jaunty helm did you download pytorch with cuda? or did you download the cpu version if you d...

with cuda
It happens sometimes
Not all the times...

clever current Nov 1, 2024, 2:52 PM

#

I think you should know math and statistics

clever current Nov 1, 2024, 2:57 PM

#

hard mortar I made a python code to extract data from a website, but the code is taking a lo...

if you don't need updated data that often, could you set up a cron job maybe? Or a bash script. If you're running your code with databricks I think they have scheduled jobs

hard mortar Nov 1, 2024, 3:02 PM

#

agile cobalt do not extract from the website itself. Either use an API or look for data dump...

but what if the site doesn't allow API

bronze creek Nov 1, 2024, 5:13 PM

#

ionic valley Python but you should learn both

🙂

ripe pawn Nov 1, 2024, 5:43 PM

#

i created a sklearn cheatsheet, if someone is free, can they proofread and give me a bit of feedback on it.

agile cobalt Nov 1, 2024, 6:47 PM

#

hard mortar but what if the site doesn't allow API

either do not touch the site or contact the site owners to see if they can provide some alternative

agile cobalt Nov 1, 2024, 6:49 PM

#

ripe pawn i created a sklearn cheatsheet, if someone is free, can they proofread and give ...

kinda falls within self-promotion I guess... but just post here, that level should be fine

clever current Nov 1, 2024, 7:07 PM

#

ripe pawn i created a sklearn cheatsheet, if someone is free, can they proofread and give ...

sure share a link or image!

left tartan Nov 1, 2024, 7:40 PM

#

Would like to see too

fallow coyote Nov 1, 2024, 10:48 PM

#

is it better to split a datatime column into individual day, month and year columns rathe than to convert a date column into datetime?

mint bobcat Nov 1, 2024, 10:51 PM

#

Usually is better to work with datetimes (and time zones)

fallow coyote Nov 1, 2024, 10:57 PM

#

i read some things online saying for ml, its better to split the datetime columns into separate columns as itll be faster. id assume itd be faster that way rather than converting the date column into datetime and reading it

#

if you get what I mean

agile cobalt Nov 1, 2024, 11:11 PM

#

for storing the data itself and for most transformations, analysis etc. it is better to store it as a datetime column

For feeding it into ml models, you must convert it to a number however - that isn't limited just to datetimes, but also every other type of data like strings and what not

For some models, you could just feed it the timestamp, but usually you'll want to split into somewhat relevant features instead of just feeding the timestamp

nocturne valley Nov 2, 2024, 12:49 AM

#

anyone here have experience with wav2vec2? Any pointers for the best repo to clone for a non-GPU multicore?

rich moth Nov 2, 2024, 3:38 AM

#

nocturne valley anyone here have experience with wav2vec2? Any pointers for the best repo to clo...

I got a little, Im building kind a Frankenstein web scraper, if you will. Anyways you can experiment with the audio processing component if you want. But I found some resources on huggingface.

#

!paste

arctic wedgeBOT Nov 2, 2024, 3:38 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich moth Nov 2, 2024, 3:39 AM

#

You will have to test and experiment with yourself. As im busy https://paste.pythondiscord.com/LHSQ
I hope this can help.

bitter garden Nov 2, 2024, 6:15 AM

#

fallow coyote i read some things online saying for ml, its better to split the datetime column...

If you want to use the time series data for ML, you can convert them into cyclic features like in the image. For usual data analysis pd.to_datetime(df["timestamp"]) should be enough.

cinder schooner Nov 2, 2024, 8:41 AM

#

hello, i want to learn more about cuda and the capabilities of what i can do with it. Would you have great ressources on that?

rigid cape Nov 2, 2024, 10:53 AM

#

Hey guys, Need ideas for ML projects for portfolio. If you guys have any lists do share. I'm looking for some ideas where I can make a small ML or DL model and then use it to make a web app and host it online.
Thank you.

quaint mulch Nov 2, 2024, 11:33 AM

#

rigid cape Hey guys, Need ideas for ML projects for portfolio. If you guys have any lists d...

https://paperswithcode.com/sota This is a good list of things that ML can do at the moment.

Papers with Code - Browse the State-of-the-Art in Machine Learning

12097 leaderboards • 5194 tasks • 10850 datasets • 146270 papers with code.

quaint mulch Nov 2, 2024, 11:34 AM

#

bitter garden If you want to use the time series data for ML, you can convert them into cyclic...

What timeseries stuff are you doing?

quaint mulch Nov 2, 2024, 11:37 AM

#

random sapphire its pxul1236 on github

Improve your presentation, turn it into a well written blog post or reports.

calm dome Nov 2, 2024, 11:44 AM

#

hellom how can i intergrate whatsapp meta AI with my chats so that it can auto reply customers

odd stratus Nov 2, 2024, 1:37 PM

#

anyone have any methods for removing poisoned data from a data set?

random sapphire Nov 2, 2024, 2:49 PM

#

quaint mulch Improve your presentation, turn it into a well written blog post or reports.

wdym can you elaborate

left tartan Nov 2, 2024, 3:03 PM

#

odd stratus anyone have any methods for removing poisoned data from a data set?

Depends on the data, for sure

#

Tell us more?

bitter garden Nov 2, 2024, 3:18 PM

#

quaint mulch What timeseries stuff are you doing?

Hourly crypto price prediction

random sapphire Nov 2, 2024, 3:20 PM

#

left tartan Depends on the data, for sure

whats a poisoned data?

left tartan Nov 2, 2024, 3:21 PM

#

random sapphire whats a poisoned data?

Not sure what they meant... to me, poisoned is intentionally malicious data intended to 'mislead' the training

#

As opposed to outliers or other 'bad' data that skew the results / don't represent 'truth'

odd stratus Nov 2, 2024, 3:22 PM

#

left tartan Not sure what they meant... to me, poisoned is intentionally malicious data inte...

yeah thats correct

im just working on an assignment and trying to find methods to implement, except google is completely trash and i can not find any reasonable method explanation

#

the data is a list of 78 values from 0-1 and then a column labelled "normal" or "attack"

random sapphire Nov 2, 2024, 3:23 PM

#

left tartan As opposed to outliers or other 'bad' data that skew the results / don't represe...

ohkk thank you ::D

odd stratus Nov 2, 2024, 3:23 PM

#

ive managed to remove 97% of the poisoned data, but it still causes issues

left tartan Nov 2, 2024, 4:11 PM

#

odd stratus ive managed to remove 97% of the poisoned data, but it still causes issues

What issues do you see, and what kind of data?

odd stratus Nov 2, 2024, 4:12 PM

#

left tartan What issues do you see, and what kind of data?

the ai never overfits, indicating that the data is reducing its accuracy (even when the model used is exceedingly large for the data)

#

the data is 78 columns of data of values from 0-1 representing different security features to detect either potential attacks or normal system functions

agile cobalt Nov 2, 2024, 4:15 PM

#

odd stratus the ai never overfits, indicating that the data is reducing its accuracy (even ...

that sound pretty weird?

Are you sure that it is not overfitting, or is it just not reaching a good accuracy?

Overfitting does not necessarily means you have a good accuracy, it just means you are paying too much attention to details in the training data that are not present in the test data

odd stratus Nov 2, 2024, 4:18 PM

#

heres a few examples of the training data btw

odd stratus Nov 2, 2024, 4:19 PM

#

agile cobalt that sound pretty weird? Are you sure that it is not overfitting, or is it just...

its not overfitting on training data, i.e. the data that is labelled

on smaller subsets, it can overfit, meaning that the subset doesnt have poison data, but is also easier to overfit on a small subset
however after removing a large portion of potential posion data, the overall accuracy increases due to the reduced poisone data, and trying to get it to overfit is very annoying, and the main goal is to just remove the poison data lmao

odd stratus Nov 2, 2024, 4:20 PM

#

agile cobalt that sound pretty weird? Are you sure that it is not overfitting, or is it just...

the test data and training data have identical data input, other than the label of course

velvet mountain Nov 2, 2024, 6:27 PM

#

what are people usually using for monitoring deployed model ? I'm typically interested in a tool that could decide that the model is badly performing, and send an alert somewhere to someone (not specifically automatic retrain). I know a bit mlflow so if there is a solution with this I'm interested. but I'm also interested in knowing the heuristic that is actually used?

cedar tusk Nov 2, 2024, 6:28 PM

#

velvet mountain what are people usually using for *monitoring deployed model* ? I'm typically in...

how would you know if the model performs badly?

velvet mountain Nov 2, 2024, 6:29 PM

#

yes that's my question. what are people typically using for that?

#

I guess there is a need for a kind of feedback loop, where the model predicts and the actual value is sent back for the model to know it performed badly. but I'm wondering if people have a dedicated tool for that, or if it's home-made

cedar tusk Nov 2, 2024, 6:30 PM

#

tensorboard is used widely afaik

#

but different platforms have their own equavalent

velvet mountain Nov 2, 2024, 6:31 PM

#

I've looked into mlflow but I couldn't find a way to really feed a deployed model with actual values, and code a "I'm obsolete" trigger

cedar tusk Nov 2, 2024, 6:32 PM

#

velvet mountain I've looked into mlflow but I couldn't find a way to really feed a deployed mode...

models dont get obsolete automatically

#

people decide that

velvet mountain Nov 2, 2024, 6:32 PM

#

yeah yeah ofc ; I meant, just the idea of sending alerts

#

but if I get you right, it's just another tool that would compare the predicted vs actual

cedar tusk Nov 2, 2024, 6:33 PM

#

i dont see how a model can get obsolete

#

because you should build the model with everything in mind anyways

#

if you are trying to train the model continuously, you have checkpoints

#

aka u train the model every 1 month or whatever

velvet mountain Nov 2, 2024, 6:34 PM

#

it's more about data drift

cedar tusk Nov 2, 2024, 6:34 PM

#

population distribution dont get changed willy nilly, unless a major event occurs

#

but people would handle the data change then anyways

velvet mountain Nov 2, 2024, 6:35 PM

#

use case: some model is trained on a production line

#

and then the production machines get older as time goes

#

so the model starts to produce bad results, because the entire pipeline of machines do not perform the same

#

that's the kind of things Ihave in mind

#

---- doesn't sound like continuous training because somehow, the "past" data is not really relevant anymore for the prediction at present-time

cedar tusk Nov 2, 2024, 6:36 PM

#

velvet mountain and then the production machines get older as time goes

I see what you mean, you want an alert that will say if the observations dont match the training set no more

velvet mountain Nov 2, 2024, 6:36 PM

#

yes

#

well, not just the observation X in f(X)=Y

#

really the use case: X was sent, f(X) was predicted, but we got Y. and it's been N times we actually badly failed f(X) >> Y (for example) : so maybe it's time to retrain

cedar tusk Nov 2, 2024, 6:37 PM

#

you need an observation group and do hypothesis testing on it automatically

#

shouldnt be too hard to implement

velvet mountain Nov 2, 2024, 6:38 PM

#

no no for sure ; I'm just wondering if there is a tool that is commonly used for that

#

I mean, a "mlops" tool ; not a lib to do hypthesis testing 😄

cedar tusk Nov 2, 2024, 6:40 PM

#

from what i see evidently ai is not bad

#

take it with a grain of salt tho, never used the thing

cinder charm Nov 2, 2024, 7:52 PM

#

matplotlib is giving me a circular import

#

how do I fix this

#

The error code is

cannot import name '_version' from partially initialized module 'matplotlib'

tidal bough Nov 2, 2024, 8:58 PM

#

cinder charm matplotlib is giving me a circular import

typically the only reason you'd get a circular import when using a third-party library is if you accidentally named a file something like matplotlib.py, and hence aren't importing what you think you're importing

cinder charm Nov 2, 2024, 9:05 PM

#

tidal bough typically the only reason you'd get a circular import when using a third-party l...

the issue has been solved, it was a broken install, but it was throwing the wrong error for some reason

compact valley Nov 2, 2024, 10:02 PM

#

Is data engineering part of data science or it should be another channel?

delicate apex Nov 3, 2024, 2:48 AM

#

!rule ad
also, nice job disclosing your conflict of interest there

arctic wedgeBOT Nov 3, 2024, 2:48 AM

#

Rules

6. Do not post unapproved advertising.

left tartan Nov 3, 2024, 3:02 AM

#

Please DM modmail and explain why this ad was posted?

neat violet Nov 3, 2024, 3:04 AM

#

@left tartan for helping students

left tartan Nov 3, 2024, 3:05 AM

#

neat violet <@738234281146712084> for helping students

Is this part of some Microsoft program? Are you being asked to post this

neat violet Nov 3, 2024, 3:05 AM

#

No

left tartan Nov 3, 2024, 3:07 AM

#

neat violet No

Don't post ads in this server, anywhere. Read our rules before posting: #rules

neat violet Nov 3, 2024, 3:07 AM

#

Okay

thorny geode Nov 3, 2024, 3:42 AM

#

hey, does the Stat110 course really that important in data science

#

i've read through bayesian statistics and i feel its not being used much compared to just learning about t-test

alpine aspen Nov 3, 2024, 4:02 AM

#

https://dev.to/kylepena/the-unreasonable-usefulness-of-npeinsum-2cj1 <-- blog post I wrote about tensor operations

DEV Community

The Unreasonable Usefulness of numpy's einsum

Introduction I'd like to introduce you to the most useful method in Python,...

alpine aspen Nov 3, 2024, 4:08 AM

#

thorny geode i've read through bayesian statistics and i feel its not being used much compare...

FWIW I've used bayesian statistics pretty extensively. I've used naive bayes and complement naive bayes to do resume-to-job matching and it worked quite well. I've also implemented PyMC models for Amazon, and sometimes I think about things in terms of priors, evidence and posteriors

thorny geode Nov 3, 2024, 4:18 AM

#

alpine aspen FWIW I've used bayesian statistics pretty extensively. I've used naive bayes an...

oh.. i will took a while to learn data analysis

thorny geode Nov 3, 2024, 4:19 AM

#

alpine aspen https://dev.to/kylepena/the-unreasonable-usefulness-of-npeinsum-2cj1 <-- blog po...

noice

wooden sail Nov 3, 2024, 5:19 AM

#

alpine aspen https://dev.to/kylepena/the-unreasonable-usefulness-of-npeinsum-2cj1 <-- blog po...

i think you should spend some time talking about the greedy optimizer for einsum's order of operations. if you just use it naively, especially for things like matrix multiplication whose base routine is super well optimized, einsum is just way slower

alpine aspen Nov 3, 2024, 5:20 AM

#

wooden sail i think you should spend some time talking about the greedy optimizer for einsum...

Thanks, I'll look into that!

#

That's a great bit of feedback - wasn't aware of that aspect of np.einsum

wooden sail Nov 3, 2024, 5:23 AM

#

yep. especially for your example of multihead attention, you might find that einsum without optimization is slower than just using a for loop with regular multiplications

#

and then if you look into the automation broadcasting behavior of @ and .dot, once of the two should already do slicewise matmul, and this will definitely be faster than einsum without the opt

#

but yeah, einsum is the best thing since sliced bread and everyone should use it 😌

alpine aspen Nov 3, 2024, 5:26 AM

#

well now i'm doubting that. i think what this means is i need to sit down and read the implementation for this optimization stuff

wooden sail Nov 3, 2024, 5:26 AM

#

heh

#

you can also drive the point home by saying that pytorch, tf, and jax also all have einsum too

alpine aspen Nov 3, 2024, 5:27 AM

#

i mention it briefly at the start but it wouldn't hurt repeating it

#

there's also C vs FORTRAN order and things like that that can matter quite a bit

wooden sail Nov 3, 2024, 5:27 AM

#

that's exactly why the optimizer is important

#

the memory layout changes how fast the multiplication is depending on the order

alpine aspen Nov 3, 2024, 5:28 AM

#

is it optimizing more for coherence in memory access or for space complexity stuff with interstitial allocations

#

or both

#

i guess i just need to read it

#

thanks again, heading to bed. i've got a few other interesting blog posts as well if you want to check them out (although the deepfakes one needs another editing pass)

wooden sail Nov 3, 2024, 5:30 AM

#

here we go, found it in matmul

#

https://numpy.org/devdocs/reference/generated/numpy.matmul.html

fleet glade Nov 3, 2024, 5:31 AM

#

I want to start learning Motion detection like Want to build something like motion tracking fitness Thing but i got no knowledge about anything
Can anyone guide me from where to start what to learn and also if possible can provide me some coursera links to learn those things

alpine aspen Nov 3, 2024, 5:35 AM

#

wooden sail here we go, found it in matmul

The only advantage I can think of, then, is that if you use einsum you don't necessarily have to transpose K before using it

wooden sail Nov 3, 2024, 5:35 AM

#

for this operation, yes

tawdry sundial Nov 3, 2024, 11:38 AM

#

this makes no sense ```py
loss_fn = nn.L1Loss()
optimizer=torch.optim.Adam(model.parameters(), lr=0.001)

epochs = 5
for i in range(epochs):
model.train()
y_pred = model(X_train)
loss_score = loss_fn(y_pred, y)
optimizer.grad_zero()
loss_score.backward()
optimizer.step()```

#

error at loss_fn(y_pred, y)

TypeError: 'int' object is not callable

wooden sail Nov 3, 2024, 11:40 AM

#

it does make sense, what is nn.L1Loss() and what does it return?

#

you probably meant to assign it without the () instead of calling the function in the very first line

tawdry sundial Nov 3, 2024, 11:42 AM

#

didnt work

#

I am pretty sure that L1Loss is a class

#

#

yea, doesnt make sense

wooden sail Nov 3, 2024, 11:43 AM

#

show the error message

tawdry sundial Nov 3, 2024, 11:43 AM

#

with () or without?

wooden sail Nov 3, 2024, 11:44 AM

#

with, and show the full traceback