#data-science-and-ml

1 messages ยท Page 149 of 1

river cape
#

Btw do we as ML engineers require DSA?

wooden sail
#

yes, among other things. a really important skill is being able to determine which problems merit using deep learning in the first place, because many (most, even) don't

#

DSA gets you familiar with common problems that already have good solutions, vs those that don't

#

you generally wanna have good familiarity with classical problem-solving and optimization methods, which includes DSA and more

charred egret
#

Yeah like using NLP when a simple regex will suffice

desert oar
#

for engineering specifically you might also actually want to know about things like space-time tradeoffs and avoiding accidentally quadratic/exponential algorithms

wooden sail
#

it also helps you detect when your deep learning algo is shit

river cape
#

Ahhh man need to learn that also now

desert oar
#

example: finding the maximum in a list is O(N) but sorting a list is usually something like O(N log(N)), and it turned out that we were able to actually make our database queries run faster by switching from 1 = row_number() over (partition by i order by y) to i = max(i) over (partition by i) using that knowledge

#

I also had some code running in production that was accidentally quadratic, we had to switch from using Pandas vectorized operations to a hand-written for loop

#

it also comes up in interviews

river cape
#

should I increase the complexity?

cerulean kayak
#

For anyone else using Google Colab, namely for image processing and making image based CNN models:

Under the "Change runtime type", do you guys use:

  • CPU
  • T4 GPU
  • TPU v2-8
    please at me if you respond.
serene scaffold
nova matrix
faint quail
#

made this using my own custom neural network library

cerulean kayak
faint quail
mint plume
#

I'm doing a project right now and I the issue here is that I have two datasets that are related but they have different granularities. One has data collected every 1 hour and the other every 3 hours. I want to combine them to get a good design matrix but I'm not really sure what to do here.

signal escarp
#

Hello guys, I have a project about the reinforcement learning and supervised learning with some sort of zigzag algorithm that extracts top and bottom points on trading but I have some issues about it. For supervised learning, it's overfits too much, it goes from 10k to millions on training data but actually performs very badly in any other testing data. I used some dropout methods and tried to increase the batch size from 256 to 512 and even 1024 but it still faces with the same problem (I will try runing 4096 for a few days but I don't think that it will solve the problem). So I know that to I have to use this model as base policy (at least it will help a lot on train data for RL) and run it on RL model but RL has some problems that I didn't find but it actually not learning at all. I debugged rewards at a well performing session and I noticed even tho everything is positive high, reward is generally around -4 to -7 and it's simular at bad performing session, problem could be that but I don't actually know. So, I'm just searching for someone who can help me about it and we can solve this issues together, if we can fix the issues we will have a great performing model and personally I'm planning to use it myself for automated investing and also I think it would be great for both sides. So, how can I find someone to fix the issues together (I mean where) or does anyone interested?

mint plume
#

The closest thing I've done is imputation.

faint quail
# signal escarp Hello guys, I have a project about the reinforcement learning and supervised lea...

This is a very outside look but to me the stock market seems like something incredibly random that you can't really easily predict, especially not without enterprise grade hardware to mine data and train giant models. I think it's not that the model is underperforming, I think it's that the task of predicting the stock market makes it impossible or at the very best needing extremely powerful hardware

#

It would be like trying to tell a model to predict something like a hashing algorithm such as SHA256, technically it's possible since it's determinate but in practice its not possible

#

idk tho I don't know a lot about finance

signal escarp
# faint quail This is a very outside look but to me the stock market seems like something incr...

Generally market is not that much random and I belive that it could be something solved at a level that machine can understand complex patterns. I'm not trying to build a model that would perform well in market on my computer, first I have to get a algorithm that performs well on small data (should perform well on any kind of simular data) and after that I will use strong servers like 8x of h200 and I belive it will end in a good model if we have a great train algorithm and enough complexity of nn

cerulean kayak
faint quail
untold fable
#

what is difference between standardization and normalization

quaint mulch
#

what are you into?

onyx frigate
marsh marsh
#

kaggle datasets list -s retail-orders
^
SyntaxError: invalid syntax

can anyone help me with this ??

austere swift
#

that's a command for your terminal, not python

#

so command prompt if you're on windows or whatever shell/terminal you use if you're on linux

marsh marsh
#

ahh no was using kaggle api to download a dataset in my jyupter notebook extension in vs code

#

but it solved thanks for the information tho

onyx frigate
#

I have a question how is that mixtral out performs llama2 and gpt3.5 in some benchmarks even though it has 56billion parameters where as llama and gpt have in trillions ??

grand breach
#

what can i infer from this information:

if i've found out the mean ctr of entire dataset and tried finding out which categories from a categorical feature had ctr above the mean ctr of dataset

do those categories play an important role for model to learn, what if a feature has only 1 or 2 categories that have ctr more than the mean ? do they've any impact on what patterns my model learns

hearty token
#

I have an exam question like this. I would like to segment each question into its own snippet. Sort like taking a screenshot out of each question as separate images. How can I go about automating this with A.I.? I have thought about using edge detection of some sort

vestal spruce
trim forum
#

basically I would use Openai api

#

its fire

vestal spruce
#

@hearty token
Assuming the width of every question in the test is all the same, you could start with finding the fix width first to get the dead zone for number detection, so if the number is inside the dead zone, which is the width range, then we can exclude it. so only the number on the left hand side of the width range will be detected.

#

from there we can pad the coordinate of the question number vertically (basically shifting upward), and then subtract the distance between question number vertically using a condition that both the question number we try to calculate the distance of is within the relatively same x coordinate (horizontally)

granite canopy
#

hello every one am new here from cameroon i like python programmation i never did that before i may you people to help me to start programming like such people here thanks..

arctic wedgeBOT
#

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

hearty token
thorny geode
rain glen
#

FineTuning an OpenAI model with data from huggingface for the first time, and im using a 5gb file of chess moves lol. Gonna have the ultimate (mediocre) chess bot

vestal spruce
#

Guys I have this distribution graph of cumulative win/loss of a technical indicator on every stocks in my market, is it a good idea to normalize and filter out the outlier?

quaint mulch
#

use more bins?

vestal spruce
quaint mulch
vestal spruce
gentle storm
#

Hi does anyone know a good program to start using python on? I want to make neural networks and simulations

vestal spruce
onyx frigate
rain glen
#

Stockfish to this is the magnas carlson of the kindergarten

quaint mulch
river cape
grand breach
#

does it make any sense to do standardization after hash encoding categorical variables ? i saw a guy doing this in his analysis (not criticising his work at all) - my understanding is if there are no multiple ranges in dataset scaling might not be necessary, but main thing is doing this on categorical features is incomprehensible

mellow vector
#

jupyter always duplicates images I paste into markdown cells, what am I doing wrong?

serene scaffold
mellow vector
#

there's a help topic open

serene scaffold
mellow vector
#

ya sorry, was back and forth between ds and general

lilac lichen
#

is there any news related to KAN?

odd meteor
twilit sable
#

Which ML framework should I learn as a beginner so that I can find job quickly??? I mean tf or pytorch

agile cobalt
#

learning a framework is less than 20% of the work
the vast majority of what you have to learn is more about math and statistics rather than programming itself

serene scaffold
serene scaffold
twilit sable
twilit sable
#

So should i learn pytorch as a beginner

serene scaffold
#

Nope.

twilit sable
#

Why??

serene scaffold
#

Because the framework doesn't matter. You need to learn the concepts.

#

ML has less to do with the actual code implementation than does general SWE

twilit sable
#

Ik

#

Ahh. The. Good courses online are of tf not of pytorch why ??

#

@serene scaffold

faint quail
#

built my own crappy framework

#

its not impossible to be self taught tho

#

just learn the concepts and understand WHY they work

twilit sable
#

Can anyone help me ??

faint quail
#

do what?

twilit sable
#

Plz I am confused tf or pyt ???

#

Tensorflow or pytorch

#

I am confused

faint quail
#

pytorch is more popular now adays

#

so I recommend that

twilit sable
#

From where should I learn ml with it ??

#

There are no good courses??

#

All good courses are of tf

#

I also want to learn pytorch

#

@faint quail

faint quail
# twilit sable From where should I learn ml with it ??

What are the neurons, why are there layers, and what is the math underlying it?
Help fund future projects: https://www.patreon.com/3blue1brown
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

Additional funding for this project was provided by Amplify Partners

Typo correction: At 14 minutes 45 seconds...

โ–ถ Play video

Exploring how neural networks learn by programming one from scratch in C#, and then attempting to teach it to recognize various doodles and images.

Source code: https://github.com/SebLague/Neural-Network-Experiments
Demo: https://sebastian.itch.io/neural-network-experiment

If you'd like to support me in creating more videos (and get early acce...

โ–ถ Play video

In this video we'll create a Convolutional Neural Network (or CNN), from scratch in Python. We'll go fully through the mathematics of that layer and then implement it. We'll also implement the Reshape Layer, the Binary Cross Entropy Loss, and the Sigmoid Activation. Finally, we'll use all these objects to make a neural network capable of classif...

โ–ถ Play video
#

these are really good and how I learned

#

no crappy commentary with a cheap mic

#

just gets straight to the point

#

while still giving you a general understanding

twilit sable
#

These are all about neural network

faint quail
#

u dont have to watch the entire 3blue1brown playlist tho

twilit sable
#

Not pytorch

#

Dude

faint quail
#

do you know python already?

twilit sable
faint quail
#

just use whichever is easiest

twilit sable
faint quail
#

if you understand the concept it generally doesnt matter what framework you use

#

just read the docs

twilit sable
#

@faint quail Bro this is the most shitty person I have ever seen -> @serene scaffold

serene scaffold
#

You're right. I'm the worst of all possible humans.

faint quail
#

lol

#

better to just help them instead of putting them down

#

but he is kind of right because you're likely not gonna get hired unless you really understand the advanced calculus concepts of A.I

#

or atleast to some degree the math for training A.I

serene scaffold
left tartan
iron basalt
faint quail
quaint mulch
elder coyote
#

Hello, is there any algorithm to find the exact images without having false positives while comparing two images? i have used cross correlation with pHash, but i still get false positives

#

i use them both to filter out the images

#

at the same time

agile cobalt
#

for "exact" matches, that is the only way
if you want similar images, it depends heavily on your definition of how "similar" the images should be

terse mirage
#

hi, I'm a complete beginner in machine learning and am trying to implement a neural network from scratch, but i seem to be getting 'weird' results, could anyone go over my code once?

#

i am randomizing weights and biases everytime i run the program (i haven't completely implemented backpropagation yet), and sometimes i get this output, where the first 0 refers to calculated y value, and the second zero refers to the error of the last layer

#

my network has 4 layers with 1,2,5 and 1 neurons respectively

timber citrus
#

Hello everyone, we are developing an FAQ chatbot to assist students with their enrollment by answering their queries. We have utilized a pre-trained BART model for the chatbot. My question is: how can I make the chatbot context-aware, so it can understand and maintain the context of the conversation? Additionally, any advice on developing a chatbot that specializes in answering questions would be greatly appreciated.

graceful niche
#

anyone got a good book for building models from 0? neural networks or ML

#

or just some good in depth guide

timber citrus
#

Any answer will help us plan our next step because we are stuck ๐Ÿ˜ญ

cold estuary
#

Guys I am doing a major project on Text to floorplan using GANs. I have a dataset already but it has the floor plan layout only like in the picture I have attached. My professor told me to include beds, dining tables, doors, and some other things that are kept in respective rooms. So does any have any idea whether we can modify the dataset using GAN itself?

quaint mulch
quaint mulch
odd meteor
# lilac lichen where i can read more?

Left to me I'd say, KAN currently looks like a nice interpretable model to play with toy examples, but it hasnโ€™t shown nearly enough evidence to claim that it can replace MLPs.

So I have doubts about the claim that KAN is superior to MLPs.

https://arxiv.org/abs/2407.17790

terse mirage
#

the second zero represents the errors of the backpropagation layer

#

i can't predict what exactly i'm expecting because i'm randomising my weights and biases at every run

terse mirage
#

I figured out why its outputting zero as feedforward results

#

a lot of my weights are negative

#

and this impacting the weighted inputs

#

and since i'm using ReLU this is just becoming zero

dawn blaze
#

Heyaaa, anyone here a "master" of tensorflow/keras? me and two classmates are working with audio-identification prediction and cant seem to get it to analys our voices. ๐Ÿ™‚ anyone wanting to have a look at our codes ๐Ÿ™‚

rich moth
#

I made a Pokรฉmon scraper. My next plan is to incorporate the MTG world into it but im not sure if it should be one parquet or two. what do you guys think? https://paste.pythondiscord.com/YFCQ

serene scaffold
#

@dawn blaze always ask the question you actually want answered. Don't wait for a commitment.

olive wedge
#

anyone got good resources from where i could begin learning?

crystal badger
#

Has anyone tried using SDFusion or AutoSDF for 3D completion? Iโ€™d appreciate insights on how they perform, especially with setup or training. Were there any specific challenges you faced during the implementation or dataset preparation? Im stuck right nw

serene scaffold
olive wedge
ancient hornet
#

Hi, just wanted to ask some questions about MLE N-gram LMs. Does anyone have any expertise?

serene scaffold
olive wedge
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

olive wedge
mossy hazel
#

hi, i have an query related to my project that i am working. I quickly jump into my problem statement;

I need to extract a text data from Tabular format, plots, images in the PDF files. I need to develop a sub project using Generative AI architecture.

Can anyone help me with How should i start? what are things and stuff i need to integrate? Affective LLM models.. and finally you can help me with anything related to this...

serene scaffold
#

to cut to the chase: you shouldn't use generative AI. generative AI is what's fashionable right now, but it's the wrong choice for this task. generative AI is about creating new content. but you're not trying to create new content. you're trying to extract content that already exists.

mossy hazel
serene scaffold
mossy hazel
# serene scaffold something involving OCR.

see, extracting normal text data from normal PDF's is easy. but in my case, i need to extract Data from scientific/Research Documents, which isn't that easy. that is why using LLM's and agents weould be robust.

#

i need help, in architecture to develop this project. and workflow,

charred egret
#

Youโ€™re extracting content from a pdf. Why would you want generative methods for this? What are you generating? You just need some way to read the pdf content, spot the tables, and transform it to whatever your target is

mossy hazel
charred egret
#

oh yeah Iโ€™m simplifying a lot here. text mining/extraction and friends can be very complex

ancient hornet
#

I'm trying to create an MLE N-gram LM. I've been provided the following code that I need to fill in.

class mle_ngram_lm:
    def __init__(self, train : Sequence[str], n : int) -> None:
        # TODO: train the model here, declaring appropriate instance variables (i.e., model parameters!)
        # Consider what you need to compute logprob(w, c)!
        pass

    def logprob(self, w : str, c : Sequence[str]) -> float:
        assert (len(c) + 1) == self.n
        # TODO: compute p(w | c) using those instance variables
        return 0.0

I've been taking this course informally (not for accreditation) so I haven't been able to attend all of the lectures and now I'm confused as to what parameters need to be filled in. This is based on Chapter 3 of Jurafsky & Martin's Speech and Language Processing.

I posted this to the python-help channel but didn't get any replies so hoping that it gains more traction here.

mint plume
#

What IDE's do you guys prefer?

spring field
#

PyCharm (which unfortunately is not starting up lately) and VSCode

rich moth
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich moth
broken eagle
#

Anyone familiar with text-image retrieval? What's the SOTA now? Anything significantly better than BLIP?

agile cobalt
# broken eagle Anyone familiar with text-image retrieval? What's the SOTA now? Anything signifi...

for "what's the SOTA", see https://paperswithcode.com/task/image-to-text-retrieval or some other tasks they link in https://paperswithcode.com/paper/blip-2-bootstrapping-language-image-pre

for how good a model will be for your specific use case / what the best model is, the answer is pretty much the same as all software engineering problems - "it depends"

some models may perform better in specific datasets for particular reason, some use cases may require lower cost/latency or require a higher accuracy etc. (not to mention fine tuning, distillation and so on)

Image-text retrieval is the process of retrieving relevant images based on textual descriptions or finding corresponding textual descriptions for a given image. This task is interdisciplinary, combining techniques from computer vision, and natural language processing. The primary challenge lies in bridging the semantic gap โ€” the difference...

#

also "significantly" better is very subjective

mossy hazel
# rich moth https://paste.pythondiscord.com/ITSA Maybe something like that would be a good ...
pymupdf
sympy
matplotlib
farm-haystack[all]
elasticsearch

when i try to install above requirements, it failing

Building wheels for collected packages: faiss-cpu
  Building wheel for faiss-cpu (pyproject.toml) ... error
  error: subprocess-exited-with-error

  ร— Building wheel for faiss-cpu (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [8 lines of output]
      running bdist_wheel
      running build
      running build_py
      running build_ext
      building 'faiss._swigfaiss' extension
      swigging faiss\faiss\python\swigfaiss.i to faiss\faiss\python\swigfaiss_wrap.cpp
      swig.exe -python -c++ -Doverride= -I/usr/local/include -Ifaiss -doxygen -DSWIGWIN -o faiss\faiss\python\swigfaiss_wrap.cpp faiss\faiss\python\swigfaiss.i    
      error: command 'swig.exe' failed: None
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for faiss-cpu
Failed to build faiss-cpu
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (faiss-cpu)
rich moth
mossy hazel
mossy hazel
rich moth
silver perch
#

hey guys im thinking of a excalidraw kind of application with ai integrated so i can type query and it creates drawing accordingly
im not sure what should i use can anyone give me an idea of how can i do this? ( not image generation but get a type of data which i can parse and create stuff myself)

rich moth
# mossy hazel ?

Im watching Alien Romulus gonna have to work with me ๐Ÿ™‚

jaunty helm
stone patrol
#

Where to get simple Text ai code?

odd meteor
stone patrol
odd meteor
odd meteor
fresh bay
#

looking at the pytorch geometric implementation of GCN - I am confused because it seems like they are assuming the adjancency matrix will be made up of 1's and 0's whch might not always be the case no?

#

compared with the formulation from Kipf

wooden sail
wooden sail
# fresh bay

this doesn't say anything about what A and A tilde are, at least not explicitly. but note that self connections are considered through addition of an identity matrix, meaning it also assigns a 1 to any connections/edges

#

the matrix D there in the second figure is probably the "degree matrix", which tells you the number of edges for each vertex. given how they define it there, i'd expect A tilde to be a binary matrix, just like in the first image

fresh bay
#

Thanks Edd - thats a fair point I am generally use to thinking of Adj matricies as correlations from frmi/transcriptome data - but yes that should be the standard case

#

I actually didnt know that if you have edge weights it isnt a true adj matrix that is good to know

#

@wooden sail catking \

onyx laurel
#

Hello, I am someone who wants to start learning data science. I found some books from the OpenStax series: Algebra and Trigonometry (page 1516), Calculus Volume 1 (page 875), Calculus Volume 2 (page 737), Calculus Volume 3 (page 915), High School Statistics (page 932), Introductory Statistics (page 849), and Business Statistics (page 627). Do you think I need to study all of them fully

#

Does anyone have any experience with them?

wooden sail
#

but in uni it's unusual to grab a book and read it cover to cover. it's more common to grab several books on the topic and fish out some select few sections. you read the rest as needed, since it's unrealistic to cover all the topics in a whole book well in a semester

#

it might help to orient yourself by looking at the syllabi of some programs

#

you can find them e.g. for some of the MIT open courseware to get an idea of how fully books are really covered

onyx laurel
#

You know, I want to learn data science through self-study, not by going to university. Do you have any suggestions for me?

wooden sail
#

exactly the same ones i gave you just now

#

you wouldn'T be asking about books if you were going to uni ๐Ÿ˜› i'm giving you the context of how it would be for people who do

fresh bay
#

Id also recommend ISL

#

Khan Academy is also good

onyx laurel
#

what's ISL

wooden sail
#

i would only add that it's important to try to set a schedule for yourself and stick to it, and to cover important topics even if you find them boring. also do several problems to check you're understanding

fresh bay
#

introduction to statistical learning

#

agree with Edd about the schedule

wooden sail
#

uni naturally forces you to do all those things, but you might feel like skipping them when studying on your own. ideally you wouldn't skip them cuz they're actually helpful

onyx laurel
#

But I'm really better at learning by reading text than by watching videos or taking courses.

fresh bay
#

I know this is a python server but introduction to data science with tidyverse is pretty good as well

onyx laurel
#

I found the atmosphere of this server better than the others. thanks both of you

wooden sail
#

watching videos may be the worst way of learning a topic from scratch, there are too many pitfalls that are easy to fall into. and courses are designed around books and require you to go read it anyway. for the most part, grabbing the book and reading it + doing several exercises is the best way to go about it. at least imo

#

i do find the math server is pretty good to ask for help and explanations on math subjects like these btw

onyx laurel
#

I'm a perfectionist, and I usually spend a lot of time on a topic. How much time do you think is enough, on average, for each topic?

wooden sail
fresh bay
#

Id add if you wanted a video - Statquest by Josh Starmer is wonderful

fresh bay
#

like there is no way to know

wooden sail
#

it'S more about becoming aware that several weird things exist, getting some intuition, and most importantly, learning how to learn

fresh bay
#

at some point - the best way to do this is just to go do it

onyx laurel
#

thankss๐Ÿ˜Š

wooden sail
#

here, since you have data science as a goal, not straight up studying mathematics, it helps a lot to look for practical problems related to the math topics, if at all possible

fresh bay
#

you need to accept you wont be able to do things perfectly - but make sure you do a few of the excerices even if its the first two or three

#

dont just read you need to do a few problems

onyx laurel
#

Do you think it's a good idea to ask ChatGPT to make a list of subjects that I should study?

wooden sail
#

i wouldn't ask chatgpt something you don't already know about, no

fresh bay
#

no

wooden sail
#

or at the very least cross-check what it spits out

fresh bay
#

you can use it if you can verify the answer

#

but if you cant then dont

onyx laurel
#

again thanks

mellow vector
#

yall use nbextensions?

mellow vector
#

hmm Well I'm annoyed with the installation process now and have abandoned that. I was hoping to find a tool that would assist in writing docstrings, going start spending some time exercising my tech writing synapses. If yall suggest anything for jupyter I'm all ears.

#

I'm also thinking I'll commit to the numpy standard, data analysis is my end goal for studying python.

mellow vector
#

๐Ÿ‘‚๐Ÿ‘‚๐ŸŒฝ๐Ÿ‘‚

rich moth
vocal zealot
agile cobalt
delicate apex
#

youtube should stop being dum with the constant anti-adblock war and disable video speed controls

vocal zealot
#

Chrome extension\

wintry relic
#

woa
this isn't data science...

delicate apex
#

ah, fooey. this is not an ot channel. did you mean to post this here?

static falcon
#

What a pathetic response by Stelercus. @viral zinc Learn both. Becoming comfortable with both will only make you a stronger candidate in the job market whether it's academia, industry, or government. On a further note, you can have fun spending a lot more unnecessary time reading in .csv or Excel data and creating a plot in Python that would take mere moments in R (and people are short on time, so use common sense).

serene scaffold
static falcon
#

Get comfortable understanding statistics and probability theory at its core. You should study calculus and, once in college, take a proofs course and then move into real analysis. Application comes from Theory, and without a solid foundation, well... just think of "The Three Little Pigs."

serene scaffold
#

I'm not saying you can't do any of these things. I'm just trying to understand your motives.

pliant echo
#

Heyo question what are the alternatives to confidence score if the model you're using(Gemini) doesn't support them

pine lake
#

yo guys, can you please tell me some good textbooks on ML and AI?

wooden sail
brazen cape
#

hey guys I have some questions about ml could someone help me with that

#

what are ml specializations and what is recommendation systems all about

thorny geode
#

last time i am forced to learn anova and regression tools in my competition, but i see some real world results from those kind of case study questions, so im now trying to find some similar projects like that

lunar gyro
#

I want to be a data analysis. Rn I'm doing under graduation in computer applications. So, could someone suggest me that what projects should be I looking for a good company and earnings.

earnest widget
desert oar
grand breach
#

high cardinality features only the target variable is binary

#

some features have 1000s of categories

desert oar
feral blade
#

Hello! I wanted some help with ai tools.
I need to run a batch llm query on a dataset column with about 4k entries.
is there any that lets you do this for free without card info?

serene scaffold
#

Crypto miners ruined free compute for everyone

versed pilot
# lunar gyro I want to be a data analysis. Rn I'm doing under graduation in computer applicat...

Do you know SQL or Pandas/Python, or R? Do you know any visualisation tools (python or R libraries, Tableau, Power BI). Do you know basic statistics, normal distribution, means/medians/standard deviations etc? That's probably a good starting point, try and put together a portfolio to demonstrate your knowledge, ideally also displaying some domain expertise about the data you are analysing.

feral blade
serene scaffold
#

The ones on hugging face can be used "anywhere" that has a large enough GPU

#

I would try mistral 7b. That means it has seven billion parameters. You might be able to download and load it on collab.

#

You probably won't be able to run any 70b models.

feral blade
#

Oooh awesome, thanks a lot ๐Ÿ‘Œ

quartz lotus
#

Does anyone know if pyautogui works with open cv template matching? I can't seem to get it to work in runtime but for static images i can make template matching work

nocturne valley
#

"data science and ai" feels like "statistics and epistemology"

serene scaffold
nocturne valley
#

"math and living"

#

lol

serene scaffold
#

!otn a statistics and epistemology

arctic wedgeBOT
#

:ok_hand: Added statistics-and-epistemology to the names list.

quaint mulch
#

!otn

quaint mulch
quaint mulch
quaint mulch
#

I mean, something more data-sciency

quaint mulch
# left tartan Minecraft?

According to wiki, Notch is already have a job as a game dev.
I mean, someone who are not already doing data-science, or maybe a huge career jump from data analyst to a data science role and triple their pay

left tartan
quaint mulch
fresh bay
#

does pytorch geometric not have a way to represent my connections with edges in the data class?

earnest widget
# quaint mulch And also <@1114957477889441822> Can you give an example of someone, making a pr...

Well maybe if youโ€™re looking at a project, it could be something where investors are ready to invest in or a big project which is open sourced and used by many. Thatโ€™s something that catches a lot of eyes. But it has to be something really eye catching by tech firms or investors especially.

Or if youโ€™re interested in the research field, then maybe some groundbreaking paper will definitely turn some eyes towards you.

quaint mulch
#

then maybe some groundbreaking paper
...cries... (even incremental is already too hard!!!)

quaint mulch
finite narwhal
#

any ideas why this is not working?

conda install -c conda-forge -c schrodinger pymol

or

conda install -c conda-forge -c schrodinger pymol-bundle

PackagesNotFoundError: The following packages are not available from current channels:

also the package is available: https://anaconda.org/schrodinger/pymol
this (listed in above link) doesn't find it either conda install schrodinger::pymol

versed pilot
# quaint mulch And also <@1114957477889441822> Can you give an example of someone, making a pr...

So often in interviews you will get asked about a project you worked on, you might even be asked to give a presentation on something. You can always talk about projects with previous employers but only up to a point, you can't give confidential and commercially sensitive information of the previous employer to the next employer. So a good personal project might be an alternative there. A good university project likewise, but as the years go by, you can't keep referring to your student work.
You can also present pet projects in meetups and community events which usually are good networking opportunities.

finite narwhal
#

How do you all resolve dependency problems? I'm trying to get a simple example with a recent version of pymol and rdkit and running into a quagmire of dependency incompatibilities. I don't have a preference for python version etc. just get that basic example running.

#

added an env.yaml file but can't seem to get the versions right for it to work

#

ok specifying no versions helps, then I can copy the compatible versions found

#

what but the versions chosen are mega old, like python 3.6, pymol 2.3.5, rdkit from 2018 !??

versed pilot
#

Python 3.6 is no longer supported I think

finite narwhal
#

does the automatic solver (when I don't specify any versions) pick the most recent possible? or what is it doing

#

and if not how do I find otherwise the compatible versions, there's a ton of dependencies and I can't possibly figure out what goes with what

#

I mean sub-dependencies of pymol and rdkit

scarlet anchor
#

Hi, I want to use any good AI model like LLama or any other to generate synthetic data. Specifically want to generate synthetic data to predict sentiments on Indian langauges. Are there good ones I can use?

serene scaffold
scarlet anchor
#

Due to lack of data, I want to geenrate synthetic data

serene scaffold
#

ah right. I see.

scarlet anchor
#

for now, I am trying this

serene scaffold
#

which languages specifically?

scarlet anchor
serene scaffold
# scarlet anchor Hindi, Kannada, Tamil, Telugu - Indian Lanugages

Do you speak all four of these?
I just asked llama3-70b to generate some text in Hindi. I don't know what it means or if it's correct.

เคซเคผเฅ€เคธ เคฌเฅเค• เคฌเคฟเคฒเฅเคกเคฟเค‚เค— เคฎเฅ‡เค‚ เคฎเฅŒเคœเฅ‚เคฆเคพ เคธเค‚เค•เคพเคฏ เคธเคฆเคธเฅเคฏ, เค•เคฐเฅเคฎเคšเคพเคฐเฅ€ เค”เคฐ เค›เคพเคคเฅเคฐ, เคธเคพเคฅ เคนเฅ€ เคธเคพเคฅ เค…เค•เคพเคฆเคฎเคฟเค• เค…เคงเฅเคฏเค•เฅเคท เค”เคฐ เคกเฅ€เคจ เค…เคชเคจเฅ‡ เค†เคช เคธเฅ‡ เคคเคพเคฒเคฎเฅ‡เคฒ เค•เคฐเค•เฅ‡ เค‡เคฎเคพเคฐเคค เค•เฅ€ เค‡เคธ เคฎเค‚เคœเคผเคฟเคฒ เคชเคฐ เค‰เคชเคฒเคฌเฅเคง เคนเฅˆเค‚เฅค

scarlet anchor
scarlet anchor
#

translation ^

serene scaffold
# scarlet anchor Hmmm. no, I only know Kannada

เฒฎเฒพเฒจเฒตเณ€เฒฏ เฒถเฒฟเฒ•เณเฒทเฒฃเฒฆ เฒชเณเฒฐเฒฎเณเฒ– เฒ‰เฒฆเณเฒฆเณ‡เฒถเฒตเณ†เฒ‚เฒฆเฒฐเณ† เฒธเฒฐเณเฒตเฒพเฒ‚เฒ—เณ€เฒฃ เฒ…เฒญเฒฟเฒตเณƒเฒฆเณเฒงเฒฟเฒฏเฒจเณเฒจเณ เฒธเฒพเฒงเฒฟเฒธเฒฒเณ เฒจเฒฟเฒฐเณเฒฎเฒฒ เฒฎเฒจเฒธเณเฒธเฒจเณเฒจเณ เฒฎเฒคเณเฒคเณ เฒธเณเฒถเฒฟเฒ•เณเฒทเฒฟเฒค เฒฎเฒจเฒธเณเฒธเฒจเณเฒจเณ เฒฌเณ†เฒณเณ†เฒธเณเฒตเณเฒฆเณ.

#

how is that?

serene scaffold
#

you'll need to confirm with fluent speakers of each language that the LLM reliably produces correct text.
neural machine translators might still produce correct-sounding translations, even if the original text contains mistakes that a fluent speaker wouldn't make.

scarlet anchor
#

๐Ÿ‘

#

yes thanks

serene scaffold
#

also, llama3-70b takes a huge amount of GPU space. if you don't have an enterprise compute environment, you will need to pay for one. or buy some API credits.

scarlet anchor
#

yes

scarlet anchor
serene scaffold
#

you can ask your university if they have a compute environment that you can use

#

but you can't run llama3-70b on your own computer, and no one is going to give you enough compute to do it for free. except maybe a university that you belong to.

scarlet anchor
#

๐Ÿ‘

narrow merlin
scarlet anchor
narrow merlin
#

yes free

scarlet anchor
narrow merlin
#

yes

#

just register and start using, all free, they show you how it WOULD cost if they activate the payment, which just makes it even cooler, cause its splinter of PENNIES what that stuff cost overall.

#

and sambanova.ai is also free there you even got the 405b bomber

#

but i tell you: big models are not your solution

scarlet anchor
#

๐Ÿ‘

narrow merlin
#

oh right and huggingface also allows a degree of usage of the inference api inside some rate limits, i am just not sure if you can make it work for the 405b but i think 70b should work somewhere, i am just sooooo much not getting their complete product world yet

jaunty helm
scarlet anchor
# jaunty helm gemma might be a better fit for multilingual

sadly tat didnt work too !!

My strength lies in understanding and generating text in English. Creating grammatically sound sentences in another language requires a deep understanding of its rules and nuances, which I don't currently possess.

narrow merlin
#

well gemma knows a lot of languages, but its not specific made for languages, but its general a good model, and i hope you use gemma2 ๐Ÿ˜‰

#

if you want the full blown langugae stuff you can use the aya, but aya is really bad at everything else ๐Ÿ˜„

scarlet anchor
#

Hmm I do use gemma for other purposes ofc

narrow merlin
#

actually i think thats even now outdated hahaha

scarlet anchor
narrow merlin
#

oh right qwen was the one

#
#

but yeah, its always a problem what you wanna do with the language

scarlet anchor
#

yess

narrow merlin
#

testing testing testing

#

chainforge

scarlet anchor
#

haha ye

bright garden
#

Was just profiling my PyTorch Lightning code. Does anyone know why configure_optimizers() is called 4-6 thousand times?

#

I have a very standard PyTorch Lightning module with a config_optimizers method defined

#

Does this have anything to do with the learning rate scheduler?

serene scaffold
#

can someone help out in #1297970307747020850? it's a pretty simple question about numpy usage. I need to run an errand.

pastel frost
#

Hi, I'd hate to fill the chat with absolute newbie questions. I finished all my gen ed classes at community college and transfered to another college to take my data science major. Brand new to coding and coding classes! If anyone can help with methods to study / anything data science career related. Please let me know PM if anything too I knew i'm really new. I also have a really big test on functions/tupples/lists next monday that im super nervous about! So, big SOS ๐Ÿ˜„

vestal spruce
#

Hi, quick question I wnat to use NLTK for sentiment analysis but the language dataset I'm using is not English, do I need to find and use a local language tokenizer/preprocessing so it can provide an accurate results?

serene scaffold
pastel frost
serene scaffold
rich moth
#

Iโ€™m having an issue with shape mismatch during the validation phase of my model the input data that includes fixed variables like ['a1', 'an', 'ak', 'n', 'd', 'Sn', 'k'], which Iโ€™m using to solve arithmetic progression formulas. Heres the code and an example. https://paste.pythondiscord.com/HT6Q

FIXED_VARIABLES = ['a1', 'an', 'ak', 'n', 'd', 'Sn', 'k']  # Added 'Sn' and 'k'

# Define symbolic variables based on FIXED_VARIABLES
a1, an, ak, n, d, Sn, k = sp.symbols(FIXED_VARIABLES)

# Key AP formulas using fixed variables
AP_FORMULAS = [
    sp.Eq(an, a1 + (n - 1) * d),  # an = a1 + (n-1)d
    sp.Eq(a1, an - (n - 1) * d),  # Rearranged formula to solve for a1
    sp.Eq(ak, a1 + (k - 1) * d),  # ak = a1 + (k-1)d
    sp.Eq(an, (2 * Sn) / n - a1),  # Derived from Sn = (n/2) * (2*a1 + (n-1)*d)
    sp.Eq(d, (an - a1) / (n - 1)),  # d = (an - a1) / (n-1)
    sp.Eq(Sn, (n / 2) * (2 * a1 + (n - 1) * d)),  # Sum formula Sn = (n/2) * (2*a1 + (n-1)*d)
    sp.Eq(Sn, (n / 2) * (a1 + an)),  # Sn = (n/2) * (a1 + an), alternative form
]
Error in predict_formula: mat1 and mat2 shapes cannot be multiplied (1x6 and 5x256)
Input tensor shape before passing to model: torch.Size([1, 6])
2024-10-21 16:43:12,504 INFO:Generated 0 valid novel formulas.
2024-10-21 16:43:12,504 INFO:Testing model with an example...
Input tensor shape before passing to model: torch.Size([1, 6])
2024-10-21 16:43:12,505 ERROR:Error in predict_formula: mat1 and mat2 shapes cannot be multiplied (1x6 and 5x256)
2024-10-21 16:43:12,505 INFO:Scenario: {a1: 5.0, n: 10.0, d: 3.0}
2024-10-21 16:43:12,505 INFO:Predicted Formula: None
2024-10-21 16:43:12,505 INFO:Main training pipeline completed.```


The model trains fine but durning the start of the formula validation i get a mismatch error the one above.  During training, the input size is 5 features, but during the validation phase, the input shape becomes [1, 6], likely due to the formulas or how the input data is structured at that point
tawdry sundial
#

how can i improve my model accuracy?

#

I want to improve my randomtreeregresssor accuracy

# Preprocessing for numerical data
numerical_transformer = Pipeline(steps=[("Scaler", StandardScaler()),("Imputer",SimpleImputer(strategy="constant"))]) # Your code here

# Preprocessing for categorical data
categorical_transformer = Pipeline(steps=[("Imputer",SimpleImputer(strategy="most_frequent")),("OHE",OneHotEncoder(handle_unknown="ignore"))]) # Your code here

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

# Define model
model = RandomForestRegressor(n_estimators=100,random_state=0) # Your code here

# Check your answer
step_1.a.check()```
#

this iss currently my pipeline

rich moth
rich moth
#
2024-10-21 21:07:48,134 INFO:Attempt 856: Valid data point generated: {'n': 65.44522712125084, 'a1': 2.0614216909210743, 'd': 22.104379657779848, 'an': 1426.5831891109}
2024-10-21 21:07:48,273 INFO:Attempt 857: Invalid target value: None. Skipping.
2024-10-21 21:07:48,413 INFO:Attempt 858: Invalid target value: None. Skipping.
2024-10-21 21:07:48,611 INFO:Valid solutions for Eq(Sn, n*(2*a1 + d*(n - 1))/2): [1.27052504310215]
2024-10-21 21:07:48,611 INFO:Attempt 859: Valid data point generated: {'Sn': 69.45588675795838, 'd': 71.04840790543672, 'a1': 45.05688735810755, 'n': 1.2705250431021455}
2024-10-21 21:07:48,780 INFO:Attempt 860: Invalid target value: None. Skipping.
2024-10-21 21:07:48,878 INFO:Valid solutions for Eq(an, 2*Sn/n - a1): [1.76248570047424]
2024-10-21 21:07:48,878 INFO:Attempt 861: Valid data point generated: {'Sn': 82.19978323777535, 'an': 22.98209692884189, 'a1': 70.29500962091075, 'n': 1.7624857004742391}
2024-10-21 21:07:48,914 INFO:Valid solutions for Eq(an, a1 + d*(n - 1)): [1.02059088463249]
2024-10-21 21:07:48,914 INFO:Attempt 862: Valid data point generated: {'an': 61.113350388683386, 'n': 38.768650571295055, 'a1': 22.567009890750036, 'd': 1.020590884632487}

Woohoo!

rich moth
#

i'm using yoshitomo-matsubara/srsd-feynman_easy from HF, but it contains physics equationsand various variables, and I'm trying to discover new relationships between them. Here's a correlation heatmap I generated to analyze how the features in the dataset interact with each other.

rigid cape
#

Hey people. How do I improve the results of my RAG project. I'm not getting good results from my direct retrieval from the vector database. Any resources or article links also would be great .

past meteor
rich moth
past meteor
#

Things like โ€œsummarise my documentsโ€ will not work

rich moth
past meteor
#

Yea, but thatโ€™s not summarising your documents

#

That will obviously work

rich moth
#

Well it gets embeded into the index.

past meteor
#

The problem is just that some questions donโ€™t really result in a retrievable set of documents, thatโ€™s what I meant

rich moth
past meteor
#

So Iโ€™m more interested in what questions he wants to ask

#

Not necessarily the code, depending on the questions you want to ask itโ€™s already a dead end and you should consider other things than rags

#

Like, imagine you have a recipe book. You embed each recipe separately. You shouldnโ€™t ask โ€œgive me a recipeโ€, your result will be pretty much arbitrary

rigid cape
#

So I'm making my RAG application where in I've used some of my textbooks and previous year question papers as the data. So I need to get questions related to a topic whenever I ask about a particular topic

#

If I give it a broad chapter name , it does give me some questions related to the chapter that have the name in . But if I give it some specific topic , it fails. I'm testing it on various ways to adjust my prompt and retrieval mechanisms.

rich moth
#

You need to embed your data into the vector database and use a similarity function to measure the vectors in order to perform efficient nearest neighbor searches. Did you already decied your framework?

rigid cape
#

Maybe RAG isn't suited for this ? I don't know , just gave it a try

rigid cape
rich moth
rich moth
#

Is it just text? English?

#

I recommened a sentence transformer or BERT,, but ive had lots of sucess with both of them

rigid cape
#

I didn't change the math to latex but used the utf encoding as it is for say - summation, integration etc. was too lazy to do that. Maybe that could have caused some problems

rich moth
#

It could be your data too, have you tried to view it?

rigid cape
#

The problem I get is usually in the retrieval part .

#

I let my code usually print out the text it retrieves and then use it for the generated output

#

I guess I'll try re-encoding my math into latex . I'm still trying out various prompts . Hope it gets better.

rigid cape
rich moth
#

I think my synthetic data set is finally ready. wow what a pain in the ass. let me tell you.

past meteor
#

Ultimately, itโ€™s just important you know what you want to do first and then you can see if your approach is fit for purpose

unkempt apex
#

anyone have exp with React + FastAPI?

#

getting some issues while deploying it

warm girder
#

From where do i start as a beginner for data science?

serene scaffold
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold
past meteor
unkempt apex
#

getting issues with deploying FastAPI + React webapp on vercel

#

just reply and I will send the logs

past meteor
unkempt apex
past meteor
#

Are you using vite?

past meteor
#

Are you planning on fastapi serving the static files?

unkempt apex
#

it is now ready to deployed

unkempt apex
#

wanna see?\

past meteor
#

I havenโ€™t used vercel yet but if I were you Iโ€™d have CI/CD on GitHub actions where the CI runs npm run build, puts it in your static folder and the CD deploys your Python project like any normal one

unkempt apex
past meteor
#

Not that I know of but it should be pretty easy ๐Ÿ˜„

Iโ€™d start with making a 2 bash scripts, call one CI and one CD.

In the CI bash script you basically build your project and potentially mv the files to the right place (Pythonโ€™s static folder).

In your CD bash script you use vercel cli to deploy your Python project.

After you have these show them to ChatGPT and figure out how to turn them into a GitHub actions yaml

#

Although, youโ€™re the first person I know to run Python on vercel

unkempt apex
unkempt apex
signal whale
#

wondering if the way i am implementing np.log1p is appected because this way my results are 10x better

upbeat prism
#

Hi, I implemented my own transformer based on the attention is all you need paper and it works. Now I wanna do the following classification task using the encoder block only. Given a list of 10 numbers, each between 1 and 20 I wanna ask if the first element in the ist is repeated. E.g. [1,2,3,4,5,6,7,8,9,0] -> False but [1,2,3,4,5,6,1,8,9,0] -> True.

So now I need a ClassifierHead. Basically my output of my encoder block is of dim (batch_size,sequence_length,d_model).

So basically I need to go from (sequence_length, d_model) to (2,)

How do I do that? I know I can reduce one dimension with a linear layer but it's still a 2D input.

#

for this specific task, I think I should basically make a linear map from (batch_size, 0, d_model) to (batch_size, 2) i.e. focusing on the first token but I'm a bit unsure how to argue for that.

I guess my question boils down to: What's the actual output of the encoder?

#

of course, I could also just use the (normalized) logits and feed it to a loss function?

#

(that already includes sigmoid/softmax like Bceloss from torch)

mint palm
#

full padding in convT

#

??what is does?

vestal spruce
#

Hi, a question about sentiment analysis, if I wanted to make an analysis to factor in the age of said sentiment, how do I quantize/calculate and include this metric into my model?

jaunty helm
#

How do you guys usually deal with missings in time series?
Right now I can bin them (i.e. original data is in 1 min intervals -> round to 5 min bins) and take the mean or something, or I can try imputation, or there's some other method I'm unaware of

rich moth
#

I did it baby! Watch out Claude and Chatgpt! Im coming baby!

versed pilot
# serene scaffold

Nice muscle guys, what food supplements would you recommend ? ๐Ÿ˜›

dense star
#

I have a question it is possible with CNN model to skip a part of a object that I'm comparing? Can I use like open cv to check it and program to get a certain part of a object and then compare? Or something like a backup picture and check if it's the same error on the object en skip it?

cyan urchin
#

Hello, good day! I would like to ask if some good people here know any fast audio noise cancellation algorithm that will be used for real-time processing.

oblique isle
#

hello guys , wht are the best plateform where i can really prepare my ML interview

#

theory + programming

serene scaffold
# oblique isle hello guys , wht are the best plateform where i can really prepare my ML intervi...

you shouldn't need to do lots of extra learning in addition to the education and experience that you got leading up to the interview. you probably won't retain it all anyway. they want to talk to you because they think the material on your resume is relevant to what they'd need you to do.

practice talking clearly and confidently about items on your resume. what projects did you contribute to, and what were your specific contributions?

oblique isle
serene scaffold
#

which is why my recommendation is to practice talking clearly and confidently about your resume items.

oblique isle
serene scaffold
#

you've probably spent months or years in school up to this point learning this kind of material. how much of a difference will three days of cramming really make?

#

(reasonable people can disagree with me on this, but that's my opinion.)

oblique isle
#

well i dont agree wth u completly but there is sm points where i do agree with u, but i got your point . and for real i appreciate your effort of responding !! โค๏ธ

#

that was helpfull

serene scaffold
#

you are welcome pepefedora

vestal spruce
#

Hi is anyone familiar with TextBlob? I just started learning how to use it and wondering if the package include stop word on it's model?

#

Ok I just looked it up apparently textblob doesn't have stopwords, so I guess adding nltk is necessary for this preprocessing task :/

rich moth
rich moth
#

I had this crazy idea of using AI to predict prime numbers, and guess what? It actually works! lol so I trained 3 different ML models on big dataset of primes, and then combined their predictions for accuracy. It even generates an Ulam spiral to visualize the patterns!

rich moth
# serene scaffold How did you do it?

So after I was reading about the Riemann Hypothesis, I had this weird thought .What if instead of trying to PROVE patterns in primes, we tried to PREDICT them using AI?

brave stream
#

Hi, does anybody have any experience using the Magenta library. I'm attempting work on a audio synthesis project using audio files/spectrograms and am attempting to follow along with Magenta's guides on installation/implementation but it doesn't seem properly supported by google collab anymore or their guides just aren't current anymore. Are there known workarounds/forks/repositories that account for this?

rich moth
#
INFO:__main__:Generating primes...
INFO:__main__:Generated 78498 primes
INFO:__main__:Training quantum ensemble...
INFO:__main__:Extracting features for training...
INFO:__main__:Training with feature shape: torch.Size([78497, 6])
INFO:__main__:Epoch 0, Average Loss: 49.3643
INFO:__main__:Epoch 10, Average Loss: 1.6099
INFO:__main__:Epoch 20, Average Loss: 1.2289
INFO:__main__:Epoch 30, Average Loss: 1.0271
INFO:__main__:Epoch 40, Average Loss: 0.8719
INFO:__main__:Epoch 50, Average Loss: 0.8199
INFO:__main__:Epoch 60, Average Loss: 0.8696
INFO:__main__:Epoch 70, Average Loss: 0.6831
INFO:__main__:Epoch 80, Average Loss: 0.6840
INFO:__main__:Epoch 90, Average Loss: 0.7866
INFO:__main__:Training Random Forest...
INFO:__main__:Making predictions...
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 99/99 [00:06<00:00, 14.41it/s]
INFO:__main__:Mean Absolute Error: 10.0802

Im generating 5,761,455 primes now, Ill share the results with ya guys when its done

rich moth
#

From a security perspective, what implications might this have if an AI could reliably predict prime number patterns?

serene scaffold
serene scaffold
#

@wooden sail What is the largest known prime n for which all prime numbers lower than n are known?

#

(there might be undiscovered prime numbers that are less than the largest known prime)

rich moth
#

Dude, I was imagining a blockchain where each new block represents a verified prime number, secured through a DAG consensus mechanism. Miners would contribute by finding new primes, I mean instead of just generating meaningless hashes, what I mean is the output of the computation is actually useful information we can build a ledger. I mean its actually contributing to science.

rich moth
#

This ones a little more interesting.

gritty vessel
#

Hey can I ask questions related to data processing in this channel?

rich moth
gritty vessel
#

OK thanks

#

I am working on satellite data lets say i have two satellites data namely l1 and l2 I have to extract the area covered by l2 from l1

#

data in l1 is like a Quadrilateral shape and region covered by l2 is curved strip passing through that Quadrilateral

#

Currently I am using ckdtree to extract the nearest cordinates in l 1 and l2 with tolarance of of 0.057

#

Ckdtree is working fine but i thought if there is any better apporach to do this

#

in this that red line is l2 and whole region visible on map is data of l1

rich moth
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

gritty vessel
#

should i share my code ? that iam using ?>?

wooden sail
# serene scaffold <@467435887236612106> What is the largest known prime `n` for which all prime nu...

i'm honestly not sure, though i think the largest ones like the recent one in the news are larger than that. but google tells me computing trillions of primes is a common thing, and it's a list longer than one would want (or be able to) store. here's a link to something i found on reddit https://t5k.org/notes/faq/LongestList.html that talks about it and provides a list of 50 million primes, but the TL;DR is that you'd need a method of generating the primes on the fly, at least up to 10^18 if you want to be competitive with this person's website

rich moth
gritty vessel
#

Currently I am using this way

hazy vector
#

Anyone help me i want to learn coding and spoken english at the same time, i am confused to how to do that?

unkempt apex
lapis sequoia
#

Has anyone build cicd pipeline for azure databricks notebooks using azure pipeline??

mint palm
#

if an unknown company approaches you, should they give some hint about the pay range?
I hate it when there is no way I can get an idea about pay range
and asking upfront doesnt seems very convenient as a candidate.

signal whale
#

@rich moth can you help?

rich moth
signal whale
golden canyon
#

hey guys, I am looking for some advice on how to start being invited for the interviews as an ML engineer being fresh out of university? I am totally lost

rich moth
rich moth
# golden canyon hey guys, I am looking for some advice on how to start being invited for the int...

I would think attending those tech conferences and trying to link up and meet other people would be a good bet. I imagine have some cool projects on your github would be a plus. Resume, anyone can generate that. I would want to see your thoughts to actions in form of projects or something with some depth. I mean, that's what I would look for if I was in that position. it's like that one movie "field of dreams". theres a quote, "if you build it, they will come". Have you tried just asking google or a chatbot? nowadays they're great places for resources, use the new free claude, ask that same question.

vocal cave
vocal cave
#

with a whopping 41 million digits

#

The largest known prime number is 2136,279,841 โˆ’ 1, a number which has 41,024,320 digits when written in base 10. It was found on October 12, 2024 by a computer volunteered by Luke Durant to the Great Internet Mersenne Prime Search (GIMPS).

A prime number is a natural number greater than 1 with no divisors other than 1 and itself. According to ...

#

https://en.wikipedia.org/wiki/Euclidโ€“Euler_theorem
it was this obscure theorem I was talking about

The Euclidโ€“Euler theorem is a theorem in number theory that relates perfect numbers to Mersenne primes. It states that an even number is perfect if and only if it has the form 2pโˆ’1(2p โˆ’ 1), where 2p โˆ’ 1 is a prime number. The theorem is named after mathematicians Euclid and Leonhard Euler, who respectively proved the "if" and "only if" aspects o...

rich moth
#

I finally got all the losess to converge, but I need some serious equipment to train it, lol. 128gigs ram and 24gigs of vram isnt cutting it, it gets killed along the way due to OOM

#

Tonight Im gonna combine the parquet files of MTG and pokemon. If' im gonna train this thing, I imagine it will be on AWS EC2? i'm gonna need to recreate the env in ubuntu. Anyone got any ideas?

shut girder
rich moth
# shut girder Hi, I am not experienced enough to help. But this seems very interesting, what d...

I'm building a multi-modal AI model thats learning to understand trading card games, i built datasets with all the pokemone and MTG cards and orgainzed them. Roughly 25 gigs of data. Im usuing a VQVAE to learn the card layouts, art visual elements, diffusion to help generate and refine the images and CLIP to align images with the text descriptions. I'm using sentenece transformer to understand the card mechanics and flavor of the text, card names, card rules, etc to understand the different writing styles. I made a transforme to understand all the card stats, types, subtyypes, etc. This confusion matrix shows how its learning to classify the cards. I'm just doing pokemon right now until I get the final touches on it. Im using optuna to fine tune the hyperparmeters. The idea is to get it to understand the realms of both and generate new and unique cards based on prompts.

#

Theres some other things going on, but thats the geist.

versed pilot
signal whale
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

signal whale
toxic mortar
#

How do you guys aproach parsing images within a document?

Iโ€™ve reached the point where I can extract them and use OpenAI vision models to analyze them, meaning I have two independent components ( ones is textual corpus and the other textual with image analysis origin ).

I want to embed the analyzed image context at the correct place within the parsed textual content of the page.
Is there any prebuilt solution for this within a Llamaparse which I am not aware of?

serene scaffold
#

<@&831776746206265384> advertising @cosmic patrol

rugged mist
#

@cosmic patrol please do not send advertisements in this server without approval from moderators

signal whale
#

is there a way i can add 2 tables together in looker studio dim and fact tables ๐Ÿ™‚

left tartan
signal whale
#

wanna have something like this

#

as 3d table

left tartan
#

Oh, it's a looker question, not a query question.. nm, can't help ๐Ÿ™‚

signal whale
#

yeah i sucks like power bi more lol and i use linux then you know it is bad

signal whale
#

fixed it

fickle shale
fickle shale
rich moth
long robin
#

Bottleneck issue in PyTorch:
I will be grateful if someone's solution or suggestion works for me.

I am learning deep learning using PyTorch. When I train CNN models. In some datasets or sometimes while using pretrained models, it doesn't utilize my GPU at all. I have a RTX-3050ti. And sometimes it works. Another thing is that when the num_workers=0, everything on the main thread it didn't give me the usse but for values greather than 0, it was causing the same low/zero GPU usage.

I have read so many articles, forumns, other stuff etc. But I am not able to understand what's actually happening.

jaunty helm
fickle shale
floral delta
#

hye guys I'm doing a data visualization with a BDS algo but the graph has some bugs when I run it, do u know why it happens

sharp crest
#

Screenshot Your Bugs

earnest widget
fickle shale
floral delta
earnest widget
floral delta
#

I solve it but idk how to implement this point in my function: The function in point four above should accept i.e. have three arguments that can be changed, as

  1. graph_name.
  2. (Start/initial) student name who wishes to connect.
  3. (End/goal) student name of target person.
    For example, if you store the relationships under a structure named โ€œGraphXโ€ and โ€œEmaโ€ wishes to get
    introduced to โ€œBobโ€ then you would call the function from the main as follows:
    BFS_firstname (โ€œGraphXโ€, โ€œEmaโ€, โ€œBobโ€)
#

cuz in my State is already declared my initial State = myName and my GoalState = Jill but idk what to exactly do in that point

fickle shale
fickle shale
#

Check this!

fickle shale
earnest widget
fickle shale
#

It's working but from today it's not working

earnest widget
# fickle shale so how can i correct it?

I don't think anything can be changed for it because the view is optimized through Kaggle. Instead maybe you could put it in a form of a pandas dataframe? But it won't look like a proper colour table. Looking at other notebooks, it seems that's the only way to provide a dataset description or you could make it in the form of a list view. That's about it I guess.

broken eagle
#

Anyone here worked with streaming dataset, and dataloader? need some help for finetuning a Blip model. If you have any experience or reference notebook. Please hit me up. thank you.

hearty token
#

How can I precisely describe this? I've attempted using VLMS but they tend to get the precise coordinates wrong

cyan birch
#

i am dealing with outliers problem and i have two approaches for dealing with outlier
create class have iqr and zscore and make comparison then create filter with best function of them or
i could use the scipy zscore and convert the outlier to null and then use decision tree to predict them again this could use overfitting
what should i do to address this problem either to reduce the data or impute the data

hardy depot
fiery citrus
#

Hello, i have my training code working for my AI model but when i try to use my GPU tensorflow just outright doesnt detect it.

I have an AMD RX6600. With this code the GPU count is 0, could it be that my GPU isnt supported?

import tensorflow as tf

# Confirm TensorFlow is using GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
#

I have found out that i am an idiot and TF doesnt support AMD, ill use a different modules

jaunty helm
unkempt wigeon
#

What functions are good for a conventional neral network my apologies

serene scaffold
unkempt wigeon
unkempt apex
unkempt wigeon
#

I want the AI to tell these apart

serene scaffold
unkempt wigeon
#

Yes what type of defs I should use just in case I want to give it more data of images of the cars

Do I use a Relu or do I use a leaky Relu should I use the softmax for the beginning and end

spring field
#

I'm fairly certain that the choice of what activation function to use has been explained at some point over the last several weeks
you either follow some implementation from a paper or you just test with several or just go with your gut, nothing here is set in stone
for CNNs, a ReLU is probably one of the most common ones

in terms of code architecture:

what type of defs I should use
that's something you sort of figure out by trying out various stuff and seeing what works best for you, it's more of a software design choice really
like, on its own or rather in this context of neural nets, it doesn't really make that much sense as a question, it also somewhat indicates a lack of understanding of some core principles that you should already be familiar with before taking on a project like this

sinful surge
#

which version of tensorflow shall i install?? like am instaling conda as of now.

serene scaffold
sinful surge
#

leave the pytorch they are begineers.

serene scaffold
late lichen
#

So uhm gradient decent is Like you will take the relation of a parameter to the output right?

serene scaffold
wooden sail
#

i wouldn't split it like that, but pytorch is a safer bet

#

tf introduces breaking changes irregularly

serene scaffold
#

But There's nothing inherent to either library that makes it better for beginners or pros

indigo wing
#

Hey, I want to enrich llama 3.2 3B uncencored instruct with specific knowledge for fine tuning the model for specific needs. How do i prepare the dataset? The dataset is about 3gb english and maths text. I have no idea what approach to take to preprocess it. Most I have done is stripping whitelines and unwanted characters using regex. My goal is to make it for querying. Do I use llama? I have no idea, I am new to fine tuning and first time using gpu.

#

Do I embed the words first?

#

My dataset is pdf images with texts which I successfully extracted and saved via fitz and oswalf(hurray oswalk). I am also aiming for very specific answers and limiting knowledge and answer access based on user authentication level. Can someone please help me out here?

#

I also trued using llamaindex and weaviate instead of FAISS but Its hard to setup for me and I dont want to use gpt and want good resutlts. i dont even know what to do anymore.

unique spoke
#

Hey guys i am currently working on a computer vision project and im not sure if im going the right direction. So i made a flutter app which has a live camera stream and am trying to send each frame to the flask server containing the ai code which (not working yet but in theory ) should apply that code onto it. but am I going the wrong way. Am i supposed to create a camera livestream in the flask code and then somehow link it to the flutter because i saw some guy trying to do something similar to me on stack overflow and he went that direction. Any advice on how i should go about it? Step by Step process would be appreciated ( I will stick to flask and flutter tho)

twilit iris
# unique spoke Hey guys i am currently working on a computer vision project and im not sure if ...

Flutter App:
Use the camera package to capture the camera stream.
Convert each frame to a format that can be sent over the network (e.g., JPEG).
Use a library like http to send the frames to the Flask server.
Flask Server:
Use a library like flask to create a RESTful API that accepts frames from the Flutter app.
Use a library like OpenCV or TensorFlow to apply AI processing to the frames.
Send the processed frames back to the Flutter app.

iron basalt
# serene scaffold The only people I see using tensorflow are beginners following old tutorials

Tensorflow seems kind of like abandon-ware at this point. There are several issues open unresolved and the people still using it are starting to post stuff like https://github.com/tensorflow/tensorflow/issues/69586 .

GitHub

Issue type Build/Install Have you reproduced the bug with TensorFlow Nightly? Yes Source source TensorFlow version tf 2.16 Custom code Yes OS platform and distribution ubuntu 24.04 Mobile device ws...

#

My guess is that Google focused instead on Jax.

serene scaffold
iron basalt
marsh marsh
#

can anyone help me with Sql Alchemy?

iron basalt
#

Biggest issue with this is that there are a ton of old papers that used it and now those can't be reproduced, which kind of invalidates them. OpenAI Gym was having the same issue but it was forked and now maintained by the Farama Foundation.

serene scaffold
serene scaffold
iron basalt
#

Gym took a lot of effort because it has to be the exact same, for example, the image resize algorithm had to be the same, there is no standard way of doing that actually, and so it took digging through a bunch of old code to reproduce it. A lot of the DQNs fail if that is even slightly different (no noticeable difference to the human eye). If TF has any of these same issues it means all those old projects are doomed.

serene scaffold
marsh marsh
serene scaffold
marsh marsh
toxic mortar
#

Hi guys,

How is it possible that my model invocation time is longer than the entire request handling time (which includes the model request)? I'm calling the HuggingFace API.

Any ideas?


@ModelInvocationTimeMetric.time()
def model_invocation():
  pass

@RequestInvocationTime.time()
def handle_request():
  # ...
  model_invocation()
  # ...
buoyant vine
#

Is the model on a GPU? It can be because the timer is being messed with by counting only the CPU time not the time spent on the gpu

toxic mortar
untold shoal
#

anyone here fairly familiar with langgraph?

young beacon
#

Hello, I have a set of different pssages, I'd like to identify the most frequent terms across them, and identify the theme of each passage and based on it group them in different categories

untold shoal
#

could prolly use ai for that somehow lmao

young beacon
#

I don't have AI available if you mean LLMs

untold shoal
#

low parameter models like llama3.2 exist that are designed to run on literal phones

#

models are getting small but powerful recently

buoyant vine
#
  • Most frequent terms is just tokenizing + counting
  • Theme/Classification into categories can be done with a GRU or similar with GloVe, Or you could use a small transformer BERT model
young beacon
#

But how do I pass all the 400 passages at once and identify patterns?

young beacon
buoyant vine
#

or do zero-shot classification with a bigger LLM like llama as dragon has said

untold shoal
#

you CF8 is u familiar with langgraph in any major way?

buoyant vine
#

nop

untold shoal
#

ah alr

young beacon
whole elm
#

I have this figure using matplotlib. I want to fill in the area between the blue and green plots. I have tried fill_between and fill_betweenx but can't seem to get the entire area filled. Does anyone have any suggestions?

bitter garden
#

This is a histGradientBoostingRegression plot from a 300 row dataset with n_splits=5, hence 5 folds. Avg R-squared is 0.77.
Would I be able to increase the R score if I restrict the model to train on only the lower end values? and maybe increase the number of folds which may or may not help idk.

stark echo
#

can anybody help me on how should i learn Tenserflow

rich moth
#

It's bascially a transformer with attention layers but also implementing uncertainty estimation using Monte Carlo Dropout while also capturing both aleatoric epistemic uncertainties, right now im just running optuna on the parameters.

grim carbon
#

Hi, is there anyone here who uses the lbph algorithm as a face recognition method?

untold cliff
#

Why does spacy's small english model (en_core_web_sm) return vectors (of size 96) eventhough it shouldn't have any embeddings according to the documentation?

rich moth
untold cliff
rich moth
long hazel
mental ivy
#

guys dose anyone use python NLTK?

serene scaffold
stable hollow
#

Hey guys I'm looking for tutorials on making maps and working with map data. I want to focus on the graphics side of things, and customizing visuals with shapefiles and possibly layering things to make more in depth charts. I'm trying to work with geopandas and seaborn right now. Any ideas?

terse crag
#

Hello, I have small problem with pyspark. I try to rewrite code from pandas where I have:

df.merge(df2, how="outer", left_on=["a"], right_on=["b"])

And I have got columns a_x and a_y. But when I do it in pyspark like:

df.join(df2, how="outer", on=["a"])

I have got two times column a... What did I wrong?

desert oar
#

Maybe #tools-and-devops would be better? It doesn't seem like a Python-specific question

strong wharf
#

Okay, I'll head there, thank you

rough herald
#

is it possible to use chatgpt in my python code not helping me code just to impopliment it into my code

summer glade
past bramble
#

any project ideas for AI? Should I try something different from neural network models? I only used TensorFlow in most of my AI apps, any other suggestions I should try?

past bramble
#

I have access to A100 GPU, can anyone guide me on making a text gpt with this?

rich moth
#

Truth is you'll learn faster than waiting for someone on here to show you. You're best bet is start to expereiment and ask for help along the way.

serene scaffold
#

What are you ultimately trying to do?

past bramble
#

i was shown a github repo in this channel guiding to building one requiring A100

#

I'll have to search for it

#

I'm not looking for a perfect advanced text gpt, maybe a simple one

#

found it

grand breach
#

I observed that after removing tomek links and undersampling only on majority class the ROC AUC score got reduced by 11 % (trained on 1,36,750 records with balanced classes) but when I trained initially over 37,668 samples (with balanced classes) that removed tomek links from both classes I got an ROC AUC score of 84%. I trained with xgboost which of these is actually a good model ? the one trained on lower samples or the one trained on more number of samples ?

grand breach
#

i mean both of them have way lesser number of samples in minority class compared to the real dataset which has 600k minority samples

lime palm
#

hey

rigid hamlet
#

Hey bro

#

Haha the bots were going nuts lol

harsh heron
#

Actually I want to make a project in how nerves works during our overthinking and when we interact with people

I want to write program in python of this concept so, I need someone's help

vestal spruce
#

Hi guys I just finished making a machine learning model comparison for financial sentiment analysis classification, so far I manage to find the best model to be svm w/ ngram-tfidf feature extraction and an accuracy score of 83% or so, at this point should I keep exploring better methods or should I focus my work on finetuning the hyper parameter for the best model?

cedar tusk
#

i dk the current version of the f1 score but if its less than 60% i mean

#

or its good for me as well

jaunty helm
#

accuracy is especially bad if your data is imbalanced
e.g. detecting fraud, there's way more data about normal transactions than fraudulent ones
just blindly guessing everything's normal gives you high accuracy

#

you can check others like f1, recall, precision, roc-auc

past meteor
#

I donโ€™t really like auc either

vestal spruce
opaque merlin
#

hello I am trying to do a arabic to english translation using Seq2Seq RNN model anyone experienced with NLP Pytorch can help, I wanted to ask a question

jaunty helm
grand breach
grand breach
past meteor
#

You care about the missclasification costs at the operating point you set

grand breach
#

I was reading that tomek links removes majority samples at decision boundary and is useful for dealing with class overlapping

past meteor
#

Imagine you have 2 cases, one with a higher AUC and one with a lower.

Itโ€™s entirely possible the optimal setting is found on the second one

#

Each dot on the AUCโ€™s curve is an operating point

pearl basin
#

Hey there,
I already know basic web dev(mern)
Some python and DSA, OOP, stats
But have no idea about AI/ML dev

Can anyone suggest me a roadmap? For ML engineer

jaunty helm
grand breach
cedar tusk
#

rest is u can do whatever u want at that point

grand breach
#

removing only majority samples in each tomek link sounds more correct but not sure why performance decreased, i'd used random undersampling which might have caused this, i'm guessing ? i'd used xgboost without a lot of parameters

desert plinth
#

Can anyone explain why when I run my RL agent for 1000 episodes (for the game snake) it just stops inputting by episode 500 and just dies to the top wall despite being penalised for doing so?

#

These are the first 10 epidodes:
Episode: 1, Total Reward: 2.80, Epsilon: 1.00
Episode: 2, Total Reward: 12.10, Epsilon: 1.00
Episode: 3, Total Reward: 45.80, Epsilon: 0.99
Episode: 4, Total Reward: 22.90, Epsilon: 0.99
Episode: 5, Total Reward: 2.20, Epsilon: 0.99
Episode: 6, Total Reward: 22.00, Epsilon: 0.99
Episode: 7, Total Reward: 5.40, Epsilon: 0.98
Episode: 8, Total Reward: 1.80, Epsilon: 0.98
Episode: 9, Total Reward: 0.40, Epsilon: 0.98
Episode: 10, Total Reward: 11.40, Epsilon: 0.97
vs the last 10:
Episode: 990, Total Reward: -0.00, Epsilon: 0.30
Episode: 991, Total Reward: -0.50, Epsilon: 0.30
Episode: 992, Total Reward: -0.40, Epsilon: 0.30
Episode: 993, Total Reward: 0.10, Epsilon: 0.30
Episode: 994, Total Reward: -0.40, Epsilon: 0.30
Episode: 995, Total Reward: 0.10, Epsilon: 0.30
Episode: 996, Total Reward: -0.20, Epsilon: 0.30
Episode: 997, Total Reward: -0.50, Epsilon: 0.30
Episode: 998, Total Reward: -0.40, Epsilon: 0.30
Episode: 999, Total Reward: -0.40, Epsilon: 0.30
Episode: 1000, Total Reward: -0.40, Epsilon: 0.30

unkempt apex
#

so as per you,

after you have trained it for 1000 episodes it is still dying to top wall right?

desert plinth
#

yep

#

and it has a punishment of -10 for doing so

unkempt apex
#

and till now you have only trained for 1k episodes?

desert plinth
#

yeah I'm pretty new so idk if that's bad lol

unkempt apex
#

for you idea,
1k is too short

desert plinth
#

But at the beginning it makes a lot of inputs and explores but because of the epsilon decay it stops making inputs towards the end

unkempt apex
#

read the DQN paper first

#

this one

desert plinth
#

okay thank you

unkempt apex
desert plinth
#

yeah

unkempt apex
#

but I will suggest to share about your environment , about your reward function

desert plinth
#

I linked the file above

unkempt apex
desert plinth
#

with all the code

unkempt apex
#

where?

#

plz share again if possible

desert plinth
#

I can't chat reply it since it pops up with an error

unkempt apex
desert plinth
#

Ohhhh

unkempt apex
#

paste the code in this link and share the link

#

you have your custom environment or you are using GYm??

#

wait does gym have snake game in it?

desert plinth
#

I just wrote the code for the snake game in with the agent

past bramble
desert plinth
#

I didn't use an environment

unkempt apex
desert plinth
#

I made it store the values for the weights after each run

unkempt apex
#

please don't use GPT and claude if you are learning

desert plinth
#

so it keeps and imroves them

unkempt apex
#

this is all AI made code

#

don't do like this

desert plinth
#

damn you're observant ๐Ÿ˜ญ

unkempt apex
#

wait I will share some articles/blogs where you can learn that

#

again please don't use AI when you are learning something

desert plinth
#

Okay

unkempt apex
#

it will destroy your baseline of understanding things

desert plinth
#

I thought I could grasp the concepts by seeing them done automatically

desert plinth
#

Where do you suggest I start?

unkempt apex
#
Paperspace by DigitalOcean Blog

This article gives a thorough introduction to reinforcement learning, covering topics like RL in robotics, Markov Decision Processes, Q-Learning, and more.

Medium

A gentle introduction to the key principles of Reinforcement Learning

desert plinth
#

Oh wow I was not expecting that

#

thank you

unkempt apex
#

I will recommmend to use pre made environment first
train them for like atleast 100k episodes

learn about DQN -> but first what is Q-table

#

and yeah, you can find different types of pre made environment in OpenAI - Gym

#

I have done like 300k episodes on my Pong game which has custom made environment

unkempt apex
#

read it in chunks and understand it step by step

cedar tusk
#

ah reinforcement learning, one type of learning i cant do because i cant code the game itself xD

unkempt apex
cedar tusk
#

because most likely scenario is that you getting the game logic false

#

and then create a biased and flawed model

unkempt apex
#

gym library provides all

#

you just have to choose right approach and train it

rigid hamlet
rich moth
# rigid hamlet Lots lol an ai stock market tool would be cool

That's what I'm working on. Well a model to perdict market volatility. I got 209 features from market prices technical indicators, options data economic signals to forcast the volatility of the market, well thats the plan. The idea was to back test it on Quant connect.

rigid hamlet
#

Thatโ€™s fucking amazing

#

Wowwwwwww

#

I wish I could join you @rich moth XD

rich moth
# rigid hamlet Wowwwwwww

really? thanks lol. im always open to work with other people. you never know what you could learn!

rigid hamlet
#

Yes 100%

#

Iโ€™m a big time investor in a sense

#

I dm you bro

craggy pilot
rich kernel
#

I need datasets (specifically udders of cows, teats, etc) for a project regarding early mastitis detection within cattle. I have looked at kaggle but images are pretty limited (they contain roughly ~100 images, but i would prefer thousands for accuracy). Does anyone know where I can find datasets?

long hazel
#

is this a place where i can post my code]

#

specifically im getting an error that has to with the .keras extension

#

ValueError: The filepath provided must end in .keras (Keras model format). Received: filepath=models/RNN_Final-{epoch:02d}-{val_acc:.3f}.model

tawdry sundial
agile cobalt
# rich kernel I need datasets (specifically udders of cows, teats, etc) for a project regardin...

a possible starting point is https://datasetsearch.research.google.com/

not sure if there's much out there though, maybe something like https://data.mendeley.com/datasets/kbvcdw5b4m/2 ? (note: I haven't downloaded it to check myself)

bitter garden
# rich moth That's what I'm working on. Well a model to perdict market volatility. I got ...

Working on a personal project of my own on the similar lines but for crypto.
Hourly data fetching -> a ton of technical analyses -> picking top 30 with a good score and backtesting them.
Additionally a hist gradient model that feeds on technical data and makes hourly target predictions. Got it to almost 90% accuracy, just need to validate it on more data to come at a number and decide on tuning parameters. Sometimes I see overfitting issues, still alot of work to be done regarding implementing news feeds sentiment analysis and maybe an ensemble model stack, I'll give it maybe 70% accuracy for now, even though it shows more than that in these plots, it just sounds too good to be true lol

bitter garden
rich moth
#

Ya overfitting issues seems common place with us lol

brave stream
#

Does anyone here have experience using Magenta? I'm attempting to use the library for music synthesis but it seems deprecated and doesn't interact very well with google colab now.

rich moth
#

One of the losses came back infinity. ๐Ÿค” Guess it didnt like the parameters hehe

trim saddle
long hazel
#

I got it to work using chatgpt

#

Btw can i not implement variable auto encoders on a non image based dataset

scenic parcel
#

Anyone know an open source rdbms similar to postgres but it lets you have column names longer than 64 characters

radiant frigate
#

how do you pick optimal parameters for normal inverse gamma distribution? im building a 2-letter classifier based on MAP, ML and Bayesian inference estimates and Im not sure how do you ideally pick parameters for the NIG distribution

random sapphire
#

hey is there any ML dc server that i can join

upbeat prism
random sapphire
vestal spruce
# random sapphire anyone?

Here is the place, I'd share you the alternative but idk if that would be against rule 6 of this server.

vestal spruce
#

But if you must, try searching it on the discover community server and use machine learning as the keyword

vestal spruce
#

though not much of them are active, aside from a few.

vestal spruce
#

there's a lot of discord server group based on types, such as but not limited to gaming, college/school, language, music community.

#

and yes even data science and ai/ml

random sapphire
vestal spruce
#

just type in those keyword in the search query of discover community server, you'll find them

vestal spruce
random sapphire
#

are you into ML

vestal spruce
#

If you have any question about ML don't be afraid to ask, though asking about question isn't really going to be an efficient way to find answers so whatever it is just ask it, someone will eventually have an answer, if not today, ask again tomorrow.

random sapphire
#

great, Im a novice and ive been working on some project(mini) could you pls checkout my github i just wanna know if those projects are good to start out ML

#

its pxul1236 on github

vestal spruce
#

Ok I'll check it out

random sapphire
#

is it ok to send repo links in this channel?

vestal spruce
#

hmm not sure, would that be consider as advertisment, since it's self-promotion

random sapphire
#

ohkk

clever current
#

Does anyone have resources for developing a marketing attribution model? It doesn't have to be fancy. I'm joining an education startup soon that wants to improve their marketing attribution and tracking. They do mostly email, SMS, and phone call marketing

#

Please @ me ๐Ÿ™‚

deep zealot
#

i just realized how diverse the matplotlib library is after only using matplotlib.pyplot then being introduced to import matplotlib instead of import matplotlib.pyplot

#

im ngl i think im going to switch to R for data visualization

agile cobalt
#

personally I like plotly for simple things

bokeh is also fine

deep zealot
#

simple plots on matplotlib.pyplot is fine

jaunty helm
#

try plotnine if you like ggplot syntax

hard mortar
#

I made a python code to extract data from a website, but the code is taking a long time to execute due to the website taking a long time to update, I've already tried using commands to stop updating the website, it still doesn't work, each step of the code only runs when the website stop updating, does anyone know how to help me?

agile cobalt
open lily
#

Hello, I'm kinda beginner in Python (I know how to code simple algorithm and I know a little bit of numpy), and I'm trying to learn ML

Do you guys have a plan to follow ? Like what are the basics, the things I must know etc...

Also I only code with VSCode but idk if I should try an other IDE

I want to know how to code a neural network from scratch berore using frameworks

cedar tusk
open lily
#

โค๏ธ@cedar tusk thank you !

rich moth
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

fallow coyote
#

I've been stuck in progressing my programming skills so Im going to go back to original goal of learning ML. The project I have in mind to make is to create a program that creates a heatmap of a fighters movement around the ring. What libraries do I need to utilise and what books or websites provides good tutorial in the tools I need?

main solar
#

wow google gemini is helping a lot, I just typed a few words and it gave me everything

random sapphire
#

can someone tell me what is the use of warnings library

arctic star
#

can anyone help me with computer vision

#

i am facing an error in intensity values in threshold images ]

#

does anyone has a good code where from the intensity values can be stroed in numpy array from the threshold image. I am getting wronf values

ionic valley
#

I'm a sophomore interested in ML. Should I learn Julia?

wooden sail
#

if you like it, why not? the ecosystem is smaller than in python, but it attracts the mathy type more. as a result it has some functionality that psthon doesnt

bronze creek
#

Hi everyone, I am new here
And do not know much about this community. I am a student in Data Science and in first semester of my study program. I want to know how many of you are new here and on the same stage. Let's collaborate.

Which one do you think is more in demand in Data Science.
Python or R?

ionic valley
ionic valley
#

am I likely to get an ROI

#

these questions are pretty bad nvm

wooden sail
#

i wouldn't say it'll make you stand out nor is it in high demand, since frameworks for ML are huge and well established

long robin
clever current
#

I think you should know math and statistics

clever current
hard mortar
bronze creek
ripe pawn
#

i created a sklearn cheatsheet, if someone is free, can they proofread and give me a bit of feedback on it.

agile cobalt
agile cobalt
clever current
left tartan
#

Would like to see too

fallow coyote
#

is it better to split a datatime column into individual day, month and year columns rathe than to convert a date column into datetime?

mint bobcat
#

Usually is better to work with datetimes (and time zones)

fallow coyote
#

i read some things online saying for ml, its better to split the datetime columns into separate columns as itll be faster. id assume itd be faster that way rather than converting the date column into datetime and reading it

#

if you get what I mean

agile cobalt
#

for storing the data itself and for most transformations, analysis etc. it is better to store it as a datetime column

For feeding it into ml models, you must convert it to a number however - that isn't limited just to datetimes, but also every other type of data like strings and what not

For some models, you could just feed it the timestamp, but usually you'll want to split into somewhat relevant features instead of just feeding the timestamp

nocturne valley
#

anyone here have experience with wav2vec2? Any pointers for the best repo to clone for a non-GPU multicore?

rich moth
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

rich moth
bitter garden
cinder schooner
#

hello, i want to learn more about cuda and the capabilities of what i can do with it. Would you have great ressources on that?

rigid cape
#

Hey guys, Need ideas for ML projects for portfolio. If you guys have any lists do share. I'm looking for some ideas where I can make a small ML or DL model and then use it to make a web app and host it online.
Thank you.

quaint mulch
quaint mulch
quaint mulch
calm dome
#

hellom how can i intergrate whatsapp meta AI with my chats so that it can auto reply customers

odd stratus
#

anyone have any methods for removing poisoned data from a data set?

left tartan
#

Tell us more?

bitter garden
random sapphire
left tartan
#

As opposed to outliers or other 'bad' data that skew the results / don't represent 'truth'

odd stratus
#

the data is a list of 78 values from 0-1 and then a column labelled "normal" or "attack"

odd stratus
#

ive managed to remove 97% of the poisoned data, but it still causes issues

left tartan
odd stratus
#

the data is 78 columns of data of values from 0-1 representing different security features to detect either potential attacks or normal system functions

agile cobalt
odd stratus
#

heres a few examples of the training data btw

odd stratus
# agile cobalt that sound pretty weird? Are you sure that it is not overfitting, or is it just...

its not overfitting on training data, i.e. the data that is labelled

on smaller subsets, it can overfit, meaning that the subset doesnt have poison data, but is also easier to overfit on a small subset
however after removing a large portion of potential posion data, the overall accuracy increases due to the reduced poisone data, and trying to get it to overfit is very annoying, and the main goal is to just remove the poison data lmao

odd stratus
velvet mountain
#

what are people usually using for monitoring deployed model ? I'm typically interested in a tool that could decide that the model is badly performing, and send an alert somewhere to someone (not specifically automatic retrain). I know a bit mlflow so if there is a solution with this I'm interested. but I'm also interested in knowing the heuristic that is actually used?

cedar tusk
velvet mountain
#

yes that's my question. what are people typically using for that?

#

I guess there is a need for a kind of feedback loop, where the model predicts and the actual value is sent back for the model to know it performed badly. but I'm wondering if people have a dedicated tool for that, or if it's home-made

cedar tusk
#

tensorboard is used widely afaik

#

but different platforms have their own equavalent

velvet mountain
#

I've looked into mlflow but I couldn't find a way to really feed a deployed model with actual values, and code a "I'm obsolete" trigger

cedar tusk
#

people decide that

velvet mountain
#

yeah yeah ofc ; I meant, just the idea of sending alerts

#

but if I get you right, it's just another tool that would compare the predicted vs actual

cedar tusk
#

i dont see how a model can get obsolete

#

because you should build the model with everything in mind anyways

#

if you are trying to train the model continuously, you have checkpoints

#

aka u train the model every 1 month or whatever

velvet mountain
#

it's more about data drift

cedar tusk
#

population distribution dont get changed willy nilly, unless a major event occurs

#

but people would handle the data change then anyways

velvet mountain
#

use case: some model is trained on a production line

#

and then the production machines get older as time goes

#

so the model starts to produce bad results, because the entire pipeline of machines do not perform the same

#

that's the kind of things Ihave in mind

#

---- doesn't sound like continuous training because somehow, the "past" data is not really relevant anymore for the prediction at present-time

cedar tusk
velvet mountain
#

yes

#

well, not just the observation X in f(X)=Y

#

really the use case: X was sent, f(X) was predicted, but we got Y. and it's been N times we actually badly failed f(X) >> Y (for example) : so maybe it's time to retrain

cedar tusk
#

you need an observation group and do hypothesis testing on it automatically

#

shouldnt be too hard to implement

velvet mountain
#

no no for sure ; I'm just wondering if there is a tool that is commonly used for that

#

I mean, a "mlops" tool ; not a lib to do hypthesis testing ๐Ÿ˜„

cedar tusk
#

from what i see evidently ai is not bad

#

take it with a grain of salt tho, never used the thing

cinder charm
#

matplotlib is giving me a circular import

#

how do I fix this

#

The error code is

cannot import name '_version' from partially initialized module 'matplotlib'
tidal bough
cinder charm
compact valley
#

Is data engineering part of data science or it should be another channel?

delicate apex
#

!rule ad
also, nice job disclosing your conflict of interest there

arctic wedgeBOT
#

6. Do not post unapproved advertising.

left tartan
#

Please DM modmail and explain why this ad was posted?

neat violet
#

@left tartan for helping students

left tartan
neat violet
#

No

left tartan
neat violet
#

Okay

thorny geode
#

hey, does the Stat110 course really that important in data science

#

i've read through bayesian statistics and i feel its not being used much compared to just learning about t-test

alpine aspen
alpine aspen
thorny geode
wooden sail
alpine aspen
#

That's a great bit of feedback - wasn't aware of that aspect of np.einsum

wooden sail
#

yep. especially for your example of multihead attention, you might find that einsum without optimization is slower than just using a for loop with regular multiplications

#

and then if you look into the automation broadcasting behavior of @ and .dot, once of the two should already do slicewise matmul, and this will definitely be faster than einsum without the opt

#

but yeah, einsum is the best thing since sliced bread and everyone should use it ๐Ÿ˜Œ

alpine aspen
#

well now i'm doubting that. i think what this means is i need to sit down and read the implementation for this optimization stuff

wooden sail
#

heh

#

you can also drive the point home by saying that pytorch, tf, and jax also all have einsum too

alpine aspen
#

i mention it briefly at the start but it wouldn't hurt repeating it

#

there's also C vs FORTRAN order and things like that that can matter quite a bit

wooden sail
#

that's exactly why the optimizer is important

#

the memory layout changes how fast the multiplication is depending on the order

alpine aspen
#

is it optimizing more for coherence in memory access or for space complexity stuff with interstitial allocations

#

or both

#

i guess i just need to read it

#

thanks again, heading to bed. i've got a few other interesting blog posts as well if you want to check them out (although the deepfakes one needs another editing pass)

wooden sail
#

here we go, found it in matmul

fleet glade
#

I want to start learning Motion detection like Want to build something like motion tracking fitness Thing but i got no knowledge about anything
Can anyone guide me from where to start what to learn and also if possible can provide me some coursera links to learn those things

alpine aspen
wooden sail
#

for this operation, yes

tawdry sundial
#

this makes no sense ```py
loss_fn = nn.L1Loss()
optimizer=torch.optim.Adam(model.parameters(), lr=0.001)

epochs = 5
for i in range(epochs):
model.train()
y_pred = model(X_train)
loss_score = loss_fn(y_pred, y)
optimizer.grad_zero()
loss_score.backward()
optimizer.step()```

#

error at loss_fn(y_pred, y)

TypeError: 'int' object is not callable

wooden sail
#

it does make sense, what is nn.L1Loss() and what does it return?

#

you probably meant to assign it without the () instead of calling the function in the very first line

tawdry sundial
#

didnt work

#

I am pretty sure that L1Loss is a class

#

yea, doesnt make sense

wooden sail
#

show the error message

tawdry sundial
#

with () or without?

wooden sail
#

with, and show the full traceback