#data-science-and-ml

1 messages · Page 85 of 1

sand pivot
#

based on E function

#

and the idea is that, since the readings are just values of E, with some small randomization applied, then i hoped to make a NN that can predict values of E somewhat accurately

#

(i tried expanding the generated readings range, but the same problem remains, as soon as i go out of the training range, error rates just start gradually increasing)

wooden sail
#

that's generally not how neural networks work, sadly 😛 extrapolation and generalization are large challenges, and the more you black box a problem, the more difficult it is to handle those things

#

so yeah, getting worse estimates of E the farther you move from the training data sounds just about right

#

unless you know ahead of time of a function that closely approximates the desired behavior, and you estimate its parameters

sand pivot
#

hmmmmmm interesting

#

i actually even plotted them and you can see how it goes xD

wooden sail
#

here, if you know for a fact your stuff behaves as some sort of power series, you can try to set up a model something like ax^-3 + bx^-2 + ... cx^3, and try to find all the coefficients

#

if you really in practice expect to know nothing about E though, there's no easy solution 😛

sand pivot
#

🤣 yeah, that makes sense, and if i think about it from another side, the value of the real function keeps getting closer to zero, so, i guess at some point there will be precision errors to deal with xD

#

on top of all the other hardships you mentioned

#

can you give me a reference how to do this solution?
ax^-3 + bx^-2 + ... cx^3

#

i am honestly still learning about ML, and i guess i could just try out this as another solution?

wooden sail
#

you could try with gradient descent

#

same as with ML, but with a well-motivated model instead of an arbitrary black box

#

using deep learning doesn't always make sense

sand pivot
#

interesting, thanks a lot for all of the tips ❤️

lapis sequoia
#

@wooden sail are you really pro in ai?

#

i just want to ask what is the difference between ml and dl

wooden sail
#

i would say no. the difference is rather arbitrary, and neither of the two terms is well defined. they're kinda buzz words

lapis sequoia
wooden sail
#

machine learning refers very generally to many kinds of optimization problems solved through computing. deep learning refers to doing so with deep neural networks, but what "deep" is isn't well defined

lapis sequoia
#

and i was wondering if someone can help me understand what ai really is

wooden sail
#

it's a made up word. everywhere you look you'll find some different definition

lapis sequoia
#

so the main role of it is a part of machine learning?

wooden sail
#

what it originally referred to was what is now called "artificial general intelligence", which is the ability to solve and interact with a large variety of problems

lapis sequoia
#

now another question, what are the material of ai?

#

principal components

wooden sail
#

ai is the broadest of these umbrella terms, it includes too much stuff to list

lapis sequoia
wooden sail
#

you could make a case for very nested if-else clauses being AI 😛

lapis sequoia
#

definition, models, materials, what is ann, what is dl, what is ml

#

i understood the definiton and ml

#

materials and dl i cant figure them out?

wooden sail
#

dl is a specific kind of ann. "materials" is too generic a word without further context

lapis sequoia
lapis sequoia
#

and dl is part of ml?

wooden sail
#

i don't think making these distinctions is important at all, but sure

lapis sequoia
wooden sail
#

this seems about right, but the precise definitions are arbitrary

#

you could set ANNs between ML and DL

lapis sequoia
#

so its like this?

wooden sail
#

yeah

lapis sequoia
#

nice

wooden sail
#

though tehcnically all deep networks are artificial neural networks

#

so maybe DL nested in ANN instead

lapis sequoia
#

ohh alright

wooden sail
#

trying to draw such hard lines is a fruitless endeavor imo

lapis sequoia
#

ANN is a neural network, which is the basis for DL and most of ML, they are mathematical constructs that mimic the brain's neuron activation. this is the definition?

wooden sail
#

leads to more confusion rather than helping

#

i would say ANN being interpreted that way is also historical artefact and hurts you more than it helps

lapis sequoia
wooden sail
#

that's not how the brain works anyway

lapis sequoia
#

like a lot

#

guess imma head to chat gpt and copy everything from there 🙂

#

but thank you anyway

wooden sail
#

that's a horrible idea too

lapis sequoia
wooden sail
#

oh boy. oh well, good luck

lapis sequoia
#

this is the real definition?

#

of ann?

wooden sail
#

that's not a definition, but a pictorial representation of a common architecture (not the only possible one)

potent sky
# wooden sail machine learning refers very generally to many kinds of optimization problems so...

deep learning refers to doing so with deep neural networks...

not-useful-fun-discussion time!
There's a bit of contention there. Many parties assert that the Deep in Deep Learning refers to the learning principle of extracting information (learning) from deep hierarchical representations of data or learning techniques that involve multiple levels of composition, where more "complex" representations can be derived from a composition of simpler representations.
Per this view, deep learning is not necessarily restricted to neural networks. Any machine learning technique that uses the principle of deep hierarchies might qualify, and need not be neurally inspired.
For example decision trees, Gradient boosting, random forests can be deep, and not be neurally inspired, while taking advantage of the deep learning principles of hierarchies of compositions.

That said, it is also argued that per the Universal Approximation Theorem all these other methods (decision trees, grad boost etc.) can also be formulated as forms of artificial neural nets. and so are subsumed under that category.
Which would mean that deep learning is after all restricted to neural networks xD

Also, these points are not very helpful. For all practical purposes in industry and most of academia, today, deep learning is taken to mean learning with artificial neural networks.
Edd is right on all of their points, I just wanted this discussion here and to see if anyone would like to add something 👀

wooden sail
#

i would just add from my side that i generally think of a "layer" as any function to be freely composed with others 😛

potent sky
wooden sail
#

"Any machine learning technique that uses the principle of deep hierarchies might qualify, and need not be neurally inspired."

#

just saying i also had something somewhat more general in mind when i was talking about deep learning above

potent sky
#

ohh yes yes makes sense. I just wanted to put this out here and see what interesting things people come up with xd

wooden sail
#

do you think there's much merit to trying to cleanly sort things things out into categories?

#

just from your personal standpoint

potent sky
#

Not immediately.

Also, these points are not very helpful. For all practical purposes in industry and most of academia, today, deep learning is taken to mean learning with artificial neural networks.

But I do think it's useful in a sense to know the discussion around these contentions, because it might help broaden the horizons of our thinking process / research etc.
I mean if I just thought that deep learning = deep neural networks, I might not stumble upon some interesting ideas than I might with
"deep learning not necessarily is deep neural networks" somewhere in the back of my mind.

I think it adds useful perspective to be aware of the discussion somewhere in there, rather than some standardized definition of clean categories

#

I'm not sure how clear I was able to make it lol xd

#

tldr I think the discussion can be valuable, more than an "answer" of what is the "right" categorization

wooden sail
#

i get the feel, yeah

potent sky
#

we stand on the shoulders of giants to advance....maybe learning is high, not deep xD

#

high-learning

ancient fossil
#

Lol yeah this took me way too long to figure out. Wasn't even aware that the forward passes in the Siamese wrapper were independent till recently, as in like after asking about the training thing

verbal oar
#

deep learning is just buzzword

#

I mean term

#

not as subfield

#

should be named as other name

#

hmm and also this
machine learning is subfield of AI but in artificial inteligence is no learning rather rules or sth like
but in machine learning there is no intelligence but learning

#

AI, machine learning are buzzwords too

#

can be named as misnomers

#

but people just accepted as it is

#

so why machine learning is subfield of AI

#

this is rather philosopical topic

#

or maybe its because intelligence contains learning

#

ok reinforcement learning is like intelligence and learning

#

but in supervised and unsupervised learning there is no intelligence

#

and why rl is intelligence and learning because it has intelligent agents, multiagent systems etc

#

but this is only my opinion

#

additionally expert systems has intelligence but they not learn

#

I paraphrase little some book

#

about intelligence and learning in case of ai and machine learning I found it in some rust projects book from apress

#

and there he is right I think

#

in rust projects there is chapter about ai, machine learning

#

I read only this chapter

#

and deep learning is because it is related to neural networks with deep layers

#

so deep neural networks

#

by machine learning I consider only classic machine learning

#

maybe shallow networks too

past meteor
potent sky
lapis sequoia
#

which reinforcement learning algorithm is recommended

#

I'm making a Google snake AI but plan to make AI for more complex mobile games in the future

gloomy parrot
#

hello does anyone here tried to deploy a lambda function aws that need PyTorch? Because my issue is the size limit.

agile cobalt
#

that sort of thing really doesn't works well in lambda functions, try a different deployment method like EC2 or even a different provider like HuggingFace

small wedge
# verbal oar so why machine learning is subfield of AI

AI is any program where an agent makes decisions based on some input. ML is any time an agent can get better at a task by itself. AI is a broader category because it includes all of ML as well as things that don't qualify as ML like graph search algorithms.

harsh minnow
#

I need to train an AI model? Finetune GPT3.5, Train opensource models or use GPT4?

I am trying to build an AI assistant app that can tailor its responses to users in different countries. It needs to sound very natural. I first ask some questions to get user data and generate responses about a specific topic for that user's country.

Now I want to train the AI on new conversational data, but when I fine-tune a model, it generates nonsense. I tried fine-tuning GPT-3.5, but it gave inaccurate info (GPT-4 is better). GPT-3 (4096 tokens) can't generate long enough responses, so I plan to call it multiple times (it's context-aware). I want the model to be smart - ask for country info and make decisions. It should output in different formats based on user needs. Each user's needs are different, so the model must analyze and generate tailored responses.

What is the best approach? How can I make a smart AI that knows each country's info and is accurate? I need a model that is smart, accurate, long context and properly trained.

agile cobalt
lapis sequoia
#

No, but I found it interesting to see what critics have to say about movies.

#

"manually"
ELI5: Haw do?

past meteor
lapis sequoia
#

Good idea. Thank you

past meteor
lapis sequoia
#

That way I can hunt for any finicky formatting edgecases.

past meteor
#

Display.max_rows then?

past meteor
ancient fossil
gloomy parrot
lapis sequoia
#

hi people

#

i need some help with fastapi

#

ig because there are not good resources for fastapi can anyone help

spark nimbus
#

Is there an efficient way to get slices of a series so that I can iterate over s[:1], s[:2], ..., s[:n]?

viscid wedge
#

how do i decide if i should but one expensive gpu or two cheaper ones? its for pytorch kaggle

spark nimbus
#

First thing to check would be if your current motherboard even supports two GPUs

#

Even if it has two x16 PCIE slots, some boards simply don't support it

mighty patio
#

What do you have now?
If you currently have no gpu I suggest you only get a single cheap gpu.
Only once you have tested your model can you evaluate if faster training will make any significant impact on performance metrics, or if there are other factors holding your model back

vestal spruce
#

Is it ok/a good idea for undergrad to dive into the field of speech recognition?

lapis sequoia
#

obviously anything that seems interesting to you is your best choice

#

speech recognition or art style recognition or writing style recognition system anything that is interesting to you anything you care is your best bet to get good at it.

#

i started with password generator in Python

vestal spruce
potent sky
#

Ofcourse make sure it's something you're genuinely interested in

vestal spruce
# potent sky I think it's fair game for a final thesis Plus you should get a better idea in t...

Well I've been listening to a lot of podcast from my local podcast service provider, I was thinking if it's possible to record these podcast audio into a text which can later then be used as a timestamp label in the podcast discussion, which is a far fetch idea since I don't actually have much ties with them (the local podcast service provider), but I'm hoping by recreating a speaker diarisation to construct a transcription that's structured in a dialogue format would be a stepping stone toward this goal I have in mind. I already found quite a few plugins to support developing this project, I just need to tweak it to the data that I have. wish me luck in this endevour. ^^

shy carbon
#

why is it such a pain to install pytorch3d

#

could someone help me install pytorch3d on windows please ?

#

after installing cuda 11.8, pytorch, I'm trying to install pytorch3d from the source

#

I have lots of errors

#

first I had to download ninja, them I had to put my dev env so pytorch3d find cl.exe

#

and now I have a cryptic error : "file not found" in build_ext

slender kestrel
# past meteor I'd read that docs page carefully and see if there's anything in there that solv...

hey ! sorry about the ping 1st of all thank you for suggesting the book ISL it was way easy to read the book with the videos i wanna ask about the Pattern recognition and machine learning 1st is the book worth reading 2nd if yes then do you have any idea if i can find video explanations of that book ? 3rd which of the 2 books would be more benficial statistical rethinking or Pattern recognition and machine learning and if you have any suggestions for a book for deep learning too ? ofc with video explanation i really need video explanations they make reading a whole lot easier lol

past meteor
#

I think after reading something like ISL you need to "do" something with it before jumping to the next book.

#

On the topic of books with videos - I've mentioned it a few times. I don't like videos, they give you a false sense of "hey I understand this" when it's not the case. They're risky business. Reading does that as well, but less. In terms of "understanding" it definitely goes like this: videos => reading => implementation.

If you're going to do PRML I suggest you don't actually watch any videos and you implement (some of) the algorithms.

past meteor
charred canyon
#

Can anyone help me with easy project guidelines? And is there a way to calculate it for me using the Python language?

ionic badge
#

hi all, just wondering if anyone has run into /lib/python3.8/multiprocessing/connection.py", in _recv raise EOFError while trying to do inference using a language model?

kindred isle
#

What's the differnce between ravel and values in pandas/

#

why use both here?

charred canyon
#

@kindred isle
The image you sent shows a Python script that is counting the number of records, missing values, and unique values in each column of a Pandas DataFrame. The script uses both the ravel and values properties of the DataFrame for different purposes.
The script uses the ravel property to count the number of missing values in each column. This is because the ravel property returns a flattened view of the underlying data, which makes it easy to count the number of NaN values.
The script uses the values property to count the number of unique values in each column. This is because the values property preserves the original shape of the data, which is necessary for finding the unique values in each column.
Overall, the ravel and values properties are both useful for different purposes. The ravel property is useful for flattening data, while the values property is useful for preserving the original shape of the data

I hope it helps you. by Bard

past meteor
slender kestrel
#

also what are your views on pattern recognition and machine learning

bitter raptor
#

Task about lakes
Generate 20 random variable from 1 to 100
Draw the plot of the sequence
Lets assume each point represent the height and so all plotting is 2d mountains.
Then consider the unlimited rain from above - cavities become lakes full of water.
Determine the deepest lake

The question is, how can I see this zone on the chart? (now on the screenshot it is bright purple, hand-drawn)
How can I do this on a chart?

past meteor
past meteor
slender kestrel
slender kestrel
past meteor
slender kestrel
plucky ivy
slender kestrel
slender kestrel
past meteor
ionic badge
#

I have a V100 GPU, ive setup cudatoolkit on anaconda environment, however whenever i run some kind of inferenece, irrespective of the model I get the following: CUDA error 209 no kernel image is available for execution on the device

#

any clue why this might be happening?

warped osprey
#

why does hypergeom.pmf give me nan if I try to calculate the pmf there, but i get a value if I use the raw formula?

delicate apex
#

!d scipy.stats.hypergeom

arctic wedgeBOT
#

scipy.stats.hypergeom = <scipy.stats._discrete_distns.hypergeom_gen object>```
A hypergeometric discrete random variable.

The hypergeometric distribution models drawing objects from a bin. *M* is the total number of objects, *n* is total number of Type I objects. The random variate represents the number of Type I objects in *N* drawn without replacement from the total population.

As an instance of the [`rv_discrete`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.rv_discrete.html#scipy.stats.rv_discrete) class, [`hypergeom`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.hypergeom.html#scipy.stats.hypergeom) object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

See also

[`nhypergeom`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.nhypergeom.html#scipy.stats.nhypergeom), [`binom`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.binom.html#scipy.stats.binom), [`nbinom`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.nbinom.html#scipy.stats.nbinom)   Notes

The symbols used to denote the shape parameters (*M*, *n*, and *N*) are not universally accepted. See the Examples for a clarification of the definitions used here.
delicate apex
warped osprey
#

ahh that makes sense

#

my prob theory textbook and wikipedia both used n = draws, M = success objects, N = total so i figured it was standard

#

!e ```py
from scipy.stats import hypergeom
print(hypergeom.pmf(3, 1696, 64, 10))

arctic wedgeBOT
#

@warped osprey :white_check_mark: Your 3.12 eval job has completed with return code 0.

0.004762390442527913
warped osprey
#

i think it's that

#

pmf(expected, total_objects, valid_objects, draws)
total -> valid -> draws looks like it goes biggest to smallest

delicate apex
#

right, i ran the equation as is there were 1696 good objects out of 64 total, which also doesn't make sense

warped osprey
#

ye ye

delicate apex
#

this is one the many cases where non-descriptive variable names are not a good idea

#

bad scipy bonk

latent quarry
#

Can anyone tell me how to calculate heuristic value in an A* search algorithm

slim reef
#

Hello! I am quite "new" to using AIs in python, but my current project uses a bunch.

Quick run down; On my laptop i have a Speech To Text code, which id like for the response to be sent to a NLP that will filter any words taht arent names of cities and discard them basically only letting city names continue.

After that, the text the NLP gives, which will hopefylly only be city names will be sent to a Raspberry Pi with a code that gives weather info about any city. It sends the text there using and activating a HTTP POST, which i still dont really know if it works 100% despite having no errors.

Then, after the Pi gives the weather info, it sends it back to my laptop into a Text To Speech code to read it outloud.

I have a working STT, Pi and TTS code, but no proper way to link them, plus i still need to make the NLP code, which is what im trying to focus on right now but i dont know much of or how it works, or which NLPs i can use etc. If anyone can help, let me know

#

any help appriciated

#

update, looking online, maybe something like NER would work better? i guess its like a branch of NLPs

desert oar
slim reef
desert oar
slim reef
vestal spruce
#

Is sliding window and frame segmentation a similar audio segmentation method?

#

I'm trying to understand how VAD and Speaker Embedding (d-vector extraction) works.

flint shale
#

Hey guys im trying to make histogram but it shows not corrently

#

Here's my Series

#

and I want to make it like this

#

but it only shows frequency

echo mesa
#

What kinda books would you guys recommend for understanding machine learning and data science, I know that there are a bunch out there already downloaded a few but I thought I'd ask here as well.

scarlet gulch
#

hi folks, here's something for any pandas expert/insider (let me know if there's a more focussed community for that): i've been flummoxed by something with pandas, in particular its df.replace() method for to_replace=<dict>. from the pandas documentation for replace(regex=<bool>):

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

Which means to_replace cannot be a dict when regex is True. However, in all my tests, something like df.replace(to_replace=nested_dict, regex=True) always works. Is the definition/documentation then wrong?

lavish kraken
#

Is anyone that is very good in XAI?

serene scaffold
lavish kraken
serene scaffold
#

what is the question?

lavish kraken
#

Can't you see i am asking question?

serene scaffold
#

you said "Is anyone that is very good at XAI" and posted a screenshot, but the screenshot doesn't tell us what you need help with

lavish kraken
#

i don't know if that's clear enough...

serene scaffold
#

Now you're getting somewhere. But none of this information can be gathered from the screenshot.

lavish kraken
#

Don't know if anyone in this channel ever done this before

dense sluice
#

Whatup? I am new to using AI and CV tools! I have an OpenVINO program for facial reidentification, based on an OpenVINO sample. How would you store the identified faces? Some options:

  • Save the face images; this is how it's set up in the demo by default. Then OpenVINO loads it into description vectors.
  • Save the description vectors (in memory as numpy arrays) in a custom binary format, with x-endian 32-bit floats
  • Save the description vectors in an SQLite database, serialized as text; there are some popular base-64 ones serialization formats IIRC
  • Something else

Thoughts? TY!

urban knoll
#

Does anyone know of any "smart" VAD(voice acitvity Detection) modules that are able to differentiate between: [i] a pause of speech that indicates a person still want's to talk to finish a thought or hwat have you [ii] a pause of speech that indicates a person is done talking If there is no such thing, are there modules or APIS I can use in combination with a VAD module like webrtcvad to get what I'm looking for?

shut girder
#

Hello, I am a beginner to data analysis, and is wondering, how should I handle missing values when cleaning data?

austere swift
#

That depends on a lot of factors

#

If the amount of missing values is very little compared to the whole dataset, just drop them

#

If they’re more than a very small portion but not more than like 30% you could get away with filling them using means/medians/etc

#

If they’re a LOT of missing values and the feature isn’t too important to the data then just drop the feature entirely

#

And if they’re a lot of missing values and they are important to the data, get better data lol

#

There’s a lot of discretion involved with that as well

#

There can always be better data, and more data is better 99% of the time, but you have to figure out what’s “good enough”

upper jewel
#

hello i want try making stable diffusion models but give users the ability to train their models with select images.
i can get keras stable diffusion to work and generate images but fail to train it how to do that.
i would like it to be similar to dreambooth/lora for efficiency.

the problem is i am not able to find this being done or idk how to go about this.
i would like to be able build on top of existing weights instead of retraining from scratch due to the price of training(150k training hours).

cunning agate
#

hello guys i have a dataset i want to work on it

#

could someone help me

#

it's a real dataset a banking one

#

There are null values how can I deal with them in this case

#

Like do imputation doesn’t work

slim reef
#

trying to use spaCy, but i keep getting this error:
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
Looking online, i already did the steps i saw online, and still getting the rrror any help?

serene scaffold
serene scaffold
# cunning agate Like do imputation doesn’t work

When asking for help, force yourself to describe the problem without saying "doesn't work". Because saying that something "doesn't work" isn't very informative. Is there a reason you can't use imputation? If you can use imputation, and you tried it, what happened that "didn't work"?

crude pilot
#

What would be a good way to train a word2vec models to learn synonymous = words that are often seen in the same context (around the same words) but rarely together?

#

I am new to this but as far as I understand, a skipgram model will make words seen together closer, but not necessarily synonymous

#

for instance "chocolate is good" and "candies are good" => "good" will be close to "chocolate" and "coffee" but not "chocolate" and "candy"

#

they will probably be relatively close but you'd expect synonymous to be very close

#

if the model never sees "candies are chocolate" it won't be as good as it should

serene scaffold
#

@crude pilot if the model knows that two words are synonyms, how should that knowledge be represented in the model? by the vectors for synonyms a and b having a low cosine distance to each other?

crude pilot
#

a very low distance I think

#

but as far as I understand skipgram, "chocolate candies" could be a negative example

#

so the model could even learn to make them not too close cause they are never in the same sentence

#

like "chocolate" will be closer to "good" than to "candies"

#

I fee like I could somehow solve that by augmenting the dataset in a certain way, or doing 2 pass

#

like trying to detect words that have the same neighbours but are not close together

#

and make positive samples out of that

#

I know there are models around specific to synonyms but I specifically seek an unsupervised approach + conceptually I'd like to get word2vec right

serene scaffold
#

@crude pilot I don't think word2vec can account for a word (as in, an exact string) having more than one sense, which means that this problem can't really be solved, I don't think. For example, "begin" and "start" are synonyms as verbs, but not as nouns.

#

and then sometimes words with the same part of speech are only synonyms in certain context. Like "help" and "aid".

#

all this is to say that "true synonyms" aren't that much of a thing.

crude pilot
#

this unsupervised work then feeds a few-shot supervised binary classifier

crude pilot
#

basically if I rephrase it's like seeking a kind of transitivity

#

I feel like word2vec will make close words that are seen together but not necessarily words that are used as replacement from one another

#

(I see a lot of models at work but only started practicing recently so it's kinda new to me, perhaps there are other known models that do that or maybe even bag of words is better)

slim reef
#

✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')

#

used terminal now, should it work?

serene scaffold
slim reef
#

import spacy
spacy.load('en_core_web_sm')

that no? id need the import

#

and yes, still same error

#

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

#

maybe because im using Pycharm? but that doesnt make much sense, that it would be PyCharm's fault

#

@serene scaffold still gives error

serene scaffold
#

which is a pretty common trap. fear not.

slim reef
#

ok ok

#

havent heard of that

serene scaffold
#

do you know about virtual environments or no?

slim reef
#

not much, no

serene scaffold
slim reef
#

erm no, just win r and enter

serene scaffold
#

and then do the spacy download command in there

slim reef
#

to do python -m spacy download en_core_web_sm

#

ok, test again the code?

serene scaffold
#

yes

#

if that doesn't work, run this as python code

from spacy.cli.download import download
download('en_core_web_sm')
#

cli stands for "command line interface"--I think this just runs the same code that python -m spacy download runs

slim reef
#

previous code still gives the same error, gonna try the new one

#

new one, is that good? ima guess so

serene scaffold
#

looks like it's working

slim reef
#

✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')

#

gonna try the code from before again

serene scaffold
#

you can delete those two lines and try doing what you were planning to do

slim reef
#

OK! no error this time

#

dope, now i just need to figure out how to have the code filter text and let only city names pass

serene scaffold
slim reef
#
import spacy

# Load the language model
nlp = spacy.load("en_core_web_sm")

# Your input text
text = "I traveled to New York last summer."

# Process the text with spaCy
doc = nlp(text)

# Extract city names
city_names = [ent.text for ent in doc.ents if ent.label_ == "GPE"]

# Print the extracted city names
print(city_names)

this code works, but i only want it to print the city names, as in (imagine it was in the console):
New York
But i get
['New York'], and i dont want ['']

#

unless i need to change the code idk

serene scaffold
#

that just shows that New York is a string

#

and city_names is a list

slim reef
#

ok ok

serene scaffold
#

you want to print out each item individually?

slim reef
#

no, just list out the city name

#

it will then go into my Raspberry Pi, etc....

#

ok, now i have "almost" all 4 codes, idk if the Pi one works, i mean it makes the HTTP POST but idk if it also does the other part of the code

#

well, thats for a separate discord

#

new question, I have this STT code, how can i have the text that it prints out go as the text the NLP code uses?

import speech_recognition as sr


def transform_audio_to_text():
    # almacenar el recognizer en variable
    r = sr.Recognizer()

    # configurar micrófono
    with sr.Microphone() as origin:

        # tiempo de espera
        r.pause_threshold = 0.8

        # informar que empezó la grabación
        print('City?')

        # guardar lo que escuche como audio
        audio = r.listen(origin)

        try:
            # buscar en google
            request = r.recognize_google(audio, language='es-ES')

            # prueba de que pudo ingresar
            print(request)

            # Devolver pedido
            return request

        except:

            # prueba de que no comprendio el audio
            print('UPS, ALGO HA SALIDO MAL')

            # devolver error
            return 'sigo esperando'


transform_audio_to_text()
#

nvm, figuring it out myself

bitter wing
slim reef
#

okkkk, so now im working on getting the response from the Pi to be sent to my laptop, but i keep getting SyntaxError in line 40:

response = requests.post(laptop_url, json=tts_data)

#

idk why, i feel like there ISNT a syntax error

#

i just need to have the response from the Pi go into the TTS and ill be done!

tidal bough
#

check previous line

#

or run your code in python 3.11 for better error messages 😛

#

or use an IDE for that matter; they generally also have better error messages.

slim reef
#

previous line.... maybe yes, hold on

#

Yeah

#

it was

#

thanks!, forgot to check, and the interpreter just didnt higlight it, thanks 😅

hollow sentinel
#

hey, can i attach an excel file here?

#

or is that not allowed

#

idk what the policies are

#

this is the file btw

#

my question is would it be a bad idea if i used some of the variables here to predict the "IPAnnualReimbursementAmt" with a regression?

#

idk what else to do because of how complicated the kaggle notebooks are on this dataset

#

my prof wants me to come up with some kind of hypothesis and i wrote a 14 page paper about it already from the provider side... like using machine learning to see if a medicare provider is issuing fraudulent claims, but it was mostly a literature review

#

so i was wondering if anyone could help me out

#

also

#

there is no data dictionary for this dataset which really sucks

#

oh and here is the kaggle link i got everything from

waxen girder
#

@serene scaffold Do you have any idea how to decode abbreviations/acronyms? I have a database of acronyms and their full text counterpart and I'd like to crosswalk it with the correspond text. I'm trying research the spaCy documentation for how to modify the tokenizer for this and have no luck.

#

I guess I'm trying to avoid having to reconstruct the doc object.

#

But it appears that doing that might be required.

hollow sentinel
#

please ping me y'all if you can help

serene scaffold
#

Oh no, I pinged the wrong person

#

Please don't forgive me as I do not deserve it.

waxen girder
waxen girder
#

I think I understand this now, thank you!

honest verge
#

is there a channel or person I can talk to for a cv related task

flat fiber
#

Hey guys, a very quick question.
Could you suggest some good baseline models for a forecasting problem?

potent sky
#

It's generally better to just ask your question away if possible and whoever is familiar with the concept and free can pick it up and help you with it.
People usually don't want to engage in back and forth just to get to the question and then discover whether they're familiar / free enough to help with it or not

dry flame
#

(it's ok to share a scan of part of a book right?)
so im currently learning NLU and found this flowchart. would you guys say it's a decent way to find out if what i'm doing can be solved with it?

i personally find this and all the explanation before make sense, but i'd like a second opinion

slim reef
#

getting this error:

[2023-10-18 20:55:44,506] ERROR in app: Exception on /generate_tts [POST]
Traceback (most recent call last):
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "C:\Users\danim\OneDrive\Desktop\programming\TTSPOST.py", line 28, in generate_tts
    stream(audio_stream_bytes)
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\elevenlabs\utils.py", line 76, in stream
    mpv_process.stdin.write(chunk)  # type: ignore
TypeError: a bytes-like object is required, not 'int'
<pi's ip> - - [18/Oct/2023 20:55:44] "POST /generate_tts HTTP/1.1" 500 -

when running the TTS code. Quick run down, i have a STT and NLP code that sends text into a raspberry pi, and then the Pi sends the response it generates back into this TTS code but i get that error, any help?

#

??

#

???

quaint loom
#

It is probably a simple task but I am not able to adjust it.

I want the values below the line (See picture) not being influenced when I change the second x-axis. https://paste.pythondiscord.com/HHEQ

quaint loom
# slim reef ???

Looks like the issue is about the "mpv_process.stdin.write(chunk)". Check and make sure the variable is located in the right place

slim reef
#

i dont have that variable....?

#

is the code wrong?

#

if so, let me know

#

or any fixes, whatever it is, i just want to finish this project by sunday

#

anyone?

quaint loom
# slim reef is the code wrong?

I don`t understand what the issues is. Explain the error code again. But I guess you have actually changed the API key to your actual elevenlab library when you aske GPT to generate the code for you.

slim reef
# quaint loom I don`t understand what the issues is. Explain the error code again. But I guess...

yes, but the error message is

[2023-10-18 20:55:44,506] ERROR in app: Exception on /generate_tts [POST]
Traceback (most recent call last):
File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 1455, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 869, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 867, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 852, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\danim\OneDrive\Desktop\programming\TTSPOST.py", line 28, in generate_tts
stream(audio_stream_bytes)
File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\elevenlabs\utils.py", line 76, in stream
mpv_process.stdin.write(chunk) # type: ignore
TypeError: a bytes-like object is required, not 'int'
<pi's ip> - - [18/Oct/2023 20:55:44] "POST /generate_tts HTTP/1.1" 500 -

idk why

#

when running the TTS code before, it actually said text out loud, still gave the message of the bytes, but it wasnt marked or highlited as an error

quaint loom
slim reef
#

no, how do i check?

#

also, wdym?

quaint loom
#

Can you give me the output when you run print(type(audio_stream))

slim reef
#

like in the same code?

quaint loom
#

You can use use it in the same code and see the output

slim reef
#

ok

quaint loom
#

Or I can modify it for you

#

Give me a moment.

slim reef
#

sure

#

also, when i add the print, it doesnt print anything rn, as the Pi hasnt sent anything rn cuz its not on

#

like i dont have it with me either so

quaint loom
#

@app.route('/generate_tts', methods=['POST'])
def generate_tts():
data = request.get_json()
weather_info = data.get('weather_info')

    audio_stream = generate(
        text=weather_info,
        model="eleven_multilingual_v2",
        voice=Voice(
            voice_id='HefIefabApyvTp4BVxGv',
            settings=VoiceSettings(stability=0.24, similarity_boost=1, style=0.5, use_speaker_boost=True)
        )
    )

    print(type(audio_stream))

    audio_stream_bytes = bytes(audio_stream)

    stream(audio_stream_bytes)

 return 'TTS audio generated and streamed', 200
else:
    return 'Invalid weather information', 400
slim reef
#

a bunch of code errors, hold on

quaint loom
#

Try to run your Flask app and request it to /generate_tts endpoint

slim reef
#

wait because it has a whole bunch of errors

#

just to know, what did you change from my code and yours?

#

plus, it needs to recieve the response from the Pi with the HTTP

quaint loom
#

I made change in insertion of the print(type(audio_stream)) statement, which will print the type of the audio_stream object to the console. So we can indicate what output it will give

slim reef
#

so this?

from flask import Flask, request, jsonify
from elevenlabs import set_api_key, generate, Voice, VoiceSettings, stream

Set your API key here

set_api_key("<your_api_key>")

app = Flask(name)

@app.route('/generate_tts', methods=['POST'])
def generate_tts():
data = request.get_json()
weather_info = data.get('weather_info')

if weather_info:
    # Generate TTS audio for the weather information
    audio_stream = generate(
        text=weather_info,
        model="eleven_multilingual_v2",
        voice=Voice(
            voice_id='HefIefabApyvTp4BVxGv',
            settings=VoiceSettings(stability=0.24, similarity_boost=1, style=0.5, use_speaker_boost=True)
        )
    )
    print(type(audio_stream))

    # Convert the audio stream to bytes
    audio_stream_bytes = bytes(audio_stream)

    # Stream the audio
    stream(audio_stream_bytes)

    return 'TTS audio generated and streamed', 200
else:
    return 'Invalid weather information', 400

if name == 'main':
app.run(host='0.0.0.0', port=5000)

quaint loom
slim reef
#

correct? wdym

quaint loom
#

Are you using your api_key?

slim reef
#

obviously yes

quaint loom
slim reef
#

nvm

quaint loom
#

Alright alright. Well, make sure it works fine

quaint loom
slim reef
#

yes i know, when i put the code i naturally use the key

#

i didnt put the key in the code

#

the only difference in the code is the print no??

quaint loom
#

print(type(audio_stream))

slim reef
#

how will that fix the error tho

quaint loom
#

It wont solve the issue, mister, you know. By type of audio_stream, you can better understand why you're getting the error:

slim reef
#

not really no

quaint loom
#

Well, we gottu find out the correct error for the audio. as it can`t find it.

slim reef
#

yes

quaint loom
#

Maybe the source youre using isnt the best. Have you tried another approach?

slim reef
#

idk

desert oar
#

you want the top axis to apply to the values above the line, and the bottom axis to apply to values below the line?

#

i see a mix of units here and you'll probably need to normalize them all somehow

quaint loom
quaint loom
desert oar
#

if you plot everything with ax1 then it will all be connected to whatever you set for ax1 and ax2 won't be relevant

desert oar
quaint loom
hollow sentinel
quaint loom
desert oar
#

this is why it's a good idea to pick project topics where you already have some understanding of the field

#

otherwise you're left wondering if anything makes sense at all

#

for the sake of the exercise i think it can't hurt to try it

#

but as with all modeling of unknown data, you need to prepare for the disappointment of your model being useless or worse

#

that, or it works great. who knows?

hollow sentinel
#

oh yeah i think the model will be uselesss

#

but i need to do it anyways

desert oar
#

(should be required reading for anyone working with data, including if it's from kaggle)

hollow sentinel
#

i can’t seem to find one with a medicare dataset with a data dictionary

#

i’ll keep looking though

desert oar
desert oar
hollow sentinel
#

and i already wrote 14 pages about providers scamming medicare ppl

desert oar
#

actually since you're looking to work with health data, i think you should be very concerned with the topics of fairness and harm

#

as you're probably discovering, medical decisions can be literally life or death, or otherwise can have huge effects of people's quality of life. there's no small moral imperative to proceed with the utmost discretion and discipline.

hollow sentinel
#

my profs are just like whatever about it

#

they couldn’t care less

#

i’ll continue looking for data dictionaries

desert oar
desert oar
#

otherwise you'll have to just make judgement calls and do the best you can

#

if you make a decision, document it clearly. nothing else you can do

#

but hopefully there are some important lessons here for the future

#

it looks like this particular dataset is connected with some larger project

#

you could also try to reach out to the author to get some info

#

it looks like some people have tried to get that information and the author has not responded

#

i would treat this data very very very cautiously

#

it might be good for the sake of your machine learning project though

#

but i would make it very clear in your project that this data has unknown provenance and therefore unknown quality and relevance to any real-world scenario

desert oar
desert oar
hollow sentinel
#

yeah let’s see what i can do

#

i meet them again on the 30th

desert oar
#

don't wait

#

that's way too long to have no idea what you're supposed to be doing

hollow sentinel
#

i just had a meeting with them yesterday too

desert oar
#

send them an email with your concerns

hollow sentinel
#

alr

desert oar
#

i suspect that for the sake of the project the best course of action will be to proceed with this mystery dataset just so you aren't wasting time requesting access to datasets from the US govt, trying to get university funding for the fees, etc.

hollow sentinel
#

ok

desert oar
#

but keep in mind the caveats: unknown data will have unknown problems. do the best you can formulating hypotheses, but be very cautious about applying results to the real world.

hollow sentinel
#

i think the CMS datasets are downloadable, not sure though

desert oar
#

for actual help formulating hypotheses (which i think was your original and good question) -- either do the best with the literature you've read so far, or try to find someone at your university who might be able to help point you in the right direction

#

(or keep asking online and maybe you'll get lucky with someone who has experience in this particular field)

desert oar
#

in the future if you want lots of intersting economically-oriented data, check out the US BLS, USDA, and Federal Reserve

hollow sentinel
#

hmmm

desert oar
#

i did a fun school project years ago looking at corn and wheat prices from public USDA data (although i think some of it i had to manually copy from a PDF)

hollow sentinel
#

maybe it’s a better idea to just use the kaggle data then

desert oar
#

if you were earlier in the process, i'd stand by my original recommendation to change your topic. but it seems like it's too late now

hollow sentinel
#

yeah. thanks for the help!

quaint loom
misty flint
hollow sentinel
#

i'm trying to find the datasets this guy uses... not to a lot of success tho

#

i have no idea where on the website he got the data

upper jewel
#

hello is someone good at fine tuning stable diffusion
i am trying to run lora but cant seem to get it working

hollow sentinel
#

the link is broken

misty flint
hollow sentinel
misty flint
#

oof

desert oar
# quaint loom I didn`t get your meaning here.

you are doing all your plotting with ax1, so when you adjust ax1 it will affect everything you plotted using that axes object. use ax2.barh to plot the things you want connected to ax2

desert oar
hollow sentinel
#

these guys changed their entire data dictionary

#

also wtf, it's glitching my google chrome from the amount of data

#

ok. here's the idea. we use a json get request

desert oar
hollow sentinel
desert oar
#

what are you actually trying to do?

#

you aren't just clicking "download" in kaggle?

hollow sentinel
#

i decided to use the medicare dataset because it has a data dictionary

last lodge
#

Im getting some depth problems with my plots. I don't use matplotlib much so no clue what the problem is... any ideas? edit: ignore this, I just rendered them in reverse order, i'm surprised matplotlib doesn't do any depth sorting for plots

lapis sequoia
#

Hey guys, i'm a begginer in python ai

#

How i can get better ?

quaint loom
lapis sequoia
#

Can you recommend some course ?

#

Or books ?

quaint loom
small wedge
#

there are other resources in the pins for the channel as well

quaint loom
lapis sequoia
#

I prefer a course in portuguese

#

I'm brazilian

#

How old is everyone here ?

serene scaffold
lapis sequoia
#

Just say your ages

serene scaffold
#

Please don't ask people to do that.

lapis sequoia
#

Why not ?

serene scaffold
#

Often, people who want to know other peoples ages are predators

lapis sequoia
#

?

serene scaffold
#

You just want to know how experienced people are with AI, do you not?

lapis sequoia
#

I just want to compare the other ages with my

serene scaffold
#

Well, you can't do that here.

lapis sequoia
#

Ok

#

👍

#

So, how do you learn AI ?

serene scaffold
#

I don't have any suggestions for resources in Portugese, unfortunately

lapis sequoia
#

I don't asked this for you

hollow sentinel
#

alr, so i got the json data, created a pandas dataframe from it, and then exported it to an excel file

#

now i have to do the same thing for the part B dataset as well

slim reef
#

still getting this error, any help?

C:\Users\danim\PycharmProjects\EXAM\venv\Scripts\python.exe C:\Users\danim\OneDrive\Desktop\programming\TTSPOST.py 
 * Serving Flask app 'TTSPOST'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://<laptop ip>:5000
Press CTRL+C to quit
<class 'bytes'>
[2023-10-19 19:44:55,431] ERROR in app: Exception on /generate_tts [POST]
Traceback (most recent call last):
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\flask\app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "C:\Users\danim\OneDrive\Desktop\programming\TTSPOST.py", line 28, in generate_tts
    stream(audio_stream_bytes)
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\elevenlabs\utils.py", line 76, in stream
    mpv_process.stdin.write(chunk)  # type: ignore
TypeError: a bytes-like object is required, not 'int'
<pi's Ip> - - [19/Oct/2023 19:44:55] "POST /generate_tts HTTP/1.1" 500 -
#

i have no clue what to do to fix the audio not working

#

cuz when i run this code

from elevenlabs import set_api_key, generate, Voice, VoiceSettings, play, save, stream

set_api_key("<not gonna put my key, obviously>")

audio_stream = generate(
    text="test",
    stream=True,
    model="eleven_multilingual_v2",
    voice=Voice(
        voice_id='HefIefabApyvTp4BVxGv',
        settings=VoiceSettings(stability=0.24, similarity_boost=1, style=0.5, use_speaker_boost=True)
    )

)

stream (audio_stream)
save(audio=audio_stream, filename="1.mp3")

it works, i mean i do get this

Traceback (most recent call last):
  File "C:\Users\danim\OneDrive\Desktop\programming\APItests.py", line 17, in <module>
    save(audio=audio_stream, filename="1.mp3")
  File "C:\Users\danim\PycharmProjects\EXAM\venv\lib\site-packages\elevenlabs\utils.py", line 52, in save
    f.write(audio)
TypeError: a bytes-like object is required, not 'generator'

Process finished with exit code 1

but the audio plays

#

please help

slim reef
#

anyone?

lapis sequoia
inland nexus
#

Too complicated for my puny mind

lilac root
dawn breach
#

Hey, does anyone here use PythonAnywhere often? I have code that runs perfectly in pycharm, but will not run in PythonAnywhere. I was wondering what the issue was?

serene scaffold
potent sky
modern storm
#

gm

#

im looking for some recommendations on textbooks for learning some ML + neural networks. Hoping to try some deep learning reinforcement in the future

sage bolt
odd meteor
#
  1. Prof. Sabestian's Machine Learning with PyTorch & Scikit-Learn https://sebastianraschka.com/books/#machine-learning-with-pytorch-and-scikit-learn

  2. https://d2l.ai/

Sebastian Raschka, PhD

My name is Sebastian, and I am a machine learning and AI researcher with a strong passion for education. As Lead AI Educator at Grid.ai, I am excited about making AI & deep learning more accessible and teaching people how to utilize AI & deep learning at scale. I am also an Assistant Professor of Statistics at the University of Wisc...

brisk cedar
#

Hello guys, I need some help in understanding my questions

serene scaffold
#

no one can know for sure if they can help unless they know what the actual question is.

hollow sentinel
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

so i have this code over here and i'm trying to access the Rfrg_NPI key

#

it's there in the excel file, but i can't see it in the dataframe columns

#

anyone have any ideas on how i can fix this?

#

i swear i'm not going crazy

harsh bane
#

No idea if this can be asked in this channel, but, is there a way to make stable diffusion install/load asynchronously? As whenever i install a different version/start over, it uses ages to download and install.

hollow sentinel
#

i'm not sure what mistake i'm making here

#

censoring some what i think is private info here

#

but yeah the column is there

#

is it because of invisible characters?

#

i did auto stretch the excel file column to see all the numbers

#

nvm, figured it out.

left tartan
#

Trying to make a pumpkin?

hollow sentinel
hollow sentinel
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

i have no idea what i'm doing

#

what does not in index mean?

#

idk how to fix this

#

can anyone help?

small wedge
hollow sentinel
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

do i remove those columns from the brackets?

lilac root
#

So this is my first time trying to label data myself I've only ever used pre-labelled data before in tutorials so bare with my noobness.

I am trying to create a model that can identify characters in Guilty Gear Strive (it's a fighting game, that's irrelevant though). When you pick a character you can choose roughly 12 different colour palettes for each character so my question is how should I approach this? When I'm labelling do I label each thing like SolColour1, SolColour2, KyColour1, KyColour2, etc? or is there a better way to do it than that?

hollow sentinel
#

ugh

lilac root
#

The problem is that you're trying to reference a key that isn't there

#

specifically ""['npi', 'EXCLTYPE', 'EXCLDATE', 'REINDATE', 'WAIVERDATE', 'WVRSTATE', 'MIN_EXCLUSION_PERIOD', 'END_EXCLDATE', 'DATA_YEAR']" which sounds like it's supposed to be a value within the key rather than the key itself

hollow sentinel
#

i see the problem now

#

i already dropped the keys

lilac root
#

Well that'll do it 😛

hollow sentinel
#

no it's not, now i'm really confused

#

damnit

#

my code consumes too much ram for google colab now

#

why is my pycharm having issues

sharp crest
#

Where are the projects for this discord server @ can some share the github repo

desert oar
desert oar
#

if your data is huge, develop and debug everything on a small subset of it first

sharp crest
#

How do i run multiline code like this in git bash? I forgot how to go about it any tips?

desert oar
#

(assuming you're on windows?)

#

when you run the program "Git Bash" on the start menu, what you actually are doing is opening a program called a terminal, which is basically a box that relays text back and forth to another program. in the case of Git Bash, that "another program" is a special kind of program called a shell, which is meant for running other programs and generally interacting with your computer. Bash is the name of the shell program running inside the terminal.

#

so when you open Git Bash you need to start a different program, in this case a Python interpreter

#

it should be as simple as running python or py, but that kind of depends on how you installed python

#

note that it's usually not a great idea to try developing code by pasting things directly into a console. you will almost certainly want to use a code editor to help you, i recommend IDLE or Thonny if you're a beginner

#

(IDLE should be included with Python if you installed from python.org)

lilac root
# desert oar this seems like a reasonable way to encode the data as a starting point

So I also realized that dustloop.com/w/GGST actually has all of the colors and images of the characters for all of their moves but I'm not sure if I can use those images to train? I'm not sure if since there is no background on them if that's going to make it more difficult when I'm trying to detect the characters in a live environment? Lol I'm just at a bit of an impasse as to what to do because it's such a big commitment to collect and annotate a bunch of screenshots especially when I don't know if it's going to actually work. I mean I'll do it if that's what it takes but I'm trying to get some input before I dive head first into something that won't work xD

sharp crest
#

Which is more user friendly between thonny and iddle

desert oar
lilac root
#

people actually use idle?

fading compass
#

Hello, I have 167 file texts to read and i want to extract datas from all of these files and to put these in a big list in order to plot them, can you give advice on how to do this

desert oar
#

i don't think many people use it for serious professional work

desert oar
lilac root
#

lol when I first learned python I was coming from C++, I launched IDLE and went "okay so this sucks, what's a good IDE to use" found Visual Studio Code and haven't looked back since

desert oar
#

this is getting into territory where i don't have first-hand experience

fading compass
#

i actually tried something which doesn't work, may i send it here

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

desert oar
lilac root
#

that being said IDLE is stiil leagues above where I started learning C++ back in the 90s in notepad, at least IDLE has colours xD

#

rofl

#

beat me to it.

fading compass
lilac root
#

The joys of spending hours looking through on notepad to find that bracket or semi-colon you missed with no indication of where it might be xD

#

You gotta send the pastebin link

fading compass
fading compass
#

this is the regular shape of all my files "3_......txt"

Station : BREST

Longitude : -4.49503994

Latitude : 48.38290024

Organisme fournisseur de données :

Fuseau horaire : UTC

Référence verticale : zero_hydrographique

Unité : m

Source 1 : Données brutes temps réel

Source 2 : Données brutes temps différé

Source 3 : Données validées temps différé

Source 4 : Données horaires validées

Source 5 : Données horaires brutes

Source 6 : Pleines et basses mers

Date;Valeur;Source

04/01/1846 00:00:00;3.48;4
04/01/1846 01:00:00;2.7;4
04/01/1846 02:00:00;1.99;4
04/01/1846 03:00:00;1.7;4
04/01/1846 04:00:00;2.15;4

lilac root
#

what is the error you're getting? I mean just looking at the code based on what you wanted to do I think you might have over-engineered this 😛

hollow sentinel
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

i don't know why my plot isn't visualizing

hollow sentinel
#

it seems like my code doesn't get to line 180

#

the pickle files don't actually generate

fading compass
#

I get a list for my dates which looks like : [01/01/1846,.............................................................,#,01/01/1847] something like this for the dates

hollow sentinel
#

not sure why the pickle files aren't working

lilac root
#

Ohhhhh you're referring to the # that is there shouldn't be there?

fading compass
#

yes !

lilac root
#

Ahhhh I see

hollow sentinel
#

if anyone can help i'd greatly appreciate it

lilac root
#

are you sure it's actually there and it's not your IDE's way of displaying that it's truncating the output to console?

hollow sentinel
#

are you asking me?

lilac root
#

no

hollow sentinel
#

oh ok

lilac root
fading compass
#

thank you for your help though

lilac root
#

Lol I mean let's wait to see if I can actually figure out the problem before you thank me 😛

hollow sentinel
#

im gonna figure this out tomorrow

fading compass
lilac root
#

I mean that depends on what a boucie is 😛

fading compass
#

a loop, sorry x)

#

gallicism

lilac root
#

not a single boucie no. but with 2 you could

fading compass
#

boucle not boucie 😉

lilac root
#

since all of your files follow a linear naming convention you could use a loop to iterate through all of the files

fading compass
lilac root
#

So you want it to read through each file and put the contents into a list iirc?

fading compass
#

yes

lilac root
#

so can it read 1 file put that into the list then read the next file and put it into the list?

fading compass
#

no

#

i get this error : with open(filename, "r") as file:

TypeError: expected str, bytes or os.PathLike object, not list

lilac root
#

one sec

#

what does your file structure look like?

#

I'm just a bit curious as to why you've done things as you've done them instead of just iterating through all txt files in a directory and instead put them in a list. Idk if that's because it's required for your task or because you didn't know how to do it that way.

fading compass
lilac root
#
import json

def read_urls_from_text_file(file_path):
    with open(file_path, 'r') as file:
        urls = [line.strip() for line in file.readlines()]
    return urls

def create_label_studio_json(urls):
    tasks = []
    for index, url in enumerate(urls, start=1):
        task = {
            "url": url,
            "metadata": {
                "id": index,
                "description": f"Image {index}"
            }
        }
        tasks.append(task)
    return tasks

def save_json_to_file(json_data, output_file):
    with open(output_file, 'w') as file:
        json.dump(json_data, file, indent=4)

if __name__ == '__main__':
    input_file = 'E:/GGSTAI/files.txt'  # Replace with your input text file
    output_file = 'E:/GGSTAI/tasks.json'  # Replace with your desired output JSON file

    urls = read_urls_from_text_file(input_file)
    tasks = create_label_studio_json(urls)
    save_json_to_file(tasks, output_file)

    print(f'Converted {len(urls)} URLs from {input_file} to {output_file}')

because I use this code to do something similar, only instead of putting them into a list I get it to write it into a json file

fading compass
#

ok no it's not what i'm used to do

wheat fox
#

What does sub word tokenization mean

void dome
#

hello

#

I am new to ML

#

and I have been assigned a project in which ML is used..and am supposed to use the transformers model

#

basically the problem statement is :

#

Design a mathematical model for an improved sentiment understanding based on emotion-annotated corpora on Plutchik’s wheel of emotion.

#

so if anyone could show me way like how to proceed..I would be grateful

#

like I have done the text preprocessing and all

#

Thank you

lavish kraken
#

Interesting! will listen to this keynote

potent sky
#

interesting

echo mesa
#

Guys, I heard that vector calculus is a part of machine learning, this might be a stupid question but how exactly is it being used and implemented?

tidal bough
#

Well, it's used in many places but for one, the forward-propagation of neural networks involves multiplying matrices by vectors, and so backpropagation (which is used in gradient descent, to train neural networks by tweaking their parameters) involves taking the derivative of the loss by the neural network's (many many) coefficients, which is multivariate ("vector") calculus.

#

e.g. if you have a very simple one-layer neural network, that means you get a vector of inputs x and you have a matrix of parameters A and also a bias vector b, and to calculate the output you do:

y = f(A@x + b)

where f is some activation function. This output is going to be different from the corrent output, target_y, and you could quantify that difference via a loss function like mean squared error:

loss = ((y-target_y)**2).mean()

To train the network via gradient descent, you calculate the derivative of (y-target_y)**2 by A and by b, and slightly change A and b in the direction which lowers loss, and repeat this process many times until it stops helping.

#

(this is a pretty rushed explanation, in any beginner ML course they'd go over it in more detail)

#

(and of course, not all ML is neural networks and there's plenty other places where you need linear algebra and calculus - Support Vector Machines come to mind)

past meteor
#

I think all methods except vanilla decision trees involve some level of linear algebra specifically

echo mesa
past meteor
#

At its heart "learning" is an optimisation problem which can be done through methods coming out of linear algebra or methods coming out of calculus typically.

wooden sail
#

at levels ranging from bsc to phd

#

the place to start with would be calculus, linalg, and statistics, the basics of which are more or less independent of each other

#

as calculus starts going into multivar and statistics starts dealing with multiple variables, linalg leaks into both

tidal bough
#

I actually started doing ML with Ngo's ML course on coursera: https://www.coursera.org/collections/machine-learning back when I was in highschool, maybe year 10 or so?
It was legendary (something like the top-1 course on coursera, I believe) back then and I think still is. Note that this was actually before the course had a massive rework - back then it was taught in Octave, now it's taught in Python. Probably it's still good, though.

echo mesa
#

Wow guys I appreciate all of your responses, thanks very much

wooden sail
#

i would also say statistics is something you have to revisit several times. basic stats doesn't require much else. multivar stats requires linalg. continuous stats requires calculus. estimation theory, which is where you want to arrive at for machine learning, requires both calc and linalg

echo mesa
past meteor
echo mesa
wooden sail
#

i'd recommend the book "statistical signal processing" by louis scharf, as well as "fundamentals of statistical signal processing: estimation theory" by steven kay. these already require the 3 prerequisites we've been discussing. there are also other books that more directly address ML, but the key concepts are the same

past meteor
#

In my first year I got linear algebra first, then calculus. In my second year I had statistics and an optimization course. In my third year I got econometrics which teaches you the "finesse" of working with data.

echo mesa
past meteor
tidal bough
echo mesa
#

Thank you guys all you are helping a lot

past meteor
#

Once you're "ready" this is the book to read in my opinion: https://www.statlearning.com/. It assumes you have a working knowledge of lin alg, calculus and probability/statistics

lapis sequoia
#

I have this cv mask I've extracted, with cv2 i want to detect how many white areas there is in this black and white mask, what would be the best, efficent and accurate way of doing this? Below i have some of the masks i've computed:

#

And also these white areas are going around in a circle since i'm trying to make a bot for a game

#

I've tried asking chatgpt and and copilot, this is the best i have so far, but it is still a little inaccurate


img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
img = cv2.inRange(img, blue_lower_bound, blue_upper_bound)

ret, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)

print(num_labels)  # num_labes > 2 then theres more than one

echo mesa
past meteor
flat ridge
#

hello guys, i'm making a face recognition project, using FaceNet and SVC. It performs well in identifying trained faces, but i need it to say if the face is not identified/trained too (what not happens cause classifier only predict based on the most likely to be the label), what can i do to solve it?

odd meteor
# echo mesa Would you guys recommend learning math before learning machine learning or as yo...

I'd say it depends on what you wanna be at the end of the day. If you're like me who's more into ML Research, then try to cover the basic Math and Statistics fundamentals. It'll be nice to also understand SVD in linear algebra.

So in summary, if you're still in high school or undergraduate program, you're lucky 'cos most of these fundamentals would be covered extensively therein. Just give it more attention in school.

Other than that, you can get into ML without being the best mathematician or statistician. Whatever you don't know now, you'll learn it as you keep progressing in the field.

hollow sentinel
#
partb = pickle.dumps("partb")
partd = pickle.dumps("partd")
dmepos = pickle.dumps("dmepos")
combined = pickle.dumps("combined")

partd = pickle.loads(partb)
partb = pickle.loads(partd)
dmepos = pickle.loads(dmepos)
combined = pickle.loads(combined)

#

is there something wrong with how i'm using pickle here?

#

i was following a youtube tutorial that used .dumps, but now i see videos saying to use .dump?

#

apparently they both do different things?

#
pd.to_pickle(partd, '/Volumes/ML_projects/Medicare_Fraud_Datasets/processed_data/partd.pkl')
pd.to_pickle(partb, '/Volumes/ML_projects/Medicare_Fraud_Datasets/processed_data/partb.pkl')
pd.to_pickle(dmepos, '/Volumes/ML_projects/Medicare_Fraud_Datasets/processed_data/dmepos.pkl')
pd.to_pickle(combined, '/Volumes/ML_projects/Medicare_Fraud_Datasets/processed_data/combined.pkl')
tidal bough
#

the ones with "s" take or return a bytes object, the ones without write to/read from a file.

hollow sentinel
#

this is how the guy on github did it

#

shouldn't it create a pickle file in my directory?

#

bc i see nothing

#

i copied my folder name in where his path is

#

i have no idea what's not working here

tidal bough
hollow sentinel
#

do macs block pickled files?

#
pd.to_pickle(partd, "/Users/rahuldas/Desktop/medicare fraud data/partd.pkl")
pd.to_pickle(partb, "/Users/rahuldas/Desktop/medicare fraud data/partb.pkl")
pd.to_pickle(dmepos, "/Users/rahuldas/Desktop/medicare fraud data/dmepos.pkl")
pd.to_pickle(combined, "/Users/rahuldas/Desktop/medicare fraud data/combined.pkl")
``` basically doxxing myself but whatevs
tidal bough
#

this should write some files unless it crashes

hollow sentinel
#

it doesn't crash strangely

#

should i post all my code in a pastebin?

tidal bough
#

actually, hmm

#

I can't find docs about pd.to_pickle existing, actually? there's .to_pickle on dataframes.

tidal bough
#

that's a method on dataframes, not a function in pandas.

hollow sentinel
#

let's see what type partd is

#

it doesn't reach the line

tidal bough
#

well, that explains why it doesn't do anything.

hollow sentinel
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

here's all the code

#

idk what's causing everything to hang

#

i don't get a warning or anything

#

actually i used to but i did ignore warnings

#

i'm beyond stuck

#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

so i printed partb, partd, and dmepos types on lines 128, 129, and 130

#

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

#

so it is a dataframe

#

now i'm really confused

#

if it is a dataframe why is it not working?

#

the problem is the lines below it, but idk what

#
partb['START_EXCLDATE'] = partb['EXCLDATE'].dt.year
partd['START_EXCLDATE'] = partd['EXCLDATE'].dt.year
dmepos['START_EXCLDATE'] = dmepos['EXCLDATE'].dt.year
#

it stops around these three lines of code

#

not sure why, they look fine

#

at least i was able to figure out the lines causing the problem

#

i don’t understand what the issue is

serene scaffold
#

.latex This is supposed to be a squared error loss function for fitting a linear function. But doesn't $(\text{h}(x) - y)^2$ have a different derivative than $(y - \text{h}(x))^2$ wrt $w$ or $b$? How can we guarantee that the gradient will be positive if we derive $\frac{d}{dw} (y - wx + b)^2$ and get $-2$ as a coefficient?

strange elbowBOT
tidal bough
#

(h(x) - y)^2 is exactly the same as (y - h(x))^2 and so has the same derivatives, too.

#

If it's not obvious, note that when calculating the derivative you'll get 2(y-h(x)) * -dh(x)/dw in the second case and 2(h(x)-y)dh(x)/dw in the first, which are the same thing (minus sign is just in a different place)

tidal bough
lavish kraken
#

finally did my best using XAI with Lime

#

so fucking complicated shit

odd meteor
lavish kraken
serene scaffold
#

@tidal bough thanks, I'll work it again when I get home 🐱

potent sky
#

Does anyone have recommendations for reading material on hyperbolic manifolds?

#

Or even videos, if they're rigorous

hollow sentinel
#

i have no idea what the problem is

#

the code should be fine

#

like what am i missing here

left tartan
jade bloom
#

hello

eternal finch
#

hi

jade bloom
#

I was wondering, how do you determine how many input nodes you need to have for a regression model? For a basic Keras Neural networks?

potent sky
odd meteor
void dome
hollow sentinel
left tartan
hollow sentinel
midnight harbor
hollow sentinel
#

!pastebiin

#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

left tartan
hollow sentinel
#

not like it does here

#

is it because i'm missing the savefig line?

left tartan
#

I'm just missing the context here. Your code relies on an external file which I don't have.

#

But, at the very end, you have plt.show() when you should probably have fig.show(), or just fig (which will autodisplay if it's the last line of the cell (plt.show might be fine, I use plotly more than matplotlib nowadays)

hollow sentinel
left tartan
#

I don't actually want the dataset tho... but I do want to help you debug it

#

Your first question was about a pandas issue. Do you still have that?

hollow sentinel
#

yeah, i don't see anything wrong with those lines

left tartan
#

So, from a debugging perspective, the question should be: does the dataframe "look" correct? Not the code, but the actual data.

#

I usually throw a "display(df)" in, so I can inspect the data

hollow sentinel
#

oh you have to import it

hollow sentinel
# left tartan So, from a debugging perspective, the question should be: does the dataframe "lo...
 LASTNAME FIRSTNAME MIDNAME  ... DATA_YEAR_max TARGET START_EXCLDATE
0          NaN       NaN     NaN  ...           NaN      0         2020.0
1          NaN       NaN     NaN  ...           NaN      0         1988.0
2          NaN       NaN     NaN  ...           NaN      0         1997.0
3          NaN       NaN     NaN  ...           NaN      0         1994.0
4          NaN       NaN     NaN  ...           NaN      0         2011.0
...        ...       ...     ...  ...           ...    ...            ...
69646      NaN       NaN     NaN  ...        2021.0      0            NaN
69647      NaN       NaN     NaN  ...        2021.0      0            NaN
69648      NaN       NaN     NaN  ...        2021.0      0            NaN
69649      NaN       NaN     NaN  ...        2021.0      0            NaN
69650      NaN       NaN     NaN  ...        2021.0      0            NaN
left tartan
#

Yah, so is that what you expected? I doubt it. Add a few more displays and figure out the earliest place the data looks wrong.

left tartan
#

Why so many null values?

hollow sentinel
hollow sentinel
left tartan
#

Where did you insert that display? And which df was it?

hollow sentinel
#

part b

left tartan
#

So, first thing: Open part_b_data.xlsx and compare it to your dataframe. Are the first names really null in the spreadsheet?

hollow sentinel
#

no

#

they're not all null

left tartan
#

Are the first 5 firstnames null?

#

(or last 5? .. the display you showed displays the first 5 and last 5 lines)

hollow sentinel
#

nope, they're all actual names

left tartan
#

So, at line 69 you have a print('hi'). Can you change this to display(part_b)?

hollow sentinel
#
   Unnamed: 0  Rndrng_NPI  ... Bene_CC_Strok_Pct Bene_Avg_Risk_Scre
0             0  1003000126  ...              0.14             1.8026
1             1  1003000134  ...              0.03             1.0785
2             2  1003000142  ...              0.06             1.4920
3             3  1003000423  ...               NaN             0.6362
4             4  1003000480  ...               NaN             1.8233
..          ...         ...  ...               ...                ...
995         995  1003059544  ...              0.00             1.1879
996         996  1003059676  ...              0.05             1.3646
997         997  1003059684  ...               NaN             1.3586
998         998  1003059783  ...              0.12             2.6992
999         999  1003059866  ...              0.17             2.9189
#

it's like it was never read correctly

#

or maybe it was

#

because there are a lot of columns in between

#

there's the NPI, then the first name, last name

left tartan
#

Yah, so then look at line 128.. at the that, you've just renamed and reordered the columns

hollow sentinel
left tartan
#

No, I don't know yet. We're just trying to isolate it by displaying the df at various points.

hollow sentinel
#

i see

left tartan
#

Just basic debugging: start at the beginning, and display/print the variables to make sure the data looks correct

hollow sentinel
#

yep

#

i think the size of the dataset is messing it up too

#

not really sure

kind beacon
#

hi how u doing guys i took a class in Ai programming and i need some resources to learn. prof suggested some but kinda 1980 style alot of theory and kinda not beginner friendly and im not really big into ai type stuff

mild dirge
mild dirge
#

What kind of resources do you expect then? Don't mean it in a harsh way, but making a model in Python is not the same as understanding it.

#

So going through the theory is just an important, sometimes tedious step of the process

serene scaffold
#

I don't find that hard to believe, even if I'm not entirely sure why

tidal bough
charred canyon
#

data-science-and-ai (AI)

serene scaffold
tidal bough
#

Sure, I suppose? It's just the distributive property: -(-b - w x + y) = (-1)*(-b - w x + y) = (-1)(-b) + (-1)(-wx) + (-1)(y) = b + wx - y

jade bloom
#

okay

#

let's say you have two features

#

but you make a model with 10 input nodes

#

what does the machine learning model do then? what does it use for the other 8 input nodes

timid kestrel
#

Hey yall i have a quick question writing OLS formulas in python. Im kinda new and Im trying to see if that there are 3 variable columns that would effect the outcome of a variable. But i noticed that the dependent variable is categorical or boolean. Since its a collumn that shows whether it recieved an award or not. I got an error when i tried ols.fit

“”ols_formula = “Award ~ imdb_rating + C(genres) + tomato_rating””

I noticed that award is boolean. Is there a way i can get over this?

desert oar
#

an input "node" is just another name for what you might call a "feature" of each input item

past meteor
spring loom
#

Hi guys Im wanting to start on my first machine learning project is there any website I can use for this I know of kaggle just any tips would be really appreciated

honest verge
#

Hey guys, how would I go about making a model that uses live video feed to describe objects as well as its distance relative to the camera? how would I then be able to take those generated descriptions and convert it to audio?

jade bloom
#

then based on the coordinates of the bounding box for an object you can calculate the area and tell how far away it is from the camera

#

each object would have a different size bounding box areas close to and far from the camera, so you would have to account for that

#

those are just some ideas :)

#

not sure about the audio part though

#

(detector model below)

#

Also, unrelated, my question is this:

#

should I stop training at around epoch 18? This is a linear regression classification model

#

or wait

#

my interpreation is that I shouldn't because the model seems to be overfitting later...

#

but please explain why I should let it train later for longer if I should

honest verge
#

Ohhhh ok I see thank you so much that sounds like a great idea thank you so much!!

warm sorrel
#

Hi.
Just a simple question for the pandas dataframe library + sns plotting.

I have a music dataset. among other columns, I have danceability (float from 0-1) that describes how "danceable" a song is
genre and
tempo_category tempo category is just a categorical variable of 8 degrees ranging from "Extremely slow" to "Extremely Fast"

I am able to plot the danceability of each tempo category as shown with

sns.set(style="whitegrid")
ax = sns.boxplot(y=music["Danceability"],x=music["Tempo Cat"],showmeans=True,
                    meanprops={"marker":"o",
                    "markerfacecolor":"white",
                    "markeredgecolor":"none",
                    "markersize":"3"})
ax.get_figure().autofmt_xdate()```

What I don't know how to do: How do I plot it but grouped by the genre?
Meaning I have the same x and y axis, but now I have e.g. ~18 plots (1 for each genre)
Beyond individually saving each category to a new dataframe variable to replot, i am not sure how to do this.
warm sorrel
#

I thought about doing a for loop with this enclosed but when i tried the newer plot just overrides the older one

curGen = "HIPHOP"
sns.set(style="whitegrid")
ax = sns.boxplot(y=music.loc[music["Genre"] == curGen]["Danceability"],x=music.loc[music["Genre"]==curGen]["Tempo Cat"],showmeans=True,
                    meanprops={"marker":"o",
                    "markerfacecolor":"white",
                    "markeredgecolor":"none",
                    "markersize":"3"}).set_title(curGen)
ax.get_figure().autofmt_xdate()
west cloak
#

Any links where I can read about data quality plans for python?

serene scaffold
hollow sentinel
#
Train_Allpatientdata=pd.merge(Train_Outpatientdata,Train_Inpatientdata,
                              left_on=['BeneID', 'ClaimID', 'ClaimStartDt', 'ClaimEndDt', 'Provider',
       'InscClaimAmtReimbursed', 'AttendingPhysician', 'OperatingPhysician',
       'OtherPhysician', 'ClmDiagnosisCode_1', 'ClmDiagnosisCode_2',
       'ClmDiagnosisCode_3', 'ClmDiagnosisCode_4', 'ClmDiagnosisCode_5',
       'ClmDiagnosisCode_6', 'ClmDiagnosisCode_7', 'ClmDiagnosisCode_8',
       'ClmDiagnosisCode_9', 'ClmDiagnosisCode_10', 'ClmProcedureCode_1',
       'ClmProcedureCode_2', 'ClmProcedureCode_3', 'ClmProcedureCode_4',
       'ClmProcedureCode_5', 'ClmProcedureCode_6', 'DeductibleAmtPaid',
       'ClmAdmitDiagnosisCode'],
                              right_on=['BeneID', 'ClaimID', 'ClaimStartDt', 'ClaimEndDt', 'Provider',
       'InscClaimAmtReimbursed', 'AttendingPhysician', 'OperatingPhysician',
       'OtherPhysician', 'ClmDiagnosisCode_1', 'ClmDiagnosisCode_2',
       'ClmDiagnosisCode_3', 'ClmDiagnosisCode_4', 'ClmDiagnosisCode_5',
       'ClmDiagnosisCode_6', 'ClmDiagnosisCode_7', 'ClmDiagnosisCode_8',
       'ClmDiagnosisCode_9', 'ClmDiagnosisCode_10', 'ClmProcedureCode_1',
       'ClmProcedureCode_2', 'ClmProcedureCode_3', 'ClmProcedureCode_4',
       'ClmProcedureCode_5', 'ClmProcedureCode_6', 'DeductibleAmtPaid',
       'ClmAdmitDiagnosisCode']
                              ,how='outer')
#

i don't quite understand left_on and right_on

#

it looks like the tables are the same, so why all the extra arguments?

#

at least for once all the code works on my machine

sharp lagoon
#

Is this the right chat for a question I have about reinforcement learning?

sharp lagoon
#

Ok cool thank you. I am trying to train an AI to play Pokemon Red with the pyboy emulator's API for python. My model currently has these parameters:

num_cpu = 16
ep_length_multiplier = 2
ep_length = 2048*ep_length_multiplier
model = PPO('CnnPolicy', env, verbose=1, n_steps=ep_length // ep_length_multiplier, batch_size=128, n_epochs=5, gamma=0.999, learning_rate=0.003, gae_lambda=.98)

What happens is it starts out strong, using my exploration reward system to find its way outside the starting house and eventually to the first real reward event. It is rewarded heavily for this and then continues to get rewarded if it continues down the right path. However, the furthest it gets after a couple iterations is making it through to where you pick a pokemon, and beat your rival, then starts walking toward route 1.

The problem then arises that it does this only some times, no matter how many iterations/episodes I run. It seems to learn and get better and more efficient at making its way here (which is by far the most rewarding path) but then proceeds to search other paths infinitely. If I run 16 cpus, about 4 on average will make there way to the rewarded path, but never improving in efficiency past a certain point. and the rest all get lost and bang their head against a wall.

My thought was that the reason was exploration/exploitation was too imbalanced, but I've tried decreasing exploration rewards, decreasing entropy coefficient (to a negative number because it was already 0), tried raising and lowering gamma and gae-lambda.

Nothing has worked and I think I'm misunderstanding something because if I'm to get the AI to actually eventually beat the game, it needs to very heavily favor going down the same very rewarding path almost every time and I'm not sure how to make it do that any more strictly than I already am. Obviously you need a bit of exploration so it doesn't get stuck in a local maximum, but this still seems too random or high exploration/entropy

rugged comet
#

What do you call it when you compare all the columns to each other in a grid of scatterplots? I want to see something like this image.

rugged comet
#

Thank you.

#

insurance charges, age, and bmi vs. each other. It seems like they charge considering what age bracket someone falls in.

#

I might be interpreting that incorrectly.

#

Yeah I am.

#

I wonder what would cause those three distinct lines to show up.

agile cobalt
#

the diagonal? that is plotting the feature against itself

#

or did you mean age X charges

rugged comet
#

I mean age x charges

hybrid mica
#

Which model can I use to determine how similar two short (around 5-10 words) pieces of text are?

agile cobalt
# rugged comet I mean age x charges

my impression is pretty much that there are three tiers of charges, each of them scaling mostly linear with the age, but the number of points is a bit absurd - it's very possible that 99% of the points are in the lowest "line"

#

try using some transparency (alpha)

rugged comet
#

It looks like you were right.

agile cobalt
#

personally I'd use even more

rugged comet
#

alpha = 0.05

noble plover
#

What do we use these days to display tables in jupyter-notebook?

I have tried QGrid, but can't get it to work in the latest versions of python.
Is there anything similar?

I want to be able to sort columns and search

remote tulip
#

Do we gave a channel for LLMs topic and genai

#

?

agile cobalt
#

if you just want to spam your outputs, offtopic
for discussion of the technical aspects, here

#

you may want to look for servers more specifically focused on it instead though

hybrid mica
sharp lagoon
#

I am trying to train an AI to play Pokemon Red with the pyboy emulator's API for python. My model currently has these parameters:

num_cpu = 16
ep_length_multiplier = 2
ep_length = 2048*ep_length_multiplier
model = PPO('CnnPolicy', env, verbose=1, n_steps=ep_length // ep_length_multiplier, batch_size=128, n_epochs=5, gamma=0.999, learning_rate=0.003, gae_lambda=.98)

It starts out strong, using my exploration reward system to find its way outside the starting house and eventually to the first real reward event. It is rewarded heavily for this and then continues to get rewarded if it continues down the right path. However, the furthest it gets after a couple iterations is making it through to where you pick a pokemon, and beat your rival, then starts walking toward route 1.

The problem then arises that it does this only some times, no matter how many iterations/episodes I run. It seems to learn and get better and more efficient at making its way here (which is by far the most rewarding path) but then proceeds to search other paths infinitely. If I run 16 cpus, about 4 on average will make there way to the rewarded path, but never improving in efficiency past a certain point. and the rest all get lost and bang their head against a wall.

My thought was that the reason was exploration/exploitation was too imbalanced, but I've tried decreasing exploration rewards, decreasing entropy coefficient (to a negative number because it was already 0), tried raising and lowering gamma and gae-lambda.

Nothing has worked and I think I'm misunderstanding something because if I'm to get the AI to actually eventually beat the game, it needs to very heavily favor going down the same very rewarding path almost every time and I'm not sure how to make it do that any more strictly than I already am. Obviously you need a bit of exploration so it doesn't get stuck in a local maximum, but this still seems too random or high exploration/entropy

sharp lagoon
#

If someone is well-versed here in PPO models in python and why my model is doing this, I would be open to a voice chat to show what's happening if text seems laborious

rugged comet
#

How do we interpret Root Mean Squared Error of log-transformed responses/targets? Is it possible to get back to the original units like USD instead of log USD?
I found
https://stats.stackexchange.com/questions/371529/interpreting-rmse-of-log-values#:~:text=As the RMSE is in,values and the true values.
but it's hard for me to understand and they don't talk about the inverse.

desert oar
#

you can't obtain the RMSE on original scale from only RMSE on log scale because many different possible predictions and errors can produce the same RMSE on log scale but different on original scale

#

consider y_pred = [20, 10] and y_true = [10, 5]

#

then consider y_pred = [200, 100] and y_true = [100, 50]. same RMSE as above in log scale, totally different in original scale

feral blade
#

hii ! I'm using vgg16 pretrained model CNN layers and adding some dense layers to predict the 100 classes in cifar100. What's the effective way to change all input sizes from 32x32 to 224x224

earnest wren
clever lake
#

Guys I would like to learn to program ai, but it seems like something unattainable, where can I start?

feral blade
neon vessel
#

Hi folks,
Which lib for machine learning you use the most, Tensorflow, Pytorch or Keras?

amber kelp
versed gulch
#

is there any python code available to measure vessel radius values of 3D images

oblique orchid
#

hey guys , not python realted questions , but can anyone come on vc and guide me , I want to understand how MongoDB is used for analytics ? or how does a person work with NoSQL databases

cunning agate
#

Hello I want someone to review with me a notebook please

clever lake
clever lake
vestal widget
#

Anyone have tips on finetune gpt-2, im finetuning a 124M model using gpt-2-simple module, but i dont really know when do i overfit the trainning, i know there is the checking for validation loss, but i dont think it possible for this module, unless there some code for it

harsh minnow
#

I am using pinecone on my app, it retrives information based on latest text, but I want to retrieve information based on the whole conversation context, not just a similaritySearch. How should I do this?

potent sky
potent sky
# clever lake y

that makes the entry more accesible. There are machine learning libraries with very simple python APIs so you could get your own models running without much effort
but to get good learning value, I would suggest reading up on the math and theory behind how these models work and how they're implemented is important
For that you could pick up a book

#

O'reilly's ML with scikit learn and tensorflow is a generally good book with a mix of both hands on coding and some accompanying theory

#

these are all free

unique ether
#

I know you can't read anything on that I'm sorry

rugged comet
#
def cross_validate(df, predictors, response):
    estimator = sklearn.linear_model.LinearRegression()
    scores = sklearn.model_selection.cross_validate(estimator, df[predictors], df[response])
    logging.info(scores["test_score"])

Output

[1. 1. 1. 1. 1.]

Is this telling me that I got an r-squared of 1.0 for all five (5) folds of cross validation?

agile cobalt
#

seems like so
(side note; cross_validate uses the default scoring metric of the model if you don't specify one, and the default scoring metric of the LinearRegression is r-squared)

there probably is some sort of data leak if I had to guess? unless it's a borderline trivial problem

rugged comet
#

Oh yeah there was a data leak. Thanks.

void dome
#

hello
i was watching a video on intro to huggingface and sentiment analysis
and followed each step side by side
but the prediction i am getting is not expected

#

any insights would be aprreciated

sharp crest
#

So Rsme RMSE is the root of the mean of the squared errors

#

Hmm?

iron peak
#

Im wanting to learn matplotlib. I currently have a horizonal bar graph bu the names on the y axis are cut off. Does anyone have any resources for styling charts and how to save them as images (i want to import them into a pdf file)

sage bolt
#

Use savefig method to save your charts as an image

cunning agate
#

hey guys if i have results like this does that mean my models are overfitting

#

my data has 12 features and i took a 50000 observation to train my models

#

this is my main function

cosmic imp
#

I have a question

#

does anyone own the google corral usb accelerator?

desert oar
#

also stackoverflow can have a lot of useful advice, but it can be hard to search for things. you might need to try several combinations of search terms

dusky basin
#

Where to start for learning genetic algorithm in python and implement practically.

rotund surge
#

Hello, i'm new here. I would like to get some help with a land price prediction algorithm i am currently working on

covert aspen
#

I have a string of binary values, 0 and 1, that I converted to this string of dark and light circles so that it could visually aid me with identifying patterns (if any) in this string. Here, the dark circle represents a 0, and the light one represents a 1. I cannot find any patterns just by looking at this string. Are there any tools out there that can help me identify patterns in this string? I'm ready to learn anything to be able to identify any patterns in this string. For more context, the aforementioned string of 0s and 1s represents the outcomes of an experiment that I conducted where each outcome could take on a binary value.

jaunty helm
#

actually I think sequential pattern mining is the name of the topic you're looking for

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time series mining is closely related, but usually considered a different activity. Sequential pattern mining...

fallow frost
#

whats the appropriate data-structure for avoiding a full-scan while doing this operation:

def search_name(prefix: str):
  return next((x for x in my_names if x.startswith(prefix)), None)
#

the first thing that comes to mind is how a DB would index that column: a b-tree, but whats the equivalent in Python?

fallow frost
#

oh yea, but is there a std-implementation?

#

or smth similar

atomic tide
# fallow frost or smth similar

A trick that I've seen is to use a defaultdict: ```py
from collections import defaultdict

Trie = lambda: defaultdict(Trie)
STOP = object()

def add(trie, word):
for letter in word:
trie = trie[letter]
trie[STOP]

def contains(trie, word, just_prefix=False):
for letter in word:
if letter not in trie:
return False
trie = trie[letter]
return just_prefix or STOP in trie

#

A more pythonic way might be to just have a dictionary mapping prefixes to sets of words that start with that prefix.

nimble tide
#

im using spacy and i want to fine tune a named entity recogniser (basically i want to fine tune en_core_web_trf)

how do i do that?

#

id rather not train my own NER cause that would be too expensive on the servers

covert aspen
fallow frost
unique ether
#

Anyone here any good at data cleaning?

#

I'm drowning here and I need someone to throw me a life vest

#

All I need to know is how to decide what to do with columns that have missing data:

When do I remove the whole column?
When do I Impute the data?
When do I just drop the rows?

jaunty helm
#

How you impute the column depends a lot on what it represents

unique ether
#

Would It help if I showed you a visualization of my data I've produced to help me decide which columns to drop?

#

You might need to download it to read it but it basically summarises my issue right now. I've got columns with varying correlation strengths to my target variable but they all have varying levels of missing data too

#

I'm using spearman for the correlations on that chart btw

jaunty helm
unique ether
#

I just don't know what to do other than use the force 😆

jaunty helm
#

You might want to first think about what each column represents before deciding what to do

unique ether
#

Do you mean the distributions?

jaunty helm
#

Like for example (if I'm reading correctly), there's lots of missings for OWN_CAR_AGE, but missing in this case may or may not mean they don't own a car

#

In that case it might make sense to set all missings to 0

unique ether
#

I went through and did this to the dataset already earlier

# Replacing all the values found to represent 'nan' in the dataset with np.nan
raw_applications.replace(['Unknown', 'XNA', 'not specified'], np.nan, inplace=True)

I'm not sure if it was a good idea though

unique ether
#

Its kinda the nightmare dataset. I've only just realized recently that some of the column headers are in Bulgarian