#data-science-and-ml

1 messages · Page 358 of 1

desert oar
#

i'm not sure if you need to re-fit the model once for each class. i'd have to think about it

#

show it to your advisor/director

compact parrot
desert oar
#

dt?

compact parrot
#

Data science

desert oar
#

why "t"?

compact parrot
#

Oh

#

3:42 am moment

final scaffold
#

Hi, i understand the reason behind creating env for each project.

I have some confusion in how these env and packages inside them works.

  • Anaconda is installed in c:/programdata/anaconda3.
  • And i wanna create all my projects inside my d:/
    So,
    I saw that anaconda environments are inside a folder "envs" in the above location.

My question is when I'm creating an environment inside the envs folder using: "conda create -n venv python"
And then referring that env for my project in d:/
Do all the packages i need for my project will have to installed again inside the new environment? Even though all the packages are already inside the base after the installation?

desert oar
#

also i am pretty sure you can compute this multi-class AUC without re-fitting the model each time, but you might end up with weird threshold values

desert oar
#

no, i don't think you do

#

try it without and see if it gives sensible results

desert oar
#

that said, the location of your project is unrelated to the location of the env you use

final scaffold
#

Understood

desert oar
#

so it's totally fine to put your project in d:\ and to let conda create the environment wherever it wants to with -n

final scaffold
#

What about the package installation for each project?

desert oar
#

what do you mean?

#

oh

#

Do all the packages i need for my project will have to installed again inside the new environment? Even though all the packages are already inside the base after the installation?
yes

final scaffold
#

Last part of my question

#

Uhh

desert oar
#

it won't re-download the packages from anaconda.org, but it will install them separately

#

that's the whole point

#

that's how conda achieves separation between environments

#

you can theoretically build a system that de-duplicates the actual installed contents of packages, but it would be very difficult and messy

#

if you really want something like that, check out nix or guix

#

but you will have to write a lot of package specifications yourself in that case. not worth it to save a few gb of disk space imo, although it does make for a more thoroughly reproducible environment setup

final scaffold
#

So if i create an environment inside the d:/ project folder and refer the default conda python, you are saying i would have to install every package again even though those packages with same version are already in the anaconda folder?

desert oar
#

what do you mean "refer the default conda python"

#

if you create an environment with conda create, you will have to install all your required packages into that environment, yes

#

like i said, that's explicitly the goal of environments: separation

final scaffold
compact parrot
final scaffold
#

I thought since it comes with packages i wouldn't have to install again every time.

desert oar
#

it comes with packages installed in the base environment

#

if you create a new environment, you have to install packages in that environment

#

it's really not a big deal though, i wouldn't worry about it

#

anaconda comes with a lot of junk imo anyway

final scaffold
#

So, what's the point of insatlling anaconda i think environment and python packages and environment can also be achieved if we only use python from python.org

desert oar
#

heck, you can even create a conda env that doesn't have python at all

desert oar
#

conda is fundamentally different from python + venv

#

you don't even have to install python in a conda environment

#

the only reason you need python in the base environment is that conda is itself a python application

compact parrot
#

Quite offtopic quesion
Is it smart to buy a home server for data science? On a computer, it is not always possible to leave the code running for a long time, since training consumes 100% of the cores

desert oar
#

some people build really wild computers for doing machine learning at home, 2 gpus and xeon processors with ecc ram

#

but that's very expensive, especially with the gpu and other chip shortage issues

compact parrot
desert oar
#

that's still considered high-end by a lot of standards, and fairly expensive

#

my home pc is some old i5 and a 1060

final scaffold
#

Cloud is very cheap if you are doing light weight but can empty your pocket if you dont know your exact requirements

compact parrot
#

My home pc is r9 3950x, 2070S and 64 gb ram
but even on it some datasets take more than a few days

compact parrot
final scaffold
final scaffold
desert oar
desert oar
#

i am a full time professional and i would consider $3000 a very expensive purchase

#

if that isn't expensive for you, then you are very fortunate, and you should go ahead and build a server

final scaffold
#

I was gonna recommend the Azure's free tier to you but nvm

compact parrot
#

It's enough expensive for me, but I can afford this purchase if I would save money for one or two years
So I am trying to understand, worth it buying

compact parrot
final scaffold
#

Jk lol. Free tier has 2gigs of ram and can only run some stuff

compact parrot
#

Jk? I am not native speaker

final scaffold
#

Just kidding.

desert oar
compact parrot
#

I was trying some clouds on free tier and it was worse than my home pc(

final scaffold
#

It is. I only use them to schedule refresh some tasks or running bots

compact parrot
#

Ok)

marsh yacht
#

can anyone give me an easy way to write folium.circlemarker in a folium map from a dataframe

#

nvm i got it

static osprey
#

hi i need help with anaconda when i try to install a library in cmd i type pip install 'lib name' then it gets downloaded in anaconda and i cant use the lib on my main python interpreter

serene scaffold
#

If not, try deleting it.

static osprey
serene scaffold
static osprey
#

before changing windows i used to download lib twice once on main python and once in anaconda env now when i download in main python it gets downloaded in anaconda nev

serene scaffold
#

since you like using anaconda, why are you trying to not use it, in this case? because you can make separate environments with anaconda and pip install stuff into those different environments.

#

anyway, what happens if you type which pip in the terminal

static osprey
serene scaffold
static osprey
serene scaffold
#

I've never heard of that happening PeepoShrug

#

but as you can see, on the D drive (whatever that is), pip is pointing to the pip in anaconda3

static osprey
#

so what should i do ;/

serene scaffold
#

I only use gitbash and powershell on Windows. cmd is annoying.

static osprey
serene scaffold
gaunt elbow
#

Can anyone tell me if a script to compare and analyze Cryptocurrency data is an AI ??

serene scaffold
#

git is a version control system and gitbash is a terminal.

static osprey
#

thank you for your help tho

serene scaffold
gaunt elbow
#

Ok i didn't started yet i'm waiting to a freind to help me with the criptocurrency analysis and then i will start

final scaffold
rough mountain
#

Could I set up a neural network in a way, that instead it uses memory to adapt to different datasets. The idea is, I feed it a hundred or so images, then it checks to see if the remaining images look similar to the first 100

serene scaffold
#

also how would it "check to see if the remaining images look similar"?

#

I think I misread what you said. Can you be more specific about what you're trying to do? You want to train a model to do what with images?

rough mountain
serene scaffold
rough mountain
#

close enough

serene scaffold
#

There isn't really "close enough" when you're trying to formally specify what something is supposed to do.

rough mountain
#

Apon more research, one shot learning on large sets of images.

rough mountain
rough mountain
wicked grove
# serene scaffold There isn't really "close enough" when you're trying to formally specify what so...

Hello, I'm really new to tensorflow and I'm doing the deeplearning.ai course on coursera
Could you please tell me if this is a good tutorial to follow https://youtu.be/bte8Er0QhDg

Today we use Tensorflow to build a neural network, which we then use to recognize images of handwritten digits that we created ourselves.

◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: https://www.neuralnine.com/books/
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
👕 Programming Merch: https://www.neural...

▶ Play video
upper granite
#

Hello, everyone. I work with databricks and I have the following problem. I want to connect databricks cluster with my local machine I tried with Databricks connect but only the spark code execute on the cluster I want the entire code to execute on the cluster.

frozen pasture
lapis sequoia
#

Hi guys, does anyone know the maths of LDA and variational inference? I have some questions. What does parameterization mean?

exotic edge
#

Hi Guys im trying to start learning about AI and creating some sort of AI assistant and wondering where there is good places to start and learn to code this in python?

#

@ me with any advice if possible <3

serene scaffold
exotic edge
serene scaffold
exotic edge
serene scaffold
exotic edge
serene scaffold
exotic edge
serene scaffold
#

@exotic edge a high-level overview of K nearest neighbors: suppose you want to predict the political affiliations of people in a city, and you have the most recent electoral results from that city broken down by household (which sounds illegal as fuck). For people who didn't vote in that election, an effective way to guess how they might have voted would be to assume they voted the same way as those closest to them

#

so if one didn't vote, and they're in a house with three Purple voters and two Orange voters nearby, you could naively assume that they would have voted Purple.

exotic edge
serene scaffold
#

for an unknown person, assume they're the same as the majority of the k nearest people (where k is an integer like 4, or something)

hollow sentinel
#

i'm so confused

#
corr = df.corr(method = "spearman")
plt.figure(figsize=(30,30))
sns.heatmap(corr, annot = True, fmt = ".2f",cmap="Blues")
plt.title("Spearman Correlation Heatmap")
sns.set(font_scale = 2)
plt.show()
#

how do i make the font size of a heatmap bigger?

#

like this is very small

#

ok i got it

soft haven
#

Hi, i got assignments from my prof that bother my mind about machine learning, if i'm not mistaken he asked us to apply regression algorithm to iris data set which is IMO it should be classification problem, i'm new in this field and not get enough clue about this so i need help to determine what type of algorithm should be apply to this dataset, anw the data is from sklearn.datasests but i'm gonna send the link here

desert oar
#

e.g. can you predict sepal length from sepal width, petal length, and petal width?

desert oar
#

sure, there you go

soft haven
#

Never think that before, dang, thanks bro

bleak kiln
#

Is there somone who can help me?

import pandas as pd
df = pd.read_csv('testVersion.csv', sep='|')
showValue = df[['LeadInfo']]
x = df['Huidig type woning', 'Toekomstig adres', 'Toekomstige postcode'] = df['Leadinfo'].str.split(':',expand=True)
print(x)
serene scaffold
serene scaffold
bleak kiln
sour grotto
#

Hell

#

Hello

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @severe shell until <t:1638475742:f> (9 minutes and 59 seconds) (reason: newlines rule: sent 110 newlines in 10s).

rapid fog
#

!unmute @severe shell

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: pardoned infraction mute for @severe shell.

rapid fog
#

!paste Please use this in the future

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

severe shell
#

ok

#

so i have this code

#

it does this

#

its an assistant that listens for commands

#

in the first code it does this
and it listens
and after a command is done , it again listens till you shut it down
i want it to listen only when i say a hotword
like ok google

#
import speech_recognition as sr
hot_word='Hi'
r=sr.Recognizer()
r.pause_threshold=5#This waits for 5 sec after voice ends
with sr.Microphone() as source:
    text=r.listen(source)
text=r.recognize_google(text)
if hot_word in text:
    #do anything like calling a function or reply to it```
#

the second code allows me to use hey google like feature so it listens for commands only when i say the hotword
its a simple thing to do but i dont know what and where to remove in the first code and where to add the second one
thanks for understanding , ping me when you are avalabile for help

#

if you want you can also help me in my dms or here works well too

#

@serene scaffold you could help me ?

serene scaffold
severe shell
#

No problem , anyone else ?

compact parrot
#

@desert oar thanks! Your yesterday decision works. I received quite strange results, but, it looks like true

ashen umbra
#

hi can anyone tell me why this character by character tokenization happening here?

#

I am trying to call my preprocessor function on the each of the qualification feature

desert oar
#

show us the definition of preprosessor

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar
#

please do not post a screenshot of the definition

ashen umbra
#

this is essentially what i am doing in my preprocessor func

#

@desert oar

ashen umbra
#

Also another thing, does anyone here has any experience w xgboost?

echo thorn
#

Any good services that will run my code on a machine with a lot of cores and a lot of ram available?

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1638487055:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

ashen umbra
#

So far I found mse and rmse as the most common ones.. but what about the traditional ones?

#

Such as accuracy, precision and so on

#

Are they not good way of evaluating an xgb's performance?

desert oar
# ashen umbra

for i in text is suspicious when text is just a string

pale sedge
#

Can someone here help to make a sliding windows classification ?

normal violet
#

please help im really stuck

supple trench
#

Good night, I'm having trouble retreiving an imaga from a URL

#

import io
import requests
import pytesseract
from PIL import Image

url = 'https://resultadosgenerales2021.cne.hn/imagen_acta.html?url=https://provisorio-honduras-2021.datosoficiales.com/opt/recuentos/mesa-8766_DIP.jpg'
headers = {
'User-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',
'Accept-Encoding' : 'gzip,deflate,sdch',
'Referer' : 'https://resultadosgenerales2021.cne.hn/#resultados/PRE/HN'
}
response = requests.get(url, headers=headers)
response.content

#

This is my code

#

But in response.content

#

import io

import requests

import pytesseract

from PIL import Image

url = 'https://resultadosgenerales2021.cne.hn/imagen_acta.html?url=https://provisorio-honduras-2021.datosoficiales.com/opt/recuentos/mesa-8766_DIP.jpg'

headers = {

'User-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36',

'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

'Accept-Encoding' : 'gzip,deflate,sdch',

'Referer' : 'https://resultadosgenerales2021.cne.hn/#resultados/PRE/HN'

}

response = requests.get(url, headers=headers)

response.content

b'<!DOCTYPE html>\n<html>\n <script>\n document.addEventListener('DOMContentLoaded', (event) => {\n const queryString = window.location.search;\n const urlParams = new URLSearchParams(queryString);\n const imageUrl = urlParams.get('url');\n if (imageUrl) {\n document.querySelector('#imagen').setAttribute('src', imageUrl);\n }\n })\n </script>\n <body>\n <img id="imagen" src=""/>\n </body>\n</html>\n'

#

This is what i get

#

I'm trying to process the image in the url with pytesseract but I can't since it's not in the content of the request

hollow sentinel
#

i am so confused rn

#

this is the confusion matrix in the o'reilly machine learning book

#

but this is the confusion matrix in a statquest video?

#

what?

#

well i guess i'll be using the python variation of it

#

i'm just gonna go by the book

#

or is the o'reilly book wrong?

#

is there a certain confusion matrix layout i should stick to?

boreal escarp
#

Hey I was wondering if someone can help me figure out how can i make my code work, thanks!

#
def covariance(x,y):

    # Trouver le mean du serie x et y
    mean_x = sum(column_x) / float(len(column_x))
    mean_y = sum(column_y) / float(len(column_y))

    # soustraire le mean des elements individuels
    sous_x = [i - mean_x for i in x]
    sous_y = [i - mean_y for i in y]

    #Creer le numerateur et le denominateur afin d'avoir la formule de la covariance
    nume = sum([sous_x[i] * sous_y[i] for i in range(len(sous_x))])
    denom = len(x) - 1
    cov = nume / denom
    return cov

with open('nicotinic1.csv') as nicotinic_1:
    fonction = covariance(x,y)
    print("La covariance du fichier nicotinic_1 est: ", fonction) ```
serene scaffold
#

@boreal escarp what does it do that is different from what you want it to do?

lone drum
#

Hello

#

I have a data frame
Which has date column I want to add an empty row before new date starts

For eg


01-01-2020
01-01-2020


02-02-2020
02-02-2020
02-02-2020


03-02-2020


04-02-2020

04-02-2020

This way
How I can do this?

#

Ping me when replying

slim fox
#

that's federated learning, isn't it

velvet thorn
#

because the classical case of federated learning is

#

a complete dataset divided horizontally across nodes

#

as opposed to vertically

velvet thorn
lone drum
#

I have pandas series
Which has different values like
String, float, int etc

How I can keep only string values only
Ping me when replying

slim fox
velvet thorn
#

problem was credit scoring with telco data

#

so the targets were basically default or no default, right

#

i.e. labels

#

and the features were the raw telco data like call records etc.

#

so the problem: when features and labels are separated, how to train model?

slim fox
#

oh. but why target label cannot be in the same place as feautures?

#

like, I don't undertstand why labels are separated from features

limpid oak
#

is it possible to use df.itterrows() and enumerate(list) in single for loop

#

for idx, row,idLst,rowLst in zip(villGdf[:5].iterrows(),enumerate(tempList)):
  print(row['id'])
    # eachPoint = geomPoints[i]```
#
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19180/940817221.py in <module>
      1 tempList = [10,2,5,3,55,57,1]
      2 
----> 3 for idx, row,idLst,rowLst in zip(villGdf[:5].iterrows(),enumerate(tempList)):
      4   print(row['id'])
      5     # eachPoint = geomPoints[i]

ValueError: not enough values to unpack (expected 4, got 2)
barren moss
#

Does anyone know how to fit a linear regression to a data?

ripe forge
#

When put inside function definitions, ** indicates capturing variable kwargs. When put in other contexts, such as when calling a function, it's "unpacking" or "opening" the dictionary into kwargs

#

The two are like mirrored operations of each other

#

So in this case, **bla is saying "open up/unpack bla and pass its contents as separate kw arguments"

dire bobcat
#

hello
I've got this error i don't raelly inderstand what does it mean
i tried to put ht variable time in array with n.array and doesn't wort even with list
thank you in advance ! 😊

old grove
#

if train accuracy and test accuracy comes up 95 and 86, Should we assume that the model is underfitting ? what should be min difference between train and test accuracies so they are general fit ?

north stirrup
#

Hi, I am doing to audio clean with U-Net (tensorflow, pytchor, soundfile, ...) , It give me a Nvidia GPU dependency and I have a Radeon AMD GPU. Do someone know how I can train a Neuronal net with my GPU? It's my first time doing this so I might doing something wrong... Some context could help me too

lapis sequoia
#

Hello I know "python" (notice the quotes😅) and I would like to start learning artificial intelligence. Any good resources that you recommend to kick off? Thank you very much in advance

teal mortar
# lapis sequoia Hello I know "python" (notice the quotes😅) and I would like to start learning a...
freeCodeCamp.org

“Good friends, good books, and a sleepy conscience: this is the ideal life.” ― Mark TwainI hope you’re reading this blog in your pjs looking forward to a rejuvenating and healthy weekend. I have been working on multiple projects lately, from creating Machine Learing Engineering and Machine Learning Operations courses

lone drum
#

hello, i have pandas dataframe in that i have date column python 01-02-2017 01-02-2017 01-02-2017 01-02-2017 01-02-2017 02-02-2027 02-02-2017 03-03-2017 04-02-2017 04-02-2017 04-02-2017 04-02-2017 04-02-2017 ... ... ... 27-02-2018 27-02-2018 27-02-2018 this way

#

i want to add blank row before new date start

#

my expected op is

#
01-02-2017
01-02-2017
01-02-2017
01-02-2017
01-02-2017

02-02-2027
02-02-2017

03-03-2017

04-02-2017
04-02-2017
04-02-2017
04-02-2017
04-02-2017
...
...
...
27-02-2018
27-02-2018
27-02-2018``` this way
#

ping me when repl,ying

hollow sentinel
#

I still don’t understand why confusion matrices are presented differently

#

Like what

#

OH I SEE

#

THE PREDICTED AND THE ACTUAL ARE FLIPPED ON THE AXES

#

🤯🤯🤯🤯

austere swift
#

if only it were 100mb larger

boreal summit
#

Hello everyone, I am tryna get myself acquainted with ML pipelines using TF & Apache-beam...so I'm reading this book. When I got to the TF transform part, the code didn't run well. So I went to the official TF website and trried to run their own example code which also failed.

#

I don't know if there's a way around this.

austere swift
#

can you send the error that you got?

boreal summit
#

Google should have updated their website.

#

If anyone has a walk around this, I'll appreciate. Thannks

austere swift
lapis sequoia
#

numpy.core._exceptions.MemoryError: Unable to allocate 820. KiB for an array with shape (100, 350, 3) and data type float64
I have this error, and google searches only bring me to unresolved issues or unanswered questions. Any ideas? lol

austere swift
austere swift
#

either make the array smaller (by reducing the precision or lowering the amount of values somehow) or get more memory

lapis sequoia
boreal summit
#

TFX versions below 1.0.0 were for experimental purposes which was stated, versions above 1.0.0 are what anyone should learn.

lapis sequoia
#

with nothing else running and more than enough spare mem

boreal summit
#

Just like on the tensorflow website,

austere swift
austere swift
boreal summit
#

TypeError: object of type 'NoneType' has no len()

boreal summit
#

That's the error I am getting.

austere swift
#

unless you want to make a lot of code changes to make it work with the new versions

#

the simplest solution to your issue is to just use the same version

lapis sequoia
#

okay i was being very not intelligent

austere swift
#

you can double check that by checking your memory usage and seeing if it's full

lapis sequoia
#

yup just re ran it there with the memory usage in front of me and it makes its way up to 100, then i get that error. Cheers!

austere swift
#

in your case, I'd recommend going down to float32 precision, float64 is usually not necessary

lapis sequoia
#

ohhh i see

#

much appreciated

lapis sequoia
#

anyone here good with encrypted strings?

serene scaffold
teal mortar
#

does anyone know a good mlops book?

boreal escarp
#

hey can someone help me figure out why my code isnt working? it has to do with the x and y variable, but i am not sure how to fix it. thanks ! ```py
def covariance(x,y):

# Trouver le mean du serie x et y
mean_x = sum(column_x) / float(len(column_x))
mean_y = sum(column_y) / float(len(column_y))

# soustraire le mean des elements individuels
sous_x = [i - mean_x for i in x]
sous_y = [i - mean_y for i in y]





#Creer le numerateur et le denominateur afin d'avoir la formule de la covariance
nume = sum([sous_x[i] * sous_y[i] for i in range(len(sous_x))])
denom = len(x) - 1
cov = nume / denom
return cov

with open('nicotinic1.csv') as nicotinic_1:
fonction = covariance(x,y)
print("La covariance du fichier nicotinic_1 est: ", fonction) ```

odd meteor
native lava
#

I've got a data merge problem I can't seem to find a straight forward solution to. Datetime based records, main dataset every 5 minutes. Set to be merged is every 15 minutes and timestamps don't match exact. I want to merge with existing dataset filling in blanks with
average values. I know I want to use pandas, but I'm really new to that, only a couple months experience. DB is MySQL running on a Linux server. Main app is based on Flask everything else is "pure" Python. I'm good at following rabbit holes, but I could use some advice on where to start and a direction to go in.

odd meteor
noble dove
#

hello

odd meteor
# old grove if train accuracy and test accuracy comes up 95 and 86, Should we assume that th...

Underfitting? Nah it's far from that. If it was underfitting your train set won't even smell a 95% accuracy 😊

Underfitting is like asking a 2 year old to solve MANOVA (Multivariate Analysis of Variance) when the child's brain is still too young to handle such complex task. Now, what do you think would happen?

The child is definitely perform woefully on the task. You can relate this to your model. If your model isn't robust and flexible enough to capture complex patterns in your dataset then it's most likely bound to underfit.

For your train data to hit a 95% accuracy score, do you now see it's far from underfitting? 😊

old grove
odd meteor
noble dove
#

i need help

odd meteor
odd meteor
hollow sentinel
rough mountain
#

There are many methods you can use to compare two images in ML (Siamese NN, CNNs, Ect.) What I cannot figure out is comparing a large number of images (Without Retraining) to find images of a different object. The best way I can describe this is a few shot learning problem without retraining. Any ideas?

#

My only real idea is to use an RNN and have it memorize some of the required features of an image while it parses through all of them. I would also likely have to ensemble multiple RNNs with different sets of images in case the first RNN starts off on the outliers.

Probably not a good sulution

wooden cosmos
#

Hello, i have a question regarding community detection in a bipartite graph.
Let's assume, that we have a set U of elements connected to a set of elements V. We define a proximity function for (a,b)∈U^2 such that F(a,b) are close if a and b map to the same elements in the set V, the more elements in V a and b map to - the higher the proximity (or lower the distance between a and b). Then we get an adjacency matrix which shows us the weight of every edge between the elements of the set U . I assume that there are communities in this graph, but i don't know how many there are.

for info, the set U contains 500k nodes, V contains 4k nodes

How do i detect communities and what is the most accurate way to represent the results?

Since I have so many nodes that I can't simply represent this data as a graph (it would be a complete mess), I was thinking about taking a node, putting the node in a N-dimensional space, then adding the neighbouring nodes according to the proximity to the first node and repeating this process until I embed all the nodes into my N-dimensional space [but that looks kinda like an NP problem (correct me if I'm wrong)]. Then I could use UMAP to detect the communities and a projection into 2D space to represent the results

upbeat dove
#

I'm a bit confused as to how CNNs work in terms of passing information onto other layers.
If you pass 64 feature maps to another convolutional layer, how does it interpret that?

old grove
odd meteor
old grove
dreamy bone
#

hi can anyone help me?

#

what does this do?

#
import pandas as pd
import numpy as np

def adder(ele1,ele2):
   return ele1+ele2

df = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.pipe(adder,2)
print df.apply(np.mean)
#

the pipe part i mean

serene scaffold
#

!docs pandas.DataFrame.pipe

arctic wedgeBOT
serene scaffold
#

so it's the same as adder(df, 2)

glass minnow
#
np.array(arr)[:,np.newaxis]
: = selecting every element in row
np.newaxis = it is creating new object at new column 

correct ?
#

can someone please help me understand this

median quail
#

the CNN takes its input layer and processes it with a convolution filter, basically a small, shallow neural network applied to each pixel on an image

#

blurs in games or things like bokeh effects are kinda like this too

#

for a CNN these filters are learned, as they're a small neural network

dreamy bone
#

But I know the few diffs between py2 and py3.10

dreamy bone
serene scaffold
ashen umbra
#

hi, can anyone tell me why I am getting this for row 4 here?

#

It's supposed to be characters like the other rows values

#

pls lmk if you need any other info to help out

spiral olive
#

any good short and best ml course for python

dreamy bone
#

Freecodecamp (:

#

I'm learning from there right now

#

Also geeks for geeks, tutorialspoimt , w3 schools (:

lapis sequoia
spiral olive
#

k

teal mortar
lapis sequoia
#

not sure if this is the right channel but

can I get the exact number of the line when I read a line and how?

Can I check for spaces in a string?

pastel valley
#

yo what is the difference of test and validation samples?
is the validation dataset required?

austere swift
#

So unless you’re doing automated hyperparameter tuning, a validation set isn’t really necessary

hexed schooner
#

can anyone explain what is data pipeline between SQL and python? what is it for and how to implement it (by luigi perhaps?)

pastel valley
austere swift
#

i.e number of layers, nodes per layer, number of conv filters, filter size, batch size, etc

#

would also include parameters for the optimizer like learning rate or which optimizer to use

prisma mist
#

how do i know when to normalize a dataset?

odd meteor
# prisma mist how do i know when to normalize a dataset?

When your features aren't in the same unit. You could have a feature whose unit is in secs, another in kg, another in weight, another in joules etc...

You'd have to normalize your data to at least give each feature a level playing ground for optimum performance before training your model with the data.

By doing so, any feature whose contribution is subpar or insignificant to your model performance won't feel so jelly or discriminated against if you decide to use your veto power to disqualify such feature from your magnificent project moving forward (pun intended) 😀

But I hope you get the point now

pastel valley
austere swift
pastel valley
#

thank you very sir @austere swift

pastel valley
#

btw if i created a model with an input size of 500x500x3 and i used that model in a a mobile application and the input will be the camera captures is it ok?

lapis sequoia
bold timber
#

why i get an error like this? AttributeError: 'numpy.ndarray' object has no attribute 'unique'

arctic wedgeBOT
#

Hey @errant path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

odd meteor
pastel valley
hollow ember
#

Can someone help me out with this one

bold timber
bold timber
#

In this case, I want to build an image recognition model with a maximum value of 255. Thus, I dividing an X by 255 for normalization. @odd meteor

dreamy bone
#

do you guys think it'd be a decent idea to add statistics in my resume as well after I finish ML/NLP/Data science libraries? 😛

lone drum
#

hello my dataframe this way

#

i want to groupby data by age column

#

i tried this way python print(df.groupby('age').head(10)) but i am getting python name marks subject age 0 amar 78 maths 45 1 ajay 56 physics 56 2 kiran 36 science 20 3 pankaj 41 hindi 78 4 kiran 20 maths 23 5 amar 78 physics 45 6 pankaj 63 hindi 12 7 sanket 41 science 12 8 sahil 85 maths 20 9 kiran 26 hindi 84 10 amar 45 science 45 11 pankaj 98 maths 41 12 swapnil 14 hindi 30 13 amar 21 maths 56 14 sham 40 hindi 56 15 sanket 85 maths 45 16 pankaj 42 science 23 this way

#

ping me when replying

lapis sequoia
#

And it helps in ml anyways.!

serene scaffold
#

Find the most common subject for each age or what?

odd meteor
vague relic
#

Could anyone please clearly distinguish between data science and data analytics? I've searched online. But the definitions available are vague.

dreamy bone
#

im not a 100 percent sure but that probably is the gist of it

vague relic
serene scaffold
vague relic
dreamy bone
#

They mean the same thing

lapis sequoia
#

not sure if this is the right channel but

can I get the exact number of the line when I read a line and how?

Can I check for spaces in a string?

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lime moon until <t:1638648800:f> (9 minutes and 58 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

midnight heath
#

code:

def show_path_line_count():
    global folder_path
    total_linecount, line_count = 0, 0

    files = [f for f in listdir(folder_path) if isfile(join(folder_path, f))]
    for filee in files:
        try:
            with open(f"{folder_path}\\{filee}", "r", encoding="UTF-8") as file:
                for line in file.readlines():
                    line_count += 1
                    
                filename = os.path.basename(file.name)
                    
                total_linecount += line_count
                    
                labels.append(filename)
                sizes.append(line_count)
                
        except Exception as e:
            print(f"Couldn't linecount {filee} | {e}")
            pass
                                
    fig1, ax1 = plt.subplots()
    ax1.pie(sizes, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)
    ax1.axis('equal')

    plt.title(f'Total lines: {total_linecount}')
    plt.show()
            
    return total_linecount, filename

I want to make it so that the piechart only displays top 5 values and then displays the rest as "other". How would I do that?

flat patrol
#

My ANN model is always getting 100% accuracy. How do I get a more accurate accuracy?

austere swift
#

it's likely that it's overfitting

#

and having a small dataset can also do that (conceptually, its easier to get 100% accuracy if you have 5 samples than if you have 5000)

flat patrol
austere swift
#

can you send some code?

flat patrol
#

I am using model.evaluate()

#

here is a screenshot of the code and output:

austere swift
#

that all looks right

#

also you only have 1810 samples, not 1 million

flat patrol
#

wait, I must have done something wrong

#

it should be larger

#

let me check

austere swift
#

also make sure you don't have your labels in your training inputs, i've done that before and it can be hard to debug

flat patrol
#

Okay

#

Oh, originally my data was over 1 million values, but I had to cut a lot of it down because there were null values

#

Also, I get a memory error if I try to load in the whole dataset, as it is over 100 million rows

#

Do you know how I could get around this?

austere swift
#

if you don't have enough memory to load in your dataset, there isnt really much you can do about it other than just getting more memory

flat patrol
#

Alright

austere swift
#

if its in pandas you can try to use the low_memory parameter

flat patrol
#

By the way, when I change the number of layers and stuff, the accuracy stays the same. How many layers should I have?

flat patrol
left shadow
#

Hey guys

#

I'm using opencv2 to find a certain colour on a map and only show it whilst blacking out everything else

#

but the one of the colours is just making the whole screen black

#

Im using BGR2HSV

quiet vault
#

What should the output layer of a object detection model look like? (Amount of nodes and activation function)

tight flare
#

send your code

left shadow
#

yes

#
import cv2
import numpy as np

img=cv2.imread("img.png")

choose = input("Which area: ").lower().strip()

def richplaces():

    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    lowerrange = np.array([25,157,1])
    upperrange = np.array([130,255,255])

    mask = cv2.inRange(hsv,lowerrange,upperrange)

    cv2.imshow("Image", img)
    cv2.imshow("Mask", mask)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

def middleclass():

    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    lowerrange = np.array([0,121,255])
    upperrange = np.array([130,255,255])

    mask = cv2.inRange(hsv,lowerrange,upperrange)

    cv2.imshow("Image", img)
    cv2.imshow("Mask", mask)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

def poverty():

    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    lowerrange = np.array([179,149,251])
    upperrange = np.array([130,255,255])

    mask = cv2.inRange(hsv,lowerrange,upperrange)

    cv2.imshow("Image", img)
    cv2.imshow("Mask", mask)

    cv2.waitKey(0)
    cv2.destroyAllWindows()


if choose == "rich":
    richplaces()

if choose == "middle":
    middleclass()

if choose == "poor":
    poverty()
tight flare
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

left shadow
#

This is the map

tight flare
#

so what's the problem? Your code looks fine

left shadow
#

we cant mask the pink placeds

#

it shows the whole screen as balck

#

black

tight flare
#

That means that your upper and lower values aren't right, it's not detecting pink anywhere

#

where did you get these HSV values?

left shadow
#

We got the colour from the map

#

we got its rgb values

#

using colourpicker

#

it works well for the rich and middle class

tight flare
#

Well you should get new values

#

Are you trying to mask an HSV image with RGB values?

dreamy bone
#

This is a long shot lol, but would anyone wanna work on some sort of software product that incorporates ai,ml,NLP, data science into it? I mean it could look cool on your resume, and you'd gain some hands on exp lol

lapis sequoia
#

Is there any nice and precise website where we can find out info about state-of-the-art models for various tasks? as an example state-of-the-art model for machine translation. (please ping me if answered. thanks.)

lapis sequoia
lapis sequoia
lone drum
#

hello i have a dataframe with date column in it

#

thhis way

#

i want to add a blank line before new date

#

my expected output this way

#

how i can do this ping me when replying

#

blank row before new date

lapis sequoia
#

Guys can u suggest me a free data analytics cours

serene scaffold
#

from pandas's perspective, this is like trying to add entries with all empty strings to an SQL database.

#

if you provide the code as text, we could offer alternatives.

#

however I don't see a "yes" outcome, so the function might as well not do any boolean logic and just return "no".

lone drum
# serene scaffold you might look into openpyxl instead of pandas, because that isn't the kind of t...

hii i tried this way ```python
def add_blank_rows(df, no_rows):
df_new = pd.DataFrame(columns=df.columns)
for idx in range(len(df)):
df_new = df_new.append(df.iloc[idx])
for _ in range(no_rows):
df_new=df_new.append(pd.Series(), ignore_index=True)
return df_new

df = pd.read_csv('pandas_dataframe.csv', names=['date', 'names', 'age', 'city'])

df_with_blank_rows = add_blank_rows(df, 1)

print(df_with_blank_rows)```

#

but i am getting python date names age city 0 date names age city 1 NaN NaN NaN NaN 2 01-01-2017 amar 23 mumbai 3 NaN NaN NaN NaN 4 01-01-2017 ankit 24 goa 5 NaN NaN NaN NaN 6 02-01-2017 ajay 25 pune 7 NaN NaN NaN NaN 8 02-01-2017 sameer 26 nashik 9 NaN NaN NaN NaN 10 02-01-2017 ankit 24 goa 11 NaN NaN NaN NaN 12 02-01-2017 ajay 25 pune 13 NaN NaN NaN NaN 14 03-01-2017 ajay 25 pune 15 NaN NaN NaN NaN 16 04-01-2017 sameer 26 nashik 17 NaN NaN NaN NaN 18 05-01-2017 ankit 24 goa 19 NaN NaN NaN NaN 20 05-01-2017 ajay 25 pune 21 NaN NaN NaN NaN

serene scaffold
lone drum
#

i want to add blank row before new date strts

serene scaffold
odd meteor
# flat patrol Also, I get a memory error if I try to load in the whole dataset, as it is over ...

Since you mentioned your data is too much, how about loading your data in batches in Pandas and subsequently doing Batch Training?

Presuming you're working with a tabular data, this should solve the problem in TensorFlow

import pandas as pd
import numpy as np
for batch in pd.read_csv('Jubitron.csv', chunksize= 10000) :
    target = np.array(batch['your_target_column'], np.float32)
    feats = np.array(batch['your_feature_column'], np.float32) 

You can increase or decrease the chunksize if you so wish.

You can also check online on how to do batch training with image or sound dataset. I think this should solve your low_memory problem.

odd meteor
odd meteor
amber echo
#

I am trying to get into AI and I am wondering if there are any online resources to help me start coding AI programs such as neural networks or linear regression type stuff. like some sort of youtube video series or some course I can possibly pay for

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @glad escarp until <t:1638732796:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

amber echo
#

im looking for specifics so I can actually begin coding

#

ive watched stuff on neural networks and such

#

and i know what they are

#

i just need some direction in the realm of actually applying such things

worldly dawn
worldly dawn
rough mountain
nimble tulip
#

.paste

delicate sphinx
#

Hello, does anyone have any ideas on how best to load images in for an image batching process?

#

I'm running a model on the MSCOCO dataset and have preprocessed image features to (256,256,3) which has given me .npy files that are 292 KB in size each, I'm loading in over 240,000 images each 292 KB which as you can expect, is bottlenecking my performance

#

(Those 240,000 to train on come from 80,000 input images and each image is used 3 times, realistically I load ~80,000 and duplicate the loaded array 3 times as opposed to loading in 240,000 images which I would definitely never do 👀👀)

loud cave
#

Are you using tensorflow?

delicate sphinx
#

I am indeed, sorry should've said

#

My times

#

I have a sort of work-around, namely that every 2 epochs (1h20m) I save the model weights and load them in the next runthrough

#

so instead of ~1600 minutes to train it fully on 40 epochs * 40 minutes per epoch, I can chunk it

loud cave
#

Isn't it a standard practice that one thread is reading data while another feeds it to the GPU? Like suppose your GPU can do a train step for a batch in one second, ideally you will have the next batch ready to feed immediately from the other thread?

delicate sphinx
#

I tried to use pooling but I didn't see any improvements which may be due to the structure of my loop

loud cave
#

It's been a while since I messed with it, but I think tf datasets can be configured to have this behavior

delicate sphinx
#

i tried to use a parallel map dataset

#

but that kept giving me incompatible errors

loud cave
#

Also if I recall with some of the configurations, they require one full pass before they take effect. I might be thinking of .cache() with that

delicate sphinx
#

iirc there was a type of batch = Dataset.from_tensor_slices()

#

and then a batch.map(lambda item1 : numpy_function( load_function, [item1], [float32]))

#

but on trying to do .prefetch(batch_size) and passing that to my .fit/.train_on_batch it just wasn't a fan

loud cave
#

hmm

delicate sphinx
#

in fact, more information that may help

#

but when i did it before

#

everytime I called .prefetch it would just never finish prefetching

#

and that may have been something i did wrong but it made me too scared to try it at all so i avoided it, so i should definitely have a look though i havent really changed my structure much except I do an inner loop to train x times on the batch (as it's already loaded)

delicate sphinx
#

so if I do im_batch = img_ds.batch(batch_size)

#

if i do im_batch = img_ds.prefetch(buffer_size = batch_size)

#

If i do the batch and prefetch earlier and try to subscript it

delicate sphinx
#

Thank you for trying to help, think I'll have to give up on that idea, tried about 30 different methods and none would work, most of the time it was giving me this error: (my input list to the mapping function would be a list of files i.e. "dir/numpyfile.npy"

austere swift
delicate sphinx
austere swift
#

so if you have a batch size of 32, rather than loading in 32 files, put 32 images in each file and only load in one file

#

it would be much faster

delicate sphinx
#

I want my batch sizes to be variable

#

as it depends on the PC that runs it

#

on Google Colab I could do batch sizes of 1 before it crashed

#

on my PC I'm doing 513

#

I've just figured out how to do Dataset.map

austere swift
delicate sphinx
#

though it depresses me because the thing it replaces was very impressive haha

austere swift
#

or batch sizes that are less than one chunk

#

best case would be to just put them all in one file, although that would take quite a bit of memory

delicate sphinx
#

tbh

#

this project is already 180GB

#

I'm not too worried about memory hahaha

austere swift
#

how much ram do you have?

delicate sphinx
#

I can load every numpy file into memory but that puts me over 90% memory

#

32 GB

austere swift
#

that wont handle all of it

delicate sphinx
#

mind you, i can only get 1 copy of each image

#

in ram at once

#

hence how it fit

austere swift
#

240,000 * 292kb = 70gb

delicate sphinx
#

yeah but 3 questions per image so the image appears 3 times

#

so if i load the image once it fits in ram

austere swift
delicate sphinx
#

yeah true, unless I do what i sort of did in one of my functions that counts how many questions each image has and copies it that many times

#

so that when it loads in from that file it would copy it three times

#

and something like np.memmap from what ive read might actually work with that

#

(but i havent used it)

austere swift
#

what I would do is have it saved as a .npz file with all of the numpy arrays inside of that, since loading a .npz file doesnt load the arrays into memory until you try to assign it to a variable (npz files act similar to dicts, with names of arrays as keys and the arrays as values)

#

so it would be a single file still, but you'd be able to load in individual arrays without loading everything into memory

#

although I haven't experimented with having more than a couple arrays in a single .npz file so i'm not sure how it will handle 240k

delicate sphinx
#

npz is apparently on par with hdf5 for it

#

which are both meant for huge datasets

#

however im still not sure

#

part of me wants to try loading it in batches though i do know thats inefficient in the long run

#

Thank you very much, if the method im trying out now doesn't work i may look into that though, hope I didn't sound ungrateful 🙂

delicate sphinx
#

Does anyone know how I can make my map function for img_ds take multiple inputs? My dataset is 240,000 images but each one is used 3 times in a row, therefore I only need to load 80,000 images and just need to "copy" or duplicate the first value two more times. Any help is much appreciated as this .map function has seriously improved my I/O bottleneck issues by 4x the speed, though I can only load them 1 by 1.

#

Don't want to needlessly ping you but if you read this peace_within_reach I cannot thank you enough haha, 1 epoch now takes about 1/3rd or 1/4th of the time it took before and that's loading 240,000 images!

lone drum
#

Hello I want to insert a row in data frame where condition becomes false

#

How I can do

#

My code this way

#

I am getting this error

#

Ping me when replying

#

Can anyone please look into this

#

Ping me when replying

dreamy bone
#

The speed of python in this area is super fast right? Matching c?

swift oxide
#

no probably

#

the syntax is easy for python

#

but python is slower than the other popular languages

dreamy bone
#

In this area? In machine learning and data analysis?

#

I thought the AI libraries were made in C?

errant path
fervent compass
#

Hi there, i'm doing a Project part for a Masters Course. The Project in general is about digital quality management: A 5-Axis cnc machine is cutting a Part. Machine Data is collected through a Edge Device and then used to create a dot cloud/stl that itself is then surveyed/measured and compared with the measurements of the real part. My part in the project is using the raw machine data and creating a ML algorithm that does a predictive decision on "if the collected Data is sufficient to create accurate measurements through the digital twin". Sadly i feel ill prepared through previous courses for handling such a specific topic and do not even know where to begin. Thus i could use some pointers on how to proceed with checking/preparing the Data, what algorithms could be used to get a useful result, and so on. If you have questions or suggestions (for possibly useful tutorials on how to get started or such) feel free to respond here or in a DM. Any help is highly appreciated.

warm jungle
# dreamy bone The speed of python in this area is super fast right? Matching c?

If you use numpy then vectorised calculations over numpy arrays are fast (it's all implemented in C under the hood). If you do stuff in pure python then you're probably paying some constant factor overhead compared with implementing the same algorithm in C. But often with big data the real question is how things scale with the size of your data (big - O behaviour), rather than what the details of the constant factors are.

dreamy bone
dreamy bone
#

If I used them in a large scale proj where I manipulate them

fervent compass
silk yoke
#

for a given row in Pandas dataframe, how do I return the column name that has the highest value?

the code i have right now only returns the highest value for a given row, but not the name of the column, which the value is belongs to

print(df.loc[139, :].max(axis = 0))
soft haven
#

Hi there, i'm currently learn about assumption tests for linear regression and use durbin watson test as one of the tests, but the problem i encountered is that i have to compare d value to the durbin watson table (dL and dU) manually, my question is , is it possible to automatically compare d value i got in Python? Btw i use stats.stattools.durbin_watson( ) function from statsmodels to get d value

odd meteor
# soft haven Hi there, i'm currently learn about assumption tests for linear regression and ...

This kinda reminded me of my parametric and non-parametric test class 😊. Sadly, we did ours manually with pen and paper, and kinda played around it with SPSS (nothing too serious then)

Unfortunately, I don't know how to navigate statstool to carry out Durbin Watson test... However, If you wouldn't mind using SPSS to figure this out, I'm sure there's plethora of YouTube videos that explained it concisely using softwares like SPSS.

All the best ✌️

odd meteor
silk yoke
#

it's in relation to tf-id where each row is a sentence and every column is a word, so i'd want to get the words with highest score for each sentence

odd meteor
soft haven
odd meteor
rough pilot
#

anyone know how to calculate percent similarity between two columns in dataframe

#

rebrushing on python

night gorge
#

I am trying to make a linear regression model and getting an error
"Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
This is my code, can anyone explain what is happening:

x=df1['Add2(in Thousands)']

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=42,test_size=0.25)

from sklearn import linear_model
lr=linear_model.LinearRegression()
model= lr.fit(x_train,y_train)
predictions=model.predict(x_test)
odd meteor
# night gorge I am trying to make a linear regression model and getting an error "Reshape your...

Just as the error message reads, you gotta fix up your x and y variable. It means your x and y data aren't in same dimension. So just reshape your y variable to a 1-D array.

You can further confirm this by printing the shape of your X and y to understand why you're getting such error. Your y is most likely a series instead on a data frame or an array.

y_reshaped = np.array(y).ravel()
x_reshaped = np.array(X).ravel()

should fix it. Then rerun the train-test split and train your model again

night gorge
#

@odd meteor Thanks a lot..

dreamy bone
#

oh man im almost done with the data analysis libraries like numpy, pandas, matplotlib and seaborn, im so pumped to be learning machine learning (:

worn herald
#

im just getting started with data analysis libraries and i wanted to know how i could make a pie chart out of a csv file

#

for example making a pie chart out of the amount of times the diff groups in exgrupo (second column) appear, im still struggling with this whole concept hehe

pseudo wren
#

I would like someone to look over my data science curriculum

#

And say what they think of the work

pallid bison
#

guys im new

#

and im a kid

#

i want to develope a programming language in Python

#

how do i do it

#

i built the parser

#

and the lexer

#

can some1 help me please

#

and i build ai

#

too

arctic wedgeBOT
#

Hey @pallid bison!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

pallid bison
#

go here

#

that my project link

loud cave
worn herald
#

ohhh tyy!!

rough mountain
#

When creating a Siamese Network you create a model and instantiate it twice.

I wish to create a layer that will instantiate the model a variable number of times based on the input. It will then take the outputs of all of those models and do the distance math returning a fixed number of outputs.

Important note: The model is not pretrained, the goal is training that model.

rigid zodiac
#

!paste

serene scaffold
rigid zodiac
#

!paste

serene scaffold
#

@rigid zodiac the problem is that row is an empty list, as row[-1] would work if there was at least one element.

#

that, or row[-1] returns an integer that is out of range for expected

rigid zodiac
#

so how can I fix it?

serene scaffold
#

I do not know

#

I would probably use a step-through debugger

rigid zodiac
#

I suspect it has do with this error. I have change it to different number but still 😦

odd meteor
# pallid bison i want to develope a programming language in Python

I'm not sure I understand you properly. Python is already a programming language, how can you develop another programming language inside Python?

Or do you mean, you wanna develop another programming language like python? If so? Why do you wanna embark on such journey? Any specific reason? 😀

PS: IDK how to develop a new programming language myself

serene scaffold
#

but that's not really a data science question 😛

serene scaffold
#

this is still off-topic shrug2

dreamy bone
#

right sorry

serene scaffold
#

#algos-and-data-structs would be the nearest equivalent as it's also for the discussion of theoretical computer science in general.

wild pine
#

Do you know how to make Poetry and Torch behave? Or do you know of another PEP517 compliant tool to handle dependency/environment management?
I just really dislike the idea of splitting those two tasks between different tools, but Torch has some cursed packaging practices that doesn't seem to work well with a pyproject.toml file.
Basically cuda support relies on a specific pip flag to fetch the wheels from another URL

worn herald
#

datos = datos[datos["OcupacionEconomica"]!="SIN DATO"]
datos = datos[datos["OcupacionEconomica"]!="SIN DATO MINDEFENSA"]
    
df=pd.DataFrame(datos, columns=["TipoDeDesmovilizacion","ExGrupo","AnioDesmovilizacion","Sexo","SituacionFinalFrenteAlProceso","DepartamentoDeResidencia","MunicipioDeResidencia","BeneficioTRV","BeneficioFA","BeneficioFPT","BeneficioPDT","OcupacionEconomica","DesembolsoBIE","NumDeHijos","TotalIntegrantesGrupoFamiliar"])

grafico= df.pivot_table(columns=["OcupacionEconomica"], aggfunc="size")
plt.pyplot.bar(grafico)
plt.pyplot.show()

print(grafico)

#

im trying to make a bar graph with the following it code

#

but i get an error that says TypeError: bar() missing 1 required positional argument: 'height'

#

how do i solve that?

regal ingot
#

1 gram model

worn herald
#

what does that mean?

#

do yall know if pivot tables can be assgin x and y axis for graphs?

upbeat dove
#

Anyone know a place where I can learn reinforcement learning OTHER THAN Q-LEARNING

tidal bough
upbeat dove
#

Alright I'll see

#

Wondering because I want to learn better techniques since I doubt any good reinforcement learning trained AI used Q-Learning

worn herald
#

datos = datos[datos["OcupacionEconomica"]!="SIN DATO"]
datos = datos[datos["OcupacionEconomica"]!="SIN DATO MINDEFENSA"]
grupos =sorted(datos["OcupacionEconomica"].unique())
grupos_dict = dict(list(enumerate(grupos)))


datos.columns = ["TipoDeDesmovilizacion","ExGrupo","AnioDesmovilizacion","Sexo","SituacionFinalFrenteAlProceso","DepartamentoDeResidencia","MunicipioDeResidencia","BeneficioTRV","BeneficioFA","BeneficioFPT","BeneficioPDT","OcupacionEconomica","DesembolsoBIE","NumDeHijos","TotalIntegrantesGrupoFamiliar"]

conteo = datos.groupby(["OcupacionEconomica"]).count()
print(conteo)

conteo.plot.bar()
plt.pyplot.show()

#

can someone tell me whats wrong with this code?

#

im trying to do what the person who responded showed

#

but instead im getting this kind of graph

#

which shows all the columns within each bar

loud cave
#

Maybe you should do

conteo = data['OcupacionEconomica'].value_counts()

instead and try plotting that

worn herald
#

IT WORKEDDD

#

TYSMMM

mighty spoke
#

Hi does a definition with a nested loop work inside a for loop as I was trying but only plotted graphs when i took out the definition and return

desert oar
hallow sparrow
#

could somone explain what K mean clustering does and what types of dataset we need to do cluster?

serene scaffold
#

@hallow sparrow it's where you have points in space and the algorithm tries to figure out which groups of points are close together

#

Here's an example for k = 5

rough mountain
#

Could someone link me to a keras implementation of magnet loss?

serene scaffold
rough mountain
rough mountain
pallid bison
#

but more easier to learn

loud cave
#

I think languages like lisp, ocaml, scheme, haskell, etc are popular choices for writing new language. Scala if you want something that runs on the JVM

rough mountain
pallid bison
#

IDK

serene scaffold
#

@pallid bisonthis channel isn't for discussing language design

pseudo wren
#

So I’m applying for a school, and they want to know if my data science program now covers the math topics that are necessary for me to be admitted to the course

#

I’d love if someone could help me review this syllabus so I could talk about what I should know and what my expectations should be

dreamy bone
pallid bison
#

but...

dreamy bone
#

don't look here lol, start from the very basics...go to freecodecamp on youtube and type in python tutorial for beginners

regal ingot
#

anyone got any knowledge on temporal difference q learning

robust shoal
#

anyone have any experience scraping data from a forum? What type of backend would I need to scrape data from a forum every day and automatically push it to my website? Also, what technologies would I use to scrape the data? Beautifulsoup?

lone drum
#

Hello I am using pandas between time function
I want to check for two different time intervals in my data frame how I can do this?

#

For eg i want to check for time interval between 09:15:00 to 15:28:00 and 18:30:00 to 19:28:00
This two time interval data i need

#

How I can get this?

#

Ping me when replying

atomic tide
#

@errant path Please don't advertise without getting prior permission from the admins. Thanks.

dreamy bone
#

Hi so I don't have a master's in data science but I'm learning machine learning on my own. Would that get me a role of data scientist or would it be better for my if I decided to switch to software dev?

uneven thistle
#

Can someone tell me the solution for this problem ?

sharp stratus
shrewd lily
#

Hi There! Does someone of you have experience with calculating post-hoc tests in python and could help me out? 🙂

dreamy bone
#

When would we use map or applymap over apply?

#

Apply just seems like a superset to me right now

lone drum
#

How to drop rows of pandas dataframe which contains specific time value
For eg i want to drop rows which has 15:29:00 value in time column

#

Ping me when replying

regal ingot
#

anyone know q learning

dreamy bone
#

Or you could use a filter mask

#

Like this

#

Df[df[time_column] != '15:29:00']

#

I think that should work?

lone drum
#

Rows are not getting removed

#

Ping me when u reply

#

Can anyone help me in this?

lapis sequoia
#

!e

import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
print(df[df.a <3])
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 |    a
002 | 0  1
003 | 1  2
lapis sequoia
lone drum
#

I tried this way but rows not getti2removed

#

Rows are not getting removed

#

Ping me when reply

rigid zodiac
lone drum
#

Ping me when u reply

rigid zodiac
#

probably will be something like data[data['time']== ####].drop(axis=0,inplace = False)

rigid zodiac
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @vital dove until <t:1638881921:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

errant path
vital basin
shrewd lily
#

Is there an equivalent function in Python to R's emmeans ?

serene scaffold
#

@shrewd lily what does that do

shrewd lily
#

Calculate the estimated marginal means

lapis sequoia
#

Is there any way I can pull the list items in jupyter notebook's markdown shell rather than putting it manually?

lapis sequoia
#

for example list_a = [2, 4, 2, 4, 6, 6]

I want to loop over to this list inside a markdown shell

#

@rigid zodiac

#

basically, I want to generate a table with outputs

bold timber
#

How to fix this error?

serene scaffold
#

@bold timber what do you think the error message is trying to tell you?

bold timber
serene scaffold
#

@bold timber you passed key word arguments that don't do anything

rigid zodiac
lapis sequoia
bold timber
serene scaffold
next lance
#

Is it nice and worthy to create a app that can find rent rooms and house around someone and it can show people the map to go to that place

We can also filter prize and location

This app can be really helpful for both buyer and room seller

#

I can also add more things then just rooms like hotels, flats or apartments

#

Is it a good idea

#

😋

dreamy bone
#

anyone wanna work a data science/ai project? I've always love the idea of creating AI so after I finish up the theory of machine learning, i defintely wanna invest my time in making stuff (:

serene scaffold
dreamy bone
#

oh yeah for sure, open source, i just wanna get some hands on exp (:

teal mortar
teal mortar
dreamy bone
#

I mean im not making anything new, im just gonna look at what idea looks coolest on the internet 😛

gritty spear
dreamy bone
lone drum
#

i tried this way ```python
Traceback (most recent call last):

File "F:\nifty_banknifty\remove time values.py", line 4, in <module>
df = df[df['time']== '15:29:00'].drop(axis=0,inplace = False)

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 4308, in drop
return super().drop(

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\generic.py", line 4145, in drop
raise ValueError(

ValueError: Need to specify at least one of 'labels', 'index' or 'columns' ``` @rigid zodiac

#

@dreamy bone hello , can we discuss again?

dreamy bone
#

uh sure (:

#
just do df = df[df.time!-'15:29:00']
(This filters out the dataframe so it doesn't have that value in it)
lone drum
dreamy bone
teal mortar
# dreamy bone how about you?

mostly repeating, didn't do ML for a while, but know a bit of scikit-learn, pandas, numpy , tensorflow and pytorch

dreamy bone
#

ah im going through each thing thoroughly and making notes so i can just refer to that, takes a little bit longer, but its really efficient studying

teal mortar
dreamy bone
dreamy bone
#

im pm or should i show here?

teal mortar
#

here, maybe other people will give better recommendations

dreamy bone
# lone drum this worked thanks

you forgot to assign the returned value, i think. Basically doing df[df.time!='date'] doesnt change in place, it returns a different value so you gotta store or print out to see it

#

sure

#

just give me a sec

lone drum
dreamy bone
#
1) Comments
2) Variables
3) Data Types
4) Numbers
5) Casting
6) Strings
7) Booleans
8) Operators
9)Lists
10) Tuples
11) Sets
12) Dictionaries
13) Loops
14) Functions (Declared and undeclared(i.e. lambda), Generators
----> stuff like return, continue, break, pass

15) Objects and Classes
16) OOP stuff
17) Data structures and algorithms
18) Multi-threading and multi-processing
19) Modules like math, cmath, os, file, string (i have to do JSON just realized - i made the stupid mistake of saving my file in the beginning (like the first day i had begun) as json.py so i couldn't use the actual json library, random, string, statistics, collections, itertools, sys, formatting strings)
20)number methods like bin, oct, hex, etc.
21)datetime, time modules
22) list, dict comprehensions
23)wrapper functions 
24) I regret wasting time in GUIs tho :(
25) Also Exception Handling

So, basically from all that, I conclude that i still gotta learn about regex, JSON and sockets.

I'm doing all the AI libaries now, and after I finish the theory ill do django before doing projects in Data Science/AI with Python. (:

Any advice? 😛 pixels_snek_2

#

you probably left haha, it took a while to write 😛 @teal mortar

teal mortar
#

and there are more advanced books after

dreamy bone
#

i use freecodecamp :P, it's got a load of content for django and i think for ml also

#

but first i read through the text

#

ill save the django link on my notepad then

#

i just bookmarked nvm lol

teal mortar
#

from ML you can start with Andrew NG course on coursera, I believe it is now free

#

but I would go with the a book too

dreamy bone
#

oh im going through websites actually

#

geeksforgeeks has some really great content

teal mortar
#

Hands-on
Machine Learning
with Scikit-Learn,
Keras & TensorFlow is quite for beginners

dreamy bone
teal mortar
dreamy bone
#

ill use that too if i can get some hands on even though freecodecamp has i think 3 ml models we can make in the process of watching the ml vid

#

well im gonna take a break from learning and get back to it tomorrow

#

feel free to add me if you wanna make some cool ML models in the future (:

teal mortar
teal mortar
#

learned something new, try to apply it

dreamy bone
#

yeah first solid understanding of theory, i.e., syntax and concepts like k-means clustering, linear regression and stuff is really key

#

first ill try to understand all the theory, and then ill go for implementation (:

teal mortar
#

in case of python visit codewars.com to get used to solving some problems

#

or leetcode

dreamy bone
#

i use leetcode (:

#

Basically my end goal by next year October is :

1) Python (OOP, DSA, Django, AI libaries)
2) Java (OOP, DSA, Hibernate, Spring, Springboot) 
3) SQL
4) Frontend stuff like Typescript, HTML, CSS
5) Rust in my free time, no rush as jobs in this are usually for seniors (:
#

and C++

teal mortar
#

C++ could take all that time :), I wouldn't go into Rust if you don't need it

dreamy bone
#

I suppose, the sequence is given in priority-wise and im not gonna rush anything for sure

#

C++ would be above rust tho

teal mortar
#

in case of SQL learn postgreSQL and you'll be fine

dreamy bone
#

i use sql server at work actually

teal mortar
#

well, in that case you know better 🙂

dreamy bone
#

I really think using pandas would be more efficient than sql tho, we have millions of rows of data, so i think pandas could reduce the query time from like 50 seconds to maybe like 10 -20 secs idk

teal mortar
#

html you can learn 80-90% of the stuff in like 5 days

#

css a bit more

dreamy bone
#

let's see how it goes, either way, big mncs that have java software dev jobs ask for like 2 years of hands-on exp so ive got plenty of time

#

and python is solid for data science

teal mortar
#

I would go with javascript instead of java, if you want to go the django route

dreamy bone
#

typescript is actually a supset of javascript

#

ts is statically typed

#

imma add you and we can get to making ml models from next year then (:

teal mortar
#

ok 🙂

fast pawn
#

hey guys, I'm a data science fresher, just wanted to know if adding Linux in my resume would be beneficial or not?

#

under technical skills*

dreamy bone
#

i mean i added ms excel lol

wicked grove
dreamy bone
#

but ms excel does need to be learnt :3

fast pawn
#

yeah, i do know MS excel

wicked grove
fast pawn
#

But my daily driver is Arch Linux, would this add any weightage? or should i just not mention it?

dreamy bone
#

do you get a certification afterwards?

wicked grove
wicked grove
#

Some days i finish a week's work in a day or 2

dreamy bone
#

oh its paid, ill do it after i learn everything then lol

wicked grove
#

Yeahh it's pretty good
I think you have to pay only if you want the certificate

dreamy bone
#

good luck then (:

#

btw you should check this website out. it's got a nice amount of theory on the matter, might help

#

and use youtube (:

wicked grove
wicked grove
wicked grove
dreamy bone
#

never heard of it!

lone drum
dreamy bone
#

No prob

teal mortar
wicked grove
#

Alrightt, thank you so much! I will go through bothe

#

*both

teal mortar
#

you are welcome

pastel valley
#

the concept of svm is a like a vector and distance to something right?

serene scaffold
#

and then when you go to make predictions, it uses the vectors at the edges of the regions (the "support vectors") to make decisions about which region the point you're trying to predict for is in

pastel valley
serene scaffold
#

because what ultimately matters is where the boundaries are for each region

pastel valley
#

sry my imagination is just poor please bear with me 😅
here

thorn coral
#

hi

#

can someone help me with training a linear regression model using sklearn?

pastel valley
#

the violet will go to green class because the nearest point is the green on that region?

#

oh i see nice nice thank you sir

paper badger
#

for ppl who are currently in uni: would you advise someone to choose computer science or data science as a major?

bronze skiff
#

has anyone used the python API for apache flink?

#

any roadbumps opposed to the scala API?

#

(would rather avoid pure java is possible)

bronze skiff
paper badger
bronze skiff
#

pure math in undergrad/grad

rigid zodiac
#

do a lot of applied statistics

#

and computer science

paper badger
#

ok thanks @bronze skiff

odd meteor
odd meteor
thorn coral
#

@odd meteor I've done that training part, the thing left is making 2 plots from crime dataset, what exactly should I plot? I can't get.. can u suggest?

rough pilot
#

how do you replace one column value with another
for example i'm setting a for loop like:
for i in df['a']:
replace (df['a'], df['b']

serene scaffold
#

any time you're trying to do something with a DataFrame, start with the assumption that there are no loops involve and wait to be proven wrong

odd meteor
# thorn coral <@!519319496868233227> I've done that training part, the thing left is making 2 ...

This is where you'll let your data visualization skills to shine. So you might wanna visualize X3 using barchart, visualize the trend in reported violent crime (X2) using lineplot, visualize the distribution of X1 or X2, visualize X5 using scatter plot etc...

You can just think of any other useful visualization and plot it. You could add a little flex by doing using plotly to make your visualizations interactive

tawny hollow
#

Someone with science / math background able to help with this: https://stackoverflow.com/questions/70264206/golden-section-moving-average-with-python-numpy - Trying to reproduce the Golden Section Moving Average with Python/NumPy. Thank you for your time.

regal ingot
#

in temporal difference q learning what does a max mean

regal ingot
#

i got a cat and a dog in a four square room. the first trial states their in the same square and dog moves down
the reward stated earlier is +1 if they're not together and -1 if they are
is the initla reward +1 since the action was moveing the dog out or -1 since they were together

charred umbra
regal ingot
#

a datascience major would be just comp sci and stat courses i guess

#

seems suckish

pallid shuttle
#

Hello I'm relatively new to AI and I've been learning from resources that you facilitated some days ago. I saw that there is a common pattern which consists of prototyping and modelling machine learning algorithms using tools like Octave or Mathlab instead of implementing those algorithms straight away in your desired programming language (Python). The purpose of doing this is to create a functional solution and then translate it into your desired programming language code so that you don't have to start from the very beginning which is a bit time consuming. Is that correct? I'm currently playing around with Octave and it looks cool but I'm afraid that I might not be using it considering that Python has great external libraries like tensorflow and so on...

regal ingot
#

damn

odd meteor
# pallid shuttle Hello I'm relatively new to AI and I've been learning from resources that you fa...

Octave was built with Matlab compatibility in mind. I'm presuming you're using Andrew Ng's ML course on Coursera to learn. 😀

Coding in octave was actually one of the things that threw me off Andrew Ng's course from the get go plus I struggled to understand at the beginning. He was using Octave to code and I wasn't particularly interested in being language agnostic when I started learning ML.

In my opinion, Andrew Ng's Coursera course is good for understanding the core Math and Statistics + Theory behind most fancy ML algorithms we use.

If you're interested in Python, then you're probably better off starting with Udemy courses or Kaggle.

pallid shuttle
#

But I will swap to kaggle or udemy as you said I don't like the fact that is being language agnostic

regal ingot
#

@odd meteor u got any knowledge on Q learning in RL

odd meteor
regal ingot
#

nice

crisp cargo
#

I'm really struggling with understanding and implementing mini-batch gradient descent within python (without Scikit learn) for a class, if anyone has experience and would be willing to help me out or give and sort of guidance, it would be greatly appreciated. feel free to drop me a message if you do not want to clog the chat here

To give more context:
I have my vector of targets and a design matrix containing my input variables. I have an initial grid of variables to try for the learning rate, regularization parameter, and num of data points. I believe this is a correct approach? I just don't really understand the math behind the algorithm and thus how to convert it to python code.

vale swallow
#

Whats the preferred solution for making dynamic type dashboard for data presentation and manipulation?

#

I am looking at things like Flask but not sure if there are easier tools

#

Something I could maybe share with someone who isnt familiar with py or jupyter

mortal dove
#

I have a dataframe with multi indexed columns
I can get a new dataframe from the lower level columns with df['c1']
However I can't do that to filter on the higher level column. Any suggestions?

Dataframe creation is from an API, so I can't change that

#

Simple way is just calling df['c1']['a1'] for each lower level column and creating a new dataframe with that, but I'm wondering if there's a more built in method of doing this?

mortal dove
#

Found a solution to my own problem, can use df.swaplevel(axis=1)

cursive gazelle
#

Hi! Is there a way to pull strings from a dataframe column and have them return as normal strings?

mortal dove
#

Do you want to pull the column names?

cursive gazelle
#

No, just the values inside that column

#

So I have a column of 200k+ titles (they are strings) but only want a random sample of 10. The catch is I need them only in basic strings. When I do to_strings, it returns them into one large string instead of their individual titles so that's not what I want

vale swallow
#

Anyone used 'streamlit' before?

mortal dove
#

Could use to_numpy() to get an array of the strings

cursive gazelle
#

That doesn't work either. It can't be in an array, list, tuple, etc

#

Only as a string itself

mortal dove
#

Do you want one string with 10 title names?

cursive gazelle
#

I want 10 strings of the title names

mortal dove
#
titles = df['title'].to_numpy()
selection = np.random.choice(titles, size=10, replace=False)

If this isn't what you want, could you give a small example of the format the data should be in?

cursive gazelle
#

That wasn't it, unfortunately.

This is an example from the docs

    "Human machine interface for lab abc computer applications",
    "A survey of user opinion of computer system response time",
    "The EPS user interface management system",
    "System and human system engineering testing of EPS",
    "Relation of user perceived response time to error measurement",
    "The generation of random binary unordered trees",
    "The intersection graph of paths in trees",
    "Graph minors IV Widths of trees and well quasi ordering",
    "Graph minors A survey",
]```
mortal dove
#

Looks like a json file?

titles = df['title'].to_json(orient='records')
cursive gazelle
#

Ahhh, it was!! I think it will work now

#

Thank you so much! thanks

mortal dove
#

It's not clean so hopefully someone has a better solution, but I'd do it this way to get 10

import json

titles = df['title'].to_numpy()
selection = np.random.choice(titles, size=10, replace=False)
json_file = json.dumps(selection.tolist())
cursive gazelle
#

I think I will do this
titles = df['title'].sample(10).to_json(orient='records')

mortal dove
#

yup, much cleaner, lol

serene scaffold
#

yeah I was about to point out sample

#

my work here is done

#

goodbye

cursive gazelle
#

Lol, thanks again!

mortal dove
#

Yea, I'm not familiar enough with pandas to always know what to use, lol

cursive gazelle
#

Yeah, I just started learning too and someone else told me about sample() lol

serene scaffold
#

was probably me on an alt

cursive gazelle
#

Are you my mentor 👀

serene scaffold
#

no but all staff members are lemons' alts

mortal dove
#

Was that 2019? 2018? Can't even remember

serene scaffold
#

idk. 2020 was where we had social distancing in the off-topic channels

grave frost
mortal dove
#

Oh, that looks interesting. I'll give it a shot if not answered yet tomorrow

limpid root
#

Anyone know why I would be getting ValueError: Unknown label type: (array([...]), )? I'm using sklearn and it keeps on getting this error

#

the actual array is float64 and shape (400, )

#

for some reason when I put it in as the y of a .fit then it doesn't work

serene scaffold
limpid root
#
File "filepath\main.py", line 166, in get_model
    model.fit(X_train, y_train)
  File "filepath\venv\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py", line 752, in fit
    return self._fit(X, y, incremental=False)
  File "filepath\venv\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py", line 393, in _fit
    X, y = self._validate_input(X, y, incremental, reset=first_pass)
  File "filepath\venv\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py", line 1131, in _validate_input
    self._label_binarizer.fit(y)
  File "filepath\venv\lib\site-packages\sklearn\preprocessing\_label.py", line 301, in fit
    self.classes_ = unique_labels(y)
  File "filepath\venv\lib\site-packages\sklearn\utils\multiclass.py", line 102, in unique_labels
    raise ValueError("Unknown label type: %s" % repr(ys))
serene scaffold
limpid root
#

which parts?

serene scaffold
#

some region of the code that includes line 166

#

enough to establish the context.

#

like what X_train and y_train are

limpid root
#
X_train, X_test, y_train, y_test = train_test_split(data["X"], data["y"], test_size=0.2)
model = MLPClassifier(hidden_layer_sizes=(2, 2), max_iter=1000)
model.fit(X_train, y_train)
#

like that?

serene scaffold
#

yeah

#

is data a DataFrame?

limpid root
#

yep

serene scaffold
#

do print(data.head().to_dict('list')) and show the text, please

limpid root
#

{'X': [[9.905686378], [66.33001336], [21.75396634], [33.24767143], [2.689293441]], 'y': [[99.59781655], [8.039968478], [13.02439705], [86.38519375], [57.15746171]]}

serene scaffold
limpid root
#

Not sure. Dataframe is from csv

#

idk if that changes things

serene scaffold
#

do data = data.applymap(lambda x: x[0]) to get everything out of the lists

#

then try again

limpid root
#

{'X': [9.905686378, 66.33001336, 21.75396634, 33.24767143, 2.689293441], 'y': [99.59781655, 8.039968478, 13.02439705, 86.38519375, 57.15746171]}

serene scaffold
#

yes, that looks better

limpid root
#

Oh I just remembered why they're in lists

#

I did a thing where the data might actually look like this:
{'X': [[9.905686378, 99.59781655], [66.33001336, 8.039968478], [21.75396634, 13.02439705], [33.24767143, 86.38519375], [2.689293441, 57.15746171]], 'y': [[1079.75072], [677.9094591], [745.1665871], [1196.777592], [1132.214797]]}

#

I'm pretty sure that the shape of x for .fit is supposed to be 2d

#

found it in docs

#

shape of X_train is (400, 2) and y_train is (400, 1) when it's put into .fit

serene scaffold
#

if X is a Series of lists, that's not the same thing as it being a 2d-array-like.

limpid root
#

I know, but I do some stuff to data before putting it in so that it is a good shape

serene scaffold
#

and it won't interface with sklearn correctly

#

so, don't do that before

#

It should be noted that the X column is still one dimensional. The fact that it contains lists does not make it two-dimensional.

limpid root
#

Right now, X_train looks like this right before going into fit:

 [54.39736997, 99.64921956],
 [53.00488272, 46.58886973],
 [24.22203264, 88.99648647],
 [71.8330977,  28.51141576],
...]```
serene scaffold
limpid root
#

<class 'numpy.ndarray'>

serene scaffold
#

okay, what about print(X_train.shape)?

limpid root
#

(400, 2)

serene scaffold
#

try doing model.fit(X_train, y_train.reshape(-1)) @limpid root

limpid root
#

same error

serene scaffold
#

do print(y_train)

limpid root
#

[1187.151578, 568.100153, 685.7766812, 626.1073536, 1199.412295, 1543.641543, 1261.285556, 350.4392658 ...]

#

this is y_train after .reshape(-1)

#

shape of y_train is (400, )

serene scaffold
#

and you're still getting the same error that you showed at the beginning? if so, I don't think I'll be able to debug this remotely.

limpid root
#

I think so

#

Error is still ValueError: Unknown label type: (array([1116.166069 , 830.6421689, 1152.047414 ...]), )

serene scaffold
#

why is that a tuple

limpid root
#

I have no idea

#

I'm just passing y_train into fit

#

and y_train looks the same as what I sent just now

#

aight I gotta go for now, I'll continue banging my head on this tomorrow

regal ingot
#

i got a csv with 2 cols of strings

#

how do i split each row into a dictionary

atomic tide
serene scaffold
#

there's the to_dict method

dreamy bone
regal ingot
#

i got it

#

um for left left column how do i check if the values in the string are the same

#

so WW returns +1

#

XV return -1

dreamy bone
#

Hi ^ don't mean to interrupt your comment but I have an important question

#

Python is probably the best lang to make ai with any amount of complexity right? Cause someone said python was for basic ai and I'm pretty sure they're wrong? I mean python has a lotttttt of stuff in its ai Libraries

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @sour dew until <t:1638937557:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

regal ingot
#

i have this equation

#

TD q-learning equation

#

i wanan make a function to get the td q value

regal ingot
#

i have no idea how to impelmeent this

golden path
#

hey guys im trying to do a project analyzing a gene expression profile matrix from a scRNAseq dataset from cancer cells.
i have about 3589 cells, ~2000 genes, and 6 cell types in the dataset. the tSNE graph would show the cell types. my plan is to use PCA and tSNE, followed by logistic regression to characterize the differences in expression between the clusters that the tSNE graphs gave.

does anyone have tips on how i can perform the logistic regression portion?

tawny hollow
potent flame
#

@regal ingot Have a look at some implementations of q-learning in python on Github. All that equation is is the bellman equation. You need to setup a MDP around your environment in order to define an iterative process that you can apply that equation upon

pastel valley
#

yo anyone here can help me understand eigenfaces?