#data-science-and-ml

1 messages ยท Page 415 of 1

timid kiln
#

Much appreciated. I'll take some time to absorb your suggestion. ๐Ÿ™‚

wooden sail
#

here's a MWE

#
import numpy as np

def find_intersection(a, b, c, d):
    r'''
    function that finds the intersection between two line segments. one segment
    is defined by the points a, b; and the other, by c, d.
    '''

    y = c - a
    A = np.zeros((2,2))
    A[:,0] = b - a
    A[:,1] = c - d
    detA = A[0,0]*A[1,1] - A[0,1]*A[1,0]
    if detA == 0:
        return np.ones((2))*np.inf #intersection at infinity
    else:
        x = np.linalg.inv(A).dot(y)
        if (0 <= x[0] <= 1) and (0 <= x[1] <= 1):
            return a + x[0]*(b-a) #valid intersection
        else: 
            return np.ones((2))*np.inf #intersection out of segment

a = np.array([0,0])
b = np.array([1,0])
c = np.array([0,-1])
d = np.array([1,1])

p = find_intersection(a,b,c,d)
print(p)
#

the result is

[0.5 0. ]

as expected

#

i made it so it returns [inf, inf] if the matrix is not invertible (that means the lines are parallel. either they never touch, or they are the same line. it's a degenerate case)

#

i guess i forgot to check whether the entries of x are in the valid interval [0,1]
edit* there we go

misty flint
#

here is my daily complaint about aws

steady basalt
#

Hereโ€™s my daily I donโ€™t use cloud computing

misty flint
wooden sail
#

here's my daily day

misty flint
#

otherwise itd be useless

steady basalt
#

Hereโ€™s my daily thanks edd for all ur help X

#

Now help me with eigenvectors!

wooden sail
#

oof

#

i have like 30 mins

#

do you have any specific questions

misty flint
#

i think the only person who hates aws as much as me here is Stel

steady basalt
#

No not yet

#

Meant later

#

Before that I have to somehow get thru the numpy

misty flint
#

edd can you learn aws and then teach me please

wooden sail
#

tomorrow might be a good day to help, i have some dead time in between meetings

steady basalt
#

Iโ€™m too stuck to advance

misty flint
#

jk

steady basalt
misty flint
steady basalt
#

Pure numpy no functions

wooden sail
#

absolutely lovely

steady basalt
#

Allowed

#

Itโ€™s a date then..

misty flint
#

eww

#

i have to run

steady basalt
#

Theyโ€™re gona make me translate the panda

wooden sail
#

check out what i shared above. nice way to find the intersection of line segments by inverting a matrix

steady basalt
#

In python

fallen portal
#

good morning. i'm wondering if someone can help me with a question regarding dask

#

i turned a csv into many parquet files using pandas, and when i read these parquet files using Dask i can do basic operations such as .head() and .tail(), but when i try to do other things like operations on a column i'm getting a ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.. When I do .index on the dataframe it shows that i do index have an index, but known_divisions is False. I'm not really sure how divisions plays into Dask or its indexing structure, or why i'm receiving this error. Any thoughts?

#

The index on the files are just a basic 0-N index created by Pandas (I didn't specify a column)

steady basalt
#

A server?

serene scaffold
#

20 million rows isn't much on the scale that a lot of data science is done these days. are you trying to pick a particular flavor of SQL?

timid kiln
# wooden sail ```py import numpy as np def find_intersection(a, b, c, d): r''' functi...

Thank you! I came back to report that this: https://www.w3schools.com/python/ref_set_intersection.asp won't work. But as I re-read the other persons's post, they indicated that I would have to include each and every point. So one, I did it wrong, and two, I won't be calculating each and every point. At least, that's not desirable for this process. ๐Ÿ™‚

steady basalt
#

Csv

timid kiln
misty flint
#

what are you doing with said data

#

is it more read heavy or write heavy

#

analytical vs. transactional

serene scaffold
#

Rex asking the big questions over here

misty flint
#

something something ACID vs. CAP

misty flint
#

ehh you could probably get away with most things as long as it can store that many rows

#

since youre not trying to put anything into prod

#

so whatever youre most familiar with/most interested in learning

#

i recommend mongo but thats me

#

do it

#

you are a student right? you can get access to mongodb atlas too

#

through the student dev pack

serene scaffold
#

if you're planning to do sentiment analysis, and you don't need to do any complicated queries or transactions (ie, the data is just there for you to feed into a model as-is), a text file should be fine

misty flint
#

resume-driven development

serene scaffold
#

I would be more concerned about how easy it is to feed the data into the model (a heavily nested JSON is not that), and not losing the data.

misty flint
#

also mongo has easy ways to query json stuff btw

#

and its aggregation pipeline is pretty powerful

#

but yeah you can go old school with txt files ig

#

so it just depends on if you want to learn a new tool or not. up to you

iron basalt
#

You probably don't need a database. If you design your system correctly with a separate IO layer, then you can swap that out later to use a database without having to change anything else in the system.

#

And I would then start with a simple IO layer and only make an IO layer for databases when you actually need one.

#

Do you control the file format? JSON / what is in it?

#

Looks pretty straight forward. If you want something more simple / you get to decide the format, then I would recommend a even more simple flat file format.

#

JSON is often overkill.

#

But yeah, if this works, just go with it for now. Databases later when actually needed.

narrow saddle
#

What does this mean?

NumPy uses C-order indexing. That means that the last index usually represents the most rapidly changing memory location, unlike Fortran or IDL, where the first index represents the most rapidly changing location in memory. This difference represents a great potential for confusion.
It's from the numpy docs. https://numpy.org/doc/stable/user/basics.indexing.html
what do they mean by 'rapidly changing memory location'

iron basalt
#

Well, nothing can be done about how bad Twitter's API gets.

#

Databases are not really the thing for solving overly complex JSON, they can do it, but there is so much else that they do and add as overhead because of it. There are libraries for wrangling more complex JSON on their own.

junior lintel
#

Hi guys, I don't know if this is the right section, in pandas I'm trying to figure out how to check with a "relative position" index without iterate through the whole dataframe.
A wrong code solution would be:
`
def foo_func(df):
index = df.tail(1).index[0]
dfCheck = df[index-3:index]
mask1 = dfCheck["A"].head(1) == True
mask2 = dfCheck["B"] > 0
if not dfCheck[mask1].empty and dfCheck[mask2].empty:
df.iloc[index] = True

df.rolling(3).apply(foo_func)
`
So the goal here is to check if there is a true in the -3 relative position and at least a value>0 in the -3:index portion. Any idea how can I translate this? Thank you all very much for any help

severe karma
#

Hi guys, anyone have experience with parsing SEC data (XML) using this library/API (https://arelle.org/arelle/documentation/xbrl-database/open-database/) ? I am trying to recover the XML form back into the table format of https://www.sec.gov/ix?doc=/Archives/edgar/data/0000315189/000155837021001774/de-20210131x10q.htm# and store into my database. Anyone knows how can I find the 'reference' or 'calculation' section using this package? Thx

steady basalt
odd meteor
hollow sentinel
#

code: https://github.com/krishnaik06/Car-Price-Prediction
Dataset: https://www.kaggle.com/nehalbirla/vehicle-dataset-from-cardekho
โญ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while youโ€™re typin...

โ–ถ Play video
#

nice example of an end to end project

vernal sail
#

i wanted@to learn ai

serene scaffold
vernal sail
#

yeah

serene scaffold
#

what

vernal sail
#

artificial inelegance

serene scaffold
#

yes, but what is that

vernal sail
#

basically a human mind inside a computer

#

you canโ€™t learn that stuff

#

because thatโ€™s stupid saying you want to learn ai

#

oh wait

#

ohhhhh

#

OHHHHH

#

๐Ÿ˜”

serene scaffold
vernal sail
#

oh

#

well

#

iโ€™m ignorant in the knowledge of ai

#

please enlighten me

calm thicket
#

<@&831776746206265384> spam across multiple channels

serene scaffold
# vernal sail please enlighten me

in general, AI is when you have programs that solve a knowledge problem. or in other words, they emulate the application of knowledge in some way

#

in practice, it's usually understood to be a body of programming techniques that one would use when an exact sequence of steps can't arrive at an exact solution for a problem

vernal sail
#

oh

#

that pretty complex but i get why people learn about it

#

i see itโ€™s interest

vernal sail
#

lmfao

#

also

#

iโ€™m not in trouble for doing this right

misty flint
wooden sail
weary ridge
#

is there anyone who has used pytesseract

#

and opencv

#

ocr

#

lemme know

stoic viper
#

Hey.
We are working on a machine Learning model. we use xgboost and want to try a blended model between xgboost and lightgbm. I have no idea how i should start on this any tips?

steady basalt
steady basalt
winged jasper
#

Hey, already posted in general discussion but I guess I could receive more help here:
Hey guys, I'm an engineer that has some Python knowledge (I can easily follow documentation, find my way through a repository etc) and I need some help finding the best resources to help me build the following small project for a university course:

- User takes a photo with the phone's camera and uploads it (or is automatically uploaded to the pc that the phone is connected to with a cable / wifi)
- The photo goes through a Python API/ or is processed by a script that reads the photo, extracts the answers to a quizz and then offers the final score of the user
- The file the user will take a photo of looks similar to the one below (it might change just a little bit) 

I would really appreciate some help, I want to learn this my self. I know some JS, I know some python, linux, etc. And I understand AI/Machine learning, so I guess I will have to use OpenCV with some other libraries to have this pipeline. Looking forward for some constructive words ๐Ÿ™‚

shell panther
#

hello, I want to build an open domain chatbot. What's the best python framework for that in your opinion?

amber perch
#

not working

neon imp
#

Quick, easy way to have users upload pictures is to simply have them email them to you.

#

Lots of ways to do that though.

#

#2 First build your "user uploads picture to you, picture arrives, user gets a score" workload.

#

Do the machine learning last.

winged jasper
#

Yeah, thing is those are the least important parts of the code ๐Ÿ˜„ I can also do it by manually uploading the images to a directory because users will have to pass the sheet manually to me anyway. And the machine part thus becomes the most important and hard part ๐Ÿ˜ฆ

remote storm
# amber perch

Have you uplpaded the xlsx file in your jupyter notebook

#

You can either upload in jupyter's home page or link up the path in your PC for that

mighty condor
#

Is this where people might know about pandas?

lapis sequoia
#

Hi guys how can I plot a heatmap from a data frame pandas?

#

I have a data frame with 120k rows and 3 columns: customer, expenditure-type and ranking, ranking goes from 1 to 5 and indicates the amount spent from each customer (1=very little etc..) per expenditure type, i want to plot these data into a heatmap how can I do?

serene scaffold
wooden sail
#

what do you want the heatmap to show?

steady basalt
#

Should we create a data analysis channel I see we get daily how to pandas

wooden sail
#

seaborn was also gonna be my suggestion, but more details can be given based on what info they want the heatmap to show

mighty condor
#

Hi everyone, I am trying to save an output of applying a function to a dataframe to a new column in that dataframe, and when I don't save it as a new column, I am seeing the correct response, but when I go to save it in a new column, it is filling it with NaNs

wooden sail
#

can you show a code snippet

mighty condor
#

so this is the correct output, all 1's

#

then I do this: python df['newcol']= df.loc["Beg-4"].apply(foos.Beg_3)

#

but it's filled with NaNs instead of the 1's

wooden sail
#

Beg-4 is a column?

mighty condor
#

a row

wooden sail
#

then the issue is that you're not applying the function to all rows, most likely?

mighty condor
#

just 1 row, the row named Beg-4

wooden sail
#

and what do you want pandas to put into the other rows you didn't specify?

mighty condor
#

oh, I just want it to take all the outputs, and put them in a new column...oh I guess I should put them in a new row?

#

I just want to save them somewhere in a df

#

actually idealy a new dataframe

#

that I would then add more stuff to once I apply different functions to the other rows

wooden sail
#

that would make more sense, since it seems you're applying the function to the elements of a row. the number of outputs is equal to the length of a row. if you have more columns than rows, putting this into a column will end up with many unspecified values

mighty condor
#

each row has it's own function

wooden sail
#

you could make a new df if you like

#

or put the output as a row

mighty condor
#

ah so maybe if I view the whole dataframe some of it would have saved as not NaNs?

#

le tme look

#

hmmm, no, they're all Nans

#

so it didn't fill some with nans, but all

wooden sail
#

i'm not sure what pandas' default behavior is, but in any case you tried to put a collection of values somewhere it doesn't fit ๐Ÿ˜›

#

different functions will handle that error in different ways

mighty condor
#

so this is how I create the new dataframe, right?

wooden sail
#

wasn't newdf already a new dataframe? (i'm asking, i've never used pandas)

#

the code looks ok, just making sure the newdff line isn't redundant

mighty condor
#

this is what newdf looks like, I think it's not an actual dataframe?

wooden sail
#

try type(newdf) and see what it prints

#

just for peace of mind

mighty condor
wooden sail
#

aha

#

aight, that's your answer

mighty condor
#

so it was a series?

wooden sail
#

yeah

mighty condor
#

and now it's a dataframe?

wooden sail
#

whatever that is ๐Ÿ˜› yep

mighty condor
#

ty so much โค๏ธ

vernal sail
tacit basin
steady basalt
#

in c++ ๐Ÿ˜…

vernal sail
#

none

#

not a thing

steady basalt
#

Damn that sucks u gona have to learn how

vernal sail
#

and pretty bad in normal python

steady basalt
#

Get better then

vernal sail
#

ez

#

just got better

steady basalt
#

Well u asked how to and the answer is first be able to code

#

What can u do with python

misty flint
#

question

#

how would you make use of an ontological model in a business setting

#

these tend to be represented with knowledge graphs

#

okay, now what?

bold timber
#

Hi, I have a question: how to showing all number in x axis in plotly?

mild dirge
#

plt.xticks(range(1, 13)) @bold timber ?

wooden sail
#

i think when you create the fig, you can give the parameter x = [some_list_of_tick_values]

bold timber
#

doesn't works

wooden sail
#

so maybe x = range(1,13)

mild dirge
#

This is plt.bar() right?

#

@bold timber

wooden sail
#

they say plotly there, or do you shorten pyplot and plotly the same way?

mild dirge
#

Oh plotly

#

yeah that doesn't work then haha

wooden sail
#

should be something like fig = some_plotly_function(some_datafram, x = range(1,13), other_params)

brave sand
#

how do I find the win rate of my algorithm?

wooden sail
#

what's the algorithm

brave sand
#

qmix

steady basalt
#

man i fkin love plotly so beautiful

brave sand
#

my professor wants me to find the win rate of this algorithm how do I do that?

wooden sail
#

ah, some flavor of learning

#

well, it's montecarlo time

steady basalt
wooden sail
#

generate a huge amount of scenarios with different starting conditions and see how many times it wins

steady basalt
#

i cannot read it

brave sand
wooden sail
#

not really

brave sand
steady basalt
wooden sail
#

you're training this yourself?

steady basalt
#

and proof

wooden sail
steady basalt
#

ummm

#

it makes it hard to follow for non mathematitians

brave sand
steady basalt
#

maybe put proof in the appendix?

brave sand
#

or at least I ran this algoritm on my local machine and I got an output pickle file

#

how do I interpret this?

wooden sail
#

the validation error of the final epoch is what you want

#

what exactly that looks like, i can't say

brave sand
#

am I allowed to send files here?

steady basalt
#

i wonder if i just pin a massive 20x20 table on my wall with every single math symbol i ll be able to decipher papers

#

after a few weeks

arctic wedgeBOT
#

Hey @brave sand!

It looks like you tried to attach file type(s) that we do not allow (). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

steady basalt
#

Reading this paper ive noticed their use of the triple equals sign, what is that?

#

is that just equating functions?

brave sand
wooden sail
#

the usual procedure is that lots of cool, cutting edge, and largely useless results are produced in academia and research. then the results slowly trickle down as either the authors make the code open-source or people make open-source implementations of the results. then people who know nothing about them start using them more widely through APIs

wooden sail
steady basalt
#

not exactly deep coding though anyone can do it

wooden sail
#

that's rather uncommon

steady basalt
#

โ‰ก means identical to

#

umm

#

interesting

#

1/2 doenst = 2/4

#

but rather is identical to

#

why didnt they teach this in school

wooden sail
#

because it's a useless distinction

steady basalt
#

We consider a partially observable scenario in which each agent draws individual observations z โˆˆ Z according to observation function O(s, a) : S ร— A โ†’ Z. Each agent has an action-observation history ฯ„a โˆˆ T โ‰ก (Z ร— U)โˆ—, on which it conditions a stochastic policy ฯ€a(ua|ฯ„a) : T ร— U โ†’ [0, 1]. The joint policy ฯ€ has a joint action-value function: Qฯ€(st,ut) = Est+1:โˆž,ut+1:โˆž [Rt|st,ut], where Rt = ๔ฐ†โˆž ฮณirt+i is the discounted return.

wooden sail
#

they represent the same number, so there is no case in standard arithmetic where it matters

steady basalt
#

so reading that i saw that symbol and thought

#

wtf?

#

what is the : before infiinte for?

#

isnt that usually a arrow?

wooden sail
#

where?

steady basalt
#

2nd last line

#

they just mean 1 to inf?

wooden sail
#

no idea

#

can you paste an image of the original text instead?

steady basalt
brave sand
#

how do you guys usually interpret pickle files?

steady basalt
#

is there any particular reason why the left side of the equation uses () and the right side uses [] ?

wooden sail
#

yeah seems like some sort of interval, but it's difficult to say. unbeknownst to most, symbols don't have universal meanings. you'll have to hope the authors explained their notation near the beginning of the paper, or work your way from the top as you figure out how they use the symbols

steady basalt
#

as a non math person this pisses me off

#

i often read things and have no clue what tehyre saying

#

hows 1:inf normally written?

#

isntit like

#

(1,inf]

#

or somehow

iron basalt
#

Expected value often uses [].

wooden sail
#

(1,inf] doesn't make it clear whether the numbers are taken from N, Z, or I

steady basalt
#

can you explain why in that equation they use expected values and what in maths that means? I thought equations didnt do expected

wooden sail
#

and yeah, expectation often uses [] or {}

iron basalt
#

Wikipedia

#

Styled E in this case.

wooden sail
#

.latex $\mathbb{N}$ are natural numbers, $\mathbb{Z}$ are the integers, and $\mathbb{I}$ are the reals

strange elbowBOT
steady basalt
#

how can expected value be a thing in normal equations? ive only ever ran into that in statistics making synthetic datasets

wooden sail
#

you're doing statistics there

steady basalt
#

i need to read up more on exactly how expected values work on maths

wooden sail
#

"stochastic policy" means "random policy"

steady basalt
iron basalt
#

You can think of it as a weighted sum for simplicity often.

#

(You have probably already seen a sum in an equation)

wooden sail
#

that was a typo, sorry. Z are whole numbers. I are the reals.

steady basalt
#

after i finish my linalg instead of calculus i will take a course on notation and definitions..

wooden sail
#

.latex $\mathbb{N}$ are natural numbers, $\mathbb{Z}$ are the integers, and $\mathbb{I}$ are the reals

strange elbowBOT
wooden sail
#

sadly i can't get the bot to delete nor update the tex, so here it is again

steady basalt
#

soooo sick of not knowing definitions

wooden sail
#

i'm surprised these things were not mentioned in your linalg course

steady basalt
#

these things were certainly not mentioned

wooden sail
#

one usually defines vector spaces as "a vector space over a field", so one has to at least briefly mention fields

steady basalt
#

is there any specific content that explains all these brackets and meanings?

wooden sail
#

no, because they're not universal, as i said

#

you pick up math-reading abilities as part of your mathematical maturity

steady basalt
wooden sail
#

by doing maths

steady basalt
#

or real value

#

i cannot do maths that advacned i can learn th atr

iron basalt
#

"mathematical maturity" - Huh, so i'm not the only one that uses that term.

wooden sail
#

i just realized i still mixed up the symbols, i meant R, not I

steady basalt
#

starting form 0 means im years away from that sorta stuff

wooden sail
#

sorry, i'm tired

wooden sail
iron basalt
wooden sail
#

i'm sure, i was just being polite lol

iron basalt
#

I don't think it was not polite. I just have not seen to term used in a chat room before.

steady basalt
#

im never gona have the time on my hands to practise enough from 0 to being able to pass highschool or early degree level papers

#

that shit requires spamming it over and over

wooden sail
#

that's how everything is learned

steady basalt
#

yeah but i not in school anymore

#

i dont have that time

wooden sail
#

that's also fair. consider that these people do this stuff for a living

steady basalt
#

high school here starts at pretty easy level maths, like straight lines, surds and basic probability

#

sure

#

but after 1 year

#

it gets quite tricky

#

the trig is confusing, the calculus requires spamming and they dont even teach linalg

#

its like 99% trig

brave sand
#

I'm in linalg right now

wooden sail
#

linalg is good for your soul

#

probably the first bump with proof-heavy courses

steady basalt
#

they dont teach it here

wooden sail
#

sadness

iron basalt
#

Linalg is so widely applicable, especially for many programs (computers compute it really well).

steady basalt
#

yo

#

Consecutive terms of a sequence are related by unรพ1 1โ„4 3 ๔ฐ (un)2

#

dammit

#

whats the strategy to finding the 50th term?

#

like a one liner?

#

sure i could go thru them one by one but that wud take ages

wooden sail
#

you could use a for loop, sure. if you don't want to, though, you have a problem ๐Ÿ˜›

steady basalt
#

its a pen and paper math exam

wooden sail
#

the formula is recursive. you have to do the math yourself on paper

#

aha

#

well

#

see if you can find a pattern

steady basalt
#

its a small question they expect u do it in 2 lines

wooden sail
#

maybe it telescopes nicely

steady basalt
#

i thought hey thats easy when they asked to find third term

#

then the next q is 50th

wooden sail
#

compute a few terms and look for the pattern

steady basalt
#

srlsy?

wooden sail
#

that's the whole point

steady basalt
#

ur meant to deduce the 50th term by just doign the first 3 or 4 terms and guaging it?

wooden sail
#

yes

iron basalt
#

What is the first term?

#

u_1

wooden sail
#

ngl that thing grows pretty nastily lol (if you start from 0)

steady basalt
#

its a high school exam paper

#

i thought id have. alook

#

damnnn

#

i cant

#

u1 is 2

wooden sail
#

aha, that's the trick

iron basalt
#

If you write it out you will see it.

wooden sail
#

write out like 4 or 5 terms and you're done

#

you really shouldn't need more than 4

#

you either missed the pattern or forgot the parentheses

iron basalt
#

A lesson on how the base case can completely change things.

steady basalt
#

ill do it after food

wooden sail
#

yeah i was 3 terms in and was like "those highschoolers are dead and buried by now"

iron basalt
#

I recommend trying u_1 is 3 or 4 to see what else happens in those cases.

steady basalt
#

u shud have a look at UK A2 maths core3/4 papers or advanced maths

#

its so hard i dropped out

brave sand
#
import pandas as pd

file_name = "/Documents/Python Virtual Environments/Popular-RL-Algorithms/model/qmix_agent (1)/archive"
objects = pd.read_pickle(file_name)```
#

why does this not work?

#

Traceback (most recent call last): File "qmix_pickle_reader.py", line 4, in <module> objects = pd.read_pickle(file_name) File "/home/ethan/Documents/Python Virtual Environments/marl-test-env/lib/python3.8/site-packages/pandas/io/pickle.py", line 187, in read_pickle with get_handle( File "/home/ethan/Documents/Python Virtual Environments/marl-test-env/lib/python3.8/site-packages/pandas/io/common.py", line 795, in get_handle handle = open(handle, ioargs.mode) FileNotFoundError: [Errno 2] No such file or directory: '/Documents/Python Virtual Environments/Popular-RL-Algorithms/model/qmix_agent (1)/archive'

wooden sail
#

iirc to read pickled files you need to import all of the libraries that were involved in the object that got pickled

#

so try taking all of the imports you used on the file that generated the pickle, and put them also in this one that reads the pickle

#

oh but there it's also telling you you're reading from the wrong directory

brave sand
#

same error

#

yeah

#

I didn't think loading in libraries would resolve my file not found error

wooden sail
#

i'm retty sure paths don't like spaces in them

#

try encasing the part of the path with a space in ''

#

'qmix_agent (1)'

#

otherwise, rename the folder ๐Ÿ˜›

upper spindle
#

best place to learn deep learning with basic python programming?

mild dirge
#

What are you currently at?

#

You know about stuff like linear regression, and perceptron, multi-layer perceptron etc?

#

@upper spindle

brave sand
pliant perch
#

anyone know the best way to get into ai for beginners

agile cobalt
#

quite much replying to both delta and yourdad: there's Andrew Ng's Machine Learning Specialisation on Coursera, but I cannot say for sure if it's the best option out there

pliant perch
#

any advice on this?

lost cairn
#

Is it possible to practice python with a mobile?

wooden sail
#

i would say andrew ng is pretty aight. you need some background knowledge though, and iirc it doesn't go much into code. still it's a great place to start and i encourage learning the math before trying the code

agile cobalt
pliant perch
agile cobalt
radiant forum
#

Hi people! I'm trying to understand the amount of parameters in a CNN. Well, I am classifying black and white images into four classes. Firstly I processed the images as RGB and later as 'grayscale'. I expected an exponential decrease in the amount of parameters after the Flatten layer, but actually they remained the same. What do the parameters actually depend on?

pliant perch
agile cobalt
pliant perch
radiant forum
#

ey people, I do not mean to be rude. Just in case you didn't know there is a pedagogy channel

agile cobalt
#

sklearn is fine for non-deep learning
pytorch or tensorflow are used for deep learning, something via a higher level API / package such as fast.ai, huggingface or keras

radiant forum
#

please,

wooden sail
agile cobalt
wooden sail
#

all it does is change the shape. it doesn't apply any function whatsoever

#

it's akin to the "vectorization" operation you can apply to an m x n matrix in order to obtain a length m*n vector

#

same number of parameters, just reshaped (and generalized to more dimensions)

radiant forum
wooden sail
#

lemme make an example for you

#
In [11]: import numpy as np

In [12]: x = np.array([[2,3],[5,6]])

In [13]: print(x)
[[2 3]
 [5 6]]

In [14]: print(x.flatten())
[2 3 5 6]

In [15]:
#

they're exactly the same thing, just in a different shape

radiant forum
#

ok, let's change the point of view

#

when adding a convolutional layer it extracts the feature maps accordingly to a number of filters

#

if the size of the input is smaller I expected somehow a decrease on the amount of feature maps and also in the number of parameters

#

but there is something else... that is my question

wooden sail
#

oh

#

that's determined by the shape of the convolution layers, the pooling layers, and the dense layers

#

from which layer to which layer did you expect a large change?

#

flatten does nothing, dropout doesn't change the number of parameters, only deactivates them randomly at each iteration. the dense layer is a linear mapping from R^n to R^m, here with m << n, so that's the layer that has a ton of parameters, but the feature vectors are quite small after it

#

in the convolutional layers, you specify an input 2D shape and a number of filters. the output is of size ~ N - kernel_length x N - kernel_length x num_filters.

#

for all of the layers, the number of parameters is related to the underlying (multi-)linear transformation from something isomorphic to R^N to something isomorphic to R^M, having N*M parameters

radiant forum
#

so... what you mean is that an image of size (256,256,3) is isomorphic to another image of size (256,256,1)?

wooden sail
#

not at all

#

what i'm saying is that an image of size 256 x 256 x 3 is isomorphic to another of size 196608

#

and you put that into the network and get another size

#

and that output vector of some size is isomorphic to some other n-dimensional array

#

so one easy way to think about the number of parameters at a given layer is to vectorize the input and output

#

then the number of parameters is something like N * M... plus another M, if there are biases

#

since the effect of a layer (before applying the activation function) is that of an affine transformation y = Ax + b, and A and b are the parameters

radiant forum
#

nice. In that case I would have expected a decrease in the output shape of the first convolutional layer as well as it happened to its parameters

wooden sail
#

and that happened indeed. 2D convolutional layers shrink the 2D axis of the image by roughly their own size

#

though the number of output slices in the image depends on the number of filters you use

glacial sparrow
#

any resources for embedding categorical variables in LSTM?

wheat snow
#

Heyo, i want to calculate e.g. 2 diffrent average values for the watchtime ("Duration")with pandas (from 2022-05-02 until the 2022-06-02) but i want it to be exact2 values... one for the first month( or the rest of the watcdata avaible for that month) and teh secodn should be all teh watchdata avaible in the second month

radiant forum
lapis sequoia
serene scaffold
#

!e

import json, pprint as pp
result = json.loads("""[{"rating":2351,"num_wins":2587,"num_losses":1916,"streak":-3,"drops":52,"timestamp":1656402092},{"rating":2357,"num_wins":2587,"num_losses":1915,"streak":-2,"drops":52,"timestamp":1656400992},{"rating":2363,"num_wins":2587,"num_losses":1914,"streak":-1,"drops":52,"timestamp":1656400031},{"rating":2369,"num_wins":2587,"num_losses":1913,"streak":4,"drops":52,"timestamp":1655825614},{"rating":2359,"num_wins":2586,"num_losses":1913,"streak":3,"drops":52,"timestamp":1655825282}]""")
pp.pprint(result)
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [{'drops': 52,
002 |   'num_losses': 1916,
003 |   'num_wins': 2587,
004 |   'rating': 2351,
005 |   'streak': -3,
006 |   'timestamp': 1656402092},
007 |  {'drops': 52,
008 |   'num_losses': 1915,
009 |   'num_wins': 2587,
010 |   'rating': 2357,
011 |   'streak': -2,
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/uyipusetir.txt?noredirect

serene scaffold
#

@lapis sequoia see?

lapis sequoia
#

but good to know for sure it's json

#

What's the best way to convert this into a pandas dataframe?

serene scaffold
arctic wedgeBOT
#
Certainly not.

No documentation found for the requested symbol.

serene scaffold
#

rip

#

!docs pandas.read_json

arctic wedgeBOT
#
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, ...)```
Convert a JSON string to pandas object.
serene scaffold
#

that one

lapis sequoia
#

Ok, I'll give that a try. I'm trying it out by calling the API directly in the meantime.

serene scaffold
#

I think

#

otherwise you can do pd.DataFrame(json.loads(...))

lapis sequoia
#

I have a list of top 10k players which I've changed into a dataframe, I want to create a new column in the dataframe('max_rating') which has the highest rating each player has ever had. I'm using the following code:

API = aoe.API()
df = pd.read_csv('AoE2_list_of_top_10000_players.csv')

for s in df['profile_id'].head():
    ratings_of_player = API.get_rating_history(profile_id=s)
    a = []
    for i in ratings_of_player:
        a.append(i.get('rating'))
    print(a)
    df['max_rating'] = max(a)

But whenever I check my dataframe after running this, every player seems to have the same exact highest rating(2560) which is obviously not correct. What am I doing wrong here? I want to append the max rating I've found separately to each player.

lapis sequoia
# serene scaffold that one

Ok, I tried it and it seems to be working pretty nicely. Probably won't need to use all the code I've written above if it works how I think it works.

#

I need help turning my file into tenserflow
Like I need help installing it, then turning it from python to tiffle
Ive tried all the videos, I just cant figure it out

lapis sequoia
#

I python file

#

into tiffle

#

I dont know how to do it

#

its easier to explain in vc

steady basalt
#

U mean TIF?

#

Why would u want to turn a py script into an image

lapis sequoia
#

@steady basalt sorry for late response

#

Into a TIFFLE

#

tesnsorflow

undone horizon
#

๐Ÿ‘

lapis sequoia
#

I'm trying to get a list of all AoE2 players(from https://aoe2.net/#api using this wrapper: https://github.com/sixP-NaraKa/aoe2net-api-wrapper/blob/main/docs/docs.md) and their highest/lowest ratings in the previous 2 years. I've tried so many things but I keep failing when I try to convert the Json data to a Pandas dataframe. It converts it into a dataframe where the first few columns have index and some other information which isn't important to me, and then one column inserts a dictionary which has all the important information I need but it's impossible to access because it's all in one column.

Any help would be awesome

shell panther
tropic niche
#

I've got a bunch of data that I'm collecting from various sources in a tabular format. The data is all similar but the tables don't always have the same columns. For example, one source may provide a column with a start value, and an end value and nothing else, other sources may provide some interim values. The ordering of the columns may also be different between tables. The rows are almost always in sorted order.

I'm wondering if there is a way to train ML model to determine column headings. Currently I need to manually open the data in excel look at it, and assign the correct heading and then enter the table into my system such that it can be processed, this is really annoying and time consuming work. The tables can have as few as 200 rows, and up to the tens of thousands, and there are typically about 6 to 12 columns. How would I go about structuring the data to train such a model?

wooden sail
#

are the columns labelled in the files?

tropic niche
#

Yes, mostly but the labels are not consistent. Data from different sources can have different labels for the same data.

wooden sail
#

aight you could look at several examples from your data to see if you can learn something about the statistical distribution of the data. the annoying part is that the files have different row sizes. you can either pad the rows or extract statistical params yourself. then train the network on randomly generated examples based on what you observed in the data.

tropic niche
#

The statistical distribution of the like columns should be similar regardless of the number of rows.

wooden sail
#

that was exactly my point ๐Ÿ˜›

lapis sequoia
#
for i in df.head()['profile_id']:
    all_ratings = []
    max_rating = 0
    min_rating = 0
    list_of_ratings = API.get_rating_history(profile_id=i)
    for i in list_of_ratings:
        all_ratings.append(i.get('rating'))
    max_rating = max(all_ratings)
    print(max_rating)
    df.loc[df['profile_id'] == i, 'max_rating'] = max_rating
#

What mistake am I making here? When I print df.head() I get max_rating values as NaN.

#

I want them to show the max ratings of the players

#

pls help ๐Ÿ˜ฆ

sour tide
#

Hi..so i got this cosine similarity matrix output from a python program. may i know how to do data classsificaiton on this like finding accuracy and all in terms of a specific data classfier which is SVM classifier

#

the output is like this..im putting link hia since i cant copy and paste my own output hia

runic lantern
#

Hi everyone! I am working on my first data science project and i am facing some trouble with identifying and dealing with outliers in my dataset

#

would love to learn how to deal with outliers!!

serene scaffold
#

Don't ask for an expert. Ask your actual question.

runic lantern
#

https://github.com/Sparsh-mahajan/House-Price-Prediction/blob/main/data_cleaning.ipynb here is the what i have been working with, i have a dataset with around 2.9k rows and 80 columns, I have dealt with missing values in the dataset and have plotted out boxplot, histplot and a scatterplot for each column vs the label ('SalePrice')

GitHub

Predicting house prices using the dataset from https://www.kaggle.com/datasets/prevek18/ames-housing - House-Price-Prediction/data_cleaning.ipynb at main ยท Sparsh-mahajan/House-Price-Prediction

#

so now do I manually find out outliers in each of the 80 columns and then remove those rows from the dataset? also have i been following the correct method in finding out the outliers?

steady basalt
#

Iโ€™ve google tensorflow tiffle canโ€™t see anything

#

guys am i tripping did i forget that tensorflow has a file type

#

We will also build the profile of the analyst profession more broadly across national policymakers and central government. This will include accreditation, training, career opportunities, status and pay to match. no fucking shot (UK, NHS)

#

how exactly would one obtain accredation

leaden nova
#

hi

#

is there a way to convert a unicode to utf-8 character for example &#8217; to '

#

in python

#

if we have in string

wooden sail
#

should be possible to use something like my_string.encode('utf8')

leaden nova
#

nope

bold timber
#

Anyone can explain to me why I get a plot like this?

long locust
#

resample will create the bins based on the cyclic data

bold timber
#

ok thank youu

lapis sequoia
#

I meant TF2 File

#

Stupid me

#

I need help turning my python file into a TF2 file

misty flint
#

ah i had a similar problem with dask but it is solvable, but i cant remember how i did it without looking it up and i have work rn - all i can say is it looks like youre close

steady basalt
#

@lapis sequoia can u explain what u mean because afaik tf2 is not a script file type? Do u mean like save model ?

#

You can save a .py file as a text file easily but Iโ€™m not sure what a tf2 file is

steady basalt
hollow sentinel
#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel
#

i don't understand why the model isn't pickled

#

unless i have to create the pickle file first and then run the code?

upper spindle
steady basalt
hollow sentinel
#

does anyone know?

#

it could be very slow to run

#

but idk

lapis sequoia
#

im trying to optimise my python file to get more fps

#

so im trying to turn it into a Tensorflow file

#

so that I can put it into deci playform

steady basalt
#

Frame per second?

#

Are you making a game?

#

Are u trolling?

#

I think ur attempting to save a model not convert a python script to a tensorflow โ€œfileโ€

lapis sequoia
#

its a script

#

that you run on a game

#

when I run it it doesnt give good FPS

#

so im trying to optimize the script

#

the only way I can figure out how to optimize it is by making it into a diffrent file one that is capatible with that one website

steady basalt
#

Are you doing computer vision in a game?

lapis sequoia
#

its easeir to explain in voice call

#

can u call?

steady basalt
#

No but I can read

lapis sequoia
#

ok so basicly

#

I have this script

#

You run it on a game

#

it gives poor FPS

steady basalt
#

U need to explain what the script is

lapis sequoia
#

it is a FOV hack for my game

steady basalt
#

Why do you run it on a game, how does that work?

#

Oh okay

#

And this has what to do with tensorflow?

lapis sequoia
#

I figured

#

of I made it until

#

into*

steady basalt
#

Tensorflow is a library that creates functions for u to do ML

#

itโ€™s not a file type

lapis sequoia
# lapis sequoia

I just want to make it one of theese files, onnz, tensoerflow, keras

#

because thats when i can optimize it

steady basalt
#

Those are for storing models

lapis sequoia
#

well how do I optimize it then

#

to get better fps

steady basalt
#

Is ur pc good

#

What game is it

#

Increasing ur fov in games is not the fov script that lags you but the game itself having to render more

lapis sequoia
#

my pc is good

#

its a fps game

#

sorta like fornite

#

it has the same applications as it

#

runs on unity engine

#

I just need help optimizing it

steady basalt
#

Making your fov script not python u wonโ€™t be able to run it

lapis sequoia
#

it injects itself

steady basalt
#

And what good is converting even language? Itโ€™s not the script itโ€™s your game

lapis sequoia
#

my freind was able to optimize his

steady basalt
#

Bro what?

#

Accuracy of what?

#

Your frames per second?

lapis sequoia
#

yes

#

nevermind

#

its hard to explain

steady basalt
#

Frames per second is not an accuracy

lapis sequoia
#

il figure it out on my own

steady basalt
#

Just to make it clear this isnโ€™t a ML task right?

lapis sequoia
steady basalt
#

Youโ€™re not data science?

#

Game dev?

lapis sequoia
#

yes

#

someone told me to come to this channe

#

for help

steady basalt
#

None of this will help you ur being trolled

lapis sequoia
#

My deadline is fucking today

steady basalt
#

Dude

#

Ur friend is trying to make u fail

#

Yikes

#

Improve your games efficiency at rendering

#

This task has nothing to do with ai

#

Ur friend scammed u heโ€™s not getting accuracy scores for this lmao

unique quail
#

how does pandas or matplotlib help in machine learnign

#

just crious

#

curious*

wooden sail
#

matplotlib is for plotting, and pandas can help you read files and check out what properties the data has. other than that, not much else. the actual ML is done with other tools and can be done entirely without those 2 libs

grand vapor
#

having an issue with pandas read_csv function, I'm trying to use certain columns of a csv, but I get an error that they are expected but not found.
attached is a screenshot of said csv and the columns I want to extract
here is the code I am running to try to accomplish this:

LORD3DM_100128XY = pd.read_csv(str(PATH100128XY) + "3DM.csv", skiprows=15, usecols=['X Accel [x8004]', 'Y Accel [x8004]', 'Z Accel [x8004]'])

I typically don't have any problems with doing this kind of thing, not sure what's happening here. figured this might be a good place to ask

sinful surge
#

Anyone know what this tensorflow error means? This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2

#

I dont think it has to do with my code

tidal bough
#

uhh, is it an error?

#

it's just saying your tensorflow binary is compiled with AVX and AVX2 support

steady basalt
#

Would anyone here wana teach me how to binary tree in python? Just the basics such as checking nodes and traversing

sinful surge
#

Oh

steady basalt
#

No matter how many tutorials I watch and can memorize key inputs i still donโ€™t full grasp it

#

Same for linked lists I get the theory but not oop

sinful surge
wheat snow
#

@untold bloom you still here?

untold bloom
#

yes, kind of :p

wheat snow
#

OHHHHH

untold bloom
#

how are you

wheat snow
#

wait

#

Im done with life

#

im stuck at something for days

untold bloom
#

:\

wheat snow
#

here

#

its been rumoring in my mind for days

#

i cant figure out how to do it

untold bloom
#

oh, sorry i didn't respond to that...

wheat snow
#

i want to calculate e.g. 2 diffrent average values for the watchtime ("Duration")with pandas (from 2022-05-02 until the 2022-06-02) but i want it to be exact2 values... one for the first month( or the rest of the watcdata avaible for that month) and teh secodn should be all teh watchdata avaible in the second month

untold bloom
#

i usually hang out in other channels and this has a lot of text in between and i tend to forget and not answer then

#

if possible, can you give me some sample input and expected output?

wheat snow
#

hmm yes

#
result=df_vd_E.groupby(df_vd_E["Start Time"].dt.date)["Duration"].sum()
result.index = pd.to_datetime(result.index)
b=(result.loc["2021-04-15": "2021-07-15"].dt.total_seconds()/60/60)

Month= b.mean()
print(Month)```
So, rn, it gives me the average of the time from 4-15 until 7-15 (one value) what i want is diffrent.... i wantg it to be the average of one month each, so it should be one average value( duration) for the rest of month 4 and then full average watchtime duration of the month 5 and so on
untold bloom
#

i see, thanks

wheat snow
#

btw

#

thank oyu so much for the help

untold bloom
#

yw

wheat snow
#

for my first pandas project, ig i really need some help

untold bloom
#

didn't help yet, though :p

wheat snow
#

but before you did

untold bloom
#

so as you said, the .mean() gives you a single number: the "global" mean

#

but you want it per month

#

whenever "per" shows up, we tend to go for .groupby

#

what will we group the data by in this case?

#

you want it per month so, month of the data

#

then we take action:

untold bloom
#
b.groupby(b.index.month).mean()
#

since the month information is at the index (right?), we reach it from there

wheat snow
#

the month thing refers to the datatype of it right?

untold bloom
#

uh, not quite

wheat snow
#

so .month automatticly knows that the -04- is a month?

untold bloom
#

yes

#

please observe what print(b.index.month) shows

#

b.index is a DateTime index; it has convenient attributes attached to it

#

.year, month, dayofweek, dayofyear...

untold bloom
#

for 12 months

wheat snow
untold bloom
#

:p

#

one caveat about the code above though:

#

we grouped by the month information only

#

so the year is ignored: any February day will be accounted for the mean of February

#

be it year 1998 data or year 2921 data

#

in your case, i guess this is fine

#

because you have only 2021 data in b

#

but in general...

#

you can do b.groupby([b.index.year, b.index.month]).mean()

#

groupby both year and month

#

so 1988's February and 2012's February are now signaling different groups.

#

before, they were falling into the same, February, group.

wheat snow
#

okay, ```
"2022-02-01": "2022-04-01"

#

Start Time
2 2.563807
3 2.003324
4 2.275278

#

so, how do i now do, that teh rest of the month, is counte din aswell

untold bloom
#

it is counted in yes

wheat snow
#

ah okay

untold bloom
#

i mean, however many days are in that month, they will be counted

#

be it 1 day or 30, 31

wheat snow
#

okie okie

#

plot looking good so far

bold timber
#

Can anyone help me? Why do I get the same color? How to use different colors in that plot?

wooden sail
#

i wouldn't say they're the same color, but they're pretty close. how about you remove the color parameter?

steady basalt
#

Those are some accurate predictions

#

Model?

bold timber
wooden sail
#

remove the 'r' and 'g' too? just to see what happens

wheat snow
#

@untold bloom ```
Rapha= df_vd_R['Duration'].dt.total_seconds()/60/60

over here i want to print an Integer of teh whole watchtime dsuration of one User, but the output prints every duration for that user... how do i define that i want the duration of some rows added together
bold timber
wooden sail
#

i wonder too tbh lol

lapis sequoia
#

Hello,
I have a piece of code but each time I run it it takes a long time. I'm guessing it takes a long time because it calls the API every time it runs and gets information from it.

Is there any way to speed it up so that it doesn't take a minute or two each time I run it?

wooden sail
# bold timber like this. But, it makes me wondering why I can't choose my color itself
lapis sequoia
#

Here is my code:

API = aoe.API()

df = pd.read_csv('AoE2_list_of_top_10000_players.csv')

df = df[['profile_id', 'name']]
df['max_rating'] = 0
df['min_rating'] = 0
df['date_of_max_rating'] = 0
df['date_of_min_rating'] = 0
df['difference_in_rating'] = 0


for i in df.head(100)['profile_id']:
    all_ratings = []
    max_rating = 0
    min_rating = 0
    list_of_ratings = API.get_rating_history(profile_id=i, count=5000)
    for j in list_of_ratings:
        if j.get('timestamp') > 1591036131:
            all_ratings.append(j.get('rating'))
        else:
            pass
    max_rating = max(all_ratings)
    min_rating = min(all_ratings)

    df.loc[df['profile_id'] == i, 'max_rating'] = max_rating
    df.loc[df['profile_id'] == i, 'min_rating'] = min_rating
    df.loc[df['profile_id'] == i, 'date_of_max_rating'] = dt.datetime.fromtimestamp(list_of_ratings[all_ratings.index(max_rating)].get('timestamp'))
    df.loc[df['profile_id'] == i, 'date_of_min_rating'] = dt.datetime.fromtimestamp(list_of_ratings[all_ratings.index(min_rating)].get('timestamp'))
    df.loc[df['profile_id'] == i, 'difference_in_rating'] = max_rating - min_rating

print(df.head(100))
steady basalt
#

Ladies and gentleman

#

I am proud to announce

#

I have inverted a binary tree

#

Thanks for all ur support

#

Iโ€™m ready to face interview

#

cant wait to apply data structures to uhh... dataframes..

untold bloom
#

not sure what the username column is called but let's say it's called "username"

#

let's first convert the durations to hours and then groupby

#

df.Duration.dt.total_seconds.div(3600).groupby(df["username"]).sum()

#

this gives you the total duration per username (in seconds)

#

which specific username you want, you can index into this to select it, e.g., above_thing.loc["user_1"]

untold bloom
#

if you only want a specific user, first filter the frame and then sum; no groupby is needed then

#

df.loc[df["username"].eq("user_1"), "Duration"].dt.total_seconds().div(3600).

misty flint
#

ahhhhhhhhhh

#

where is stel

#

hes probs busy

#

anyway my model doesnt fit even with aws lambda layers + s3

#

rip

#

so the alternative will have to be probably be putting the model and inference code into a docker container

#

then probs deploy using ECS or Fargate or something

#

more AWS services i do not know

wheat snow
#

@untold bloom it often tells me that 'function' object has no attribute .div

#

raceback (most recent call last):
23118 1.025278
23118 1.025278
23119 0.719444
23120 0.000556
23121 0.019444
Name: Duration, Length: 10293, dtype: float64

#

this ios output btw

#

somehow it prints more....

#

hmm

#

df_vd.loc[df_vd["Profile Name"].eq("Rapha"), "Duration"].dt.total_seconds()/60/60

wheat snow
hollow sentinel
#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

misty flint
misty flint
#

since he is the only one here who feels same way about aws as me

wheat snow
#

its my first data project you gotta know

#

never used pandas before

misty flint
hollow sentinel
#

this is a fun error message

#

federal-gov

#

what the fuck is federal-gov

#

oh i'm being stupiid

#

i never one hot encoded anything

#

that's why

misty flint
#

yep that will do it

wheat snow
#

@untold bloom ?

hollow sentinel
#

don't ping ppl asking for help

wheat snow
misty flint
wheat snow
misty flint
wheat snow
misty flint
#

ehh sometimes its topical chat

wheat snow
misty flint
#

we just get too many newbies

wheat snow
#

/Help

#

see

misty flint
#

Topical Chat/

#

see

wheat snow
#

ye, so it isnt prohobited to ask for help

last salmon
#

um guys I currently want to undertake a project that involves an ai classifying images shown to it and it getting better at doing so over time

#

how would i go about doing that?

#

all ik is show the ai data

#

train it

#

over time

#

and results

misty flint
#

otherwise you will have to wait till peeps are online

wheat snow
last salmon
#

can you guys help me

#

or do you need help like me lol

misty flint
#

hmm you should try to break down your problem into someone with no context can understand then

hollow sentinel
#

sometimes i've asked in help channels and then people who don't know pandas try to hop in and help

misty flint
misty flint
#

i have to go now

#

peace

wheat snow
#

@eternal trench

hollow sentinel
#
models = []

models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier())) 
models.append(('NB', GaussianNB())) 
models.append(('SVM', SVC()))
results = []
names = []

validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y,
    test_size=validation_size, random_state=seed)

import category_encoders as ce

encoder = ce.OneHotEncoder(cols=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 
                                 'race', 'sex', 'native-country'])

X_train = encoder.fit_transform(X_train)

X_test = encoder.transform(X_test).reshape(14)

for name, model in models:
  kfold = KFold(n_splits=10, random_state=seed, shuffle = True)
  cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring="accuracy")
  results.append(cv_results)
  names.append(name)
  msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
  print(msg)
#
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-bb069de3b56c> in <module>()
     22 X_train = encoder.fit_transform(X_train)
     23 
---> 24 X_test = encoder.transform(X_test).reshape(14)
     25 
     26 for name, model in models:

1 frames
/usr/local/lib/python3.7/dist-packages/category_encoders/utils.py in _check_transform_inputs(self, X)
    321         # then make sure that it is the right size
    322         if X.shape[1] != self._dim:
--> 323             raise ValueError(f'Unexpected input dimension {X.shape[1]}, expected {self._dim}')
    324 
    325     def _drop_invariants(self, X: pd.DataFrame, override_return_df: bool) -> Union[np.ndarray, pd.DataFrame]:

ValueError: Unexpected input dimension 108, expected 14
#

sigh

#

i have no clue what to do here

wheat snow
hollow sentinel
#

who is jock

#

yeah idk how to fix this

#

.reshape maybe?

untold bloom
#

if not, what's the error?

hollow sentinel
#

i'm stumped

#

maybe use .getdummies instead?

#

idk what to do here

untold bloom
hollow sentinel
#
(26048, 14)
(9769, 108)
untold bloom
#

so something bad happened after train_test_split and that point

#

because train_test_split won't mess up with the number of features (i.e., number of columns)

wooden sail
#

what's the shape of the original X?

hollow sentinel
#

i guess it's the encoder

#
(32561, 14)
untold bloom
#

my guess is: you already transformed X_test sometime before; now it's as if you're trying to transform again

#

are you working with JupyterLab/Notebook?

untold bloom
hollow sentinel
#

yeah

#

oh i see

#

yeah i noticed on github people were using .ipynb

#

so i decided to use google colab

wooden sail
#

hmm my impression is that x train transform does the transformation implicitly, but does not change the variable in place

#

so that it might not be necessary to encode x test at all

hollow sentinel
#

but if i don't do that i get a weirder error

untold bloom
#

no, X_train and X_test are different entities

wooden sail
#

i'm aware

untold bloom
#

if a transformation happened to X_train, that should happen to X_test as well

wooden sail
#

what error do you get if you don't transform x test?

hollow sentinel
#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel
untold bloom
#

an advice: in these Jupyter-like enviorenments, it helps to make new variables whenever possible (and when it makes sense) instead of assigning to the same name

hollow sentinel
#

bc these are categorical features

untold bloom
#

e.g., you could do X_test_encoded = encoder.transform(X_test) above and that error wouldn't have happened

#

similar for X_train_encoded = ...

hollow sentinel
#

i see

#

i still get that same error after making X_train_encoded and X_test_encoded

untold bloom
#

possible; X_test has already been transformed to have got 108 features; trying again to transform it will error

#

perhaps restart the kernel

wooden sail
#

i find it weird that the number of examples in x train and x test doesnt add up to the total in x, too

#

yeah, restart the kernel first

#

then show again the original sizes of x, xtest, and x train before any transformation is applied

hollow sentinel
#
models = []

models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier())) 
models.append(('NB', GaussianNB())) 
models.append(('SVM', SVC()))
results = []
names = []

validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y,
    test_size=validation_size, random_state=seed)

import category_encoders as ce

encoder = ce.OneHotEncoder(cols=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 
                                 'race', 'sex', 'native-country'])

X_train = encoder.fit_transform(X_train)

X_test = encoder.transform(X_test).reshape(14)

for name, model in models:
  kfold = KFold(n_splits=10, random_state=seed, shuffle = True)
  cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring="accuracy")
  results.append(cv_results)
  names.append(name)
  msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
  print(msg)
#

i don't see an X_test here

wooden sail
#

x train and x validation, then

#

where was x test defined, then, btw?

hollow sentinel
#

apparently nowhere

wooden sail
#

doesn't seem to come from splitting of the data

#

nice

hollow sentinel
#

maybe that's why it wasn't working

wooden sail
#

my best guess is they meant validation

hollow sentinel
#
(26048, 14)
#

i tried to print X_validation but nothing showed up

#

actually hold on

#

[6513 rows x 14 columns]

wooden sail
#

that makes sense

#

then the first dim of those two adds up to X's first dim

hollow sentinel
#

so then what's wrong with the one hot encoding

wooden sail
#

well, swap out the nonexistent variable with one that exists and see what error we get (i.e. x test <- x validation)

#

however, as nahita suggested, i strongly suggest you don't modify x train and x validation in place, but make another variable instead and use those

#

because as it turns out, jupyter is terrible for debugging

#

if you make changes in place, you might need to rerun the whole code from the beginning

hollow sentinel
#
models = []

models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier())) 
models.append(('NB', GaussianNB())) 
models.append(('SVM', SVC()))
results = []
names = []

validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y,
    test_size=validation_size, random_state=seed)

print(X_train.shape)
print(X_validation)

import category_encoders as ce

encoder = ce.OneHotEncoder(cols=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 
                                 'race', 'sex', 'native-country'])

X_train_encoded = encoder.fit_transform(X_train)

X_validation_encoded = encoder.transform(X_validation)


for name, model in models:
  kfold = KFold(n_splits=10, random_state=seed, shuffle = True)
  cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring="accuracy")
  results.append(cv_results)
  names.append(name)
  msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
  print(msg)
#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel
#

looks like one hot encoding didn't work

wooden sail
#

needed to use x_train_encoded below

wheat snow
#

i recieved help

#

btw @untold bloom it worked... your thing ( i was just missing some () somewhere)

hollow sentinel
#

omg it's WORKING

wooden sail
#

cool

hollow sentinel
#

why do people use .ipynbs on github

wooden sail
#

then the issue was wrong variable names (test ~ val) and running the code out of order (jupyter)

#

because github can render the notebooks and they might be "pretty to look at"

#

especially when some nice latex typesetting is used

#

but you'd never use jupyter notebooks for real work ๐Ÿ˜›

#

it's a nice display tool, not good for development nor deployment though

hollow sentinel
#

i like thonny

#

but i thought my github looked weird with everything as a .py file

wooden sail
#

right

hollow sentinel
#

i did see people showcase their eda projects with notebooks

#

jupyter notebook actually broke on my mac and i can't even reopen it anymore

wooden sail
#

yeah, so ideally you'd put all your nice modules into .py, and then make a slick demo in a jupyter notebook

hollow sentinel
#

it's been like that for months

#

i see

#

welp third project

wooden sail
#

congrats

hollow sentinel
#

i am slowly learning this stuff

#

project based learning works

#

even if it's just regression and classification

#

idk if it's enough to turn heads for a portfolio yet, but it's a start

wooden sail
#

i can't speak about portfolios, but yeah, motivation comes from within. that means if you find a nice thing you're interested in, you'll have the motivation to see it through. that's a big factor in learning: actually practicing what you're learning, and you won't practice it if you're not interested/motivated

hollow sentinel
#

i find it hard to come up with portfolio projects

#

but i'll get there

#

one step at a time

hoary breach
#

can anyone help me with a problem involving SpaCy?

primal shuttle
#

@hoary breach ask your question, don't ask to ask ๐Ÿ™‚

serene scaffold
primal shuttle
#

... ๐Ÿ˜‰

hoary breach
#

in spacy you can use similarity for some data

serene scaffold
primal shuttle
#

dontasktoask . com ๐Ÿ˜‰

#

There is that

hoary breach
#

I caught a snag... (AttributeError: 'str' object has no attribute 'similarity')

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

hoary breach
serene scaffold
#

what you've shown us is just the last line of a larger error message. the last line isn't very useful in itself.

arctic wedgeBOT
#

Hey @hoary breach!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
#

no one is going to download and run your notebook. please just copy and paste the relevant code.

hoary breach
#

train_data = nlp(data)

#

print(data['parsed_doc'][0].similarity(data['parsed_doc'][1]))

serene scaffold
#

so, data['parsed_doc'][0] is a string

primal shuttle
#
train_data = nlp(data)
print(data['parsed_doc'][0].similarity(data['parsed_doc'][1]))
serene scaffold
#

what type of object do you expect to have a similarity method?

#

or is it a function from some spacy module?

hoary breach
#

is is a Spacy object

#

import pandas as pd
data = pd.read_csv('panel_discussion.csv')

serene scaffold
#

do print(type(data['parsed_doc'][0])) and you'll see.
spacy is the name of the library. "spacy" is not a type of object.

hoary breach
#

it is a str class

primal shuttle
#

Yup

serene scaffold
#

what you mean is "it's an instance of str" or "it's a str". it's not a "str class". these distinctions matter.

primal shuttle
#

So you cannot compare strings in terms of similarity

hoary breach
#

similarity is a built in function in spacy

primal shuttle
#

Yes, that operates on vectors, not strings

hoary breach
#

parsed doc refers to the column of the data

#

Why is it then that they use the columns and can in fact calculate it.

#

Is it that once you print a dataframe you cannot print it and import the csv code and then use similarity?

#

relevant: tokens = []
lemma = []
pos = []
parsed_doc = []
col_to_parse = 'Q1'

for doc in nlp.pipe(data[col5_to_parse].astype('unicode').values, batch_size=1,
n_process=1):
if doc.has_annotation("DEP"):
parsed_doc.append(doc)
tokens.append([n.text for n in doc])
lemma.append([n.lemma_ for n in doc])
pos.append([n.pos_ for n in doc])
else:
# We want to make sure that the lists of parsed results have the
# same number of entries of the original Dataframe, so add some blanks in case the parse fails
tokens.append(None)
lemma.append(None)
pos.append(None)
data['parsed_doc'] = parsed_doc
data['comment_tokens'] = tokens
data['comment_lemma'] = lemma
data['pos_pos'] = pos

primal shuttle
#
relevant: tokens = []
lemma = []
pos = []
parsed_doc = [] 
col_to_parse = 'Q1'
col2_to_parse = 'Q2'
col3_to_parse = 'Q3'
col4_to_parse = 'Q4'
col5_to_parse = 'AddQ'
col6_to_parse = 'LastQ'


for doc in nlp.pipe(data[col5_to_parse].astype('unicode').values, batch_size=1,
                        n_process=1):
    if doc.has_annotation("DEP"):
        parseddoc.append(doc)
        tokens.append([n.text for n in doc])
        lemma.append([n.lemma for n in doc])
        pos.append([n.pos_ for n in doc])
    else:
        # We want to make sure that the lists of parsed results have the
        # same number of entries of the original Dataframe, so add some blanks in case the parse fails
        tokens.append(None)
        lemma.append(None)
        pos.append(None)
data['parsed_doc'] = parsed_doc
data['comment_tokens'] = tokens
data['comment_lemma'] = lemma
data['pos_pos'] = pos
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

hoary breach
#

sweet

#

i am in voice chat if someone cares to help more

#

so from my understanding the columns (like parsed doc) get appended to a pandas dataframe

#

I printed the data out and imported a csv.

#

is it that pandas dataframe presents data as a vector so that you can use similarity?

#

the example they provide is this

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

hoary breach
#
doc2 = nlp("How do I obtain a pet?")
doc1.similarity(doc2)```
serene scaffold
#

what would you like to say about this?

blissful bone
serene scaffold
blissful bone
#

Happy?

serene scaffold
hoary breach
#

when I run the code through the 'doc' class (posted above) I get <class 'spacy.tokens.doc.Doc'>

#

which is good! however my issue is i wanted to get multiple columns involved

#

so I parsed each part individually... but that results in printing a 'str'

hoary breach
#

I made a workaround instead but thanks for the help

hoary breach
#

๐Ÿ™‚

hollow sentinel
#

where can i get help w selenium?

serene scaffold
hollow sentinel
#

shit, i think i just scraped a website i wasn't supposed to

#

and proceeded to get banned

misty flint
misty flint
hollow sentinel
misty flint
#

dang

#

what if you used sleep() / wait()

hollow sentinel
#

no they use something called

#

imperva

#

which sounds like a harry potter spell but it's this thing that blocks scraperss

misty flint
#

it really does

hollow sentinel
#

yep

#

some companies don't love getting scraped

#

ESPN let me scrape them

misty flint
#

im surprised espn doesnt have an api

#

or do they

#

oh hey they do

#

you can just grab the data from here

#

fun data engineering times

hollow sentinel
#

idk how to do api requests

#

time to learn

misty flint
#

its okay you can usually google those and its a good skill to have

misty flint
#

like if you added the ability to work with APIs to your projects, that would def go a long way imo

#

since more and more places require calling APIs for collecting data

#

nowadays

hollow sentinel
misty flint
#

that looks pretty comprehensive