#data-science-and-ml

1 messages · Page 306 of 1

jolly ginkgo
#

try plotly

#

and seaborn

lapis sequoia
#

I'm just asking what "mat" stands for rooThink1

vale crown
#

I think it stands for "Matlab", but not sure. Matplotlib looks like Matlab plotting system and has created to resemble matlab

lapis sequoia
#

Sounds correct rooYes Matrix labratory bearStudy

grave frost
#

uh-huh

#

is matplotlib short for math or matrix pithink
Does it honestly matter?

lapis sequoia
#

I enjoy etymology ruwu

mint obsidian
#

hey um my question wasn't answered in help so I figured maybe I could ask here?
so I'm new... if there was 5 columns for each person, is there a way to add a new column that counts how many columns were filled for each person?
considering the dataset comes from Excel that is

dapper halo
#

So for missing data....is there a way to have a sort of initialization layer that will deactivate an input neuron based on a specific value?

sharp turret
#

Hey fellows, if I were to run kmeans.predict() on a really large dataset, would it be faster to give it one huge numpy array, or a series of smaller numpy arrays?

mint obsidian
dapper halo
# mint obsidian I'm not sure if I understood you correctly... as I said I just started on Python...

The second sentence was my own question haha, so ignore that.

My suggestion for you was to just to sum the values across each row. Maybe normalize them or something by their own values if you just want a total count. I'm sure there are better ways to do it. Also not sure what your missing values are labeled as. Whether or not this is the appropriate channel to ask that question...i have no idea haha.

exotic maple
velvet thorn
#

show a sample of your data please

velvet thorn
#

well

#

because the "mat" comes from MATLAB

#

so, yes, matrix laboratory

exotic maple
mint obsidian
# velvet thorn yes it is

I'm not sure how to do that but in case it helps, each row is a person with 43 columns, 5 of them are diagnoses (because max diagnoses in this data set is 5)
I just want to know how many diagnoses a person has (how many of the 5 columns are filled)

#

thanks for taking the time to answer btw

velvet thorn
#

like null

mint obsidian
#

do I just send a screenshot? but the data is confidential

#

yes

velvet thorn
#

for two reaons

#
  1. confidential
#
  1. screenshots in general are bad because hard to read + non-reproducible
#

data as text is better

#

anyway

#

I'm assuming

#

you're working with your data in Python

#

using pandas

#

is that correct?

mint obsidian
#

haha I'm not sure how to do that yet, I just started Python like 3 days ago

velvet thorn
#

or openpyxl?

mint obsidian
#

yes and numpty

velvet thorn
#

okay

#

what's your dataframe called?

mint obsidian
#

i just called it excel lol

velvet thorn
#

excel.notna().sum(axis=1) should get you the column you want

velvet thorn
#

fundamentals are important

mint obsidian
#

I know

velvet thorn
mint obsidian
#

but the data set is very messy, I had to split the diagnoses from a single cell separated with either comma or ;

#

my colleagues work with only excel and that doesnt cut it so....

#

anyway, wouldnt the code u gave me count also the other columns that are not diagnoses?

velvet thorn
#

that counts the number of non-null values in each row

mint obsidian
#

there are 43 columns in each row, only 5 of them are diagnoses

velvet thorn
#

oh

#

well

#

then subset the DataFrame first

mint obsidian
#

I want to know how many of the 5 is filled

#

yeah I figured

#

thank you for the pointers

velvet thorn
#

yw

#

feel free to ask further questions

mint obsidian
#

and sorry for that guy who didnt get his question answered yet

sharp turret
#

Hahah it's alright bro I figured it out experiementally

lavish tundra
#

if u are a expert about pandas help me in #help-ramen pls
its a really hard problem

dapper halo
#

typically you standardize data so if the domains of each input are different, they'll still be weighted equally. But if you want certain features to have a higher weight..is it common practice to maybe standardized particular features but leave others as raw? Or is this an inappropriate way to deal with this?

dapper halo
# velvet thorn why do you want that

My data physically has a higher dependency on certain parameters. Although thinking about it more shouldnt matter and should standardize them all. Idk not much forward progress has been made in a couple weeks and im just tryna think of stuff I could try and see if it potentially benefits my model. Still know very little with ML in general...so kinda the whole..if you know nothing everything is fair game whether its valid or not haha

velvet thorn
#

if you already knew

#

how to weight your parameters

dapper halo
velvet thorn
#

you wouldn't be building a model, right

dapper halo
#

this is true.

floral mantle
#

In pandas dataframe I have a huge DF and my data has multiple formats for name. I want to grab the first instance and drop the rest...

Ex:
ID Name
1 Sean S
1 Sean—
1 Sean
2 Bob

#

Not sure how and it’s killing me. I’d do MAXIF in excel

velvet thorn
floral mantle
#

Honestly I don’t care which one I grab

velvet thorn
#

like

floral mantle
#

I just can’t dump duplicates unless I pick one

velvet thorn
#

based on that

#

what do you want the result to be?

#

deduplicate based on the ID column?

floral mantle
#

I could drop the name column and deduplicate but then I need to bring a name back for reference

#

Without réduplications if this makes sense

floral mantle
#

So
1 Sean S
2 Bob

Would be a good result

velvet thorn
#

okay

#

df.drop_duplicates(subset=['ID'])

floral mantle
#

Ok

#

So not sure how that works but yeah

#

That did what I wanted

#

That’s awesome. Thanks @velvet thorn

#

Pandas has some weird voodoo commands

velvet thorn
dapper halo
#

built for convenience

velvet thorn
#

but whether rows are duplicated

#

will be determined solely

#

by the values in the columns in subset

#

which is just ID

floral mantle
#

Oh that’s genius. So basically it didn’t care about the name at all

velvet thorn
#

so of each group of rows with the same ID value, the first will be taken

floral mantle
#

And it just said screw you — take the first one you dumb python guy

#

I love it

velvet thorn
#

I would suggest you look @ the documentation for that function

#

plenty of options

floral mantle
#

Definitely will. I’m rewriting some long processes for my team to run more consistently

#

Getting out of crazy excel sheets

#

That was really helpful. I’ll dig into the docs more but needed this one asap

#

👍

lavish tundra
#

i only need to figure out now how to replace a str value in the dict's

brittle wing
#

is it possible to quantize a tensorflow(keras) model without converting to tflite

floral mantle
#

How about a pd function to add a column for earliest date?

Example: dataframe with all invoices paid by an account. I want the earliest since that’s customer start date.

Using:
loc now to filter to active accounts only, then .groupby(account ID) .agg(invoice date, min)

#

Sorry for the formatting... on mobile

stuck socket
#

sean

#

@floral mantle r u there?

floral mantle
#

@stuck socket yes

stuck socket
#

how would u do to grab numbers from a column in this way: 1,2,3,4--2,3,4,5--3,4,5,6.. etc

worn bough
#

If the basic Series is 1,2,3,4 you can just do series+1

#

If you want to shift the Series, you can use series.shift(1)

slim jackal
#

Is anyone a data analyst here?

#

I need some help.

velvet thorn
#

unless you’re asking how to combine the two

#

in which case .merge

tropic junco
#

HOW DO I MAKE A CHAT BOT

#

@scarlet wasp

#

i need help

grave frost
tropic junco
#

ok

austere swift
#

I can't seem to understand why they would use torch.gather

#

also

#

the code just doesnt work

#

brings up all sorts of cuda errors

grave frost
austere swift
#

lol yeah i've used pytorch for a while now but I'm just now getting into more advanced NLP concepts

#

like attention and stuff

grave frost
#

but seriously, are you not using lightning?

austere swift
#

for some reason i just don't really like lightning that much tbh

#

plus i'm more used to normal pytorch

#

but i don't understand why they would give a tutorial code that literally just doesn't work lol

grave frost
#

wheres the error BTW?

#

oof, that's so programming heavy

austere swift
#

well with the original function they gave me it had the error RuntimeError: Size does not match at dimension 0 expected index [32, 1] to be smaller than src [1, 32] apart from dimension 1

#

I did a bunch of stuff to fix it but then I gave up and went back to the original one cus the stuff i did to fix it just ended up giving me a ton of cuda errors

#

theres also the issue that if i try to print any of those tensors my pc blue screens entirely

#

so i can't even visualize what it's doing :)

#

I don't know why it's doing that though

#

very weird

austere swift
#

its on the gather function

grave frost
#

for CUDA errors, switch to CPU to have a clearer traceback 🤷

grave frost
#

basically, your guess is as good as mine - you passed out of bound index to the index arg of torch.gather

#

S.O has a really nice explanation, if you want to understand torch.gather

scenic wedge
#

Hey guys, im using a keras deeply connected net and when i convert my labels to categorical i keep getting the error "TypeError: 'NoneType' object is not callable" when i run model.fit(). Anyone know why this is happening?

ripe forge
#

Full trace back? Sounds like something that shouldn't be a None is a None

scenic wedge
#

this happens only when i change the labels to categorical and the loss function to categorical crossentropy

#

Oh, nevermind its fixed somehow after i compiled the model again. Thanks anyways!

misty thicket
#

who good with pandas?

serene scaffold
#

Asking the actual question is going to be a lot faster than asking who knows about the topic of an unasked question.

misty thicket
#

like I need constant help

#

VC?

serene scaffold
#

Can you isolate a specific question for the moment?

misty thicket
serene scaffold
misty thicket
#

got its unique values too

serene scaffold
misty thicket
#

like I wanna asign a unique number to that unique value

serene scaffold
misty thicket
#

but for the same value it should be same

serene scaffold
#

alright. so make a dictionary mapping the unique values to numbers with enumerate, and then use the .replace method

misty thicket
#

thanks a lot

serene scaffold
#

No problem

misty thicket
serene scaffold
misty thicket
#

like

#

I wanna convert every single one to a number

#

but

#

abcd exists twice

#

will

#

the .replace convert it too?

serene scaffold
#

it would replace both instances of abcd with the same value

#

it just looks up abcd in the dict that you provide

misty thicket
#

can you give a code for like this example please

#

if you can*

serene scaffold
#
unique_vals = df['column'].unique()
num_mapping = dict(enumerate(unique_vals))
#

This is the first part

misty thicket
#
['M Chinnaswamy Stadium' 'Punjab Cricket Association Stadium, Mohali'
 'Feroz Shah Kotla' 'Eden Gardens' 'Wankhede Stadium'
 'Sawai Mansingh Stadium' 'Rajiv Gandhi International Stadium, Uppal'
 'MA Chidambaram Stadium, Chepauk' 'Dr DY Patil Sports Academy' 'Newlands'
 "St George's Park" 'Kingsmead' 'SuperSport Park' 'Buffalo Park'
 'New Wanderers Stadium' 'De Beers Diamond Oval' 'OUTsurance Oval'
 'Brabourne Stadium' 'Sardar Patel Stadium, Motera' 'Barabati Stadium'
 'Vidarbha Cricket Association Stadium, Jamtha'
 'Himachal Pradesh Cricket Association Stadium' 'Nehru Stadium'
 'Holkar Cricket Stadium'
 'Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium'
 'Subrata Roy Sahara Stadium'
 'Shaheed Veer Narayan Singh International Stadium'
 'JSCA International Stadium Complex' 'Sheikh Zayed Stadium'
 'Sharjah Cricket Stadium' 'Dubai International Cricket Stadium'
 'Maharashtra Cricket Association Stadium'
 'Punjab Cricket Association IS Bindra Stadium, Mohali'
 'Saurashtra Cricket Association Stadium' 'Green Park'
 'M.Chinnaswamy Stadium' 'MA Chidambaram Stadium' 'Arun Jaitley Stadium'
 'Rajiv Gandhi International Stadium'
 'Punjab Cricket Association IS Bindra Stadium'
 'MA Chidambaram Stadium, Chepauk, Chennai' 'Wankhede Stadium, Mumbai']```
serene scaffold
#

Did you look at the docs for .replace?

misty thicket
serene scaffold
#

do you understand what dict(enumerate(unique_vals)) does?

misty thicket
#

but it converts to dict I see

#

I know what dict does

serene scaffold
#

!e

letters = 'abcdefg'
stuff = list(enumerate(letters))
print(stuff)
print(dict(stuff))
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g')]
002 | {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g'}
serene scaffold
#

@misty thicket enumerate gives you tuples of ints and items from whatever iterable you pass to it

serene scaffold
#

and you can put those in a dict as key-value pairs

misty thicket
#

and it converts to dict by dict method

#

yea ok got it thanks

serene scaffold
#

it's more a function than it is a method

misty thicket
#

ok

misty thicket
#

then I use this right?

serene scaffold
#

It occurs to me that I may have given you backwards instructions

misty thicket
#

😐

serene scaffold
#
unique_vals = df['column'].unique()
num_mapping = dict(v, k for k, v in enumerate(unique_vals))
#

it's a small fix.

serene scaffold
misty thicket
#

like there are 5-6 columns

serene scaffold
misty thicket
#

yea

#

same numbers

serene scaffold
#

most data frame methods return copies

#

you have to specify when you want to change a dataframe in place

misty thicket
serene scaffold
#
>>> a = pd.DataFrame([[1, 2], [3, 4]])
>>> a
   0  1
0  1  2
1  3  4
>>> b = pd.DataFrame([[5, 6], [7, 9]])
>>> b
   0  1
0  5  6
1  7  9
>>> a.add(b)  # returns a new dataframe, does NOT change a
    0   1
0   6   8
1  10  13
>>> a  # a is the same
   0  1
0  1  2
1  3  4
#

most dataframe methods are the same way. it does whatever change you wanted to a copy rather than changing the one that you already have.

misty thicket
#

ok nvm I'll figure it out

#

thanks for the help bud

lavish tundra
#

i have one pkl file like the img
i'm trying to replace the values inside the dict from "'s" to "s", but the values are dicts and i need to keep the type as dict
i know if i want change the keys i can use:

di['Name'] = di['Name'].map(lambda d: {k if k != 'EN-US' else 'en': v for k, v in d.items()})

but how about change the values?

topaz obsidian
#

hiii!!

serene scaffold
lavish tundra
#

ye

#

but not only delete the apostrophes, but replace some words too

velvet thorn
#

answer my question

#

that time

#

why do you need to store as dict?

#

that's a bad data model IMO

lavish tundra
#

cause i need to read the languages(keys) of the items

velvet thorn
#

not a good reason

#

you can model the data differently

#

you should

#

paste

#

your data above

#

as text

lavish tundra
#

i dont did it with different columns cause i need to check for one word fast in all languages too

velvet thorn
#

not as a screenshot

velvet thorn
#

in any case

#

did you profile?

#

or are you prematurely optimising?

lavish tundra
#

i did a few of profiles

lavish tundra
grave frost
#

"bylat, this shit slaps"

#

They are killing us by the suspense. just publish the code already!!!

ember jungle
#

what is this called

empty patio
#

anyone knows how backup R modules without installing every time(I'm using a VPS ) ?

serene scaffold
empty patio
#

What if I had to do the same thing under python

#

does it has check sum authentication like R ?

lapis sequoia
#

Hey anyone knows how to combine kernel density estimator and naive bayes classifier?

uncut monolith
#

does anyone here has experience with dash an plotly? im using then for a physics assignment and getting some trouble

uncut barn
#

what are the ways that I can improve my solution when using k-means?

serene scaffold
#

You can install python libraries with pip and save a list of what you have installed

uncut barn
#

will look into that

lapis sequoia
#

Guys, after the great community interaction our AutoDataCleaner (https://pypi.org/project/AutoDataCleaner/) received, we figured out that it is time for a free web-based end-to-end ML service. Drop your CSV/Excel, choose what column to predict and it will take you through a drag-and-drop wizard which will end in having your new model, python code downloadable and FastAPI web-based server to test out your predictions!

The drag-and-drop wizard will contain all necessary steps for data cleaning, EDA, feature engineering, automatic model selection and a automatic hyper parameter optimization.

We will call this free service: var.blue (we already got the domain!)

Building something that ends up on a dusty shelf really sucks; that is why we would like to know the following?

  1. Is this something you would appreciate and use?
  2. If yes, what features would you like to see in var.blue.? This could be any statistical functions, specific data cleaning functions, data exploring practices, specific machine learning models. Literally anything that would make your ML project easier.
  3. If no, what ML service would you appreciate?

We are trying to build a go-to place for ML projects; a place for pros to get setup quickly and for beginners to explore and learn.

We are eager to hear from you!

Shout out to the people who supported AutoDataCleaner by their valuable feedback:
u/0x256
u/EvenMoreConfusedNow
u/browneyesays
u/jiejenn

If you would like to jump on board and help, please DM.

#

Your input would be really helpful

lapis sequoia
#

For what is useful ai?

fierce oracle
#

I begin AI today and Bellman algorithm burnt my brain 😂😂 I wasn't ready for this

robust charm
#

Hi, Can anyone help with this? I have created a CNN model that detects a certain object. I now want to test the model with random images with the object in the picture. When the CNN has detected the object I would like to draw a rectangle at the location. Could someone point me in the right direction.

#

So far Im reading about cascades using CV

ripe forge
#

" I have created a CNN model that detects a certain object." did you train a classifier or an object detection model?

robust charm
#

A classifier

serene scaffold
flint mason
#

How can we parse java script using beautiful soup that we returned from an API

serene scaffold
flint mason
#

Umm its web scrapping in python for data analysis and ML

serene scaffold
flint mason
#

Yeah I got developers account and access as student developer

velvet thorn
#

depends on what kind of padding but

#

padding often means increasing the size of input data, adding zeros where necessary

#

masking is generally about removing certain (not necessarily contiguous) parts of data

primal tulip
#

I'm trying to gather data from a source while it streams, do some transformations, and stop the program if the source ends. I must be doing something wrong because I'm running out of memory.
How should I share the code? Is a screenshot ok?

exotic maple
primal tulip
#

I'm (trying) to use chunksize with Pandas' read_csv().

#

That is being passed as a module.

#

And called from this. The issue is at the While loop.

#

I'm not sure if I should change the open_csv() function to YIELD instead of RETURN. That way I could use a generator, but I'm not aware on how could I use it.

primal tulip
grave frost
serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

exotic maple
#

no. There are packages for Async, but in general async is a "type" of programming. Batch request is what the name says, instead of requsting all data, you request N responses, hold, and then continue

primal tulip
#

Thank you man I'll use it from now on.

exotic maple
#

you're running out of emmory reading a CSV with pandas?

#

what sthe filesize lol

#

ive opened 12GB no problem

#

I thought you said you were requesting from an API

#

@primal tulip take a look at this article from KDNuggets

#

looks like your same problem

primal tulip
velvet thorn
#

like

#

subsetting with a mask

#

but yes, in general I would say that is more correct

mystic turtle
#

hi guys im new here, currently im doing my assignment and i faced this error, need some help to solve it

#

im doing nlp with textblob, but the error showing that is a name error with textblob

#

NameError: name 'TextBlob' is not defined

#

i had import the libraries textblob

#

please help me with this, thanks in advance

velvet thorn
#

from the textblob library

mystic turtle
#

from textblob import TextBlob

#

this is the code right?i had done it

velvet thorn
#

not as a screenshot

arctic wedgeBOT
#

Hey @mystic turtle!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

mystic turtle
#

!code-blocks

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

mystic turtle
#
import textblob as Textblob

pol = lambda x: TextBlob(x).sentiment.polarity
sub = lambda x: TextBlob(x).sentiment.subjectivity

df1[b'Subjectivity'] = df1[b'comment'].apply(sub)
df1[b'Polarity'] = df1[b'comment'].apply(pol)
velvet thorn
#

do you know what as does when importing?

ionic drum
#

hey can someone help me out real quick with a program I'm trying to do?

velvet thorn
#

I'm p sure you want from textblob import TextBlob

ionic drum
#

I need to calculate something from info in a file.

velvet thorn
#

and someone will get around to it, or not

#

(hopefully the former) 😉

mystic turtle
mystic turtle
ionic drum
#
AGE VEF HT SEX SMOKE
11 3.2220 72.0 1 0
10 2.5920 65.0 1 0
13 3.1930 70.0 1 0
11 1.6940 60.0 1 1
14 3.9570 72.0 1 1
11 2.3460 59.0 0 0
13 4.7890 69.0 1 1 

#

ok I have this data and I need to calculate the average of the 3rd column for those who are smoke=1 and smoke = 0

velvet thorn
#

you focus more on your Python basics

#

I mean

#

I get that machine learning is fun but

#

fundamentals are important

#

and how to import is REALLY fundamental

#

is that a DataFrame or what

ionic drum
#

yeah I sent nothing haha

#

sent by accident

velvet thorn
#

so

#

yes?

ionic drum
#

yes

velvet thorn
#

okay

#

that's quite a simple question

#

so instead of an answer

#

I'm going to give you a hint

#

you need a groupby

fickle tinsel
#

hello

velvet thorn
#

do you know what groupby is?

fickle tinsel
#

i need help

velvet thorn
ionic drum
#

nope but actually my prof said that we should do it just by using basic functions

fickle tinsel
#

AUC area under the curve

mystic turtle
velvet thorn
#

on your own time

mystic turtle
#

this is where i can't catch up with the syllabus

ionic drum
#

it's a practice example

velvet thorn
velvet thorn
#

you can also do that with filtering and aggregation

#

do you know how to filter a DataFrame?

ionic drum
#

basically I can use the open/close stuff but I only use if/while/for, etx

#

etc

#

this is what I have

mystic turtle
#

thanks for the advice, i will try as much as i can to improve my basics

velvet thorn
#

huh

velvet thorn
fickle tinsel
velvet thorn
#

you said it was a DataFrame?

#

like a pandas DataFrame?

ionic drum
#

nonono

velvet thorn
ionic drum
#

wait

fickle tinsel
#

I am having troubling with the formula

#

oh

#

sorry

ionic drum
#

ok now it's good

velvet thorn
#

are you having

#

do you understand the concept?

#

or do you have questions about it too

ionic drum
#
    return open(path,'rb').read().decode('utf-8')

#Fonction écrire dans le fichier
def writeFile(path,texte):
    f=open(path,'wb')
    f.write(texte.encode('utf-8'))
    f.close()

#Découpe en lignes texte dans fichier
def decouperEnLignes(contenu):
    lignes = contenu.split('\n')
    if lignes[-1] =='':
        lignes.pop()
    return lignes


path=input("Insert path of the data file.")```
#

this is what I have

velvet thorn
fickle tinsel
#

basically you need to loop an two set of array

ionic drum
#

nope

velvet thorn
ionic drum
#

aaaah

velvet thorn
#

and what you have is not a DataFrame

#

so

velvet thorn
#

but anyway

ionic drum
#

yeah sorry this is first year coding

velvet thorn
#

I'm going to assume

#

what you have

fickle tinsel
#

using a for loop to declare and initialize it

velvet thorn
#

is a list of lists

#

is that correct?

ionic drum
#

yes

velvet thorn
#

sure?

#

like it's okay to be not sure

ionic drum
#

yes I have list of lists

velvet thorn
#

okay

velvet thorn
ionic drum
#

yes

velvet thorn
#

have you written code already?

fickle tinsel
#

no

velvet thorn
#

okay

#

are you having trouble there?

fickle tinsel
#

I already created the list

ionic drum
#

here's the deal i'm in this class and I know how to do this but It uses methods not used in class

#

and they get all pissed

velvet thorn
#

with a simple loop

fickle tinsel
#

I got confused trying to lay down the logic

ionic drum
#

ok 🙂

velvet thorn
#

it's just

#

weird

#

for example...

#

!e

numbers = [3, 6, 1, 3]
accumulator = 0
count = 0

for number in numbers:
    accumulator += number
    count += 1

print(f'The mean is {accumulator / count}')
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

The mean is 3.25
velvet thorn
#

yes?

#

same concept

#

just scaled up because you also need to extract an element from the inner list

velvet thorn
ionic drum
#

ok I see

fickle tinsel
#

the formula

#

AUC

velvet thorn
fickle tinsel
#

that's the formula for Area under the curve

velvet thorn
#

what I'm asking is

#

okay maybe show the whole assignment or something

fickle tinsel
#

can I pm

#

or should I share screenshot

velvet thorn
#

uh

#

I'm guessing

#

wait

#

screenshot I guess

fickle tinsel
#

okay

velvet thorn
#

with your computer

velvet thorn
#

you know how to create two lists

#

right?

#

one for x and one for y

fickle tinsel
#

yes

#

i did that already using numpy

velvet thorn
#

okay

#

oh you're uspposed to use arrays

#

right

#

fair

velvet thorn
#

the delta-x array?

fickle tinsel
#

no

velvet thorn
fickle tinsel
#

i do know its the chage in x

modern phoenix
#

I have a pandas table that has a real number field 'success' and another field X that is an integer. I have a suspicion that success is correlated to X; is a scatter plot the best way to start with the hypothesis? Also, if there is a better channel for this, please let me know!

velvet thorn
#

so

velvet thorn
#

[x1 - x0, x2 - x1, x3 - x2...xn - x(n - 1)], right?

velvet thorn
#

but

fickle tinsel
#

yes

velvet thorn
#

what kind of correlation are you thinking of?

velvet thorn
#

so

#

how would you get

#

these two arrays?

#

[x1, x2, x3...xn]

#

and

modern phoenix
#

@velvet thorn I don't have a stats background, but I suspect the higher X is the worse success will be on average

velvet thorn
#

[x0, x1, x2...x(n - 1)]?

#

think about that

velvet thorn
#

that wasn't really what I meant

#

more like...

#

are you talking about linear correlation?

#

or nonlinear correlation

modern phoenix
#

I don't understand the difference. They are independent variables but I'm trying to understand what is affecting my 'success' score

modern phoenix
#

and so far in the 20+ cols, the thing that stands out is X (just eyeballing it)

fickle tinsel
velvet thorn
#

say you have two correlated variables

#

x and y

#

now, when x changes, y changes too

#

we can think of this roughly as "when x changes by an amount dx, y changes by an amount k * dx, on average"

#

where k is a fixed number.

velvet thorn
modern phoenix
#

it's most likely not like that

velvet thorn
#

if it is not, it's nonlinear

modern phoenix
#

success is generally 0 or 1

velvet thorn
#

but anyway, a scatterplot would do well

velvet thorn
modern phoenix
#

but sometimes 0.24 but that's rare

velvet thorn
#

it could be correlation

#

with the probability of success

modern phoenix
#

and I have a feeling if X is very high, like 35000 then success is probably going to be 0

velvet thorn
#

or something called the log-odds

#

anyway

#

so success

#

is categorical?

#

i.e. 1 or 0

modern phoenix
#

pretty much but as I mentioned it can be fractional

velvet thorn
#

why?

#

okay

#

never mind that

#

one simple thing you can do is

#

group by success value

#

and get the mean/median

#

of X

modern phoenix
#

think of a headshot kill, you either miss or kill but somtimes they can get off wounded

velvet thorn
#

try slicing

velvet thorn
#

ar ether

#

for success?

modern phoenix
#

not what I'm dealing with but will give you an idea

#

i.e, say anything > 0 is success

velvet thorn
fickle tinsel
velvet thorn
modern phoenix
#

no, success is in the range [0, 1]

velvet thorn
#

which suggests quantisation

fickle tinsel
velvet thorn
fickle tinsel
#

ok

modern phoenix
velvet thorn
#

0 or 1

#

i.e. round up

#

now

#

find the mean/median

#

of X

#

for each group

#

i.e. success = 0, and success > 0

#

that's a very quick and dirty way

modern phoenix
#

@velvet thorn you mean mean, median of X? the dependent variable?

velvet thorn
#

wait

#

I thought you said

#

success was the dependent variable

#

well anyway it doesn't matter

#

the idea is that

#

if the two groups

#

have a wildly different X value

#

that suggests that htere's ome relationship

#

either way

modern phoenix
velvet thorn
#

basically

#

you have a hypothesis

#

that variable A

#

affects variable B

#

therefore

#

B is the dependent variable

#

because it depends on A, yes?

#

and by extension A is the independent variable

#

for obvious reasons

#

in other words, you ask

#

"if I were to increase A by 10%, how would B change?"

#

and not the other way round

modern phoenix
#

thank you

#

trying a groupby on success > 0

#

@velvet thorn unfortunately mean, median don't really look different....

sour thunder
velvet thorn
#

sometimes it can be hard to eyeball

inland isle
#

i am learning ml from the past 2 months, shall i first clear the concepts of ml algos or shall i put more focus on the data manipulation (feature engineering) part?

royal lintel
#

Hey, anyone knows if it is underfitting and if so, how to fix it?

#

I can provide code samples if needed

primal tulip
royal lintel
#

I'm working on a dataset from kaggle - https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset and got something like this after some time

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

import tensorflow as tf
import pandas as pd
import keras
from sklearn.model_selection import train_test_split
import numpy as np

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

df = pd.read_csv('datasets/heart.csv')
train_df, test_df = train_test_split(df, test_size=0.3, random_state=42)

train_x = train_df.drop('output', axis=1)
train_x = train_x.astype('float32')
train_x /= 255.0

train_y = train_df.output.values
train_y = train_y.astype('float32')

test_x = test_df.drop('output', axis=1)
test_x = test_x.astype('float32')
test_x /= 255.0

test_y = test_df.output.values
test_y = test_y.astype('float32')

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(2, activation='sigmoid')
])

model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(),
    optimizer='adam',
    metrics=['accuracy']
)

model.fit(train_x, train_y, epochs=5, batch_size=2)
model.evaluate(test_x, test_y)```
#

Layers which you see now are just by trying various things (more, less denses/dropouts etc.) but overall it always showed about 60% accuracy at fitting data, and 70% at eval

grave frost
#

*slightly

royal lintel
#

any ideas on how to improve it? Idk if loss/denses are done properly and cant find any solutions. I assume data is not made wrongly since it works, but can be wrong

grave frost
#

lowering learning rate should help a bit - along with adjusting dense, dropout, conv etc.

royal lintel
#

Is it okay when training set accuracy varies but overall increases? like e. g. 50,40,60,55,70. Tutorials always show data acc that always increases 'lineary' in some way

ripe forge
#

How the test accuracy performing? That's the more important metric you should be looking at

grave frost
#

Id rather use k-fold for smaller datasets

delicate garnet
#

hi is somone good with streamlit?

glass cedar
#

I have a question about CNNs, I'm putting the image of a frog into the convolutional and pooling layers of a CNN, and I'm getting plotting the output after every layer. With the image on the right, I'm wondering how the CNN is able to make a determination that it's a frog on something that blurry

#

For nearly any image, by the time it goes through to the right most image, it's become an amorphous blob, and I'm not sure how training weights of the kernels will help make any identifications

tidal bough
#

The intermediate results of a neural network will generally not "mean" anything to humans.

burnt meadow
#

hey guys, wondering if it's possible to use data from an excel sheet as inputs and outputs in pytorch and how to do so

glass cedar
#

i think so, i've done it by saving it as a csv and reading it using pandas

#

then from pandas to pytorch should be trivial

burnt meadow
#

Thank you appreciate it

digital aurora
#

Hello people!!

#

Any data scientist available?

serene scaffold
digital aurora
#

Just wanted some guidance as a new Comer to this field

serene scaffold
digital aurora
#

Currently pursuing Engineering in computer science.

serene scaffold
digital aurora
#

Currently in 1st year. I know python language completely.

digital aurora
#

Although I have started studying statistics for data science from online resources.

#

Currently I am studying normal distribution in statistics.

serene scaffold
#

Another thing you can do to get started is to get comfortable with numpy and pandas.

digital aurora
#

Well, I would jump to it a bit later once my stats portion is complete.

digital aurora
#

Like I know the complete basic python language.

#

Is it enough?

serene scaffold
digital aurora
#

I don't know anything about Data structures and algorithms...so do I need study them also?

ripe forge
#

Those are good to know in general

digital aurora
#

But I guess they are not so important for data science field

serene scaffold
glass cedar
#

i don't think you'd use them algos & dat structs directly in data science, but it wouldn't be good not to know them

serene scaffold
#

Understanding runtime complexity is key to understanding why, for example, you shouldn't keep appending a dataframe

digital aurora
#

See, data structures and algorithms are a part of my CS curriculum, I will have to study it..but if its not much used in Data science field, I won't put much stress over it then.

serene scaffold
ripe forge
#

Cs is used in data science. It's like.. Its like foundations for your work. You never really use those things directly, but rather build your understanding of things on top of the concepts you learn there

digital aurora
digital aurora
#

So as of now, I should start with numpy and pandas then?

serene scaffold
#

Those would be good to know, yes

ripe forge
#

Definitely

serene scaffold
#

If you have time after that, SQL is the other language you want to know.

#

Unlike R or Java, its use case doesn't really overlap with Python.

ripe forge
#

hears sql, breaks down and cries

serene scaffold
#

(it's not that bad)

ripe forge
#

. /sniff you promise?

serene scaffold
digital aurora
#

Firstly, let me study panda and numpy...

#

Will ask you for further guidance then..🙂

serene scaffold
#

The reason I throw in sql is that pandas is about working with tabular data that's in live memory

digital aurora
#

Btw, do companies hire data scientist after bachelor's?

#

Like I see people getting hired after their masters

ripe forge
#

You can, though you do need a bit of luck with these kinds of things. I suppose it would depend on your country too

digital aurora
#

I see!

glass cedar
#

i'd imagine having many projects and internships under your belt upon graduation would help your job prospects after bachelors

serene scaffold
#

The courses really weren't enough.

glass cedar
#

i'd consider taking linear algebra, multi-variate calc, probability, algos & dat structs, and machine learning as early as possible to be competitive for internships + give you time to work on projects

digital aurora
#

I see!!

prime vortex
#

anyone know how to use zobrist hash for chess game?

#

im confused

#

i mean i know the theory but idk how to implement it

grave frost
#

you guys forgot ML competetions too!

#

while they won't be much of a direct factor, most people in the industry now recognize the value of Kaggle. they won't care about your kernels, but saying things like ("top 2% out x-grand data scientists") is nice

stuck socket
#

sup

exotic maple
dapper halo
#

Coming in hot with a super dumb question. I know how I could reorder this, but I'm sure there's a one liner that i'd much rather have.

If I have a dataframe with values x,y,z and a presorted array [y,z,x]. Without using .index, is there a way to reorder the dataframe to match the presorted array?

lavish tundra
#

i'm trying to get closest values from a wrong typed string
i know i can do this

difflib.get_close_matches("mamoth", db['en'].astype(str))

tha problem is: i need to look for the closest's results not only in one column but in multiples columns, someone know how to do that?

stuck socket
#

lumi

lavish tundra
#

i solved it doing a loop

flint mason
#

I want to color a bar plot based based on a true or false value from a dataframe

#

When I use this I lost the names from the plot

exotic maple
dapper halo
dapper halo
#

idxx = [] for xx in range_idx.to_list(): idxx.append(ioncopy['ions'][ioncopy['ions']==xx].index[0]) ioncopy = ioncopy.sort_index(idxx)

exotic maple
dapper halo
#

oh geeze that copied bad...nvm squished screen. And I lied, three lines with list initialization

exotic maple
dapper halo
exotic maple
#

I always refer to the documentation first before trying to self-define stuff 😛 I mean it's always good to try a personal solution, but those in the doc are almost always optimize

flint mason
#
plt.bar(popular_crypto_data['Name'], popular_crypto_data['percent_change_24h'], label = 'percent_change_24h', color = 'g' if popular_crypto_data['positive'] else 'r')
#

can someone check this

dapper halo
exotic maple
#

Does anyone here confidently understand gradient descent?

I understand all the concepts and the overall intuition but for God's sake I'm always missing something.

As I get, it works like this:

A(i , j , k) = [ i , j , k]
where i, j, k are the dimensions / features / independent variables that form the vector A.

We get the partial derivative of A for each vector ala -> dA / di

Then for a randomly initialized value we calculate the gradient / derivative at the value. Then we move in a direction and calculate again.

that's usually where i stop before getting lost in steps >.<

grave frost
exotic maple
grave frost
rigid sundial
#

guys i need to make a desicion

#

I have to choose between data scicence and computer science

#

what should i chooose

#

which is better overall

velvet thorn
#

although that's not really a good question to ask

boreal summit
#

What's the difference between a depth dimension and spatial dimension in CNN?

exotic maple
#

Its not a question in itself. Im jiust not understanding "how" the gradient changes with each iteration proper

boreal summit
#

I was studying and couldn't find much online.

exotic maple
#

I understand the derivative, the learning rate, etc. But not the whole thing together, i think @velvet thorn

solar geyser
velvet thorn
#

g

#

you're talking about height/length/width

#

vs number of channels

#

okay so like

#

for an image

#

assuming it's greyscale

#

each pixel can be described by (x, y, v)

#

x-coordinate, y-coordinate, value (how black or white it is)

#

now

#

consider a standard RGB colour image

#

you still need one x-coordinate and one y-coordinate

#

but now you need 3 numbers for colour

#

R, G, B

boreal summit
#

...following

velvet thorn
#

so

#

in the colour dimension

#

you have 3 values that can vary independently, yes?

#

R, G, and B

#

and on the "position" side you have X and Y

#

which can also vary independently

#

so X and Y control "where" the pixel is

#

and RGB controls "what" it is

#

the canonical way to store such an image

#

is in an array with the shape (x, y, c)

#

such that for the pixel at (x, y) c is an array describing its colour

boreal summit
#

So the RGB filter is the depth dimension, while each color cordinate is the spatial dimension?

velvet thorn
#

because

#

when you pass an image through a convolutional layer (assuming 2D here)

#

the c axis changes size

#

in particular, it will have a length equal to the number of filters in the layer

boreal summit
#

okay man, thanks.

barren iris
#

Hello guys! Can anyone recommend a good package that can extract a table from an image (or an image inside a pdf?)

I'm have some old paper documents with tables that i have to turn into an excel, but they're very different from each other and it would take weeks to do it by hand.

Tabula-py works fantastic but it requires that the table is stored as tabulated text or something, it doesn't work with images.

soft dock
#

Not exactly a package but this is the first thing I would follow
https://github.com/jainammm/TableNet

GitHub

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images" - jainammm/TableNet

rotund dagger
#

is there a way to go through a csv and write each row to its own csv?

glacial monolith
rotund dagger
glacial monolith
rotund dagger
#

i will look at it. i havent encoutered it before

glacial monolith
#

I'm happy with it so far. It also supports multiple storage modes with an API very close to the native methods and file handling.

#

So the experience of using Dropbox or S3 feels like you're accessing a subdirectory of your project.

barren iris
rotund dagger
glacial monolith
# rotund dagger i will look at it. i havent encoutered it before

Anyway, you will just need to load in your source files (are they small enough that you don't have to worry about memory management?) and write a script to reorganize the data, then ship a new list off to a PyFS enabled method to build the new CSV files in the designated output folder.

rotund dagger
#

sounds accurate

#

yea the files are not large enough for memory trouble

glacial monolith
#

That makes things easier.

#

PyFS is kind of like a framework. Some of what it gives you isn't actually different from the core libraries, but it has some powerful utilities like walking and its optimized copy handling. In this case, it will give you conceptual structure for the conversion of data locked up in the file system into fully manipulable Python data structures.

#

That's the boring hard part, and the library takes care of a lot that you would have to figure out as you start to expand your idea of what you want to do.

rotund dagger
#

it sounds like a pretty powerful tool to leverage. from what im reading it should do what i need.

#

i will just need to try to implement it

glacial monolith
#

I'm using it on every Python project going forward that deals with files.

#

At the simplest level, you implement it in the same way as a vanilla file: with a context manager, where you use reading and writing commands on the open object to exchange data with instance variables inside your program.

rotund dagger
#

i just installed it now im playing around with it a bit.

#

they have a decent documentation for it it looks like

glacial monolith
rotund dagger
#

just once.

#

basically im doing a time series forcast on historical stock data.

#

i was in a group of three people in school, and they decided not to finish this assignment so im trying to complete it solo lol. got stuck on reading the data in becuase of the way the data is presented

arctic wedgeBOT
#

Hey @severe cloud!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

severe cloud
steady oxide
#

did you restart kernel?

severe cloud
#

never played with kernel

#

i am on linux kernel 5.8.0-44

ripe forge
#

Ah not that kernel. Confusing terminology but if you run jupyter notebooks or ipython repl, the place where the code runs and holds variables in memory is also called a kernel. Ipython kernel to be specific. Has no relation to Linux kernels*

#

So, first question to you would be, how are you running this code?

#

If I'm reading this correctly you're not dealing with kernels at all, just running python as a script ftom terminal

#

So the kernel suggestion doesn't apply to you

modern lion
#

I hope this is right forum for my question.. Do anyone got an example how to do 3D FFT from CSV?

raven knoll
#

Is it possible to do text sentiment with unsupervised learning? I need to do a sentiment analysis on a twitter tweets

restive spade
#

Yes

#

But you need to know the sentiment of tweets

#

So it's better to use supervised learning 😅

raven knoll
# restive spade So it's better to use supervised learning 😅

I agree, but my project is about scraping twitter tweets about a company and getting the sentiment in the tweets. It's hard to label the data.

I am new to data science but I really love it. If there is anyting I miss I like to learn. I haven't yet touched neural networks but that is my next topic

restive spade
#

🤔 I think you have to learn how to use neural networks first, then train a model with already labeled data (to be found on the Internet), then use it with your data.

raven knoll
#

That’s what I thought as well but the language of the tweets are in Dutch and there are little trained models for that language, but thank you for confirming my thoughts

primal tulip
grave frost
#

you can pre-train a model with the unsupervised data, and fine-tune on supervised to get the best accuracy. but you can fine-tune BERT tho, if your tweets are english

lapis sequoia
#

Hey anyone has a clue how to apply OneHotEncoder in the model?

#

(sklearn)

lapis sequoia
#

which page you think I am on 🤣

primal tulip
#

Then what's your actual question?

lapis sequoia
#

just confused on how to implement it in a model

#

okay let me make an example

#
  1. so I have my unique items in the columns, I transform them into a vector with 1,2,3,4 etc
  2. Let's say we are using... SVC (example)
#

how do I implement my encoder in the SVC?

raven knoll
lapis sequoia
#
X = [['Male', 1], ['Female', 3], ['Female', 2]]
enc.fit(X)```
#

so i pass this, but what's next?

#

Hi

grave frost
lapis sequoia
#

I want to ask that from where i can learn ai

grave frost
#

looks good too ^^^

primal tulip
# lapis sequoia so i pass this, but what's next?

I gave someone an example video on how to use the OneHotEncoder. I remember he was working with the Titanic dataset.
While I look for it, I'll tell you that you already have your data with the fit method, but if I recall correctly you need to transform it. Still if you call enc, you should have it properly encoded.

#

And just to make sure, you do OneHotEncoding in categorical data, not numerical values.

lapis sequoia
lapis sequoia
primal tulip
#

Got it. It's a bit lenghty but you can skip until he goes into the good stuff.
https://www.youtube.com/watch?v=irHhDMbw3xo

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this video, you'll learn how to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a...

▶ Play video
#

Go to minute 10 and on.

lapis sequoia
#

cheers, i'll check it out, yeah no worries will just do 2x

hard hound
#

Hey any tips on cleaning data

raven knoll
serene scaffold
late shell
#

I have this bollywood movie dataset, where each movie has a few features, like box office collection & stuff, and the target variable is the performance of the movie, namely 4 categories [flop, average, hit, super hit]. For this classification problem, how do I use the 1st column i.e movie name in a ML model (since it's string dtype)? I can't encode all of the movie names, right? so what do i do?

serene scaffold
late shell
#

umm it won't ig.. like it's no use for predicting the performance? idk, sorry, I'm still a noobie

#

im not sure abt that

serene scaffold
late shell
#

nice, but when I'm using the model to predict a bunch of movies, how will I know which performance prediction corresponds to which movie (I cant figure it out from the features , right.)?

serene scaffold
late shell
#

alright. thank you very much mate. 🙌

serene scaffold
late shell
#

The Tcollection?

serene scaffold
#

@late shell is this dataset on kaggle btw?

gentle sedge
#

Hey! How do i get my bar plot from matplotlib to show more? Right now the window ends almost instantly after the biggest bar.

#

I want it to give the bars a little bit of space from the top.

candid sable
#

Hi guys - what would the numbers on the right mean in an activation map?

floral nexus
late shell
#

I have a dataset that has a few nan values in it. im using python. I want to replace the nan values with the average of the value that lies one above & below the nan value. How can I do this? bcoz df.fillna() only provides 2 methods [ffill, bfill].

ripe forge
#

Maybe do the task in separate steps. One way is to create two temp variables. One with ffill one with bfill. Then subset them on places where value is nan, and average the two. Then assign back to the slots where nan was present

spark stag
grave frost
grave frost
stuck swallow
#

I trained an opencv cascade sheet to find among us characters. Is there any way to use this cascade sheet to generate images? I cant find any info on this online.

serene mural
grave frost
#

@iron basalt Finally got Hawkins' A thousand brains theory😌 (curse Brexit and world shipping) 🥳 🥳 I look forward to deep-diving fully into HTM in the coming months !!! 😎

ebon hound
#

is there an easy way to add a function on top of a scatter plot in matplotlib?

serene mural
#
import nltk
from nltk import word_tokenize
from nltk import FreqDist

my_text = input("Enter something: ")

cuss_words = #my list of cusswords, not putting here since its "vagour"

tokens = word_tokenize(my_text)
text = nltk.Text(tokens)

fdist = FreqDist(text)
``` So what I am trying to do, is enter a input, say "abc fuck" or "abc" or "abc sh$t" and have it detect cusswords and bypassed cusswords, so it learns what the "bypassed cusswords" can be in a correct context, how can I achieve this?
flint mason
#

is there a library for sentiment analysis of a text

rotund dagger
#

how can i add a date column to my dataframe where the date value in each row is set to the date stored on the csv file header.

#

basicly i have a bunch of csv files that are named "NSYE-Thursday-August-02-2018" and so on and so forth. and i want each row to have a date value that matches that header date. i would also add a day vaule to show the day of week.

late shell
#

I've been told that if I'm using the model for prediction only, then there is no need to get rid of multicollinearity? is that true?

ripe forge
#

i mean, if youre using it for just predictions, it must have been trained on the features that you no longer have control over in the first place.

#

so even thinking about changing features seems like a nonsensical conversation.

modern beacon
#

how can i easily write pcm data on an array? i want to write binary data using hearable sound but it all seems complicated

analog pike
#

Does anyone here have any suggestions for a good website to start with learning ai, tensorflow, decision trees and that sort of stuff. If it helps I'd say my skill level is between amateur and novice with python

#

The only ai I've built was a text generator using markov chain analysis

rotund dagger
analog pike
#

@rotund dagger how did you like it?

rotund dagger
#

i use it all the time. it was very well worth it.

analog pike
#

Is it similar at all to code academy because I really just don't like them

rotund dagger
#

i dont think so. i would say it is more like a youtube playlist but has more interaction. it provides resource files.

analog pike
#

Alright, i'll check it out

#

thanks

rotund dagger
#

np

#

i have a dataframe that looks like this but the company apears multiple times in the data frame.

#

i would like to make a dictionary with the key being the unique symbol. and the value being a dataframe with rows for each entry of the unique symbol from the original data frame.

#

for example: for symbol 'FCCY' i would like to add those entries to the values dataframe of the dictionary

exotic maple
#

"dict_name"[COMPANY] = Dataframe[dataframe["ticker"] == FCCY ]

#

basically

#

create an entry in the dictionary, and the value its the masked / resulting dataframe from filtering by ticker

#

now, if that's efficient, it's a different question...

rotund dagger
#

the end goal to is forecast stocks using holt winter time series.

#

the toughest time im having is importing in the data from all the csvs in such a way that i can use them in time series.

#

this was the only way i could think to do it, but im sure there is a more efficient way

rotund dagger
lapis sequoia
#

Always end up coding like that

#

then my whole kernel goes slow as hell

#

Sad

modern vine
#

Good night! What is the best AI Area & Framework to find certain patterns on HTML documents? In this case it would be bidding items

serene scaffold
#

What is a bidding item?

modern vine
modern vine
#

They're asking an AI for finding items in different HTMLs and then from these items suggest a product[

#

I already have a algorithm to suggest a product using spacy, but I need another AI to find these texts to suggest a product

serene scaffold
modern vine
#

Of course

#

These are the items

#

But not every HTML is like this

#

I want to convert to Python Objects in a list

lapis sequoia
#

what resources worked for you in learning ml?

exotic maple
#

It seems you're building a recommender system of sorts?

visual spear
#

how would i graph non-functions (matplotlib, numpy, sympy) from their equation inputted as a string?

#

like conic sections

#

example:
the input is

x^2/4+y^2/16=1

and it should graph an ellipse

velvet thorn
#

you need a step to parse the equation

#

and a way to decide what bounds you want

#

those are the main issues

#

sympy has a function for the former

#

but it uses eval

visual spear
#

okay

rotund dagger
#

i think is is happening becuase of the indexing. is there a way to fix it simply. the date on x column is not showing the dates but appears to be "binning" the index. each row doesnt have its own index

autumn basin
#

Just add an index

#

IMO that’s the simplest way

rotund dagger
#

i got it working.

#

i just reset the index

#

however i need to figure out how to make it display all of the values of x currently it shows 6 values of x out of 88 @autumn basin

whole mica
#

Hey guys! I was wondering if there are any places i can learn AI equations/algorithms that i can incorporate into my trading bot

primal tulip
grave frost
#

it should have been the reverse - integrating your model into a trading bot. doesn't make sense to make a bot that way (unless you are using some off-the-shelf financial strategy which wouldn't yield much)

lapis sequoia
#

I have like 30 columns and good luck finding which one is actually producing a good result

raven knoll
#

I'm still trying to start that dutch text-sentiment part of my project, but I currently cannot find a pre trained model or a labeled dataset.

I found one pre trained model but im new into machine learning and I don't really understand the code

grave frost
#

I found one pre trained model but im new into machine learning and I don't really understand the code
well, then do the basics then, or find some tutorial to teach transfer learning if you already know the basics

broken warren
#

Hello, i'd like to build an AI (Neuronal Network) to predict the 6. number of a given 5 number series. My current one (copied for the internet, but i do understand it) is able to do something like 10,20.30,40,50 but at 10,0,10,0,10 for eg. it fails miserably. Do u guys have any advise what i could do. I'm quite new to AI.

bronze skiff
#

how many training examples did you give it

hard hound
#

hey I was getting a error and it said It was unable to convert to float I searched for it on stack overflow but wasn't able to solve

#

ValueError: could not convert string to float: 'Biggin

#

Its a house price dataset with a lot of parameters

young dock
#

i have a dataframe that for some reason was missing a few indexes, how do I reorder it so it's fixed?

It goes 47774, 47775, and then, 47778

#

I want the 47778 to become 47776, and 47779 to become 47777

#

nvm i'm dumb

#

if an explanatory variable only increases the adjusted r-squared by 0.01, is it still worth including in the regression?

#

what if only increases it by 0.05?

daring peak
#

I am making a game which has 2 ais battle each other, I have coded the game (might change a few things) but here is the game and I was wondering what modules should I use or how do i get started with adding the ais? (if code is needed I'll provide)

bronze skiff
winged yew
#

anyone

winged yew
#

can anyone suggest me which would be better to deploy machine learning models ?
Django or Flask ?

grave frost
flint mason
#

do I need permission if I want to scrape data off linkedin ?

wicked mantle
#

torch.Size([10, 1, 28, 28]) in this shape 28, 28 is 28x28 pixel image
What is 10, 1? dimensions?

daring peak
serene mural
#

How hard would it be to make a chat bot?

uncut orbit
#

without ai its quite simple

#

with ai it'll take some more work

#

but you can use telegram and gpt 2

lapis sequoia
grave frost
cold mantle
#

my number classifier is not working, i upload an image, but it always says the image is a 2 or a 0

arctic wedgeBOT
#

Hey @cold mantle!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

Can anybody explain what we choose unit sizes of 16/32?

lapis sequoia
#

why

velvet thorn
#

what unit sizes

lapis sequoia
#

Like in a Dense layer

velvet thorn
#

ah

#

like number of neurons

lapis sequoia
#

Intuitively I understand we are getting the input vectors and shoving them into neurons

#

But why multiple neurons?

velvet thorn
#

hm

lapis sequoia
#

Arent they all doing the same thing?

velvet thorn
#

are you asking

#

why multiple neurons

lapis sequoia
#

Yes

velvet thorn
#

or why powers of 2

lapis sequoia
#

Why multiple neurons

velvet thorn
lapis sequoia
#

What decides the weight?

velvet thorn
#

in general, backpropagation of error

#

you can think of each neuron as learning a very limited aspect of the relationship between data and target

lapis sequoia
#

Yeah but what part of my code is distinguishing the weights?

#

It kinda just seems that I shove it in and get answers out

velvet thorn
lapis sequoia
#

Are there preferred weighting systems?

velvet thorn
#

it will seem like that

velvet thorn
velvet thorn
lapis sequoia
#

Like, are there common weight architectures people use for more accurate results?

velvet thorn
#

"model architecture" makes sense

#

so does "weight initialisation method", but I'm not really sure what you mean by "weight architecture"

lapis sequoia
#

Basically how much each neurons function is changing

velvet thorn
#

depending on how you mean that

#

you could be referring to learning rate

#

or optimiser

lapis sequoia
#

So if we have a linear neuron y = cx + b let's say

velvet thorn
#

(assuming we're still in the realm of gradient descent backpropagation)

lapis sequoia
#

The weights would be all the values of c and b

#

Right?

velvet thorn
#

well

#

that's an implementation detail; some use one single bias value per layer