#data-science-and-ml

1 messages · Page 295 of 1

twilit pilot
#

how long does it typically take to train an sklearn.svm model? My dataframe has about 26,000 rows and the X is a 26,000 1d np.arrays with a length of 2500 and the y is 26,000 labels. My regularization=1, gamma=auto, and kernel=rbf?

twin latch
#

Hey guys im reading sensor data from my serial ports, ive successfully read sensor data to variables but i dont know like how to append that sensor data to csv file, can anyone help me?

torpid scarab
#

hello. anyone knows any good book for logic programming and asp?

twin latch
#

Oh okay Thank you

misty flint
#

is there a website somewhere that translates all the library abbreviations

#

like a master list

eternal hare
#

could anyone help me set up torch xla on google colab, im using pytorch lightning
ive been having tons of issues and theres nothing on stackoverflow

uncut barn
#

Anyone know the answer to this?

lapis sequoia
#

Hello, I'm new in data science, are there any books you recommend to start with?

serene scaffold
misty flint
#

what would we do without the o'reilly series

serene scaffold
misty flint
#

true

#

they just have the time advantage

#

just like AWS with the cloud

#

well ig AWS has more features but thats also a function of their age

#

side note: our nlp project on contracts is on github. it would be really easy to create a docker image/container for it right?

#

my first time working with docker

fossil rivet
#

Hey I was wondering if anyone could explain a Hough Transform to me. I know it's not syntax, but it's a CV concept. I have a table that we're supposed to understand for an upcoming exam but still don't

grave frost
#

I heard that using docker is pretty difficult (mostly people complaining about cryptic errors)

#

Also a somewhat argumentative question - I only saw a very basic overview of capsule networks (from Hinton) but it does seem kind of like the hierarchy theory in HTM's and there were little credits toward Hawkins. can someone with deep knowledge explain the core difference b/w Hawkins and Hintons' approach?

fleet hare
# fossil rivet

Pretty sure that table represents the Hough parameters for the lines in the image. It's similar to the 2nd example on the wikipedia page. If you want more help I can hop in a voice chat and walk you through it but check out the wikipedia page first and see if you can figure it out https://en.wikipedia.org/wiki/Hough_transform#:~:text=The Hough transform is a,shapes by a voting procedure.

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are ob...

fossil rivet
#

Alright, will do. Thanks!

fossil rivet
empty sable
#

everyone here seems rather experienced, I have a question about python capabilities
if I had historical data on a price or level of a number
could I build an algo that could give me a rough estimate of the direction the price would go

serene scaffold
#

@empty sable what format is the data in?

exotic maple
grave frost
#

Look up stock prediction with LSTM on google. YOu would find plenty of tutorials to help you out

light stump
#

I’m trying to figure out how to use scipy.interpolate.RectBivariateSpline to perform a polynomial image warp between two halves of an image stored as a numpy array. Can someone help me figure out how to accomplish this by any chance?

proven plinth
#

You dont need a network, you could just use an autoregressive model to predict the series

#

They usually get decent results with less work than a RNN

#

You need to do all the basic data cleaning and normalization tho but you probably will have to do that anyway

#

Like range scaling, making the series stationary, removing seasonality etc

brisk moth
#

could someone help me with an RNN language model?

#

for sentiment analysis

empty sable
proven plinth
#

Stock prices are easier to get a hold of than sugar levels

serene scaffold
empty sable
proven plinth
#

Thats not enough data my guy

#

Im guessing

empty sable
proven plinth
#

If you go the RNN route youre gonna need years of data

serene scaffold
#

if it's enough, it might be good at predicting that person's BSLs 🤷‍♂️

empty sable
#

and then change it to sugar levels

#

if the stock one is successful

#

or at least adapt it

proven plinth
#

Blood sugar spikes in different ways per person btw, also you would probably also need inputs on their diet too

proven plinth
#

Stock prices behave better imo

empty sable
#

thats why I want to use their data speciffically

serene scaffold
empty sable
#

time of day, sugar level, day it happened. I could get more starting today

serene scaffold
#

if all you have are their blood sugar levels and timestamps, all you can really do is try to fit a curve for their blood sugar levels throughout the day

#

you don't know anything about what causes those blood sugar levels

#

you'd need data about their nutritional intake, I believe

#

though curve fitting is still good.

empty sable
#

its the relation of insulin to what they eat I believe

#

nutritonal intake, I could track that

fossil rivet
#

Is anyone going to help me or no

empty sable
#

what do you guys think of building one for stocks as a baseline

#

and adapting it to sugar levels

proven plinth
#

The stocks one will be easier

empty sable
serene scaffold
#

I don't really know how insulin works, as neither me nor any family members need it

proven plinth
#

And give you an idea on how to make things like this

serene scaffold
#

do you also get to know how much insulin was administered after each reading, or something?

empty sable
proven plinth
#

Insulin ratios differ from person to person

grave frost
#

what features would you use?

empty sable
serene scaffold
#

do they know what features are?

empty sable
grave frost
empty sable
grave frost
#

how would you track nutritional intakes (since you would take data on your own)

empty sable
#

just the main things from each meal, how big the meal was, calories, carbs, etc

grave frost
#

unless you would pester them every 10 minutes about what they ate and are going to eat

empty sable
#

I live them

#

with them

#

I could just note it, I help take care of them anyways

grave frost
#

cool enough

#

seems reasonable data that you model would converge

empty sable
#

the more data the more accurate generally right?

grave frost
#

yep

empty sable
#

ok I will start planning this out and reading up on this, thanks for letting me bounce some ideas off. do you mind if I ocasionally dm you? its cool if your busy

grave frost
#

cool, no problem

#

I would think that insulin level on its own would be a pretty strong feature

#

that with the nutritional intake would be good enough. but better collect all data you can

wild dome
#

I want to count the eggs in this image using OpenCV

#

what filters do I have to apply before using findContours? right now I just applied grayscale

grave frost
#

mix n match

wild dome
#

the empty spaces where there are no eggs are giving me trouble when detecting edges

marsh berry
#

Hey all, I've got a spreadsheet with lots of data in it and I want to create visuals (charts, graphs, etc) for it. Do you guys know if there is a list of general stuff I can make?

misty flint
#

how come when i call my dependencies, some of them look like this

Pillow @ file:///C:/ci/pillow_1615224175364/work

#

Jinja2 @ file:///tmp/build/80754af9/jinja2_1612213139570/work

#

instead of

grave frost
#

.....?

misty flint
#

PyPDF2==1.26.0

#

a specific version number

grave frost
#

what command?

misty flint
#

pip freeze

#

in a virtual env

grave frost
#

Its giving me the normal versions

misty flint
#

ig its just me then

grave frost
#

just a sec

#

Imma try on my server with conda

misty flint
#

ah

#

it is a conda env

#

so that might be it

empty sable
grave frost
#

@misty flint yea, conda gives some file paths

misty flint
#

thats a problem right?

#

especially when im making requirements.txt files?

#

or no

grave frost
#

prob preinstalled

#

I dunno

#

tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1609612933698/work

misty flint
#

hmm

grave frost
#

tqdm came preinstalled in my env

#

so I assume thats why there is a path

misty flint
#

i wonder how it will play when i try to throw it into a docker container

#

might have to specify a specific version

grave frost
#

make a new, clean conda env

#

if you really want the reproducibility

misty flint
#

i did with this one tho

#

or do you mean to make it without the paths

grave frost
#

no, just make a brand new one

fleet hare
misty flint
#

ah make a new requirements.txt gotcha

exotic maple
#

@misty flint did your model finish training?

simple shadow
#

hey i need help with a dataset
so all demographic info is in one column
how do i split the demographic info into different columns, like age group, gender, and etc

exotic maple
#

Yo're going to need some regex for hat

exotic maple
serene scaffold
#

would be nice to know what an entire row of the table looks like though

simple shadow
#

there are lots of rows

#

15840 rows

serene scaffold
#

right, but there's presumably a relatively low number of columns?

exotic maple
#

good luck cleaning that mess lol

#

but also, are those repeat instances?

#

I mean. does it have ALL attributes in the same row? or just one attribute or something

#

thats weird

simple shadow
#

ok i will take a screenshot of the entire row

exotic maple
#

that's... messy

#

you can keep it as it is

#

or you can create new columns for each type of breakout

#

but what will you do for empty ones?

#

NaN or 0?

#

it matters a lot for ages, for example

simple shadow
#

should i do mean of ages?

exotic maple
#

that's not what I mean

simple shadow
#

i mean for the empty ones, should i put the mean of age

exotic maple
simple shadow
#

ohh

exotic maple
#

you "could" set the mode of age-range as the fill-in value, but that's your call as researcher

#

and that's only for age. What about gender, race, etc.

simple shadow
#

race is a tricky situation

exotic maple
#

the problem is that break-out column has a lot of mixed info, so you need to decide what to do with all that data

#

and most importantly, what to do with missing data

simple shadow
#

should i keep it as is because race is tricky to do missing data for?

exotic maple
#

thats your call. you're the researcher

#

I'd keep it, but the most pressing issue for me is "wtf do i do with rows that do not have that info"

simple shadow
#

ohh

#

i looked, and those columns dont have missing data

#

i was wondering how do i make statisicial stuff with it if it's all mixed info? @exotic maple

exotic maple
#

I didnt explain myself properly

#

think it like this

#

you have a single column called "TYPES" that holds data of type: "Age", "gender", "race", etc.

On a normal DB each of those would be a single column in itself. In your DB, this is all ina single column, which means you haved mixed signals in a single feature.

Ideally, you want to separate that feature into multiple features that actually make sense (each one in their single column, as they are independent of each other), but if you do that, you will have missing data because 1 row can only have 1 of either type.

#

SO if you create an age column, you will NaN for all the rows where there is no age specified

simple shadow
#

ohhh

#

so i can't split the features into different column due to the missing info?

exotic maple
#

You can, in fact, you should, but you need to deal with the missing data

simple shadow
#

first can i split them into different columns and then deal with the missing data?

exotic maple
#

yes, that's what i would i do

simple shadow
#

thank you for your help!! @exotic maple

exotic maple
#

no prob.

#

I also used you to procrastinate and not work on this regex so :v

simple shadow
#

XDD

exotic maple
#

man pandas is so goddamn powerful. Even if you never do any data science, pandas itself is worth all the struggle

misty flint
#

i will die for pandas

undone heron
#

hey guys, I'm working on a df in pandas and after a merge one of the columns is coming in with NaN values.

def preprocess(x):
    df = pd.merge(df_gps, x, on=['bus_id', 'date_time'], how='left')
    df.dropna()
    df.to_csv("./mobility-dataset/merge_gps_translated_validation.csv", index=False)

reader = pd.read_csv("./mobility-dataset/translated_validation.csv", chunksize=10000)

futures = []
with  cf.ThreadPoolExecutor(max_workers=6) as exe:

    for r in reader:
        r['date_time'] = pd.to_datetime(r['date_time'], format='%Y-%m-%d %H:%M:%S')
        r['busline_id'] = r['busline_id'].astype('int32')
        r['bus_id'] = r['bus_id'].astype('int32')

        futures.append(exe.submit(preprocess, r))

cf.wait(fs=futures)

df = pd.read_csv('./mobility-dataset/merge_gps_translated_validation.csv', nrows=1000000)
df
#

here is the output and above is the merge code between two df

#

From the two df, busline_id is actually ok, int32 value and what not

#

Appreciate any thoughts on why the NaN is coming

serene scaffold
#

@undone heron I'm looking at this, but please add a py to your code sample

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

Please ping when you reply or I will probably not know that you replied.

undone heron
#

sure

#

pre process basically returns

serene scaffold
#

It just so happens that I'm still here

undone heron
#

Oh ok lul

serene scaffold
#

though for future reference, if I'm helping you, always ping when you've completed your response no matter what.

#

so what does it return?

undone heron
#

It returns the image below the code, basically that csv I'm writing @serene scaffold

serene scaffold
#

preprocess actually does not return anything

undone heron
#

Wait u mean ping you at the end everytime?

#

Well the at the end it is processing what I want, so the final result is the DF at the image

serene scaffold
#

Once you have a completed thought that you are ready for me to read and there are no more messages that you are going to type until I respond, ping.

#
def preprocess(x):
    df = pd.merge(df_gps, x, on=['bus_id', 'date_time'], how='left')
    df.dropna()
    df.to_csv("./mobility-dataset/merge_gps_translated_validation.csv", index=False)

Nothing is returned

#

(except, well None)

#

Depending on what exe.submit does, it may be that you don't need it to return something.

#

Note that saving something to disk is not the same as returning it.

undone heron
#

For sure but on that code, writing to csv is what I need at the end. How would u recommend for me to return something with that futures.append()? @serene scaffold

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

^ you can paste the CSVs there.

undone heron
#

Oh, nice 1 sec

serene scaffold
#

I think it is! and the second one is x?

undone heron
#

yep, it is the other csv I'm "merging" with @serene scaffold

serene scaffold
#

@undone heron is the problem that you're getting nans in busline_id?

#

and if so, how much do you know about the different types of joins you can do?

undone heron
#

Yep, that is the problem. I'm quite new to this whole thing so I don't know that much about the join types. I know the theory somewhat but maybe you have better insight. @serene scaffold

serene scaffold
#

for one thing, are all your datetimes on 2015-03-11, or is that just because of the rows you picked out for me?

#

Also, here are some of the different types of joins


right: use only keys from right frame.

outer: use union of keys from both frames.

inner: use intersection of keys from both frames.```
#

@undone heron do those types of joins make sense, or do you need me to explain them?

undone heron
#

That is some test data for just one day to decrease the amount of data I need to process. So it is all on the same date, yes.
Makes sense! @serene scaffold

undone heron
#

So... Maybe outer makes more sense here? @serene scaffold

serene scaffold
#

because "outer" is the union-like join.

undone heron
#

I see, but it feels like the busline_id nan thing is because it is not keeping the column after the merge properly, right? @serene scaffold

serene scaffold
undone heron
#

I know! I just don't understand why the column is not being kept, I'm not using it as a key for the merge... Why would it become NaN? @serene scaffold

serene scaffold
#

the way that join operations work. If you use the wrong type of join for what you are trying to do, Pandas might fill in some blanks with NaNs.

#

Think of it this way

#

you join the tables based on date_time and bus_id, right?

#

within each dataframe, could there ever be two rows that have the same for both date_time and bus_id as another row?

undone heron
#

Matching rows on both dataframes based on those 2 keys? Yes, that is what I'm looking for. @serene scaffold

serene scaffold
undone heron
#

Inner lul

serene scaffold
#

let me know if that works

undone heron
#

Great stuff, running the script again, should be ready in about 1 hour. Thanks in advance! Really appreciate the way it was explained ok_handbutflipped

serene scaffold
#

it will be an hour before we know if it worked?!

serene scaffold
exotic maple
nocturne widget
#

I want to identify how similar older sections of text are to newer sections of text we'll call "section A". What type of data should I be looking at? I can't just take only "section A" text, but also "non-section A" text. But would I look into taking this "non-section A" text as any other section besides section A? Also, for text similarity models, would you suggest using an LSTM or a siamese NN? Currently I just have an LSTM in tf.

undone heron
#

Well, once it reached that script section it was pretty fast and it failed with zero entries all Dtypes as objects non-null @serene scaffold

exotic maple
#

I think im going crazy...why i do comment like im talking omg

undone heron
#

Hey guys, I have another question not exactly related to my problem above earlier.

I have a couple of dataframes such as validation.csv and I need to "translate" the values of a column and in a way map it to other values I have in another csv (links.csv). How would I go about to do that? For example:

on links.csv
230, 555 -> meaning if I find 555 in validation.csv I need to change it to 230 and so on through the whole file.

Any thoughts?

misty flint
# exotic maple

haha it gives character. i would like reading comments like that

lean ledge
#

Luckily for you, your images seem well controlled

#

Keep in mind you don't need to find the whole segment of the egg, you can just find one half of the egg and count those too

#

If that gives you trouble, you can also find contours as you're doing now and then filter by area. The eggs should be bigger than the small holes

harsh trellis
#

there are some huge skewness present in it so, is it gonna be good if i use a power transform or instead if i should use log transformation ? cause boxcox is not gonna seem to work on this, since it only works on positive values

tacit stump
#

Is it possible to do text classification using linear regression model by converting the strings to their sum of the ascii values of all the characters?

wide oxide
sweet plaza
#

I have an assignment, basic machine learning application, but I'm very new in this.

there must be 1 runner and 2 chaser neural networks (each one of them are separate neural networks) and chasers aim to catch the runner and runner moves randomly.

what kind of ML can be applied here, unsupervised or reinforcement ? and which library would be more appropriate to use?

P.S. runner should be different program, and the environment is common for runner and chasers. environment has walls and created randomly

wet cedar
#

Does anyone know the path where I can see the list of pre-trained models in pickle?
I tried getting to pickle file but it was formatted with unsupported or binary formatting so, I can't somehow understand the values.

#

^^this returns an error that the file I'm trying to load does not exist. I want to know if its name has been changed due to updates or something like that

serene scaffold
#

Or better yet, replace

#

!docs pandas.DataFrame.replace

arctic wedgeBOT
#
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')```
Replace values given in to\_replace with value.

Values of the DataFrame are replaced with other values dynamically. This differs from updating with `.loc` or `.iloc`, which require you to specify a location to update with some value.

Parameters  **to\_replace**str, regex, list, dict, Series, int, float, or NoneHow to find the values that will be replaced.

• numeric, str or regex:

>  
> 	 • numeric: numeric values equal to to\_replace will be replaced with value
> 	
> 	
> 	 • str: string exactly matching to\_replace will be replaced with value
> 	
> 	
> 	 • regex: regexs matching to\_replace will be replaced with value  
 • list of str, regex, or numeric:... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html#pandas.DataFrame.replace)
wanton laurel
#

very quick pandas question - trying to apply a regex expr to my df column like so:
df.desc_copy = df.desc_copy.apply( lambda x: re.sub(r'(([0-9]{2})(Jan|Feb|Mar|Apr|May|Jun|Jul|Sep|Oct|Nov|Dec)([0-9]{2}))' r'(ON \d\d\d\d-\d+-\d+)|(\d+\d+)', '', str(x)))
the expr is supposed to remove each date string from every row in the desc_copy column but no effect is taking place - why?

#

i was printing the wrong column 😑

grave frost
dim trail
#

hola

#

Hey guys I need some help. I am currently doing my thesis and I got stuck in creating a dictionary.
My data comes from an experiment in which they asked ppl what quantity of CO2 do they think certain products emit, in total I have 17 products, which mean I have dataframe with lenght of N * 17. What I want is to create a nested dictionary that stores all the responses of the individuals, something like: {1: {car:200,beer:500}, 2: {car:5.beer:10}, ..., N:{car:NN,beer:NN}}. How can I do this?

serene scaffold
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

@wanton laurel is the problem that your regular expression is not matching anything? or is your problem with usage of dataframes? Please ping me if/when you're ready to continue or I may never know that you've replied.

serene scaffold
#

In either case, please run print(df.iloc[:5].to_csv()) and copy/paste the string into this chat so I can see what your data looks like.

serene scaffold
wanton laurel
#

Yeah that's Right, no longer needed, you too

dim trail
serene scaffold
dim trail
dim trail
#

I would like to have a row with all his responses instead of 17

#

that's why I am trying to create a nested dict

serene scaffold
#

@dim trail so given 0,5e6cf1cb28e5a82aed026a8f,1,500.0,car, what sub-dict do you want?

dim trail
#

one like this : {1 (this is the subject(5e6cf1cb28e5a82aed026a8f): {car:200,beer:500}, 2: {car:5.beer:10}, ..., N (this is subject N):{car:NN,beer:NN}}

serene scaffold
#

I am only asking about 0,5e6cf1cb28e5a82aed026a8f,1,500.0,car

dim trail
#

,ppnr,subject,mode,product
0(index),5e6cf1cb28e5a82aed026a8f(identifier of subject 1),1 (subject 1),500.0 (his guess),car(product)
1,5e6cf1cb28e5a82aed026a8f,1,4.0,carSocialCost ---> all this is from same subject, I want all responses as dict in a bigger dict
2,5e6cf1cb28e5a82aed026a8f,1,2.0,warming
3,5e6cf1cb28e5a82aed026a8f,1,700.0,heatwave
4,5e6cf1cb28e5a82aed026a8f,1,900.0,seaLevelRise

serene scaffold
#

does car: 200 come from a row that is not given in the sample?

dim trail
#

no, that's just an example of mine

serene scaffold
#

so each sub-dict in your outputted, nested dict will represent data from different rows?

#

or does each row of the dataframe get represented as one sub-dict in your nested dict?

dim trail
#

no, each subdict will represent data for the subjects

#

for subject 1 I have 1

#

for subject 1 I have 17 responses

serene scaffold
#

Okay, so the data structure you want is Dict[str, Dict[str, int]], and each key-value pair in the inner dicts is a row that has the same subject as the outer dict

#

let me see

dim trail
#

yes

serene scaffold
#

are you using mode for anything?

dim trail
#

mode are their guesses

#

Dict[subject, Dict[product, mode]]

serene scaffold
#

great

#

what is ppnr for?

dim trail
#

ppnr is the unique identifier of each subject, I have that to identify each person in my others datasets

serene scaffold
#

so what is the key for the outer dict? the subject or the ppnr?

dim trail
#

the subject,

#

it would be more readable for me

serene scaffold
dim trail
#

yes, no problem. there is only one subject 1 in the whole experiment

serene scaffold
#

okay, so we can drop the ppnr column, basically

dim trail
#

yes

serene scaffold
#

@dim trail I'm still looking into it

dim trail
#

thanks

serene scaffold
#

@dim trail still looking

dim trail
#

is it very complicated? I tried for hours

serene scaffold
ripe forge
#

. / gasp

ripe forge
#

Dataframe iteration is usually a sin. So there's a good chance I may request you to explain the context again, waste 30 min, and then come to the same conclusion.

#

Though if you're making dictionaries out of it you're anyways leaving dataframes behind

dim trail
#

I will try row iteration and if I can't find a solution, I'll abandon the idea and try something else

serene scaffold
#

I'd actually like to know if there's a "panda-ic" solution

#
,subject,mode,product
0,1,500.0,car
1,1,4.0,carSocialCost
2,1,2.0,warming
3,2,700.0,heatwave
4,2,900.0,seaLevelRise

The desired output is:

{1: {'car': 500.0, 'carSocialCost': 4.0, 'warming': 2.0}, 
 2: {'heatwave': 700.0, 'seaLevelRise': 900.0}}

The problem is that you're basically trying to create new columns based on values in the product column.
@ripe forge I tried to do a pivot table and then do to_dict

dim trail
ripe forge
#

!e

import pandas as pd
from io import StringIO

string = """,subject,mode,product
0,1,500.0,car
1,1,4.0,carSocialCost
2,1,2.0,warming
3,2,700.0,heatwave
4,2,900.0,seaLevelRise"""

df = pd.read_csv(StringIO(string))

def dict_creator(df):
    return dict(zip(df['product'], df['mode']))
out = df.groupby('subject').apply(dict_creator).to_dict()
print(out)
arctic wedgeBOT
#

@ripe forge :white_check_mark: Your eval job has completed with return code 0.

{1: {'car': 500.0, 'carSocialCost': 4.0, 'warming': 2.0}, 2: {'heatwave': 700.0, 'seaLevelRise': 900.0}}
ripe forge
#

this would be my knee jerk reaction to it, but it's essentially looping via apply

serene scaffold
ripe forge
#

./blushes crimson

#

i dont think you'll have a lot of gains because ultimately vectorization has to be broken to create the dictionaries at the end though. ideally for larger datastructures you want to avoid going back to dictionaries if possible when using pandas. but this probably is enough for OP's needs

dim trail
grave frost
#

simplest code to remove a specific list of words from another list of words? maybe using sets?

ripe forge
#

could you elaborate? does the toy example showcase your original problem adequately?

ripe forge
grave frost
#

yea, is there any simpler method without iterating?

#

something elegant

ripe forge
#

iterating is the simple method. you could use set intersection(edit:? not intersection, difference) if you really wanted to, but you arent gaining performance there

#

[word for word in words if word not in words_to_remove] # where words_to_remove is a set

grave frost
#

one of the list is nested which complicates it 😅

ripe forge
#

ah, the plot thickens

#

can you make a minimal example that showcases the question adequately?

exotic maple
#

You could probably iterate over every element but that sounds very efficient lol

#

doesnt sound

#

@grave frost is it possible to transform those lists to numpy arrays? If you can, you could create a 2d array and filter via masking

#

should be more efficient as a vector operation instead of iterating over N elements of nested lists

grave frost
#

@exotic maple nah, its done with python

#

thanx anyways

exotic maple
#

oh you want it in base python?

spare vine
#

[el for sublist in mylist for el in sublist if el not in words_to_remove]

#

nested for loop but in a list comprehension

grave frost
#

no, I meant the prob has already been resolved in pure python, so no need to use numpy 🙂

exotic maple
#

I mean yes, you can do it in pure python. Let me think...

I would try this to get all unique words:

set_var = set()
for sublist in list:
  for element in sublist:
    set_var.add(element)
#

question: D o you want to delete unique words or unique lists (as in, the actual block)?

spare vine
#

for the general case of arbitrarily nested lists you get to use recursion

grave frost
#

leave it guys 🙂

#

If you want to see the solution, you can check #help-apple

#

(but you will have to scroll up)

spare vine
#

or here: [el for sublist in mylist for el in sublist if el not in words_to_remove]

grave frost
#

no, its had to be done like that:

out = [[word for word in words if word not in [_.replace(' ','') for _ in translated_stop]] for words in tqdm(data_text)]
heady tide
#

I have a multilayered perceptron implemented in python, but it spits out probabilities instead of classes, what do I need to do for it to return classes ? Should I change the activation function of the last layer to softmax ?

exotic maple
#

are you sure arent using predict_proba?

heady tide
#

I am just using the forward propagation value

#

the result from the last layer

shy kraken
#

I'm trying to improve my documentation reading skills. I'm looking at numpy's linspace and I see code like this:

test = np.linspace(0, 500, 12)

and you get the same result if you do this:

test = np.linspace(0, 500, num=12)

Now when I look at the docs, it looks like this:

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)

My question is how do I know that I can type "num" or not and it will work the same? When I first saw it I was confused what the 12 did...

tidal bough
#

that's why when you pass 3 arguments, they resolve to start, stop, num

#

if you passed 4, the fourth one would be assumed to be endpoint, the fifth one retstep, and so on

#

Specifying arguments by name allows you to specify them regardless of position. Say, dtype you need to pass by name unless you are also passing num, endpoint and retstep

shy kraken
#

understood thanks! @tidal bough

misty flint
#

that concept of positional vs. keyword arguments

exotic maple
#

functions and their defintions (*args, *kwargs) sounds like you're grumbling

sacred gate
#

Hi. I would like to ask some help with the choice of literature for data-science (more precisely, bioinformatics field). I'm reading "Mathematics for machine learning" by M.P. Deiesenroth and "Practical Statistics for data scientists" by Peter Bruce. But I'm not sure about my choice, especially about the second one. Could you recommend me some books concerning Statistics for DS (especially with python examples).

P.S.: I have named just books I'm reading at the moment. I'm planning to read also "Data science from scratch", "Deep learning for the Life Sciences" from O'Reily publisher and some other books.

Thanks in advance

hollow sentinel
#

how are you liking practical statistics for DS

#

I have that book stashed somewhere

lean ledge
#

Bishop's machine learning book or elements of statistical learning are the classic ML references

sacred gate
#

@hollow sentinel I have sense, that it's unfull, may be it is explicated by the fact that this book doesn't concerned the mathematical part of statistics

sacred gate
lean ledge
#

Personal preference. I like bishop more

granite wolf
#

Hey please could someone help me with a for loop where I'm trying to read multiple tables from an SQLite database, the current code looks like this:

#

so i have a list of tables within the database, and the idea is that for each table in the list a dataframe is produced as df_<table-name>

#

for example the output should be 7 dataframes: df_sqlite_sequence, df_Player_Attributes etc

#

but currently it is just loading the last table (Team) as df_table, overwriting the previous one

exotic maple
#

@granite wolf look at your code and think about what it does.

Your loop goes through to every query and assigns the rsulting dataframe to a variable. This variable does not change in any form in any iteration, so its the same variable every time.

#

basically, you're overwriting the resulting dataframes with each step of the loop. That's why you only get the last table

#

now, depending on what you want there are manyways to move forward

#

you can create a list and store the dataframes there as elements of the list. This is the simplest solution and they will have the same order as the parent list, but you will not have any linking between them

granite wolf
#

im basically aiming for different dfs for each table in the database

sour mango
#

hey guys, im new to ml... i wanted to know how you can parse a live video feed from your computers webcam? ( i am trying to make a rock paper scissors game where the i play with the computer by showing rock paper or scissors (done by my hand) to the camera)

shy kraken
#

Got a question, let's say you have a computer that can do your machine learning operations reasonably well but not great, meaning it takes a little long to do. Is there any reason why you wouldn't use google colab? Like is there a benefit to just keeping everything on your computer?

uncut orbit
#

there isn't really a benefit to keeping everything on your computer

#

but it takes a little long for mostly everyone

#

if you switch everything to another computer....then it makes it harder to get back all of your projects

sour mango
uncut orbit
#

welcome

#

anytime

shy kraken
uncut orbit
#

its not too hard

#

but it is a headache

#

you need to move files from this computer to that

shy kraken
#

ahh ok, so you're advocating for putting it on colab because of the cloud

uncut orbit
#

yea

#

use github too

#

private repo

shy kraken
#

yeah that makes sense

#

thanks!

uncut orbit
#

welcome

iron basalt
#

The upside to cloud is that it does not matter which computer you use, all your stuff is always there. The downside is that you need an internet connection (that is stable and decently fast).

grave frost
#

but overall, the recommendation is GCP. Colab is good for newbies

iron basalt
#

Idk depends on how quickly you are doing things, the stability and lag can get in the way of a fast feedback loop (assuming the operations themselves don't take very long).

#

(And images can take a long time if you have a very slow internet speed)

grave frost
#

The most overhead I personally face is in the preprocessing part, but thats something even my decade-old laptop can do

iron basalt
#

Either way, the answer is just use both, local and cloud. Backups are always good.

uncut orbit
#

colab is good for sharing code too

iron basalt
#

Yup, hence the name.

uncut orbit
#

P.S. don't fear the loss of your job....the notebook makes basic predictions. we'll always need people to fine tune models 😉

grave frost
#

don't fear the loss of your job
.....?

bitter harbor
#

is there a reason to use seaborn over mpl?

#

my profs exclusively teaching it and i can't see why

uncut orbit
grave frost
uncut orbit
#

yea

grave frost
#

Data science has nothing to do with automating jobs. it is about deriving insight with data

uncut orbit
#

it is

#

thats true

grave frost
#

any field even remotely close to that is just robotics

uncut orbit
#

but i view data science to makes life easier

grave frost
#

well, that's pretty wrong

uncut orbit
#

ok

hollow sentinel
#

I think you're thinking of ML engineers

#

they're more focused on automation

uncut orbit
#

hmm

#

maybe

#

but please get back to me on the notebook pls feel free to dm me

grave frost
uncut orbit
#

ok now im confused

sage aurora
#

hello

grave frost
uncut orbit
#

automation using data science

grave frost
sage aurora
#

i need a very simple help; need to plot a function in a subplot (basic)

uncut orbit
#

what about integrating data science with automation

uncut orbit
grave frost
#

data science is just an umbrella term to signify someone very experienced with statistics and other relevant fields to derive insight from data.

grave frost
grave frost
uncut orbit
#

like self driving cars

grave frost
uncut orbit
#

wait i get that too

#

oh shoot

bitter harbor
#

what would you put ai under?

uncut orbit
#

i got all my definitions wrong

grave frost
#

Data scientists do a little of ML (Machine Learning) but its mostly ML researchers and engineers that do the more complex stuff

uncut orbit
#

ok

grave frost
uncut orbit
#

never thought of it that way

bitter harbor
#

huh I always thought it and nn's were fields of data sci

grave frost
#

there's no exact definition for a data scientist. but we can draw some lines

uncut orbit
#

ok

#

now if i want to do ai with robotics how would that work

grave frost
#

NN's (Neural Networks) are mostly grouped under ML

grave frost
#

but you can learn right now too 🙂

uncut orbit
#

i've been doing data science since i was 12

grave frost
uncut orbit
grave frost
uncut orbit
#

ai with robotics

#

hmm

hollow sentinel
#

you've been learning it since you were 12? your math skills must be good

grave frost
#

Thats Reinforcement learning

uncut orbit
#

lmao

hollow sentinel
#

I'm not

grave frost
uncut orbit
#

ok ig i'll learn reinforcement learning

grave frost
#

no offense

hollow sentinel
#

I don't like the accusation

uncut orbit
grave frost
uncut orbit
#

using nueral nets right?

grave frost
grave frost
#

like both aim to optimize some function

hollow sentinel
#

there's a generous amount of math that's important to know

#

that's all

uncut orbit
#

now what resources do i need?

grave frost
#

you can learn about it more by watching 3blue1brown for some basic maths and then prob pick some course

hollow sentinel
#
#

this is good

uncut orbit
#

ok

#

phd takes 12 years right?

grave frost
uncut orbit
#

how long for reinforcement learning?

grave frost
#

BTW what is your prior experience? just curious

grave frost
uncut orbit
grave frost
#

its much better to have an overall positive attitude towards learning than just doing something in "x" amount of time

grave frost
uncut orbit
#

igs

uncut orbit
grave frost
#

well, then you have the basics already done. I think you can move on to RL from there on

uncut orbit
#

ok

#

thx so much

grave frost
#

cool, no worries

iron basalt
#

ROCm support in pytorch is so nice. Don't need a Nvidia GPU.

#

Saves $$$.

grave frost
#

Is ROCm good now? I had ordered an AMD GPU before cuz I wanted to try it, but I got dissapointed with the bugs and performance so returned the card to get an Nvidia one

#

But I read some intellectual discussion where they mentioned weird C stuff to prove that AMD cards won't be able to compete with CUDA's performance.

iron basalt
#

AMD vs Nvidia performance tests are all really bad (both ways), just try it yourself.

#

(Even if one could be faster than the other it's also limited by how much effort was put into each by the library being used)

grave frost
#

Nvidia poured billions on CUDA

iron basalt
#

Newer AMD GPUs align more with Nvidia GPUs too

#

That's because Nvidia wanted an iron grip on the ML community. So all the libraries added CUDA support and ignored AMD.

grave frost
#

AMD hasn't made much contributions to computing, and OpenCL sucks

grave frost
iron basalt
#

Yeah, they invested.

grave frost
#

If it were me, I would have gone with CUDA too. its the most sane decision

iron basalt
#

But AMD is cheap, and all that so it's totally worth having support for it even if it's slower.

#

I think the way it works with ROCm is that it somehow runs the CUDA code on AMD gpus.

grave frost
#

yeah, AMD is so great. Nvidia just had a monopoly and milked all the money

iron basalt
#

so they still are using pycuda

grave frost
iron basalt
#

That way they don't need to recode everything

grave frost
#

the aim should be for native

#

not hacky stuff.

iron basalt
#

Not sure how hacky it is, if at all

heady tide
#

The graph on the right represents the error of the output layer after each epoch, is this normal for a MLP with these hyperparameters ?

grave frost
heady tide
#

I made it, a live visualization of how a multilayered perceptron works, using PyQt and multiprocessing to avoid GIL

grave frost
#

that is pretty cool

heady tide
#

thank you

grave frost
grave frost
heady tide
#

well the tricky thing is that you have to run the neural network on a separate process because you're bound to get into GIL if you run it on the main thread, so you have to create pipelines between the GUI and the network to exchange data.

#

A lot of people don't bother making visual representations but for me it's very insightful to see how everything works in real time

iron basalt
#

ROCm is pretty much native for AMD. Though it's not for all of their cards.

grave frost
#

OpenCl loses perf to CUDA

iron basalt
#

Yeah, because it's locked off from a bunch of things on Nvidia GPUs.

grave frost
#

I dunno 🤷 There is a whole github issue about it with plenty of technical arguments

iron basalt
#

It's pretty well known that the OpenCL drivers are intentionally crippled on Nvidia.

grave frost
#

....?

#

I mean OpenCL on AMD vs CUDA on Nvidia

iron basalt
#

Well AMD is different hardware.

#

It's not really just a OpenCL vs CUDA thing then. And can't really be compared.

#

You can do price per compute

#

Or something like that.

grave frost
#

well, its different architecture

#

but from what I barely understood, OpenCL is general in inferior to CUDA according to some C and optimization stuff

lean ledge
#

Data scientists can be anything from no ML to all ML, because it's a generic buzzword. Many ML engineers do no real ML, just software engineering around ML

lean ledge
#

If you're looking at stuff like Boston Dynamics, they use no "AI"

#

Good ol mechanical engineering and control theory

iron basalt
# grave frost well, its different architecture

Not really, that's just what Nvidia just wants you to think or some random people on the internet. OpenCL and CUDA do not really have anything to do with C optimizations. It's just two different APIs, what really matters is the hardware itself.

lean ledge
#

CUDA and OpenCL are unrelated to architecture lol

#

OpenCL can work on basically anything

iron basalt
#

The only issue with OpenCL is like I stated, on Nvidia hardware it's locked off from some stuff.

lean ledge
iron basalt
#

And also how much effort is put into supporting it

lean ledge
#

The OpenCL api is also not as good as the CUDA one

iron basalt
#

^

lean ledge
#

CUDA is really really good

iron basalt
#

CUDA is really convenient. Closest to it is probably OpenMP on that axis.

#

(Directly in the C/C++ code, not like passing around some string of code which one then compiles manually)

lean ledge
#

@uncut orbit In general, take everything anyone says here about machine learning or robotics or anything with a pinch of salt, there's not a lot of people with expertise in the area in this server. There's dedicated servers for AI/ML type stuff and they're generally better, alongside dedicated servers for robotics.

uncut orbit
#

ok

#

thank you

lean ledge
#

If you're asking about python, not many better servers than this but any theory work has limited talent here

uncut orbit
#

ok

iron basalt
#

This server is also not really the right place for theory, it's more for practical use with python. There are the off topic channels, but that's it.

lean ledge
#

I only exist to call out other people's BS on this server

shy kraken
#

nvm figured it out i think

#

its like a tensor specific function i guess

obtuse sable
#

How long shd I spend on understanding the theory of neural networks bedore I can start implementing one using pytorch ? I just want to compare to a logistic regression binary classifier. I have the data rdy. Is 8. 5 k data points enough?

uncut bloom
#

there should be a baseline for your problem already... so none

#

just implement and use your baseline to as a check that it worked

#

if not try, try , try, again

#

regarding data... it depends on how complex a network you're building and how much information your network needs to encode

#

try training it in batches to see the improvement rate to get a guess at the value of more data

#

e.g. training on 20%, 40% 60%...

#

plot out the curve of improvement on your valid set as a metric to get some kind of feel for the value of more data

#

if you see a jump in value in the last batch it's probably worth getting more data

#

but you should also make a business decision on the metrics you're judging by and the difficulty in acquiring more

#

if you're smart of your weight initialization and optimization less data will be necessary

#

you can also think of clever ways to use you existing data to weakly label more

modest maple
#

Hey guys can anyone help me with an error message that I have been getting while using the scipy library. I am trying to get a numpy array of pearson r correlation similarity from a bulk data and this data is imported in the form of a pandas dataFrame.

obtuse sable
misty flint
#

youre probably not accessing the dataframe correctly

#

show us your code + error

plucky grotto
#

Hi, so I want to clone my base conda install but swap out the python version. I've tried conda create --name testenv2 python=2.7 --clone root but says too many arguments. Is this not possible?

#

I'd be fine with just having two installs of python too, so long as I can reference specifically which one I want

uncut barn
#

You need to design a Neural Network that solves the problem of facial attribute
recognition. More specifically the network should receive in the input an image of a face,
and should recognise whether the depicted subject wears glasses or not, has long or
short hair, smiles or not and should recognise its apparent age. Design the first and the
last layers of such a network, detailing your choices. Define the total cost function and
give the format of a training example and the corresponding ground truth associated with
it.
[Hint: You can treat the recognition of the age either as a regression problem, or as a
classification problem – either choice is equally valid.]

#

Can anyone help me with this question?

obtuse sable
#

anyone know of a good neural network binary classification problem with solution in Pytorch online to work through so I can familiarize myself with NNs and Pytorch? preferably with at least 10 features and > 5k datapoints

hard yew
#

HI, I recently calculating the number of parameter of conv2d, but how to calculate parameter of separableconv2d?

undone vine
#

guys how do u paste code in discord

tall trail
#

add three backtics

undone vine
#

ah k thx

#

does anyone here know how to solve that cause im trying to make it on a hex value

#
    shape = [(1, 1), (220, 190)]

    # creating new Image object
    img = Image.new("RGB", (w, h))
    
    # create rectangle image
    img1 = ImageDraw.Draw(img)
    img1.rectangle(shape, fill=f"{item_color.get(items[0])}")

    font = ImageFont.truetype('theboldfont.ttf', 30)
    text_position = 25, 80

    img1.text(text_position, items[0], 'white', font=font)

    img.save('fortnite.jpg', 'JPEG')

    await ctx.send(file=discord.File('fortnite.jpg'))```
pearl arch
#

im looking for algorithms and methods for detection the anomaly in vibration track. there is a machine and i set the sensor which senses the temperature and vibration, im looking for the machine learning algorithm to detect it. is there anyone for advice?

merry lintel
#

hey im interested in getting into ai machine learning etc... but concerned if i should learn things like calculus and linear algebra first or if it's fine to learn it along the way as well

blazing bridge
#

@merry lintel I'm still learning machine learning and AI and what I did was learn everything I needed in terms of math along the way with whatever I needed. It may be better to learn the math before because you won't have to worry about it as much for the resources that are more theoretical. I hope that helps

merry lintel
#

@blazing bridge

#

oh thanks

#

but probably will just learn them along the way

#

lazy

#

xd

blazing bridge
#

yeah I was the same lol

#

it doesn't really matter as long as you learn it

merry lintel
#

i mean algebra and calculus are interesting but still lazy. it is a wide range of concepts isn't it? @blazing bridge

hollow sentinel
#

there's a good book for math in ML

#
#

it's also pinned

merry lintel
#

oh nice

#

thanks a lot

#

didnt notice

#

i guess

lapis sequoia
uncut barn
#

When applying K-Means clustering on unlabelled data, if we use a linear classifier and artificial labels, What type of regularisation would we use?

#

Can anyone help me with this?

serene scaffold
lapis sequoia
obtuse sable
#

Hi guys. What's a good metric/score to look at if I want to prioritise minimizing false positives in my binary classifier? And also if I have a lot more "Y=1"s to "Y=0"s? Approximately in the ratio of 4:1

lapis sequoia
serene scaffold
uncut barn
#

its an exericse

#

on the sheet

serene scaffold
#

actually you might want to be looking at the precision score?

#

false positives bring your precision score down, but false negatives don't.

obtuse sable
#

FPs would be castostrophic for what I'm doing so I want to minimize that without losing too many TP. Is AUCROC or specificity ok?

lapis sequoia
lapis sequoia
obtuse sable
#

That should still work for unbalanced data like mine? Like accuracy is kind of bad here because a model that only predicts 1 would still get 75+ percent

lapis sequoia
obtuse sable
#

Ok. Thanks for the help!

bitter harbor
#

is there a reason to use seaborn over mpl?
my prof's exclusively teaching it and i can't see why otrher than maybe having to do a bit less work?

lapis sequoia
# bitter harbor is there a reason to use seaborn over mpl? my prof's exclusively teaching it and...

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

So seaborn contains a lot of prebuilt and defined plots and visualization that can be directly used.
Whereas matplotlib is a plotting library with limited predefined visualisation methods but greater customisation using its APIs.

In simple words:
If in future you want to create a custom plot/visualization that doesn't exist in seaborn or Matplotlib then you can use matplotlib library to create that using its API.

You can even create your own library like seaborn using Matplotlib.

bitter harbor
#

huh ok thanks that's kinda what I was thinking

misty flint
#

@paper lake listening to this podcast from 2 biostatisticians from john hopkins talking about data science and thought of you

paper lake
#

oh well still thanks

#

i will follow that

misty flint
grave frost
#

I was surprised this took a long time. Apparently, deepix is some GH lib that can interpret blurred text. I thought that this was implemented wayy before

#

the amount of information in the 'mosaic' blurred images is incredible. a good model with plenty of data can easily break it

misty flint
rough shore
#

What is exactly AI in python?

#

Is there some sort of a special module for it?

misty flint
#

its a field of study

misty flint
rough shore
#

Thank you!

misty flint
#

np the course is built for non-technical people but its a nice overview

rough shore
#

Is AI machine learning?

misty flint
#

ML falls under AI

rough shore
#

oh ok

misty flint
#

just watch the video

#

it will be more clear

hollow sentinel
#

so what is this

#

just the theory of AI?

grave frost
#

must be some basic stuff with easy to relate examples

#

why the specific output? just asking

#

Its still not clear what is your end goal. 'motion tracking' - do you want bounding boxes or segmentation (or maybe both?)

#

That looks pretty advanced. I have never done anything like that. sorry

uncut orbit
#

how'd you do that

#

thats magic

misty flint
#

thats...a ton of data

hollow sentinel
misty flint
#

gonna have to use some big data tools

hollow sentinel
#

big boy tools

misty flint
#

dude. that output could be anything. no idea.

grave frost
#

the tenser is a float32
the whole of it?

#

why don't you flatten the output and use it like that

twin latch
#

dont know how to fix this exception error, can anyone help?

serene scaffold
twin latch
inner aspen
#

I really like working with Neural networks, Especially GANs. I am training Stylegan2-ADA on about 6.7K minecraft images, it's really cool

serene scaffold
#

#career-advice is another place where you can ask about that. you might want to give more context. I don't make hiring decisions, though there are data science jobs in my region (mid atlantic US) that accept applicants with bachelors degrees and relevant coursework

misty flint
#

same

#

but there are also many positions where theyre looking for a higher degree

serene scaffold
#

some listings say "bachelors and five years experience, or masters and two years". so basically if you don't get a masters but you spent the time you would have spent getting it in industry, the effect is the same.

lean ledge
#

Well the problem is you have to get that experience in data science

#

So you have to find at least one job that's willing to take you in with no experience in data science

grave frost
#

hmm...also, does anyone have any idea on how to get your first internship?

lusty iron
# serene scaffold Almost every data scientist listing I've seen has said that a master's degree is...

Funny thing is that most of the types that get those jobs are people who's only actual knowledge of ml comes from coursera classes. The truth is that even something as relevant as a phd in "Mathematical Optimization" does not do much for practicing machine learning. The managers for these positions don't know much about ML and believe the hype; thinking it is something from science fiction. There is not as much work/jobs in ML as people think, most of "Data Scientist" positions are really just "Data Analyst" Positions. After talking to some "Data Scientists", it is shocking how little they know. I have wondered what will happen to these people once the "Data Science" bubble busts in a few years......

serene scaffold
misty flint
#

same

#

i think there will always be roles for technical people explaining things to non-technical people. if its less ML than promised, i would still be okay with such a role

lusty iron
#

Well there are many analyst roles that require python, if you are also very technical you can move over to data engineering.

misty flint
#

maybe instead of neural nets, youre doing linear regression. im still okay with that

lusty iron
#

to be fair, I am more worried about the python language once the data bubble bursts. Python web is dying, there are also of competition in sys-admin languages

misty flint
lean ledge
#

Not so sure about that, I know multiple data scientists with PhDs who've done pretty complex work in industry, even made their own techniques for multimodal data and stuff

lusty iron
#

Python's current data ecosystem looks a lot like Java's big data/hadoop ecosystem 8 years ago....a lot of those projects died, only a handful outlived the bubble (Spark, Presto)

lean ledge
#

There might be a lot of data science positions because of hype and a lot of people being hired when they're crap but that's a separate issue entirely that stems from stupid marketing of data science MOOCs

#

The whole "sexiest job" thing has probably hurt the field significantly

misty flint
#

yeah

#

def

#

too many rookies

#

now

#

im one of them

lean ledge
#

So much scientific and engineering work is python

lusty iron
#

I am hoping people in the python community see this, and try to make sure python can survive the data bubble bursting

#

so python science is prop not going anywhere

#

not much work outside of academia for it thu

grave frost
#

tbh, I think the focus should rather be more on research and development than so called 'application of ML'. right now, its more oriented towards "getting 0.5% more in that benchmark"

lean ledge
lusty iron
#

if you look at the pydata ecosystem, there are a lot of packages/projects for different science disciplines.....I don't know if they will translate to out of academia work......

lean ledge
#

not sure what you mean by pydata ecosystem but the scientific python ecosystem is large and well used within industry

misty flint
#

everyone thinks R is going to die but its still used heavily in many places

lean ledge
#

just not by software developers because they dont actually know any science

lusty iron
lean ledge
#

I know many of these actually being used in industry lol

grave frost
#

the top of the list lol numpy, pandas and matplotlib are all used

lean ledge
#

Heck I've used some of them

grave frost
#

I struggle to understand your arguments for such reasoning

lusty iron
#

things like MDAnalysis and ITK are only for work in academia

grave frost
#

who said that?

lusty iron
#

(I am not talking about Pandas/Matplotlib)

lean ledge
#

MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale, spanning use cases from interactions of drugs with proteins to novel materials

#

I assure you this isn't only used in academia

lusty iron
#

I might be wrong.....fair enough....

grave frost
#

spanning use cases from interactions of drugs with proteins to novel materials
Probably most biotech firms?

misty flint
#

sounds like biotech/pharma

lean ledge
#

Biotech and pharmaceutical yes, but also general material science

misty flint
#

they always seem to have interesting tools

lean ledge
#

Molecular dynamics has tons of real world usecases lol

misty flint
#

lots of interesting R packages

grave frost
misty flint
#

barely

#

whats that phrase

#

"R is a glorified calculator"

lean ledge
#

And ITK is almost certainly used in industry too

#

First time I'm coming across it

misty flint
#

its good for stats tho

lean ledge
#

But it's darn useful looking

misty flint
#

R has many useful functions

grave frost
#

I guess there is a place and time for each language

lean ledge
#

Any programmer claiming X thing isn't used in industry is probably saying it because they have no domain knowledge about any industry outside of software

grave frost
#

I personally think MatLab is pretty good for stats. It seems pretty simple IMO

#

they have a GUI for regresssion 🤷

misty flint
#

hmm idk Matlab's stats capabilities but R's stats functions are pretty comprehensive

grave frost
#

Tho its DL toolkit sucks AF. its so limited

#

it has a self-driving toolkit too. nobody uses it lol

lusty iron
grave frost
#

I dont get your obsession with one language

misty flint
lean ledge
#

They absolutely will because they're not a "few jobs in that small field", it's "many many many jobs across many fields, just fields programmers don't know enough to work in"

#

It's like how CS majors claim MATLAB is dead and no one uses it

#

Because they don't realise it's used a fuck tonne, they just don't know enough other things to find and get those jobs

grave frost
#

even if python dies, we will migrate to another language if need be (there are several alternatives in development) what matters are the core programming fundamentals, not syntax

misty flint
#

yeah matlab is very industry specific

lusty iron
#

I guess I am the only one here that likes python alot and wants it to thrive

lean ledge
#

Python is going to be used for many decades across robotics, control, signal processing, physics dynamics modelling, data analysis, etc.

grave frost
#

yeah, its hard to unravel a lot of effort and work put in it

misty flint
#

one language to rule them all

#

jk

lean ledge
#

Python can't be the one language to rule them all until it becomes a fast lower level language with no garbage collection

#

Rust on the other hand...

misty flint
#

omg garbage collection

#

i hate that i have to do it so often

#

on some stuff

#

and then i forget to do it when i need to

grave frost
#

I am vaguely familiar with garbage collection. is it the clearing of memory for stuff for which the variable does not exist

lean ledge
#

Yes

#

Clearing of memory that isn't being referenced so can't be used anymore

grave frost
#

if someone deletes a variable, shouldn't it delete the stored contents too

lusty iron
#

I vaguely played with gc when I tinkered in c/c++. Java has no gc right?, but it is alot faster than python

lean ledge
#
x = [1, 2, 3] # allocates memory
x = 3
# someone needs to clear the memory memory in which [1, 2, 3] is stored
exotic maple
lean ledge
#

Java has a gc

#

C/C++ have no GC

lusty iron
#

yeah, I think I ment it the other way around

grave frost
lean ledge
#

ML isn't "AI", fancy regression is always what it's been about

#

ML never started with "AI" in the goal

exotic maple
lusty iron
#

You will be shocked how many people shy away from python due it its suppressive lack in performance

grave frost
#

well, some people did aim for making AGI (Like Turing for instance) its just that not much is there for cutting edge stuff

exotic maple
#

I know people who spend DAYs killing themselves over a meager 0.5% accuracy KPI, where its not needed

#

I respect the theorical building and the fantastic work many reserachers do in ML and DL, but personally, it's not my thing, fk research lol

grave frost
#

making 0.5% on a benchmark =/= AI progress (note I use AI, not ML)

exotic maple
#

I want to be able to apply those things to something useful to me, that's all

lean ledge
exotic maple
#

obviously, to apply them properly, i need to understand them, what they do, etc

exotic maple
grave frost
#

well, it depends. for me, I appreciate the theoretical work more than the applied one because the theoretical ones are focusing on making AGI. Practically deploying models doesn't sound very appealing

lusty iron
exotic maple
lean ledge
#

Python gets left for something faster all the time but it has to be in:

  • business logic not numerics, or
  • extremely high throughput required, or
  • needing low latency and real-time
exotic maple
#

dont get me wrong though @grave frost I can still get all nerdy and ask about specific shit, There{s reason i studied engineering (evne though i only ever worked business)

#

but not in data science, any CS graduate could shut me up with their indepth knowledge xd

grave frost
#

why is it that some package written in pure python still do stuff in like 0.001s (or at least they claim to)

lean ledge
#

What type of engineering?

exotic maple
#

Imaginary, I mean

#

Industrial Engineering

lean ledge
#

Industrial engineering ~= business engineering anyway

grave frost
#

I doubt low level language can improve performance more than 0.001s

lean ledge
#

james acaster is great

exotic maple
#

I mean, i dont regret my choice truth be told

#

Business degree would be too shallow for me

#

and the other engineering are too close-minded for me

lean ledge
#

in which case the lower level language can and will eat Python's cake

exotic maple
#

I guess the only other thing i could have studied was CS, but that wasnt a choice for me at the time

lean ledge
#

how are other engineering degrees too close-minded ThinkingJeff

exotic maple
#

@lean ledge perhaps i didnt explain myself properly. What I meant was: Their domain is exact -> A single topic. They are xtremely indepth and useful, but also narrow

#

Well, that was my view at the time, and for the most part i think it has held up.

#

Industrial engineering is shallow af. You dont learn much about any topic, but you get a good notion of a lot of things which helps you be versatile

lusty iron
#

what I don't get is that java feels like python but with forced classes and forced types......why can't they make a python compiler that takes typed python and gets the performance of java

misty flint
#

is industrial eng like operations?

exotic maple
lusty iron
exotic maple
#

@misty flint It's origin its basically for operations: Factory management, floor management, etc

#

you need to know about production processes, statistical quality control, but also business and admin

lean ledge
#

Knowing X is this particular type and the output is supposed to be this, etc, you can avoid extra operations that check the input types of the input, that ensure X thing is happening, there's less data to clean up, etc

#

For example, if you have 2 ints, in python, you write (a+b) and it checks the type of a, a is an object so fundamentally on the low level its a PyObject struct and you have to access its value and you have to see if it supports the + operation, and then you have to see if b can be added to a, etc

lusty iron
#

is there syntax difference between reference counting and what java does? I feel like they look the same for the user

grave frost
#

dym that if python was explicit, it would be faster?

lean ledge
#

In java with strict typing, your compiler knows a and b are ints before hand, there's nothing new to do. You just insert an add operation

#

and that's it

lean ledge
#

that's why things like cython etc work

#

they force you to do type annotations etc

iron basalt
lean ledge
#

and that lets them optimise

exotic maple
lean ledge
#

They're the same thing

#

Python does the same thing

exotic maple
#

@lean ledge are you a DS?

lean ledge
#

Most GC is just fancy reference counting, each object maintains a count. Every X miliseconds a program goes around checking all the memory it has allocated and clears it if no references

grave frost
#

so, if we do everything explicit in python, does it boost it a little bit?

lean ledge
#

In CPython, the normal python you do, it doesnt. The explicit type hints are just hints

exotic maple
lusty iron
lean ledge
#

You need to switch to a different python implementation that takes advantage of it

lean ledge
lusty iron
lean ledge
exotic maple
misty flint
lusty iron
#

if I take non-numric python code, add types....it will not be very fast useing cython

lean ledge
#

It's just a normal website I made with bootstrap because I was board. .gy is the Guyana domain

#

I just got lucky I got rag.gy as a domain

misty flint
#

like a good generalist skillset

exotic maple
misty flint
#

i feel like industrial engineers could be a good product manager maybe

lean ledge
#

the only fun part of industrial eng is operations research

#

that's good shit

misty flint
exotic maple
misty flint
iron basalt
#

Cython is fast if you add types to the variables and disable a bunch of things python does by default, like bounds checking, etc.

exotic maple
#

Operations research is amazing

#

Its the one part of math and college i loved

lusty iron
#

did not know that.....

exotic maple
#

Sadly i never got to use it so i forgot everything

lean ledge
#

Oh yep, safety checks like bound checks also add to slowness

misty flint
#

optimization?

exotic maple
iron basalt
#

Cython can tell you what code is probably slow cython -a.

misty flint
exotic maple
#

Optimization, queue theory, etc

lean ledge
#

s i m p l e x

misty flint
#

i only know bc of that podcast

exotic maple
#

S I M P L E X

#

no wait

#

"GUYS LETS START EXCEL SOLVER"

lean ledge
#

I do lots of optimisation as a robotics/control person so operations research is mildly cool. Not as cool as more continuous type optimisation though

exotic maple
lean ledge
#

Convex optimisation is cooler as a subject

lean ledge
iron basalt
#

One big reason to use cython is that it automatically works with numpy and you don't really need to setup a C/C++ project (all those different build tools are a nightmare and one big reason people dislike C/C++).

lean ledge
#

Cython is what happens when you realise as a scientist your simulation is slightly slow but it's going to be a bitch to rewrite in C++

#

So you add type hints and a couple other optimisation and get that last bit of juice

grave frost
#

so its basically like a C wrapper for python to make it faster?

exotic maple
#

isnt Cython just normal python?

lean ledge
#

It's Python compiled into C or C++ through the right typehints and optimisations to make Python significantly faster

lean ledge
iron basalt
#

It's python (with some extra stuff) to C, but also with a bunch of stuff added to make it work with python. It compiles to a shared object / dll which python can load.

grave frost
lean ledge
#

Yep

exotic maple
lean ledge
#

Cython's used tons by my scientist friends

#

Although they also liked numba last they tried

iron basalt
#

I kind of dislike numba, it yells at me too much, I often give up and go to cython.

#

Also Jitting each time you run the program can give annoying startup times.

lean ledge
#

numba has some really cool scientific computing stuff

iron basalt
#

For threading it can be very useful.

#

(And GPU ofc)

lean ledge
#

I dream of a world where scientists don't need to learn the ins and outs of C++ because parallelisation to CUDA or clusters becomes much easier and optimisations are automatic

iron basalt
#

It feels a lot like OpenMP, but python.

iron basalt
#

Right now it's hell trying to get the kernels tweaked just right and dealing with all the API gunk.

#

I also wish FPGAs were easier to get into and use.

#

For making things that don't really work well on CPUs nor GPUs.

#

like python -> FPGA numba or something

exotic maple
#

anyone had the issue where jupyter lab//notebook cant import NLTK?

#

I have specific env and everything, installed in there and everything

#

but its not working

#

I have verified all pointers in the env are ok as well

lean ledge
grave frost
#

thats why I always prefer pre-defined environments

exotic maple
#

@grave frost any clue on how to fix it?

iron basalt
grave frost
exotic maple
#

no module "nltk"

#

but its already installed in the enviroment and everything

grave frost
#

did you activate it?

exotic maple
#

obv lol

grave frost
#

well, then you did not install it. try force-reinstalling it

exotic maple
#

I can use it via cmd

grave frost
#

maybe some error you missed

exotic maple
#

I can use it via cmd, bu t not in jupyter

grave frost
#

hmm....whats the output with pip and pip3

exotic maple
#

already satisfied

#

you can see im in the env as well

#

its also listed

#

I trie everything i can think of

grave frost
#

pip3 install nltk

exotic maple
#

already tried. same thing

#

req satisfied

grave frost
#

when something doesn't work, there is only one solution left

exotic maple
#

ima try restarting my pc. bullshit sometimes works after restart

grave frost
#

Reboot

exotic maple
#

LOL

grave frost
#

yea, haha

exotic maple
#

fking crap sint working

#

zzz

#

what the f

#

now it worked when installed it outside the env...

#

it might be a PATH issue

loud finch
#

Did you run any pip stuff in the environment

#

It clearly states that if you run pip inside conda it will break it 100%

#

Then you can throw that env in the trash

grave frost
#

That has to be the weirdest argument I have ever seen max_features=0.7000000000000001

loud finch
#

Thats like.. Ok bro

serene scaffold