#data-science-and-ml | Python | Page 338

trim latch Sep 1, 2021, 4:05 PM

#

Please anyone

stuck karma Sep 1, 2021, 4:46 PM

#

hey! do you know how many samples is considered as "enough" or not enough to do a c ross validation?

#

because i think it is used when we dont have enough samples

#

i have 2100 features and 300 samples, it is "not enough" to split into test and train test ?

#

please ping me if you answer

livid kiln Sep 1, 2021, 4:55 PM

#

I had shown the wrong values in the explanation, I've edited them now

desert oar Sep 1, 2021, 4:56 PM

#

thanks for clarifying

#

i can't think of an obvious tidy way to do this. maybe ask on SO and link the question here. i can ponder and post a longer answer on SO if nobody else does first

livid kiln Sep 1, 2021, 4:57 PM

#

desert oar thanks for clarifying

idxs = []
idx = (df[2].notnull()==False) & (df[2].notnull().shift(-1)==True)
idx[len(idx)-1] = True

start=0
for i,end in enumerate(map(lambda x: x+1, sorted(idx.index[idx]))):
    print(i,start,end)
    idxs.append((i,start,end))
    start = end
idxs

result = pd.concat({i:df[start:end].reset_index() for i,start,end in idxs}, axis=1)
result.columns.names = ['combo','stock']
result

desert oar Sep 1, 2021, 4:57 PM

#

that's one way to do it

#

you're basically resorting to parsing a stream of 1s and 0s and grouping them

#

the absolute last resort technique 😛

livid kiln Sep 1, 2021, 4:58 PM

#

desert oar i can't think of an obvious tidy way to do this. maybe ask on SO and link the qu...

Yeah I can't seem to think of an easy way either!

desert oar Sep 1, 2021, 4:58 PM

#

ask on SO anyway and show your current attempt (and make sure to include copy/paste-able sample data), maybe you'll get some other interesting variations

#

people like doing performance benchmarks on SO too

livid kiln Sep 1, 2021, 5:19 PM

#

desert oar ask on SO anyway and show your current attempt (and make sure to include copy/pa...

Found it!!!
https://stackoverflow.com/questions/21402384/how-to-split-a-pandas-time-series-by-nan-values/66015224#66015224

sparse_ts = (~df[2].notnull()).astype(pd.SparseDtype('bool'))
# we need to use .values.sp_index.to_block_index() in this version of pandas
block_locs = zip(
    sparse_ts.values.sp_index.to_block_index().blocs - 1,
    sparse_ts.values.sp_index.to_block_index().blengths + 2,
)
# Map the sparse blocks back to the dense timeseries
blocks = [
    df.iloc[start : (start + length - 1)]
    for (start, length) in block_locs
]
for block in blocks:
    print(block)

Stack Overflow

How to split a pandas time-series by NAN values

I have a pandas TimeSeries which looks like this:

2007-02-06 15:00:00 0.780
2007-02-06 16:00:00 0.125
2007-02-06 17:00:00 0.875
2007-02-06 18:00:00 NaN
2007-02-06 19:00:00 0.565...

desert oar Sep 1, 2021, 5:20 PM

#

welp

#

that's an interesting one

#

very clever

#

i'm gonna file that one away in my brain if i ever need to work on boolean stuff like this

livid kiln Sep 1, 2021, 5:41 PM

#

How do you get the depth of a multi index dataframe ?

serene scaffold Sep 1, 2021, 6:37 PM

#

livid kiln How do you get the depth of a multi index dataframe ?

what do you mean by the depth? the number of levels of a given index?

#

!docs pandas.MultiIndex.nlevels

arctic wedgeBOT Sep 1, 2021, 6:38 PM

#

pandas.MultiIndex.nlevels


property MultiIndex.nlevels```
Integer number of levels in this MultiIndex.

Examples

```py
>>> mi = pd.MultiIndex.from_arrays([['a'], ['b'], ['c']])
>>> mi
MultiIndex([('a', 'b', 'c')],
           )
>>> mi.nlevels
3

tranquil drift Sep 1, 2021, 6:49 PM

#

Hello , any1 maybe have couple minutes to help me with pandas and merge df ?

livid kiln Sep 1, 2021, 6:51 PM

#

serene scaffold what do you mean by the depth? the number of levels of a given index?

I think I might have asked for the wrong thing! After doing a pd.concat, you can index into each 'level', how do you get the length of those?

serene scaffold Sep 1, 2021, 6:52 PM

#

livid kiln I think I might have asked for the wrong thing! After doing a pd.concat, you can...

can you show an example?

livid kiln Sep 1, 2021, 6:57 PM

#

import io
import pandas as pd

df = pd.read_csv(io.StringIO('''0.7, AAPL, 0.5
0.1, MSFT
0.1, NVDA
0.1, GOOG

0.7, INTC, 0.5
0.15, AMD
0.15, SKYT'''), header=None)

df

sparse_ts = (~df[2].notnull()).astype(pd.SparseDtype('bool'))

block_locs = zip(
    sparse_ts.values.sp_index.to_block_index().blocs - 1,
    sparse_ts.values.sp_index.to_block_index().blengths + 2,
)

df2 = pd.concat({i:df[start:(start + length - 1)].reset_index() for i, (start,length) in enumerate(block_locs)}, axis=1)
df2.columns.names = ['combo','stock']

df2

#

Now I can index df2 like this:
df2[0]
df2[1]
How do I get this max depth (== length of depth)

livid kiln Sep 1, 2021, 7:07 PM

#

serene scaffold can you show an example?

I was doing df2.index.nlevels, it is:
df2.columns.nlevels

serene scaffold Sep 1, 2021, 7:20 PM

#

livid kiln I was doing df2.index.nlevels, it is: df2.columns.nlevels

I was in a meeting. you figured it out, then?

livid kiln Sep 1, 2021, 7:23 PM

#

serene scaffold I was in a meeting. you figured it out, then?

I got confused as I thought it would be df2.index.nlevels
Hence the name multi index

flat hollow Sep 1, 2021, 7:27 PM

#

when multiple indices are called multiindex and multiple columns are called multiindex 😦

quiet vault Sep 1, 2021, 7:56 PM

#

Hello, I have multivariate time series data with the shape of (3916, 4). I am trying to use a CNN-LSTM model with the TimeDestributed layer. I need to reshape my data into [samples, subsequences, timesteps, features]. I want time steps to be 5

#

Does anyone know how I can reshape the data into this

serene scaffold Sep 1, 2021, 8:00 PM

#

quiet vault Hello, I have multivariate time series data with the shape of (3916, 4). I am tr...

what would the dimensions be, in that case?

quiet vault Sep 1, 2021, 8:01 PM

#

3916, subsequences, 5, 4

#

i tried using the reshape function but it wont work

serene scaffold Sep 1, 2021, 8:02 PM

#

when you reshape the data, the products of lengths of each dimension have to be the same

#

subsequences and 5 are new, as compared to (3916, 4) -- where is that data now?

#

are these in arrays, tensors, dataframes, or what?

quiet vault Sep 1, 2021, 8:03 PM

#

arrays

serene scaffold Sep 1, 2021, 8:03 PM

#

what array has the subsequences?

quiet vault Sep 1, 2021, 8:03 PM

#

a TimeDestributed layer needs it

#

steps and provide a time series of interpretations of the subsequences to the LSTM model
to process as input. We can parameterize this and define the number of subsequences as
n seq and the number of time steps per subsequence as n steps.```

serene scaffold Sep 1, 2021, 8:05 PM

#

do you have more than one array of shape (3916, 4)?

quiet vault Sep 1, 2021, 8:05 PM

#

no

serene scaffold Sep 1, 2021, 8:06 PM

#

whatever number subsequences is, if you have that many (3916, 4)-shaped arrays, you can stack them and get one that is (subsequences, 3916, 4)-shaped, and then you can transpose it.

#

I'm not sure about the 5.

quiet vault Sep 1, 2021, 8:06 PM

#

alright

#

thanks

livid kiln Sep 1, 2021, 9:03 PM

#

df.select_dtypes(include="object")
I would like to strip all these columns

white venture Sep 1, 2021, 9:19 PM

#

anyone know how to fix this error: Tensor.op is meaningless when eager execution is enabled

desert oar Sep 1, 2021, 10:31 PM

#

livid kiln df.select_dtypes(include="object") I would like to strip all these columns

df.drop(columns=df.select_dtypes(include='object'), inplace=True)

livid kiln Sep 1, 2021, 10:32 PM

#

desert oar ```python df.drop(columns=df.select_dtypes(include='object'), inplace=True) ```

By strip, I mean str.strip()

desert oar Sep 1, 2021, 10:32 PM

#

use .apply?

#

obj_cols = df.select_dtypes(include='object')
df[obj_cols] = df[obj_cols].apply(lambda y: y.str.strip())

livid kiln Sep 1, 2021, 10:32 PM

#

does apply do things in parallel?

desert oar Sep 1, 2021, 10:33 PM

#

not by default but i think there's a patch to do it

#

you could also loop over column names and parallelize it yourself w/ joblib

livid kiln Sep 1, 2021, 10:34 PM

#

Is there away to do this inplace?

desert oar Sep 1, 2021, 10:34 PM

#

i was just about to look into that

#

you might need to manually memmap it or something

#

how big is this data exactly?

#

maybe you should be using dask if something as simple as str.strip is slow enough to want to parallelize

#

or a real database

#

unless you have 10k object columns i don't see much value in parallelizing something like this

livid kiln Sep 1, 2021, 10:37 PM

#

Yeah, just writing the prototype in python, going to rewrite this in rust. But would like to see how far python will go 😃 , might try numba

desert oar Sep 1, 2021, 10:37 PM

#

numba won't do much for you w/ strings

#

what are you actually trying to do?

#

this seems like a total non-useful thing to try to optimize

#

at least use dtype='string' instead of dtype='object'

livid kiln Sep 1, 2021, 10:38 PM

#

I used pd.StringDtype()

desert oar Sep 1, 2021, 10:38 PM

#

same thing

#

should be a bit faster than object, i believe it's backed by an arrow array

#

i'm not sure if parallel in-place operations on a dataframe are possible without some serious care and fuckery

#

at which point, like i said, you're probably outside the realm of what you should be doing with pandas and should be considering with dask or even spark

#

or, again, an rdbms

#

however this does exist https://github.com/nalepae/pandarallel, and i think i used another library like this a couple years ago

#

if you actually explain what you're trying to do maybe i can be more helpful

#

this sounds like an x-y problem

#

e.g. if you have an ETL pipeline that needs to run in 500ms and it currently takes 750ms, you need to roll up your sleeves and start profiling and/or thinking about how you can store/handle this data differently

#

often the biggest gains are algorithmic

#

reducing number of passes over the data, etc.

livid kiln Sep 1, 2021, 10:42 PM

#

will probably use clickhouse as the db

desert oar Sep 1, 2021, 10:42 PM

#

if it really does turn out that you are bottlenecked by string processing over a list, you might want to consider switching to native python lists, and using nuitka, mypy, or cython to build a c extension

#

or switch to pypy, which can be significantly faster for "plain python" operations than cpython

livid kiln Sep 1, 2021, 10:43 PM

#

desert oar if it really does turn out that you are bottlenecked by string processing over a...

This is where I see julia being useful

desert oar Sep 1, 2021, 10:43 PM

#

but again, you need to have a goal in mind and start benchmarking

#

yep, i don't know specifically how much faster it will be for strings though

#

for all i know, dataframes.jl is backed by arrow anyway and the underlying performance characteristics might be similar

#

a lot of the slowness of python has to do with the overhead of the bytecode interpreter, so if you stay "inside" a C function for a long period of time then you're going to get significant speedups

#

yet another option is R, vectorized operations in R are very well-optimized, despite how ungodly slow things like loops and function calls are

#

i've processed 1bn+ records in data.table without thinking twice about it

#

for that much data i'd actually sooner pick up r w/ data.table than pandas

#

tldr you probably dont need to parallelize your str.strip operation

livid kiln Sep 1, 2021, 10:48 PM

#

The reason why I'm using pandas is 2 fold, 1. documentation is vast (including SO), 2. learnt python as my first language... Ideally I would switch to Julia, but due to a lack of filled SO pages, the coding process is slower for me atm.

desert oar Sep 1, 2021, 10:48 PM

#

i'm not saying not to use pandas! pandas is a great tool

livid kiln Sep 1, 2021, 10:48 PM

#

100% right tool for the right job

desert oar Sep 1, 2021, 10:48 PM

#

julia has pretty good docs though

#

including dataframes.jl

livid kiln Sep 1, 2021, 10:52 PM

#

strip = re.compile(r'(^\s+|\s+$)')
df.select_dtypes(include=pd.StringDtype()).replace(strip, '', regex=True, inplace=True)

#

This is the only way I've found inplace

#

But then I get this error

#

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

desert oar Sep 1, 2021, 11:11 PM

#

does this still trigger the warning?

string_columns = df.select_dtypes(include=pd.StringDtype()).columns
df[string_columns].replace(
    re.compile(r'(^\s+|\s+$)'), '', regex=True, inplace=True
)

upper scarab Sep 1, 2021, 11:11 PM

#

678,679,aaa,male,0.96,NG
679,680,bbb,male,0.99,IN
690,691,ccc,male,0.99,IN
691,692,ddd,male,0.99,IN
in the first column you can see the missing data(index 680 to 689 are missing)
I have another data frame
688,689,a4353
689,690,somename
690,691,dfsfds
691,692,sdfsdf
692,693,retret
693,694,ertret
I want to query the data in second data frame based on the missing index in first dataframe

desert oar Sep 1, 2021, 11:11 PM

#

upper scarab 678,679,aaa,male,0.96,NG 679,680,bbb,male,0.99,IN 690,691,ccc,male,0.99,IN 691,6...

wdym "missing"?

upper scarab Sep 1, 2021, 11:12 PM

#

678,679...690

#

index is missing

#

ignore the second column

desert oar Sep 1, 2021, 11:13 PM

#

and you want to fill in those missing rows with rows from the 2nd df?

upper scarab Sep 1, 2021, 11:13 PM

#

nope

desert oar Sep 1, 2021, 11:14 PM

#

or do you just want to get the rows from the 2nd df that fit that criterion?

upper scarab Sep 1, 2021, 11:14 PM

#

I want the rows from 2nd dataframe.

upper scarab Sep 1, 2021, 11:14 PM

#

desert oar or do you just want to get the rows from the 2nd df that fit that criterion?

yesss

#

680 to 689

#

from 2nd dataframe

#

there are few other rows missing in 1st dataframe

desert oar Sep 1, 2021, 11:20 PM

#

!e ```python
from io import StringIO

import pandas as pd

df1 = pd.read_csv(StringIO('''
678,679,aaa,male,0.96,NG
679,680,bbb,male,0.99,IN
690,691,ccc,male,0.99,IN
691,692,ddd,male,0.99,IN
'''),
index_col=0,
names=['idx', *(f'y{i}' for i in range(1, 6))]
)

df2 = pd.read_csv(StringIO('''
688,689,a4353
689,690,somename
690,691,dfsfds
691,692,sdfsdf
692,693,retret
693,694,ertret
'''), index_col=0, names=['idx', 'x1', 'x2'])

df1_idx_range = pd.RangeIndex(
df1.index.min(),
df1.index.max(),
)

df1_idx_missing = df1_idx_range.difference(df1.index)

df2_missing_df1 = df2.loc[df1_idx_missing.intersection(df2.index)]

print(df2_missing_df1)

arctic wedgeBOT Sep 1, 2021, 11:20 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |       x1        x2
002 | 688  689     a4353
003 | 689  690  somename

desert oar Sep 1, 2021, 11:20 PM

#

the key operations are RangeIndex, Index.difference, and Index.intesection

upper scarab Sep 1, 2021, 11:20 PM

#

Wow

upper scarab Sep 1, 2021, 11:21 PM

#

desert oar the key operations are `RangeIndex`, `Index.difference`, and `Index.intesection`

Thanks a lot

desert oar Sep 1, 2021, 11:22 PM

#

👍

upper scarab Sep 1, 2021, 11:22 PM

#

cant dm you

#

This helped me alot

#

i cant see your name

desert oar Sep 1, 2021, 11:23 PM

#

happy to help. i keep dm's closed because otherwise i get a lot of them, more than i want

upper scarab Sep 1, 2021, 11:23 PM

#

Any way I can repay you

upper scarab Sep 1, 2021, 11:23 PM

#

desert oar happy to help. i keep dm's closed because otherwise i get a lot of them, more th...

understandable

desert oar Sep 1, 2021, 11:23 PM

#

you can repay me by reading the docs and making sure you understand how these functions work 🙂

#

then you can teach someone else one day

upper scarab Sep 1, 2021, 11:24 PM

#

Wow sure

#

I just got started. gonna read and learn the docs after this projedct

arctic wedgeBOT Sep 1, 2021, 11:24 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1630539287:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

eager imp Sep 1, 2021, 11:57 PM

#

i'd like to plot continuous wavelets (i'd use pyplot.matshow(coef, fignum=1, aspect="auto") in jupyter) but in pyqtgraph instead, anyone know how to do that?

quiet vault Sep 2, 2021, 12:44 AM

#

Has anyone worked with CONVLSTM2d layers in keras for time series data?

#

If so, can you explain the rules or reshaping data to become 5d?

#

Everytime I try to reshape, I run into this problem where it says I cannot reshape it

dusty cloud Sep 2, 2021, 12:51 AM

#

Is it possible to save data in some variable in some other currently being executed file, when captured by the main file where SIGINT handler was attached? for the interruptions when program is being halted or stopped

proven sigil Sep 2, 2021, 5:22 AM

#

Directory: /home/hadoop/
module.py

def incr(value):
    return int(value + 1)

main.py

import pyspark.sql.functions as F
import pyspark.sql.types as T
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()

import sys
sys.path.append('/home/hadoop/')
import module

if __name__ == '__main__':
    df = spark.createDataFrame([['a', 1], ['b', 2]], schema=['id', 'value'])
    df.show()

    print(module.incr(5)) #this works

    incr_udf = F.udf(lambda val: module.incr(val), T.IntegerType()) # this throws module not found error
    df = df.withColumn('new_value', incr_udf('value'))
    df.show()

I think spark task nodes do not have access to /home/hadoop/
How do I access module.py from spark task nodes?

arctic wedgeBOT Sep 2, 2021, 6:06 AM

#

:incoming_envelope: :ok_hand: applied mute to @waxen thorn until <t:1630563396:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia Sep 2, 2021, 8:22 AM

#

Hey, this question is about SVD,
Say once i have U, sigma and V, how do i decrease the dimensions?

what i did is

  U = u[:,:no_of_parameters] # M, K # we can treat U as our X for smaller dimension
  Sigma = sigma[:no_of_parameters, :no_of_parameters] # K, K
  V = vh[:no_of_parameters] # K, N

Now say I have to classify this, so How would I get X which i could fit in and would have dimensions of (M, K)?

lapis sequoia Sep 2, 2021, 8:39 AM

#

moreover, now to classify with more dimensions, what i did was first slice them and then get the X back but it would have same dimensions as M,K * K,K * K*N = M,N

which works fine but its like converting to K and then again M,N which is big seems stupid,
is there any way by which i could put M,K sized new vector and fit it
and somehow get training data in K columns as well? (please ping me if you reply)

desert oar Sep 2, 2021, 9:42 AM

#

proven sigil Directory: `/home/hadoop/` `module.py` ```py def incr(value): return int(val...

I am pretty sure you need to physically put the file on every node, not sure how though. Hadoop admin is lemon_grimace

desert oar Sep 2, 2021, 9:44 AM

#

lapis sequoia Hey, this question is about SVD, Say once i have U, sigma and V, how do i decrea...

SVD is matrix factorization: U V D*, so just multiply the factors back together

#

That said I haven't ever seen someone actually do this by hand like you describe

#

Usually I see people doing "truncated SVD" and working with the truncated U and V matrices

#

There is also an algorithm that directly computes a low rank approximation without doing full SVD and truncating

#

@lapis sequoia https://inst.eecs.berkeley.edu/~ee127/sp21/livebook/l_svd_low_rank.html

#

https://dustinstansbury.github.io/theclevermachine/svd-data-compression

The Clever Machine

SVD and Data Compression Using Low-rank Matrix Approximation

In a previous post we introduced the Singular Value Decomposition (SVD) and its many advantages and applications. In this post, we’ll discuss one of my favorite applications of SVD: data compression using low-rank matrix approximation (LRA). We’ll start off with a quick introduction to LRA and how it relates to data compression. Then we’ll demon...

#

https://www.victorchen.org/2016/01/23/svd-and-low-rank-approximation/

Victor Chen

SVD and low-rank approximation

#

So yes this looks like a totally valid approach to low rank approximation

#

Maybe use the U matrix and not the reconstruction? Or do PCA and use the first K columns, if you are just looking for dimension reduction (PCA can be obtained from SVD on the covariance matrix on standardized data)

pine wolf Sep 2, 2021, 10:02 AM

#

desert oar Maybe use the U matrix and not the reconstruction? Or do PCA and use the first K...

did you see a post in r/math about matrix multiplication

desert oar Sep 2, 2021, 10:02 AM

#

pine wolf did you see a post in r/math about matrix multiplication

No, did i say something stupid?

pine wolf Sep 2, 2021, 10:03 AM

#

no, it seemed like an interesting post, but i didn't look at it

desert oar Sep 2, 2021, 10:03 AM

#

That was meant to be a reply to their question about how to use only the first K columns

#

Link the post?

#

I don't follow Reddit usually

pine wolf Sep 2, 2021, 10:03 AM

#

unrelated to your previous conversation, just something you might be interested in

#

a machine learning model of matrix multiplication i think ... : https://www.reddit.com/r/math/comments/pfioih/210610860_multiplying_matrices_without_multiplying/

r/math - [2106.10860] Multiplying Matrices Without Multiplying

170 votes and 23 comments so far on Reddit

#

i didn't look into it either, could just be fool's gold, dunno

desert oar Sep 2, 2021, 10:04 AM

#

Huh!

#

Interesting that it has a practical application

pine wolf Sep 2, 2021, 10:06 AM

#

i can think of several uses already without reading the paper

#

i mean, if you're already doing monte-carlo or statistical things, why not

velvet thorn Sep 2, 2021, 10:12 AM

#

pine wolf a machine learning model of matrix multiplication i think ... : https://www.redd...

didn't someone post this somewhere recently

#

I remember seeing it

pine wolf Sep 2, 2021, 10:12 AM

#

i don't know, possibly

#

i sort of earmarked it for later

lapis sequoia Sep 2, 2021, 10:50 AM

#

desert oar SVD is matrix factorization: U V D*, so just multiply the factors back together

Thanks for answering, and yep I'm aware of that, but if i multiply them even after say, cutting of some edges like i mentioned earlier, I do get same sized matrix.

#

there was one answer which said, I can get new matrix by U x Sigma but giving that to classifier did not help.

#

So by changing into small ranking matrices, I do get the new X, yes but it's dimensions stay the same regardless.

desert oar Sep 2, 2021, 11:31 AM

#

lapis sequoia Thanks for answering, and yep I'm aware of that, but if i multiply them even aft...

It's not clear what you're trying to do, but that's exactly what that operation would do. Create a lower rank approximation to the original matrix

lapis sequoia Sep 2, 2021, 11:31 AM

#

I'll show you what I'm trying to do in code hold on.

desert oar Sep 2, 2021, 11:31 AM

#

lapis sequoia there was one answer which said, I can get new matrix by U x Sigma but giving th...

Wdym "did not help"?

#

Because yes that would be the right solution

lapis sequoia Sep 2, 2021, 11:32 AM

#

desert oar Wdym "did not help"?

the classification was kinda poor, it classified everything into one class.

desert oar Sep 2, 2021, 11:32 AM

#

You are doing clustering?

lapis sequoia Sep 2, 2021, 11:32 AM

#

knn

#

def get_classifier_model(u, sigma, vh, no_of_parameters, k, y, X):
  U = u[:,:no_of_parameters] # M, K # we can treat U as our X for smaller dimension
  Sigma = sigma[:no_of_parameters, :no_of_parameters] # K, K
  V = vh[:no_of_parameters] # K, N
  X_reduced = U @ Sigma @ V # Thats how we can get the X back(same dimensions but data will be changed)
  # you can verify how much we are close to original X by np.sum(X-X_reduced, axis=None)
  neigh = KNeighborsClassifier(n_neighbors=k)
  neigh.fit(X_reduced,y)
  return neigh, U, Sigma, V

so thats what i did.

desert oar Sep 2, 2021, 11:33 AM

#

Plot your data

lapis sequoia Sep 2, 2021, 11:34 AM

#

but its on very higher degree.

#

i have docs of say 61188 words, so what i did above was,

reduced dimensions, got X back of say rows*61188 and put it into classfier.

now say if i just do U @ Sigma, which will give me matrix of say rows*100, the result is very poor.

#

of classifier.

#

I'm sorry if I'm sounding confusing.

prime hearth Sep 2, 2021, 11:50 AM

#

hello, i would like to please ask is it fine to include titantic dataset machine learning project in my resume if i build machine learning algorithm from scratch? I googled and saw lots of advices from professionals that is it not good to include it but i not sure why and if that applies to my cirucmstance since im building linear regression from scratch without any library?

#

titanic data set is from kaggle*

desert oar Sep 2, 2021, 11:51 AM

#

lapis sequoia i have docs of say 61188 words, so what i did above was, reduced dimensions, go...

Consider that 100 is still a very high dimension for computing useful distance metrics due to the curse of dimensionality

desert oar Sep 2, 2021, 11:53 AM

#

prime hearth hello, i would like to please ask is it fine to include titantic dataset machine...

What does "from scratch" mean? Personally I don't think it's that bad to put it on a resume anyway, but it's certainly a low bar and a relatively easy/uncreative project

#

I don't think knowing about QR composition and applying formulas from your stats class is all that much better than anything else

prime hearth Sep 2, 2021, 11:54 AM

#

oh okay and when i mean from scratch i mean like i dont use sklearn library for linear regression, or k means, i implement all the algorithms using math formulas like cost functino, gradient so from scratch

desert oar Sep 2, 2021, 11:54 AM

#

It's not bad, it shows you know more than nothing, which is good

#

I would suggest that implementing SGD by hand is also kind of a low bar easy project

#

Moreover I think it reflects a shallow understanding of these algorithms, because I don't think SGD is a sensible choice for the titanic data

#

So if you're going to use the wrong algorithm then yes I would not suggest putting it on the resume

#

But it also depends on what kind of job you're applying for

#

Instead of putting it "on your resume", I would build a project around it

#

Demonstrate that you can write, that you can come up with a project plan, explain the plan, and reason about your decisions for choosing that plan

#

Demonstrate that you can generate useful data visualizations and reflect on the pros and cons of different approaches

prime hearth Sep 2, 2021, 11:56 AM

#

sorry for algorithms from scratch that i implementing is for datasets that support it or work good with it

desert oar Sep 2, 2021, 11:56 AM

#

Demonstrate that you can write good quality code that other people will be able to use

#

I don't think it's a good idea to just put your class homework assignments on your resume

#

If you did some kind of capstone project or thesis, that's a great thing to include

#

But I doubt your capstone project was to run SGD linear regression on the titanic data

prime hearth Sep 2, 2021, 11:57 AM

#

oh okay. So it not great to include as a beginner?

desert oar Sep 2, 2021, 11:57 AM

#

There's nothing wrong with simple tools and familiar data sets, but nowadays the bar is too high

lapis sequoia Sep 2, 2021, 11:57 AM

#

desert oar Consider that 100 is still a very high dimension for computing useful distance m...

indeed, but for SVD the minimum i can go is k=min(m,n) so well.

desert oar Sep 2, 2021, 11:58 AM

#

prime hearth oh okay. So it not great to include as a beginner?

It's not enough I think. It's literally a class assignment, which means that everyone who took your class can do the same thing

#

You can put the class on your resume, I think that's a great idea

prime hearth Sep 2, 2021, 11:58 AM

#

Oh okay, and so like for email spam classification, you think i should create some visualization for it and deploy it using flask or just a terminal command line for this app is good enough?

#

Just not sure which projects are "bad" to hr

desert oar Sep 2, 2021, 11:58 AM

#

It depends on the kind of job you're applying for

prime hearth Sep 2, 2021, 11:59 AM

#

oh okay, i guess entry level

#

or internships

#

for machine learning

desert oar Sep 2, 2021, 11:59 AM

#

Nothing is "bad", what is bad is putting some thing on your resume that doesn't reflect that you know what you're doing

#

10 years ago, SGD was hot shit

#

But today it's baseline entry level knowledge

#

They will ask you about SGD not to test how good you are, but to make sure that you aren't lying about having basic fundamental skills

prime hearth Sep 2, 2021, 12:00 PM

#

oh okay.

desert oar Sep 2, 2021, 12:00 PM

#

So if you put fundamental skills on your resume and try to advertise them as something special, it's a red flag that you don't know what you're talking about

#

As for your spam classification project, it certainly can't hurt to spin up a web application and make nice data visualizations. However there will be a point of diminishing returns with respect to your time and energy

prime hearth Sep 2, 2021, 12:01 PM

#

diminishing returns?

desert oar Sep 2, 2021, 12:01 PM

#

I would personally rather see that you have invested energy into making your work reproducible

#

Setting random seeds, using version control, etc

#

I would want to see that the code quality is good

#

I would want to see that you have handled the data correctly

#

Diminishing returns meaninf, the benefit for every additional hour of work gets lower as you put more hours into the project

#

So what I would do is focus on making sure that the core classification project is solid, before moving onto "extra" stuff like a web application

prime hearth Sep 2, 2021, 12:02 PM

#

oh okay, so instead i should just put it on github and write documentation like in the. readme fiile and show visualization and communicate how i clean data and approached the problem and good code quality , instead of making a web app out of it and deploying it?

desert oar Sep 2, 2021, 12:02 PM

#

However, using data visualizations to demonstrate that your work is successful is a critical skill for a data scientist, so I do strongly recommend working on data viz

prime hearth Sep 2, 2021, 12:02 PM

#

oh okay

desert oar Sep 2, 2021, 12:03 PM

#

prime hearth oh okay, so instead i should just put it on github and write documentation like ...

Yes, I think that's more productive. I also recommend writing a one page "executive summary". It should explain the purpose of the project, the approach you took, and your findings. Include one or two small data visualizations

prime hearth Sep 2, 2021, 12:04 PM

#

oh okay thanks so its totoally fine if the app is just a "run file" and the code executes or i should build a web interface

desert oar Sep 2, 2021, 12:04 PM

#

https://conference.nber.org/confer/2011/GFC11/Executive.pdf here is a longer executive summary from a sophisticated economics paper. Yours should be shorter, but they should give you a sense of the kind of content that belongs in an executive summary

desert oar Sep 2, 2021, 12:05 PM

#

prime hearth oh okay thanks so its totoally fine if the app is just a "run file" and the code...

Focus on writing documentation and making sure that your code actually runs from a "clean" environment. Can I follow your instructions and obtain the same results that you obtained?

#

And emphasize that you did this work aggressively when you interview

prime hearth Sep 2, 2021, 12:05 PM

#

oh okay thanks

desert oar Sep 2, 2021, 12:05 PM

#

Sloppy data science destroys business value. Demonstrate that you understand this, and demonstrate that you are not a sloppy data scientist

#

Your code is readable, your scripts can be run without too much difficulty. You have produced sufficient justification that your model performs well. You can communicate to non-technical stakeholders, business management, etc. and demonstrate the value of your work.

#

Then yeah, throw a web app on top of that if you have time

#

And of course you have demonstrated that you are not just copying and pasting shit from TDS and Stackoverflow and Kaggle

prime hearth Sep 2, 2021, 12:07 PM

#

oh okay thanks, il prob not do web app since that requires more time like you said like flask or django and bug testing if it. happens

desert oar Sep 2, 2021, 12:08 PM

#

Note that these are the opinions and biases of exactly one person and should not be treated as fact or even general principle

prime hearth Sep 2, 2021, 12:09 PM

#

oh okay. thanks for the help! Lastly i also saw this website called iNeuron which give free internship projects

#

from beginners to advance

#

not sure if you heard of it but should i include that under internship

#

or under proejcts in my resume

#

if i do work on a project there

#

each project there gives a data set, and instructions on what they want "work with x cloud, use this model, " and some other requirements, but you are implementing it there is no source code to copy

desert oar Sep 2, 2021, 12:10 PM

#

Probably can't hurt

prime hearth Sep 2, 2021, 12:10 PM

#

oh okay thanks

desert oar Sep 2, 2021, 12:11 PM

#

Note that if you did specifically want to mention the Titanic SGD thing, you can put it as a bullet point when describing the course you are taking

#

Good excuse to hit some HR buzzwords

prime hearth Sep 2, 2021, 12:12 PM

#

oh okay,i not taking any course doing titantic but i thought it was interesting just to show...

#

thanks for help

desert oar Sep 2, 2021, 12:18 PM

#

prime hearth oh okay,i not taking any course doing titantic but i thought it was interesting ...

I see, maybe you can put it as a single bullet point, but I don't think just programming SGD from scratch is enough to justify it as a "project"

#

Like I said, if you build more of a project around it it can stand on its own

#

It's like 20 lines of python, it's just too trivial

#

Do some data visualizations showing how SGD works maybe

#

Or compare different approaches to fitting linear models

tender hearth Sep 2, 2021, 12:45 PM

#

Embedding vectors are different from regular vector representations in that they have the property of being closer in terms of distance to similar inputs, right?

serene scaffold Sep 2, 2021, 12:46 PM

#

tender hearth Embedding vectors are different from regular vector representations in that they...

are you talking about nlp?

umbral raptor Sep 2, 2021, 12:47 PM

#

Is anyone familiar with a library for using fastext(pretrained) embeddings in NNs (Pytorch) . Online I can only find code to encode text data to embedding ids but everyone is doing it with their own code? I thought that till now there would be l library for that.

tender hearth Sep 2, 2021, 12:49 PM

#

serene scaffold are you talking about nlp?

Does it matter? Are word embeddings special in some manner

#

Although now that I think about it I've only seen embeddings in NLP contexts

serene scaffold Sep 2, 2021, 12:50 PM

#

tender hearth Does it matter? Are word embeddings special in some manner

I just wasn't sure if "embedding vectors" means something outside of NLP.

#

Based on how word embeddings are trained, words whose embeddings have a closer cosine distance tend to be more semantically similar.

ripe forge Sep 2, 2021, 12:57 PM

#

tender hearth Embedding vectors are different from regular vector representations in that they...

Could you elaborate. What is regular vector representation and what is embedding vectors

tender hearth Sep 2, 2021, 12:58 PM

#

From my understanding embeddings are regular vectors with the additional property that their cosine distance from embeddings of similar inputs are relatively low

#

regular vector representation is just an abstract representation of some input

ripe forge Sep 2, 2021, 1:00 PM

#

Oh. Okay. Then sure yes. But they "end up" with this property specifically because the vectors were trained and fine-tuned with an objective that makes similar things get similar vectors.

#

Like, a poorly trained embedding vector would be no different than just a regular vector.

proven sigil Sep 2, 2021, 1:11 PM

#

desert oar I am pretty sure you need to physically put the file on every node, not sure how...

I found other options
--py-files in spark-submit and sc.addPyFile(path)
But with both of them, the spark-submit command gets stuck forever. Any idea about this?

desert oar Sep 2, 2021, 1:11 PM

#

proven sigil I found other options `--py-files` in `spark-submit` and `sc.addPyFile(path)` ...

hm, both seem right

#

stuck forever?

proven sigil Sep 2, 2021, 1:12 PM

#

the run doesn't complete

#

It's just waiting without even printing anything

cold pine Sep 2, 2021, 1:35 PM

#

yo!
i have a problem. I'd like to plot top 5 countries to see if they change across timespan. After some time working on it i've created table with {year:[top_5_countries]}. How to plot it? 😮‍💨

prime hearth Sep 2, 2021, 1:44 PM

#

so like top 5 countries are the ones with highest year?

#

oh

#

one way is dataframe looping

desert oar Sep 2, 2021, 1:47 PM

#

(nvm sorry)

cold pine Sep 2, 2021, 1:47 PM

#

did it

#

but i've just a bunch of 0's and 1's

desert oar Sep 2, 2021, 1:48 PM

#

one way to plot this is to plot the rankings as a line over time

#

one line per country

cold pine Sep 2, 2021, 1:48 PM

#

thats what i want to do

#

but dunno how

desert oar Sep 2, 2021, 1:48 PM

#

can you post this data as code/text and not as a screenshot?

#

then i can copy and paste it to help

#

!code

arctic wedgeBOT Sep 2, 2021, 1:48 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

cold pine Sep 2, 2021, 1:49 PM

#

well i mean. If country occured -> plot else -> dont plot but dunno how to do this.
{1896: ['AUS', 'BLR', 'DEN', 'HUN', 'ITA']}
{1900: ['EUN', 'LTU', 'NZL', 'PAK', 'URS']}
{1904: ['CUB', 'EUN', 'GDR', 'SUI', 'USA']}
{1906: ['CRO', 'IND', 'LTU', 'NZL', 'SRB']}
{1908: ['ARM', 'BOT', 'GUY', 'PAK', 'URS']}
{1912: ['EUN', 'GDR', 'GEO', 'GER', 'URS']}
{1920: ['ANZ', 'EUN', 'MNE', 'PAR', 'SCG']}
{1924: ['ANZ', 'CRO', 'FIN', 'IOA', 'URS']}
{1928: ['GDR', 'KOR', 'SCG', 'SRB', 'URS']}
{1932: ['CMR', 'EUN', 'FIN', 'IND', 'SCG']}
{1936: ['ANZ', 'EUN', 'GRE', 'NEP', 'URS']}
{1948: ['CIV', 'LTU', 'SCG', 'TJK', 'URS']}
{1952: ['AZE', 'ETH', 'GDR', 'PAK', 'SRB']}
{1956: ['ANZ', 'DEN', 'EUN', 'GDR', 'URS']}
{1960: ['EUN', 'GDR', 'SGP', 'URS', 'USA']}
{1964: ['GDR', 'NAM', 'SCG', 'URS', 'WIF']}
{1968: ['ANZ', 'GDR', 'SCG', 'URS', 'USA']}
{1972: ['EUN', 'MKD', 'PAK', 'SCG', 'UZB']}
{1976: ['GDR', 'GRN', 'RUS', 'URS', 'USA']}
{1980: ['ANZ', 'CUB', 'ESP', 'KOS', 'WIF']}
{1984: ['FIJ', 'GDR', 'JAM', 'PAN', 'URS']}
{1988: ['EUN', 'GDR', 'SRB', 'URS', 'USA']}
{1992: ['ANZ', 'EUN', 'GDR', 'HAI', 'URS']}
{1994: ['CUB', 'GDR', 'URS', 'URU', 'YUG']}
{1996: ['EUN', 'GDR', 'JAM', 'PAK', 'URS']}
{1998: ['ARM', 'GDR', 'SRB', 'SWE', 'URS']}
{2000: ['EUN', 'GDR', 'SRB', 'URS', 'ZIM']}
{2002: ['ANZ', 'EUN', 'HAI', 'IND', 'URS']}
{2004: ['AZE', 'BOH', 'GDR', 'SCG', 'URS']}
{2006: ['AZE', 'CRO', 'GDR', 'GEO', 'NGR']}
{2008: ['EUN', 'GDR', 'URS', 'USA', 'WIF']}
{2010: ['EUN', 'GEO', 'RUS', 'URS', 'UZB']}
{2012: ['EUN', 'GDR', 'PAR', 'UAE', 'URS']}
{2014: ['ETH', 'KOR', 'SGP', 'TTO', 'URS']}
{2016: ['ANZ', 'GDR', 'TGA', 'URS', 'WIF']}

desert oar Sep 2, 2021, 1:49 PM

#

thanks

#

so it's more than 5 countries total

#

it's the top 5 from a larger list?

cold pine Sep 2, 2021, 1:49 PM

#

yeah

#

for each year every country got a point. I've sorted them by points and categoried them by year

desert oar Sep 2, 2021, 1:50 PM

#

hm, that might get tricky with the lines thing

#

how many countries total? you don't want like 10 different colored lines if you can avoid it

#

is the first item in the list the highest or lowest ranked?

#

also why is each record a separate dict? i would imagine this should be one dict, where each key is a year and each value is a list

#

{
    2006: ['AZE', 'CRO', 'GDR', 'GEO', 'NGR'],
    2008: ['EUN', 'GDR', 'URS', 'USA', 'WIF'],
    2010: ['EUN', 'GEO', 'RUS', 'URS', 'UZB'],
}

cold pine Sep 2, 2021, 1:52 PM

#

holy

#

58 exclusive countries

#

different ^

desert oar Sep 2, 2021, 1:53 PM

#

yeah so 58 different lines is out of the question, even if only 5 of them are going to be visible at a given time

cold pine Sep 2, 2021, 1:53 PM

#

huh

#

then could you give me advice how to see if these countries change across timespan?

desert oar Sep 2, 2021, 1:53 PM

#

you could narrow it down a lot by getting the list of countries that appear in the top 5

#

i assume some of the 58 countries never do?

#

or are there 58 unique countries across this top 5 data?

cold pine Sep 2, 2021, 1:54 PM

#

these are all across this top 5 data

desert oar Sep 2, 2021, 1:55 PM

#

are you interested in 1 country specifically? or are you trying to get a sense of how the top 5 has changed over time?

cold pine Sep 2, 2021, 1:55 PM

#

second one

desert oar Sep 2, 2021, 1:55 PM

#

proven sigil It's just waiting without even printing anything

unfortunately i'm not sure. hadoop and spark can be fussy, the last time i used spark it was on databricks because it was too much work to keep it running well

desert oar Sep 2, 2021, 1:57 PM

#

cold pine second one

you could plot a subset of countries' rankings still, maybe 10 or so

#

maybe take the countries with the 10 best average rankings over time

#

another option is a grid of some kind, where the y axis is the country and the x axis is the year

#

and each cell in the grid would have the ranking in it

#

you can try to use something called "seriation" to find interesting arrangements of the countries https://pypi.org/project/seriate/

PyPI

seriate

Implementation of the Seriation sequence ordering algorithm.

#

or again do something naive like sort by average ranking over time

valid pebble Sep 2, 2021, 2:10 PM

#

I am appending multiple dataframes to CSV file using mode=a in to_csv method now how can I get 1 dataframes from this file??

desert oar Sep 2, 2021, 2:11 PM

#

read it with pd.read_csv?

valid pebble Sep 2, 2021, 2:11 PM

#

desert oar read it with `pd.read_csv`?

Yeah

cold pine Sep 2, 2021, 2:11 PM

#

it might work! Ima try both options and see how it'll look. Thank you ❤️

valid pebble Sep 2, 2021, 2:12 PM

#

desert oar read it with `pd.read_csv`?

Or normal file reading whatever works

desert oar Sep 2, 2021, 2:12 PM

#

i don't understand the question

valid pebble Sep 2, 2021, 2:13 PM

#

The things is I am appending multiple dataframes to same CSV

#

So now how can I get one dataframe out of this csv

desert oar Sep 2, 2021, 2:13 PM

#

do you understand what csv is?

valid pebble Sep 2, 2021, 2:14 PM

#

desert oar do you understand what csv is?

CSV file yeah

desert oar Sep 2, 2021, 2:14 PM

#

if you append them with header=False you can just read the file and it will already be one "data frame"

#

there's nothing magical here

valid pebble Sep 2, 2021, 2:14 PM

#

desert oar if you append them with `header=False` you can just read the file and it will al...

They are different dataframes with different structure

desert oar Sep 2, 2021, 2:14 PM

#

?

#

well that's important information

#

the answer is no, you can't really do that

#

why not just write them to separate files?

#

you'd have to put some kind of delimiter between the two dataframes, then read the file line by line, partitioning or splitting on the delimiter

#

seems like a lot of trouble instead of just writing 1 dataframe per file

#

if you need to store multiple tables in a single file, consider using sqlite or hdf5

valid pebble Sep 2, 2021, 2:16 PM

#

desert oar if you _need_ to store multiple tables in a single file, consider using sqlite o...

Sqlite sounds good ....

#

Thanks

vapid sentinel Sep 2, 2021, 2:32 PM

#

guys can anyone share data science and machine learning cousre link or a proper video

serene scaffold Sep 2, 2021, 2:41 PM

#

vapid sentinel guys can anyone share data science and machine learning cousre link or a proper ...

!resources

arctic wedgeBOT Sep 2, 2021, 2:41 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold Sep 2, 2021, 2:42 PM

#

There's data science stuff in here.

#

And there will be more pretty soon

umbral skiff Sep 2, 2021, 3:20 PM

#

I'm trying to create a new column in this DataFrame with the name of the day of the week, but I can't. Does anyone have any tips?


from datetime import date

days = (
  'Segunda-feira',
  'Terça-feira',
  'Quarta-feira',
  'Quinta-feira',
  'Sexta-feira',
  'Sábado',
  'Domingo'
  )

import pandas as pd

dados_vacinacao = pd.read_excel("vacinacao_br.xlsx")
dados_vacinacao.head()

UF    Data Vacinacao    quantidade
0    AC    2021-01-18 00:00:00    1
1    AC    2021-01-19 00:00:00    46
2    AC    2021-01-20 00:00:00    1021
3    AC    2021-01-21 00:00:00    1609
4    AC    2021-01-22 00:00:00    1105

dados_vacinacao["dia_semana"] = days[dados_vacinacao["Data Vacinacao"]].weekday()

TypeError: list indices must be integers or slices, not Series

proven sigil Sep 2, 2021, 3:25 PM

#

umbral skiff I'm trying to create a new column in this DataFrame with the name of the day of ...

days[dados_vacinacao["Data Vacinacao"].weekday()]

umbral skiff Sep 2, 2021, 3:27 PM

#

proven sigil ```py days[dados_vacinacao["Data Vacinacao"].weekday()] ```

This error appeared
AttributeError: 'Series' object has no attribute 'weekday'

proven sigil Sep 2, 2021, 3:28 PM

#

days[dados_vacinacao["Data Vacinacao"].dt.weekday()]

#

this works?

umbral skiff Sep 2, 2021, 3:30 PM

#

proven sigil ```py days[dados_vacinacao["Data Vacinacao"].dt.weekday()] ```

No
SyntaxError: invalid syntax

proven sigil Sep 2, 2021, 3:32 PM

#

ok, you have to convert the column to datetime format first

umbral skiff Sep 2, 2021, 3:33 PM

#

proven sigil ok, you have to convert the column to datetime format first

Do you know how I can do this?

flat fiber Sep 2, 2021, 3:44 PM

#

Hello, I wanted to learn to make a chatbot (preferably retrieval-based, but generative too if possible). I know Python with Pandas, Numpy and some regex.
Could you suggest some resources to learn how to do that?
Also, between the below two courses, which one seems better?
https://learn.datacamp.com/courses/building-chatbots-in-python
https://www.codecademy.com/learn/paths/build-chatbots-with-python

bleak trout Sep 2, 2021, 4:03 PM

#

What can we learn from Artificial Intelligence in video games?

quiet vault Sep 2, 2021, 4:21 PM

#

bleak trout What can we learn from Artificial Intelligence in video games?

What do you mean?

bleak trout Sep 2, 2021, 4:22 PM

#

quiet vault What do you mean?

that is the point

#

skool projects are like that

quiet vault Sep 2, 2021, 4:24 PM

#

You are doing AI in school?

somber prism Sep 2, 2021, 4:56 PM

#

guys, what is epsilon in svr model in sklearn.svm ? is it learning rate

somber prism Sep 2, 2021, 4:57 PM

#

umbral skiff Do you know how I can do this?

pd.to_datetime(df[colname])

umbral skiff Sep 2, 2021, 5:34 PM

#

somber prism pd.to_datetime(df[colname])

Did not work

somber prism Sep 2, 2021, 5:36 PM

#

import pandas as pd

pd.to_datetime(dados_vacinacao["Data Vacinacao"]) ``` assuming dados_vacinacoa is the dataframe and data vacinacao is the columns

opal mango Sep 2, 2021, 5:39 PM

#

sup, could you send a good python video thats focused on data science

somber prism Sep 2, 2021, 5:40 PM

#

opal mango sup, could you send a good python video thats focused on data science

https://www.youtube.com/results?search_query=freecodecamp+data+science

opal mango Sep 2, 2021, 5:42 PM

#

thx

lapis sequoia Sep 2, 2021, 5:48 PM

#

does anyone know how to implement a machine learning api into python code

misty flint Sep 2, 2021, 7:08 PM

#

are you calling the api or coding one

#

those are 2 different topics

#

DoggoKek

#

calling an api is relatively easy for most common api's. look into their documentation, usually very helpful

desert oar Sep 2, 2021, 7:22 PM

#

somber prism guys, what is epsilon in svr model in sklearn.svm ? is it learning rate

no. support vector regression is like linear regression, but errors smaller than ε are ignored. so ε controls the tolerance of the model to errors; bigger ε means greater tolerance. this is analogous to the interpretation of C in support vector classification as the tolerance of the model to misclassification.

see:
https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html#sklearn.svm.LinearSVR
https://scikit-learn.org/stable/modules/svm.html#linearsvr
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.4288

CiteSeerX — A tutorial on support vector regression

CiteSeerX - Document Details (Isaac Councill, Lee Giles, Pradeep Teregowda): In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and...

somber prism Sep 2, 2021, 7:24 PM

#

desert oar no. support vector regression is like linear regression, but errors smaller than...

thanks

lapis sequoia Sep 2, 2021, 8:35 PM

#

hi

#

im using jupyter notebooks

#

Screen_Shot_2021-09-02_at_4.35.41_PM.png

#

this happens when i try to load an image (.tif)

#

I dont know why

grave frost Sep 2, 2021, 9:30 PM

#

bleak trout What can we learn from Artificial Intelligence in video games?

"that initially simple mathematical policies can lead to somewhat intelligent and complex behaviours in AI agents, very reminiscent of humans". watch this for an illustration https://www.youtube.com/watch?v=n6nF9WfpPrA

glad mulch Sep 2, 2021, 11:23 PM

#

anyone know why i keep getting this error

#

## using Health Care as BaseLine
exog_vars = ['BOARD_SIZE','CEO_DUALITY','BOARD_AVERAGE_TENURE','BOARD_MEETING_ATTENDANCE_PCT',
             'PCT_WOMEN_ON_BOARD','ROE','CUR_MKT_CAP','ROA','D/E',
             'Consumer Discretionary','Consumer Staples', 'Energy', 'Financials',
             'Industrials', 'Information Technology', 'Materials',
             'Utilities']
exog = sm.tools.tools.add_constant(panel_data[exog_vars])
endog = panel_data['ESG_DISCLOSURE_SCORE']
mod = PanelOLS(endog, exog, entity_effects=True)
PanelOLS4_res = mod.fit(cov_type='clustered', cluster_entity=True)
# Store values for checking homoskedasticity graphically
fittedvals_fe4_OLS = PanelOLS4_res.predict().fitted_values
residuals_fe4_OLS = PanelOLS4_res.resids

#

AbsorbingEffectError                      Traceback (most recent call last)
<ipython-input-50-a32677c6977b> in <module>
      8 endog = panel_data['ESG_DISCLOSURE_SCORE']
      9 mod = PanelOLS(endog, exog, entity_effects=True)
---> 10 PanelOLS4_res = mod.fit(cov_type='clustered', cluster_entity=True)
     11 # Store values for checking homoskedasticity graphically
     12 fittedvals_fe4_OLS = PanelOLS4_res.predict().fitted_values

C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\model.py in fit(self, use_lsdv, use_lsmr, low_memory, cov_type, debiased, auto_df, count_effects, **cov_config)
   1779         if self.entity_effects or self.time_effects or self.other_effects:
   1780             if not self._drop_absorbed:
-> 1781                 check_absorbed(x, [str(var) for var in self.exog.vars])
   1782             else:
   1783                 # TODO: Need to special case the constant here when determining which to retain

C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\utility.py in check_absorbed(x, variables, x_orig)
    441         absorbed_variables = "\n".join(rows)
    442         msg = absorbing_error_msg.format(absorbed_variables=absorbed_variables)
--> 443         raise AbsorbingEffectError(msg)
    444     if x_orig is None:
    445         return

AbsorbingEffectError: 
The model cannot be estimated. The included effects have fully absorbed
one or more of the variables. This occurs when one or more of the dependent
variable is perfectly explained using the effects included in the model.

The following variables or variable combinations have been fully absorbed
or have become perfectly collinear after effects are removed:

          const, BOARD_SIZE, CEO_DUALITY, BOARD_AVERAGE_TENURE, ROE, ROA, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, BOARD_AVERAGE_TENURE, PCT_WOMEN_ON_BOARD, ROE, ROA, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, BOARD_AVERAGE_TENURE, ROA, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, ROA, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, BOARD_AVERAGE_TENURE, BOARD_MEETING_ATTENDANCE_PCT, PCT_WOMEN_ON_BOARD, ROE, ROA, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, BOARD_AVERAGE_TENURE, ROE, ROA, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities
          const, BOARD_SIZE, CEO_DUALITY, BOARD_AVERAGE_TENURE, BOARD_MEETING_ATTENDANCE_PCT, PCT_WOMEN_ON_BOARD, ROE, CUR_MKT_CAP, ROA, D/E, Consumer Discretionary, Consumer Staples, Energy, Financials, Industrials, Information Technology, Materials, Utilities

Set drop_absorbed=True to automatically drop absorbed variables.

desert oar Sep 3, 2021, 12:20 AM

#

@glad mulch it looks like one of your variables is fully explained by one of your effects, as per the error

#

It's possible that you forgot to drop a level in a dummy variable, or you have some combinations of variables that are redundant

#

@glad mulch https://github.com/statsmodels/statsmodels/issues/2568 https://stackoverflow.com/q/55071706 https://github.com/bashtage/linearmodels/issues/193 you need to drop levels of dummy variables

glad mulch Sep 3, 2021, 12:34 AM

#

i thought i did drop a variable

#

i dropped health care

#

@desert oar

desert oar Sep 3, 2021, 12:36 AM

#

It's a bit hard to read on my phone, but it looks like they are suggesting several combinations that are collinear

#

It's possible that your data is messed up

#

In fact, if you can share the data, or a sample of it that reproduces the problem, that might help

glad mulch Sep 3, 2021, 12:39 AM

#

i can only do partial amounts of data

#

to share

arctic wedgeBOT Sep 3, 2021, 12:40 AM

#

Hey @glad mulch!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

glad mulch Sep 3, 2021, 12:40 AM

#

can i dm u

#

@desert oar

orchid lance Sep 3, 2021, 1:06 AM

#

I'm studying data science at a low level and need help trying to create a multi objective optimization using goal programming to generate a pareto frontier.

#

I managed to make it in excel and mathematica but couldn't do so in python

#

Any help at all would be appreciated

#

I did make a post on help-apple

bleak trout Sep 3, 2021, 3:08 AM

#

grave frost "that initially simple mathematical policies can lead to somewhat intelligent an...

woah, that was quite a "technical" answer, thanks 🙂

sinful gale Sep 3, 2021, 4:21 AM

#

What do you call the classification categories in a formal way? Like if the dataset has three classification categories like -1,0,1 - what are -1,0,1 called?

tender hearth Sep 3, 2021, 5:29 AM

#

sinful gale What do you call the classification categories in a formal way? Like if the data...

labels

#

targets

sinful gale Sep 3, 2021, 5:30 AM

#

Gotcha

tender hearth Sep 3, 2021, 5:31 AM

#

sinful gale Gotcha

Unless, you're talking about the set of defined labels for your dataset?

#

I'm not sure if there's a separate term but it seems like there should be

sinful gale Sep 3, 2021, 5:32 AM

#

tender hearth Unless, you're talking about the set of defined labels for your dataset?

Nah I think labels is the correct word here

cold pine Sep 3, 2021, 8:27 AM

#

hello. I've df with two columns, first - 'Class' (one of 8 skills) and second - 'isOver30' (True/False)
how to make stacked bar chart with Classes as x,
total(isOver30) for each class as y and total(true) under total(False)

shut tapir Sep 3, 2021, 8:28 AM

#

Hi guys
Has anyone worked with BERT previously? If so, have you used the transformers library by HuggingFace? I need help with running the BERT model offline, because the system I will be running this model on will not have access to an internet connection, and so pre-trained model is the way to go. I'm struck at this point not sure how to proceed, please help. Thank you! 🙂

warm wharf Sep 3, 2021, 8:44 AM

#

hi :) i put a really basic question about BCELoss in #help-honey if nayone could help i would appreciate it thanks

livid pine Sep 3, 2021, 8:45 AM

#

Hi i need some help how can i dump rows in another df

lapis sequoia Sep 3, 2021, 9:05 AM

#

Hi can someone help me with databases

modest mulch Sep 3, 2021, 10:52 AM

#

lapis sequoia does anyone know how to implement a machine learning api into python code

If you're implementing your model using tensorflow backend, checkout tf serving,
else checkout flask if you're looking for a simple api it's pretty easy to learn

rustic scarab Sep 3, 2021, 1:30 PM

#

Anyone here doing projects on NLP? Am looking for someone who's is learning AI by building some projects?

royal crest Sep 3, 2021, 1:45 PM

#

Yes

hoary wigeon Sep 3, 2021, 1:56 PM

#

Hello

#

Julia vs Javascript which are good in these two for ML : )

royal crest Sep 3, 2021, 1:59 PM

#

Julia

#

(Neither personally)

median cliff Sep 3, 2021, 2:04 PM

#

Are there any resources for stats stuff in regards to checking pre-processing image data for a CNN?

austere swift Sep 3, 2021, 2:17 PM

#

rustic scarab Anyone here doing projects on NLP? Am looking for someone who's is learning AI b...

Yeah i have a couple NLP projects i'm doing for learning

#

whenever i wanna learn a new kind of model or something I try to replicate it from the paper so I can see its functions and such

#

it gives me a good understanding of the inner workings of the model by hard coding them on my own

serene scaffold Sep 3, 2021, 2:24 PM

#

Dataframe:

CellLine,39,0.541,Animal,red,#FF0000,0.59
GroupName,939,0.399,Animal,red,#FF7F00,0.569
GroupSize,367,0.625,Animal,red,#FFD400,0.343
SampleSize,43,0.584,Animal,red,#FFFF00,0.791
Sex,574,0.83,Animal,red,#BFFF00,0.099
Species,1585,0.859,Animal,red,#6AFF00,0.047
Strain,372,0.62,Animal,red,#00EAFF,0.387
Dose,637,0.618,Dose,green,#0095FF,0.232
DoseDuration,210,0.545,Dose,green,#0040FF,0.262
DoseDurationUnits,198,0.536,Dose,green,#AA00FF,0.177
DoseFrequency,92,0.513,Dose,green,#FF00AA,0.522
DoseRoute,558,0.563,Dose,green,#EDB9B9,0.317
DoseUnits,472,0.592,Dose,green,#E7E9B9,0.252
TimeAtDose,115,0.276,Dose,green,#B9EDE0,0.617
TimeAtFirstDose,45,0.037,Dose,green,#B9D7ED,0.644
TimeAtLastDose,21,0.0,Dose,green,#DCB9ED,0.571
TimeEndpointAssessed,659,0.5,Dose,green,#8F2323,0.382
TimeUnits,594,0.653,Dose,green,#8F6A23,0.12
TestArticle,1831,0.485,Exposure,orange,#4F8F23,0.281
TestArticlePurity,26,0.213,Exposure,orange,#23628F,0.855
Vehicle,417,0.44,Exposure,orange,#6B238F,0.283
TestArticleVerification,5,0.0,Exposure,orange,#000000,1.0
Endpoint,4316,0.308,Endpoint,blue,#737373,0.844
EndpointUnitOfMeasure,682,0.35,Endpoint,blue,#CCCCCC,0.499

Code:

df.plot.scatter(x='unique_percent', y='b_f1', xlabel='Percent Unique Mentions', ylabel='BERT $F_1$ Score', color=df['unique_colors'])

How do I add a color legend on the side?

#

This is the result:

#

My advisor told me to do it like this. I'm not complaining because it's gay.

rigid zodiac Sep 3, 2021, 2:26 PM

#

Dump question, I have a really big csv file, like 134000 row of data (in json file). how can I speed up the processing data

serene scaffold Sep 3, 2021, 2:26 PM

#

rigid zodiac Dump question, I have a really big csv file, like 134000 row of data (in json fi...

It depends on what data processing you are trying to do and how much of it can be done independently.

rigid zodiac Sep 3, 2021, 2:29 PM

#

serene scaffold It depends on what data processing you are trying to do and how much of it can b...

so far I have spent 2 days with it. In the data processing code, I'm trying to convert a string data which include [range, azimuth, doppler] into [x,y,z]

serene scaffold Sep 3, 2021, 2:29 PM

#

rigid zodiac so far I have spent 2 days with it. In the data processing code, I'm trying to c...

how is it stored? a csv?

rigid zodiac Sep 3, 2021, 2:29 PM

#

I did the similar process with other data 12900 row before. Yeahh csv

serene scaffold Sep 3, 2021, 2:30 PM

#

let's just forget how many rows there are. It doesn't matter for figuring out what transformation you need to do

#

Can you provide a few example rows from the CSV as text (no screenshot)?

arctic wedgeBOT Sep 3, 2021, 2:32 PM

#

Hey @rigid zodiac!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

rigid zodiac Sep 3, 2021, 2:32 PM

#

serene scaffold let's just forget how many rows there are. It doesn't matter for figuring out wh...

https://paste.pythondiscord.com/uximecabef.yaml

#

It's just a snipit of it

serene scaffold Sep 3, 2021, 2:43 PM

#

rigid zodiac https://paste.pythondiscord.com/uximecabef.yaml

I can't dive into this at the moment, but if transforming each row can be done independently, you can figure out what the transformation is and do them in parallel

austere lion Sep 3, 2021, 2:54 PM

#

Can i ask a question about Natural Language Processing here?

#

I really can't find the definition of Shallow NLP

austere swift Sep 3, 2021, 2:58 PM

#

austere lion Can i ask a question about Natural Language Processing here?

yeah you can ask anything related to machine learning or data science (which nlp is)

austere swift Sep 3, 2021, 2:59 PM

#

rigid zodiac so far I have spent 2 days with it. In the data processing code, I'm trying to c...

if the rows are independent of each other and it's compute-bound, you can use multiprocessing and create a process pool to compute them in parallel

#

it would be something along the lines of:

import multiprocessing as mp
with mp.Pool() as pool:
    results = pool.map(transform_function, df.iterrows())

#

you will have to mess with results a bit to get it back to a df because it will return a list of the outputs from the transform_function

austere lion Sep 3, 2021, 3:01 PM

#

So.. can anyone explain about shallow npl or where can i get the source that explains it?

austere swift Sep 3, 2021, 3:02 PM

#

austere lion So.. can anyone explain about shallow npl or where can i get the source that exp...

https://nlp.stanford.edu/projects/shallow-parsing.shtml I found this on shallow parsing, i'm not sure if it's what you're talking about though

The Stanford NLP Group

Performing groundbreaking
Natural Language Processing research since 1999.

austere lion Sep 3, 2021, 3:08 PM

#

austere swift https://nlp.stanford.edu/projects/shallow-parsing.shtml I found this on shallow ...

So, it's written "In NLP for text-retrieval, need robust approach and efficient (Shallow NLP)"

#

i dunno what it means, so i try to find out

mortal dove Sep 3, 2021, 3:22 PM

#

serene scaffold Dataframe: ```0,count,b_f1,group,colors,unique_colors,unique_percent CellLine,39...

import matplotlib.patches as mpatches

ax = df.plot.scatter(x='unique_percent', y='b_f1', xlabel='Percent Unique Mentions', ylabel='BERT $F_1$ Score', color=df['unique_colors'])

colorlist = zip(df['0'], df['unique_colors'])
handles = [mpatches.Patch(color=colour, label=label) for label, colour in colorlist]
labels = df['0']

ax.legend(handles, labels, loc='center right', bbox_to_anchor=(1.5, 0.5));

gray verge Sep 3, 2021, 3:24 PM

#

Any resources?

serene scaffold Sep 3, 2021, 3:24 PM

#

@mortal dove seems like an unnecessarily complicated solution, but I suppose matplotlib might not support a better one

mortal dove Sep 3, 2021, 3:24 PM

#

Yea, can't find a easier way, seaborn might have a simpler solution

serene scaffold Sep 3, 2021, 3:26 PM

#

@mortal dove you can add it to the so question I posted, but the question is posed differently https://stackoverflow.com/questions/69046713/create-a-color-coded-key-for-a-matplotlib-scatter-plot-with-specific-colors

Stack Overflow

Create a color-coded key for a matplotlib scatter plot with specifi...

Here is the data:
import pandas as pd

data = {'letter': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X'], 'color': ['#FF0000...

#

Also if you post it there, please don't include the real labels.

mortal dove Sep 3, 2021, 3:30 PM

#

Right, will do

#

seaborn is indeed a lot easier if you can use it

#

ax = sns.scatterplot(x=df['unique_percent'], y=df['b_f1'], palette=df['unique_colors'].tolist(), hue=df['0'])

#

thorn bobcat Sep 3, 2021, 3:44 PM

#

5 Steps to Machine Learning with Google(Completely Free)
https://medium.com/@TheUndergrad/5-steps-to-machine-learning-with-google-completely-free-ea5cddf39ddd

Medium

5 Steps to Machine Learning with Google (Completely Free)

Before I begin this topic let me just say, this content is in no way sponsored by Google but rather it’s a practical guide to learning from…

rustic scarab Sep 3, 2021, 4:06 PM

#

austere swift whenever i wanna learn a new kind of model or something I try to replicate it fr...

That's so cool

rustic scarab Sep 3, 2021, 4:10 PM

#

austere swift Yeah i have a couple NLP projects i'm doing for learning

can we collab on some projects related to NLP?

austere swift Sep 3, 2021, 4:15 PM

#

Nah I don’t really collab with people, sorry

quick kestrel Sep 3, 2021, 4:24 PM

#

Guys how can I start learning ai and data science

rigid zodiac Sep 3, 2021, 4:34 PM

#

quick kestrel Guys how can I start learning ai and data science

start with hitting the book or coursera

quick kestrel Sep 3, 2021, 4:34 PM

#

@rigid zodiac well I wanna learn ai and machine learning

untold parrot Sep 3, 2021, 4:46 PM

#

quick kestrel <@380930360407621646> well I wanna learn ai and machine learning

Do you have any specific thing to learn in mind? Like you wanna know more about text processing or image? or you just wanna learn more about them?

#

if you're new to ML check out Kaggle courses, maybe you can start with intro to machine learning course.

quick kestrel Sep 3, 2021, 4:49 PM

#

@untold parrot I am making a discord bot but I want it to learn about the way users talk and make it talk to them

#

Basically a chat bott

misty flint Sep 3, 2021, 4:50 PM

#

serene scaffold <@217713754094305290> you can add it to the so question I posted, but the questi...

i think plotly could do it better. itll also allow you to have hover text for the labels which i think would be helpful in this case

quick kestrel Sep 3, 2021, 4:50 PM

#

Btw what's the download size of tensorflow

serene scaffold Sep 3, 2021, 4:51 PM

#

misty flint i think plotly could do it better. itll also allow you to have hover text for th...

It's going in a PDF

misty flint Sep 3, 2021, 4:51 PM

#

quick kestrel <@759776646194135061> I am making a discord bot but I want it to learn about the...

youll need a database to store that information if its a certain server

quick kestrel Sep 3, 2021, 4:51 PM

#

misty flint youll need a database to store that information if its a certain server

U mean I don't need ML or u mean I will require a db too

misty flint Sep 3, 2021, 4:51 PM

#

serene scaffold It's going in a PDF

ah gotcha nvm. i was thinking with a deployment mindset

misty flint Sep 3, 2021, 4:52 PM

#

quick kestrel U mean I don't need ML or u mean I will require a db too

the latter

quick kestrel Sep 3, 2021, 4:52 PM

#

Ok so I need a db too

#

And I have thaat

#

Now I wanna learn ml foor my bot but how @misty flint

misty flint Sep 3, 2021, 4:53 PM

#

plenty of tutorials. you dont need to learn all of ML first to do what you need

quick kestrel Sep 3, 2021, 4:53 PM

#

U mean I need to learn just a bit of ml

#

?

misty flint Sep 3, 2021, 4:56 PM

#

you need to learn the bare minimum. how is your chatbot going to learn from your dataset? what do you want it to say? etc.

quick kestrel Sep 3, 2021, 4:57 PM

#

misty flint you need to learn the bare minimum. how is your chatbot going to learn from your...

I want it to learn from the chat of the users

misty flint Sep 3, 2021, 4:57 PM

#

yes but what is it going to say in response

quick kestrel Sep 3, 2021, 4:58 PM

#

misty flint yes but what is it going to say in response

The thing it learned

misty flint Sep 3, 2021, 4:58 PM

#

theres different approaches to your problem so think about what youre looking for and then search through some of the tutorials. come back when you have a more specific problem, yeah?

untold parrot Sep 3, 2021, 5:09 PM

#

quick kestrel <@759776646194135061> I am making a discord bot but I want it to learn about the...

google discord bot and you can find many tutorials, see which one is close to what you want and follow those

sand timber Sep 3, 2021, 5:29 PM

#

anyone got good links for deciding on NN archs? e.g. how many conv layers, what kernel sizes, pooling types, etc

lapis sequoia Sep 3, 2021, 6:00 PM

#

Is there some good project to start with numpy and pandas? rooThink
These both are in my school syllabus and I'm thinking to learn them with some project, rather than just trying out each method one by one.. coz that's very boring blobpain

soft kite Sep 3, 2021, 6:25 PM

#

shot in the dark. does anyone here have experience with matplotlib for graphing data?

wraith lagoon Sep 3, 2021, 6:26 PM

#

while using pandas i came across statements df.sort_values('tip',ascending=False)[0:2] and df.sort_values('tip',ascending=False).iloc[0:2] both the statements gave me same result how?

#

i asked this in help channel but no reply

ripe forge Sep 3, 2021, 6:30 PM

#

pandas tends to give you multiple ways of doing the same thing.

#

the first uses a slice. the second slices an iloc. that's about it really, both do the same thing here.

#

(unless you;re saying that they didnt give the same result. that would be a different question entirely)

wraith lagoon Sep 3, 2021, 6:37 PM

#

ripe forge pandas tends to give you multiple ways of doing the same thing.

when i used iloc it gave me data in some type of format

#

but here the results were in tables

ripe forge Sep 3, 2021, 6:39 PM

#

could you rephrase your question? are you getting same results from both or different

wraith lagoon Sep 3, 2021, 6:41 PM

#

same but iloc give rows

#

if we slice using iloc shouldn't we get row exluding header row

#

for example ['name','age']['abc',123]..... so in this name and age are column names

#

so if i use iloc they should be exluded

ripe forge Sep 3, 2021, 6:43 PM

#

oh, i see where you're coming from. when you select a row using iloc, you get the row. but when you slice, you still get the dataframe, so column names will be there.

wraith lagoon Sep 3, 2021, 6:44 PM

#

ripe forge oh, i see where you're coming from. when you select a row using iloc, you get th...

ya

ripe forge Sep 3, 2021, 6:44 PM

#

ie. what you're saying would only be true if you were talking about .iloc[2]. .iloc[0: 2] gives the slice of* df, so no, the column names should be there, as intended

desert oar Sep 3, 2021, 6:45 PM

#

wraith lagoon while using pandas i came across statements df.sort_values('tip',ascending=False...

This will only give you the same results if df has integer indexes or RangeIndex

#

Actually maybe there is a special case in there for slices of integers

#

I try to avoid using implicit row slicing/subsetting whenever possible

wraith lagoon Sep 3, 2021, 6:46 PM

#

ripe forge ie. what you're saying would only be true if you were talking about `.iloc[2]`. ...

in .iloc[2] there is a type format its not in table form

ripe forge Sep 3, 2021, 6:46 PM

#

yes, it's a series

desert oar Sep 3, 2021, 6:46 PM

#

I always use loc/iloc for rows

ripe forge Sep 3, 2021, 6:46 PM

#

this will only apply on selection, but not on slices.

wraith lagoon Sep 3, 2021, 6:47 PM

#

so if we select multiple series it should be same

desert oar Sep 3, 2021, 6:47 PM

#

Calling iloc with a slice or something "array like" returns a DataFrame. Calling it with a "scalar" returns a Series

wraith lagoon Sep 3, 2021, 6:48 PM

#

so is there any way that i can specify that its a header row and it should not be included

desert oar Sep 3, 2021, 6:48 PM

#

I think you're confusing the content of the table with its display on the screen

wraith lagoon Sep 3, 2021, 6:48 PM

#

desert oar I always use `loc`/`iloc` for rows

whats the difference b/w loc and iloc

desert oar Sep 3, 2021, 6:49 PM

#

That or you need to pass different options to read_csv

ripe forge Sep 3, 2021, 6:49 PM

#

this might be an XY problem. a dataframe always has column names. maybe your goal can be achieved regardless depending on what you're actually facing

desert oar Sep 3, 2021, 6:49 PM

#

wraith lagoon whats the difference b/w loc and iloc

The former uses the row labels, the "index", the latter always uses the numerical position of the row in the table

wraith lagoon Sep 3, 2021, 6:49 PM

#

oh

desert oar Sep 3, 2021, 6:49 PM

#

It's a common newbie trap, because the default row labels are also integers

#

As for your actual question, are you just trying to print the data without the column names? Or did you somehow accidentally get the column names into a row in the dataframe?

wraith lagoon Sep 3, 2021, 6:51 PM

#

thing is that when i did for single iloc it gave me ans in series

#

but i want multiple rows

#

not columes

rigid zodiac Sep 3, 2021, 6:52 PM

#

you can range it

desert oar Sep 3, 2021, 6:52 PM

#

Then use a slice or a list of numbers, not a single number

rigid zodiac Sep 3, 2021, 6:52 PM

#

like df.iloc[: , : ]

wraith lagoon Sep 3, 2021, 6:52 PM

#

rigid zodiac you can range it

but can i fix my header row

rigid zodiac Sep 3, 2021, 6:52 PM

#

sure

desert oar Sep 3, 2021, 6:52 PM

#

what's this business about the header row?

wraith lagoon Sep 3, 2021, 6:52 PM

#

that it should not be included in iloc

desert oar Sep 3, 2021, 6:53 PM

#

@wraith lagoon can you show us the output you're getting and explain what you want to be different?

rigid zodiac Sep 3, 2021, 6:53 PM

#

wraith lagoon that it should not be included in iloc

I'm pretty sure when you slice like that, it will have the header row

desert oar Sep 3, 2021, 6:54 PM

#

it will display the header row because that's part of the dataframe. but the column names are not "part of" the data, they're kept in a separate place

wraith lagoon Sep 3, 2021, 6:54 PM

#

desert oar it will _display_ the header row because that's part of the dataframe. but the c...

ok

rigid zodiac Sep 3, 2021, 6:54 PM

#

Btw Have any one work on the LSTM model? or CNN model before? If yes can I have your code and use it as sample

wraith lagoon Sep 3, 2021, 6:55 PM

#

rigid zodiac Btw Have any one work on the LSTM model? or CNN model before? If yes can I have ...

i started learing pandas

heady yarrow Sep 3, 2021, 6:55 PM

#

!e ```py
import os
os.system("ls")

arctic wedgeBOT Sep 3, 2021, 6:55 PM

#

@heady yarrow :warning: Your eval job has completed with return code 0.

[No output]

heady yarrow Sep 3, 2021, 6:55 PM

#

bruh

desert oar Sep 3, 2021, 6:55 PM

#

!e ```python
import pandas as pd

df = pd.DataFrame([
['a', 11, 12],
['b', 21, 22],
['c', 32, 32],
], columns=['i', 'y', 'z']).set_index('i')

print(df)
print( df.iloc[1] )
print( df.iloc[[1]] )
print( df.iloc[:2] )
print( df.iloc[[0, 2]] )

arctic wedgeBOT Sep 3, 2021, 6:55 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |     y   z
002 | i        
003 | a  11  12
004 | b  21  22
005 | c  32  32
006 | y    21
007 | z    22
008 | Name: b, dtype: int64
009 |     y   z
010 | i        
011 | b  21  22
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ewezobaveh.txt?noredirect

wraith lagoon Sep 3, 2021, 6:55 PM

#

wraith lagoon i started learing pandas

dont know much about

desert oar Sep 3, 2021, 6:56 PM

#

@wraith lagoon https://pandas.pydata.org/docs/user_guide/indexing.html

#

@heady yarrow #bot-commands

wraith lagoon Sep 3, 2021, 6:56 PM

#

desert oar it will _display_ the header row because that's part of the dataframe. but the c...

so when i do df[df.describe().transpose()] this should also work

#

but this dont work

desert oar Sep 3, 2021, 6:57 PM

#

...what did you expect that to do?

wraith lagoon Sep 3, 2021, 6:57 PM

#

just to transpose the dataframe

#

and to overwrite the previous one

desert oar Sep 3, 2021, 6:58 PM

#

i suggest reading the docs 🙂

#

df = df.transpose()

#

don't guess at syntax

rigid zodiac Sep 3, 2021, 6:58 PM

#

wraith lagoon just to transpose the dataframe

https://thispointer.com/drop-first-row-of-pandas-dataframe-3-ways/

thispointer.com

Varun

Drop first row of pandas dataframe (3 Ways)

desert oar Sep 3, 2021, 6:58 PM

#

!d pandas.DataFrame.describe

arctic wedgeBOT Sep 3, 2021, 6:58 PM

#

pandas.DataFrame.describe


DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)```
Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding `NaN` values.

Analyzes both numeric and object series, as well as `DataFrame` column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

desert oar Sep 3, 2021, 6:59 PM

#

!d pandas.DataFrame.transpose

arctic wedgeBOT Sep 3, 2021, 6:59 PM

#

pandas.DataFrame.transpose


DataFrame.transpose(*args, copy=False)```
Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property [`T`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.T.html#pandas.DataFrame.T "pandas.DataFrame.T") is an accessor to the method [`transpose()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html#pandas.DataFrame.transpose "pandas.DataFrame.transpose").

wraith lagoon Sep 3, 2021, 7:02 PM

#

Me totaly confussed

wraith lagoon Sep 3, 2021, 7:59 PM

#

Also why np.vectorize is faster then apply command in pandas . i searched but didnt get what happen in backend of np.vectorize

#

There was a good time difference

#

In apply time diff was 3.something and in vectorize it was in 0.something

#

So why there is a vast diff?

serene scaffold Sep 3, 2021, 8:30 PM

#

I can't figure out how to calculate micro averages with only the scores to be averaged and the percent share.

uncut barn Sep 3, 2021, 8:56 PM

#

is there a way for python (i.e using jupyter notebook) to acess files directly on the USB i.e. without downloading the file locallly first?

#

or other IDEs?

mint matrix Sep 3, 2021, 9:00 PM

#

I am working on some assignment in networkx and need little help of understanding math and graph of it if someone could explain me :
The 2-dimensional lattice generated by vectors u,v consists of the points au+bv for all integers a,b. Considering vectors u,v, and an integer k>0 as input parameters, write a program creating a planar graph of a lattice generated by u,v, whose vertices are the origin (0,0) and only its k closest neighbors among the lattice points, while any vertices related by a translation by only u, v, u+v or u-v should be connected by a straight-line edge.
I have to impleement it in networkx

#

but could not understand how to start

livid kiln Sep 3, 2021, 10:12 PM

#

cold pine yo! i have a problem. I'd like to plot top 5 countries to see if they change acr...

https://flourish.studio/2019/03/21/bar-chart-race/ this comes to mind, check it out!

Flourish

How to make “bar chart race” visualizations without coding

Animated bar charts are taking social media by storm. You can now make one in seconds by uploading a spreadsheet to Flourish.

serene scaffold Sep 3, 2021, 10:21 PM

#

uncut barn is there a way for python (i.e using jupyter notebook) to acess files directly o...

The ide that you use doesn't matter, as you provide a path and python requests the data from the operating system

soft kite Sep 3, 2021, 11:08 PM

#

Would this be the right chat to ask about how to label items in bar charts in matplotlib python library?

desert oar Sep 3, 2021, 11:45 PM

#

wraith lagoon Also why np.vectorize is faster then apply command in pandas . i searched but di...

Show the code you used

wanton spear Sep 4, 2021, 6:50 AM

#

#

guys i cant find a way to add individual range on bins...google didnt help much

crystal harness Sep 4, 2021, 9:55 AM

#

i want to be senpai in ML how long it will take me?

robust cosmos Sep 4, 2021, 10:18 AM

#

guys could anyone help in this code?

#

thows this error:'[2927] not found in axis'

flat hollow Sep 4, 2021, 10:18 AM

#

wanton spear

if you're plotting this using sns.distplot() you can pass a sequence as your bins in bins= karg

robust cosmos Sep 4, 2021, 10:19 AM

#

this is for dropping random rows in a df

robust cosmos Sep 4, 2021, 10:19 AM

#

robust cosmos guys could anyone help in this code?

np.random.seed(np.random.randint(1000))
rand_values=[]
for i in range(2000):
rand_values.append(np.random.randint(5999))
for i in range(2000):
data=data.drop(rand_values[i])

flat hollow Sep 4, 2021, 10:23 AM

#

robust cosmos np.random.seed(np.random.randint(1000)) rand_values=[] for i in range(2000): r...

is your dataframe indexed using integers? .drop() requires the label matches what's in your dataframe. If you want to use iloc[] for this, you could do data.drop(data.iloc[rand_values[i]].name) or sth like that

#

perhaps even better solution than what you're trying to get working would be something like

remove_n = 1
drop_indices = np.random.choice(df.index, remove_n, replace=False)
df_subset = df.drop(drop_indices)

robust cosmos Sep 4, 2021, 10:24 AM

#

the indexes are all integers and sorted , i just want to drop random rows .

flat hollow Sep 4, 2021, 10:25 AM

#

this would take a random choice of the actual indices that exist in your dataframe and pass them into .drop() for removing

velvet thorn Sep 4, 2021, 10:26 AM

#

robust cosmos the indexes are all integers and sorted , i just want to drop random rows .

.sample

velvet thorn Sep 4, 2021, 10:27 AM

#

robust cosmos np.random.seed(np.random.randint(1000)) rand_values=[] for i in range(2000): r...

np.random.seed(np.random.randint(1000))
this part doesn't make sense btw

robust cosmos Sep 4, 2021, 10:28 AM

#

flat hollow perhaps even better solution than what you're trying to get working would be som...

works just fine thanks a lot

robust cosmos Sep 4, 2021, 10:29 AM

#

velvet thorn > np.random.seed(np.random.randint(1000)) this part doesn't make sense btw

i have no idea why i did , was little confused , thanks anyways

robust cosmos Sep 4, 2021, 10:30 AM

#

velvet thorn `.sample`

by updating frac? yeah i could do that

velvet thorn Sep 4, 2021, 10:30 AM

#

robust cosmos by updating frac? yeah i could do that

yeah, I'm guessing you want 1/3 of values

#

or you could just pass 2000

#

works too

robust cosmos Sep 4, 2021, 10:31 AM

#

velvet thorn yeah, I'm guessing you want 1/3 of values

thank you so much been trying so hard and it was this ez

velvet thorn Sep 4, 2021, 10:31 AM

#

robust cosmos thank you so much been trying so hard and it was this ez

yw 👋

crystal harness Sep 4, 2021, 10:33 AM

#

Screenshot_2021-09-04-16-17-53-762_org.mozilla.firefox.jpg

#

Guys I am using matplotlib on Android

serene scaffold Sep 4, 2021, 11:01 AM

#

crystal harness i want to be senpai in ML how long it will take me?

This is something you'd spend a career on

uncut barn Sep 4, 2021, 11:05 AM

#

def get_patches(imgfolder, lvl = 0, size = 512):
  """
  This function takes in an image of certain resolution
  and extracts patches of size - (width, height) and saves them in
  certain folder via a directory

  parameters:
  imgfolder - contains imgs of file type NDPI
  lvl - resolution of the img
  size - square patch so int - (int, int) - (width height)
  """
  
  # we get the number of the image with the file name
  for img_num, imgfile in enumerate(imgfolder, start = 1):
    
    # using openslide to open the img file
    os_img = OpenSlide(imgfile)

    # dimensions of the img at a particular lvl & calculating the interval
    os_img_w_interval = os_img.level_dimensions[lvl][0] / size
    os_img_h_interval = os_img.level_dimensions[lvl][1] / size

    # row of the overall img
    for i in range(os_img_w_interval):
      # column of the overall img
      for j in range(os_img_h_interval):
        # getting a patch of the img
        ptch = os_img.read_region((i * size, j * size), lvl, 
                                          (size, size))
        arr = np.array(ptch)
        # exlcude the last channel column RGBA - as all values are 255 so opaque
        RGB = arr[:, :, :-1] 
        # trying to get rid of all background pxs using a min px threshold
        # and proportion of green px vals
        flat_RGB = RGB.reshape(-1, 3)
        green = flat_RGB[:, 1] # just green pxs
        # proportion value
        prop = len([px for px in green if px > 180]) / len(green)
        # min px in the patch
        min_px = np.min(RGB)
        if (min_px < 130) or (prop < 0.8):
          # saving the image PNG format into the specified directory
          ptch.save('/content/drive/MyDrive/Data/older_retics_patches/img'+ 
                   str(img_num) + '/' + 'patch_' + str(i) + "_" + str(j)+ ".png")

Guys is there a way I can speed up this code, this function extracts patches from images that are of the type NDPI, adn range from 120 MB to 3GB in my data set

boreal wasp Sep 4, 2021, 11:18 AM

#

hi my I know what is nltk?

#

stopwords

ripe forge Sep 4, 2021, 11:21 AM

#

Stopwords are just those filler words we use in English that may not be necessary to convey meaning for a machine.

#

So for example, articles "he is a man" would still retain the meaning more or less if you changed it to "he is man". It may sound weird but for a machine it's learning whatever you send it anyways. The benefit is that you get to reduce the number of words needed

#

So it can lead to better performance for simple models

boreal wasp Sep 4, 2021, 11:22 AM

#

Ahh I see

#

Thank you

serene scaffold Sep 4, 2021, 11:29 AM

#

ripe forge So for example, articles "he is a man" would still retain the meaning more or le...

"man" would be the only non stopword in a lot of systems.

ripe forge Sep 4, 2021, 12:03 PM

#

Yeap yep. But I think it makes explaining it harder, so I'm perfectly happy to fudge the specifics a little to convey the concept easier.

quick kestrel Sep 4, 2021, 12:10 PM

#

Guys how can I make a image identifier

#

Using ai

serene scaffold Sep 4, 2021, 12:25 PM

#

quick kestrel Guys how can I make a image identifier

what are you trying to identify?

boreal wasp Sep 4, 2021, 12:25 PM

#

https://discordapp.com/channels/267624335836053506/342318764227821568/883670432807542784

#

Need help on this

#

Hi I wonder how to grab values starting with '#'?
I've tried using it like this...but it returns error

#

quick kestrel Sep 4, 2021, 12:30 PM

#

@serene scaffold I want to identify images

serene scaffold Sep 4, 2021, 12:30 PM

#

quick kestrel <@253696366952316929> I want to identify images

Please be more specific

quick kestrel Sep 4, 2021, 12:31 PM

#

Like I want to make a anti NSFW system in my discord bot but how will it identify any NSFW content that's where I thought of taking help of ai, my bot will work like if any NSFW content is posted then it will immediately delete it

quick kestrel Sep 4, 2021, 12:32 PM

#

quick kestrel Like I want to make a anti NSFW system in my discord bot but how will it identif...

@serene scaffold

serene scaffold Sep 4, 2021, 12:33 PM

#

quick kestrel Like I want to make a anti NSFW system in my discord bot but how will it identif...

I imagine that's going to be exceptionally difficult as "nsfw content" could be a lot of things. I guess I would see what literature is out there about identifying nudity in images and go from there.

quick kestrel Sep 4, 2021, 12:34 PM

#

serene scaffold I imagine that's going to be exceptionally difficult as "nsfw content" could be ...

Well is it possible with ai to make a anti NSFW

#

?

serene scaffold Sep 4, 2021, 12:34 PM

#

quick kestrel Well is it possible with ai to make a anti NSFW

it is possible yes.

quick kestrel Sep 4, 2021, 12:35 PM

#

serene scaffold it is *possible* yes.

How?

serene scaffold Sep 4, 2021, 12:35 PM

#

quick kestrel How?

I don't personally know, though I already made a suggestion about what you could do next.

quick kestrel Sep 4, 2021, 12:36 PM

#

@serene scaffold what lib shall I use for this???

serene scaffold Sep 4, 2021, 12:36 PM

#

quick kestrel <@253696366952316929> what lib shall I use for this???

idk

quick kestrel Sep 4, 2021, 12:36 PM

#

Aah

serene scaffold Sep 4, 2021, 12:36 PM

#

I have nothing else to suggest that I haven't already suggested.

errant flame Sep 4, 2021, 12:42 PM

#

I just started with machine learning, so sorry if I made a mistake or something.

So I watched a few tutorials on deep reinforced learning with tensorflow and keras. On the build model step, they added different amounts of Flatten, Dense, and Convolution2D. In the CartPole-v0, it was Flatten, Dense, Dense, Dense(actions). In one where the the states were a number from 0 to 100, there was no Flatten. And on the one with Space Invaders, they added 3 Convolution2D ones.

How do I know which layers to add and which arguments to give? I'm currently trying to build one with an input of an 11 x 15 2D array.

dire crane Sep 4, 2021, 1:34 PM

#

quick kestrel How?

You can create a simple model using tensorflow, using hentai images as your database. I recomend taking at least 1000 nsfw images and 1000 non-nsfw images, label them as [nswf and non-nsfw], and then train your model using any neural network architecture.

#

The idea is to create a model thtat can detect if a image if nsfw or not. Its possible to improve it

dire crane Sep 4, 2021, 1:37 PM

#

dire crane The idea is to create a model thtat can detect if a image if nsfw or not. Its po...

'cuz here we are just classifying, but its possible to create a model to detect and classify. To detect and classify there are some specific models, like YOLO, SSD....

dire crane Sep 4, 2021, 1:44 PM

#

errant flame I just started with machine learning, so sorry if I made a mistake or something....

If im not mistaken, in keras doc it explains most of the parameters in every possible layer, and what layer do. I recommend reading keras docs, it helped me. And also, keep studying about Deep Learning and its architectures.

uncut barn Sep 4, 2021, 2:21 PM

#

is there a way in python to check the file size of an image before saving it?

serene scaffold Sep 4, 2021, 2:22 PM

#

uncut barn is there a way in python to check the file size of an image before saving it?

what data type is representing the image file?

uncut barn Sep 4, 2021, 2:23 PM

#

so i've changed my array to a pil.image

serene scaffold Sep 4, 2021, 2:23 PM

#

uncut barn so i've changed my array to a pil.image

Please always provide code as text. But let me see

quick kestrel Sep 4, 2021, 2:24 PM

#

dire crane You can create a simple model using tensorflow, using hentai images as your data...

How th I am going to get 1000 images and how will I install them in it lol

serene scaffold Sep 4, 2021, 2:25 PM

#

uncut barn so i've changed my array to a pil.image

I'm not sure how to know what the size of the file will be before you save it, but there's os.path.getsize

lapis sequoia Sep 4, 2021, 2:27 PM

#

quick kestrel How th I am going to get 1000 images and how will I install them in it lol

~~Just automate your browser~~ use an API to get them 👀

dire crane Sep 4, 2021, 2:27 PM

#

quick kestrel How th I am going to get 1000 images and how will I install them in it lol

To install: https://www.tensorflow.org/install, and to get the images, you can go to kaggle (web site with a lot of databases) or you can do it yourself manually or maybe Web Scraping can do the job (i've never done this with image)

serene scaffold Sep 4, 2021, 2:27 PM

#

uncut barn so i've changed my array to a pil.image

You could also save it to an in-memory file and use that.

image: PIL.Image

from io import BytesIO
img_file = BytesIO()
image.save(img_file, 'png')
print(img_file.tell())

uncut barn Sep 4, 2021, 2:28 PM

#

ok thanks will try this

lapis sequoia Sep 4, 2021, 2:29 PM

#

lapis sequoia ~~Just automate your browser~~ use an API to get them 👀

https://pics.hori.ovh:8036/docs/ Many people use this api in their discord bots, you can try it @quick kestrel rooThink

quick kestrel Sep 4, 2021, 2:29 PM

#

@dire crane Well what's web scrapping

#

??

#

Never tried it

dire crane Sep 4, 2021, 2:30 PM

#

Its a method to get data from web, literally scrap data from web sites

lapis sequoia Sep 4, 2021, 2:31 PM

#

Like getting all the html content of the site and parse it to get some data out of it

dire crane Sep 4, 2021, 2:31 PM

#

There is a librarie called Beautiful Soup that do this for you

#

yes

quick kestrel Sep 4, 2021, 2:32 PM

#

Well it is really done with selenium?

lapis sequoia Sep 4, 2021, 2:32 PM

#

Or just make requests...? easier and faster

quick kestrel Sep 4, 2021, 2:32 PM

#

lapis sequoia Or just make requests...? easier and faster

Http request?

lapis sequoia Sep 4, 2021, 2:32 PM

#

Yes

quick kestrel Sep 4, 2021, 2:32 PM

#

Ah me nub in python currently started working with keras

lapis sequoia Sep 4, 2021, 2:32 PM

#

import requests
data = requests.get('https://github.com/AkshuAgarwal').text()
# now use bs4

quick kestrel Sep 4, 2021, 2:33 PM

#

lapis sequoia ```py import requests data = requests.get('https://github.com/AkshuAgarwal').tex...

I want images not text

lapis sequoia Sep 4, 2021, 2:33 PM

#

You'll get the whole html of the site

#

Do some reverse engineering to find the ids of image tags and parse them

quick kestrel Sep 4, 2021, 2:33 PM

#

lapis sequoia You'll get the whole html of the site

Ah I see, so it is faster and can be used to get many images

#

Right?

lapis sequoia Sep 4, 2021, 2:34 PM

#

Best is actually to use an API

lapis sequoia Sep 4, 2021, 2:34 PM

#

lapis sequoia <https://pics.hori.ovh:8036/docs/> Many people use this api in their discord bot...

^

quick kestrel Sep 4, 2021, 2:35 PM

#

lapis sequoia Best is actually to use an API

Ok thanks

#

@dire crane need some more help bud

dire crane Sep 4, 2021, 2:36 PM

#

what you need?

quick kestrel Sep 4, 2021, 2:36 PM

#

What are the things I need to learn to make this like tensorflow

#

And what's the use of neural networks in ai, I mean does it work like neurones of human to send data and responses from one location to another

dire crane Sep 4, 2021, 2:40 PM

#

If you really want to go straight to hands-on i suggest you to read tensorflow and keras docs. Keras is like tensorflow high level, with all deep learning layers already created. But i highly suggest you to study the basics of a neural network architecture, ''because knowing the basics you can create simple architectures to predict values.

quick kestrel Sep 4, 2021, 2:41 PM

#

dire crane If you really want to go straight to hands-on i suggest you to read tensorflow a...

And do I need maths in it??

dire crane Sep 4, 2021, 2:42 PM

#

quick kestrel And what's the use of neural networks in ai, I mean does it work like neurones o...

Basically this architectures were created based on humans' neurons, i dont know must details, but feel free to study it, must be very interesting. Well with Machine Learning or Deep Learning (actually Deep Learning is a part of Machine Learning) we can do a lot of things, like image classification, predict stock values and other things

dire crane Sep 4, 2021, 2:42 PM

#

quick kestrel And do I need maths in it??

Its good to know the basis, but you dont need to know everything, unless you want to go to research area

quick kestrel Sep 4, 2021, 2:43 PM

#

So no need of maths in it?

dire crane Sep 4, 2021, 2:43 PM

#

but knowing simple math and statistics itss fine

quick kestrel Sep 4, 2021, 2:43 PM

#

dire crane but knowing simple math and statistics itss fine

Ok

#

So to do those calculations of ai do I need numpy or can do it without any module, and are neurons used to transfer data like api does?

dire crane Sep 4, 2021, 2:45 PM

#

quick kestrel So to do those calculations of ai do I need numpy or can do it without any modul...

tensorflow do it all

quick kestrel Sep 4, 2021, 2:45 PM

#

dire crane tensorflow do it all

Wdym?

dire crane Sep 4, 2021, 2:46 PM

#

Tensorflow has a lot of methods that do the math

quick kestrel Sep 4, 2021, 2:46 PM

#

So whats the use of keras and what is the use of neural networks

dire crane Sep 4, 2021, 2:47 PM

#

Keras is an API that facilitates the use of tensorflow

#

https://keras.io/

Keras: the Python deep learning API

Keras documentation

quick kestrel Sep 4, 2021, 2:48 PM

#

dire crane Keras is an API that facilitates the use of tensorflow

U mean I need keras to work with tensorflow to do the neural network stuff?

lapis sequoia Sep 4, 2021, 2:49 PM

#

Hey I've been working on this imbalanced dataset for awhile, however, can't seem to find a good enough structure to it.
Process:

Imported the csv (99:1, target) and fixed the skew
Created 50-50 under sample as a train set
Scaled the train
LogisticRegression
GridSearchCV to find best f1 score for the minority target (fitted the train under sampled df)
scaled the original dataset (X_val, y_val)
classification report and confusion matrix of predict(X_val, y_val)
The results are horrible, the maximum I am getting f1_score is 6%, could someone please share your experience and approach on imbalanced datasets?

dire crane Sep 4, 2021, 2:49 PM

#

quick kestrel U mean I need keras to work with tensorflow to do the neural network stuff?

yes and no, yyou can do everything just with tensorflow, but its too hard to practice as a beginner using just tensorflow, so Keras helps us doing the job

#

There is a book called: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, its a very good book, i recommend

quick kestrel Sep 4, 2021, 2:51 PM

#

dire crane There is a book called: Hands-On Machine Learning with Scikit-Learn, Keras, and ...

Well scikit learn does the ai stuff too right?

dire crane Sep 4, 2021, 2:51 PM

#

yes, thats another library

quick kestrel Sep 4, 2021, 2:52 PM

#

dire crane yes, thats another library

So my last question which is the best lib to do all this ai and ml stuff

dire crane Sep 4, 2021, 2:53 PM

#

That depends on your problem, if you want to solve simple problems like linear problems scikilearn may solve your problems. But if is more complex i prefer using Kera + tensorflow

quick kestrel Sep 4, 2021, 2:54 PM

#

dire crane That depends on your problem, if you want to solve simple problems like linear p...

So tbh I have a lot of work to do with ai and ml and many of them are just a bit complex too, so shall I use tensorflow and keras

dire crane Sep 4, 2021, 2:56 PM

#

take it slow, study AI is a long journey. I've been studying it for 2-3 years and still there is so much to learn. I hope you enjoy this journey of AI ;D

quick kestrel Sep 4, 2021, 2:58 PM

#

dire crane take it slow, study AI is a long journey. I've been studying it for 2-3 years an...

2-3 years that's a pretty long time

#

Well thanks foor the help bud

dire crane Sep 4, 2021, 2:59 PM

#

you're wellcome! ;3

median cliff Sep 4, 2021, 4:03 PM

#

Anyone have any resources for some stats analysis on a feature map for a potential CNN? I have a lot of data and I'm trying to check for skew/normality/etc etc

grave frost Sep 4, 2021, 4:54 PM

#

austere swift whenever i wanna learn a new kind of model or something I try to replicate it fr...

bet you forget about stuff like transformers ehehe 😉

#

but if you did, big balls to you man

hasty mountain Sep 4, 2021, 5:03 PM

#

Hey guys, I'm trying to read an image to use as training set. However, for some motive, instead of getting a 3D RGB array, I'm getting a 4D RGB array with shape (26, 25, 4): ```
[[[186 26 51 255]
[194 51 68 255]
[205 82 90 255]
...
[221 132 134 255]
[215 151 156 255]
[208 167 177 255]]

I've tried using `np.delete(array, 3, axis=1)` to remove this `255` from the array, but it didn't work as I expected. Can someone give me a hint on how to remove this `255` column?

median cliff Sep 4, 2021, 5:08 PM

#

@hasty mountain I believe opencv2 has some options to get down to 3 channels

hasty mountain Sep 4, 2021, 5:10 PM

#

Hm... How about keras.layers? (I want to use this set in a neural network)

median cliff Sep 4, 2021, 5:11 PM

#

I'm not sure how you pre-processed your data. Can you elaborate on how you got that array?

hasty mountain Sep 4, 2021, 5:12 PM

#

I just loaded an image using image.io (but it could be matplotlib.pyplot. It gives the same result).

median cliff Sep 4, 2021, 5:13 PM

#

I could be wrong here, but you'll want to get rid of that (I think its an alpha channel) before you convert to the feature map

hasty mountain Sep 4, 2021, 5:14 PM

#

Well, I don't know how that 4th dimension appeared

#

Maybe I messed up in Paint 3D...

median cliff Sep 4, 2021, 5:15 PM

#

@hasty mountain https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html?highlight=cvtcolor#void cvtColor(InputArray src, OutputArray dst, int code, int dstCn)

#

Here is an option in opencv2

hasty mountain Sep 4, 2021, 5:19 PM

#

median cliff <@!388857837222100993> https://docs.opencv.org/2.4/modules/imgproc/doc/miscellan...

I see. I'll take a look, thanks!

austere swift Sep 4, 2021, 5:21 PM

#

grave frost bet you forget about stuff like transformers ehehe 😉

I did actually, but for that one I had to get some help from pytorchs seq2seq tutorial to understand some stuff

grave frost Sep 4, 2021, 5:22 PM

#

austere swift I did actually, but for that one I had to get some help from pytorchs seq2seq tu...

chad

austere swift Sep 4, 2021, 5:22 PM

#

I basically tried it on my own, didn’t work very well, then went to that to see what I did wrong

arctic wedgeBOT Sep 4, 2021, 6:52 PM

#

:incoming_envelope: :ok_hand: applied mute to @topaz yew until <t:1630782168:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia Sep 4, 2021, 7:33 PM

#

Who are you?

#

Oh. Well, no

plucky oar Sep 4, 2021, 9:12 PM

#

so im trying to create an ai and this error is popping up:

edgy gulch Sep 4, 2021, 9:20 PM

#

Would anyone be able to help me make sure my implementation of multinomial naive bayes is correct?

plucky oar Sep 4, 2021, 9:22 PM

#

    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'pyttsx3.drivers.sap15' ```

#

thats my code

#


engine = pyttsx3.init("sap15")
voices = engine.getProperty("voices")
print(voices[0].id)
engine.setProperty("voices", voices[0].id)```

#

ahh if you didnt know you need to wait for the first one to finish their problem because like that your blocking them from getting help

#

people wont see their message like thst

slow jewel Sep 4, 2021, 9:32 PM

#

oh sorry

plucky oar Sep 4, 2021, 9:32 PM

#

in

#

t

#

its not a problem

slow jewel Sep 4, 2021, 9:32 PM

#

k

plucky oar Sep 4, 2021, 9:33 PM

#

ill notify you when i get my problem solved

slow jewel Sep 4, 2021, 9:33 PM

#

ok thanks

plucky oar Sep 4, 2021, 9:49 PM

#

i think i solved it

plucky oar Sep 4, 2021, 9:49 PM

#

slow jewel ok thanks

yea you can post now

chilly torrent Sep 4, 2021, 10:15 PM

#

Hope it's ok if I do some self promotion her for my unmonetized blog, I just release an article about python scripting for big data.
https://www.babbling.fish/elt-cookbook-python/

Python for Distributed Systems

Common design patterns used by data engineers to write python scripts that can be horizontally scaled on a stateless task runner.

slow jewel Sep 4, 2021, 10:17 PM

#

I need help reorganizing a pandas data frame. Its a bunch of stock market data about a bunch of companies. Right now, it is ordered where every row is a point in time describing a single stock. There is a row for each of the stocks for every 5 minutes over the course of some time. I need it so the index is the date and time and it would have many columns such as ALKT.US_OPEN, ALKT.US_HIGH,... ect, for all of the stocks. That way, each row is a snapshot of the whole dataset at a given time. Can anyone give me an idea about how to do this? Heres some example rows.
<TICKER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE> <VOL> 319763 ALKT.US 20210729 185000 31.8400 31.96 31.840 31.85 1275 24615 ABCL.US 20210730 191500 15.3000 15.31 15.295 15.31 1595 55906 ACB.US 20210804 165500 7.0782 7.09 7.055 7.09 12275 843228 BCRX.US 20210826 203000 15.4100 15.43 15.395 15.41 5699 453707 ANTE.US 20210820 160000 2.5000 2.50 2.500 2.50 100

chilly torrent Sep 4, 2021, 10:18 PM

#

check out the pivot table method

slow jewel Sep 4, 2021, 10:23 PM

#

thanks, i just looked at that. Although its similar, its not really what I intend

mortal dove Sep 4, 2021, 10:26 PM

#

Could you also provide an example of what you'd want the dataframe to look like after transforming it?

slow jewel Sep 4, 2021, 10:27 PM

#

sure one sec

#

<AMZN.US_OPEN> <AMZN.US_HIGH> ... <APPL.US_OPEN> <APPLE.US_HIGH> ... for every stock 2021-07-1 3:00 1000 1010 150 155 2021-07-1 3:05 1001 1011 144 155 ...

#

like this

#

the index of the row is the date and time and there would be a column for each stock's data like AMZN_OPEN or APPL_HIGH

old axle Sep 5, 2021, 12:38 AM

#

i dont know if this is an appropriate question for this channel but

#

i have a dataset of jobs and salaries and the job titles are kinda weird? like some of them repeat and have roman numerals at the end of them and im kinda confused

#

like for example theres several "Accountant/Auditor"s but each has a different roman numeral at the end of it

#

and i was just wondering if anyone knew what this meant

#

https://catalog.data.gov/dataset/average-salary-by-job-classification this is the dataset btw

desert oar Sep 5, 2021, 3:17 AM

#

old axle like for example theres several "Accountant/Auditor"s but each has a different r...

Pay and seniority grades probably

old axle Sep 5, 2021, 3:17 AM

#

huh ok

#

thank you!

desert oar Sep 5, 2021, 3:18 AM

#

HR departments at large organizations have things like that

old axle Sep 5, 2021, 3:18 AM

#

weird

#

eugh jobs are so confusing

desert oar Sep 5, 2021, 3:19 AM

#

https://www.indeed.com/cmp/Biogen/faq/what-does-the-number-following-the-job-title-mean-for-example-manufacturing-associate-i-ii-iii-iv?quid=1c2ss81e65j4qf7r

What does the number following the job title mean? For example, man...

Find answers to 'What does the number following the job title mean? For example, manufacturing associate I,II,III,IV' from Biogen employees. Get answers to your biggest company questions on Indeed.

old axle Sep 5, 2021, 3:22 AM

#

oh!! thank you so much :)

tardy jungle Sep 5, 2021, 3:42 AM

#

#

As we are on stock market

#

And also value of higj

#

High**

dusty cloud Sep 5, 2021, 5:56 AM

#

Hi all. How do generally the reports submitted by data scientists to the higher mgmnt look like in major companies? In what format do they submit? Are those word docs with codes attached? Or ppts? or smething else?

ripe forge Sep 5, 2021, 6:08 AM

#

Anyone non technical cannot digest codes. It's usually ppts all the way

buoyant adder Sep 5, 2021, 8:34 AM

#

For the next few days I'll be explaining some data science concepts through 1 minute videos so that everyone has a basic intuition and clears their basic concepts. Here's today's video: https://youtu.be/gDqeW2t2dV0

YouTube

Analytica

Data Collection | Data Science Concepts in 1 minute

This will will give you an intuition about what data collection is in Data Science, its necessity, requirement and the different ways to do it with a simple and easy example.
Join this telegram group is you are serious about learning data science and want to avail free organised resources that are added and updated everyday: https://t.me/analyt...

▶ Play video

orchid silo Sep 5, 2021, 8:49 AM

#

buoyant adder For the next few days I'll be explaining some data science concepts through 1 mi...

thank

#

thanks

buoyant adder Sep 5, 2021, 9:13 AM

#

Welcome!

orchid adder Sep 5, 2021, 10:12 AM

#

buoyant adder For the next few days I'll be explaining some data science concepts through 1 mi...

Thank you

hoary wigeon Sep 5, 2021, 10:17 AM

#

which algo can be used to classify these data ? any idea ?

grave frost Sep 5, 2021, 11:23 AM

#

hoary wigeon which algo can be used to classify these data ? any idea ?

probably a weird sort of polynomial or smthing?

hoary wigeon Sep 5, 2021, 11:25 AM

#

yeah

grave frost Sep 5, 2021, 11:42 AM

#

that kinda looks like the sequence of prime numbers or something in 3 blue 1 brown's video

serene scaffold Sep 5, 2021, 11:55 AM

#

hoary wigeon which algo can be used to classify these data ? any idea ?

I thought I saw that one can use SVMs in such a way that the two spirals become closer together.

#

see if there's a kernel function for that.

tender hearth Sep 5, 2021, 12:00 PM

#

hoary wigeon which algo can be used to classify these data ? any idea ?

regular FNN should do fine

velvet thorn Sep 5, 2021, 12:01 PM

#

tender hearth regular FNN should do fine

what's FNN?

#

is that a new NN type

#

man I've been out of touch for so long

#

I don't even recognise the techniques that are current any more

#

😔

tender hearth Sep 5, 2021, 12:01 PM

#

Quite the opposite haha

velvet thorn Sep 5, 2021, 12:01 PM

#

serene scaffold I thought I saw that one can use SVMs in such a way that the two spirals become ...

this sounds like it?

tender hearth Sep 5, 2021, 12:01 PM

#

feed forward neural network

velvet thorn Sep 5, 2021, 12:02 PM

#

it looks exactly like the kind of thing a projection into a higher dimensional space would solve

velvet thorn Sep 5, 2021, 12:02 PM

#

tender hearth feed forward neural network

oh...

#

😔

tender hearth Sep 5, 2021, 12:02 PM

#

hahahaha

serene scaffold Sep 5, 2021, 12:03 PM

#

and here I was thinking I was onto something. Also what exactly makes a NN feed forward?

tender hearth Sep 5, 2021, 12:03 PM

#

I know for sure because that screenshot looks like it's from Tensorflow's neural network playground and on Google's ML Crash Course they trained an FNN on exactly that pattern

tender hearth Sep 5, 2021, 12:04 PM

#

serene scaffold and here I was thinking I was onto something. Also what exactly makes a NN *feed...

the input does one pass through the network

serene scaffold Sep 5, 2021, 12:06 PM

#

tender hearth the input does one pass through the network

as opposed to having multiple epochs or what?

tender hearth Sep 5, 2021, 12:06 PM

#

like it doesn't loop compared to RNNs

#

i'm talking about MLPs

velvet thorn Sep 5, 2021, 12:08 PM

#

serene scaffold and here I was thinking I was onto something. Also what exactly makes a NN *feed...

direction of information flow

#

for example

#

well, I guess RNNs are a great example

#

🙂

frosty sigil Sep 5, 2021, 12:15 PM

#

Try one of these maybe? https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

abstract sinew Sep 5, 2021, 1:20 PM

#

haha oh my god so many acronyms

grave frost Sep 5, 2021, 1:59 PM

#

tender hearth feed forward neural network

I think its called FFN in the lingo 😁

keen shadow Sep 5, 2021, 2:05 PM

#

I have an issue with numpy, when I pass an image through np.asarray.

It seems to change the shape of the image? I have used a work around but is there any reason to why it does this?
Code

def extract_frames(stream: io.BytesIO):

    bt_array = bytearray(stream.read())
    actual = io.BytesIO(bt_array)
    
    try:
        gif = mimread(actual, memtest='1MB', pilmode="RGB")
    except ValueError:
        gif = [np.asarray(bt_array, dtype=np.uint8)]

    frames = gif[:MAX_FRAMES]

The sizes that I got where:

(671, 75) - Original Size
[(13696,)] - After passing through np.asarray
[(1, 13696)] - Before saving to memory

lapis sequoia Sep 5, 2021, 2:06 PM

#

can anyone tell me list of AI modules used ?

#

nm

uncut barn Sep 5, 2021, 2:15 PM

#

For the same img saved in different versions i.e. jpg, png, tiff, etc. would all the values in the array (RGB values) be the same?

keen shadow Sep 5, 2021, 2:28 PM

#

uncut barn For the same img saved in different versions i.e. jpg, png, tiff, etc. would all...

for the current image yes but for others no

#

its just a simple black and white im

G-hRRNjxfWaacpGnLzgi8lr7F5U7sGEDXNMxqV_LFD6Jm8lZ-f2PoSKO0ZMwaynCp04p9H0uxKV_jg5CQL7H4L9t1ckv5seXROXi.png

desert bear Sep 5, 2021, 2:29 PM

#

Hey, I'm doing some research on algorithms that help detect outliers (anomalies) in dataset. Does anybody have advice or experiences on how to automate it?

true estuary Sep 5, 2021, 2:39 PM

#

What do you want to automate ? Outlier detection ?

keen shadow Sep 5, 2021, 2:48 PM

#

probably to identify data that doesnt suit the pattern

desert bear Sep 5, 2021, 2:56 PM

#

true estuary What do you want to automate ? Outlier detection ?

Yes

#

I've been given a task to research some options, that could help detect anomalies once fed with data from specific timeframe once in a while.

true estuary Sep 5, 2021, 3:00 PM

#

Well there are a bunch of packages and approaches. How comfortable are you with the math ?

#

Do you want to use a ready made software tool that comes with ready made models ? Or code a model ?

#

This example shows you how to code a model and train it
https://keras.io/examples/timeseries/timeseries_anomaly_detection/

Keras documentation: Timeseries anomaly detection using an Autoencoder

desert bear Sep 5, 2021, 3:01 PM

#

true estuary Well there are a bunch of packages and approaches. How comfortable are you with ...

I think, well enough

#

I prefer to code the model myself. I know a library pycaret that seems to do all the code work itself, and I would like to avoid that

dusky dome Sep 5, 2021, 3:01 PM

#

Hi can someone help me with this

true estuary Sep 5, 2021, 3:02 PM

#

All right, then the approach that example takes is the level of code you're looking for

dusky dome Sep 5, 2021, 3:02 PM

#

desert bear Sep 5, 2021, 3:02 PM

#

true estuary All right, then the approach that example takes is the level of code you're look...

Do you think that I can use this approach when I am dealing with data containing transactions?

true estuary Sep 5, 2021, 3:02 PM

#

I'd start with that one, just to get a feel. Or look for an even simpler model, density estimation, to have a real simple baseline

true estuary Sep 5, 2021, 3:03 PM

#

desert bear Do you think that I can use this approach when I am dealing with data containing...

I was going to ask, what's you data. Timeline, or other

desert bear Sep 5, 2021, 3:03 PM

#

So far i've researched models like LOF, Isolation Forest, knn, SVM

true estuary Sep 5, 2021, 3:03 PM

#

Transactions, sounds like fraud detection ?

desert bear Sep 5, 2021, 3:03 PM

#

true estuary Transactions, sounds like fraud detection ?

I would consider them fraud yes

#

There is a nice paper on similiar topic https://www.diva-portal.org/smash/get/diva2:897808/FULLTEXT01.pdf

true estuary Sep 5, 2021, 3:04 PM

#

Right, well fraud detection is one of the most used applications of anomaly detection, you should be able to find tons of resources

#

I personally don't know much about it, like what models/tools are the right ones to try

#

Hope someone will recommend some resources to get started

true estuary Sep 5, 2021, 3:05 PM

#

desert bear So far i've researched models like LOF, Isolation Forest, knn, SVM

Nice, did you try those with your data ?

desert bear Sep 5, 2021, 3:06 PM

#

Most of papers, that I encountered seems to research the models implementation and their benchmarks.

true estuary Sep 5, 2021, 3:06 PM

#

desert bear Do you think that I can use this approach when I am dealing with data containing...

That keras anomaly example works with timeline data, so I don't expect it to work with transaction data

desert bear Sep 5, 2021, 3:06 PM

#

true estuary Nice, did you try those with your data ?

Well, I personally did not, but a colleague did a while ago

true estuary Sep 5, 2021, 3:23 PM

#

Well sounds like if you're reading papers

zenith slate Sep 5, 2021, 3:33 PM

#

hi guys is writing custom loss functions a must know and working with tensors a must-know for AI practitioners?

#

I would appreciate if I could have an eye on helping me debug a weighted-loss multiclass custom loss function I'm attempting

uncut barn Sep 5, 2021, 3:54 PM

#

Is it better to save the image (given that we have thousands of images to save) as a numpy array, or as an image file i.e. as a jpg, png or tiff file etc? (this would be regarding the runtime, disk space (storage)) As after this I would be feeding these images into a CNN

zenith slate Sep 5, 2021, 3:57 PM

#

@uncut barn definitely image files because you can make use of compression

serene scaffold Sep 5, 2021, 6:06 PM

#

zenith slate hi guys is writing custom loss functions a must know and working with tensors a ...

I don't believe these are as relevant to statistical approaches to AI.

#

Loss functions are important for neural network-based AI. And then I think "tensor" means something different in Python's data science ecosystem than it does in pure mathematics.

shell berry Sep 5, 2021, 6:12 PM

#

Anyone know of any cool datasets that aren't something boring like simple image recognition or house prices/stuff like that?

#

and non-NLP 😛 maybe something scientific

serene scaffold Sep 5, 2021, 6:17 PM

#

shell berry and non-NLP 😛 maybe something scientific

why not nlp? sad_cat

ripe forge Sep 5, 2021, 6:21 PM

#

they said non boring! 😛

#

perhaps take the dataset from this tutorial. https://www.tensorflow.org/tutorials/images/classification

TensorFlow

Image classification | TensorFlow Core

shell berry Sep 5, 2021, 6:28 PM

#

serene scaffold why not nlp? <:sad_cat:827636798666309712>

Just finished a degree in NLP and spent a long time doing only NLP haha

#

thanks @ripe forge

raven cloud Sep 5, 2021, 6:49 PM

#

img = cv.imread(cv.samples.findFile(img_name))

Im having trouble finding out what samples. && findFile.
Its with respect to import argparse
I can assure u its not from opencv, any advice where to search in the code to decipher that. Or is this information provided by me way to sPARSE (pun intended) ?
I dont like argsparse

wide meadow Sep 5, 2021, 6:54 PM

#

I'm just a beginner in python and ML, can someone please tell what apply(lambda x: x) does in df['y'] = df['y'].apply(lambda x: x)

ripe forge Sep 5, 2021, 6:55 PM

#

raven cloud ```python img = cv.imread(cv.samples.findFile(img_name)) ``` Im having trouble f...

you can assure us it's not from opencv, but your code tells us otherwise. perhaps just run print(cv.samples) and see what it says.

true elk Sep 5, 2021, 6:55 PM

#

Hey guys! I'm a python enthusiast and my uncle runs a car upholstery business. I want to help him to come up with new designs for car seat covers using pictures of previous works.
Is it doable? I have only a few notions of ML

#

I can also get a set of 100 vectorial designs with the same angle

#

I don't need complex material imitation and other fancy stuffy, I only need "lines". Basically black and white

#

I've read some stuff about GAN but it seems overpowered for what I need

ripe forge Sep 5, 2021, 6:58 PM

#

wide meadow I'm just a beginner in python and ML, can someone please tell what apply(lambda ...

to me, it seems like that should do nothing. specifically apply takes a function and runs it on every item. but lambda x: x is a function that takes a parameter x and returns x so i don't see the point

wide meadow Sep 5, 2021, 6:59 PM

#

ripe forge to me, it seems like that should do nothing. specifically `apply` takes a functi...

Thank you!

raven cloud Sep 5, 2021, 7:00 PM

#

my bad, I have just never seen those methods before I guess it is opencv.
Lack of sleep can get to u

ripe forge Sep 5, 2021, 7:00 PM

#

you're fine! (though yes, sleep is definitely worth it regardless)

raven cloud Sep 5, 2021, 7:01 PM

#

it was worth being wrong just to add the pun though

raven cloud Sep 5, 2021, 7:12 PM

#

ripe forge you're fine! (though yes, sleep is definitely worth it regardless)

I think I should have added this part 😮

for img_name in args.img:
  img = cv.imread(cv.samples.findFile(img_name))``` 
please dont tell me to read a guide on args, to tired 😅

```python
args = parser.parse_args()```

#

ill see if print(args...) works

ripe forge Sep 5, 2021, 7:26 PM

#

raven cloud I think I should have added this part 😮 ```python for img_name in args.img: ...

hey, what's your question here? args.img looks like some kind of iterable that sounds like it should have image names. nothing special as such.

raven cloud Sep 5, 2021, 7:28 PM

#

👋 ,
args.img
comes from args = parser.parse_args() , I have no clue what parser.parse_args() is supposed to be.

#

And how to do get the background like what you did for "args.img"

#

perhaps my question is still to sParse?

#

argparse is making an arse out of me I tell you what

#

    args = parser.parse_args()
    print(args)
    # read input images
    imgs = []
    for img_name in args.img:
        img = cv.imread(cv.samples.findFile(img_name))

I just need to know where args.img is searching

ripe forge Sep 5, 2021, 7:36 PM

#

oh. so, that looks like command line args. (and for displaying code you can use backticks. single ` for inline)

#

so args is just the argparse object. it just "holds" all the arguments. argparse itself is a way to hold parameters when the file is run via command line with extra arguments. so, say python thisfile.py arg1 arg2 arg3 for example is one way to run python files with extra commands

#

that's probably what info you're missing. this code wont show you what img is unless you see what command it was run with.

#

so img is just whatever is specified in the parser arguments in the code somewhere. it's actual value is decided at runtime based on the command the file is run with.

#

for what it's worth, it's easy to assume it's basically like a list of strings, file names.

raven cloud Sep 5, 2021, 7:39 PM

#

parser.add_argument('img', nargs='+', help = 'input images')

#

😵‍💫

#

im really not a cli guy

#

not yet anyways

ripe forge Sep 5, 2021, 7:40 PM

#

yep, so if you wanted to see how exactly that works, you'd want to read up on parser. but if you want a tldr, img is a list that's being given values during the cli run of this file

raven cloud Sep 5, 2021, 7:42 PM

#

img is a list that's being given values during the cli run of this file
exactly what im trying to look for 😅

ripe forge Sep 5, 2021, 7:43 PM

#

the code of the file won't have it.

#

it would depend exactly on what command args were written when this file is being run

raven cloud Sep 5, 2021, 7:44 PM

#

can I not just my relative path ?
https://github.com/opencv/opencv/blob/master/samples/python/stitching.py

#

better not though

#

I think the entire code would need rewriting then

ripe forge Sep 5, 2021, 7:45 PM

#

you can do whatever with your own codes. all i'll say is, you can't look at the code to see what input the user passed basically

raven cloud Sep 5, 2021, 7:45 PM

#

to a certain extent only

ripe forge Sep 5, 2021, 7:45 PM

#

it's like reading a code that says input("blah") and trying to figure out what the user wrote.

raven cloud Sep 5, 2021, 7:45 PM

#

ripe forge you can do whatever with your own codes. all i'll say is, you can't look at the ...

me looking at the code for the last one hour lol

ripe forge Sep 5, 2021, 7:45 PM

#

cli args are basically like the cli version of input

raven cloud Sep 5, 2021, 7:46 PM

#

Thanks very much 🙂

#

Cant help not looking at the code, guess I gotta read the docs lol

#

hold on u said its like a input

ripe forge Sep 5, 2021, 7:51 PM

#

in terms of the general concept, input is a separate thing ofcourse.

raven cloud Sep 5, 2021, 7:52 PM

#

yeah

ripe forge Sep 5, 2021, 7:52 PM

#

i only meant both are similar in the sense that the code won't know what is passed to it

raven cloud Sep 5, 2021, 7:52 PM

#

Im trying to run on cmd

#

stitching.py: error: the following arguments are required: img same error but I think I may be close now unless u know ?

#

wait

#

its working 🤦‍♂️ 🤦‍♂️ 🤦‍♂️ Instructions were sort of there, albeit not that clear

#

Thank you Darr!

#

you mentioning ìnputresulted in an output 😄

ripe forge Sep 5, 2021, 8:01 PM

#

np! 🙂

tough bolt Sep 5, 2021, 8:18 PM

#

Hey there!

I have this bunch of code right here:
(python!)

   for adj_t in adj_matrices:
        dissimilarity_row = []
        for adj_tdt in adj_matrices:
            dissimilarity_measure = ((adj_t - adj_tdt)**2).sum()
            dissimilarity_row.append(dissimilarity_measure)
        dissimilarity_matrix.append(dissimilarity_row)

And as it rises exponentially in time with the dataset it becomes notoriously slow really fast.
Right now this is all default python, no numpy or anything

#

adj_matrices is a list

#

Up to 90k+ in length

#

adj_t is a matrix (in form of 2 dimensional python list)

#

looks like this

#

The code is calculating the squared eucludian distance between each of the 90k+ points.

#

Any idea how and if this could be optimised with numpy.

And if so - with which numpy functions

tidal bough Sep 5, 2021, 8:22 PM

#

How did using scipy.pdist go?

tough bolt Sep 5, 2021, 8:23 PM

#

Hmm, I ditched that at some point, I'm not sure why.

#

not sure if pdist would work all that well for me as I need to add another factor later on to the dissimilarity_measure var

tidal bough Sep 5, 2021, 8:24 PM

#

this seems to work fine for me, say:

import numpy as np
import scipy.spatial

#random data in that format (1000 10x10 matrices of bools)
arrs = np.random.randint(0,2,(1000,10,10), dtype=np.uint8)

#use pdist, after flattening each matrix:
dists = scipy.spatial.distance.pdist(arrs.reshape(arrs.shape[0],-1))
# this is a "compressed" distance array

tidal bough Sep 5, 2021, 8:25 PM

#

tough bolt not sure if pdist would work all that well for me as I need to add another facto...

Ah, now that would indeed be a problem

#

you can instead straightforwardly vectorize your loop, then, like, uhh...

tough bolt Sep 5, 2021, 8:26 PM

#

tidal bough Ah, now that would indeed be a problem

yeah, I'd later like to factor in values like velocity, amount of neighboured points, etc. data that isn't inlcuded in the ecludian distance

tough bolt Sep 5, 2021, 8:26 PM

#

tidal bough you can instead straightforwardly vectorize your loop, then, like, uhh...

hmm

#

I mean, if it really can't be optimized, so be it. The dataset shouldn't exceed 100k entries. But still a shame - I'm also scared it will overload the ram

#

but that's probably another story

tidal bough Sep 5, 2021, 8:29 PM

#

tough bolt hmm

def pairwise_dists(arrs):
    # start by flattening the matrices, since we are taking diffs only anyway
    arrs = arrs.reshape(arrs.shape[0],-1) # shape: n_points, N*M
    # We need to subtract each row from all others.
    # That can be done by using some clever broadcasting
    arrs1 = arrs.reshape(arrs.shape[0], 1, -1) # "rotate" it so that the rows are pointing depthwise
    arrs2 = arrs.reshape(1,arrs.shape[0], -1)
    diffs = arrs1 - arrs2 # shape: n_points, n_points, N*M
    distances = (diffs**2).sum(axis=2) # sum depthwise, so over each original row
    return distances # n_points, n_points
# short version that might be a bit faster, not sure:
def pairwise_dists(arrs):
    n = arr.shape[0]
    return ((arrs.reshape(n,1,-1) - arrs.reshape(1,n,-1))**2).sum(axis=2)

#

like this, I think that should work

#

yeah, seems to not crash and to produce a sane-looking result at least.

tidal bough Sep 5, 2021, 8:31 PM

#

tough bolt yeah, I'd later like to factor in values like velocity, amount of neighboured po...

But if you're adding stuff like this, than this approach might cease to work. In that case, you'll have one final solution left - using numba, or FFI with a compiled language of your choice, to do the exact same stuff you were doing but without the slowness of Python.

tough bolt Sep 5, 2021, 8:32 PM

#

Oh wow, that's great. I'll try this - thanks you!

#

Hey - this works wonders!

It now takes 1.5 seconds instead of 24 seconds to calculate for 2k entries ... amazing

tidal bough Sep 5, 2021, 8:42 PM

#

tough bolt Hey - this works wonders! It now takes 1.5 seconds instead of 24 seconds to ca...

Here's another solution (I've tested that it produces the same results as pairwise_dists above):

from numba import njit
@njit
def pairwise_numba(arrs):
    n = arrs.shape[0]
    results = np.empty(shape=(n,n), dtype = np.uint32)
    for i in range(n):
        for j in range(i,n):
            dist = ((arrs[i]-arrs[j])**2).sum()
            results[i,j] = dist
            results[j,i] = dist
    return results

This is the "same thing you are doing but in numba" way

#

It's 2x faster:

>>> %timeit pairwise_dists(arrs)
354 ms ± 6.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit pairwise_numba(arrs)
150 ms ± 7.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

which is likely because it avoids doing half the work by using the symmetry of your distance function (for j in range(i,n)).

iron basalt Sep 5, 2021, 8:45 PM

#

There is also:

#

def pairwise_dists(x, y):
    """ Computing pairwise distances using memory-efficient
    vectorization.

    Parameters
    ----------
    x : numpy.ndarray, shape=(M, D)
    y : numpy.ndarray, shape=(N, D)

    Returns
    -------
    numpy.ndarray, shape=(M, N)
        The Euclidean distance between each pair of
        rows between `x` and `y`."""
    sqr_dists = -2 * np.matmul(x, y.T)
    sqr_dists +=  np.sum(x**2, axis=1)[:, np.newaxis]
    sqr_dists += np.sum(y**2, axis=1)
    return  np.sqrt(np.clip(sqr_dists, a_min=0, a_max=None))

#

minus the sqrt

#

Using this:

#

abstract sinew Sep 5, 2021, 8:46 PM

#

may i ask what the above functions are for?

tidal bough Sep 5, 2021, 8:47 PM

#

abstract sinew may i ask what the above functions are for?

Computing the pairwise distances between many matrices, with the caveat that the distance function might get more complex later

iron basalt Sep 5, 2021, 8:47 PM

#

The clip is for numeric precision issues.

tough bolt Sep 5, 2021, 8:47 PM

#

hmm, another problem I'm running into right now is this

numpy.core._exceptions.MemoryError: Unable to allocate 180. GiB for an array with shape (22001, 22001, 100) and data type int32

#

KEKW

#

180 gb

tidal bough Sep 5, 2021, 8:48 PM

#

iron basalt ```py def pairwise_dists(x, y): """ Computing pairwise distances using memor...

nice, so basically:

def pairwise_dists_2(arrs):
    n = arrs.shape[0]
    x = arrs.reshape(n, -1)
    sqr_dists = -2 * np.matmul(x, x.T)
    sqr_dists +=  np.sum(x**2, axis=1)[:, np.newaxis]
    sqr_dists += np.sum(x**2, axis=1)
    return sqr_dists # we are using integers, so no sqrt or clip needed

Lemme compare it...

tidal bough Sep 5, 2021, 8:48 PM

#

tough bolt hmm, another problem I'm running into right now is this > numpy.core._exception...

oh well 😅

#

yeah, that's another problem of my vectorized solution

iron basalt Sep 5, 2021, 8:50 PM

#

Without the clipping you may get negative values that are very close to being zero.

#

So just clip them to 0

tidal bough Sep 5, 2021, 8:50 PM

#

tidal bough nice, so basically: ```py def pairwise_dists_2(arrs): n = arrs.shape[0] ...

>>> res_2 = pairwise_dists_2(arrs)
>>> np.array_equal(res,res_2)
True
>>> %timeit pairwise_dists_2(arrs)
183 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Result is correct, and this is a bit slower than the numba solution but better than mine vectorized one

iron basalt Sep 5, 2021, 8:51 PM

#

Yea numba will be the best, minus the JIT time

tidal bough Sep 5, 2021, 8:51 PM

#

iron basalt Without the clipping you may get negative values that are very close to being ze...

In this case the original arr is of int dtype, I believe, and so so are the distances - so no clipping needed, as floats aren't involved.

tough bolt Sep 5, 2021, 8:51 PM

#

tidal bough ```py >>> res_2 = pairwise_dists_2(arrs) >>> np.array_equal(res,res_2) True >>> ...

hmm, so you think this approach would also solve the 180gb problem?

iron basalt Sep 5, 2021, 8:51 PM

#

You can do the solution I gave in numba too

tough bolt Sep 5, 2021, 8:51 PM

#

As the endresult always will be an array that big - no?

tidal bough Sep 5, 2021, 8:52 PM

#

tough bolt hmm, so you think this approach would also solve the 180gb problem?

Neither pairwise_numba and pairwise_dists_2 share pairwise_dists's problem of allocating a giant array.

tidal bough Sep 5, 2021, 8:52 PM

#

tough bolt As the endresult always will be an array that big - no?

No, the final array is (n,n), so only 1.8GB for you - pairwise_dists allocates an intemediate array of shape (n,n,<number of elements in each matrix>), that's what kills the memory.

tough bolt Sep 5, 2021, 8:53 PM

#

oh, I see

iron basalt Sep 5, 2021, 8:53 PM

#

The large memory issues comes from a very large intermediate value.

#

(Which the expression re-write avoids)

tidal bough Sep 5, 2021, 8:54 PM

#

Basically, I'd suggest using the numba solution, not just because it's currently the fastest one, but also because it's the only one you'll easily be able to extend when you change your distance function. With vectorized ones, you'll have to work hard to figure out how to vectorize the new more complicated function (if it starts depending on other values than just the two, I mean), and it might even be impossible at all depending on the distance function

(well, I guess it would only be truly impossible if the dissimilarity won't actually be a distance function, since good distance functions should, at the very least, only depend on the two points)

tough bolt Sep 5, 2021, 8:55 PM

#

yeah ... I think I'm gonna need a minute to process all that 😅

mossy owl Sep 5, 2021, 8:55 PM

#

tidal bough Basically, I'd suggest using the numba solution, not just because it's currently...

we'll try it out

livid kiln Sep 5, 2021, 8:58 PM

#

When I read a multi index table, how do I specify the dtype of the header?

import io
import pandas as pd

df = pd.read_csv(io.StringIO('''combo,0,0,1,1,2,2
stock,weights,tickers,weights,tickers,weights,tickers
0,0.5,ADBE,0.7,AAPL,0.7,INTC
1,0.25,MSFT,0.1,MSFT,0.15,AMD
2,0.25,NVDA,0.1,NVDA,0.15,SKYT
3,,,0.1,GOOG,,

'''), header=[0, 1], index_col=0)

df.columns.get_level_values(0)

Index(['0', '0', '1', '1', '2', '2'], dtype='object', name='combo')

Here I would like to specify the dtype is int (not object)

tidal bough Sep 5, 2021, 8:58 PM

#

livid kiln When I read a multi index table, how do I specify the dtype of the header? ```py...

I indeed don't see any mentions of being able to specify the header dtype for read_csv, but have you tried converting it to int afterwards?

livid kiln Sep 5, 2021, 8:59 PM

#

tidal bough I indeed don't see any mentions of being able to specify the header dtype for `r...

Just found this solution: https://stackoverflow.com/questions/49199530/pandas-multi-index-column-header-change-type-when-reading-a-csv-file

Stack Overflow

pandas multi-index column header change type when reading a CSV file

I have a pandas data frame that uses multi-index for both column and rows. Here is a simplified example:

import pandas as pd
import datetime

col1 = [datetime.date(2018, 1, 1)+i*datetime.timedelta...

#

It doesn't seem elegant tho 😅

tidal bough Sep 5, 2021, 9:01 PM

#

I believe it'll at least be efficient (redtyping the header affects only the header column and doesn't copy the entire DF, hopefully), but yeah, annoying

livid kiln Sep 5, 2021, 9:03 PM

#

I tried using this:

#

df.columns = df.columns.set_levels(
      df.columns.get_level_values(0).astype(int), level=0
)

#

Gives me this error:

#

ValueError: Level values must be unique: [0, 0, 1, 1, 2, 2] on level 0

tidal bough Sep 5, 2021, 9:04 PM

#

that solution was for a multiindex, you probably need, uhhh...

#

actually, it's pretty fair

#

you indeed have duplicate values in your first row.

livid kiln Sep 5, 2021, 9:06 PM

#

But that is intentional

tidal bough Sep 5, 2021, 9:07 PM

#

Hmm, what should duplicate index values mean?

livid kiln Sep 5, 2021, 9:07 PM

#

Same group

#

e.g. I can already use this

df['0']

#

But would like to use this:

df[0]

mossy owl Sep 5, 2021, 9:09 PM

#

tidal bough Basically, I'd suggest using the numba solution, not just because it's currently...

hm the numba solution seems to work best
it is a bit slower (but definitely still manageable) than the numpy functions but it doesn't use that much ram so is actually usable.
thanks

livid kiln Sep 5, 2021, 9:14 PM

#

livid kiln ```python df.columns = df.columns.set_levels( df.columns.get_level_values(...

@tidal bough I think it could be related this:

df.columns.levels

gives:

FrozenList([['0', '1', '2'], ['tickers', 'weights']])

misty flint Sep 5, 2021, 9:20 PM

#

i wish we didnt have to learn feature eng in matlab

#

im pretty sure you can do the same things with python + opencv

#

like 90% sure

#

theres a lot of the same functions

livid kiln Sep 5, 2021, 9:25 PM

#

tidal bough Hmm, what should duplicate index values mean?

After reading this: https://github.com/pandas-dev/pandas/issues/28294
I found that they interpret the new indexes as a mapping for unique values

df.columns = df.columns.set_levels(set(df.columns.get_level_values(0).astype(int)), level=0)

EDIT:
Actually a better way is:

df.columns = df.columns.set_levels(df.columns.levels[0].astype(int), level=0)

REF: https://stackoverflow.com/questions/49199530/pandas-multi-index-column-header-change-type-when-reading-a-csv-file/49199682#comment85952840_49199682

GitHub

Clarify that MultiIndex.set_levels() interprets passed values as ne...

Code Sample, a copy-pastable example if possible import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(3, 3), columns=pd.MultiIndex.from_tuples([(1, 2), (1, 4), (5, 6)])) df.colum...

Stack Overflow

pandas multi-index column header change type when reading a CSV file

I have a pandas data frame that uses multi-index for both column and rows. Here is a simplified example:

import pandas as pd
import datetime

col1 = [datetime.date(2018, 1, 1)+i*datetime.timedelta...

glad mulch Sep 5, 2021, 10:52 PM

#

anyone know why my residual vs fitted vals are bounded near the bottom

plush pier Sep 5, 2021, 10:55 PM

#

What's a good set of modules to use for financial modeling? I want to run some simulations but there must be tools out there for modeling stochastic trend series and financial instruments.

#

I could reinvent the wheel but it would be pretty crude I think.

glad mulch Sep 5, 2021, 11:02 PM

#

plush pier What's a good set of modules to use for financial modeling? I want to run some ...

mplfinance

#

mlfinlab

desert oar Sep 6, 2021, 12:08 AM

#

glad mulch anyone know why my residual vs fitted vals are bounded near the bottom

Is there a natural upper or lower bound on the data?

thorn bobcat Sep 6, 2021, 12:30 AM

#

can someone explain a concept to do with gradients for me real quick?

velvet thorn Sep 6, 2021, 1:25 AM

#

thorn bobcat can someone explain a concept to do with gradients for me real quick?

well, nobody can answer you if you don't post the question

glad mulch Sep 6, 2021, 1:39 AM

#

desert oar Is there a natural upper or lower bound on the data?

there is a bound on the dependent variable where it cant be less than 0

thorn bobcat Sep 6, 2021, 1:39 AM

#

velvet thorn well, nobody can answer you if you don't post the question

i got it answered actually.

glad mulch Sep 6, 2021, 1:39 AM

#

and cant be greater than 100

thorn bobcat Sep 6, 2021, 1:39 AM

#

it was about gradient descent but I got it figured out now.