#data-science-and-ml

1 messages · Page 342 of 1

desert oar
#
from math import isclose

# Lists of length-2 lists or tuples
computational_spans = [ ... ]
manual_spans = [ ... ]

def span_percent_overlap(lo1, hi1, lo2, hi2):
    # You can figure this part out :)
    ...

span_overlaps = {}
for comp_num, (comp_lo, comp_hi) in enumerate(computational_spans):
    for manu_num, (manu_lo, manu_hi) in enumerate(manual_spans):
        overlap = span_percent_overlap(comp_lo, comp_hi, manu_lo, manu_hi)
        if not isclose(overlap, 0.0):
            span_overlaps[comp_num, manu_num] = overlap
#

something like that maybe

slim drift
#

hmm ill try it

#

give me 1 sec

slim drift
desert oar
#

the starts and ends of the spans

#

as in "low" and "high"

slim drift
#

oh ok is it fine to keep it like that?

#

which part do i need to figure out

desert oar
#

variable names are up to you. this is just to demonstrate the use of enumerate and the nested loop

#

you need to figure out how to determine if 2 spans overlap

slim drift
desert oar
#

yes, isn't that what the contents of the lists are?

#
|----|
  |----|

|--------|
   |---|

|---|
      |----|

which spans overlap, and by how much? you need to write the code to do that part

#

it sounded like in your code the spans are represented as a pair of times, the start time and the stop time

#

which is a reasonable way to represent a time span!

desert oar
#

so given two start-stop pairs, how do you calculate the % overlap?

slim drift
#

well there's time in between as well

desert oar
#

i'm not sure what you mean by that

slim drift
#

there is time = 0, time =100, but also there are times in between

desert oar
#

i still don't know what you mean by that

#

you asked how to find the overlaps between two lists of time spans, i am showing you the rough sketch of how you might do that

slim drift
forest beacon
#

Hello!

fig = plt.Figure()
ax = fig.subplots(1)
plt.gcf().autofmt_xdate()
ax.set_facecolor('#1E1E1E')
fig.patch.set_facecolor('#1E1E1E')
x, y = [], []
async for document in db.main.stats.find({}):
    x.append(datetime.fromtimestamp(document['time']))
    y.append(document['total_users'])
ax.plot(x, y, color='orange', linestyle='-')
ax.grid(color='#323232', linewidth=1, linestyle='-')
ax.set_xlabel('Время')
ax.set_ylabel('Количество пользователей')
ax.tick_params(axis='both', colors='white')
ax.spines['top'].set_color('white')
ax.spines['bottom'].set_color('white')
ax.spines['left'].set_color('white')
ax.spines['right'].set_color('white')
ax.xaxis.label.set_color('white')
ax.yaxis.label.set_color('white')
ax.xaxis.set_major_locator(HourLocator(interval=2))
ax.xaxis.set_major_formatter(DateFormatter('%H:%M', tz=tz.gettz('Europe/Moscow')))
ax.yaxis.set_major_formatter(plt.FuncFormatter(format_tick))
plt.close('all')
fig.savefig('./data/stats.png', bbox_inches='tight', dpi=256)

I use matplotlib to create stats graph for my bot. And it executes every 15 minutes to update image. And every 15 minutes matplotlib uses some memory but doesn't release it. I tried to do plt.cla(), plt.close(), fig.clear() and fig.clf() but these has no effect.
I even asked this question in StackOverflow. https://stackoverflow.com/questions/69144822/releasing-memory-in-matplotlib.
Maybe somebody here had same problem.
Help please.

desert oar
slim drift
desert oar
#

yes, you'd probably want to use if at some point

#

you'll probably need the comparison operators like >, <=, etc. and the math operators like -

#

i highly suggest writing out your solution on paper first

#

do some test calculations by hand, to make sure it makes sense

#

then translate your solution to code

#

professionals do this all the time. it's a tool for thought

slim drift
#

ok i will write it out

olive shore
#

Does anyone have a reddit dataset that I could use to train GPT 2?

reef wing
bronze rampart
#

hello all

hasty grail
#

Hello, do you need help with something?

bronze rampart
#

definately need help

#

having hard time making viz my default editor

#

cant install viz module

royal crest
bronze rampart
#

thanks

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1632201233:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

minor spoke
#

Hey can someone help me with ML stuff, I'm try to work with upsampling point clouds

lone drum
#

Helli

#

My code

I tried
for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python',  chunksize=100000 , iterator=True)):
    
    sep_time = pd.DataFrame()
    print('chunk no.:', i)
    for n in chunk['Transaction Time']:

        print('n:', n)
        
        converted_timestamp = datetime.fromtimestamp(n /1_000_000_000)
        print('converted:', converted_timestamp)
        print()
        
        get_month = converted_timestamp.month
        
        get_day = converted_timestamp.day
         
        #add 10 to year
        year_add_10 = converted_timestamp.year+10

        #hour
        get_hour = converted_timestamp.hour

        #minute
        get_min = converted_timestamp.minute

        #second
        get_sec = converted_timestamp.second
        
        # microsecond
        get_microsec = converted_timestamp.microsecond

        c = datetime(year_add_10, get_month, get_day, get_hour,get_min, get_sec, get_microsec)
        print('datetime:', c)
        print()
        final_date_time = datetime.strftime(c, '%d-%m-%Y %H:%M:%S.%f')
        print('final_date_time:',final_date_time)
        print()

        chunk['time_seprated'] = final_date_time
        
        print('chunk1:')
        print(chunk)
        print() ```
 this way
#

I am getting same value in last column

chunk1:
      Msgtype Activity Type  ...   Qty               time_seprated
0       Order             N  ...  1175  28-08-2021 14:45:08.107780
1       Order             N  ...  1175  28-08-2021 14:45:08.107780
2       Order             N  ...   500  28-08-2021 14:45:08.107780
3       Order             N  ...  1175  28-08-2021 14:45:08.107780
4       Order             M  ...  1175  28-08-2021 14:45:08.107780
      ...           ...  ...   ...                         ...
99995   Order             M  ...    50  28-08-2021 14:45:08.107780
99996   Order             M  ...    25  28-08-2021 14:45:08.107780
99997   Order             M  ...    50  28-08-2021 14:45:08.107780
99998   Order             M  ...    25  28-08-2021 14:45:08.107780
99999   Order             M  ...    50  28-08-2021 14:45:08.107780```
#

Can anyone help me to understand why I am getting same value in last column

#

Ping me when replying

royal crest
lone drum
lapis sequoia
#

bet

#

love this place

royal crest
#

i think rule 8 applies at this point

#

i won't be involved

royal crest
#

if the graph is generated as expected then you could probably rule out your matplotlib code

#

it might actually be the way your async code is set up instead

forest beacon
lone drum
gray tartan
#

Hey ! I have an issue with pandas (that i asked in #help-avocado and another channel before that)
should i copy it there or should i just wait ?
It is pretty blocking :/
Thanks in advance for the help !

uncut barn
#
o180_img10_patch_419_43.jpg, 
o180_mask10_patch_419_43.jpg

I want to check if these strings match from pacth_419_43.jpg, i.e. I want to first find the word patch and print this with everything else to the right of it in the string, I have an idea of using some kind of regex but not sure how to implement it, any help will be much appreciated

gaunt marsh
#

Thank you. What is that 06.2f in return '(' + ', '.join(format(val, '06.2f') for val in rgb) + ')' doing?

gaunt marsh
#

I am getting TypeError: unsupported operand type(s) for +: 'int' and 'str' How can I fix this?

I use this code:

plt.axis('off')
plt.show()``` 

This is what rgb_labels look like
pliant shadow
#

has anyone here dealt with SageMath before?

desert oar
#

@dull turtle so not only did you refuse to do any work for yourself, but this also turned out to be a homework assignment? That's quite unethical. You are only cheating yourself by refusing to study and learn.

desert oar
desert oar
#

@gaunt marsh use the counts for the sizes and the labels for the axis labels

lapis sequoia
#

Hey

#

Anyone has worked on Marketing Mixed Modelling

dull turtle
#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

gaunt marsh
#

I need help with autopct for my pie chart. I have the absolute values as a list and want to pass them. The examples from stackoverflow didn't work

tidal bough
#

https://towardsdatascience.com/a-simple-cnn-multi-image-classifier-31c463324fa

Why does this article call it a CNN?
the model seems to be:

model = Sequential() 
model.add(Flatten(input_shape=train_data.shape[1:])) 
model.add(Dense(100, activation=keras.layers.LeakyReLU(alpha=0.3))) 
model.add(Dropout(0.5)) 
model.add(Dense(50, activation=keras.layers.LeakyReLU(alpha=0.3))) 
model.add(Dropout(0.3)) 
model.add(Dense(num_classes, activation=’softmax’))

which I don't see any convolutions in

hasty grail
#

No idea

#

I don't think transfer learning is taking place either - the pretrained VGG16 model is only used to generate the labels

inner imp
#

Hello. I know basics of Python, and i wanna learn machine learning, AI and stuff. Can someone recommend me a good resource where i can learn 4 free?

dull turtle
#
Traceback (most recent call last):

  File "F:\office codes\transaction time.py", line 52, in <module>
    chunk['sep_time'] = sep_time

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
    self._set_item(key, value)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
    value = self._sanitize_column(key, value)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
    value = sanitize_index(value, self.index)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
    raise ValueError(

ValueError: Length of values (1) does not match length of index (100000)```
#

my code ```python
final_date_time = datetime.strftime(c, '%d-%m-%Y %H:%M:%S.%f')
#New_Time.append(final_date_time)
sep_time.append(final_date_time)
print('final_date_time:', final_date_time)
print()

    chunk['sep_time'] = sep_time

    new_path = r'F:/output csv files/'
    file_name = 'Stream-5 datetime extracted'
    extension = '.csv'
    chunk.to_csv(f'{new_path}{file_name}{extension}', index=False, mode='a', header=None)
    print('data saved in extracted csv file')``` my code
#

how i can add output in column at end of datatframe

#

ping me when replying

desert oar
#

@dull turtle you've repeatedly ignored and disregarded something like an hour of help and attention from multiple people here

#

And now you're re-asking the original question

worldly lake
#

I have two arrays:

massive_first = [{"sku": "1", "type": "car", "color": "yellow"}]
massive_second = [{"sku": "1", "price": "4000", "quentity": "8"}]

anyone knows how i can combine them into one array like it:

massive_third = [{"sku": "1", "type": "car", "color": "yellow", "price": "4000", "quentity": "8"]

but with for cycle for every itteration?

dull turtle
desert oar
dull turtle
#

i stucked at this from yesterday , can u please look into this ?

worldly lake
#

@desert oar dataframe from pandas module?

lapis sequoia
#

so @dull turtle is this like small array or like big chunk of data?

#

if small array, i mean it should not be too hard IMO.

desert oar
lapis sequoia
#

also you mean to combine them assuming merge by indexes right?

dull turtle
#

so i am using chunksize for it

desert oar
lapis sequoia
# dull turtle my data is of 5.5 gb

oh df may be bad way for it then, since it will take quite RAM.
I suggest simply using loops.
assuming by 5.5gb you mean its in some kind of file.

desert oar
#

And the problem is that they are re-inserting the column at every iteration of a loop when they should be inserting it only once

#

Which we have explained over and over

lapis sequoia
desert oar
dull turtle
lapis sequoia
#

that is great!! I'll look upon chuck size in free time.

lapis sequoia
worldly lake
#

@desert oarI have two arrays that I want to link using the same "sku" (article) keys, but the problem is that the second array contains some objects that are not in the first one, and I need to somehow compare and display them. Thought at first to make an object, but I need to iterate over two arrays at the same time. I read about Itertools.chains but it doesn't really help me much.

lapis sequoia
dull turtle
desert oar
lapis sequoia
#

also please no DM. @dull turtle

desert oar
#

@worldly lake I can't tag you because you have RTL in your name

dull turtle
desert oar
#

oh it's only buggy on mobile

#

the tagging works on desktop

worldly lake
#

@lapis sequoiaSorted yes. I just have a list of products in the first array, and information about the price and quantity of this product in the second. And so it turns out that there is a product in the first array, but there is no price for it, which is why the first array of objects contains more than the second. And because of this, I cannot sort. Although I can unload sku data of goods and then comparing whether there are such unloading as I need. I seem to have solved my problem.

desert oar
#

i would still using a dataframe here

#

implementing a join manually over lists would be slow and a lot more code than you need

lapis sequoia
#

wait you both have different questions?

worldly lake
#

@desert oar sorry for that

desert oar
#

they did

# inputs
massive_first = [{"sku": "1", "type": "car", "color": "yellow"}]
massive_second = [{"sku": "1", "price": "4000", "quentity": "8"}]

# result
massive_third = [{"sku": "1", "type": "car", "color": "yellow", "price": "4000", "quentity": "8"}]
dull turtle
#

salt rock can we discuss again ? please

lapis sequoia
#

okay okay. hold on.

desert oar
#

i won't re-answer the same question over and over

#

@worldly lake doing this "manually" would entail do 2 loops: one over massive_first, and another over massive_second - basically comparing all pairs of elements

dull turtle
#

my data

desert oar
#

you can speed this up by using some kind of index instead of just scanning the list. the simplest thing to do in python would be to build a hash table of SKU -> record

dull turtle
lapis sequoia
desert oar
#

that's true too

dull turtle
#

blank csv file i get yesterday

desert oar
#

but again, doing this all in pandas would make this easier

#

yes @dull turtle, i said that this was untested code written partly by guessing, and that you need to double-check the documentation for these functions to see how to actually use it

#

set mode='a' in my code, try that

worldly lake
lapis sequoia
worldly lake
#

@lapis sequoia first massive have object with sku 107 but second massive is not

lapis sequoia
#

that seems like a small data

#

wait is that js?

desert oar
#

"massive" would be like 107 million 🙂

#

and even then, that's pretty small for some systems

lapis sequoia
#

don't tell me the arrays you shared are array of jsons and not dicts :'(

desert oar
#

doing that by hand would be a lot slower and a lot more verbose

#

you might have to change how= depending on the desired effect:

df_result = df1.join(df2, how='inner')
df_result = df1.join(df2, how='outer')
df_result = df1.join(df2, how='left')
df_result = df1.join(df2, how='right')
burnt knot
#

So basically throughout August and September I've been beside myself as I struggled to figure out how to build a model that can generate TTS for English and Japanese in one sentence

desert oar
burnt knot
#

I used Multi-Tacotron-Voice-Cloning for the most part
This is a GitHub repository designed to make it easy to do multiple languages cloned by swapping out the English and Russian in the dataset

#

It's divided into four portions: g2p, encoder, synthesiser, and vocoder

#

I spent a good deal of time successfully making a g2p collection for English and Japanese

#

The problem comes with everything after that

#

The instructions are set out plainly for me to use accordingly

#

Probably because there's little room for error if I do it right

desert oar
#

you can also use pd.merge which is a slightly different interface (more concise in this particular case):

df1 = pd.DataFrame(massive1)
df2 = pd.DataFrame(massive2)

df_result = pd.merge(df1, df2, on='sku', how='outer')
burnt knot
#

It seems the main reason I can't make this model has to do with my choice for audio?

#

I struggled with the encoder step in terms of formatting and collecting English voices, but I got it to work
But then when I got to the synthesiser step I was stuck because I didn't have a metadata file

#

I don't know how I was expected to organise my data because it really hurt me in the process

#

I think I could overcome this if I figure out how to properly format any kind of dataset I have
But I don't really know how that would work because the author of the repository never addressed this

#

That's where I've been throughout September

#

I tried asking for help from other people that do voice cloning but they don't have a clue what to tell me

#

I'm running out of ideas and haven't figured out how to start over

worldly lake
#

@desert oar@lapis sequoia oh my godddddddddddd, thanks you both very much!!!!!

#

this is exactly what I needed!

#

can I express my gratitude to you?
can there be any section with reviews / reviews? hah: D

dull turtle
#

@desert oar hello i am using this code ```python
df_chunks = pd.read_csv(f'{path}{file_name}{extension}',
engine="python",
chunksize=100000,
iterator=True,
)

for i, chunk in enumerate(df_chunks):
print("chunk no.:", i)

time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365)
chunk.insert(2, "time_separated", time_separated)
del chunk["Transaction Time"]

is_first_chunk = i == 0
write_header = is_first_chunk
write_append = not is_first_chunk

append=write_append

chunk.to_csv(filename_out, header=write_header, mode='a' )

chunk.to_csv(f'{new_path}{output_file_name}{extension}', header=write_header,mode='a' if write_append else 'w')```
#

can we discuss again ?

desert oar
#

!paste use this site 👇

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar
#

you only need to post the first 10 rows or something like that

#

also, @dull turtle is this actually a school assignment? if so, i will have to stop giving answers as per our rules, but i will at least try to debug what i already wrote

dull turtle
#

do u get my data in pastebin link ?

desert oar
#

yes, thank you

#

i noticed that another user posted very similar code and almost the exact same problem

#

are you collaborating with someone on it? are you using 2 different accounts?

#

were you given a template by your instructor?

dull turtle
#

not at all

desert oar
#

this is the same code as your original code, right down to the typos

dull turtle
#

i have written script for it

#

if u check my code u will find

#

now i just want to add that output column at end of df

#

do u get my point what i am trying to do ?

desert oar
#

yes, i understand. but i am trying to get more context for this

lapis sequoia
dull turtle
desert oar
#

what is the nature of the project? what kind of task were you given? are you going to just use my code and claim it as your own?

#

anyway, i tested my code and this worked as expected:

from datetime import timedelta

import pandas as pd


filename_in = "..."
filename_out = "..."

df_chunks = pd.read_csv(filename_in, sep='\t', chunksize=5, iterator=True)

for i, chunk in enumerate(df_chunks):
    print(f"Chunk: {i}")

    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
    chunk.insert(2, "time_separated", time_separated)

    is_first_chunk = i == 0
    write_header = is_first_chunk
    write_mode = 'w' if is_first_chunk else 'a'
    chunk.to_csv(filename_out, header=write_header, mode=write_mode, index=False, sep='\t')
#

obviously that chunksize is very small, it was just for testing

burnt knot
#

I basically used the Google Colab provided and rewrote it to be mounted to my drive and work from there with the instructions

#

I've gone through many issues environment-wise, but I believe the problem is entirely in making sure all my files are in place

#

Something that I haven't been able to understand

dull turtle
#

?

#

Traceback (most recent call last):

  File "F:\office codes\transaction time seprateed code.py", line 46, in <module>
    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err

KeyError: 'Transaction Time'``` this error i am getting @desert oar
desert oar
#

i can't reproduce the error, works fine for me

dull turtle
quasi parcel
#

hi @desert oar i hope you are doing well, your code works well for string but for int or float its not working could you help when you are free?

dull turtle
desert oar
#

i don't know why you're getting the error, because it works for me

lapis sequoia
desert oar
#

that seems odd

#

more realistically del did something, i've since removed that from my code

desert oar
quasi parcel
#

i am sorry

#

sure

serene scaffold
# quasi parcel i am sorry

Part of the advantage of this channel is that lots of people with data science knowledge hang out here. salt rock lamp is very knowledgeable, but it's best to pose your questions as something that anyone can help you with.

quasi parcel
#

i agree sorry @serene scaffold

lapis sequoia
#

lol its alright, anyways, you may want to reshare the question/issue.

#

or atleast put the link of question shared previously

quasi parcel
#

sharing in two mins

#

@lapis sequoia

dull turtle
lapis sequoia
#

either you're doing something wrong with the data. or for that chunk that data is null in all cases(an assumption)

a simple way to debug would be to print the column names and/or see if you're deleting some columns or something

dull turtle
# lapis sequoia either you're doing something wrong with the data. or for that chunk that data i...

my code here ```python
df_chunks = pd.read_csv(f'{path}{file_name}{extension}', sep='\t', chunksize=100000, iterator=True)

for i, chunk in enumerate(df_chunks):
print(f"Chunk: {i}")

time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
chunk.insert(2, "time_separated", time_separated)

is_first_chunk = i == 0
write_header = is_first_chunk
write_mode = 'w' if is_first_chunk else 'a'
chunk.to_csv(f'{new_path}{output_file_name}{extension}', header=write_header, mode=write_mode, index=False, sep='\t')
```
#

i am using salt rock lamp code

lapis sequoia
#
print(chunk.columns.tolist())

you can see columns when this breaks to see if Transaction Time exists over there.

dull turtle
burnt knot
#

I was working with this (running into problems of missing files so I couldn't complete it properly) but I have doubts that this would be effective in constructing a voice cloning model for English-Japanese

lapis sequoia
dull turtle
#
for i, chunk in enumerate(df_chunks):
    print(f"Chunk: {i}")
    print(chunk.columns.tolist())
    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
    chunk.insert(2, "time_separated", time_separated)``` my code @lapis sequoia
burnt knot
#

I don't know if there are things in the code I need to rewrite for this to work because I haven't figured that out
But I started conflating this with the other code and now I'm a mess

dull turtle
# lapis sequoia is that the last output when it breaks?
Chunk: 0
['Msgtype,Activity Type,Transaction Time,Exchange,Token,Symbol,Buy/Sell,Buy Order Number,Sell Order Number,Price,Qty']``` error ```python
Traceback (most recent call last):

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Transaction Time'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "F:\office codes\transaction time seprateed code.py", line 46, in <module>
    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err

KeyError: 'Transaction Time'```
lapis sequoia
#

not sure what is wrong then 😓

dull turtle
#

okay np thanks for your efforts

#

if anyone has solution for my issue please ping me

quasi parcel
#

@lime hemlock you need seperate data frame for transcation time?

#

is the transaction time is in epoch?

dull turtle
quasi parcel
#

have you tried chunk['Transaction Time']

dull turtle
dull turtle
quasi parcel
#

ohh wait

#

i have encounterd this

#

you can do this

dull turtle
#

the output came from Transaction Time i have to add this in column at end of df

dull turtle
quasi parcel
#

chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transcation Time']]

#

this should do the trick

#

@dull turtle

coral isle
#

Hello
I need help to delete cells in excel file and shift the rest to left such as
The photo

I will delete any cell before NR and shift left , if the row haven't NR should be deleted and shift up

quasi parcel
dull turtle
# quasi parcel this will create a new colomn in the exsisting df and convert the epoch to date
Traceback (most recent call last):

  File "F:\office codes\transaction time.py", line 46, in <module>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transaction Time']]

  File "F:\office codes\transaction time.py", line 46, in <listcomp>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transaction Time']]

OSError: [Errno 22] Invalid argument```
quasi parcel
#

one moment

dull turtle
#

sure, ping me

quasi parcel
#

chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transcation Time']]

#

can you show me the code after adding this line please

#

?

#

os error generally occur when there is an invalid file path

dull turtle
quasi parcel
#

try this once

#

there will be some idendation issue but

burnt knot
#

I just hope my issue isn't terribly hard to address without testing out the program yourself
Which might be why I never get a response to my issue

quasi parcel
#

@burnt knot what was your issue again can you please give the link

dull turtle
# quasi parcel ```chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d...
chunk no.: 0
Traceback (most recent call last):

  File "F:\office codes\transaction time.py", line 19, in <module>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transaction Time']]

  File "F:\office codes\transaction time.py", line 19, in <listcomp>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transaction Time']]

OSError: [Errno 22] Invalid argument```
dull turtle
quasi parcel
#

okay,

#

is it okay to give 10 rows in .csv format

#

@dull turtle

dull turtle
desert oar
# quasi parcel is it okay to give 10 rows in .csv format

this is what they gave me (tab-separated)

Msgtype    Activity Type    Transaction Time    Exchange    Token    Symbol    Buy/Sell    Buy Order Number    Sell Order Number    Price    Qty
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473430    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255595    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255600    500
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    607185    1175
Order    M    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255605    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255610    250
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473425    250
Order    M    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473420    250
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    607180    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473415    500
#

and this was the code i wrote, which worked

from datetime import timedelta

import pandas as pd


filename_in = "pd_chunks_1.csv"
filename_out = "pd_chunks_2.csv"

df_chunks = pd.read_csv(filename_in, sep='\t', chunksize=2, iterator=True)

for i, chunk in enumerate(df_chunks):
    print(f"Chunk: {i}")

    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
    chunk.insert(2, "time_separated", time_separated)

    is_first_chunk = i == 0
    write_header = is_first_chunk
    write_mode = 'w' if is_first_chunk else 'a'
    chunk.to_csv(filename_out, header=write_header, mode=write_mode, index=False, sep='\t')
quasi parcel
#

got it sir let me check

arctic wedgeBOT
#

Hey @dull turtle!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

dull turtle
quasi parcel
#

sure

dull turtle
#

plz check

burnt knot
quasi parcel
#

@desert oar you code works like a charm, the only issue was @dull turtle was using comma seperated but you have you used tab seperated

#

i have changed sep='\t' to sep=','

#

it started working

quasi parcel
#

@burnt knot what is wrong in first repo

#

?

#

could you please elaborate on the error

burnt knot
#

Okay
So the guy who made it listed the voice banks he used for his data for both English and Russian
I couldn't use his English voice bank because I had complications downloading it so I used my own, and I got my Japanese voice bank

#

The audio work starts in the encoder stage

#

It took me awhile to get anywhere with the audio but I eventually got it to train
But then I got to the synthesiser stage

#

I couldn't run the first command because it stated it needed a metadata.csv file, which I could not find
I take it that it's a graph with all the metadata or however that works
My point is I can't run the step without having the voice metadata file to give it

#

I don't know if it mattered on the kind of voices I was using--I'm using a new batch this time around

#

But I can't figure out how to work that synthesiser step

quasi parcel
#

okay

#

let me run the first repo do you have your data?

#

@burnt knot

burnt knot
#

You want me to share it?

quasi parcel
#

yes

#

if you are okay with it

#

?

quasi parcel
#

got it

#

it might take some time to debug your issue its bit complex

burnt knot
#

Do you have the notebook too?

#

@quasi parcel

quasi parcel
#

your data is downloading

#

its still loading

#

sorry my bad

#

net

burnt knot
#

No I mean
Do you need the link to the Colab I used?
Or do you prefer to work on the hard drive?

modest timber
#

Hi could I ask question with pandas dataframe?

#

Or should I wait, there is queue 🙂

burnt knot
#

Ask away
There isn't a queue, you can expect a response eventually

quasi parcel
#

okay

#

my bad could you run colab again

#

?

#

sorry

modest timber
#

hey, please help.. I have code that is adding two SMA to data (simple moving average to data)
But its job was to every loop iteration add 2 columns of SMA, and start another loop with old data frame.( In the beginning of the loop I add df=data for that) - but for unknown reasons I got expanding column every iteration-( like it was being doing still on the same old data.)

#
import pandas as pd
def checkBestSMA(file, minSMA1, maxSMA1, minSMA2, maxSMA2):
    maxSMA1 += 1
    maxSMA2 += 1
    data = pd.read_csv(file)
    # read file .csv
    data = data.iloc[:, [0, 5]]
    # get only DATE and stock price
    for i in range(minSMA1, maxSMA1):
        for j in range(minSMA2, maxSMA2):
            df = data
            SMA1 = f"SMA{i}"
            SMA2 = f"SMA{j}"
            df[SMA1] = df['Adj Close'].rolling(i).mean()
            df[SMA2] = df['Adj Close'].rolling(j).mean()
            df = df.dropna()
            print(df.head())
if __name__ == '__main__':
    checkBestSMA('PKN.WA.csv', 20,50,40, 200)
#
          Date  Adj Close      SMA20      SMA40
39  2010-02-25  22.858307  23.167923  24.311018
40  2010-02-26  23.519203  23.131645  24.289287
41  2010-03-01  23.914301  23.126617  24.251393
42  2010-03-02  24.711685  23.133801  24.210088
43  2010-03-03  24.783520  23.132005  24.168782
          Date  Adj Close      SMA20      SMA40      SMA41
40  2010-02-26  23.519203  23.131645  24.289287  24.291705
41  2010-03-01  23.914301  23.126617  24.251393  24.280141
42  2010-03-02  24.711685  23.133801  24.210088  24.262620
43  2010-03-03  24.783520  23.132005  24.168782  24.224074
44  2010-03-04  24.855356  23.196657  24.130170  24.185527
          Date  Adj Close      SMA20      SMA40      SMA41      SMA42
41  2010-03-01  23.914301  23.126617  24.251393  24.280141  24.282719
42  2010-03-02  24.711685  23.133801  24.210088  24.262620  24.290416
burnt knot
quasi parcel
#

no can you just run the code

#

it was asking auth

burnt knot
#

Try now?

quasi parcel
#
/content```
#

@burnt knot

#

how do you want the data to look like @modest timber

modest timber
#
      Date  Adj Close      SMA20      SMA40
39  2010-02-25  22.858307  23.167923  24.311018
40  2010-02-26  23.519203  23.131645  24.289287
          Date  Adj Close      SMA20   SMA41
40  2010-02-26  23.519203  23.131645  24.291705
41  2010-03-01  23.914301  23.126617  24.280141
          Date  Adj Close      SMA20   SMA42
40  2010-02-26  23.519203  23.131645  24.291705
41  2010-03-01  23.914301  23.126617  24.280141
......
#

@quasi parcel

#

I can send you .csv if you want

quasi parcel
#

could you

modest timber
#

For me, it is weird behavior of pandas.df

#

I wonder why this happen - ( or I have some bug)

quasi parcel
#

can you share a screenshot on how you want the data to be

#

its kinda confusing sorry

modest timber
#

ok, - you know its part of bigger project

lapis sequoia
#

so right now you're basically adding a lot of columns right?

#

and what behaviour do you want?

modest timber
#

No this function should add odnly two columns, and then delete the dataframe

#

start with new one - so I loading df=data

#

at the top fo the block in loop

lapis sequoia
#

okay okay okay so you want to use new df everytime

modest timber
#

y

#

🙂

lapis sequoia
#

you may wonna use copy. df=data is same as using data.

#

!d pandas.DataFrame.copy

arctic wedgeBOT
#

DataFrame.copy(deep=True)```
Make a copy of this object’s indices and data.

When `deep=True` (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When `deep=False`, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).
lapis sequoia
#

try with df = data.copy()

modest timber
#

yep

#

worked 😄

lapis sequoia
#

great 😊 gni8

modest timber
#

Could you gave me light why thats hapening?

lapis sequoia
#

sure. in a nutshell df=data means they both are referring to same object.
it's same for lists, dicts.

modest timber
#

aaaaah

#

k 😄

lapis sequoia
#

while
a=5
b=a
a+=1
will only change a

it is not same case with lists or dicts or np arrays or dfs

#

I'm on cellphone rn otherwise I'd give you an example.

quasi parcel
#

correct @lapis sequoia sorry i was confused @modest timber

modest timber
#

ok, np 😄

lapis sequoia
#

but in a nutshell they both are referring same address so changing something at that place will be like changing for both.

this is like we both have address of same house
so if you put a glass in that house, we both will see a new glass.

modest timber
#

@lapis sequoia ok i preatty uderstand - its like pointer in c 😄

lapis sequoia
#

yeah exactly! it's nothing but pointer!!

modest timber
#

this teacher I think dont explain to us 😄

#

Ok, thanks again, have a nice day 🙂

lapis sequoia
#

you too😊

burnt knot
#

Have you used Colab before?

serene scaffold
#

For work I need to create something that, once complete, can be put on a flash drive and run on a computer that doesn't have internet access. So part of that means making sure that the finished product doesn't make API calls, but it also means being able to either (a) download all the dependencies before building the environment or (b) building a portable environment. I'm not really sure how to do either because I always just make a venv and pip install the requirements.

desert oar
#

pyinstaller i think just bundles the entire runtime and all your deps into a single .exe

serene scaffold
#

(not my decision)

desert oar
#

what exactly are the components involved?

#

and what's the target system?

serene scaffold
#

I'm waiting to hear back from the project lead about all of that. I guess, could one build the wheel on a VM that's the same OS? Are wheels portable?

desert oar
#

you can build a noarch wheel, but it won't help with deps

#

does your usb stick need to include python, jupyter, etc.? or just the deps for your particular notebook?

#

pyinstaller might still be the way to go, since jupyter is a python application after all

desert oar
serene scaffold
#

But I guess I need to figure out more about the requirements first

desert oar
#

yeah it sounds like it

chilly mica
#

Hey everyone, I'm looking for help with geopandas, json, and folium. Is anyone able to help me on a project?

serene scaffold
quasi parcel
lapis sequoia
# quasi parcel Yes I did

you can kinda play around with !ls and !cd to get exact path. once you do that you can use !pwd to get absolute path which will help for next time.

steel trellis
#

any data scientist here?

royal crest
#

i employ a lot of data science techniques in my work but my job description isn't data scientist

brave umbra
#

Yoo guys I have been learning python for about 6 weeks, made some guis with tkinter and used mysql databases to write data etc, you think I can move onto AI to do stuff like facial detection or is this too advanced for me and I should wait and learn more before doing this?

tender hearth
brave umbra
tender hearth
#

Ideal would be 10th-11th grade

#

Calculus and linear algebra

brave umbra
#

@tender hearth So i'd need to know that to do AI?

tender hearth
#

You'd need to know that to understand how it works behind the scenes

#

even though libraries abstract a lot of it away from you it's still essential when gluing stuff together

brave umbra
#

damn, maths isn't my strong suit

tender hearth
#

If you have any questions feel free to ask

brave umbra
#

thank you

gaunt marsh
#

My matplotlib pie chart only shows one color

pure gull
#

Hi, how on (or off!) earth do I get cuda/torch to play together on wintendo 10?

gaunt marsh
#

can anyone help me with this?

pure gull
#

no code, no dice

gaunt marsh
#


def format_rgb(rgb):
    # precision format for Float numbers to a string
    return '(' + ', '.join(format(val, '06.2f') for val in rgb) + ')'


rgb_labels = rgb_df.apply(format_rgb, axis=1)

rgb_counts = rgb_labels.value_counts()
print(rgb_counts[0])
rgb_10 = []
hex_10 = []
for k in range(len(rgb_counts)):
    if rgb_counts[k] >= 3:
        rgb_10.append(rgb_counts[k])
        hex_10.append(real_col[k])


def pct_to_absolute(value):
    a = rgb_10
    return a

plt.figure(figsize=(8, 6))
plt.pie(rgb_10, labels=hex_10, colors=hex_10, normalize=True, autopct=pct_to_absolute)
plt.show()```
pure gull
gaunt marsh
pure gull
#

I can't run this without e.g. adding example data as ara

rough mountain
gaunt marsh
#

How can I print all Values in the console without these ..?

earnest shuttle
#

Hi guys! I am stuck in an ML project... anyone that can guide me through?

prime hearth
#

@earnest shuttle

#

Just post here

earnest shuttle
#

I'm actually stuck in an assignment for kNN
Have it due in some days
But I am not able to resize the data
It's a number recognition file in a .mat format and it includes 8s and 9s
I need to plot testing error rate vs k

#

Please dont judge, im new to ML

#

from scipy.io import loadmat
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np
import seaborn as sbn
from sklearn.neighbors import KNeighborsClassifier
dataset = loadmat(r"C:\Users\Kushal\Downloads\NumberRecognition.mat")
dataset.keys()
zeros = dataset["imageArrayTraining0"]
eights = dataset["imageArrayTraining8"]
nines = dataset["imageArrayTraining9"]
img = nines[:,:,1]
#plt.imshow(img, cmap="Greys")

#creating labels
y_eights = np.zeros(eights.shape[-1])
y_nines = np.ones(nines.shape[-1])
np.concatenate([eights,nines], axis = 2).shape
np.transpose([eights,nines]).shape

#

Need to reshape the data next and then get X and y

#

after that need to train and predict the model

#

But stuck here as i dont know what to do next

prime hearth
#

what do need to rehsape it to?

#

also , np. returns a new array

#

if want to use it new modify array, need to assign a variable to it

#

otherwise np.concatenate will do nothing since it not being used

#

can also find lots of tutorials on reshaping

shrewd fog
#

How can I realese google colab unvailable?

#

Who can recommend other database to me?

desert oar
#

@gaunt marsh that's a pandas Series you are looking at

gaunt marsh
serene scaffold
#

@gaunt marsh numpy and pandas data structures will truncate what gets displayed, but one can override this

#

The reason being that the size of the structures can be exceptionally large

desert oar
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

tender hearth
#

I wonder why @arctic wedge didn't do its thing and tell you to put it in a codeblock already

serene scaffold
arctic wedgeBOT
#

Series.to_string(buf=None, na_rep='NaN', float_format=None, header=True, index=True, length=False, dtype=False, name=False, max_rows=None, min_rows=None)```
Render a string representation of the Series.
gaunt marsh
#

I have another question. Is it possible, to write Python code which does something like: every 10th textfile should go in directory x

serene scaffold
#

I would probably use that

gaunt marsh
#

that means that the next 10 files should go in directory y and so on

serene scaffold
#

you can use os.walk, I guess

#

I usually use pathlib

gaunt marsh
#

I guess that I need two loops. One which goes through all dirs and one which goes through all files, which shall be moved, right?

serene scaffold
#

you'd have to pip install more-itertools but that's easy

earnest shuttle
serene scaffold
earnest shuttle
#

Lol and I dont know what to do after reshaping

serene scaffold
#

what is the current shape and what is the desired shape?

desert oar
serene scaffold
desert oar
#

oh, i misunderstood the original question

#
import os

dirs = ['a', 'b', 'c']
files = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']

iter_dirs = iter(dirnames)
for i, f in enumerate(files):
    if i % 3 == 0:
        d = next(iter_dirs)
    os.rename(f, os.path.join(d, f))
#

something like that

#

normally i'd recommend using pathlib.Path.glob in real life. maybe also you want the directory list to cycle forever?

from itertools import cycle
from pathlib import Path

dirs = [Path('a'), Path('b'), Path('c')]
paths = Path('source-files').glob('*.txt')

for d, p in zip(cycle(dirs), paths):
    p.rename(d / p.name)
gaunt marsh
#

The folders are already created via cat folder.txt | xargs mkdir

#

folder.txt contains the names of the folders

pure gull
#

@gaunt marsh might want to have a look at zmv if you're on mac/linux; it is great for stuff like this

#

do try with zmv -n to see what will be the result

earnest shuttle
#

if im transposing, it becomes 750,28,28,2 idk why

dull turtle
#

i am getting this error ```python
Traceback (most recent call last):

File "F:\office codes\resampling code.py", line 19, in <module>
chunk['Price'].resample('1T').sum()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 957, in f
return self._downsample(_method, min_count=min_count)

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1053, in _downsample
self._set_binner()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 195, in _set_binner
self.binner, self.grouper = self._get_binner()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 202, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1042, in _get_binner_for_time
return self.groupby._get_time_bins(self.ax)

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1499, in _get_time_bins
first, last = _get_timestamp_range_edges(

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1746, in _get_timestamp_range_edges
index_tz = first.tz

AttributeError: 'NaTType' object has no attribute 'tz'```

#

my code ```python
for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python', chunksize=100000 , iterator=True)):

chunk['Transaction Time'] = pd.to_datetime(chunk['Transaction Time'], errors='coerce')    
chunk = pd.DataFrame(chunk).set_index('Transaction Time')
chunk['Price'].resample('1T').sum()
print('chunk..')
print(chunk)
print()```
pure gull
dull turtle
# pure gull looks to me like you have mixed a non-timestamp value into a dataframe where you...

see python chunk.. Msgtype Activity Type ... Price Qty Transaction Time ... 2011-08-28 09:15:02.006138097 Order N ... 473430 1175 2011-08-28 09:15:02.707897899 Order N ... 255595 1175 2011-08-28 09:15:03.373856246 Order N ... 255600 500 2011-08-28 09:15:04.159525439 Order N ... 607185 1175 2011-08-28 09:15:05.213452151 Order M ... 255605 1175 ... ... ... ... ... 2011-08-28 09:15:14.243840962 Order M ... 21815 50 2011-08-28 09:15:14.243913774 Order M ... 21820 25 2011-08-28 09:15:14.243978819 Order M ... 21825 50 2011-08-28 09:15:14.244046696 Order M ... 21830 25 2011-08-28 09:15:14.244110503 Order M ... 21835 50 my dataframe

desert oar
#

is this yet another person working on the same program?

dull turtle
#

no i am only

desert oar
#

or did you change your username @dull turtle ? are you daniel?

desert oar
#

ok, thanks. sorry

dull turtle
serene scaffold
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

False
gaunt marsh
#

How can I iterate through a numpy array? I am getting TypeError: 'numpy.int64' object is not iterable

reef wing
#

Very good question baby

#

You aren't iterating over an array to begin with; you have a scalar value of an int NOT an array of int

#

TLDR: Whatever is in that value isnt an array. @gaunt marsh and PS: baby freeza has horns I think; your profile picture is a lie!

gaunt marsh
#

I have one last question. This is my Array: [11, 7, 7, 7, 6, 6, 5, 5]
I give this to a pie chart and it prints this:

#

This is wrong because every value in that array belongs to one part of the pie

#

instead it prints all values to all parts

#

Do you have any Idea how to make it plot the right value to each part?

#

@reef wing this is my favorite xbox 360 avatar 😛

reef wing
#

So your array is good now; BUT your plotting method is bad; the only way for me to fix it is to see the plotting code.

rigid zodiac
#

quick question how do you split the file inside the folder into test-train and validation

reef wing
#

you don't split the file normally, you split the data into two separate variables

#

you can split the file also; but that would require reading in one file and creating two separate files

rigid zodiac
#

it was in npy file

reef wing
#

yeah if they are paying you to split a file then split the dang file lol

rigid zodiac
#

manually?

reef wing
#

I recommend reading it with the numpy module np.load()

gaunt marsh
#

rgb_10 outputs [11, 7, 7, 7, 6, 6, 5, 5]

reef wing
#

I think the common way to save or write to a file is np.save("filename.npy")

#

so

data = np.load("input.npy")

train, test = some_split_logic_you_invent(data)

with open('train.npy', 'wb') as f:
    np.save(f, train)

with open('test.npy', 'wb') as f:
    np.save(f, test)
#

I did 50% of the work there you just need to make that function to split and return two numpy arrays

#

sklearn has a train test split function

frigid elk
#

having issues loading an xgboost booster model saved from R into python and getting consistent predictions. .. any caveats i might need to take into consideration?

reef wing
#

Give me a second as my current virtualenv doesnt have matplotlib to figure it out @gaunt marsh

#

AH! I had that problem last year DRE

#

That's exactly why we had to move back to SKLEARN

#

are you storing into a pickle file as I did?

#

This can happen even if you don't store the file but pickles definitely exacerbated the problem

frigid elk
#

no, .. saved from R. .. xgb.save('model.model') ... then loading in python

reef wing
#

Yeah; i guess storing it as an object regardless of the method right now causes this to happen

#

the seed should stay the same within the same run; it's just that it seems to be set differently when running outside the same script instance

frigid elk
#

so no fix? i need to continue using our model in R?

reef wing
#

It may be different even if you run it again in R

frigid elk
#

ah

reef wing
#

the problem is if you get a consistent model like the Gradient booster in SKLEARN its like 10X slower

frigid elk
#

no, i set the seed in r (23) and am able to get consistent predictions

reef wing
#

Nice perfect; just double test it with that config of seed to make sure though;

frigid elk
#

i tried exporting the config from R as well and loading that in python while loading the model, still crappy predictions

reef wing
#

I had that problem and I actually had misinterpreted the output

#

sometimes the output array is flipped

frigid elk
#

i'm good with r, but would prefer python due to other requirements. .. namely automation and shap

#

really? flipped? ... i'll check on that

reef wing
#

yeah lmao I was classifying my customers as Assasins instead of Enjoying my product on text classification for like a whole day; did not end well. Hence the support team contacting people that actually loved the product lol

frigid elk
#

lol

reef wing
#

Thank god I was the only tech person that understood what was going on and I didn't get in trouble

frigid elk
#

i'm working on attrition, but yeah.. flipped results will definitely be noticed here

gaunt marsh
reef wing
#

yeah the only way I did it is by using plt.pie(x=data_array, labels=label_array)

#

where data_array elements and label_array elements are uni-dimensional and each element corresponds with each other

#

that method wont show the numbers inside the piechart; but has correct labels

gaunt marsh
#

the data_array is rgb_10 for me which is [11, 7, 7, 7, 6, 6, 5, 5]

reef wing
#

Where does real_col come from?

#

also people will nag at you when your coding for money about your loop

#

rgb_10 = []
hex_10 = []
for k in range(len(rgb_counts)):
    if rgb_counts[k] >= 5:
        rgb_10.append(rgb_counts[k])
        hex_10.append(real_col[k])

rgb_10 = []
hex_10 = []
for count, col in zip(rgb_counts, real_col):
    if count >= 5:
        rgb_10.append(count)
        hex_10.append(col)
frigid elk
#

@reef wing no go on the flipped output array. ... is there somewhere i should be setting the seed outside of np.random.seed? does the model not save that as well?

reef wing
#

Tbh I think it's what I stated before; when saving the object into your file system it kinda deletes the local variables in the methods of the XGBoost object when you load it

#

regardless of the language you use to boot up the model

#

But it was a year ago

frigid elk
#

bummer, did you find out how to port a specific model over? ... i tried pickling via reticulate in r but get a 'cannot pickle pycapsule object' error

#

doesn't look like dill supports that either

reef wing
#

yeah for models that had to share with other people I used sklearn and it was consistent

#

for my own models I winged it and used xgboost

frigid elk
#

so sklearn implementation of xgboost is less accurate?

reef wing
#

sklearn implementation is more consistent when storing the model into hard disk; but slower in training stage

frigid elk
#

gotcha

#

is it worth training in xgboost native and pickling to sklearn to save and distribute?

reef wing
#

I don't think there is a transformer for XGboost to sklearn model tbh

#

Not 100% certain

frigid elk
#

oh, .. my mistake. .. was thinking the XGBRegressor could use booster

reef wing
#

Depends on what you are talking about; but I think the "boost" is using stochastic methods instead of deterministic methods to train

#

hence the randomness; but that randomness boosts the speed of training

#

@gaunt marsh

#

Did u fix ur problem freeza?

#

autopct is what helps you label inside the pie

slim drift
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

slim drift
#

'''py
total_count = 0
total_frames = 0
overlap_count = 0
overlap_frames = 0
for i in manuals_spans:
total_count += 1
total frames += (i[1]-i[0])
check = 0
for j in comp_spans:
if (j[0] > i[0] and j[0] < i[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (i[1] - j[0])
elif (i[0] > j[0] and i[0] < j[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (j[1] - i[0])
percent_called = (float(overlap_count)/total_count)*100
percent_overlap = (float(overlap_frames )/total_frames)*100
print(percent_called, percent_overlap)
'''

prime hearth
#

Thanks!

stable kestrel
#

hey

#

do you guys know something about monte carlo method?

reef wing
#
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle.```

It is normally used to estimate a parameter in a statistical formula
#

Like the MLE method or some other method like method of moments

#

Except MLE and method of moments use pure mathematical solutions

#

it's in essence an old version of the reinforcement machine learning technique

scenic drum
#

hey yall, anyone know if conda tends to work ok and not mess up the rest of a system if I simply want to use strictly conda or strictly not-conda for each different project? years ago it used to trample stuff or something. (thread continued from #help-cheese)

bitter fiber
#

Yes

#

Stick to whatever you already know and use well

#

Dont invent hypothetical problems just be productive until you actually face a problem at this point the value isnt the software for data science its the applications of it

scenic drum
#

wait, @bitter fiber, that answer is slightly surprising as a response to my question so I'm confused if it was to me. you're in fact saying "yeah, it still tramples stuff, avoid conda unless it's in docker for someone else's project" basically?

bitter fiber
#

Oh no lol i mean use conda if thats what you are already using

#

Trampling stuff no but yes on 2nd point

#

Imho i just use normal python and i’ve done ok

scenic drum
#

to clarify, a lot of packages I've seen recently have been saying "hard to install without conda", so I want to see what these things are that are mostly conda-only, and try them out a bit. but to do that, I want to make sure if I try conda, it won't mess with my existing python installations (system, and on some computers, pyenv)

#

eg homebrew messes other pythons up

reef wing
#

This is more of an operating system problem; as I would use whereis python or python --version and not a true data science problem; I recommend asking in another channel as they can help you more. I have trouble with this too; which is why I have so many computers and am conservative with what I download to not increase stress.

#

also note that every environment will use a separate pip and python combination; hence there will be no problems if maintaining a project within an environment.

daring blaze
#

I’m not sure if this is something appropriate for this channel, or perhaps #async-and-concurrency instead, but I was wondering if anyone here had insight on the best way to train multiple RL agents “together”? By together though, I just mean training like multiple Q-learning agents with different hyper parameters at the same (or near the same) time

reef wing
#

Do the two RL agents communicate?

#

If they don't I would use the below instead of x*x train the model in the function

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))
#

wierd i cant design my code

lunar grotto
#

Any recommendations for APIs or free solutions to easily get the complete paginated search results from a google query search?

reef wing
#

i wish

#

Google Custom Search gives you 100 queries per day for free.
After that you pay $5 per 1000 queries.
There is a maximum of 10,000 queries per day.```
lunar grotto
#

That doesn't sound like extortion. Thanks!

daring blaze
reef wing
#

You can pass a parameter to most machine learning modules that increase the number of processors you want to use in the model

#

but if you're doing reinforcement learning that's likely not the case; as you are developing it yourself most of the time.

#
from multiprocessing import Pool

def f(x):
    return x*x

if name == 'main':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))```
#

I added color to the code to make it more elegant 😄

daring blaze
#

Yea my goal is to build everything ground up but train different agents in parallel to do hyperparameter searching

reef wing
#

I love the notion of reinforcement learning. Just let the computer figure it out, sounds like super high tech but so simple lol.

daring blaze
#

That's why I'm trying to play with it lol, my goal is to go through the Richard & Sutton book and replicate their graphs before moving onto some more complex/modern approaches

#

The whole "figure it out yourself" is much more interesting to me than "here's a million pictures, find the dog" lol

#

Though I know that's an oversimplification

reef wing
#

I want to do it for chess tbh

#

or checkers; since it's simpler

#

Simply because I want to make and practice making new dashboards in streamlit

arctic wedgeBOT
#

Hey @quasi parcel!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

reef wing
#

Thank god good bot

stable kestrel
#

ehm

#

but for that method

reef wing
#

oo @daring blaze someone made a checkers with rest api connection so you can play with your code

stable kestrel
#

we're tyring to plot and calculate the integral with the monte ccarlo method

#

can you maby help?

#

if you are avalible?

reef wing
#

I think you should ask direct questions of where you are stuck and not ask for someone to help with the whole thing as we aren't getting paid

stable kestrel
#

we're stuck at getting the dots to work

#

ehm

#

this is the code rn

#

and the assignment is plot the graph,make the points outside the graph red and inside the graph green

#

and watch out when the graph goes below the y axis because then the values get negative

#

that's we're were stuck rn

#

because the rest is working

reef wing
#

I am still not sure which specific part you are stuck :/

stable kestrel
#

no worries ❤️

#

i explained it worng probs

#

ehm

#

when the values go below the y axis

#

*x axis

#

the code doesnt hold

#

and yeah the other point

#

on the picture

reef wing
#

to check if a value went below the x axis; you have to check if that value is greater than 0

stable kestrel
#

ehm

#

hmm

#

below the x axis

#

are negative yvalues

#

i got it a bit

#

    #for y_values >= 0:
        
        if random_y_value <= y_values:
            green_dots_y_value.append(random_y_value)
            green_dots_x_value.append(random_x_value)
            counter_points_inside_graph += 1


        else:
            red_dots_y_value.append(random_y_value)
            red_dots_x_value.append(random_x_value)
            counter_points_outside_graph += 1```
quasi parcel
#

but this is the expected out

#

i am not understanding why it is happening like htis

silver sun
#

Is anyone familiar with the fashion MNIST data set?

reef wing
#

@quasi parcel the first problem is that you are not using the methods that pandas has for dataframes

#

also i think you need to wrap your regex thing with parenthesis before using the plus sign

#

so you can capture more than one thing

prime hearth
#

@silver sun a lot are , afterall this is ai channel

reef wing
#

res = re.findall("'Product_Id': '[(0-9)+]'", str((r))) note the plus inside the brackets idk how one would handle comma delimited in re module but I wouldn't even use re module to begin with for this problem.

prime hearth
#

It be better if ask the problem are facing rather than asking to ask

reef wing
#

Instead of training one yourself; unless you are learning how to train models ( It's out there just cant find it right now )

silver sun
#

fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(10,5))
for i in range(5):
image = X_train[i]
label = Y_train[i]
label_name = label_names[label]
axs[i].imshow(image, cmap='gray') # imshow renders a 2D grid
axs[i].set_title(label_name)
axs[i].axis('off')
plt.show()

prime hearth
#

What Happens

reef wing
#

are you displaying anything at all at this point?

prime hearth
#

When run code

silver sun
#

So far I have just figured out to print out the first 5 pictures

reef wing
#

you hardcoded 5 in many places

#

So that may be it; try settings num_images = 10 on top and putting num_images instead of 5 hardcoded

#

You seem to have reached your own goal after re-reading everything you wrote lol

#

I am trying to display the first 5 images for each class then So far I have just figured out to print out the first 5 pictures

silver sun
silver sun
quasi parcel
#

apart from regex

quasi parcel
#

can anyone please help me

#

please

frigid elk
#

@reef wing curious your take on rstudio vs any other ide? ala vscode. .... any advantages to leaving rstudio?

quasi parcel
#

after doing your change @reef wing the list is empty

#

just an observation for less data the code is working fine

#

apart from using regex

#

the above like is the data

#

please help me

#

the out to be expected is this

polar dock
#

You can use .explode() in pandas

#

On that series

quasi parcel
#

how that even work

polar dock
#

Each element of the series product_id is a list

quasi parcel
#

the expected output should in a list

polar dock
#

I'm writing a test for you

#

One sec

quasi parcel
#

okay sure

polar dock
#

I'll DM you

quasi parcel
#

sure

#

please

prime hearth
#

hello, i would like to please ask for Natural lang processing, is it better to first tokenize then do text cleaning or the other way around?

#

usually i do collect raw data then text cleaning then tokenize

magic dune
#

does matplotlib take html values for color when ploting

reef wing
#

depends on your ram constraints I have 256 GB of ram for NLP and I clean out using PCA in my grid search.

#

Also note that you should change your tokenizer to convert to int32 or int16 to reduce ram consumption * <- this took me years to learn haha.

#

@prime hearth

prime hearth
#

oh ok thx

prisma mulch
#

can someone eli5 random_state

weary lynx
#

so

#

how to get started with data science and ai

warm nebula
#

Is this ai?

weary lynx
warm nebula
#

Well

weary lynx
#

thx

tender hearth
#

Throws a lot of terminology at you. Be warned

quasi parcel
#

i am facing some trouble with cosine similarities and weight-rating into pivot table

quasi parcel
#

yes two mins sharing the csv

#

i am copying

#

it

#

two mins

#

first need to find weightage after that cosine similarities

#

please anyone help

gaunt marsh
#

I want to write a list into a textfile with this Code:


for row in rgb_10:

    np.savetxt(value_file, row)

value_file.close()```

But it says `ValueError: Expected 1D or 2D array, got 0D array instead`

This is what my list looks like
quasi parcel
#

can anyone help please

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @errant hull until <t:1632390001:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

royal crest
#

!d sklearn.metrics.pairwise.cosine_similarity

arctic wedgeBOT
#

sklearn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True)```
Compute cosine similarity between samples in X and Y.

Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y:

>  K(X, Y) = <X, Y> / (||X||*||Y||)
> 
>   On L2-normalized data, this function is equivalent to linear\_kernel.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/metrics.html#cosine-similarity).
royal crest
#

not too sure what weightage is

quasi parcel
#

got it

#

got it

#

@royal crest

#

thank you

#

and can you tell how to add two different size martix using numpy

royal crest
quasi parcel
#

yes but is there anyway we can achive that

royal crest
#

can't you just use +

#

!e

import numpy as np

a = np.array([[3, 6, 9], [5, -10, 15], [-7, 14, 21]])
b = np.array([[9, -18, 27], [11, 22, 33], [13, -26, 39]])
print(a + b)
arctic wedgeBOT
#

@royal crest :white_check_mark: Your eval job has completed with return code 0.

001 | [[ 12 -12  36]
002 |  [ 16  12  48]
003 |  [  6 -12  60]]
royal crest
#

i don't know any fancier way to add matrices, perhaps someone else can chime in

royal crest
#

!e

import numpy as np

a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(f'{a}\n{type(a)}')

b = np.array(a)
print(f'{b}\n{type(b)}\nDimensions:{b.ndim}')
arctic wedgeBOT
#

@royal crest :white_check_mark: Your eval job has completed with return code 0.

001 | [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
002 | <class 'list'>
003 | [[1 2 3]
004 |  [4 5 6]
005 |  [7 8 9]]
006 | <class 'numpy.ndarray'>
007 | Dimensions:2
quasi parcel
#

got it

rigid zodiac
#

quick question, do we have some sort of recurrent KNN?

gaunt marsh
tender hearth
rigid zodiac
# tender hearth What would that achieve

Like i'm trying to train a model to detect whether it fall or no fall, and my junior was like you can use recurrent knn.

I think he meant RNN but he insist it recurrent KNN

tender hearth
#

I mean

#

KNN is an iterative process, maybe he meant that

#

although he could've just said KNN

rigid zodiac
#

that's what I thought. like in grad school they didnt mention about recurrent KNN... so idk

In his word, he was like you can use initial set of data to create KNN model, then keep training KNN... like epoch

quasi parcel
#

can anyone tell how to get the weightage for each product from pivot table

serene scaffold
#

@quasi parcel what is weightage

quasi parcel
#

so for each customer there is a product assigned need to find the weightage for each product for every occuracnce give me some time i will explain properly @serene scaffold sir

royal crest
#

i don't think that's a proper term

#

do you mean weighted average?

quasi parcel
#

yes

royal crest
#

you should probably write a function to calculate it

#

lambda that is

serene scaffold
serene scaffold
quasi parcel
#

yes sure

#

i can

serene scaffold
#

Please provide it as text and ping me when you have it.

quasi parcel
#

sure

#

give me some time

#

the algo is running

#

two mins

#

sorry

rigid zodiac
#

any one know how to categorize or combine frames in millisecond into second?

pastel valley
#

can someone drop cs thesis topics here? any recommendations?

#

topic for cs student

spiral gale
#

Hey guys, last week my boss proposed me to create an AI to form processing, but I am new to AI and ML.
I started my research on python library and found PyTorch and TensorFlow.
I want to study more about them, but I am lost where I start.
Can anyone help me?

lapis sequoia
lapis sequoia
quasi parcel
#

really sorry there was some prod issue i had to address

#

this is the data

#

@serene scaffold

serene scaffold
spiral gale
quasi parcel
#

for the product id occurance for customer

serene scaffold
#

what determines the weight?

spiral gale
quasi parcel
#

the occurance of each product_id

#

@serene scaffold

serene scaffold
spiral gale
quasi parcel
#

let me check thank you sir @serene scaffold

chilly mica
#

Does anyone have any experience using numpy and folium/heatmap? I've been trying to aska question for a few days but we havent gotten anywhere

reef wing
#

seaborn has a pretty heatmap

#

sns.heatmap(df, annot=True)

#

Never done folium though.

slim drift
#

hey if anyone here is able to help with a question come to helproom burrito

lapis sequoia
brisk moth
#

can anyone help me implement a way to stop getting rate limited using twitter api

tough bolt
#

Problem: I'm not getting the result shown on this website

#

not exactly what it's supposed to look like

#

Here is what my code currently looks like:

        cov_matrix = np.cov(points_for_cov)
        eigenvalue, eigenvector = np.linalg.eig(cov_matrix)
        center_x = np.average(all_points_x)
        center_y = np.average(all_points_y)
        
#

So I'm simply using numpy to calculate the covariance matrix from my point list

and then linalg.eig for the eigenvalue and vector

#

The rotation for the ellipse I'm calculating like this

math.atan2(eigenvector[0][1], eigenvector[0][0])
#

Any idea where I could be going wrong ?

brisk moth
#

nice graph

tough bolt
# brisk moth nice graph

the blue one is supposed to look like that, no worries.

the black lines are supposed to visualize the rotation and width of those ellipses above

brisk moth
#

you want elipses instead of lines?

tough bolt
#

also this

#

problem is, the black lines don't correctly fit my data

#

I can't pinpoint where I'm going wrong

#

I think the black line should more look like the red one

#

ellipses are just a different visualization of the black lines

brisk moth
#

i think its grabbing that little blue splotch

#

and trying to fit it

tough bolt
#

not sure honestly

#

might be it

reef wing
#

What yall up to?

slim drift
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

slim drift
#
manual_spans = []
manual_spans.append(plt.axvspan(242068, 249530, 0.5, 1, color='g',alpha=0.5))
manual_spans.append(plt.axvspan(242055, 249503, 0.5, 1, color='g',alpha=0.5))

total_count = 0
total_frames = 0
overlap_count = 0
overlap_frames = 0
for i in manual_spans:
    total_count += 1
    total frames += (i[1]-i[0])
    check = 0
    for j in ranges_omega:
        if (j[0] > i[0] and j[0] < i[1]):
            if check == 0:
                overlap_count += 1
                check = 1
            overlap_frames += (i[1] - j[0])
        elif (i[0] > j[0] and i[0] < j[1]):
            if check == 0:
                overlap_count += 1
                check = 1
            overlap_frames += (j[1] - i[0])
percent_called = (float(overlap_count)/total_count)*100
percent_overlap = (float(overlap_frames )/total_frames)*100
print(percent_called, percent_overlap)
#

so im trying to just find the percent overlap between manual_calls and ranges_reversal

#

and need some help checking this code

slim drift
#
File "<ipython-input-121-551078431366>", line 9
    total frames += (i[1]-i[0])
               ^
SyntaxError: invalid syntax
#
TypeError                                 Traceback (most recent call last)
<ipython-input-122-cb481450f9a5> in <module>()
      7 for i in manual_spans:
      8     total_count += 1
----> 9     total_frames += (i[1]-i[0])
     10     check = 0
     11     for j in ranges_omega:

TypeError: 'Polygon' object is not subscriptable
#

anyone know how to fix this error?

#

@tribal arrow i want to find time overlap

#

on the x axis

tribal arrow
#

can you join vc

#

it was helpful

slim drift
#

yea im on a call with advisor

#

for like another 10 min

tribal arrow
#

ah alright

#

Anyway, take the maximum and minimum value on the x-axis of the zigzag line (your sample)

#

and also the minimum and maximum value on the x-axis of the big rectangle

#

the difference, of each, between their minimum and maximum is their size

#

divide the rectangle size by the other and multiply the result by 100

#

should give you the percentage overlap

#

unless I'm making some elementary mistake

mossy maple
#

how do i get pd.read_csv to read a uint64 col? tried pd.read_csv(fname, squeeze = True, usecols = [4], names = ['time'], dtype = { 'time': 'uint64' }, engine = 'c'), but it reads it as float64 instead

#

a typical entry in this field looks like this 1598918686328, which doesnt fit in 32 bits

lapis sequoia
rigid zodiac
#

Hey guys I have a quick question, how can you categorize entire csv file?

#

I'm trying to categorize a csv, and the compare with other

rigid zodiac
rigid zodiac
mossy maple
lapis sequoia
serene scaffold
#

I have a jupyter server running on a remote machine. how can I access the jupyter GUI for that server locally?

#

it gave me a url I can go to but obviously I can't just copy and paste that url into a local browser.

serene scaffold
grave frost
#

@serene scaffold just use something like ColabCode if you are too lazy. I use it all the time

quasi parcel
#

hi @serene scaffold remote server means is it sagemake or aws instance or how it is

#

can you tell me\

#

i think we can get throught apache

serene scaffold
#

@grave frost @quasi parcel I think I figured it out but it's specific to this system.

quasi parcel
#

okay awesome sir @serene scaffold

brisk moth
#

can anyone help me with not getting rate limited using twitter api?

#

        keywords_tweets = tweepy.Cursor(self.api.search, keyword, lang = lang, tweet_mode = 'extended').items(count)

        return keywords_tweets``` is my main function
#

im guessing its a problem with using the cursor but how can i change it?

real wigeon
#

how would i be able to count the occurences of a list of categories

#

using pandas

serene scaffold
#

@real wigeon you'd need to share an example and the expected output

#

Is it a list of strings or what?

real wigeon
#

yes

#

it's cetegorical data

#

strings

#

like if i query my db, and get a dataframe, and in that dataframe i want to count the values in a column

#

but i want to count how many instances of each category

#

my ultimate goal is to actually make a bar chart out of this

reef wing
#

yep value_counts isnt bad

#

I recommend turning into a string before counting though

vale zephyr
#

Hi ! Anyone know how to apply a green mask (show only the green on an image) within the forward function (need to convert it to OpenVINO) in PyTorch ?

rose cipher
#

Guys, it is possible to learn the math needed for Data Science (Calculus, Linear Algebra, Statistics, and Probability) only with videos? I know that books are the right way, but there's a lot of unnecessary information that I will probably not use. And I do not want to master math, just want the necessary to work with Data Science and its fields (ML, AI, Data Engineering).

reef wing
#

Nope @rose cipher ! There are things I did entire videos + writing down notes + all the exercises at the end of the chapter of book + reading the chapter

And I still forget it 2-3 years down the line

#

The things I do exercises and practice I remember after quick refresher video

#

What I think you should do is choose a subset of applications like I did; as it helps with learning much less stuff and makes you much more productive and smarter at that one thing by focusing:

Choose Image processing, text processing, sound processing, or some one domain first and master it as not all the models and math applies to everything.

#

Y finalmente bienvenido al mundo de datos; espero que todos los hispanos se apoyen.

serene scaffold
rose cipher
reef wing
#

Lol ignore the well thought out and experienced answer if you must. The marketplace already pays me handsomely, wouldn't want competition anywho.

prime hearth
#

hello, i would like to please ask for sentimenatal anaylsis if something positive or negative would we use count vecotrizer or TfIDF or. bag of words etc and in text summary or NLP with relationship we use things like word2vec and glove?

reef wing
#

@prime hearth Essentially you need a huge dictionary mapping words to sentiments and conditional sentiments. I wouldn't train one myself but use an existing library. Do you need an english one?

prime hearth
#

oh it just for personal projects

#

i was looking over oding some research but from what i read the conclusion i got is if it for classification then to use like count vectorize since easier or others and see what better accuracy otherwise for NLP use word2vec etc

#

it because im making 2 models, one to classify something good or bad and another is to summarize text using RNN. However, i not sure if it best just to use word2vec for both models or whicch to use ? I know since similarity is important i would use word2vec but not sure for classification

reef wing
#

Well vectorizing the data is only to prepare it for the model; you can input that "vector" of data into a gradient boosting algorithm or any classifier now; because you now have count/dummy variables as input

#

The one that's worked the best for me is gradient boosting for text classification. but I think BERT is the best one that exists for NLP

#

You can make separate models and use the output of one model as input to the 2nd model; OR use the classification to check if the summary returns the same label "GOOD" or "BAD" as the original text.

uncut turret
#

Suggest me resource for AI or ML , python.

tender hearth
# uncut turret Suggest me resource for AI or ML , python.

Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programme...

MIT Technology Review

Machine-learning algorithms find and apply patterns in data. And they pretty much run the world.

The Official NVIDIA Blog

AI, machine learning, and deep learning are terms that are often used interchangeably. But they are not the same things.

GitHub

🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained - GitHub - trekhleb/homemade-machine-learning: 🤖 Python examples of popular machine ...

uncut turret
#

It would be better to follow one at a time?

tender hearth
#

Then after that I stopped following courses and started reading papers/articles for reference in my projects

prime hearth
#

Another good resource too for learning data cleaning techniques is Krish Naik youtube channel

uncut turret
#

Ok suggest me papers for references. I mean how I will use.

uncut turret
#

I don't want to stuck myself in many as I have other subjects not just one .

uncut turret
tender hearth
uncut turret
#

Right I just want to take glance 🙂

#

@tender hearth
Suggest me using weka related course.

tender hearth
#

weka?

uncut turret
arctic wedgeBOT
#

failmail :ok_hand: applied mute to @solar grotto until <t:1632452878:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

subtle barn
#

how to perform ml with this type of data. Its FHIR dataset of Allergy Intolerance

lapis sequoia
tender hearth
#

Do you want to predict something, classify something, etc?

#

You can't just "do ML" on data

arctic wedgeBOT
#

Hey @subtle barn!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .json attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

subtle barn
hasty grail
#

I suppose that this is for some open-ended project based on the given dataset

subtle barn
hasty grail
#

They didn't provide any guidance on what to do?

subtle barn
#

they told to do like anything on it

subtle barn
#

pls somebody can u do ml implementation on it. simple is fine

elfin atlas
subtle barn
elfin atlas
#

json is even easier than csv

subtle barn
#

btw i wont get paid or anything like that

elfin atlas
#

anyways you can read through the tensorflow docs to get up to speed with the basics

#

or you can use keras for a more high level tf implentation

subtle barn
elfin atlas
#

what data is given?

subtle barn
#

i will send u the raw json link

subtle barn
subtle barn
royal crest
#

uhhh

lapis sequoia
#

and about ML implementation well for starters you can show some statistics, may be try to find attributes which matter most. and may be divide dataset into training and testing set and see how different supervised algos work out.

lapis sequoia
# subtle barn can u pls do an ml implementation and send me the notebook pls

I recommend a master’s degree in machine learning at university before you start doing anything. Processing important data without expertise leads to erroneous results. Wrong results lead to wrong decisions and it is detrimental to everyone. Do you take responsibility if someone makes you an incorrect machine learning model which lead to wrong decisions? A bit like a fake doctor would ask you to do surgery on your behalf.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1632465649:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

subtle barn
lapis sequoia
elfin atlas
#

you cant just bodge your way through an internship

lapis sequoia
#

For me it seems that you have lied about your ability to build machine learning models and now you are here asking others to do it for you. Give us a break. I'm sorry, my intention is not to be rude, but if one can't even convert a json file e.g to Pandas DataFrame doesn't know even the simplest basics in the industry.

weary lynx
#

o

subtle barn
lapis sequoia
subtle barn
lapis sequoia
#

You need to understand the problem before you can solve the problem. You also need to be able to examine the data to see what it is suitable for. Sometimes no relevant results are obtained from the data.

#

But good luck going forward! I have an appointment.

desert bear
#

When I was testing ensemble model, I have stumbled upon a phrase verbosity. There is parameter that if set to True, "controls the verbosity of the building process."
Does anyone know what does it mean?

tender hearth
#

we can't make assumptions

misty flint
#

i think the best advice ive heard recently

#

is try to solve your problem WITHOUT ML first

#

even if you end up having to use ML in the end, your solution will usually be better for having thought from that perspective

#

also not every problem is a ML problem

lapis sequoia
gaunt marsh
#

I have a Pie chart which looks like this. I want a legend on the right side which shows me the RGB color values of the pies pieces. I have the color values in an Array, I give them to the chart already. How can I pass it to a legend?

rigid zodiac
#

quick question, how do you split files inside a folder as train test split

#

like I'm trying to split it, since each csv file consider as a pattern

pure gull
rigid zodiac
#

where each one of it was split into 80+ frames (10 second)

pure gull
rigid zodiac
#

I want to but the issue is 80+ frames (in 1 csv file) will be consider as a fall or nonfall, and I have like 320 csv file for nonfall and 200 csv file for fall

rigid zodiac
pure gull
#

Ok so you just run the split on two lists. Fall and not fall. (There might be something called «stratified» to address this already)

rigid zodiac
pure gull
#

Are the files so large that memory becomes an issue?

rigid zodiac
pure gull
#

I would still construct a big dataframe of it all because you can then easily work with the data, loop over positions, fall/nonfall, ... use stratified sampling, etc

#

(but there are many ways to do this and this is just my prefrence)

#

My thinking is that when I am doing the analysis, I do not want to be concerned with which file the data is in, which format, ... and other issues. Convert up to a more abstract dataset so I can focus on the data, not on the storage formats, ...

rigid zodiac
#

but when you split it. it will randomize, there will no longer be 10seconds frames or is it?

#

like I tried it before but it is not correct

rigid zodiac
serene scaffold
pure gull
#

yeah maybe but you shoul dbe able to do it quite easily yourself, too - it's hard to understand exactly what you are after here

prisma mulch
#

can someone eli5 n_estimators in random forest regression?