desert oar Sep 20, 2021, 6:14 PM

#

from math import isclose

# Lists of length-2 lists or tuples
computational_spans = [ ... ]
manual_spans = [ ... ]

def span_percent_overlap(lo1, hi1, lo2, hi2):
    # You can figure this part out :)
    ...

span_overlaps = {}
for comp_num, (comp_lo, comp_hi) in enumerate(computational_spans):
    for manu_num, (manu_lo, manu_hi) in enumerate(manual_spans):
        overlap = span_percent_overlap(comp_lo, comp_hi, manu_lo, manu_hi)
        if not isclose(overlap, 0.0):
            span_overlaps[comp_num, manu_num] = overlap

#

something like that maybe

slim drift Sep 20, 2021, 6:14 PM

#

hmm ill try it

#

give me 1 sec

slim drift Sep 20, 2021, 6:15 PM

#

desert oar ```python from math import isclose # Lists of length-2 lists or tuples computat...

what does the lo1 hi1 part mean

desert oar Sep 20, 2021, 6:16 PM

#

the starts and ends of the spans

#

as in "low" and "high"

slim drift Sep 20, 2021, 6:16 PM

#

oh ok is it fine to keep it like that?

#

which part do i need to figure out

desert oar Sep 20, 2021, 6:17 PM

#

variable names are up to you. this is just to demonstrate the use of enumerate and the nested loop

#

you need to figure out how to determine if 2 spans overlap

slim drift Sep 20, 2021, 6:18 PM

#

desert oar you need to figure out how to determine if 2 spans overlap

shouldnt this be based on if they overlap on time

desert oar Sep 20, 2021, 6:18 PM

#

yes, isn't that what the contents of the lists are?

#

|----|
  |----|

|--------|
   |---|

|---|
      |----|

which spans overlap, and by how much? you need to write the code to do that part

#

it sounded like in your code the spans are represented as a pair of times, the start time and the stop time

#

which is a reasonable way to represent a time span!

slim drift Sep 20, 2021, 6:19 PM

#

desert oar which is a reasonable way to represent a time span!

yea

desert oar Sep 20, 2021, 6:19 PM

#

so given two start-stop pairs, how do you calculate the % overlap?

slim drift Sep 20, 2021, 6:20 PM

#

well there's time in between as well

desert oar Sep 20, 2021, 6:20 PM

#

i'm not sure what you mean by that

slim drift Sep 20, 2021, 6:20 PM

#

there is time = 0, time =100, but also there are times in between

desert oar Sep 20, 2021, 6:21 PM

#

i still don't know what you mean by that

#

you asked how to find the overlaps between two lists of time spans, i am showing you the rough sketch of how you might do that

slim drift Sep 20, 2021, 6:22 PM

#

desert oar you need to figure out how to determine if 2 spans overlap

i dont get how to tell if the 2 spans overlap

forest beacon Sep 20, 2021, 6:30 PM

#

Hello!

fig = plt.Figure()
ax = fig.subplots(1)
plt.gcf().autofmt_xdate()
ax.set_facecolor('#1E1E1E')
fig.patch.set_facecolor('#1E1E1E')
x, y = [], []
async for document in db.main.stats.find({}):
    x.append(datetime.fromtimestamp(document['time']))
    y.append(document['total_users'])
ax.plot(x, y, color='orange', linestyle='-')
ax.grid(color='#323232', linewidth=1, linestyle='-')
ax.set_xlabel('Время')
ax.set_ylabel('Количество пользователей')
ax.tick_params(axis='both', colors='white')
ax.spines['top'].set_color('white')
ax.spines['bottom'].set_color('white')
ax.spines['left'].set_color('white')
ax.spines['right'].set_color('white')
ax.xaxis.label.set_color('white')
ax.yaxis.label.set_color('white')
ax.xaxis.set_major_locator(HourLocator(interval=2))
ax.xaxis.set_major_formatter(DateFormatter('%H:%M', tz=tz.gettz('Europe/Moscow')))
ax.yaxis.set_major_formatter(plt.FuncFormatter(format_tick))
plt.close('all')
fig.savefig('./data/stats.png', bbox_inches='tight', dpi=256)

I use matplotlib to create stats graph for my bot. And it executes every 15 minutes to update image. And every 15 minutes matplotlib uses some memory but doesn't release it. I tried to do plt.cla(), plt.close(), fig.clear() and fig.clf() but these has no effect.
I even asked this question in StackOverflow. https://stackoverflow.com/questions/69144822/releasing-memory-in-matplotlib.
Maybe somebody here had same problem.
Help please.

Stack Overflow

Releasing memory in matplotlib

I have this code in async function
fig = plt.Figure()
ax = fig.subplots(1)
plt.gcf().autofmt_xdate()
ax.set_facecolor('#1E1E1E')
fig.patch.set_facecolor('#1E1E1E')
x, y = [], []
async for document ...

desert oar Sep 20, 2021, 6:30 PM

#

slim drift i dont get how to tell if the 2 spans overlap

look at the "diagram" i drew - can you figure it out?

slim drift Sep 20, 2021, 6:31 PM

#

desert oar ``` |----| |----| |--------| |---| |---| |----| ``` which spans ove...

would it be an if statement if the bars overlap

desert oar Sep 20, 2021, 6:31 PM

#

yes, you'd probably want to use if at some point

#

you'll probably need the comparison operators like >, <=, etc. and the math operators like -

#

i highly suggest writing out your solution on paper first

#

do some test calculations by hand, to make sure it makes sense

#

then translate your solution to code

#

professionals do this all the time. it's a tool for thought

slim drift Sep 20, 2021, 6:32 PM

#

ok i will write it out

forest beacon Sep 20, 2021, 6:35 PM

#

forest beacon Hello! ```python fig = plt.Figure() ax = fig.subplots(1) plt.gcf().autofmt_xdate...

help please

olive shore Sep 20, 2021, 11:27 PM

#

Does anyone have a reddit dataset that I could use to train GPT 2?

reef wing Sep 21, 2021, 12:28 AM

#

https://www.reddit.com/r/datasets/ @olive shore

r/datasets

bronze rampart Sep 21, 2021, 2:35 AM

#

hello all

hasty grail Sep 21, 2021, 2:42 AM

#

Hello, do you need help with something?

bronze rampart Sep 21, 2021, 2:57 AM

#

definately need help

#

having hard time making viz my default editor

#

cant install viz module

royal crest Sep 21, 2021, 3:02 AM

#

for editor-related stuff, perhaps enquire #editors-ides

bronze rampart Sep 21, 2021, 3:03 AM

#

thanks

arctic wedgeBOT Sep 21, 2021, 5:03 AM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1632201233:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

minor spoke Sep 21, 2021, 5:08 AM

#

Hey can someone help me with ML stuff, I'm try to work with upsampling point clouds

lone drum Sep 21, 2021, 5:09 AM

#

Helli

#

My code

I tried
for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python',  chunksize=100000 , iterator=True)):
    
    sep_time = pd.DataFrame()
    print('chunk no.:', i)
    for n in chunk['Transaction Time']:

        print('n:', n)
        
        converted_timestamp = datetime.fromtimestamp(n /1_000_000_000)
        print('converted:', converted_timestamp)
        print()
        
        get_month = converted_timestamp.month
        
        get_day = converted_timestamp.day
         
        #add 10 to year
        year_add_10 = converted_timestamp.year+10

        #hour
        get_hour = converted_timestamp.hour

        #minute
        get_min = converted_timestamp.minute

        #second
        get_sec = converted_timestamp.second
        
        # microsecond
        get_microsec = converted_timestamp.microsecond

        c = datetime(year_add_10, get_month, get_day, get_hour,get_min, get_sec, get_microsec)
        print('datetime:', c)
        print()
        final_date_time = datetime.strftime(c, '%d-%m-%Y %H:%M:%S.%f')
        print('final_date_time:',final_date_time)
        print()

        chunk['time_seprated'] = final_date_time
        
        print('chunk1:')
        print(chunk)
        print() ```
 this way

#

I am getting same value in last column

chunk1:
      Msgtype Activity Type  ...   Qty               time_seprated
0       Order             N  ...  1175  28-08-2021 14:45:08.107780
1       Order             N  ...  1175  28-08-2021 14:45:08.107780
2       Order             N  ...   500  28-08-2021 14:45:08.107780
3       Order             N  ...  1175  28-08-2021 14:45:08.107780
4       Order             M  ...  1175  28-08-2021 14:45:08.107780
      ...           ...  ...   ...                         ...
99995   Order             M  ...    50  28-08-2021 14:45:08.107780
99996   Order             M  ...    25  28-08-2021 14:45:08.107780
99997   Order             M  ...    50  28-08-2021 14:45:08.107780
99998   Order             M  ...    25  28-08-2021 14:45:08.107780
99999   Order             M  ...    50  28-08-2021 14:45:08.107780```

#

Can anyone help me to understand why I am getting same value in last column

#

Ping me when replying

royal crest Sep 21, 2021, 5:27 AM

#

lone drum My code ```python I tried for i, chunk in enumerate(pd.read_csv(f'{path}{file_na...

This looks exactly same as @dull turtle's .....

lone drum Sep 21, 2021, 5:31 AM

#

royal crest This looks exactly same as <@!672406849538097173>'s .....

Yes
Can u please look info this?

lapis sequoia Sep 21, 2021, 5:31 AM

#

bet

#

love this place

royal crest Sep 21, 2021, 5:32 AM

#

i think rule 8 applies at this point

#

i won't be involved

forest beacon Sep 21, 2021, 6:59 AM

#

forest beacon Hello! ```python fig = plt.Figure() ax = fig.subplots(1) plt.gcf().autofmt_xdate...

help please

royal crest Sep 21, 2021, 7:12 AM

#

forest beacon help please

problems like this can come from multiple sources

#

if the graph is generated as expected then you could probably rule out your matplotlib code

#

it might actually be the way your async code is set up instead

forest beacon Sep 21, 2021, 7:16 AM

#

royal crest if the graph is generated as expected then you could probably rule out your matp...

yes, graph generates as expected

lone drum Sep 21, 2021, 9:27 AM

#

lone drum Can anyone help me to understand why I am getting same value in last column

Can anyone help me in this?

gray tartan Sep 21, 2021, 9:36 AM

#

Hey ! I have an issue with pandas (that i asked in #help-avocado and another channel before that)
should i copy it there or should i just wait ?
It is pretty blocking :/
Thanks in advance for the help !

uncut barn Sep 21, 2021, 11:03 AM

#

o180_img10_patch_419_43.jpg, 
o180_mask10_patch_419_43.jpg

I want to check if these strings match from pacth_419_43.jpg, i.e. I want to first find the word patch and print this with everything else to the right of it in the string, I have an idea of using some kind of regex but not sure how to implement it, any help will be much appreciated

gaunt marsh Sep 21, 2021, 11:05 AM

#

Thank you. What is that 06.2f in return '(' + ', '.join(format(val, '06.2f') for val in rgb) + ')' doing?

gaunt marsh Sep 21, 2021, 11:53 AM

#

I am getting TypeError: unsupported operand type(s) for +: 'int' and 'str' How can I fix this?

I use this code:

plt.axis('off')
plt.show()``` 

This is what rgb_labels look like

Bildschirmfoto_2021-09-21_um_13.53.02.png

pliant shadow Sep 21, 2021, 12:35 PM

#

has anyone here dealt with SageMath before?

#

facing an issue which is described here: https://github.com/sagemath/sage-windows/issues/63

GitHub

SageMath kernel folder not found while installing it to Jupyter env...

I was recently trying to install SageMath kernel into the global version of Jupyter using this command, jupyter kernelspec install $SAGE_ROOT\local\share\jupyter\kernels\sagemath Where $SAGE_ROOT i...

desert oar Sep 21, 2021, 1:01 PM

#

@dull turtle so not only did you refuse to do any work for yourself, but this also turned out to be a homework assignment? That's quite unethical. You are only cheating yourself by refusing to study and learn.

desert oar Sep 21, 2021, 1:01 PM

#

gaunt marsh I am getting `TypeError: unsupported operand type(s) for +: 'int' and 'str'` How...

The sizes are supposed to be numbers... they're sizes after all!

desert oar Sep 21, 2021, 1:02 PM

#

gaunt marsh Thank you. What is that 06.2f in `return '(' + ', '.join(format(val, '06.2f') fo...

Padding, the "06" means "pad with 0s up to 6 characters total", and the ".2" means "show exactly 2 decimal places"

#

@gaunt marsh use the counts for the sizes and the labels for the axis labels

lapis sequoia Sep 21, 2021, 1:47 PM

#

Hey

#

Anyone has worked on Marketing Mixed Modelling

dull turtle Sep 21, 2021, 1:54 PM

#

!pastebin

arctic wedgeBOT Sep 21, 2021, 1:54 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

gaunt marsh Sep 21, 2021, 1:56 PM

#

desert oar Padding, the "06" means "pad with 0s up to 6 characters total", and the ".2" mea...

thank you!

#

I need help with autopct for my pie chart. I have the absolute values as a list and want to pass them. The examples from stackoverflow didn't work

tidal bough Sep 21, 2021, 2:46 PM

#

https://towardsdatascience.com/a-simple-cnn-multi-image-classifier-31c463324fa

Why does this article call it a CNN?
the model seems to be:

model = Sequential() 
model.add(Flatten(input_shape=train_data.shape[1:])) 
model.add(Dense(100, activation=keras.layers.LeakyReLU(alpha=0.3))) 
model.add(Dropout(0.5)) 
model.add(Dense(50, activation=keras.layers.LeakyReLU(alpha=0.3))) 
model.add(Dropout(0.3)) 
model.add(Dense(num_classes, activation=’softmax’))

which I don't see any convolutions in

hasty grail Sep 21, 2021, 2:58 PM

#

No idea

#

I don't think transfer learning is taking place either - the pretrained VGG16 model is only used to generate the labels

inner imp Sep 21, 2021, 3:04 PM

#

Hello. I know basics of Python, and i wanna learn machine learning, AI and stuff. Can someone recommend me a good resource where i can learn 4 free?

dull turtle Sep 21, 2021, 3:10 PM

#

Traceback (most recent call last):

  File "F:\office codes\transaction time.py", line 52, in <module>
    chunk['sep_time'] = sep_time

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
    self._set_item(key, value)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
    value = self._sanitize_column(key, value)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
    value = sanitize_index(value, self.index)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
    raise ValueError(

ValueError: Length of values (1) does not match length of index (100000)```

#

my code ```python
final_date_time = datetime.strftime(c, '%d-%m-%Y %H:%M:%S.%f')
#New_Time.append(final_date_time)
sep_time.append(final_date_time)
print('final_date_time:', final_date_time)
print()

    chunk['sep_time'] = sep_time

    new_path = r'F:/output csv files/'
    file_name = 'Stream-5 datetime extracted'
    extension = '.csv'
    chunk.to_csv(f'{new_path}{file_name}{extension}', index=False, mode='a', header=None)
    print('data saved in extracted csv file')``` my code

#

how i can add output in column at end of datatframe

#

ping me when replying

desert oar Sep 21, 2021, 3:19 PM

#

@dull turtle you've repeatedly ignored and disregarded something like an hour of help and attention from multiple people here

#

And now you're re-asking the original question

worldly lake Sep 21, 2021, 3:19 PM

#

I have two arrays:

massive_first = [{"sku": "1", "type": "car", "color": "yellow"}]
massive_second = [{"sku": "1", "price": "4000", "quentity": "8"}]

anyone knows how i can combine them into one array like it:

massive_third = [{"sku": "1", "type": "car", "color": "yellow", "price": "4000", "quentity": "8"]

but with for cycle for every itteration?

dull turtle Sep 21, 2021, 3:20 PM

#

desert oar And now you're re-asking the original question

see actually i tried to solve my problem but i am getting above error

desert oar Sep 21, 2021, 3:20 PM

#

worldly lake I have two arrays: ``` massive_first = [{"sku": "1", "type": "car", "color": "ye...

Maybe use a dataframe and do a join?

dull turtle Sep 21, 2021, 3:20 PM

#

i stucked at this from yesterday , can u please look into this ?

worldly lake Sep 21, 2021, 3:20 PM

#

@desert oar dataframe from pandas module?

lapis sequoia Sep 21, 2021, 3:21 PM

#

so @dull turtle is this like small array or like big chunk of data?

#

if small array, i mean it should not be too hard IMO.

dull turtle Sep 21, 2021, 3:22 PM

#

lapis sequoia so <@!672406849538097173> is this like small array or like big chunk of data?

my data is of 5.5 gb

desert oar Sep 21, 2021, 3:22 PM

#

worldly lake <@!389497659087650836> dataframe from pandas module?

Yes. Are the two arrays in the same order?

lapis sequoia Sep 21, 2021, 3:22 PM

#

also you mean to combine them assuming merge by indexes right?

dull turtle Sep 21, 2021, 3:22 PM

#

so i am using chunksize for it

desert oar Sep 21, 2021, 3:23 PM

#

worldly lake <@!389497659087650836> dataframe from pandas module?

If both arrays are in the same order, you can just loop through pairs of elements. Otherwise you will need some kind of join

lapis sequoia Sep 21, 2021, 3:23 PM

#

dull turtle my data is of 5.5 gb

oh df may be bad way for it then, since it will take quite RAM.
I suggest simply using loops.
assuming by 5.5gb you mean its in some kind of file.

dull turtle Sep 21, 2021, 3:23 PM

#

lapis sequoia oh df may be bad way for it then, since it will take quite RAM. I suggest simply...

csv file

desert oar Sep 21, 2021, 3:23 PM

#

lapis sequoia oh df may be bad way for it then, since it will take quite RAM. I suggest simply...

We helped them extensively with this yesterday. They are reading a data frame in chunks using the pandas read_csv chunking/iterator API

#

And the problem is that they are re-inserting the column at every iteration of a loop when they should be inserting it only once

#

Which we have explained over and over

lapis sequoia Sep 21, 2021, 3:24 PM

#

desert oar We helped them extensively with this yesterday. They are reading a data frame in...

oh i see. alright, I kinda used csv module in past when i had like this big data.

desert oar Sep 21, 2021, 3:24 PM

#

lapis sequoia oh i see. alright, I kinda used `csv` module in past when i had like this big da...

Yeah, I don't think pandas always had this functionality. I have done the same in the past, but now you can read in chunks in a tidy way

dull turtle Sep 21, 2021, 3:25 PM

#

lapis sequoia oh i see. alright, I kinda used `csv` module in past when i had like this big da...

can u now look into my problem also

lapis sequoia Sep 21, 2021, 3:25 PM

#

that is great!! I'll look upon chuck size in free time.

lapis sequoia Sep 21, 2021, 3:26 PM

#

dull turtle can u now look into my problem also

i mean i think salt has already said what you're doing wrong.
but yeah sure, i can try, what have you done so far?
Also you didn't say, the sequence is same in both cases?

worldly lake Sep 21, 2021, 3:26 PM

#

@desert oarI have two arrays that I want to link using the same "sku" (article) keys, but the problem is that the second array contains some objects that are not in the first one, and I need to somehow compare and display them. Thought at first to make an object, but I need to iterate over two arrays at the same time. I read about Itertools.chains but it doesn't really help me much.

dull turtle Sep 21, 2021, 3:26 PM

#

lapis sequoia i mean i think salt has already said what you're doing wrong. but yeah sure, i c...

wait i share my code

lapis sequoia Sep 21, 2021, 3:27 PM

#

worldly lake <@!389497659087650836>I have two arrays that I want to link using the same "sku"...

oh i see, are they sorted tho?

dull turtle Sep 21, 2021, 3:27 PM

#

my code here https://paste.pythondiscord.com/sizixeyiha.py @lapis sequoia

desert oar Sep 21, 2021, 3:27 PM

#

worldly lake <@!389497659087650836>I have two arrays that I want to link using the same "sku"...

Chain is for concatenating, you need to work on pairs of elements

lapis sequoia Sep 21, 2021, 3:27 PM

#

also please no DM. @dull turtle

desert oar Sep 21, 2021, 3:28 PM

#

@worldly lake I can't tag you because you have RTL in your name

dull turtle Sep 21, 2021, 3:28 PM

#

lapis sequoia also please no DM. <@!672406849538097173>

do u get my code?

desert oar Sep 21, 2021, 3:29 PM

#

oh it's only buggy on mobile

#

the tagging works on desktop

worldly lake Sep 21, 2021, 3:29 PM

#

@lapis sequoiaSorted yes. I just have a list of products in the first array, and information about the price and quantity of this product in the second. And so it turns out that there is a product in the first array, but there is no price for it, which is why the first array of objects contains more than the second. And because of this, I cannot sort. Although I can unload sku data of goods and then comparing whether there are such unloading as I need. I seem to have solved my problem.

desert oar Sep 21, 2021, 3:29 PM

#

i would still using a dataframe here

#

implementing a join manually over lists would be slow and a lot more code than you need

lapis sequoia Sep 21, 2021, 3:30 PM

#

wait you both have different questions?

worldly lake Sep 21, 2021, 3:30 PM

#

@desert oar sorry for that

lapis sequoia Sep 21, 2021, 3:30 PM

#

worldly lake <@456226577798135808>Sorted yes. I just have a list of products in the first arr...

share a small example.

desert oar Sep 21, 2021, 3:30 PM

#

they did

# inputs
massive_first = [{"sku": "1", "type": "car", "color": "yellow"}]
massive_second = [{"sku": "1", "price": "4000", "quentity": "8"}]

# result
massive_third = [{"sku": "1", "type": "car", "color": "yellow", "price": "4000", "quentity": "8"}]

dull turtle Sep 21, 2021, 3:30 PM

#

salt rock can we discuss again ? please

lapis sequoia Sep 21, 2021, 3:30 PM

#

okay okay. hold on.

desert oar Sep 21, 2021, 3:31 PM

#

dull turtle salt rock can we discuss again ? please

i am happy to help you, but please read what i tried to tell you yesterday. i even wrote multiple new versions of your code for you

#

i won't re-answer the same question over and over

#

@worldly lake doing this "manually" would entail do 2 loops: one over massive_first, and another over massive_second - basically comparing all pairs of elements

dull turtle Sep 21, 2021, 3:32 PM

#

my data

desert oar Sep 21, 2021, 3:32 PM

#

you can speed this up by using some kind of index instead of just scanning the list. the simplest thing to do in python would be to build a hash table of SKU -> record

dull turtle Sep 21, 2021, 3:32 PM

#

desert oar i am happy to help you, but please read what i tried to tell you yesterday. i ev...

i tried your code but csv file in output has nothing inside it

lapis sequoia Sep 21, 2021, 3:32 PM

#

desert oar <@!287925492080967681> doing this "manually" would entail do 2 loops: one over `...

comparing all may not be required if both of them are sorted, by something unique

desert oar Sep 21, 2021, 3:32 PM

#

that's true too

dull turtle Sep 21, 2021, 3:32 PM

#

blank csv file i get yesterday

desert oar Sep 21, 2021, 3:32 PM

#

but again, doing this all in pandas would make this easier

#

yes @dull turtle, i said that this was untested code written partly by guessing, and that you need to double-check the documentation for these functions to see how to actually use it

#

set mode='a' in my code, try that

worldly lake Sep 21, 2021, 3:33 PM

#

@lapis sequoia
https://pastebin.com/B3QxVqr0

Pastebin

First massive = [{ "barcodes": "05055813401945", "category": 60, ...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

lapis sequoia Sep 21, 2021, 3:34 PM

#

worldly lake <@456226577798135808> https://pastebin.com/B3QxVqr0

ah thanks hold on.

worldly lake Sep 21, 2021, 3:35 PM

#

@lapis sequoia first massive have object with sku 107 but second massive is not

lapis sequoia Sep 21, 2021, 3:35 PM

#

that seems like a small data

#

wait is that js?

desert oar Sep 21, 2021, 3:36 PM

#

"massive" would be like 107 million 🙂

#

and even then, that's pretty small for some systems

lapis sequoia Sep 21, 2021, 3:36 PM

#

don't tell me the arrays you shared are array of jsons and not dicts :'(

desert oar Sep 21, 2021, 3:37 PM

#

https://paste.pythondiscord.com/inagowoyoh.py

#

doing that by hand would be a lot slower and a lot more verbose

#

you might have to change how= depending on the desired effect:

df_result = df1.join(df2, how='inner')
df_result = df1.join(df2, how='outer')
df_result = df1.join(df2, how='left')
df_result = df1.join(df2, how='right')

burnt knot Sep 21, 2021, 3:38 PM

#

So basically throughout August and September I've been beside myself as I struggled to figure out how to build a model that can generate TTS for English and Japanese in one sentence

desert oar Sep 21, 2021, 3:39 PM

#

https://www.w3schools.com/sql/sql_join.asp @worldly lake see here for the different kinds of joins

burnt knot Sep 21, 2021, 3:39 PM

#

I used Multi-Tacotron-Voice-Cloning for the most part
This is a GitHub repository designed to make it easy to do multiple languages cloned by swapping out the English and Russian in the dataset

#

It's divided into four portions: g2p, encoder, synthesiser, and vocoder

#

I spent a good deal of time successfully making a g2p collection for English and Japanese

#

The problem comes with everything after that

#

The instructions are set out plainly for me to use accordingly

#

Probably because there's little room for error if I do it right

desert oar Sep 21, 2021, 3:41 PM

#

you can also use pd.merge which is a slightly different interface (more concise in this particular case):

df1 = pd.DataFrame(massive1)
df2 = pd.DataFrame(massive2)

df_result = pd.merge(df1, df2, on='sku', how='outer')

burnt knot Sep 21, 2021, 3:41 PM

#

It seems the main reason I can't make this model has to do with my choice for audio?

#

I struggled with the encoder step in terms of formatting and collecting English voices, but I got it to work
But then when I got to the synthesiser step I was stuck because I didn't have a metadata file

#

I don't know how I was expected to organise my data because it really hurt me in the process

#

I think I could overcome this if I figure out how to properly format any kind of dataset I have
But I don't really know how that would work because the author of the repository never addressed this

#

That's where I've been throughout September

#

I tried asking for help from other people that do voice cloning but they don't have a clue what to tell me

#

I'm running out of ideas and haven't figured out how to start over

worldly lake Sep 21, 2021, 3:49 PM

#

@desert oar@lapis sequoia oh my godddddddddddd, thanks you both very much!!!!!

#

this is exactly what I needed!

#

can I express my gratitude to you?
can there be any section with reviews / reviews? hah: D

dull turtle Sep 21, 2021, 3:52 PM

#

@desert oar hello i am using this code ```python
df_chunks = pd.read_csv(f'{path}{file_name}{extension}',
engine="python",
chunksize=100000,
iterator=True,
)

for i, chunk in enumerate(df_chunks):
print("chunk no.:", i)

time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365)
chunk.insert(2, "time_separated", time_separated)
del chunk["Transaction Time"]

is_first_chunk = i == 0
write_header = is_first_chunk
write_append = not is_first_chunk

append=write_append

chunk.to_csv(filename_out, header=write_header, mode='a' )

chunk.to_csv(f'{new_path}{output_file_name}{extension}', header=write_header,mode='a' if write_append else 'w')```

#

can we discuss again ?

desert oar Sep 21, 2021, 3:54 PM

#

dull turtle <@!389497659087650836> hello i am using this code ```python df_chunks = pd.read_...

if you can post a few rows from your csv data, i can actually test this code

#

!paste use this site 👇

arctic wedgeBOT Sep 21, 2021, 3:54 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar Sep 21, 2021, 3:55 PM

#

you only need to post the first 10 rows or something like that

#

also, @dull turtle is this actually a school assignment? if so, i will have to stop giving answers as per our rules, but i will at least try to debug what i already wrote

dull turtle Sep 21, 2021, 3:57 PM

#

desert oar if you can post a few rows from your csv data, i can actually test this code

my data https://paste.pythondiscord.com/oyonajovac.apache check here

dull turtle Sep 21, 2021, 3:57 PM

#

desert oar also, <@!672406849538097173> is this actually a school assignment? if so, i will...

i am working on project , i stucked at this problem

#

do u get my data in pastebin link ?

desert oar Sep 21, 2021, 4:02 PM

#

yes, thank you

#

i noticed that another user posted very similar code and almost the exact same problem

#

are you collaborating with someone on it? are you using 2 different accounts?

#

were you given a template by your instructor?

dull turtle Sep 21, 2021, 4:02 PM

#

not at all

desert oar Sep 21, 2021, 4:02 PM

#

#data-science-and-ml message

#

this is the same code as your original code, right down to the typos

dull turtle Sep 21, 2021, 4:04 PM

#

desert oar were you given a template by your instructor?

see firstly i have to change my Transaction Time into humna readable time for mat

#

i have written script for it

#

if u check my code u will find

#

now i just want to add that output column at end of df

#

do u get my point what i am trying to do ?

desert oar Sep 21, 2021, 4:05 PM

#

yes, i understand. but i am trying to get more context for this

lapis sequoia Sep 21, 2021, 4:05 PM

#

worldly lake <@!389497659087650836><@456226577798135808> oh my godddddddddddd, thanks you bot...

all thanks to salt heh, i was mere spectator in this case.

dull turtle Sep 21, 2021, 4:06 PM

#

desert oar yes, i understand. but i am trying to get more context for this

can u ask me what u want more? so u also get more idea?

desert oar Sep 21, 2021, 4:06 PM

#

what is the nature of the project? what kind of task were you given? are you going to just use my code and claim it as your own?

#

anyway, i tested my code and this worked as expected:

from datetime import timedelta

import pandas as pd


filename_in = "..."
filename_out = "..."

df_chunks = pd.read_csv(filename_in, sep='\t', chunksize=5, iterator=True)

for i, chunk in enumerate(df_chunks):
    print(f"Chunk: {i}")

    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
    chunk.insert(2, "time_separated", time_separated)

    is_first_chunk = i == 0
    write_header = is_first_chunk
    write_mode = 'w' if is_first_chunk else 'a'
    chunk.to_csv(filename_out, header=write_header, mode=write_mode, index=False, sep='\t')

#

obviously that chunksize is very small, it was just for testing

burnt knot Sep 21, 2021, 4:08 PM

#

This is the Github repository I was talking about
https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

GitHub

GitHub - vlomme/Multi-Tacotron-Voice-Cloning: Phoneme multilingual(...

Phoneme multilingual(Russian-English) voice cloning based on - GitHub - vlomme/Multi-Tacotron-Voice-Cloning: Phoneme multilingual(Russian-English) voice cloning based on

#

I basically used the Google Colab provided and rewrote it to be mounted to my drive and work from there with the instructions

#

I've gone through many issues environment-wise, but I believe the problem is entirely in making sure all my files are in place

#

Something that I haven't been able to understand

dull turtle Sep 21, 2021, 4:10 PM

#

desert oar obviously that chunksize is very small, it was just for testing

can i use this code by increasing chunksize to 100000

#

?

#


Traceback (most recent call last):

  File "F:\office codes\transaction time seprateed code.py", line 46, in <module>
    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err

KeyError: 'Transaction Time'``` this error i am getting @desert oar

desert oar Sep 21, 2021, 4:21 PM

#

i can't reproduce the error, works fine for me

dull turtle Sep 21, 2021, 4:22 PM

#

desert oar i can't reproduce the error, works fine for me

i have used increasing chunksize to 100000 this

quasi parcel Sep 21, 2021, 4:23 PM

#

hi @desert oar i hope you are doing well, your code works well for string but for int or float its not working could you help when you are free?

dull turtle Sep 21, 2021, 4:23 PM

#

desert oar i can't reproduce the error, works fine for me

i agree it works fine for u, but can u help me why i am getting error

desert oar Sep 21, 2021, 4:24 PM

#

i don't know why you're getting the error, because it works for me

lapis sequoia Sep 21, 2021, 4:24 PM

#

dull turtle ```python Traceback (most recent call last): File "F:\office codes\transacti...

simple traceback suggests that you may not have that column precisely for some chunk?

desert oar Sep 21, 2021, 4:24 PM

#

that seems odd

#

more realistically del did something, i've since removed that from my code

desert oar Sep 21, 2021, 4:24 PM

#

quasi parcel hi <@!389497659087650836> i hope you are doing well, your code works well for st...

i don't remember what my code was. just post your code and some example data, and someone can help. i'm not the only person here who knows pandas, numpy, etc.

quasi parcel Sep 21, 2021, 4:25 PM

#

i am sorry

#

sure

serene scaffold Sep 21, 2021, 4:25 PM

#

quasi parcel i am sorry

Part of the advantage of this channel is that lots of people with data science knowledge hang out here. salt rock lamp is very knowledgeable, but it's best to pose your questions as something that anyone can help you with.

quasi parcel Sep 21, 2021, 4:27 PM

#

i agree sorry @serene scaffold

lapis sequoia Sep 21, 2021, 4:27 PM

#

lol its alright, anyways, you may want to reshare the question/issue.

#

or atleast put the link of question shared previously

quasi parcel Sep 21, 2021, 4:29 PM

#

sharing in two mins

#

@lapis sequoia

dull turtle Sep 21, 2021, 4:29 PM

#

lapis sequoia simple traceback suggests that you may not have that column precisely for some c...

see this data i have Transaction Time column

lapis sequoia Sep 21, 2021, 4:30 PM

#

either you're doing something wrong with the data. or for that chunk that data is null in all cases(an assumption)

a simple way to debug would be to print the column names and/or see if you're deleting some columns or something

dull turtle Sep 21, 2021, 4:31 PM

#

lapis sequoia either you're doing something wrong with the data. or for that chunk that data i...

my code here ```python
df_chunks = pd.read_csv(f'{path}{file_name}{extension}', sep='\t', chunksize=100000, iterator=True)

for i, chunk in enumerate(df_chunks):
print(f"Chunk: {i}")

time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
chunk.insert(2, "time_separated", time_separated)

is_first_chunk = i == 0
write_header = is_first_chunk
write_mode = 'w' if is_first_chunk else 'a'
chunk.to_csv(f'{new_path}{output_file_name}{extension}', header=write_header, mode=write_mode, index=False, sep='\t')
```

#

i am using salt rock lamp code

lapis sequoia Sep 21, 2021, 4:34 PM

#

print(chunk.columns.tolist())

you can see columns when this breaks to see if Transaction Time exists over there.

dull turtle Sep 21, 2021, 4:35 PM

#

lapis sequoia ```py print(chunk.columns.tolist()) ``` you can see columns when this breaks to ...

['Msgtype,Activity Type,Transaction Time,Exchange,Token,Symbol,Buy/Sell,Buy Order Number,Sell Order Number,Price,Qty']```

burnt knot Sep 21, 2021, 4:35 PM

#

I would also like to mention I was recommended this
https://github.com/deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning

GitHub

GitHub - deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning: ...

Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning. - GitHub - deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning: Tacotron...

#

I was working with this (running into problems of missing files so I couldn't complete it properly) but I have doubts that this would be effective in constructing a voice cloning model for English-Japanese

lapis sequoia Sep 21, 2021, 4:36 PM

#

dull turtle ```python ['Msgtype,Activity Type,Transaction Time,Exchange,Token,Symbol,Buy/Sel...

is that the last output when it breaks?

dull turtle Sep 21, 2021, 4:36 PM

#

for i, chunk in enumerate(df_chunks):
    print(f"Chunk: {i}")
    print(chunk.columns.tolist())
    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
    chunk.insert(2, "time_separated", time_separated)``` my code @lapis sequoia

burnt knot Sep 21, 2021, 4:36 PM

#

I don't know if there are things in the code I need to rewrite for this to work because I haven't figured that out
But I started conflating this with the other code and now I'm a mess

dull turtle Sep 21, 2021, 4:36 PM

#

lapis sequoia is that the last output when it breaks?

Chunk: 0
['Msgtype,Activity Type,Transaction Time,Exchange,Token,Symbol,Buy/Sell,Buy Order Number,Sell Order Number,Price,Qty']``` error ```python
Traceback (most recent call last):

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Transaction Time'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "F:\office codes\transaction time seprateed code.py", line 46, in <module>
    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err

KeyError: 'Transaction Time'```

lapis sequoia Sep 21, 2021, 4:40 PM

#

not sure what is wrong then 😓

dull turtle Sep 21, 2021, 4:41 PM

#

okay np thanks for your efforts

#

if anyone has solution for my issue please ping me

quasi parcel Sep 21, 2021, 4:42 PM

#

@lime hemlock you need seperate data frame for transcation time?

#

is the transaction time is in epoch?

dull turtle Sep 21, 2021, 4:43 PM

#

quasi parcel is the transaction time is in epoch?

yes

#

my code here https://paste.pythondiscord.com/ixuvakohem.py @quasi parcel

quasi parcel Sep 21, 2021, 4:44 PM

#

have you tried chunk['Transaction Time']

dull turtle Sep 21, 2021, 4:44 PM

#

dull turtle see this data i have `Transaction Time` column

@quasi parcel my data here

dull turtle Sep 21, 2021, 4:45 PM

#

quasi parcel have you tried ```chunk['Transaction Time']```

yes but how i can append my data end of df

quasi parcel Sep 21, 2021, 4:45 PM

#

ohh wait

#

i have encounterd this

#

you can do this

dull turtle Sep 21, 2021, 4:45 PM

#

the output came from Transaction Time i have to add this in column at end of df

dull turtle Sep 21, 2021, 4:46 PM

#

quasi parcel you can do this

okay

quasi parcel Sep 21, 2021, 4:47 PM

#

chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transcation Time']]

#

this should do the trick

#

@dull turtle

dull turtle Sep 21, 2021, 4:47 PM

#

quasi parcel chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m...

what this wiil return ?

coral isle Sep 21, 2021, 4:48 PM

#

Hello
I need help to delete cells in excel file and shift the rest to left such as
The photo

I will delete any cell before NR and shift left , if the row haven't NR should be deleted and shift up

#

quasi parcel Sep 21, 2021, 4:49 PM

#

dull turtle what this wiil return ?

this will create a new colomn in the exsisting df and convert the epoch to date

dull turtle Sep 21, 2021, 4:49 PM

#

quasi parcel this will create a new colomn in the exsisting df and convert the epoch to date

let me try

dull turtle Sep 21, 2021, 4:51 PM

#

quasi parcel this will create a new colomn in the exsisting df and convert the epoch to date

Traceback (most recent call last):

  File "F:\office codes\transaction time.py", line 46, in <module>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transaction Time']]

  File "F:\office codes\transaction time.py", line 46, in <listcomp>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transaction Time']]

OSError: [Errno 22] Invalid argument```

quasi parcel Sep 21, 2021, 4:52 PM

#

one moment

dull turtle Sep 21, 2021, 4:53 PM

#

sure, ping me

quasi parcel Sep 21, 2021, 4:53 PM

#

chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transcation Time']]

#

can you show me the code after adding this line please

#

?

#

os error generally occur when there is an invalid file path

dull turtle Sep 21, 2021, 4:54 PM

#

quasi parcel ?

https://paste.pythondiscord.com/kukicokoli.py plz check code here

quasi parcel Sep 21, 2021, 4:55 PM

#

https://paste.pythondiscord.com/jamufeziza.py

#

try this once

#

there will be some idendation issue but

burnt knot Sep 21, 2021, 4:58 PM

#

I just hope my issue isn't terribly hard to address without testing out the program yourself
Which might be why I never get a response to my issue

quasi parcel Sep 21, 2021, 4:59 PM

#

@burnt knot what was your issue again can you please give the link

dull turtle Sep 21, 2021, 5:00 PM

#

quasi parcel ```chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d...

chunk no.: 0
Traceback (most recent call last):

  File "F:\office codes\transaction time.py", line 19, in <module>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transaction Time']]

  File "F:\office codes\transaction time.py", line 19, in <listcomp>
    chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transaction Time']]

OSError: [Errno 22] Invalid argument```

dull turtle Sep 21, 2021, 5:01 PM

#

quasi parcel os error generally occur when there is an invalid file path

file path is correct

quasi parcel Sep 21, 2021, 5:01 PM

#

okay,

#

is it okay to give 10 rows in .csv format

#

@dull turtle

dull turtle Sep 21, 2021, 5:02 PM

#

quasi parcel is it okay to give 10 rows in .csv format

sure

desert oar Sep 21, 2021, 5:03 PM

#

quasi parcel is it okay to give 10 rows in .csv format

this is what they gave me (tab-separated)

Msgtype    Activity Type    Transaction Time    Exchange    Token    Symbol    Buy/Sell    Buy Order Number    Sell Order Number    Price    Qty
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473430    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255595    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255600    500
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    607185    1175
Order    M    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255605    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    BUY    1.4E+15    0    255610    250
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473425    250
Order    M    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473420    250
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    607180    1175
Order    N    1.31452E+18    NSEFO    39203    BANKNIFTY 02SEP21 31500.00 CE    SELL    0    1.4E+15    473415    500

#

and this was the code i wrote, which worked

from datetime import timedelta

import pandas as pd


filename_in = "pd_chunks_1.csv"
filename_out = "pd_chunks_2.csv"

df_chunks = pd.read_csv(filename_in, sep='\t', chunksize=2, iterator=True)

for i, chunk in enumerate(df_chunks):
    print(f"Chunk: {i}")

    time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
    chunk.insert(2, "time_separated", time_separated)

    is_first_chunk = i == 0
    write_header = is_first_chunk
    write_mode = 'w' if is_first_chunk else 'a'
    chunk.to_csv(filename_out, header=write_header, mode=write_mode, index=False, sep='\t')

quasi parcel Sep 21, 2021, 5:04 PM

#

got it sir let me check

arctic wedgeBOT Sep 21, 2021, 5:05 PM

#

Hey @dull turtle!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

dull turtle Sep 21, 2021, 5:06 PM

#

quasi parcel is it okay to give 10 rows in .csv format

can i dm u my csv file, hwere i am not able to send file

quasi parcel Sep 21, 2021, 5:06 PM

#

sure

dull turtle Sep 21, 2021, 5:07 PM

#

plz check

burnt knot Sep 21, 2021, 5:14 PM

#

quasi parcel <@!696806892793626664> what was your issue again can you please give the link

I was using a Github repository for weeks trying to turn an English-Russian voice cloning model into an English-Japanese one
I got stuck likely because my data was a mess
I was also recommended another Github repository but am doubtful about it because it seems attuned to one language rather than many, and I wouldn't know how to tweak it for more than one language

#

First link: https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/wiki/Training-(and-for-other-languages)

GitHub

Training (and for other languages) · vlomme/Multi-Tacotron-Voice-Cl...

Phoneme multilingual(Russian-English) voice cloning based on - Training (and for other languages) · vlomme/Multi-Tacotron-Voice-Cloning Wiki

#

Second link: https://github.com/deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning

GitHub

GitHub - deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning: ...

Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning. - GitHub - deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning: Tacotron...

coral isle Sep 21, 2021, 5:31 PM

#

coral isle Hello I need help to delete cells in excel file and shift the rest to left suc...

Any idea

quasi parcel Sep 21, 2021, 5:50 PM

#

@desert oar you code works like a charm, the only issue was @dull turtle was using comma seperated but you have you used tab seperated

#

@dull turtle https://paste.pythondiscord.com/iriguhobut.py this should work

#

i have changed sep='\t' to sep=','

#

it started working

quasi parcel Sep 21, 2021, 6:08 PM

#

@burnt knot what is wrong in first repo

#

?

#

could you please elaborate on the error

burnt knot Sep 21, 2021, 6:09 PM

#

Okay
So the guy who made it listed the voice banks he used for his data for both English and Russian
I couldn't use his English voice bank because I had complications downloading it so I used my own, and I got my Japanese voice bank

#

The audio work starts in the encoder stage

#

It took me awhile to get anywhere with the audio but I eventually got it to train
But then I got to the synthesiser stage

#

I couldn't run the first command because it stated it needed a metadata.csv file, which I could not find
I take it that it's a graph with all the metadata or however that works
My point is I can't run the step without having the voice metadata file to give it

#

I don't know if it mattered on the kind of voices I was using--I'm using a new batch this time around

#

But I can't figure out how to work that synthesiser step

quasi parcel Sep 21, 2021, 6:13 PM

#

okay

#

let me run the first repo do you have your data?

#

@burnt knot

burnt knot Sep 21, 2021, 6:15 PM

#

You want me to share it?

quasi parcel Sep 21, 2021, 6:15 PM

#

yes

#

if you are okay with it

#

?

burnt knot Sep 21, 2021, 6:16 PM

#

https://drive.google.com/drive/folders/1ZqaNw3XezZrGkgmf9hRWbbatKHZDDMQV?usp=sharing

quasi parcel Sep 21, 2021, 6:19 PM

#

got it

#

it might take some time to debug your issue its bit complex

burnt knot Sep 21, 2021, 6:43 PM

#

Do you have the notebook too?

#

@quasi parcel

quasi parcel Sep 21, 2021, 6:48 PM

#

your data is downloading

#

its still loading

#

sorry my bad

#

net

burnt knot Sep 21, 2021, 6:54 PM

#

No I mean
Do you need the link to the Colab I used?
Or do you prefer to work on the hard drive?

modest timber Sep 21, 2021, 6:57 PM

#

Hi could I ask question with pandas dataframe?

#

Or should I wait, there is queue 🙂

burnt knot Sep 21, 2021, 7:03 PM

#

Ask away
There isn't a queue, you can expect a response eventually

quasi parcel Sep 21, 2021, 7:04 PM

#

burnt knot No I mean Do you need the link to the Colab I used? Or do you prefer to work on ...

anything is fine with me

burnt knot Sep 21, 2021, 7:05 PM

#

https://colab.research.google.com/drive/1Hao3P_DOoHUTKcYcNdmgm_Eq949fvlP3

Google Colaboratory

#

Here you go just in case

quasi parcel Sep 21, 2021, 7:05 PM

#

okay

#

my bad could you run colab again

#

?

#

sorry

modest timber Sep 21, 2021, 7:14 PM

#

hey, please help.. I have code that is adding two SMA to data (simple moving average to data)
But its job was to every loop iteration add 2 columns of SMA, and start another loop with old data frame.( In the beginning of the loop I add df=data for that) - but for unknown reasons I got expanding column every iteration-( like it was being doing still on the same old data.)

#

import pandas as pd
def checkBestSMA(file, minSMA1, maxSMA1, minSMA2, maxSMA2):
    maxSMA1 += 1
    maxSMA2 += 1
    data = pd.read_csv(file)
    # read file .csv
    data = data.iloc[:, [0, 5]]
    # get only DATE and stock price
    for i in range(minSMA1, maxSMA1):
        for j in range(minSMA2, maxSMA2):
            df = data
            SMA1 = f"SMA{i}"
            SMA2 = f"SMA{j}"
            df[SMA1] = df['Adj Close'].rolling(i).mean()
            df[SMA2] = df['Adj Close'].rolling(j).mean()
            df = df.dropna()
            print(df.head())
if __name__ == '__main__':
    checkBestSMA('PKN.WA.csv', 20,50,40, 200)

#

          Date  Adj Close      SMA20      SMA40
39  2010-02-25  22.858307  23.167923  24.311018
40  2010-02-26  23.519203  23.131645  24.289287
41  2010-03-01  23.914301  23.126617  24.251393
42  2010-03-02  24.711685  23.133801  24.210088
43  2010-03-03  24.783520  23.132005  24.168782
          Date  Adj Close      SMA20      SMA40      SMA41
40  2010-02-26  23.519203  23.131645  24.289287  24.291705
41  2010-03-01  23.914301  23.126617  24.251393  24.280141
42  2010-03-02  24.711685  23.133801  24.210088  24.262620
43  2010-03-03  24.783520  23.132005  24.168782  24.224074
44  2010-03-04  24.855356  23.196657  24.130170  24.185527
          Date  Adj Close      SMA20      SMA40      SMA41      SMA42
41  2010-03-01  23.914301  23.126617  24.251393  24.280141  24.282719
42  2010-03-02  24.711685  23.133801  24.210088  24.262620  24.290416

burnt knot Sep 21, 2021, 7:15 PM

#

quasi parcel my bad could you run colab again

What do you mean? Do you need something?

quasi parcel Sep 21, 2021, 7:16 PM

#

no can you just run the code

#

it was asking auth

burnt knot Sep 21, 2021, 7:17 PM

#

Try now?

quasi parcel Sep 21, 2021, 7:40 PM

#

/content```

#

@burnt knot

#

how do you want the data to look like @modest timber

modest timber Sep 21, 2021, 7:53 PM

#

      Date  Adj Close      SMA20      SMA40
39  2010-02-25  22.858307  23.167923  24.311018
40  2010-02-26  23.519203  23.131645  24.289287
          Date  Adj Close      SMA20   SMA41
40  2010-02-26  23.519203  23.131645  24.291705
41  2010-03-01  23.914301  23.126617  24.280141
          Date  Adj Close      SMA20   SMA42
40  2010-02-26  23.519203  23.131645  24.291705
41  2010-03-01  23.914301  23.126617  24.280141
......

#

@quasi parcel

#

I can send you .csv if you want

quasi parcel Sep 21, 2021, 8:00 PM

#

could you

modest timber Sep 21, 2021, 8:02 PM

#

For me, it is weird behavior of pandas.df

#

I wonder why this happen - ( or I have some bug)

quasi parcel Sep 21, 2021, 8:07 PM

#

can you share a screenshot on how you want the data to be

#

its kinda confusing sorry

modest timber Sep 21, 2021, 8:10 PM

#

ok, - you know its part of bigger project

lapis sequoia Sep 21, 2021, 8:11 PM

#

so right now you're basically adding a lot of columns right?

#

and what behaviour do you want?

modest timber Sep 21, 2021, 8:12 PM

#

No this function should add odnly two columns, and then delete the dataframe

#

start with new one - so I loading df=data

#

at the top fo the block in loop

lapis sequoia Sep 21, 2021, 8:13 PM

#

okay okay okay so you want to use new df everytime

modest timber Sep 21, 2021, 8:13 PM

#

y

#

🙂

lapis sequoia Sep 21, 2021, 8:13 PM

#

you may wonna use copy. df=data is same as using data.

#

!d pandas.DataFrame.copy

arctic wedgeBOT Sep 21, 2021, 8:14 PM

#

pandas.DataFrame.copy


DataFrame.copy(deep=True)```
Make a copy of this object’s indices and data.

When `deep=True` (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When `deep=False`, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

lapis sequoia Sep 21, 2021, 8:14 PM

#

try with df = data.copy()

modest timber Sep 21, 2021, 8:14 PM

#

yep

#

worked 😄

lapis sequoia Sep 21, 2021, 8:15 PM

#

great 😊 gni8

modest timber Sep 21, 2021, 8:15 PM

#

Could you gave me light why thats hapening?

lapis sequoia Sep 21, 2021, 8:16 PM

#

sure. in a nutshell df=data means they both are referring to same object.
it's same for lists, dicts.

modest timber Sep 21, 2021, 8:16 PM

#

aaaaah

#

k 😄

lapis sequoia Sep 21, 2021, 8:16 PM

#

while
a=5
b=a
a+=1
will only change a

it is not same case with lists or dicts or np arrays or dfs

#

I'm on cellphone rn otherwise I'd give you an example.

quasi parcel Sep 21, 2021, 8:18 PM

#

correct @lapis sequoia sorry i was confused @modest timber

modest timber Sep 21, 2021, 8:18 PM

#

ok, np 😄

lapis sequoia Sep 21, 2021, 8:18 PM

#

but in a nutshell they both are referring same address so changing something at that place will be like changing for both.

this is like we both have address of same house
so if you put a glass in that house, we both will see a new glass.

modest timber Sep 21, 2021, 8:19 PM

#

@lapis sequoia ok i preatty uderstand - its like pointer in c 😄

lapis sequoia Sep 21, 2021, 8:19 PM

#

yeah exactly! it's nothing but pointer!!

modest timber Sep 21, 2021, 8:20 PM

#

this teacher I think dont explain to us 😄

#

Ok, thanks again, have a nice day 🙂

lapis sequoia Sep 21, 2021, 8:21 PM

#

you too😊

burnt knot Sep 21, 2021, 9:05 PM

#

quasi parcel <@!696806892793626664>

I think you need to reroute the folder
Did you clone it into your drive?

#

Have you used Colab before?

serene scaffold Sep 21, 2021, 9:17 PM

#

For work I need to create something that, once complete, can be put on a flash drive and run on a computer that doesn't have internet access. So part of that means making sure that the finished product doesn't make API calls, but it also means being able to either (a) download all the dependencies before building the environment or (b) building a portable environment. I'm not really sure how to do either because I always just make a venv and pip install the requirements.

desert oar Sep 21, 2021, 9:17 PM

#

serene scaffold For work I need to create something that, once complete, can be put on a flash d...

pyinstaller or nuitka maybe

#

pyinstaller i think just bundles the entire runtime and all your deps into a single .exe

serene scaffold Sep 21, 2021, 9:18 PM

#

desert oar pyinstaller i think just bundles the entire runtime and all your deps into a sin...

I don't want that. I should have been more specific: it's going to be a jupyter notebook

#

(not my decision)

desert oar Sep 21, 2021, 9:18 PM

#

what exactly are the components involved?

#

and what's the target system?

serene scaffold Sep 21, 2021, 9:19 PM

#

I'm waiting to hear back from the project lead about all of that. I guess, could one build the wheel on a VM that's the same OS? Are wheels portable?

desert oar Sep 21, 2021, 9:21 PM

#

you can build a noarch wheel, but it won't help with deps

#

does your usb stick need to include python, jupyter, etc.? or just the deps for your particular notebook?

#

pyinstaller might still be the way to go, since jupyter is a python application after all

serene scaffold Sep 21, 2021, 9:22 PM

#

desert oar does your usb stick need to include python, jupyter, etc.? or just the deps for ...

probably just the deps

desert oar Sep 21, 2021, 9:40 PM

#

serene scaffold probably just the deps

so the target system will have python, ipykernel, and jupyter, but maybe not torch, pandas, matplotlib, etc?

#

you might want to ask in #tools-and-devops tbh, this sounds not-data-science-specific

serene scaffold Sep 21, 2021, 9:57 PM

#

desert oar you might want to ask in <#463035462760792066> tbh, this sounds not-data-science...

it's data science if I'm the one doing it lemon_angrysad

#

But I guess I need to figure out more about the requirements first

desert oar Sep 21, 2021, 10:13 PM

#

yeah it sounds like it

chilly mica Sep 21, 2021, 11:11 PM

#

Hey everyone, I'm looking for help with geopandas, json, and folium. Is anyone able to help me on a project?

serene scaffold Sep 22, 2021, 12:04 AM

#

chilly mica Hey everyone, I'm looking for help with geopandas, json, and folium. Is anyone a...

It's best to just put your question out there. Then someone can try to answer it.

quasi parcel Sep 22, 2021, 4:31 AM

#

burnt knot Have you used Colab before?

Yes

quasi parcel Sep 22, 2021, 4:31 AM

#

burnt knot I think you need to reroute the folder Did you clone it into your drive?

Yes I did

lapis sequoia Sep 22, 2021, 4:56 AM

#

quasi parcel Yes I did

you can kinda play around with !ls and !cd to get exact path. once you do that you can use !pwd to get absolute path which will help for next time.

steel trellis Sep 22, 2021, 6:14 AM

#

any data scientist here?

royal crest Sep 22, 2021, 6:35 AM

#

i employ a lot of data science techniques in my work but my job description isn't data scientist

brave umbra Sep 22, 2021, 8:18 AM

#

Yoo guys I have been learning python for about 6 weeks, made some guis with tkinter and used mysql databases to write data etc, you think I can move onto AI to do stuff like facial detection or is this too advanced for me and I should wait and learn more before doing this?

tender hearth Sep 22, 2021, 8:27 AM

#

brave umbra Yoo guys I have been learning python for about 6 weeks, made some guis with tkin...

What's important is your proficiency in mathematics... the programming is just to implement an idea, and most data science / AI programming is procedural anyway

brave umbra Sep 22, 2021, 8:32 AM

#

tender hearth What's important is your proficiency in mathematics... the programming is just t...

What standard should my maths be at?

tender hearth Sep 22, 2021, 8:33 AM

#

Ideal would be 10th-11th grade

#

Calculus and linear algebra

brave umbra Sep 22, 2021, 8:34 AM

#

@tender hearth So i'd need to know that to do AI?

tender hearth Sep 22, 2021, 8:35 AM

#

You'd need to know that to understand how it works behind the scenes

#

even though libraries abstract a lot of it away from you it's still essential when gluing stuff together

brave umbra Sep 22, 2021, 8:41 AM

#

damn, maths isn't my strong suit

tender hearth Sep 22, 2021, 8:46 AM

#

brave umbra damn, maths isn't my strong suit

It's alright, that's what we are here for 😆

#

If you have any questions feel free to ask

brave umbra Sep 22, 2021, 10:04 AM

#

thank you

gaunt marsh Sep 22, 2021, 10:15 AM

#

My matplotlib pie chart only shows one color

pure gull Sep 22, 2021, 10:23 AM

#

Hi, how on (or off!) earth do I get cuda/torch to play together on wintendo 10?

gaunt marsh Sep 22, 2021, 10:31 AM

#

can anyone help me with this?

pure gull Sep 22, 2021, 10:31 AM

#

no code, no dice

gaunt marsh Sep 22, 2021, 10:32 AM

#



def format_rgb(rgb):
    # precision format for Float numbers to a string
    return '(' + ', '.join(format(val, '06.2f') for val in rgb) + ')'


rgb_labels = rgb_df.apply(format_rgb, axis=1)

rgb_counts = rgb_labels.value_counts()
print(rgb_counts[0])
rgb_10 = []
hex_10 = []
for k in range(len(rgb_counts)):
    if rgb_counts[k] >= 3:
        rgb_10.append(rgb_counts[k])
        hex_10.append(real_col[k])


def pct_to_absolute(value):
    a = rgb_10
    return a

plt.figure(figsize=(8, 6))
plt.pie(rgb_10, labels=hex_10, colors=hex_10, normalize=True, autopct=pct_to_absolute)
plt.show()```

pure gull Sep 22, 2021, 10:34 AM

#

gaunt marsh ```rgb_df = pd.DataFrame(ara, columns=list('rgb'), dtype=float) def format_rgb...

use a pastebin, gist, ...
NameError: name 'ara' is not defined

gaunt marsh Sep 22, 2021, 10:35 AM

#

pure gull 1. use a pastebin, gist, ... 2. NameError: name 'ara' is not defined

sorry, next time pastebin... ara is a np.array, which looks like this:

 [218.69241573 215.50561798 193.4508427 ]
 [252.28142803 250.81048234 232.56323585]
 ...
 [188.23099415 177.65497076 155.31067251]
 [151.82307692 144.07802198 128.6978022 ]
 [ 70.58236659  68.68097448  62.73085847]]```

pure gull Sep 22, 2021, 10:40 AM

#

gaunt marsh sorry, next time pastebin... ara is a np.array, which looks like this: ```[[237...

it's good practice to prepare a standalone, runnable, minimal example before asking. This lets you figure out the problem more easily yourself, and people trying to help can do just that, instead of debugging how to set up the environment details etc

#

I can't run this without e.g. adding example data as ara

rough mountain Sep 22, 2021, 11:32 AM

#

Could you guys help me collect some data for a data science project? https://forms.gle/TgLm8N5HwkFfMPbf7

Google Docs

School Opinion

gaunt marsh Sep 22, 2021, 11:33 AM

#

How can I print all Values in the console without these ..?

Bildschirmfoto_2021-09-22_um_13.26.55.png

earnest shuttle Sep 22, 2021, 11:53 AM

#

Hi guys! I am stuck in an ML project... anyone that can guide me through?

prime hearth Sep 22, 2021, 11:58 AM

#

@earnest shuttle

#

Just post here

earnest shuttle Sep 22, 2021, 11:59 AM

#

I'm actually stuck in an assignment for kNN
Have it due in some days
But I am not able to resize the data
It's a number recognition file in a .mat format and it includes 8s and 9s
I need to plot testing error rate vs k

#

Please dont judge, im new to ML

#

from scipy.io import loadmat
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np
import seaborn as sbn
from sklearn.neighbors import KNeighborsClassifier
dataset = loadmat(r"C:\Users\Kushal\Downloads\NumberRecognition.mat")
dataset.keys()
zeros = dataset["imageArrayTraining0"]
eights = dataset["imageArrayTraining8"]
nines = dataset["imageArrayTraining9"]
img = nines[:,:,1]
#plt.imshow(img, cmap="Greys")

#creating labels
y_eights = np.zeros(eights.shape[-1])
y_nines = np.ones(nines.shape[-1])
np.concatenate([eights,nines], axis = 2).shape
np.transpose([eights,nines]).shape

#

Need to reshape the data next and then get X and y

#

after that need to train and predict the model

#

But stuck here as i dont know what to do next

prime hearth Sep 22, 2021, 12:20 PM

#

what do need to rehsape it to?

#

also , np. returns a new array

#

if want to use it new modify array, need to assign a variable to it

#

otherwise np.concatenate will do nothing since it not being used

#

can also find lots of tutorials on reshaping

shrewd fog Sep 22, 2021, 12:35 PM

#

How can I realese google colab unvailable?

#

Who can recommend other database to me?

desert oar Sep 22, 2021, 12:50 PM

#

gaunt marsh How can I print all Values in the console without these ..?

Nobody has any context for this, I know what it is because I wrote the code that created it, but you will need to be more specific when asking this kind of question in general

#

@gaunt marsh that's a pandas Series you are looking at

gaunt marsh Sep 22, 2021, 12:53 PM

#

desert oar <@200805179509964800> that's a pandas Series you are looking at

Yes, but shouldn‘t print() give out all values?

serene scaffold Sep 22, 2021, 1:10 PM

#

@gaunt marsh numpy and pandas data structures will truncate what gets displayed, but one can override this

#

The reason being that the size of the structures can be exceptionally large

desert oar Sep 22, 2021, 1:23 PM

#

gaunt marsh Yes, but shouldn‘t print() give out all values?

Try setting pd.options.display.max_rows to None, or looping over idx, val in series.items() and printing each accordingly

tender hearth Sep 22, 2021, 1:24 PM

#

earnest shuttle from scipy.io import loadmat import matplotlib.pyplot as plt from pathlib import...

!code

arctic wedgeBOT Sep 22, 2021, 1:24 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

tender hearth Sep 22, 2021, 1:24 PM

#

I wonder why @arctic wedge didn't do its thing and tell you to put it in a codeblock already

gaunt marsh Sep 22, 2021, 1:27 PM

#

desert oar Try setting `pd.options.display.max_rows` to `None`, or looping over `idx, val i...

thanks!

serene scaffold Sep 22, 2021, 1:27 PM

#

desert oar Try setting `pd.options.display.max_rows` to `None`, or looping over `idx, val i...

!docs pandas.Series.to_string

arctic wedgeBOT Sep 22, 2021, 1:27 PM

#

pandas.Series.to\_string


Series.to_string(buf=None, na_rep='NaN', float_format=None, header=True, index=True, length=False, dtype=False, name=False, max_rows=None, min_rows=None)```
Render a string representation of the Series.

gaunt marsh Sep 22, 2021, 1:27 PM

#

I have another question. Is it possible, to write Python code which does something like: every 10th textfile should go in directory x

serene scaffold Sep 22, 2021, 1:28 PM

#

I would probably use that

gaunt marsh Sep 22, 2021, 1:28 PM

#

that means that the next 10 files should go in directory y and so on

serene scaffold Sep 22, 2021, 1:28 PM

#

gaunt marsh I have another question. Is it possible, to write Python code which does somethi...

It is!

#

you can use os.walk, I guess

#

I usually use pathlib

gaunt marsh Sep 22, 2021, 1:31 PM

#

I guess that I need two loops. One which goes through all dirs and one which goes through all files, which shall be moved, right?

serene scaffold Sep 22, 2021, 1:32 PM

#

gaunt marsh I guess that I need two loops. One which goes through all dirs and one which goe...

you can make a flat list of all the files and then use chunked https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked

#

you'd have to pip install more-itertools but that's easy

earnest shuttle Sep 22, 2021, 1:33 PM

#

prime hearth what do need to rehsape it to?

A 2d matrix so its right now its a 3d matrix and i want to reshape it to a 2d matrix

serene scaffold Sep 22, 2021, 1:34 PM

#

earnest shuttle A 2d matrix so its right now its a 3d matrix and i want to reshape it to a 2d m...

you have to be careful about how you reshape things so that you don't mess up the association between axes

earnest shuttle Sep 22, 2021, 1:34 PM

#

Lol and I dont know what to do after reshaping

serene scaffold Sep 22, 2021, 1:34 PM

#

what is the current shape and what is the desired shape?

desert oar Sep 22, 2021, 1:34 PM

#

gaunt marsh I have another question. Is it possible, to write Python code which does somethi...

Use a single loop with enumerate checking if the index is % 10

serene scaffold Sep 22, 2021, 1:35 PM

#

desert oar Use a single loop with `enumerate` checking if the index is `% 10`

I considered that solution as well. they'd probably need another iterator of directory names that they could advance with next

desert oar Sep 22, 2021, 1:37 PM

#

oh, i misunderstood the original question

#

@gaunt marsh note that this isn't a #data-science-and-ml and you can/should feel comfortable asking in a help channel, see #❓｜how-to-get-help

#

import os

dirs = ['a', 'b', 'c']
files = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']

iter_dirs = iter(dirnames)
for i, f in enumerate(files):
    if i % 3 == 0:
        d = next(iter_dirs)
    os.rename(f, os.path.join(d, f))

#

something like that

#

normally i'd recommend using pathlib.Path.glob in real life. maybe also you want the directory list to cycle forever?

from itertools import cycle
from pathlib import Path

dirs = [Path('a'), Path('b'), Path('c')]
paths = Path('source-files').glob('*.txt')

for d, p in zip(cycle(dirs), paths):
    p.rename(d / p.name)

gaunt marsh Sep 22, 2021, 1:46 PM

#

desert oar normally i'd recommend using `pathlib.Path.glob` in real life. maybe also you wa...

What I want to do is to split all my 13.754 textfiles on subfolders "folder01, folder02 ...". The split should happen all 600 files

#

The folders are already created via cat folder.txt | xargs mkdir

#

folder.txt contains the names of the folders

pure gull Sep 22, 2021, 1:53 PM

#

@gaunt marsh might want to have a look at zmv if you're on mac/linux; it is great for stuff like this

#

do try with zmv -n to see what will be the result

earnest shuttle Sep 22, 2021, 1:56 PM

#

serene scaffold what is the current shape and what is the desired shape?

current shape is (750, 28, 28,) desired is (1500, 768)

#

if im transposing, it becomes 750,28,28,2 idk why

dull turtle Sep 22, 2021, 1:59 PM

#

i am getting this error ```python
Traceback (most recent call last):

File "F:\office codes\resampling code.py", line 19, in <module>
chunk['Price'].resample('1T').sum()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 957, in f
return self._downsample(_method, min_count=min_count)

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1053, in _downsample
self._set_binner()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 195, in _set_binner
self.binner, self.grouper = self._get_binner()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 202, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1042, in _get_binner_for_time
return self.groupby._get_time_bins(self.ax)

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1499, in _get_time_bins
first, last = _get_timestamp_range_edges(

File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1746, in _get_timestamp_range_edges
index_tz = first.tz

AttributeError: 'NaTType' object has no attribute 'tz'```

#

my code ```python
for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python', chunksize=100000 , iterator=True)):

chunk['Transaction Time'] = pd.to_datetime(chunk['Transaction Time'], errors='coerce')    
chunk = pd.DataFrame(chunk).set_index('Transaction Time')
chunk['Price'].resample('1T').sum()
print('chunk..')
print(chunk)
print()```

pure gull Sep 22, 2021, 2:03 PM

#

dull turtle i am getting this error ```python Traceback (most recent call last): File "F:...

looks to me like you have mixed a non-timestamp value into a dataframe where you're acting on it like it is a timestamp

dull turtle Sep 22, 2021, 2:04 PM

#

pure gull looks to me like you have mixed a non-timestamp value into a dataframe where you...

see python chunk.. Msgtype Activity Type ... Price Qty Transaction Time ... 2011-08-28 09:15:02.006138097 Order N ... 473430 1175 2011-08-28 09:15:02.707897899 Order N ... 255595 1175 2011-08-28 09:15:03.373856246 Order N ... 255600 500 2011-08-28 09:15:04.159525439 Order N ... 607185 1175 2011-08-28 09:15:05.213452151 Order M ... 255605 1175 ... ... ... ... ... 2011-08-28 09:15:14.243840962 Order M ... 21815 50 2011-08-28 09:15:14.243913774 Order M ... 21820 25 2011-08-28 09:15:14.243978819 Order M ... 21825 50 2011-08-28 09:15:14.244046696 Order M ... 21830 25 2011-08-28 09:15:14.244110503 Order M ... 21835 50 my dataframe

desert oar Sep 22, 2021, 2:07 PM

#

is this yet another person working on the same program?

dull turtle Sep 22, 2021, 2:07 PM

#

no i am only

desert oar Sep 22, 2021, 2:07 PM

#

or did you change your username @dull turtle ? are you daniel?

dull turtle Sep 22, 2021, 2:07 PM

#

desert oar or did you change your username <@!672406849538097173> ? are you daniel?

yes i am

desert oar Sep 22, 2021, 2:07 PM

#

ok, thanks. sorry

dull turtle Sep 22, 2021, 2:08 PM

#

dull turtle i am getting this error ```python Traceback (most recent call last): File "F:...

now i am working on this issue

serene scaffold Sep 22, 2021, 2:19 PM

#

earnest shuttle current shape is (750, 28, 28,) desired is (1500, 768)

!e Whatever you reshape an array to, the product of the two shapes has to be the same.

from math import prod
print(prod((750, 28, 28)) == prod((1500, 768)))

arctic wedgeBOT Sep 22, 2021, 2:19 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

False

gaunt marsh Sep 22, 2021, 3:13 PM

#

How can I iterate through a numpy array? I am getting TypeError: 'numpy.int64' object is not iterable

reef wing Sep 22, 2021, 3:16 PM

#

Very good question baby

#

You aren't iterating over an array to begin with; you have a scalar value of an int NOT an array of int

#

TLDR: Whatever is in that value isnt an array. @gaunt marsh and PS: baby freeza has horns I think; your profile picture is a lie!

gaunt marsh Sep 22, 2021, 3:29 PM

#

reef wing TLDR: Whatever is in that value isnt an array. <@!200805179509964800> and PS: ba...

thank you that helped me!

#

I have one last question. This is my Array: [11, 7, 7, 7, 6, 6, 5, 5]
I give this to a pie chart and it prints this:

#

This is wrong because every value in that array belongs to one part of the pie

#

instead it prints all values to all parts

#

Do you have any Idea how to make it plot the right value to each part?

#

@reef wing this is my favorite xbox 360 avatar 😛

reef wing Sep 22, 2021, 3:45 PM

#

So your array is good now; BUT your plotting method is bad; the only way for me to fix it is to see the plotting code.

rigid zodiac Sep 22, 2021, 3:47 PM

#

quick question how do you split the file inside the folder into test-train and validation

reef wing Sep 22, 2021, 3:53 PM

#

you don't split the file normally, you split the data into two separate variables

#

you can split the file also; but that would require reading in one file and creating two separate files

rigid zodiac Sep 22, 2021, 3:54 PM

#

reef wing you don't split the file normally, you split the data into two separate variable...

I tried to reason with my Senior guy, he said I can do it and he show me the folder that he did

#

it was in npy file

reef wing Sep 22, 2021, 3:54 PM

#

yeah if they are paying you to split a file then split the dang file lol

rigid zodiac Sep 22, 2021, 3:54 PM

#

manually?

reef wing Sep 22, 2021, 3:54 PM

#

I recommend reading it with the numpy module np.load()

gaunt marsh Sep 22, 2021, 3:55 PM

#

reef wing So your array is good now; BUT your plotting method is bad; the only way for me ...

https://pastebin.com/gN2HiiNv

Pastebin

rgb_df = pd.DataFrame(ara, columns=list('rgb'), dtype=float)def for...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

rgb_10 outputs [11, 7, 7, 7, 6, 6, 5, 5]

reef wing Sep 22, 2021, 3:55 PM

#

I think the common way to save or write to a file is np.save("filename.npy")

#

so

data = np.load("input.npy")

train, test = some_split_logic_you_invent(data)

with open('train.npy', 'wb') as f:
    np.save(f, train)

with open('test.npy', 'wb') as f:
    np.save(f, test)

#

I did 50% of the work there you just need to make that function to split and return two numpy arrays

#

woops I was wrong: https://numpy.org/doc/stable/reference/generated/numpy.save.html

#

sklearn has a train test split function

#

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

frigid elk Sep 22, 2021, 4:00 PM

#

having issues loading an xgboost booster model saved from R into python and getting consistent predictions. .. any caveats i might need to take into consideration?

reef wing Sep 22, 2021, 4:00 PM

#

Give me a second as my current virtualenv doesnt have matplotlib to figure it out @gaunt marsh

#

AH! I had that problem last year DRE

#

That's exactly why we had to move back to SKLEARN

#

are you storing into a pickle file as I did?

#

https://xgboost.readthedocs.io/en/latest/faq.html

Slightly different result between runs¶
This could happen, due to non-determinism in floating point summation order and multi-threading. Though the general accuracy will usually remain the same.

#

This can happen even if you don't store the file but pickles definitely exacerbated the problem

frigid elk Sep 22, 2021, 4:02 PM

#

no, .. saved from R. .. xgb.save('model.model') ... then loading in python

reef wing Sep 22, 2021, 4:02 PM

#

Yeah; i guess storing it as an object regardless of the method right now causes this to happen

#

the seed should stay the same within the same run; it's just that it seems to be set differently when running outside the same script instance

frigid elk Sep 22, 2021, 4:03 PM

#

so no fix? i need to continue using our model in R?

reef wing Sep 22, 2021, 4:03 PM

#

It may be different even if you run it again in R

frigid elk Sep 22, 2021, 4:03 PM

#

ah

reef wing Sep 22, 2021, 4:03 PM

#

the problem is if you get a consistent model like the Gradient booster in SKLEARN its like 10X slower

frigid elk Sep 22, 2021, 4:03 PM

#

no, i set the seed in r (23) and am able to get consistent predictions

reef wing Sep 22, 2021, 4:04 PM

#

Nice perfect; just double test it with that config of seed to make sure though;

frigid elk Sep 22, 2021, 4:04 PM

#

i tried exporting the config from R as well and loading that in python while loading the model, still crappy predictions

reef wing Sep 22, 2021, 4:05 PM

#

I had that problem and I actually had misinterpreted the output

#

sometimes the output array is flipped

frigid elk Sep 22, 2021, 4:05 PM

#

i'm good with r, but would prefer python due to other requirements. .. namely automation and shap

#

really? flipped? ... i'll check on that

reef wing Sep 22, 2021, 4:06 PM

#

yeah lmao I was classifying my customers as Assasins instead of Enjoying my product on text classification for like a whole day; did not end well. Hence the support team contacting people that actually loved the product lol

frigid elk Sep 22, 2021, 4:06 PM

#

lol

reef wing Sep 22, 2021, 4:06 PM

#

Thank god I was the only tech person that understood what was going on and I didn't get in trouble

frigid elk Sep 22, 2021, 4:07 PM

#

i'm working on attrition, but yeah.. flipped results will definitely be noticed here

gaunt marsh Sep 22, 2021, 4:07 PM

#

reef wing Give me a second as my current virtualenv doesnt have matplotlib to figure it ou...

ok 🙂 I use virtualenv's as well when I code with PyCharm

reef wing Sep 22, 2021, 4:08 PM

#

yeah the only way I did it is by using plt.pie(x=data_array, labels=label_array)

#

where data_array elements and label_array elements are uni-dimensional and each element corresponds with each other

#

that method wont show the numbers inside the piechart; but has correct labels

gaunt marsh Sep 22, 2021, 4:13 PM

#

reef wing that method wont show the numbers inside the piechart; but has correct labels

It doesn't matter if the correct value for each slice is in the pie or as a label. I still didn't get how to have the correct value for each part of the pie and not the whole array...

#

the data_array is rgb_10 for me which is [11, 7, 7, 7, 6, 6, 5, 5]

reef wing Sep 22, 2021, 4:21 PM

#

Where does real_col come from?

#

also people will nag at you when your coding for money about your loop

#


rgb_10 = []
hex_10 = []
for k in range(len(rgb_counts)):
    if rgb_counts[k] >= 5:
        rgb_10.append(rgb_counts[k])
        hex_10.append(real_col[k])


rgb_10 = []
hex_10 = []
for count, col in zip(rgb_counts, real_col):
    if count >= 5:
        rgb_10.append(count)
        hex_10.append(col)

frigid elk Sep 22, 2021, 4:22 PM

#

@reef wing no go on the flipped output array. ... is there somewhere i should be setting the seed outside of np.random.seed? does the model not save that as well?

reef wing Sep 22, 2021, 4:25 PM

#

Tbh I think it's what I stated before; when saving the object into your file system it kinda deletes the local variables in the methods of the XGBoost object when you load it

#

regardless of the language you use to boot up the model

#

But it was a year ago

frigid elk Sep 22, 2021, 4:27 PM

#

bummer, did you find out how to port a specific model over? ... i tried pickling via reticulate in r but get a 'cannot pickle pycapsule object' error

#

doesn't look like dill supports that either

reef wing Sep 22, 2021, 4:32 PM

#

yeah for models that had to share with other people I used sklearn and it was consistent

#

for my own models I winged it and used xgboost

frigid elk Sep 22, 2021, 4:34 PM

#

so sklearn implementation of xgboost is less accurate?

reef wing Sep 22, 2021, 4:35 PM

#

sklearn implementation is more consistent when storing the model into hard disk; but slower in training stage

frigid elk Sep 22, 2021, 4:35 PM

#

gotcha

#

is it worth training in xgboost native and pickling to sklearn to save and distribute?

reef wing Sep 22, 2021, 4:37 PM

#

I don't think there is a transformer for XGboost to sklearn model tbh

#

Not 100% certain

frigid elk Sep 22, 2021, 4:38 PM

#

oh, .. my mistake. .. was thinking the XGBRegressor could use booster

reef wing Sep 22, 2021, 4:38 PM

#

Depends on what you are talking about; but I think the "boost" is using stochastic methods instead of deterministic methods to train

#

hence the randomness; but that randomness boosts the speed of training

#

@gaunt marsh

#

Did u fix ur problem freeza?

#

autopct is what helps you label inside the pie

slim drift Sep 22, 2021, 4:57 PM

#

desert oar you asked how to find the overlaps between two lists of time spans, i am showing...

hey i made some code logic to go in the def you gave me, can you check it?

#

!code

arctic wedgeBOT Sep 22, 2021, 4:57 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

slim drift Sep 22, 2021, 4:58 PM

#

'''py
total_count = 0
total_frames = 0
overlap_count = 0
overlap_frames = 0
for i in manuals_spans:
total_count += 1
total frames += (i[1]-i[0])
check = 0
for j in comp_spans:
if (j[0] > i[0] and j[0] < i[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (i[1] - j[0])
elif (i[0] > j[0] and i[0] < j[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (j[1] - i[0])
percent_called = (float(overlap_count)/total_count)*100
percent_overlap = (float(overlap_frames )/total_frames)*100
print(percent_called, percent_overlap)
'''

prime hearth Sep 22, 2021, 5:57 PM

#

Thanks!

stable kestrel Sep 22, 2021, 6:24 PM

#

hey

#

do you guys know something about monte carlo method?

reef wing Sep 22, 2021, 7:02 PM

#

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle.```

It is normally used to estimate a parameter in a statistical formula

#

Like the MLE method or some other method like method of moments

#

Except MLE and method of moments use pure mathematical solutions

#

it's in essence an old version of the reinforcement machine learning technique

scenic drum Sep 22, 2021, 7:03 PM

#

hey yall, anyone know if conda tends to work ok and not mess up the rest of a system if I simply want to use strictly conda or strictly not-conda for each different project? years ago it used to trample stuff or something. (thread continued from #help-cheese)

bitter fiber Sep 22, 2021, 7:04 PM

#

Yes

#

Stick to whatever you already know and use well

#

Dont invent hypothetical problems just be productive until you actually face a problem at this point the value isnt the software for data science its the applications of it

scenic drum Sep 22, 2021, 7:05 PM

#

wait, @bitter fiber, that answer is slightly surprising as a response to my question so I'm confused if it was to me. you're in fact saying "yeah, it still tramples stuff, avoid conda unless it's in docker for someone else's project" basically?

bitter fiber Sep 22, 2021, 7:06 PM

#

Oh no lol i mean use conda if thats what you are already using

#

Trampling stuff no but yes on 2nd point

#

Imho i just use normal python and i’ve done ok

scenic drum Sep 22, 2021, 7:07 PM

#

to clarify, a lot of packages I've seen recently have been saying "hard to install without conda", so I want to see what these things are that are mostly conda-only, and try them out a bit. but to do that, I want to make sure if I try conda, it won't mess with my existing python installations (system, and on some computers, pyenv)

#

eg homebrew messes other pythons up

reef wing Sep 22, 2021, 7:15 PM

#

This is more of an operating system problem; as I would use whereis python or python --version and not a true data science problem; I recommend asking in another channel as they can help you more. I have trouble with this too; which is why I have so many computers and am conservative with what I download to not increase stress.

#

also note that every environment will use a separate pip and python combination; hence there will be no problems if maintaining a project within an environment.

daring blaze Sep 22, 2021, 7:24 PM

#

I’m not sure if this is something appropriate for this channel, or perhaps #async-and-concurrency instead, but I was wondering if anyone here had insight on the best way to train multiple RL agents “together”? By together though, I just mean training like multiple Q-learning agents with different hyper parameters at the same (or near the same) time

reef wing Sep 22, 2021, 7:25 PM

#

Do the two RL agents communicate?

#

If they don't I would use the below instead of x*x train the model in the function

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

#

wierd i cant design my code

lunar grotto Sep 22, 2021, 7:26 PM

#

Any recommendations for APIs or free solutions to easily get the complete paginated search results from a google query search?

reef wing Sep 22, 2021, 7:27 PM

#

i wish

#

https://stackoverflow.com/questions/4082966/what-are-the-alternatives-now-that-the-google-web-search-api-has-been-deprecated

Stack Overflow

What are the alternatives now that the Google web search API has be...

Google Web Search API has been deprecated and replaced with Custom Search API (see http://code.google.com/apis/websearch/).

I wanted to search the whole web but it looks like with the new API only

#


Google Custom Search gives you 100 queries per day for free.
After that you pay $5 per 1000 queries.
There is a maximum of 10,000 queries per day.```

lunar grotto Sep 22, 2021, 7:29 PM

#

That doesn't sound like extortion. Thanks!

daring blaze Sep 22, 2021, 7:31 PM

#

reef wing If they don't I would use the below instead of x*x train the model in the functi...

They wouldn’t have to interact or share any info between themselves, something like what you shared was what I was thinking but I wasn’t sure if there was something more elegant for training ML/RL algorithms

reef wing Sep 22, 2021, 7:37 PM

#

You can pass a parameter to most machine learning modules that increase the number of processors you want to use in the model

#

but if you're doing reinforcement learning that's likely not the case; as you are developing it yourself most of the time.

#

from multiprocessing import Pool

def f(x):
    return x*x

if name == 'main':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))```

#

I added color to the code to make it more elegant 😄

daring blaze Sep 22, 2021, 7:40 PM

#

Yea my goal is to build everything ground up but train different agents in parallel to do hyperparameter searching

reef wing Sep 22, 2021, 7:43 PM

#

I love the notion of reinforcement learning. Just let the computer figure it out, sounds like super high tech but so simple lol.

daring blaze Sep 22, 2021, 7:45 PM

#

That's why I'm trying to play with it lol, my goal is to go through the Richard & Sutton book and replicate their graphs before moving onto some more complex/modern approaches

#

The whole "figure it out yourself" is much more interesting to me than "here's a million pictures, find the dog" lol

#

Though I know that's an oversimplification

reef wing Sep 22, 2021, 7:47 PM

#

I want to do it for chess tbh

#

or checkers; since it's simpler

#

Simply because I want to make and practice making new dashboards in streamlit

arctic wedgeBOT Sep 22, 2021, 7:48 PM

#

Hey @quasi parcel!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

reef wing Sep 22, 2021, 7:48 PM

#

Thank god good bot

stable kestrel Sep 22, 2021, 7:49 PM

#

ehm

#

but for that method

reef wing Sep 22, 2021, 7:49 PM

#

oo @daring blaze someone made a checkers with rest api connection so you can play with your code

stable kestrel Sep 22, 2021, 7:49 PM

#

we're tyring to plot and calculate the integral with the monte ccarlo method

#

can you maby help?

#

if you are avalible?

reef wing Sep 22, 2021, 7:50 PM

#

I think you should ask direct questions of where you are stuck and not ask for someone to help with the whole thing as we aren't getting paid

stable kestrel Sep 22, 2021, 7:50 PM

#

we're stuck at getting the dots to work

#

ehm

#

this is the code rn

#

https://codeshare.io/wn4wbP

#

and the assignment is plot the graph,make the points outside the graph red and inside the graph green

#

and watch out when the graph goes below the y axis because then the values get negative

#

that's we're were stuck rn

#

because the rest is working

#

reef wing Sep 22, 2021, 7:54 PM

#

I am still not sure which specific part you are stuck :/

stable kestrel Sep 22, 2021, 7:55 PM

#

no worries ❤️

#

i explained it worng probs

#

ehm

#

when the values go below the y axis

#

*x axis

#

the code doesnt hold

#

and yeah the other point

#

on the picture

reef wing Sep 22, 2021, 7:56 PM

#

to check if a value went below the x axis; you have to check if that value is greater than 0

stable kestrel Sep 22, 2021, 7:59 PM

#

ehm

#

hmm

#

below the x axis

#

are negative yvalues

#

i got it a bit

#


    #for y_values >= 0:
        
        if random_y_value <= y_values:
            green_dots_y_value.append(random_y_value)
            green_dots_x_value.append(random_x_value)
            counter_points_inside_graph += 1


        else:
            red_dots_y_value.append(random_y_value)
            red_dots_x_value.append(random_x_value)
            counter_points_outside_graph += 1```

quasi parcel Sep 22, 2021, 8:08 PM

#

https://paste.pythondiscord.com/toquxujeco.apache
the above link is the data, i have a problem here with my code in which the product_ids are not fully getting fetched,
i mean even thought the product ids are there in the data column after running through this code which will separate and
keep in a list its not working for this case
this is the code https://paste.pythondiscord.com/bigujivewa.py
i am not understanding how to solve

this is the out put i am getting

#

but this is the expected out

#

#

i am not understanding why it is happening like htis

silver sun Sep 22, 2021, 8:11 PM

#

Is anyone familiar with the fashion MNIST data set?

reef wing Sep 22, 2021, 8:12 PM

#

@quasi parcel the first problem is that you are not using the methods that pandas has for dataframes

#

also i think you need to wrap your regex thing with parenthesis before using the plus sign

#

so you can capture more than one thing

prime hearth Sep 22, 2021, 8:16 PM

#

@silver sun a lot are , afterall this is ai channel

reef wing Sep 22, 2021, 8:16 PM

#

res = re.findall("'Product_Id': '[(0-9)+]'", str((r))) note the plus inside the brackets idk how one would handle comma delimited in re module but I wouldn't even use re module to begin with for this problem.

prime hearth Sep 22, 2021, 8:16 PM

#

It be better if ask the problem are facing rather than asking to ask

reef wing Sep 22, 2021, 8:17 PM

#

Yeah I also prefer downloading the model already trained from the MNIST site http://yann.lecun.com/exdb/mnist/

#

Instead of training one yourself; unless you are learning how to train models ( It's out there just cant find it right now )

silver sun Sep 22, 2021, 8:20 PM

#

prime hearth <@764272816757276682> a lot are , afterall this is ai channel

Yea sorry about that. I am trying to display the first 5 images for each class in my 10 x 5 grid but I'm stuck.

#

fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(10,5))
for i in range(5):
image = X_train[i]
label = Y_train[i]
label_name = label_names[label]
axs[i].imshow(image, cmap='gray') # imshow renders a 2D grid
axs[i].set_title(label_name)
axs[i].axis('off')
plt.show()

prime hearth Sep 22, 2021, 8:21 PM

#

What Happens

reef wing Sep 22, 2021, 8:21 PM

#

are you displaying anything at all at this point?

prime hearth Sep 22, 2021, 8:21 PM

#

When run code

silver sun Sep 22, 2021, 8:21 PM

#

So far I have just figured out to print out the first 5 pictures

reef wing Sep 22, 2021, 8:22 PM

#

you hardcoded 5 in many places

#

So that may be it; try settings num_images = 10 on top and putting num_images instead of 5 hardcoded

#

You seem to have reached your own goal after re-reading everything you wrote lol

#

I am trying to display the first 5 images for each class then So far I have just figured out to print out the first 5 pictures

silver sun Sep 22, 2021, 8:25 PM

#

silver sun Sep 22, 2021, 8:26 PM

#

reef wing `I am trying to display the first 5 images for each class` then `So far I have j...

This is what I have so far. I wanted to see the first 5 of each class. Like 5 tshirts, 5 pullovers, etc.

quasi parcel Sep 22, 2021, 8:27 PM

#

reef wing `res = re.findall("'Product_Id': '[(0-9)+]'", str((r)))` note the plus inside th...

could you please suggest a better way for this @reef wing

#

apart from regex

quasi parcel Sep 22, 2021, 8:49 PM

#

can anyone please help me

#

please

frigid elk Sep 22, 2021, 9:03 PM

#

@reef wing curious your take on rstudio vs any other ide? ala vscode. .... any advantages to leaving rstudio?

quasi parcel Sep 22, 2021, 9:04 PM

#

after doing your change @reef wing the list is empty

#

just an observation for less data the code is working fine

#

can anyone help me on how to get all products_ids for each customer from this data column and keep it in a separate column https://paste.pythondiscord.com/toquxujeco.py

#

apart from using regex

#

the above like is the data

#

please help me

#

the out to be expected is this

#

polar dock Sep 22, 2021, 10:00 PM

#

You can use .explode() in pandas

#

On that series

quasi parcel Sep 22, 2021, 10:00 PM

#

how that even work

polar dock Sep 22, 2021, 10:01 PM

#

Each element of the series product_id is a list

quasi parcel Sep 22, 2021, 10:02 PM

#

the expected output should in a list

polar dock Sep 22, 2021, 10:02 PM

#

I'm writing a test for you

#

One sec

quasi parcel Sep 22, 2021, 10:03 PM

#

okay sure

polar dock Sep 22, 2021, 10:05 PM

#

I'll DM you

quasi parcel Sep 22, 2021, 10:06 PM

#

sure

#

please

prime hearth Sep 22, 2021, 10:55 PM

#

hello, i would like to please ask for Natural lang processing, is it better to first tokenize then do text cleaning or the other way around?

#

usually i do collect raw data then text cleaning then tokenize

magic dune Sep 22, 2021, 11:34 PM

#

does matplotlib take html values for color when ploting

reef wing Sep 23, 2021, 12:28 AM

#

depends on your ram constraints I have 256 GB of ram for NLP and I clean out using PCA in my grid search.

#

Also note that you should change your tokenizer to convert to int32 or int16 to reduce ram consumption * <- this took me years to learn haha.

#

@prime hearth

prime hearth Sep 23, 2021, 12:39 AM

#

oh ok thx

prisma mulch Sep 23, 2021, 7:13 AM

#

can someone eli5 random_state

weary lynx Sep 23, 2021, 7:17 AM

#

so

#

how to get started with data science and ai

warm nebula Sep 23, 2021, 7:28 AM

#

Is this ai?

weary lynx Sep 23, 2021, 7:31 AM

#

warm nebula Is this ai?

yeah

warm nebula Sep 23, 2021, 7:33 AM

#

Well

tender hearth Sep 23, 2021, 7:43 AM

#

weary lynx how to get started with data science and ai

https://developers.google.com/machine-learning/crash-course

Google Developers

Machine Learning Crash Course | Google Developers

weary lynx Sep 23, 2021, 7:44 AM

#

thx

tender hearth Sep 23, 2021, 7:44 AM

#

Throws a lot of terminology at you. Be warned

quasi parcel Sep 23, 2021, 8:30 AM

#

i am facing some trouble with cosine similarities and weight-rating into pivot table

tender hearth Sep 23, 2021, 8:39 AM

#

quasi parcel i am facing some trouble with cosine similarities and weight-rating into pivot t...

Share it

quasi parcel Sep 23, 2021, 8:39 AM

#

yes two mins sharing the csv

#

i am copying

#

it

#

two mins

#

https://paste.pythondiscord.com/uhuzaxewuh.csv
https://paste.pythondiscord.com/egamurezol.csv
https://paste.pythondiscord.com/ulocucuton.py

here are the csv for these i have the pivot table already but i need to get a cosine similarity and weight-rating and create another combine matrix

#

first need to find weightage after that cosine similarities

#

please anyone help

gaunt marsh Sep 23, 2021, 9:01 AM

#

I want to write a list into a textfile with this Code:


for row in rgb_10:

    np.savetxt(value_file, row)

value_file.close()```

But it says `ValueError: Expected 1D or 2D array, got 0D array instead`

This is what my list looks like

Bildschirmfoto_2021-09-23_um_11.01.01.png

quasi parcel Sep 23, 2021, 9:17 AM

#

can anyone help please

arctic wedgeBOT Sep 23, 2021, 9:30 AM

#

:incoming_envelope: :ok_hand: applied mute to @errant hull until <t:1632390001:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

royal crest Sep 23, 2021, 9:36 AM

#

!d sklearn.metrics.pairwise.cosine_similarity

arctic wedgeBOT Sep 23, 2021, 9:36 AM

#

sklearn.metrics.pairwise.cosine\_similarity


sklearn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True)```
Compute cosine similarity between samples in X and Y.

Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y:

>  K(X, Y) = <X, Y> / (||X||*||Y||)
> 
>   On L2-normalized data, this function is equivalent to linear\_kernel.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/metrics.html#cosine-similarity).

royal crest Sep 23, 2021, 9:38 AM

#

not too sure what weightage is

quasi parcel Sep 23, 2021, 9:39 AM

#

got it

#

got it

#

@royal crest

#

thank you

#

and can you tell how to add two different size martix using numpy

royal crest Sep 23, 2021, 9:41 AM

#

quasi parcel and can you tell how to add two different size martix using numpy

pretty sure you need two matrices of the same dimension in order to perform matrix addition.

quasi parcel Sep 23, 2021, 9:42 AM

#

yes but is there anyway we can achive that

royal crest Sep 23, 2021, 9:43 AM

#

can't you just use +

#

!e

import numpy as np

a = np.array([[3, 6, 9], [5, -10, 15], [-7, 14, 21]])
b = np.array([[9, -18, 27], [11, 22, 33], [13, -26, 39]])
print(a + b)

arctic wedgeBOT Sep 23, 2021, 9:44 AM

#

@royal crest :white_check_mark: Your eval job has completed with return code 0.

001 | [[ 12 -12  36]
002 |  [ 16  12  48]
003 |  [  6 -12  60]]

royal crest Sep 23, 2021, 9:45 AM

#

i don't know any fancier way to add matrices, perhaps someone else can chime in

royal crest Sep 23, 2021, 9:48 AM

#

gaunt marsh I want to write a list into a textfile with this Code: ```value_file = open("/x...

maybe convert your list into a numpy array then write it?

#

!e

import numpy as np

a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(f'{a}\n{type(a)}')

b = np.array(a)
print(f'{b}\n{type(b)}\nDimensions:{b.ndim}')

arctic wedgeBOT Sep 23, 2021, 9:50 AM

#

@royal crest :white_check_mark: Your eval job has completed with return code 0.

001 | [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
002 | <class 'list'>
003 | [[1 2 3]
004 |  [4 5 6]
005 |  [7 8 9]]
006 | <class 'numpy.ndarray'>
007 | Dimensions:2

quasi parcel Sep 23, 2021, 10:17 AM

#

got it

rigid zodiac Sep 23, 2021, 10:43 AM

#

quick question, do we have some sort of recurrent KNN?

gaunt marsh Sep 23, 2021, 11:02 AM

#

royal crest maybe convert your list into a numpy array then write it?

still doesn't work :/ my Array is now [43 19 16 15 14 12 12 12 12 11 11 11 11 10 10 10 10 10 10]and it says ValueError: Expected 1D or 2D array, got 0D array instead

tender hearth Sep 23, 2021, 11:36 AM

#

rigid zodiac quick question, do we have some sort of recurrent KNN?

What would that achieve

rigid zodiac Sep 23, 2021, 11:53 AM

#

tender hearth What would that achieve

Like i'm trying to train a model to detect whether it fall or no fall, and my junior was like you can use recurrent knn.

I think he meant RNN but he insist it recurrent KNN

tender hearth Sep 23, 2021, 11:54 AM

#

I mean

#

KNN is an iterative process, maybe he meant that

#

although he could've just said KNN

rigid zodiac Sep 23, 2021, 11:55 AM

#

that's what I thought. like in grad school they didnt mention about recurrent KNN... so idk

In his word, he was like you can use initial set of data to create KNN model, then keep training KNN... like epoch

#

https://tenor.com/view/scooby-doo-woof-bark-ruh-roh-gif-4303248

Tenor

Ruh Ruh

▶ Play video

quasi parcel Sep 23, 2021, 12:30 PM

#

can anyone tell how to get the weightage for each product from pivot table

serene scaffold Sep 23, 2021, 12:33 PM

#

@quasi parcel what is weightage

quasi parcel Sep 23, 2021, 12:37 PM

#

so for each customer there is a product assigned need to find the weightage for each product for every occuracnce give me some time i will explain properly @serene scaffold sir

royal crest Sep 23, 2021, 12:41 PM

#

i don't think that's a proper term

#

do you mean weighted average?

quasi parcel Sep 23, 2021, 12:42 PM

#

yes

royal crest Sep 23, 2021, 12:43 PM

#

you should probably write a function to calculate it

#

lambda that is

serene scaffold Sep 23, 2021, 12:47 PM

#

quasi parcel so for each customer there is a product assigned need to find the weightage for ...

Can you give an example of the data?

serene scaffold Sep 23, 2021, 12:47 PM

#

royal crest lambda that is

Start with the assumption that you can solve something without making a lambda.

quasi parcel Sep 23, 2021, 12:48 PM

#

yes sure

#

i can

serene scaffold Sep 23, 2021, 12:49 PM

#

Please provide it as text and ping me when you have it.

quasi parcel Sep 23, 2021, 1:04 PM

#

sure

#

give me some time

#

the algo is running

#

two mins

#

sorry

rigid zodiac Sep 23, 2021, 1:36 PM

#

any one know how to categorize or combine frames in millisecond into second?

pastel valley Sep 23, 2021, 2:11 PM

#

can someone drop cs thesis topics here? any recommendations?

#

topic for cs student

spiral gale Sep 23, 2021, 2:43 PM

#

Hey guys, last week my boss proposed me to create an AI to form processing, but I am new to AI and ML.
I started my research on python library and found PyTorch and TensorFlow.
I want to study more about them, but I am lost where I start.
Can anyone help me?

lapis sequoia Sep 23, 2021, 2:58 PM

#

spiral gale Hey guys, last week my boss proposed me to create an AI to form processing, but ...

From Wikipedia: "Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format." Why you would need to build AI, when this can be done really simply by little coding?

lapis sequoia Sep 23, 2021, 3:02 PM

#

pastel valley can someone drop cs thesis topics here? any recommendations?

Read new books and studies. The more you read, the more ideas you get. A good place to get ideas with good keywords: https://ieeexplore.ieee.org/Xplore/home.jsp

quasi parcel Sep 23, 2021, 3:26 PM

#

https://paste.pythondiscord.com/vunizezapa.yaml

#

really sorry there was some prod issue i had to address

#

this is the data

#

@serene scaffold

serene scaffold Sep 23, 2021, 3:28 PM

#

quasi parcel https://paste.pythondiscord.com/vunizezapa.yaml

and you want the weighted mean of what?

spiral gale Sep 23, 2021, 3:28 PM

#

lapis sequoia From Wikipedia: "Forms processing is a process by which one can capture informat...

The problem is that our forms are not structured
We already use a tool to capture the data, but it is manual because there are some fields that need optical recognition.

quasi parcel Sep 23, 2021, 3:28 PM

#

for the product id occurance for customer

serene scaffold Sep 23, 2021, 3:29 PM

#

what determines the weight?

spiral gale Sep 23, 2021, 3:35 PM

#

spiral gale The problem is that our forms are not structured We already use a tool to captur...

here's a example, we need to recognize that the option 'Diariamente' from field 'Bebidas Alcoolicas' are selected, and do this for all olther fields

quasi parcel Sep 23, 2021, 3:36 PM

#

the occurance of each product_id

#

@serene scaffold

serene scaffold Sep 23, 2021, 3:37 PM

#

quasi parcel the occurance of each product_id

I don't think I can look into this rn but you can get weighted mean with np.repeat and np.mean

spiral gale Sep 23, 2021, 3:37 PM

#

spiral gale here's a example, we need to recognize that the option 'Diariamente' from field ...

Also, these fields and other different fields can be anywhere in the document with different alignments

quasi parcel Sep 23, 2021, 3:39 PM

#

let me check thank you sir @serene scaffold

chilly mica Sep 23, 2021, 3:42 PM

#

Does anyone have any experience using numpy and folium/heatmap? I've been trying to aska question for a few days but we havent gotten anywhere

reef wing Sep 23, 2021, 4:02 PM

#

seaborn has a pretty heatmap

#

sns.heatmap(df, annot=True)

#

Never done folium though.

#

https://stackoverflow.com/questions/54752175/add-heatmap-to-a-layer-in-folium

Stack Overflow

Add heatmap to a layer in Folium

I have this sample code:

from glob import glob
import numpy as np
import folium
from folium import plugins
from folium.plugins import HeatMap

lon, lat = -86.276, 30.935
zoom_start = 5
data = (...

slim drift Sep 23, 2021, 4:52 PM

#

hey if anyone here is able to help with a question come to helproom burrito

lapis sequoia Sep 23, 2021, 4:57 PM

#

spiral gale here's a example, we need to recognize that the option 'Diariamente' from field ...

Okay, kind of Optical Character Recognition. There are definitely ready-made OCR models for it. All you need is good training data and some study.

brisk moth Sep 23, 2021, 4:57 PM

#

can anyone help me implement a way to stop getting rate limited using twitter api

tough bolt Sep 23, 2021, 5:00 PM

#

https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/

Hey ... I'm not exactly great at math but trying to translate this to python

Computer vision for dummies

A geometric interpretation of the covariance matrix

In this article, we provide a geometric interpretation of the covariance matrix, exploring the relation between linear transformations and data covariance.

#

Problem: I'm not getting the result shown on this website

#

#

not exactly what it's supposed to look like

#

Here is what my code currently looks like:

        cov_matrix = np.cov(points_for_cov)
        eigenvalue, eigenvector = np.linalg.eig(cov_matrix)
        center_x = np.average(all_points_x)
        center_y = np.average(all_points_y)

#

So I'm simply using numpy to calculate the covariance matrix from my point list

and then linalg.eig for the eigenvalue and vector

#

The rotation for the ellipse I'm calculating like this

math.atan2(eigenvector[0][1], eigenvector[0][0])

#

Any idea where I could be going wrong ?

#

brisk moth Sep 23, 2021, 5:08 PM

#

nice graph

tough bolt Sep 23, 2021, 5:09 PM

#

brisk moth nice graph

the blue one is supposed to look like that, no worries.

the black lines are supposed to visualize the rotation and width of those ellipses above

brisk moth Sep 23, 2021, 5:10 PM

#

you want elipses instead of lines?

tough bolt Sep 23, 2021, 5:11 PM

#

brisk moth you want elipses instead of lines?

sort of. check the article above I linked

#

https://datascienceplus.com/understanding-the-covariance-matrix/

DataScience+

Understanding the Covariance Matrix

This article is showing a geometric and intuitive explanation of the covariance matrix and the way it describes the shape of a data set. We will describe the geometric relationship of the covariance matrix with the use of linear transformations and eigendecomposition. Introduction Before we get started, we shall take a quick look at the […]

#

also this

#

problem is, the black lines don't correctly fit my data

#

I can't pinpoint where I'm going wrong

#

I think the black line should more look like the red one

#

ellipses are just a different visualization of the black lines

brisk moth Sep 23, 2021, 5:24 PM

#

i think its grabbing that little blue splotch

#

and trying to fit it

tough bolt Sep 23, 2021, 5:28 PM

#

#

not sure honestly

#

might be it

reef wing Sep 23, 2021, 5:35 PM

#

What yall up to?

slim drift Sep 23, 2021, 5:44 PM

#

!code

arctic wedgeBOT Sep 23, 2021, 5:44 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

slim drift Sep 23, 2021, 5:44 PM

#

manual_spans = []
manual_spans.append(plt.axvspan(242068, 249530, 0.5, 1, color='g',alpha=0.5))
manual_spans.append(plt.axvspan(242055, 249503, 0.5, 1, color='g',alpha=0.5))

total_count = 0
total_frames = 0
overlap_count = 0
overlap_frames = 0
for i in manual_spans:
    total_count += 1
    total frames += (i[1]-i[0])
    check = 0
    for j in ranges_omega:
        if (j[0] > i[0] and j[0] < i[1]):
            if check == 0:
                overlap_count += 1
                check = 1
            overlap_frames += (i[1] - j[0])
        elif (i[0] > j[0] and i[0] < j[1]):
            if check == 0:
                overlap_count += 1
                check = 1
            overlap_frames += (j[1] - i[0])
percent_called = (float(overlap_count)/total_count)*100
percent_overlap = (float(overlap_frames )/total_frames)*100
print(percent_called, percent_overlap)

#

so im trying to just find the percent overlap between manual_calls and ranges_reversal

#

and need some help checking this code

tribal arrow Sep 23, 2021, 5:48 PM

#

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvspan.html

slim drift Sep 23, 2021, 5:50 PM

#

File "<ipython-input-121-551078431366>", line 9
    total frames += (i[1]-i[0])
               ^
SyntaxError: invalid syntax

#

TypeError                                 Traceback (most recent call last)
<ipython-input-122-cb481450f9a5> in <module>()
      7 for i in manual_spans:
      8     total_count += 1
----> 9     total_frames += (i[1]-i[0])
     10     check = 0
     11     for j in ranges_omega:

TypeError: 'Polygon' object is not subscriptable

#

anyone know how to fix this error?

#

#

@tribal arrow i want to find time overlap

#

on the x axis

tribal arrow Sep 23, 2021, 6:01 PM

#

can you join vc

#

it was helpful

slim drift Sep 23, 2021, 6:02 PM

#

yea im on a call with advisor

#

for like another 10 min

tribal arrow Sep 23, 2021, 6:02 PM

#

ah alright

#

Anyway, take the maximum and minimum value on the x-axis of the zigzag line (your sample)

#

and also the minimum and maximum value on the x-axis of the big rectangle

#

the difference, of each, between their minimum and maximum is their size

#

divide the rectangle size by the other and multiply the result by 100

#

should give you the percentage overlap

#

unless I'm making some elementary mistake

mossy maple Sep 23, 2021, 6:04 PM

#

how do i get pd.read_csv to read a uint64 col? tried pd.read_csv(fname, squeeze = True, usecols = [4], names = ['time'], dtype = { 'time': 'uint64' }, engine = 'c'), but it reads it as float64 instead

#

a typical entry in this field looks like this 1598918686328, which doesnt fit in 32 bits

lapis sequoia Sep 23, 2021, 7:05 PM

#

mossy maple how do i get `pd.read_csv` to read a `uint64` col? tried `pd.read_csv(fname, squ...

Pandas dtype is int64? NumPy type has uint64

rigid zodiac Sep 23, 2021, 7:06 PM

#

Hey guys I have a quick question, how can you categorize entire csv file?

#

I'm trying to categorize a csv, and the compare with other

lapis sequoia Sep 23, 2021, 7:07 PM

#

rigid zodiac Hey guys I have a quick question, how can you categorize entire csv file?

I would use Pandas

rigid zodiac Sep 23, 2021, 7:07 PM

#

lapis sequoia I would use Pandas

how? I know I can add a categorical column... but it's not that

rigid zodiac Sep 23, 2021, 7:08 PM

#

lapis sequoia I would use Pandas

Because i'm trying to use ML to identify pattern

mossy maple Sep 23, 2021, 7:09 PM

#

lapis sequoia Pandas dtype is int64? NumPy type has uint64

ive tried it with np.uint64 also, no luck. ended up using pyarrows read_csv instead

lapis sequoia Sep 23, 2021, 7:12 PM

#

mossy maple ive tried it with np.uint64 also, no luck. ended up using pyarrows read_csv inst...

Pandas doesn't have that dtype and converts it to float? Pandas has int64. Or is there NaN values?

serene scaffold Sep 23, 2021, 7:20 PM

#

I have a jupyter server running on a remote machine. how can I access the jupyter GUI for that server locally?

#

it gave me a url I can go to but obviously I can't just copy and paste that url into a local browser.

lapis sequoia Sep 23, 2021, 7:22 PM

#

Ah, https://pandas.pydata.org/docs/reference/api/pandas.UInt64Dtype.html

grave frost Sep 23, 2021, 7:36 PM

#

serene scaffold I have a jupyter server running on a remote machine. how can I access the jupyte...

ssh?

serene scaffold Sep 23, 2021, 7:36 PM

#

grave frost ssh?

what about ssh?

grave frost Sep 23, 2021, 7:40 PM

#

serene scaffold what about ssh?

I don't know much about networking myself but apparently you can use ngrok to make some sort of tunnel and use it remotely

#

@serene scaffold just use something like ColabCode if you are too lazy. I use it all the time

quasi parcel Sep 23, 2021, 7:41 PM

#

hi @serene scaffold remote server means is it sagemake or aws instance or how it is

#

can you tell me\

#

i think we can get throught apache

serene scaffold Sep 23, 2021, 7:42 PM

#

@grave frost @quasi parcel I think I figured it out but it's specific to this system.

quasi parcel Sep 23, 2021, 7:42 PM

#

okay awesome sir @serene scaffold

brisk moth Sep 23, 2021, 8:18 PM

#

can anyone help me with not getting rate limited using twitter api?

#


        keywords_tweets = tweepy.Cursor(self.api.search, keyword, lang = lang, tweet_mode = 'extended').items(count)

        return keywords_tweets``` is my main function

#

im guessing its a problem with using the cursor but how can i change it?

real wigeon Sep 23, 2021, 8:28 PM

#

how would i be able to count the occurences of a list of categories

#

using pandas

serene scaffold Sep 23, 2021, 8:33 PM

#

@real wigeon you'd need to share an example and the expected output

#

Is it a list of strings or what?

real wigeon Sep 23, 2021, 8:35 PM

#

yes

#

it's cetegorical data

#

strings

#

like if i query my db, and get a dataframe, and in that dataframe i want to count the values in a column

#

but i want to count how many instances of each category

#

my ultimate goal is to actually make a bar chart out of this

lapis sequoia Sep 23, 2021, 9:07 PM

#

real wigeon but i want to count how many instances of each category

Like this?

reef wing Sep 23, 2021, 9:11 PM

#

yep value_counts isnt bad

#

I recommend turning into a string before counting though

vale zephyr Sep 23, 2021, 11:37 PM

#

Hi ! Anyone know how to apply a green mask (show only the green on an image) within the forward function (need to convert it to OpenVINO) in PyTorch ?

rose cipher Sep 23, 2021, 11:39 PM

#

Guys, it is possible to learn the math needed for Data Science (Calculus, Linear Algebra, Statistics, and Probability) only with videos? I know that books are the right way, but there's a lot of unnecessary information that I will probably not use. And I do not want to master math, just want the necessary to work with Data Science and its fields (ML, AI, Data Engineering).

reef wing Sep 24, 2021, 12:13 AM

#

Nope @rose cipher ! There are things I did entire videos + writing down notes + all the exercises at the end of the chapter of book + reading the chapter

And I still forget it 2-3 years down the line

#

The things I do exercises and practice I remember after quick refresher video

#

What I think you should do is choose a subset of applications like I did; as it helps with learning much less stuff and makes you much more productive and smarter at that one thing by focusing:

Choose Image processing, text processing, sound processing, or some one domain first and master it as not all the models and math applies to everything.

#

Y finalmente bienvenido al mundo de datos; espero que todos los hispanos se apoyen.

serene scaffold Sep 24, 2021, 12:23 AM

#

rose cipher Guys, it is possible to learn the math needed for Data Science (Calculus, Linear...

who said that books are the right way? yes, I'm sure all the information you need is out there in youtube videos, though I would probably use a video series that has some structure to it and make sure that you have some way to verify that you're absorbing the information.

rose cipher Sep 24, 2021, 12:27 AM

#

serene scaffold who said that books are the right way? yes, I'm sure all the information you nee...

So, can I learn all the math I need for DS and Its fields with videos?

serene scaffold Sep 24, 2021, 12:38 AM

#

rose cipher So, can I learn all the math I need for DS and Its fields with videos?

probably

reef wing Sep 24, 2021, 12:55 AM

#

Lol ignore the well thought out and experienced answer if you must. The marketplace already pays me handsomely, wouldn't want competition anywho.

rose cipher Sep 24, 2021, 1:07 AM

#

reef wing Lol ignore the well thought out and experienced answer if you must. The marketpl...

Sorry?

prime hearth Sep 24, 2021, 1:12 AM

#

hello, i would like to please ask for sentimenatal anaylsis if something positive or negative would we use count vecotrizer or TfIDF or. bag of words etc and in text summary or NLP with relationship we use things like word2vec and glove?

reef wing Sep 24, 2021, 1:41 AM

#

@prime hearth Essentially you need a huge dictionary mapping words to sentiments and conditional sentiments. I wouldn't train one myself but use an existing library. Do you need an english one?

prime hearth Sep 24, 2021, 1:43 AM

#

oh it just for personal projects

#

i was looking over oding some research but from what i read the conclusion i got is if it for classification then to use like count vectorize since easier or others and see what better accuracy otherwise for NLP use word2vec etc

#

it because im making 2 models, one to classify something good or bad and another is to summarize text using RNN. However, i not sure if it best just to use word2vec for both models or whicch to use ? I know since similarity is important i would use word2vec but not sure for classification

reef wing Sep 24, 2021, 1:49 AM

#

Well vectorizing the data is only to prepare it for the model; you can input that "vector" of data into a gradient boosting algorithm or any classifier now; because you now have count/dummy variables as input

#

The one that's worked the best for me is gradient boosting for text classification. but I think BERT is the best one that exists for NLP

#

You can make separate models and use the output of one model as input to the 2nd model; OR use the classification to check if the summary returns the same label "GOOD" or "BAD" as the original text.

uncut turret Sep 24, 2021, 1:56 AM

#

Suggest me resource for AI or ML , python.

tender hearth Sep 24, 2021, 2:00 AM

#

uncut turret Suggest me resource for AI or ML , python.

A short list about Machine Learning
General Information

Machine learning

Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programme...

MIT Technology Review

What is machine learning?

Machine-learning algorithms find and apply patterns in data. And they pretty much run the world.

What Is Machine Learning? | How It Works, Techniques & Applications

Learn the 3 things you need to know about machine learning; Resources include MATLAB examples, documentation, and code describing different machine learning algorithms.

The Official NVIDIA Blog

The Difference Between AI, Machine Learning, and Deep Learning? | N...

AI, machine learning, and deep learning are terms that are often used interchangeably. But they are not the same things.

GitHub

GitHub - trekhleb/homemade-machine-learning: 🤖 Python examples of p...

🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained - GitHub - trekhleb/homemade-machine-learning: 🤖 Python examples of popular machine ...

uncut turret Sep 24, 2021, 2:02 AM

#

It would be better to follow one at a time?

tender hearth Sep 24, 2021, 2:03 AM

#

uncut turret It would be better to follow one at a time?

Well for me I started with this: https://cs50.harvard.edu/ai/2020/ and then this: https://developers.google.com/machine-learning/crash-course

#

Then after that I stopped following courses and started reading papers/articles for reference in my projects

prime hearth Sep 24, 2021, 2:06 AM

#

Another good resource too for learning data cleaning techniques is Krish Naik youtube channel

uncut turret Sep 24, 2021, 2:06 AM

#

Ok suggest me papers for references. I mean how I will use.

uncut turret Sep 24, 2021, 2:06 AM

#

prime hearth Another good resource too for learning data cleaning techniques is Krish Naik yo...

Ok

#

I don't want to stuck myself in many as I have other subjects not just one .

uncut turret Sep 24, 2021, 2:10 AM

#

tender hearth Well for me I started with this: <https://cs50.harvard.edu/ai/2020/> and then th...

Thanks for these..

uncut turret Sep 24, 2021, 2:12 AM

#

uncut turret Ok suggest me papers for references. I mean how I will use.

Well suggest me how can I get help from paper or article ?

tender hearth Sep 24, 2021, 2:16 AM

#

uncut turret Well suggest me how can I get help from paper or article ?

Papers and articles usually target specific problems. You can read them after you develop a respectable foundation, for example with the courses I sent above

uncut turret Sep 24, 2021, 2:20 AM

#

Right I just want to take glance 🙂

#

@tender hearth
Suggest me using weka related course.

tender hearth Sep 24, 2021, 2:29 AM

#

weka?

uncut turret Sep 24, 2021, 2:34 AM

#

https://en.m.wikipedia.org/wiki/Weka_(machine_learning)

Weka (machine learning)

Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

arctic wedgeBOT Sep 24, 2021, 2:57 AM

#

failmail :ok_hand: applied mute to @solar grotto until <t:1632452878:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

subtle barn Sep 24, 2021, 4:36 AM

#

how to perform ml with this type of data. Its FHIR dataset of Allergy Intolerance

lapis sequoia Sep 24, 2021, 4:38 AM

#

subtle barn how to perform ml with this type of data. Its FHIR dataset of Allergy Intoleranc...

what do you exactly mean by perform ml?

tender hearth Sep 24, 2021, 4:39 AM

#

subtle barn how to perform ml with this type of data. Its FHIR dataset of Allergy Intoleranc...

I mean. What kind of information do you want to extract from this dataset?

#

Do you want to predict something, classify something, etc?

#

You can't just "do ML" on data

arctic wedgeBOT Sep 24, 2021, 4:45 AM

#

Hey @subtle barn!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .json attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

subtle barn Sep 24, 2021, 4:47 AM

#

tender hearth Do you want to predict something, classify something, etc?

yes i couldnt understand what to do with that data

#

@tender hearth @lapis sequoia http://hapi.fhir.org/baseR4/AllergyIntolerance?_format=json pls check this out to me

hasty grail Sep 24, 2021, 4:53 AM

#

I suppose that this is for some open-ended project based on the given dataset

subtle barn Sep 24, 2021, 4:57 AM

#

hasty grail I suppose that this is for some open-ended project based on the given dataset

actually im doing an internship and they gave a sample dataset and im stuck. I have worked with csv but json is very new to me

hasty grail Sep 24, 2021, 4:57 AM

#

They didn't provide any guidance on what to do?

subtle barn Sep 24, 2021, 4:58 AM

#

they told to do like anything on it

subtle barn Sep 24, 2021, 5:00 AM

#

hasty grail They didn't provide any guidance on what to do?

they told this - "Pick your own data set and show us some real analytics and ML implementations based on patient data"

#

pls somebody can u do ml implementation on it. simple is fine

elfin atlas Sep 24, 2021, 5:24 AM

#

subtle barn they told this - "Pick your own data set and show us some real analytics and ML ...

https://www.tensorflow.org/api_docs howd you even get a ML internship without knowing how to ML

TensorFlow

API Documentation | TensorFlow Core v2.6.0

An open source machine learning library for research and production.

subtle barn Sep 24, 2021, 5:25 AM

#

elfin atlas https://www.tensorflow.org/api_docs howd you even get a ML internship without kn...

well i have worked with csv but not json

elfin atlas Sep 24, 2021, 5:25 AM

#

json is even easier than csv

subtle barn Sep 24, 2021, 5:25 AM

#

btw i wont get paid or anything like that

elfin atlas Sep 24, 2021, 5:25 AM

#

anyways you can read through the tensorflow docs to get up to speed with the basics

#

or you can use keras for a more high level tf implentation

subtle barn Sep 24, 2021, 5:27 AM

#

elfin atlas or you can use keras for a more high level tf implentation

actually im not able to understand what kind of implementation to be done whether its prediction or classification. I have cleaned the json and set it into tabular columns

elfin atlas Sep 24, 2021, 5:29 AM

#

what data is given?

subtle barn Sep 24, 2021, 5:29 AM

#

i will send u the raw json link

subtle barn Sep 24, 2021, 5:29 AM

#

elfin atlas what data is given?

http://hapi.fhir.org/baseR4/AllergyIntolerance?_format=json its fhir dataset of allergy intollerance

subtle barn Sep 24, 2021, 5:45 AM

#

elfin atlas what data is given?

can u pls do an ml implementation and send me the notebook pls

royal crest Sep 24, 2021, 5:51 AM

#

uhhh

lapis sequoia Sep 24, 2021, 6:11 AM

#

subtle barn can u pls do an ml implementation and send me the notebook pls

bruh people are not gonna just do the projects and do stuff for you. if you want help we're happy to help but don't expect full fledged code written just for you.

#

and about ML implementation well for starters you can show some statistics, may be try to find attributes which matter most. and may be divide dataset into training and testing set and see how different supervised algos work out.

lapis sequoia Sep 24, 2021, 6:25 AM

#

subtle barn can u pls do an ml implementation and send me the notebook pls

I recommend a master’s degree in machine learning at university before you start doing anything. Processing important data without expertise leads to erroneous results. Wrong results lead to wrong decisions and it is detrimental to everyone. Do you take responsibility if someone makes you an incorrect machine learning model which lead to wrong decisions? A bit like a fake doctor would ask you to do surgery on your behalf.

arctic wedgeBOT Sep 24, 2021, 6:30 AM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1632465649:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

subtle barn Sep 24, 2021, 6:37 AM

#

lapis sequoia I recommend a master’s degree in machine learning at university before you start...

ok im sorry but the data is just a sample data obtained from public test server

lapis sequoia Sep 24, 2021, 7:29 AM

#

subtle barn ok im sorry but the data is just a sample data obtained from public test server

Do your own work that you will be asked to do. If you are applying for job you do not know, please bear your responsibility.

elfin atlas Sep 24, 2021, 7:32 AM

#

subtle barn can u pls do an ml implementation and send me the notebook pls

why would i do your internship for you? dont apply for something if you cant do it. tell your employer thats it beyond your reach.

#

you cant just bodge your way through an internship

lapis sequoia Sep 24, 2021, 7:33 AM

#

For me it seems that you have lied about your ability to build machine learning models and now you are here asking others to do it for you. Give us a break. I'm sorry, my intention is not to be rude, but if one can't even convert a json file e.g to Pandas DataFrame doesn't know even the simplest basics in the industry.

weary lynx Sep 24, 2021, 7:54 AM

#

o

subtle barn Sep 24, 2021, 7:56 AM

#

lapis sequoia For me it seems that you have lied about your ability to build machine learning ...

It's ok. I have converted the json to tabular columns from there I cant figure out whether the datas is for prediction or classification

lapis sequoia Sep 24, 2021, 7:58 AM

#

subtle barn It's ok. I have converted the json to tabular columns from there I cant figure o...

That's why also domain expertise is important.

subtle barn Sep 24, 2021, 7:59 AM

#

lapis sequoia That's why also domain expertise is important.

Btw its not that internship internship im just practicing myself

lapis sequoia Sep 24, 2021, 8:01 AM

#

You need to understand the problem before you can solve the problem. You also need to be able to examine the data to see what it is suitable for. Sometimes no relevant results are obtained from the data.

#

But good luck going forward! I have an appointment.

desert bear Sep 24, 2021, 8:42 AM

#

When I was testing ensemble model, I have stumbled upon a phrase verbosity. There is parameter that if set to True, "controls the verbosity of the building process."
Does anyone know what does it mean?

tender hearth Sep 24, 2021, 8:45 AM

#

desert bear When I was testing ensemble model, I have stumbled upon a phrase verbosity. Ther...

Share the exact function that accepts this parameter

#

we can't make assumptions

misty flint Sep 24, 2021, 9:30 AM

#

i think the best advice ive heard recently

#

is try to solve your problem WITHOUT ML first

#

blobhyperthink

#

even if you end up having to use ML in the end, your solution will usually be better for having thought from that perspective

#

~~also not every problem is a ML problem~~

#

RunFail

lapis sequoia Sep 24, 2021, 9:52 AM

#

subtle barn Btw its not that internship internship im just practicing myself

not sure what intership intership means but yeah we have nice resources in pins. you better learn stuff and then 'do ml' on that data lol. you can ask specific questions of course if you get stuck.

gaunt marsh Sep 24, 2021, 10:28 AM

#

I have a Pie chart which looks like this. I want a legend on the right side which shows me the RGB color values of the pies pieces. I have the color values in an Array, I give them to the chart already. How can I pass it to a legend?

rigid zodiac Sep 24, 2021, 11:00 AM

#

quick question, how do you split files inside a folder as train test split

#

like I'm trying to split it, since each csv file consider as a pattern

pure gull Sep 24, 2021, 11:06 AM

#

rigid zodiac quick question, how do you split files inside a folder as train test split

can't you just use sklearn's train test split on a list of filenames?

rigid zodiac Sep 24, 2021, 11:14 AM

#

pure gull can't you just use sklearn's train test split on a list of filenames?

what do you mean? like the one I have is a list of frames in a mmWave device

#

where each one of it was split into 80+ frames (10 second)

pure gull Sep 24, 2021, 11:15 AM

#

rigid zodiac what do you mean? like the one I have is a list of frames in a mmWave device

do you want to split your observations/frames or your files?

rigid zodiac Sep 24, 2021, 11:19 AM

#

I want to but the issue is 80+ frames (in 1 csv file) will be consider as a fall or nonfall, and I have like 320 csv file for nonfall and 200 csv file for fall

rigid zodiac Sep 24, 2021, 11:20 AM

#

pure gull do you want to split your observations/frames or your files?

So, the normal train test split is out of the way. Because it will mix between frames of fall and non fall

pure gull Sep 24, 2021, 11:22 AM

#

Ok so you just run the split on two lists. Fall and not fall. (There might be something called «stratified» to address this already)

rigid zodiac Sep 24, 2021, 11:24 AM

#

pure gull Ok so you just run the split on two lists. Fall and not fall. (There might be so...

but how can I do that with each of the csv file? Like I mention before, I have 200 csv file for fall in 1 folder, and 320 csv file for non fall in another folder.

pure gull Sep 24, 2021, 11:26 AM

#

rigid zodiac but how can I do that with each of the csv file? Like I mention before, I have 2...

I don't quite see what you want / are missing. I would personally use dask to make one large dataframe of all "Frames" / data you have, then make a column with "falling" bool True/False as its content. Then use stratify="falling" https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

#

Are the files so large that memory becomes an issue?

rigid zodiac Sep 24, 2021, 11:28 AM

#

pure gull I don't quite see what you want / are missing. I would personally use dask to ma...

Well it's not that large. but each of the csv file contain 10 second of fall or non fall for a particular position. that's why I cant just combine them into 1 large file (also because of the amount of frames is varies)

pure gull Sep 24, 2021, 11:29 AM

#

I would still construct a big dataframe of it all because you can then easily work with the data, loop over positions, fall/nonfall, ... use stratified sampling, etc

#

(but there are many ways to do this and this is just my prefrence)

#

My thinking is that when I am doing the analysis, I do not want to be concerned with which file the data is in, which format, ... and other issues. Convert up to a more abstract dataset so I can focus on the data, not on the storage formats, ...

rigid zodiac Sep 24, 2021, 11:33 AM

#

but when you split it. it will randomize, there will no longer be 10seconds frames or is it?

#

like I tried it before but it is not correct

rigid zodiac Sep 24, 2021, 11:42 AM

#

pure gull I would still construct a big dataframe of it all because you can then easily wo...

so your suggestion is to do it without shuffle?

serene scaffold Sep 24, 2021, 11:47 AM

#

reef wing Lol ignore the well thought out and experienced answer if you must. The marketpl...

Felipe only asked if it's possible to learn the math they want to learn via YouTube videos. They didn't ask about data science careers. Plenty of university students learn more from youtube videos than from their instructor.

pure gull Sep 24, 2021, 11:47 AM

#

yeah maybe but you shoul dbe able to do it quite easily yourself, too - it's hard to understand exactly what you are after here

velvet thorn Sep 24, 2021, 12:13 PM

#

reef wing Lol ignore the well thought out and experienced answer if you must. The marketpl...

🥴

prisma mulch Sep 24, 2021, 12:49 PM

#

can someone eli5 n_estimators in random forest regression?