from math import isclose
# Lists of length-2 lists or tuples
computational_spans = [ ... ]
manual_spans = [ ... ]
def span_percent_overlap(lo1, hi1, lo2, hi2):
# You can figure this part out :)
...
span_overlaps = {}
for comp_num, (comp_lo, comp_hi) in enumerate(computational_spans):
for manu_num, (manu_lo, manu_hi) in enumerate(manual_spans):
overlap = span_percent_overlap(comp_lo, comp_hi, manu_lo, manu_hi)
if not isclose(overlap, 0.0):
span_overlaps[comp_num, manu_num] = overlap
#data-science-and-ml
1 messages · Page 342 of 1
what does the lo1 hi1 part mean
variable names are up to you. this is just to demonstrate the use of enumerate and the nested loop
you need to figure out how to determine if 2 spans overlap
shouldnt this be based on if they overlap on time
yes, isn't that what the contents of the lists are?
|----|
|----|
|--------|
|---|
|---|
|----|
which spans overlap, and by how much? you need to write the code to do that part
it sounded like in your code the spans are represented as a pair of times, the start time and the stop time
which is a reasonable way to represent a time span!
yea
so given two start-stop pairs, how do you calculate the % overlap?
well there's time in between as well
i'm not sure what you mean by that
there is time = 0, time =100, but also there are times in between
i still don't know what you mean by that
you asked how to find the overlaps between two lists of time spans, i am showing you the rough sketch of how you might do that
i dont get how to tell if the 2 spans overlap
Hello!
fig = plt.Figure()
ax = fig.subplots(1)
plt.gcf().autofmt_xdate()
ax.set_facecolor('#1E1E1E')
fig.patch.set_facecolor('#1E1E1E')
x, y = [], []
async for document in db.main.stats.find({}):
x.append(datetime.fromtimestamp(document['time']))
y.append(document['total_users'])
ax.plot(x, y, color='orange', linestyle='-')
ax.grid(color='#323232', linewidth=1, linestyle='-')
ax.set_xlabel('Время')
ax.set_ylabel('Количество пользователей')
ax.tick_params(axis='both', colors='white')
ax.spines['top'].set_color('white')
ax.spines['bottom'].set_color('white')
ax.spines['left'].set_color('white')
ax.spines['right'].set_color('white')
ax.xaxis.label.set_color('white')
ax.yaxis.label.set_color('white')
ax.xaxis.set_major_locator(HourLocator(interval=2))
ax.xaxis.set_major_formatter(DateFormatter('%H:%M', tz=tz.gettz('Europe/Moscow')))
ax.yaxis.set_major_formatter(plt.FuncFormatter(format_tick))
plt.close('all')
fig.savefig('./data/stats.png', bbox_inches='tight', dpi=256)
I use matplotlib to create stats graph for my bot. And it executes every 15 minutes to update image. And every 15 minutes matplotlib uses some memory but doesn't release it. I tried to do plt.cla(), plt.close(), fig.clear() and fig.clf() but these has no effect.
I even asked this question in StackOverflow. https://stackoverflow.com/questions/69144822/releasing-memory-in-matplotlib.
Maybe somebody here had same problem.
Help please.
look at the "diagram" i drew - can you figure it out?
would it be an if statement if the bars overlap
yes, you'd probably want to use if at some point
you'll probably need the comparison operators like >, <=, etc. and the math operators like -
i highly suggest writing out your solution on paper first
do some test calculations by hand, to make sure it makes sense
then translate your solution to code
professionals do this all the time. it's a tool for thought
ok i will write it out
help please
Does anyone have a reddit dataset that I could use to train GPT 2?
https://www.reddit.com/r/datasets/ @olive shore
hello all
Hello, do you need help with something?
definately need help
having hard time making viz my default editor
cant install viz module
for editor-related stuff, perhaps enquire #editors-ides
thanks
:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1632201233:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
Hey can someone help me with ML stuff, I'm try to work with upsampling point clouds
Helli
My code
I tried
for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python', chunksize=100000 , iterator=True)):
sep_time = pd.DataFrame()
print('chunk no.:', i)
for n in chunk['Transaction Time']:
print('n:', n)
converted_timestamp = datetime.fromtimestamp(n /1_000_000_000)
print('converted:', converted_timestamp)
print()
get_month = converted_timestamp.month
get_day = converted_timestamp.day
#add 10 to year
year_add_10 = converted_timestamp.year+10
#hour
get_hour = converted_timestamp.hour
#minute
get_min = converted_timestamp.minute
#second
get_sec = converted_timestamp.second
# microsecond
get_microsec = converted_timestamp.microsecond
c = datetime(year_add_10, get_month, get_day, get_hour,get_min, get_sec, get_microsec)
print('datetime:', c)
print()
final_date_time = datetime.strftime(c, '%d-%m-%Y %H:%M:%S.%f')
print('final_date_time:',final_date_time)
print()
chunk['time_seprated'] = final_date_time
print('chunk1:')
print(chunk)
print() ```
this way
I am getting same value in last column
chunk1:
Msgtype Activity Type ... Qty time_seprated
0 Order N ... 1175 28-08-2021 14:45:08.107780
1 Order N ... 1175 28-08-2021 14:45:08.107780
2 Order N ... 500 28-08-2021 14:45:08.107780
3 Order N ... 1175 28-08-2021 14:45:08.107780
4 Order M ... 1175 28-08-2021 14:45:08.107780
... ... ... ... ...
99995 Order M ... 50 28-08-2021 14:45:08.107780
99996 Order M ... 25 28-08-2021 14:45:08.107780
99997 Order M ... 50 28-08-2021 14:45:08.107780
99998 Order M ... 25 28-08-2021 14:45:08.107780
99999 Order M ... 50 28-08-2021 14:45:08.107780```
Can anyone help me to understand why I am getting same value in last column
Ping me when replying
This looks exactly same as @dull turtle's .....
Yes
Can u please look info this?
help please
problems like this can come from multiple sources
if the graph is generated as expected then you could probably rule out your matplotlib code
it might actually be the way your async code is set up instead
yes, graph generates as expected
Can anyone help me in this?
Hey ! I have an issue with pandas (that i asked in #help-avocado and another channel before that)
should i copy it there or should i just wait ?
It is pretty blocking :/
Thanks in advance for the help !
o180_img10_patch_419_43.jpg,
o180_mask10_patch_419_43.jpg
I want to check if these strings match from pacth_419_43.jpg, i.e. I want to first find the word patch and print this with everything else to the right of it in the string, I have an idea of using some kind of regex but not sure how to implement it, any help will be much appreciated
Thank you. What is that 06.2f in return '(' + ', '.join(format(val, '06.2f') for val in rgb) + ')' doing?
I am getting TypeError: unsupported operand type(s) for +: 'int' and 'str' How can I fix this?
I use this code:
plt.axis('off')
plt.show()```
This is what rgb_labels look like
has anyone here dealt with SageMath before?
facing an issue which is described here: https://github.com/sagemath/sage-windows/issues/63
@dull turtle so not only did you refuse to do any work for yourself, but this also turned out to be a homework assignment? That's quite unethical. You are only cheating yourself by refusing to study and learn.
The sizes are supposed to be numbers... they're sizes after all!
Padding, the "06" means "pad with 0s up to 6 characters total", and the ".2" means "show exactly 2 decimal places"
@gaunt marsh use the counts for the sizes and the labels for the axis labels
!pastebin
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
thank you!
I need help with autopct for my pie chart. I have the absolute values as a list and want to pass them. The examples from stackoverflow didn't work
https://towardsdatascience.com/a-simple-cnn-multi-image-classifier-31c463324fa
Why does this article call it a CNN?
the model seems to be:
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(100, activation=keras.layers.LeakyReLU(alpha=0.3)))
model.add(Dropout(0.5))
model.add(Dense(50, activation=keras.layers.LeakyReLU(alpha=0.3)))
model.add(Dropout(0.3))
model.add(Dense(num_classes, activation=’softmax’))
which I don't see any convolutions in
No idea
I don't think transfer learning is taking place either - the pretrained VGG16 model is only used to generate the labels
Hello. I know basics of Python, and i wanna learn machine learning, AI and stuff. Can someone recommend me a good resource where i can learn 4 free?
Traceback (most recent call last):
File "F:\office codes\transaction time.py", line 52, in <module>
chunk['sep_time'] = sep_time
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
value = sanitize_index(value, self.index)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
raise ValueError(
ValueError: Length of values (1) does not match length of index (100000)```
my code ```python
final_date_time = datetime.strftime(c, '%d-%m-%Y %H:%M:%S.%f')
#New_Time.append(final_date_time)
sep_time.append(final_date_time)
print('final_date_time:', final_date_time)
print()
chunk['sep_time'] = sep_time
new_path = r'F:/output csv files/'
file_name = 'Stream-5 datetime extracted'
extension = '.csv'
chunk.to_csv(f'{new_path}{file_name}{extension}', index=False, mode='a', header=None)
print('data saved in extracted csv file')``` my code
how i can add output in column at end of datatframe
ping me when replying
@dull turtle you've repeatedly ignored and disregarded something like an hour of help and attention from multiple people here
And now you're re-asking the original question
I have two arrays:
massive_first = [{"sku": "1", "type": "car", "color": "yellow"}]
massive_second = [{"sku": "1", "price": "4000", "quentity": "8"}]
anyone knows how i can combine them into one array like it:
massive_third = [{"sku": "1", "type": "car", "color": "yellow", "price": "4000", "quentity": "8"]
but with for cycle for every itteration?
see actually i tried to solve my problem but i am getting above error
Maybe use a dataframe and do a join?
i stucked at this from yesterday , can u please look into this ?
@desert oar dataframe from pandas module?
so @dull turtle is this like small array or like big chunk of data?
if small array, i mean it should not be too hard IMO.
my data is of 5.5 gb
Yes. Are the two arrays in the same order?
also you mean to combine them assuming merge by indexes right?
so i am using chunksize for it
If both arrays are in the same order, you can just loop through pairs of elements. Otherwise you will need some kind of join
oh df may be bad way for it then, since it will take quite RAM.
I suggest simply using loops.
assuming by 5.5gb you mean its in some kind of file.
csv file
We helped them extensively with this yesterday. They are reading a data frame in chunks using the pandas read_csv chunking/iterator API
And the problem is that they are re-inserting the column at every iteration of a loop when they should be inserting it only once
Which we have explained over and over
oh i see. alright, I kinda used csv module in past when i had like this big data.
Yeah, I don't think pandas always had this functionality. I have done the same in the past, but now you can read in chunks in a tidy way
can u now look into my problem also
that is great!! I'll look upon chuck size in free time.
i mean i think salt has already said what you're doing wrong.
but yeah sure, i can try, what have you done so far?
Also you didn't say, the sequence is same in both cases?
@desert oarI have two arrays that I want to link using the same "sku" (article) keys, but the problem is that the second array contains some objects that are not in the first one, and I need to somehow compare and display them. Thought at first to make an object, but I need to iterate over two arrays at the same time. I read about Itertools.chains but it doesn't really help me much.
wait i share my code
oh i see, are they sorted tho?
my code here https://paste.pythondiscord.com/sizixeyiha.py @lapis sequoia
Chain is for concatenating, you need to work on pairs of elements
also please no DM. @dull turtle
@worldly lake I can't tag you because you have RTL in your name
do u get my code?
@lapis sequoiaSorted yes. I just have a list of products in the first array, and information about the price and quantity of this product in the second. And so it turns out that there is a product in the first array, but there is no price for it, which is why the first array of objects contains more than the second. And because of this, I cannot sort. Although I can unload sku data of goods and then comparing whether there are such unloading as I need. I seem to have solved my problem.
i would still using a dataframe here
implementing a join manually over lists would be slow and a lot more code than you need
wait you both have different questions?
@desert oar sorry for that
share a small example.
they did
# inputs
massive_first = [{"sku": "1", "type": "car", "color": "yellow"}]
massive_second = [{"sku": "1", "price": "4000", "quentity": "8"}]
# result
massive_third = [{"sku": "1", "type": "car", "color": "yellow", "price": "4000", "quentity": "8"}]
salt rock can we discuss again ? please
okay okay. hold on.
i am happy to help you, but please read what i tried to tell you yesterday. i even wrote multiple new versions of your code for you
i won't re-answer the same question over and over
@worldly lake doing this "manually" would entail do 2 loops: one over massive_first, and another over massive_second - basically comparing all pairs of elements
my data
you can speed this up by using some kind of index instead of just scanning the list. the simplest thing to do in python would be to build a hash table of SKU -> record
i tried your code but csv file in output has nothing inside it
comparing all may not be required if both of them are sorted, by something unique
that's true too
blank csv file i get yesterday
but again, doing this all in pandas would make this easier
yes @dull turtle, i said that this was untested code written partly by guessing, and that you need to double-check the documentation for these functions to see how to actually use it
set mode='a' in my code, try that
@lapis sequoia
https://pastebin.com/B3QxVqr0
Pastebin
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
ah thanks hold on.
@lapis sequoia first massive have object with sku 107 but second massive is not
"massive" would be like 107 million 🙂
and even then, that's pretty small for some systems
don't tell me the arrays you shared are array of jsons and not dicts :'(
doing that by hand would be a lot slower and a lot more verbose
you might have to change how= depending on the desired effect:
df_result = df1.join(df2, how='inner')
df_result = df1.join(df2, how='outer')
df_result = df1.join(df2, how='left')
df_result = df1.join(df2, how='right')
So basically throughout August and September I've been beside myself as I struggled to figure out how to build a model that can generate TTS for English and Japanese in one sentence
https://www.w3schools.com/sql/sql_join.asp @worldly lake see here for the different kinds of joins
I used Multi-Tacotron-Voice-Cloning for the most part
This is a GitHub repository designed to make it easy to do multiple languages cloned by swapping out the English and Russian in the dataset
It's divided into four portions: g2p, encoder, synthesiser, and vocoder
I spent a good deal of time successfully making a g2p collection for English and Japanese
The problem comes with everything after that
The instructions are set out plainly for me to use accordingly
Probably because there's little room for error if I do it right
you can also use pd.merge which is a slightly different interface (more concise in this particular case):
df1 = pd.DataFrame(massive1)
df2 = pd.DataFrame(massive2)
df_result = pd.merge(df1, df2, on='sku', how='outer')
It seems the main reason I can't make this model has to do with my choice for audio?
I struggled with the encoder step in terms of formatting and collecting English voices, but I got it to work
But then when I got to the synthesiser step I was stuck because I didn't have a metadata file
I don't know how I was expected to organise my data because it really hurt me in the process
I think I could overcome this if I figure out how to properly format any kind of dataset I have
But I don't really know how that would work because the author of the repository never addressed this
That's where I've been throughout September
I tried asking for help from other people that do voice cloning but they don't have a clue what to tell me
I'm running out of ideas and haven't figured out how to start over
@desert oar@lapis sequoia oh my godddddddddddd, thanks you both very much!!!!!
this is exactly what I needed!
can I express my gratitude to you?
can there be any section with reviews / reviews? hah: D
@desert oar hello i am using this code ```python
df_chunks = pd.read_csv(f'{path}{file_name}{extension}',
engine="python",
chunksize=100000,
iterator=True,
)
for i, chunk in enumerate(df_chunks):
print("chunk no.:", i)
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365)
chunk.insert(2, "time_separated", time_separated)
del chunk["Transaction Time"]
is_first_chunk = i == 0
write_header = is_first_chunk
write_append = not is_first_chunk
append=write_append
chunk.to_csv(filename_out, header=write_header, mode='a' )
chunk.to_csv(f'{new_path}{output_file_name}{extension}', header=write_header,mode='a' if write_append else 'w')```
can we discuss again ?
if you can post a few rows from your csv data, i can actually test this code
!paste use this site 👇
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
you only need to post the first 10 rows or something like that
also, @dull turtle is this actually a school assignment? if so, i will have to stop giving answers as per our rules, but i will at least try to debug what i already wrote
my data https://paste.pythondiscord.com/oyonajovac.apache check here
i am working on project , i stucked at this problem
do u get my data in pastebin link ?
yes, thank you
i noticed that another user posted very similar code and almost the exact same problem
are you collaborating with someone on it? are you using 2 different accounts?
were you given a template by your instructor?
not at all
see firstly i have to change my Transaction Time into humna readable time for mat
i have written script for it
if u check my code u will find
now i just want to add that output column at end of df
do u get my point what i am trying to do ?
yes, i understand. but i am trying to get more context for this
all thanks to salt heh, i was mere spectator in this case.
can u ask me what u want more? so u also get more idea?
what is the nature of the project? what kind of task were you given? are you going to just use my code and claim it as your own?
anyway, i tested my code and this worked as expected:
from datetime import timedelta
import pandas as pd
filename_in = "..."
filename_out = "..."
df_chunks = pd.read_csv(filename_in, sep='\t', chunksize=5, iterator=True)
for i, chunk in enumerate(df_chunks):
print(f"Chunk: {i}")
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
chunk.insert(2, "time_separated", time_separated)
is_first_chunk = i == 0
write_header = is_first_chunk
write_mode = 'w' if is_first_chunk else 'a'
chunk.to_csv(filename_out, header=write_header, mode=write_mode, index=False, sep='\t')
obviously that chunksize is very small, it was just for testing
This is the Github repository I was talking about
https://github.com/vlomme/Multi-Tacotron-Voice-Cloning
I basically used the Google Colab provided and rewrote it to be mounted to my drive and work from there with the instructions
I've gone through many issues environment-wise, but I believe the problem is entirely in making sure all my files are in place
Something that I haven't been able to understand
can i use this code by increasing chunksize to 100000
?
Traceback (most recent call last):
File "F:\office codes\transaction time seprateed code.py", line 46, in <module>
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'Transaction Time'``` this error i am getting @desert oar
i can't reproduce the error, works fine for me
i have used increasing chunksize to 100000 this
hi @desert oar i hope you are doing well, your code works well for string but for int or float its not working could you help when you are free?
i agree it works fine for u, but can u help me why i am getting error
i don't know why you're getting the error, because it works for me
simple traceback suggests that you may not have that column precisely for some chunk?
that seems odd
more realistically del did something, i've since removed that from my code
i don't remember what my code was. just post your code and some example data, and someone can help. i'm not the only person here who knows pandas, numpy, etc.
Part of the advantage of this channel is that lots of people with data science knowledge hang out here. salt rock lamp is very knowledgeable, but it's best to pose your questions as something that anyone can help you with.
i agree sorry @serene scaffold
lol its alright, anyways, you may want to reshare the question/issue.
or atleast put the link of question shared previously
see this data i have Transaction Time column
either you're doing something wrong with the data. or for that chunk that data is null in all cases(an assumption)
a simple way to debug would be to print the column names and/or see if you're deleting some columns or something
my code here ```python
df_chunks = pd.read_csv(f'{path}{file_name}{extension}', sep='\t', chunksize=100000, iterator=True)
for i, chunk in enumerate(df_chunks):
print(f"Chunk: {i}")
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
chunk.insert(2, "time_separated", time_separated)
is_first_chunk = i == 0
write_header = is_first_chunk
write_mode = 'w' if is_first_chunk else 'a'
chunk.to_csv(f'{new_path}{output_file_name}{extension}', header=write_header, mode=write_mode, index=False, sep='\t')
```
i am using salt rock lamp code
print(chunk.columns.tolist())
you can see columns when this breaks to see if Transaction Time exists over there.
['Msgtype,Activity Type,Transaction Time,Exchange,Token,Symbol,Buy/Sell,Buy Order Number,Sell Order Number,Price,Qty']```
I would also like to mention I was recommended this
https://github.com/deterministic-algorithms-lab/Cross-Lingual-Voice-Cloning
I was working with this (running into problems of missing files so I couldn't complete it properly) but I have doubts that this would be effective in constructing a voice cloning model for English-Japanese
is that the last output when it breaks?
for i, chunk in enumerate(df_chunks):
print(f"Chunk: {i}")
print(chunk.columns.tolist())
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
chunk.insert(2, "time_separated", time_separated)``` my code @lapis sequoia
I don't know if there are things in the code I need to rewrite for this to work because I haven't figured that out
But I started conflating this with the other code and now I'm a mess
Chunk: 0
['Msgtype,Activity Type,Transaction Time,Exchange,Token,Symbol,Buy/Sell,Buy Order Number,Sell Order Number,Price,Qty']``` error ```python
Traceback (most recent call last):
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Transaction Time'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:\office codes\transaction time seprateed code.py", line 46, in <module>
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'Transaction Time'```
not sure what is wrong then 😓
@lime hemlock you need seperate data frame for transcation time?
is the transaction time is in epoch?
yes
my code here https://paste.pythondiscord.com/ixuvakohem.py @quasi parcel
have you tried chunk['Transaction Time']
@quasi parcel my data here
yes but how i can append my data end of df
the output came from Transaction Time i have to add this in column at end of df
okay
chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transcation Time']]
this should do the trick
@dull turtle
what this wiil return ?
Hello
I need help to delete cells in excel file and shift the rest to left such as
The photo
I will delete any cell before NR and shift left , if the row haven't NR should be deleted and shift up
this will create a new colomn in the exsisting df and convert the epoch to date
let me try
Traceback (most recent call last):
File "F:\office codes\transaction time.py", line 46, in <module>
chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transaction Time']]
File "F:\office codes\transaction time.py", line 46, in <listcomp>
chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m%Y") for y in chunk['Transaction Time']]
OSError: [Errno 22] Invalid argument```
one moment
sure, ping me
chunk['Transcation time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transcation Time']]
can you show me the code after adding this line please
?
os error generally occur when there is an invalid file path
https://paste.pythondiscord.com/kukicokoli.py plz check code here
I just hope my issue isn't terribly hard to address without testing out the program yourself
Which might be why I never get a response to my issue
@burnt knot what was your issue again can you please give the link
chunk no.: 0
Traceback (most recent call last):
File "F:\office codes\transaction time.py", line 19, in <module>
chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transaction Time']]
File "F:\office codes\transaction time.py", line 19, in <listcomp>
chunk['Transaction time'] = [datetime.fromtimestamp(int(y)/1000).strftime("%d-%m-%Y") for y in chunk['Transaction Time']]
OSError: [Errno 22] Invalid argument```
file path is correct
sure
this is what they gave me (tab-separated)
Msgtype Activity Type Transaction Time Exchange Token Symbol Buy/Sell Buy Order Number Sell Order Number Price Qty
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE SELL 0 1.4E+15 473430 1175
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE BUY 1.4E+15 0 255595 1175
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE BUY 1.4E+15 0 255600 500
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE SELL 0 1.4E+15 607185 1175
Order M 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE BUY 1.4E+15 0 255605 1175
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE BUY 1.4E+15 0 255610 250
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE SELL 0 1.4E+15 473425 250
Order M 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE SELL 0 1.4E+15 473420 250
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE SELL 0 1.4E+15 607180 1175
Order N 1.31452E+18 NSEFO 39203 BANKNIFTY 02SEP21 31500.00 CE SELL 0 1.4E+15 473415 500
and this was the code i wrote, which worked
from datetime import timedelta
import pandas as pd
filename_in = "pd_chunks_1.csv"
filename_out = "pd_chunks_2.csv"
df_chunks = pd.read_csv(filename_in, sep='\t', chunksize=2, iterator=True)
for i, chunk in enumerate(df_chunks):
print(f"Chunk: {i}")
time_separated = pd.to_datetime(chunk["Transaction Time"]) + timedelta(days=365*10)
chunk.insert(2, "time_separated", time_separated)
is_first_chunk = i == 0
write_header = is_first_chunk
write_mode = 'w' if is_first_chunk else 'a'
chunk.to_csv(filename_out, header=write_header, mode=write_mode, index=False, sep='\t')
got it sir let me check
Hey @dull turtle!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
can i dm u my csv file, hwere i am not able to send file
sure
plz check
I was using a Github repository for weeks trying to turn an English-Russian voice cloning model into an English-Japanese one
I got stuck likely because my data was a mess
I was also recommended another Github repository but am doubtful about it because it seems attuned to one language rather than many, and I wouldn't know how to tweak it for more than one language
First link: https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/wiki/Training-(and-for-other-languages)
Any idea
@desert oar you code works like a charm, the only issue was @dull turtle was using comma seperated but you have you used tab seperated
@dull turtle https://paste.pythondiscord.com/iriguhobut.py this should work
i have changed sep='\t' to sep=','
it started working
@burnt knot what is wrong in first repo
?
could you please elaborate on the error
Okay
So the guy who made it listed the voice banks he used for his data for both English and Russian
I couldn't use his English voice bank because I had complications downloading it so I used my own, and I got my Japanese voice bank
The audio work starts in the encoder stage
It took me awhile to get anywhere with the audio but I eventually got it to train
But then I got to the synthesiser stage
I couldn't run the first command because it stated it needed a metadata.csv file, which I could not find
I take it that it's a graph with all the metadata or however that works
My point is I can't run the step without having the voice metadata file to give it
I don't know if it mattered on the kind of voices I was using--I'm using a new batch this time around
But I can't figure out how to work that synthesiser step
You want me to share it?
No I mean
Do you need the link to the Colab I used?
Or do you prefer to work on the hard drive?
Ask away
There isn't a queue, you can expect a response eventually
anything is fine with me
hey, please help.. I have code that is adding two SMA to data (simple moving average to data)
But its job was to every loop iteration add 2 columns of SMA, and start another loop with old data frame.( In the beginning of the loop I add df=data for that) - but for unknown reasons I got expanding column every iteration-( like it was being doing still on the same old data.)
import pandas as pd
def checkBestSMA(file, minSMA1, maxSMA1, minSMA2, maxSMA2):
maxSMA1 += 1
maxSMA2 += 1
data = pd.read_csv(file)
# read file .csv
data = data.iloc[:, [0, 5]]
# get only DATE and stock price
for i in range(minSMA1, maxSMA1):
for j in range(minSMA2, maxSMA2):
df = data
SMA1 = f"SMA{i}"
SMA2 = f"SMA{j}"
df[SMA1] = df['Adj Close'].rolling(i).mean()
df[SMA2] = df['Adj Close'].rolling(j).mean()
df = df.dropna()
print(df.head())
if __name__ == '__main__':
checkBestSMA('PKN.WA.csv', 20,50,40, 200)
Date Adj Close SMA20 SMA40
39 2010-02-25 22.858307 23.167923 24.311018
40 2010-02-26 23.519203 23.131645 24.289287
41 2010-03-01 23.914301 23.126617 24.251393
42 2010-03-02 24.711685 23.133801 24.210088
43 2010-03-03 24.783520 23.132005 24.168782
Date Adj Close SMA20 SMA40 SMA41
40 2010-02-26 23.519203 23.131645 24.289287 24.291705
41 2010-03-01 23.914301 23.126617 24.251393 24.280141
42 2010-03-02 24.711685 23.133801 24.210088 24.262620
43 2010-03-03 24.783520 23.132005 24.168782 24.224074
44 2010-03-04 24.855356 23.196657 24.130170 24.185527
Date Adj Close SMA20 SMA40 SMA41 SMA42
41 2010-03-01 23.914301 23.126617 24.251393 24.280141 24.282719
42 2010-03-02 24.711685 23.133801 24.210088 24.262620 24.290416
What do you mean? Do you need something?
Try now?
Date Adj Close SMA20 SMA40
39 2010-02-25 22.858307 23.167923 24.311018
40 2010-02-26 23.519203 23.131645 24.289287
Date Adj Close SMA20 SMA41
40 2010-02-26 23.519203 23.131645 24.291705
41 2010-03-01 23.914301 23.126617 24.280141
Date Adj Close SMA20 SMA42
40 2010-02-26 23.519203 23.131645 24.291705
41 2010-03-01 23.914301 23.126617 24.280141
......
@quasi parcel
I can send you .csv if you want
could you
For me, it is weird behavior of pandas.df
I wonder why this happen - ( or I have some bug)
can you share a screenshot on how you want the data to be
its kinda confusing sorry
ok, - you know its part of bigger project
so right now you're basically adding a lot of columns right?
and what behaviour do you want?
No this function should add odnly two columns, and then delete the dataframe
start with new one - so I loading df=data
at the top fo the block in loop
okay okay okay so you want to use new df everytime
DataFrame.copy(deep=True)```
Make a copy of this object’s indices and data.
When `deep=True` (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).
When `deep=False`, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).
try with df = data.copy()
great 😊 gni8
Could you gave me light why thats hapening?
sure. in a nutshell df=data means they both are referring to same object.
it's same for lists, dicts.
while
a=5
b=a
a+=1
will only change a
it is not same case with lists or dicts or np arrays or dfs
I'm on cellphone rn otherwise I'd give you an example.
correct @lapis sequoia sorry i was confused @modest timber
ok, np 😄
but in a nutshell they both are referring same address so changing something at that place will be like changing for both.
this is like we both have address of same house
so if you put a glass in that house, we both will see a new glass.
@lapis sequoia ok i preatty uderstand - its like pointer in c 😄
yeah exactly! it's nothing but pointer!!
you too😊
I think you need to reroute the folder
Did you clone it into your drive?
Have you used Colab before?
For work I need to create something that, once complete, can be put on a flash drive and run on a computer that doesn't have internet access. So part of that means making sure that the finished product doesn't make API calls, but it also means being able to either (a) download all the dependencies before building the environment or (b) building a portable environment. I'm not really sure how to do either because I always just make a venv and pip install the requirements.
pyinstaller or nuitka maybe
pyinstaller i think just bundles the entire runtime and all your deps into a single .exe
I don't want that. I should have been more specific: it's going to be a jupyter notebook
(not my decision)
I'm waiting to hear back from the project lead about all of that. I guess, could one build the wheel on a VM that's the same OS? Are wheels portable?
you can build a noarch wheel, but it won't help with deps
does your usb stick need to include python, jupyter, etc.? or just the deps for your particular notebook?
pyinstaller might still be the way to go, since jupyter is a python application after all
probably just the deps
so the target system will have python, ipykernel, and jupyter, but maybe not torch, pandas, matplotlib, etc?
you might want to ask in #tools-and-devops tbh, this sounds not-data-science-specific
it's data science if I'm the one doing it 
But I guess I need to figure out more about the requirements first
yeah it sounds like it
Hey everyone, I'm looking for help with geopandas, json, and folium. Is anyone able to help me on a project?
It's best to just put your question out there. Then someone can try to answer it.
Yes
Yes I did
you can kinda play around with !ls and !cd to get exact path. once you do that you can use !pwd to get absolute path which will help for next time.
any data scientist here?
i employ a lot of data science techniques in my work but my job description isn't data scientist
Yoo guys I have been learning python for about 6 weeks, made some guis with tkinter and used mysql databases to write data etc, you think I can move onto AI to do stuff like facial detection or is this too advanced for me and I should wait and learn more before doing this?
What's important is your proficiency in mathematics... the programming is just to implement an idea, and most data science / AI programming is procedural anyway
What standard should my maths be at?
@tender hearth So i'd need to know that to do AI?
You'd need to know that to understand how it works behind the scenes
even though libraries abstract a lot of it away from you it's still essential when gluing stuff together
damn, maths isn't my strong suit
It's alright, that's what we are here for 😆
If you have any questions feel free to ask
thank you
My matplotlib pie chart only shows one color
Hi, how on (or off!) earth do I get cuda/torch to play together on wintendo 10?
can anyone help me with this?
no code, no dice
def format_rgb(rgb):
# precision format for Float numbers to a string
return '(' + ', '.join(format(val, '06.2f') for val in rgb) + ')'
rgb_labels = rgb_df.apply(format_rgb, axis=1)
rgb_counts = rgb_labels.value_counts()
print(rgb_counts[0])
rgb_10 = []
hex_10 = []
for k in range(len(rgb_counts)):
if rgb_counts[k] >= 3:
rgb_10.append(rgb_counts[k])
hex_10.append(real_col[k])
def pct_to_absolute(value):
a = rgb_10
return a
plt.figure(figsize=(8, 6))
plt.pie(rgb_10, labels=hex_10, colors=hex_10, normalize=True, autopct=pct_to_absolute)
plt.show()```
- use a pastebin, gist, ...
- NameError: name 'ara' is not defined
sorry, next time pastebin... ara is a np.array, which looks like this:
[218.69241573 215.50561798 193.4508427 ]
[252.28142803 250.81048234 232.56323585]
...
[188.23099415 177.65497076 155.31067251]
[151.82307692 144.07802198 128.6978022 ]
[ 70.58236659 68.68097448 62.73085847]]```
it's good practice to prepare a standalone, runnable, minimal example before asking. This lets you figure out the problem more easily yourself, and people trying to help can do just that, instead of debugging how to set up the environment details etc
I can't run this without e.g. adding example data as ara
Could you guys help me collect some data for a data science project? https://forms.gle/TgLm8N5HwkFfMPbf7
How can I print all Values in the console without these ..?
Hi guys! I am stuck in an ML project... anyone that can guide me through?
I'm actually stuck in an assignment for kNN
Have it due in some days
But I am not able to resize the data
It's a number recognition file in a .mat format and it includes 8s and 9s
I need to plot testing error rate vs k
Please dont judge, im new to ML
from scipy.io import loadmat
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np
import seaborn as sbn
from sklearn.neighbors import KNeighborsClassifier
dataset = loadmat(r"C:\Users\Kushal\Downloads\NumberRecognition.mat")
dataset.keys()
zeros = dataset["imageArrayTraining0"]
eights = dataset["imageArrayTraining8"]
nines = dataset["imageArrayTraining9"]
img = nines[:,:,1]
#plt.imshow(img, cmap="Greys")
#creating labels
y_eights = np.zeros(eights.shape[-1])
y_nines = np.ones(nines.shape[-1])
np.concatenate([eights,nines], axis = 2).shape
np.transpose([eights,nines]).shape
Need to reshape the data next and then get X and y
after that need to train and predict the model
But stuck here as i dont know what to do next
what do need to rehsape it to?
also , np. returns a new array
if want to use it new modify array, need to assign a variable to it
otherwise np.concatenate will do nothing since it not being used
can also find lots of tutorials on reshaping
Nobody has any context for this, I know what it is because I wrote the code that created it, but you will need to be more specific when asking this kind of question in general
@gaunt marsh that's a pandas Series you are looking at
Yes, but shouldn‘t print() give out all values?
@gaunt marsh numpy and pandas data structures will truncate what gets displayed, but one can override this
The reason being that the size of the structures can be exceptionally large
Try setting pd.options.display.max_rows to None, or looping over idx, val in series.items() and printing each accordingly
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
I wonder why @arctic wedge didn't do its thing and tell you to put it in a codeblock already
thanks!
!docs pandas.Series.to_string
Series.to_string(buf=None, na_rep='NaN', float_format=None, header=True, index=True, length=False, dtype=False, name=False, max_rows=None, min_rows=None)```
Render a string representation of the Series.
I have another question. Is it possible, to write Python code which does something like: every 10th textfile should go in directory x
I would probably use that
that means that the next 10 files should go in directory y and so on
It is!
you can use os.walk, I guess
I usually use pathlib
I guess that I need two loops. One which goes through all dirs and one which goes through all files, which shall be moved, right?
you can make a flat list of all the files and then use chunked https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked
you'd have to pip install more-itertools but that's easy
A 2d matrix so its right now its a 3d matrix and i want to reshape it to a 2d matrix
you have to be careful about how you reshape things so that you don't mess up the association between axes
Lol and I dont know what to do after reshaping
what is the current shape and what is the desired shape?
Use a single loop with enumerate checking if the index is % 10
I considered that solution as well. they'd probably need another iterator of directory names that they could advance with next
oh, i misunderstood the original question
@gaunt marsh note that this isn't a #data-science-and-ml and you can/should feel comfortable asking in a help channel, see #❓|how-to-get-help
import os
dirs = ['a', 'b', 'c']
files = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']
iter_dirs = iter(dirnames)
for i, f in enumerate(files):
if i % 3 == 0:
d = next(iter_dirs)
os.rename(f, os.path.join(d, f))
something like that
normally i'd recommend using pathlib.Path.glob in real life. maybe also you want the directory list to cycle forever?
from itertools import cycle
from pathlib import Path
dirs = [Path('a'), Path('b'), Path('c')]
paths = Path('source-files').glob('*.txt')
for d, p in zip(cycle(dirs), paths):
p.rename(d / p.name)
What I want to do is to split all my 13.754 textfiles on subfolders "folder01, folder02 ...". The split should happen all 600 files
The folders are already created via cat folder.txt | xargs mkdir
folder.txt contains the names of the folders
@gaunt marsh might want to have a look at zmv if you're on mac/linux; it is great for stuff like this
do try with zmv -n to see what will be the result
current shape is (750, 28, 28,) desired is (1500, 768)
if im transposing, it becomes 750,28,28,2 idk why
i am getting this error ```python
Traceback (most recent call last):
File "F:\office codes\resampling code.py", line 19, in <module>
chunk['Price'].resample('1T').sum()
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 957, in f
return self._downsample(_method, min_count=min_count)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1053, in _downsample
self._set_binner()
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 195, in _set_binner
self.binner, self.grouper = self._get_binner()
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 202, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1042, in _get_binner_for_time
return self.groupby._get_time_bins(self.ax)
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1499, in _get_time_bins
first, last = _get_timestamp_range_edges(
File "C:\Users\shubh\anaconda3\lib\site-packages\pandas\core\resample.py", line 1746, in _get_timestamp_range_edges
index_tz = first.tz
AttributeError: 'NaTType' object has no attribute 'tz'```
my code ```python
for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python', chunksize=100000 , iterator=True)):
chunk['Transaction Time'] = pd.to_datetime(chunk['Transaction Time'], errors='coerce')
chunk = pd.DataFrame(chunk).set_index('Transaction Time')
chunk['Price'].resample('1T').sum()
print('chunk..')
print(chunk)
print()```
looks to me like you have mixed a non-timestamp value into a dataframe where you're acting on it like it is a timestamp
see python chunk.. Msgtype Activity Type ... Price Qty Transaction Time ... 2011-08-28 09:15:02.006138097 Order N ... 473430 1175 2011-08-28 09:15:02.707897899 Order N ... 255595 1175 2011-08-28 09:15:03.373856246 Order N ... 255600 500 2011-08-28 09:15:04.159525439 Order N ... 607185 1175 2011-08-28 09:15:05.213452151 Order M ... 255605 1175 ... ... ... ... ... 2011-08-28 09:15:14.243840962 Order M ... 21815 50 2011-08-28 09:15:14.243913774 Order M ... 21820 25 2011-08-28 09:15:14.243978819 Order M ... 21825 50 2011-08-28 09:15:14.244046696 Order M ... 21830 25 2011-08-28 09:15:14.244110503 Order M ... 21835 50 my dataframe
is this yet another person working on the same program?
no i am only
or did you change your username @dull turtle ? are you daniel?
yes i am
ok, thanks. sorry
now i am working on this issue
!e Whatever you reshape an array to, the product of the two shapes has to be the same.
from math import prod
print(prod((750, 28, 28)) == prod((1500, 768)))
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
False
How can I iterate through a numpy array? I am getting TypeError: 'numpy.int64' object is not iterable
Very good question baby
You aren't iterating over an array to begin with; you have a scalar value of an int NOT an array of int
TLDR: Whatever is in that value isnt an array. @gaunt marsh and PS: baby freeza has horns I think; your profile picture is a lie!
thank you that helped me!
I have one last question. This is my Array: [11, 7, 7, 7, 6, 6, 5, 5]
I give this to a pie chart and it prints this:
This is wrong because every value in that array belongs to one part of the pie
instead it prints all values to all parts
Do you have any Idea how to make it plot the right value to each part?
@reef wing this is my favorite xbox 360 avatar 😛
So your array is good now; BUT your plotting method is bad; the only way for me to fix it is to see the plotting code.
quick question how do you split the file inside the folder into test-train and validation
you don't split the file normally, you split the data into two separate variables
you can split the file also; but that would require reading in one file and creating two separate files
I tried to reason with my Senior guy, he said I can do it and he show me the folder that he did
it was in npy file
yeah if they are paying you to split a file then split the dang file lol
manually?
I recommend reading it with the numpy module np.load()
Pastebin
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
rgb_10 outputs [11, 7, 7, 7, 6, 6, 5, 5]
I think the common way to save or write to a file is np.save("filename.npy")
so
data = np.load("input.npy")
train, test = some_split_logic_you_invent(data)
with open('train.npy', 'wb') as f:
np.save(f, train)
with open('test.npy', 'wb') as f:
np.save(f, test)
I did 50% of the work there you just need to make that function to split and return two numpy arrays
woops I was wrong: https://numpy.org/doc/stable/reference/generated/numpy.save.html
sklearn has a train test split function
having issues loading an xgboost booster model saved from R into python and getting consistent predictions. .. any caveats i might need to take into consideration?
Give me a second as my current virtualenv doesnt have matplotlib to figure it out @gaunt marsh
AH! I had that problem last year DRE
That's exactly why we had to move back to SKLEARN
are you storing into a pickle file as I did?
https://xgboost.readthedocs.io/en/latest/faq.html
Slightly different result between runs¶
This could happen, due to non-determinism in floating point summation order and multi-threading. Though the general accuracy will usually remain the same.
This can happen even if you don't store the file but pickles definitely exacerbated the problem
no, .. saved from R. .. xgb.save('model.model') ... then loading in python
Yeah; i guess storing it as an object regardless of the method right now causes this to happen
the seed should stay the same within the same run; it's just that it seems to be set differently when running outside the same script instance
so no fix? i need to continue using our model in R?
It may be different even if you run it again in R
ah
the problem is if you get a consistent model like the Gradient booster in SKLEARN its like 10X slower
no, i set the seed in r (23) and am able to get consistent predictions
Nice perfect; just double test it with that config of seed to make sure though;
i tried exporting the config from R as well and loading that in python while loading the model, still crappy predictions
I had that problem and I actually had misinterpreted the output
sometimes the output array is flipped
i'm good with r, but would prefer python due to other requirements. .. namely automation and shap
really? flipped? ... i'll check on that
yeah lmao I was classifying my customers as Assasins instead of Enjoying my product on text classification for like a whole day; did not end well. Hence the support team contacting people that actually loved the product lol
lol
Thank god I was the only tech person that understood what was going on and I didn't get in trouble
i'm working on attrition, but yeah.. flipped results will definitely be noticed here
ok 🙂 I use virtualenv's as well when I code with PyCharm
yeah the only way I did it is by using plt.pie(x=data_array, labels=label_array)
where data_array elements and label_array elements are uni-dimensional and each element corresponds with each other
that method wont show the numbers inside the piechart; but has correct labels
It doesn't matter if the correct value for each slice is in the pie or as a label. I still didn't get how to have the correct value for each part of the pie and not the whole array...
the data_array is rgb_10 for me which is [11, 7, 7, 7, 6, 6, 5, 5]
Where does real_col come from?
also people will nag at you when your coding for money about your loop
rgb_10 = []
hex_10 = []
for k in range(len(rgb_counts)):
if rgb_counts[k] >= 5:
rgb_10.append(rgb_counts[k])
hex_10.append(real_col[k])
rgb_10 = []
hex_10 = []
for count, col in zip(rgb_counts, real_col):
if count >= 5:
rgb_10.append(count)
hex_10.append(col)
@reef wing no go on the flipped output array. ... is there somewhere i should be setting the seed outside of np.random.seed? does the model not save that as well?
Tbh I think it's what I stated before; when saving the object into your file system it kinda deletes the local variables in the methods of the XGBoost object when you load it
regardless of the language you use to boot up the model
But it was a year ago
bummer, did you find out how to port a specific model over? ... i tried pickling via reticulate in r but get a 'cannot pickle pycapsule object' error
doesn't look like dill supports that either
yeah for models that had to share with other people I used sklearn and it was consistent
for my own models I winged it and used xgboost
so sklearn implementation of xgboost is less accurate?
sklearn implementation is more consistent when storing the model into hard disk; but slower in training stage
gotcha
is it worth training in xgboost native and pickling to sklearn to save and distribute?
I don't think there is a transformer for XGboost to sklearn model tbh
Not 100% certain
oh, .. my mistake. .. was thinking the XGBRegressor could use booster
Depends on what you are talking about; but I think the "boost" is using stochastic methods instead of deterministic methods to train
hence the randomness; but that randomness boosts the speed of training
@gaunt marsh
Did u fix ur problem freeza?
autopct is what helps you label inside the pie
hey i made some code logic to go in the def you gave me, can you check it?
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
'''py
total_count = 0
total_frames = 0
overlap_count = 0
overlap_frames = 0
for i in manuals_spans:
total_count += 1
total frames += (i[1]-i[0])
check = 0
for j in comp_spans:
if (j[0] > i[0] and j[0] < i[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (i[1] - j[0])
elif (i[0] > j[0] and i[0] < j[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (j[1] - i[0])
percent_called = (float(overlap_count)/total_count)*100
percent_overlap = (float(overlap_frames )/total_frames)*100
print(percent_called, percent_overlap)
'''
Thanks!
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle.```
It is normally used to estimate a parameter in a statistical formula
Like the MLE method or some other method like method of moments
Except MLE and method of moments use pure mathematical solutions
it's in essence an old version of the reinforcement machine learning technique
hey yall, anyone know if conda tends to work ok and not mess up the rest of a system if I simply want to use strictly conda or strictly not-conda for each different project? years ago it used to trample stuff or something. (thread continued from #help-cheese)
Yes
Stick to whatever you already know and use well
Dont invent hypothetical problems just be productive until you actually face a problem at this point the value isnt the software for data science its the applications of it
wait, @bitter fiber, that answer is slightly surprising as a response to my question so I'm confused if it was to me. you're in fact saying "yeah, it still tramples stuff, avoid conda unless it's in docker for someone else's project" basically?
Oh no lol i mean use conda if thats what you are already using
Trampling stuff no but yes on 2nd point
Imho i just use normal python and i’ve done ok
to clarify, a lot of packages I've seen recently have been saying "hard to install without conda", so I want to see what these things are that are mostly conda-only, and try them out a bit. but to do that, I want to make sure if I try conda, it won't mess with my existing python installations (system, and on some computers, pyenv)
eg homebrew messes other pythons up
This is more of an operating system problem; as I would use whereis python or python --version and not a true data science problem; I recommend asking in another channel as they can help you more. I have trouble with this too; which is why I have so many computers and am conservative with what I download to not increase stress.
also note that every environment will use a separate pip and python combination; hence there will be no problems if maintaining a project within an environment.
I’m not sure if this is something appropriate for this channel, or perhaps #async-and-concurrency instead, but I was wondering if anyone here had insight on the best way to train multiple RL agents “together”? By together though, I just mean training like multiple Q-learning agents with different hyper parameters at the same (or near the same) time
Do the two RL agents communicate?
If they don't I would use the below instead of x*x train the model in the function
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
wierd i cant design my code
Any recommendations for APIs or free solutions to easily get the complete paginated search results from a google query search?
i wish
Stack Overflow
Google Web Search API has been deprecated and replaced with Custom Search API (see http://code.google.com/apis/websearch/).
I wanted to search the whole web but it looks like with the new API only
Google Custom Search gives you 100 queries per day for free.
After that you pay $5 per 1000 queries.
There is a maximum of 10,000 queries per day.```
That doesn't sound like extortion. Thanks!
They wouldn’t have to interact or share any info between themselves, something like what you shared was what I was thinking but I wasn’t sure if there was something more elegant for training ML/RL algorithms
You can pass a parameter to most machine learning modules that increase the number of processors you want to use in the model
but if you're doing reinforcement learning that's likely not the case; as you are developing it yourself most of the time.
from multiprocessing import Pool
def f(x):
return x*x
if name == 'main':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))```
I added color to the code to make it more elegant 😄
Yea my goal is to build everything ground up but train different agents in parallel to do hyperparameter searching
I love the notion of reinforcement learning. Just let the computer figure it out, sounds like super high tech but so simple lol.
That's why I'm trying to play with it lol, my goal is to go through the Richard & Sutton book and replicate their graphs before moving onto some more complex/modern approaches
The whole "figure it out yourself" is much more interesting to me than "here's a million pictures, find the dog" lol
Though I know that's an oversimplification
I want to do it for chess tbh
or checkers; since it's simpler
Simply because I want to make and practice making new dashboards in streamlit
Hey @quasi parcel!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
Thank god good bot
oo @daring blaze someone made a checkers with rest api connection so you can play with your code
we're tyring to plot and calculate the integral with the monte ccarlo method
can you maby help?
if you are avalible?
I think you should ask direct questions of where you are stuck and not ask for someone to help with the whole thing as we aren't getting paid
we're stuck at getting the dots to work
ehm
this is the code rn
and the assignment is plot the graph,make the points outside the graph red and inside the graph green
and watch out when the graph goes below the y axis because then the values get negative
that's we're were stuck rn
because the rest is working
I am still not sure which specific part you are stuck :/
no worries ❤️
i explained it worng probs
ehm
when the values go below the y axis
*x axis
the code doesnt hold
and yeah the other point
on the picture
to check if a value went below the x axis; you have to check if that value is greater than 0
ehm
hmm
below the x axis
are negative yvalues
i got it a bit
#for y_values >= 0:
if random_y_value <= y_values:
green_dots_y_value.append(random_y_value)
green_dots_x_value.append(random_x_value)
counter_points_inside_graph += 1
else:
red_dots_y_value.append(random_y_value)
red_dots_x_value.append(random_x_value)
counter_points_outside_graph += 1```
https://paste.pythondiscord.com/toquxujeco.apache
the above link is the data, i have a problem here with my code in which the product_ids are not fully getting fetched,
i mean even thought the product ids are there in the data column after running through this code which will separate and
keep in a list its not working for this case
this is the code https://paste.pythondiscord.com/bigujivewa.py
i am not understanding how to solve
this is the out put i am getting
but this is the expected out
i am not understanding why it is happening like htis
Is anyone familiar with the fashion MNIST data set?
@quasi parcel the first problem is that you are not using the methods that pandas has for dataframes
also i think you need to wrap your regex thing with parenthesis before using the plus sign
so you can capture more than one thing
@silver sun a lot are , afterall this is ai channel
res = re.findall("'Product_Id': '[(0-9)+]'", str((r))) note the plus inside the brackets idk how one would handle comma delimited in re module but I wouldn't even use re module to begin with for this problem.
It be better if ask the problem are facing rather than asking to ask
Yeah I also prefer downloading the model already trained from the MNIST site http://yann.lecun.com/exdb/mnist/
Instead of training one yourself; unless you are learning how to train models ( It's out there just cant find it right now )
Yea sorry about that. I am trying to display the first 5 images for each class in my 10 x 5 grid but I'm stuck.
fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(10,5))
for i in range(5):
image = X_train[i]
label = Y_train[i]
label_name = label_names[label]
axs[i].imshow(image, cmap='gray') # imshow renders a 2D grid
axs[i].set_title(label_name)
axs[i].axis('off')
plt.show()
What Happens
are you displaying anything at all at this point?
When run code
So far I have just figured out to print out the first 5 pictures
you hardcoded 5 in many places
So that may be it; try settings num_images = 10 on top and putting num_images instead of 5 hardcoded
You seem to have reached your own goal after re-reading everything you wrote lol
I am trying to display the first 5 images for each class then So far I have just figured out to print out the first 5 pictures
This is what I have so far. I wanted to see the first 5 of each class. Like 5 tshirts, 5 pullovers, etc.
could you please suggest a better way for this @reef wing
apart from regex
@reef wing curious your take on rstudio vs any other ide? ala vscode. .... any advantages to leaving rstudio?
after doing your change @reef wing the list is empty
just an observation for less data the code is working fine
can anyone help me on how to get all products_ids for each customer from this data column and keep it in a separate column https://paste.pythondiscord.com/toquxujeco.py
apart from using regex
the above like is the data
please help me
the out to be expected is this
how that even work
Each element of the series product_id is a list
the expected output should in a list
okay sure
I'll DM you
hello, i would like to please ask for Natural lang processing, is it better to first tokenize then do text cleaning or the other way around?
usually i do collect raw data then text cleaning then tokenize
does matplotlib take html values for color when ploting
depends on your ram constraints I have 256 GB of ram for NLP and I clean out using PCA in my grid search.
Also note that you should change your tokenizer to convert to int32 or int16 to reduce ram consumption * <- this took me years to learn haha.
@prime hearth
oh ok thx
can someone eli5 random_state
Is this ai?
yeah
Well
thx
Throws a lot of terminology at you. Be warned
i am facing some trouble with cosine similarities and weight-rating into pivot table
Share it
yes two mins sharing the csv
i am copying
it
two mins
https://paste.pythondiscord.com/uhuzaxewuh.csv
https://paste.pythondiscord.com/egamurezol.csv
https://paste.pythondiscord.com/ulocucuton.py
here are the csv for these i have the pivot table already but i need to get a cosine similarity and weight-rating and create another combine matrix
first need to find weightage after that cosine similarities
please anyone help
I want to write a list into a textfile with this Code:
for row in rgb_10:
np.savetxt(value_file, row)
value_file.close()```
But it says `ValueError: Expected 1D or 2D array, got 0D array instead`
This is what my list looks like
can anyone help please
:incoming_envelope: :ok_hand: applied mute to @errant hull until <t:1632390001:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
!d sklearn.metrics.pairwise.cosine_similarity
sklearn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True)```
Compute cosine similarity between samples in X and Y.
Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y:
> K(X, Y) = <X, Y> / (||X||*||Y||)
>
> On L2-normalized data, this function is equivalent to linear\_kernel.
Read more in the [User Guide](https://scikit-learn.org/stable/modules/metrics.html#cosine-similarity).
not too sure what weightage is
got it
got it
@royal crest
thank you
and can you tell how to add two different size martix using numpy
pretty sure you need two matrices of the same dimension in order to perform matrix addition.
yes but is there anyway we can achive that
can't you just use +
!e
import numpy as np
a = np.array([[3, 6, 9], [5, -10, 15], [-7, 14, 21]])
b = np.array([[9, -18, 27], [11, 22, 33], [13, -26, 39]])
print(a + b)
@royal crest :white_check_mark: Your eval job has completed with return code 0.
001 | [[ 12 -12 36]
002 | [ 16 12 48]
003 | [ 6 -12 60]]
i don't know any fancier way to add matrices, perhaps someone else can chime in
maybe convert your list into a numpy array then write it?
!e
import numpy as np
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(f'{a}\n{type(a)}')
b = np.array(a)
print(f'{b}\n{type(b)}\nDimensions:{b.ndim}')
@royal crest :white_check_mark: Your eval job has completed with return code 0.
001 | [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
002 | <class 'list'>
003 | [[1 2 3]
004 | [4 5 6]
005 | [7 8 9]]
006 | <class 'numpy.ndarray'>
007 | Dimensions:2
got it
quick question, do we have some sort of recurrent KNN?
still doesn't work :/ my Array is now [43 19 16 15 14 12 12 12 12 11 11 11 11 10 10 10 10 10 10]and it says ValueError: Expected 1D or 2D array, got 0D array instead
What would that achieve
Like i'm trying to train a model to detect whether it fall or no fall, and my junior was like you can use recurrent knn.
I think he meant RNN but he insist it recurrent KNN
I mean
KNN is an iterative process, maybe he meant that
although he could've just said KNN
that's what I thought. like in grad school they didnt mention about recurrent KNN... so idk
In his word, he was like you can use initial set of data to create KNN model, then keep training KNN... like epoch
can anyone tell how to get the weightage for each product from pivot table
@quasi parcel what is weightage
so for each customer there is a product assigned need to find the weightage for each product for every occuracnce give me some time i will explain properly @serene scaffold sir
yes
Can you give an example of the data?
Start with the assumption that you can solve something without making a lambda.
Please provide it as text and ping me when you have it.
any one know how to categorize or combine frames in millisecond into second?
Hey guys, last week my boss proposed me to create an AI to form processing, but I am new to AI and ML.
I started my research on python library and found PyTorch and TensorFlow.
I want to study more about them, but I am lost where I start.
Can anyone help me?
From Wikipedia: "Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format." Why you would need to build AI, when this can be done really simply by little coding?
Read new books and studies. The more you read, the more ideas you get. A good place to get ideas with good keywords: https://ieeexplore.ieee.org/Xplore/home.jsp
really sorry there was some prod issue i had to address
this is the data
@serene scaffold
and you want the weighted mean of what?
The problem is that our forms are not structured
We already use a tool to capture the data, but it is manual because there are some fields that need optical recognition.
for the product id occurance for customer
what determines the weight?
here's a example, we need to recognize that the option 'Diariamente' from field 'Bebidas Alcoolicas' are selected, and do this for all olther fields
I don't think I can look into this rn but you can get weighted mean with np.repeat and np.mean
Also, these fields and other different fields can be anywhere in the document with different alignments
let me check thank you sir @serene scaffold
Does anyone have any experience using numpy and folium/heatmap? I've been trying to aska question for a few days but we havent gotten anywhere
seaborn has a pretty heatmap
sns.heatmap(df, annot=True)
Never done folium though.
hey if anyone here is able to help with a question come to helproom burrito
Okay, kind of Optical Character Recognition. There are definitely ready-made OCR models for it. All you need is good training data and some study.
can anyone help me implement a way to stop getting rate limited using twitter api
https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/
Hey ... I'm not exactly great at math but trying to translate this to python
Problem: I'm not getting the result shown on this website
not exactly what it's supposed to look like
Here is what my code currently looks like:
cov_matrix = np.cov(points_for_cov)
eigenvalue, eigenvector = np.linalg.eig(cov_matrix)
center_x = np.average(all_points_x)
center_y = np.average(all_points_y)
So I'm simply using numpy to calculate the covariance matrix from my point list
and then linalg.eig for the eigenvalue and vector
The rotation for the ellipse I'm calculating like this
math.atan2(eigenvector[0][1], eigenvector[0][0])
Any idea where I could be going wrong ?
nice graph
the blue one is supposed to look like that, no worries.
the black lines are supposed to visualize the rotation and width of those ellipses above
you want elipses instead of lines?
sort of. check the article above I linked
DataScience+
This article is showing a geometric and intuitive explanation of the covariance matrix and the way it describes the shape of a data set. We will describe the geometric relationship of the covariance matrix with the use of linear transformations and eigendecomposition. Introduction Before we get started, we shall take a quick look at the […]
also this
problem is, the black lines don't correctly fit my data
I can't pinpoint where I'm going wrong
I think the black line should more look like the red one
ellipses are just a different visualization of the black lines
What yall up to?
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
manual_spans = []
manual_spans.append(plt.axvspan(242068, 249530, 0.5, 1, color='g',alpha=0.5))
manual_spans.append(plt.axvspan(242055, 249503, 0.5, 1, color='g',alpha=0.5))
total_count = 0
total_frames = 0
overlap_count = 0
overlap_frames = 0
for i in manual_spans:
total_count += 1
total frames += (i[1]-i[0])
check = 0
for j in ranges_omega:
if (j[0] > i[0] and j[0] < i[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (i[1] - j[0])
elif (i[0] > j[0] and i[0] < j[1]):
if check == 0:
overlap_count += 1
check = 1
overlap_frames += (j[1] - i[0])
percent_called = (float(overlap_count)/total_count)*100
percent_overlap = (float(overlap_frames )/total_frames)*100
print(percent_called, percent_overlap)
so im trying to just find the percent overlap between manual_calls and ranges_reversal
and need some help checking this code
File "<ipython-input-121-551078431366>", line 9
total frames += (i[1]-i[0])
^
SyntaxError: invalid syntax
TypeError Traceback (most recent call last)
<ipython-input-122-cb481450f9a5> in <module>()
7 for i in manual_spans:
8 total_count += 1
----> 9 total_frames += (i[1]-i[0])
10 check = 0
11 for j in ranges_omega:
TypeError: 'Polygon' object is not subscriptable
anyone know how to fix this error?
@tribal arrow i want to find time overlap
on the x axis
ah alright
Anyway, take the maximum and minimum value on the x-axis of the zigzag line (your sample)
and also the minimum and maximum value on the x-axis of the big rectangle
the difference, of each, between their minimum and maximum is their size
divide the rectangle size by the other and multiply the result by 100
should give you the percentage overlap
unless I'm making some elementary mistake
how do i get pd.read_csv to read a uint64 col? tried pd.read_csv(fname, squeeze = True, usecols = [4], names = ['time'], dtype = { 'time': 'uint64' }, engine = 'c'), but it reads it as float64 instead
a typical entry in this field looks like this 1598918686328, which doesnt fit in 32 bits
Pandas dtype is int64? NumPy type has uint64
Hey guys I have a quick question, how can you categorize entire csv file?
I'm trying to categorize a csv, and the compare with other
I would use Pandas
how? I know I can add a categorical column... but it's not that
Because i'm trying to use ML to identify pattern
ive tried it with np.uint64 also, no luck. ended up using pyarrows read_csv instead
Pandas doesn't have that dtype and converts it to float? Pandas has int64. Or is there NaN values?
I have a jupyter server running on a remote machine. how can I access the jupyter GUI for that server locally?
it gave me a url I can go to but obviously I can't just copy and paste that url into a local browser.
ssh?
what about ssh?
I don't know much about networking myself but apparently you can use ngrok to make some sort of tunnel and use it remotely
@serene scaffold just use something like ColabCode if you are too lazy. I use it all the time
hi @serene scaffold remote server means is it sagemake or aws instance or how it is
can you tell me\
i think we can get throught apache
@grave frost @quasi parcel I think I figured it out but it's specific to this system.
okay awesome sir @serene scaffold
can anyone help me with not getting rate limited using twitter api?
keywords_tweets = tweepy.Cursor(self.api.search, keyword, lang = lang, tweet_mode = 'extended').items(count)
return keywords_tweets``` is my main function
im guessing its a problem with using the cursor but how can i change it?
@real wigeon you'd need to share an example and the expected output
Is it a list of strings or what?
yes
it's cetegorical data
strings
like if i query my db, and get a dataframe, and in that dataframe i want to count the values in a column
but i want to count how many instances of each category
my ultimate goal is to actually make a bar chart out of this
Like this?
Hi ! Anyone know how to apply a green mask (show only the green on an image) within the forward function (need to convert it to OpenVINO) in PyTorch ?
Guys, it is possible to learn the math needed for Data Science (Calculus, Linear Algebra, Statistics, and Probability) only with videos? I know that books are the right way, but there's a lot of unnecessary information that I will probably not use. And I do not want to master math, just want the necessary to work with Data Science and its fields (ML, AI, Data Engineering).
Nope @rose cipher ! There are things I did entire videos + writing down notes + all the exercises at the end of the chapter of book + reading the chapter
And I still forget it 2-3 years down the line
The things I do exercises and practice I remember after quick refresher video
What I think you should do is choose a subset of applications like I did; as it helps with learning much less stuff and makes you much more productive and smarter at that one thing by focusing:
Choose Image processing, text processing, sound processing, or some one domain first and master it as not all the models and math applies to everything.
Y finalmente bienvenido al mundo de datos; espero que todos los hispanos se apoyen.
who said that books are the right way? yes, I'm sure all the information you need is out there in youtube videos, though I would probably use a video series that has some structure to it and make sure that you have some way to verify that you're absorbing the information.
So, can I learn all the math I need for DS and Its fields with videos?
probably
Lol ignore the well thought out and experienced answer if you must. The marketplace already pays me handsomely, wouldn't want competition anywho.
Sorry?
hello, i would like to please ask for sentimenatal anaylsis if something positive or negative would we use count vecotrizer or TfIDF or. bag of words etc and in text summary or NLP with relationship we use things like word2vec and glove?
@prime hearth Essentially you need a huge dictionary mapping words to sentiments and conditional sentiments. I wouldn't train one myself but use an existing library. Do you need an english one?
oh it just for personal projects
i was looking over oding some research but from what i read the conclusion i got is if it for classification then to use like count vectorize since easier or others and see what better accuracy otherwise for NLP use word2vec etc
it because im making 2 models, one to classify something good or bad and another is to summarize text using RNN. However, i not sure if it best just to use word2vec for both models or whicch to use ? I know since similarity is important i would use word2vec but not sure for classification
Well vectorizing the data is only to prepare it for the model; you can input that "vector" of data into a gradient boosting algorithm or any classifier now; because you now have count/dummy variables as input
The one that's worked the best for me is gradient boosting for text classification. but I think BERT is the best one that exists for NLP
You can make separate models and use the output of one model as input to the 2nd model; OR use the classification to check if the summary returns the same label "GOOD" or "BAD" as the original text.
Suggest me resource for AI or ML , python.
A short list about Machine Learning
General Information
- https://en.wikipedia.org/wiki/Machine_learning
- https://www.technologyreview.com/2018/11/17/103781/what-is-machine-learning-we-drew-you-another-flowchart/
- https://www.mathworks.com/discovery/machine-learning.html
- https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
Tutorials - https://github.com/trekhleb/homemade-machine-learning
- http://ujjwalkarn.github.io/Machine-Learning-Tutorials/
- https://github.com/ZuzooVn/machine-learning-for-software-engineers
- https://christinakouridi.blog/2019/07/09/vanilla-gan-numpy/ GAN, Python
Courses - https://www.coursera.org/learn/machine-learning
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-036-introduction-to-machine-learning-fall-2020/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-867-machine-learning-fall-2006/
- https://ocw.mit.edu/courses/mathematics/18-657-mathematics-of-machine-learning-fall-2015/
- https://sebastianraschka.com/blog/2021/dl-course.html Deep Learning
Projects - https://github.com/josephmisiti/awesome-machine-learning
Tools - https://www.tensorflow.org/
- https://pytorch.org/
- https://scikit-learn.org/stable/
Videos - https://www.youtube.com/playlist?list=PLbg3ZX2pWlgKV8K6bFJr5dhM7oOClExUJ
Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programme...
MIT Technology Review
Machine-learning algorithms find and apply patterns in data. And they pretty much run the world.
Learn the 3 things you need to know about machine learning; Resources include MATLAB examples, documentation, and code describing different machine learning algorithms.
The Official NVIDIA Blog
AI, machine learning, and deep learning are terms that are often used interchangeably. But they are not the same things.
It would be better to follow one at a time?
Well for me I started with this: https://cs50.harvard.edu/ai/2020/ and then this: https://developers.google.com/machine-learning/crash-course
Then after that I stopped following courses and started reading papers/articles for reference in my projects
Another good resource too for learning data cleaning techniques is Krish Naik youtube channel
Ok suggest me papers for references. I mean how I will use.
Ok
I don't want to stuck myself in many as I have other subjects not just one .
Thanks for these..
Well suggest me how can I get help from paper or article ?
Papers and articles usually target specific problems. You can read them after you develop a respectable foundation, for example with the courses I sent above
Right I just want to take glance 🙂
@tender hearth
Suggest me using weka related course.
weka?
:ok_hand: applied mute to @solar grotto until <t:1632452878:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
how to perform ml with this type of data. Its FHIR dataset of Allergy Intolerance
what do you exactly mean by perform ml?
I mean. What kind of information do you want to extract from this dataset?
Do you want to predict something, classify something, etc?
You can't just "do ML" on data
Hey @subtle barn!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .json attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
yes i couldnt understand what to do with that data
@tender hearth @lapis sequoia http://hapi.fhir.org/baseR4/AllergyIntolerance?_format=json pls check this out to me
I suppose that this is for some open-ended project based on the given dataset
actually im doing an internship and they gave a sample dataset and im stuck. I have worked with csv but json is very new to me
They didn't provide any guidance on what to do?
they told to do like anything on it
they told this - "Pick your own data set and show us some real analytics and ML implementations based on patient data"
pls somebody can u do ml implementation on it. simple is fine
https://www.tensorflow.org/api_docs howd you even get a ML internship without knowing how to ML
TensorFlow
An open source machine learning library for research and production.
well i have worked with csv but not json
json is even easier than csv
btw i wont get paid or anything like that
anyways you can read through the tensorflow docs to get up to speed with the basics
or you can use keras for a more high level tf implentation
actually im not able to understand what kind of implementation to be done whether its prediction or classification. I have cleaned the json and set it into tabular columns
what data is given?
i will send u the raw json link
http://hapi.fhir.org/baseR4/AllergyIntolerance?_format=json its fhir dataset of allergy intollerance
can u pls do an ml implementation and send me the notebook pls
uhhh
bruh people are not gonna just do the projects and do stuff for you. if you want help we're happy to help but don't expect full fledged code written just for you.
and about ML implementation well for starters you can show some statistics, may be try to find attributes which matter most. and may be divide dataset into training and testing set and see how different supervised algos work out.
I recommend a master’s degree in machine learning at university before you start doing anything. Processing important data without expertise leads to erroneous results. Wrong results lead to wrong decisions and it is detrimental to everyone. Do you take responsibility if someone makes you an incorrect machine learning model which lead to wrong decisions? A bit like a fake doctor would ask you to do surgery on your behalf.
:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1632465649:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
ok im sorry but the data is just a sample data obtained from public test server
Do your own work that you will be asked to do. If you are applying for job you do not know, please bear your responsibility.
why would i do your internship for you? dont apply for something if you cant do it. tell your employer thats it beyond your reach.
you cant just bodge your way through an internship
For me it seems that you have lied about your ability to build machine learning models and now you are here asking others to do it for you. Give us a break. I'm sorry, my intention is not to be rude, but if one can't even convert a json file e.g to Pandas DataFrame doesn't know even the simplest basics in the industry.
o
It's ok. I have converted the json to tabular columns from there I cant figure out whether the datas is for prediction or classification
That's why also domain expertise is important.
Btw its not that internship internship im just practicing myself
You need to understand the problem before you can solve the problem. You also need to be able to examine the data to see what it is suitable for. Sometimes no relevant results are obtained from the data.
But good luck going forward! I have an appointment.
When I was testing ensemble model, I have stumbled upon a phrase verbosity. There is parameter that if set to True, "controls the verbosity of the building process."
Does anyone know what does it mean?
Share the exact function that accepts this parameter
we can't make assumptions
i think the best advice ive heard recently
is try to solve your problem WITHOUT ML first

even if you end up having to use ML in the end, your solution will usually be better for having thought from that perspective
also not every problem is a ML problem

not sure what intership intership means but yeah we have nice resources in pins. you better learn stuff and then 'do ml' on that data lol. you can ask specific questions of course if you get stuck.
I have a Pie chart which looks like this. I want a legend on the right side which shows me the RGB color values of the pies pieces. I have the color values in an Array, I give them to the chart already. How can I pass it to a legend?
quick question, how do you split files inside a folder as train test split
like I'm trying to split it, since each csv file consider as a pattern
can't you just use sklearn's train test split on a list of filenames?
what do you mean? like the one I have is a list of frames in a mmWave device
where each one of it was split into 80+ frames (10 second)
do you want to split your observations/frames or your files?
I want to but the issue is 80+ frames (in 1 csv file) will be consider as a fall or nonfall, and I have like 320 csv file for nonfall and 200 csv file for fall
So, the normal train test split is out of the way. Because it will mix between frames of fall and non fall
Ok so you just run the split on two lists. Fall and not fall. (There might be something called «stratified» to address this already)
but how can I do that with each of the csv file? Like I mention before, I have 200 csv file for fall in 1 folder, and 320 csv file for non fall in another folder.
I don't quite see what you want / are missing. I would personally use dask to make one large dataframe of all "Frames" / data you have, then make a column with "falling" bool True/False as its content. Then use stratify="falling" https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
Are the files so large that memory becomes an issue?
Well it's not that large. but each of the csv file contain 10 second of fall or non fall for a particular position. that's why I cant just combine them into 1 large file (also because of the amount of frames is varies)
I would still construct a big dataframe of it all because you can then easily work with the data, loop over positions, fall/nonfall, ... use stratified sampling, etc
(but there are many ways to do this and this is just my prefrence)
My thinking is that when I am doing the analysis, I do not want to be concerned with which file the data is in, which format, ... and other issues. Convert up to a more abstract dataset so I can focus on the data, not on the storage formats, ...
but when you split it. it will randomize, there will no longer be 10seconds frames or is it?
like I tried it before but it is not correct
so your suggestion is to do it without shuffle?
Felipe only asked if it's possible to learn the math they want to learn via YouTube videos. They didn't ask about data science careers. Plenty of university students learn more from youtube videos than from their instructor.
yeah maybe but you shoul dbe able to do it quite easily yourself, too - it's hard to understand exactly what you are after here
🥴
can someone eli5 n_estimators in random forest regression?